Binary Search in Scheme

../img/hex-twist-variation-2.jpg

Intro

Just for fun, I’ve begun translating some of the algorithms from Mastering Algorithms with Perl into Scheme. My hope is that I’ll get two things out of this: a better knowledge of algorithms, and of Scheme hacking.

Binary search is one of the first algorithms listed in the book; it’s tricky to write a correct binary search, but I had the Perl code to work from. Let’s see how I did.

What’s binary search?

Binary search is a method for finding a specific item in a sorted list. Here’s how it works:

  1. Take a guess that the item you want is in the middle of the current search “window” (when you start, the search window is the entire list).
  2. If the item is where you guessed it would be, return the index (the location of your guess).
  3. If your guess is “less than” the item you want (based on a comparison function you choose), recur, this time raising the “bottom” of the search window to the midway point.
  4. If your guess is “greater than” the item you want (based on your comparison function), recur, this time lowering the “top” of the search window to the midway point.

In other words, you cut the size of the search window in half every time through the loop. This gives you a worst-case running time of about (/ (log n) (log 2)) steps. This means you can find an item in a sorted list of 20,000,000,000 (twenty billion) items in about 34 steps.

Reading lines from a file

Before I could start writing a binary search, I needed a sorted list of items. I decided to work with a sorted list of words from /usr/share/dict/words, so I wrote a couple of little procedures to make a list of words from a subset of that file. (I didn’t want to read the entire large file into a list in memory.)

Note: Both format and the Lisp-inspired #!optional keyword are available in MIT Scheme; they made writing the re-matches? procedure more convenient.

  • re-matches? checks if a regular expression matches a string (in this case, a line from a file).
  • make-list-of-words-matching is used to loop over the lines of the words file and return a list of lines matching the provided regular expression.

Now I have the tools I need to make my word list.

(load-option 'format)

(define (re-matches? re line #!optional display-matches)
  ;; Regex String . Boolean -> Boolean
  "Attempt to match RE against LINE. Print the match if DISPLAY-MATCHES is set."
  (let ((match (re-string-match re line)))
    (if match
        (if (not (default-object? display-matches))
            (begin (format #t "|~A|~%" (re-match-extract line match 0))
                   #t)
            #t)
        #f)))

(define (make-list-of-words-matching re file)
  ;; Regex String -> List
  "Given a regular expression RE, loop over FILE, gathering matches."
  (call-with-input-file file
    (lambda (port)
      (let loop ((source (read-line port)) (sink '()))
        (if (eof-object? source)
            sink
            (loop (read-line port) (if (re-matches? re source)
                             (cons source sink)
                             sink)))))))

Writing tests

Since I am not one of the 10% of programmers who can implement a correct binary search on paper, I started out by writing a test procedure. The test procedure grew over time as I found bugs and read an interesting discussion about the various edge cases a binary search procedure should handle. These include:

  • Empty list
  • List has one word
  • List has two word
  • Word is not there and “less than” anything in the list
  • Word is not there and “greater than” anything in the list
  • Word is first item
  • Word is last item
  • List is all one word
  • If multiple copies of word are in list, return the first word found (this could be implemented to return the first or last duplicated word)

Furthermore, I added a few “sanity checks” that check the return values against known outputs. Here are the relevant procedures:

  • assert= checks two numbers for equality and prints a result
  • assert-equal checks two Scheme objects against each other with equal? and prints a result
  • run-binary-search-tests reads in words from a file and runs all of our tests
(define (assert= expected got #!optional noise)
  ;; Int Int -> IO
  (if (= expected got)
      (format #t "~A is ~A\t...ok~%" expected got)
      (format #t "~A is not ~A\t...FAIL~%" expected got)))

(define (assert-equal? expected got #!optional noise)
  ;; Thing Thing -> IO
  (if (equal? expected got)
      (format #t "~A is ~A\t...ok~%" expected got)
      (format #t "~A is not ~A\t...FAIL~%" expected got)))

(define (run-binary-search-tests)
  ;; -> IO
  "Run our binary search tests using known words from the 'words' file.
This file should be in the current working directory."
  (with-working-directory-pathname (pwd)
    (lambda ()
      (if (file-exists? "words")
          (begin
            (format #t "file 'words' exists, making a list...~%")
            (let* ((unsorted (make-list-of-words-matching "acc" "words"))
                   (sorted (sort unsorted string<?)))
              (format #t "doing binary searches...~%")
              (assert-equal? #f (binary-search "test" '())) ; empty list
              (assert-equal? #f (binary-search "aardvark" sorted)) ; element absent and too small
              (assert-equal? #f (binary-search "zebra" sorted)) ; element absent and too large
              (assert= 0 (binary-search "accusive" '("accusive"))) ; list of length one
              (assert= 0 (binary-search "acca" sorted)) ; first element of list
              (assert= 1 (binary-search "aardvark" '("aardvark" "aardvark" "babylon"))) ; multiple copies of word in list
              (assert= 1 (binary-search "barbaric" '("accusive" "barbaric"))) ; list of length two
              (assert= 98 (binary-search "acclamator" sorted))
              (assert= 127 (binary-search "aardvark" (map (lambda (x) "aardvark") test-list))) ; list is all one value
              (assert= 143 (binary-search "accomplice" sorted))
              (assert= 254 (binary-search "accustomedly" sorted))
              (assert= 255 (binary-search "accustomedness" sorted)))))))) ; last element of list

The binary search procedure

Finally, here’s the binary search procedure; it uses a couple of helper procedures for clarity.

  • ->int is a helper procedure that does a quick and dirty integer conversion on its argument
  • split-difference takes a low and high number and returns the floor of the halfway point between the two
  • binary-search takes an optional debug-print argument that I used a lot while debugging. The format statements and the optional argument tests add a lot of bulk – now that the procedure is debugged, they can probably be removed. (Aside: I wonder how much “elegant” code started out like this and was revised after sufficient initial testing and debugging?)
(define (->int n)
  ;; Number -> Int
  "Given a number N, return its integer representation.
N can be an integer or flonum (yes, it's quick and dirty)."
  (flo:floor->exact (exact->inexact n)))

(define (split-difference low high)
  ;; Int Int -> Int
  "Given two numbers, return their rough average."
  (if (= (- high low) 1)
      1
      (->int (/ (- high low) 2))))

(define (binary-search word xs #!optional debug-print)
  ;; String List -> Int
  "Do binary search of list XS for WORD. Return the index found, or #f."
  (if (null? xs)
      #f
      (let loop ((low 0) (high (- (length xs) 1)))
        (let* ((try (+ low (split-difference low high)))
               (word-at-try (list-ref xs try)))
          (cond
           ((string=? word-at-try word) try)
           ((< (- high low) 1) #f)
           ((= (- high try) 1) 
            (if (string=? (list-ref xs low) word)
                low
                #f))
           ((string<? word-at-try word)
            (if (not (default-object? debug-print))
                (begin (format #f "(string<? ~A ~A) -> #t~%try: ~A high: ~A low: ~A ~2%"
                               word-at-try word try high low)
                       (loop (+ 1 try) high)) ; raise the bottom of the window
                (loop (+ 1 try) high)))
           ((string>? word-at-try word)
            (if (not (default-object? debug-print))
                (begin (format #f "(string>? ~A ~A) -> #t~%try: ~A high: ~A low: ~A ~2%"
                               word-at-try word try high low)
                       (loop low (+ 1 try))) ; lower the top of the window
                (loop low (+ 1 try))))
           (else #f))))))

Takeaways

This exercise has taught me a lot.

  1. Writing correct code is hard. (I’m confident that this code is not correct.) You need to figure out your invariants and edge cases first. I didn’t, and it made things a lot harder.
  2. It’s been said a million times, but tests are code. The tests required some debugging of their own.
  3. Once they worked, the tests were extremely helpful. Especially now that I’m at the point where (if this were “for real”) additional features would need to be added, the format calls removed, the procedure speeded up, and so on.

I hope this has been useful to some other aspiring Scheme wizards out there. Happy Hacking!

(Image courtesy Melisande under Creative Commons license.)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s