Just for fun, I’ve begun translating some of the algorithms from Mastering Algorithms with Perl into Scheme. My hope is that I’ll get two things out of this: a better knowledge of algorithms, and of Scheme hacking.
Binary search is one of the first algorithms listed in the book; it’s tricky to write a correct binary search, but I had the Perl code to work from. Let’s see how I did.
What’s binary search?
Binary search is a method for finding a specific item in a sorted list. Here’s how it works:
- Take a guess that the item you want is in the middle of the current search “window” (when you start, the search window is the entire list).
- If the item is where you guessed it would be, return the index (the location of your guess).
- If your guess is “less than” the item you want (based on a comparison function you choose), recur, this time raising the “bottom” of the search window to the midway point.
- If your guess is “greater than” the item you want (based on your comparison function), recur, this time lowering the “top” of the search window to the midway point.
In other words, you cut the size of the search window in half every time through the loop. This gives you a worst-case running time of about
(/ (log n) (log 2)) steps. This means you can find an item in a sorted list of 20,000,000,000 (twenty billion) items in about 34 steps.
Reading lines from a file
Before I could start writing a binary search, I needed a sorted list of items. I decided to work with a sorted list of words from
/usr/share/dict/words, so I wrote a couple of little procedures to make a list of words from a subset of that file. (I didn’t want to read the entire large file into a list in memory.)
format and the Lisp-inspired
#!optional keyword are available in MIT Scheme; they made writing the
re-matches? procedure more convenient.
re-matches?checks if a regular expression matches a string (in this case, a line from a file).
make-list-of-words-matchingis used to loop over the lines of the
wordsfile and return a list of lines matching the provided regular expression.
Now I have the tools I need to make my word list.
(load-option 'format) (define (re-matches? re line #!optional display-matches) ;; Regex String . Boolean -> Boolean "Attempt to match RE against LINE. Print the match if DISPLAY-MATCHES is set." (let ((match (re-string-match re line))) (if match (if (not (default-object? display-matches)) (begin (format #t "|~A|~%" (re-match-extract line match 0)) #t) #t) #f))) (define (make-list-of-words-matching re file) ;; Regex String -> List "Given a regular expression RE, loop over FILE, gathering matches." (call-with-input-file file (lambda (port) (let loop ((source (read-line port)) (sink '())) (if (eof-object? source) sink (loop (read-line port) (if (re-matches? re source) (cons source sink) sink)))))))
Since I am not one of the 10% of programmers who can implement a correct binary search on paper, I started out by writing a test procedure. The test procedure grew over time as I found bugs and read an interesting discussion about the various edge cases a binary search procedure should handle. These include:
- Empty list
- List has one word
- List has two word
- Word is not there and “less than” anything in the list
- Word is not there and “greater than” anything in the list
- Word is first item
- Word is last item
- List is all one word
- If multiple copies of word are in list, return the first word found (this could be implemented to return the first or last duplicated word)
Furthermore, I added a few “sanity checks” that check the return values against known outputs. Here are the relevant procedures:
assert=checks two numbers for equality and prints a result
assert-equalchecks two Scheme objects against each other with
equal?and prints a result
run-binary-search-testsreads in words from a file and runs all of our tests
(define (assert= expected got #!optional noise) ;; Int Int -> IO (if (= expected got) (format #t "~A is ~A\t...ok~%" expected got) (format #t "~A is not ~A\t...FAIL~%" expected got))) (define (assert-equal? expected got #!optional noise) ;; Thing Thing -> IO (if (equal? expected got) (format #t "~A is ~A\t...ok~%" expected got) (format #t "~A is not ~A\t...FAIL~%" expected got))) (define (run-binary-search-tests) ;; -> IO "Run our binary search tests using known words from the 'words' file. This file should be in the current working directory." (with-working-directory-pathname (pwd) (lambda () (if (file-exists? "words") (begin (format #t "file 'words' exists, making a list...~%") (let* ((unsorted (make-list-of-words-matching "acc" "words")) (sorted (sort unsorted string<?))) (format #t "doing binary searches...~%") (assert-equal? #f (binary-search "test" '())) ; empty list (assert-equal? #f (binary-search "aardvark" sorted)) ; element absent and too small (assert-equal? #f (binary-search "zebra" sorted)) ; element absent and too large (assert= 0 (binary-search "accusive" '("accusive"))) ; list of length one (assert= 0 (binary-search "acca" sorted)) ; first element of list (assert= 1 (binary-search "aardvark" '("aardvark" "aardvark" "babylon"))) ; multiple copies of word in list (assert= 1 (binary-search "barbaric" '("accusive" "barbaric"))) ; list of length two (assert= 98 (binary-search "acclamator" sorted)) (assert= 127 (binary-search "aardvark" (map (lambda (x) "aardvark") test-list))) ; list is all one value (assert= 143 (binary-search "accomplice" sorted)) (assert= 254 (binary-search "accustomedly" sorted)) (assert= 255 (binary-search "accustomedness" sorted)))))))) ; last element of list
The binary search procedure
Finally, here’s the binary search procedure; it uses a couple of helper procedures for clarity.
->intis a helper procedure that does a quick and dirty integer conversion on its argument
split-differencetakes a low and high number and returns the floor of the halfway point between the two
binary-searchtakes an optional
debug-printargument that I used a lot while debugging. The
formatstatements and the optional argument tests add a lot of bulk – now that the procedure is debugged, they can probably be removed. (Aside: I wonder how much “elegant” code started out like this and was revised after sufficient initial testing and debugging?)
(define (->int n) ;; Number -> Int "Given a number N, return its integer representation. N can be an integer or flonum (yes, it's quick and dirty)." (flo:floor->exact (exact->inexact n))) (define (split-difference low high) ;; Int Int -> Int "Given two numbers, return their rough average." (if (= (- high low) 1) 1 (->int (/ (- high low) 2)))) (define (binary-search word xs #!optional debug-print) ;; String List -> Int "Do binary search of list XS for WORD. Return the index found, or #f." (if (null? xs) #f (let loop ((low 0) (high (- (length xs) 1))) (let* ((try (+ low (split-difference low high))) (word-at-try (list-ref xs try))) (cond ((string=? word-at-try word) try) ((< (- high low) 1) #f) ((= (- high try) 1) (if (string=? (list-ref xs low) word) low #f)) ((string<? word-at-try word) (if (not (default-object? debug-print)) (begin (format #f "(string<? ~A ~A) -> #t~%try: ~A high: ~A low: ~A ~2%" word-at-try word try high low) (loop (+ 1 try) high)) ; raise the bottom of the window (loop (+ 1 try) high))) ((string>? word-at-try word) (if (not (default-object? debug-print)) (begin (format #f "(string>? ~A ~A) -> #t~%try: ~A high: ~A low: ~A ~2%" word-at-try word try high low) (loop low (+ 1 try))) ; lower the top of the window (loop low (+ 1 try)))) (else #f))))))
This exercise has taught me a lot.
- Writing correct code is hard. (I’m confident that this code is not correct.) You need to figure out your invariants and edge cases first. I didn’t, and it made things a lot harder.
- It’s been said a million times, but tests are code. The tests required some debugging of their own.
- Once they worked, the tests were extremely helpful. Especially now that I’m at the point where (if this were “for real”) additional features would need to be added, the
formatcalls removed, the procedure speeded up, and so on.
I hope this has been useful to some other aspiring Scheme wizards out there. Happy Hacking!