Three views on a loop

Recently I needed to translate a function I had written in Scheme to Common Lisp for … reasons. As it turns out, this was not entirely straightforward. The function was originally written in a very recursive style. In order to have something that would run portably across Common Lisp implementations I needed something that didn’t rely so heavily on recursion.

This was not easy for me. I have written much much more Scheme than Common Lisp in my life, and it feels quite natural to me now to design loops using recursion. So that bridge had to be crossed somehow. This is the story of how I crossed it.

The procedure in question is merge, which is used to implement the core of a merge sort.

Let’s start with the Scheme. The code below shows a straightforward implementation of merge from a Scheme programmer’s perspective (or at least this one’s). As mentioned above, this is the merging procedure that is used to implement the core of a merge sort. It’s very similar to the code presented in Bottom Up Merge Sort in Scheme.

(define (merge l r pred)
  (letrec ((merge-aux
            (lambda (left right pred result)
              (cond ((and (null? left)
                          (null? right))
                     (reverse result))
                    ((and (not (null? left))
                          (not (null? right)))
                     (if (pred (car left) (car right))
                         (merge-aux (cdr left)
                                    right pred (cons (car left) result))
                       (merge-aux left
                                  (cdr right) pred (cons (car right) result))))
                    ((not (null? left))
                     (merge-aux (cdr left) right pred (cons (car left) result)))
                    ((not (null? right))
                     (merge-aux left (cdr right) pred (cons (car right) result)))
                    (else #f)))))
    (merge-aux l r pred '())))

As you can see this code makes heavy use of recursive procedure calls, the default iteration construct in Scheme. As noted above, we can’t expect this sort of code to be supported in portable Common Lisp code, so it needs to be translated.

(Side note: If you are interested in some notes on which Lisp implementations support tail calls, recursive self-calls, etc., see this interesting article.)

Below is my first attempt to translate the Scheme code directly into Common Lisp. It still relies heavily on recursive function calls. If you compile this in a recent CMUCL or Clozure CL the recursion doesn’t seem to bother them, since their compilers support recursion well. And the performance will be pretty good, too. But that won’t be the case in every Lisp.

(Note that we call this function rml/merge because Common Lisp already has a built-in merge — about which more later.)

(defun rml/merge (l r pred)
  (labels ((merge-aux (pred left right result)
             (cond ((and (null left)
                         (null right))
                    (reverse result))
                   ((and (not (null left))
                         (not (null right)))
                    (if (funcall pred (car left) (car right))
                        (merge-aux pred (cdr left)
                                   right (cons (car left) result))
                        (merge-aux pred left
                                   (cdr right) (cons (car right) result))))
                   ((and (not (null left)) (null right))
                    (merge-aux pred (cdr left) right (cons (car left) result)))
                   ((and (null left) (not (null right)))
                    (merge-aux pred left (cdr right) (cons (car right) result)))
                   (t 'FAIL))))
    (merge-aux pred l r '())))

As we discussed, some Common Lisp implementations will not handle this level of recursion. One Lisp in particular was pretty salty about it. :-}



CLISP and ABCL will also explode pretty much immediately.

It took a bit of thinking, but I remembered reading about something called BLOCK in good old CLtL2. It lets you create a named block that you can later jump out of at your convenience, e.g.,

(block outer
   ;; ... do things
   ;; ok, i'm done!
   (return-from outer my-result))

This reminds me of a similar pattern I’ve used in Perl (or Java for that matter) for early returns.

LOOP: for my $elem (@list) {
    # do things ....
    next LOOP if check($elem);
    # do other things ...

Using my newfound weapon (hammer?) block, I was able to slay the recursion dragon as shown below in rml/merge2. This code is speaking Common Lisp, but it still has a heavy Scheme accent! The shape of the computation feels pretty similar, at least.

(defun rml/merge2 (left right pred)
  (let ((left left)
        (right right)
        (result '()))
    (block outer
      (loop do
           (block inner
             (cond ((and (null left) (null right))
                    (return-from outer))
                   ((and (not (null left))
                         (not (null right)))
                    (if (funcall pred (car left) (car right))
                          (push (car left) result)
                          (setf left (cdr left))
                          (return-from inner))
                          (push (car right) result)
                          (setf right (cdr right))
                          (return-from inner))))
                   ((not (null left))
                      (push (car left) result)
                      (setf left (cdr left))
                      (return-from inner)))
                   ((not (null right))
                      (push (car right) result)
                      (setf right (cdr right))
                      (return-from inner)))
                   (t                   ; Should never get here (tm)
                    (error "ERROR: RML/MERGE2: ~A ~A ~A~%" left right result))))))
    (nreverse result)))

At a high level, the code works as follows:

  1. let-bind some variables to the arguments passed in, so we can bash on them at will later on.
  2. Open up an outer block to capture the state of the loop, with a name we can use for an early return. This is what we will jump out to when we’re done.
  3. Start a big loop with an inner block. I’m using a magic number here (one billion) as a shorthand for “just loop forever” since we are going to jump out when we’re done anyway. (“One-billion-element-long lists should be enough for anyone.”)
  4. Inside the loop’s inner block, we model the iteration created by recursive function calls using jumps. In every step of the cond, we jump out to the inner block’s label just as we would in Scheme by making a procedure call, except here we need to be explicit using return-from, whereas Scheme lets us be implicit.
  5. Once left and right are both empty, our work is done, so we jump back out to the outer block. Then, we destructively sort the result list in place with nreverse, just because we can. (Hey, it was temporary anyway.)

This isn’t very satisfying, though. It felt icky using block, which feels like a building block you’d need to build a looping construct (aka compiler/macro output), but not something “user-level” code should need to use. So I did some more reading of Common Lisp code. In particular I discovered a couple of things:

  • A bare loop with no arguments will happily go on forever; you don’t need to say loop repeat 5 ....
  • Inside a loop, you don’t need to say do (something else) unless, prior to that point, you’ve used loop‘s “English” syntax, e.g., loop from 1 to 10 do ....
  • Because this isn’t Scheme, we don’t need to do the 2-step dance of “put car of A here, set A to cdr of A there”. We can push the return value of pop onto Adirectly, and rely on the fact that pop mutates its argument list to handle setting A to cdr of A. Mmmmm, yummy state-bashing.

Thanks to the above, we now have code for rml/merge3. It got a lot shorter and simpler along the way, while saying the same things:

(defun rml/merge3 (left right pred)
  (let ((result '()))
       (cond ((and (null left) (null right))
              (return (nreverse result)))
             ((and (not (null left))
                   (not (null right)))
              (if (funcall pred (car left) (car right))
                  (push (pop left) result)
                  (push (pop right) result)))
             ((not (null left))
              (push (pop left) result))
             ((not (null right))
              (push (pop right) result))
             (t                         ; Should never get here (tm)
              (error "ERROR: RML/MERGE2: ~A ~A ~A~%" left right result))))))

Now that I had some code, I needed to make sure it actually gave the right answers (aka the hard part). Because Common Lisp has a merge function built in, I was able to write a bit of test code to check my merge function’s implementation against what the Lisp wizards did. There’s no feeling like standing on the shoulders of giants!

Here is the test code, which you can paste into your REPL and try for yourself. It runs our merge function against the system’s merge 1000 times on the same randomly generated input lists. For each test case, if the outputs match, we output a t; otherwise nil.

(defun make-random-list (size &key max)
  (loop repeat (random size) collecting (random max)))

(defun check-merge-output (function size)
  (let* ((left (make-random-list size :max size))
         (right (make-random-list size :max size))
         (left-copy (copy-list left))
         (right-copy (copy-list right))
         (expected (merge 'list left-copy right-copy #'<))
         (got (funcall function left right #'<)))
    (if (equalp expected got)
        (values expected got))))

(defun run-merge-tests (function)
  (loop repeat 1000 collecting
       (if (check-merge-output function 1000) t nil)))

Here’s what the output of a successful run looks like (Luckily, I have not yet found any unsuccessful ones):

CL-USER> (run-merge-tests #'rml/merge3)
(T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T ...)

Based on the testing shown here, I think this implementation is Probably Fine ™. I’m happy with it, and it performs well enough for my needs.

In a future post, we’ll use the merge function we’ve shared here to write a merge sort.

Performance Postamble

But what about the performance? It’s not bad, actually! As usual, it can depend on the implementation. Below I’ll list out how it compares in a few implementations. However I should state right now for the record that:

  1. I’m not a performance or benchmarking expert. You should probably just stop reading here, because what follows is likely to be nonsense.
  2. My implementation only works on lists (not vectors, too) and does not accept a sort key, as the standard dictates.
  3. My implementation is not battle-hardened from living inside a Common Lisp implementation for decades (the most important point of all).

That said, what follows is what I saw when I ran the test code below on a few implementations:

(defun check-merge-performance (function)
   (let* ((len 1000000)
          (max 1000000)
          (xs (make-random-list len :max max))
          (ys (make-random-list len :max max)))
       (funcall function xs ys #'<)

(defun check-system-merge-performance ()
   (let* ((len 1000000)
          (max 1000000)
          (xs (make-random-list len :max max))
          (ys (make-random-list len :max max)))
       (merge 'list xs ys #'<)

Note that the following tests were all run on a several-years-old Windows laptop with spinning disks, slow RAM, and a bunch of other applications running. Did I mention I’m not a performance and benchmarking expert? :-}

Armed Bear Common Lisp

ABCL‘s merge implementation did not do so well against mine, but I think I must be missing something. For example, there’s probably a lot of subtlety around JVM performance on Windows I just have no idea about.

Mine ran in about 17 seconds.

CL-USER> (check-merge-performance #'rml/merge3)
17.452 seconds real time
4000684 cons cells

Unfortunately, theirs took about 5 minutes to do about the same (*) computation. I’m not sure why. It’s possible I’m making some kind of mistake here, but I don’t know what it is. I used the timing code below because at first I thought the ABCL merge check had just hung altogether.

CL-USER> (time
          (let* ((len 1000000)
                 (max 1000000)
                 (xs (make-random-list len :max max))
                 (ys (make-random-list len :max max)))
              (merge 'list xs ys #'<)

312.803 seconds real time
412497 cons cells

(*) It’s not exactly the same, because remember that CL’s merge works on lists or vectors, so there will be some dispatching.

Out of curiosity, I went back and tried a few smaller inputs. ABCL seems to do OK with 2 100,000-element lists of random numbers, as shown below. Further, if you look at the number of cons cells, the function does appear to be O(n), which you would expect to see. This makes me think the problem is simply that 2 lists of 1,000,000 elements in the JVM is just too big a load for my little laptop, somehow.

CL-USER> (time
          (let* ((len 100000)
                 (max 1000000)
                 (xs (make-random-list len :max max))
                 (ys (make-random-list len :max max)))
              (merge 'list xs ys #'<)               (values)))) 0.028 seconds real time 4025 cons cells NIL CL-USER> (time
          (let* ((len 10000)
                 (max 1000000)
                 (xs (make-random-list len :max max))
                 (ys (make-random-list len :max max)))
              (merge 'list xs ys #'<)
2.24 seconds real time
40113 cons cells


CLISP‘s implementation is about twice as fast as mine, and uses about half as much memory, as is right and proper.

CL-USER> (check-merge-performance #'rml/merge3)
Real time: 9.634908 sec.
Run time: 9.578125 sec.
Space: 56000000 Bytes
GC: 16, GC time: 0.578125 sec.
; No value
CL-USER> (check-system-merge-performance)
Real time: 4.6764607 sec.
Run time: 4.671875 sec.
Space: 32000000 Bytes
GC: 6, GC time: 0.40625 sec.
; No value

Corman Lisp

Corman Lisp is a Common Lisp implementation for Windows. It’s fast and has really good Win32 integration. I enjoy the IDE a lot.

If you like Lisp, you may enjoy this interesting talk by its creator Roger Corman where he discusses its development.

The system merge is about twice as fast as mine. It seems to come down to my implementation creating a lot of garbage, which is expected given my naive use of push and pop when I should be doing more direct mutation and structure sharing using e.g. rplacd.

(check-merge-performance #'rml/merge3)
Total Execution time: 0.360289 seconds
Time spent garbage collecting: 0.093065 seconds

Total Execution time: 0.180082 seconds
Time spent garbage collecting: 0.0 seconds

LispWorks (32-bit Hobbyist Edition for Windows)

LispWorks is a proprietary implementation put out by a company in England. Not only is their merge faster, they’re using about half as much memory. All as expected.

P.S. I just bought the “Hobbyist” version of the software, and I can definitely recommend it. Fast compiler, and the IDE and class browser, debugger, etc., are lot of fun to use.

CL-USER 40 > (check-merge-performance #'rml/merge3)
Timing the evaluation of (LET* ((LEN 100000) (MAX 100000) (XS (MAKE-RANDOM-LIST LEN :MAX MAX)) (YS (MAKE-RANDOM-LIST LEN :MAX MAX))) (PROGN (FUNCALL FUNCTION XS YS (FUNCTION <)) (VALUES))) User time    =        0.062 System time  =        0.015 Elapsed time =        0.063 Allocation   = 2895872 bytes 0 Page faults CL-USER 41 > (check-system-merge-performance)

User time    =        0.015
System time  =        0.000
Elapsed time =        0.015
Allocation   = 1434380 bytes
0 Page faults

Anyway that’s all for now. As I said, you should take the “results” above with a grain of salt. Nonetheless it was interesting seeing how the different implementations behaved in an unscientific comparison.

Bottom Up Merge Sort in Scheme


Recently I’ve begun a project to implement a number of basic algorithms in Scheme, which I’d like to eventually grow into a free (as in freedom) ebook. Having just done a Binary Search in Scheme, I thought it would be fun to give merge sort a try.

According to the mighty interwebs, merge sort is a good choice for sorting linked lists (a.k.a., Lisp lists). Unfortunately the only Lisp merge sort implementation examples I’ve been able to find on the web have been recursive, not iterative.

The implementation described here is an iterative, bottom-up merge sort, written in a functional style. (I daren’t say the functional style, lest any real Scheme wizards show up and burn me to a crisp.)

First, generate a list of random numbers

In order to have something to sort, we need a procedure that generates a list of random numbers – note that the docstring is allowed by MIT/GNU Scheme; YMMV with other Schemes.

(define (make-list-of-random-numbers list-length max)
  ;; Int Int -> List
  "Make a list of random integers less than MAX that's LIST-LENGTH long."
  (letrec ((maker
            (lambda (list-length max result)
              (let loop ((n list-length) (result '()))
                (if (= n 0)
                    (loop (- n 1) (cons (random max) result)))))))
    (maker list-length max '())))

Then, write a merge procedure

This implementation of the merge procedure is a straight port of the one described on the Wikipedia Merge Sort page, with one minor difference to make the sort faster 1.

An English description of the merge operation is as follows:

  • If both items passed in are numbers (or strings), wrap them up in lists and recur. (In this example we only care about sorting numbers)
  • If both lists are empty, return the result.
  • If neither list is empty:
    • If the first item in the first list is “less than” the first item in the second list, cons it onto the result and recur.
    • Otherwise, cons the first item in the second list on the result and recur.
  • If the first list still has items in it, cons the first item onto the result and recur.
  • If the second list still has items in it, cons the first item onto the result and recur.
  • If none of the above conditions are true, return #f. I put this here for debugging purposes while writing this code; now that the procedure is debugged, it is never reached. (Note: “debugged” just means “I haven’t found another bug yet”.)
(define (rml/merge pred l r)
  (letrec ((merge-aux
            (lambda (pred left right result)
               ((and (number? left)
                     (number? right))
                (merge-aux pred 
                           (list left) 
                           (list right) 
               ((and (string? left)
                     (string? right))
                (merge-aux pred
                           (list left) 
                           (list right) 
               ((and (null? left)
                     (null? right))
                (reverse result))
               ((and (not (null? left))
                     (not (null? right)))
                (if (pred (car left)
                          (car right))
                    (merge-aux pred
                               (cdr left)
                               (cons (car left) result))
                    (merge-aux pred
                               (cdr right)
                               (cons (car right) result))))
               ((not (null? left))
                (merge-aux pred (cdr left) right (cons (car left) result)))
               ((not (null? right))
                (merge-aux pred left (cdr right) (cons (car right) result)))
               (else #f)))))
    (merge-aux pred l r '())))

We can run a few merges to get a feel for how it works. The comparison predicate we pass as the first argument will let us sort all kinds of things, but for the purposes of this example we’ll stick to numbers:

(rml/merge < '(360 388 577) '(10 811 875 995))
;Value 11: (10 360 388 577 811 875 995)

(rml/merge < '(8 173 227 463 528 817) '(10 360 388 577 811 875 995))
;Value 12: (8 10 173 227 360 388 463 528 577 811 817 875 995)

(rml/merge < 
           '(218 348 486 520 639 662 764 766 886 957 961 964)
           '(8 10 173 227 360 388 463 528 577 811 817 875 995))
;Value 14: (8 10 173 218 227 348 360 388 463 486 520 528 577 639 662 764 766 811 817 875 886 957 961 964 995)

Finally, do a bottom up iterative merge sort

It took me a while to figure out how to do the iterative merge sort in a Schemely fashion. As usual, it wasn’t until I took the time to model the procedure on paper that I got somewhere. Here’s what I wrote in my notebook:

;;  XS                   |      RESULT

'(5 1 2 9 7 8 4 3 6)            '()
    '(2 9 7 8 4 3 6)            '((1 5))
        '(7 8 4 3 6)            '((2 9) (1 5))
            '(4 3 6)            '((7 8) (2 9) (1 5))
                '(6)            '((3 4) (7 8) (2 9) (1 5))
                 '()            '((6) (3 4) (7 8) (2 9) (1 5))

;; XS is null, and RESULT is not of length 1 (meaning it isn't sorted
;; yet), so we recur, swapping the two:

'((6) (3 4) (7 8) (2 9) (1 5))  '()
          '((7 8) (2 9) (1 5))  '((3 4 6))
                      '((1 5))  '((2 7 8 9) (3 4 6))
                           '()  '((1 5) (2 7 8 9) (3 4 6))

;; Once more XS is null, but RESULT is still not sorted, so we swap
;; and recur again

'((1 5) (2 7 8 9) (3 4 6))      '()
                  '(3 4 6)      '((1 2 5 7 8 9))
                       '()      '((3 4 6) (1 2 5 7 8 9))

;; Same story: swap and recur!

'((3 4 6) (1 2 5 7 8 9))        '()
                     '()        '((1 2 3 4 5 6 7 8 9))

;; Finally, we reach our base case: XS is null, and RESULT is of
;; length 1, meaning that it contains a sorted list

'(1 2 3 4 5 6 7 8 9)

This was a really fun little problem to think about and visualize. It just so happens that it fell out in a functional style; usually I don’t mind doing a bit of state-bashing, especially if it’s procedure-local. Here’s the code that does the sort shown above:

(define (rml/merge-sort xs pred)
  (let loop ((xs xs)
             (result '()))
    (cond ((and (null? xs)
                (null? (cdr result)))
           (car result))
          ((null? xs)
           (loop result
          ((null? (cdr xs))
           (loop (cdr xs)
                 (cons (car xs) result)))
           (loop (cddr xs)
                 (cons (rml/merge <
                              (first xs)
                              (second xs))

That’s nice, but how does it perform?

A good test of our merge sort is to compare it to the system’s sort procedure. In the case of MIT/GNU Scheme, we’ll need to compile our code if we hope to get anywhere close to the system’s speed. If your Scheme is interpreted, you don’t have to bother of course.

To make the test realistic, we’ll create three lists of random numbers: one with 20,000 items, another with 200,000, and finally a giant list of 2,000,000 random numbers. This should give us a good idea of our sort’s performance. Here’s the output of timing first two sorts, 20,000 and 200,000 2:

;;; Load compiled code

(load "mergesort")
;Loading ""... done
;Value: rml/insertion-sort2

;;; Define our lists

(define unsorted-20000 (make-list-of-random-numbers 20000 200000))
;Value: unsorted-20000

(define unsorted-200000 (make-list-of-random-numbers 200000 2000000))
;Value: unsorted-200000

;;; Sort the list with 20,000 items

(with-timing-output (rml/merge-sort unsorted-20000 <))
;Run time:      .03
;GC time:       0.
;Actual time:   .03

(with-timing-output (sort unsorted-20000 <))
;Run time:      .02
;GC time:       0.
;Actual time:   .021

;;; Sort the list with 200,000 items

(with-timing-output (rml/merge-sort unsorted-200000 <))
;Run time:      .23
;GC time:       0.
;Actual time:   .252

(with-timing-output (sort unsorted-200000 <))
;Run time:      .3
;GC time:       0.
;Actual time:   .3

As you can see, our sort procedure is on par with the system’s for these inputs. Now let’s turn up the heat. How about a list with 2,000,000 random numbers?

;;; Sort the list with 2,000,000 items

(define unsorted-2000000 (make-list-of-random-numbers 2000000 20000000))
;Value: unsorted-2000000

(with-timing-output (rml/merge-sort4 unsorted-2000000 <))
;Aborting!: out of memory
;GC #34: took:   0.80 (100%) CPU time,   0.10 (100%) real time; free: 11271137
;GC #35: took:   0.70 (100%) CPU time,   0.90  (81%) real time; free: 11271917
;GC #36: took:   0.60 (100%) CPU time,   0.90  (99%) real time; free: 11271917

(with-timing-output (sort unsorted-2000000 <))
;Run time:      2.48
;GC time:       0.
;Actual time:   2.474

No go. On a MacBook with 4GB of RAM, our merge sort runs out of memory, while the system sort procedure works just fine. It seems the wizards who implemented this Scheme system knew what they were doing after all! :-}

It should be pretty clear at this point why we’re running out of memory. In MIT/GNU Scheme, the system sort procedure uses vectors and mutation (and is no doubt highly tuned for the compiler), whereas we take a relatively brain-dead approach that uses lists and lots of cons-ing. I leave it as an exercise for the reader (or perhaps my future self) to rewrite this code so that it doesn’t run out of memory.

(Image courtesy mag3737 under Creative Commons license.)


1 An earlier implementation started off the sort by “exploding” the list to be sorted so that ='(1 2 3)= became ='((1) (2) (3))=. This is convenient for testing purposes, but very expensive. It’s also unnecessary after the first round of merging. We avoid the need to explode the list altogether by teaching merge to accept numbers and listify them when they appear. We could also do the same for strings and other types as necessary.

2 For the definition of the with-timing-output macro, see here.