Monthly Archives: July 2013

Functions, Exponents, and Egg-Producing Hens

What is a function? One way of looking at it is as a rule that maps the elements of one set onto the elements of another. For example, given the set A = {apple, orange, pear}, and the set B = {red, orange, yellow}, the elements of A could map onto the elements of B like so:

         apple -> red
         orange -> orange
         pear -> yellow

The arrow represents the mapping performed by the function. In this case, we might describe the function f applied to x as “get the color of x”. Thus, f(‘pear’) -> ‘yellow’.

Some of my favorite functions are exponential functions, e.g., f(x) = c^x, where c stands for some constant (it doesn’t matter which – it could be 3 or 2.96e12).

Likewise, one of my favorite constants called e. The most practical applications of e have to do with measuring growth rates.

Say for example that you farm chickens for a living. Since your chicken farm is a small one (left to you by your grandfather, perhaps?), you have only 82 chickens. As you know, chickens lay a lot of eggs, so let’s assume that roughly 75% of the eggs are harvested to sell, and your family manages to eat or give away to friends another 5%. This leaves 20% or so of your eggs which will be fertilized and eventually hatch. Assuming a roughly even distribution of male and female chicks, about 10% of the eggs you produce will become egg-laying hens themselves.

Sitting around the kitchen table in the farmhouse one night, you decide that you’d like to double the number of egg-producing hens you have now. You wonder how long it will take, so you get out your trusty calculator and compute the following table:

Year Egg-laying Hens

0 82
1 90.2
2 99.22
3 109.14
4 120.06
5 132.06
6 145.27
7 159.80

It’s not too difficult; you simply multiply your existing number of hens by ten percent and add the result to the principal. Alternatively, you can multiply the total number by 1.1 every year – either way will work, and will eventually show you that it will take about seven years to double your stock of egg-laying hens at your current rate.

What does this have to do with e? Quite a lot, actually. The formula e^rt is used (where r is the rate of change and t is the elapsed time). Since in our chicken farm example the rate of change is 10%, and it takes approximately seven years to double the number of egg-laying hens, we plug in the appropriate values for r and t and find that

e^rt = e^(0.1)(7) = e^0.7 = 2.014

The resulting value, 2.014, is then multiplied by our initial principal to yield an answer of 165.15. Our previous calculations were good enough (about 97% accurate compared to this one), but this method generalizes to other compound growth rate problems. And of course, it’s better for use in a computer program.

References

Bottom Up Merge Sort in Scheme

../img/37-tiles.jpg

Recently I’ve begun a project to implement a number of basic algorithms in Scheme, which I’d like to eventually grow into a free (as in freedom) ebook. Having just done a Binary Search in Scheme, I thought it would be fun to give merge sort a try.

According to the mighty interwebs, merge sort is a good choice for sorting linked lists (a.k.a., Lisp lists). Unfortunately the only Lisp merge sort implementation examples I’ve been able to find on the web have been recursive, not iterative.

The implementation described here is an iterative, bottom-up merge sort, written in a functional style. (I daren’t say the functional style, lest any real Scheme wizards show up and burn me to a crisp.)

First, generate a list of random numbers

In order to have something to sort, we need a procedure that generates a list of random numbers – note that the docstring is allowed by MIT/GNU Scheme; YMMV with other Schemes.

(define (make-list-of-random-numbers list-length max)
  ;; Int Int -> List
  "Make a list of random integers less than MAX that's LIST-LENGTH long."
  (letrec ((maker
            (lambda (list-length max result)
              (let loop ((n list-length) (result '()))
                (if (= n 0)
                    result 
                    (loop (- n 1) (cons (random max) result)))))))
    (maker list-length max '())))

Then, write a merge procedure

This implementation of the merge procedure is a straight port of the one described on the Wikipedia Merge Sort page, with one minor difference to make the sort faster 1.

An English description of the merge operation is as follows:

  • If both items passed in are numbers (or strings), wrap them up in lists and recur. (In this example we only care about sorting numbers)
  • If both lists are empty, return the result.
  • If neither list is empty:
    • If the first item in the first list is “less than” the first item in the second list, cons it onto the result and recur.
    • Otherwise, cons the first item in the second list on the result and recur.
  • If the first list still has items in it, cons the first item onto the result and recur.
  • If the second list still has items in it, cons the first item onto the result and recur.
  • If none of the above conditions are true, return #f. I put this here for debugging purposes while writing this code; now that the procedure is debugged, it is never reached. (Note: “debugged” just means “I haven’t found another bug yet”.)
(define (rml/merge pred l r)
  (letrec ((merge-aux
            (lambda (pred left right result)
              (cond 
               ((and (number? left)
                     (number? right))
                (merge-aux pred 
                           (list left) 
                           (list right) 
                           result))
               ((and (string? left)
                     (string? right))
                (merge-aux pred
                           (list left) 
                           (list right) 
                           result))
               ((and (null? left)
                     (null? right))
                (reverse result))
               ((and (not (null? left))
                     (not (null? right)))
                (if (pred (car left)
                          (car right))
                    (merge-aux pred
                               (cdr left)
                               right
                               (cons (car left) result))
                    (merge-aux pred
                               left
                               (cdr right)
                               (cons (car right) result))))
               ((not (null? left))
                (merge-aux pred (cdr left) right (cons (car left) result)))
               ((not (null? right))
                (merge-aux pred left (cdr right) (cons (car right) result)))
               (else #f)))))
    (merge-aux pred l r '())))

We can run a few merges to get a feel for how it works. The comparison predicate we pass as the first argument will let us sort all kinds of things, but for the purposes of this example we’ll stick to numbers:

(rml/merge < '(360 388 577) '(10 811 875 995))
;Value 11: (10 360 388 577 811 875 995)

(rml/merge < '(8 173 227 463 528 817) '(10 360 388 577 811 875 995))
;Value 12: (8 10 173 227 360 388 463 528 577 811 817 875 995)

(rml/merge < 
           '(218 348 486 520 639 662 764 766 886 957 961 964)
           '(8 10 173 227 360 388 463 528 577 811 817 875 995))
;Value 14: (8 10 173 218 227 348 360 388 463 486 520 528 577 639 662 764 766 811 817 875 886 957 961 964 995)

Finally, do a bottom up iterative merge sort

It took me a while to figure out how to do the iterative merge sort in a Schemely fashion. As usual, it wasn’t until I took the time to model the procedure on paper that I got somewhere. Here’s what I wrote in my notebook:

;;  XS                   |      RESULT
;;---------------------------------------------

'(5 1 2 9 7 8 4 3 6)            '()
    '(2 9 7 8 4 3 6)            '((1 5))
        '(7 8 4 3 6)            '((2 9) (1 5))
            '(4 3 6)            '((7 8) (2 9) (1 5))
                '(6)            '((3 4) (7 8) (2 9) (1 5))
                 '()            '((6) (3 4) (7 8) (2 9) (1 5))

;; XS is null, and RESULT is not of length 1 (meaning it isn't sorted
;; yet), so we recur, swapping the two:

'((6) (3 4) (7 8) (2 9) (1 5))  '()
          '((7 8) (2 9) (1 5))  '((3 4 6))
                      '((1 5))  '((2 7 8 9) (3 4 6))
                           '()  '((1 5) (2 7 8 9) (3 4 6))

;; Once more XS is null, but RESULT is still not sorted, so we swap
;; and recur again

'((1 5) (2 7 8 9) (3 4 6))      '()
                  '(3 4 6)      '((1 2 5 7 8 9))
                       '()      '((3 4 6) (1 2 5 7 8 9))

;; Same story: swap and recur!

'((3 4 6) (1 2 5 7 8 9))        '()
                     '()        '((1 2 3 4 5 6 7 8 9))

;; Finally, we reach our base case: XS is null, and RESULT is of
;; length 1, meaning that it contains a sorted list

'(1 2 3 4 5 6 7 8 9)

This was a really fun little problem to think about and visualize. It just so happens that it fell out in a functional style; usually I don’t mind doing a bit of state-bashing, especially if it’s procedure-local. Here’s the code that does the sort shown above:

(define (rml/merge-sort xs pred)
  (let loop ((xs xs)
             (result '()))
    (cond ((and (null? xs)
                (null? (cdr result)))
           (car result))
          ((null? xs)
           (loop result
                 xs))
          ((null? (cdr xs))
           (loop (cdr xs)
                 (cons (car xs) result)))
          (else
           (loop (cddr xs)
                 (cons (rml/merge <
                              (first xs)
                              (second xs))
                       result))))))

That’s nice, but how does it perform?

A good test of our merge sort is to compare it to the system’s sort procedure. In the case of MIT/GNU Scheme, we’ll need to compile our code if we hope to get anywhere close to the system’s speed. If your Scheme is interpreted, you don’t have to bother of course.

To make the test realistic, we’ll create three lists of random numbers: one with 20,000 items, another with 200,000, and finally a giant list of 2,000,000 random numbers. This should give us a good idea of our sort’s performance. Here’s the output of timing first two sorts, 20,000 and 200,000 2:

;;; Load compiled code

(load "mergesort")
;Loading "mergesort.so"... done
;Value: rml/insertion-sort2

;;; Define our lists

(define unsorted-20000 (make-list-of-random-numbers 20000 200000))
;Value: unsorted-20000

(define unsorted-200000 (make-list-of-random-numbers 200000 2000000))
;Value: unsorted-200000

;;; Sort the list with 20,000 items

(with-timing-output (rml/merge-sort unsorted-20000 <))
;Run time:      .03
;GC time:       0.
;Actual time:   .03

(with-timing-output (sort unsorted-20000 <))
;Run time:      .02
;GC time:       0.
;Actual time:   .021

;;; Sort the list with 200,000 items

(with-timing-output (rml/merge-sort unsorted-200000 <))
;Run time:      .23
;GC time:       0.
;Actual time:   .252

(with-timing-output (sort unsorted-200000 <))
;Run time:      .3
;GC time:       0.
;Actual time:   .3

As you can see, our sort procedure is on par with the system’s for these inputs. Now let’s turn up the heat. How about a list with 2,000,000 random numbers?

;;; Sort the list with 2,000,000 items

(define unsorted-2000000 (make-list-of-random-numbers 2000000 20000000))
;Value: unsorted-2000000

(with-timing-output (rml/merge-sort4 unsorted-2000000 <))
;Aborting!: out of memory
;GC #34: took:   0.80 (100%) CPU time,   0.10 (100%) real time; free: 11271137
;GC #35: took:   0.70 (100%) CPU time,   0.90  (81%) real time; free: 11271917
;GC #36: took:   0.60 (100%) CPU time,   0.90  (99%) real time; free: 11271917

(with-timing-output (sort unsorted-2000000 <))
;Run time:      2.48
;GC time:       0.
;Actual time:   2.474

No go. On a MacBook with 4GB of RAM, our merge sort runs out of memory, while the system sort procedure works just fine. It seems the wizards who implemented this Scheme system knew what they were doing after all! :-}

It should be pretty clear at this point why we’re running out of memory. In MIT/GNU Scheme, the system sort procedure uses vectors and mutation (and is no doubt highly tuned for the compiler), whereas we take a relatively brain-dead approach that uses lists and lots of cons-ing. I leave it as an exercise for the reader (or perhaps my future self) to rewrite this code so that it doesn’t run out of memory.

(Image courtesy mag3737 under Creative Commons license.)

Footnotes:

1 An earlier implementation started off the sort by “exploding” the list to be sorted so that ='(1 2 3)= became ='((1) (2) (3))=. This is convenient for testing purposes, but very expensive. It’s also unnecessary after the first round of merging. We avoid the need to explode the list altogether by teaching merge to accept numbers and listify them when they appear. We could also do the same for strings and other types as necessary.

2 For the definition of the with-timing-output macro, see here.