How to draw the Hilbert Curve using a computer program

In this note we will learn how to draw the Hilbert Curve using a computer program written in Scheme.

There are a number of descriptions in books, papers, and online that talk about how to draw the curve. (For a list of the materials that I reviewed while trying to learn this, see the References section below.) Unfortunately, I found most of the descriptions very difficult to understand!

Most descriptions of how to draw the Hilbert curve seem to fall into one of the following categories:

  1. Using magic tables of numbers to generate coordinates, which is fine if you already understand how to construct the curve and then work backwards from there to figure out how to generate those magic numbers. But it doesn’t really help you understand how to draw it in the first place.
  2. A “grammatical” approach using L-systems (about which see the wiki article linked above), which is fine if you want to take a detour into writing an interpreter for L-systems. Unfortunately, I did not — or at least, I did not think it would teach me much about how to actually draw the Hilbert curve. But it’s definitely a cool approach!
  3. Iterating over C arrays and doing magical bitshifts to represent both the points on the curve and the various rotations/reflections that are applied to those points.

In my opinion, none of these approaches are really useful for learning. They are fine if you are looking for implementation techniques for something you already understand.

Thankfully all was not lost. I spent a lot of time reading, and scratching my head, and drawing curves in my notebook. Then, I found the excellent article Crinkly Curves by Brian Hayes. In addition to the great information in the article, Hayes linked to a really helpful visualization on his website called “Constructing the Hilbert Curve”. I urge you to try it, since seeing the visualization is important to what follows.

Seeing this is what helped me finally understand. Thank you, Brian Hayes!

The algorithm displayed in Hayes’ visualization is approximately what I ended up implementing, and it looked like this in Scheme:

(define (hilbert order width)
  (let loop ((count 0)
             (shape (h0 width)))
    (if (= count order)
        shape
        (let* ((next-shape (shrink-shape shape 0.5))
               (upper-left (reflect-shape (rotate-shape next-shape (* 3.14159 1.5))))
               (upper-right (translate-shape next-shape 'east width))
               (lower-left (translate-shape next-shape 'south width))
               (lower-left* (reflect-shape (rotate-shape lower-left (* 3.14159 .5))))
               (lower-right (translate-shape next-shape 'southeast width)))
          (loop (+ count 1)
                (merge-shapes upper-left upper-right lower-right lower-left*))))))

It works like this:

  1. Given a “shape” (starting with the zeroth order Hilbert curve, which is just point in the center of a square), make 4 copies of that shape, each of which fits in a square 1/2 the width of the original square. Then, “move” the shape to one of the 4 corners of the original square. (The “move” is accomplished by a call to translate-shape as shown above.)
  2. Depending on which corner of the full-sized square the shape is now in, it may need to be rotated (spun) around its axis so it points in the right direction. In some cases it may also need to be reflected so that the locality of the points in the curve is correctly preserved. Note that the calls to rotate-shape take a radians argument, a unit for measuring angles. 360 degrees is equal to (* 3.14159 2) in radians. For more information, see this article.
  3. Keep repeating the previous two steps, bumping a counter and accumulating a list of points until we have all of the points for an Nth order curve. We accumulate the list of points using merge-shapes, which is really just append.
  4. Base case: The counter is now equal to the order argument, telling us we have an Nth order Hilbert curve. So we return the accumulated list of points.

These “points” are really just 2-element (X Y) lists, as you can see below:

> (hilbert 2 600))

((74.99970147174234 75.00029852944591) (224.9997014705541 74.99970147174234)
 (225.00029852825764 224.9997014705541)
 (75.00029852944591 225.00029852825764)
 (75 375)
 (75 525)
 (225 525)
 (225 375)
 (375 375)
 (375 525)
 (525 525)
 (525 375)
 (525.0000995095512 224.99990049031675)
 (375.00009950968325 225.00009950955126)
 (374.99990049044874 75.00009950968327)
 (524.9999004903167 74.99990049044875))

Using some not-so-interesting utility code that builds an SVG (XML) file and does some geometry things like shape rotation, etc., we can draw this list of points by outputting it into an SVG polyline:

(let ((content (tag->string (svg (shape->polyline (rotate-shape (hilbert 5 600) (* 3.14159 0.5)))))))
  (with-output-to-file "hilbert.svg"
    (lambda ()
      (display content))))

The file hilbert.svg output by the above code looks like this:

We can make an even prettier picture if we stack the curves atop each other in the same image as follows:

(let ((content (tag->string
                (svg (reverse (list
                               (shape->polyline (rotate-shape (hilbert 1 600) (* 3.14159 0.5)) 'black)
                               (shape->polyline (rotate-shape (hilbert 2 600) (* 3.14159 0.5)) 'red)
                               (shape->polyline (rotate-shape (hilbert 3 600) (* 3.14159 0.5)) 'blue)
                               (shape->polyline (rotate-shape (hilbert 4 600) (* 3.14159 0.5)) 'orange)
                               (shape->polyline (rotate-shape (hilbert 5 600) (* 3.14159 0.5)) 'yellow)))))))
  (with-output-to-file "multi-hilbert.svg" (lambda () (display content))))

Here is the image in multi-hilbert.svg:

References

Software design critique: dnf

I just posted the following comment on a Reddit thread entitled “Strange/confusing dnf and package management quirks”:

Possibly related: I just spun up an old Fedora (27) box I hadn’t used in several years, and went through the process of reacclimating myself to dnf to upgrade things, etc.

In general, I think the “porcelain” for dnf is not very good. It feels like there is no designer (in the Fred Brooks sense) in charge of its development.

It would be nice if the hypothetical chief dnf designer could limit themselves to like 5-7 commands (aka human working memory) with a “flags budget” of 1-2 flags per command. The current situation feels very messy, like people are just hacking on stuff (which is probably what is happening, since this is software).

All i really need from a dnf user porcelain is:

  • dnf update: update all packages on the system to latest versions
  • dnf search $string : show all packages matching $string
  • dnf upgrade $version: just do all the crap to upgrade me to version $version. Especially don’t nag me about stupid intermediate states (‘did you run “dnf upgrade –verify”?’ or whatever is, frankly, a sign of bad design – if the software knows about the intermediate state, then it should handle it)
  • dnf install $pkg: install $pkg
  • dnf uninstall $pkg: uninstall $pkg

Everything else IMO should be punted to RPM and custom scripts, for people who have weird needs.

I’m probably dropping a bunch of use cases on the floor that are “really important” to that one dnf hacker or enterprise customer. But the design above would be perfect for my (end-user) needs. Current situation is very “google random commands, copypasta, and pray”. Current design has way too many bolts and knobs sticking out.

In general, I feel that most software design eventually falls victim to the “thousand hackers making their little semi-random PRs that make sense individually” approach unless there is a big meanie design chief standing at the bridge saying “YOU SHALL NOT PASS”. Which there approximately never is.

A full implementation of breadth-first search of a graph

In this post I’ll share a full, standalone implementation of breadth-first search of a graph. Breadth-first search (hereafter BFS) is used in many graph algorithms. In particular it is useful for finding the shortest path (in terms of fewest vertices, or “hops”) between two vertices A and B.

There are many pseudocode examples showing breadth-first search of a graph on the interwebs, and lots of textual descriptions. There are also many videos on Youtube talking about this algorithm.

Unfortunately, despite all of the above, I had a really hard time finding the complete source code of a graph traversal that I could read and study — at least, one that was at the right level of complexity for me to actually learn how it works. A lot of the code I found online or in books was too abstract and hid implementation details from me by relying on “helper” libraries which were not really explained. For example, this seemed to be the case with the Java code from Sedgewick. The underlying data structures of most algorithms’ implementations were hidden away in library code which I would have needed to study to understand what was really happening.

On the other hand, the couple of production-quality graph implementations I looked at were very hard to read for someone not very familiar with the code and the algorithm already.

I knew that in order to really learn how this algorithm works, I needed to aim somewhere in the middle. In the end, I decided to implement it myself to make sure I understood it. I’m sure that’s pretty common!

So what do I mean by a full, standalone implementation?

I mean that you can download just the code on this page, and run it on any machine where Perl is installed, and it will work. Other than the language it’s written in, there are no external libraries or other dependencies.

Without further ado here is my take on a full implementation of breadth-first search of a graph. If you download only the code on this page and run it on your machine, it will work – the batteries are included, as they say.

(Incidentally, the statement above is also true of the Scheme code from Simple Network Search in Scheme, if you’d rather look at an implementation in a non-ALGOL language. However, that code is not really “my own creation”, it’s merely ported to Scheme from a Lisp book.)

Algorithm prose description

At a high level, the way BFS works is:

  • 1. Push the starting vertex A onto a queue
  • 2. While the queue is not empty, get the next vertex from it (in this case A, since we just added it)
  • 3. Look at each neighboring vertex N of A and take the following steps:
  • 1. If N has already been marked as visited, skip it and go to the next neighbor of A.
  • 2. Add the pair (N, A) to a spanning tree (or a reverse linked list). If we find the end node B, we will walk this spanning tree backward to find the path from the end vertex B back to the start vertex A.
  • 3. If the currently visited neighbor N is the end vertex B we’re looking for, we walk the spanning tree backward to build the path taken from B back to A.
  • 4. If the currently visited neighbor N is not the end vertex, we push N onto the queue to be processed.
  • 5. Finally, we mark N as visited.

Pseudocode

To supplement the prose description above, here is some pseudocode for BFS. It maps pretty directly onto the concrete implementation we will get into in the next section. I did complain a little above about pseudocode, but of course I’ll subject you to mine just the same. :-} I do think this is a reasonably “complete” pseudocode that you could actually use to implement BFS, unlike many examples that I found.

    function bfs(start, end, graph) {

        push start onto queue
        mark start as seen
        make a spanning tree for storing the path from start to end

        while queue is not empty {
            vertex = get next item from queue
            for each neighbor of vertex in the graph {
                if neighbor has been seen {
                    next
                }
                add (neighbor, vertex) to the spanning tree
                if this neighbor is the end node we want {
                    walk the spanning tree, printing path from start to end
                    exit
                }
                else {
                    push neighbor onto the queue
                }
                mark this neighbor as visited
            }
        }
      print "No path found"
      exit
    }

Implementation

The code below shows a direct implementation of BFS in Perl. The graph it will search is the one shown in the diagram at the top of this post. In code, the data structure for the graph is as shown below. It’s an array of arrays: the first element of each array is the vertex A, followed by an array of its neighbors. This is the classic “adjacency list” representation of a graph.

    my $graph = [['s', ['a', 'd']],
                 ['a', ['s', 'b', 'd']],
                 ['b', ['a', 'c', 'e']],
                 ['c', ['b']],
                 ['d', ['s', 'a', 'e']],
                 ['e', ['b', 'd', 'f']]];

Next, here is the main loop of BFS. If you know Perl, you can see that it maps pretty much 1:1 to the description in the above pseudocode. The shape of this code would be pretty much the same in another language such as Python or Ruby, with a few small syntactic differences, but hardly any semantic differences.

    sub find_path_between {
        my ( $start, $end, $graph ) = @_;

        return () unless defined $start && defined $end;

        my @path;     # Path so far
        my @queue;    # Vertices still to visit.
        my %seen;     # Vertices already seen.
        my $found;    # Whether we have found the wanted vertex.
        my $st = {};  # Spanning tree, used to find paths.

        if ( $start eq $end ) {
            push @path, $start;
            return @path;
        }

        push @queue, $start;
        $seen{$start}++;

        while (@queue) {
            my $v         = shift @queue;
            my $neighbors = get_neighbors( $v, $graph );

            for my $neighbor (@$neighbors) {
                next if $seen{$neighbor};
                _st_add( $v, $neighbor, $st );
                if ( $neighbor eq $end ) {
                    $found++;
                    @path = _st_walk( $start, $end, $st );
                    return @path;
                }
                else {
                    push @queue, $neighbor;
                }
                $seen{$neighbor}++;
            }
        }
        return $found ? @path : ();
    }

Once the main loop structure is written so that you are walking the vertices of the graph in the correct (breadth-first) order, one part I found slightly tricky was keeping a “trail of bread crumbs” from the end node back to the start. You may have noticed that the code above uses two helper functions — _st_add and _st_walk — that hide that complexity away a little bit. We will now look at them in more detail.

_st_add is trivial, and could have been written directly as a hash table access. The idea is to keep a logical “reverse linked list” structure in a hash table, where each vertex added to the list has a link pointing back to the previous vertex. I hid it from myself in this function so I could read the logic in the main search loop more easily.

    sub _st_add {
        my ( $vertex, $neighbor, $st ) = @_;
        $st->{$neighbor}->{prev} = $vertex;
    }

_st_walk is a little more interesting. As noted above, we kept a trail of bread crumbs from our end vertex (if such exists) back to the start vertex. _st_walk walks that trail of crumbs backward and builds an array holding all of the vertices visited along the way, which it returns to the caller.

    sub _st_walk {
        my ( $start, $end, $st ) = @_;

        my @path;
        push @path, $end;

        my $prev = $st->{$end}->{prev};
        while (1) {
            if ( $prev eq $start ) {
                push @path, $start;
                last;
            }
            push @path, $prev;
            $prev = $st->{$prev}->{prev};
            next;
        }
        return reverse @path;
    }

Finally, you may have noticed that in order to visit all of the neighbors of some node A we have to be able to list them. The function get_neighbors handles that task, along with its helper function _find_index. Let’s look at them in turn.

get_neighbors looks up some node N in the graph, and returns its list of neighbors, if there are any.

    sub get_neighbors {
        my ( $k, $graph ) = @_;

        my $index = _find_index( $k, $graph );

        if ( defined $index ) {
            return $graph->[$index]->[1];
        }
        else {
            return;
        }
    }

_find_index looks up the node N‘s index in our array-based graph representation. This is actually not a very performant way to do this, since it’s doing a linear search into an array. There are ways to speed this up, such as using a hash of arrays for faster vertex lookup, or using a cache. However I felt it would be better to keep the graph’s data representation as simple as possible for this example. (Incidentally, the below is exactly the sort of somewhat uninteresting but very necessary bookkeeping code I was having a hard time finding in much of the example code in books or online.)

    sub _find_index {
        my ( $wanted, $graph ) = @_;

        # Naive linear search, for now.
        my $i = 0;
        for my $elem (@$graph) {

            # Definedness check here is necessary because we delete
            # elements from the graph by setting the element's index to
            # undef.  In other words, some graph indices can be undef.
            if ( defined $elem->[0] && $elem->[0] eq $wanted ) {
                return $i;
            }
            $i++;
        }
        return;
    }

Finally, we have a main function that drives everything. First, we find the shortest path (by number of nodes) from ‘s’ to ‘c’. Then, we find the shortest path from ‘s’ to ‘f’:

    sub main {

        my $graph = [
            [ 's', [ 'a', 'd' ] ],
            [ 'a', [ 's', 'b', 'd' ] ],
            [ 'b', [ 'a', 'c', 'e' ] ],
            [ 'c', ['b'] ],
            [ 'd', [ 's', 'a', 'e' ] ],
            [ 'e', [ 'b', 'd', 'f' ] ]
        ];

        my $start = 's';
        my $end   = 'c';

        my @path = find_path_between( $start, $end, $graph );

        print qq[Path from '$start' to '$end' is: @path\n];

        # Find a second path.
        $end  = 'f';
        @path = find_path_between( $start, $end, $graph );
        print qq[Path from '$start' to '$end' is: @path\n];
    }

Putting everything above together and running it will print the following output:

    Path from 's' to 'c' is: s a b c
    Path from 's' to 'f' is: s d e f

References

1. Aho, Ullman, Hopcroft – Data Structures and Algorithms.

2. Sedgewick – Algorithms in Java, Part 5: Graph Algorithms.

3. Orwant, Hietaniemi, Macdonald – Mastering Algorithms with Perl.

Appendix: The complete program listing

    #!perl

    use strict;
    use warnings;

    sub find_path_between {
        my ( $start, $end, $graph ) = @_;

        return () unless defined $start && defined $end;

        my @path;     # Path so far
        my @queue;    # Vertices still to visit.
        my %seen;     # Vertices already seen.
        my $found;    # Whether we have found the wanted vertex.
        my $st = {};  # Spanning tree, used to find paths.

        if ( $start eq $end ) {
            push @path, $start;
            return @path;
        }

        push @queue, $start;
        $seen{$start}++;

        while (@queue) {
            my $v         = shift @queue;
            my $neighbors = get_neighbors( $v, $graph );

            for my $neighbor (@$neighbors) {
                next if $seen{$neighbor};
                _st_add( $v, $neighbor, $st );
                if ( $neighbor eq $end ) {
                    $found++;
                    @path = _st_walk( $start, $end, $st );
                    return @path;
                }
                else {
                    push @queue, $neighbor;
                }
                $seen{$neighbor}++;
            }
        }
        return $found ? @path : ();
    }

    sub _st_walk {
        my ( $start, $end, $st ) = @_;

        my @path;

        push @path, $end;
        my $prev = $st->{$end}->{prev};
        while (1) {
            if ( $prev eq $start ) {
                push @path, $start;
                last;
            }
            push @path, $prev;
            $prev = $st->{$prev}->{prev};
            next;
        }
        return reverse @path;
    }

    sub _st_add {
        my ( $vertex, $neighbor, $st ) = @_;
        $st->{$neighbor}->{prev} = $vertex;
    }

    sub get_neighbors {
        my ( $k, $graph ) = @_;

        my $index = _find_index( $k, $graph );

        if ( defined $index ) {
            return $graph->[$index]->[1];
        }
        else {
            return;
        }
    }

    sub _find_index {
        my ( $wanted, $graph ) = @_;

        # Naive linear search, for now.
        my $i = 0;
        for my $elem (@$graph) {

            # Definedness check here is necessary because we delete
            # elements from the graph by setting the element's index to
            # undef.  In other words, some graph indices can be undef.
            if ( defined $elem->[0] && $elem->[0] eq $wanted ) {
                return $i;
            }
            $i++;
        }
        return;
    }

    sub main {
        my $graph = [
            [ 's', [ 'a', 'd' ] ],
            [ 'a', [ 's', 'b', 'd' ] ],
            [ 'b', [ 'a', 'c', 'e' ] ],
            [ 'c', ['b'] ],
            [ 'd', [ 's', 'a', 'e' ] ],
            [ 'e', [ 'b', 'd', 'f' ] ]
        ];

        my $start = 's';
        my $end   = 'c';

        my @path = find_path_between( $start, $end, $graph );

        print qq[Path from '$start' to '$end' is: @path\n];

        # Find a second path.
        $end  = 'f';
        @path = find_path_between( $start, $end, $graph );
        print qq[Path from '$start' to '$end' is: @path\n];
    }

    main();

Three views on a loop

Recently I needed to translate a function I had written in Scheme to Common Lisp for … reasons. As it turns out, this was not entirely straightforward. The function was originally written in a very recursive style. In order to have something that would run portably across Common Lisp implementations I needed something that didn’t rely so heavily on recursion.

This was not easy for me. I have written much much more Scheme than Common Lisp in my life, and it feels quite natural to me now to design loops using recursion. So that bridge had to be crossed somehow. This is the story of how I crossed it.

The procedure in question is merge, which is used to implement the core of a merge sort.

Let’s start with the Scheme. The code below shows a straightforward implementation of merge from a Scheme programmer’s perspective (or at least this one’s). As mentioned above, this is the merging procedure that is used to implement the core of a merge sort. It’s very similar to the code presented in Bottom Up Merge Sort in Scheme.

(define (merge l r pred)
  (letrec ((merge-aux
            (lambda (left right pred result)
              (cond ((and (null? left)
                          (null? right))
                     (reverse result))
                    ((and (not (null? left))
                          (not (null? right)))
                     (if (pred (car left) (car right))
                         (merge-aux (cdr left)
                                    right pred (cons (car left) result))
                       (merge-aux left
                                  (cdr right) pred (cons (car right) result))))
                    ((not (null? left))
                     (merge-aux (cdr left) right pred (cons (car left) result)))
                    ((not (null? right))
                     (merge-aux left (cdr right) pred (cons (car right) result)))
                    (else #f)))))
    (merge-aux l r pred '())))

As you can see this code makes heavy use of recursive procedure calls, the default iteration construct in Scheme. As noted above, we can’t expect this sort of code to be supported in portable Common Lisp code, so it needs to be translated.

(Side note: If you are interested in some notes on which Lisp implementations support tail calls, recursive self-calls, etc., see this interesting article.)

Below is my first attempt to translate the Scheme code directly into Common Lisp. It still relies heavily on recursive function calls. If you compile this in a recent CMUCL or Clozure CL the recursion doesn’t seem to bother them, since their compilers support recursion well. And the performance will be pretty good, too. But that won’t be the case in every Lisp.

(Note that we call this function rml/merge because Common Lisp already has a built-in merge — about which more later.)

(defun rml/merge (l r pred)
  (labels ((merge-aux (pred left right result)
             (cond ((and (null left)
                         (null right))
                    (reverse result))
                   ((and (not (null left))
                         (not (null right)))
                    (if (funcall pred (car left) (car right))
                        (merge-aux pred (cdr left)
                                   right (cons (car left) result))
                        (merge-aux pred left
                                   (cdr right) (cons (car right) result))))
                   ((and (not (null left)) (null right))
                    (merge-aux pred (cdr left) right (cons (car left) result)))
                   ((and (null left) (not (null right)))
                    (merge-aux pred left (cdr right) (cons (car right) result)))
                   (t 'FAIL))))
    (merge-aux pred l r '())))

As we discussed, some Common Lisp implementations will not handle this level of recursion. One Lisp in particular was pretty salty about it. :-}

IMAGE

genera-stack-overflow

CLISP and ABCL will also explode pretty much immediately.

It took a bit of thinking, but I remembered reading about something called BLOCK in good old CLtL2. It lets you create a named block that you can later jump out of at your convenience, e.g.,

(block outer
   ;; ... do things
   ;; ok, i'm done!
   (return-from outer my-result))

This reminds me of a similar pattern I’ve used in Perl (or Java for that matter) for early returns.

LOOP: for my $elem (@list) {
    # do things ....
    next LOOP if check($elem);
    # do other things ...
}

Using my newfound weapon (hammer?) block, I was able to slay the recursion dragon as shown below in rml/merge2. This code is speaking Common Lisp, but it still has a heavy Scheme accent! The shape of the computation feels pretty similar, at least.

(defun rml/merge2 (left right pred)
  (let ((left left)
        (right right)
        (result '()))
    (block outer
      (loop do
           (block inner
             (cond ((and (null left) (null right))
                    (return-from outer))
                   ((and (not (null left))
                         (not (null right)))
                    (if (funcall pred (car left) (car right))
                        (progn
                          (push (car left) result)
                          (setf left (cdr left))
                          (return-from inner))
                        (progn
                          (push (car right) result)
                          (setf right (cdr right))
                          (return-from inner))))
                   ((not (null left))
                    (progn
                      (push (car left) result)
                      (setf left (cdr left))
                      (return-from inner)))
                   ((not (null right))
                    (progn
                      (push (car right) result)
                      (setf right (cdr right))
                      (return-from inner)))
                   (t                   ; Should never get here (tm)
                    (error "ERROR: RML/MERGE2: ~A ~A ~A~%" left right result))))))
    (nreverse result)))

At a high level, the code works as follows:

  1. let-bind some variables to the arguments passed in, so we can bash on them at will later on.
  2. Open up an outer block to capture the state of the loop, with a name we can use for an early return. This is what we will jump out to when we’re done.
  3. Start a big loop with an inner block. I’m using a magic number here (one billion) as a shorthand for “just loop forever” since we are going to jump out when we’re done anyway. (“One-billion-element-long lists should be enough for anyone.”)
  4. Inside the loop’s inner block, we model the iteration created by recursive function calls using jumps. In every step of the cond, we jump out to the inner block’s label just as we would in Scheme by making a procedure call, except here we need to be explicit using return-from, whereas Scheme lets us be implicit.
  5. Once left and right are both empty, our work is done, so we jump back out to the outer block. Then, we destructively sort the result list in place with nreverse, just because we can. (Hey, it was temporary anyway.)

This isn’t very satisfying, though. It felt icky using block, which feels like a building block you’d need to build a looping construct (aka compiler/macro output), but not something “user-level” code should need to use. So I did some more reading of Common Lisp code. In particular I discovered a couple of things:

  • A bare loop with no arguments will happily go on forever; you don’t need to say loop repeat 5 ....
  • Inside a loop, you don’t need to say do (something else) unless, prior to that point, you’ve used loop‘s “English” syntax, e.g., loop from 1 to 10 do ....
  • Because this isn’t Scheme, we don’t need to do the 2-step dance of “put car of A here, set A to cdr of A there”. We can push the return value of pop onto Adirectly, and rely on the fact that pop mutates its argument list to handle setting A to cdr of A. Mmmmm, yummy state-bashing.

Thanks to the above, we now have code for rml/merge3. It got a lot shorter and simpler along the way, while saying the same things:

(defun rml/merge3 (left right pred)
  (let ((result '()))
    (loop
       (cond ((and (null left) (null right))
              (return (nreverse result)))
             ((and (not (null left))
                   (not (null right)))
              (if (funcall pred (car left) (car right))
                  (push (pop left) result)
                  (push (pop right) result)))
             ((not (null left))
              (push (pop left) result))
             ((not (null right))
              (push (pop right) result))
             (t                         ; Should never get here (tm)
              (error "ERROR: RML/MERGE2: ~A ~A ~A~%" left right result))))))

Now that I had some code, I needed to make sure it actually gave the right answers (aka the hard part). Because Common Lisp has a merge function built in, I was able to write a bit of test code to check my merge function’s implementation against what the Lisp wizards did. There’s no feeling like standing on the shoulders of giants!

Here is the test code, which you can paste into your REPL and try for yourself. It runs our merge function against the system’s merge 1000 times on the same randomly generated input lists. For each test case, if the outputs match, we output a t; otherwise nil.

(defun make-random-list (size &key max)
  (loop repeat (random size) collecting (random max)))

(defun check-merge-output (function size)
  (let* ((left (make-random-list size :max size))
         (right (make-random-list size :max size))
         (left-copy (copy-list left))
         (right-copy (copy-list right))
         (expected (merge 'list left-copy right-copy #'<))
         (got (funcall function left right #'<)))
    (if (equalp expected got)
        t
        (values expected got))))

(defun run-merge-tests (function)
  (loop repeat 1000 collecting
       (if (check-merge-output function 1000) t nil)))

Here’s what the output of a successful run looks like (Luckily, I have not yet found any unsuccessful ones):

CL-USER> (run-merge-tests #'rml/merge3)
(T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T ...)

Based on the testing shown here, I think this implementation is Probably Fine ™. I’m happy with it, and it performs well enough for my needs.

In a future post, we’ll use the merge function we’ve shared here to write a merge sort.

Performance Postamble

But what about the performance? It’s not bad, actually! As usual, it can depend on the implementation. Below I’ll list out how it compares in a few implementations. However I should state right now for the record that:

  1. I’m not a performance or benchmarking expert. You should probably just stop reading here, because what follows is likely to be nonsense.
  2. My implementation only works on lists (not vectors, too) and does not accept a sort key, as the standard dictates.
  3. My implementation is not battle-hardened from living inside a Common Lisp implementation for decades (the most important point of all).

That said, what follows is what I saw when I ran the test code below on a few implementations:

(defun check-merge-performance (function)
  (time
   (let* ((len 1000000)
          (max 1000000)
          (xs (make-random-list len :max max))
          (ys (make-random-list len :max max)))
     (progn
       (funcall function xs ys #'<)
       (values)))))

(defun check-system-merge-performance ()
  (time
   (let* ((len 1000000)
          (max 1000000)
          (xs (make-random-list len :max max))
          (ys (make-random-list len :max max)))
     (progn
       (merge 'list xs ys #'<)
       (values)))))

Note that the following tests were all run on a several-years-old Windows laptop with spinning disks, slow RAM, and a bunch of other applications running. Did I mention I’m not a performance and benchmarking expert? :-}

Armed Bear Common Lisp

ABCL‘s merge implementation did not do so well against mine, but I think I must be missing something. For example, there’s probably a lot of subtlety around JVM performance on Windows I just have no idea about.

Mine ran in about 17 seconds.

CL-USER> (check-merge-performance #'rml/merge3)
17.452 seconds real time
4000684 cons cells
NIL

Unfortunately, theirs took about 5 minutes to do about the same (*) computation. I’m not sure why. It’s possible I’m making some kind of mistake here, but I don’t know what it is. I used the timing code below because at first I thought the ABCL merge check had just hung altogether.

CL-USER> (time
          (let* ((len 1000000)
                 (max 1000000)
                 (xs (make-random-list len :max max))
                 (ys (make-random-list len :max max)))
            (progn
              (merge 'list xs ys #'<)
              (values))))

312.803 seconds real time
412497 cons cells
NIL

(*) It’s not exactly the same, because remember that CL’s merge works on lists or vectors, so there will be some dispatching.

Out of curiosity, I went back and tried a few smaller inputs. ABCL seems to do OK with 2 100,000-element lists of random numbers, as shown below. Further, if you look at the number of cons cells, the function does appear to be O(n), which you would expect to see. This makes me think the problem is simply that 2 lists of 1,000,000 elements in the JVM is just too big a load for my little laptop, somehow.

CL-USER> (time
          (let* ((len 100000)
                 (max 1000000)
                 (xs (make-random-list len :max max))
                 (ys (make-random-list len :max max)))
            (progn
              (merge 'list xs ys #'<)               (values)))) 0.028 seconds real time 4025 cons cells NIL CL-USER> (time
          (let* ((len 10000)
                 (max 1000000)
                 (xs (make-random-list len :max max))
                 (ys (make-random-list len :max max)))
            (progn
              (merge 'list xs ys #'<)
              (values))))
2.24 seconds real time
40113 cons cells
NIL

CLISP

CLISP‘s implementation is about twice as fast as mine, and uses about half as much memory, as is right and proper.

CL-USER> (check-merge-performance #'rml/merge3)
Real time: 9.634908 sec.
Run time: 9.578125 sec.
Space: 56000000 Bytes
GC: 16, GC time: 0.578125 sec.
; No value
CL-USER> (check-system-merge-performance)
Real time: 4.6764607 sec.
Run time: 4.671875 sec.
Space: 32000000 Bytes
GC: 6, GC time: 0.40625 sec.
; No value

Corman Lisp

Corman Lisp is a Common Lisp implementation for Windows. It’s fast and has really good Win32 integration. I enjoy the IDE a lot.

If you like Lisp, you may enjoy this interesting talk by its creator Roger Corman where he discusses its development.

The system merge is about twice as fast as mine. It seems to come down to my implementation creating a lot of garbage, which is expected given my naive use of push and pop when I should be doing more direct mutation and structure sharing using e.g. rplacd.

(check-merge-performance #'rml/merge3)
Total Execution time: 0.360289 seconds
Time spent garbage collecting: 0.093065 seconds

(check-system-merge-performance)
Total Execution time: 0.180082 seconds
Time spent garbage collecting: 0.0 seconds

LispWorks (32-bit Hobbyist Edition for Windows)

LispWorks is a proprietary implementation put out by a company in England. Not only is their merge faster, they’re using about half as much memory. All as expected.

P.S. I just bought the “Hobbyist” version of the software, and I can definitely recommend it. Fast compiler, and the IDE and class browser, debugger, etc., are lot of fun to use.

CL-USER 40 > (check-merge-performance #'rml/merge3)
Timing the evaluation of (LET* ((LEN 100000) (MAX 100000) (XS (MAKE-RANDOM-LIST LEN :MAX MAX)) (YS (MAKE-RANDOM-LIST LEN :MAX MAX))) (PROGN (FUNCALL FUNCTION XS YS (FUNCTION <)) (VALUES))) User time    =        0.062 System time  =        0.015 Elapsed time =        0.063 Allocation   = 2895872 bytes 0 Page faults CL-USER 41 > (check-system-merge-performance)
Timing the evaluation of (LET* ((LEN 100000) (MAX 100000) (XS (MAKE-RANDOM-LIST LEN :MAX MAX)) (YS (MAKE-RANDOM-LIST LEN :MAX MAX))) (PROGN (MERGE (QUOTE LIST) XS YS (FUNCTION <)) (VALUES)))

User time    =        0.015
System time  =        0.000
Elapsed time =        0.015
Allocation   = 1434380 bytes
0 Page faults

Anyway that’s all for now. As I said, you should take the “results” above with a grain of salt. Nonetheless it was interesting seeing how the different implementations behaved in an unscientific comparison.

Good Reads: A Place of My Own

book: a place of my own, by michael pollan

I just finished a second reading of this book over the course of a couple of months of 10-20 minute sessions in the evenings.

Despite myself, I have to admit: I like it very much. Even though the author’s concerns and way of seeing the world are very different from mine, it was fun and interesting to follow along with him on a meandering and -dare I say- slightly self indulgent journey through the process of building a little writing cottage in the woods on his property in Connecticut.

The meandering path Pollan takes you on, through various historical texts, architectural trends, and social changes, can be slow going in places but, depending on your point of view, it’s a rather luxurious as well. So much writing is trying to get to some kind of “point” these days, it seems. Most of one’s reading can consist of internet listicles with straight-to-the-point, SEO-optimized headlines. Even very good technical nonfiction can become a bit of a drag at times. It’s nice to spend a few hours walking along a scenic and sometimes semi-circular path with an erudite, friendly, and self effacing (if almost absurdly bourgeois) host like Pollan.

The literary excursions are interspersed with tales of Pollan’s interactions with his architect and a local carpenter that he hires to help him with his project. These scenes are the most enjoyable ones to read, in my view. The things that stand out to me are Pollan’s way of seeing other people with a kindness and tolerance of their foibles that I found endearing.

A final note: if you didn’t already know that architecture as a field is just plain off-the-rails crazy, you will once you’ve finished reading this book. It might be worth reading just for that.

I give it 4 out of 5 stars.

Pythagoras Pie

(Image courtesy Tomohiro Tachi under Creative Commons license.)

Recently I came across a fun programming challenge called the Pythagoras Pie, which was described as:

At a party a pie is to be shared by 100 guests. The first guest gets 1% of the pie, the second guest gets 2% of the remaining pie, the third gets 3% of the remaining pie, the fourth gets 4% and so on.

Write a script that figures out which guest gets the largest piece of pie.

I sat down for a few minutes, and wrote the obvious code. It iterates over the list of guests. For each guest, it calculates how large a piece of pie the guest will get. All the while, it stores size of the largest piece of pie it has seen so far.

Here is a solution in Perl.

sub slice_pie {
    my $iters   = shift;
    my $pie     = 1;
    my $largest = 0;
    my $winner  = 0;
    for ( 0 .. $iters ) {
        my $iter_value = $_ * .01;
        my $portion    = ( $iter_value * $pie );
        $pie = $pie - $portion;

        if ( $portion >= $largest ) {
            $largest = $portion;
            $winner  = $_;
        }
    }
    print qq[Winner is guest # $winner with the largest portion: $largest\n];
}

slice_pie(100);

The answer, as it turns out, is that the 10th guest gets the largest piece of the pie: 0.0628156509555295, or about 6%.

Just for fun, I wrote almost the same exact code once again, except this time in Perl 6. Even though this is a straightforward translation using the same basic loop structure, it has a few nice improvements:

  • No need for argument unpacking (saves a horizontal line — vertical compactness is good)
  • Nice type annotations mean we can call an integer an Int, which also helps the compiler
  • No need for parens around the for and if checks
sub slice-pie(Int $iters) {
    my $pie = 1;
    my $largest = 0;
    my Int $winner;

    for 0 .. $iters {
        my $iter_value = $_ * .01;
        my $portion = $iter_value * $pie;
        $pie -= $portion;

        if $portion >= $largest {
            $largest = $portion;
            $winner = $_;
        }
    }
    say qq[Winner is guest number $winner with the largest portion: $largest ];
}

slice-pie(100);

A Perl one liner to generate passwords

I’ve noticed that browsers like Safari and Chrome are helpfully offering to generate secure passwords for me when I create a new login somewhere.

Sometimes this is really nice! Certainly it’s better than having to take a few minutes to compose a new password, especially since the quality of passwords I can easily generate on the spot is … shall we say of questionable quality sometimes, depending on time of day, blood sugar levels, etc.

So just for fun I decided to type out a Perl one liner to generate passwords for me in situations where I don’t necessarily have access to (or want) Safari and friends to do it for me.

I make no claims nor warranties about the “security” of the passwords generated by the following code, but I sure did enjoy writing it.  Just for fun, I did paste an output string into Kaspersky’s online password strength tester, and according to the tester it’s … actually not bad?  (Again: not an expert here)

Anyway, here’s the code.  It loops over an alphanumeric array with some special characters thrown in, grabbing one character at random for each iteration.  It also folds the case of the character if the number of the current iteration is even (assuming the character is of the sort whose case can be folded, which some aren’t).

$ perl -E '@vals = split "", "0x1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ-?!@^&*()"; $_ % 2 == 0 ? print $vals[ rand($#vals) ] : print fc $vals[ rand($#vals) ] for 0 .. 24; say;'

To give you a sense of the output of this script, here’s the password I typed into the Kaspersky checker for reference:

Vp8vJmNnN*8(CrE8*30*4@JlC

Genera Notes, Part 1/N

I think I’d like to start sharing some screenshots and notes taken while playing with my local Open Genera installation.  In part for historical capture reasons, and in part because I think it’s fun.

I set up the system using the instructions in this Youtube video, which can be tl;dr’d as:

  • install an old-ass Ubuntu in a Virtualbox
  • configure NFS and other random things
  • SSH in with Putty and fire up an Open Genera instance via X Windows

The old-ass Ubuntu is allegedly necessary due to some behavior in newer X that breaks Open Genera, but I haven’t verified that yet, only read it.

I’m planning to write up the (Virtualbox on Windows) installation process shown in that video soon for my own future reference.  At that point I’ll probably write a script to automate it as well.

If you don’t use Windows, there is already this excellent tutorial: Running Open Genera 2.0 on Linux.  I’ve exchanged mail with the author of this piece, he seems like quite a nice guy in addition to being pretty knowledgeable.  Apparently Open Genera runs more robustly on a Compaq AlphaServerDS10L (or similar machine) as was originally intended, though it’s much slower than modern systems.

13 November 2018

Currently reading the Genera User’s Guide section entitled Getting Acquainted with Dynamic Windows (link is to the exact page of the PDF on Internet Archive!).  There is a list of bookmarks on the right that I’d like to revisit and finish reading.

To add a document to the Bookmarks pane in Document Examiner, either:

  • Visit it (so it’s added automatically)
  • From somewhere else in the interface, when you see a link (AKA a hoverable title for a document or section thereof), press Shift and then click with the left mouse button (also denoted as Sh-Mouse-M by the system – you need a three-button mouse to use this system properly)

Note: the excessive (?) whitespace in the screenshot below is due to the fact that we’re running at 1920×1080, which is my laptop’s default resolution but is probably (?) larger than any physical Lisp Machine monitor that ever existed.  Based on some pictures of actual monitors I’ve seen, I wonder if this environment would profit from running on a vertically oriented monitor as well.  Something to play with.

As I read various docs, I’ve been taking notes in Zmacs.  The Zmacs buffer shown in the next screenshot is actually getting written to the Linux machine’s (virtual) disk, and can thus be backed up, edited from other text editors, etc.  It’s all happening over NFS!  And as you have probably deduced from the window borders, this Genera window is being served over the X Windows system (specifically XMing running on Windows).

Here’s the Zmacs window after being expanded using the System menu (shown, which can be accessed at any time via Sh-Mouse-R):

In addition to the Genera window, there is the “cold load” window that is also displayed via X Windows while Open Genera is running.  And lo!  As I began writing this, Genera crashed by trying to display an ellipse (an image from the ZMail documentation, specifically Conceptual Zmail Architecture), which caused it to try to reference an array out of bounds (I don’t know why, yet).  Here’s the backtrace as shown in the cold load window (the Genera window with Document Examiner just beeped and froze – when that happens it’s time to look at the cold load window):

In the cold load window’s debugger I was able to ascertain the following keys’ meanings, at least on my laptop (confusingly, the keys here do not map to the same keys as in the Genera X window):

  • Shift-E means “eval (program)”, dropping you into Lisp
  • * (asterisk) means Abort.  It popped me back up out of the cold load stream and into Genera (the Document Examiner in particular).  My document window state was nuked, but I was able to click the bookmark to return to the section I was reading.  (Now to go back and see if I can get Document Examiner to crash again with a bad array subscript by viewing that page again!).

Note: the above crash(es) happened while I was simultaneously loading CLIM in the Listener, which seems to put a lot of load on the (smallish, Virtualbox’d) system I’m running on.  So it may have something to do with that.  Here’s what some of the output of loading CLIM looks like:

Oh, and another thing: Zmail can read GNU Emacs RMAIL files according to the docs!  I’m not 100% sure what “UNIX” mail format means in this context, but perhaps it means good old mbox?

Anyway I’ve got a lot left to explore on this system.  As it is I’ve been reading the documentation and browsing around in my off hours for the last week or two, and it feels like I’m just getting started.

Announcing flycheck-pod.el, Flycheck support for Perl 5 POD files

Recently I was writing some Plain Old Documentation (aka POD) for one of my Perl modules, and I thought it would be convenient to have Flycheck support while writing docs.

Since I couldn’t find the Elisp code already written to do this, I whipped something up: flycheck-pod.

As of right now, it supports error checking in Emacs’ various POD and Perl modes using podchecker in the background. I doubt it covers all of the features of POD::Checker though — patches are certainly welcome. See the source code for details.

(Image courtesy Kenneth Lu via Creative Commons license.)

How to use Locate from Emacs on Windows

If you are like me, you like to:

  • Live in Emacs as much as possible to avoid context-switching
  • Set up Emacs so your environment abstracts the OS as much as possible

Being able to sit down at any of my computers and type M-x locate in Emacs is a requirement for me, even if It’s running Windows underneath.

In this post I’ll describe how to set up a locate(1) command on Windows 10, and how to access it from Emacs.

Step 1. Install Locate32

Download and install locate32 on your machine. It doesn’t have an installer, it just gives you a directory full of things, including the locate.exe binary. I put mine in "C:/Users/rml/Programs/locate32/", and added that location to my Windows %PATH%.

Step 2. Tell Emacs where to find it

In Emacs, set the value of the locate-command variable to wherever you ended up putting it. Here’s where it is on my machine:

(setq locate-command "c:/Users/rml/Programs/locate32/locate.exe")

Step 3. Locate all the things

Now when you run the M-x locate command from inside Emacs, it should give you a Dired buffer of results, the same way it does on other systems. Because it’s Dired, you can hit enter on a filename to visit it or mark files in various ways and then operate on them.

Here’s what it looks like on my Windows 10 laptop if I search for the text “svn”: