Make your own iterators in Scheme using closures



I recently bought a copy of Higher Order Perl (also known as HOP) by Mark Dominus. In chapter 4, he introduces iterators. At a high level, an iterator is a black box with an input funnel, an output funnel, and a big red “NEXT” button. You put something into the funnel at time T; then, at some later time T+n, you press the “NEXT” button and something useful comes out.

Iterators are useful in cases when you want to access a potentially huge list of things one at a time, but you don’t have access to a computer with infinite memory.

In this post we’ll look at a moderately useful iterator example; a directory walker that recurses down a directory tree on your file system. The implementation is given in Scheme (specifically, the scsh dialect); the code and the descriptions thereof are heavily inspired by Dominus’ excellent book. However, this post contains the work of an amateur. Any errors or omissions in the following code and explanations are mine.

Iterators are made with closures

In order to build a useful iterator, we need to have a basic understanding of closures. A closure is just a function packaged up with an environment. The environment allows us to store up state between function calls. As noted above, it’s easiest just to think of it as a little black box that we can pass around in our program.

In Scheme and Common Lisp, you build functions with LAMBDA. Here’s a very simple closure example in Scheme, translated from the first example given in HOP:

(define (upto m n)
  (set! m (- m 1)) ;needed since we don't have post-increment ($i++)
  (lambda ()
    (if (< m n)
        (begin (set! m (+ m 1)) m)

This is a function that returns a function (a closure). To use the closure returned by UPTO, assign it to a variable and call it whenever you want the next value. When the iterator is exhausted, it will return #f.

> (define 1to10 (upto 1 10))
> (1to10)
> (1to10)
> (1to10)
> (1to10)
> (1to10)

A trivial file tree iterator

A more interesting and practical use of this technique is to walk a directory tree on your computer. In the Scheme procedure below, we take a list of directories and then build and return a closure that, when called, will walk the directory tree breadth-first, printing out the names of files as we go.

(define (dir-walk queue)
  (let ((file #f))
    (lambda ()
      (if (not (null? queue))
          (begin (set! file (car queue))
                 (set! queue (cdr queue))
                 (cond ((file-directory? file)
                        (let ((new-files (directory-files file)))
                          (set! queue (append queue (map (lambda (filename)
                                                           (string-append file "/" filename))
                       ((file-regular? file) file)
                       (else #f)))))))

The important part to notice is the inner LAMBDA. This is where we create a closure by packaging up a procedure with an environment. The closure’s environment remembers the contents of the variable QUEUE in memory, so we can ask for the elements of QUEUE at our leisure.

Here it is running on my machine:

> (define dir-iter1 (dir-walk '("/Users/rloveland/Desktop/current/")))
> (dir-iter1)
> (dir-iter1)
> (dir-iter1)
> (dir-iter1)
> (dir-iter1)

Just for fun, here’s a version of the same function in Common Lisp. It’s essentially the same as the scsh version, save that it uses the nice CL pathname machinery instead of appending raw strings, and also makes use of the CL-FAD convenience library.

(defun dir-walk (queue)
  (let ((file nil))
    (lambda ()
      (if (not (null queue))
            (setf file (car queue))
            (setf queue (cdr queue))
            (cond ((cl-fad:directory-pathname-p file)
                   (let ((new-files (cl-fad:list-directory file)))
                     (setf queue (append queue (mapcar #'(lambda (filename)
                                                           (merge-pathnames filename file))
                  ((cl-fad:file-exists-p file) file)
                  (t nil)))))))

Usage is also essentially the same:

CL-USER> (defvar *tree-walker* (dir-walk '(#P"/Users/rloveland/Desktop/current/")))
CL-USER> (type-of *tree-walker*)
CL-USER> (funcall *tree-walker*)
CL-USER> (funcall *tree-walker*)
CL-USER> (funcall *tree-walker*)
CL-USER> (funcall *tree-walker*)
CL-USER> (funcall *tree-walker*)
CL-USER> (funcall *tree-walker*)

A slightly less trivial file tree iterator

Listing out files and directories is nice, but not that useful. We’d like a way to see only those files and directories that have some interesting property.

This can be accomplished by passing in another argument to our DIR-WALK function: this argument will be yet another function that will test the current file to see whether it’s interesting to us. It’s pretty easy to change DIR-WALK to accept a function argument, INTERESTING?. This arbitrary function is used to check the file to see if we care about it.

This time around, when we build our internal queue, we use a call to FILTER to make sure that only interesting files get added.

(define (dir-walk* interesting? queue)
  (let ((file #f))
    (lambda ()
      (if (not (null? queue))
          (begin (set! file (car queue))
                 (set! queue (cdr queue))
                  ((file-directory? file)
                        (let ((new-files (directory-files file)))
                          (set! queue (append queue (filter interesting? (map (lambda (filename)
                                                           (string-append file "/" filename))
                          (if (interesting? file) file)))
                  ((interesting? file) file)
                  (else #f)))

And here it is in use; in this example, we pass an INTERESTING? function that asks for only files that are not marked as executable:

> (define dir-iter2 (dir-walk* (lambda (f) (file-not-executable? f)) '("/home/rml/Desktop/current")))
> (dir-iter2)
> (dir-iter2)
> (dir-iter2)
> (dir-iter2)
> (dir-iter2)
> (dir-iter2)
> (dir-iter2)


There are still bugs and unexpected behavior in DIR-WALK*. For example, on the first call to the resulting iterator there is no result due to the one-armed IF. There are also strange things that happen if we want to filter out directories from our results, since we mix together the FILE-DIRECTORY? check and the INTERESTING? check inside the iterator. However, despite these small nits, it can still do useful work on my machine 1, and it’s a good enough example of using closures to build iterators.

Here’s hoping you’ll enjoy playing with closures and iterators in your own code!

(Image courtesy under a Creative Commons License.)


1 In the grandest internet tradition, “It works on my machine” ™.

Why Libraries and Librarians are Amazing


Libraries: because you cannot have a conscience without a memory.

For some time now I’ve been meaning to write about why libraries (and librarians!) are important. After all, I’m a child of the library. In particular, I’m a child of this library, where I spent many happy hours growing up. As the son of a blue-collar family that was anything but “bookish”, well, I don’t know where I’d be without libraries.

What follows are a collection of random thoughts (linky-blurbs, really) about the general amazingness of libraries and librarians:

  • Librarians fought, and are still fighting, surveillance-state idiocy like the PATRIOT Act.
  • On a lighter note, the Internet Archive also has this collection of classic MS-DOS games that you can play in your browser. When I saw titles like Street Fighter, Sim City, Donkey Kong, and Castle Wolfenstein in this list, I have to admit I kinda freaked out. The future is amazing. Growing up, I had to memorize my friends’ phone numbers! And now we have magic tiny emulated computers in our browser tabs.

See also:

Never trust a corporation to do a library’s job.

(Image courtesy Cher Amio under Creative Commons license.)

Why Markdown is not my favorite text markup language

origami-galerie-freising-tomoko-fuseThere are many text markup languages that purport to allow you to write in a simple markup format and publish to the web. Markdown has arguably emerged as the “king” of these formats. I quite like it myself when it’s used for writing short documents with relatively simple formatting needs. However, it falls a bit short when you start to do more elaborate work. This is especially the case when you are trying to do any kind of “serious” technical authoring.

I know that “Markdown” has been used to write technical books. Game Programming Patterns is one excellent example; you can read more about the author’s use of Markdown here, and the script he uses to extend Markdown to meet his needs is here. (I recommend reading all of his essays about how he wrote the book, by the way. They’re truly inspiring.). Based on that author’s experience (and some of my own), I know that Markdown can absolutely be used as a base upon which to build ebooks, websites, wikis, and more. However, this is exactly why I used the term “Markdown” in quotes at the beginning of this paragraph. By the time you’ve extended Markdown to cover your more featureful technical authoring use cases, it really isn’t “just” Markdown anymore. This is fine if you just want to get something done quickly that meets your own needs, but it’s not ideal if you want to work with a meaningful system can be standardized and built on.

Below I’ll address just a few of the needs of “industrial” technical writing (the kind that I do, ostensibly) where Markdown falls a little short. Lest this come off as too negative, it’s worth stating for the record that a homegrown combination of Markdown and a few scripts in a git repo with a Makefile is still an absolute paradise compared to almost all of the clunky proprietary tooling that is marketed and sold for the purposes of “mainstream” technical writing. I have turned to such a homebrewed setup myself in times of need. I’ve even written about how awesome writing in Markdown can be. However, this essay is an attempt to capture my thoughts on Markdown’s shortcomings. Like any good internet crank, I reserve the right to pull a Nickieben Bourbaki at a later date.

I. No native table support

If you are doing any kind of large-scale tech docs, you need tables. Although constraints are always good, and a simple list can probably replace 80% of your table usage if you’re disciplined, there are times when you really just need a big honkin’ table. And as much as I’m used to editing raw XML and HTML directly in Emacs using its excellent tooling to completely sidestep the unwanted “upgrade” to the Confluence editor at $WORK, most writers probably don’t want to be authoring tables directly in HTML (which is the “native” Markdown solution).

II. No native table of contents support

Yes, I can write a script myself to do this. I can also use one of the dozens of such scripts written by others. However, I’d rather have something built in, and consider it a weakness of the format.

III. Forcing the user to fall back to inline HTML is not really OK

Like tables, there are a number of other formatting and layout use cases that Markdown can’t handle natively. As with tables, you must resort to just slapping in some raw HTML. Two reasons why this isn’t so amazing are:

  • It’s hard for an editor to support well, since editing “regular” text markup and tag-based markup languages are quite different beasts
  • It punts complexity to thousands of users in in order to preserve implementation simplicity for a small number of implementors

I can sympathize with the reasoning behind this design decision, since I am usually the guy making his own little hacks that meet simple use cases, but again: not really OK for serious work.

IV. Too many different ways to express the same formatting

This has lead to a number of incompatibilities among the different “Markdown” renderers out there. Just a few of the areas where ambiguity exists are: headers, lists, code sections, and links. For an introduction to Markdown’s flexible semantics, see the original syntax docs. Then, for a more elaborate description of the inconsistencies and challenges of rendering Markdown properly, see Why is a spec needed?, written by the CommonMark folks.

V. Too many incompatible flavors

There are too many incompatible flavors of Markdown that each render a document slightly differently. For a good description of the ways different Markdown implementations diverge, see the Babelmark 2 FAQ.

The “incompatible flavors” issue will hopefully be addressed with the advent of the CommonMark Standard, but if you read the spec it doesn’t address points I, II, or III at all. This makes sense from the perspective of the author of a standards document: a spec isn’t very useful unless you can achieve consensus and adoption among all the slightly different implementations out there right now, and Markdown as commonly understaood doesn’t try to support those cases anyway.

VI. No native means of validation

There will of course be a reference implementation and tests for CommonMark, which will ensure that the content is valid Markdown, but for large-scale documentation deployments, you really need the ability to validate that the documentation sets you’re publishing have certain properties. These properties might include, but aren’t limited to:

  • “Do all of the links have valid targets?”
  • “Is every page reachable from some other page?”

Markdown doesn’t care about this. And to be fair it never said it would! You are of course free to use other tools to perform all of the validations you care about on the resulting HTML output. This isn’t necessarily so bad (in fact it’s not as bad as points I and II in my opinion, since those actually affect you while you’re authoring), but it’s an issue to be aware of.

This is one area where XML has some neat tooling and properties. Although I suppose you could do something workable with a strict subset of HTML. You could also use pandoc to generate XML, which you then validate according to your needs.


Markdown solves its original use case well, while punting on many others in classic Worse is Better fashion. To be fair to Markdown, it was never purported to be anything other than a simple set of formatting conventions for web writing. And it’s worth saying once more that, even given its limitations, a homegrown combination of Markdown and a few scripts in a git repo with a Makefile is still an absolute paradise compared to almost all of the clunky proprietary tooling that is marketed and sold for the purposes of “mainstream” technical writing.

Even so, I hope I’ve presented an argument for why Markdown is not ideal for large scale technical documentation work.

(Image courtesy Gerwin Sturm under a Creative Commons license.)

How to install Perlbrew on Fedora 20 and use it with (m)ksh


This post details how to go about installing perlbrew on a Fedora 20 Linux system for a user whose default shell is in the ksh family, specifically mksh, which is the shell I use.

The instructions in each section can be used separately if you are just a Fedora user or just a (m)ksh user.

Installing Perlbrew on Fedora 20

If you try to install Perlbrew using the instructions on its website, Fedora barfs. Fortunately, the workaround you need is detailed in this Github issue. (Hint: read the whole thread and then do what wumpus says.)

Using Perlbrew with ksh or mksh

The perlbrew bashrc file uses a few “bashisms” (unnecessarily, I think, but it is a “bash” rc after all). I managed to hack them out and, instead of sourcing perlbrew’s default bashrc file, I now source a kshrc built by applying the patch below.

It’s working well for me so far under mksh, but feel free to leave a comment if it doesn’t work for you and I’ll try to help.

Happy Perlbrewing!

--- bashrc	2014-12-30 12:49:14.110446784 -0500
+++ kshrc	2014-12-30 13:15:58.750454519 -0500
@@ -1,6 +1,5 @@
 __perlbrew_reinit() {
     if [[ ! -d "$PERLBREW_HOME" ]]; then
         mkdir -p "$PERLBREW_HOME"
@@ -13,7 +12,9 @@
 __perlbrew_purify () {
-    local path patharray outsep
+    path=
+    patharray= 
+    outsep=
     if [[ -n "$BASH_VERSION" ]]; then
         IFS=: read -ra patharray <<< "$1"
@@ -30,13 +31,13 @@
 __perlbrew_set_path () {
-    export MANPATH=$PERLBREW_MANPATH${PERLBREW_MANPATH:+:}$(__perlbrew_purify "$(manpath)")
-    export PATH=${PERLBREW_PATH:-$PERLBREW_ROOT/bin}:$(__perlbrew_purify "$PATH")
     hash -r
 __perlbrew_set_env() {
-    local code
+    code=
     code="$($perlbrew_command env $@)" || return $?
     eval "$code"
@@ -64,8 +65,8 @@
 perlbrew () {
-    local exit_status
-    local short_option
+    exit_status=
+    short_option=
     export SHELL
     if [[ $1 == -* ]]; then

(Image courtesy under Creative Commons License.)

Scsh is UNIX, Part 1: Sockets


Inspired by words from Ryan Tomayko and Jacob Kaplan-Moss, I set out to see how difficult it would be to implement a tiny UNIX echo server in scsh. It turns out to be fairly easy. In this post, we’ll cover:

  • How the UNIX socket programming model works at a high level
  • How to write a basic echo server in scsh

Unlike the echo servers of Tomayko and Kaplan-Moss, this one doesn’t use fork (yet). We will add forking to the server in a follow-up post. Hopefully this will make it easier for less experienced UNIX programmers (like me) to get started with.

The UNIX socket programming model for dummies

At a high level, the steps to open a server-side socket are:

  • create a socket “thing”
  • toggle some settings on it so the OS knows it’s for interweb use
  • bind it to an address
  • listen for connections on it
  • start accepting those connections

scsh does its part to make this easy by providing a high-level Scheme procedure, BIND-LISTEN-ACCEPT-LOOP, which handles this stuff for you, in addition to some nifty error handling and other bookkeeping chores. But we’re going to ignore that for now and write everything by hand to see how it’s done.

You should be able to enter all of the code in the next section directly in the scsh top-level. You don’t need to open any additional packages; this is just plain old scsh.

Our echo server

Our server takes two arguments: the PROC is a procedure that causes the server to actually do something; we’ll write in a minute. The PORT is the port you want to serve on, e.g., 49995:

(define (server proc port)
  (let* ((sock (create-socket protocol-family/internet socket-type/stream))
         (addr (internet-address->socket-address internet-address/any port)))
    (set-socket-option sock level/socket socket/reuse-address #t)
    (bind-socket sock addr)
    (listen-socket sock 5)
    (let loop ()
      (lambda () (accept-connection sock))

The first thing you’ll notice is that it’s pretty sequential and quite simple really. We just go through the socket opening dance we described earlier: open, configure, bind, listen, and accept.

Since this is an echo server, we’ll just, like, echo stuff:

(define (echo socket address)
  (let* ((in (socket:inport socket))
         (out (socket:outport socket))
         (message (read-line in)))
      (format out "~A~%" message)
      (force-output out)))

As you can see, our ECHO procedure takes a socket and an address. (We don’t do anything with the address in this example, but our procedure needs to “implement this interface”, as they say, in order to work. You can see this for yourself in the scsh 0.6.7 tarball; it’s in scsh/network.scm.)

In our LET*-binding we create some convenient locally bound names for the socket structure’s input and output ports, and then we read a line from the socket’s input port.

Since this is echo server, we just write the same data back out with FORMAT. We call FORCE-OUTPUT to flush output to the terminal immediately. Otherwise Scheme will wait for the operating system’s output buffer to fill before writing out, and it will appear to the user that nothing is happening.

Trying it out

Let’s start it up and see if it works. Assuming you’ve loaded the procedures above in your scsh image somehow, enter this at the REPL:

> (server echo 49995)

The REPL should appear to “hang” while the server is running. Now go to your terminal and connect to the echo server. There are several ways to do it; here’s what I used:

~ $ telnet localhost 49995
Trying ::1...
telnet: connect to address ::1: Connection refused
Connected to localhost.
Escape character is '^]'.
who's there?
who's there?
i don't know
i don't know
could you stop that?
could you stop that?
fine then
fine then

Hopefully that was enough to get you started playing with sockets in scsh. I’ll write a followup post where we talk about the UNIX forking process model and update this example to make it a “preforking” server.

(Image courtesy Hope for Gorilla under a Creative Commons license.)

Driving the Scsh disassembler from Geiser

(Note: This post refers to functionality provided by geiser-scsh.)

Most Scheme 48 and scsh programmers are familiar with the \,dis command. It allows you to disassemble Scheme code (a procedure or continuation) into the “assembly” understood by the Scheme 48 virtual machine. Here’s an example using scsh’s fork/pipe procedure:

> ,dis fork/pipe
  0 (protocol 0 +)
  4 (make-env 1)
  7 (global fork)
 10 (push-local0 1)
 13 (push)
 14 (global really-fork/pipe)
 17 (call 2)

This is pretty cool. However, I thought it would be even better if Geiser could run the disassembler on a region of Scheme code, and pop open the disassembler output in a new buffer. Geiser already supports similar functionality for macro-expansion via GEISER-EXPAND-REGION. There’s no reason why it shouldn’t do the same for the disassembler. So that’s what I taught it to do – in this screenshot I’m disassembling the LETREC expression:


(I have to admit I was inspired to work on this by my jealousy of the disassembler functionality available to Common Lispers via the SLIME-DISASSEMBLE-{DEFINITION,SYMBOL} commands in SLIME.)

How it Works

Here’s how it works:

  • Highlight a region of Scheme (scsh) code in a buffer
  • Call ‘M-x geiser-scsh-disassemble-region‘ (a.k.a., ‘C-c RET C-d‘).
  • Emacs sends the Scheme code to the running scsh process for disassembly via a newly exported Geiser macro, GE:DISASSEMBLE. This macro wraps the code in a LAMBDA before sending it to the disassembler to work around the fact that the S48 DISASSEMBLE procedure won’t accept arbitrary Scheme expressions such as (+ 1 1). (Wrapping expressions in LAMBDA like this is not ideal for reasons I’ll explain below.)

Future Work

This implementation is a proof of concept – don’t believe in it. Think of it as a prototype of how this functionality could work if built into Geiser proper. Here are some of the issues with it from an implementation perspective:

  • It doesn’t use the correct Geiser protocols for sending Scheme code to evaluate, instead just piping in raw strings. This was expedient because of the way s48/scsh use non-reader-friendly strings like {#Uncategorized} as the final lines of output for procedures whose outputs are not defined by the standard. I think this can be fixed by coming up with a better implementation of the Geiser code evaluation protocols (in scsh/geiser/evaluation.scm) so that they handle all of the weird cases in S48 output.
  • Related to the previous point, I’m doing some ugly regex stuff on the stringified output of the disassembler to make it nicer before piping it into the temp buffer.
  • This functionality should really be added to Geiser itself, via the GEISER-DEBUG-* namespace. Then it would be forced to address both of the above points. Right now it’s just an ugly little hack in geiser-scsh.el. In principle, with the right infrastructure in GEISER-DEBUG-*, there’s nothing preventing a Guile or Racket implementation (here’s the Guile disassembler in action – you can see that it’s not so different from S48):
    scheme@(guile-user)> (define (foo n) (expt n n))
    scheme@(guile-user)> ,x foo
    Disassembly of #<procedure foo (n)>:
       0    (assert-nargs-ee/locals 1)      ;; 1 arg, 0 locals
       2    (toplevel-ref 1)                ;; `expt'
       4    (local-ref 0)                   ;; `n'
       6    (local-ref 0)                   ;; `n'
       8    (tail-call 2)                                         at (unknown file):5:16
    • The disassembler, like the macro-expansion functionality, should be able to work recursively. That is, in places where the S48 assembly makes procedure calls like (global ge:really-fork/pipe) (as it does in our first example way at the top of this post), you should be able to ask for the disassembled output of GE:REALLY-FORK/PIPE as well, all the way down to the core primitives. Currently, there are cases where you still need to call \,dis <PROCEDURE> at the command prompt. I think this limitation is created by the way we wrap the expression in a LAMBDA before sending it to the disassembler. A better design is needed, one that (for example) calls PROCEDURE? on the proposed input, and then decides whether the expression needs to be wrapped in a LAMBDA before going to the disassembler.

Review: The Hard Thing about Hard Things


A while back, Ben Horowitz came to give a talk at the company where I work, AppNexus. During that talk, I discovered that he is an investor. After Mr. Horowitz’s visit, the company generously bought every employee a copy of his book The Hard Thing about Hard Things.

It took a while to get around to it, but I’ve finally given it a read on my long bus rides. My impressions are as follows:

  • It’s bullshit free. There is no management-speak, jargon, or other obfuscatory nonsense anywhere in this book. It is very information-dense and doesn’t waste your time.
  • It’s honest. Mr. Horowitz describes how gut-wrenchingly hard it is to be the CEO, especially when things aren’t going well. His experience is particularly informative because he’s been a CEO during many times when things weren’t going well. He relates his wife’s observation regarding how his service as CEO was met by some: “They called you everything but a child of God”.
  • It tells good stories. The book illustrates its salient points with stories taken from real experience. Most management books are written by consultants (as Mr. Horowitz mentions), this one is written by someone who’s been there.
  • It uses metaphor. The two that stick in my mind are (1) the difference between “Wartime CEO” and “Peacetime CEO”, and (2) something called “The Torture” (read the book to find out more).
  • It’s concrete. The author has real opinions, and he expresses them. For example, he makes it clear that Andy Grove’s High Output Management is one of his favorite business books. He also describes in frank terms how Ernst & Young almost tanked the sale of his company (when he was a client of theirs!), and thus are not his fave, to say the least. It’s hard to trust people who don’t express concrete opinions; Mr. Horowitz is not in that group.

That’s all very nice, but what does any of this high-falutin’ CEO talk have to do with my life as a technical writer? After all, it’s not always clear how books aimed at executives are of use to an individual contributor.

Even though this book is for and about executives, it has given me some insight into how an executive might be thinking about different functions in the company and their value. It seems that technical writing’s value to the company from this perspective is as follows:

  • Generate a positive impression of the company: For example, I’ve had someone who works at AppNexus tell me that, when he was working at another company, he and his team really valued our wiki’s API docs (I’d link but they’re behind a login). Good API docs made us the easiest platform for his team to build against. Because we were the easiest to build against, we were the first company they would integrate with. Not only is it nice to hear things like this, it really makes the value to customers clear. I would imagine that this is helpful for sales as well.
  • Reduce support costs: This is the flip-side of the previous comment. If customers can get things done faster by using information in the docs than they can by sending a support request, you win.
  • Stave off imformational collapse: This sounds negative, but it must be accounted for. As a company grows, you can no longer scale by bringing new people up to speed with a “come sit next to me and I’ll show you” workflow. This also applies to existing employees who are working in an area of the product/platform that’s new to them. Someone has to do the nasty work of writing stuff down in order to scale your internal technical communication. Often engineers will do this work, but sometimes you need technical writers. This is why Google is hiring for Technical Writer, Internal Software Engineering positions right now (and has been for quite a while from what I’ve seen). At a certain size, word-of-mouth just doesn’t work that well anymore.

A final point worth noting is that the job of CEO is stressful as hell. After finishing this book, I can say that I’m much happier in my role than I would be as an executive. It doesn’t seem very glamorous at all based on Mr. Horowitz’s description. Although I’m sure the money is nice, it’s been shown over and over again that money is not much of a motivator past a certain point, nor does it buy happiness.

(The book image above is from pandodaily.)