I recently bought a copy of Higher Order Perl (also known as HOP) by Mark Dominus. In chapter 4, he introduces iterators. At a high level, an iterator is a black box with an input funnel, an output funnel, and a big red “NEXT” button. You put something into the funnel at time
T; then, at some later time
T+n, you press the “NEXT” button and something useful comes out.
Iterators are useful in cases when you want to access a potentially huge list of things one at a time, but you don’t have access to a computer with infinite memory.
In this post we’ll look at a moderately useful iterator example; a directory walker that recurses down a directory tree on your file system. The implementation is given in Scheme (specifically, the scsh dialect); the code and the descriptions thereof are heavily inspired by Dominus’ excellent book. However, this post contains the work of an amateur. Any errors or omissions in the following code and explanations are mine.
Iterators are made with closures
In order to build a useful iterator, we need to have a basic understanding of closures. A closure is just a function packaged up with an environment. The environment allows us to store up state between function calls. As noted above, it’s easiest just to think of it as a little black box that we can pass around in our program.
In Scheme and Common Lisp, you build functions with
LAMBDA. Here’s a very simple closure example in Scheme, translated from the first example given in HOP:
(define (upto m n) (set! m (- m 1)) ;needed since we don't have post-increment ($i++) (lambda () (if (< m n) (begin (set! m (+ m 1)) m) #f)))
This is a function that returns a function (a closure). To use the closure returned by
UPTO, assign it to a variable and call it whenever you want the next value. When the iterator is exhausted, it will return
> (define 1to10 (upto 1 10)) > (1to10) 1 > (1to10) 2 ... > (1to10) 9 > (1to10) 10 > (1to10) #f
A trivial file tree iterator
A more interesting and practical use of this technique is to walk a directory tree on your computer. In the Scheme procedure below, we take a list of directories and then build and return a closure that, when called, will walk the directory tree breadth-first, printing out the names of files as we go.
(define (dir-walk queue) (let ((file #f)) (lambda () (if (not (null? queue)) (begin (set! file (car queue)) (set! queue (cdr queue)) (cond ((file-directory? file) (let ((new-files (directory-files file))) (set! queue (append queue (map (lambda (filename) (string-append file "/" filename)) new-files))) file)) ((file-regular? file) file) (else #f)))))))
The important part to notice is the inner
LAMBDA. This is where we create a closure by packaging up a procedure with an environment. The closure’s environment remembers the contents of the variable
QUEUE in memory, so we can ask for the elements of
QUEUE at our leisure.
Here it is running on my machine:
> (define dir-iter1 (dir-walk '("/Users/rloveland/Desktop/current/"))) > (dir-iter1) "/Users/rloveland/Desktop/current/XProlog.jar" > (dir-iter1) "/Users/rloveland/Desktop/current/all-jira-issues.txt" > (dir-iter1) "/Users/rloveland/Desktop/current/automate-campaign-setup.txt" > (dir-iter1) "/Users/rloveland/Desktop/current/buflocal.scm" > (dir-iter1) "/Users/rloveland/Desktop/current/bundle.js"
Just for fun, here’s a version of the same function in Common Lisp. It’s essentially the same as the scsh version, save that it uses the nice CL pathname machinery instead of appending raw strings, and also makes use of the CL-FAD convenience library.
(defun dir-walk (queue) (let ((file nil)) (lambda () (if (not (null queue)) (progn (setf file (car queue)) (setf queue (cdr queue)) (cond ((cl-fad:directory-pathname-p file) (let ((new-files (cl-fad:list-directory file))) (setf queue (append queue (mapcar #'(lambda (filename) (merge-pathnames filename file)) new-files))) file)) ((cl-fad:file-exists-p file) file) (t nil)))))))
Usage is also essentially the same:
CL-USER> (defvar *tree-walker* (dir-walk '(#P"/Users/rloveland/Desktop/current/"))) *TREE-WALKER* CL-USER> (type-of *tree-walker*) FUNCTION CL-USER> (funcall *tree-walker*) #P"/Users/rloveland/Desktop/current/" CL-USER> (funcall *tree-walker*) #P"/Users/rloveland/Dropbox/current/.debuggerDefaults" CL-USER> (funcall *tree-walker*) #P"/Users/rloveland/Dropbox/current/.DS_Store" CL-USER> (funcall *tree-walker*) #P"/Users/rloveland/Dropbox/current/A" CL-USER> (funcall *tree-walker*) #P"/Users/rloveland/Dropbox/current/A.hi" CL-USER> (funcall *tree-walker*) #P"/Users/rloveland/Dropbox/current/A.hs"
A slightly less trivial file tree iterator
Listing out files and directories is nice, but not that useful. We’d like a way to see only those files and directories that have some interesting property.
This can be accomplished by passing in another argument to our
DIR-WALK function: this argument will be yet another function that will test the current file to see whether it’s interesting to us. It’s pretty easy to change
DIR-WALK to accept a function argument,
INTERESTING?. This arbitrary function is used to check the file to see if we care about it.
This time around, when we build our internal queue, we use a call to
FILTER to make sure that only interesting files get added.
(define (dir-walk* interesting? queue) (let ((file #f)) (lambda () (if (not (null? queue)) (begin (set! file (car queue)) (set! queue (cdr queue)) (cond ((file-directory? file) (let ((new-files (directory-files file))) (set! queue (append queue (filter interesting? (map (lambda (filename) (string-append file "/" filename)) new-files)))) (if (interesting? file) file))) ((interesting? file) file) (else #f))) #f))))
And here it is in use; in this example, we pass an
INTERESTING? function that asks for only files that are not marked as executable:
> (define dir-iter2 (dir-walk* (lambda (f) (file-not-executable? f)) '("/home/rml/Desktop/current"))) > (dir-iter2) > (dir-iter2) "/home/rml/Desktop/current/A.hi" > (dir-iter2) "/home/rml/Desktop/current/A.hs" > (dir-iter2) "/home/rml/Desktop/current/A.o" > (dir-iter2) "/home/rml/Desktop/current/ABBREV.xlsx" > (dir-iter2) "/home/rml/Desktop/current/AddInts.class" > (dir-iter2) "/home/rml/Desktop/current/AddInts.java"
There are still bugs and unexpected behavior in
DIR-WALK*. For example, on the first call to the resulting iterator there is no result due to the one-armed
IF. There are also strange things that happen if we want to filter out directories from our results, since we mix together the
FILE-DIRECTORY? check and the
INTERESTING? check inside the iterator. However, despite these small nits, it can still do useful work on my machine 1, and it’s a good enough example of using closures to build iterators.
Here’s hoping you’ll enjoy playing with closures and iterators in your own code!
1 In the grandest internet tradition, “It works on my machine” ™.