For a long time now, I’ve been looking to make my life more Lispy. As part of that transformation, I’ve begun porting some of my little Perl scripts over to Guile Scheme. Today I’m going to walk through a script that renames my files in a nice, *nix-friendly fashion. For example, if I download a file that someone has erroneously (if good-naturedly) called “My Cool Data.tar.gz”, this script will rename it to “my-cool-data.tar.gz”.
A note on filename style: I’ve never liked the common practice of naming files using underscores (`_’), so I use hyphens instead (`-‘). It’s more Lispy! Also, regular expressions usually recognize the underscore character as part of a word, such that `my_cool_data’ is considered one word, whereas `my-cool-data’ will be treated as three, and the latter is almost always what I’d prefer (since those are, in fact, three words).
Ok. So what about Guile then? It’s an R5RS-compatible scheme, so you get all of that goodness. If you’re an Emacs user, check out Geiser, which turns Emacs into an AWESOME Scheme hacking environment. You don’t need to be an Emacs weirdo like me to write programs in Guile, however. Vim works very nicely, as a matter of fact, and it also highlights Scheme source code beautifully.
Finally, not that it matters that much, but this short essay is also a literate program, thanks to Orgmode (a.k.a. “The Teal Unicorn”). Fun!
Program Headers and Modules
Just like any other *nix script, we need to declare a path to our interpreter, as well as any arguments to the interpreter itself. In Guile’s case, there are two things to notice: (1) The
guile executable must be passed the
-s argument to execute in script-mode, and (2) The opening
#! in the interpreter path must be matched by a closing
!# due to the way Scheme works (or at least, this particular Scheme).
Next, we declare the modules we’d like to use. In this case, it’s just the one:
ice-9 regex. Please don’t ask me what the
ice-9 part means, but Guile has a whole bunch of functionality under the
ice-9 umbrella, such as regular expression support (which we’re using here), POSIX-related stuff, a
getopt-long library, and more. For details see [the fine manual]. Or just type “C-h i C-s guile” as $DEITY intended.
#!/usr/local/bin/guile -s !# (use-modules (ice-9 regex))
We’re ready to start writing our actual program! Because we’re exciting and creative folk, we’ll call our single procedure
We’ll go ahead and use a
let statement to grab all but the first element of the
program-arguments list and stick it in the
args variable for brevity (the first element is the name of the executable file). This use of
let isn’t really required in such a simple program, but I find that it makes things easier to read, and if I expand the program later, it’s easier to modify.
(define (main) (let ((args (cdr (program-arguments))))
We can’t just assume that the
args list is going to have anything in it, however, so we’ll print a short message and exit the program if it’s empty. If it’s not empty, we travel on to the `else’ clause of the
(if (null? args) (begin (display "No arguments, exiting...") (newline) (exit))
Now that we’ve invoked the interpreter with the right incantations, loaded our required module, and checked the program arguments list to make sure that we have something there to process, we can write the part of the program that actually does something. Sweet!
In the `else’ clause of the
if expression, we iterate over
args using the
for-each procedure. We use
for-each in this case (rather than our beloved
map) because we don’t want to build a new list by transforming each element of
args, we just want to iterate over our list being all “side-effect-y” (a technical term that in this case means “affecting the state of stuff on disk”).
The best way to read Lisp code is usually “inside-out”. Begin with the innermost element, figure out what argument(s) it takes, and see what it passes along as a return value. That return value is then an input for something else. This is true in most computer languages, but in Lisp it becomes especially necessary to read things this way.
Therefore we’ll start inside the innermost expression, at
regexp-substitute/global. The documentation says that it needs a port, a regular expression, and a string to match that regular expression against. Since
regexp-substitute/global isn’t writing its output to a port, but passing its arguments out to
string-downcase, we specify “no port” as
Post has to do with making
regexp-substitute/global recur on any unmatched parts of the string in
arg, and the literal
- is what we’d like to replace our matches with. For more comprehensive information on
post, I actually needed to consult the documentation on
regexp-substitute/global is apparently a special case of the former (and is perhaps implemented using
regexp-substitute? I didn’t check, but it would be easy enough to do so).
Let’s look at that regex,
[,'!_ \t]+. In English, it means “match any commas, apostrophes, exclamation points, underscores, blank spaces or tabs”. As noted above, we want to replace any occurrences of these characters with
For example, a string like
Hey Kids I Have Spaces.txt would become
Hey-Kids-I-Have-Spaces.txt. We then pass it out to the
string-downcase procedure, which transforms it into
That value is then passed as the second argument to the
rename-file procedure, which renames
arg (our original, uncool filename) to
It’s all wrapped in a
lambda expression, which does the job of creating and invoking a one-argument procedure out of the several we’ve discussed; this procedure is then applied to every item in our argument list
(for-each (lambda (arg) (rename-file arg (string-downcase (regexp-substitute/global #f "[,'!_ \t]+" arg 'pre "-" 'post)))) args))))
Invocation and Program Listing
In this way the file renaming operation that we’ve defined here is applied to each of our program’s arguments, and we invoke it like so (shown here operating on two files):
$ guile renamer.scm Hey\ Kids\ I\ Got\ Spaces.txt Oh_no_ugly_underscores.html
A final note: even for a program as simple as this, I didn’t sit down and bang it out all in one go. Especially with the regex, I was testing little parts of it at the REPL the whole way, consulting the documentation for these functions via the relevant Geiser and Emacs commands. But that’s a story for another day…
Finally, here’s the complete program listing:
#!/usr/local/bin/guile -s !# (use-modules (ice-9 regex)) (if (null? args) (begin (display "No arguments, exiting...") (newline) (exit)) (for-each (lambda (arg) (rename-file arg (string-downcase (regexp-substitute/global #f "[,'!_ \t]+" arg 'pre "-" 'post)))) args)))) (main)