A Trivial Utility: Prepend

Recently at work I needed to add a timestamp to the top of a bunch of Markdown files. There are plenty of ways to skin this particular cat. As you probably know, the semantics of how you navigate UNIX file contents mean it’s easy to add something to the end of a file, but it’s not as easy to add something to the beginning.

This is a pretty trivial task that other people have solved in lots of ways. In my case, I decided against a shell script using sh or the like because I use Windows a bunch, too, and I wanted something cross-platform. As usual for me, this meant breaking out Perl.

I decided to name the tool prepend, on the grounds that that’s what it does: it adds text content to the beginning of a file.

Since I like to design top-down, let’s look at how it’s meant to be used:


There are two ways to use it:

  1. Add a string to the beginning of one or more files
  2. Add the contents of a file to the beginning of one or more files

Let’s say I wanted to add a timestamp to every Markdown file in a directory. In such a case I’d add a string like so:

$ prepend '<!-- Converted on: 1/26/2017 -->' *.md

If I had some multi-line text I wanted to add to the beginning of every Markdown file, I’d say

$ prepend /tmp/multiline-preamble.md *.md

The code is shown below. I could have written it using a more low-level function such as seek but hey, why fiddle with details when memory is cheap and I can just read the entire file into an array using Tie::File?

#!/usr/bin/env perl

use strict;
use warnings;
use experimentals;
use autodie;
use IO::All;
use Tie::File;

my $usage = <<"EOF";
    \$ $0 '<!-- some text for the opening line -->' *.md
    \$ $0 /tmp/message.txt *.txt
die "$usage\n" unless scalar @ARGV >= 2;
my @files = @ARGV;

my $maybe_file = shift;
my $content;

if (-f $maybe_file) {
  $content = io($maybe_file)->slurp;
else {
  $content = $maybe_file;

for my $file (@files) {
  my @lines;
  tie @lines, 'Tie::File', $file;
  unshift @lines, $content;
  untie @lines;

Some thoughts about privacy and networked computers

According to the Merriam-Webster dictionary, privacy is:

a : the quality or state of being apart from company or observation

b : freedom from unauthorized intrusion

Historically, the most common use of this term was around one’s physical space. If you go into a room in your house and close the door, you are experiencing privacy. You are “apart from” others. They cannot see you, hear you, etc.

If you go and sit in your back yard, then depending on the visibility of your back yard to neighbors and passersby, you are experiencing some degree of privacy.

If you send a letter to a friend (the kind that is written on paper and wrapped in an envelope), you have an expectation of privacy in that you expect that your letter will not be read.

In each of these cases, there is a physical barrier that separates the space that you (or your communications) occupy from space that is available for other people to see and observe.

In each of these cases, the physical objects in question do not broadcast information. There is visual information available to any passersby or other residents of the home, but the passerby must take action to look, to seek it out.

It seems that historical notions of privacy have to do with physical presence and a third party must make an effort to transgress a boundary that you have explicitly put in place (a door, wall, envelope, etc.).

Networked computers do not have any of these characteristics!

A networked computer is, in essence, a beacon that is constantly shining in the night. Networked computers constantly transmit information to other computers, and you have to do a lot of work if you want to keep that from happening (and you will probably not succeed anyway, even if you have a lot of relevant expertise). To use the beacon metaphor, it is shining all of the time, you have to do a lot of work to keep it covered up, and any slip of the covering means you will be visible from many miles away.

This flips the notion of what we historically think of as “privacy” completely on its head. Rather than a third party being forced to transgress a boundary to see you in a private room, tear open your letter’s envelope, or jump your backyard fence, the third party running the computer network that you connect to, or the server that your browser connects to, would have to explicitly take action to drop information on the floor that they have already been given.

This is a fundamentally different thing to ask for. In the first case, you are saying, “please don’t cross this physical boundary”. In the second, you are saying “I am sending you this information, but please don’t read it. Well actually, that won’t work — you will have to read it to provide me the service I’m asking for. But after providing the service, please go back and erase the information. Definitely don’t store it anywhere.”

Whereas in the first case you were asking the third party to simply avoid a behavior, in the second case you are asking the third party to do work on your behalf. This is going to be fundamentally harder to accomplish, and you really need to understand that you are asking someone else to do something for you. Implied in asking someone else to do work on your behalf is that they are not obligated to do that work, except under certain conditions or relationships.

You almost certainly do not enjoy these conditions or relationships with network operators, computer manufacturers, the writers of web browsing software, web applications, or advertising technology. You are not in a position to demand extra work from these entities.

Another way of looking at it is: given a possibility space of all behaviors in the physical realm, traditional privacy just carves out a small area of the total space and says “don’t go here”. It looks like this:


Given the possibility space of all of the behaviors that can be engaged in by networked computers, “privacy” carves out an area of the total space and says “you will have to go here at least once to provide me with network connectivity and other services, but I want you to then take a second pass over that area and erase/drop the information you collected during the first pass”.


I hope the above explains why I do not really like or agree with the use of the word “privacy” in discussions about computers. It’s the wrong word. I don’t know if we even have a word for what we need going forward.

Advent of Code 2017, Day 2

This is my solution for Day 2 of this year’s Advent of Code.

You may also enjoy browsing the Day 2 solutions megathread on Reddit.


The spreadsheet consists of rows of apparently-random numbers. To make sure the recovery process is on the right track, they need you to calculate the spreadsheet’s checksum. For each row, determine the difference between the largest value and the smallest value; the checksum is the sum of all of these differences.

For example, given the following spreadsheet:

5 1 9 5
7 5 3
2 4 6 8

The first row’s largest and smallest values are 9 and 1, and their difference is 8.

The second row’s largest and smallest values are 7 and 3, and their difference is 4.

The third row’s difference is 6.

In this example, the spreadsheet’s checksum would be 8 + 4 + 6 = 18.


(define (line->list line)
  ;; String -> List
  (let ((read-ln (field-reader (infix-splitter (rx (+ whitespace)))))
        (in-port (make-string-input-port line)))
    (receive (record fields)
        (read-ln in-port)
      (map string->number fields))))

(define (read-spreadsheet file)
  ;; File -> List[List[Number]]
  (call-with-input-file file
    (lambda (port)
      (let loop ((line (read-line port))
                 (results '()))
        (if (eof-object? line)
            (loop (read-line port) (cons line results)))))))

(define (main prog+args)
  (let ((rows (read-spreadsheet "/Users/rloveland/Code/personal/advent-of-code/2017/02/02.dat")))
    (write (apply + (map
                     (lambda (row)
                       (let* ((xs (line->list row))
                              (min (apply min xs))
                              (max (apply max xs)))
                         (- max min)))

Advent of Code 2017, Day 1

This is my solution for Day 1 of this year’s Advent of Code.

You may also enjoy browsing the Day 1 solutions megathread on Reddit.


The captcha requires you to review a sequence of digits (your puzzle input) and find the sum of all digits that match the next digit in the list. The list is circular, so the digit after the last digit is the first digit in the list.

For example:

  • 1122 produces a sum of 3 (1 + 2) because the first digit (1) matches the second digit and the third digit (2) matches the fourth digit.

  • 1111 produces 4 because each digit (all 1) matches the next.

  • 1234 produces 0 because no digit matches the next.

  • 91212129 produces 9 because the only digit that matches the next one is the last digit, 9.


(define captcha-input "5994521226795838")

'(set! captcha-input "1111")

'(set! captcha-input "1122")

'(set! captcha-input "1234")

'(set! captcha-input "91212129")

(define (gather-matches s)
  ;; String -> List
  (let ((in-port (make-string-input-port s)) (count 0) (head #f) (vals '()))
    (let loop ((cur (read-char in-port)) (next (peek-char in-port)) (count count) (vals vals))
      (if (eof-object? next)
          (if (char=? cur head)
              (cons cur vals)
          (cond ((= count 0)
                   (set! head cur)
                   (loop cur next (+ 1 count) vals)))
                 ((char=? cur next)
                 (loop (read-char in-port) (peek-char in-port) (+ 1 count) (cons cur vals)))
                (else (loop (read-char in-port) (peek-char in-port) (+ 1 count) vals)))))))

(define (main prog+args)
  (let* ((matches (gather-matches captcha-input))
         (matches* (map (lambda (c) (string->number (string c))) matches))
         (sum (apply + matches*)))
      (format #t "MATCHES*: ~A~%" matches*)
      (format #t "SUM: ~A~%" sum))))

Thinking about software documentation as the output of a lossy compression algorithm


How many times have you heard or read the following comments?

  1. “The docs are always out of date”
  2. “I don’t bother reading the docs, I just read the source code”
  3. “If you write self-documenting code, you don’t need to write docs”

If you work in software, I bet the answer is: a lot. They range in truth value from #1, which is a tautology, to #3, which is a fantasy. I can relate to #2, since I had to do it just yesterday when using a semi-undocumented Ruby library.

I think all of these points of view (and more) are explained when you think about software documentation as the output of a lossy compression algorithm. Many (most? all?) of the things that you love and/or hate about the documentation for a particular piece of software are explained by this.

Quoth wiki:

Well-designed lossy compression technology often reduces file sizes significantly before degradation is noticed by the end-user. Even when noticeable by the user, further data reduction may be desirable (e.g., for real-time communication, to reduce transmission times, or to reduce storage needs).

As such, the features of software documentation are similar to the features of other products of lossy compression algorithms.

Take mp3s for example. The goal of an mp3 is not to be the highest-fidelity replication of the audio experience it represents. The goal of an mp3 is to provide a “good enough” audio experience given the necessary tradeoffs that had to be made because of constraints such as:

  • Time: How expensive in CPU time is it to run the decompression algorithm? Can the CPU on the device handle it?
  • Space: How much disk space will the file take on the device?

Similarly, we might say that the goal of software documentation is not to be the highest-fidelity replication of the “understanding” experience it (theoretically) represents. The goal of a piece of documentation is to provide a “good enough” learning experience given the necessary tradeoffs that had to be made because of constraints such as:

  • Time: How expensive in person time is it to “run the decompression algorithm”, i.e., learn how the system works well enough to write something coherent? And then to actually write it? How many technical writers does the organization employ per engineer? (In many companies it’s ~1 per 40-50 engineers) How many concurrent projects is that writer working on, across how many teams?
  • Space: How much information does the user need to use the system? How little information can you get away with providing before users give up in disgust?

Remember that fewer, more effective materials cost more to produce. This is similar to the way better compression algorithms may cost more than worse ones along various axes you care about (dollar cost for proprietary algorithms, CPU, memory, etc.)

It takes longer to write more concise documentation, draw clear system diagrams, etc., since those are signs that you actually understand the system better, and have thus compressed the relevant information about it into fewer bytes.

And oh by the way, in practice in an “agile” (lol) environment you don’t have enough time to write the “best” docs for any given feature X. Just like the programmer who wrote feature X would likely admit that she didn’t have enough time to write the “best” implementation according to her standards.

Quoth Pascal:

I would have written a shorter letter, but I did not have the time.

So the next time you are frustrated by the docs for some piece of software (if any docs exist at all), instead of some platitude about docs sucking, think “oh, lossy compression”.

(Image courtesy Jonathan Sureau under Creative Commons license.)

A Portable Scheme Module System



In this post I’d like to introduce load-module, a portable Scheme module system.

Why did I write a module system?

  • Simplicity: A single-file module system in about 200 lines of code
  • Understandability: The implementation avoids wizardry and should be accessible to anyone who knows the language
  • Portability: One system that can be used across multiple implementations

The way it works is this:

  1. You have a file (say, utils.scm) with Scheme code in it that implements stuff that you want to live in the same module.
  2. You create another file (utils.mod, but that extension is easy to change) which lists the procedures and syntax you want to export.
  3. The load-module procedure reads utils.scm, rewriting unexported procedure names such that only the procedures you want exported show up at the top-level. Everything else gets rewritten during load-time as an ignorable “gensym” of the form %--gensym-utils-random-integer-8190504171, where “utils” is the module name, and “random-integer” is the procedure internal to your module.

The module file format is very simple:

(define-module utils
  (exports random-integer atom? take))

The module system exports one procedure: load-module. Run it like so to get the procedures from the aforementioned hypothetical utils package into your environment:

> (load "load-module.scm")
> (load-module 'utils)
> (random-integer 199)
> (atom? 199)

If you care, there’s more information about over at the project README.

(Image courtesy Geoff Collins under Creative Commons license.)

Announcing cpan.el

The Cat's Eye Nebula, one of the first planetary nebulae discovered, also has one of the most complex forms known to this kind of nebula. Eleven rings, or shells, of gas make up the Cat's Eye. Credit: NASA, ESA, HEIC, and The Hubble Heritage Team (STScI/AURA) Acknowledgment: R. Corradi (Isaac Newton Group of Telescopes, Spain) and Z. Tsvetanov (NASA) The Hubble Space Telescope is a project of international cooperation between NASA and the European Space Agency. NASA's Goddard Space Flight Center manages the telescope. The Space Telescope Science Institute conducts Hubble science operations. Goddard is responsible for HST project management, including mission and science operations, servicing missions, and all associated development activities.

The CPAN shell is just another shell, so why not drive it from Emacs?

If you write Perl code in Emacs, you may have wondered why we don’t have a simple mode for driving the CPAN shell (at least I couldn’t find one!).

Well, I finally stopped wondering. It wasn’t that hard to rip out the sh-specific parts of shell.el and make a new mode for the CPAN shell.

Here’s the code:


It’s easy to load up and drive from Emacs:

(add-to-list 'load-path (expand-file-name "/path/to/cpan-el/"))
(setq cpan-file-name "cpan")

(require 'cpan)

To run it, type M-x cpan.

There aren’t too any bells and whistles yet (completion, etc.), but you it’s pretty small so feel free to hack away.

(Image courtesy NASA Goddard Photo and Video under Creative Commons License.)