A full implementation of breadth-first search of a graph

In this post I’ll share a full, standalone implementation of breadth-first search of a graph. Breadth-first search (hereafter BFS) is used in many graph algorithms. In particular it is useful for finding the shortest path (in terms of fewest vertices, or “hops”) between two vertices A and B.

There are many pseudocode examples showing breadth-first search of a graph on the interwebs, and lots of textual descriptions. There are also many videos on Youtube talking about this algorithm.

Unfortunately, despite all of the above, I had a really hard time finding the complete source code of a graph traversal that I could read and study — at least, one that was at the right level of complexity for me to actually learn how it works. A lot of the code I found online or in books was too abstract and hid implementation details from me by relying on “helper” libraries which were not really explained. For example, this seemed to be the case with the Java code from Sedgewick. The underlying data structures of most algorithms’ implementations were hidden away in library code which I would have needed to study to understand what was really happening.

On the other hand, the couple of production-quality graph implementations I looked at were very hard to read for someone not very familiar with the code and the algorithm already.

I knew that in order to really learn how this algorithm works, I needed to aim somewhere in the middle. In the end, I decided to implement it myself to make sure I understood it. I’m sure that’s pretty common!

So what do I mean by a full, standalone implementation?

I mean that you can download just the code on this page, and run it on any machine where Perl is installed, and it will work. Other than the language it’s written in, there are no external libraries or other dependencies.

Without further ado here is my take on a full implementation of breadth-first search of a graph. If you download only the code on this page and run it on your machine, it will work – the batteries are included, as they say.

(Incidentally, the statement above is also true of the Scheme code from Simple Network Search in Scheme, if you’d rather look at an implementation in a non-ALGOL language. However, that code is not really “my own creation”, it’s merely ported to Scheme from a Lisp book.)

Algorithm prose description

At a high level, the way BFS works is:

  • 1. Push the starting vertex A onto a queue
  • 2. While the queue is not empty, get the next vertex from it (in this case A, since we just added it)
  • 3. Look at each neighboring vertex N of A and take the following steps:
  • 1. If N has already been marked as visited, skip it and go to the next neighbor of A.
  • 2. Add the pair (N, A) to a spanning tree (or a reverse linked list). If we find the end node B, we will walk this spanning tree backward to find the path from the end vertex B back to the start vertex A.
  • 3. If the currently visited neighbor N is the end vertex B we’re looking for, we walk the spanning tree backward to build the path taken from B back to A.
  • 4. If the currently visited neighbor N is not the end vertex, we push N onto the queue to be processed.
  • 5. Finally, we mark N as visited.

Pseudocode

To supplement the prose description above, here is some pseudocode for BFS. It maps pretty directly onto the concrete implementation we will get into in the next section. I did complain a little above about pseudocode, but of course I’ll subject you to mine just the same. :-} I do think this is a reasonably “complete” pseudocode that you could actually use to implement BFS, unlike many examples that I found.

    function bfs(start, end, graph) {

        push start onto queue
        mark start as seen
        make a spanning tree for storing the path from start to end

        while queue is not empty {
            vertex = get next item from queue
            for each neighbor of vertex in the graph {
                if neighbor has been seen {
                    next
                }
                add (neighbor, vertex) to the spanning tree
                if this neighbor is the end node we want {
                    walk the spanning tree, printing path from start to end
                    exit
                }
                else {
                    push neighbor onto the queue
                }
                mark this neighbor as visited
            }
        }
      print "No path found"
      exit
    }

Implementation

The code below shows a direct implementation of BFS in Perl. The graph it will search is the one shown in the diagram at the top of this post. In code, the data structure for the graph is as shown below. It’s an array of arrays: the first element of each array is the vertex A, followed by an array of its neighbors. This is the classic “adjacency list” representation of a graph.

    my $graph = [['s', ['a', 'd']],
                 ['a', ['s', 'b', 'd']],
                 ['b', ['a', 'c', 'e']],
                 ['c', ['b']],
                 ['d', ['s', 'a', 'e']],
                 ['e', ['b', 'd', 'f']]];

Next, here is the main loop of BFS. If you know Perl, you can see that it maps pretty much 1:1 to the description in the above pseudocode. The shape of this code would be pretty much the same in another language such as Python or Ruby, with a few small syntactic differences, but hardly any semantic differences.

    sub find_path_between {
        my ( $start, $end, $graph ) = @_;

        return () unless defined $start && defined $end;

        my @path;     # Path so far
        my @queue;    # Vertices still to visit.
        my %seen;     # Vertices already seen.
        my $found;    # Whether we have found the wanted vertex.
        my $st = {};  # Spanning tree, used to find paths.

        if ( $start eq $end ) {
            push @path, $start;
            return @path;
        }

        push @queue, $start;
        $seen{$start}++;

        while (@queue) {
            my $v         = shift @queue;
            my $neighbors = get_neighbors( $v, $graph );

            for my $neighbor (@$neighbors) {
                next if $seen{$neighbor};
                _st_add( $v, $neighbor, $st );
                if ( $neighbor eq $end ) {
                    $found++;
                    @path = _st_walk( $start, $end, $st );
                    return @path;
                }
                else {
                    push @queue, $neighbor;
                }
                $seen{$neighbor}++;
            }
        }
        return $found ? @path : ();
    }

Once the main loop structure is written so that you are walking the vertices of the graph in the correct (breadth-first) order, one part I found slightly tricky was keeping a “trail of bread crumbs” from the end node back to the start. You may have noticed that the code above uses two helper functions — _st_add and _st_walk — that hide that complexity away a little bit. We will now look at them in more detail.

_st_add is trivial, and could have been written directly as a hash table access. The idea is to keep a logical “reverse linked list” structure in a hash table, where each vertex added to the list has a link pointing back to the previous vertex. I hid it from myself in this function so I could read the logic in the main search loop more easily.

    sub _st_add {
        my ( $vertex, $neighbor, $st ) = @_;
        $st->{$neighbor}->{prev} = $vertex;
    }

_st_walk is a little more interesting. As noted above, we kept a trail of bread crumbs from our end vertex (if such exists) back to the start vertex. _st_walk walks that trail of crumbs backward and builds an array holding all of the vertices visited along the way, which it returns to the caller.

    sub _st_walk {
        my ( $start, $end, $st ) = @_;

        my @path;
        push @path, $end;

        my $prev = $st->{$end}->{prev};
        while (1) {
            if ( $prev eq $start ) {
                push @path, $start;
                last;
            }
            push @path, $prev;
            $prev = $st->{$prev}->{prev};
            next;
        }
        return reverse @path;
    }

Finally, you may have noticed that in order to visit all of the neighbors of some node A we have to be able to list them. The function get_neighbors handles that task, along with its helper function _find_index. Let’s look at them in turn.

get_neighbors looks up some node N in the graph, and returns its list of neighbors, if there are any.

    sub get_neighbors {
        my ( $k, $graph ) = @_;

        my $index = _find_index( $k, $graph );

        if ( defined $index ) {
            return $graph->[$index]->[1];
        }
        else {
            return;
        }
    }

_find_index looks up the node N‘s index in our array-based graph representation. This is actually not a very performant way to do this, since it’s doing a linear search into an array. There are ways to speed this up, such as using a hash of arrays for faster vertex lookup, or using a cache. However I felt it would be better to keep the graph’s data representation as simple as possible for this example. (Incidentally, the below is exactly the sort of somewhat uninteresting but very necessary bookkeeping code I was having a hard time finding in much of the example code in books or online.)

    sub _find_index {
        my ( $wanted, $graph ) = @_;

        # Naive linear search, for now.
        my $i = 0;
        for my $elem (@$graph) {

            # Definedness check here is necessary because we delete
            # elements from the graph by setting the element's index to
            # undef.  In other words, some graph indices can be undef.
            if ( defined $elem->[0] && $elem->[0] eq $wanted ) {
                return $i;
            }
            $i++;
        }
        return;
    }

Finally, we have a main function that drives everything. First, we find the shortest path (by number of nodes) from ‘s’ to ‘c’. Then, we find the shortest path from ‘s’ to ‘f’:

    sub main {

        my $graph = [
            [ 's', [ 'a', 'd' ] ],
            [ 'a', [ 's', 'b', 'd' ] ],
            [ 'b', [ 'a', 'c', 'e' ] ],
            [ 'c', ['b'] ],
            [ 'd', [ 's', 'a', 'e' ] ],
            [ 'e', [ 'b', 'd', 'f' ] ]
        ];

        my $start = 's';
        my $end   = 'c';

        my @path = find_path_between( $start, $end, $graph );

        print qq[Path from '$start' to '$end' is: @path\n];

        # Find a second path.
        $end  = 'f';
        @path = find_path_between( $start, $end, $graph );
        print qq[Path from '$start' to '$end' is: @path\n];
    }

Putting everything above together and running it will print the following output:

    Path from 's' to 'c' is: s a b c
    Path from 's' to 'f' is: s d e f

References

1. Aho, Ullman, Hopcroft – Data Structures and Algorithms.

2. Sedgewick – Algorithms in Java, Part 5: Graph Algorithms.

3. Orwant, Hietaniemi, Macdonald – Mastering Algorithms with Perl.

Appendix: The complete program listing

    #!perl

    use strict;
    use warnings;

    sub find_path_between {
        my ( $start, $end, $graph ) = @_;

        return () unless defined $start && defined $end;

        my @path;     # Path so far
        my @queue;    # Vertices still to visit.
        my %seen;     # Vertices already seen.
        my $found;    # Whether we have found the wanted vertex.
        my $st = {};  # Spanning tree, used to find paths.

        if ( $start eq $end ) {
            push @path, $start;
            return @path;
        }

        push @queue, $start;
        $seen{$start}++;

        while (@queue) {
            my $v         = shift @queue;
            my $neighbors = get_neighbors( $v, $graph );

            for my $neighbor (@$neighbors) {
                next if $seen{$neighbor};
                _st_add( $v, $neighbor, $st );
                if ( $neighbor eq $end ) {
                    $found++;
                    @path = _st_walk( $start, $end, $st );
                    return @path;
                }
                else {
                    push @queue, $neighbor;
                }
                $seen{$neighbor}++;
            }
        }
        return $found ? @path : ();
    }

    sub _st_walk {
        my ( $start, $end, $st ) = @_;

        my @path;

        push @path, $end;
        my $prev = $st->{$end}->{prev};
        while (1) {
            if ( $prev eq $start ) {
                push @path, $start;
                last;
            }
            push @path, $prev;
            $prev = $st->{$prev}->{prev};
            next;
        }
        return reverse @path;
    }

    sub _st_add {
        my ( $vertex, $neighbor, $st ) = @_;
        $st->{$neighbor}->{prev} = $vertex;
    }

    sub get_neighbors {
        my ( $k, $graph ) = @_;

        my $index = _find_index( $k, $graph );

        if ( defined $index ) {
            return $graph->[$index]->[1];
        }
        else {
            return;
        }
    }

    sub _find_index {
        my ( $wanted, $graph ) = @_;

        # Naive linear search, for now.
        my $i = 0;
        for my $elem (@$graph) {

            # Definedness check here is necessary because we delete
            # elements from the graph by setting the element's index to
            # undef.  In other words, some graph indices can be undef.
            if ( defined $elem->[0] && $elem->[0] eq $wanted ) {
                return $i;
            }
            $i++;
        }
        return;
    }

    sub main {
        my $graph = [
            [ 's', [ 'a', 'd' ] ],
            [ 'a', [ 's', 'b', 'd' ] ],
            [ 'b', [ 'a', 'c', 'e' ] ],
            [ 'c', ['b'] ],
            [ 'd', [ 's', 'a', 'e' ] ],
            [ 'e', [ 'b', 'd', 'f' ] ]
        ];

        my $start = 's';
        my $end   = 'c';

        my @path = find_path_between( $start, $end, $graph );

        print qq[Path from '$start' to '$end' is: @path\n];

        # Find a second path.
        $end  = 'f';
        @path = find_path_between( $start, $end, $graph );
        print qq[Path from '$start' to '$end' is: @path\n];
    }

    main();

Pythagoras Pie

(Image courtesy Tomohiro Tachi under Creative Commons license.)

Recently I came across a fun programming challenge called the Pythagoras Pie, which was described as:

At a party a pie is to be shared by 100 guests. The first guest gets 1% of the pie, the second guest gets 2% of the remaining pie, the third gets 3% of the remaining pie, the fourth gets 4% and so on.

Write a script that figures out which guest gets the largest piece of pie.

I sat down for a few minutes, and wrote the obvious code. It iterates over the list of guests. For each guest, it calculates how large a piece of pie the guest will get. All the while, it stores size of the largest piece of pie it has seen so far.

Here is a solution in Perl.

sub slice_pie {
    my $iters   = shift;
    my $pie     = 1;
    my $largest = 0;
    my $winner  = 0;
    for ( 0 .. $iters ) {
        my $iter_value = $_ * .01;
        my $portion    = ( $iter_value * $pie );
        $pie = $pie - $portion;

        if ( $portion >= $largest ) {
            $largest = $portion;
            $winner  = $_;
        }
    }
    print qq[Winner is guest # $winner with the largest portion: $largest\n];
}

slice_pie(100);

The answer, as it turns out, is that the 10th guest gets the largest piece of the pie: 0.0628156509555295, or about 6%.

Just for fun, I wrote almost the same exact code once again, except this time in Perl 6. Even though this is a straightforward translation using the same basic loop structure, it has a few nice improvements:

  • No need for argument unpacking (saves a horizontal line — vertical compactness is good)
  • Nice type annotations mean we can call an integer an Int, which also helps the compiler
  • No need for parens around the for and if checks
sub slice-pie(Int $iters) {
    my $pie = 1;
    my $largest = 0;
    my Int $winner;

    for 0 .. $iters {
        my $iter_value = $_ * .01;
        my $portion = $iter_value * $pie;
        $pie -= $portion;

        if $portion >= $largest {
            $largest = $portion;
            $winner = $_;
        }
    }
    say qq[Winner is guest number $winner with the largest portion: $largest ];
}

slice-pie(100);

A Perl one liner to generate passwords

I’ve noticed that browsers like Safari and Chrome are helpfully offering to generate secure passwords for me when I create a new login somewhere.

Sometimes this is really nice! Certainly it’s better than having to take a few minutes to compose a new password, especially since the quality of passwords I can easily generate on the spot is … shall we say of questionable quality sometimes, depending on time of day, blood sugar levels, etc.

So just for fun I decided to type out a Perl one liner to generate passwords for me in situations where I don’t necessarily have access to (or want) Safari and friends to do it for me.

I make no claims nor warranties about the “security” of the passwords generated by the following code, but I sure did enjoy writing it.  Just for fun, I did paste an output string into Kaspersky’s online password strength tester, and according to the tester it’s … actually not bad?  (Again: not an expert here)

Anyway, here’s the code.  It loops over an alphanumeric array with some special characters thrown in, grabbing one character at random for each iteration.  It also folds the case of the character if the number of the current iteration is even (assuming the character is of the sort whose case can be folded, which some aren’t).

$ perl -E '@vals = split "", "0x1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ-?!@^&*()"; $_ % 2 == 0 ? print $vals[ rand($#vals) ] : print fc $vals[ rand($#vals) ] for 0 .. 24; say;'

To give you a sense of the output of this script, here’s the password I typed into the Kaspersky checker for reference:

Vp8vJmNnN*8(CrE8*30*4@JlC

Advent of Code 2017, Day 2

This is my solution for Day 2 of this year’s Advent of Code.

You may also enjoy browsing the Day 2 solutions megathread on Reddit.

PROBLEM

The spreadsheet consists of rows of apparently-random numbers. To make sure the recovery process is on the right track, they need you to calculate the spreadsheet’s checksum. For each row, determine the difference between the largest value and the smallest value; the checksum is the sum of all of these differences.

For example, given the following spreadsheet:

5 1 9 5
7 5 3
2 4 6 8

The first row’s largest and smallest values are 9 and 1, and their difference is 8.

The second row’s largest and smallest values are 7 and 3, and their difference is 4.

The third row’s difference is 6.

In this example, the spreadsheet’s checksum would be 8 + 4 + 6 = 18.

SOLUTION

(define (line->list line)
  ;; String -> List
  (let ((read-ln (field-reader (infix-splitter (rx (+ whitespace)))))
        (in-port (make-string-input-port line)))
    (receive (record fields)
        (read-ln in-port)
      (map string->number fields))))

(define (read-spreadsheet file)
  ;; File -> List[List[Number]]
  (call-with-input-file file
    (lambda (port)
      (let loop ((line (read-line port))
                 (results '()))
        (if (eof-object? line)
            results
            (loop (read-line port) (cons line results)))))))

(define (main prog+args)
  (let ((rows (read-spreadsheet "/Users/rloveland/Code/personal/advent-of-code/2017/02/02.dat")))
    (write (apply + (map
                     (lambda (row)
                       (let* ((xs (line->list row))
                              (min (apply min xs))
                              (max (apply max xs)))
                         (- max min)))
                     rows)))
    (newline)))

Advent of Code 2017, Day 1

This is my solution for Day 1 of this year’s Advent of Code.

You may also enjoy browsing the Day 1 solutions megathread on Reddit.

PROBLEM

The captcha requires you to review a sequence of digits (your puzzle input) and find the sum of all digits that match the next digit in the list. The list is circular, so the digit after the last digit is the first digit in the list.

For example:

  • 1122 produces a sum of 3 (1 + 2) because the first digit (1) matches the second digit and the third digit (2) matches the fourth digit.

  • 1111 produces 4 because each digit (all 1) matches the next.

  • 1234 produces 0 because no digit matches the next.

  • 91212129 produces 9 because the only digit that matches the next one is the last digit, 9.

SOLUTION

(define captcha-input "5994521226795838")

'(set! captcha-input "1111")

'(set! captcha-input "1122")

'(set! captcha-input "1234")

'(set! captcha-input "91212129")

(define (gather-matches s)
  ;; String -> List
  (let ((in-port (make-string-input-port s)) (count 0) (head #f) (vals '()))
    (let loop ((cur (read-char in-port)) (next (peek-char in-port)) (count count) (vals vals))
      (if (eof-object? next)
          (if (char=? cur head)
              (cons cur vals)
              vals)
          (cond ((= count 0)
                 (begin
                   (set! head cur)
                   (loop cur next (+ 1 count) vals)))
                 ((char=? cur next)
                 (loop (read-char in-port) (peek-char in-port) (+ 1 count) (cons cur vals)))
                (else (loop (read-char in-port) (peek-char in-port) (+ 1 count) vals)))))))

(define (main prog+args)
  (let* ((matches (gather-matches captcha-input))
         (matches* (map (lambda (c) (string->number (string c))) matches))
         (sum (apply + matches*)))
    (begin
      (format #t "MATCHES*: ~A~%" matches*)
      (format #t "SUM: ~A~%" sum))))

A Portable Scheme Module System

collins-mustang

 

In this post I’d like to introduce load-module, a portable Scheme module system.

Why did I write a module system?

  • Simplicity: A single-file module system in about 200 lines of code
  • Understandability: The implementation avoids wizardry and should be accessible to anyone who knows the language
  • Portability: One system that can be used across multiple implementations

The way it works is this:

  1. You have a file (say, utils.scm) with Scheme code in it that implements stuff that you want to live in the same module.
  2. You create another file (utils.mod, but that extension is easy to change) which lists the procedures and syntax you want to export.
  3. The load-module procedure reads utils.scm, rewriting unexported procedure names such that only the procedures you want exported show up at the top-level. Everything else gets rewritten during load-time as an ignorable “gensym” of the form %--gensym-utils-random-integer-8190504171, where “utils” is the module name, and “random-integer” is the procedure internal to your module.

The module file format is very simple:

(define-module utils
  (exports random-integer atom? take))

The module system exports one procedure: load-module. Run it like so to get the procedures from the aforementioned hypothetical utils package into your environment:

> (load "load-module.scm")
> (load-module 'utils)
#t
> (random-integer 199)
76
> (atom? 199)
#t

If you care, there’s more information about over at the project README.

(Image courtesy Geoff Collins under Creative Commons license.)

Announcing cpan.el

The Cat's Eye Nebula, one of the first planetary nebulae discovered, also has one of the most complex forms known to this kind of nebula. Eleven rings, or shells, of gas make up the Cat's Eye. Credit: NASA, ESA, HEIC, and The Hubble Heritage Team (STScI/AURA) Acknowledgment: R. Corradi (Isaac Newton Group of Telescopes, Spain) and Z. Tsvetanov (NASA) The Hubble Space Telescope is a project of international cooperation between NASA and the European Space Agency. NASA's Goddard Space Flight Center manages the telescope. The Space Telescope Science Institute conducts Hubble science operations. Goddard is responsible for HST project management, including mission and science operations, servicing missions, and all associated development activities.

The CPAN shell is just another shell, so why not drive it from Emacs?

If you write Perl code in Emacs, you may have wondered why we don’t have a simple mode for driving the CPAN shell (at least I couldn’t find one!).

Well, I finally stopped wondering. It wasn’t that hard to rip out the sh-specific parts of shell.el and make a new mode for the CPAN shell.

Here’s the code:

https://github.com/rmloveland/cpan-el

It’s easy to load up and drive from Emacs:

(add-to-list 'load-path (expand-file-name "/path/to/cpan-el/"))
(setq cpan-file-name "cpan")

(require 'cpan)

To run it, type M-x cpan.

There aren’t too any bells and whistles yet (completion, etc.), but you it’s pretty small so feel free to hack away.

(Image courtesy NASA Goddard Photo and Video under Creative Commons License.)

Scripting JIRA

black-origami-dragon

JIRA is everybody’s favorite enterprise issue management system. It’s the system many of us just love to hate.

Unlike some vocal people on the interwebs, I don’t “hate” JIRA, but I like to keep it at arm’s length. My job is not to be a JIRA jockey (although I do know me some JQL, about which more below) — my job is to get shit done.

And getting shit done quickly, for me at least, is usually a function of being able to control my tools from the command line, including via scripts.

That’s why I have written several scripts for interacting with JIRA from the command line (they’re explained below). You can get them from Github.

Note that they use some hard-coded values that match the JIRA server where I work, which you’ll have to change. However they do at least use your .netrc for passwords, etc., so you should be able to change them for your own use with a quick sed one-liner.

The functionality isn’t there to fully replace your web browser, especially if you work at a company that insists on enterprise-level JIRA jockeying, with crazy themes and labels and stuff, but here’s what’s included as of this writing — for each command, I’ve marked whether it is “plumbing” or “porcelain”:

  • jira-get-issue (plumbing): View an issue’s JSON, which you can pipe through tools such as jq to build other more generic tools you can control from your text editor, scripts, etc.
  • jira-create-issue (porcelain): This one is “end-user-ready”, in the sense that you call it and it pops up your text editor of choice so you can write the issue description, and when you close your editor the issue is created for you!
  • jira-search-issues (plumbing): Uses JQL to search your JIRA, and returns a bunch of issues as JSON for you to fiddle with however you prefer.
  • jira-add-comment (porcelain): Like jira-create-issue, this one pops open your text editor to add a comment to an issue.
  • jira-get-issue-status (plumbing): Gets back JSON describing an issue’s status. Used to check the status to transition an issue forwards or backwards using jira-set-issue-status (below).
  • jira-set-issue-status (porcelain-ish): Depending on an issue’s status, bump it forwards or backwards to an adjacent status. Help output describes the statuses and their ordering. (I’m not so sure about this one; more design work is needed, it’s still not really “porcelain”.)

As an example of using one of the plumbing tools to make a more user-friendly CLI tool, I have one I call jli that just gives me a list of my currently open issues:

#!/usr/bin/env sh

jira-search-issues "assignee = rloveland AND status not in \ 
(Closed, Resolved)" | jq '.key + " " + .fields.summary' | sed -e 's/\"//g';

Since it’s an arbitrary JQL query returning JSON, it’s easy to imagine how you might extend this to give separate lists per-project, ordered by status, etc. You can make your own little terminal- or text-file-based dashboard, send yourself a daily digest email using a larger script, and so forth.

There are some others, but I only have them on my work computer. For example, I have one that checks JIRA for new tickets by diffing the list of my assigned tickets and my local TODO list, which is in a text file.

Obviously these tools could use further development to be more modular and general-purpose, and they don’t come anywhere near covering the whole JIRA API, but they’re pretty small and easy to modify for your own use.

Most importantly, I use them every day at my job, so I know they work and are useful.

Hopefully they are useful to you too!

(Image courtesy Terry Robinson via Flickr under a Creative Commons license.)

Editing Chrome Textareas with Edwin

edwin-editing-textarea

In this post, I’ll describe how to edit Chrome textareas with the Edwin text editor that comes built-in with MIT/GNU Scheme.

If you just want to see the end result, see the screenshot and video at the end of this post.

These instructions will also work with recent releases of the Opera browser (since the newer Chromium-based versions can run Chrome plugins). They may also work at some point with Firefox, when Mozilla implements the new WebExtensions API.

At a high level, the steps to edit Chrome textareas with Edwin are:

  1. Install a browser add-on
  2. Customize Edwin with a few hacks
  3. Write a shell script to make it easy to launch Edwin from the command line
  4. Run a local “edit server” that interacts with the browser add-on and launches Edwin

On This Page

Install the ‘Edit with Emacs’ add-on

Install the Edit with Emacs add-on from the Chrome Web Store.

Load some Edwin hacks

The default way to open Edwin is to run

$ mit-scheme --edit

This just launches an Edwin editor window. From there, you need to manually open files and edit them.

What we need is a way to launch Edwin and open a specific file automatically. Most editors you are familiar with already do this, e.g.,

$ vim /tmp/foo.txt
$ emacsclient /tmp/bar.txt

To be able to launch Edwin in this way, we need to hack a few procedures in the file editor.scm in the MIT/GNU Scheme source and load them from the Edwin init file. We’ll tackle each of these tasks separately below.

Hacking editor.scm

To get Edwin to open a file on startup, we need to tweak three procedures in editor.scm to accept and/or pass around filename arguments:

  • CREATE-EDITOR
  • STANDARD-EDITOR-INITIALIZATION
  • EDIT

Here’s the code; you can just paste it into a file somewhere. For the purposes of this post we’ll call it open-edwin-on-file.scm:

;;;; open-edwin-on-file.scm -- Rich's hacks to open Edwin on a specific file.

;;; These (minor) changes are all to the file `editor.scm'. They are
;;; all that is needed to allow Edwin to be opened on a specific file
;;; by adding a `filename' argument to the EDIT procedure.

(define (create-editor file . args)
  (let ((args
     (if (null? args)
         create-editor-args
         (begin
           (set! create-editor-args args)
           args)))
        (filename (if (file-exists? file)
                      file
                      #f)))
    (reset-editor)
    (event-distributor/invoke! editor-initializations)
    (set! edwin-editor
      (make-editor "Edwin"
               (let ((name (and (not (null? args)) (car args))))
             (if name
                 (let ((type (name->display-type name)))
                   (if (not type)
                   (error "Unknown display type name:" name))
                   (if (not (display-type/available? type))
                   (error "Requested display type unavailable:"
                      type))
                   type)
                 (default-display-type '())))
               (if (null? args) '() (cdr args))))
    (set! edwin-initialization
      (lambda ()
        (set! edwin-initialization #f)
        (if filename
                (standard-editor-initialization filename)
                (standard-editor-initialization))
    (set! edwin-continuation #f)
    unspecific))))

(define (standard-editor-initialization #!optional filename)
  (with-editor-interrupts-disabled
   (lambda ()
     (if (and (not init-file-loaded?)
          (not inhibit-editor-init-file?))
     (begin
       (let ((filename (os/init-file-name)))
         (if (file-exists? filename)
         (load-edwin-file filename '(EDWIN) #t)))
       (set! init-file-loaded? #t)
       unspecific))))
  (let ((buffer (find-buffer initial-buffer-name))
        (filename (if (not (default-object? filename))
                      ((ref-command find-file) filename)
                      #f)))
    (if (and buffer
         (not inhibit-initial-inferior-repl?))
    (start-inferior-repl!
     buffer
     (nearest-repl/environment)
     (and (not (ref-variable inhibit-startup-message))
          (cmdl-message/append
           (cmdl-message/active
        (lambda (port)
          (identify-world port)
          (newline port)))
           (cmdl-message/strings
        "You are in an interaction window of the Edwin editor."
                "Type `C-h' for help, or `C-h t' for a tutorial."
                "`C-h m' will describe some commands."
                "`C-h' means: hold down the Ctrl key and type `h'.")))))))

(define (edit file . args)
  (call-with-current-continuation
   (lambda (continuation)
     (cond (within-editor?
        (error "edwin: Editor already running"))
       ((not edwin-editor)
        (apply create-editor file args))
       ((not (null? args))
        (error "edwin: Arguments ignored when re-entering editor" args))
       (edwin-continuation
        => (lambda (restart)
         (set! edwin-continuation #f)
         (within-continuation restart
           (lambda ()
             (set! editor-abort continuation)
             unspecific)))))
     (fluid-let ((editor-abort continuation)
         (current-editor edwin-editor)
         (within-editor? #t)
         (editor-thread (current-thread))
         (editor-thread-root-continuation)
         (editor-initial-threads '())
         (inferior-thread-changes? #f)
         (inferior-threads '())
         (recursive-edit-continuation #f)
         (recursive-edit-level 0))
       (editor-grab-display edwin-editor
     (lambda (with-editor-ungrabbed operations)
       (let ((message (cmdl-message/null)))
         (cmdl/start
          (make-cmdl
           (nearest-cmdl)
           dummy-i/o-port
           (lambda (cmdl)
         cmdl       ;ignore
         (bind-condition-handler (list condition-type:error)
             internal-error-handler
           (lambda ()
             (call-with-current-continuation
              (lambda (root-continuation)
            (set! editor-thread-root-continuation
                  root-continuation)
            (with-notification-output-port null-output-port
              (lambda ()
                (do ((thunks (let ((thunks editor-initial-threads))
                       (set! editor-initial-threads '())
                       thunks)
                     (cdr thunks)))
                ((null? thunks))
                  (create-thread root-continuation (car thunks)))
                (top-level-command-reader
                 edwin-initialization)))))))
         message)
           #f
           `((START-CHILD ,(editor-start-child-cmdl with-editor-ungrabbed))
         (CHILD-PORT ,(editor-child-cmdl-port (nearest-cmdl/port)))
         ,@operations))
          message))))))))

Update your Edwin init file

Then, you’ll need to tweak your Edwin init file (also known as ~/.edwin) to load this file into Edwin’s environment on startup:

(load "/path/to/open-edwin-on-file.scm" '(edwin))

Write a shell script to make it easier launch Edwin from the command line

Now that the EDIT procedure takes a filename argument, we can wrap this all up in a shell script that calls Edwin with the right arguments. There may be other ways to accomplish this than in the code shown below, but it works.

Note that the path to my local installation of MIT/GNU Scheme on Mac OS X is slightly tweaked from the official install location. What’s important is that Scheme is invoked using the right “band”, or image file. For more information, see the fine manual.

Take the code below and stick it somewhere on your $PATH; on my machine it lives at ~/bin/edwin.

#!/usr/bin/env sh

EDIT_FILE=$1
SCHEME_CODE="(edit \"$EDIT_FILE\")"

if [[ $(uname) == 'Darwin' ]]; then
  _SCHEME_DIR=/Applications/MIT-Scheme.app/Contents/Resources
  SCHEME=$_SCHEME_DIR/mit-scheme
  MITSCHEME_BAND=$SCHEME_DIR/all.com
  CMD=$SCHEME
fi

if [[ $(uname) == 'Linux' ]]; then
  CMD=scheme
fi

N=$RANDOM
F=/tmp/edit-$N.scm

touch $F
echo $SCHEME_CODE > $F

$CMD --load $F

Install an edit server

Although the extension is called ‘Edit with Emacs’, it can be used with any text editor. You just need to be able to run a local “edit server” that generates the right inputs and outputs. Since Chrome extensions can’t launch apps directly, the extension running in the browser needs to act as a client to a locally running server, which will launch the app.

Since we want to launch Edwin, we’ll need to run a local edit server. Here’s the one that I use:

https://gist.github.com/frodwith/367752

To get the server to launch Edwin, I save the gist somewhere as editserver.psgi and run the following script (for more information on the environment variables and what they mean, see the comments in the gist):

#!/usr/bin/env sh
EDITSERVER_CMD='edwin %s' \
EDITSERVER_BLOCKING=1 \
screen -d -m `which plackup` -s Starman -p 9292 -a ~/Code/mathoms/editserver.psgi

The relevant bit for running Edwin is the EDITSERVER_CMD environment variable, which we’ve set to run the edwin script shown above.

Note that this server is written in Perl and requires you to install the Starman and Plack modules. If you don’t like Perl or don’t know how to install Perl modules, there are other servers out there that should work for you, such as this one written in Python.

Edit text!

Once you’ve done everything above and gotten it working together, you should be able to click the “edit” button next to your browser textarea and start Edwin. It will look something like the following screenshot (which you saw at the beginning of this post):

edwin-editing-textarea

If you prefer video, check out this short demo on YouTube.

A Mini Python and Shell Tutorial

wooly-mammoth-cp

The following is an email I sent to a couple of coworkers whom I’d been teaching a short Python course for technical writers, using Automate the Boring Stuff with Python. The email was meant to show them a real-life example of how a technical writer can use Python and shell scripting to automate something that is, well, boring. In this case, the task was to clean up a CSV file containing a list of git commits to the AppNexus REST APIs.

Because of the way we received this data, it had duplicate entries, and lots of non-interesting merge commits that were unrelated to a feature (a feature is generally associated with a JIRA ticket). Our task was to review the commits and see if there was anything interesting that should be added to our monthly API release notes.

(The names of my coworkers have been changed, obv.)


To: Jane X. (‘REDACTED@appnexus.com’)

Subject: Filtered API git commits to review (bonus: mini Python & shell tutorial)

From: Rich Loveland (‘REDACTED@appnexus.com’)

CC: Victoria Y. (‘REDACTED@appnexus.com’)

Date: Wed, 18 Nov 2015 16:59:04 -0500

+Victoria for the code fun

Jane, the file of commit logs for you to review is attached (along with some others). But so what, that’s boring! Let’s talk about how it was made.

To make the really boring task of reviewing API git commits less awful, let’s do some programming for fun. First let’s write a short Python script to pull out only those commits that have a JIRA ticket ID in them (since we don’t care about the other ones), and call it ‘filter-commit-messages.py’:

  #!/usr/bin/env python

  import re
  import sys

  jira_pat = "[A-Z]+-[0-9]+"

  for line in sys.stdin.readlines():
      m = re.search(jira_pat, line)
      if m:
          print(line)

This tries to match a regular expression against each line of its input (in this case the compiled API git commit list), and prints the line if the match occurs.

Let’s make it executable from our shell:

$ cd ~/bin
$ ln -s ~/work/code/filter-commit-messages.py filter-commit-messages
$ chmod +x ~/bin/filter-commit-messages
$ export PATH=$HOME/bin:$PATH

Then we can run it on the text file with the git commits like so:

$ filter-commit-messages < api-release-november-2015.csv

(The “<” in the shell means “Read your input from this place”.)

This prints out only the matching lines, but there are a lot of annoying extra lines in the output. We can get rid of those lines while sorting them like so:

$ filter-commit-messages < api-release-november-2015.csv | sort 

(The ”

” in the shell means “Pass your output through to this other command”.)

Now that we are extracting only the important lines, let’s throw them in a file:

$ filter-commit-messages < api-release-november-2015.csv | sort > api-release-november-2015-actual.csv

(The “>” near the end means “Write all of the output to this place”.)

We can see how much less reading we have to do now by running a word count program (‘wc’) on the before and after files:

$ wc -l api-release-november-2015.csv # old
     201 api-release-november-2015.csv
$ wc -l api-release-november-2015-actual.csv # new
     115 api-release-november-2015-actual.csv

(The “-l” means “count the lines”.)

Now, since Jane and I each have to review half of the commits, we can use the ‘split’ shell command to break the file in half. Since we know the file is 115 lines, we need to tell ‘split’ how many lines to put in each half with the ‘-l’ option (see ‘man split’ in your terminal):

$ split -l 58 api-release-november-2015-actual.csv COMMITS-TO-REVIEW

‘split’ takes the last argument, “COMMITS-TO-REVIEW”, and creates two files based on that, “COMMITS-TO-REVIEWaa” and “COMMITS-TO-REVIEWbb”, which we can rename for each reviewer:

$ mv COMMITS-TO-REVIEWaa COMMITS-TO-REVIEW-RICH
$ mv COMMITS-TO-REVIEWbb COMMITS-TO-REVIEW-JANE

A nice thing is that because we sorted the lines of the files, each reviewer gets commits by a sorted subset of the engineers, making it easier to see their related commits next to each other.

p.s. We didn’t actually need a Python program for the first part, we could have just used ‘grep’ and stayed with shell commands. But hey!

p.p.s. With more work, this could all be put together into a single program if we were inclined, but since it doesn’t get used that often it’s probably OK to type a few commands.

(Image courtesy William Hartman under Creative Commons license.)