Scripting Language Idioms: The “Seen” Hash

The “seen” hash technique is an idiom that lets you use a hash (or dictionary if you prefer) as a set data type. It’s good for generating a de-duplicated list of things, where each thing appears only once. If your language of choice has a real set data type, you may want to use that instead.

To illustrate I’ll offer a real-world use case.

The other day at work I needed to grab a bunch of information about git commits from a batch of automated emails. For reasons that don’t matter right now, our team (Docs) gets automated emails about git commits on our API (It’s not really what we asked for, but it’s what we could get somebody to build).

As a result, we get a bunch of emails formatted like this (personal details changed to protect the guilty):

--------- Project: Foo Details: something something garbage noise etc.

J. Random Luser ABC-123 did something to some other thing
Alyssa P. Hacker ABC-124 fixed mistake in ABC-123
Ben Bitdiddle ABC-125 yet another thing was done
J. Random Luser oh yeah this thing too
Ben Bitdiddle Merge ABC-123 into Foo/master

Luckily I don’t need to read all of these darn things. I have a filter set up on my mail client that saves them all in my ‘Archive’ folder, where I can safely ignore them.

When we’re getting ready to do our API release notes, I go into my ‘Archive’ folder and search for all of the emails with the subject “Project: Foo” that arrived between our last set of release notes and today. I end up with (say) about 100 files formatted like the above.

The format is: Name, JIRA ticket ID, description. Except that sometimes there is no JIRA ticket ID. And sometimes there are duplicate ticket IDs, since the emails contain messages about merge commits.

As a tech writer, I don’t need to look at the contents of every commit. I need to generate a (de-duplicated) list of JIRA ticket IDs, so I can go and review those tickets to see if there is user-facing docs work that needs to happen for those commits. (Sometimes I still need to look at the commits anyway because a ticket has a description like “change the frobnitz”, but hey.)

So I save all of these email files into a directory, and I write some code that loops over each file, generating a set of JIRA ticket IDs, which I then print out. Here’s the code what done it (it’s written in Perl but could as easily be Ruby or Python or whatevs):

#!perl

use strict;
use warnings;
use feature     qw/ say   /;
use File::Slurp qw/ slurp /;

my @files = glob('*.eml');
my $jira_pat = '([A-Z]+-[0-9]+)';
my %seen;

for my $f (@files) {
  my @lines = slurp($f);

  for my $line (@lines) {
    next unless $line =~ /$jira_pat/; # Skip unless it has a JIRA ticket ID
    my $id = $1;                      # If it did match, save the capture
    $seen{$id}++;                     # Stick the ID in the hash (as a key)
  }
}

say for sort keys %seen;        # Print out all the keys (which are de-duped)

The reason this trick works is that a hash table can’t have duplicate keys. Therefore the ‘$seen{$id}++’ bit means: “Stick the ID in the hash, and increment its value”. Based on the example email above, you end up with a hash table that looks like this:

{
  ABC-123 => 2,
  ABC-124 => 1,
  ABC-125 => 1,
}

Then we print the keys using the line say for sort keys %seen, which just means “print the hash keys in sorted order”.

Perl’s Autovivification FTW

Interestingly, part of the reason this idiom is cleaner in Perl than in, say, Ruby, is that Perl does something called “autovivification” of hash keys. Basically, it means that stuff gets created as soon as you mention it. That’s why you can call the ‘$seen{$id}++’ all in one line. (If you want more information about autovivification, there’s a good article on the Wikipedia.)

By contrast, in Ruby you have to first explicitly create the key’s value, and then increment it. As you can see below, if you try to bump the value of a key that doesn’t exist yet, you get an error (unless you use the technique from the Wikipedia article).

irb(main):015:0> RUBY_VERSION
=> "2.2.4"
irb(main):010:0> tix
=> {"ABC-123"=>2, "ABC-124"=>1}
irb(main):011:0> tix['ABC-125'] += 1
NoMethodError: undefined method `+' for nil:NilClass
    from (irb):11
    from c:/Ruby22/bin/irb:11:in `<main>'
irb(main):012:0> tix['ABC-125'] = 1
=> 1
irb(main):013:0> tix
=> {"ABC-123"=>2, "ABC-124"=>1, "ABC-125"=>1}
irb(main):014:0> tix['ABC-125'] += 1
=> 2

Further Reading

How Dangerous is the Samsung Galaxy Note 7? It’s Safer than Driving

Now that the Galaxy Note 7 has been officially discontinued, I’m not sure it’s worth worrying about the failure rate of this device. But there’s something that really bothered me about the coverage of the device’s various recalls and eventual discontinuation (is that a word?), which was that almost nobody seemed to be running the numbers on the actual failure rates.

If you do the arithmetic on the device failure rates, you end up looking at the situation rather differently. This is not to say that the device being discontinued was the wrong decision — all it takes is one person being horribly burned to create a panic and do serious damage to the company, not to mention that person!

Rather, I think it’s interesting to do the arithmetic as a way of exploring how humans think about risk. It may not surprise you to hear that I think we are really bad at this. And oftentimes it’s because we don’t run the numbers.

With that said, let’s look at some numbers.

According to this article on the Galaxy Note 7 recall, there were about 2.5 million devices sold in the initial batch, and, at least in early September, there had been 35 handsets discovered with the issue. A later report said that over 70 devices had overheated.

The best final count I could find is the one from the Consumer Product Safety Commission. According to the CPSC, there have been 92 reports of the batteries overheating.

How much danger was I really in? (I just returned my Galaxy Note 7 yesterday, which I LOVED, which is part of why I’m writing this.)

  • 2.5 million phones
  • 92 incidents of overheating

Turning to my trusty calculator, that looks like a 1 in 26,000 chance of the device overheating:

CL-USER> (/ 2.4e6 92)
26086.957

Expressed as a percentage, there is approximately a 0.004% chance that your device would have been one of the ones to overheat:

CL-USER> (format t "~6$" (* (/ 92 2.4e6) 100))
0.003833

However, let’s make a more conservative assumption that 1000 devices (over 10x as many) would eventually overheat. That’s still 0.04%, far less than one tenth of one percent. Of course, that number “less than one tenth of one percent” was quoted by Samsung themselves during the initial recall:

CL-USER> (format t "~6$" (* (/ 1000 2.4e6) 100))
0.041667

Eventually, the bad PR due to overheating devices grew to be too much, and Samsung discontinued the model.

One lesson of this incident seems to be that you can make a product that is nearly perfect, with a 0.0038% failure rate, but if the failure mode is bad enough (it probably is), and if the media exposure is widespread enough to create a public outcry (it definitely was), you’re fucked. With lines like this one appearing in the Verge, it’s not hard to understand why Samsung realized they had to just kill it:

It’s easy to imagine how terrifying it would be to have a phone begin smoking like this on a plane or on your bedside table. No thanks.

The mobile device hardware industry is brutal. You can’t have a failure that occurs even in 0.0038% of devices. And even if you maintain that near-perfect safety record you still have to compete on price, features, and time-to-market. I really don’t envy those folks.

But what about risk assessment?

What I find most interesting, as I mentioned above, is what this reflects about how humans assess risk. For example, in 2012, 92 people died in car accidents every day, and nobody has ever considered doing a recall of all automobiles sold in the United States for being fundamentally unsafe!

According to the chart linked above, in 2012 there were 10.691 auto accident deaths per 100,000 people, which means that you had a 0.001% chance of literally dying in a horrible car crash:

CL-USER> (format t "~6$" (* (/ 10.691 10e5) 100))
0.001069

Compare this to the 0.004% chance of your phone overheating that we calculated above (based on the 92 incidents figure). Given the rather imprecise way we’ve been slinging these numbers around, let’s just assume there’s a lot of error there, and that the figures are roughly equal.

That’s how we arrive at our conclusion:

You have as much chance of your Samsung Galaxy Note 7 overheating as you did of dying in a car crash in 2012.

(Note: this article and its headline do not constitute a claim that the Note 7 is “safe”, or that you should not return it as recommended, etc. This article is not advice on how to live your life, it’s just an exploration of how humans think about risk.)

How to Sync your Opera 12 bookmarks, history, and mail using Dropbox

opera-2004

In this tutorial I’m going to show you how to sync your Opera 12 bookmarks, history, and mail using Dropbox. It involves doing some annoying things to your files (symlinking, etc.) so that Dropbox does the hard work of keeping your bookmarks, mail, history, etc., updated across your various computers. On the bright side, you should only have to do it once.

This post assumes you are a Dropbox user, and you have Opera 12 installed on multiple computers. It doesn’t cover mobile devices, since there was never a mobile version of Opera’s desktop browser.

These instructions work for Linux and the Mac. I would like to do this on my Windows laptop too, but I haven’t been able to get it to work yet. I can’t seem to get Opera to use a link to a folder as if it’s a real folder. If someone knows how to do this, feel free to leave a comment.

One more thing:

OPERA 12 IS OLD AND UNSUPPORTED SOFTWARE THAT PROBABLY HAS SECURITY VULNERABILITIES. YOU ALMOST CERTAINLY SHOULD NOT USE IT. OPERA SOFTWARE ASA DOESN’T WANT YOU TO USE IT. THE ONLY REASON I FEEL “SAFE” USING OPERA 12 IS BECAUSE I BLACKLIST ALL JAVASCRIPT BY DEFAULT AND USE MODERN UP-TO-DATE BROWSERS FOR ONLINE BANKING, INTERACTIVE WEB APPS, ETC.

OK, now that I’ve shouted a disclaimer at you, let’s do this.

A note on using Dropbox

You may have some concerns about Dropbox’s file synchronization messing up your stuff. For what it’s worth, I’ve been doing this for several months now, and have found that for the most part it just works. In my experience Dropbox basically just handles file updates with no drama.

If it does encounter drama (e.g., finds that it can’t handle a conflict between two different versions of a file), it creates a new version of the file called something like “autosave\ (glamdring2's\ conflicted\ copy\ 2016-02-27).win“, where “glamdring2” is the name of the computer with the conflicted file.

This means you can manually fix things if you have to, but you basically never have to as long as you take care to shut down Opera on computer A and wait until Dropbox is fully synced to start Opera on computer B. This normally takes just a few seconds.

Remember, you should not be doing this with your only copy of your Opera folder. You need to keep regular backups of your Opera folder, and syncing is not a backup.

Step 1. Find your Opera folder

Because Opera is a pretty sane app, it stores all of its junk in one folder (with the exception of the Mac version, about which more below).

Depending on your platform, you’ll find the folder in different places. For more information about where your version of Opera stores its files, see Files used by Opera.

On Linux, it’s easy. Everything is in

~/.opera

On Mac, there is a slight wrinkle. Opera 12 stores your Opera prefs and bookmarks in one place, and your mail in another. This means we need to get them all together in one place so we can sync them with Dropbox.

According to the link above, here’s where your bookmarks, history, etc. are stored if you’re a Mac user:

~/Library/Opera

And it stores your mail in:

~/Library/Application Support/Opera

This is because (according to the docs), user data of “significant size” should go into this other random folder. Um, okay.

In any case, in order to get your Mac install of Opera 12 to work with the actual mail you’ll be syncing in your Dropbox folder, you need to do some goofy stuff with symbolic links, namely:

$ cd ~/Library/Application Support/Opera/
$ mv mail ~/Library/Opera
$ ln -s ~/Library/Opera/mail .

Now everything should all be in ~/Library/Opera (Mac) or ~/.opera (Linux).

Step 2. Copy your Opera folder into Dropbox

Now that we have all of our Opera things in one place, let’s copy the single Opera folder into Dropbox and link it back to wherever your Opera app is expecting the folder to be. On Linux you would do something like this in your terminal – note that Opera should not be running when you do this:

$ mv ~/.opera ~/.opera.bak # Minimal backup copy, you should really do more
$ cp -R ~/.opera.bak ~/Dropbox/opera_config

For Mac it’s pretty similar:

$ mv ~/Library/Support/Opera ~/Library/Support/Opera.bak # Minimal backup copy
$ cp -R ~/Library/Opera.bak ~/Dropbox/opera_config

In English, this:

  • Makes a backup of your Opera folder so if something gets messed up, you can put things back the way they were
  • Copies the backup into your Dropbox folder, where it can be synced across all of your machines

Step 3. Link the folder from Dropbox back to where the app wants it

Now we need to make it look to the Opera 12 app like everything is totally normal. We’ll do that by creating a link from the folder sitting in Dropbox back to where the app expects it to be.

Again, I’m really happy to hear from people about how to do this on Windows. Thus far all of my attempts have failed, and the way Windows handles what it calls “links” seems basically Crazytown. But it’s probably just that I don’t understand them well enough.

Anyway, here’s how you do the linking on Linux:

$ ln -s ~/Dropbox/opera_config ~/.opera

And on Mac:

$ ln -s ~/Dropbox/opera_config ~/Library/Opera

Step 4. Profit?

At this point you should be syncing your mail, bookmarks, history, and sessions across computers. When you shut down Opera 12 on your work machine (a Mac laptop), you can open up Opera 12 on your home workstation running Linux and be in the same session, looking at the same tabs and emails. Everything should just magically work.

If it doesn’t, please let me know in the comments and I’ll try to help.

Editing Chrome Textareas with Edwin

edwin-editing-textarea

In this post, I’ll describe how to edit Chrome textareas with the Edwin text editor that comes built-in with MIT/GNU Scheme.

If you just want to see the end result, see the screenshot and video at the end of this post.

These instructions will also work with recent releases of the Opera browser (since the newer Chromium-based versions can run Chrome plugins). They may also work at some point with Firefox, when Mozilla implements the new WebExtensions API.

At a high level, the steps to edit Chrome textareas with Edwin are:

  1. Install a browser add-on
  2. Customize Edwin with a few hacks
  3. Write a shell script to make it easy to launch Edwin from the command line
  4. Run a local “edit server” that interacts with the browser add-on and launches Edwin

On This Page

Install the ‘Edit with Emacs’ add-on

Install the Edit with Emacs add-on from the Chrome Web Store.

Load some Edwin hacks

The default way to open Edwin is to run

$ mit-scheme --edit

This just launches an Edwin editor window. From there, you need to manually open files and edit them.

What we need is a way to launch Edwin and open a specific file automatically. Most editors you are familiar with already do this, e.g.,

$ vim /tmp/foo.txt
$ emacsclient /tmp/bar.txt

To be able to launch Edwin in this way, we need to hack a few procedures in the file editor.scm in the MIT/GNU Scheme source and load them from the Edwin init file. We’ll tackle each of these tasks separately below.

Hacking editor.scm

To get Edwin to open a file on startup, we need to tweak three procedures in editor.scm to accept and/or pass around filename arguments:

  • CREATE-EDITOR
  • STANDARD-EDITOR-INITIALIZATION
  • EDIT

Here’s the code; you can just paste it into a file somewhere. For the purposes of this post we’ll call it open-edwin-on-file.scm:

;;;; open-edwin-on-file.scm -- Rich's hacks to open Edwin on a specific file.

;;; These (minor) changes are all to the file `editor.scm'. They are
;;; all that is needed to allow Edwin to be opened on a specific file
;;; by adding a `filename' argument to the EDIT procedure.

(define (create-editor file . args)
  (let ((args
     (if (null? args)
         create-editor-args
         (begin
           (set! create-editor-args args)
           args)))
        (filename (if (file-exists? file)
                      file
                      #f)))
    (reset-editor)
    (event-distributor/invoke! editor-initializations)
    (set! edwin-editor
      (make-editor "Edwin"
               (let ((name (and (not (null? args)) (car args))))
             (if name
                 (let ((type (name->display-type name)))
                   (if (not type)
                   (error "Unknown display type name:" name))
                   (if (not (display-type/available? type))
                   (error "Requested display type unavailable:"
                      type))
                   type)
                 (default-display-type '())))
               (if (null? args) '() (cdr args))))
    (set! edwin-initialization
      (lambda ()
        (set! edwin-initialization #f)
        (if filename
                (standard-editor-initialization filename)
                (standard-editor-initialization))
    (set! edwin-continuation #f)
    unspecific))))

(define (standard-editor-initialization #!optional filename)
  (with-editor-interrupts-disabled
   (lambda ()
     (if (and (not init-file-loaded?)
          (not inhibit-editor-init-file?))
     (begin
       (let ((filename (os/init-file-name)))
         (if (file-exists? filename)
         (load-edwin-file filename '(EDWIN) #t)))
       (set! init-file-loaded? #t)
       unspecific))))
  (let ((buffer (find-buffer initial-buffer-name))
        (filename (if (not (default-object? filename))
                      ((ref-command find-file) filename)
                      #f)))
    (if (and buffer
         (not inhibit-initial-inferior-repl?))
    (start-inferior-repl!
     buffer
     (nearest-repl/environment)
     (and (not (ref-variable inhibit-startup-message))
          (cmdl-message/append
           (cmdl-message/active
        (lambda (port)
          (identify-world port)
          (newline port)))
           (cmdl-message/strings
        "You are in an interaction window of the Edwin editor."
                "Type `C-h' for help, or `C-h t' for a tutorial."
                "`C-h m' will describe some commands."
                "`C-h' means: hold down the Ctrl key and type `h'.")))))))

(define (edit file . args)
  (call-with-current-continuation
   (lambda (continuation)
     (cond (within-editor?
        (error "edwin: Editor already running"))
       ((not edwin-editor)
        (apply create-editor file args))
       ((not (null? args))
        (error "edwin: Arguments ignored when re-entering editor" args))
       (edwin-continuation
        => (lambda (restart)
         (set! edwin-continuation #f)
         (within-continuation restart
           (lambda ()
             (set! editor-abort continuation)
             unspecific)))))
     (fluid-let ((editor-abort continuation)
         (current-editor edwin-editor)
         (within-editor? #t)
         (editor-thread (current-thread))
         (editor-thread-root-continuation)
         (editor-initial-threads '())
         (inferior-thread-changes? #f)
         (inferior-threads '())
         (recursive-edit-continuation #f)
         (recursive-edit-level 0))
       (editor-grab-display edwin-editor
     (lambda (with-editor-ungrabbed operations)
       (let ((message (cmdl-message/null)))
         (cmdl/start
          (make-cmdl
           (nearest-cmdl)
           dummy-i/o-port
           (lambda (cmdl)
         cmdl       ;ignore
         (bind-condition-handler (list condition-type:error)
             internal-error-handler
           (lambda ()
             (call-with-current-continuation
              (lambda (root-continuation)
            (set! editor-thread-root-continuation
                  root-continuation)
            (with-notification-output-port null-output-port
              (lambda ()
                (do ((thunks (let ((thunks editor-initial-threads))
                       (set! editor-initial-threads '())
                       thunks)
                     (cdr thunks)))
                ((null? thunks))
                  (create-thread root-continuation (car thunks)))
                (top-level-command-reader
                 edwin-initialization)))))))
         message)
           #f
           `((START-CHILD ,(editor-start-child-cmdl with-editor-ungrabbed))
         (CHILD-PORT ,(editor-child-cmdl-port (nearest-cmdl/port)))
         ,@operations))
          message))))))))

Update your Edwin init file

Then, you’ll need to tweak your Edwin init file (also known as ~/.edwin) to load this file into Edwin’s environment on startup:

(load "/path/to/open-edwin-on-file.scm" '(edwin))

Write a shell script to make it easier launch Edwin from the command line

Now that the EDIT procedure takes a filename argument, we can wrap this all up in a shell script that calls Edwin with the right arguments. There may be other ways to accomplish this than in the code shown below, but it works.

Note that the path to my local installation of MIT/GNU Scheme on Mac OS X is slightly tweaked from the official install location. What’s important is that Scheme is invoked using the right “band”, or image file. For more information, see the fine manual.

Take the code below and stick it somewhere on your $PATH; on my machine it lives at ~/bin/edwin.

#!/usr/bin/env sh

EDIT_FILE=$1
SCHEME_CODE="(edit \"$EDIT_FILE\")"

if [[ $(uname) == 'Darwin' ]]; then
  _SCHEME_DIR=/Applications/MIT-Scheme.app/Contents/Resources
  SCHEME=$_SCHEME_DIR/mit-scheme
  MITSCHEME_BAND=$SCHEME_DIR/all.com
  CMD=$SCHEME
fi

if [[ $(uname) == 'Linux' ]]; then
  CMD=scheme
fi

N=$RANDOM
F=/tmp/edit-$N.scm

touch $F
echo $SCHEME_CODE > $F

$CMD --load $F

Install an edit server

Although the extension is called ‘Edit with Emacs’, it can be used with any text editor. You just need to be able to run a local “edit server” that generates the right inputs and outputs. Since Chrome extensions can’t launch apps directly, the extension running in the browser needs to act as a client to a locally running server, which will launch the app.

Since we want to launch Edwin, we’ll need to run a local edit server. Here’s the one that I use:

https://gist.github.com/frodwith/367752

To get the server to launch Edwin, I save the gist somewhere as editserver.psgi and run the following script (for more information on the environment variables and what they mean, see the comments in the gist):

#!/usr/bin/env sh
EDITSERVER_CMD='edwin %s' \
EDITSERVER_BLOCKING=1 \
screen -d -m `which plackup` -s Starman -p 9292 -a ~/Code/mathoms/editserver.psgi

The relevant bit for running Edwin is the EDITSERVER_CMD environment variable, which we’ve set to run the edwin script shown above.

Note that this server is written in Perl and requires you to install the Starman and Plack modules. If you don’t like Perl or don’t know how to install Perl modules, there are other servers out there that should work for you, such as this one written in Python.

Edit text!

Once you’ve done everything above and gotten it working together, you should be able to click the “edit” button next to your browser textarea and start Edwin. It will look something like the following screenshot (which you saw at the beginning of this post):

edwin-editing-textarea

If you prefer video, check out this short demo on YouTube.

A Mini Python and Shell Tutorial

wooly-mammoth-cp

The following is an email I sent to a couple of coworkers whom I’d been teaching a short Python course for technical writers, using Automate the Boring Stuff with Python. The email was meant to show them a real-life example of how a technical writer can use Python and shell scripting to automate something that is, well, boring. In this case, the task was to clean up a CSV file containing a list of git commits to the AppNexus REST APIs.

Because of the way we received this data, it had duplicate entries, and lots of non-interesting merge commits that were unrelated to a feature (a feature is generally associated with a JIRA ticket). Our task was to review the commits and see if there was anything interesting that should be added to our monthly API release notes.

(The names of my coworkers have been changed, obv.)


To: Jane X. (‘REDACTED@appnexus.com’)

Subject: Filtered API git commits to review (bonus: mini Python & shell tutorial)

From: Rich Loveland (‘REDACTED@appnexus.com’)

CC: Victoria Y. (‘REDACTED@appnexus.com’)

Date: Wed, 18 Nov 2015 16:59:04 -0500

+Victoria for the code fun

Jane, the file of commit logs for you to review is attached (along with some others). But so what, that’s boring! Let’s talk about how it was made.

To make the really boring task of reviewing API git commits less awful, let’s do some programming for fun. First let’s write a short Python script to pull out only those commits that have a JIRA ticket ID in them (since we don’t care about the other ones), and call it ‘filter-commit-messages.py’:

  #!/usr/bin/env python

  import re
  import sys

  jira_pat = "[A-Z]+-[0-9]+"

  for line in sys.stdin.readlines():
      m = re.search(jira_pat, line)
      if m:
          print(line)

This tries to match a regular expression against each line of its input (in this case the compiled API git commit list), and prints the line if the match occurs.

Let’s make it executable from our shell:

$ cd ~/bin
$ ln -s ~/work/code/filter-commit-messages.py filter-commit-messages
$ chmod +x ~/bin/filter-commit-messages
$ export PATH=$HOME/bin:$PATH

Then we can run it on the text file with the git commits like so:

$ filter-commit-messages < api-release-november-2015.csv

(The “<” in the shell means “Read your input from this place”.)

This prints out only the matching lines, but there are a lot of annoying extra lines in the output. We can get rid of those lines while sorting them like so:

$ filter-commit-messages < api-release-november-2015.csv | sort 

(The ”

” in the shell means “Pass your output through to this other command”.)

Now that we are extracting only the important lines, let’s throw them in a file:

$ filter-commit-messages < api-release-november-2015.csv | sort > api-release-november-2015-actual.csv

(The “>” near the end means “Write all of the output to this place”.)

We can see how much less reading we have to do now by running a word count program (‘wc’) on the before and after files:

$ wc -l api-release-november-2015.csv # old
     201 api-release-november-2015.csv
$ wc -l api-release-november-2015-actual.csv # new
     115 api-release-november-2015-actual.csv

(The “-l” means “count the lines”.)

Now, since Jane and I each have to review half of the commits, we can use the ‘split’ shell command to break the file in half. Since we know the file is 115 lines, we need to tell ‘split’ how many lines to put in each half with the ‘-l’ option (see ‘man split’ in your terminal):

$ split -l 58 api-release-november-2015-actual.csv COMMITS-TO-REVIEW

‘split’ takes the last argument, “COMMITS-TO-REVIEW”, and creates two files based on that, “COMMITS-TO-REVIEWaa” and “COMMITS-TO-REVIEWbb”, which we can rename for each reviewer:

$ mv COMMITS-TO-REVIEWaa COMMITS-TO-REVIEW-RICH
$ mv COMMITS-TO-REVIEWbb COMMITS-TO-REVIEW-JANE

A nice thing is that because we sorted the lines of the files, each reviewer gets commits by a sorted subset of the engineers, making it easier to see their related commits next to each other.

p.s. We didn’t actually need a Python program for the first part, we could have just used ‘grep’ and stayed with shell commands. But hey!

p.p.s. With more work, this could all be put together into a single program if we were inclined, but since it doesn’t get used that often it’s probably OK to type a few commands.

(Image courtesy William Hartman under Creative Commons license.)

How to Install the Pentadactyl Firefox Add-On

wilkie-reservoir-december-2015-small

(Wilkie Reservoir, Queensbury, NY)

This post describes how to install the latest nightly build of Pentadactyl, a browser add-on for Firefox that gives it Vim-like keybindings and behavior.

These instructions are current as of the date of this post. I’m using Firefox 44.0.2 on the release channel as I write this.

On This Page

Step 1. Turn off Add-on Signing

In Firefox, open about:config. You may have to click through a nanny warning about voiding your warranty.

In the text area at the top of the config screen, type xpinstall.signatures.required. This will filter out all of the other options.

Below the text area, double-click on the xpinstall.signatures.required row, which will change the Value to false (it defaults to true).

Step 2. Download the Add-on

Go to the Pentadactyl website, click nightly builds to get the latest version (sometimes the other downloads are outdated), and download the file pentadactyl-latest.xpi.

To install the extension directly from the downloaded file:

  • Go to about:addons
  • Click the gear icon at the upper right-hand side of the screen
  • A dropdown menu will appear; select Install Add-on From File
  • A popup will appear, asking if you want to install this unverified add-on; click Install

If the extension installs, it will change the Firefox UI a lot. It will look like the GUI is gone completely, and it can be a little confusing at first. Type :help to read the built-in docs. If that doesn’t work, type :open http://5digits.org/help/pentadactyl/ to read the online version.

Troubleshooting

Pentadactyl could not be installed because it is not compatible

If the add-on installation fails because of a version mismatch, you’ll need to do the following:

Open the XPI file (which is basically just a zip file) in Vim, Emacs, or something else that can edit the contents of zip files directly. Edit the install.rdf XML file so the em:maxVersion attribute is a number equal to or higher than your Firefox version:

<em:targetApplication>
    <Description
        em:id="{ec8030f7-c20a-464f-9b0e-13a3a9e97384}"
        em:minVersion="31.0"
        em:maxVersion="42.*"/>
</em:targetApplication>

I can’t turn off extension signing

Mozilla are planning to remove the ability to turn off extension signing requirement in an upcoming version. If you are reading this after that has happened, a possible workaround is to install Firefox ESR (Extended Support Release), which is aimed at enterprise or education environments and lags several development cycles behind the “consumer” versions.

(The above photo was taken by me and is available under a Creative Commons license.)

Oh my, am I really considering XML?

Since writing Why Markdown is not my favorite text markup language, I’ve been thinking more about document formats.

More and more I begin to see the impetus for the design of XML, despite its sometimes ugly implementation. With XML you avoid much of the ambiguity of parsing plain-text-based formats and just write the document AST directly. Whether this is a good or bad thing seems to depend on the tools you have available to you, but I think I’m starting to see the light.

At $WORK, for example, I’ve been writing directly in the “XML-ish” Confluence storage format since it was introduced in Confluence 4. Combined with the right editing environment (such as that provided by Emacs’ nxml-mode), it’s easy to navigate XML “structurally” in such a way that you no longer really see the tags.

It’s sort of like being Neo in The Matrix except that, instead of making cool shit happen in an immersive virtual world you’re, um, writing XML.

However, not all is roses in XML-land. In an ideal world, you could maintain a set of XML documents and reliably transform them into other valid formats using a simple set of tools that are easy to learn and use. In reality, many of the extant XML tools such as XSLT exhibit a design aesthetic that is deeply unappealing to most programmers. The semantics of XSLT are interesting, but the syntax appears to be a result of the mistakes that are often made when programmers decide to create their own DSLs. Olin Shivers has a good discussion of the often-broken “little language” phenomenon in his scsh paper.

Speaking of Scheme, it’s possible that something reasonable can be built with SXML. I’ve also had good results using Perl and Mojo::DOM to build Graphviz diagrams of the links among Confluence wiki pages as part of a hacked-together “link checker” (Users of Confluence in an industrial setting will know that the built-in link-checking in Confluence only “sort of” works, which is indistinguishable in practice from not actually working — hence the need to build my own thing).

I’ve also been playing around with MIT Scheme’s built-in XML Parser, and so far I’m preferring it to the Perl or SXML way of doing things.