Advent of Code 2017, Day 2

This is my solution for Day 2 of this year’s Advent of Code.

You may also enjoy browsing the Day 2 solutions megathread on Reddit.

PROBLEM

The spreadsheet consists of rows of apparently-random numbers. To make sure the recovery process is on the right track, they need you to calculate the spreadsheet’s checksum. For each row, determine the difference between the largest value and the smallest value; the checksum is the sum of all of these differences.

For example, given the following spreadsheet:

5 1 9 5
7 5 3
2 4 6 8

The first row’s largest and smallest values are 9 and 1, and their difference is 8.

The second row’s largest and smallest values are 7 and 3, and their difference is 4.

The third row’s difference is 6.

In this example, the spreadsheet’s checksum would be 8 + 4 + 6 = 18.

SOLUTION

(define (line->list line)
  ;; String -> List
  (let ((read-ln (field-reader (infix-splitter (rx (+ whitespace)))))
        (in-port (make-string-input-port line)))
    (receive (record fields)
        (read-ln in-port)
      (map string->number fields))))

(define (read-spreadsheet file)
  ;; File -> List[List[Number]]
  (call-with-input-file file
    (lambda (port)
      (let loop ((line (read-line port))
                 (results '()))
        (if (eof-object? line)
            results
            (loop (read-line port) (cons line results)))))))

(define (main prog+args)
  (let ((rows (read-spreadsheet "/Users/rloveland/Code/personal/advent-of-code/2017/02/02.dat")))
    (write (apply + (map
                     (lambda (row)
                       (let* ((xs (line->list row))
                              (min (apply min xs))
                              (max (apply max xs)))
                         (- max min)))
                     rows)))
    (newline)))
Advertisements

Advent of Code 2017, Day 1

This is my solution for Day 1 of this year’s Advent of Code.

You may also enjoy browsing the Day 1 solutions megathread on Reddit.

PROBLEM

The captcha requires you to review a sequence of digits (your puzzle input) and find the sum of all digits that match the next digit in the list. The list is circular, so the digit after the last digit is the first digit in the list.

For example:

  • 1122 produces a sum of 3 (1 + 2) because the first digit (1) matches the second digit and the third digit (2) matches the fourth digit.

  • 1111 produces 4 because each digit (all 1) matches the next.

  • 1234 produces 0 because no digit matches the next.

  • 91212129 produces 9 because the only digit that matches the next one is the last digit, 9.

SOLUTION

(define captcha-input "5994521226795838")

'(set! captcha-input "1111")

'(set! captcha-input "1122")

'(set! captcha-input "1234")

'(set! captcha-input "91212129")

(define (gather-matches s)
  ;; String -> List
  (let ((in-port (make-string-input-port s)) (count 0) (head #f) (vals '()))
    (let loop ((cur (read-char in-port)) (next (peek-char in-port)) (count count) (vals vals))
      (if (eof-object? next)
          (if (char=? cur head)
              (cons cur vals)
              vals)
          (cond ((= count 0)
                 (begin
                   (set! head cur)
                   (loop cur next (+ 1 count) vals)))
                 ((char=? cur next)
                 (loop (read-char in-port) (peek-char in-port) (+ 1 count) (cons cur vals)))
                (else (loop (read-char in-port) (peek-char in-port) (+ 1 count) vals)))))))

(define (main prog+args)
  (let* ((matches (gather-matches captcha-input))
         (matches* (map (lambda (c) (string->number (string c))) matches))
         (sum (apply + matches*)))
    (begin
      (format #t "MATCHES*: ~A~%" matches*)
      (format #t "SUM: ~A~%" sum))))

Thinking about software documentation as the output of a lossy compression algorithm

stick

How many times have you heard or read the following comments?

  1. “The docs are always out of date”
  2. “I don’t bother reading the docs, I just read the source code”
  3. “If you write self-documenting code, you don’t need to write docs”

If you work in software, I bet the answer is: a lot. They range in truth value from #1, which is a tautology, to #3, which is a fantasy. I can relate to #2, since I had to do it just yesterday when using a semi-undocumented Ruby library.

I think all of these points of view (and more) are explained when you think about software documentation as the output of a lossy compression algorithm. Many (most? all?) of the things that you love and/or hate about the documentation for a particular piece of software are explained by this.

Quoth wiki:

Well-designed lossy compression technology often reduces file sizes significantly before degradation is noticed by the end-user. Even when noticeable by the user, further data reduction may be desirable (e.g., for real-time communication, to reduce transmission times, or to reduce storage needs).

As such, the features of software documentation are similar to the features of other products of lossy compression algorithms.

Take mp3s for example. The goal of an mp3 is not to be the highest-fidelity replication of the audio experience it represents. The goal of an mp3 is to provide a “good enough” audio experience given the necessary tradeoffs that had to be made because of constraints such as:

  • Time: How expensive in CPU time is it to run the decompression algorithm? Can the CPU on the device handle it?
  • Space: How much disk space will the file take on the device?

Similarly, we might say that the goal of software documentation is not to be the highest-fidelity replication of the “understanding” experience it (theoretically) represents. The goal of a piece of documentation is to provide a “good enough” learning experience given the necessary tradeoffs that had to be made because of constraints such as:

  • Time: How expensive in person time is it to “run the decompression algorithm”, i.e., learn how the system works well enough to write something coherent? And then to actually write it? How many technical writers does the organization employ per engineer? (In many companies it’s ~1 per 40-50 engineers) How many concurrent projects is that writer working on, across how many teams?
  • Space: How much information does the user need to use the system? How little information can you get away with providing before users give up in disgust?

Remember that fewer, more effective materials cost more to produce. This is similar to the way better compression algorithms may cost more than worse ones along various axes you care about (dollar cost for proprietary algorithms, CPU, memory, etc.)

It takes longer to write more concise documentation, draw clear system diagrams, etc., since those are signs that you actually understand the system better, and have thus compressed the relevant information about it into fewer bytes.

And oh by the way, in practice in an “agile” (lol) environment you don’t have enough time to write the “best” docs for any given feature X. Just like the programmer who wrote feature X would likely admit that she didn’t have enough time to write the “best” implementation according to her standards.

Quoth Pascal:

I would have written a shorter letter, but I did not have the time.

So the next time you are frustrated by the docs for some piece of software (if any docs exist at all), instead of some platitude about docs sucking, think “oh, lossy compression”.

(Image courtesy Jonathan Sureau under Creative Commons license.)

A Portable Scheme Module System

collins-mustang

 

In this post I’d like to introduce load-module, a portable Scheme module system.

Why did I write a module system?

  • Simplicity: A single-file module system in about 200 lines of code
  • Understandability: The implementation avoids wizardry and should be accessible to anyone who knows the language
  • Portability: One system that can be used across multiple implementations

The way it works is this:

  1. You have a file (say, utils.scm) with Scheme code in it that implements stuff that you want to live in the same module.
  2. You create another file (utils.mod, but that extension is easy to change) which lists the procedures and syntax you want to export.
  3. The load-module procedure reads utils.scm, rewriting unexported procedure names such that only the procedures you want exported show up at the top-level. Everything else gets rewritten during load-time as an ignorable “gensym” of the form %--gensym-utils-random-integer-8190504171, where “utils” is the module name, and “random-integer” is the procedure internal to your module.

The module file format is very simple:

(define-module utils
  (exports random-integer atom? take))

The module system exports one procedure: load-module. Run it like so to get the procedures from the aforementioned hypothetical utils package into your environment:

> (load "load-module.scm")
> (load-module 'utils)
#t
> (random-integer 199)
76
> (atom? 199)
#t

If you care, there’s more information about over at the project README.

(Image courtesy Geoff Collins under Creative Commons license.)

Announcing cpan.el

The Cat's Eye Nebula, one of the first planetary nebulae discovered, also has one of the most complex forms known to this kind of nebula. Eleven rings, or shells, of gas make up the Cat's Eye. Credit: NASA, ESA, HEIC, and The Hubble Heritage Team (STScI/AURA) Acknowledgment: R. Corradi (Isaac Newton Group of Telescopes, Spain) and Z. Tsvetanov (NASA) The Hubble Space Telescope is a project of international cooperation between NASA and the European Space Agency. NASA's Goddard Space Flight Center manages the telescope. The Space Telescope Science Institute conducts Hubble science operations. Goddard is responsible for HST project management, including mission and science operations, servicing missions, and all associated development activities.

The CPAN shell is just another shell, so why not drive it from Emacs?

If you write Perl code in Emacs, you may have wondered why we don’t have a simple mode for driving the CPAN shell (at least I couldn’t find one!).

Well, I finally stopped wondering. It wasn’t that hard to rip out the sh-specific parts of shell.el and make a new mode for the CPAN shell.

Here’s the code:

https://github.com/rmloveland/cpan-el

It’s easy to load up and drive from Emacs:

(add-to-list 'load-path (expand-file-name "/path/to/cpan-el/"))
(setq cpan-file-name "cpan")

(require 'cpan)

To run it, type M-x cpan.

There aren’t too any bells and whistles yet (completion, etc.), but you it’s pretty small so feel free to hack away.

(Image courtesy NASA Goddard Photo and Video under Creative Commons License.)

Scripting JIRA

black-origami-dragon

JIRA is everybody’s favorite enterprise issue management system. It’s the system many of us just love to hate.

Unlike some vocal people on the interwebs, I don’t “hate” JIRA, but I like to keep it at arm’s length. My job is not to be a JIRA jockey (although I do know me some JQL, about which more below) — my job is to get shit done.

And getting shit done quickly, for me at least, is usually a function of being able to control my tools from the command line, including via scripts.

That’s why I have written several scripts for interacting with JIRA from the command line (they’re explained below). You can get them from Github.

Note that they use some hard-coded values that match the JIRA server where I work, which you’ll have to change. However they do at least use your .netrc for passwords, etc., so you should be able to change them for your own use with a quick sed one-liner.

The functionality isn’t there to fully replace your web browser, especially if you work at a company that insists on enterprise-level JIRA jockeying, with crazy themes and labels and stuff, but here’s what’s included as of this writing — for each command, I’ve marked whether it is “plumbing” or “porcelain”:

  • jira-get-issue (plumbing): View an issue’s JSON, which you can pipe through tools such as jq to build other more generic tools you can control from your text editor, scripts, etc.
  • jira-create-issue (porcelain): This one is “end-user-ready”, in the sense that you call it and it pops up your text editor of choice so you can write the issue description, and when you close your editor the issue is created for you!
  • jira-search-issues (plumbing): Uses JQL to search your JIRA, and returns a bunch of issues as JSON for you to fiddle with however you prefer.
  • jira-add-comment (porcelain): Like jira-create-issue, this one pops open your text editor to add a comment to an issue.
  • jira-get-issue-status (plumbing): Gets back JSON describing an issue’s status. Used to check the status to transition an issue forwards or backwards using jira-set-issue-status (below).
  • jira-set-issue-status (porcelain-ish): Depending on an issue’s status, bump it forwards or backwards to an adjacent status. Help output describes the statuses and their ordering. (I’m not so sure about this one; more design work is needed, it’s still not really “porcelain”.)

As an example of using one of the plumbing tools to make a more user-friendly CLI tool, I have one I call jli that just gives me a list of my currently open issues:

#!/usr/bin/env sh

jira-search-issues "assignee = rloveland AND status not in \ 
(Closed, Resolved)" | jq '.key + " " + .fields.summary' | sed -e 's/\"//g';

Since it’s an arbitrary JQL query returning JSON, it’s easy to imagine how you might extend this to give separate lists per-project, ordered by status, etc. You can make your own little terminal- or text-file-based dashboard, send yourself a daily digest email using a larger script, and so forth.

There are some others, but I only have them on my work computer. For example, I have one that checks JIRA for new tickets by diffing the list of my assigned tickets and my local TODO list, which is in a text file.

Obviously these tools could use further development to be more modular and general-purpose, and they don’t come anywhere near covering the whole JIRA API, but they’re pretty small and easy to modify for your own use.

Most importantly, I use them every day at my job, so I know they work and are useful.

Hopefully they are useful to you too!

(Image courtesy Terry Robinson via Flickr under a Creative Commons license.)

Scripting Language Idioms: The “Seen” Hash

The “seen” hash technique is an idiom that lets you use a hash (or dictionary if you prefer) as a set data type. It’s good for generating a de-duplicated list of things, where each thing appears only once. If your language of choice has a real set data type, you may want to use that instead.

To illustrate I’ll offer a real-world use case.

The other day at work I needed to grab a bunch of information about git commits from a batch of automated emails. For reasons that don’t matter right now, our team (Docs) gets automated emails about git commits on our API (It’s not really what we asked for, but it’s what we could get somebody to build).

As a result, we get a bunch of emails formatted like this (personal details changed to protect the guilty):

--------- Project: Foo Details: something something garbage noise etc.

J. Random Luser ABC-123 did something to some other thing
Alyssa P. Hacker ABC-124 fixed mistake in ABC-123
Ben Bitdiddle ABC-125 yet another thing was done
J. Random Luser oh yeah this thing too
Ben Bitdiddle Merge ABC-123 into Foo/master

Luckily I don’t need to read all of these darn things. I have a filter set up on my mail client that saves them all in my ‘Archive’ folder, where I can safely ignore them.

When we’re getting ready to do our API release notes, I go into my ‘Archive’ folder and search for all of the emails with the subject “Project: Foo” that arrived between our last set of release notes and today. I end up with (say) about 100 files formatted like the above.

The format is: Name, JIRA ticket ID, description. Except that sometimes there is no JIRA ticket ID. And sometimes there are duplicate ticket IDs, since the emails contain messages about merge commits.

As a tech writer, I don’t need to look at the contents of every commit. I need to generate a (de-duplicated) list of JIRA ticket IDs, so I can go and review those tickets to see if there is user-facing docs work that needs to happen for those commits. (Sometimes I still need to look at the commits anyway because a ticket has a description like “change the frobnitz”, but hey.)

So I save all of these email files into a directory, and I write some code that loops over each file, generating a set of JIRA ticket IDs, which I then print out. Here’s the code what done it (it’s written in Perl but could as easily be Ruby or Python or whatevs):

#!perl

use strict;
use warnings;
use feature     qw/ say   /;
use File::Slurp qw/ slurp /;

my @files = glob('*.eml');
my $jira_pat = '([A-Z]+-[0-9]+)';
my %seen;

for my $f (@files) {
  my @lines = slurp($f);

  for my $line (@lines) {
    next unless $line =~ /$jira_pat/; # Skip unless it has a JIRA ticket ID
    my $id = $1;                      # If it did match, save the capture
    $seen{$id}++;                     # Stick the ID in the hash (as a key)
  }
}

say for sort keys %seen;        # Print out all the keys (which are de-duped)

The reason this trick works is that a hash table can’t have duplicate keys. Therefore the ‘$seen{$id}++’ bit means: “Stick the ID in the hash, and increment its value”. Based on the example email above, you end up with a hash table that looks like this:

{
  ABC-123 => 2,
  ABC-124 => 1,
  ABC-125 => 1,
}

Then we print the keys using the line say for sort keys %seen, which just means “print the hash keys in sorted order”.

Perl’s Autovivification FTW

Interestingly, part of the reason this idiom is cleaner in Perl than in, say, Ruby, is that Perl does something called “autovivification” of hash keys. Basically, it means that stuff gets created as soon as you mention it. That’s why you can call the ‘$seen{$id}++’ all in one line. (If you want more information about autovivification, there’s a good article on the Wikipedia.)

By contrast, in Ruby you have to first explicitly create the key’s value, and then increment it. As you can see below, if you try to bump the value of a key that doesn’t exist yet, you get an error (unless you use the technique from the Wikipedia article).

irb(main):015:0> RUBY_VERSION
=> "2.2.4"
irb(main):010:0> tix
=> {"ABC-123"=>2, "ABC-124"=>1}
irb(main):011:0> tix['ABC-125'] += 1
NoMethodError: undefined method `+' for nil:NilClass
    from (irb):11
    from c:/Ruby22/bin/irb:11:in `<main>'
irb(main):012:0> tix['ABC-125'] = 1
=> 1
irb(main):013:0> tix
=> {"ABC-123"=>2, "ABC-124"=>1, "ABC-125"=>1}
irb(main):014:0> tix['ABC-125'] += 1
=> 2

Further Reading