A Mini Python and Shell Tutorial

wooly-mammoth-cp

The following is an email I sent to a couple of coworkers whom I’d been teaching a short Python course for technical writers, using Automate the Boring Stuff with Python. The email was meant to show them a real-life example of how a technical writer can use Python and shell scripting to automate something that is, well, boring. In this case, the task was to clean up a CSV file containing a list of git commits to the AppNexus REST APIs.

Because of the way we received this data, it had duplicate entries, and lots of non-interesting merge commits that were unrelated to a feature (a feature is generally associated with a JIRA ticket). Our task was to review the commits and see if there was anything interesting that should be added to our monthly API release notes.

(The names of my coworkers have been changed, obv.)


To: Jane X. (‘REDACTED@appnexus.com’)

Subject: Filtered API git commits to review (bonus: mini Python & shell tutorial)

From: Rich Loveland (‘REDACTED@appnexus.com’)

CC: Victoria Y. (‘REDACTED@appnexus.com’)

Date: Wed, 18 Nov 2015 16:59:04 -0500

+Victoria for the code fun

Jane, the file of commit logs for you to review is attached (along with some others). But so what, that’s boring! Let’s talk about how it was made.

To make the really boring task of reviewing API git commits less awful, let’s do some programming for fun. First let’s write a short Python script to pull out only those commits that have a JIRA ticket ID in them (since we don’t care about the other ones), and call it ‘filter-commit-messages.py’:

  #!/usr/bin/env python

  import re
  import sys

  jira_pat = "[A-Z]+-[0-9]+"

  for line in sys.stdin.readlines():
      m = re.search(jira_pat, line)
      if m:
          print(line)

This tries to match a regular expression against each line of its input (in this case the compiled API git commit list), and prints the line if the match occurs.

Let’s make it executable from our shell:

$ cd ~/bin
$ ln -s ~/work/code/filter-commit-messages.py filter-commit-messages
$ chmod +x ~/bin/filter-commit-messages
$ export PATH=$HOME/bin:$PATH

Then we can run it on the text file with the git commits like so:

$ filter-commit-messages < api-release-november-2015.csv

(The “<” in the shell means “Read your input from this place”.)

This prints out only the matching lines, but there are a lot of annoying extra lines in the output. We can get rid of those lines while sorting them like so:

$ filter-commit-messages < api-release-november-2015.csv | sort 

(The ”

” in the shell means “Pass your output through to this other command”.)

Now that we are extracting only the important lines, let’s throw them in a file:

$ filter-commit-messages < api-release-november-2015.csv | sort > api-release-november-2015-actual.csv

(The “>” near the end means “Write all of the output to this place”.)

We can see how much less reading we have to do now by running a word count program (‘wc’) on the before and after files:

$ wc -l api-release-november-2015.csv # old
     201 api-release-november-2015.csv
$ wc -l api-release-november-2015-actual.csv # new
     115 api-release-november-2015-actual.csv

(The “-l” means “count the lines”.)

Now, since Jane and I each have to review half of the commits, we can use the ‘split’ shell command to break the file in half. Since we know the file is 115 lines, we need to tell ‘split’ how many lines to put in each half with the ‘-l’ option (see ‘man split’ in your terminal):

$ split -l 58 api-release-november-2015-actual.csv COMMITS-TO-REVIEW

‘split’ takes the last argument, “COMMITS-TO-REVIEW”, and creates two files based on that, “COMMITS-TO-REVIEWaa” and “COMMITS-TO-REVIEWbb”, which we can rename for each reviewer:

$ mv COMMITS-TO-REVIEWaa COMMITS-TO-REVIEW-RICH
$ mv COMMITS-TO-REVIEWbb COMMITS-TO-REVIEW-JANE

A nice thing is that because we sorted the lines of the files, each reviewer gets commits by a sorted subset of the engineers, making it easier to see their related commits next to each other.

p.s. We didn’t actually need a Python program for the first part, we could have just used ‘grep’ and stayed with shell commands. But hey!

p.p.s. With more work, this could all be put together into a single program if we were inclined, but since it doesn’t get used that often it’s probably OK to type a few commands.

(Image courtesy William Hartman under Creative Commons license.)

How to Install the Pentadactyl Firefox Add-On

wilkie-reservoir-december-2015-small

(Wilkie Reservoir, Queensbury, NY)

This post describes how to install the latest nightly build of Pentadactyl, a browser add-on for Firefox that gives it Vim-like keybindings and behavior.

These instructions are current as of the date of this post. I’m using Firefox 44.0.2 on the release channel as I write this.

On This Page

Step 1. Turn off Add-on Signing

In Firefox, open about:config. You may have to click through a nanny warning about voiding your warranty.

In the text area at the top of the config screen, type xpinstall.signatures.required. This will filter out all of the other options.

Below the text area, double-click on the xpinstall.signatures.required row, which will change the Value to false (it defaults to true).

Step 2. Download the Add-on

Go to the Pentadactyl website, click nightly builds to get the latest version (sometimes the other downloads are outdated), and download the file pentadactyl-latest.xpi.

To install the extension directly from the downloaded file:

  • Go to about:addons
  • Click the gear icon at the upper right-hand side of the screen
  • A dropdown menu will appear; select Install Add-on From File
  • A popup will appear, asking if you want to install this unverified add-on; click Install

If the extension installs, it will change the Firefox UI a lot. It will look like the GUI is gone completely, and it can be a little confusing at first. Type :help to read the built-in docs. If that doesn’t work, type :open http://5digits.org/help/pentadactyl/ to read the online version.

Troubleshooting

Pentadactyl could not be installed because it is not compatible

If the add-on installation fails because of a version mismatch, you’ll need to do the following:

Open the XPI file (which is basically just a zip file) in Vim, Emacs, or something else that can edit the contents of zip files directly. Edit the install.rdf XML file so the em:maxVersion attribute is a number equal to or higher than your Firefox version:

<em:targetApplication>
    <Description
        em:id="{ec8030f7-c20a-464f-9b0e-13a3a9e97384}"
        em:minVersion="31.0"
        em:maxVersion="42.*"/>
</em:targetApplication>

I can’t turn off extension signing

Mozilla are planning to remove the ability to turn off extension signing requirement in an upcoming version. If you are reading this after that has happened, a possible workaround is to install Firefox ESR (Extended Support Release), which is aimed at enterprise or education environments and lags several development cycles behind the “consumer” versions.

(The above photo was taken by me and is available under a Creative Commons license.)

Oh my, am I really considering XML?

Since writing Why Markdown is not my favorite text markup language, I’ve been thinking more about document formats.

More and more I begin to see the impetus for the design of XML, despite its sometimes ugly implementation. With XML you avoid much of the ambiguity of parsing plain-text-based formats and just write the document AST directly. Whether this is a good or bad thing seems to depend on the tools you have available to you, but I think I’m starting to see the light.

At $WORK, for example, I’ve been writing directly in the “XML-ish” Confluence storage format since it was introduced in Confluence 4. Combined with the right editing environment (such as that provided by Emacs’ nxml-mode), it’s easy to navigate XML “structurally” in such a way that you no longer really see the tags.

It’s sort of like being Neo in The Matrix except that, instead of making cool shit happen in an immersive virtual world you’re, um, writing XML.

However, not all is roses in XML-land. In an ideal world, you could maintain a set of XML documents and reliably transform them into other valid formats using a simple set of tools that are easy to learn and use. In reality, many of the extant XML tools such as XSLT exhibit a design aesthetic that is deeply unappealing to most programmers. The semantics of XSLT are interesting, but the syntax appears to be a result of the mistakes that are often made when programmers decide to create their own DSLs. Olin Shivers has a good discussion of the often-broken “little language” phenomenon in his scsh paper.

Speaking of Scheme, it’s possible that something reasonable can be built with SXML. I’ve also had good results using Perl and Mojo::DOM to build Graphviz diagrams of the links among Confluence wiki pages as part of a hacked-together “link checker” (Users of Confluence in an industrial setting will know that the built-in link-checking in Confluence only “sort of” works, which is indistinguishable in practice from not actually working — hence the need to build my own thing).

I’ve also been playing around with MIT Scheme’s built-in XML Parser, and so far I’m preferring it to the Perl or SXML way of doing things.

Advent of Code, Day 3

In this post I’ll describe my solution for Day 3 of the Advent of Code.

Problem Description

Day 3: Perfectly Spherical Houses in a Vacuum

Santa is delivering presents to an infinite two-dimensional grid of houses.

He begins by delivering a present to the house at his starting location, and then an elf at the North Pole calls him via radio and tells him where to move next. Moves are always exactly one house to the north (‘^’), south (‘v’), east (‘>’), or west (‘<‘). After each move, he delivers another present to the house at his new location.

However, the elf back at the north pole has had a little too much eggnog, and so his directions are a little off, and Santa ends up visiting some houses more than once. How many houses receive at least one present?

For example:

‘>’ delivers presents to 2 houses: one at the starting location, and one to the east.

‘^>v<‘ delivers presents to 4 houses in a square, including twice to the house at his starting/ending location.

‘^v^v^v^v^v’ delivers a bunch of presents to some very lucky children at only 2 houses.

Solution

Broadly speaking, my solution consisted of:

  • Reading the directions file to determine the largest x and y values of the grid
  • Making a shaped array using those dimensions (specifically, we double the array dimensions to allow for movement up, down, forward, and back)
  • Starting in the center of the shaped array, follow the instructions from the “map” and mark every house (called a “position” in the code) if it hasn’t already been visited
  • Every time we visit a house we haven’t already visited, we bump a counter

Here’s the Scheme code that accomplishes those steps:

;; read in the string
;; sum the ^ and v chars to get the height of the matrix (graph)
;; sum the < and > chars to get the width of the matrix (graph)

(define (north? ch) (char=? ch #\^))
(define (south? ch) (char=? ch #\v))
(define (east? ch) (char=? ch #\>))
(define (west? ch) (char=? ch #\<))

(define (make-shape width height)
  ;; Int Int Int -> Shape
  (shape 0 width 0 height))

(define (read-shape-file file)
  ;; Pathname -> Shape
  (with-input-from-file file
    (lambda ()
      (let loop ((width 0)
         (height 0)
         (min-width  0)
         (max-width 0)
         (min-height 0)
         (max-height 0)
         (ch (read-char)))
    (if (eof-object? ch)
        (make-shape
         (* 2 (- max-width min-width))
         (* 2 (-  max-height min-height)))
        (cond ((north? ch)
           (loop width (+ height 1)
             min-width max-width
             (min min-height height)
             (max max-height height)
             (read-char)))
          ((south? ch)
           (loop width (- height 1)
             min-width max-width
             (min min-height height)
             (max max-height height)
             (read-char)))
          ((east? ch)
           (loop (+ width 1) height
             (min min-width width)
             (max max-width width)
             min-height max-height
             (read-char)))
          ((west? ch)
           (loop (- width 1) height
             (min min-width width)
             (max max-width width)
             min-height max-height
             (read-char)))
          (else (error "WHOA"))))))))

;; We make a shaped, multi-dimensional array (SRFI-25) in the size
;; it's determined we need by our earlier check.

(define (make-grid shape)
  ;; Shape -> Array
  (make-array shape #f))

;; The POSITION data type

(define-record-type position
  (make-position x y)
  position?
  (x position-x set-position-x!)
  (y position-y set-position-y!))

(define (array-center arr)
  ;; Array -> Position
  (let ((len-x (array-length arr 0))
    (len-y (array-length arr 1)))
    (make-position (/ len-x 2)
           (/ len-y 2))))

(define (make-relative-position x y ch)
  ;; Int Int Char -> Position
  (let ((vals '()))
    (cond ((north? ch)
       (set! vals (list (+ x 1) y)))
      ((south? ch)
       (set! vals (list (- x 1) y)))
      ((east? ch)
       (set! vals (list x (- y 1))))
      ((west? ch)
       (set! vals (list x (+ y 1))))
      (else (set! vals (list x y))))
    (make-position (first vals)
           (second vals))))

(define (visited? arr x y)
  ;; Array Int Int -> Bool
  (array-ref arr x y))

(define (set-visited! arr x y)
  ;; Array Int Int -> Undefined
  (array-set! arr x y #t))

(define (visit-locations arr file)
  ;; Pathname -> Int
  (let ((visited-count 0))
    (with-input-from-file file
      (lambda ()
    (let loop ((ch (read-char))
           (current-position (array-center arr)))
      (if (eof-object? ch)
          visited-count
          (begin
        (let* ((current-x (position-x current-position))
               (current-y (position-y current-position))
               (next-position
            (make-relative-position current-x current-y ch)))
          (if (not (visited? arr current-x current-y))
              (begin
            (set! visited-count (+ visited-count 1))
            (set-visited! arr current-x current-y)
            (loop (read-char)
                  next-position))
              (loop (read-char) next-position))))))))
    visited-count))

;; eof

Once this code is loaded up in the REPL, you can use it as shown below. (Note that the answer shown at the end isn’t real to avoid a spoiler.)

(set! *the-array* (make-grid (read-shape-file (expand-file-name "~/Code/personal/advent-of-code/03.dat"))))
'#{Array:srfi-9-record-type-descriptor}

> (array-size *the-array*)
42588

> (array-center *the-array*)
'#{Position}

> (define *the-file* (expand-file-name "~/Code/personal/advent-of-code/03.dat"))

> (visit-locations *the-array* *the-file*)
12345

Related Posts

Advent of Code, Day 2

This post describes my solution for Day 2 of the Advent of Code.

Problem Description

First, the problem description (copied from the website):


Day 2: I Was Told There Would Be No Math

The elves are running low on wrapping paper, and so they need to submit an order for more. They have a list of the dimensions (length l, width w, and height h) of each present, and only want to order exactly as much as they need.

Fortunately, every present is a box (a perfect right rectangular prism), which makes calculating the required wrapping paper for each gift a little easier: find the surface area of the box, which is 2 x l x w + 2 x w x h + 2 x h x l. The elves also need a little extra paper for each present: the area of the smallest side.

For example:

  • A present with dimensions 2x3x4 requires 2 x 6 + 2 x 12 + 2 x 8 = 52 square feet of wrapping paper plus 6 square feet of slack, for a total of 58 square feet.
  • A present with dimensions 1x1x10 requires 2 x 1 + 2 x 10 + 2 x 10 = 42 square feet of wrapping paper plus 1 square foot of slack, for a total of 43 square feet.

All numbers in the elves’ list are in feet. How many total square feet of wrapping paper should they order?


Solution

Once again, we’ll be working in Scheme.

For this problem, I decided to create a “box” data type. In addition to the automatically generated accessors (thanks SRFI-9!), I wrote several procedures to perform calculations on boxes, namely:

  • SURFACE-AREA: Calculate the box’s surface area.
  • SMALLEST-SIDE: Determine which of the box’s sides has the smallest surface area (the extra material makes it easier to wrap).

WRAPPING-PAPER is just a “wrapper” (pun intended) around the first two.

LINE->BOX, READ-BOXES, and SUM-BOXES are all about parsing the input file contents and shuffling them into the box data type that we use to do the actual calculation. The only part that required a bit of thought was the line with STRING-TOKENIZE in LINE->BOX. In Perl I’d use my @params = split /x/, $line without even thinking, but I was less familiar with Scheme’s facility for solving this problem, so it took a few minutes to puzzle out the right part of Scheme’s “API”. (STRING-TOKENIZE was helpfully provided by SRFI-13.)

Abstract data types FTW! I’ll be using them more as the month’s challenges progress.

;; ,open srfi-9 srfi-13 sort

(define-record-type box
  (make-box l w h)
  box?
  (l box-length set-box-length!)
  (w box-width set-box-width!)
  (h box-height set-box-height!))

(define (surface-area box)
  ;; Box -> Int
  (let ((l (box-length box))
    (w (box-width box))
    (h (box-height box)))
    (+ (* 2 l w)
       (* 2 w h)
       (* 2 h l))))

(define (smallest-side box)
  ;; Box -> Int
  (define (smallest-two xs)
    ;; List -> List
    (let ((sorted (sort-list xs <)))
      (list (first sorted)
        (second sorted))))
  (let ((l (box-length box))
    (w (box-width box))
    (h (box-height box)))
    (apply * (smallest-two (list l w h)))))

(define (wrapping-paper box)
  ;; Box -> Int
  (let ((minimum (surface-area box))
    (extra (smallest-side box)))
    (+ minimum extra)))

(define (line->box line)
  ;; String -> Box
  (define (line->lon s)
    ;; String -> List<Number>
    (let ((xs (string-tokenize s (char-set-complement (char-set #\x)))))
      (map string->number xs)))
  (let* ((dims (line->lon line)))
    (let ((l (first dims))
      (w (second dims))
      (h (third dims)))
      (make-box l w h))))

(define (read-boxes file)
  ;; Pathname -> List<Box>
  (with-input-from-file file
    (lambda ()
      (let loop ((line (read-line))
         (ys '()))
    (if (eof-object? line)
        ys
        (loop
         (read-line)
         (cons (line->box line) ys)))))))

(define (sum-boxes boxes)
  ;; List<Box> -> Int
  (let ((xs (map wrapping-paper boxes)))
    (apply + xs)))

;; eof

Related Posts

Advent of Code, Day 1

origami-dragon

Advent of Code is a site that provides a programming problem for every day in December leading up to Christmas.

I’ve become a little obsessed with it over the last few days, and thought I’d write up my results. So far I’ve been working in Scheme.

Here’s the Day 1 problem description:


Day 1: Not Quite Lisp

Santa was hoping for a white Christmas, but his weather machine’s “snow” function is powered by stars, and he’s fresh out! To save Christmas, he needs you to collect fifty stars by December 25th.

Collect stars by helping Santa solve puzzles. Two puzzles will be made available on each day in the advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants one star. Good luck!

Here’s an easy puzzle to warm you up.

Santa is trying to deliver presents in a large apartment building, but he can’t find the right floor – the directions he got are a little confusing. He starts on the ground floor (floor 0) and then follows the instructions one character at a time.

An opening parenthesis, (, means he should go up one floor, and a closing parenthesis, ), means he should go down one floor.

The apartment building is very tall, and the basement is very deep; he will never find the top or bottom floors.

For example:

  • (()) and ()() both result in floor 0.
  • ((( and (()(()( both result in floor 3.
  • ))((((( also results in floor 3.
  • ()) and ))( both result in floor -1 (the first basement level).
  • ))) and )())()) both result in floor -3.

To what floor do the instructions take Santa?


The file of instructions looks something like this (but much larger):

()()((((()(((())(()(()((((((()(()(((())))((()(((()))((())(()((()

Here’s the (reasonably straight-forward) Scheme code. It basically just iterates through the input file, bumping a counter up or down based on the type of paren read from the input port. The definitions of UP-FLOOR? and DOWN-FLOOR? weren’t really necessary, but they made the main procedure a little easier to read.

(define (up-floor? x)
  (char=? x #\())

(define (down-floor? x)
  (char=? x #\)))

(define (find-floor file)
  (with-input-from-file file
    (lambda ()
      (let loop ((floor-number 0)
         (char (read-char)))
    (if (eof-object? char)
        floor-number
        (loop (cond ((up-floor? char)
             (+ floor-number 1))
            ((down-floor? char)
             (- floor-number 1))
            (else floor-number))
          (read-char)))))))

(Image courtesy Strongpaper under a Creative Commons license.)

A Solution to the Five Weekends Problem in Common Lisp

neuschwanstein

I’ve been enjoying playing around with Rosetta Code problems lately. Here’s a solution I posted last week for the Five Weekends problem in Common Lisp:

  ;;; http://rosettacode.org/wiki/Five_weekends

  ;; Given a date, get the day of the week.  Adapted from
  ;; http://lispcookbook.github.io/cl-cookbook/dates_and_times.html

  (defun day-of-week (day month year)
    (nth-value
     6
     (decode-universal-time
          (encode-universal-time 0 0 0 day month year 0)
          0)))

  (defparameter *long-months* '(1 3 5 7 8 10 12))

  (defun sundayp (day month year)
    (= (day-of-week day month year) 6))

  (defun ends-on-sunday-p (month year)
    (sundayp 31 month year))

  ;; We use the "long month that ends on Sunday" rule.
  (defun has-five-weekends-p (month year)
    (and (member month *long-months*)
             (ends-on-sunday-p month year)))

  ;; For the extra credit problem.
  (defun has-at-least-one-five-weekend-month-p (year)
    (let ((flag nil))
          (loop for month in *long-months* do
                   (if (has-five-weekends-p month year)
                           (setf flag t)))
          flag))

  (defun solve-it ()
    (let ((good-months '())
                  (bad-years 0))
          (loop for year from 1900 to 2100 do
             ;; First form in the PROGN is for the extra credit.
                   (progn (unless (has-at-least-one-five-weekend-month-p year)
                                    (incf bad-years))
                                  (loop for month in *long-months* do
                                           (when (has-five-weekends-p month year)
                                             (push (list month year) good-months)))))
          (let ((len (length good-months)))
            (format t "~A months have five weekends.~%" len)
            (format t "First 5 months: ~A~%" (subseq good-months (- len 5) len))
            (format t "Last 5 months: ~A~%" (subseq good-months 0 5))
            (format t "Years without a five-weekend month: ~A~%" bad-years))))

(Image copyright gacabo under Creative Commons license.)