A Mini Python and Shell Tutorial


The following is an email I sent to a couple of coworkers whom I’d been teaching a short Python course for technical writers, using Automate the Boring Stuff with Python. The email was meant to show them a real-life example of how a technical writer can use Python and shell scripting to automate something that is, well, boring. In this case, the task was to clean up a CSV file containing a list of git commits to the AppNexus REST APIs.

Because of the way we received this data, it had duplicate entries, and lots of non-interesting merge commits that were unrelated to a feature (a feature is generally associated with a JIRA ticket). Our task was to review the commits and see if there was anything interesting that should be added to our monthly API release notes.

(The names of my coworkers have been changed, obv.)

To: Jane X. (‘REDACTED@appnexus.com’)

Subject: Filtered API git commits to review (bonus: mini Python & shell tutorial)

From: Rich Loveland (‘REDACTED@appnexus.com’)

CC: Victoria Y. (‘REDACTED@appnexus.com’)

Date: Wed, 18 Nov 2015 16:59:04 -0500

+Victoria for the code fun

Jane, the file of commit logs for you to review is attached (along with some others). But so what, that’s boring! Let’s talk about how it was made.

To make the really boring task of reviewing API git commits less awful, let’s do some programming for fun. First let’s write a short Python script to pull out only those commits that have a JIRA ticket ID in them (since we don’t care about the other ones), and call it ‘filter-commit-messages.py’:

  #!/usr/bin/env python

  import re
  import sys

  jira_pat = "[A-Z]+-[0-9]+"

  for line in sys.stdin.readlines():
      m = re.search(jira_pat, line)
      if m:

This tries to match a regular expression against each line of its input (in this case the compiled API git commit list), and prints the line if the match occurs.

Let’s make it executable from our shell:

$ cd ~/bin
$ ln -s ~/work/code/filter-commit-messages.py filter-commit-messages
$ chmod +x ~/bin/filter-commit-messages
$ export PATH=$HOME/bin:$PATH

Then we can run it on the text file with the git commits like so:

$ filter-commit-messages < api-release-november-2015.csv

(The “<” in the shell means “Read your input from this place”.)

This prints out only the matching lines, but there are a lot of annoying extra lines in the output. We can get rid of those lines while sorting them like so:

$ filter-commit-messages < api-release-november-2015.csv | sort 

(The ”

” in the shell means “Pass your output through to this other command”.)

Now that we are extracting only the important lines, let’s throw them in a file:

$ filter-commit-messages < api-release-november-2015.csv | sort > api-release-november-2015-actual.csv

(The “>” near the end means “Write all of the output to this place”.)

We can see how much less reading we have to do now by running a word count program (‘wc’) on the before and after files:

$ wc -l api-release-november-2015.csv # old
     201 api-release-november-2015.csv
$ wc -l api-release-november-2015-actual.csv # new
     115 api-release-november-2015-actual.csv

(The “-l” means “count the lines”.)

Now, since Jane and I each have to review half of the commits, we can use the ‘split’ shell command to break the file in half. Since we know the file is 115 lines, we need to tell ‘split’ how many lines to put in each half with the ‘-l’ option (see ‘man split’ in your terminal):

$ split -l 58 api-release-november-2015-actual.csv COMMITS-TO-REVIEW

‘split’ takes the last argument, “COMMITS-TO-REVIEW”, and creates two files based on that, “COMMITS-TO-REVIEWaa” and “COMMITS-TO-REVIEWbb”, which we can rename for each reviewer:


A nice thing is that because we sorted the lines of the files, each reviewer gets commits by a sorted subset of the engineers, making it easier to see their related commits next to each other.

p.s. We didn’t actually need a Python program for the first part, we could have just used ‘grep’ and stayed with shell commands. But hey!

p.p.s. With more work, this could all be put together into a single program if we were inclined, but since it doesn’t get used that often it’s probably OK to type a few commands.

(Image courtesy William Hartman under Creative Commons license.)

Just for Fun: Accumulate

Some years ago, I came across a neat little shell utility that Mark Dominus had written called accumulate. You can read more about it at his blog.

He thought it so trivial that he worried about insulting the intelligence of his readers by sharing the source code; since I have no readers, I have no such concerns. Here is my Python version:

#!/usr/bin/env python

import sys
An ugly not-quite-port of Mark J. Dominus' 'accumulate' utility,
described at [http://blog.plover.com/prog/accumulate.html]

seen = {}

for line in sys.stdin.readlines():
  array = line.split()
  key = array[0]
  value = array[1:]
  if key in seen:
    seen[key] = value

for k, v in seen.iteritems():
  print k, ' '.join(v)