A Mini Python and Shell Tutorial

wooly-mammoth-cp

The following is an email I sent to a couple of coworkers whom I’d been teaching a short Python course for technical writers, using Automate the Boring Stuff with Python. The email was meant to show them a real-life example of how a technical writer can use Python and shell scripting to automate something that is, well, boring. In this case, the task was to clean up a CSV file containing a list of git commits to the AppNexus REST APIs.

Because of the way we received this data, it had duplicate entries, and lots of non-interesting merge commits that were unrelated to a feature (a feature is generally associated with a JIRA ticket). Our task was to review the commits and see if there was anything interesting that should be added to our monthly API release notes.

(The names of my coworkers have been changed, obv.)


To: Jane X. (‘REDACTED@appnexus.com’)

Subject: Filtered API git commits to review (bonus: mini Python & shell tutorial)

From: Rich Loveland (‘REDACTED@appnexus.com’)

CC: Victoria Y. (‘REDACTED@appnexus.com’)

Date: Wed, 18 Nov 2015 16:59:04 -0500

+Victoria for the code fun

Jane, the file of commit logs for you to review is attached (along with some others). But so what, that’s boring! Let’s talk about how it was made.

To make the really boring task of reviewing API git commits less awful, let’s do some programming for fun. First let’s write a short Python script to pull out only those commits that have a JIRA ticket ID in them (since we don’t care about the other ones), and call it ‘filter-commit-messages.py’:

  #!/usr/bin/env python

  import re
  import sys

  jira_pat = "[A-Z]+-[0-9]+"

  for line in sys.stdin.readlines():
      m = re.search(jira_pat, line)
      if m:
          print(line)

This tries to match a regular expression against each line of its input (in this case the compiled API git commit list), and prints the line if the match occurs.

Let’s make it executable from our shell:

$ cd ~/bin
$ ln -s ~/work/code/filter-commit-messages.py filter-commit-messages
$ chmod +x ~/bin/filter-commit-messages
$ export PATH=$HOME/bin:$PATH

Then we can run it on the text file with the git commits like so:

$ filter-commit-messages < api-release-november-2015.csv

(The “<” in the shell means “Read your input from this place”.)

This prints out only the matching lines, but there are a lot of annoying extra lines in the output. We can get rid of those lines while sorting them like so:

$ filter-commit-messages < api-release-november-2015.csv | sort 

(The ”

” in the shell means “Pass your output through to this other command”.)

Now that we are extracting only the important lines, let’s throw them in a file:

$ filter-commit-messages < api-release-november-2015.csv | sort > api-release-november-2015-actual.csv

(The “>” near the end means “Write all of the output to this place”.)

We can see how much less reading we have to do now by running a word count program (‘wc’) on the before and after files:

$ wc -l api-release-november-2015.csv # old
     201 api-release-november-2015.csv
$ wc -l api-release-november-2015-actual.csv # new
     115 api-release-november-2015-actual.csv

(The “-l” means “count the lines”.)

Now, since Jane and I each have to review half of the commits, we can use the ‘split’ shell command to break the file in half. Since we know the file is 115 lines, we need to tell ‘split’ how many lines to put in each half with the ‘-l’ option (see ‘man split’ in your terminal):

$ split -l 58 api-release-november-2015-actual.csv COMMITS-TO-REVIEW

‘split’ takes the last argument, “COMMITS-TO-REVIEW”, and creates two files based on that, “COMMITS-TO-REVIEWaa” and “COMMITS-TO-REVIEWbb”, which we can rename for each reviewer:

$ mv COMMITS-TO-REVIEWaa COMMITS-TO-REVIEW-RICH
$ mv COMMITS-TO-REVIEWbb COMMITS-TO-REVIEW-JANE

A nice thing is that because we sorted the lines of the files, each reviewer gets commits by a sorted subset of the engineers, making it easier to see their related commits next to each other.

p.s. We didn’t actually need a Python program for the first part, we could have just used ‘grep’ and stayed with shell commands. But hey!

p.p.s. With more work, this could all be put together into a single program if we were inclined, but since it doesn’t get used that often it’s probably OK to type a few commands.

(Image courtesy William Hartman under Creative Commons license.)

How to Write Documentation for People that Don’t Read

These are my notes from Kevin Burke’s talk at the 2015 Write the Docs conference in Portland, Oregon. Any brilliant ideas in the below text should be attributed to Mr. Burke. Any errors, omissions, or misrepresentations are mine. You can also
watch a video of Mr. Burke’s talk.

Kevin Burke

Eye scans show that users read in an F-shaped pattern, from top left down the left side.

Users don’t look at big blocks of text.

Do this:

  • meaningful text and images
  • starts of paras
  • links
  • bullets

Use bulleted lists, not walls of text.

Bulleted lists perform 125% better (2.25x) than paragraphs for getting people to retain information.

Use the Markdown format for Github READMEs so you can have bold, etc. formatting.

People skip things that look like ads; this means they SKIP BIG RED CALLOUTS.

(NTS: we use this a ton, think about revisiting)

Text can’t be too wide (shows JIRA’s REST API docs as bad example)

Text should be 65-90 characters wide, like books.

(NTS: “information scent”)

Biggest paragraph from Github thing has 2 sentences.

Users are also bad at searching; they also don’t try other queries (“one and done”).

There is often benefit to having subtly different variants of questions around, as people tend to ask and search using different words.

People searched for client SDKs at Twilio as:

  • helper libraries
  • client SDKs
  • API SDKs
  • library bindings
  • language-specific wrappers

Example question: “how do I forward a number to my cell phone?”

Meanwhile, the docs said: “How to do everything with an incoming call”

A user won’t associate this; it’s better to have targeted pages.

No one reads anything above or below code snippets.

Code snippets can often have implicit configuration (env vars, library requires, etc.).

There can also be typos in code snippets.

A lot of people are not familiar with pip/gem/etc., so you have to help them figure out how to use peripheral tools.

Twilio has JS that fills in snippets with your actual credentials so when you copypasta a code sample it’s runnable.

People copy the dollar signs from bash snippets.

(NTS: WTF?)

every user failure is a potential job to be done

Say you get an SSL error authenticating to the Twilio API.

What are you going to do? Google it!

None of the top google results are your actual docs (SO, blogs, forums, etc.)

(NTS: if your stuff isn’t on google, it doesn’t exist.)

transgression: docs behind a wall

At one point 50% of twilio traffic was from Google.

If you have a login wall, you’re throwing away 50% of your traffic.

transgression: providing pdfs

PDFs are not searchable.

WikiHow stole traffic from Twilio!

If users get an error message from you, put that exact message in your docs.

Error messages should be copypastable into the goog.

Validation as documentation; maybe we can improve the error message.

Suggestion: put the user’s input back into the error message string so your user can see what they passed in to cause the error.

Strings are easy to find/change.

Takeaways

  • break up text
  • first 3-5 words of every para
  • more links
  • care about SEO
  • EVERY PAGE IS A LANDING PAGE
  • your docs should always win on google
  • make error messages explain how to fix the problem
  • better yet, solve the problem for the user

Scsh Manual Available in Texinfo Format

../img/shells.jpg

I am pleased to announce that, many years later, the famous Scsh Reference Manual has been converted to Texinfo. You can get your very own copy here: https://github.com/rmloveland/scsh-manual-texinfo.

This conversion represents quite a bit of work. Although I would have liked to do the conversion with a program, there was just too much of an impedance mismatch between the original LaTeX sources and Texinfo. Something tells me that’s why this conversion hasn’t happened until now. Whatever the reason, I’m glad to finally collect some internet cool points, even if it is almost twenty years later.

In the course of the conversion, I uncovered a number of small spelling, grammatical, and usage issues, which have now been fixed. To be clear: this is no criticism of the original authors. Several of them appear to be non-native English speakers. Their English is miles better than my German!

The resulting manual builds very nicely into HTML, PDF, or Info files. The PDF in particular is nice, since you get beautiful, clickable links everywhere! And the Info files are quite easy to install in your local Emacs.

Speaking of Emacs, if you like hacking Scsh from Emacs, try my geiser-scsh setup. Better yet, help me hack on it!

(Image courtesy steeljam under CC-BY-NC-ND license.)