Statistics over Git Repositories with Scsh

stream-small.jpg

Figure 1: A forest stream outside New Paltz, NY.

In this post I’ll share a scsh port of a nice shell script from Gary Bernhardt’s Destroy All Software screencasts. This is taken from season 1, episode 1 of the series 1.

The script is used to gather statistics on a git repository. You pass it a regex matching a filename, and it outputs a table showing how many lines of that type of file were included in each commit.

For example, I might want to see how the number of lines of documentation in Markdown files changed across commits:

$ repo-stats ".md$"
... snip! ...
52      c36cc6d First version of diff-checking code.
52      9ed53c3 Tweaks.
52      9e17d7e Add new service.
64      b293c3d Describe how to use the diffing code.
64      1886164 Update comments and documentation.
64      4a7ba26 Bump TODO prio.

The scsh code to do this is below; it’s a nearly 1:1 translation of Mr. Bernhardt’s bash code into scsh. It does differ in a few ways:

  • No dynamic/global variables: In the bash code there are variables being used inside functions that weren’t passed in as arguments to those functions. This is fine for small programs, but is probably not a Good Thing ™.
  • Since scsh is based on Scheme 48, we get a nice inspector/debugger for free.
  • At this program size, we don’t need to break out the Scheme 48 module system. However, if we wanted to integrate this scsh code cleanly with a larger system, we could do so fairly easily.
  • Something about how scsh is calling git and piping its output isn’t turning off git’s dumb (IMO) “I will behave differently depending on what kind of output I think I’m writing to” behavior. Therefore, unlike in Mr. Bernhardt’s example, we need to unset the GIT_PAGER environment variable.
  • Mr. Bernhardt used bash in his video due to its ubiquity. Scsh fails utterly in this regard, since almost no one uses it. However, that doesn’t really matter unless you need to distribute your code to a wider audience.2
  • Subjectively, Scheme is an immeasurably nicer language than whatever weird flavor of POSIXy sh is available.

Enough rambling, let’s have some code:

#!/usr/local/bin/scsh \
-e main -s
!#

(setenv "GIT_PAGER" "")

(define (revisions)
  (run/strings (git rev-list --reverse HEAD)))

(define (commit-description rev)
  (run/string (git log --oneline -1 ,rev)))

(define (number-of-lines file-pattern rev)
  (run/string
   (| (git ls-tree -r ,rev)
      (grep ,file-pattern)
      (awk "{print $3}")
      (xargs git show)
      (wc -l))))

(define (main prog+args)
  (let ((pat (second prog+args))
        (revs (revisions)))
    (for-each
     (lambda (rev)
       (let ((column-1 (string-trim-both (number-of-lines pat rev)))
             (column-2 (string-trim-both (commit-description rev))))
         (format #t "~A\t~A~%" column-1 column-2)))
     revs)))

Footnotes:

1

I feel like I should note for the record that:

  1. This is a legitimate, paid copy of Mr. Bernhardt’s videos that we’re working from.
  2. Although I’m only a few episodes into season 1, I am really enjoying the series and would recommend.
2

And if you do need to distribute your code to a wider audience, there is an easy way to dump a heap image that should be runnable by any other scsh VM of the same version. I’ve done this myself to distribute reasonably large/complex scripts to coworkers. I’m written a little scsh library to automate the process of installing an “app” in a heap image. I hope to write about it here soon.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s