Tag Archives: writing

Thinking about software documentation as the output of a lossy compression algorithm

stick

How many times have you heard or read the following comments?

  1. “The docs are always out of date”
  2. “I don’t bother reading the docs, I just read the source code”
  3. “If you write self-documenting code, you don’t need to write docs”

If you work in software, I bet the answer is: a lot. They range in truth value from #1, which is a tautology, to #3, which is a fantasy. I can relate to #2, since I had to do it just yesterday when using a semi-undocumented Ruby library.

I think all of these points of view (and more) are explained when you think about software documentation as the output of a lossy compression algorithm. Many (most? all?) of the things that you love and/or hate about the documentation for a particular piece of software are explained by this.

Quoth wiki:

Well-designed lossy compression technology often reduces file sizes significantly before degradation is noticed by the end-user. Even when noticeable by the user, further data reduction may be desirable (e.g., for real-time communication, to reduce transmission times, or to reduce storage needs).

As such, the features of software documentation are similar to the features of other products of lossy compression algorithms.

Take mp3s for example. The goal of an mp3 is not to be the highest-fidelity replication of the audio experience it represents. The goal of an mp3 is to provide a “good enough” audio experience given the necessary tradeoffs that had to be made because of constraints such as:

  • Time: How expensive in CPU time is it to run the decompression algorithm? Can the CPU on the device handle it?
  • Space: How much disk space will the file take on the device?

Similarly, we might say that the goal of software documentation is not to be the highest-fidelity replication of the “understanding” experience it (theoretically) represents. The goal of a piece of documentation is to provide a “good enough” learning experience given the necessary tradeoffs that had to be made because of constraints such as:

  • Time: How expensive in person time is it to “run the decompression algorithm”, i.e., learn how the system works well enough to write something coherent? And then to actually write it? How many technical writers does the organization employ per engineer? (In many companies it’s ~1 per 40-50 engineers) How many concurrent projects is that writer working on, across how many teams?
  • Space: How much information does the user need to use the system? How little information can you get away with providing before users give up in disgust?

Remember that fewer, more effective materials cost more to produce. This is similar to the way better compression algorithms may cost more than worse ones along various axes you care about (dollar cost for proprietary algorithms, CPU, memory, etc.)

It takes longer to write more concise documentation, draw clear system diagrams, etc., since those are signs that you actually understand the system better, and have thus compressed the relevant information about it into fewer bytes.

And oh by the way, in practice in an “agile” (lol) environment you don’t have enough time to write the “best” docs for any given feature X. Just like the programmer who wrote feature X would likely admit that she didn’t have enough time to write the “best” implementation according to her standards.

Quoth Pascal:

I would have written a shorter letter, but I did not have the time.

So the next time you are frustrated by the docs for some piece of software (if any docs exist at all), instead of some platitude about docs sucking, think “oh, lossy compression”.

(Image courtesy Jonathan Sureau under Creative Commons license.)

Advertisements

Oh my, am I really considering XML?

Since writing Why Markdown is not my favorite text markup language, I’ve been thinking more about document formats.

More and more I begin to see the impetus for the design of XML, despite its sometimes ugly implementation. With XML you avoid much of the ambiguity of parsing plain-text-based formats and just write the document AST directly. Whether this is a good or bad thing seems to depend on the tools you have available to you, but I think I’m starting to see the light.

At $WORK, for example, I’ve been writing directly in the “XML-ish” Confluence storage format since it was introduced in Confluence 4. Combined with the right editing environment (such as that provided by Emacs’ nxml-mode), it’s easy to navigate XML “structurally” in such a way that you no longer really see the tags.

It’s sort of like being Neo in The Matrix except that, instead of making cool shit happen in an immersive virtual world you’re, um, writing XML.

However, not all is roses in XML-land. In an ideal world, you could maintain a set of XML documents and reliably transform them into other valid formats using a simple set of tools that are easy to learn and use. In reality, many of the extant XML tools such as XSLT exhibit a design aesthetic that is deeply unappealing to most programmers. The semantics of XSLT are interesting, but the syntax appears to be a result of the mistakes that are often made when programmers decide to create their own DSLs. Olin Shivers has a good discussion of the often-broken “little language” phenomenon in his scsh paper.

Speaking of Scheme, it’s possible that something reasonable can be built with SXML. I’ve also had good results using Perl and Mojo::DOM to build Graphviz diagrams of the links among Confluence wiki pages as part of a hacked-together “link checker” (Users of Confluence in an industrial setting will know that the built-in link-checking in Confluence only “sort of” works, which is indistinguishable in practice from not actually working — hence the need to build my own thing).

I’ve also been playing around with MIT Scheme’s built-in XML Parser, and so far I’m preferring it to the Perl or SXML way of doing things.

Include Code Samples in Markdown Files

angelfish.jpg

Introduction

One useful feature that Markdown omits is any way to properly maintain formatted code samples in the text. Instead you have to indent your code samples “by hand”. This is easy to mess up, especially if you have a team of people all editing the same files.

Code indentation and formatting is an important issue if you are writing tech docs intended for engineers. It’s mostly about ease of readability. Badly formatted code is jarring to the eye of your reader and makes the rest of your documentation seem instantly suspect.

In this post I’ll share a technique (a script, really) that I’ve developed for “including” longer code samples into your Markdown documents from external files 1.

Motivation

To understand the motivation for this technique, let’s look at some made-up code samples. If you already understand why one might want to do this, feel free to skip down to the code.

First up is the simple case: a code snippet that’s just a few lines long.

# Spaceship docking mechanism.

my $foo = Foo->new;
$foo->rotate(90);
$foo->engage_maglocks;

That wasn’t too bad. However, you may need a longer code sample like the one shown below which uses a lot of indentation. You really don’t want to be manually indenting this inside a Markdown file.

;; Ye Olde Merge Sort.

(define (merge pred l r)
  (letrec ((merge-aux
            (lambda (pred left right result)
              (cond ((and (null? left) (null? right))
                     (reverse result))
                    ((and (not (null? left)) (not (null? right)))
                     (if (pred (car left) (car right))
                         (merge-aux pred
                                    (cdr left)
                                    right
                                    (cons (car left) result))
                         (merge-aux pred
                                    left
                                    (cdr right)
                                    (cons (car right) result))))
                    ((not (null? left))
                     (merge-aux pred (cdr left) right (cons (car left) result)))
                    ((not (null? right))
                     (merge-aux pred left (cdr right) (cons (car right) result)))
                    (else #f)))))
    (merge-aux pred l r '())))

(define (merge-sort xs pred)
  (let loop ((xs xs)
             (result '()))
    (cond ((and (null? xs) (null? (cdr result))) (car result))
          ((null? xs) (loop result xs))
          ((null? (cdr xs))
           (loop (cdr xs) (cons (car xs) result)))
          (else
           (loop (cddr xs)
                 (cons (merge < (first xs) (second xs)) result))))))

Code to solve the problem

An easier way to do this is to “include” the code samples from somewhere else. Then you can maintain the code samples in separate files where you will edit them with the right support for syntax highlighting and indentation from your favorite code $EDITOR.

The particular inline syntax I’ve settled on for this is as follows:

{include code_dir/YourFile.java}

Where code_dir is a directory containing all of your code samples, and YourFile.java is some random Java source file in that directory. (It doesn’t have to be Java, it could be any language.)

The include syntax is not that important. What’s important is that we can easily maintain our code and text separately. We can edit Markdown in a Markdown-aware editor, and code in a code-aware editor.

Then we can build a “final” version of the Markdown file which includes the properly formatted code samples. One way to do it is with this shell redirection (see below for the source of the expand_markdown_includes script):

$ expand_markdown_includes < your-markdown-file.md.in > your-markdown-file.md

This assumes you use the convention that your not-quite-Markdown files (the ones with the {include *} syntax described here) use the extension .md.in.

Another nice thing about this method is that you can automate the “include and build” step using a workflow like the one described in Best. Markdown. Writing. Setup. Ever.

Finally, here is the source of the expand_markdown_includes script. The script itself is not that important. It could be improved in any number of ways. Furthermore, because it’s so trivial, you can rewrite it in your favorite language.

#!/usr/bin/env perl

use strict;
use warnings;
use File::Basename;
use File::Slurp qw< slurp >;

my $input_file = shift;
my $input_pathname_directory = dirname( $input_file );

my @input_lines = slurp( $input_file );

my $include_pat = "{include ([/._a-z]+)}";

for my $line ( @input_lines ) {
  print $line unless $line =~ m/$include_pat/;

  if ( $line =~ /$include_pat/ ) {
    my $include_pathname = $1;
    my $program_file = build_full_pathname($input_pathname_directory,
                                           $include_pathname);
    my @program_text = slurp( $program_file );
    for my $program_line ( @program_text ) {
      printf( "    %s", $program_line );
    }
  }
}

sub build_full_pathname {
  my ($dir, $file) = @_;
  return $dir . '/' . $file;
}

Footnotes:

1

While I was writing this I decided to do a bit of web searching and I discovered this interesting Stack Overflow thread that mentions a number of different tools that solve this problem. However I rather like mine (of course!) since it doesn’t require any particular Markdown implementation, just the small preprocessing script presented here.

(Image courtesy Claudia Mont under a Creative Commons license.)

The Debugger is a Notebook

penrose-tiling-based-modular-origami.jpg

My style of programming has changed since the ODB. I now write insanely fast, making numerous mistakes. This gives me something to search for with the ODB. It’s fun.

– Bil Lewis, creator of the Omniscient Debugger

I. Programmers and Poets

In this short essay I will explore some similarities between the way (some) programmers work when doing exploratory programming that can be fruitfully compared to the way poets write poems. I will also sprinkle in some attempts to make the case that a good debugger is core to the experience of composing a new program that you don’t understand very well yet, and compare that with the experience of writing a poem. I know a lot about the former because I am not a very good programmer, so many programs that explore computer science concepts are “exploratory” for me; I know a lot about the latter because I am a reasonably good poet who has written many poems (which is to say, that I have edited many poems, which is really more important).

This work is largely inspired by:

  • The experience of working an programs for SICP exercises and getting popped into the MIT Scheme debugger a lot! 1
  • Using the Scheme 48/scsh inspector a bunch while working on geiser-scsh
  • Writing a lot of poems

II. Generating {Program,Poem} Text

Computer program texts are compressed descriptions of computational processes designed to be experienced by computers and humans. Similarly, poems are compressed descriptions of cognitive and emotional processes designed to be experienced by humans.

Both artifacts strive to encapsulate something that was understood by the author(s) at one point in time and convey that understanding to a reader at another point in time (human or machine). In poetry world, there are a number of different ways to work. There are ostensibly some writers who think really hard for a moment and write a fully-formed sentence. Then they think for a few moments more and write down another fully-formed sentence. And so on.

In reality, there are very few poets who work this way. Most people work using an approximation of what Sussman beautifully describes as “problem-solving by debugging almost-right plans”. 2 This is actually how human beings create new things! As my professor told our writing workshop, “I can’t teach you how to write. I can only teach you how to edit your own work”. Few people write well, and fewer edit well. But in the end, writing and editing are actually the same thing. When you first edit a poem, you may correct minor errors in the text. The more important work is “running the program” of the poem in your head, reading it over and over, reciting it aloud, testing whether it achieves the aesthetic purpose you have set for it. You will add a pronoun in one place, and replace an adjective in another. You might remove the last line, or add another stanza entirely. Do this for long enough, and you may find the poem has changed substantially over the course of having been “debugged”. It may also achieve a goal that you didn’t know existed when you began writing it. I suspect there is something very similar at work when people are doing exploratory programming sessions.

III. Debugger as Crutch/Enabler

Debuggers are seen by some as a crutch. I agree that debuggers are a crutch. There’s a reason crutches were invented. Because without them, you would have to crawl along, dragging your broken leg behind you in the dirt. And we all have a “broken leg” of sorts when we’re working on a problem we don’t understand very well.

I’d like to propose a better metaphor for debuggers. The debugger is a notebook where you can sketch out different versions of your program. You may correct minor errors in a variable declaration, or change a parameter to a procedure. You might redefine an existing procedure as a higher-order procedure that replaces two or three more verbose ones. And so on. All inside the debugger!

A sufficiently powerful debugger will give you the freedom to sketch out an idea quickly, watch it break, and play around in the environment where the breakage occurred, reading variable bindings, entering new definitions, etc.

I think of this style of programming as “sketching in your notebook” because you don’t write poems by staring at a blank sheet of paper for two minutes and then carefully writing a fully-formed sentence. You have an initial idea or impulse, and then you express it as fast as you can! You write down whatever parts of it you can manage to catch hold of, since your brain is working and associating much faster than your pen can really capture. What you end up with is a pile of things, some of which you will discard, some of which are worth keeping just as they are, and some of which are worth keeping but non-optimal and will need to be rewritten. If you actually have an idea worth expressing, you are in much greater danger of not capturing something than you are of making a mess. You will always start by making a mess and then cleaning it up 3.

I submit that a sufficently powerful, expressive debugging environment is as necessary to the programmer as a pocket notebook to the poet.

Interesting Reads

These essays explore writing, debugging, and thinking in more depth:

(Image courtesy fdecomite under Creative Commons License.)

Footnotes:

1

For more information about how cool the MIT Scheme debugger is, see Joe Marshall’s informative blog post.

2

This technique is mentioned on his CSAIL page here. For more information, see the link to his paper elsewhere on this page.

3

You may enjoy an interesting essay with this title: Make a Mess, Clean it Up!

Why Libraries and Librarians are Amazing

mother-of-all-libraries-14

Libraries: because you cannot have a conscience without a memory.

For some time now I’ve been meaning to write about why libraries (and librarians!) are important. After all, I’m a child of the library. In particular, I’m a child of this library, where I spent many happy hours growing up. As the son of a blue-collar family that was anything but “bookish”, well, I don’t know where I’d be without libraries.

What follows are a collection of random thoughts (linky-blurbs, really) about the general amazingness of libraries and librarians:

  • Librarians fought, and are still fighting, surveillance-state idiocy like the PATRIOT Act.
  • On a lighter note, the Internet Archive also has this collection of classic MS-DOS games that you can play in your browser. When I saw titles like Street Fighter, Sim City, Donkey Kong, and Castle Wolfenstein in this list, I have to admit I kinda freaked out. The future is amazing. Growing up, I had to memorize my friends’ phone numbers! And now we have magic tiny emulated computers in our browser tabs.

See also:

Never trust a corporation to do a library’s job.

(Image courtesy Cher Amio under Creative Commons license.)

Why Markdown is not my favorite text markup language

origami-galerie-freising-tomoko-fuseThere are many text markup languages that purport to allow you to write in a simple markup format and publish to the web. Markdown has arguably emerged as the “king” of these formats. I quite like it myself when it’s used for writing short documents with relatively simple formatting needs. However, it falls a bit short when you start to do more elaborate work. This is especially the case when you are trying to do any kind of “serious” technical authoring.

I know that “Markdown” has been used to write technical books. Game Programming Patterns is one excellent example; you can read more about the author’s use of Markdown here, and the script he uses to extend Markdown to meet his needs is here. (I recommend reading all of his essays about how he wrote the book, by the way. They’re truly inspiring.). Based on that author’s experience (and some of my own), I know that Markdown can absolutely be used as a base upon which to build ebooks, websites, wikis, and more. However, this is exactly why I used the term “Markdown” in quotes at the beginning of this paragraph. By the time you’ve extended Markdown to cover your more featureful technical authoring use cases, it really isn’t “just” Markdown anymore. This is fine if you just want to get something done quickly that meets your own needs, but it’s not ideal if you want to work with a meaningful system can be standardized and built on.

Below I’ll address just a few of the needs of “industrial” technical writing (the kind that I do, ostensibly) where Markdown falls a little short. Lest this come off as too negative, it’s worth stating for the record that a homegrown combination of Markdown and a few scripts in a git repo with a Makefile is still an absolute paradise compared to almost all of the clunky proprietary tooling that is marketed and sold for the purposes of “mainstream” technical writing. I have turned to such a homebrewed setup myself in times of need. I’ve even written about how awesome writing in Markdown can be. However, this essay is an attempt to capture my thoughts on Markdown’s shortcomings. Like any good internet crank, I reserve the right to pull a Nickieben Bourbaki at a later date.

I. No native table support

If you are doing any kind of large-scale tech docs, you need tables. Although constraints are always good, and a simple list can probably replace 80% of your table usage if you’re disciplined, there are times when you really just need a big honkin’ table. And as much as I’m used to editing raw XML and HTML directly in Emacs using its excellent tooling to completely sidestep the unwanted “upgrade” to the Confluence editor at $WORK, most writers probably don’t want to be authoring tables directly in HTML (which is the “native” Markdown solution).

II. No native table of contents support

Yes, I can write a script myself to do this. I can also use one of the dozens of such scripts written by others. However, I’d rather have something built in, and consider it a weakness of the format.

III. Forcing the user to fall back to inline HTML is not really OK

Like tables, there are a number of other formatting and layout use cases that Markdown can’t handle natively. As with tables, you must resort to just slapping in some raw HTML. Two reasons why this isn’t so amazing are:

  • It’s hard for an editor to support well, since editing “regular” text markup and tag-based markup languages are quite different beasts
  • It punts complexity to thousands of users in in order to preserve implementation simplicity for a small number of implementors

I can sympathize with the reasoning behind this design decision, since I am usually the guy making his own little hacks that meet simple use cases, but again: not really OK for serious work.

IV. Too many different ways to express the same formatting

This has lead to a number of incompatibilities among the different “Markdown” renderers out there. Just a few of the areas where ambiguity exists are: headers, lists, code sections, and links. For an introduction to Markdown’s flexible semantics, see the original syntax docs. Then, for a more elaborate description of the inconsistencies and challenges of rendering Markdown properly, see Why is a spec needed?, written by the CommonMark folks.

V. Too many incompatible flavors

There are too many incompatible flavors of Markdown that each render a document slightly differently. For a good description of the ways different Markdown implementations diverge, see the Babelmark 2 FAQ.

The “incompatible flavors” issue will hopefully be addressed with the advent of the CommonMark Standard, but if you read the spec it doesn’t address points I, II, or III at all. This makes sense from the perspective of the author of a standards document: a spec isn’t very useful unless you can achieve consensus and adoption among all the slightly different implementations out there right now, and Markdown as commonly understaood doesn’t try to support those cases anyway.

VI. No native means of validation

There will of course be a reference implementation and tests for CommonMark, which will ensure that the content is valid Markdown, but for large-scale documentation deployments, you really need the ability to validate that the documentation sets you’re publishing have certain properties. These properties might include, but aren’t limited to:

  • “Do all of the links have valid targets?”
  • “Is every page reachable from some other page?”

Markdown doesn’t care about this. And to be fair it never said it would! You are of course free to use other tools to perform all of the validations you care about on the resulting HTML output. This isn’t necessarily so bad (in fact it’s not as bad as points I and II in my opinion, since those actually affect you while you’re authoring), but it’s an issue to be aware of.

This is one area where XML has some neat tooling and properties. Although I suppose you could do something workable with a strict subset of HTML. You could also use pandoc to generate XML, which you then validate according to your needs.

Conclusion

Markdown solves its original use case well, while punting on many others in classic Worse is Better fashion. To be fair to Markdown, it was never purported to be anything other than a simple set of formatting conventions for web writing. And it’s worth saying once more that, even given its limitations, a homegrown combination of Markdown and a few scripts in a git repo with a Makefile is still an absolute paradise compared to almost all of the clunky proprietary tooling that is marketed and sold for the purposes of “mainstream” technical writing.

Even so, I hope I’ve presented an argument for why Markdown is not ideal for large scale technical documentation work.

(Image courtesy Gerwin Sturm under a Creative Commons license.)

Best. Markdown. Writing. Setup. Ever.

../img/markdown-emacs-compilation.png

When writing in a source format such as Markdown, it’s nice to be able to see your changes show up automatically in the output. One of my favorite ways to work is to have Emacs and Firefox open side by side (as shown above). Whenever I save my Markdown file, I want Emacs to automatically build a new HTML file from it, and I want Firefox to automatically refresh to show the latest changes.

Once you have this set up, all you have to do is write and save, write and save.

As it happens, fellow Redditor goodevilgenius was looking to accomplish just this workflow. I originally posted this answer on Reddit, but I’m reposting it here in the hope that it will help some kindly internet stranger someday.

I have this exact use case. I use compile-on-save mode and the Firefox Auto Reload extension.

So in a Markdown buffer (once you’ve installed compile-on-save mode):

M-x compile-on-save-mode RET
M-x compile RET markdown current-file.md > /tmp/current-file.html
Open current-file.html in Firefox.
Write stuff and save. Emacs will auto-compile the Markdown, and Firefox will instantly auto-reload the HTML file.

With Emacs and Firefox open side-by-side, I find it pretty easy to enter a “flow” state, since all you have to do is write and save the file. Hope that helps!

The Emacs-savvy reader will note that this workflow isn’t confined to Markdown. For example, compile-on-save mode could kick off an XML doc build (or any other computation you like, for that matter).

More Recommendations for Technical Writers

../img/green-fractal-tree.jpg

In the same spirit of the last post in this series, I have more recommendations for technical writers working in software-land. As before, I make no claims to living up to these recommendations myself. It’s like, aspirational, man.

Without further ado:

Learn Markdown

Yes, Markdown. It’s become a de facto standard plain text format for all kinds of web writing and documentation. It’s easy to use, it’s everywhere, and there are many tools that you can use to work with it.

See also: What is Markdown?

If you want to know what could happen to you if you start using it for things, see Ryan Tomayko’s Why You Should Not Use Markdown.

Learn your web browser’s development tools

From time to time you will need to peek under the covers of a web app you are documenting and see what’s actually going on. Especially nowadays with single page apps, browser-side local storage, and all that fun stuff.

Even if you never need to know that stuff to write your UI docs, you can use the dev tools to rewrite text in user interfaces for making nicer screenshots.

Strunk & White! Yes, Really!

Read it. Then try like hell to live it. I’ll confine the rest of this document to “technical” issues, since I can’t improve on the advice given in that book.

(Again, this is an area where I need to work harder on following my own advice. My long sentences and passive constructions are killing me.)

Think about learning to read code

Everything in our world runs on code (or will soon). Learning how to read code at least up to a basic level can help you figure out what the hell’s going on, sometimes. Yes, you can talk to an engineer, but your conversation will be more productive if you at least have a starting point.

The reason is simple: It’s usually easier to go to someone with a wrong idea and have them correct you with the right ideas than it is to get that same someone to teach you everything from scratch.

This doesn’t mean you have to become a programmer yourself (although you should try to learn how to write scripts to automate boring computer-y stuff, as I noted previously). But like most foreign languages, it helps to be able to read a little to get directions from the locals. And in this case the locals (programmers) speak code.

Seek out pathological edge cases

In any really large system or “platform” designed and built by multiple people on different teams, not all of whom are talking to each other three times a day about every single design decision they make, there will be a number of edge cases. In other words, there will be parts of the system that don’t play well together, or are at best, um, unintuitive in the way they behave.

It’s your job to find these and document them as well as you can. Sometimes this will be low-priority because the issues will be fixed “soon” (for some value of soon). Sometimes it will be necessary, though. You will have to use your own best judgment, along with input from your friends in Engineering and the Product organization, about when to do this.

You probably won’t get them all (not even close), but if you actively seek them out as you go, you will develop a mindset that will help you learn the system a little better.

Also: no one will assume this is part of your job or give you actual time to work on this; you just have to do it.

Don’t listen to anybody

Perhaps that should be rephrased as: “Don’t listen to anybody … just yet.” Until you’ve done your own research and testing and had your own look at the thing you’re documenting (whatever that thing is), you can’t really write about it for someone else. When Product folks, Engineering, etc. tell you the sky is blue, you should still stick your head out the nearest window.

It’s not that they don’t know what they’re talking about (they usually know more than you), it’s that the features they’re telling you about may exist soon at your layer of the onion, but right now they’re sitting on a git branch in somebody’s laptop, or as a line in a technical spec on the internal wiki, or in a weird corner of the internal-only sandbox environment that you will need to hunt down in order to actually use the thing.

If you are documenting a web API, only the things the APIs actually do are real. Everything else is bullshit. That’s part of the meaning of “API”.

Nobody will explicitly budget time for this, either. But you have to do it.

Do your own research, but prepare to be Wrong

It’s always faster to bug someone with a quick question, and sometimes that’s necessary. That said, I’ve been embarrassed by asking quick questions that turned out to have stupidly easy answers that I could have found out for myself.

It’s better to be known as someone who tries to solve their own problems before reaching out for help. Doing some light functional testing as you document, trying out a few database queries, or even reading through code (if that even makes sense and if you are able), will teach you things about the system that could prove useful to you in the future. It will at least give you something to talk about when you do sit down with someone else.

Having said all that, you will still be Wrong a lot (yes, that’s capitalized and bold). It’s one thing to use a thing, and another to build it. The engineers will always know a lot more than you about the systems. If you can get them to point out your mistakes, you’re doing your job! You’re learning things! So don’t get discouraged.

But didn’t I just tell you not to listen to anybody? Yes, but it’s never that simple – these are complex systems.

You are probably an unpaid part-time QA person, embrace it

In the course of writing documentation, you will naturally test things out to ensure that what you’re writing isn’t totally useless (even so, that happens sometimes). You’ll probably spend a lot of time doing this, and taking notes on what you find. These notes will then be integrated into the documentation you write, or into bug reports of some kind to your engineering colleagues (which will probably be rejected because you are Wrong – see above).

Again: no one will assume this is part of your job or give you actual time to work on it; you just have to do it.

Never Forget that The Train Never Stops

Finally – and most importantly – don’t let the fact that everything is changing all of the time get you down. The user interface will change drastically; new APIs will be added, and old ones deprecated and removed; people you’ve gotten to know and like will come and go. If you stay at a job like this for a couple of years, you will be rewriting your Year Two revision of the rewrite you did when you started.

Through it all, “the platform” can never stop running and changing: upgrades, changes, and so on all have to happen while the train is running down the tracks at eighty miles per hour. It doesn’t stop for the engineering teams working on it, and it certainly isn’t going to stop for you.

Your best weapon against change is automating as much drudge work as possible (again with the scripting!) so that you can focus on:

  1. Learning your company’s systems as well as you can technically given your limited time and resources
  2. Knowing what’s getting built next, and why
  3. Knowing what the business cares about (this is usually what drives #2)

Finally, never forget that you are doing important work. Someday, when the people who designed and built the current system have moved on, people will wonder “Why does API service ‘X’ behave this way when I set the twiddle_frobs field on API ‘Y’ to true, but only when API Z’s ignore_frob_twiddle_settings field is not null?”

If you’ve done a good job (and gotten really lucky), there will be some documentation on that.

For a great perspective on why documentation is so important to the health of a company from a smart man who’s been around the block once or twice, see Tim Daly’s talk Literate Programming in the Large.

(Image courtesy typedow under CC-BY-NC-SA.)

Announcing confluence2html

../img/kyoto-swan.jpg

If you use (or used to use) a Confluence wiki, you may need to deliver content that was written in wiki markup to HTML. Confluence does have the ability to export an entire space to HTML, but not a single page (or section of a page). To overcome this limitation, I’ve written a script called confluence2html which can convert a subset of Confluence’s wiki markup to HTML. (You can check it out at my Github page – For examples of the supported subset of wiki syntax, see the t/files directory).

Unfortunately, the new versions of Confluence use a hacky “not-quite-XML” storage format that is terrible for writers and for which there are basically no existing XML tools either. If you are trying to get your content out of a newer version of Confluence and back into wiki markup, check out Graham Hannington’s site (search the page for “wikifier”). His page also has tools to help you wrangle the new format, if you care to (I don’t).

With a bit of editing, you should be able to get the output of Graham’s tool to conform to the subset of Confluence wiki markup supported by this script. It supports some common macros (including tables!), and I’ve found it really useful.

Right now, it’s a command line tool only. You can use it like so:

$ confluence2html < wiki-input.txt > html-output.html

If you don’t know how to run commands from the shell, you can read about it at LinuxCommand.org. If you are on Windows, you can run shell commands using Cygwin. If you’re on a Mac, open Utilities > Terminal in Finder.

At this point I’ll quit blathering and point you to the Github page. Again: for examples of the supported subset of wiki syntax, see the t/files directory. For documentation on the supported wiki syntax, see the README.

Happy writing!

(Image courtesy of caribb under a Creative Commons license.)

Recommendations for Technical Writers

This document describes my recommendations for technical writers working in the software industry (at least the web-focused corner of it). Be forewarned: it’s opinionated. I don’t even live up to it, so you can think of it as an extended note to self.

Learn the UNIX command line

The entire modern internet computing infrastructure is built on open source, UNIX-like operating systems (mostly Linux). If you’re not familiar with them, you will have a more difficult time learning about how large-scale software systems running on networked computers work, otherwise known as: virtually all software systems of importance 1.

To get there, you’ll need to know how to interact with the command line interface (also known as a “terminal”). This is a very plain-looking text-only interface where you will type commands formatted in a strict language understood by the computer. You will learn (if you haven’t already) that the computer always does exactly what you’ve asked it to, but not necessarily what you wanted it to do.

There are a number of “shells” you can use on a UNIX-like OS. I recommend Bash because it’s ubiquitous.

Recommended Reading:

Learn REST API principles

Here’s a light-speed introduction to REST APIs. A large software system has lots of “objects”, which are generally just pieces of stored data about something. For example, where I work, there are objects that represent an advertising campaign or a user in the system.

It doesn’t really matter what data the objects represent, the point of an API is that it creates a single point of contact for external entities (usually programs) that want to interact with the data in the system. It ensures that only certain operations can be performed, and only by authorized parties.

Here’s where the REST part comes in: for any object represented in the system, you can (in theory) perform 4 basic operations on it:

  • Create: Make a new object of that type
  • Read: Look at what information the object contains
  • Update: Change the information the object contains
  • Delete: Delete the object from the system altogether

Unfortunately, I know about RESTful APIs only from practice, not study. There’s a copy of O’Reilly’s RESTful Web Services at work that I’ve been meaning to read to fill in the gaps. See? This really is a note to self.

Recommended Reading:

Learn to use a programmer’s text editor

You’re a professional who gets paid to work with text all day. Programmers are also professionals who get paid to work with text all day. You should be following their example by using a powerful plain text editor; it will allow you to automate away some of the tedious parts of editing.

Remember, word processors are written by programmers for others to use. Text editors are written by programmers for themselves to use. It’s not even a contest.

I recommend vim or Emacs. Pick one, it doesn’t matter.

Recommended Reading:

Learn a programming language with strong UNIX integration

As noted above, you are going to be documenting software running on networked computers doing … networky stuff. For example, you might be making API calls to a “cloud” of some kind 2. A lot of it will be tedious and potentially time-consuming, which means that you should automate it using a programming language. Once you are pretty familiar with a range of terminal commands, it won’t be too difficult to pick up a language with strong UNIX integration. Tastes vary; I am most familiar with Perl. I do not recommend wasting much time writing shell scripts, as there are too many weird edge cases for beginners to trip over.

Recommended Reading:

  • Learning Perl by Randal L. Schwartz, brian d foy, and Tom Phoenix
  • Modern Perl by chromatic

Footnotes:

1 Where “importance” is defined as: someone will pay you good money to document it. Personally, I’d love to get a paying job rewriting half of the Emacs manual. But I digress.

2 A Freudian slip by the tech industry perhaps? After all, “the cloud” brings to mind an ephemeral, floating entity that is mostly vaporous, and exists at the whim of forces you don’t really understand, and certainly can’t control. In addition, it’s highly likely to be blown away at any time.