Tag Archives: markdown

Include Code Samples in Markdown Files

angelfish.jpg

Introduction

One useful feature that Markdown omits is any way to properly maintain formatted code samples in the text. Instead you have to indent your code samples “by hand”. This is easy to mess up, especially if you have a team of people all editing the same files.

Code indentation and formatting is an important issue if you are writing tech docs intended for engineers. It’s mostly about ease of readability. Badly formatted code is jarring to the eye of your reader and makes the rest of your documentation seem instantly suspect.

In this post I’ll share a technique (a script, really) that I’ve developed for “including” longer code samples into your Markdown documents from external files 1.

Motivation

To understand the motivation for this technique, let’s look at some made-up code samples. If you already understand why one might want to do this, feel free to skip down to the code.

First up is the simple case: a code snippet that’s just a few lines long.

# Spaceship docking mechanism.

my $foo = Foo->new;
$foo->rotate(90);
$foo->engage_maglocks;

That wasn’t too bad. However, you may need a longer code sample like the one shown below which uses a lot of indentation. You really don’t want to be manually indenting this inside a Markdown file.

;; Ye Olde Merge Sort.

(define (merge pred l r)
  (letrec ((merge-aux
            (lambda (pred left right result)
              (cond ((and (null? left) (null? right))
                     (reverse result))
                    ((and (not (null? left)) (not (null? right)))
                     (if (pred (car left) (car right))
                         (merge-aux pred
                                    (cdr left)
                                    right
                                    (cons (car left) result))
                         (merge-aux pred
                                    left
                                    (cdr right)
                                    (cons (car right) result))))
                    ((not (null? left))
                     (merge-aux pred (cdr left) right (cons (car left) result)))
                    ((not (null? right))
                     (merge-aux pred left (cdr right) (cons (car right) result)))
                    (else #f)))))
    (merge-aux pred l r '())))

(define (merge-sort xs pred)
  (let loop ((xs xs)
             (result '()))
    (cond ((and (null? xs) (null? (cdr result))) (car result))
          ((null? xs) (loop result xs))
          ((null? (cdr xs))
           (loop (cdr xs) (cons (car xs) result)))
          (else
           (loop (cddr xs)
                 (cons (merge < (first xs) (second xs)) result))))))

Code to solve the problem

An easier way to do this is to “include” the code samples from somewhere else. Then you can maintain the code samples in separate files where you will edit them with the right support for syntax highlighting and indentation from your favorite code $EDITOR.

The particular inline syntax I’ve settled on for this is as follows:

{include code_dir/YourFile.java}

Where code_dir is a directory containing all of your code samples, and YourFile.java is some random Java source file in that directory. (It doesn’t have to be Java, it could be any language.)

The include syntax is not that important. What’s important is that we can easily maintain our code and text separately. We can edit Markdown in a Markdown-aware editor, and code in a code-aware editor.

Then we can build a “final” version of the Markdown file which includes the properly formatted code samples. One way to do it is with this shell redirection (see below for the source of the expand_markdown_includes script):

$ expand_markdown_includes < your-markdown-file.md.in > your-markdown-file.md

This assumes you use the convention that your not-quite-Markdown files (the ones with the {include *} syntax described here) use the extension .md.in.

Another nice thing about this method is that you can automate the “include and build” step using a workflow like the one described in Best. Markdown. Writing. Setup. Ever.

Finally, here is the source of the expand_markdown_includes script. The script itself is not that important. It could be improved in any number of ways. Furthermore, because it’s so trivial, you can rewrite it in your favorite language.

#!/usr/bin/env perl

use strict;
use warnings;
use File::Basename;
use File::Slurp qw< slurp >;

my $input_file = shift;
my $input_pathname_directory = dirname( $input_file );

my @input_lines = slurp( $input_file );

my $include_pat = "{include ([/._a-z]+)}";

for my $line ( @input_lines ) {
  print $line unless $line =~ m/$include_pat/;

  if ( $line =~ /$include_pat/ ) {
    my $include_pathname = $1;
    my $program_file = build_full_pathname($input_pathname_directory,
                                           $include_pathname);
    my @program_text = slurp( $program_file );
    for my $program_line ( @program_text ) {
      printf( "    %s", $program_line );
    }
  }
}

sub build_full_pathname {
  my ($dir, $file) = @_;
  return $dir . '/' . $file;
}

Footnotes:

1

While I was writing this I decided to do a bit of web searching and I discovered this interesting Stack Overflow thread that mentions a number of different tools that solve this problem. However I rather like mine (of course!) since it doesn’t require any particular Markdown implementation, just the small preprocessing script presented here.

(Image courtesy Claudia Mont under a Creative Commons license.)

Advertisements

Why Markdown is not my favorite text markup language

origami-galerie-freising-tomoko-fuseThere are many text markup languages that purport to allow you to write in a simple markup format and publish to the web. Markdown has arguably emerged as the “king” of these formats. I quite like it myself when it’s used for writing short documents with relatively simple formatting needs. However, it falls a bit short when you start to do more elaborate work. This is especially the case when you are trying to do any kind of “serious” technical authoring.

I know that “Markdown” has been used to write technical books. Game Programming Patterns is one excellent example; you can read more about the author’s use of Markdown here, and the script he uses to extend Markdown to meet his needs is here. (I recommend reading all of his essays about how he wrote the book, by the way. They’re truly inspiring.). Based on that author’s experience (and some of my own), I know that Markdown can absolutely be used as a base upon which to build ebooks, websites, wikis, and more. However, this is exactly why I used the term “Markdown” in quotes at the beginning of this paragraph. By the time you’ve extended Markdown to cover your more featureful technical authoring use cases, it really isn’t “just” Markdown anymore. This is fine if you just want to get something done quickly that meets your own needs, but it’s not ideal if you want to work with a meaningful system can be standardized and built on.

Below I’ll address just a few of the needs of “industrial” technical writing (the kind that I do, ostensibly) where Markdown falls a little short. Lest this come off as too negative, it’s worth stating for the record that a homegrown combination of Markdown and a few scripts in a git repo with a Makefile is still an absolute paradise compared to almost all of the clunky proprietary tooling that is marketed and sold for the purposes of “mainstream” technical writing. I have turned to such a homebrewed setup myself in times of need. I’ve even written about how awesome writing in Markdown can be. However, this essay is an attempt to capture my thoughts on Markdown’s shortcomings. Like any good internet crank, I reserve the right to pull a Nickieben Bourbaki at a later date.

I. No native table support

If you are doing any kind of large-scale tech docs, you need tables. Although constraints are always good, and a simple list can probably replace 80% of your table usage if you’re disciplined, there are times when you really just need a big honkin’ table. And as much as I’m used to editing raw XML and HTML directly in Emacs using its excellent tooling to completely sidestep the unwanted “upgrade” to the Confluence editor at $WORK, most writers probably don’t want to be authoring tables directly in HTML (which is the “native” Markdown solution).

II. No native table of contents support

Yes, I can write a script myself to do this. I can also use one of the dozens of such scripts written by others. However, I’d rather have something built in, and consider it a weakness of the format.

III. Forcing the user to fall back to inline HTML is not really OK

Like tables, there are a number of other formatting and layout use cases that Markdown can’t handle natively. As with tables, you must resort to just slapping in some raw HTML. Two reasons why this isn’t so amazing are:

  • It’s hard for an editor to support well, since editing “regular” text markup and tag-based markup languages are quite different beasts
  • It punts complexity to thousands of users in in order to preserve implementation simplicity for a small number of implementors

I can sympathize with the reasoning behind this design decision, since I am usually the guy making his own little hacks that meet simple use cases, but again: not really OK for serious work.

IV. Too many different ways to express the same formatting

This has lead to a number of incompatibilities among the different “Markdown” renderers out there. Just a few of the areas where ambiguity exists are: headers, lists, code sections, and links. For an introduction to Markdown’s flexible semantics, see the original syntax docs. Then, for a more elaborate description of the inconsistencies and challenges of rendering Markdown properly, see Why is a spec needed?, written by the CommonMark folks.

V. Too many incompatible flavors

There are too many incompatible flavors of Markdown that each render a document slightly differently. For a good description of the ways different Markdown implementations diverge, see the Babelmark 2 FAQ.

The “incompatible flavors” issue will hopefully be addressed with the advent of the CommonMark Standard, but if you read the spec it doesn’t address points I, II, or III at all. This makes sense from the perspective of the author of a standards document: a spec isn’t very useful unless you can achieve consensus and adoption among all the slightly different implementations out there right now, and Markdown as commonly understaood doesn’t try to support those cases anyway.

VI. No native means of validation

There will of course be a reference implementation and tests for CommonMark, which will ensure that the content is valid Markdown, but for large-scale documentation deployments, you really need the ability to validate that the documentation sets you’re publishing have certain properties. These properties might include, but aren’t limited to:

  • “Do all of the links have valid targets?”
  • “Is every page reachable from some other page?”

Markdown doesn’t care about this. And to be fair it never said it would! You are of course free to use other tools to perform all of the validations you care about on the resulting HTML output. This isn’t necessarily so bad (in fact it’s not as bad as points I and II in my opinion, since those actually affect you while you’re authoring), but it’s an issue to be aware of.

This is one area where XML has some neat tooling and properties. Although I suppose you could do something workable with a strict subset of HTML. You could also use pandoc to generate XML, which you then validate according to your needs.

Conclusion

Markdown solves its original use case well, while punting on many others in classic Worse is Better fashion. To be fair to Markdown, it was never purported to be anything other than a simple set of formatting conventions for web writing. And it’s worth saying once more that, even given its limitations, a homegrown combination of Markdown and a few scripts in a git repo with a Makefile is still an absolute paradise compared to almost all of the clunky proprietary tooling that is marketed and sold for the purposes of “mainstream” technical writing.

Even so, I hope I’ve presented an argument for why Markdown is not ideal for large scale technical documentation work.

(Image courtesy Gerwin Sturm under a Creative Commons license.)

Best. Markdown. Writing. Setup. Ever.

../img/markdown-emacs-compilation.png

When writing in a source format such as Markdown, it’s nice to be able to see your changes show up automatically in the output. One of my favorite ways to work is to have Emacs and Firefox open side by side (as shown above). Whenever I save my Markdown file, I want Emacs to automatically build a new HTML file from it, and I want Firefox to automatically refresh to show the latest changes.

Once you have this set up, all you have to do is write and save, write and save.

As it happens, fellow Redditor goodevilgenius was looking to accomplish just this workflow. I originally posted this answer on Reddit, but I’m reposting it here in the hope that it will help some kindly internet stranger someday.

I have this exact use case. I use compile-on-save mode and the Firefox Auto Reload extension.

So in a Markdown buffer (once you’ve installed compile-on-save mode):

M-x compile-on-save-mode RET
M-x compile RET markdown current-file.md > /tmp/current-file.html
Open current-file.html in Firefox.
Write stuff and save. Emacs will auto-compile the Markdown, and Firefox will instantly auto-reload the HTML file.

With Emacs and Firefox open side-by-side, I find it pretty easy to enter a “flow” state, since all you have to do is write and save the file. Hope that helps!

The Emacs-savvy reader will note that this workflow isn’t confined to Markdown. For example, compile-on-save mode could kick off an XML doc build (or any other computation you like, for that matter).