Documenting the Difficulties of Documentation

2011-12-11

Programming , Technology

documentation , OpenSource , Scala

Over the years i’ve been involved in a wide-range of diverse open source projects. Some large. Some small. Some very obscure. But every single one has at one point or another, had a “problem” with documentation. Many of you reading this will no doubt have had a similar experience at some point in your programming career. Perhaps you were a project creator wondering how best to communicate the inherent awesome of something you’ve just created, or perhaps you were that eager n00b trying to get to grips with some new technology you came across on github - in either case, you have to create or consume “documentation”.

Now then, before we go any further I do have somewhat of a problem with the overloaded meaning of the term “documentation”. The widely accepted definition of documentation is the following:

Doc·u·men·ta·tion noun - Material that provides official information or evidence or that serves as a record.

Evidence that serves as a record? That sounds awfully vague doesn’t it. I’d like to suggest that this is, to all intent purposes, too vague. More specifically it is often an ambiguous umbrella term for what people are actually referring too. It’s this ambiguity that causes all manner of problems for software projects and their (would-be)users.

Disambiguating Documentation Components

There are many excellent projects in the Scala ecosystem, and many suffer this apparent “lack of documentation”. Curiously though, each project seems to suffer this affliction in its own specific way… at least, this is often what you might witness on the mailing-list of any project you might care to pick at random. Most projects have some thread or other in the archives of their mailing lists where someone has complained about their documentation, or distinct lack of whatever it is they were looking for.

With this in mind, i’d like to now illustrate what I think are the key aspects of the wider “documentation problem”, and disambiguate their respective intentions.

Tutorials and Introductions

The first thrust of documentation i’d like to define is that all-important introductory material people will need when coming to your project in the first instance. This is quite probably the most difficult type of text to write, particularly if the subject matter is highly technical in nature. It’s typically difficult for the following three reasons:

Assumptions - Assuming the reader fully understands a topic, line of code, operator or anything else must be one of the most common issues. In introductory texts and tutorials its highly likely that the reader will not be in possession of the implicit knowledge that’s relevant to properly grok the topic at hand.
Accessibility - “Joe Developer” can initially be easily scared by large words, complex-sounding terms that originate from academic theories, and terse writing styles. It is so vitally important that your tutorials and introduction texts are accessible. For the most part this may well mean that you have to sweep over some of the finer details, or forego some more abstract possibilities in order to effectively get the point across to newcomers. Ironically, it is often this simplification or frivolity with the facts that programmers struggle with when taking up the pen (ok - keyboard, but you know what I mean).
Authorship - Writing is hard. Be honest with yourself and recognise that you may not be any good at it. During the writing of Lift in Action I had to throw away a whole bunch of manuscript which was either too technical or just plain rubbish. Having a professional team of editors who could help me (re)learn about writing was really key… but i’m aware that this is hardly practical for the common case. The fact is that most of us have not written long documents since high-school or college, and it is incredibly difficult. Don’t forget this when writing your project introduction and tutorials - if you can’t do it the proper justice it deserves, then embrace the fact that us humans all have different skills and find someone who can plug the required gaps. Having your texts reviewed honestly by your peers is also another useful strategy to ensure what you’re writing is actually any good.

Examples and Explanations

The second branch of the documentation umbrella is somewhat of an extension to the first, but I decided to make it separate because its use-case feels distinctly different. With this in mind i’ll add the caveat that yes, examples often form parts of tutorials (and later, references), but in and of themselves you wouldn’t use verbose introductory-style writing within an example. In my mind at least, the primary difference is that example/explanatory documentation is tightly coupled with the code to which it relates. That is to say that the text is more often than not sitting next to the code itself in the comments, which usually means there are certain conventions to follow with regard to syntax and so forth (e.g. ScalaDoc, Wiki markup etc). Critically though, the tone of voice in the explanatory text when compared to that of the introductory & tutorial manuscripts is much shorter, more concise and really focusing on the line-by-line, blow-by-blow goings on of the code. More generally, its reasonable to suggest that examples are about illustrating concepts, and this is where the writing style really differentiates itself from the other branches listed in this article.

Finally, if you’re going to go to the trouble of writing examples and explanations, make sure the code actually works! Nothing is more frustrating to find an example that simply does not do what it should because the code either doesn’t compile or doesn’t run correctly.

Reference materials

Reference material is your last line of defence before forcing users to delve into the source code. Consider the type of person might be using a reference, or what their goal might be? I’d propose that when someone is looking at a reference, they know what they are looking for: they need something specific. It’s probably also fair to assume that before arriving at the reference they will have read tutorials and examples, so there is a degree of implied understanding. Reference materials are usually heavy on details and light on fluffy writing style, which allows the author to be far more technical, and satisfy the aforementioned need for presenting the exact facts.

Unlike the other aspects of documentation writing, references can have the tendency to become quite large; even for mid-sized projects. With this in mind you should take care to refactor the organisation and layout of the reference with each major change and strives for a reference that is logically ordered and consistent throughout. If your reference is massive, then you should seriously consider having a decent search function in addition to a logical layout.

Source Code

Your last line of documentation defence is the source code. That might seem odd, as i’m not talking about comments or ScalaDoc (or similar in your language of choice). I’m talking about types (sorry dynamically typed people!). Type annotations can be extremely useful when reading code and concisely communicating the result or intention of a particular item. With Scala for example, consider these two lines:

// actual code irrelevant
val foo = whatever.map(...).flatMap(...).foldLeft(...).map(...)
val foo: Option[Int] = whatever.map(...).flatMap(...).foldLeft(...).map(...)

Having the simple type annotation frees me from having to mull over the code in order to understand its result; its right there in the type annotation. When moving code between teams, or people, having the ability to simple scan complex blocks of code and understand it is a huge win (IMHO). Sure, make use of type inference where the value is obvious (e.g. simple assignment etc), but where you think something might not be directly understood explicitly annotating can serve as effective documentation.

…In any case, writing readable and well documented code could easily be the subject of a whole other blog post (or indeed, academic paper), so we’ll put a pin in that subject and move swiftly along…

In my mind at least, when your general users complain about documentation, they are typically complaining about one of first three branches of this documentation umbrella.

Isn’t all this a lot of work?

I won’t lie to you, good reader: this will take a lot of time and dedication to do. To do it well, will take more time and a fuckton more effort. However, if you want your software or project to be used by people other than yourself, then it is imperative the documentation is well structured, and exhibiting some - if not most - of the traits in this article. It’s also important that you realise that it will, in all likelihood, be “expensive” in terms of time; this is nearly unavoidable, but it will make your project more approachable and more usable.

Interestingly one thing that you often see are projects that try to distill the documentation effort by promoting community authorship, which when considering what a social activity programming has become in recent times, does not seem like such a crazy idea. The reality however is somewhat different to the ideal: people are often keen to submit bugs and patches, but those same keen people are still typically reluctant to contribute documentation. One could speculate that writing documentation was too tedious, or that perhaps it was not as fun as doing the coding and people didn’t want to spend time in that area… the reason is actually irrelevant, as the result is always the same: without a small core of dedicated people who write, maintain and constantly improve that wiki the whole documentation effort will fail. To qualify that, i’m not saying that community-powered documentation never works, as that clearly isn’t the case. I am however suggesting that by-and-large for most communities it simply does not operate effectively, and this has an overall negative impact on the project as a whole.

Who’s doing it right?

I’m not going to gratify this article with pointing out projects that are “doing it wrong”, as frankly most projects are making a hash of their documentation; irrespective of language or community. I do however want to highlight a couple of projects that are setting an excellent example:

Akka - The Akka team are making a superb job of documenting their project, even with extensive changes to the codebase they are very effective when it comes to ensuring the docs are up-to-date and covering all new or refactored features. The documentation is nearly exclusively maintained by the core team of programmers.
JQuery - Very different to Akka, and in a different community, JQuery has been very effective in delivering core reference materials that allow (and encourage) the wider community to write tutorials, introductions and other helpful articles. JQuery also makes extensive use of illustrative examples, and its good documentation is probably one of the reasons for its apparent ubiquity on the web today.

Both of these projects exhibit dedication on the part of the coders, who are typically the ones authoring the core reference materials and bulk of the ancillary texts. Learn from these projects and others like them. We can all do a better job of documenting our projects, myself included. Say no to undocumented projects, and the next time you’re throwing something on github, take the time to write some documentation… even if its a long README people will thank you for it.