reStructuredText for blog post formatting

This post documents my transition from Textile to reStructuredText, with Pygments for source code highlighting.

Leaving Textile

When I got tired banging in HTML code for my blog posts, I found Textile as a friendlier solution.

However, I'm finally fed-up with Textile, for several reasons:

No implementation does exactly what I want, and tweaking is essential. But textile implementations were not designed for tweaking, so making them fit your needs is a painful experience.
Since I'm lately into Python, I've been recently using pytextile, which turned out to be a particularly bad implementation [1].
The source code formatting (in <pre> blocks) of the textile processors kept clashing with Wordpress

And, looking for a better solution, I ran into reStructuredText, which is part of the docutils package.

reStructuredText (reST)

reStructuredText has a few immediate benefits over Textile:

It is being developed very actively. A few busy mailing lists is always a good sign of healthy development activity.
The main implementation is in Python
reStructuredText is considered to be a quasi-standard tool in the Python world, and is being used to format docstrings and even PEPs
Its architecture is designed to be hackable and extensible from the ground up, and the documentation is very extensive and detailed.
reStructuredText is suitable for more complex tasks than simple formatting. It can be used to format whole documents, with hyper-linked sections and a table of contents. The certainly "eat their own dog food" - the whole stack of documentation (and there's a lot of it) is formatted with reStructuredText

Installing reST

Installation was a snap. I've downloaded docutils, followed the installation instructions and was up and running in 2 minutes. docutils installs a few useful scripts into the scripts installation directory of Python, and these can be used to turn text into various formats - HTML, XML, Latex, etc.

In principle, reST is similar to Textile, and learning it was very easy. It took me less than an hour to whip up a sample document for myself that contains all the types of formatting I ever use for my blog posts. From a cursory glance, reST seems to be more powerful than Textile in several ways, providing more options. It is a tad less lightweight [2], but I think this is for a good purpose - Textile's lightness is the cause of the bad quality of parsers written for it.

The only problem I had with reST is its construct for formatting source code. It's quite easy to do (simply ident a block of text, and it will be placed in <pre> tags), but it wouldn't be easy to connect it with the wp-syntax Wordpress plugin I'm using to highlight code in my blog.

So I've decided to give Pygments a try.

Pygments

Pygments is a Python library for source code highlighting. It is widely used [3] and respected, and best of all - can easily connect to reST. After installing Pygments (just downloading from its website and following the instructions), I've modified the supplied external/rst-directive.py script for my needs, and created a generic "runner script" that is called with a text file as an argument, and creates from it an HTML file, formatted with reST with Pygments syntax highlighting (hooked to the sourcecode directive).

Here's the code of the runner script, together with my custom style class for Pygments:

# A 'runner' for HTML output
# Accepts the input file and output file names as command line
# arguments. Loads docutils and pygments and runs the formatter.
#
# Based on:
#  rst2html - from the docutils distribution
#  external/rst-directive.py - from the pygments distribution
#
# This code is in the public domain
# Eli Bendersky
#

try:
    import locale
    locale.setlocale(locale.LC_ALL, '')
except:
    pass

##
## Configuring Pygments
##
from pygments.formatters import HtmlFormatter
from pygments import highlight
from pygments.lexers import get_lexer_by_name, TextLexer

from pygments.style import Style
from pygments.token import Keyword, Name, Comment, String, Error, \
     Number, Operator, Generic, Whitespace, Text

class SciteStyle(Style):
    default_style = ""

    styles = {
        Whitespace:                 '#bbbbbb',
        Text:                       '#000000',

        Comment:                    '#007f00',

        Keyword:                    'bold #00007f',

        Operator.Word:              '#0000aa',

        Name.Builtin:               '#00007f',
        Name.Function:              '#00007f',
        Name.Class:                 '#00007f',
        Name.Namespace:             '#00007f',

        String:                     '#7f007f',

        Number:                     '#007f7f',

        Generic:                    '#000000',
        Generic.Heading:            'bold #000080',
        Generic.Subheading:         'bold #800080',
        Generic.Deleted:            '#aa0000',
        Generic.Inserted:           '#00aa00',
        Generic.Error:              '#aa0000',
        Generic.Emph:               'italic',
        Generic.Strong:             'bold',
        Generic.Prompt:             '#555555',
        Generic.Output:             '#888888',
        Generic.Traceback:          '#aa0000',

        Error:                      '#F00 bg:#FAA'
    }

# Set to True if you want inline CSS styles instead of classes
inlinestyles = True

# The default formatter
DEFAULT = HtmlFormatter(noclasses=inlinestyles, linenos=False, style=SciteStyle)

# Add name -> formatter pairs for every variant you want to use
VARIANTS = {
     'linenos': HtmlFormatter(noclasses=inlinestyles, linenos=True, style=SciteStyle)
}


def pygments_directive(name, arguments, options, content, lineno,
                       content_offset, block_text, state, state_machine):
    """ Will process the highlighted source-code directive.
    """
    try:
        lexer = get_lexer_by_name(arguments[0])
    except ValueError:
        # no lexer found - use the text one instead of an exception
        lexer = TextLexer()
    # take an arbitrary option if more than one is given
    formatter = options and VARIANTS[options.keys()[0]] or DEFAULT
    parsed = highlight(u'\n'.join(content), lexer, formatter)
    return [nodes.raw('', parsed, format='html')]


##
## Loading docutils and registering the new directive
##
from docutils import nodes, io
from docutils.parsers.rst import directives
import docutils.core

pygments_directive.arguments = (1, 0, 1)
pygments_directive.content = 1
pygments_directive.options = dict([(key, directives.flag) for key in VARIANTS])
directives.register_directive('sourcecode', pygments_directive)


##
## Execution
##
import os, sys

infile = sys.argv[1]
outfile = os.path.splitext(infile)[0] + ".html"

print "Running HTML writer:\n-> %s" % outfile

# Running publish_parts to get at the document body, without
# header, style specifications and footer
#
parts = docutils.core.publish_parts(
            source=open(infile, 'r'),
            source_class=io.FileInput,
            settings_overrides = {
                'doctitle_xform': 0,
                'initial_header_level': 3},
            writer_name='html')

open(outfile, 'w').write(parts['body'])

Conclusion

So, now I'm a happy user of reStructuredText with Pygments. I must say that the transition to reST has been a very pleasant one. The docutils library is very well designed and documented - such libraries are a pleasure to work with.

This post was written in the new setup with reST, of course.

[1]

I'm not usually into bashing open-source work done by people, but I do believe that some constructive criticism is appropriate.

pytextile is very buggy
It doesn't seem to be maintained - the authors don't respond to issues opened in the bug tracker
The code is designed in a very customization-averse manner. Even fixing bugs is difficult, not to mention adding features.

[2]	lightweight in the sense of formatting. Some basic formattings take a couple of extra keystrokes in reST. Overall, it's not too bad and very easy to get used to.

[3]	Used by, along others, Trac, ActiveState Code and several popular pastebins.