reStructuredText for blog post formatting

August 24th, 2008 at 7:58 pm

This post documents my transition from Textile to reStructuredText, with Pygments for source code highlighting.

Leaving Textile

When I got tired banging in HTML code for my blog posts, I found Textile as a friendlier solution.

However, I’m finally fed-up with Textile, for several reasons:

  1. No implementation does exactly what I want, and tweaking is essential. But textile implementations were not designed for tweaking, so making them fit your needs is a painful experience.
  2. Since I’m lately into Python, I’ve been recently using pytextile, which turned out to be a particularly bad implementation [1].
  3. The source code formatting (in <pre> blocks) of the textile processors kept clashing with WordPress

And, looking for a better solution, I ran into reStructuredText, which is part of the docutils package.

reStructuredText (reST)

reStructuredText has a few immediate benefits over Textile:

  1. It is being developed very actively. A few busy mailing lists is always a good sign of healthy development activity.
  2. The main implementation is in Python
  3. reStructuredText is considered to be a quasi-standard tool in the Python world, and is being used to format docstrings and even PEPs
  4. Its architecture is designed to be hackable and extensible from the ground up, and the documentation is very extensive and detailed.
  5. reStructuredText is suitable for more complex tasks than simple formatting. It can be used to format whole documents, with hyper-linked sections and a table of contents. The certainly "eat their own dog food" – the whole stack of documentation (and there’s a lot of it) is formatted with reStructuredText

Installing reST

Installation was a snap. I’ve downloaded docutils, followed the installation instructions and was up and running in 2 minutes. docutils installs a few useful scripts into the scripts installation directory of Python, and these can be used to turn text into various formats – HTML, XML, Latex, etc.

In principle, reST is similar to Textile, and learning it was very easy. It took me less than an hour to whip up a sample document for myself that contains all the types of formatting I ever use for my blog posts. From a cursory glance, reST seems to be more powerful than Textile in several ways, providing more options. It is a tad less lightweight [2], but I think this is for a good purpose – Textile’s lightness is the cause of the bad quality of parsers written for it.

The only problem I had with reST is its construct for formatting source code. It’s quite easy to do (simply ident a block of text, and it will be placed in <pre> tags), but it wouldn’t be easy to connect it with the wp-syntax WordPress plugin I’m using to highlight code in my blog.

So I’ve decided to give Pygments a try.

Pygments

Pygments is a Python library for source code highlighting. It is widely used [3] and respected, and best of all – can easily connect to reST. After installing Pygments (just downloading from its website and following the instructions), I’ve modified the supplied external/rst-directive.py script for my needs, and created a generic "runner script" that is called with a text file as an argument, and creates from it an HTML file, formatted with reST with Pygments syntax highlighting (hooked to the sourcecode directive).

Here’s the code of the runner script, together with my custom style class for Pygments:

# A 'runner' for HTML output
# Accepts the input file and output file names as command line
# arguments. Loads docutils and pygments and runs the formatter.
#
# Based on:
#  rst2html - from the docutils distribution
#  external/rst-directive.py - from the pygments distribution
#
# This code is in the public domain
# Eli Bendersky
#

try:
    import locale
    locale.setlocale(locale.LC_ALL, '')
except:
    pass

##
## Configuring Pygments
##
from pygments.formatters import HtmlFormatter
from pygments import highlight
from pygments.lexers import get_lexer_by_name, TextLexer

from pygments.style import Style
from pygments.token import Keyword, Name, Comment, String, Error, \
     Number, Operator, Generic, Whitespace, Text

class SciteStyle(Style):
    default_style = ""

    styles = {
        Whitespace:                 '#bbbbbb',
        Text:                       '#000000',

        Comment:                    '#007f00',

        Keyword:                    'bold #00007f',

        Operator.Word:              '#0000aa',

        Name.Builtin:               '#00007f',
        Name.Function:              '#00007f',
        Name.Class:                 '#00007f',
        Name.Namespace:             '#00007f',

        String:                     '#7f007f',

        Number:                     '#007f7f',

        Generic:                    '#000000',
        Generic.Heading:            'bold #000080',
        Generic.Subheading:         'bold #800080',
        Generic.Deleted:            '#aa0000',
        Generic.Inserted:           '#00aa00',
        Generic.Error:              '#aa0000',
        Generic.Emph:               'italic',
        Generic.Strong:             'bold',
        Generic.Prompt:             '#555555',
        Generic.Output:             '#888888',
        Generic.Traceback:          '#aa0000',

        Error:                      '#F00 bg:#FAA'
    }

# Set to True if you want inline CSS styles instead of classes
inlinestyles = True

# The default formatter
DEFAULT = HtmlFormatter(noclasses=inlinestyles, linenos=False, style=SciteStyle)

# Add name -> formatter pairs for every variant you want to use
VARIANTS = {
     'linenos': HtmlFormatter(noclasses=inlinestyles, linenos=True, style=SciteStyle)
}


def pygments_directive(name, arguments, options, content, lineno,
                       content_offset, block_text, state, state_machine):
    """ Will process the highlighted source-code directive.
    """
    try:
        lexer = get_lexer_by_name(arguments[0])
    except ValueError:
        # no lexer found - use the text one instead of an exception
        lexer = TextLexer()
    # take an arbitrary option if more than one is given
    formatter = options and VARIANTS[options.keys()[0]] or DEFAULT
    parsed = highlight(u'\n'.join(content), lexer, formatter)
    return [nodes.raw('', parsed, format='html')]


##
## Loading docutils and registering the new directive
##
from docutils import nodes, io
from docutils.parsers.rst import directives
import docutils.core

pygments_directive.arguments = (1, 0, 1)
pygments_directive.content = 1
pygments_directive.options = dict([(key, directives.flag) for key in VARIANTS])
directives.register_directive('sourcecode', pygments_directive)


##
## Execution
##
import os, sys

infile = sys.argv[1]
outfile = os.path.splitext(infile)[0] + ".html"

print "Running HTML writer:\n-> %s" % outfile

# Running publish_parts to get at the document body, without
# header, style specifications and footer
#
parts = docutils.core.publish_parts(
            source=open(infile, 'r'),
            source_class=io.FileInput,
            settings_overrides = {
                'doctitle_xform': 0,
                'initial_header_level': 3},
            writer_name='html')

open(outfile, 'w').write(parts['body'])

Conclusion

So, now I’m a happy user of reStructuredText with Pygments. I must say that the transition to reST has been a very pleasant one. The docutils library is very well designed and documented – such libraries are a pleasure to work with.

This post was written in the new setup with reST, of course.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1]

I’m not usually into bashing open-source work done by people, but I do believe that some constructive criticism is appropriate.

  1. pytextile is very buggy
  2. It doesn’t seem to be maintained – the authors don’t respond to issues opened in the bug tracker
  3. The code is designed in a very customization-averse manner. Even fixing bugs is difficult, not to mention adding features.
[2] lightweight in the sense of formatting. Some basic formattings take a couple of extra keystrokes in reST. Overall, it’s not too bad and very easy to get used to.
[3] Used by, along others, Trac, ActiveState Code and several popular pastebins.

Related posts:

  1. Textile – a simple markup language for the web
  2. Posting correctly-aligned Latex formulae in a WordPress blog
  3. hmm… my last post was prophetic
  4. Local execution of Python CGI scripts
  5. Posting mathematical formulae in a WordPress blog

11 Responses to “reStructuredText for blog post formatting”

  1. JohnNo Gravatar Says:

    I like reST, and I think — formatting-wise — it’s a good alternative to Markdown. The reason reST is not more widely-used is because there’s no easy way for non-technical users to just have a script that turns a reST-formatted document into an html fragment.

    Markdown has this. It’s just the Markdown.pl script from daringfireball. You download it and run it. Pass it a filename and out of stdout comes plain html (but no html or head tags, nor CSS — just an html fragment).

    Someone needs to write and distribute (via the Cheeseshop, please) a simple rst2htmlfrag.py script that does what Markdown.pl does (but for reST).

    Incidentally, no, I have no idea why the bundled rst2html.py tool does not feature an “–html-fragment-only” option. (Or, if it has such an option, I’m not seeing it in the online help.)

    And, please, don’t tell me it’s trivial to write such a script. I realize it’s trivial for someone who knows a little bit of python and who’s willing to dig around the docutils docs. Maybe it’s so darn trivial that no one thinks they need to do it and so very few users consider using reST when comparing it to Markdown, Textile, asciidoc, etc.

    Eli, maybe this would be a good exercise for you if you’ve never packaged and uploaded to the cheeseshop.

  2. JohnNo Gravatar Says:

    Edit: Sorry, don’t want to sound harsh or rude, I just don’t like to see a good markup format not be widely-used because people can’t easily use it.

    Edit #2: Also, it looks like the Cheeseshop is now instead called the “python package index”, or just “package index” for short.

  3. Valentin JacqueminNo Gravatar Says:

    Hello Eli,
    I was just wondering… Why do you get tired by writing your posts in HTML since WordPress has a great wysiwyg?

  4. elibenNo Gravatar Says:

    John: Indeed, it seems that publishing only the body fragment without the programmatic interface is impossible.

    I don’t have PyPi experience, but from what I’ve seen the author of reST is very responsive, and you should definitely tell him the idea you have. He’ll probably add it as a command line option to the published script.

    Valentin: I don’t like WYSIWYGs :-) They’re all good and nice while they serve your need. But once there’s something you can’t do with them, that’s it, you’re done. For example, I found no normal way to conveniently post source code and footnotes with WordPress’s WYSIWYG.

  5. Vasudev RamNo Gravatar Says:

    Interesting post, Eli.

    Coincidentally, I had been thinking about the same issue – generating content for blog posts from some form of structured text such as Textile or Markdown – just a little while before I read this post. Though I knew about reST (reStructuredText), for some reason I hadn’t remembered that option – so its good that I read your post …

    With my PC config, I see an issue with the output you got – i.e. this current post:

    The code shown is truncated on the right side. I tried resizing the text (using Firefox) with Ctrl-plus and Ctrl-minus, to different sizes, but the problem stays.

    The text (English) content of the post shows fine, though.

    - Vasudev

  6. Vasudev RamNo Gravatar Says:

    Checked again after getting a message from Eli – there is a scroll bar at the bottom of the box, but missed seeing it, either due to a hurry or because it isn’t visible when you’re viewing the top or middle of the code (since the box is tall).

  7. elibenNo Gravatar Says:

    Vasudev: yes, this is a limitation of those HTML text boxes, and I don’t know if I can do much about it. You can always copy-paste the whole code and view it comfortably in an outside editor if the scrolling becomes a real problem.

    Update: I think I’ve managed to limit the height of the ‘pre’ boxes (using the max-height CSS property). I hope it works out nicely now.

  8. Vasudev RamNo Gravatar Says:

    Thanks, Eli. Did a quick check – looks fine now.
    Will let you know if any other issues are found.

    - Vasudev

  9. Jason SamsaNo Gravatar Says:

    Around the time of this post I took over the pytextile project. Since taking it over, I’ve completely rewritten a port of Textile 2.0 to Python and have built a lot of tests around the code. Some issues still exist but I think the current implementation (2.1.3) is much improved over past versions. It doesn’t have some of the non-textile features that existed in previous versions, but it does core textile much better.

    It would be nice to allow pluggable extensions so that advanced users can customize pytextile easily. Perhaps that will be a feature I build this year.

  10. elibenNo Gravatar Says:

    Hi Jason, thanks for commenting!

    I’m glad to hear that Textile has moved forward – it’s a real shame it was neglected for so long.

    Personally, I’m very happy with reST now, and am using it for all my formatting needs.

  11. SomebodyNo Gravatar Says:

    Thanks for sharing.
    If you want to test this with Python 3, get “distribute” as “easy_install” replacement for Python 3, then install “pygments” and “docutils”. Docutils currently has a reported bug with documented workaround, which is included in this “2to3″ conversion:

    # http://eli.thegreenplace.net/2008/08/24/restructuredtext-for-blog-post-formatting/
    # A 'runner' for HTML output
    # Accepts the input file and output file names as command line
    # arguments. Loads docutils and pygments and runs the formatter.
    #
    # Based on:
    #  rst2html - from the docutils distribution
    #  external/rst-directive.py - from the pygments distribution
    #
    # This code is in the public domain
    # Eli Bendersky
    #
    
    try:
        import locale
        locale.setlocale(locale.LC_ALL, '')
    except:
        pass
    
    ##
    ## Configuring Pygments
    ##
    from pygments.formatters import HtmlFormatter
    from pygments import highlight
    from pygments.lexers import get_lexer_by_name, TextLexer
    
    from pygments.style import Style
    from pygments.token import Keyword, Name, Comment, String, Error, \
         Number, Operator, Generic, Whitespace, Text
    
    class SciteStyle(Style):
        default_style = ""
    
        styles = {
            Whitespace:                 '#bbbbbb',
            Text:                       '#000000',
    
            Comment:                    '#007f00',
    
            Keyword:                    'bold #00007f',
    
            Operator.Word:              '#0000aa',
    
            Name.Builtin:               '#00007f',
            Name.Function:              '#00007f',
            Name.Class:                 '#00007f',
            Name.Namespace:             '#00007f',
    
            String:                     '#7f007f',
    
            Number:                     '#007f7f',
    
            Generic:                    '#000000',
            Generic.Heading:            'bold #000080',
            Generic.Subheading:         'bold #800080',
            Generic.Deleted:            '#aa0000',
            Generic.Inserted:           '#00aa00',
            Generic.Error:              '#aa0000',
            Generic.Emph:               'italic',
            Generic.Strong:             'bold',
            Generic.Prompt:             '#555555',
            Generic.Output:             '#888888',
            Generic.Traceback:          '#aa0000',
    
            Error:                      '#F00 bg:#FAA'
        }
    
    # Set to True if you want inline CSS styles instead of classes
    inlinestyles = True
    
    # The default formatter
    DEFAULT = HtmlFormatter(noclasses=inlinestyles, linenos=False, style=SciteStyle)
    
    # Add name -> formatter pairs for every variant you want to use
    VARIANTS = {
         'linenos': HtmlFormatter(noclasses=inlinestyles, linenos=True, style=SciteStyle)
    }
    
    def pygments_directive(name, arguments, options, content, lineno,
                           content_offset, block_text, state, state_machine):
        """ Will process the highlighted source-code directive.
        """
        try:
            lexer = get_lexer_by_name(arguments[0])
        except ValueError:
            # no lexer found - use the text one instead of an exception
            lexer = TextLexer()
        # take an arbitrary option if more than one is given
        formatter = options and VARIANTS[list(options.keys())[0]] or DEFAULT
        parsed = highlight('\n'.join(content), lexer, formatter)
        return [nodes.raw('', parsed, format='html')]
    
    ##
    ## Loading docutils and registering the new directive
    ##
    from docutils import nodes, utils, io
    from docutils.parsers.rst import directives
    import docutils.core
    
    pygments_directive.arguments = (1, 0, 1)
    pygments_directive.content = 1
    pygments_directive.options = dict([(key, directives.flag) for key in VARIANTS])
    directives.register_directive('sourcecode', pygments_directive)
    
    ##
    ## Execution
    ##
    import os, sys
    
    infile = sys.argv[1]
    outfile = os.path.splitext(infile)[0] + ".html"
    
    print("Running HTML writer:\n-> %s" % outfile)
    
    # Running publish_parts to get at the document body, without
    # header, style specifications and footer
    #
    parts = docutils.core.publish_parts(
                source=open(infile, 'r'),
                source_class=io.FileInput,
                settings_overrides = {
                    'doctitle_xform': 0,
                    'initial_header_level': 3},
                writer_name='html')
    
    open(outfile, 'w').write(parts['body'])