Co-routines as an alternative to state machines

August 29th, 2009 at 2:32 pm

Observation:

Co-routines are to state machines what recursion is to stacks

When you have to traverse some sort of a nested data structure (say, a binary tree), one approach is to create a stack that remembers where in the tree you are. Another, much more elegant approach, is to write the function recursively. A recursive function employs the machine stack used to implicitly implement function calls – you get the benefits of the stack without paying the cost of reduced readability.

In this article I’ll try to show, using a simple, yet very realistic example why co-routines do the same to state machines.

The problem – serial framing

I’ve written a detailed article about framing earlier this month. The simple summary is: we have an endless incoming stream of bytes, from which we need to deduce structured data frames. That is, we have to find where a frame starts, where it ends and what is the data it carries. For this purpose we use a special header value, footer value and an escape byte (DLE).

A complete Python implementation is described here, but in this article I will present the solution in a simplified manner, keeping all irrelevant details out.

The state machine

Given a stream and receiving one byte at a time, here is the state machine that describes the framing process:

http://eli.thegreenplace.net/wp-content/uploads/2009/08/statemachine1_framing.png

Only inputs and state transitions are shown. The framing process outputs complete frames when moving from the IN_MSG state to the WAIT_HEADER stage (this happens when a footer is received) [1]

Implementing the state machine

Here’s an implementation of this state machine in Python. The internal state is kept in an object:

class ProtocolWrapper(object):
    def __init__(self,
            header='\x61',
            footer='\x62',
            dle='\xAB',
            after_dle_func=lambda x: x):
        self.header = header
        self.footer = footer
        self.dle = dle
        self.after_dle_func = after_dle_func

        self.state = self.WAIT_HEADER
        self.frame = ''

    # internal state
    (WAIT_HEADER, IN_MSG, AFTER_DLE) = range(3)

    def input(self, byte):
        """ Receive a byte.
            If this byte completes a frame, the
            frame is returned. Otherwise, None
            is returned.
        """
        if self.state == self.WAIT_HEADER:
            if byte == self.header:
                self.state = self.IN_MSG
                self.frame = ''

            return None
        elif self.state == self.IN_MSG:
            if byte == self.footer:
                self.state = self.WAIT_HEADER
                return self.frame
            elif byte == self.dle:
                self.state = self.AFTER_DLE
            else:
                self.frame += byte
            return None
        elif self.state == self.AFTER_DLE:
            self.frame += self.after_dle_func(byte)
            self.state = self.IN_MSG
            return None
        else:
            raise AssertionError()

Note that the code of the input method closely follows the state diagram. This is how implementations of state machines are – it’s generally difficult to understand what’s going on in the code without having some sort of a state diagram in front of your eyes. In this case the state machine has just 3 states, but it can be easily 20 for more complex needs. Understanding such a state function with 20 states is impossible without a diagram.

Anyhow, here’s some test code that simulates a stream of data with a couple of frames and invalid data in between:

bytes = ''.join(chr(b) for b in
            [0x70, 0x24,
             0x61, 0x99, 0xAF, 0xD1, 0x62,
             0x56, 0x62,
             0x61, 0xAB, 0xAB, 0x14, 0x62,
             0x7
            ])

pw = ProtocolWrapper()

for byte in bytes:
    frame = pw.input(byte)
    if frame:
        print 'Got frame:', frame.encode('hex')

This prints:

Got frame: 99afd1
Got frame: ab14

Co-routines

I don’t intend to teach the theory behind co-routines here, and I’ll assume at least a basic familiarity with the concept. My goal is to show a real-life, relevant example that demonstrates how co-routines relate to state machines.

This link is a good tutorial on co-routines (in C, of all languages), and there’s of course Wikipedia and C2. But the absolutely best tutorial, with focus on Python, is David Beazley’s presentation from this year’s PyCon: A curious course on coroutines and concurrency. It is while reading this tutorial that the connection finally ‘clicked’ in my head. It is most highly recommended [2].

If there’s one description of co-routines you should remember while reading this article and later, it is that co-routines save the control state of a function between calls. Kinda like recursion – you know exactly where are you going to return after a function call.

When you call a co-routine, it doesn’t start all over from the beginning. Rather, it starts from right after where it returned (yielded control) the previous time it was called.

This also explains why co-routines can replace state machines. The input method of ProtocolWrapper is invoked multiple times. Since it’s a "normal" function, it begins running from its first line for each invocation. This is why it needs to keep a state machine – to know it’s current "place in the world" when the next byte is received. With co-routines this isn’t necessary – co-routines start exactly where they stopped the previous time they were called – so no state keeping is required!

Using co-routines for framing

Without further ado, here is the co-routine implementation of the framing problem:

@coroutine
def unwrap_protocol(header='\x61',
                    footer='\x62',
                    dle='\xAB',
                    after_dle_func=lambda x: x,
                    target=None):
    """ Simplified framing (protocol unwrapping)
        co-routine.
    """
    # Outer loop looking for a frame header
    #
    while True:
        byte = (yield)
        frame = ''

        if byte == header:
            # Capture the full frame
            #
            while True:
                byte = (yield)
                if byte == footer:
                    target.send(frame)
                    break
                elif byte == dle:
                    byte = (yield)
                    frame += after_dle_func(byte)
                else:
                    frame += byte

Look how simple and elegant it is. You can tell immediately what it does just by looking at the source code – no state diagrams are needed.

We loop over frames. A frame starts with a header byte. After a header byte has been received, we accumulate the bytes of the frame until a footer is encountered. The (yield) calls is where the magic is. The function suspends at these points until it is called again [3]. Then, the value passed in the new call is returned from (yield) and the co-routine proceeds from the same place.

Note how the state machine is implicitly embedded in this code. It’s there, but you don’t see it – it’s hiding in the control structures (the IFs, ELSEs and the WHILEs) of the function.

When a complete frame is received, it is sent to the target of the co-routine, which may process it at will. After executing send, the co-routine breaks out of the inner loop and suspends waiting for a new header in the outer loop.

The @coroutine decorator is a simple utility required for Python co-routines:

def coroutine(func):
    def start(*args,**kwargs):
        cr = func(*args,**kwargs)
        cr.next()
        return cr
    return start

This is needed to bring a co-routine to its first yield and suspend there. You can just use this decorator without worrying about the details, until you become more comfortable with the concept to understand the exact inner workings described in PEP 342.

To test this co-routine implementation we also need a simple "sink" co-routine (using Dave Beazley’s terminology from his presentation). This will be the receiver of the send calls made by our co-routine:

@coroutine
def frame_receiver():
    """ A simple co-routine "sink" for receiving
        full frames.
    """
    while True:
        frame = (yield)
        print 'Got frame:', frame.encode('hex')

bytes = ''.join(chr(b) for b in
            [0x70, 0x24,
             0x61, 0x99, 0xAF, 0xD1, 0x62,
             0x56, 0x62,
             0x61, 0xAB, 0xAB, 0x14, 0x62,
             0x7
            ])

unwrapper = unwrap_protocol(
                target=frame_receiver())

for byte in bytes:
    unwrapper.send(byte)

Prints:

Got frame: 99afd1
Got frame: ab14

Conclusion

I’ll repeat the quote from the beginning of the article:

Co-routines are to state machines what recursion is to stacks

Recursion helps process nested data structures without employing explicit stacks.

Similarly, co-routines help solve problems involving state, without using explicit state machines. The resulting code is not centered on the states, but rather on the logic of the tasks, which makes it much simpler to understand.

Co-routines are a useful tool to have in one’s toolbox. It is worthwhile to spend some time getting acquainted with them.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1] Such a state machine is called a Mealy machine – it generates output based on the current state and input. Most state machines implemented in software are of this type.
[2] For Python there’s also PEP 342 – but I recommend going over it only after you’ve read Dave’s tutorial.
[3] Technically, a co-routine is created once by calling it. Then, we have a "co-routine object" on which we can execute send methods, passing the arguments to yield via send. This is how co-routines are implemented in Python. It might look different in another language, but the concept stays the same.

Related posts:

  1. Framing in serial communications
  2. endian-ness of bits and bytes
  3. Frames and protocols for the serial port – in Python
  4. Using sub-generators for lexical scanning in Python
  5. when bit endianness matters

17 Responses to “Co-routines as an alternative to state machines”

  1. andrew cookeNo Gravatar Says:

    Thanks for writing this. It’s made me think again about coroutines. I thought I understood them, but now wonder if I have structured my code backwards (if I could explain that better I wouldn’t be so confused…).

    Said code (possibly backwards) is a recursive descent parser written in Python which also contains a pure-python implementation of regular expressions (which do not use coroutines, although the main parser does). Anyway, because it’s pure Python you can use regexps to match binary data. And indeed, not just regexps – you can use the entire recursive decent parser framework. Most of my work has been on text, so I am on the lookout for anyone who actually wants to parse binary data – any feedback appreciated. It may solve what you are trying to do here.

    Anyway, thanks again. It’s disturbing/worrying, but also hopeful, when you read something that makes you question what you thought you knew.

    Andrew

    PS Sorry, forgot, link to binary parser – http://www.acooke.org/lepl/binary.html
    PPS Managed to get your antispam question wrong! (wrote “four”)

  2. CoryNo Gravatar Says:

    Great stuff – this is the best case I’ve seen made for using coroutines. That actually helps it be a successful explanation of coroutines, as well.

    Never thought of coroutines as a protocol parser before.

  3. BrianNo Gravatar Says:

    This is a nice, illustrative article.

    I would not use this in the real world for the proposed problem domain though. You are calling a function for every *byte* read. Ouch.

  4. linkbackNo Gravatar Says:

    http://www.reddit.com/r/programming/comments/9fcna/coroutines_as_an_alternative_to_state_machines/

  5. Brian TakitaNo Gravatar Says:

    Uhh, you almost lost me when you claimed recursion is more readable than stacks. That is a subjective and probably unpopular (as I believe the majority of programmers would disagree with you) statement.
    It’s also a real pain in the ass to locate errors in stackless languages (Javascript, Erlang).
    I find your claim that recursion gives you the benefits without the costs to be just plain wrong.

    Maybe it’s my unfamiliarity with Python, but I also find the state machine example more readable, as I don’t have to hold more thoughts (state) in my head to understand what is going on.

  6. BCNo Gravatar Says:

    python co-routines are a killer feature for scientific data-acquisition or instrumentation applications. Traditionally, this type of app makes extensive use of state-machines to control the interaction with external hardware. We’ve just finished a somewhat complex instrument application where we’ve used coroutines extensive and it’s been a huge success.

    We normally use data-acquisition devices (PCI cards, from National Instruments) to generate and collect analog and digital signals. These feed a “pipeline” of co-routines. The data processing pipeline can comprise elements like “data averaging”, “output-to-file”, “plotting” and also hardware interactions such as moving linear actuators and GPIB devices. Each one gets a coroutine to control it and manage it’s state.

    The data acquisition pipeline node looks like:
    @pipeline
    def daq_lifecycle(self, output):
    ....task = daq.initialiseTask()
    ........task.start()
    ....try:
    ........while True:
    ............output.send((yield))
    ....finally:
    ........task.clear()

    where the ‘daq’ object represents the PCI card device (wrapped with ctypes).

    We also have:
    # a little decorator to 'ping' the coroutines first .next() call
    def pipeline(func):
    ....def wrapper(*args, **kw):
    ........gen = func(*args, **kw)
    ........gen.next()
    ........return gen

    When new data arrives, it’s is “sent” into the pipeline (in a callback called by the device drivers whenever new data is available). Each node in the pipeline is given an output-coroutine to send it’s data into (except for the terminating node). A full pipeline is built by joining all the required nodes together.

    One cool feature is the use of try-finally block. If there’s an exception downstream in the pipeline, it propagates upwards and all the finally blocks execute, restoring the hardware to a safe state. The pipeline can be shutdown manually by “throw”ing an exception into it. The downstream nodes are automatically garbage-collected and all their finally blocks execute, ensuring a safe final state.

    Using co-routines means we can drastically reduce the number of state-maintaining attributes on the objects, which simplifies the code and reduces the opportunities for bugs. Also, describing state in terms of point-of-executing in the code makes the state evolution or lifecycle intuitive to understand.

    python is awesome.

  7. Ludvig EricsonNo Gravatar Says:

    Well-written, nice and informative. Good language use, and a very interesting subject.

    However, have you looked at the performance implications, if any? It somehow feels like there’d be a price to pay for this.

  8. elibenNo Gravatar Says:

    Brian,
    Here’s Python-pseudocode for inorder traversal of a binary tree:

    def traverse(node, func):
      if node is None:
        return
      traverse(node.left)
      func(node)
      traverse(node.right)

    I challenge you to write it in a more readable way using explicit stacks instead of recursion.

  9. elibenNo Gravatar Says:

    Ludvig,

    I haven’t tried benchmarking, but in Dave Beazley’s article he came to the conclusion that co-routines are more efficient than the alternative.

    BC,

    Thanks, this is very interesting.

  10. Ryan FoxNo Gravatar Says:

    I agree that state machines as nested conditions is hard to comprehend when the complexity increases, but have you considered an explicit state machine model?

    http://www.perl.com/pub/a/2004/09/23/fsms.html
    This article (written for Perl, but should be easy to understand) demonstrates this. The code just above the “Final Assembly” heading shows a nice example of an explicit state machine. You essentially build a table of [state, condition, action] to define the behaviour.

    Another benefit of this style of implementation is that it’s very easy to parse and then give to dot http://www.graphviz.org/ to draw a visual representation of the state machine.

  11. Ryan FoxNo Gravatar Says:

    For the tree traversal, I won’t argue that the recursive version is simpler, but if you did implement it with an explicit stack, you could then very easily implement a breadth-first traversal by replacing the stack with a queue.

  12. James ThieleNo Gravatar Says:

    The basic correspondence between the Python implementations of state machines in coroutines and “regular” state machines is that each (yield) in the coroutine corresponds to a state in the state machine. You assert that a state machine with 20 states would be difficult to follow. I doubt that a function with 20 (yield)s would be any easier.

  13. elibenNo Gravatar Says:

    @Ryan,

    Just to put things straight, I’m not coming out against state machines here. State machines are an important tool – they definitely have their place in software. I’m just presenting co-routines as an alternative for certain cases.

    Explicitly specifying state machines is indeed better than just hand coding them from a diagram. There are nice tools out there, like Ragel, that even compile an FSM in some simple DSL to real code (C, C++, Java, Ruby – unfortunately no Python).

  14. roger rubygemsNo Gravatar Says:

    Thanks good writeup. I’m not so sure about your conclusion, however…How would it look comparison-wise if you had 10 states that all called each other, complexity-wise?

    Being able to pipeline operations together seems to be a good fit for coroutines, as mentioned. Or using them to allow for functional programming in an evented architecture :)
    Cheers.

    -r

  15. José HérnandezNo Gravatar Says:

    @Roger: I think Eli makes a nice argument for coroutines. When one thinks about parsing, one would first imagine the code to parse the input without having to record the results in the middle of parsing. Coroutine parsers are written with this kind of frame of mind. If one writes a parser that just burns through the input without returning any of the parsed results, it would look strikingly like a parser written with coroutines. It’s this natural approach that I believe Eli finds appealing.

  16. Shai ShasagNo Gravatar Says:

    I do not see the need for coroutines here. Here is a shorter, simpler implementation that uses iterators:

    bytes = (chr(b) for b in
                [0x70, 0x24,
                 0x61, 0x99, 0xAF, 0xD1, 0x62,
                 0x56, 0x62,
                 0x61, 0xAB, 0x62, 0x14, 0x62,
                 0x7
                ])
    def iterator_protocol(get_byte_iter,
                        header='\x61',
                        footer='\x62',
                        dle='\xAB'):
        while True:
            byte = get_byte_iter.next()
            frame = ''
            if byte == header:
                while True:
                    byte = get_byte_iter.next()
                    if byte == footer:
                        yield frame
                        break
                    elif byte == dle:
                        byte = get_byte_iter.next()
                        frame += byte
                    else:
                        frame += byte
    for frame in iterator_protocol(bytes):
        print 'Got frame:', frame.encode('hex')
  17. elibenNo Gravatar Says:

    Shai,

    It’s pretty much the same thing – a minor variation. The coroutine approach is a bit more general, because unwap_protocol does not have to worry about pulling bytes from an iterator – it can get directly from yield. This way it can participate in a longer “pipe” of couroutines if needed.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)