Frames and protocols for the serial port – in Python

August 20th, 2009 at 7:01 am

Some preliminaries

If you’ve been following this blog recently, you must have noticed that many of the posts in these past few weeks are about using Python to communicate via the serial port. I specifically decided to write them as separate posts and not as part of a series, because I think that each post is interesting in itself [1].

But just in case you got confused, here’s the logical order:

  1. Setting up Python to work with the serial port
  2. A “live” data monitor with Python, PyQt and PySerial
  3. Framing in serial communications

In this post I want to present some useful Python code to implement the ideas of (3). Additionally, I’ll introduce a very useful library for constructing frames from various forms of data.

Code

The code for this post is available for download as a single .zip file here. It contains the modules discussed, the sample code shown and even some unit tests.

Arrays of data in Python

When we think about a sequence of bytes in Python, two approaches come to mind: an array of integers in the range 0-255, or a ‘packed’ string. Here’s some Python terminal action that displays the difference:

>>> arr = [0x45, 0xAB, 0xC3, 0x16]
>>> arr
[69, 171, 195, 22]
>>> str = '\x45\xAB\xC3\x16'
>>> str
'E\xab\xc3\x16'
>>> ''.join(chr(b) for b in arr)
'E\xab\xc3\x16'
>>> [ord(b) for b in str]
[69, 171, 195, 22]

This shows that the two formats are essentially interchangeable, and also that it’s very easy to convert between the two.

The format we’re going to use is the packed string, because this is what the pyserial module uses to send and receive data.

Serializing data

So, to send data over the serial port we first have to turn it into a packed string – this is called serialization [2].

Python has a couple of built-in ways to do that – with the array and struct modules. However, both are suitable for fairly simple and unsophisticated data. To serialize arbitrarily sophisticated data formats, it’s much better to use the powerful and flexible construct library [3].

Here’s a sample message format defined with construct (from sampleformat.py in this article’s code archive):

from construct import *

message_crc = Struct('message_crc', ULInt32('crc'))

message_format = Struct('message_format',
    ULInt16('msg_id'),
    ULInt16('dest_addr'),
    Enum(Byte('command_type'),
        RESTART = 0x40,
        RESTART_ACK = 0x80,
        SIGNAL = 0x22,
        _default_ = Pass
    ),
    BitStruct('flags',
        Flag('on'),
        BitField('status', 3),
        Flag('cache'),
        Padding(3)
    ),
    Byte('datalen'),
    Array(lambda ctx: ctx['datalen'], Byte('data')),
    Embed(message_crc)
)

It shows off a few interesting features of construct:

  • Explicit specification of endianness for multi-byte fields
  • Enumerations
  • Support for byte-oriented and bit-oriented fields
  • Arrays of data with specified length
  • Embedded structs

The message should look roughly familiar for anyone designing and using binary protocols. It’s very typical of how real formats look – some ID fields, flags, data, CRC [4].

Here’s how this message format can be used to pack and unpack a message:

>>> from sampleformat import message_format
>>> from construct import *
>>> raw = message_format.build(Container(
...         msg_id=0x1234,
...         dest_addr=0xacba,
...         command_type='RESTART',
...         flags=Container(on=1, cache=0, status=4),
...         datalen=4,
...         data=[0x1, 0xff, 0xff, 0xdd],
...         crc=0x12345678))
>>> raw.encode('hex')
'3412baac40c00401ffffdd78563412'
>>> c = message_format.parse(raw)
>>> print c
Container:
    msg_id = 4660
    dest_addr = 44218
    command_type = 'RESTART'
    flags = Container:
        on = True
        status = 4
        cache = False
    datalen = 4
    data = [
        1
        255
        255
        221
    ]
    crc = 305419896

A few things to note here:

  • message_format is an object with two useful methods: build for packing data into a string, and parse for unpacking it back from a string.
  • Container is a class taken from construct. It’s just a simple data container holding its data items in attributes. Any compatible object would do here (duck typing!) – for example a namedtuple. I chose Container because it comes with construct anyway and is simple and useful.
  • raw is a packed string. The encode string method is used here to show the hex values of the string’s bytes.

Framing (protocol wrapping and unwrapping)

protocolwrapper.py in the code archive is a faithful Python implementation of the Framing in serial communications article.

Not much more to say about it here – the code is commented and should be simple to understand if you’re familiar with the theory.

Putting it all together

The process of sending is:

  1. Serialize all the fields into a packed string using the message format object
  2. Compute the CRC and insert it into the frame
  3. Wrap the frame with the protocol
  4. Now we have a string ready to send that represents the complete message
from zlib import crc32
from protocolwrapper import (
    ProtocolWrapper, ProtocolStatus)
from sampleformat import (
    message_format, message_crc, Container)


PROTOCOL_HEADER = '\x11'
PROTOCOL_FOOTER = '\x12'
PROTOCOL_DLE = '\x90'


def build_message_to_send(
        msg_id, dest_addr, command_type,
        flag_on, flag_cache, flag_status, data):
    """ Given the data, builds a message for
        transmittion, computing the CRC and packing
        the protocol.
        Returns the packed message ready for
        transmission on the serial port.
    """
    datalen = len(data)
    flags = Container(  on=flag_on,
                        cache=flag_cache,
                        status=flag_status)

    # Build the raw message string. CRC is empty
    # for now
    #
    raw = message_format.build(Container(
        msg_id=msg_id,
        dest_addr=dest_addr,
        command_type=command_type,
        flags=flags,
        datalen=datalen,
        data=data,
        crc=0))

    # Compute the CRC field and append it to the
    # message instead of the empty CRC specified
    # initially.
    #
    msg_without_crc = raw[:-4]
    msg_crc = message_crc.build(Container(
        crc=crc32(msg_without_crc)))

    # Append the CRC field
    #
    msg = msg_without_crc + msg_crc

    pw = ProtocolWrapper(
            header=PROTOCOL_HEADER,
            footer=PROTOCOL_FOOTER,
            dle=PROTOCOL_DLE)

    return pw.wrap(msg)

The receiving process is:

  1. Unwrap the protocol to receive a frame
  2. Unpack the frame into separate fields using the frame format
  3. Compute the CRC and compare it to the one received
  4. If all is OK, we have received a new valid frame
# Sample: receiving a message
#
pw = ProtocolWrapper(
        header=PROTOCOL_HEADER,
        footer=PROTOCOL_FOOTER,
        dle=PROTOCOL_DLE)

# Feed all the bytes of 'msg' sequentially
# into pw.input
#
status = map(pw.input, msg)

if status[-1] == ProtocolStatus.MSG_OK:
    rec_msg = pw.last_message

    # Parse the received CRC into a 32-bit integer
    #
    rec_crc = message_crc.parse(rec_msg[-4:]).crc

    # Compute the CRC on the message
    #
    calc_crc = crc32(rec_msg[:-4])

    if rec_crc != calc_crc:
        print 'Error: CRC mismatch'

    print message_format.parse(rec_msg)

These are just examples, of course. Your own code will depend on the structure of your frames and how you receive your data. But it can serve as a basic template for implementing arbitrary complex serial protocols in a robust way.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1] By the way, all the posts on this topic are collected in the Serial Port category.
[2] Python already has nice libraries for serialization (pickle, shelve, json and others), but there’s a problem! It’s usually not Python we have on the other side of the serial link! Two Python programs would find a better, faster method to communicate (like TCP/IP). When we use Python with pyserial it’s because we actually want to communicate with some embedded hardware (implemented in C or even as an FPGA/ASIC with VHDL or Verilog) or other physical equipment. So pickling the data won’t help here.
[3] This is not a tutorial of construct though. There’s a pretty good one on its website
[4] construct has sample formats for well-known protocols like TCP and ARP, and binary files like PNG and ELF32.

Related posts:

  1. Setting up Python to work with the serial port
  2. perl master, C++ slave, bound for serial port programming
  3. serial port saga – a C++ implementation
  4. once again: perl, serial ports and what’s between them
  5. Listing all serial ports on Windows with Python

9 Responses to “Frames and protocols for the serial port – in Python”

  1. DrewNo Gravatar Says:

    Great write up! This is really useful for getting packet data from an embedded system

  2. BlakeNo Gravatar Says:

    Eli,
    Great Post…I’ve used your ideas here to try and implement a protocol of my own for a customer project. I keep running into an issue though…How do you deal with multi-byte values for PROTOCOL_FOOTER in the wrapper? the engineer for this project has spec’d ‘\x0d\x0a’ for the packet footer and i can’t get pw.status(‘MSG_OK’) to work…all i get is ‘IN_MSG’…any ideas would be greatly appreciated.

  3. ManniNo Gravatar Says:

    I’m very impressed by your clean explanations, good for us less experienced.
    I’m disassembling your code and beginning to catch it but I’m having one problem:
    -what’s that extra DLE in the message stream?
    If I run the tryit.py in your code I get the following output:

    $ python tryit.py
    11349012cdab40e0049090901281755ab51a6912
    Container({‘command_type’: ‘RESTART’,
    ‘crc’: 1763358042,
    ‘data’: [144, 18, 129, 117],
    ‘datalen’: 4,
    ‘dest_addr’: 43981,
    ‘flags’: Container({‘status’: 6, ‘on’: True, ‘cache’: False}),
    ‘msg_id’: 4660})

    I can dissasemble the message (11349012cdab40e0049090901281755ab51a6912)
    this much:
    11 PROTOCOL_HEADER
    349012 msg_id=0×1234 with embedded dle for the 0×12 databyte
    cdab dest_addr=0xABCD,
    40 command_type=’RESTART’
    e0 flag_on=1, flag_status=6,
    04 datalength
    90 <— what's this extra dle ?
    90 dle for the dle inside data following
    90128175 data=[0x90, 0x12, 0x81, 0x75])
    5ab51a69 crc
    12 PROTOCOL_FOOTER

    I guess the extra DLE has something to do with the after_dle_func in your ProtocolWrapper,
    but I would be very glad if you could elaborate a little more on this.

  4. elibenNo Gravatar Says:

    Blake,

    This code currently only supports single-byte footers. For multi-byte footers the footer detection code will have to be modified.

    Manni,

    After 0×04 come 3 instances of 0×90. The first escapes the second, so the first data byte is 0×90. The third escapes 0×12, since that’s PROTOCOL_FOOTER and has to be escaped also.

  5. ManniNo Gravatar Says:

    Of course, sorry for being blind, I’m embarrassed.
    I caught that first occurrence of PROTOCOL_FOOTER escaping, but doing it twice was obviously too much for my simple mind.
    Thanks anyway!

  6. ManniNo Gravatar Says:

    Do you cope with more of my investigations of this great code example ?
    If I try with other values for msg_id in the tryit.py code I get strange errors in the CRC calculation,
    for instance 0×1235 (and every other eg 0×1237, 0×1239 aso) gives me the following output:

    $ python tryit.py
    /usr/local/lib/python2.6/dist-packages/construct-2.06-py2.6.egg/construct/core.py:359: DeprecationWarning: struct integer overflow masking is deprecated
    _write_stream(stream, self.length, self.packer.pack(obj))
    11359012cdab40e0049090901281759a6a94a812
    Error: CRC mismatch
    Container({‘command_type’: ‘RESTART’,
    ‘crc’: 2828298906L,
    ‘data’: [144, 18, 129, 117],
    ‘datalen’: 4,
    ‘dest_addr’: 43981,
    ‘flags’: Container({‘status’: 6, ‘on’: True, ‘cache’: False}),
    ‘msg_id’: 4661})

    Is it some kind of two’s complement overflow we trigger here?

  7. elibenNo Gravatar Says:

    Manni,

    This appears to be the result of a change in Python 2.6 to the zlib.crc32 function return value. The fix is to “and” the return value of crc32 with 0xFFFFFFFF. There are two calls to crc32 in tryit.py. Can you verify that the fix works for you?

  8. ManniNo Gravatar Says:

    Yes I can confirm, these two lines seems to do the trick:
    .
    crc=crc32(msg_without_crc) & 0xFFFFFFFF))
    .
    calc_crc = crc32(rec_msg[:-4]) & 0xFFFFFFFF

    Thanks again.

  9. foresightyjNo Gravatar Says:

    Thanks for recommending the construct library. Before knowing this, I always use the struct library to parse data.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)