Some preliminaries

If you've been following this blog recently, you must have noticed that many of the posts in these past few weeks are about using Python to communicate via the serial port. I specifically decided to write them as separate posts and not as part of a series, because I think that each post is interesting in itself [1].

But just in case you got confused, here's the logical order:

  1. Setting up Python to work with the serial port
  2. A “live” data monitor with Python, PyQt and PySerial
  3. Framing in serial communications

In this post I want to present some useful Python code to implement the ideas of (3). Additionally, I'll introduce a very useful library for constructing frames from various forms of data.

Code

The code for this post is available here. It contains the modules discussed, the sample code shown and even some unit tests.

Arrays of data in Python

When we think about a sequence of bytes in Python, two approaches come to mind: an array of integers in the range 0-255, or a 'packed' string. Here's some Python terminal action that displays the difference:

>>> arr = [0x45, 0xAB, 0xC3, 0x16]
>>> arr
[69, 171, 195, 22]
>>> str = '\x45\xAB\xC3\x16'
>>> str
'E\xab\xc3\x16'
>>> ''.join(chr(b) for b in arr)
'E\xab\xc3\x16'
>>> [ord(b) for b in str]
[69, 171, 195, 22]

This shows that the two formats are essentially interchangeable, and also that it's very easy to convert between the two.

The format we're going to use is the packed string, because this is what the pyserial module uses to send and receive data.

Serializing data

So, to send data over the serial port we first have to turn it into a packed string - this is called serialization [2].

Python has a couple of built-in ways to do that - with the array and struct modules. However, both are suitable for fairly simple and unsophisticated data. To serialize arbitrarily sophisticated data formats, it's much better to use the powerful and flexible construct library [3].

Here's a sample message format defined with construct (from sampleformat.py in this article's code archive):

from construct import *

message_crc = Struct('message_crc', ULInt32('crc'))

message_format = Struct('message_format',
    ULInt16('msg_id'),
    ULInt16('dest_addr'),
    Enum(Byte('command_type'),
        RESTART = 0x40,
        RESTART_ACK = 0x80,
        SIGNAL = 0x22,
        _default_ = Pass
    ),
    BitStruct('flags',
        Flag('on'),
        BitField('status', 3),
        Flag('cache'),
        Padding(3)
    ),
    Byte('datalen'),
    Array(lambda ctx: ctx['datalen'], Byte('data')),
    Embed(message_crc)
)

It shows off a few interesting features of construct:

  • Explicit specification of endianness for multi-byte fields
  • Enumerations
  • Support for byte-oriented and bit-oriented fields
  • Arrays of data with specified length
  • Embedded structs

The message should look roughly familiar for anyone designing and using binary protocols. It's very typical of how real formats look - some ID fields, flags, data, CRC [4].

Here's how this message format can be used to pack and unpack a message:

>>> from sampleformat import message_format
>>> from construct import *
>>> raw = message_format.build(Container(
...         msg_id=0x1234,
...         dest_addr=0xacba,
...         command_type='RESTART',
...         flags=Container(on=1, cache=0, status=4),
...         datalen=4,
...         data=[0x1, 0xff, 0xff, 0xdd],
...         crc=0x12345678))
>>> raw.encode('hex')
'3412baac40c00401ffffdd78563412'
>>> c = message_format.parse(raw)
>>> print c
Container:
    msg_id = 4660
    dest_addr = 44218
    command_type = 'RESTART'
    flags = Container:
        on = True
        status = 4
        cache = False
    datalen = 4
    data = [
        1
        255
        255
        221
    ]
    crc = 305419896

A few things to note here:

  • message_format is an object with two useful methods: build for packing data into a string, and parse for unpacking it back from a string.
  • Container is a class taken from construct. It's just a simple data container holding its data items in attributes. Any compatible object would do here (duck typing!) - for example a namedtuple. I chose Container because it comes with construct anyway and is simple and useful.
  • raw is a packed string. The encode string method is used here to show the hex values of the string's bytes.

Framing (protocol wrapping and unwrapping)

protocolwrapper.py in the code archive is a faithful Python implementation of the Framing in serial communications article.

Not much more to say about it here - the code is commented and should be simple to understand if you're familiar with the theory.

Putting it all together

The process of sending is:

  1. Serialize all the fields into a packed string using the message format object
  2. Compute the CRC and insert it into the frame
  3. Wrap the frame with the protocol
  4. Now we have a string ready to send that represents the complete message
from zlib import crc32
from protocolwrapper import (
    ProtocolWrapper, ProtocolStatus)
from sampleformat import (
    message_format, message_crc, Container)


PROTOCOL_HEADER = '\x11'
PROTOCOL_FOOTER = '\x12'
PROTOCOL_DLE = '\x90'


def build_message_to_send(
        msg_id, dest_addr, command_type,
        flag_on, flag_cache, flag_status, data):
    """ Given the data, builds a message for
        transmittion, computing the CRC and packing
        the protocol.
        Returns the packed message ready for
        transmission on the serial port.
    """
    datalen = len(data)
    flags = Container(  on=flag_on,
                        cache=flag_cache,
                        status=flag_status)

    # Build the raw message string. CRC is empty
    # for now
    #
    raw = message_format.build(Container(
        msg_id=msg_id,
        dest_addr=dest_addr,
        command_type=command_type,
        flags=flags,
        datalen=datalen,
        data=data,
        crc=0))

    # Compute the CRC field and append it to the
    # message instead of the empty CRC specified
    # initially.
    #
    msg_without_crc = raw[:-4]
    msg_crc = message_crc.build(Container(
        crc=crc32(msg_without_crc)))

    # Append the CRC field
    #
    msg = msg_without_crc + msg_crc

    pw = ProtocolWrapper(
            header=PROTOCOL_HEADER,
            footer=PROTOCOL_FOOTER,
            dle=PROTOCOL_DLE)

    return pw.wrap(msg)

The receiving process is:

  1. Unwrap the protocol to receive a frame
  2. Unpack the frame into separate fields using the frame format
  3. Compute the CRC and compare it to the one received
  4. If all is OK, we have received a new valid frame
# Sample: receiving a message
#
pw = ProtocolWrapper(
        header=PROTOCOL_HEADER,
        footer=PROTOCOL_FOOTER,
        dle=PROTOCOL_DLE)

# Feed all the bytes of 'msg' sequentially
# into pw.input
#
status = map(pw.input, msg)

if status[-1] == ProtocolStatus.MSG_OK:
    rec_msg = pw.last_message

    # Parse the received CRC into a 32-bit integer
    #
    rec_crc = message_crc.parse(rec_msg[-4:]).crc

    # Compute the CRC on the message
    #
    calc_crc = crc32(rec_msg[:-4])

    if rec_crc != calc_crc:
        print 'Error: CRC mismatch'

    print message_format.parse(rec_msg)

These are just examples, of course. Your own code will depend on the structure of your frames and how you receive your data. But it can serve as a basic template for implementing arbitrary complex serial protocols in a robust way.

[1]By the way, all the posts on this topic are collected in the Serial Port category.
[2]Python already has nice libraries for serialization (pickle, shelve, json and others), but there's a problem! It's usually not Python we have on the other side of the serial link! Two Python programs would find a better, faster method to communicate (like TCP/IP). When we use Python with pyserial it's because we actually want to communicate with some embedded hardware (implemented in C or even as an FPGA/ASIC with VHDL or Verilog) or other physical equipment. So pickling the data won't help here.
[3]This is not a tutorial of construct though. There's a pretty good one on its website
[4]construct has sample formats for well-known protocols like TCP and ARP, and binary files like PNG and ELF32.