Framing in serial communications

August 12th, 2009 at 5:16 am

Introduction

In the previous post we’ve seen how to send and receive data on the serial port with Python and plot it live using a pretty GUI.

Notice that the sender script (sender_sim.py) is just sending one byte at a time. The "chunks" of data in the protocol between the sender and receiver are single bytes. This is simple and convenient, but hardly sufficient in the general sense. We want to be able to send multiple-byte data frames between the communicating parties.

However, there are some challenges that arise immediately:

  • The receiver is just receiving a stream of bytes from the serial port. How does it know when a message begins or ends? How does it know how long the message is?
  • Even more seriously, we can not assume a noise-free channel. This is real, physical hardware stuff. Bytes and whole chunks can and will be lost due to electrical noise. Worse, other bytes will be distorted (say, a single bit can be flipped due to noise).

To see how this can be done in a safe and tested manner, we first have to learn about the basics of the Data Link Layer in computer networks.

An example

Let’s now see a completely worked-out example that demonstrates how this works.

Suppose we define the following protocol:

  • Start flag: 0×12
  • End flag: 0×13
  • Escape (DLE): 0×7D

And the sender wants to send the following data message (let’s ignore its contents for the sake of the example – they’re really not that important). The original data is in (a):

http://eli.thegreenplace.net/wp-content/uploads/2009/08/example1.png

The data contains two flags that need to be escaped – an end flag at position 2 (counting from 0, of course!), and a DLE at position 4.

The sender’s data link layer [7] turns the data into the frame shown in (b) – start and end flags are added, and in-message flags are escaped.

Let’s see how the receiver handles such a frame. For demonstration, assume that the first byte the receiver draws from the serial port is not a real part of the message (we want to see how it handles this). In the following diagram, ‘Receiver state’ is the state of the receiver after the received byte. ‘Data buffer’ is the currently accumulated message buffer to pass to an upper level:

http://eli.thegreenplace.net/wp-content/uploads/2009/08/example1_rcv.png

A few things to note:

  • The "stray" byte before the header is ignored: according to the protocol each frame has to start with a header, so this isn’t part of the frame.
  • The start and end flags are not inserted into the data buffer
  • Escapes (DLEs) are correctly handled by a special state
  • When the frame is finished with an end flag, the receiver has a frame ready to pass to an upper level, and comes back waiting for a header – a new frame.

Finally, we see that the message received is exactly the message sent. All the protocol details (flags, escapes and so on) were transparently handled by the data link layer [8].

Conclusion

There are several methods of handling framing in communications, although most are unsuitable to be used on top of the serial port. Among the ones that are suitable, the most commonly used is byte stuffing. By defining a couple of "magic value" flags and careful rules of escaping, this framing methods is both robust and easy to implement as a software layer. It is also widely used as PPP depends on it.

Finally, it’s important to remember that for a high level of robustness, it’s required to add some kind of error checking into the protocol – such as computing a CRC on the message and appending it as the last word of the message, which the receiver can verify before deciding that the message is valid.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1] The Data Link Layer is layer 2 in the OSI model. In the TCP/IP model it’s simply called the "link layer".
[2] The serial port can be configured to add parity bits to bytes. These days, this option is rarely used, because:
  • A single parity bit isn’t a very strong means of detecting errors. 2-bit errors fool it.
  • Error handling is usually done by stronger means at a higher level.
[3] For example Ethernet (802.3) uses 12 octets of idle characters between frames.
[4] You might run into the term DLE – Data Link Escape, which means the same thing. I will use the acronyms DLE and ESC interchangeably.
[5] Just like quotes and escape characters in strings! In C: "I say \"Hello\"". To escape the escape, repeat it: "Here comes the backslash: \\ - seen it?"
[6] I’d love to hear why this XOR-ing is required. One simple reason I can think of is to prevent the flag and escape bytes appearing "on the line" even after they’re escaped. Presumably this improves resynchronization if the escape byte is lost?
[7] Which is just a fancy way to say "a protocol wrapping function", since the layer is implemented in software.
[8] Such transparency is one of the greatest ideas of layered network protocols. So when we implement protocols in software, it’s a good thing to keep in mind – transparency aids modularity and decoupling, it’s a good thing.

Related posts:

  1. Frames and protocols for the serial port – in Python
  2. when bit endianness matters
  3. once again: perl, serial ports and what’s between them
  4. Co-routines as an alternative to state machines
  5. Setting up Python to work with the serial port

6 Responses to “Framing in serial communications”

  1. Joseph LiseeNo Gravatar Says:

    The character count method works fine if you a have relatively reliable link, checksumed packets and a defined resynchronization protocol. I have used such a system for real time robotics control for several years and I have had no issues. In the simplest terms, when the receiver detects a bad packet it returns an error code which cause the sender to send sync packets at the receiver until the receiver responds with a valid response.

    Oh course the serial link in question is really done over USB through an onboard FTDI serial USB converter chip, so the chance of corrupt data is much lower then a normal analog serial comms.

  2. Marcelo MDNo Gravatar Says:

    How do you compare it with something like:
    [header][size][data][crc]
    Byte stuffing sounds good when you have an ‘open ended’ byte stream, or when you have enough memory to hold the entire packet before processing it in the upper layers.

    I prefer little packets, but it might just be my ethernet background. Maybe I should experiment with it.

  3. elibenNo Gravatar Says:

    @Joseph,
    As you said yourself, you have a very reliable link, so most issues don’t apply. In addition, your feedback-based resync method is considerably more complex than simple byte stuffing.

    @Marcelo,
    header + size can work, but you must byte stuff as well. This is why: suppose your data has a header byte somewhere, by chance. This easily happens when you’re transmitting some kind of physical measurement. Now, if your size byte is corrupt just once, you may never be able to resynchronize as the “fake” header byte will be tried as the message header over and over again.

    But as long as you use escaping for the header (and for the escape char), this method isn’t much different from header + footer.

  4. Marcelo MDNo Gravatar Says:

    Oh, I see it now. I’ve never tought about getting the wrong size. Silly me.
    Thanks a lot =)

  5. MatthewNo Gravatar Says:

    I understand the necessity of having the framing flags and escape characters, but I’m still unsure of how to handle bad packets on the sender’s side. Does the sender keep a quasi-infinite list of all the messages sent, deleting them as a valid reply is returned? Or does the sender poll the serial port awaiting a valid reply before it sends another message, or resends the bad message?

  6. elibenNo Gravatar Says:

    @matthew,
    That’s a whole different level. My post refers to the data link layer, where the goal is to get a packet across. What you’re interested in is the transport layer (at least in the TCP/IP stack land). Read up on TCP – it uses a sliding window protocol to handle bad packets that drop from time to time, with minimal losses.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)