Eli Bendersky's website - EE & Embedded

Sum of same-frequency sinusoids

2023-03-11T19:44:00-08:00

I was reviewing an electronics textbook the other day, and it made an offhand comment that "sinusoidal signals of the same frequency always add up to a sinusoid, even if their magnitudes and phases are different". This gave me pause; is that really so? Even with different phases?

Using EE notation, a sinusoidal signal with magnitude A_1, frequency and phase \phi_1 is A_1 sin(wt+\phi_1) [1]. The book's statement amounts to:

\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=A_3 sin(wt+\phi_3)\]

The sum is also a sinusoid with the same frequency, but potentially different magnitude and phase. I couldn't find this equality in any of my reference books, so why is it true?

Empirical probing

Let's start by asking whether this is true at all? It's not at all obvious that this should work. Armed with Python, Numpy and matplotlib, I plotted two sinusoidal signals with the same frequency but different magnitudes and phases:

Now, plotting their sum in green on the same chart:

Well, look at that. It seems to be working. I guess it's time to prove it.

Proof using trig identities

The first proof I want to demonstrate doesn't use any fancy math beyond some basic trigonometric identities. One of best known ones is:

\[sin(a+b)=sin(a)cos(b)+cos(a)sin(b) \hspace{2cm} (id. 1)\]

Taking our sum of sinusoids:

\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)\]

Applying (id.1) to each of the terms, and then regrouping, we get:

\[\begin{align*} <sum>&=A_1\left [sin(wt)cos(\phi_1)+cos(wt)sin(\phi_1) \right ]+A_2\left [sin(wt)cos(\phi_2)+cos(wt)sin(\phi_2) \right ]\\ &=\left [A_1 cos(\phi_1) + A_2 cos(\phi_2) \right ]sin(wt)+\left [ A_1 sin(\phi_1) + A_2 sin(\phi_2)\right ]cos(wt)\\ \end{align*}\]

Now, a change of variables trick: we'll assume we can solve the following set of equations for some and [2]:

\[\begin{align*} Bcos(\theta)&=A_1 cos(\phi_1)+A_2 cos(\phi_2) \hspace{2cm} (1)\\ Bsin(\theta)&=A_1 sin(\phi_1)+A_2 sin(\phi_2) \hspace{2cm} (2)\\ \end{align*}\]

To find , we can square each of (1) and (2) and then add the squares together:

\[B^2 cos^2 (\theta)+B^2 sin^2 (\theta)=(A_1 cos(\phi_1)+A_2 cos(\phi_2))^2 + (A_1 sin(\phi_1)+A_2 sin(\phi_2))^2\]

Using the fact that cos^2(a)+sin^2(a)=1, we get:

\[B=\sqrt{(A_1 cos(\phi_1)+A_2 cos(\phi_2))^2 + (A_1 sin(\phi_1)+A_2 sin(\phi_2))^2}\]

To solve for , we can divide equation (2) by (1), getting:

\[\frac{sin(\theta)}{cos(\theta)}=tan(\theta)=\frac{A_1 sin(\phi_1)+A_2 sin(\phi_2)}{A_1 cos(\phi_1)+A_2 cos(\phi_2)}\]

Meaning that:

\[\theta = atan{\frac{A_1 sin(\phi_1)+A_2 sin(\phi_2)}{A_1 cos(\phi_1)+A_2 cos(\phi_2)}}\]

Now that we have the values of and , let's put them aside for a bit and get back to the final line of our sum of sinusoids equation:

\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=\left [A_1 cos(\phi_1) + A_2 cos(\phi_2) \right ]sin(wt)+\left [ A_1 sin(\phi_1) + A_2 sin(\phi_2)\right ]cos(wt)\]

On the right-hand side, we can apply equations (1) and (2) to get:

\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=B cos(\theta) sin(wt)+ B sin(\theta) cos(wt)\]

Applying (id.1) again, we get:

\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=B sin(wt + \theta)\]

We've just shown that the sum of sinusoids with the same frequency is another sinusoid with frequency , and we've calculated and from the other parameters (A_1, A_2, \phi_1 and \phi_2) \blacksquare

Proof using complex numbers

The second proof uses a bit more advanced math, but overall feels more elegant to me. The plan is to use Euler's equation and prove a more general statement on the complex plane.

Instead of looking at the sum of real sinusoids, we'll first look at the sum of two complex exponential functions:

\[A_1 e^{j(wt + \phi_1)} + A_2 e^{j(wt + \phi_2)}\]

Reminder: Euler's equation for a complex exponential is

\[e^{jx}=cosx+jsinx\]

Regrouping our sum of exponentials a bit and then applying this equation:

\[\begin{align*} A_1 e^{j(wt + \phi_1)} + A_2 e^{j(wt + \phi_2)}&=e^{jwt}\left (A_1 e^{j\phi_1} + A_2 e^{j\phi_2}\right )\\ &=e^{jwt}\left ( A_1 cos(\phi_1) + jA_1 sin(\phi_1) + A_2 cos(\phi_2) + jA_2 sin(\phi_2)\right )\\ &=e^{jwt}\left [\left (A_1 cos(\phi_1) + A_2 cos(\phi_2) \right ) + j\left(A_1 sin(\phi_1) + A_2 sin(\phi_2) \right ) \right ] \end{align*}\]

The value inside the square brackets can be viewed as a complex number in its rectangular form: x + jy. We can convert it to its polar form: re^{j\theta}, by calculating:

\[\begin{align*} r&=\sqrt{x^2+y^2}\\ \theta&=atan(\frac{y}{x}) \end{align*}\]

In our case:

\[r=\sqrt{(A_1 cos(\phi_1)+A_2 cos(\phi_2))^2 + (A_1 sin(\phi_1)+A_2 sin(\phi_2))^2}\]

And:

\[\theta = atan{\frac{A_1 sin(\phi_1)+A_2 sin(\phi_2)}{A_1 cos(\phi_1)+A_2 cos(\phi_2)}}\]

Therefore, the sum of complex exponentials is another complex exponential with the same frequency, but a different magnitude and phase:

\[A_1 e^{j(wt + \phi_1)} + A_2 e^{j(wt + \phi_2)}= e^{jwt} r e^{j \theta}=r e^{j(wt + \theta)}\]

From here, we can use Euler's equation again to see the equivalence in terms of sinusoidal functions:

\[\begin{align*} A_1 cos(wt+\phi_1)+jA_1 sin(wt+\phi_1)&+\\ A_2 cos(wt+\phi_2)+jA_2 sin(wt+\phi_2)&=r cos(wt+\theta) + jr sin(wt+\theta) \end{align*}\]

If we only compare the imaginary parts of this equation, we get:

\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=r sin(wt+\theta)\]

With known r and we've calculated earlier from the other constants \blacksquare

Note that by comparing the real parts of the equation, we can trivially prove a similar statement about the sum of cosines (which should surprise no one, since a cosine is just a phase-shifted sine).

[1]	Electrical engineers prefer their signal frequencies in units of radian per second. We also like calling the imaginary unit j instead of i, because the latter is used for electrical current.

[2]	If you're wondering "hold on, why would this work?", recall that any point (x,y) on the Cartesian plane can be represented using polar coordinates with magnitude and angle .

Some notes on Luz - an assembler, linker and CPU simulator

2017-01-05T06:27:00-08:00

A few years ago I wrote about Luz - a self-educational project to implement a CPU simulator and a toolchain for it, consisting of an assembler and a linker. Since then, I received some questions by email that made me realize I could do a better job explaining what the project is and what one can learn from it.

So I went back to the Luz repository and fixed it up to be more modern, in-line with current documentation standards on GitHub. The landing README page should now provide a good overview, but I also wanted to write up some less formal documentation I could point to - a place to show-off some of the more interesting features in Luz; a blog post seemed like the perfect medium for this.

As before, it makes sense to start with the Luz toplevel diagram:

Luz is a collection of related libraries and programs written in Python, implementing all the stages shown in the diagram above.

The CPU simulator

The Luz CPU is inspired by MIPS (for the instruction set), by Altera Nios II (for the way "peripherals" are attached to the CPU), and by MPC 555 (for the memory controller) and is aimed at embedded uses, like Nios II. The Luz user manual lists the complete instruction set explaining what each instructions means.

The simulator itself is functional only - it performs the instructions one after the other, without trying to simulate how long their execution takes. It's not very remarkable and is designed to be simple and readable. The most interesting feature it has, IMHO, is how it maps "peripherals" and even CPU control registers into memory. Rather than providing special instructions or traps for OS system calls, Luz facilitates "bare-metal" programming (by which I mean, without an OS) by mapping "peripherals" into memory, allowing the programmer to access them by reading and writing special memory locations.

My inspiration here was soft-core embeddable CPUs like Nios II, which let you configure what peripherals to connect and how to map them. The CPU can be configured before it's loaded onto real HW, for example to attach as many SPI interfaces as needed. For Luz, to create a new peripheral and attach it to the simulator one implements the Peripheral interface:

class Peripheral(object):
    """ An abstract memory-mapped perhipheral interface.
        Memory-mapped peripherals are accessed through memory
        reads and writes.

        The address given to reads and writes is relative to the
        peripheral's memory map.
        Width is 1, 2, 4 for byte, halfword and word accesses.
    """
    def read_mem(self, addr, width):
        raise NotImplementedError()

    def write_mem(self, addr, width, data):
        raise NotImplementedError()

Luz implements some built-in features as peripherals as well; for example, the core registers (interrupt control, exception control, etc). The idea here is that embedded CPUs can have multiple custom "registers" to control various features, and creating dedicated names for them bloats instruction encoding (you need 5 bits to encode one of 32 registers, etc.); it's better to just map them to memory.

Another example is the debug queue - a peripheral useful for testing and debugging. It's a single word mapped to address 0xF0000 in the simulator. When the peripheral gets a write, it stores it in a special queue and optionally emits the value to stdout. The queue can later be examined. Here is a simple Luz assembly program that makes use of it:

# Counts from 0 to 9 [inclusive], pushing these numbers into the debug queue

    .segment code
    .global asm_main

    .define ADDR_DEBUG_QUEUE, 0xF0000

asm_main:
    li $k0, ADDR_DEBUG_QUEUE

    li $r9, 10                          # r9 is the loop limit
    li $r5, 0                           # r5 is the loop counter

loop:
    sw $r5, 0($k0)                      # store loop counter to debug queue
    addi $r5, $r5, 1                    # increment loop counter
    bltu $r5, $r9, loop                 # loop back if not reached limit

    halt

Using the interactive runner to run this program we get:

$ python run_test_interactive.py loop_simple_debugqueue
DebugQueue: 0x0
DebugQueue: 0x1
DebugQueue: 0x2
DebugQueue: 0x3
DebugQueue: 0x4
DebugQueue: 0x5
DebugQueue: 0x6
DebugQueue: 0x7
DebugQueue: 0x8
DebugQueue: 0x9
Finished successfully...
Debug queue contents:
['0x0', '0x1', '0x2', '0x3', '0x4', '0x5', '0x6', '0x7', '0x8', '0x9']

Assembler

There's a small snippet of Luz assembly shown above. It's your run-of-the-mill RISC assembly, with the familiar set of instructions, fairly simple addressing modes and almost every instruction requiring registers (note how we can't store into the debug queue directly, for example, without dereferencing a register that holds its address).

The Luz user manual contains a complete reference for the instructions, including their encodings. Every instruction is a 32-bit word, with the 6 high bits for the opcode (meaning up to 64 distinct instructions are supported).

The code snippet also shows off some special features of the full Luz toolchain, like the special label asm_main. I'll discuss these later on in the section about linking.

Assembly languages are usually fairly simple to parse, and Luz is no exception. When I started working on Luz, I decided to use the PLY library for the lexer and parser mainly because I wanted to play with it. These days I'd probably just hand-roll a parser.

Luz takes another cool idea from MIPS - register aliases. While the assembler doesn't enforce any specific ABI on the coder, some conventions are very important when writing large assembly programs, and especially when interfacing with routines written by other programmers. To facilitate this, Luz designates register aliases for callee-saved registers and temporary registers.

For example, the general-purpose register number 19 can be referred to in Luz assembly as $r19 but also as $s1 - the callee-saved register 1. When writing standalone Luz programs, one is free to ignore these conventions. To get a taste of how ABI-conformant Luz assembly would look, take a look at this example.

To be honest, ABI was on my mind because I was initially envisioning a full programming environment for Luz, including a C compiler. When you have a compiler, you must have some set of conventions for generated code like procedure parameter passing, saved registers and so on; in other words, the platform ABI.

Linker

In my view, one of the distinguishing features of Luz from other assembler projects out there is the linker. Luz features a full linker that supports creating single "binaries" from multiple assembly files, handling all the dirty work necessary to make that happen. Each assembly file is first "assembled" into a position-independent object file; these are glued together by the linker which applies the necessary relocations to resolve symbols across object files. The prime sieve example shows this in action - the program is divided into three .lasm files: two for subroutines and one for "main".

As we've seen above, the main subroutine in Luz is called asm_main. This is a special name for the linker (not unlike the _start symbol for modern Linux assemblers). The linker collects a set of object files produced by assembly, and makes sure to invoke asm_main from the special location 0x100000. This is where the simulator starts execution.

Luz also has the concept of object files. They are not unlike ELF images in nature: there's a segment table, an export table and a relocation table for each object, serving the expected roles. It is the job of the linker to make sense in this list of objects and correctly connect all call sites to final subroutine addresses.

Luz's standalone assembler can write an assembled image into a file in Intel HEX format, a popular format used in embedded systems to encode binary images or data in ASCII.

The linker was quite a bit of effort to develop. Since all real Luz programs are small I didn't really need to break them up into multiple assembly files; but I really wanted to learn how to write a real linker :) Moreover, as already mentioned my original plans for Luz included a C compiler, and that would make a linker very helpful, since I'd need to link some "system" code into the user's program. Even today, Luz has some "startup code" it links into every image:

# The special segments added by the linker.
# __startup: 3 words
# __heap: 1 word
#
LINKER_STARTUP_CODE = string.Template(r'''
        .segment __startup

    LI      $$sp, ${SP_POINTER}
    CALL    asm_main

        .segment __heap
        .global __heap
    __heap:
        .word 0
''')

This code sets up the stack pointer to the initial address allocated for the stack, and calls the user's asm_main.

Debugger and disassembler

Luz comes with a simple program runner that will execute a Luz program (consisting of multiple assembly files); it also has an interactive mode - a debugger. Here's a sample session with the simple loop example shown above:

$ python run_test_interactive.py -i loop_simple_debugqueue

LUZ simulator started at 0x00100000

[0x00100000] [lui $sp, 0x13] >> set alias 0
[0x00100000] [lui $r29, 0x13] >> s
[0x00100004] [ori $r29, $r29, 0xFFFC] >> s
[0x00100008] [call 0x40003 [0x10000C]] >> s
[0x0010000C] [lui $r26, 0xF] >> s
[0x00100010] [ori $r26, $r26, 0x0] >> s
[0x00100014] [lui $r9, 0x0] >> s
[0x00100018] [ori $r9, $r9, 0xA] >> s
[0x0010001C] [lui $r5, 0x0] >> s
[0x00100020] [ori $r5, $r5, 0x0] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> s
[0x0010002C] [bltu $r5, $r9, -2] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> s
[0x0010002C] [bltu $r5, $r9, -2] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> r
$r0   = 0x00000000   $r1   = 0x00000000   $r2   = 0x00000000   $r3   = 0x00000000
$r4   = 0x00000000   $r5   = 0x00000002   $r6   = 0x00000000   $r7   = 0x00000000
$r8   = 0x00000000   $r9   = 0x0000000A   $r10  = 0x00000000   $r11  = 0x00000000
$r12  = 0x00000000   $r13  = 0x00000000   $r14  = 0x00000000   $r15  = 0x00000000
$r16  = 0x00000000   $r17  = 0x00000000   $r18  = 0x00000000   $r19  = 0x00000000
$r20  = 0x00000000   $r21  = 0x00000000   $r22  = 0x00000000   $r23  = 0x00000000
$r24  = 0x00000000   $r25  = 0x00000000   $r26  = 0x000F0000   $r27  = 0x00000000
$r28  = 0x00000000   $r29  = 0x0013FFFC   $r30  = 0x00000000   $r31  = 0x0010000C

[0x00100028] [addi $r5, $r5, 0x1] >> s 100
[0x00100030] [halt] >> q

There are many interesting things here demonstrating how Luz works:

Note the start up at 0x1000000 - this is where Luz places the start-up segment - three instructions that set up the stack pointer and then call the user's code (asm_main). The user's asm_main starts running at the fourth instruction executed by the simulator.
li is a pseudo-instruction, broken into two real instructions: lui for the upper half of the register, followed by ori for the lower half of the register. The reason for this is li having a 32-bit immediate, which can't fit in a Luz instruction. Therefore, it's broken into two parts which only need 16-bit immediates. This trick is common in RISC ISAs.
Jump labels are resolved to be relative by the assembler: the jump to loop is replaced by -2.
Disassembly! The debugger shows the instruction decoded from every word where execution stops. Note how this exposes pseudo-instructions.

The in-progress RTL implementation

Luz was a hobby project, but an ambitious one :-) Even before I wrote the first line of the assembler or simulator, I started working on an actual CPU implementation in synthesizable VHDL, meaning to get a complete RTL image to run on FPGAs. Unfortunately, I didn't finish this part of the project and what you find in Luz's experimental/luz_uc directory is only 75% complete. The ALU is there, the registers, the hookups to peripherals, even parts of the control path - dealing with instruction fetching, decoding, etc. My original plan was to implement a pipelined CPU (a RISC ISA makes this relatively simple), which perhaps was a bit too much. I should have started simpler.

Conclusion

Luz was an extremely educational project for me. When I started working on it, I mostly had embedded programming experience and was just starting to get interested in systems programming. Luz flung me into the world of assemblers, linkers, binary images, calling conventions, and so on. Besides, Python was a new language for me at the time - Luz started just months after I first got into Python.

Its ~8000 lines of Python code are thus likely not my best Python code, but they should be readable and well commented. I did modernize it a bit over the years, for example to make it run on both Python 2 and 3.

I still hope to get back to the RTL implementation project one day. It's really very close to being able to run realistic assembly programs on real hardware (FPGAs). My dream back then was to fully close the loop by adding a Luz code generation backend to pycparser. Maybe I'll still fulfill it one day :-)

Introducing Luz

2010-05-05T19:43:38-07:00

OK, so the documentation still isn't complete, but I can't wait to introduce my newest concoction - Luz. Luz is a pure-Python implementation of a MIPS-like CPU (as a simulator, of course). This CPU is programmable in an assembly language, a complete assembler for which has been implemented, along with a linker that takes together several object files and creates an executable image to run on the simulator. Oh, and did I mention that it also includes a rudimentary debugger and disassembler? All of this is Luz:

To call Luz new is a bit of a stretch, because I started working on it more than two years ago. It has been a jagged road, with occasional spurts of productivity, but now Luz is finally in a presentable form.

I'll paste from its "getting started guide":

What is Luz useful for? I don't know yet. It's a self-educational project of mine, and I learned a lot by working on it. I suppose that Luz's main value is as an educational tool. Its implementation focuses on simplicity and modularity, and is done in Python, which is a portable and very readable high-level language. Luz can serve as a sample of implementing a complete assembler, a complete linker, a complete CPU simulator. Other such tools exist, but usually not in the clean and self-contained form offered by Luz. In any case, if you've found Luz iseful, I'd love to receive feedback.

This summarizes it, really. Not much more to add, except that Luz is available in source-only form for now, so you'll have to check it out from SVN or just look at the sources in the online browser. Checking the source out is recommended because it allows one to view the documentation in nice HTML format. A few example programs in Luz assembly are available. Luz requires Python 2.6 or higher and the PLY module installed. I tested it on Windows XP and Ubuntu.

I've written an assembler and a CPU simulator before, but that was for a very weird architecture (Knuth's MIX from TAOCP). Luz is a much more useful beast - the CPU is not far from real modern CPUs (the embedded kind, mostly), the assembly language is familiar and best of all, Luz also includes a linker, which will make it much easier to compile C for it in the future.

I'll write more about Luz in sometime later, when I find the time to work on its documentation.

Framing in serial communications

2009-08-12T05:16:47-07:00

Introduction

In the previous post we've seen how to send and receive data on the serial port with Python and plot it live using a pretty GUI.

Notice that the sender script (sender_sim.py) is just sending one byte at a time. The "chunks" of data in the protocol between the sender and receiver are single bytes. This is simple and convenient, but hardly sufficient in the general sense. We want to be able to send multiple-byte data frames between the communicating parties.

However, there are some challenges that arise immediately:

The receiver is just receiving a stream of bytes from the serial port. How does it know when a message begins or ends? How does it know how long the message is?
Even more seriously, we can not assume a noise-free channel. This is real, physical hardware stuff. Bytes and whole chunks can and will be lost due to electrical noise. Worse, other bytes will be distorted (say, a single bit can be flipped due to noise).

To see how this can be done in a safe and tested manner, we first have to learn about the basics of the Data Link Layer in computer networks.

Data Link Layer

Given a physical layer that can transmit signals between devices, the job of the Data Link Layer [1] is (roughly stated) to transmit whole frames of data, with some means of assuring the integrity of the data (lack of errors). When we use sockets to communicate over TCP or UDP on the internet, the framing is taken care of deep in the hardware, and we don't even feel it. On the serial port, however, we must take care of the framing and error handling ourselves [2].

Framing

In chapter 3 of his "Computer Networks" textbook, Tanenbaum defines the following methods of framing:

Inserting time gaps between frames
Physical layer coding violations
Character count
Flag bytes with byte stuffing
Flag bytes with bit stuffing

Methods (1) and (2) are only suitable for a hardware-implemented data link layer [3]. It is very difficult (read: impossible) to ensure timing when multiple layers of software (running on Windows!) are involved. (2) is an interesting hardware method - but out of the scope of this article.

Method (3) means specifying in the frame header the number of bytes in the frame. The trouble with this is that the count can be garbled by a transmission error. In such a case, it's very difficult to "resynchronize". This method is rarely used.

Methods (4) and (5) are somewhat similar. In this article I'll focus on (4), as (5) is not suitable for serial port communications.

Flag bytes with byte stuffing

Let's begin with a simple idea and develop it into a full, robust scheme.

Flag bytes are special byte values that denote when a frame begins and ends. Suppose that we want to be able to send frames of arbitrary length. A special start flag byte will denote the beginning of the frame, and an end flag byte will denote its end.

A question arises, however. Suppose that the value of the end flag is 0x98. What if the value 0x98 appears somewhere in the data? The protocol will get confused and end the message.

There is a simple solution to this problem that will be familiar to all programmers who know about escaping quotes and special characters in strings. It is called byte stuffing, or octet stuffing, or simply escaping [4]. The scheme goes as follows:

Whenever a flag (start or end) byte appears in the data, we shall insert a special escape byte (ESC) before it. When the receiver sees an ESC, it knows to ignore it and not insert it into the actual data received (de-stuffing).
Whenever ESC itself has to appear in the data, another ESC is prepended to it. The receiver removes the first one but keeps the second one [5].

Here are a few examples:

Note that we didn't specify what the data is - it's arbitrary and up the the protocol to decide. The only really required part of the data is some kind of error checking - a checksum, or better yet a CRC. This is customarily the last byte (or last word) of the frame, referring to all the bytes in the frame (in its un-stuffed form).

This scheme is quite robust: any lost byte (be it a flag, an escape, a data byte or a checksum byte) will cause the receiver to lose just one frame, after which it will resynchronize onto the start flag byte of the next one.

PPP

As a matter of fact, this method is a slight simplification of the Point-to-Point Protocol (PPP) which is used by most ISPs for providing ADSL internet to home users, so there's a good chance you're using it now to surf the net and read this article! The framing of PPP is defined in RFC 1662.

In particular, PPP does the following:

Both the start and end flag bytes are 0x7E (they shouldn't really be different, if you think about it)
The escape byte is 0x7D
Whenever a flag or escape byte appears in the message, it is escaped by 0x7D and the byte itself is XOR-ed with 0x20. So, for example 0x7E becomes 0x7D 0x5E. Similarly 0x7D becomes 0x7D 0x5D. The receiver unsuffs the escape byte and XORs the next byte with 0x20 again to get the original [6].

An example

Let's now see a completely worked-out example that demonstrates how this works.

Suppose we define the following protocol:

Start flag: 0x12
End flag: 0x13
Escape (DLE): 0x7D

And the sender wants to send the following data message (let's ignore its contents for the sake of the example - they're really not that important). The original data is in (a):

The data contains two flags that need to be escaped - an end flag at position 2 (counting from 0, of course!), and a DLE at position 4.

The sender's data link layer [7] turns the data into the frame shown in (b) - start and end flags are added, and in-message flags are escaped.

Let's see how the receiver handles such a frame. For demonstration, assume that the first byte the receiver draws from the serial port is not a real part of the message (we want to see how it handles this). In the following diagram, 'Receiver state' is the state of the receiver after the received byte. 'Data buffer' is the currently accumulated message buffer to pass to an upper level:

A few things to note:

The "stray" byte before the header is ignored: according to the protocol each frame has to start with a header, so this isn't part of the frame.
The start and end flags are not inserted into the data buffer
Escapes (DLEs) are correctly handled by a special state
When the frame is finished with an end flag, the receiver has a frame ready to pass to an upper level, and comes back waiting for a header - a new frame.

Finally, we see that the message received is exactly the message sent. All the protocol details (flags, escapes and so on) were transparently handled by the data link layer [8].

Conclusion

There are several methods of handling framing in communications, although most are unsuitable to be used on top of the serial port. Among the ones that are suitable, the most commonly used is byte stuffing. By defining a couple of "magic value" flags and careful rules of escaping, this framing methods is both robust and easy to implement as a software layer. It is also widely used as PPP depends on it.

Finally, it's important to remember that for a high level of robustness, it's required to add some kind of error checking into the protocol - such as computing a CRC on the message and appending it as the last word of the message, which the receiver can verify before deciding that the message is valid.

[1]	The Data Link Layer is layer 2 in the OSI model. In the TCP/IP model it's simply called the "link layer".

[2]	The serial port can be configured to add parity bits to bytes. These days, this option is rarely used, because:

A single parity bit isn't a very strong means of detecting errors. 2-bit errors fool it.
Error handling is usually done by stronger means at a higher level.

[3]	For example Ethernet (802.3) uses 12 octets of idle characters between frames.

[4]	You might run into the term DLE - Data Link Escape, which means the same thing. I will use the acronyms DLE and ESC interchangeably.

[5]	Just like quotes and escape characters in strings! In C: `"I say \"Hello\""`. To escape the escape, repeat it: `"Here comes the backslash: \\ - seen it?"`

[6]	I'd love to hear why this XOR-ing is required. One simple reason I can think of is to prevent the flag and escape bytes appearing "on the line" even after they're escaped. Presumably this improves resynchronization if the escape byte is lost?

[7]	Which is just a fancy way to say "a protocol wrapping function", since the layer is implemented in software.

[8]	Such transparency is one of the greatest ideas of layered network protocols. So when we implement protocols in software, it's a good thing to keep in mind - transparency aids modularity and decoupling, it's a good thing.

Book review: "Tab electronics guide to understanding Electricity and Electronics" by G. Randy Slone

2009-07-10T10:28:41-07:00

This book has two main aims. One is to teach the basics of electronics. The other is to serve as a guide to electronics hobbyists for setting up a lab to build stuff and experiment with circuits. It is the second aim which spurred me to purchase it. I think the best way to present this review is as a list of pros and cons of the book. Pros:

As far as I can tell (mainly from work experience) without actually building anything following the book's directions, the advice given on setting up a workbench is correct, interesting and easy to follow. Since this is one of the main goals of the book, it's an important point in its favor.
Basic electronics, up to and including BJTs are explained quite well - I think that any intelligent beginner can learn a lot from this book, even without prior experience. The author manages to leverage intuition in useful ways whenever possible, and doesn't get into the complex math you often see in textbooks on the subject.
The book contains a lot of interesting and useful circuits to build, in specially designated "Circuit Porpourri" sections. After some point in the book, the author presents several circuits at the end of each chapter, explaining in brief what they do and specifying which components to purchase, and how to connect everything together.

Cons:

The author has made an unfortunate decision to use electron current flow (current flows from negative to positive) instead of the conventional current flow used almost universally throughout the industry. While electron flow is admittedly more correct in most cases, convention plays a strong role here, and the author should've followed it. Doesn't it feel funny for a beginner to read that "in the transistor, the current goes against the arrow"? Why against?
Although the second edition is from 1990, the book is dated. This is clearly seen in the equipment photographs the author uses, as well as in some advice. For instance, AFAIK, no one uses paper catalogs for components these days - datasheets are downloaded from the web and catalogs are online too.
After the chapter on BJTs, the wind ran out in the author's sails for clear explanations. MOSFETs are explained in a very sketchy way, and op-amps are left practically without any explanation at all. Nevertheless the author keeps piling up circuits assuming the reader will just learn this stuff in some other place?

That said, for $16 (darn cheap for books on Electronics) the book isn't a bad deal. Especially if you're looking into setting up your own electronics lab, this book can be a valuable first guide. I wouldn't recommend seriously attempting to learn electronics with it as the only source, though. In this aspect, this book can serve as only a basic introduction, with more serious texts being a must for deeper understanding of the more advanced concepts circuits.

Solution to the RC circuit puzzle

2008-12-26T11:21:45-08:00

Here, as promised, is the solution to the RC circuit puzzle I posted earlier this week. Let's look at the circuit again:

The problem with my reasoning was the direction of current in the capacitor. I've quietly assumed that:

But this is wrong for the circuit above. Why? Because we must obey the voltage & current directions we've chosen. In passive elements, the positive current flows from the higher voltage to the lower voltage, meaning that in our circuit:

This small minus sign makes all the difference, and now the solution will be correct. Physically, the intuition is that the current here flows from a discharging capacitor, hence it's "against" the voltage direction. Had it been a capacitor-charging circuit, there would be no confusion.

An RC circuit puzzle

2008-12-22T22:16:05-08:00

If you're interested in electronics, you'll find the following simple "paradox" amusing. It's the usual case of "proving that 2+2=5". The fun is finding where the mistake in the reasoning is. Consider the following circuit:

Assume that the capacitor is charged to some initial voltage before the switch is closed. At time 0, the switch is closed. What is the current in the circuit as a function of time ? Let's solve it using the familiar RC circuit methods. We know that because of Kirchoff's voltage law. We'll differentiate both sides by time: $\dot{V}_{c}(t) = \dot{V}_{R}(t)$ We know that for a capacitor, the relation between current and voltage is:

Substituting it into the equation above and also recalling that , we get:

But the current through the capacitor and resistor is the same current, so this can be rewritten simply as:

This is a simple first order differential equation, the solution of which is:

For some initial current . But wait a second, how can the exponent be positive, won't it grow to infinity with time ? There's obviously a mistake here, somewhere. Can you find it ? This problem gave me some headache last night, and today I've successfully stumped a few co-workers with it. I'll post a solution in a couple of days.

memmgr - a fixed-pool memory allocator

2008-10-17T12:44:17-07:00

In embedded systems, it is common to write code that runs on "bare metal", i.e. without an operating system. On one hand, it is very empowering. When you write your main function (assuming it's C, of course, but that's a safe assumption for 95% of embedded code), you know it has the full control of the processor. Your program is the brains of the chip - whatever you write, the chip performs, without any external code getting in your way. On the other hand, code running this way misses many of the benefits operating systems provide. Process control, memory management, file system, and so on. When writing code to run on bare metal, there are some special precautions one must take. One important point to consider is the heap - dynamic memory allocation. An embedded system (think of the safety controller of a Boeing plane) can't just fail because the heap runs out. When malloc returns 0 to your desktop-application code, in most cases you will just bail out, because most probably it's the system's fault, and you don't have much choice. In an embedded controller, this is not an option. There is nowhere to bail out to, and in any case, that heap memory ran out is your fault, a bug in your design or code. To help managing these complications, embedded programmers often avoid heap allocation altogether, and only use static allocation (i.e. arrays allocated at compile (or more accurately - link/load) time). However, sometimes this is less than optimal, because:

Dynamic allocation helps write code in a more convenient and reusable way.
You may be using some 3rd party code that uses dynamic allocation

The solutions to this problem are numerous, but as any self-respecting embedded programmer, I wrote my own fixed-pool memory allocator. It provides a pair of functions:


// 'malloc' clone
//
void* memmgr_alloc(ulong nbytes);

// 'free' clone
//
void memmgr_free(void* ap);

That can be used as a drop-in replacement for malloc and free, but with a twist. There is no heap involved. All the memory is allocated from, and returned to, a fixed pool of memory that's allocated at link time (in simpler terms: a static array). This way, you know the maximal amount of space your heap will take even before running the program, and can use these functions to test that your program indeed doesn't allocate more than you assumed. Moreover, the library allows a printout of allocation statistics (which you can enhance, the code is open) that will help diagnose allocation problems and memory leaks. The library (350 LOC of ANSI C) can be downloaded from here. Let me know if you've found it useful.

TFTP

2007-04-03T12:42:39-07:00

Some time ago I heard about TFTP - something I've never encountered before. TFTP is an acronym for Trivial File Transfer Protocol. Yes, like FTP, just Trivial. TFTP is a much watered down version of FTP - its only command is to transfer a file from a place to a place - no directory listing, deleting, renaming, user authentication. What is it useful for, one may wonder.

Well, not every computer is a PC. In recent years more and more small embedded devices are becoming networked, and one of the best forms of networking is TCP / UDP / IP - the same set of protocols the Internet works on.

TFTP works on top of UDP, as opposed to FTP which works on TCP. UDP is a far simpler protocol than TCP, since it is a "send and forget" concept, without ensuring the correct arrival of data, in order, like TCP does. As a result, it is much easier to implement which leads to an implementation with a smaller footprint, and this is important for embedded devices. TFTP itself is also much simpler than FTP. It ensures the correct transfer of data by employing a simple stop and wait protocol on top of UDP. I assume that it also makes it slower than FTP on non-congested networks, since FTP's reliability is achieved on the TCP level which works in selective repeat. However, simplicity is often more important than performance, especially for embedded devices with small amounts of ROM.

So TFTP is perfect for embedded devices to transfer data to and from each other (and PCs) in a reliable, quick way (UDP / IP on Ethernet is far faster than serial RS232 / RS485 communication, the most common interconnection method of embedded devices).

The research into TFTP led me through a few interesting sources of information, on Wikipedia, HowStuffWorks and RFCs. RFC 1180 is especially helpful - it's a tutorial written in a very readable style that explains the basics of IP, ARP, routing tables and TCP / UDP. RFC 1350 describes TFTP. RFC 1123 is a thorough collection of all Internet related protocols with cross references to other relevant RFCs.

The TCP / UDP / IP network stack is one of the nicest examples of sound engineering, and IMHO it is beneficial to get at least a superficial understanding of how these things work under the hood.

Antialiasing filters and multirate systems

2006-05-10T17:27:30-07:00

What is this about ?

Antialiasing is an important topic to understand when dealing with digital processing of data. In this article I concentrate on the various methods used to combat this phenomenon, and try to explain what is multirate filtering and how it is related to antialiasing.

Sampling

Both analog and digital signals can be sampled. If we use an ADC that runs at 10^6 samples per second, then we can say that the analog signal on the ADCs input is sampled at frequency Fs = 1 MHz. Each digital signal has some sampling frequency tied to it - the frequency at which it was sampled [1]. If we take a digital signal that was sampled at frequency Fs, and grab each 4th sample discarding the others, we get a digital signal sampled at Fs/4.

What is aliasing ?

I won't go too much into mathematical details here, as I assume that the basics are well known. Whenever some signal (either analog or digital) is sampled at frequency Fs, aliasing will occur if the original signal had harmonics at Fs/2 or higher (that is, if the sampling frequency was below the Nyquist frequency of the signal). See the links section in the end of this articles for more details. Aliasing is "bad for you", it distorts a signal in a way that can't really be fixed, so engineers to their best to avoid it. Fortunately, it is quite simple, using antialias filters.

Antialias filters

So, we have a signal we want to sample, and we want to avoid aliasing. What should be done ? Generally, given that our sampling rate is Fs, we just need to make sure that there are no harmonics faster than Fs/2 in the signal. How can we assure this ? By using a lowpass filter that cuts all the harmonics above Fs/2. Such a lowpass filter is called an "antialias filter".

Say that the input to your system is an analog signal. You know that it has no "important" information at harmonics above 10 KHz, so you can safely sample it with an ADC at 20 KHz. However, although nothing above 10 KHz interests you, the signal might (and will, in a real-world system) have some power at harmonics above 10 KHz, mostly because of noise and the imperfect nature of analog signals. So, how do you avoid aliasing ? Right, by using an antialias filter. And how is that done ? Exactly - just prepend a lowpass filter to the ADC, which cuts off at 10 KHz, and viola, you'll have a clean sampled signal.

Analog antialias filters

This is nice in theory, but in the real world, such an implementation poses some serious difficulties. To get a clean signal, you must use a very accurate lowpass filter, one that passes everything below 10 KHz and nothing above 10 KHz. In the DSP jargon such a filter is called a "brick-wall" filter, since it looks like a brick wall with completely right angles.

The sad truth is that such filters are impossible. They are unreal - a theoretical delirium. We can get quite close though, but constructing an analog filter that is close to a brick wall requires an accurate, high-order filtering circuitry, which is difficult and quite expensive. Fortunately, there is hope - multirate filtering to the rescue !

Digital antialias filters

I will discuss the general topic of multirate filtering below, but for now I want to explain how it helps with digital antialias filtering.

Consider the following solution to the problem presented in the last section: We know that there's interesting information at up to 10 KHz, so we should sample the signal at at least 20 KHz, according to the sampling theorem [2]. But nothing prevents us from sampling it at a much higher frequency, and gain an important advantage by doing so.

Suppose we sample the signal at 100 KHz instead of 20 KHz. Now, to avoid antialiasing in this sampling, we must attach a lowpass filter before the ADC that cuts off at 50 KHz. Note, however, that it is not obliged to be a brick wall filter, since 50 KHz is very far from 10 KHz where the information is, so we don't mind for some useless frequencies at 40+KHz to be attenuated. Hence, we can attach a very simple analog filter before the ADC - a RC for example, tuned to 50 KHz. This helps with the antialiasing of the 100 KHz sample, but it doesn't ensure a clean signal, since frequencies between 10 and 50 KHz still pass through, disrupting information that is stored at below 10 KHz.

To solve this problem, we now apply another antialias filter on the sampled data. We can now apply a digital lowpass filter tuned to cut-off at 10 KHz. Digital filters also can't be brick wall, but they can easily approach it, at a fraction of the cost of an analog filter with the same specification !

So, the full solution is: sample the input signal at 100 KHz with an ADC, which has a simple RC filter at its input configured to cut off at 50 KHz. Next, we apply an "almost brick wall" digital lowpass filter configured to cut off at 10 KHz. Then, we can resample our 100 KHz signal to 20 KHz (by simply discarding 4 out of each 5 samples) and yay - we have a clean 10 KHz signal sampled at 20 KHz, no aliasing and no noise disrupting the information.

Multirate systems

"Multirate" simply means multiple sampling rates. A multirate DSP system uses multiple sampling rates within the system. In the example above, we have a multirate system because the signal is first sampled at 100 KHz and later re-sampled at 20 KHz. Generally, if we can allow to increase the initial sampling frequency of the analog signal (which is called oversampling the signal), we can lower the overall cost of the system because the analog part becomes much simpler.

Decimation, Interpolation and Resampling

Decimation is decreasing the sampling rate of a signal. In our example, after the digital antialias filter is applied, the signal is decimated by a factor of 5 from 100 KHz to 20 KHz. Another common use for decimation is decreasing the sampling rate to ease on the computation. Suppose you just need to sample an audio signal, for which 44 KHz is usually enough, but you only have a 10 MHz ADC. Why overwhelm your processor with so much samples, when decimating by a factor of 100 would be just fine.

Interpolation is the reverse process - increasing the sampling rate of a signal. This is usually done by inserting a certain amount of zeros between each sample of a signal (inserting N zeros means a N + 1 times increase in the signal's frequency) and passing the signal through a digital lowpass filter. The aim is often to generate an input for a system with a faster sampling rate.

Resampling is a combination of Decimation and Interpolation. If you have a signal with sampling frequency Fs and you want to have a signal with a sampling frequency of 2.5 * Fs, you can first interpolate the signal by a factor of 5 and then decimate it by a factor of 2.

Note: in some digital FIR filter generation tools, it is often possible to combine a FIR with decimation and / or interpolation. This is because the combination allows for a more efficient implementation than separate stages of filtering and resampling.

Digital-only antialias filtering

A common misconception seems to be that it is possible to implement antialias filtering without analog circuitry. This is false. Between a real world analog signal and a digital system there must, somewhere, lie the brink where the analog signal is sampled to turn it into the digital signal. And in real physical signals, wherever there is sampling, there is aliasing. So analog filtering is essential, unless you are very sure that your analog signal really doesn't have any power at above Fs/2, which is rarely the case.

A good rule of thumb is: whenever you sample an analog signal for digital processing with an ADC at rate Fs, attach a simple RC lowpass filter configured to cut off at Fs/2 before the ADC (it is best to make is a little less than Fs/2 to account for the very imperfect performance of a RC filter). This assures that the sampled signal is free of aliases. Later, you can apply multirate techniques with digital filters to further shape your digital signal.

So remember, digital antialias filtering works only for digital signals.

Links

These links were active at the time I wrote the article. If you find a dead link, let me know. In any case, Googling for the link's title may bring you to its new location and other related sources.

Notes

[1] - For simplicity it is sometimes useful to assume that an analog signal is just a digital signal sampled at a very high frequency, say 10^50 Hz.

[2] - The sampling theorem states that in order for a band limited (at Fv) signal to be reconstructed fully, it must be sampled at a rate Fs >= 2*Fv.