<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Eli Bendersky's website - EE &amp; Embedded</title><link href="https://eli.thegreenplace.net/" rel="alternate"></link><link href="https://eli.thegreenplace.net/feeds/ee-embedded.atom.xml" rel="self"></link><id>https://eli.thegreenplace.net/</id><updated>2025-05-21T03:01:20-07:00</updated><entry><title>Convolutions, Polynomials and Flipped Kernels</title><link href="https://eli.thegreenplace.net/2025/convolutions-polynomials-and-flipped-kernels/" rel="alternate"></link><published>2025-05-20T20:01:00-07:00</published><updated>2025-05-21T03:01:20-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2025-05-20:/2025/convolutions-polynomials-and-flipped-kernels/</id><summary type="html">&lt;p&gt;This is a post about multiplying polynomials, convolution sums and the
connection between them.&lt;/p&gt;
&lt;div class="section" id="multiplying-polynomials"&gt;
&lt;h2&gt;Multiplying polynomials&lt;/h2&gt;
&lt;p&gt;Suppose we want to multiply one polynomial by another:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/96f9b24ba7ab07c468c2aa1e60608f958e9b539a.svg" style="height: 22px;" type="image/svg+xml"&gt;\[(3x^3+x^2+2x+1)\cdot(2x^2+6)\]&lt;/object&gt;
&lt;p&gt;This is basic middle-school math - we start by cross-multiplying:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/d4ab9f8944b83f765c0882de57371df7f71545c8.svg" style="height: 19px;" type="image/svg+xml"&gt;\[6x^5+2x^4+4x^3 …&lt;/object&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;This is a post about multiplying polynomials, convolution sums and the
connection between them.&lt;/p&gt;
&lt;div class="section" id="multiplying-polynomials"&gt;
&lt;h2&gt;Multiplying polynomials&lt;/h2&gt;
&lt;p&gt;Suppose we want to multiply one polynomial by another:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/96f9b24ba7ab07c468c2aa1e60608f958e9b539a.svg" style="height: 22px;" type="image/svg+xml"&gt;\[(3x^3+x^2+2x+1)\cdot(2x^2+6)\]&lt;/object&gt;
&lt;p&gt;This is basic middle-school math - we start by cross-multiplying:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/d4ab9f8944b83f765c0882de57371df7f71545c8.svg" style="height: 19px;" type="image/svg+xml"&gt;\[6x^5+2x^4+4x^3+2x^2+18x^3+6x^2+12x+6\]&lt;/object&gt;
&lt;p&gt;And then collect all the terms together by adding up the coefficients:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/16c1240b3f6d8f1024144d5c854a554dcb5cbae9.svg" style="height: 19px;" type="image/svg+xml"&gt;\[6x^5+2x^4+22x^3+8x^2+12x+6\]&lt;/object&gt;
&lt;p&gt;Let's look at a slightly different way to achieve the same result. Instead of
cross multiplying all terms and then adding up, we'll focus on what terms appear
in the output, and for each such term - what its coefficients are going to be.
This is easy to demonstrate with a table, where we lay out one polynomial
horizontally and the other vertically. Note that we have to be explicit about
the zero coefficient of &lt;em&gt;x&lt;/em&gt; in the second polynomial:&lt;/p&gt;
&lt;img alt="Table showing polynomial multiplication" class="align-center" src="https://eli.thegreenplace.net/images/2025/poly-mul-table.png" /&gt;
&lt;p&gt;The diagonals that add up to each term in the output are highlighted. For example,
to get the coefficient of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/9c51f590abd716225f526f3b099b27aa00172afa.svg" style="height: 15px;" type="image/svg+xml"&gt;x^3&lt;/object&gt; in the output, we have to add up:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;The coefficient of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/9c51f590abd716225f526f3b099b27aa00172afa.svg" style="height: 15px;" type="image/svg+xml"&gt;x^3&lt;/object&gt; in the first polynomial, multiplied by the
constant (coefficient of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/d4a92c6f08fa9e82c0bcd736856ea098b8c8a3b3.svg" style="height: 15px;" type="image/svg+xml"&gt;x^0&lt;/object&gt;) in the second polynomial&lt;/li&gt;
&lt;li&gt;The coefficient of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/7046d961a8144b7b2c2da6066849a9f889ff2ac9.svg" style="height: 15px;" type="image/svg+xml"&gt;x^2&lt;/object&gt; in the first polynomial, multiplied by the
coefficient of &lt;img alt="x" class="valign-0" src="https://eli.thegreenplace.net/images/math/11f6ad8ec52a2984abaafd7c3b516503785c2072.png" style="height: 8px;" /&gt; in the second polynomial&lt;/li&gt;
&lt;li&gt;The coefficient of &lt;img alt="x" class="valign-0" src="https://eli.thegreenplace.net/images/math/11f6ad8ec52a2984abaafd7c3b516503785c2072.png" style="height: 8px;" /&gt; in the first polynomial, multiplied by the
coefficients of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/7046d961a8144b7b2c2da6066849a9f889ff2ac9.svg" style="height: 15px;" type="image/svg+xml"&gt;x^2&lt;/object&gt; in the second polynomial&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(if the second polynomial had a &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/9c51f590abd716225f526f3b099b27aa00172afa.svg" style="height: 15px;" type="image/svg+xml"&gt;x^3&lt;/object&gt; term, there would be another
component to add)&lt;/p&gt;
&lt;p&gt;For what follows, let's move to a somewhat more abstract representation: a
polynomial &lt;em&gt;P&lt;/em&gt; can be represented as a sum of coefficients &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/ffa0d1acb27e00704b68904f64390704186cbc2c.svg" style="height: 15px;" type="image/svg+xml"&gt;P_i&lt;/object&gt;
multiplied by corresponding powers of &lt;em&gt;x&lt;/em&gt; &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/15f2df57859ae973d462d0d29eb209cc811b38d2.svg" style="height: 38px;" type="image/svg+xml"&gt;\[P=\sum_{i}P_i x^i\]&lt;/object&gt;
&lt;p&gt;Suppose we have two polynomials, &lt;em&gt;P&lt;/em&gt; and &lt;em&gt;R&lt;/em&gt;. When we multiply them together,
the resulting polynomial is &lt;em&gt;S&lt;/em&gt;. Based on our insight from the table diagonals
above, we can say that for each &lt;em&gt;k&lt;/em&gt;:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/44ed9e287aa311790a92c1ad3966fa622cb7ba46.svg" style="height: 38px;" type="image/svg+xml"&gt;\[S_k=\sum_{i}P_i\cdot R_{k-i}\]&lt;/object&gt;
&lt;p&gt;And then the entire polynomial &lt;em&gt;S&lt;/em&gt; is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/9df1cb964f21286b72cdb2cc3c5718fc63fcccf8.svg" style="height: 54px;" type="image/svg+xml"&gt;\[S=\sum_{k} \left( \sum_{i}P_i\cdot R_{k-i}\right)x^k\]&lt;/object&gt;
&lt;p&gt;It's important to understand this formulation, since it's key to this
post. Let's use our example polynomials to see how it works. First, we represent
the two
polynomials as sequences of coefficients (starting with 0, so the coefficient
of the constant is first, and the coefficient of the highest power is last):&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/b1263ee35215151cae0decb0bb89c0dded11cbdb.svg" style="height: 18px;" type="image/svg+xml"&gt;\[P=[1,2,1,3]\qquad R=[6,0,2]\]&lt;/object&gt;
&lt;p&gt;In this notation, &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/d6872ace349b2a492c852ebc70a17a9f50138954.svg" style="height: 15px;" type="image/svg+xml"&gt;P_0&lt;/object&gt; is the first element in the list for &lt;em&gt;P&lt;/em&gt;, etc.
To calculate the coefficient of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/9c51f590abd716225f526f3b099b27aa00172afa.svg" style="height: 15px;" type="image/svg+xml"&gt;x^3&lt;/object&gt; in the product:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/eebf5f0279c9ef5dda90ec7e83cb82285c55fa31.svg" style="height: 38px;" type="image/svg+xml"&gt;\[S_3=\sum_{i}P_i\cdot R_{3-i}\]&lt;/object&gt;
&lt;p&gt;Expanding the sum for all the non-zero coefficients of &lt;em&gt;P&lt;/em&gt;:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/c5d7ee4330c4b3ba32f69cdaf17235484ff532a6.svg" style="height: 14px;" type="image/svg+xml"&gt;\[S_3=P_0 R_3+P_1 R_2+P_2 R_1+P_3 R_0=0+4+0+18=22\]&lt;/object&gt;
&lt;p&gt;Similarly, we'll find that &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/b0479a6acc9805682c41bb933a8f1a85fc948858.svg" style="height: 15px;" type="image/svg+xml"&gt;S_4=2&lt;/object&gt;, &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/402d6d04034d21b87c66239a91f0a93d0d4ad08c.svg" style="height: 15px;" type="image/svg+xml"&gt;S_2=8&lt;/object&gt; and so on, resulting
in the final polynomial as before:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/16c1240b3f6d8f1024144d5c854a554dcb5cbae9.svg" style="height: 19px;" type="image/svg+xml"&gt;\[6x^5+2x^4+22x^3+8x^2+12x+6\]&lt;/object&gt;
&lt;p&gt;There's a nice graphical approach to multiply polynomials using this idea of
pairing up sums for each term in the output:&lt;/p&gt;
&lt;img alt="Graphical representation of poly mul, part 1" class="align-center" src="https://eli.thegreenplace.net/images/2025/poly-mul-slide1.png" /&gt;
&lt;p&gt;We start with the diagram on the left: one of the polynomials remains in its
original form, while the other is flipped around (constant term first, highest
power term last). We line up the polynomials vertically, and multiply the
corresponding coefficients: the constant coefficient of the output is just the
constant coefficient of the first polynomial times the constant coefficient of
the second polynomial.&lt;/p&gt;
&lt;p&gt;The diagram on the right shows the next step: the second polynomial is shifted
left by one term and lined up vertically again. The corresponding coefficients
are multiplied, and then the results are added to obtain the coefficient of &lt;em&gt;x&lt;/em&gt;
in the output polynomial.&lt;/p&gt;
&lt;p&gt;We continue the process by shifting the lower polynomial left:&lt;/p&gt;
&lt;img alt="Graphical representation of poly mul, part 2" class="align-center" src="https://eli.thegreenplace.net/images/2025/poly-mul-slide2.png" /&gt;
&lt;p&gt;Calculating the coefficients of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/7046d961a8144b7b2c2da6066849a9f889ff2ac9.svg" style="height: 15px;" type="image/svg+xml"&gt;x^2&lt;/object&gt; and then &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/9c51f590abd716225f526f3b099b27aa00172afa.svg" style="height: 15px;" type="image/svg+xml"&gt;x^3&lt;/object&gt;. A couple more
steps:&lt;/p&gt;
&lt;img alt="Graphical representation of poly mul, part 3" class="align-center" src="https://eli.thegreenplace.net/images/2025/poly-mul-slide3.png" /&gt;
&lt;p&gt;Now we have all the coefficients of the output. Take a moment to convince
yourself that this approach is equivalent to the summation shown just
before it, and also to the &amp;quot;diagonals in a table&amp;quot; approach shown further up.
They all calculate the same thing &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;Hopefully it should be clear why the second polynomial is &amp;quot;flipped&amp;quot; to perform
this procedure. This all comes down to which input terms pair up to calculate
each output term. As seen above:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/44ed9e287aa311790a92c1ad3966fa622cb7ba46.svg" style="height: 38px;" type="image/svg+xml"&gt;\[S_k=\sum_{i}P_i\cdot R_{k-i}\]&lt;/object&gt;
&lt;p&gt;While the index &lt;em&gt;i&lt;/em&gt; moves in one direction (from the low power terms to the
high power terms) in &lt;em&gt;P&lt;/em&gt;, the index &lt;em&gt;k-i&lt;/em&gt; moves in the opposite direction in
&lt;em&gt;R&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;If this procedure reminds you of computing a convolution between two arrays,
that's because it's exactly that!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="signals-systems-and-convolutions"&gt;
&lt;h2&gt;Signals, systems and convolutions&lt;/h2&gt;
&lt;p&gt;The theory of signals and systems is a large topic (typically taught for one
or two semesters in undergraduate engineering degrees), but here I want to focus
on just one aspect of it which I find really elegant.&lt;/p&gt;
&lt;p&gt;Let's define discrete signals and systems first, restricting ourselves to 1D
(similar ideas apply in higher dimensions):&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Discrete signal&lt;/strong&gt;: An ordered sequence of numbers &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/97897883b2ac574b045db012a128c024c5a05498.svg" style="height: 18px;" type="image/svg+xml"&gt;x[n]&lt;/object&gt; with integer
indices &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/3de636a92d95a628df24e439ae90b5e4c9f4d5f3.svg" style="height: 19px;" type="image/svg+xml"&gt;n\in \left\{ \dots,-2,-1,0,1,2,\dots \right\}&lt;/object&gt;. Can also be
thought of as a function &lt;object class="valign-m1" data="https://eli.thegreenplace.net/images/math/cec8dca7e4892beffc8e2f86787fc5a93edb3071.svg" style="height: 13px;" type="image/svg+xml"&gt;x:\mathbb{Z} \rightarrow \mathbb{R}&lt;/object&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Discrete system&lt;/strong&gt;: A function mapping input signals &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/97897883b2ac574b045db012a128c024c5a05498.svg" style="height: 18px;" type="image/svg+xml"&gt;x[n]&lt;/object&gt; to output
signals &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/7d09231348c8caf84d4fba7c80f26c2607e1d397.svg" style="height: 18px;" type="image/svg+xml"&gt;y[n]&lt;/object&gt;. For example, &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/7496db80e038dcb3c30672eef6f3a73e5856a0f1.svg" style="height: 18px;" type="image/svg+xml"&gt;y[n]=2x[n]&lt;/object&gt; is a system that scales
all signals by a factor of two.&lt;/p&gt;
&lt;p&gt;Here's an example signal:&lt;/p&gt;
&lt;img alt="Basic signal" class="align-center" src="https://eli.thegreenplace.net/images/2025/signal-basic.png" /&gt;
&lt;p&gt;This is a &lt;em&gt;finite&lt;/em&gt; signal. All values not explicitly shown in the chart are
assumed to be 0 (e.g. &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/c799c5cc1533f1a1dd5775dc7c358b57245fd3d7.svg" style="height: 18px;" type="image/svg+xml"&gt;x[3]=0&lt;/object&gt;, &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/a13d41817d095b1a8d945047947396c6ab9293c9.svg" style="height: 18px;" type="image/svg+xml"&gt;x[-1]=0&lt;/object&gt;, &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/a6a0a2893603aaa89006b12a6ed459a9bc15ea6e.svg" style="height: 18px;" type="image/svg+xml"&gt;x[99]=0&lt;/object&gt; and so
on).&lt;/p&gt;
&lt;p&gt;A very important signal is the &lt;em&gt;discrete impulse&lt;/em&gt;:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/017d13ff59a119f0e8c3455ba4ed63c6c76455f4.svg" style="height: 54px;" type="image/svg+xml"&gt;\[\delta[n]=\begin{cases}
  1\quad if \quad n=0\\
  0\quad otherwise
\end{cases}\]&lt;/object&gt;
&lt;p&gt;In graphical form, here's &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/6cae1d0346254097c7b55bb6167c5b959eecf5a6.svg" style="height: 18px;" type="image/svg+xml"&gt;\delta[n]&lt;/object&gt;, as well as a couple of time-shifted
variants of it. Note how we shift a signal left and right on the horizontal
axis by adding to or subtracting from &lt;em&gt;n&lt;/em&gt;, correspondingly. Take a moment to
double check why this works.&lt;/p&gt;
&lt;img alt="Discrete impulse function delta with shifts" class="align-center" src="https://eli.thegreenplace.net/images/2025/discrete-delta.png" /&gt;
&lt;p&gt;The impulse is useful because we can decompose any discrete signal into
a sequence of scaled and shifted impulses. Our example signal
has three non-zero values at indices 0, 1 and 2; we can
represent it as follows:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/3c5acceab47e9fe68aa6996942cad0174e7f88c1.svg" style="height: 18px;" type="image/svg+xml"&gt;\[x[n]=x[0]\delta[n]+x[1]\delta[n-1]+x[2]\delta[n-2]=2\delta[n]+2\delta[n-1]+\delta[n-2]\]&lt;/object&gt;
&lt;p&gt;More generally, a signal &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/97897883b2ac574b045db012a128c024c5a05498.svg" style="height: 18px;" type="image/svg+xml"&gt;x[n]&lt;/object&gt; can be written as:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/0f09f5c114a18a1ac8d2a3c8b0c9162d5c73c8ba.svg" style="height: 38px;" type="image/svg+xml"&gt;\[x[n]=\sum_{k} x[k]\delta[n-k]\]&lt;/object&gt;
&lt;p&gt;(for all &lt;em&gt;k&lt;/em&gt; where &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/79671334a2b5653a26855789e1ae683d6a238237.svg" style="height: 18px;" type="image/svg+xml"&gt;x[k]&lt;/object&gt; is nonzero)&lt;/p&gt;
&lt;p&gt;In the study of signals and systems, linear and time-invariant (LTI) systems
are particularly important.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Linear&lt;/strong&gt;: suppose &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/1a1af1865adce69bee27bc12ac7efe21fc5b4cc7.svg" style="height: 18px;" type="image/svg+xml"&gt;y_1[n]&lt;/object&gt; is the output of a system for input signal
&lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/d83ca8f12d64bb1cc32bde2de17916db3b241fbb.svg" style="height: 18px;" type="image/svg+xml"&gt;x_1[n]&lt;/object&gt;, and similarly &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/063fd2dc2abb49d7c840a759acc028022a516c6f.svg" style="height: 18px;" type="image/svg+xml"&gt;y_2[n]&lt;/object&gt; is the output for &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/1901362389403ab7a877ad93ae470f9f582611b0.svg" style="height: 18px;" type="image/svg+xml"&gt;x_2[n]&lt;/object&gt;.
A linear system outputs &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/8df8fffbb7ad264b8d90092fd487c7df828b7b21.svg" style="height: 18px;" type="image/svg+xml"&gt;a\cdot y_1[n]+b\cdot y_2[n]&lt;/object&gt; for the input
&lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/975188d1987d56029b91e1b72254e2ad24b307e4.svg" style="height: 18px;" type="image/svg+xml"&gt;a\cdot x_1[n] + b\cdot x_2[n]&lt;/object&gt; where &lt;em&gt;a&lt;/em&gt; and &lt;em&gt;b&lt;/em&gt; are constants.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Time-invariant&lt;/strong&gt;: if we delay the input signal by some constant:
&lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/981ba86a512ab5fbf28c730b5c561d0b20d08565.svg" style="height: 18px;" type="image/svg+xml"&gt;x_1[n-N]&lt;/object&gt;, the output is similarly delayed: &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/13cde0f8a6c1f155d9e21068dbdccb6b8abc3616.svg" style="height: 18px;" type="image/svg+xml"&gt;y_1[n-N]&lt;/object&gt;. In other
words, the response of the system has a similar shape, no matter when the
signal was received (it behaves today similarly to the way it behaved
yesterday).&lt;/p&gt;
&lt;p&gt;LTI systems are important because of the decomposition of a signal into impulses
discussed above. Suppose we want to characterize a system: what it does to an
arbitrary signal. For an LTI system, all we need to describe is its response to
an impulse!&lt;/p&gt;
&lt;p&gt;If the response of our system to &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/6cae1d0346254097c7b55bb6167c5b959eecf5a6.svg" style="height: 18px;" type="image/svg+xml"&gt;\delta[n]&lt;/object&gt; is &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/3a63dbb689a8a0614073ad3414f4c071c18027d5.svg" style="height: 18px;" type="image/svg+xml"&gt;h[n]&lt;/object&gt;, then:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Its response to &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/b459c57afa0ea9b88b9c66ec25fc270c679d31d9.svg" style="height: 18px;" type="image/svg+xml"&gt;c\cdot \delta[n]&lt;/object&gt; is &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/66a2d90a05a07da54c9f3f3d6f24ffecc469a8df.svg" style="height: 18px;" type="image/svg+xml"&gt;c\cdot h[n]&lt;/object&gt;, for any
constant &lt;em&gt;c&lt;/em&gt;, because the system is linear.&lt;/li&gt;
&lt;li&gt;Its response to &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/922d79bd72f7cae3b16dde94d2125d2e07f96764.svg" style="height: 18px;" type="image/svg+xml"&gt;\delta[n-k]&lt;/object&gt; is &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/e0de39fedafcd96af7e968bab01da89a66e9db9b.svg" style="height: 18px;" type="image/svg+xml"&gt;h[n-k]&lt;/object&gt;, for any time shift
&lt;em&gt;k&lt;/em&gt;, because the system is time-invariant.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We'll combine these and use linearity again (note that in the following
&lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/79671334a2b5653a26855789e1ae683d6a238237.svg" style="height: 18px;" type="image/svg+xml"&gt;x[k]&lt;/object&gt; are just constants); the response to a signal decomposed into a sum
of shifted and scaled impulses:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/b7b5d739d5577043347b12c46a3e72f5a8704ae9.svg" style="height: 38px;" type="image/svg+xml"&gt;\[y[n]=\sum_{k} x[k]\delta[n-k]\]&lt;/object&gt;
&lt;p&gt;Is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/4edb642051dc77b74e5fc5ce3c49497cff17788f.svg" style="height: 38px;" type="image/svg+xml"&gt;\[y[n]=\sum_{k} x[k]h[n-k]\]&lt;/object&gt;
&lt;p&gt;This operation is called the &lt;em&gt;convolution&lt;/em&gt; between sequences &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/97897883b2ac574b045db012a128c024c5a05498.svg" style="height: 18px;" type="image/svg+xml"&gt;x[n]&lt;/object&gt; and
&lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/3a63dbb689a8a0614073ad3414f4c071c18027d5.svg" style="height: 18px;" type="image/svg+xml"&gt;h[n]&lt;/object&gt;, and is denoted with the &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/65cf200212223c762d7ed4756d09813cde57121d.svg" style="height: 9px;" type="image/svg+xml"&gt;\ast&lt;/object&gt; operator:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/013fb5362714d06a2aec4094fb18c577ec2be023.svg" style="height: 18px;" type="image/svg+xml"&gt;\[y[n]=x[n]\ast h[n]\]&lt;/object&gt;
&lt;p&gt;Let's work through an example. Suppose we have an LTI system with the following
response to an impulse:&lt;/p&gt;
&lt;img alt="Impulse response h[n]" class="align-center" src="https://eli.thegreenplace.net/images/2025/hn-impulse-response.png" /&gt;
&lt;p&gt;The response has &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/8737c322a24d90cf374ec9d086bd36bb88ba31b9.svg" style="height: 18px;" type="image/svg+xml"&gt;h[0]=1&lt;/object&gt;, &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/f305f989a51e39fad8a3ee0d11366dc322617d4b.svg" style="height: 18px;" type="image/svg+xml"&gt;h[1]=3&lt;/object&gt; and zeros everywhere else.
Recall our sample signal from the top of this section (the sequence 2, 2, 1).
We can decompose it to a sequence of scaled and shifted impulses, and then
calculate the system response to each of them. Like this:&lt;/p&gt;
&lt;img alt="Decomposed x[n] and the h[n] for each component" class="align-center" src="https://eli.thegreenplace.net/images/2025/hn-response-decompose.png" /&gt;
&lt;p&gt;The top row shows the input signal decomposed into scaled and shifted impulses;
the bottom row is the corresponding system response to each input. If we carefully
add up the responses for each &lt;em&gt;n&lt;/em&gt;, we'll get the system response &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/7d09231348c8caf84d4fba7c80f26c2607e1d397.svg" style="height: 18px;" type="image/svg+xml"&gt;y[n]&lt;/object&gt;
to our input:&lt;/p&gt;
&lt;img alt="y[n] full system response" class="align-center" src="https://eli.thegreenplace.net/images/2025/yn-response.png" /&gt;
&lt;p&gt;Now, let's calculate the same output, but this time using the convolution sum.
First, we'll represent the signal &lt;em&gt;x&lt;/em&gt; and the impulse response &lt;em&gt;h&lt;/em&gt; as sequences
(just like we did with polynomials), with &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/4d0df241f4afb921166d4c21a74ac89a317fc541.svg" style="height: 18px;" type="image/svg+xml"&gt;x[0]&lt;/object&gt; first, then &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/40c7c1e6b8e98be3051b26e1b151f1a437aba7d6.svg" style="height: 18px;" type="image/svg+xml"&gt;x[1]&lt;/object&gt;
etc:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/9517b9d2f9f8bb6b67055b9e05e590e9a485d5ad.svg" style="height: 18px;" type="image/svg+xml"&gt;\[x=[2,2,1]\qquad h=[1, 3]\]&lt;/object&gt;
&lt;p&gt;The convolution sum is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/4edb642051dc77b74e5fc5ce3c49497cff17788f.svg" style="height: 38px;" type="image/svg+xml"&gt;\[y[n]=\sum_{k} x[k]h[n-k]\]&lt;/object&gt;
&lt;p&gt;Recall that &lt;em&gt;k&lt;/em&gt; ranges over all the non-zero elements in &lt;em&gt;x&lt;/em&gt;. Let's calculate
each element of &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/7d09231348c8caf84d4fba7c80f26c2607e1d397.svg" style="height: 18px;" type="image/svg+xml"&gt;y[n]&lt;/object&gt; (noting that &lt;em&gt;h&lt;/em&gt; is nonzero only for indices
0 and 1):&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/d485709c51ca2175b2254ba35e0bc0c665fe0db9.svg" style="height: 98px;" type="image/svg+xml"&gt;\[\begin{align*}
  y[0]&amp;amp;=x[0]h[0]=2\\
  y[1]&amp;amp;=x[0]h[1]+x[1]h[0]=8\\
  y[2]&amp;amp;=x[1]h[1]+x[2]h[0]=7\\
  y[3]&amp;amp;=x[2]h[1]=3
\end{align*}\]&lt;/object&gt;
&lt;p&gt;All subsequent values of &lt;em&gt;y&lt;/em&gt; are zero because &lt;em&gt;k&lt;/em&gt; only ranges up to 2 and
&lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/c453485f399f8597542277efa0bb321f19097ea5.svg" style="height: 18px;" type="image/svg+xml"&gt;h[4-2]=h[2]=0&lt;/object&gt;.&lt;/p&gt;
&lt;p&gt;If you look carefully at this calculation, you'll notice that &lt;em&gt;h&lt;/em&gt; is &amp;quot;flipped&amp;quot;
in relation to &lt;em&gt;x&lt;/em&gt;, just like with the polynomials:&lt;/p&gt;
&lt;img alt="Convolution between signals by flipping one and sliding" class="align-center" src="https://eli.thegreenplace.net/images/2025/hn-flip-slide.png" /&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;We start with &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/97897883b2ac574b045db012a128c024c5a05498.svg" style="height: 18px;" type="image/svg+xml"&gt;x[n]&lt;/object&gt; (black) and flipped &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/3a63dbb689a8a0614073ad3414f4c071c18027d5.svg" style="height: 18px;" type="image/svg+xml"&gt;h[n]&lt;/object&gt; (blue), and line up the first
non-zero elements. This computes &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/56d6c463cc6373ce9ba08bafd265f552ec09a726.svg" style="height: 18px;" type="image/svg+xml"&gt;y[0]&lt;/object&gt;&lt;/li&gt;
&lt;li&gt;In subsequent steps, &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/3a63dbb689a8a0614073ad3414f4c071c18027d5.svg" style="height: 18px;" type="image/svg+xml"&gt;h[n]&lt;/object&gt; is slides right, one element at a time,
and the next value of &lt;em&gt;y&lt;/em&gt; is computed by adding up the element-wise products
of the lined up &lt;em&gt;x&lt;/em&gt; and &lt;em&gt;h&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Just like with polynomials &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;, the reason why one of the inputs is flipped is
clear from the definition of the convolution sum, where one of the the indices
increases (&lt;em&gt;k&lt;/em&gt;), while the other decreases (&lt;em&gt;n-k&lt;/em&gt;).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="properties-of-convolution"&gt;
&lt;h2&gt;Properties of convolution&lt;/h2&gt;
&lt;p&gt;The convolution operation has &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Convolution#Properties"&gt;many useful algebraic properties&lt;/a&gt;: linearity, associativity,
commutativity, distributivity, etc.&lt;/p&gt;
&lt;p&gt;The commutative property means that when computing convolutions graphically,
it doesn't matter which of the signals is &amp;quot;flipped&amp;quot;, because:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/9bd7b758f593d9706e0e448d9dd1854356d2c566.svg" style="height: 18px;" type="image/svg+xml"&gt;\[y[n]=x[n]\ast h[n]=h[n]\ast x[n]\]&lt;/object&gt;
&lt;p&gt;And therefore:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/0c732b0e414b373ae2cfabb26f1bc080f33ff003.svg" style="height: 38px;" type="image/svg+xml"&gt;\[y[n]=\sum_{k} x[k]h[n-k]=\sum_{k} x[n-k]h[k]\]&lt;/object&gt;
&lt;p&gt;But the most important property of the convolution is how it behaves in the
frequency domain. If we denote &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/7d3e973d3685b53763332dc22c2edcca014c844b.svg" style="height: 19px;" type="image/svg+xml"&gt;\mathcal{F}(f)&lt;/object&gt; as the Fourier transform
of signal &lt;em&gt;f&lt;/em&gt;, then the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Convolution_theorem"&gt;convolution theorem&lt;/a&gt; states:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/28c9d86ff9f7121ee0f257da3a6ae4787e96fd07.svg" style="height: 19px;" type="image/svg+xml"&gt;\[\mathcal{F}(f\ast g)=\mathcal{F}(f)\cdot\mathcal{F}(g)\]&lt;/object&gt;
&lt;p&gt;The Fourier transform of a convolution is equal to simple multiplication
between the Fourier transforms of its operands. This fact - along with advanced
algorithms like FFT - make it possible to
&lt;a class="reference external" href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.fftconvolve.html"&gt;implement convolutions&lt;/a&gt;
very efficiently.&lt;/p&gt;
&lt;p&gt;This is a deep and fascinating topic, but we'll leave it as a story for
another day.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Theoretically, &lt;object class="valign-m2" data="https://eli.thegreenplace.net/images/math/acf8a35cfbd53e99e771cc7d3ceb05918188e693.svg" style="height: 14px;" type="image/svg+xml"&gt;-\infty&amp;lt;i&amp;lt;\infty&lt;/object&gt;, but we only care about the
non-zero coefficients. Therefore, we won't be specifying summation bounds;
instead, &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/2f073b6121c4685b116e69f307f6094903ea9a10.svg" style="height: 18px;" type="image/svg+xml"&gt;\sum_{i}&lt;/object&gt; means &amp;quot;sum over all the non-zero coefficients&amp;quot;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;As a cool exercise, explore how the same technique works for multiplying
two numbers together, treating each number as a polynomial of subsequent
powers of 10. There's just one slight complication with carries that
have to be taken into account in the end, but it works really well!
&lt;a class="reference external" href="https://charlesfrye.github.io/math/2019/02/20/multiplication-convoluted-part-one.html"&gt;This blog post&lt;/a&gt;
has more information, if you're interested.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;p class="first"&gt;The similarity in behavior between polynomials and signals here is
quite beautiful, and far from accidental. In fact, if
we take polynomials with addition and multiplication, we get a
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Polynomial_ring"&gt;ring&lt;/a&gt; that's
isomorphic to the ring of finite sequences with similar operations.
Even operations like &amp;quot;delay&amp;quot; or &amp;quot;shift right&amp;quot; can be emulated with
multiplying a polynomial by a power &lt;em&gt;x&lt;/em&gt; (which could be negative if
we want to &amp;quot;shift left&amp;quot;).&lt;/p&gt;
&lt;p class="last"&gt;This isomorphism is widely employed in mathematics; for example,
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Generating_function"&gt;ordinary generating functions&lt;/a&gt;
use it to represent sequences as polynomials an then manipulate them
algebraically. In signal processing it's also used for the
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Z-transform"&gt;Z transform&lt;/a&gt;.&lt;/p&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Math"></category><category term="EE &amp; Embedded"></category></entry><entry><title>Sum of same-frequency sinusoids</title><link href="https://eli.thegreenplace.net/2023/sum-of-same-frequency-sinusoids/" rel="alternate"></link><published>2023-03-11T19:44:00-08:00</published><updated>2024-05-04T19:46:23-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2023-03-11:/2023/sum-of-same-frequency-sinusoids/</id><summary type="html">&lt;p&gt;I was reviewing an electronics textbook the other day, and it made an offhand
comment that &amp;quot;sinusoidal signals of the same frequency always add up to a
sinusoid, even if their magnitudes and phases are different&amp;quot;.
This gave me pause; is that really so? Even with different phases?&lt;/p&gt;
&lt;p&gt;Using EE …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I was reviewing an electronics textbook the other day, and it made an offhand
comment that &amp;quot;sinusoidal signals of the same frequency always add up to a
sinusoid, even if their magnitudes and phases are different&amp;quot;.
This gave me pause; is that really so? Even with different phases?&lt;/p&gt;
&lt;p&gt;Using EE notation, a sinusoidal signal with magnitude &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/86684571efbdc2d7a49562ba00dd15056c517135.svg" style="height: 16px;" type="image/svg+xml"&gt;A_1&lt;/object&gt;, frequency
&lt;img alt="w" class="valign-0" src="https://eli.thegreenplace.net/images/math/aff024fe4ab0fece4091de044c58c9ae4233383a.png" style="height: 8px;" /&gt; and phase &lt;object class="valign-m4" data="https://eli.thegreenplace.net/images/math/3d9f2f00378f60e70beb5531aa2169a534bffe40.svg" style="height: 16px;" type="image/svg+xml"&gt;\phi_1&lt;/object&gt; is &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/6e641a468b62c6db3c69c21e1328e23ad284a748.svg" style="height: 19px;" type="image/svg+xml"&gt;A_1 sin(wt+\phi_1)&lt;/object&gt; &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;. The
book's statement amounts to:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/a1b13ea5023ad1e0b1f2dde7cf495c7da83f6657.svg" style="height: 19px;" type="image/svg+xml"&gt;\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=A_3 sin(wt+\phi_3)\]&lt;/object&gt;
&lt;p&gt;The sum is also a sinusoid with the same frequency, but potentially different
magnitude and phase. I couldn't find this equality in any of my reference books,
so why is it true?&lt;/p&gt;
&lt;div class="section" id="empirical-probing"&gt;
&lt;h2&gt;Empirical probing&lt;/h2&gt;
&lt;p&gt;Let's start by asking whether this is true at all? It's not at all obvious that
this should work. &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2023/sinusoid"&gt;Armed with Python, Numpy and matplotlib&lt;/a&gt;, I
plotted two sinusoidal signals with the same frequency but different magnitudes
and phases:&lt;/p&gt;
&lt;img alt="Two sinusoidal signals plotted together" class="align-center" src="https://eli.thegreenplace.net/images/2023/two-sinusoids.png" /&gt;
&lt;p&gt;Now, plotting their sum in green on the same chart:&lt;/p&gt;
&lt;img alt="Two sinusoidal signals plotted together with their sum signal" class="align-center" src="https://eli.thegreenplace.net/images/2023/sinusoids-with-sum.png" /&gt;
&lt;p&gt;Well, look at that. It seems to be working. I guess it's time to prove it.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="proof-using-trig-identities"&gt;
&lt;h2&gt;Proof using trig identities&lt;/h2&gt;
&lt;p&gt;The first proof I want to demonstrate doesn't use any fancy math beyond some
basic trigonometric identities. One of best known ones is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/78bab16c56ec9b9da0b5dc2543c8a5dabee73f08.svg" style="height: 19px;" type="image/svg+xml"&gt;\[sin(a+b)=sin(a)cos(b)+cos(a)sin(b) \hspace{2cm} (id. 1)\]&lt;/object&gt;
&lt;p&gt;Taking our sum of sinusoids:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/72c2f9d4c2be1adde9b7b4ba1bf94f31b3a8dd15.svg" style="height: 19px;" type="image/svg+xml"&gt;\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)\]&lt;/object&gt;
&lt;p&gt;Applying (id.1) to each of the terms, and then regrouping, we get:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/dac6d8532a46bbf7490d4e9083835914b8b61005.svg" style="height: 45px;" type="image/svg+xml"&gt;\[\begin{align*}
&amp;lt;sum&amp;gt;&amp;amp;=A_1\left [sin(wt)cos(\phi_1)+cos(wt)sin(\phi_1)  \right ]+A_2\left [sin(wt)cos(\phi_2)+cos(wt)sin(\phi_2)  \right ]\\
&amp;amp;=\left [A_1 cos(\phi_1) + A_2 cos(\phi_2) \right ]sin(wt)+\left [ A_1 sin(\phi_1) + A_2 sin(\phi_2)\right ]cos(wt)\\
\end{align*}\]&lt;/object&gt;
&lt;p&gt;Now, a change of variables trick: we'll assume we can solve the following
set of equations for some &lt;img alt="B" class="valign-0" src="https://eli.thegreenplace.net/images/math/ae4f281df5a5d0ff3cad6371f76d5c29b6d953ec.png" style="height: 12px;" /&gt; and &lt;img alt="\theta" class="valign-0" src="https://eli.thegreenplace.net/images/math/cb005d76f9f2e394a770c2562c2e150a413b3216.png" style="height: 12px;" /&gt; &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/c605365120049d7ca728477757b12b12efe13fdd.svg" style="height: 46px;" type="image/svg+xml"&gt;\[\begin{align*}
Bcos(\theta)&amp;amp;=A_1 cos(\phi_1)+A_2 cos(\phi_2) \hspace{2cm} (1)\\
Bsin(\theta)&amp;amp;=A_1 sin(\phi_1)+A_2 sin(\phi_2) \hspace{2cm} (2)\\
\end{align*}\]&lt;/object&gt;
&lt;p&gt;To find &lt;img alt="B" class="valign-0" src="https://eli.thegreenplace.net/images/math/ae4f281df5a5d0ff3cad6371f76d5c29b6d953ec.png" style="height: 12px;" /&gt;, we can square each of (1) and (2) and then add the
squares together:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/4fdeb60073659f55fa75878ec2f0867f9cbe7fd6.svg" style="height: 22px;" type="image/svg+xml"&gt;\[B^2 cos^2 (\theta)+B^2 sin^2 (\theta)=(A_1 cos(\phi_1)+A_2 cos(\phi_2))^2 + (A_1 sin(\phi_1)+A_2 sin(\phi_2))^2\]&lt;/object&gt;
&lt;p&gt;Using the fact that &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/e188aa0292a1c0d17344548fdcc38dc26faf3429.svg" style="height: 20px;" type="image/svg+xml"&gt;cos^2(a)+sin^2(a)=1&lt;/object&gt;, we get:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/28f6325da1ceee1767cdc9508fbc85b90427b1b0.svg" style="height: 23px;" type="image/svg+xml"&gt;\[B=\sqrt{(A_1 cos(\phi_1)+A_2 cos(\phi_2))^2 + (A_1 sin(\phi_1)+A_2 sin(\phi_2))^2}\]&lt;/object&gt;
&lt;p&gt;To solve for &lt;img alt="\theta" class="valign-0" src="https://eli.thegreenplace.net/images/math/cb005d76f9f2e394a770c2562c2e150a413b3216.png" style="height: 12px;" /&gt;, we can divide equation (2) by (1), getting:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/a0ecd0ee8c1f85d0ddb6b1224f48f5a8f2c469de.svg" style="height: 43px;" type="image/svg+xml"&gt;\[\frac{sin(\theta)}{cos(\theta)}=tan(\theta)=\frac{A_1 sin(\phi_1)+A_2 sin(\phi_2)}{A_1 cos(\phi_1)+A_2 cos(\phi_2)}\]&lt;/object&gt;
&lt;p&gt;Meaning that:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/b66eb0f8ef309c637f017134d8c66368452594c8.svg" style="height: 43px;" type="image/svg+xml"&gt;\[\theta = atan{\frac{A_1 sin(\phi_1)+A_2 sin(\phi_2)}{A_1 cos(\phi_1)+A_2 cos(\phi_2)}}\]&lt;/object&gt;
&lt;p&gt;Now that we have the values of &lt;img alt="B" class="valign-0" src="https://eli.thegreenplace.net/images/math/ae4f281df5a5d0ff3cad6371f76d5c29b6d953ec.png" style="height: 12px;" /&gt; and &lt;img alt="\theta" class="valign-0" src="https://eli.thegreenplace.net/images/math/cb005d76f9f2e394a770c2562c2e150a413b3216.png" style="height: 12px;" /&gt;, let's put them aside
for a bit and get back to the final line of our sum of sinusoids equation:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/921c17b6fb04e7cb8cc10d3f5652ed28d6f508ed.svg" style="height: 19px;" type="image/svg+xml"&gt;\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=\left [A_1 cos(\phi_1) + A_2 cos(\phi_2) \right ]sin(wt)+\left [ A_1 sin(\phi_1) + A_2 sin(\phi_2)\right ]cos(wt)\]&lt;/object&gt;
&lt;p&gt;On the right-hand side, we can apply equations (1) and (2) to get:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/580c6c30e3caaf5ab9bb0472620c46f53266a5a2.svg" style="height: 19px;" type="image/svg+xml"&gt;\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=B cos(\theta) sin(wt)+ B sin(\theta) cos(wt)\]&lt;/object&gt;
&lt;p&gt;Applying (id.1) again, we get:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/042c3173428218fb7a3410884288a540ecc78cd1.svg" style="height: 19px;" type="image/svg+xml"&gt;\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=B sin(wt + \theta)\]&lt;/object&gt;
&lt;p&gt;We've just shown that the sum of sinusoids with the same frequency &lt;img alt="w" class="valign-0" src="https://eli.thegreenplace.net/images/math/aff024fe4ab0fece4091de044c58c9ae4233383a.png" style="height: 8px;" /&gt;
is another sinusoid with frequency &lt;img alt="w" class="valign-0" src="https://eli.thegreenplace.net/images/math/aff024fe4ab0fece4091de044c58c9ae4233383a.png" style="height: 8px;" /&gt;, and we've calculated &lt;img alt="B" class="valign-0" src="https://eli.thegreenplace.net/images/math/ae4f281df5a5d0ff3cad6371f76d5c29b6d953ec.png" style="height: 12px;" /&gt; and
&lt;img alt="\theta" class="valign-0" src="https://eli.thegreenplace.net/images/math/cb005d76f9f2e394a770c2562c2e150a413b3216.png" style="height: 12px;" /&gt; from the other parameters (&lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/86684571efbdc2d7a49562ba00dd15056c517135.svg" style="height: 16px;" type="image/svg+xml"&gt;A_1&lt;/object&gt;, &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/dfcd56bce194520e6f50a8f821c98f338cb9d65c.svg" style="height: 16px;" type="image/svg+xml"&gt;A_2&lt;/object&gt;,
&lt;object class="valign-m4" data="https://eli.thegreenplace.net/images/math/3d9f2f00378f60e70beb5531aa2169a534bffe40.svg" style="height: 16px;" type="image/svg+xml"&gt;\phi_1&lt;/object&gt; and &lt;object class="valign-m4" data="https://eli.thegreenplace.net/images/math/f80876d413b14edaee0aa2678ab67346f6da633c.svg" style="height: 16px;" type="image/svg+xml"&gt;\phi_2&lt;/object&gt;) &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/4a4e9e431da45a27bc880a8a1ca44d8b1b9bc143.svg" style="height: 12px;" type="image/svg+xml"&gt;\blacksquare&lt;/object&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="proof-using-complex-numbers"&gt;
&lt;h2&gt;Proof using complex numbers&lt;/h2&gt;
&lt;p&gt;The second proof uses a bit more advanced math, but overall feels more elegant
to me. The plan is to use Euler's equation and prove a more general statement
on the complex plane.&lt;/p&gt;
&lt;p&gt;Instead of looking at the sum of real sinusoids, we'll first look at the sum
of two complex exponential functions:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/ae18a24d9752babf28d9472d59bdc1e67bdb2074.svg" style="height: 20px;" type="image/svg+xml"&gt;\[A_1 e^{j(wt + \phi_1)} + A_2 e^{j(wt + \phi_2)}\]&lt;/object&gt;
&lt;p&gt;Reminder: Euler's equation for a complex exponential is&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/0741c35664a1df4373bf777ddecce046f75eb386.svg" style="height: 21px;" type="image/svg+xml"&gt;\[e^{jx}=cosx+jsinx\]&lt;/object&gt;
&lt;p&gt;Regrouping our sum of exponentials a bit and then applying this equation:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/12458dd213e9b9f3ef8ee7b9d34094410fcd3fbb.svg" style="height: 88px;" type="image/svg+xml"&gt;\[\begin{align*}
A_1 e^{j(wt + \phi_1)} + A_2 e^{j(wt + \phi_2)}&amp;amp;=e^{jwt}\left (A_1 e^{j\phi_1} + A_2 e^{j\phi_2}\right )\\
&amp;amp;=e^{jwt}\left ( A_1 cos(\phi_1) + jA_1 sin(\phi_1) + A_2 cos(\phi_2) + jA_2 sin(\phi_2)\right )\\
&amp;amp;=e^{jwt}\left [\left (A_1 cos(\phi_1) + A_2 cos(\phi_2) \right ) + j\left(A_1 sin(\phi_1) + A_2 sin(\phi_2) \right ) \right ]
\end{align*}\]&lt;/object&gt;
&lt;p&gt;The value inside the square brackets can be viewed as a complex number in its
rectangular form: &lt;object class="valign-m4" data="https://eli.thegreenplace.net/images/math/8e2a949b46783cd572f79c9ad9d6a3887f0fb462.svg" style="height: 16px;" type="image/svg+xml"&gt;x + jy&lt;/object&gt;. We can convert it to its polar form:
&lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/14073811ee769d485ac4495503a1d32292b73f45.svg" style="height: 15px;" type="image/svg+xml"&gt;re^{j\theta}&lt;/object&gt;, by calculating:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/dd1d145174f17f82bfcffce4c658ac249025aaf2.svg" style="height: 62px;" type="image/svg+xml"&gt;\[\begin{align*}
r&amp;amp;=\sqrt{x^2+y^2}\\
\theta&amp;amp;=atan(\frac{y}{x})
\end{align*}\]&lt;/object&gt;
&lt;p&gt;In our case:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/18a842fb6ac23aa16c0c4b4fe8c974c064933509.svg" style="height: 23px;" type="image/svg+xml"&gt;\[r=\sqrt{(A_1 cos(\phi_1)+A_2 cos(\phi_2))^2 + (A_1 sin(\phi_1)+A_2 sin(\phi_2))^2}\]&lt;/object&gt;
&lt;p&gt;And:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/b66eb0f8ef309c637f017134d8c66368452594c8.svg" style="height: 43px;" type="image/svg+xml"&gt;\[\theta = atan{\frac{A_1 sin(\phi_1)+A_2 sin(\phi_2)}{A_1 cos(\phi_1)+A_2 cos(\phi_2)}}\]&lt;/object&gt;
&lt;p&gt;Therefore, the sum of complex exponentials is another complex exponential with
the same frequency, but a different magnitude and phase:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/a5028bcfc1af3517f2e52448e4b51f25c80ade80.svg" style="height: 20px;" type="image/svg+xml"&gt;\[A_1 e^{j(wt + \phi_1)} + A_2 e^{j(wt + \phi_2)}= e^{jwt} r e^{j \theta}=r e^{j(wt + \theta)}\]&lt;/object&gt;
&lt;p&gt;From here, we can use Euler's equation again to see the equivalence in terms
of sinusoidal functions:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/dc5490843cd4fc3676a1bcf8fcdf0002ae0f227a.svg" style="height: 45px;" type="image/svg+xml"&gt;\[\begin{align*}
A_1 cos(wt+\phi_1)+jA_1 sin(wt+\phi_1)&amp;amp;+\\
A_2 cos(wt+\phi_2)+jA_2 sin(wt+\phi_2)&amp;amp;=r cos(wt+\theta) + jr sin(wt+\theta)
 \end{align*}\]&lt;/object&gt;
&lt;p&gt;If we only compare the imaginary parts of this equation, we get:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/e1ef66dd4f3ec1496cb94bae8f52acf5f77229da.svg" style="height: 19px;" type="image/svg+xml"&gt;\[A_1 sin(wt+\phi_1)+A_2 sin(wt+\phi_2)=r sin(wt+\theta)\]&lt;/object&gt;
&lt;p&gt;With known &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/4dc7c9ec434ed06502767136789763ec11d2c4b7.svg" style="height: 8px;" type="image/svg+xml"&gt;r&lt;/object&gt; and &lt;img alt="\theta" class="valign-0" src="https://eli.thegreenplace.net/images/math/cb005d76f9f2e394a770c2562c2e150a413b3216.png" style="height: 12px;" /&gt; we've calculated earlier from the other
constants &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/4a4e9e431da45a27bc880a8a1ca44d8b1b9bc143.svg" style="height: 12px;" type="image/svg+xml"&gt;\blacksquare&lt;/object&gt;&lt;/p&gt;
&lt;p&gt;Note that by comparing the real parts of the equation, we can trivially prove a
similar statement about the sum of cosines (which should surprise no one, since
a cosine is just a phase-shifted sine).&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;p class="first"&gt;Electrical engineers prefer their signal frequencies in units of
radian per second.&lt;/p&gt;
&lt;p class="last"&gt;We also like calling the imaginary unit &lt;em&gt;j&lt;/em&gt; instead of &lt;em&gt;i&lt;/em&gt;, because
the latter is used for electrical current.&lt;/p&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;If you're wondering &amp;quot;hold on, why would this work?&amp;quot;, recall that
any point &lt;em&gt;(x,y)&lt;/em&gt; on the Cartesian plane can be represented using
&lt;em&gt;polar coordinates&lt;/em&gt; with magnitude &lt;img alt="B" class="valign-0" src="https://eli.thegreenplace.net/images/math/ae4f281df5a5d0ff3cad6371f76d5c29b6d953ec.png" style="height: 12px;" /&gt; and angle &lt;img alt="\theta" class="valign-0" src="https://eli.thegreenplace.net/images/math/cb005d76f9f2e394a770c2562c2e150a413b3216.png" style="height: 12px;" /&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Math"></category><category term="EE &amp; Embedded"></category></entry><entry><title>Some notes on Luz - an assembler, linker and CPU simulator</title><link href="https://eli.thegreenplace.net/2017/some-notes-on-luz-an-assembler-linker-and-cpu-simulator/" rel="alternate"></link><published>2017-01-05T06:27:00-08:00</published><updated>2024-05-04T19:46:23-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2017-01-05:/2017/some-notes-on-luz-an-assembler-linker-and-cpu-simulator/</id><summary type="html">&lt;p&gt;A few years ago I &lt;a class="reference external" href="https://eli.thegreenplace.net/2010/05/05/introducing-luz"&gt;wrote about Luz&lt;/a&gt; - a
self-educational project to implement a CPU simulator and a toolchain for it,
consisting of an assembler and a linker. Since then, I received some questions
by email that made me realize I could do a better job explaining what the
project …&lt;/p&gt;</summary><content type="html">&lt;p&gt;A few years ago I &lt;a class="reference external" href="https://eli.thegreenplace.net/2010/05/05/introducing-luz"&gt;wrote about Luz&lt;/a&gt; - a
self-educational project to implement a CPU simulator and a toolchain for it,
consisting of an assembler and a linker. Since then, I received some questions
by email that made me realize I could do a better job explaining what the
project is and what one can learn from it.&lt;/p&gt;
&lt;p&gt;So I went back to the &lt;a class="reference external" href="https://github.com/eliben/luz-cpu"&gt;Luz repository&lt;/a&gt; and
fixed it up to be more modern, in-line with current documentation standards on
GitHub. The landing &lt;cite&gt;README&lt;/cite&gt; page should now provide a good overview, but I also
wanted to write up some less formal documentation I could point to - a place to
show-off some of the more interesting features in Luz; a blog post seemed like
the perfect medium for this.&lt;/p&gt;
&lt;p&gt;As before, it makes sense to start with the Luz toplevel diagram:&lt;/p&gt;
&lt;img alt="Luz toplevel diagram" class="align-center" src="https://eli.thegreenplace.net/images/2010/05/luz_proj_toplevel.png" /&gt;
&lt;p&gt;Luz is a collection of related libraries and programs written in Python,
implementing all the stages shown in the diagram above.&lt;/p&gt;
&lt;div class="section" id="the-cpu-simulator"&gt;
&lt;h2&gt;The CPU simulator&lt;/h2&gt;
&lt;p&gt;The Luz CPU is inspired by MIPS (for the instruction set), by Altera Nios II
(for the way &amp;quot;peripherals&amp;quot; are attached to the CPU), and by MPC 555 (for the
memory controller) and is aimed at embedded uses, like Nios II. The &lt;a class="reference external" href="https://github.com/eliben/luz-cpu/blob/main/doc/luz_user_manual.rst"&gt;Luz user
manual&lt;/a&gt;
lists the complete instruction set explaining what each instructions means.&lt;/p&gt;
&lt;p&gt;The simulator itself is functional only - it performs the instructions one after
the other, without trying to simulate how long their execution takes. It's not
very remarkable and is designed to be simple and readable. The most interesting
feature it has, IMHO, is how it maps &amp;quot;peripherals&amp;quot; and even CPU control
registers into memory. Rather than providing special instructions or traps for
OS system calls, Luz facilitates &amp;quot;bare-metal&amp;quot; programming (by which I mean,
without an OS) by mapping &amp;quot;peripherals&amp;quot; into memory, allowing the programmer to
access them by reading and writing special memory locations.&lt;/p&gt;
&lt;p&gt;My inspiration here was soft-core embeddable CPUs like Nios II, which let you
configure what peripherals to connect and how to map them. The CPU can be
configured before it's loaded onto real HW, for example to attach as many SPI
interfaces as needed. For Luz, to create a new peripheral and attach it to the
simulator one implements the &lt;tt class="docutils literal"&gt;Peripheral&lt;/tt&gt; interface:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Peripheral&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; An abstract memory-mapped perhipheral interface.&lt;/span&gt;
&lt;span class="sd"&gt;        Memory-mapped peripherals are accessed through memory&lt;/span&gt;
&lt;span class="sd"&gt;        reads and writes.&lt;/span&gt;

&lt;span class="sd"&gt;        The address given to reads and writes is relative to the&lt;/span&gt;
&lt;span class="sd"&gt;        peripheral&amp;#39;s memory map.&lt;/span&gt;
&lt;span class="sd"&gt;        Width is 1, 2, 4 for byte, halfword and word accesses.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_mem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="ne"&gt;NotImplementedError&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_mem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="ne"&gt;NotImplementedError&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Luz implements some built-in features as peripherals as well; for example, the
&lt;a class="reference external" href="https://github.com/eliben/luz-cpu/blob/main/luz_asm_sim/lib/simlib/peripheral/coreregisters.py"&gt;core registers&lt;/a&gt;
(interrupt control, exception control, etc). The idea here is that embedded CPUs
can have multiple custom &amp;quot;registers&amp;quot; to control various features, and creating
dedicated names for them bloats instruction encoding (you need 5 bits to encode
one of 32 registers, etc.); it's better to just map them to memory.&lt;/p&gt;
&lt;p&gt;Another example is the &lt;a class="reference external" href="https://github.com/eliben/luz-cpu/blob/main/luz_asm_sim/lib/simlib/peripheral/debugqueue.py"&gt;debug queue&lt;/a&gt;
- a peripheral useful for testing and debugging. It's a single word mapped to
address &lt;tt class="docutils literal"&gt;0xF0000&lt;/tt&gt; in the simulator. When the peripheral gets a write, it
stores it in a special queue and optionally emits the value to stdout. The
queue can later be examined. Here is a simple Luz assembly program that makes
use of it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;# Counts from 0 to 9 [inclusive], pushing these numbers into the debug queue

    .segment code
    .global asm_main

    .define ADDR_DEBUG_QUEUE, 0xF0000

asm_main:
    li $k0, ADDR_DEBUG_QUEUE

    li $r9, 10                          # r9 is the loop limit
    li $r5, 0                           # r5 is the loop counter

loop:
    sw $r5, 0($k0)                      # store loop counter to debug queue
    addi $r5, $r5, 1                    # increment loop counter
    bltu $r5, $r9, loop                 # loop back if not reached limit

    halt
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using the interactive runner to run this program we get:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ python run_test_interactive.py loop_simple_debugqueue
DebugQueue: 0x0
DebugQueue: 0x1
DebugQueue: 0x2
DebugQueue: 0x3
DebugQueue: 0x4
DebugQueue: 0x5
DebugQueue: 0x6
DebugQueue: 0x7
DebugQueue: 0x8
DebugQueue: 0x9
Finished successfully...
Debug queue contents:
[&amp;#39;0x0&amp;#39;, &amp;#39;0x1&amp;#39;, &amp;#39;0x2&amp;#39;, &amp;#39;0x3&amp;#39;, &amp;#39;0x4&amp;#39;, &amp;#39;0x5&amp;#39;, &amp;#39;0x6&amp;#39;, &amp;#39;0x7&amp;#39;, &amp;#39;0x8&amp;#39;, &amp;#39;0x9&amp;#39;]
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="assembler"&gt;
&lt;h2&gt;Assembler&lt;/h2&gt;
&lt;p&gt;There's a small snippet of Luz assembly shown above. It's your run-of-the-mill
RISC assembly, with the familiar set of instructions, fairly simple addressing
modes and almost every instruction requiring registers (note how we can't store
into the debug queue directly, for example, without dereferencing a register
that holds its address).&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://github.com/eliben/luz-cpu/blob/main/doc/luz_user_manual.rst"&gt;Luz user manual&lt;/a&gt;
contains a complete reference for the instructions, including their encodings.
Every instruction is a 32-bit word, with the 6 high bits for the opcode (meaning
up to 64 distinct instructions are supported).&lt;/p&gt;
&lt;p&gt;The code snippet also shows off some special features of the full Luz toolchain,
like the special label &lt;tt class="docutils literal"&gt;asm_main&lt;/tt&gt;. I'll discuss these later on in the section
about linking.&lt;/p&gt;
&lt;p&gt;Assembly languages are usually fairly simple to parse, and Luz is no exception.
When I started working on Luz, I decided to use the &lt;a class="reference external" href="http://www.dabeaz.com/ply/"&gt;PLY&lt;/a&gt; library for the lexer and parser mainly because I
wanted to play with it. These days I'd probably just hand-roll a parser.&lt;/p&gt;
&lt;p&gt;Luz takes another cool idea from MIPS - &lt;a class="reference external" href="https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Mips/altReg.html"&gt;register aliases&lt;/a&gt;. While
the assembler doesn't enforce any specific ABI on the coder, some conventions are
very important when writing large assembly programs, and especially when
interfacing with routines written by other programmers. To facilitate this, Luz
designates register aliases for callee-saved registers and temporary registers.&lt;/p&gt;
&lt;p&gt;For example, the general-purpose register number 19 can be referred to in Luz
assembly as &lt;tt class="docutils literal"&gt;$r19&lt;/tt&gt; but also as &lt;tt class="docutils literal"&gt;$s1&lt;/tt&gt; - the callee-saved register 1. When
writing standalone Luz programs, one is free to ignore these conventions. To
get a taste of how ABI-conformant Luz assembly would look, take a look at
&lt;a class="reference external" href="https://github.com/eliben/luz-cpu/tree/main/luz_asm_sim/tests_full/procedure_call_stack_convention"&gt;this example&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To be honest, ABI was on my mind because I was initially envisioning a full
programming environment for Luz, including a C compiler. When you have a
compiler, you must have some set of conventions for generated code like
procedure parameter passing, saved registers and so on; in other words, the
platform ABI.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="linker"&gt;
&lt;h2&gt;Linker&lt;/h2&gt;
&lt;p&gt;In my view, one of the distinguishing features of Luz from other assembler
projects out there is the linker. Luz features a full linker that supports
creating single &amp;quot;binaries&amp;quot; from multiple assembly files, handling all the dirty
work necessary to make that happen. Each assembly file is first &amp;quot;assembled&amp;quot; into
a position-independent object file; these are glued together by the linker which
applies the necessary relocations to resolve symbols across object files. The
&lt;a class="reference external" href="https://github.com/eliben/luz-cpu/tree/main/luz_asm_sim/tests_full/prime_sieve"&gt;prime sieve example&lt;/a&gt;
shows this in action - the program is divided into three &lt;tt class="docutils literal"&gt;.lasm&lt;/tt&gt; files: two
for subroutines and one for &amp;quot;main&amp;quot;.&lt;/p&gt;
&lt;p&gt;As we've seen above, the main subroutine in Luz is called &lt;tt class="docutils literal"&gt;asm_main&lt;/tt&gt;. This is
a special name for the linker (not unlike the &lt;tt class="docutils literal"&gt;_start&lt;/tt&gt; symbol for &lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/13/how-statically-linked-programs-run-on-linux"&gt;modern
Linux assemblers&lt;/a&gt;).
The linker collects a set of object files produced by assembly, and makes sure
to invoke &lt;tt class="docutils literal"&gt;asm_main&lt;/tt&gt; from the special location &lt;tt class="docutils literal"&gt;0x100000&lt;/tt&gt;. This is where
the simulator starts execution.&lt;/p&gt;
&lt;p&gt;Luz also has the concept of &lt;a class="reference external" href="https://github.com/eliben/luz-cpu/blob/main/luz_asm_sim/lib/asmlib/objectfile.py"&gt;object files&lt;/a&gt;.
They are not unlike ELF images in nature: there's a segment table, an export
table and a relocation table for each object, serving the expected roles. It is
the job of the linker to make sense in this list of objects and correctly
connect all call sites to final subroutine addresses.&lt;/p&gt;
&lt;p&gt;Luz's &lt;a class="reference external" href="https://github.com/eliben/luz-cpu/blob/main/luz_asm_sim/luz_asm.py"&gt;standalone assembler&lt;/a&gt; can
write an assembled image into a file in &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Intel_HEX"&gt;Intel HEX format&lt;/a&gt;, a popular format used in embedded
systems to encode binary images or data in ASCII.&lt;/p&gt;
&lt;p&gt;The linker was quite a bit of effort to develop. Since all real Luz programs are
small I didn't really need to break them up into multiple assembly files; but
I really wanted to learn how to write a real linker :) Moreover, as already
mentioned my original plans for Luz included a C compiler, and that would make a
linker very helpful, since I'd need to link some &amp;quot;system&amp;quot; code into the user's
program. Even today, Luz has some &amp;quot;startup code&amp;quot; it links into every image:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;# The special segments added by the linker.
# __startup: 3 words
# __heap: 1 word
#
LINKER_STARTUP_CODE = string.Template(r&amp;#39;&amp;#39;&amp;#39;
        .segment __startup

    LI      $$sp, ${SP_POINTER}
    CALL    asm_main

        .segment __heap
        .global __heap
    __heap:
        .word 0
&amp;#39;&amp;#39;&amp;#39;)
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This code sets up the stack pointer to the initial address allocated for the
stack, and calls the user's &lt;tt class="docutils literal"&gt;asm_main&lt;/tt&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="debugger-and-disassembler"&gt;
&lt;h2&gt;Debugger and disassembler&lt;/h2&gt;
&lt;p&gt;Luz comes with a simple program runner that will execute a Luz program
(consisting of multiple assembly files); it also has an interactive mode - a
debugger. Here's a sample session with the simple loop example shown above:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ python run_test_interactive.py -i loop_simple_debugqueue

LUZ simulator started at 0x00100000

[0x00100000] [lui $sp, 0x13] &amp;gt;&amp;gt; set alias 0
[0x00100000] [lui $r29, 0x13] &amp;gt;&amp;gt; s
[0x00100004] [ori $r29, $r29, 0xFFFC] &amp;gt;&amp;gt; s
[0x00100008] [call 0x40003 [0x10000C]] &amp;gt;&amp;gt; s
[0x0010000C] [lui $r26, 0xF] &amp;gt;&amp;gt; s
[0x00100010] [ori $r26, $r26, 0x0] &amp;gt;&amp;gt; s
[0x00100014] [lui $r9, 0x0] &amp;gt;&amp;gt; s
[0x00100018] [ori $r9, $r9, 0xA] &amp;gt;&amp;gt; s
[0x0010001C] [lui $r5, 0x0] &amp;gt;&amp;gt; s
[0x00100020] [ori $r5, $r5, 0x0] &amp;gt;&amp;gt; s
[0x00100024] [sw $r5, 0($r26)] &amp;gt;&amp;gt; s
[0x00100028] [addi $r5, $r5, 0x1] &amp;gt;&amp;gt; s
[0x0010002C] [bltu $r5, $r9, -2] &amp;gt;&amp;gt; s
[0x00100024] [sw $r5, 0($r26)] &amp;gt;&amp;gt; s
[0x00100028] [addi $r5, $r5, 0x1] &amp;gt;&amp;gt; s
[0x0010002C] [bltu $r5, $r9, -2] &amp;gt;&amp;gt; s
[0x00100024] [sw $r5, 0($r26)] &amp;gt;&amp;gt; s
[0x00100028] [addi $r5, $r5, 0x1] &amp;gt;&amp;gt; r
$r0   = 0x00000000   $r1   = 0x00000000   $r2   = 0x00000000   $r3   = 0x00000000
$r4   = 0x00000000   $r5   = 0x00000002   $r6   = 0x00000000   $r7   = 0x00000000
$r8   = 0x00000000   $r9   = 0x0000000A   $r10  = 0x00000000   $r11  = 0x00000000
$r12  = 0x00000000   $r13  = 0x00000000   $r14  = 0x00000000   $r15  = 0x00000000
$r16  = 0x00000000   $r17  = 0x00000000   $r18  = 0x00000000   $r19  = 0x00000000
$r20  = 0x00000000   $r21  = 0x00000000   $r22  = 0x00000000   $r23  = 0x00000000
$r24  = 0x00000000   $r25  = 0x00000000   $r26  = 0x000F0000   $r27  = 0x00000000
$r28  = 0x00000000   $r29  = 0x0013FFFC   $r30  = 0x00000000   $r31  = 0x0010000C

[0x00100028] [addi $r5, $r5, 0x1] &amp;gt;&amp;gt; s 100
[0x00100030] [halt] &amp;gt;&amp;gt; q
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are many interesting things here demonstrating how Luz works:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Note the start up at &lt;tt class="docutils literal"&gt;0x1000000&lt;/tt&gt; - this is where Luz places the start-up
segment - three instructions that set up the stack pointer and then &lt;tt class="docutils literal"&gt;call&lt;/tt&gt;
the user's code (&lt;tt class="docutils literal"&gt;asm_main&lt;/tt&gt;). The user's &lt;tt class="docutils literal"&gt;asm_main&lt;/tt&gt; starts running at
the fourth instruction executed by the simulator.&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;li&lt;/tt&gt; is a pseudo-instruction, broken into two real instructions: &lt;tt class="docutils literal"&gt;lui&lt;/tt&gt;
for the upper half of the register, followed by &lt;tt class="docutils literal"&gt;ori&lt;/tt&gt; for the lower half of
the register. The reason for this is &lt;tt class="docutils literal"&gt;li&lt;/tt&gt; having a 32-bit immediate, which
can't fit in a Luz instruction. Therefore, it's broken into two parts which
only need 16-bit immediates. This trick is common in RISC ISAs.&lt;/li&gt;
&lt;li&gt;Jump labels are resolved to be relative by the assembler: the jump to &lt;tt class="docutils literal"&gt;loop&lt;/tt&gt;
is replaced by &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;-2&lt;/span&gt;&lt;/tt&gt;.&lt;/li&gt;
&lt;li&gt;Disassembly! The debugger shows the instruction decoded from every word where
execution stops. Note how this exposes pseudo-instructions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="the-in-progress-rtl-implementation"&gt;
&lt;h2&gt;The in-progress RTL implementation&lt;/h2&gt;
&lt;p&gt;Luz was a hobby project, but an ambitious one :-) Even before I wrote the first
line of the assembler or simulator, I started working on an actual CPU
implementation in synthesizable VHDL, meaning to get a complete RTL image to run
on FPGAs. Unfortunately, I didn't finish this part of the project
and what you find in Luz's &lt;tt class="docutils literal"&gt;experimental/luz_uc&lt;/tt&gt; directory is only 75%
complete. The ALU is there, the registers, the hookups to peripherals, even
parts of the control path - dealing with instruction fetching, decoding, etc. My
original plan was to implement a pipelined CPU (a RISC ISA makes this relatively
simple), which perhaps was a bit too much. I should have started simpler.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Luz was an extremely educational project for me. When I started working on it,
I mostly had embedded programming experience and was just starting to get
interested in systems programming. Luz flung me into the world of assemblers,
linkers, binary images, calling conventions, and so on. Besides, Python was
a new language for me at the time - Luz started just months after
&lt;a class="reference external" href="https://eli.thegreenplace.net/2008/05/14/python"&gt;I first got into Python&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Its ~8000 lines of Python code are thus likely not my best Python code, but they
should be readable and well commented. I did modernize it a bit over the years,
for example to make it run on both Python 2 and 3.&lt;/p&gt;
&lt;p&gt;I still hope to get back to the RTL implementation project one day. It's really
very close to being able to run realistic assembly programs on &lt;em&gt;real hardware&lt;/em&gt;
(FPGAs). My dream back then was to fully close the loop by adding a Luz code
generation backend to &lt;a class="reference external" href="https://github.com/eliben/pycparser"&gt;pycparser&lt;/a&gt;. Maybe
I'll still fulfill it one day :-)&lt;/p&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Assembly"></category><category term="EE &amp; Embedded"></category><category term="Linkers and Loaders"></category><category term="Python"></category></entry><entry><title>Introducing Luz</title><link href="https://eli.thegreenplace.net/2010/05/05/introducing-luz" rel="alternate"></link><published>2010-05-05T19:43:38-07:00</published><updated>2023-02-04T13:41:52-08:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2010-05-05:/2010/05/05/introducing-luz</id><summary type="html">
&lt;p&gt;
OK, so the documentation still isn't complete, but I can't wait to introduce my newest concoction - &lt;a href="https://github.com/eliben/luz-cpu/"&gt;Luz&lt;/a&gt;. Luz is a pure-Python implementation of a MIPS-like CPU (as a simulator, of course). This CPU is programmable in an assembly language, a complete assembler for which has been implemented, along with a …&lt;/p&gt;</summary><content type="html">
&lt;p&gt;
OK, so the documentation still isn't complete, but I can't wait to introduce my newest concoction - &lt;a href="https://github.com/eliben/luz-cpu/"&gt;Luz&lt;/a&gt;. Luz is a pure-Python implementation of a MIPS-like CPU (as a simulator, of course). This CPU is programmable in an assembly language, a complete assembler for which has been implemented, along with a linker that takes together several object files and creates an executable image to run on the simulator. Oh, and did I mention that it also includes a rudimentary debugger and disassembler? All of this is Luz:
&lt;/p&gt;

&lt;p&gt;
&lt;center&gt;
&lt;img src="https://eli.thegreenplace.net/images/2010/05/luz_proj_toplevel.png" title="luz_proj_toplevel" width="437" height="952" class="aligncenter size-full wp-image-2165" /&gt;
&lt;/center&gt;
&lt;/p&gt;

&lt;p&gt;
To call Luz new is a bit of a stretch, because I started working on it more than two years ago. It has been a jagged road, with occasional spurts of productivity, but now Luz is finally in a presentable form.
&lt;/p&gt;&lt;p&gt;

&lt;/p&gt;&lt;p&gt;
I'll paste from its "getting started guide":
&lt;/p&gt;

&lt;blockquote&gt;
&lt;strong&gt;What is Luz useful for?&lt;/strong&gt;
I don't know yet. It's a self-educational project of mine, and I learned a lot by working on it. I suppose that Luz's main value is as an educational tool. Its implementation focuses on simplicity and modularity, and is done in Python, which is a portable and very readable high-level language.
Luz can serve as a sample of implementing a complete assembler, a complete linker, a complete CPU simulator. Other such tools exist, but usually not in the clean and self-contained form offered by Luz. In any case, if you've found Luz iseful, I'd love to receive feedback.
&lt;/blockquote&gt;

&lt;p&gt;
This summarizes it, really. Not much more to add, except that Luz is available in source-only form for now, so you'll have to check it out from SVN or just look at the sources in the online browser. Checking the source out is recommended because it allows one to view the documentation in nice HTML format. A few example programs in Luz assembly are available. Luz requires Python 2.6 or higher and the PLY module installed. I tested it on Windows XP and Ubuntu.
&lt;/p&gt;&lt;p&gt;

I've written &lt;a href="https://eli.thegreenplace.net/2005/02/20/mix-implementation-in-perl-completed/"&gt;an assembler and a CPU simulator before&lt;/a&gt;, but that was for a very weird architecture (Knuth's MIX from TAOCP). Luz is a much more useful beast - the CPU is not far from real modern CPUs (the embedded kind, mostly), the assembly language is familiar and best of all, Luz also includes a linker, which will make it much easier to compile C for it in the future.
&lt;/p&gt;&lt;p&gt;

I'll write more about Luz in sometime later, when I find the time to work on its documentation.
&lt;/p&gt;


    </content><category term="misc"></category><category term="Assembly"></category><category term="EE &amp; Embedded"></category><category term="Linkers and loaders"></category></entry><entry><title>Framing in serial communications</title><link href="https://eli.thegreenplace.net/2009/08/12/framing-in-serial-communications" rel="alternate"></link><published>2009-08-12T05:16:47-07:00</published><updated>2023-06-30T23:16:27-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2009-08-12:/2009/08/12/framing-in-serial-communications</id><summary type="html">
        &lt;div class="section" id="introduction"&gt;
&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;In the &lt;a class="reference external" href="https://eli.thegreenplace.net/2009/08/07/a-live-data-monitor-with-python-pyqt-and-pyserial/"&gt;previous post&lt;/a&gt; we've seen how to send and receive data on the serial port with Python and plot it live using a pretty GUI.&lt;/p&gt;
&lt;p&gt;Notice that the sender script (sender_sim.py) is just sending one byte at a time. The &amp;quot;chunks&amp;quot; of data in the protocol between …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">
        &lt;div class="section" id="introduction"&gt;
&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;In the &lt;a class="reference external" href="https://eli.thegreenplace.net/2009/08/07/a-live-data-monitor-with-python-pyqt-and-pyserial/"&gt;previous post&lt;/a&gt; we've seen how to send and receive data on the serial port with Python and plot it live using a pretty GUI.&lt;/p&gt;
&lt;p&gt;Notice that the sender script (sender_sim.py) is just sending one byte at a time. The &amp;quot;chunks&amp;quot; of data in the protocol between the sender and receiver are single bytes. This is simple and convenient, but hardly sufficient in the general sense. We want to be able to send multiple-byte data frames between the communicating parties.&lt;/p&gt;
&lt;p&gt;However, there are some challenges that arise immediately:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;The receiver is just receiving a stream of bytes from the serial port. How does it know when a message begins or ends? How does it know how long the message is?&lt;/li&gt;
&lt;li&gt;Even more seriously, we can not assume a noise-free channel. This is real, physical hardware stuff. Bytes and whole chunks can and will be lost due to electrical noise. Worse, other bytes will be distorted (say, a single bit can be flipped due to noise).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To see how this can be done in a safe and tested manner, we first have to learn about the basics of the Data Link Layer in computer networks.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="data-link-layer"&gt;
&lt;h3&gt;Data Link Layer&lt;/h3&gt;
&lt;p&gt;Given a physical layer that can transmit signals between devices, the job of the Data Link Layer &lt;a class="footnote-reference" href="#id9" id="id1"&gt;[1]&lt;/a&gt; is (roughly stated) to transmit whole frames of data, with some means of assuring the integrity of the data (lack of errors). When we use sockets to communicate over TCP or UDP on the internet, the framing is taken care of deep in the hardware, and we don't even feel it. On the serial port, however, we must take care of the framing and error handling ourselves &lt;a class="footnote-reference" href="#id10" id="id2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="framing"&gt;
&lt;h4&gt;Framing&lt;/h4&gt;
&lt;p&gt;In chapter 3 of his &lt;a class="reference external" href="https://eli.thegreenplace.net/2009/08/08/book-review-computer-networks-4th-edition-by-andrew-tanenbaum/"&gt;&amp;quot;Computer Networks&amp;quot;&lt;/a&gt; textbook, Tanenbaum defines the following methods of framing:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;Inserting time gaps between frames&lt;/li&gt;
&lt;li&gt;Physical layer coding violations&lt;/li&gt;
&lt;li&gt;Character count&lt;/li&gt;
&lt;li&gt;Flag bytes with byte stuffing&lt;/li&gt;
&lt;li&gt;Flag bytes with bit stuffing&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Methods (1) and (2) are only suitable for a hardware-implemented data link layer &lt;a class="footnote-reference" href="#id11" id="id3"&gt;[3]&lt;/a&gt;. It is very difficult (read: impossible) to ensure timing when multiple layers of software (running on Windows!) are involved. (2) is an interesting hardware method - but out of the scope of this article.&lt;/p&gt;
&lt;p&gt;Method (3) means specifying in the frame header the number of bytes in the frame. The trouble with this is that the count can be garbled by a transmission error. In such a case, it's very difficult to &amp;quot;resynchronize&amp;quot;. This method is rarely used.&lt;/p&gt;
&lt;p&gt;Methods (4) and (5) are somewhat similar. In this article I'll focus on (4), as (5) is not suitable for serial port communications.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="flag-bytes-with-byte-stuffing"&gt;
&lt;h4&gt;Flag bytes with byte stuffing&lt;/h4&gt;
&lt;p&gt;Let's begin with a simple idea and develop it into a full, robust scheme.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Flag bytes&lt;/em&gt; are special byte values that denote when a frame begins and ends. Suppose that we want to be able to send frames of arbitrary length. A special start flag byte will denote the beginning of the frame, and an end flag byte will denote its end.&lt;/p&gt;
&lt;img src="https://eli.thegreenplace.net/images/2009/08/flags_data.png" /&gt;
&lt;p&gt;A question arises, however. Suppose that the value of the end flag is 0x98. What if the value 0x98 appears somewhere in the data? The protocol will get confused and end the message.&lt;/p&gt;
&lt;p&gt;There is a simple solution to this problem that will be familiar to all programmers who know about escaping quotes and special characters in strings. It is called &lt;em&gt;byte stuffing&lt;/em&gt;, or &lt;em&gt;octet stuffing&lt;/em&gt;, or simply &lt;em&gt;escaping&lt;/em&gt; &lt;a class="footnote-reference" href="#id12" id="id4"&gt;[4]&lt;/a&gt;. The scheme goes as follows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Whenever a flag (start or end) byte appears in the data, we shall insert a special escape byte (ESC) before it. When the receiver sees an ESC, it knows to ignore it and not insert it into the actual data received (de-stuffing).&lt;/li&gt;
&lt;li&gt;Whenever ESC itself has to appear in the data, another ESC is prepended to it. The receiver removes the first one but keeps the second one &lt;a class="footnote-reference" href="#id13" id="id5"&gt;[5]&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here are a few examples:&lt;/p&gt;
&lt;img src="https://eli.thegreenplace.net/images/2009/08/escaping.png" /&gt;
&lt;p&gt;Note that we didn't specify what the data is - it's arbitrary and up the the protocol to decide. The only really required part of the data is some kind of error checking - a checksum, or better yet a CRC. This is customarily the last byte (or last word) of the frame, referring to all the bytes in the frame (in its un-stuffed form).&lt;/p&gt;
&lt;p&gt;This scheme is quite robust: any lost byte (be it a flag, an escape, a data byte or a checksum byte) will cause the receiver to lose just one frame, after which it will resynchronize onto the start flag byte of the next one.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="ppp"&gt;
&lt;h4&gt;PPP&lt;/h4&gt;
&lt;p&gt;As a matter of fact, this method is a slight simplification of the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Point-to-Point_Protocol"&gt;Point-to-Point Protocol&lt;/a&gt; (PPP) which is used by most ISPs for providing ADSL internet to home users, so there's a good chance you're using it now to surf the net and read this article! The framing of PPP is defined in &lt;a class="reference external" href="http://tools.ietf.org/html/rfc1662"&gt;RFC 1662&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In particular, PPP does the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Both the start and end flag bytes are 0x7E (they shouldn't really be different, if you think about it)&lt;/li&gt;
&lt;li&gt;The escape byte is 0x7D&lt;/li&gt;
&lt;li&gt;Whenever a flag or escape byte appears in the message, it is escaped by 0x7D and the byte itself is XOR-ed with 0x20. So, for example 0x7E becomes 0x7D 0x5E. Similarly 0x7D becomes 0x7D 0x5D. The receiver unsuffs the escape byte and XORs the next byte with 0x20 again to get the original &lt;a class="footnote-reference" href="#id14" id="id6"&gt;[6]&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="an-example"&gt;
&lt;h3&gt;An example&lt;/h3&gt;
&lt;p&gt;Let's now see a completely worked-out example that demonstrates how this works.&lt;/p&gt;
&lt;p&gt;Suppose we define the following protocol:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Start flag: 0x12&lt;/li&gt;
&lt;li&gt;End flag: 0x13&lt;/li&gt;
&lt;li&gt;Escape (DLE): 0x7D&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And the sender wants to send the following data message (let's ignore its contents for the sake of the example - they're really not that important). The original data is in &lt;strong&gt;(a)&lt;/strong&gt;:&lt;/p&gt;
&lt;img src="https://eli.thegreenplace.net/images/2009/08/example1.png" /&gt;
&lt;p&gt;The data contains two flags that need to be escaped - an end flag at position 2 (counting from 0, of course!), and a DLE at position 4.&lt;/p&gt;
&lt;p&gt;The sender's data link layer &lt;a class="footnote-reference" href="#id15" id="id7"&gt;[7]&lt;/a&gt; turns the data into the frame shown in &lt;strong&gt;(b)&lt;/strong&gt; - start and end flags are added, and in-message flags are escaped.&lt;/p&gt;
&lt;p&gt;Let's see how the receiver handles such a frame. For demonstration, assume that the first byte the receiver draws from the serial port is not a real part of the message (we want to see how it handles this). In the following diagram, 'Receiver state' is the state of the receiver &lt;em&gt;after&lt;/em&gt; the received byte. 'Data buffer' is the currently accumulated message buffer to pass to an upper level:&lt;/p&gt;
&lt;img src="https://eli.thegreenplace.net/images/2009/08/example1_rcv.png" /&gt;
&lt;p&gt;A few things to note:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;The &amp;quot;stray&amp;quot; byte before the header is ignored: according to the protocol each frame has to start with a header, so this isn't part of the frame.&lt;/li&gt;
&lt;li&gt;The start and end flags are not inserted into the data buffer&lt;/li&gt;
&lt;li&gt;Escapes (DLEs) are correctly handled by a special state&lt;/li&gt;
&lt;li&gt;When the frame is finished with an end flag, the receiver has a frame ready to pass to an upper level, and comes back waiting for a header - a new frame.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, we see that the message received is exactly the message sent. All the protocol details (flags, escapes and so on) were transparently handled by the data link layer &lt;a class="footnote-reference" href="#id16" id="id8"&gt;[8]&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion"&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;There are several methods of handling framing in communications, although most are unsuitable to be used on top of the serial port. Among the ones that are suitable, the most commonly used is &lt;em&gt;byte stuffing&lt;/em&gt;. By defining a couple of &amp;quot;magic value&amp;quot; flags and careful rules of escaping, this framing methods is both robust and easy to implement as a software layer. It is also widely used as PPP depends on it.&lt;/p&gt;
&lt;p&gt;Finally, it's important to remember that for a high level of robustness, it's required to add some kind of error checking into the protocol - such as computing a CRC on the message and appending it as the last word of the message, which the receiver can verify before deciding that the message is valid.&lt;/p&gt;
&lt;div align="center" class="align-center"&gt;&lt;img class="align-center" src="https://eli.thegreenplace.net/images/hline.jpg" style="width: 320px; height: 5px;" /&gt;&lt;/div&gt;
&lt;table class="docutils footnote" frame="void" id="id9" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#id1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;The Data Link Layer is layer 2 in the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/OSI_model"&gt;OSI model&lt;/a&gt;. In the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/TCP/IP_model"&gt;TCP/IP model&lt;/a&gt; it's simply called the &amp;quot;link layer&amp;quot;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="id10" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#id2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;The serial port can be configured to add parity bits to bytes. These days, this option is rarely used, because:&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;A single parity bit isn't a very strong means of detecting errors. 2-bit errors fool it.&lt;/li&gt;
&lt;li&gt;Error handling is usually done by stronger means at a higher level.&lt;/li&gt;
&lt;/ul&gt;
&lt;table class="docutils footnote" frame="void" id="id11" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#id3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;For example Ethernet (802.3) uses 12 octets of idle characters between frames.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="id12" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#id4"&gt;[4]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;You might run into the term DLE - Data Link Escape, which means the same thing. I will use the acronyms DLE and ESC interchangeably.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="id13" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#id5"&gt;[5]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Just like quotes and escape characters in strings! In C: &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;&amp;quot;I&lt;/span&gt; &lt;span class="pre"&gt;say&lt;/span&gt; &lt;span class="pre"&gt;\&amp;quot;Hello\&amp;quot;&amp;quot;&lt;/span&gt;&lt;/tt&gt;. To escape the escape, repeat it: &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;&amp;quot;Here&lt;/span&gt; &lt;span class="pre"&gt;comes&lt;/span&gt; &lt;span class="pre"&gt;the&lt;/span&gt; &lt;span class="pre"&gt;backslash:&lt;/span&gt; &lt;span class="pre"&gt;\\&lt;/span&gt; &lt;span class="pre"&gt;-&lt;/span&gt; &lt;span class="pre"&gt;seen&lt;/span&gt; &lt;span class="pre"&gt;it?&amp;quot;&lt;/span&gt;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="id14" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#id6"&gt;[6]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;I'd love to hear why this XOR-ing is required. One simple reason I can think of is to prevent the flag and escape bytes appearing &amp;quot;on the line&amp;quot; even after they're escaped. Presumably this improves resynchronization if the escape byte is lost?&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="id15" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#id7"&gt;[7]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Which is just a fancy way to say &amp;quot;a protocol wrapping function&amp;quot;, since the layer is implemented in software.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="id16" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#id8"&gt;[8]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Such transparency is one of the greatest ideas of layered network protocols. So when we implement protocols in software, it's a good thing to keep in mind - transparency aids modularity and decoupling, it's a &lt;em&gt;good thing&lt;/em&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;

    </content><category term="misc"></category><category term="EE &amp; Embedded"></category><category term="Serial port"></category></entry></feed>