This is a post about multiplying polynomials, convolution sums and the connection between them.

Multiplying polynomials

Suppose we want to multiply one polynomial by another:

\[(3x^3+x^2+2x+1)\cdot(2x^2+6)\]

This is basic middle-school math - we start by cross-multiplying:

\[6x^5+2x^4+4x^3+2x^2+18x^3+6x^2+12x+6\]

And then collect all the terms together by adding up the coefficients:

\[6x^5+2x^4+22x^3+8x^2+12x+6\]

Let's look at a slightly different way to achieve the same result. Instead of cross multiplying all terms and then adding up, we'll focus on what terms appear in the output, and for each such term - what its coefficients are going to be. This is easy to demonstrate with a table, where we lay out one polynomial horizontally and the other vertically. Note that we have to be explicit about the zero coefficient of x in the second polynomial:

Table showing polynomial multiplication

The diagonals that add up to each term in the output are highlighted. For example, to get the coefficient of x^3 in the output, we have to add up:

  • The coefficient of x^3 in the first polynomial, multiplied by the constant (coefficient of x^0) in the second polynomial
  • The coefficient of x^2 in the first polynomial, multiplied by the coefficient of x in the second polynomial
  • The coefficient of x in the first polynomial, multiplied by the coefficients of x^2 in the second polynomial

(if the second polynomial had a x^3 term, there would be another component to add)

For what follows, let's move to a somewhat more abstract representation: a polynomial P can be represented as a sum of coefficients P_i multiplied by corresponding powers of x [1]:

\[P=\sum_{i}P_i x^i\]

Suppose we have two polynomials, P and R. When we multiply them together, the resulting polynomial is S. Based on our insight from the table diagonals above, we can say that for each k:

\[S_k=\sum_{i}P_i\cdot R_{k-i}\]

And then the entire polynomial S is:

\[S=\sum_{k} \left( \sum_{i}P_i\cdot R_{k-i}\right)x^k\]

It's important to understand this formulation, since it's key to this post. Let's use our example polynomials to see how it works. First, we represent the two polynomials as sequences of coefficients (starting with 0, so the coefficient of the constant is first, and the coefficient of the highest power is last):

\[P=[1,2,1,3]\qquad R=[6,0,2]\]

In this notation, P_0 is the first element in the list for P, etc. To calculate the coefficient of x^3 in the product:

\[S_3=\sum_{i}P_i\cdot R_{3-i}\]

Expanding the sum for all the non-zero coefficients of P:

\[S_3=P_0 R_3+P_1 R_2+P_2 R_1+P_3 R_0=0+4+0+18=22\]

Similarly, we'll find that S_4=2, S_2=8 and so on, resulting in the final polynomial as before:

\[6x^5+2x^4+22x^3+8x^2+12x+6\]

There's a nice graphical approach to multiply polynomials using this idea of pairing up sums for each term in the output:

Graphical representation of poly mul, part 1

We start with the diagram on the left: one of the polynomials remains in its original form, while the other is flipped around (constant term first, highest power term last). We line up the polynomials vertically, and multiply the corresponding coefficients: the constant coefficient of the output is just the constant coefficient of the first polynomial times the constant coefficient of the second polynomial.

The diagram on the right shows the next step: the second polynomial is shifted left by one term and lined up vertically again. The corresponding coefficients are multiplied, and then the results are added to obtain the coefficient of x in the output polynomial.

We continue the process by shifting the lower polynomial left:

Graphical representation of poly mul, part 2

Calculating the coefficients of x^2 and then x^3. A couple more steps:

Graphical representation of poly mul, part 3

Now we have all the coefficients of the output. Take a moment to convince yourself that this approach is equivalent to the summation shown just before it, and also to the "diagonals in a table" approach shown further up. They all calculate the same thing [2]!

Hopefully it should be clear why the second polynomial is "flipped" to perform this procedure. This all comes down to which input terms pair up to calculate each output term. As seen above:

\[S_k=\sum_{i}P_i\cdot R_{k-i}\]

While the index i moves in one direction (from the low power terms to the high power terms) in P, the index k-i moves in the opposite direction in R.

If this procedure reminds you of computing a convolution between two arrays, that's because it's exactly that!

Signals, systems and convolutions

The theory of signals and systems is a large topic (typically taught for one or two semesters in undergraduate engineering degrees), but here I want to focus on just one aspect of it which I find really elegant.

Let's define discrete signals and systems first, restricting ourselves to 1D (similar ideas apply in higher dimensions):

Discrete signal: An ordered sequence of numbers x[n] with integer indices n\in \left\{ \dots,-2,-1,0,1,2,\dots \right\}. Can also be thought of as a function x:\mathbb{Z} \rightarrow \mathbb{R}.

Discrete system: A function mapping input signals x[n] to output signals y[n]. For example, y[n]=2x[n] is a system that scales all signals by a factor of two.

Here's an example signal:

Basic signal

This is a finite signal. All values not explicitly shown in the chart are assumed to be 0 (e.g. x[3]=0, x[-1]=0, x[99]=0 and so on).

A very important signal is the discrete impulse:

\[\delta[n]=\begin{cases} 1\quad if \quad n=0\\ 0\quad otherwise \end{cases}\]

In graphical form, here's \delta[n], as well as a couple of time-shifted variants of it. Note how we shift a signal left and right on the horizontal axis by adding to or subtracting from n, correspondingly. Take a moment to double check why this works.

Discrete impulse function delta with shifts

The impulse is useful because we can decompose any discrete signal into a sequence of scaled and shifted impulses. Our example signal has three non-zero values at indices 0, 1 and 2; we can represent it as follows:

\[x[n]=x[0]\delta[n]+x[1]\delta[n-1]+x[2]\delta[n-2]=2\delta[n]+2\delta[n-1]+\delta[n-2]\]

More generally, a signal x[n] can be written as:

\[x[n]=\sum_{k} x[k]\delta[n-k]\]

(for all k where x[k] is nonzero)

In the study of signals and systems, linear and time-invariant (LTI) systems are particularly important.

Linear: suppose y_1[n] is the output of a system for input signal x_1[n], and similarly y_2[n] is the output for x_2[n]. A linear system outputs a\cdot y_1[n]+b\cdot y_2[n] for the input a\cdot x_1[n] + b\cdot x_2[n] where a and b are constants.

Time-invariant: if we delay the input signal by some constant: x_1[n-N], the output is similarly delayed: y_1[n-N]. In other words, the response of the system has a similar shape, no matter when the signal was received (it behaves today similarly to the way it behaved yesterday).

LTI systems are important because of the decomposition of a signal into impulses discussed above. Suppose we want to characterize a system: what it does to an arbitrary signal. For an LTI system, all we need to describe is its response to an impulse!

If the response of our system to \delta[n] is h[n], then:

  • Its response to c\cdot \delta[n] is c\cdot h[n], for any constant c, because the system is linear.
  • Its response to \delta[n-k] is h[n-k], for any time shift k, because the system is time-invariant.

We'll combine these and use linearity again (note that in the following x[k] are just constants); the response to a signal decomposed into a sum of shifted and scaled impulses:

\[y[n]=\sum_{k} x[k]\delta[n-k]\]

Is:

\[y[n]=\sum_{k} x[k]h[n-k]\]

This operation is called the convolution between sequences x[n] and h[n], and is denoted with the \ast operator:

\[y[n]=x[n]\ast h[n]\]

Let's work through an example. Suppose we have an LTI system with the following response to an impulse:

Impulse response h[n]

The response has h[0]=1, h[1]=3 and zeros everywhere else. Recall our sample signal from the top of this section (the sequence 2, 2, 1). We can decompose it to a sequence of scaled and shifted impulses, and then calculate the system response to each of them. Like this:

Decomposed x[n] and the h[n] for each component

The top row shows the input signal decomposed into scaled and shifted impulses; the bottom row is the corresponding system response to each input. If we carefully add up the responses for each n, we'll get the system response y[n] to our input:

y[n] full system response

Now, let's calculate the same output, but this time using the convolution sum. First, we'll represent the signal x and the impulse response h as sequences (just like we did with polynomials), with x[0] first, then x[1] etc:

\[x=[2,2,1]\qquad h=[1, 3]\]

The convolution sum is:

\[y[n]=\sum_{k} x[k]h[n-k]\]

Recall that k ranges over all the non-zero elements in x. Let's calculate each element of y[n] (noting that h is nonzero only for indices 0 and 1):

\[\begin{align*} y[0]&=x[0]h[0]=2\\ y[1]&=x[0]h[1]+x[1]h[0]=8\\ y[2]&=x[1]h[1]+x[2]h[0]=7\\ y[3]&=x[2]h[1]=3 \end{align*}\]

All subsequent values of y are zero because k only ranges up to 2 and h[4-2]=h[2]=0.

If you look carefully at this calculation, you'll notice that h is "flipped" in relation to x, just like with the polynomials:

Convolution between signals by flipping one and sliding
  • We start with x[n] (black) and flipped h[n] (blue), and line up the first non-zero elements. This computes y[0]
  • In subsequent steps, h[n] is slides right, one element at a time, and the next value of y is computed by adding up the element-wise products of the lined up x and h.

Just like with polynomials [3], the reason why one of the inputs is flipped is clear from the definition of the convolution sum, where one of the the indices increases (k), while the other decreases (n-k).

Properties of convolution

The convolution operation has many useful algebraic properties: linearity, associativity, commutativity, distributivity, etc.

The commutative property means that when computing convolutions graphically, it doesn't matter which of the signals is "flipped", because:

\[y[n]=x[n]\ast h[n]=h[n]\ast x[n]\]

And therefore:

\[y[n]=\sum_{k} x[k]h[n-k]=\sum_{k} x[n-k]h[k]\]

But the most important property of the convolution is how it behaves in the frequency domain. If we denote \mathcal{F}(f) as the Fourier transform of signal f, then the convolution theorem states:

\[\mathcal{F}(f\ast g)=\mathcal{F}(f)\cdot\mathcal{F}(g)\]

The Fourier transform of a convolution is equal to simple multiplication between the Fourier transforms of its operands. This fact - along with advanced algorithms like FFT - make it possible to implement convolutions very efficiently.

This is a deep and fascinating topic, but we'll leave it as a story for another day.


[1]Theoretically, -\infty<i<\infty, but we only care about the non-zero coefficients. Therefore, we won't be specifying summation bounds; instead, \sum_{i} means "sum over all the non-zero coefficients"
[2]As a cool exercise, explore how the same technique works for multiplying two numbers together, treating each number as a polynomial of subsequent powers of 10. There's just one slight complication with carries that have to be taken into account in the end, but it works really well! This blog post has more information, if you're interested.
[3]

The similarity in behavior between polynomials and signals here is quite beautiful, and far from accidental. In fact, if we take polynomials with addition and multiplication, we get a ring that's isomorphic to the ring of finite sequences with similar operations. Even operations like "delay" or "shift right" can be emulated with multiplying a polynomial by a power x (which could be negative if we want to "shift left").

This isomorphism is widely employed in mathematics; for example, ordinary generating functions use it to represent sequences as polynomials an then manipulate them algebraically. In signal processing it's also used for the Z transform.