Tags Math

We'll start with a visual and intuitive representation of what a projection is. In the following diagram, we have vector b in the usual 3-dimensional space and two possible projections - one onto the z axis, and another onto the x,y plane.

Projection of a 3d vector onto axis and plane

If we think of 3D space as spanned by the usual basis vectors, a projection onto the z axis is simply:

\[b_z=\begin{bmatrix} 0 \\ 0 \\ z \end{bmatrix}\]

A couple of intuitive ways to think about what a projection means:

  • The projection of b on the z axis is a vector in the direction of the z axis that's closest to b.
  • The projection of b on the z axis is the shadow cast by b when a flashlight is pointed at it in the direction of the z axis.

We'll see a more formal definition soon. A projection onto the x,y plane is similarly easy to express.

Projection onto a line

Projecting onto an axis is easy - as the diagram shows, it's simply taking the vector component in the direction of the axis. But how about projections onto arbitrary lines?

Projection of a 3d vector onto another 3D vector

In vector space, a line is just all possible scalings of some vector [1].

Speaking more formally now, we're interested in the projection of \vec{b} onto \vec{a}, where the arrow over a letter means it's a vector. The projection (which we call \vec{b_a}) is the closest vector to \vec{b} in the direction of \vec{a}. In other words, the dotted line in the diagram is at a right angle to the line a; therefore, the error vector \vec{e} is orthogonal to \vec{a}.

This orthogonality gives us the tools we need to find the projection. We'll want to find a constant c such that:

\[\vec{b_a}=c\vec{a}\]

\vec{e} is orthogonal to \vec{a}, meaning that their dot product is zero: \vec{e}\cdot\vec{a}=0. We'll use the distributive property of the dot product in what follows:

\[\begin{align*} \vec{a}\cdot\vec{e}&=0 \\ \vec{a}\cdot(\vec{b}-c\vec{a})&=0\\ \vec{a}\cdot\vec{b}-c\vec{a}\cdot\vec{a}&=0\\ c&=\frac{\vec{a}\cdot\vec{b}}{\vec{a}\cdot\vec{a}} \end{align*}\]

Note that \vec{a}\cdot\vec{a} is the squared magnitude of \vec{a}; for a unit vector this would be 1. This is why it doesn't matter if \vec{a} is a unit vector or not - we normalize it anyway.

We have a formula for c now - we can find it given \vec{a} and \vec{b}. To prepare for what comes next, however, we'll switch notations. We'll use matrix notation, in which vectors are - by convention - column vectors, and a dot product can be expressed by a matrix multiplication between a row and a column vector. Therefore:

\[\begin{align*} c&=\frac{a^T b}{a^T a} \Rightarrow \\ b_a&=\frac{a^T b}{a^T a}a \end{align*}\]

Projection matrix

Since the fraction representing c is a constant, we can switch the order of the multiplication by a, and then use the fact that matrix multiplication is associative to write:

\[\begin{align*} b_a&=a\frac{a^T b}{a^T a}\\ b_a&=\frac{a a^T}{a^T a}b \end{align*}\]

In our case, since a is a 3D vector, a a^T is a 3x3 matrix [2], while a^Ta is a scalar. Thus we get our projection matrix - call it P:

\[\begin{align*} P&=\frac{a a^T}{a^T a}\\ b_a&=Pb \end{align*}\]

A recap: given some vector \vec{a}, we can construct a projection matrix P. This projection matrix can take any vector \vec{b} and help us calculate its projection onto \vec{a} by means of a simple matrix multiplication!

Example of line projection

Consider our original example - projection on the z axis. First, we'll find a vector that spans the subspace represented by the z axis: a trivial vector is the unit vector:

\[a_z=\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\]

What's the projection matrix corresponding to this vector?

\[P = \frac{a_z a_{z}^{T}}{1} = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\begin{bmatrix}0&0&1\end{bmatrix}=\begin{bmatrix} 0&0&0\\ 0&0&0\\ 0&0&1 \end{bmatrix}\]

Now, given any arbitrary vector \vec{b} we can find its projection onto the z axis by multiplying with P. For example:

\[b_a=Pb=\begin{bmatrix} 0&0&0\\ 0&0&0\\ 0&0&1 \end{bmatrix}\begin{bmatrix} x\\ y\\ z \end{bmatrix}=\begin{bmatrix} 0\\ 0\\ z \end{bmatrix}\]

Another example - less trivial this time. Say we want to project vectors onto the line spanned by the vector:

\[a=\begin{bmatrix} 1 \\ 3 \\ 7 \end{bmatrix}\]

Let's compute the projection matrix:

\[P = \frac{a a^{T}}{a^T a} = \frac{1}{59}\begin{bmatrix} 1 \\ 3 \\ 7 \end{bmatrix}\begin{bmatrix}1&3&7\end{bmatrix}=\frac{1}{59}\begin{bmatrix} 1&3&7\\ 3&9&21\\ 7&21&49 \end{bmatrix}\]

Now we'll use it to calculate the projection of b=\begin{bmatrix}2 & 8 & -4\end{bmatrix}^T onto this line:

\[b_a=Pb=\frac{1}{59}\begin{bmatrix} 1&3&7\\ 3&9&21\\ 7&21&49 \end{bmatrix}\begin{bmatrix} 2\\ 8\\ -4 \end{bmatrix}=\frac{1}{59}\begin{bmatrix} -2\\ -6\\ -14 \end{bmatrix}\]

To verify this makes sense, we can calculate the error vector \vec{e}:

\[\begin{align*} e&=b-b_a=\begin{bmatrix} 2\\ 8\\ -4 \end{bmatrix}-\frac{1}{59}\begin{bmatrix} -2\\ -6\\ -14 \end{bmatrix}=\frac{1}{59}\begin{bmatrix} 120\\ 478\\ -222 \end{bmatrix} \end{align*}\]

And check that it's indeed orthogonal to \vec{a}:

\[a\cdot e = \frac{1}{59}(1\cdot 120 + 3\cdot 478 + 7 \cdot -222)=0\]

Projection onto a vector subspace

A subspace of a vector space is a subset of vectors from the vector space that's closed under vector addition and scalar multiplication. For \mathbb{R}^3, some common subspaces include lines that go through the origin and planes that go through the origin.

Therefore, the projection onto a line scenario we've discussed so far is just a special case of a projection onto a subspace. We'll look at the general case now.

Suppose we have an m-dimensional vector space \mathbb{R}^m, and a set of n linearly independent vectors \vec{a_1},\dots,\vec{a_n} \in \mathbb{R}^m. We want to find a combination of these vectors that's closest to some target vector \vec{b} - in other words, to find the projection of \vec{b} onto the subspace spanned by \vec{a_1},\dots,\vec{a_n}.

Arbitrary m-dimensional vectors are difficult to visualize, but the derivation here follows exactly the path we've taken for projections onto lines in 3D. There, we were looking for a constant c such that c\vec{a} was the closest vector to \vec{b}. Now, we're looking for a vector \vec{c} which represents a linear combination of \vec{a_1},\dots,\vec{a_n} that is closest to a target \vec{b}.

If we organize \vec{a_1},\dots,\vec{a_n} as columns into a matrix called A, we can express this as:

\[\vec{b_a}=A\vec{c}\]

This is a matrix multiplication: \vec{c} is a list of coefficients that describes some linear combination of the columns of A. As before, we want the error vector \vec{e}=\vec{b}-\vec{b_a} to be orthogonal to the subspace onto which we're projecting: this means it's orthogonal to every one of \vec{a_1},\dots,\vec{a_n}. The fact that vectors \vec{a_n} are orthogonal to \vec{e} can be expressed as [3]:

\[\begin{align*} a_{1}^{T}e&=0\\ \vdots\\ a_{n}^{T}e&=0 \end{align*}\]

This is a system of linear equations, and thus it can be represented as a matrix multiplication by a matrix with vectors a_{k}^T in its rows; this matrix is just A^T:

\[A^T e=0\]

But e=b-Ac, so:

\[\begin{align*} A^T (b-Ac)&=0 \Rightarrow \\ A^Tb&=A^TAc \end{align*}\]

Since the columns of A are linearly independent, A^T A is an invertible matrix [4], so we can isolate c:

\[c=(A^T A)^{-1}A^T b\]

Then the projection \vec_{b_a} is:

\[b_a=Ac=A(A^T A)^{-1}A^T b\]

Similarly to the line example, we can also define a projection matrix as:

\[P=A(A^T A)^{-1}A^T\]

Given a vector \vec{b}, P projects it onto the subspace spanned by the vectors \vec{a_1},\dots,\vec{a_n}:

\[b_a=Pb\]

Let's make sure the dimensions work out. Recall that A consists of n columns, each with m rows. So we have:

\[\begin{matrix} A & (m\times n) \\ A^T & (n\times m)\\ A^T A & (n\times n) \\ (A^T A)^{-1} & (n\times n) \\ A(A^T A)^{-1} & (m\times n) \\ A(A^T A)^{-1}A^T & (m\times m) \\ \end{matrix}\]

Since the vector \vec{b} is m-dimensional, Pb is valid and the result is another m-dimensional vector - the projection \vec{b}_a.

Example of subspace projection

At the beginning of this post there's a diagram showing the projection of an arbitrary vector \vec{b} onto a line and onto a plane. We'll find the projection matrix for the plane case now. The projection is onto the xy plane, which is spanned by these vectors:

\[a_x=\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} a_y=\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\]

Collecting them into a single matrix A, we get:

\[A=\begin{bmatrix} 1 & 0\\ 0 & 1\\ 0 & 0 \end{bmatrix}\]

To find P, let's first calculate A^T A:

\[A^T A= \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0\\ 0 & 1\\ 0 & 0 \end{bmatrix}= \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix}\]

This happens to be the identity matrix, so its inverse is itself. Thus, we get:

\[P=A(A^T A)^{-1}A^T=AIA^T=AA^T= \begin{bmatrix} 1 & 0\\ 0 & 1\\ 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0 \end{bmatrix}= \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{bmatrix}\]

We can now project an arbitrary vector \vec{b} onto this plane by multiplying it with this P:

\[b_a=Pb= \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix}= \begin{bmatrix} x \\ y \\ 0 \end{bmatrix}\]

Granted, this is a fairly trivial example - but it works in the general case. As an exercise, pick a different pair of independent vectors and find the projection matrix onto the plane spanned by them; then, verify that the resulting error is orthogonal to the plane.

Properties of projection matrices

Projection matrices have some interesting properties that are educational to review.

First, projection matrices are symmetric. To understand why, first recall how a transpose of a matrix product is done:

\[(AB)^T=B^T A^T\]

As a warm-up, we can show that A^T A is symmetric:

\[(A^T A)^T=A^T (A^T)^T=A^T A\]

Now, let's transpose P:

\[\begin{align*} P&=A(A^T A)^{-1}A^T \\ P^T&=(A(A^T A)^{-1}A^T)^T\\ &=((A^T A)^{-1}A^T)^T A^T\\ &=A(A^T A)^{-1}A^T=P \end{align*}\]

Here we've used the fact that the inverse of a symmetric matrix is also symmetric, and we see that indeed P^T=P.

Second, projection matrices are idempotent: P^2=P; this isn't hard to prove either:

\[\begin{align*} P^2&=A(A^T A)^{-1}A^T A(A^T A)^{-1}A^T\\ &=A(A^T A)^{-1}(A^T A)(A^T A)^{-1}A^T\\ &=A(A^T A)^{-1}[(A^T A)(A^T A)^{-1}]A^T\\ &=A(A^T A)^{-1}IA^T\\ &=A(A^T A)^{-1}A^T=P \end{align*}\]

Intuitive explanation: think about what a projection does - given some \vec{b}, it calculates the closest vector to it in the desired subspace. If we try to project this projection again - what will we get? Well, still the closest vector in that subspace - itself! In other words:

\[b_a=Pb=P(Pb)\]

Projections onto orthogonal subspaces

There's another special case of projections that is interesting to discuss: projecting a vector onto orthogonal subspaces. We'll work through this using an example.

Consider the vector:

\[a_1=\begin{bmatrix} 1 \\ -2 \\ 3 \end{bmatrix}\]

We'll find the projection matrix for this vector:

\[P_1=\frac{a_1 a_{1}^T}{a_{1}^T a_1}= \frac{1}{14} \begin{bmatrix} 1 & -2 & 3\\ -2 & 4 & -6\\ 3 & -6 & 9 \end{bmatrix}\]

Now, consider the following vector, which is orthogonal to \vec{a_1}:

\[a_2=\begin{bmatrix} -3 \\ 0 \\ 1 \end{bmatrix}\]

Its projection matrix is:

\[P_2=\frac{a_2 a_{2}^T}{a_{2}^T a_2}= \frac{1}{10} \begin{bmatrix} 9 & 0 & -3\\ 0 & 0 & 0\\ -3 & 0 & 1 \end{bmatrix}\]

It's trivial to check that both P_1 and P_2 satisfy the properties of projective matrices; what's more interesting is that P_1 + P_2 does as well - so it's also a proper projection matrix!

To take it a step further, consider yet another vector:

\[a_3=\begin{bmatrix} -1 \\ -5 \\ -3 \end{bmatrix}\]

The vectors (\vec{a_1},\vec{a_2},\vec{a_3}) are all mutually orthogonal, and thus form an orthogonal basis for \mathbb{R}^3. We can calculate P_3 in the usual way, and get:

\[P_3=\frac{a_3 a_{3}^T}{a_{3}^T a_3}= \frac{1}{35} \begin{bmatrix} 1 & 5 & 3\\ 5 & 25 & 15\\ 3 & 15 & 9 \end{bmatrix}\]

Not only is P_1+P_2+P_3 is a projection matrix, it's a very familiar matrix in general:

\[P_1+P_2+P_3=I\]

This is equivalent to saying that for any vector \vec{b}:

\[(P_1+P_2+P_3)b=b\]

Hopefully this makes intuitive sense because it's just expressing \vec{b} in an alternative basis for \mathbb{R}^3 [5].


[1]We're dealing with vector spaces, where we don't really have lines - only vectors. A line is just a visual way to think about certain subspaces of the vector space \mathbb{R}^3. Specifically, a line through the origin (lines that don't go through the origin belong in affine spaces) is a way to represent \forall c, c\vec{a} where \vec{a} is a vector in the same direction as this line and c is a constant; in other words it's the subspace of \mathbb{R}^3 spanned by \vec{a}.
[2]By the rules of matrix multiplication: we're multiplying a column vector (a 3x1 matrix) by a row vector (a 1x3 matrix). The multiplication is allowed because the inner dimensions match, and the result is a 3x3 matrix.
[3]Recall from the earlier example: we're dropping the explicit vector markings to be able to write matrix arithmetic more naturally. By default vectors are column vectors, so v^T w expresses the dot product between vectors \vec{v} and \vec{w}.
[4]It's possible to prove this statement, but this post is already long enough.
[5]This is a special case of a change of basis, in which the basis is orthogonal.