Tags Math

Knowing how to convert a vector to a different basis has many practical applications. Gilbert Strang has a nice quote about the importance of basis changes in his book [1] (emphasis mine):

The standard basis vectors for \mathbb{R}^n and \mathbb{R}^m are the columns of I. That choice leads to a standard matrix, and T(v)=Av in the normal way. But these spaces also have other bases, so the same T is represented by other matrices. A main theme of linear algebra is to choose the bases that give the best matrix for T.

This should serve as a good motivation, but I'll leave the applications for future posts; in this one, I will focus on the mechanics of basis change, starting from first principles.

The basis and vector components

A basis of a vector space V is a set of vectors in V that is linearly independent and spans V. An ordered basis is a list, rather than a set, meaning that the order of the vectors in an ordered basis matters. This is important with respect to the topics discussed in this post.

Let's now define components. If U = u_1,u_2,...,u_n is an ordered basis for V and v is a vector in V, then there's a unique [2] list of scalars c_1,c_2,...,c_n such that:

\[v = c_1u_1+c_2u_2+...+c_nu_n\]

These are called the components of v relative to the ordered basis U. We'll introduce a useful piece of notation here: collect the components c_1,c_2,...,c_n into a column vector and call it [v]_{\text{\tiny U}}: this is the component vector of v relative to the basis U.

Example: finding a component vector

Let's use \mathbb{R}^2 as an example. U=(2,3), (4,5) is an ordered basis for \mathbb{R}^2 (since the two vectors in it are independent). Say we have v=(2,4). What is [v]_{\text{\tiny U}}? We'll need to solve the system of equations:

\[\begin{pmatrix} 2 \\ 4 \end{pmatrix}=c_1\begin{pmatrix} 2 \\ 3\end{pmatrix}+c_2\begin{pmatrix} 4 \\ 5 \end{pmatrix}\]

In the 2-D case this is trivial - the solution is c_1=3 and c_2=-1. Therefore:

\[[v]_{\text {\tiny U}}=\begin{pmatrix} 3 \\ -1 \end{pmatrix}\]

In the more general case of \mathbb{R}^n, this is akin to solving a linear system of n equations with n variables. Since the basis vectors are, by definition, linearly independent, solving the system is simply inverting a matrix [3].

Change of basis matrix

Now comes the key part of the post. Say we have two different ordered bases for the same vector space: U = u_1,u_2,...,u_n and W= w_1,w_2,...,w_n. For some v\in V, we can find [v]_{\text{\tiny U}} and [v]_{\text{\tiny W}}. How are these two related?

Surely, given [v]_{\text{\tiny U}} we can find its coefficients in basis W the same way as we did in the example above [4]. It involves solving a linear system of n equations. We'll have to redo this operation for every vector v we want to convert. Is there a simpler way?

Luckily for science, yes. The key here is to find how the basis vectors of U look in basis W. In other words, we have to find [u_1]_{\text{\tiny W}}, [u_2]_{\text{\tiny W}} and so on to [u_n]_{\text{\tiny W}}.

Let's say we do that and find the coefficients to be a_{ij} such that:

\[\begin{matrix} u_1=a_{11}w_1+a_{21}w_2+...+a_{n1}w_n \\ u_2=a_{12}w_1+a_{22}w_2+...+a_{n2}w_n \\ ... \\ u_n=a_{1n}w_1+a_{2n}w_2+...+a_{nn}w_n \end{matrix}\]

Now, given some vector v \in V, suppose its components in basis U are:

\[[v]_{\text{\tiny U}}=\begin{pmatrix} c_1 \\ c_2 \\ ... \\ c_n \end{pmatrix}\]

Let's try to figure out how it looks in basis W. The above equation (by definition of components) is equivalent to:

\[v=c_1u_1+c_2u_2+...+c_nu_n\]

Substituting the expansion of the us in basis W, we get:

\[v=\begin{matrix} c_1(a_{11}w_1+a_{21}w_2+...+a_{n1}w_n)+ \\ c_2(a_{12}w_1+a_{22}w_2+...+a_{n2}w_n)+ \\ ... \\ c_n(a_{1n}w_1+a_{2n}w_2+...+a_{nn}w_n) \end{matrix}\]

Reordering a bit to find the multipliers of each w:

\[v=\begin{matrix} (c_1a_{11}+c_2a_{12}+...+c_na_{1n})w_1+ \\ (c_1a_{21}+c_2a_{22}+...+c_na_{2n})w_2+ \\ ... \\ (c_1a_{n1}+c_2a_{n2}+...+c_na_{nn})w_n \end{matrix}\]

By our definition of vector components, this equation is equivalent to:

\[[v]_{\text{\tiny W}}=\begin{pmatrix} c_1a_{11}+c_2a_{12}+...+c_na_{1n} \\ c_1a_{21}+c_2a_{22}+...+c_na_{2n} \\ ... \\ c_1a_{n1}+c_2a_{n2}+...+c_na_{nn} \end{pmatrix}\]

Now we're in vector notation again, so we can decompose the column vector on the right hand side to:

\[[v]_{\text{\tiny W}}=\begin{pmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} \\ ... & ... & ... \\ a_{n1} & a_{n2} & ... & a_{nn} \end{pmatrix}\begin{pmatrix}c_1 \\ c_2 \\ ... \\ c_n \end{pmatrix}\]

This is matrix times a vector. The vector on the right is [v]_{\text{\tiny U}}. The matrix should look familiar too because it consists of those a_{ij} coefficients we've defined above. In fact, this matrix just represents the basis vectors of U expressed in basis W. Let's call this matrix A_{\text{\tiny U}\rightarrow \text{\tiny W}} - the change of basis matrix from U to W. It has [u_1]_{\text{\tiny W}} to [u_n]_{\text{\tiny W}} laid out in its columns:

\[A_{\text{\tiny U}\rightarrow \text{\tiny W}}=\begin{pmatrix}[u_1]_{\text{\tiny W}},[u_2]_{\text{\tiny W}},...,[u_n]_{\text{\tiny W}}]\end{pmatrix}\]

So we have:

\[[v]_{\text{\tiny W}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]

To recap, given two bases U and W, we can spend some effort to compute the "change of basis" matrix A_{\text{\tiny U}\rightarrow \text{\tiny W}}, but then we can easily convert any vector in basis U to basis W if we simply left-multiply it by this matrix.

A reasonable question to ask at this point is - what about converting from W to U? Well, since the computations above are completely generic and don't special-case either base, we can just flip the roles of U and W and get another change of basis matrix, A_{\text{\tiny W}\rightarrow \text{\tiny U}} - it converts vectors in base W to vectors in base U as follows:

\[[v]_{\text{\tiny U}}=A_{\text{\tiny W}\rightarrow \text{\tiny U}}[v]_{\text{\tiny W}}\]

And this matrix is:

\[A_{\text{\tiny W}\rightarrow \text{\tiny U}}=\begin{pmatrix}[w_1]_{\text{\tiny U}},[w_2]_{\text{\tiny U}},...,[w_n]_{\text{\tiny U}}]\end{pmatrix}\]

We will soon see that the two change of basis matrices are intimately related; but first, an example.

Example: changing bases with matrices

Let's work through another concrete example in \mathbb{R}^2. We've used the basis U=(2,3), (4,5) before; let's use it again, and also add the basis W=(-1,1), (1,1). We've already seen that for v=(2,4) we have:

\[[v]_{\text {\tiny U}}=\begin{pmatrix} 3 \\ -1 \end{pmatrix}\]

Similarly, we can solve a set of two equations to find [v]_{\text {\tiny W}}:

\[[v]_{\text {\tiny W}}=\begin{pmatrix} 1 \\ 3 \end{pmatrix}\]

OK, let's see how a change of basis matrix can be used to easily compute one given the other. First, to find A_{\text{\tiny U}\rightarrow \text{\tiny W}} we'll need [u_1]_{\text {\tiny W}} and [u_2]_{\text {\tiny W}}. We know how to do that. The result is:

\[[u_1]_{\text {\tiny W}}=\begin{pmatrix} 0.5 \\ 2.5 \end{pmatrix}\qquad[u_2]_{\text {\tiny W}}=\begin{pmatrix} 0.5 \\ 4.5 \end{pmatrix}\]

Now we can verify that given [v]_{\text {\tiny U}} and A_{\text{\tiny U}\rightarrow \text{\tiny W}}, we can easily find [v]_{\text {\tiny W}}:

\[[v]_{\text{\tiny W}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}= \\ \begin{pmatrix} 0.5 & 0.5 \\ 2.5 & 4.5 \end{pmatrix} \\ \begin{pmatrix} 3 \\ -1 \end{pmatrix}=\\ \begin{pmatrix} 1 \\ 3 \end{pmatrix}\]

Indeed, it checks out! Let's also verify the other direction. To find A_{\text{\tiny W}\rightarrow \text{\tiny U}} we'll need [w_1]_{\text {\tiny U}} and [w_2]_{\text {\tiny U}}:

\[[w_1]_{\text {\tiny U}}=\begin{pmatrix} 4.5 \\ -2.5 \end{pmatrix}\qquad[w_2]_{\text {\tiny U}}=\begin{pmatrix}- 0.5 \\ 0.5 \end{pmatrix}\]

And now to find [v]_{\text {\tiny U}}:

\[[v]_{\text{\tiny U}}=A_{\text{\tiny W}\rightarrow \text{\tiny U}}[v]_{\text{\tiny W}}= \\ \begin{pmatrix} 4.5 & -0.5 \\ -2.5 & 0.5 \end{pmatrix} \\ \begin{pmatrix} 1 \\ 3 \end{pmatrix}=\\ \begin{pmatrix} 3 \\ -1 \end{pmatrix}\]

Checks out again! If you have a keen eye, or have recently spent some time solving linar algebra problems, you'll notice something interesting about the two basis change matrices used in this example. One is an inverse of the other! Is this some sort of coincidence? No - in fact, it's always true, and we can prove it.

The inverse of a change of basis matrix

We've derived the change of basis matrix from U to W to perform the conversion:

\[[v]_{\text{\tiny W}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]

Left-multiplying this equation by A_{\text{\tiny W}\rightarrow \text{\tiny U}}:

\[A_{\text{\tiny W}\rightarrow \text{\tiny U}}[v]_{\text{\tiny W}}=\\ A_{\text{\tiny W}\rightarrow \text{\tiny U}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]

But the left-hand side is now, by our earlier definition, equal to [v]_{\text{\tiny U}}, so we get:

\[[v]_{\text{\tiny U}}=\\ A_{\text{\tiny W}\rightarrow \text{\tiny U}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]

Since this is true for every vector [v]_{\text{\tiny U}}, it must be that:

\[A_{\text{\tiny W}\rightarrow \text{\tiny U}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}=I\]

From this, we can infer that A_{\text{\tiny W}\rightarrow \text{\tiny U}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}^{-1} and vice versa [5].

Changing to and from the standard basis

You may have noticed that in the examples above, we short-circuited a little bit of rigor by making up a vector (such as v=(2,4)) without explicitly specifying the basis its components are relative to. This is because we're so used to working with the "standard basis" we often forget it's there.

The standard basis (let's call it E) consists of unit vectors pointing in the directions of the axes of a Cartesian coordinate system. For \mathbb{R}^2 we have the basis vectors:

\[e_1=\begin{pmatrix} 1 \\ 0 \end{pmatrix}\qquad e_2=\begin{pmatrix} 0 \\ 1 \end{pmatrix}\]

And more generally in \mathbb{R}^n we have an ordered list of n vectors \left\{ e_i:1\leq i \leq n \right\} where e_i has 1 in the ith position and zeros elsewhere.

So when we say v=(2,4), what we actually mean is:

\[\begin{matrix} v=2e_1+4e_2 \\[1em] [v]_{\text {\tiny E}}=\begin{pmatrix} 2 \\ 4 \end{pmatrix} \end{matrix}\]

The standard basis is so ingrained in our intuition of vectors that we usually neglect to mention it. This is fine, as long as we're only dealing with the standard basis. Once change of basis is required, it's worthwhile to stick to a more consistent notation to avoid confusion. Moreover, it's often useful to change a vector's basis to or from the standard one. Let's see how that works. Recall how we use the change of basis matrix:

\[[v]_{\text{\tiny W}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]

Replacing the arbitrary basis W by the standard basis E in this equation, we get:

\[[v]_{\text{\tiny E}}=A_{\text{\tiny U}\rightarrow \text{\tiny E}}[v]_{\text{\tiny U}}\]

And A_{\text{\tiny U}\rightarrow \text{\tiny E}} is the matrix with [u_1]_{\text {\tiny E}} to [u_n]_{\text {\tiny E}} in its columns. But wait, these are just the basis vectors of U! So finding the matrix A_{\text{\tiny U}\rightarrow \text{\tiny E}} for any given basis U is trivial - simply line up U's basis vectors as columns in their order to get a matrix. This means that any square, invertible matrix can be seen as a change of basis matrix from the basis spelled out in its columns to the standard basis. This is a natural consequence of how multiplying a matrix by a vector works by linearly combining the matrix's columns.

OK, so we know how to find [v]_{\text {\tiny E}} given [v]_{\text {\tiny U}}. What about the other way around? We'll need A_{\text{\tiny E}\rightarrow \text{\tiny U}} for that, and we know that:

\[A_{\text{\tiny E}\rightarrow \text{\tiny U}}=A_{\text{\tiny U}\rightarrow \text{\tiny E}}^{-1}\]

Therefore:

\[[v]_{\text{\tiny U}}=\\ A_{\text{\tiny E}\rightarrow \text{\tiny U}}[v]_{\text{\tiny E}}=\\ A_{\text{\tiny U}\rightarrow \text{\tiny E}}^{-1}[v]_{\text{\tiny E}}\]

Chaining basis changes

What happens if we change a vector from one basis to another, and then change the resulting vector to yet another basis? I mean, for bases U, W and T and some arbitrary vector v, we'll do:

\[A_{\text{\tiny W}\rightarrow \text{\tiny T}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]

This is simply applying the change of basis by matrix multiplication equation, twice:

\[A_{\text{\tiny W}\rightarrow \text{\tiny T}}(A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}})=\\ A_{\text{\tiny W}\rightarrow \text{\tiny T}}[v]_{\text{\tiny W}}\\ =[v]_{\text{\tiny T}}\]

What this means is that changes of basis can be chained, which isn't surprising given their linear nature. It also means that we've just found A_{\text{\tiny U}\rightarrow \text{\tiny T}}, since we found how to transform [v]_{\text{\tiny U}} to [v]_{\text{\tiny T}} (using an intermediary basis W).

\[A_{\text{\tiny U}\rightarrow \text{\tiny T}}=\\ A_{\text{\tiny W}\rightarrow \text{\tiny T}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}\]

Finally, let's say that the indermediary basis is not just some arbitrary W, but the standard basis E. So we have:

\[A_{\text{\tiny U}\rightarrow \text{\tiny T}}=\\ A_{\text{\tiny E}\rightarrow \text{\tiny T}}A_{\text{\tiny U}\rightarrow \text{\tiny E}}=\\ A_{\text{\tiny T}\rightarrow \text{\tiny E}}^{-1}A_{\text{\tiny U}\rightarrow \text{\tiny E}}\]

We prefer the last form, since finding A_{\text{\tiny U}\rightarrow \text{\tiny E}} for any basis U is, as we've seen above, trivial.

Example: standard basis and chaining

It's time to solidify the ideas of the last two sections with a concrete example. We'll use our familiar bases U=(2,3), (4,5) and W=(-1,1), (1,1) from the previous example, along with the standard basis for \mathbb{R}^2. Previously, we transformed a vector v from U to W and vice-versa using the change of basis matrices between these bases. This time, let's do it by chaining via the standard basis.

We'll pick v=(2,4). Formally, the components of v relative to the standard basis are:

\[[v]_{\text{\tiny E}} = \begin{pmatrix} 2 \\ 4 \end{pmatrix}\]

In the last example we've already computed the components of v relative to U and W:

\[[v]_{\text {\tiny U}}=\begin{pmatrix} 3 \\ -1 \end{pmatrix}\qquad [v]_{\text {\tiny W}}=\begin{pmatrix} 1 \\ 3 \end{pmatrix}\]

Previously, one was computed from the other using the "direct" basis change matrices from U to W and vice versa. Now we can use chaining via the standard basis to achieve the same result. For example, we know that:

\[[v]_{\text{\tiny W}}=\\ A_{\text{\tiny E}\rightarrow \text{\tiny W}}A_{\text{\tiny U}\rightarrow \text{\tiny E}}[v]_{\text{\tiny U}}\]

Finding the change of basis matrices from some basis to E is just laying out the basis vectors as columns, so we immediately know that:

\[A_{\text{\tiny U}\rightarrow \text{\tiny E}}=\begin{pmatrix} 2 & 4\\ 3 & 5 \end{pmatrix}\qquad \qquad \\ A_{\text{\tiny W}\rightarrow \text{\tiny E}}=\begin{pmatrix} -1 & 1\\ 1 & 1 \end{pmatrix}\]

The change of basis matrix from E to some basis is the inverse, so by inverting the above matrices we find:

\[A_{\text{\tiny E}\rightarrow \text{\tiny U}}=A_{\text{\tiny U}\rightarrow \text{\tiny E}}^{-1}=\begin{pmatrix} -2.5 & 2 \\ 1.5 & -1 \end{pmatrix}\qquad \qquad \\ A_{\text{\tiny E}\rightarrow \text{\tiny W}}=A_{\text{\tiny W}\rightarrow \text{\tiny E}}^{-1}=\begin{pmatrix} -0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}\]

Now we have all we need to find [v]_{\text{\tiny W}} from [v]_{\text{\tiny U}}:

\[[v]_{\text{\tiny W}}=\\ A_{\text{\tiny E}\rightarrow \text{\tiny W}}A_{\text{\tiny U}\rightarrow \text{\tiny E}}[v]_{\text{\tiny U}}=\begin{pmatrix} -0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}\begin{pmatrix} 2 & 4\\ 3 & 5 \end{pmatrix}\begin{pmatrix} 3 \\ -1 \end{pmatrix}=\begin{pmatrix} 1 \\ 3 \end{pmatrix}\]

The other direction can be done similarly.


[1]Introduction to Linear Algebra, 4th edition, section 7.2
[2]Why is this list unique? Because given a basis U for a vector space V, every v\in V can be expressed uniquely as a linear combination of the vectors in U. The proof for this is very simple - just assume there are two different ways to express v - two alternative sets of components. Subtract one from the other and use linear independence of the basis vectors to conclude that the two ways must be the same one.
[3]The matrix here has the basis vectors laid out in its columns. Since the basis vectors are independent, the matrix is invertible. In our small example, the matrix equation we're looking to solve is:
\[\begin{pmatrix} 2 & 4 \\ 3 & 5 \end{pmatrix}\begin{pmatrix} c_1 \\ c_2 \end{pmatrix}=\begin{pmatrix} 2 \\ 4 \end{pmatrix}\]
[4]The example converts from the standard basis to some other basis, but converting from a non-standard basis to another requires exactly the same steps: we try to find coefficients such that a combination of some set of basis vectors adds up to some components in another basis.
[5]For square matrices A and B, if AB=I then also BA=I.