Knowing how to convert a vector to a different basis has many practical applications. Gilbert Strang has a nice quote about the importance of basis changes in his book [1] (emphasis mine):
The standard basis vectors forand
are the columns of I. That choice leads to a standard matrix, and
in the normal way. But these spaces also have other bases, so the same T is represented by other matrices. A main theme of linear algebra is to choose the bases that give the best matrix for T.
This should serve as a good motivation, but I'll leave the applications for future posts; in this one, I will focus on the mechanics of basis change, starting from first principles.
The basis and vector components
A basis of a vector space is a set of vectors in
that is
linearly independent and spans
. An ordered basis is a list, rather
than a set, meaning that the order of the vectors in an ordered basis matters.
This is important with respect to the topics discussed in this post.
Let's now define components. If is an ordered
basis for
and
is a vector in
, then there's a
unique [2] list of scalars
such that:
![\[v = c_1u_1+c_2u_2+...+c_nu_n\]](https://eli.thegreenplace.net/images/math/400f6b84c3ee13d328880d7b29bb7c467c868a33.png)
These are called the components of relative to the ordered basis
. We'll introduce a useful piece of notation here: collect the
components
into a column vector and call it
: this is the component vector of
relative to the basis
.
Example: finding a component vector
Let's use as an example.
is an
ordered basis for
(since the two vectors in it are
independent). Say we have
.
What is
? We'll need to solve the system of
equations:
![\[\begin{pmatrix} 2 \\ 4 \end{pmatrix}=c_1\begin{pmatrix} 2 \\ 3\end{pmatrix}+c_2\begin{pmatrix} 4 \\ 5 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/1793547a360ef6b1c94da80aebb1698bcd69e20e.png)
In the 2-D case this is trivial - the solution is and
. Therefore:
![\[[v]_{\text {\tiny U}}=\begin{pmatrix} 3 \\ -1 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/548b7f57bfc01932593b5cdaa597b77f531bd03b.png)
In the more general case of , this is akin to solving a
linear system of n equations with n variables. Since the basis vectors are, by
definition, linearly independent, solving the system is simply inverting a
matrix [3].
Change of basis matrix
Now comes the key part of the post. Say we have two different ordered bases for
the same vector space: and
. For some
, we can find
and
. How are these two related?
Surely, given we can find its coefficients in basis
the same way as we did in the example above [4]. It involves solving
a linear system of
equations. We'll have to redo this operation for
every vector
we want to convert. Is there a simpler way?
Luckily for science, yes. The key here is to find how the basis vectors of
look in basis
. In other words, we have to find
,
and so on to
.
Let's say we do that and find the coefficients to be such that:
![\[\begin{matrix} u_1=a_{11}w_1+a_{21}w_2+...+a_{n1}w_n \\ u_2=a_{12}w_1+a_{22}w_2+...+a_{n2}w_n \\ ... \\ u_n=a_{1n}w_1+a_{2n}w_2+...+a_{nn}w_n \end{matrix}\]](https://eli.thegreenplace.net/images/math/4c5dd3dc9c5d0acedafd6cd20c09a4498802577a.png)
Now, given some vector , suppose its components in basis
are:
![\[[v]_{\text{\tiny U}}=\begin{pmatrix} c_1 \\ c_2 \\ ... \\ c_n \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/83437d273512b461cab0f132626d1d64df3b32ae.png)
Let's try to figure out how it looks in basis . The above equation (by
definition of components) is equivalent to:
![\[v=c_1u_1+c_2u_2+...+c_nu_n\]](https://eli.thegreenplace.net/images/math/fe3fcc27f581a058afb05460629e332bc2fae909.png)
Substituting the expansion of the s in basis
, we get:
![\[v=\begin{matrix} c_1(a_{11}w_1+a_{21}w_2+...+a_{n1}w_n)+ \\ c_2(a_{12}w_1+a_{22}w_2+...+a_{n2}w_n)+ \\ ... \\ c_n(a_{1n}w_1+a_{2n}w_2+...+a_{nn}w_n) \end{matrix}\]](https://eli.thegreenplace.net/images/math/450faf4b2cc27042f6b5fd90cdf39b6588f89e67.png)
Reordering a bit to find the multipliers of each :
![\[v=\begin{matrix} (c_1a_{11}+c_2a_{12}+...+c_na_{1n})w_1+ \\ (c_1a_{21}+c_2a_{22}+...+c_na_{2n})w_2+ \\ ... \\ (c_1a_{n1}+c_2a_{n2}+...+c_na_{nn})w_n \end{matrix}\]](https://eli.thegreenplace.net/images/math/2504c20a0377cc5defb98814b0eed08043a2bfc3.png)
By our definition of vector components, this equation is equivalent to:
![\[[v]_{\text{\tiny W}}=\begin{pmatrix} c_1a_{11}+c_2a_{12}+...+c_na_{1n} \\ c_1a_{21}+c_2a_{22}+...+c_na_{2n} \\ ... \\ c_1a_{n1}+c_2a_{n2}+...+c_na_{nn} \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/0c385a0913584dc008dc0f088d429cf7fc432002.png)
Now we're in vector notation again, so we can decompose the column vector on the right hand side to:
![\[[v]_{\text{\tiny W}}=\begin{pmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} \\ ... & ... & ... \\ a_{n1} & a_{n2} & ... & a_{nn} \end{pmatrix}\begin{pmatrix}c_1 \\ c_2 \\ ... \\ c_n \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/b55ca14e97d3c8c1be864c00d4995f02f0406845.png)
This is matrix times a vector. The vector on the right is
. The matrix should look familiar too because it
consists of those
coefficients we've defined above. In fact, this
matrix just represents the basis vectors of
expressed in basis
. Let's call this matrix
- the change of basis matrix from
to
. It
has
to
laid out
in its columns:
![\[A_{\text{\tiny U}\rightarrow \text{\tiny W}}=\begin{pmatrix}[u_1]_{\text{\tiny W}},[u_2]_{\text{\tiny W}},...,[u_n]_{\text{\tiny W}}]\end{pmatrix}\]](https://eli.thegreenplace.net/images/math/48c37598af067782c747688c6f2f1b037539c14a.png)
So we have:
![\[[v]_{\text{\tiny W}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]](https://eli.thegreenplace.net/images/math/40ce253d22d14c257a969dca84539e9d06be237d.png)
To recap, given two bases and
, we can spend some effort to
compute the "change of basis" matrix
, but then we can easily convert any vector in basis
to basis
if we simply left-multiply it by this matrix.
A reasonable question to ask at this point is - what about converting from
to
? Well, since the computations above are completely
generic and don't special-case either base, we can just flip the roles of
and
and get another change of basis
matrix,
- it converts
vectors in base
to vectors in base
as follows:
![\[[v]_{\text{\tiny U}}=A_{\text{\tiny W}\rightarrow \text{\tiny U}}[v]_{\text{\tiny W}}\]](https://eli.thegreenplace.net/images/math/11b3d1909fe5b2e306590dbb6c5ab4b99c911e43.png)
And this matrix is:
![\[A_{\text{\tiny W}\rightarrow \text{\tiny U}}=\begin{pmatrix}[w_1]_{\text{\tiny U}},[w_2]_{\text{\tiny U}},...,[w_n]_{\text{\tiny U}}]\end{pmatrix}\]](https://eli.thegreenplace.net/images/math/a3d5cd46635be2f2936ce51aa9c484d5c491a5be.png)
We will soon see that the two change of basis matrices are intimately related; but first, an example.
Example: changing bases with matrices
Let's work through another concrete example in . We've
used the basis
before; let's use it again, and also add
the basis
. We've already seen that for
we have:
![\[[v]_{\text {\tiny U}}=\begin{pmatrix} 3 \\ -1 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/548b7f57bfc01932593b5cdaa597b77f531bd03b.png)
Similarly, we can solve a set of two equations to find :
![\[[v]_{\text {\tiny W}}=\begin{pmatrix} 1 \\ 3 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/cef00c5fb242059409d28cfd7a9de38cb87839a3.png)
OK, let's see how a change of basis matrix can be used to easily compute one
given the other. First, to find we'll need
and
. We know how to do that. The result is:
![\[[u_1]_{\text {\tiny W}}=\begin{pmatrix} 0.5 \\ 2.5 \end{pmatrix}\qquad[u_2]_{\text {\tiny W}}=\begin{pmatrix} 0.5 \\ 4.5 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/5eca6d9d809f75e136f1619b08ac6677448406d6.png)
Now we can verify that given and
, we can easily find
:
![\[[v]_{\text{\tiny W}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}= \\ \begin{pmatrix} 0.5 & 0.5 \\ 2.5 & 4.5 \end{pmatrix} \\ \begin{pmatrix} 3 \\ -1 \end{pmatrix}=\\ \begin{pmatrix} 1 \\ 3 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/95af5086112fead5258aabba524167e746969d37.png)
Indeed, it checks out! Let's also verify the other direction. To find
we'll need
and
:
![\[[w_1]_{\text {\tiny U}}=\begin{pmatrix} 4.5 \\ -2.5 \end{pmatrix}\qquad[w_2]_{\text {\tiny U}}=\begin{pmatrix}- 0.5 \\ 0.5 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/028bc2e59c73e299a1cdbaf05df5ed605b737512.png)
And now to find :
![\[[v]_{\text{\tiny U}}=A_{\text{\tiny W}\rightarrow \text{\tiny U}}[v]_{\text{\tiny W}}= \\ \begin{pmatrix} 4.5 & -0.5 \\ -2.5 & 0.5 \end{pmatrix} \\ \begin{pmatrix} 1 \\ 3 \end{pmatrix}=\\ \begin{pmatrix} 3 \\ -1 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/4e697d4673dcffaf65875566c470a2113defa3b0.png)
Checks out again! If you have a keen eye, or have recently spent some time solving linar algebra problems, you'll notice something interesting about the two basis change matrices used in this example. One is an inverse of the other! Is this some sort of coincidence? No - in fact, it's always true, and we can prove it.
The inverse of a change of basis matrix
We've derived the change of basis matrix from to
to perform
the conversion:
![\[[v]_{\text{\tiny W}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]](https://eli.thegreenplace.net/images/math/40ce253d22d14c257a969dca84539e9d06be237d.png)
Left-multiplying this equation by :
![\[A_{\text{\tiny W}\rightarrow \text{\tiny U}}[v]_{\text{\tiny W}}=\\ A_{\text{\tiny W}\rightarrow \text{\tiny U}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]](https://eli.thegreenplace.net/images/math/487b88a4e75475a46621a4cdbffb7fc37e30c920.png)
But the left-hand side is now, by our earlier definition, equal to
, so we get:
![\[[v]_{\text{\tiny U}}=\\ A_{\text{\tiny W}\rightarrow \text{\tiny U}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]](https://eli.thegreenplace.net/images/math/8d6e70905656f42e21896dda00ce2590f6218766.png)
Since this is true for every vector , it must be
that:
![\[A_{\text{\tiny W}\rightarrow \text{\tiny U}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}=I\]](https://eli.thegreenplace.net/images/math/ff93ca3481c29a874a9ab5e903321b1d6c4e38f0.png)
From this, we can infer that and vice versa [5].
Changing to and from the standard basis
You may have noticed that in the examples above, we short-circuited a little bit
of rigor by making up a vector (such as ) without explicitly
specifying the basis its components are relative to. This is because we're so
used to working with the "standard basis" we often forget it's there.
The standard basis (let's call it ) consists of unit vectors pointing
in the directions of the axes of a Cartesian coordinate system. For
we have the basis vectors:
![\[e_1=\begin{pmatrix} 1 \\ 0 \end{pmatrix}\qquad e_2=\begin{pmatrix} 0 \\ 1 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/cae83340f574d5d86529a8fc7e8bb578337b027b.png)
And more generally in we have an ordered list of
vectors
where
has 1 in
the
th position and zeros elsewhere.
So when we say , what we actually mean is:
![\[\begin{matrix} v=2e_1+4e_2 \\[1em] [v]_{\text {\tiny E}}=\begin{pmatrix} 2 \\ 4 \end{pmatrix} \end{matrix}\]](https://eli.thegreenplace.net/images/math/6e3fdb31e8f3be4dab7cd31785a9b649dfd472bd.png)
The standard basis is so ingrained in our intuition of vectors that we usually neglect to mention it. This is fine, as long as we're only dealing with the standard basis. Once change of basis is required, it's worthwhile to stick to a more consistent notation to avoid confusion. Moreover, it's often useful to change a vector's basis to or from the standard one. Let's see how that works. Recall how we use the change of basis matrix:
![\[[v]_{\text{\tiny W}}=A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]](https://eli.thegreenplace.net/images/math/40ce253d22d14c257a969dca84539e9d06be237d.png)
Replacing the arbitrary basis by the standard basis
in this
equation, we get:
![\[[v]_{\text{\tiny E}}=A_{\text{\tiny U}\rightarrow \text{\tiny E}}[v]_{\text{\tiny U}}\]](https://eli.thegreenplace.net/images/math/db4f797209370d41a5069009b16269967a6ba3ea.png)
And is the matrix with
to
in its
columns. But wait, these are just the basis vectors of
! So finding
the matrix
for any given
basis
is trivial - simply line up
's basis vectors as columns
in their order to get a matrix. This means that any square, invertible matrix
can be seen as a change of basis matrix from the basis spelled out in its
columns to the standard basis. This is a natural consequence of how multiplying
a matrix by a vector works by linearly combining the matrix's columns.
OK, so we know how to find given
. What about the other way around? We'll need
for that, and we know that:
![\[A_{\text{\tiny E}\rightarrow \text{\tiny U}}=A_{\text{\tiny U}\rightarrow \text{\tiny E}}^{-1}\]](https://eli.thegreenplace.net/images/math/60679b364e239e7764aecc891a5579a3fc204ea3.png)
Therefore:
![\[[v]_{\text{\tiny U}}=\\ A_{\text{\tiny E}\rightarrow \text{\tiny U}}[v]_{\text{\tiny E}}=\\ A_{\text{\tiny U}\rightarrow \text{\tiny E}}^{-1}[v]_{\text{\tiny E}}\]](https://eli.thegreenplace.net/images/math/c9c3e8722c55c44dc21aec2ba823cdb0c1f8a5a0.png)
Chaining basis changes
What happens if we change a vector from one basis to another, and then change
the resulting vector to yet another basis? I mean, for bases ,
and
and some arbitrary vector
, we'll do:
![\[A_{\text{\tiny W}\rightarrow \text{\tiny T}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}}\]](https://eli.thegreenplace.net/images/math/69c8d309f1e7fb5d65bd72570cdadc1769d315f0.png)
This is simply applying the change of basis by matrix multiplication equation, twice:
![\[A_{\text{\tiny W}\rightarrow \text{\tiny T}}(A_{\text{\tiny U}\rightarrow \text{\tiny W}}[v]_{\text{\tiny U}})=\\ A_{\text{\tiny W}\rightarrow \text{\tiny T}}[v]_{\text{\tiny W}}\\ =[v]_{\text{\tiny T}}\]](https://eli.thegreenplace.net/images/math/f80dd3f39e7e736972e31ec553e8541628ab038c.png)
What this means is that changes of basis can be chained, which isn't surprising
given their linear nature. It also means that we've just found
, since we found how to
transform
to
(using
an intermediary basis
).
![\[A_{\text{\tiny U}\rightarrow \text{\tiny T}}=\\ A_{\text{\tiny W}\rightarrow \text{\tiny T}}A_{\text{\tiny U}\rightarrow \text{\tiny W}}\]](https://eli.thegreenplace.net/images/math/e1235761ce18868df7f936908c80f49c464550bc.png)
Finally, let's say that the indermediary basis is not just some arbitrary , but the
standard basis
. So we have:
![\[A_{\text{\tiny U}\rightarrow \text{\tiny T}}=\\ A_{\text{\tiny E}\rightarrow \text{\tiny T}}A_{\text{\tiny U}\rightarrow \text{\tiny E}}=\\ A_{\text{\tiny T}\rightarrow \text{\tiny E}}^{-1}A_{\text{\tiny U}\rightarrow \text{\tiny E}}\]](https://eli.thegreenplace.net/images/math/5e2c2ecd7ad9e15cfa07d8d9f5eef1c26479c4cd.png)
We prefer the last form, since finding for any basis
is, as we've seen above, trivial.
Example: standard basis and chaining
It's time to solidify the ideas of the last two sections with a concrete
example. We'll use our familiar bases and
from the previous example, along with the standard basis
for
. Previously, we transformed a vector
from
to
and vice-versa using the change of basis matrices between
these bases. This time, let's do it by chaining via the standard basis.
We'll pick . Formally, the components of
relative to
the standard basis are:
![\[[v]_{\text{\tiny E}} = \begin{pmatrix} 2 \\ 4 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/69766b4a4bc3f500ac1f25d3367774958f163084.png)
In the last example we've already computed the components of relative
to
and
:
![\[[v]_{\text {\tiny U}}=\begin{pmatrix} 3 \\ -1 \end{pmatrix}\qquad [v]_{\text {\tiny W}}=\begin{pmatrix} 1 \\ 3 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/c7cac93827ffa171be55db031004356516fb98fa.png)
Previously, one was computed from the other using the "direct" basis change
matrices from to
and vice versa. Now we can use chaining via
the standard basis to achieve the same result. For example, we know that:
![\[[v]_{\text{\tiny W}}=\\ A_{\text{\tiny E}\rightarrow \text{\tiny W}}A_{\text{\tiny U}\rightarrow \text{\tiny E}}[v]_{\text{\tiny U}}\]](https://eli.thegreenplace.net/images/math/ec831bdd78639c2bf290e705c7efb0cb4908cd16.png)
Finding the change of basis matrices from some basis to is just laying
out the basis vectors as columns, so we immediately know that:
![\[A_{\text{\tiny U}\rightarrow \text{\tiny E}}=\begin{pmatrix} 2 & 4\\ 3 & 5 \end{pmatrix}\qquad \qquad \\ A_{\text{\tiny W}\rightarrow \text{\tiny E}}=\begin{pmatrix} -1 & 1\\ 1 & 1 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/56debe1dbb75f938c25fe7b4645b6602bb13e637.png)
The change of basis matrix from to some basis is the inverse, so by
inverting the above matrices we find:
![\[A_{\text{\tiny E}\rightarrow \text{\tiny U}}=A_{\text{\tiny U}\rightarrow \text{\tiny E}}^{-1}=\begin{pmatrix} -2.5 & 2 \\ 1.5 & -1 \end{pmatrix}\qquad \qquad \\ A_{\text{\tiny E}\rightarrow \text{\tiny W}}=A_{\text{\tiny W}\rightarrow \text{\tiny E}}^{-1}=\begin{pmatrix} -0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/8810ea8701a2aba7df82e88302df93c890a26e26.png)
Now we have all we need to find from
:
![\[[v]_{\text{\tiny W}}=\\ A_{\text{\tiny E}\rightarrow \text{\tiny W}}A_{\text{\tiny U}\rightarrow \text{\tiny E}}[v]_{\text{\tiny U}}=\begin{pmatrix} -0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}\begin{pmatrix} 2 & 4\\ 3 & 5 \end{pmatrix}\begin{pmatrix} 3 \\ -1 \end{pmatrix}=\begin{pmatrix} 1 \\ 3 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/1f5875aba56faf3fda3c9f4c72b1421529958116.png)
The other direction can be done similarly.
[1] | Introduction to Linear Algebra, 4th edition, section 7.2 |
[2] | Why is this list unique? Because given a basis ![]() ![]() ![]() ![]() ![]() |
[3] | The matrix here has the basis vectors laid out in its columns. Since the basis vectors are independent, the matrix is invertible. In our small example, the matrix equation we're looking to solve is: |
![\[\begin{pmatrix} 2 & 4 \\ 3 & 5 \end{pmatrix}\begin{pmatrix} c_1 \\ c_2 \end{pmatrix}=\begin{pmatrix} 2 \\ 4 \end{pmatrix}\]](https://eli.thegreenplace.net/images/math/6d840237e5940eaadf2002f888e8537e48e90158.png)
[4] | The example converts from the standard basis to some other basis, but converting from a non-standard basis to another requires exactly the same steps: we try to find coefficients such that a combination of some set of basis vectors adds up to some components in another basis. |
[5] | For square matrices ![]() ![]() ![]() ![]() |