Knowing how to convert a vector to a different basis has many practical applications. Gilbert Strang has a nice quote about the importance of basis changes in his book [1] (emphasis mine):
The standard basis vectors for and are the columns of I. That choice leads to a standard matrix, and in the normal way. But these spaces also have other bases, so the same T is represented by other matrices. A main theme of linear algebra is to choose the bases that give the best matrix for T.
This should serve as a good motivation, but I'll leave the applications for future posts; in this one, I will focus on the mechanics of basis change, starting from first principles.
The basis and vector components
A basis of a vector space is a set of vectors in that is linearly independent and spans . An ordered basis is a list, rather than a set, meaning that the order of the vectors in an ordered basis matters. This is important with respect to the topics discussed in this post.
Let's now define components. If is an ordered basis for and is a vector in , then there's a unique [2] list of scalars such that:
These are called the components of relative to the ordered basis . We'll introduce a useful piece of notation here: collect the components into a column vector and call it : this is the component vector of relative to the basis .
Example: finding a component vector
Let's use as an example. is an ordered basis for (since the two vectors in it are independent). Say we have . What is ? We'll need to solve the system of equations:
In the 2-D case this is trivial - the solution is and . Therefore:
In the more general case of , this is akin to solving a linear system of n equations with n variables. Since the basis vectors are, by definition, linearly independent, solving the system is simply inverting a matrix [3].
Change of basis matrix
Now comes the key part of the post. Say we have two different ordered bases for the same vector space: and . For some , we can find and . How are these two related?
Surely, given we can find its coefficients in basis the same way as we did in the example above [4]. It involves solving a linear system of equations. We'll have to redo this operation for every vector we want to convert. Is there a simpler way?
Luckily for science, yes. The key here is to find how the basis vectors of look in basis . In other words, we have to find , and so on to .
Let's say we do that and find the coefficients to be such that:
Now, given some vector , suppose its components in basis are:
Let's try to figure out how it looks in basis . The above equation (by definition of components) is equivalent to:
Substituting the expansion of the s in basis , we get:
Reordering a bit to find the multipliers of each :
By our definition of vector components, this equation is equivalent to:
Now we're in vector notation again, so we can decompose the column vector on the right hand side to:
This is matrix times a vector. The vector on the right is . The matrix should look familiar too because it consists of those coefficients we've defined above. In fact, this matrix just represents the basis vectors of expressed in basis . Let's call this matrix - the change of basis matrix from to . It has to laid out in its columns:
So we have:
To recap, given two bases and , we can spend some effort to compute the "change of basis" matrix , but then we can easily convert any vector in basis to basis if we simply left-multiply it by this matrix.
A reasonable question to ask at this point is - what about converting from to ? Well, since the computations above are completely generic and don't special-case either base, we can just flip the roles of and and get another change of basis matrix, - it converts vectors in base to vectors in base as follows:
And this matrix is:
We will soon see that the two change of basis matrices are intimately related; but first, an example.
Example: changing bases with matrices
Let's work through another concrete example in . We've used the basis before; let's use it again, and also add the basis . We've already seen that for we have:
Similarly, we can solve a set of two equations to find :
OK, let's see how a change of basis matrix can be used to easily compute one given the other. First, to find we'll need and . We know how to do that. The result is:
Now we can verify that given and , we can easily find :
Indeed, it checks out! Let's also verify the other direction. To find we'll need and :
And now to find :
Checks out again! If you have a keen eye, or have recently spent some time solving linar algebra problems, you'll notice something interesting about the two basis change matrices used in this example. One is an inverse of the other! Is this some sort of coincidence? No - in fact, it's always true, and we can prove it.
The inverse of a change of basis matrix
We've derived the change of basis matrix from to to perform the conversion:
Left-multiplying this equation by :
But the left-hand side is now, by our earlier definition, equal to , so we get:
Since this is true for every vector , it must be that:
From this, we can infer that and vice versa [5].
Changing to and from the standard basis
You may have noticed that in the examples above, we short-circuited a little bit of rigor by making up a vector (such as ) without explicitly specifying the basis its components are relative to. This is because we're so used to working with the "standard basis" we often forget it's there.
The standard basis (let's call it ) consists of unit vectors pointing in the directions of the axes of a Cartesian coordinate system. For we have the basis vectors:
And more generally in we have an ordered list of vectors where has 1 in the th position and zeros elsewhere.
So when we say , what we actually mean is:
The standard basis is so ingrained in our intuition of vectors that we usually neglect to mention it. This is fine, as long as we're only dealing with the standard basis. Once change of basis is required, it's worthwhile to stick to a more consistent notation to avoid confusion. Moreover, it's often useful to change a vector's basis to or from the standard one. Let's see how that works. Recall how we use the change of basis matrix:
Replacing the arbitrary basis by the standard basis in this equation, we get:
And is the matrix with to in its columns. But wait, these are just the basis vectors of ! So finding the matrix for any given basis is trivial - simply line up 's basis vectors as columns in their order to get a matrix. This means that any square, invertible matrix can be seen as a change of basis matrix from the basis spelled out in its columns to the standard basis. This is a natural consequence of how multiplying a matrix by a vector works by linearly combining the matrix's columns.
OK, so we know how to find given . What about the other way around? We'll need for that, and we know that:
Therefore:
Chaining basis changes
What happens if we change a vector from one basis to another, and then change the resulting vector to yet another basis? I mean, for bases , and and some arbitrary vector , we'll do:
This is simply applying the change of basis by matrix multiplication equation, twice:
What this means is that changes of basis can be chained, which isn't surprising given their linear nature. It also means that we've just found , since we found how to transform to (using an intermediary basis ).
Finally, let's say that the indermediary basis is not just some arbitrary , but the standard basis . So we have:
We prefer the last form, since finding for any basis is, as we've seen above, trivial.
Example: standard basis and chaining
It's time to solidify the ideas of the last two sections with a concrete example. We'll use our familiar bases and from the previous example, along with the standard basis for . Previously, we transformed a vector from to and vice-versa using the change of basis matrices between these bases. This time, let's do it by chaining via the standard basis.
We'll pick . Formally, the components of relative to the standard basis are:
In the last example we've already computed the components of relative to and :
Previously, one was computed from the other using the "direct" basis change matrices from to and vice versa. Now we can use chaining via the standard basis to achieve the same result. For example, we know that:
Finding the change of basis matrices from some basis to is just laying out the basis vectors as columns, so we immediately know that:
The change of basis matrix from to some basis is the inverse, so by inverting the above matrices we find:
Now we have all we need to find from :
The other direction can be done similarly.
[1] | Introduction to Linear Algebra, 4th edition, section 7.2 |
[2] | Why is this list unique? Because given a basis for a vector space , every can be expressed uniquely as a linear combination of the vectors in . The proof for this is very simple - just assume there are two different ways to express - two alternative sets of components. Subtract one from the other and use linear independence of the basis vectors to conclude that the two ways must be the same one. |
[3] | The matrix here has the basis vectors laid out in its columns. Since the basis vectors are independent, the matrix is invertible. In our small example, the matrix equation we're looking to solve is: |
[4] | The example converts from the standard basis to some other basis, but converting from a non-standard basis to another requires exactly the same steps: we try to find coefficients such that a combination of some set of basis vectors adds up to some components in another basis. |
[5] | For square matrices and , if then also . |