Change of basis in Linear Algebra



Knowing how to convert a vector to a different basis has many practical applications. Gilbert Strang has a nice quote about the importance of basis changes in his book [1] (emphasis mine):

The standard basis vectors for <math> and <math> are the columns of I. That choice leads to a standard matrix, and <math> in the normal way. But these spaces also have other bases, so the same T is represented by other matrices. A main theme of linear algebra is to choose the bases that give the best matrix for T.

This should serve as a good motivation, but I'll leave the applications for future posts; in this one, I will focus on the mechanics of basis change, starting from first principles.

The basis and vector components

A basis of a vector space <math> is a set of vectors in <math> that is linearly independent and spans <math>. An ordered basis is a list, rather than a set, meaning that the order of the vectors in an ordered basis matters. This is important with respect to the topics discussed in this post.

Let's now define components. If <math> is an ordered basis for <math> and <math> is a vector in <math>, then there's a unique [2] list of scalars <math> such that:

<math>

These are called the components of <math> relative to the ordered basis <math>. We'll introduce a useful piece of notation here: collect the components <math> into a column vector and call it <math>: this is the component vector of <math> relative to the basis <math>.

Example: finding a component vector

Let's use <math> as an example. <math> is an ordered basis for <math> (since the two vectors in it are independent). Say we have <math>. What is <math>? We'll need to solve the system of equations:

<math>

In the 2-D case this is trivial - the solution is <math> and <math>. Therefore:

<math>

In the more general case of <math>, this is akin to solving a linear system of n equations with n variables. Since the basis vectors are, by definition, linearly independent, solving the system is simply inverting a matrix [3].

Change of basis matrix

Now comes the key part of the post. Say we have two different ordered bases for the same vector space: <math> and <math>. For some <math>, we can find <math> and <math>. How are these two related?

Surely, given <math> we can find its coefficients in basis <math> the same way as we did in the example above [4]. It involves solving a linear system of <math> equations. We'll have to redo this operation for every vector <math> we want to convert. Is there a simpler way?

Luckily for science, yes. The key here is to find how the basis vectors of <math> look in basis <math>. In other words, we have to find <math>, <math> and so on to <math>.

Let's say we do that and find the coefficients to be <math> such that:

<math>

Now, given some vector <math>, suppose its components in basis <math> are:

<math>

Let's try to figure out how it looks in basis <math>. The above equation (by definition of components) is equivalent to:

<math>

Substituting the expansion of the <math>s in basis <math>, we get:

<math>

Reordering a bit to find the multipliers of each <math>:

<math>

By our definition of vector components, this equation is equivalent to:

<math>

Now we're in vector notation again, so we can decompose the column vector on the right hand side to:

<math>

This is matrix times a vector. The vector on the right is <math>. The matrix should look familiar too because it consists of those <math> coefficients we've defined above. In fact, this matrix just represents the basis vectors of <math> expressed in basis <math>. Let's call this matrix <math> - the change of basis matrix from <math> to <math>. It has <math> to <math> laid out in its columns:

<math>

So we have:

<math>

To recap, given two bases <math> and <math>, we can spend some effort to compute the "change of basis" matrix <math>, but then we can easily convert any vector in basis <math> to basis <math> if we simply left-multiply it by this matrix.

A reasonable question to ask at this point is - what about converting from <math> to <math>? Well, since the computations above are completely generic and don't special-case either base, we can just flip the roles of <math> and <math> and get another change of basis matrix, <math> - it converts vectors in base <math> to vectors in base <math> as follows:

<math>

And this matrix is:

<math>

We will soon see that the two change of basis matrices are intimately related; but first, an example.

Example: changing bases with matrices

Let's work through another concrete example in <math>. We've used the basis <math> before; let's use it again, and also add the basis <math>. We've already seen that for <math> we have:

<math>

Similarly, we can solve a set of two equations to find <math>:

<math>

OK, let's see how a change of basis matrix can be used to easily compute one given the other. First, to find <math> we'll need <math> and <math>. We know how to do that. The result is:

<math>

Now we can verify that given <math> and <math>, we can easily find <math>:

<math>

Indeed, it checks out! Let's also verify the other direction. To find <math> we'll need <math> and <math>:

<math>

And now to find <math>:

<math>

Checks out again! If you have a keen eye, or have recently spent some time solving linar algebra problems, you'll notice something interesting about the two basis change matrices used in this example. One is an inverse of the other! Is this some sort of coincidence? No - in fact, it's always true, and we can prove it.

The inverse of a change of basis matrix

We've derived the change of basis matrix from <math> to <math> to perform the conversion:

<math>

Left-multiplying this equation by <math>:

<math>

But the left-hand side is now, by our earlier definition, equal to <math>, so we get:

<math>

Since this is true for every vector <math>, it must be that:

<math>

From this, we can infer that <math> and vice versa [5].

Changing to and from the standard basis

You may have noticed that in the examples above, we short-circuited a little bit of rigor by making up a vector (such as <math>) without explicitly specifying the basis its components are relative to. This is because we're so used to working with the "standard basis" we often forget it's there.

The standard basis (let's call it <math>) consists of unit vectors pointing in the directions of the axes of a Cartesian coordinate system. For <math> we have the basis vectors:

<math>

And more generally in <math> we have an ordered list of <math> vectors <math> where <math> has 1 in the <math>th position and zeros elsewhere.

So when we say <math>, what we actually mean is:

<math>

The standard basis is so ingrained in our intuition of vectors that we usually neglect to mention it. This is fine, as long as we're only dealing with the standard basis. Once change of basis is required, it's worthwhile to stick to a more consistent notation to avoid confusion. Moreover, it's often useful to change a vector's basis to or from the standard one. Let's see how that works. Recall how we use the change of basis matrix:

<math>

Replacing the arbitrary basis <math> by the standard basis <math> in this equation, we get:

<math>

And <math> is the matrix with <math> to <math> in its columns. But wait, these are just the basis vectors of <math>! So finding the matrix <math> for any given basis <math> is trivial - simply line up <math>'s basis vectors as columns in their order to get a matrix. This means that any square, invertible matrix can be seen as a change of basis matrix from the basis spelled out in its columns to the standard basis. This is a natural consequence of how multiplying a matrix by a vector works by linearly combining the matrix's columns.

OK, so we know how to find <math> given <math>. What about the other way around? We'll need <math> for that, and we know that:

<math>

Therefore:

<math>

Chaining basis changes

What happens if we change a vector from one basis to another, and then change the resulting vector to yet another basis? I mean, for bases <math>, <math> and <math> and some arbitrary vector <math>, we'll do:

<math>

This is simply applying the change of basis by matrix multiplication equation, twice:

<math>

What this means is that changes of basis can be chained, which isn't surprising given their linear nature. It also means that we've just found <math>, since we found how to transform <math> to <math> (using an intermediary basis <math>).

<math>

Finally, let's say that the indermediary basis is not just some arbitrary <math>, but the standard basis <math>. So we have:

<math>

We prefer the last form, since finding <math> for any basis <math> is, as we've seen above, trivial.

Example: standard basis and chaining

It's time to solidify the ideas of the last two sections with a concrete example. We'll use our familiar bases <math> and <math> from the previous example, along with the standard basis for <math>. Previously, we transformed a vector <math> from <math> to <math> and vice-versa using the change of basis matrices between these bases. This time, let's do it by chaining via the standard basis.

We'll pick <math>. Formally, the components of <math> relative to the standard basis are:

<math>

In the last example we've already computed the components of <math> relative to <math> and <math>:

<math>

Previously, one was computed from the other using the "direct" basis change matrices from <math> to <math> and vice versa. Now we can use chaining via the standard basis to achieve the same result. For example, we know that:

<math>

Finding the change of basis matrices from some basis to <math> is just laying out the basis vectors as columns, so we immediately know that:

<math>

The change of basis matrix from <math> to some basis is the inverse, so by inverting the above matrices we find:

<math>

Now we have all we need to find <math> from <math>:

<math>

The other direction can be done similarly.


[1]Introduction to Linear Algebra, 4th edition, section 7.2
[2]Why is this list unique? Because given a basis <math> for a vector space <math>, every <math> can be expressed uniquely as a linear combination of the vectors in <math>. The proof for this is very simple - just assume there are two different ways to express <math> - two alternative sets of components. Subtract one from the other and use linear independence of the basis vectors to conclude that the two ways must be the same one.
[3]The matrix here has the basis vectors laid out in its columns. Since the basis vectors are independent, the matrix is invertible. In our small example, the matrix equation we're looking to solve is:
<math>
[4]The example converts from the standard basis to some other basis, but converting from a non-standard basis to another requires exactly the same steps: we try to find coefficients such that a combination of some set of basis vectors gives adds up to some components in another basis.
[5]For square matrices <math> and <math>, if <math> then also <math>.

Programmatic access to the call stack in C++



Sometimes when working on a large project, I find it useful to figure out all the places from which some function or method is called. Moreover, more often than not I don't just want the immediate caller, but the whole call stack. This is most useful in two scenarios - when debugging and when trying to figure out how some code works.

One possible solution is to use a debugger - run the program within a debugger, place a breakpoint in the interesting place, examine call stack when stopped. While this works and can sometimes be very useful, I personally prefer a more programmatic approach. I want to change the code in a way that will print out the call stack in every place I find interesting. Then I can use grepping and more sophisticated tools to analyze the call logs and thus gain a better understanding of the workings of some piece of code.

In this post, I want to present a relatively simple method to do this. It's aimed mainly at Linux, but should work with little modification on other Unixes (including OS X).

Obtaining the backtrace - libunwind

I'm aware of three reasonably well-known methods of accessing the call stack programmatically:

  1. The gcc builtin macro __builtin_return_address: very crude, low-level approach. This obtains the return address of the function on each frame on the stack. Note: just the address, not the function name. So extra processing is required to obtain the function name.
  2. glibc's backtrace and backtrace_symbols: can obtain the actual symbol names for the functions on the call stack.
  3. libunwind

Between the three, I strongly prefer libunwind, as it's the most modern, widespread and portable solution. It's also more flexible than backtrace, being able to provide extra information such as values of CPU registers at each stack frame.

Moreover, in the zoo of system programming, libunwind is the closest to the "official word" you can get these days. For example, gcc can use libunwind for implementing zero-cost C++ exceptions (which requires stack unwinding when an exception is actually thrown) [1]. LLVM also has a re-implementation of the libunwind interface in libc++, which is used for unwinding in LLVM toolchains based on this library.

Code sample

Here's a complete code sample for using libunwind to obtain the backtrace from an arbitrary point in the execution of a program. Refer to the libunwind documentation for more details about the API functions invoked here:

#define UNW_LOCAL_ONLY
#include <libunwind.h>
#include <stdio.h>

// Call this function to get a backtrace.
void backtrace() {
  unw_cursor_t cursor;
  unw_context_t context;

  // Initialize cursor to current frame for local unwinding.
  unw_getcontext(&context);
  unw_init_local(&cursor, &context);

  // Unwind frames one by one, going up the frame stack.
  while (unw_step(&cursor) > 0) {
    unw_word_t offset, pc;
    unw_get_reg(&cursor, UNW_REG_IP, &pc);
    if (pc == 0) {
      break;
    }
    printf("0x%lx:", pc);

    char sym[256];
    if (unw_get_proc_name(&cursor, sym, sizeof(sym), &offset) == 0) {
      printf(" (%s+0x%lx)\n", sym, offset);
    } else {
      printf(" -- error: unable to obtain symbol name for this frame\n");
    }
  }
}

void foo() {
  backtrace(); // <-------- backtrace here!
}

void bar() {
  foo();
}

int main(int argc, char **argv) {
  bar();

  return 0;
}

libunwind is easy to install from source or as a package. I just built it from source with the usual configure, make and make install sequence and placed it into /usr/local/lib.

Once you have libunwind installed in a place the compiler can find [2], compile the code snippet with:

gcc -o libunwind_backtrace -Wall -g libunwind_backtrace.c -lunwind

Finally, run:

$ LD_LIBRARY_PATH=/usr/local/lib ./libunwind_backtrace
0x400958: (foo+0xe)
0x400968: (bar+0xe)
0x400983: (main+0x19)
0x7f6046b99ec5: (__libc_start_main+0xf5)
0x400779: (_start+0x29)

So we get the complete call stack at the point where backtrace is called. We can obtain the function symbol names and the address of the instruction where the call was made (more precisely, the return address which is the next instruction).

Sometimes, however, we want not only the caller's name, but also the call location (source file name + line number). This is useful when one function calls another from multiple locations and we want to pinpoint which one is actually part of a given call stack. libunwind gives us the call address, but nothing beyond. Fortunately, it's all in the DWARF information of the binary, and given the address we can extract the exact call location in a number of ways. The simplest is probably to call addr2line:

$ addr2line 0x400968 -e libunwind_backtrace
libunwind_backtrace.c:37

We pass the PC address to the left of the bar frame to addr2line and get the file name and line number.

Alternatively, we can use the dwarf_decode_address example from pyelftools to obtain the same information:

$ python <path>/dwarf_decode_address.py 0x400968 libunwind_backtrace
Processing file: libunwind_backtrace
Function: bar
File: libunwind_backtrace.c
Line: 37

If printing out the exact locations is important for you during the backtrace call, you can also go fully programmatic by using libdwarf to open the executable and read this information from it, in the backtrace call. There's a section and a code sample about a very similar task in my blog post on debuggers.

C++ and mangled function names

The code sample above works well, but these days one is most likely writing C++ code and not C, so there's a slight problem. In C++, names of functions and methods are mangled. This is essential to make C++ features like function overloading, namespaces and templates work. Let's say the actual call sequence is:

namespace ns {

template <typename T, typename U>
void foo(T t, U u) {
  backtrace(); // <-------- backtrace here!
}

}  // namespace ns

template <typename T>
struct Klass {
  T t;
  void bar() {
    ns::foo(t, true);
  }
};

int main(int argc, char** argv) {
  Klass<double> k;
  k.bar();

  return 0;
}

The backtrace printed will then be:

0x400b3d: (_ZN2ns3fooIdbEEvT_T0_+0x17)
0x400b24: (_ZN5KlassIdE3barEv+0x26)
0x400af6: (main+0x1b)
0x7fc02c0c4ec5: (__libc_start_main+0xf5)
0x4008b9: (_start+0x29)

Oops, that's not nice. While some seasoned C++ veterans can usually make sense of simple mangled names (kinda like system programmers who can read text from hex ASCII), when the code is heavily templated this can get ugly very quickly.

One solution is to use a command-line tool - c++filt:

$ c++filt _ZN2ns3fooIdbEEvT_T0_
void ns::foo<double, bool>(double, bool)

However, it would be nicer if our backtrace dumper would print the demangled name directly. Luckily, this is pretty easy to do, using the cxxabi.h API that's part of libstdc++ (more precisely, libsupc++). libc++ also provides it in the low-level libc++abi. All we need to do is call abi::__cxa_demangle. Here's a complete example:

#define UNW_LOCAL_ONLY
#include <cxxabi.h>
#include <libunwind.h>
#include <cstdio>
#include <cstdlib>

void backtrace() {
  unw_cursor_t cursor;
  unw_context_t context;

  // Initialize cursor to current frame for local unwinding.
  unw_getcontext(&context);
  unw_init_local(&cursor, &context);

  // Unwind frames one by one, going up the frame stack.
  while (unw_step(&cursor) > 0) {
    unw_word_t offset, pc;
    unw_get_reg(&cursor, UNW_REG_IP, &pc);
    if (pc == 0) {
      break;
    }
    std::printf("0x%lx:", pc);

    char sym[256];
    if (unw_get_proc_name(&cursor, sym, sizeof(sym), &offset) == 0) {
      char* nameptr = sym;
      int status;
      char* demangled = abi::__cxa_demangle(sym, nullptr, nullptr, &status);
      if (status == 0) {
        nameptr = demangled;
      }
      std::printf(" (%s+0x%lx)\n", nameptr, offset);
      std::free(demangled);
    } else {
      std::printf(" -- error: unable to obtain symbol name for this frame\n");
    }
  }
}

namespace ns {

template <typename T, typename U>
void foo(T t, U u) {
  backtrace(); // <-------- backtrace here!
}

}  // namespace ns

template <typename T>
struct Klass {
  T t;
  void bar() {
    ns::foo(t, true);
  }
};

int main(int argc, char** argv) {
  Klass<double> k;
  k.bar();

  return 0;
}

This time, the backtrace is printed with all names nicely demangled:

$ LD_LIBRARY_PATH=/usr/local/lib ./libunwind_backtrace_demangle
0x400b59: (void ns::foo<double, bool>(double, bool)+0x17)
0x400b40: (Klass<double>::bar()+0x26)
0x400b12: (main+0x1b)
0x7f6337475ec5: (__libc_start_main+0xf5)
0x4008b9: (_start+0x29)

[1]AFAIK, gcc indeed uses libunwind by default on some architectures, though it uses an alternative unwinder on others. Please correct me if I'm missing something here.
[2]If your libunwind is in a non-standard location, you'll need to provide additional -I and -L flags.

Summary of reading: April - June 2015



  • "Introduction to Linear Algebra" by Gilbert Strang - This book is the lecture notes for the popular MIT OCW 18.06 course on Linear Algebra by the author. Note that I say "lecture notes", which is a better way to describe this book than "a textbook". It follows the lectures very closely, and the writing is the same conversational style Prof. Strang uses in his lectures. The book (and lectures) cover basic linear algebra really well and provide a useful glimpse into multiple cool applications of the math studied. On the flip side, the book could be more rigorous - Prof. Strang clearly doesn't believe in the usefulness of proofs and rigor in this first introduction to the subject, and relies on example and intuition. While there's no doubt that he's a gifted teacher, some amount of proving would be a good supplement.
  • "Biology: The Science of Life" by Stephen Novicki (audio course) - My interest in biology has mainly circled around genetics and evolution, so I wanted to expand my horizons a bit. This is a general "introduction to biology" course that encompasses a much broader view of the subject. Sure, there's a bunch of genetics in it, but also a lot of stuff about the structure of cells and organisms, as well as whole ecosystems and populations. This is also the longest course/audiobook I've head to date (36 hours).
  • "How To Prove It" by Daniel Velleman - link to full review.
  • "The Four Pillars of Investing" by William J. Bernstein - Somewhat similar to "A Random Walk Down Wall Street" - trashing stock picking, financial advisors, brokers and the financial press, recommending diversified market investment (ETFs, market indexing funds, etc). The first part of the book is a pretty good analysis of risk vs. return, both historically and conceptually. Another thing I liked is the analysis of investment strategies based on tax-protected (IRAs) or taxable accounts. For example, rebalancing a portfolio makes sense for tax-protected accounts, but not necessarily for taxables because of the capital gains taxes incurred. Conversely it's curious that the author (having written the book shortly after the dot-com bubble crash) is bearish on the stock market in the foreseeable future, something that didn't materialize (with the strong bull market after the 2008 crash). All in all a very good read on investing - there's certainly some depth to the book, so it may be worthwhile to revisit it in the future.
  • "Starship Troopers" by Robert Heinlein - Very nice science fiction about a mobile infantry fighter in a futuristic interstellar setting.
  • "1-2-3 Magic: Effective Discipline for Children 2–12" by Thomas W. Phelan - I very rarely read parenting books, but this one had such stellar reviews that I decided to try. Besides, I've noticed other parents "counting" their kids and was curious what it's all about. I wasn't disappointed - the book is excellent. Not only the main thesis - which is the why and how of counting, but also some "secondary" chapters on developing healthy relations with your children, are great. Just the chapter that talks about listening skills and overparenting is worth reading the book for, in my opinion, in addition to all the other great material. One small criticism is that the handling of "start" behaviors, though addressed in depth, felt somewhat weak compared to the other parts. But it may be because there's just no good solution, I'm not sure.
  • "The Deadline: A Novel About Project Management" by Tom DeMarco - an unusual book - hybrid between a weird pseudo-scifi novel and a "best guidelines for project management" book. There are some good ideas w.r.t. project management here, but also some weird ones. The preaching is very waterfall-oriented - meticulous collection of requirements, followed by very detailed design, followed by a very short period of coding. Leaving only 1/6th of the duration of the project for coding, because the design is so spotless? That's kinda weird. Getting rid of code inspections/reviews because, again, the design is perfect? You've got to be kidding me. Maybe this part of the book is a joke, who knows. Also, the super-detailed models they build to very accurately describe the project and all kinds of processes involved in developing it... Perhaps this would work for some very boilerplate projects, but I don't see its general applicability. This criticism notwithstanding, overall the book is pretty funny and not a bad way to teach some interesting lessons.

Re-reads:

  • "The soul of a new machine" by Tracy Kidder
  • "The rational optimist" by Matt Ridley