Visualizing matrix multiplication as a linear combination

When multiplying two matrices, there's a manual procedure we all know how to go through. Each result cell is computed separately as the dot-product of a row in the first matrix with a column in the second matrix. While it's the easiest way to compute the result manually, it may obscure a very interesting property of the operation: multiplying A by B is the linear combination of A's columns using coefficients from B. Another way to look at it is that it's a linear combination of the rows of B using coefficients from A.

In this quick post I want to show a colorful visualization that will make this easier to grasp.

Right-multiplication: combination of columns

Let's begin by looking at the right-multiplication of matrix X by a column vector:


Representing the columns of X by colorful boxes will help visualize this:

Matrix by vector

Sticking the white box with a in it to a vector just means: multiply this vector by the scalar a. The result is another column vector - a linear combination of X's columns, with a, b, c as the coefficients.

Right-multiplying X by a matrix is more of the same. Each resulting column is a different linear combination of X's columns:



Matrix by matrix

If you look hard at the equation above and squint a bit, you can recognize this column-combination property by examining each column of the result matrix.

Left-multiplication: combination of rows

Now let's examine left-multiplication. Left-multiplying a matrix X by a row vector is a linear combination of X's rows:


Is represented graphically thus:

Vector by matrix

And left-multiplying by a matrix is the same thing repeated for every result row: it becomes the linear combination of the rows of X, with the coefficients taken from the rows of the matrix on the left. Here's the equation form:


And the graphical form:

Matrix by matrix from the left

Redirecting all kinds of stdout in Python

A common task in Python (especially while testing or debugging) is to redirect sys.stdout to a stream or a file while executing some piece of code. However, simply "redirecting stdout" is sometimes not as easy as one would expect; hence the slightly strange title of this post. In particular, things become interesting when you want C code running within your Python process (including, but not limited to, Python modules implemented as C extensions) to also have its stdout redirected according to your wish. This turns out to be tricky and leads us into the interesting world of file descriptors, buffers and system calls.

But let's start with the basics.

Pure Python

The simplest case arises when the underlying Python code writes to stdout, whether by calling print, sys.stdout.write or some equivalent method. If the code you have does all its printing from Python, redirection is very easy. With Python 3.4 we even have a built-in tool in the standard library for this purpose - contextlib.redirect_stdout. Here's how to use it:

from contextlib import redirect_stdout

f = io.StringIO()
with redirect_stdout(f):
print('Got stdout: "{0}"'.format(f.getvalue()))

When this code runs, the actual print calls within the with block don't emit anything to the screen, and you'll see their output captured by in the stream f. Incidentally, note how perfect the with statement is for this goal - everything within the block gets redirected; once the block is done, things are cleaned up for you and redirection stops.

If you're stuck on an older and uncool Python, prior to 3.4 [1], what then? Well, redirect_stdout is really easy to implement on your own. I'll change its name slightly to avoid confusion:

from contextlib import contextmanager

def stdout_redirector(stream):
    old_stdout = sys.stdout
    sys.stdout = stream
        sys.stdout = old_stdout

So we're back in the game:

f = io.StringIO()
with stdout_redirector(f):
print('Got stdout: "{0}"'.format(f.getvalue()))

Redirecting C-level streams

Now, let's take our shiny redirector for a more challenging ride:

import ctypes
libc = ctypes.CDLL(None)

f = io.StringIO()
with stdout_redirector(f):
    libc.puts(b'this comes from C')
    os.system('echo and this is from echo')
print('Got stdout: "{0}"'.format(f.getvalue()))

I'm using ctypes to directly invoke the C library's puts function [2]. This simulates what happens when C code called from within our Python code prints to stdout - the same would apply to a Python module using a C extension. Another addition is the os.system call to invoke a subprocess that also prints to stdout. What we get from this is:

this comes from C
and this is from echo
Got stdout: "foobar

Err... no good. The prints got redirected as expected, but the output from puts and echo flew right past our redirector and ended up in the terminal without being caught. What gives?

To grasp why this didn't work, we have to first understand what sys.stdout actually is in Python.

Detour - on file descriptors and streams

This section dives into some internals of the operating system, the C library, and Python [3]. If you just want to know how to properly redirect printouts from C in Python, you can safely skip to the next section (though understanding how the redirection works will be difficult).

Files are opened by the OS, which keeps a system-wide table of open files, some of which may point to the same underlying disk data (two processes can have the same file open at the same time, each reading from a different place, etc.)

File descriptors are another abstraction, which is managed per-process. Each process has its own table of open file descriptors that point into the system-wide table. Here's a schematic, taken from The Linux Programming Interface:

File descriptor diagram

File descriptors allow sharing open files between processes (for example when creating child processes with fork). They're also useful for redirecting from one entry to another, which is relevant to this post. Suppose that we make file descriptor 5 a copy of file descriptor 4. Then all writes to 5 will behave in the same way as writes to 4. Coupled with the fact that the standard output is just another file descriptor on Unix (usually index 1), you can see where this is going. The full code is given in the next section.

File descriptors are not the end of the story, however. You can read and write to them with the read and write system calls, but this is not the way things are typically done. The C runtime library provides a convenient abstraction around file descriptors - streams. These are exposed to the programmer as the opaque FILE structure with a set of functions that act on it (for example fprintf and fgets).

FILE is a fairly complex structure, but the most important things to know about it is that it holds a file descriptor to which the actual system calls are directed, and it provides buffering, to ensure that the system call (which is expensive) is not called too often. Suppose you emit stuff to a binary file, a byte or two at a time. Unbuffered writes to the file descriptor with write would be quite expensive because each write invokes a system call. On the other hand, using fwrite is much cheaper because the typicall call to this function just copies your data into its internal buffer and advances a pointer. Only occasionally (depending on the buffer size and flags) will an actual write system call be issued.

With this information in hand, it should be easy to understand what stdout actually is for a C program. stdout is a global FILE object kept for us by the C library, and it buffers output to file descriptor number 1. Calls to functions like printf and puts add data into this buffer. fflush forces its flushing to the file descriptor, and so on.

But we're talking about Python here, not C. So how does Python translate calls to sys.stdout.write to actual output?

Python uses its own abstraction over the underlying file descriptor - a file object. Moreover, in Python 3 this file object is further wrapper in an io.TextIOWrapper, because what we pass to print is a Unicode string, but the underlying write system calls accept binary data, so encoding has to happen en route.

The important take-away from this is: Python and a C extension loaded by it (this is similarly relevant to C code invoked via ctypes) run in the same process, and share the underlying file descriptor for standard output. However, while Python has its own high-level wrapper around it - sys.stdout, the C code uses its own FILE object. Therefore, simply replacing sys.stdout cannot, in principle, affect output from C code. To make the replacement deeper, we have to touch something shared by the Python and C runtimes - the file descriptor.

Redirecting with file descriptor duplication

Without further ado, here is an improved stdout_redirector that also redirects output from C code [4]:

from contextlib import contextmanager
import ctypes
import io
import os, sys
import tempfile

libc = ctypes.CDLL(None)
c_stdout = ctypes.c_void_p.in_dll(libc, 'stdout')

def stdout_redirector(stream):
    # The original fd stdout points to. Usually 1 on POSIX systems.
    original_stdout_fd = sys.stdout.fileno()

    def _redirect_stdout(to_fd):
        """Redirect stdout to the given file descriptor."""
        # Flush the C-level buffer stdout
        # Flush and close sys.stdout - also closes the file descriptor (fd)
        # Make original_stdout_fd point to the same file as to_fd
        os.dup2(to_fd, original_stdout_fd)
        # Create a new sys.stdout that points to the redirected fd
        sys.stdout = io.TextIOWrapper(os.fdopen(original_stdout_fd, 'wb'))

    # Save a copy of the original stdout fd in saved_stdout_fd
    saved_stdout_fd = os.dup(original_stdout_fd)
        # Create a temporary file and redirect stdout to it
        tfile = tempfile.TemporaryFile(mode='w+b')
        # Yield to caller, then redirect stdout back to the saved fd
        # Copy contents of temporary file to the given stream
        tfile.flush(), io.SEEK_SET)

There are a lot of details here (such as managing the temporary file into which output is redirected) that may obscure the key approach: using dup and dup2 to manipulate file descriptors. These functions let us duplicate file descriptors and make any descriptor point at any file. I won't spend more time on them - go ahead and read their documentation, if you're interested. The detour section should provide enough background to understand it.

Let's try this:

f = io.BytesIO()

with stdout_redirector(f):
    libc.puts(b'this comes from C')
    os.system('echo and this is from echo')
print('Got stdout: "{0}"'.format(f.getvalue().decode('utf-8')))

Gives us:

Got stdout: "and this is from echo
this comes from C

Success! A few things to note:

  1. The output order may not be what we expected. This is due to buffering. If it's important to preserve order between different kinds of output (i.e. between C and Python), further work is required to disable buffering on all relevant streams.
  2. You may wonder why the output of echo was redirected at all? The answer is that file descriptors are inherited by subprocesses. Since we rigged fd 1 to point to our file instead of the standard output prior to forking to echo, this is where its output went.
  3. We use a BytesIO here. This is because on the lowest level, the file descriptors are binary. It may be possible to do the decoding when copying from the temporary file into the given stream, but that can hide problems. Python has its in-memory understanding of Unicode, but who knows what is the right encoding for data printed out from underlying C code? This is why this particular redirection approach leaves the decoding to the caller.
  4. The above also makes this code specific to Python 3. There's no magic involved, and porting to Python 2 is trivial, but some assumptions made here don't hold (such as sys.stdout being a io.TextIOWrapper).

Redirecting the stdout of a child process

We've just seen that the file descriptor duplication approach lets us grab the output from child processes as well. But it may not always be the most convenient way to achieve this task. In the general case, you typically use the subprocess module to launch child processes, and you may launch several such processes either in a pipe or separately. Some programs will even juggle multiple subprocesses launched this way in different threads. Moreover, while these subprocesses are running you may want to emit something to stdout and you don't want this output to be captured.

So, managing the stdout file descriptor in the general case can be messy; it is also unnecessary, because there's a much simpler way.

The subprocess module's swiss knife Popen class (which serve as the basis for much of the rest of the module) accepts a stdout parameter, which we can use to ask it to get access to the child's stdout:

import subprocess

echo_cmd = ['echo', 'this', 'comes', 'from', 'echo']
proc = subprocess.Popen(echo_cmd, stdout=subprocess.PIPE)
output = proc.communicate()[0]
print('Got stdout:', output)

The subprocess.PIPE argument can be used to set up actual child process pipes (a la the shell), but in its simplest incarnation it captures the process's output.

If you only launch a single child process at a time and are interested in its output, there's an even simpler way:

output = subprocess.check_output(echo_cmd)
print('Got stdout:', output)

check_output will capture and return the child's standard output to you; it will also raise an exception if the child exist with a non-zero return code.


I hope I covered most of the common cases where "stdout redirection" is needed in Python. Naturally, all of the same applies to the other standard output stream - stderr. Also, I hope the background on file descriptors was sufficiently clear to explain the redirection code; squeezing this topic in such a short space is challenging. Let me know if any questions remain or if there's something I could have explained better.

Finally, while it is conceptually simple, the code for the redirector is quite long; I'll be happy to hear if you find a shorter way to achieve the same effect.

[1]Do not despair. As of February 2015, a sizable chunk of the worldwide Python programmers are in the same boat.
[2]Note that bytes passed to puts. This being Python 3, we have to be careful since libc doesn't understand Python's unicode strings.
[3]The following description focuses on Unix/POSIX systems; also, it's necessarily partial. Large book chapters have been written on this topic - I'm just trying to present some key concepts relevant to stream redirection.
[4]The approach taken here is inspired by this Stack Overflow answer.

Python version of the LLVM tutorial

The LLVM tutorial is a venerable and important part of the project's documentation. It's been there for as long as I've been using LLVM (and according to the logs, a few years before that), almost always the first resource newcomers to the project are pointed to. It strikes just the right balance between simplicity and interesting content to provide an enticing introduction to LLVM. For a motivated reader, it shouldn't take more than a work day or two to go through it from start to finish, building a full compiler for a simple but "real" programming language in the process; how cool is that?

Anyway, it occurred to me that since the "official" version of the tutorial is in C++ and the only alternative checked into the tree is in OCaml, it may be interesting to re-implement the tutorial in Python. While I wouldn't write an industrial strength compiler in Python, it's a great prototyping platform, and when thinking about compilers and languages in general, prototyping is very imporatnt. You want to try all kinds of possibilities and combinations of features quickly, to get a feel for writing code in the language before it's fully done - and Python (with LLVM) is great for that.

Enter Pykaleidoscope, a project I put on Github that follows the steps of the official LLVM tutorial, but implementing the Kaleidoscope compiler in Python, using llvmlite as the binding to LLVM.

Installing llvmlite is fairly easy - see this post if you have any issues.

While working on Pykaleidoscope, I was impressed with llvmlite's maturity and compatibility with the C++ LLVM IR APIs. I didn't run into any significant problems, except maybe lack of documentation. But documentation isn't a strong side of LLVM either, which is one of the problems the tutorial helps with. So I hope this Python version will help folks understand how to use llvmlite to build non-trivial LLVM IR in Python.