Redirecting all kinds of stdout in Python



A common task in Python (especially while testing or debugging) is to redirect sys.stdout to a stream or a file while executing some piece of code. However, simply "redirecting stdout" is sometimes not as easy as one would expect; hence the slightly strange title of this post. In particular, things become interesting when you want C code running within your Python process (including, but not limited to, Python modules implemented as C extensions) to also have its stdout redirected according to your wish. This turns out to be tricky and leads us into the interesting world of file descriptors, buffers and system calls.

But let's start with the basics.

Pure Python

The simplest case arises when the underlying Python code writes to stdout, whether by calling print, sys.stdout.write or some equivalent method. If the code you have does all its printing from Python, redirection is very easy. With Python 3.4 we even have a built-in tool in the standard library for this purpose - contextlib.redirect_stdout. Here's how to use it:

from contextlib import redirect_stdout

f = io.StringIO()
with redirect_stdout(f):
    print('foobar')
    print(12)
print('Got stdout: "{0}"'.format(f.getvalue()))

When this code runs, the actual print calls within the with block don't emit anything to the screen, and you'll see their output captured by in the stream f. Incidentally, note how perfect the with statement is for this goal - everything within the block gets redirected; once the block is done, things are cleaned up for you and redirection stops.

If you're stuck on an older and uncool Python, prior to 3.4 [1], what then? Well, redirect_stdout is really easy to implement on your own. I'll change its name slightly to avoid confusion:

from contextlib import contextmanager

@contextmanager
def stdout_redirector(stream):
    old_stdout = sys.stdout
    sys.stdout = stream
    try:
        yield
    finally:
        sys.stdout = old_stdout

So we're back in the game:

f = io.StringIO()
with stdout_redirector(f):
    print('foobar')
    print(12)
print('Got stdout: "{0}"'.format(f.getvalue()))

Redirecting C-level streams

Now, let's take our shiny redirector for a more challenging ride:

import ctypes
libc = ctypes.CDLL(None)

f = io.StringIO()
with stdout_redirector(f):
    print('foobar')
    print(12)
    libc.puts(b'this comes from C')
    os.system('echo and this is from echo')
print('Got stdout: "{0}"'.format(f.getvalue()))

I'm using ctypes to directly invoke the C library's puts function [2]. This simulates what happens when C code called from within our Python code prints to stdout - the same would apply to a Python module using a C extension. Another addition is the os.system call to invoke a subprocess that also prints to stdout. What we get from this is:

this comes from C
and this is from echo
Got stdout: "foobar
12
"

Err... no good. The prints got redirected as expected, but the output from puts and echo flew right past our redirector and ended up in the terminal without being caught. What gives?

To grasp why this didn't work, we have to first understand what sys.stdout actually is in Python.

Detour - on file descriptors and streams

This section dives into some internals of the operating system, the C library, and Python [3]. If you just want to know how to properly redirect printouts from C in Python, you can safely skip to the next section (though understanding how the redirection works will be difficult).

Files are opened by the OS, which keeps a system-wide table of open files, some of which may point to the same underlying disk data (two processes can have the same file open at the same time, each reading from a different place, etc.)

File descriptors are another abstraction, which is managed per-process. Each process has its own table of open file descriptors that point into the system-wide table. Here's a schematic, taken from The Linux Programming Interface:

File descriptor diagram

File descriptors allow sharing open files between processes (for example when creating child processes with fork). They're also useful for redirecting from one entry to another, which is relevant to this post. Suppose that we make file descriptor 5 a copy of file descriptor 4. Then all writes to 5 will behave in the same way as writes to 4. Coupled with the fact that the standard output is just another file descriptor on Unix (usually index 1), you can see where this is going. The full code is given in the next section.

File descriptors are not the end of the story, however. You can read and write to them with the read and write system calls, but this is not the way things are typically done. The C runtime library provides a convenient abstraction around file descriptors - streams. These are exposed to the programmer as the opaque FILE structure with a set of functions that act on it (for example fprintf and fgets).

FILE is a fairly complex structure, but the most important things to know about it is that it holds a file descriptor to which the actual system calls are directed, and it provides buffering, to ensure that the system call (which is expensive) is not called too often. Suppose you emit stuff to a binary file, a byte or two at a time. Unbuffered writes to the file descriptor with write would be quite expensive because each write invokes a system call. On the other hand, using fwrite is much cheaper because the typicall call to this function just copies your data into its internal buffer and advances a pointer. Only occasionally (depending on the buffer size and flags) will an actual write system call be issued.

With this information in hand, it should be easy to understand what stdout actually is for a C program. stdout is a global FILE object kept for us by the C library, and it buffers output to file descriptor number 1. Calls to functions like printf and puts add data into this buffer. fflush forces its flushing to the file descriptor, and so on.

But we're talking about Python here, not C. So how does Python translate calls to sys.stdout.write to actual output?

Python uses its own abstraction over the underlying file descriptor - a file object. Moreover, in Python 3 this file object is further wrapper in an io.TextIOWrapper, because what we pass to print is a Unicode string, but the underlying write system calls accept binary data, so encoding has to happen en route.

The important take-away from this is: Python and a C extension loaded by it (this is similarly relevant to C code invoked via ctypes) run in the same process, and share the underlying file descriptor for standard output. However, while Python has its own high-level wrapper around it - sys.stdout, the C code uses its own FILE object. Therefore, simply replacing sys.stdout cannot, in principle, affect output from C code. To make the replacement deeper, we have to touch something shared by the Python and C runtimes - the file descriptor.

Redirecting with file descriptor duplication

Without further ado, here is an improved stdout_redirector that also redirects output from C code [4]:

from contextlib import contextmanager
import ctypes
import io
import os, sys
import tempfile

libc = ctypes.CDLL(None)
c_stdout = ctypes.c_void_p.in_dll(libc, 'stdout')

@contextmanager
def stdout_redirector(stream):
    # The original fd stdout points to. Usually 1 on POSIX systems.
    original_stdout_fd = sys.stdout.fileno()

    def _redirect_stdout(to_fd):
        """Redirect stdout to the given file descriptor."""
        # Flush the C-level buffer stdout
        libc.fflush(c_stdout)
        # Flush and close sys.stdout - also closes the file descriptor (fd)
        sys.stdout.close()
        # Make original_stdout_fd point to the same file as to_fd
        os.dup2(to_fd, original_stdout_fd)
        # Create a new sys.stdout that points to the redirected fd
        sys.stdout = io.TextIOWrapper(os.fdopen(original_stdout_fd, 'wb'))

    # Save a copy of the original stdout fd in saved_stdout_fd
    saved_stdout_fd = os.dup(original_stdout_fd)
    try:
        # Create a temporary file and redirect stdout to it
        tfile = tempfile.TemporaryFile(mode='w+b')
        _redirect_stdout(tfile.fileno())
        # Yield to caller, then redirect stdout back to the saved fd
        yield
        _redirect_stdout(saved_stdout_fd)
        # Copy contents of temporary file to the given stream
        tfile.flush()
        tfile.seek(0, io.SEEK_SET)
        stream.write(tfile.read())
    finally:
        tfile.close()
        os.close(saved_stdout_fd)

There are a lot of details here (such as managing the temporary file into which output is redirected) that may obscure the key approach: using dup and dup2 to manipulate file descriptors. These functions let us duplicate file descriptors and make any descriptor point at any file. I won't spend more time on them - go ahead and read their documentation, if you're interested. The detour section should provide enough background to understand it.

Let's try this:

f = io.BytesIO()

with stdout_redirector(f):
    print('foobar')
    print(12)
    libc.puts(b'this comes from C')
    os.system('echo and this is from echo')
print('Got stdout: "{0}"'.format(f.getvalue().decode('utf-8')))

Gives us:

Got stdout: "and this is from echo
this comes from C
foobar
12
"

Success! A few things to note:

  1. The output order may not be what we expected. This is due to buffering. If it's important to preserve order between different kinds of output (i.e. between C and Python), further work is required to disable buffering on all relevant streams.
  2. You may wonder why the output of echo was redirected at all? The answer is that file descriptors are inherited by subprocesses. Since we rigged fd 1 to point to our file instead of the standard output prior to forking to echo, this is where its output went.
  3. We use a BytesIO here. This is because on the lowest level, the file descriptors are binary. It may be possible to do the decoding when copying from the temporary file into the given stream, but that can hide problems. Python has its in-memory understanding of Unicode, but who knows what is the right encoding for data printed out from underlying C code? This is why this particular redirection approach leaves the decoding to the caller.
  4. The above also makes this code specific to Python 3. There's no magic involved, and porting to Python 2 is trivial, but some assumptions made here don't hold (such as sys.stdout being a io.TextIOWrapper).

Redirecting the stdout of a child process

We've just seen that the file descriptor duplication approach lets us grab the output from child processes as well. But it may not always be the most convenient way to achieve this task. In the general case, you typically use the subprocess module to launch child processes, and you may launch several such processes either in a pipe or separately. Some programs will even juggle multiple subprocesses launched this way in different threads. Moreover, while these subprocesses are running you may want to emit something to stdout and you don't want this output to be captured.

So, managing the stdout file descriptor in the general case can be messy; it is also unnecessary, because there's a much simpler way.

The subprocess module's swiss knife Popen class (which serve as the basis for much of the rest of the module) accepts a stdout parameter, which we can use to ask it to get access to the child's stdout:

import subprocess

echo_cmd = ['echo', 'this', 'comes', 'from', 'echo']
proc = subprocess.Popen(echo_cmd, stdout=subprocess.PIPE)
output = proc.communicate()[0]
print('Got stdout:', output)

The subprocess.PIPE argument can be used to set up actual child process pipes (a la the shell), but in its simplest incarnation it captures the process's output.

If you only launch a single child process at a time and are interested in its output, there's an even simpler way:

output = subprocess.check_output(echo_cmd)
print('Got stdout:', output)

check_output will capture and return the child's standard output to you; it will also raise an exception if the child exist with a non-zero return code.

Conclusion

I hope I covered most of the common cases where "stdout redirection" is needed in Python. Naturally, all of the same applies to the other standard output stream - stderr. Also, I hope the background on file descriptors was sufficiently clear to explain the redirection code; squeezing this topic in such a short space is challenging. Let me know if any questions remain or if there's something I could have explained better.

Finally, while it is conceptually simple, the code for the redirector is quite long; I'll be happy to hear if you find a shorter way to achieve the same effect.


[1]Do not despair. As of February 2015, a sizable chunk of the worldwide Python programmers are in the same boat.
[2]Note that bytes passed to puts. This being Python 3, we have to be careful since libc doesn't understand Python's unicode strings.
[3]The following description focuses on Unix/POSIX systems; also, it's necessarily partial. Large book chapters have been written on this topic - I'm just trying to present some key concepts relevant to stream redirection.
[4]The approach taken here is inspired by this Stack Overflow answer.

Python version of the LLVM tutorial



The LLVM tutorial is a venerable and important part of the project's documentation. It's been there for as long as I've been using LLVM (and according to the logs, a few years before that), almost always the first resource newcomers to the project are pointed to. It strikes just the right balance between simplicity and interesting content to provide an enticing introduction to LLVM. For a motivated reader, it shouldn't take more than a work day or two to go through it from start to finish, building a full compiler for a simple but "real" programming language in the process; how cool is that?

Anyway, it occurred to me that since the "official" version of the tutorial is in C++ and the only alternative checked into the tree is in OCaml, it may be interesting to re-implement the tutorial in Python. While I wouldn't write an industrial strength compiler in Python, it's a great prototyping platform, and when thinking about compilers and languages in general, prototyping is very imporatnt. You want to try all kinds of possibilities and combinations of features quickly, to get a feel for writing code in the language before it's fully done - and Python (with LLVM) is great for that.

Enter Pykaleidoscope, a project I put on Github that follows the steps of the official LLVM tutorial, but implementing the Kaleidoscope compiler in Python, using llvmlite as the binding to LLVM.

Installing llvmlite is fairly easy - see this post if you have any issues.

While working on Pykaleidoscope, I was impressed with llvmlite's maturity and compatibility with the C++ LLVM IR APIs. I didn't run into any significant problems, except maybe lack of documentation. But documentation isn't a strong side of LLVM either, which is one of the problems the tutorial helps with. So I hope this Python version will help folks understand how to use llvmlite to build non-trivial LLVM IR in Python.


Review of the Coursera Machine Learning course



I've just finished going through the Machine Learning course on Coursera, and this is a brief review.

The scope of the course is quite broad, covering a lot of topics from both supervised and unsupervised machine learning. "Practical advice" parts are sprinkled throughout the lectures and are extremely useful - how to evaluate your algorithm, how to know what to focus on, practical tips for improving speed and accuracy, etc.

Machine Learning is one of the oldest courses on Coursera, and is extremely polished by now. In the lectures, Andrew Ng foresees the questions that may arise and answers them beforehead. The programming excercises are very well organized with very little detail left to chance. They're quite sizable too. I added extra work to myself by using Python instead of Octave (if you use Octave or Matlab a lot gets set up for you by the code provided with the assignment), but that's probably not the bulk of the code. ~2200 lines of dense numerical code (in throwaway homework mode - barely any comments and few unit tests) is quite a lot.

One thing I liked somewhat less is the lack of mathematical depth. I realize the reason for this is the relatively low barrier of entry they want to set for the course, I'm sure that Stanford students taking the real course are expected to know some linear algebra and probability and the formulae can be developed, rather than just presented as an act of god. This was especially noticeable in the lecture on PCA, where Prof Ng just dropped the eigenvector approach, and even went so far as to provide the exact Octave function to call, without much mathematical background or reasoning. Indeed, it is quite possible to implement the PCA part of the programming exercises without really understanding how PCA works.

Which brings me to a related topic. The course is way too easy to provide a meaningful certificate. The review quizzes are trivial and require very little thought. The programming exercises, while demanding in terms of invested time, aren't going deep either, and don't deviate from the lecture materials - you basically follow very detailed instructions and transcribe formulae into code.

Overall, I enjoyed the course, and I would highly recommend it to anyone interested in getting into machine learning (or, by its new-age name, big data).

Finally, an interesting observation I had while working through the programming assignments in Python (using Numpy, Scipy and Scikit-learn). Domain specific languages (like Matlab/Octave for general numerics and R for statistics) are often hailed as eye-openers because they come equipped with some facilities tailored at scientific programming (like a convenient notation for defining constant matrices), but I don't think this is the right approach. Writing Python with Numpy et. al, some basic things may take a bit more keystrokes to achieve, but eventually you end up with very similar code. Moreover, it's all the same Fortran-written LINPACK running under the hood anyways, so performance is the same.

However, expressing a constant matrix in the most concise way possible is not the end of the story when writing a machine learning system. It's nice to have an actual programming language in your hands, with all that entails - community, libraries, etc. It was curious to notice the impact of this in the spam classification assignment. Tons of textual preprocessing for which Octave is not very well suitable. Need to implement some NLP algorithms, for which Octave has no libraries, etc. In Python it was all a breeze, of course, including importing NLTK to take care of any NLP needs.

I'm not saying that Python is the best tool for every job. But for exploratory scientific programming, it seems like the strongest option out there. Numpy and its kin are very mature, fast and well-supported. Plotting with matplotlib is great (and I heard that the legendary ggplot from R now has Python bindings as well). And when you need to step outside the narrow domain of computations, you have the full programming language with all its support structure at your command.