Slow and fast methods for generating random integers in Python

The other day, while playing with a simple program involving randomness, I noticed something strange. Python's random.randint() function feels quite slow, in comparison to other randomness-generating functions. Since randint() is the canonical answer for "give me a random integer" in Python, I decided to dig deeper to understand what's going on.

This is a brief post that dives into the implementation of the random module, and discusses some alternative methods for generating pseudo-random integers.

First, a basic benchmark (Python 3.6):

$ python3 -m timeit -s 'import random' 'random.random()'
10000000 loops, best of 3: 0.0523 usec per loop
$ python3 -m timeit -s 'import random' 'random.randint(0, 128)'
1000000 loops, best of 3: 1.09 usec per loop

Whoa! It's about 20x more expensive to generate a random integer in the range [0, 128] than to generate a random float in the range [0, 1). That's pretty steep, indeed.

To understand why randint() is so slow, we'll have to dig into the Python source. Let's start with random() [1]. In Lib/random.py, the exported function random is an alias to the random method of the class Random, which inherits this method directly from _Random. This is the C companion defined in Modules/_randommodule.c, and it defines its random method as follows:

static PyObject *
random_random(RandomObject *self, PyObject *Py_UNUSED(ignored))
{
    uint32_t a=genrand_int32(self)>>5, b=genrand_int32(self)>>6;
    return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
}

Where getrand_int32 is defined directly above and implements a step of the Mersenne Twister PRNG. All in all, when we call random.random() in Python, the C function is directly invoked and there's not much extra work done beyond converting the result of genrand_int32 to a floating point number in a line of C.

Now let's take a look at what randint() is up to:

def randint(self, a, b):
    """Return random integer in range [a, b], including both end points.
    """

    return self.randrange(a, b+1)

It calls randrange, fair enough. Here it is:

def randrange(self, start, stop=None, step=1, _int=int):
    """Choose a random item from range(start, stop[, step]).

    This fixes the problem with randint() which includes the
    endpoint; in Python this is usually not what you want.

    """

    # This code is a bit messy to make it fast for the
    # common case while still doing adequate error checking.
    istart = _int(start)
    if istart != start:
        raise ValueError("non-integer arg 1 for randrange()")
    if stop is None:
        if istart > 0:
            return self._randbelow(istart)
        raise ValueError("empty range for randrange()")

    # stop argument supplied.
    istop = _int(stop)
    if istop != stop:
        raise ValueError("non-integer stop for randrange()")
    width = istop - istart
    if step == 1 and width > 0:
        return istart + self._randbelow(width)
    if step == 1:
        raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))

    # Non-unit step argument supplied.
    istep = _int(step)
    if istep != step:
        raise ValueError("non-integer step for randrange()")
    if istep > 0:
        n = (width + istep - 1) // istep
    elif istep < 0:
        n = (width + istep + 1) // istep
    else:
        raise ValueError("zero step for randrange()")

    if n <= 0:
        raise ValueError("empty range for randrange()")

    return istart + istep*self._randbelow(n)

That's quite a bit of case checking and setting up parameters before we get to the next level. There are a couple of fast-path cases (for example, when the stop parameter is not supplied, this function will be a bit faster), but overall after a bunch of checking we get to call the _randbelow() method.

By default, _randbelow() gets mapped to _randbelow_with_getrandbits():

def _randbelow_with_getrandbits(self, n):
    "Return a random int in the range [0,n).  Raises ValueError if n==0."

    getrandbits = self.getrandbits
    k = n.bit_length()  # don't use (n-1) here because n can be 1
    r = getrandbits(k)          # 0 <= r < 2**k
    while r >= n:
        r = getrandbits(k)
    return r

Note that it does a couple more computations and can end up invoking getrandbits() multiple times (esp. if n is far from a power of two). getrandbits() is in C, and while it also ends up invoking the PRNG getrand_int32(), it's somewhat heavier than random() and runs twice as slow.

In other words, there's a lot of Python and C code in the way to invoke the same underlying C function. Since Python is bytecode-interpreted, all of this ends up being quite a bit slower than simply calling the C function directly. A death by a thousand cuts. To be fair, randint() is also more flexible in that it can generate pseudo-random numbers of any size; that said, it's not very common to need huge pseudo-random numbers, and our tests were with small numbers anyway.

Here's a couple of experiments to help us test this hypothesis. First, let's try to hit the fast-path we've seen above in randrange, by calling randrange without a stop parameter:

$ python3 -m timeit -s 'import random' 'random.randrange(1)'
1000000 loops, best of 3: 0.784 usec per loop

As expected, the run-time is somewhat better than randint. Another experiment is to rerun the comparison in PyPy, which is a JIT compiler that should end up tracing through the Python code and generating efficient machine code that strips a lot of abstractions.

$ pypy -m timeit -s 'import random' 'random.random()'
100000000 loops, best of 3: 0.0139 usec per loop
$ pypy -m timeit -s 'import random' 'random.randint(0, 128)'
100000000 loops, best of 3: 0.0168 usec per loop

As expected, the difference between these calls in PyPy is small.

Faster methods for generating pseudo-random integers

So randint() turns out to be very slow. In most cases, no one cares; but just occasionally, we need many random numbers - so what is there to do?

One tried and true trick is just using random.random() instead, multiplying by our integer limit:

$ python3 -m timeit -s 'import random' 'int(128 * random.random())'
10000000 loops, best of 3: 0.193 usec per loop

This gives us pseudo-random integers in the range [0, 128), much faster. One word of caution: Python represents its floats in double-precision, with 53 bits of accuracy. When the limit is above 53 bits, the numbers we'll be getting using this method are not quite random - bits will be missing. This is rarely a problem because we don't usually need such huge integers, but definitely something to keep in mind [2].

Another quick way to generate pseudo-random integers is to use getrandbits() directly:

$ python3 -m timeit -s 'import random' 'random.getrandbits(7)'
10000000 loops, best of 3: 0.102 usec per loop

This method is fast but limited - it only supports ranges that are powers of two. If we want to limit the range we can't just compute a modulo - this will skew the distribution; rather we'll have to use a loop similarly to what _randbelow_with_getrandbits() does in the sample above. This will slow things down, of course.

Finally, we can turn away from the random module altogether, and use Numpy:

$ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128)'
1000000 loops, best of 3: 1.21 usec per loop

Surprisingly, this is slow; that's because Numpy isn't great for working with single datums - it likes to amortize costs over large arrays created / manipulated in C. To see this in action, let's see how long it takes to generate 100 random integers:

$ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128, size=100)'
1000000 loops, best of 3: 1.91 usec per loop

Only 60% slower than generating a single one! With 0.019 usec per integer, this is the fastest method by far - 3 times faster than calling random.random(). The reason this method is so fast is because the Python call overheads are amortized over all generated integers, and deep inside Numpy runs an efficient C loop to generate them.

To conclude, use Numpy if you want to generate large numbers of random ints; if you're just generating one-at-a-time, it may not be as useful (but then how much do you care about performance, really?)

Conclusion: performance vs. abstraction

In programming, performance and abstraction/flexibility are often at odds. By making a certain function more flexible, we inevitably make it slower - and randint() is a great practical example of this problem. 9 times out of 10 we don't care about the performance of these functions, but when we do, it's useful to know what's going on and how to improve the situation.

In a way, pure Python code itself is one of the slowest abstractions we encounter, since every line gets translated to a bunch of bytecode that has to be interpreted at run-time.

To mitigate these effects, Python programmers who care about performance have many techniques at their disposal. Libraries like Numpy carefully move as much compute as possible to underlying C code; PyPy is a JIT compiler that can speed up most pure Python code sequences considerably. Numba is somewhere in between, while Cython lets us re-write chosen functions in a restricted subset of Python that can be efficiently compiled to machine code.

[1]	From this point on, file names point to source files in the CPython repository. Feel free to follow along on your machine or on GitHub.

[2]

As an experiment, try to generate pseudo-random integers up to 2^54 using this technique. You'll notice that only even numbers are generated!

More generally, the closer the multiplier is to machine precision, the less random the result becomes. Knuth has an interesting discussion of this in volume 2 of TAOCP - it has to do with unbalanced rounding that has to happen every time a precision-limited float is multiplied by an integer. That said, if the multiplier is much smaller than the precision, we'll be fine; for generating numbers up to 2^40, say, the bad effects on the distribution will be negligible.