Affine transformations



This is a brief article on affine mappings and their relation to linear mappings, with some applications.

Linear vs. Affine

To start discussing affine mappings, we have to first address a common confusion around what it means for a function to be linear.

According to Wikipedia the term linear function can refer to two distinct concepts, based on the context:

  1. In Calculus, a linear function is a polynomial function of degree zero or one; in other words, a function of the form f(x)=ax+b for some constants a and b.
  2. In Linear Algebra, a linear function is a linear mapping, or linear transformation.

In this article we're going to be using (2) as the definition of linear, and it will soon become obvious why (1) is confusing when talking about transformations. To avoid some of the jumble going forward, I'm goine to be using the term mapping instead of function, but in linear algebra the two are interchangeable (transformation is another synonym, which I'm going to be making less effort to avoid since it's not as overloaded [1]).

Linear transformations

Since we're talking about linear algebra, let's use the domain of vector spaces for the definitions. A transformation (or mapping) f is linear when for any two vectors \vec{v} and \vec{w} (assuming the vectors are in the same vector space, say \mathbb{R}^2):

  • f(\vec{v}+\vec{w})=f(\vec{v})+f(\vec{w})
  • f(k\vec{v})=kf(\vec{v}) for some scalar k

For example, the mapping f(\vec{v})=\langle 3v_1-4v_2,v_2 \rangle - where v_1 and v_2 are the components of \vec{v} - is linear. The mapping g(\vec{v})=\langle v_2,2v_{1}v_{2} \rangle is not linear.

In fact, it can be shown that for the kind of vector spaces we're mostly interested in [2], any linear mapping can be represented by a matrix that is multiplied by the input vector. This is because we can represent any vector in terms of the standard basis vectors: \vec{v}=v_1\vec{e}_1+...+v_n\vec{e}_n. Then, since f is linear:

\[f(\vec{v})=f(\sum_{i=1}^{n}v_i\vec{e}_i)=\sum_{i=1}^{n}v_if(\vec{e}_i)\]

If we think of f(\vec{e}_i) as column vectors, this is precisely the multiplication of a matrix by \vec{v}:

\[f(\vec{v}) = \begin{pmatrix} \mid & \mid & & \mid \\ f(\vec{e}_1) & f(\vec{e}_2) & \cdots & f(\vec{e}_n) \\ \mid & \mid & & \mid \\ \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \\ ... \\ v_n \end{pmatrix}\]

This multiplication by a matrix can also be seen as a change of basis for \vec{v} from the standard base to a base defined by f. If you want a refresher on how changes of basis work, take a look at my older post on this topic.

Let's get back to our earlier example of the mapping f(\vec{v})=\langle 3v_1-4v_2,v_2 \rangle. We can represent this mapping with the following matrix:

\[\begin{pmatrix} 3 & -4 \\ 0 & 1 \end{pmatrix}\]

Meaning that:

\[f(\vec{v})=\begin{pmatrix} 3 & -4 \\ 0 & 1 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix}\]

Representing linear mappings this way gives us a number of interesting tools for working with them. For example, the associativity of matrix multiplication means that we can represent compositions of mappings by simply multiplying the mapping matrices together.

Consider the following mapping:

\[S=\begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix}\]

In equational form: S(\vec{v})=\langle 2v_1,2v_2 \rangle. This mapping stretches the input vector 2x in both dimensions. To visualize a mapping, it's useful to examine its effects on some standard vectors. Let's use the vectors (0,0), (0,1), (1,0), (1,1) (the "unit square"). In \mathbb{R}^2 they represent four points that can be connected together as follows [3]:

Unit vectors as points on the plane

It's easy to see that when transformed with S, we'll get:

Unit vectors trasformed with 2x stretch

It's also well known that rotation (relative to the origin) can be modeled with the following mapping with \theta in radians:

\[R=\begin{pmatrix} cos\theta & sin\theta \\ -sin\theta & cos\theta \end{pmatrix}\]

Transforming our unit square with this matrix we get:

Unit vectors trasformed with rotation by one radian

Finally, let's say we want to combine these transformations. To stretch and then rotate a vector, we would do: f(\vec{v})=R(Sv). Since matrix multiplication is associative, this can also be rewritten as: f(\vec{v})=(RS)v. In other words, we can find a matrix A=RS which represents the combined transformation, and we "find" it by simply multiplying R and S together [4]:

\[A=\begin{pmatrix} cos\theta & sin\theta \\ -sin\theta & cos\theta \end{pmatrix}\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}=\begin{pmatrix} 2cos\theta & 2sin\theta \\ -2sin\theta & 2cos\theta \end{pmatrix}\]

And when we multiply our unit by this matrix we get:

Unit vectors transformed with rotation and stretch

Affine transformations

Now that we have some good context on linear transformations, it's time to get to the main topic of this post - affine transformations.

For an affine space (we'll talk about what this is exactly in a later section), every affine transformation is of the form g(\vec{v})=Av+b where A is a matrix representing a linear transformation and b is a vector. In other words, an affine transformation combines a linear transformation with a translation.

Quite obviously, every linear transformation is affine (just set b to the zero vector). However, not every affine transformation is linear. For a non-zero b, the linearity rules don't check out. Let's say that:

\[\begin{align*} f(\vec{v})&=A\vec{v}+\vec{b} \\ f(\vec{w})&=A\vec{w}+\vec{b} \end{align*}\]

Then if we try to add these together, we get:

\[f(\vec{v}+\vec{w})=A(\vec{v}+\vec{w})+\vec{b}\]

Whereas:

\[f(\vec{v})+f(\vec{w})=A\vec{v}+b+A\vec{w}+b=A(\vec{v}+\vec{w})+2b\]

The violation of the scalar multiplication rule can be checked similarly.

Let's examine the affine transformation that stretches a vector by a factor of two (similarly to the S transformation we've discussed before) and translates it by 0.5 for both dimensions:

\[f(\vec{v})=\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}\vec{v}+\begin{pmatrix} 0.5 \\ 0.5\end{pmatrix}\]

Here is this transformation visualized:

Unit vectors translated and stretched

With some clever augmentation, we can represent affine transformations as a multiplication by a single matrix, if we add another dimension to the vectors [5]:

\[f(\vec{v})=T\vec{v}=\begin{pmatrix} 2 & 0 & 0.5 \\ 0 & 2 & 0.5 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ 1 \end{pmatrix}\]

The translation vector is tacked on the right-hand side of the transform matrix, with a 1 for the extra dimension (the matrix gets 0s in that dimension). The result will always have a 1 in the final dimension, which we can ignore.

Affine transforms can be composed similarly to linear transforms, using matrix multiplication. This also makes them associative. As an example, let's compose the scaling+translation transform discussed most recently with the rotation transform mentioned earlier. This is the augmented matrix for the rotation:

\[R=\begin{pmatrix} cos\theta & sin\theta & 0 \\ -sin\theta & cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}\]

The composed transform will be f(\vec{v})=T(R(\vec{v}))=(TR)\vec{v}. Its matrix is:

\[TR=\begin{pmatrix} 2 & 0 & 0.5 \\ 0 & 2 & 0.5 \\ 0 & 0 & 1 \end{pmatrix}\begin{pmatrix} cos\theta & sin\theta & 0 \\ -sin\theta & cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}=\begin{pmatrix} 2cos\theta & 2sin\theta & 0.5 \\ -2sin\theta & 2cos\theta & 0.5 \\ 0 & 0 & 1 \end{pmatrix}\]

The visualization is:

Translation after rotation

Affine subspaces

The previous section defined affine transformation w.r.t. the concept of affine space, and now it's time to pay the rigor debt. According to Wikipedia, an affine space:

... is a geometric structure that generalizes the properties of Euclidean spaces in such a way that these are independent of the concepts of distance and measure of angles, keeping only the properties related to parallelism and ratio of lengths for parallel line segments.

Since we've been using vectors and vector spaces so far in the article, let's see the relation between vector spaces and affine spaces. The best explanation I found online is the following.

Consider the vector space \mathbb{R}^2, with two lines:

Lines for subspace and affine space of R2

The blue line can be seen as a vector subspace (also known as linear subspace) of \mathbb{R}^2. On the other hand, the green line is not a vector subspace because it doesn't contain the zero vector. The green line is an affine subspace. This leads us to a definition:

A subset U \subset V of a vector space V is an affine space if there exists a u \in U such that U - u = \{x-u \mid x \in U\} is a vector subspace of V.

If you recall the definition of affine transformations from earlier on, this should seem familiar - linear and affine subspaces are related by using a translation vector. It can also be said that an affine space is a generalization of a linear space, in that it doesn't require a specific origin point. From Wikipedia, again:

Any vector space may be considered as an affine space, and this amounts to forgetting the special role played by the zero vector. In this case, the elements of the vector space may be viewed either as points of the affine space or as displacement vectors or translations. When considered as a point, the zero vector is called the origin. Adding a fixed vector to the elements of a linear subspace of a vector space produces an affine subspace. One commonly says that this affine subspace has been obtained by translating (away from the origin) the linear subspace by the translation vector.

When mathematicians define new algebraic structures, they don't do it just for fun (well, sometimes they do) but because such structures have some properties which can lead to useful generalizations. Affine spaces and transformations also have interesting properties, which make them useful. For example, an affine transformation always maps a line to a line (and not to, say, a parabola). Any two triangles can be converted one to the other using an affine transform, and so on. This leads to interesting applications in computational geometry and 3D graphics.

Affine functions in linear regression and neural networks

Here I want to touch upon the linear vs. affine confusion again, in the context of machine learning. Recall that Linear Regression attempts to fit a line onto data in an optimal way, the line being defined as the function:

\[y(x) = mx + b\]

But as this article explained, y(x) is not actually a linear function; it's an affine function (because of the constant factor b). Should linear regression be renamed to affine regression? It's probably too late for that :-), but it's good to get the terminology right.

Similarly, a single fully connected layer in a neural network is often expressed mathematically as:

\[y(\vec{x})=W\vec{x}+\vec{b}\]

Where \vec{x} is the input vector, W is the weight matrix and \vec{b} is the bias vector. This function is also usually referred to as linear although it's actually affine.

Affine expressions and array accesses

Pivoting from algebra to programming, affine functions have a use when discussing one of the most fundamental building blocks of computer science: accessing arrays.

Let's start by defining an affine expression:

An expression is affine w.r.t. variables v_1,v_2,...,v_n if it can be expressed as c_0+c_{1}v_1+...+c_{n}v_n where c_0,c_1,...,c_n are constants.

Affine expressions are interesting because they are often used to index arrays in loops. Consider the following loop in C that copies all elements in an MxN matrix "one to the left":

for (int i = 0; i < M; ++i) {
  for (int j = 1; j < N; ++j) {
    arr[i][j-1] = arr[i][j];
  }
}

Since C's memory layout for multi-dimensional arrays is row-major, the statement in the loop assigns a value to arr[i*N + j - 1] at every iteration. i*N + j - 1 is an affine expression w.r.t. variables i and j [6].

When all expressions in a loop are affine, the loop is amenable to some advanced analyses and optimizations, but this is a topic for another post.


[1]Though it's also not entirely precise. Generally speaking, transformations are more limited than functions. A transformation is defined on a set as a binjection of the set to itself, whereas functions are more general (they can map between different sets, for example).
[2]Finite-dimensional vector spaces with a defined basis.
[3]Tossing a bit of rigor aside, we can imagine points and vectors to be isomophic since both are represented by pairs of numbers on the \mathbb{R}^2 plane. Some resources will mention the Euclidean plane - \mathbb{E}^2 when talking about points and lines, but the Euclidean plane can be modeled by a same-dimensional real plane so I'll just be using \mathbb{R}^2.
[4]I'll admit this result looks fairly obvious. But longer chains of transforms work in exactly the same way, and the fact that we can represent such chains with a single matrix is very useful.
[5]This trick has a geometrical explanation: translation in 2D can be modeled as adding a dimension and performing a 3D shear operation, then projecting the resulting object onto a 2D plane again. The object will appear shifted.
[6]It's actually only affine if N is a compile-time constant or can be proven to be constant throughout the loop.

Summary of reading: October - December 2017



  • "How Asia Works" by Joe Studwell - an interesting analysis of the economies of some Asian countries and their paths from rags to riches in the last few decades. The author raises some intriguing questions about the paths the most successful countries taken as compared to the free market approach the great powers tried to enforce upon them, and ends with some speculation on the future of China's financial development.
  • "The Remains of the Day" by Kazuo Ishiguro - a truly delightful book, I have no other word to describe it :) Having watched and enjoyed "Downton Abbey" just a short while ago, it was very interesting to draw the similarities with the world Ishiguro describes in this book - the world of British nobles and their servants.
  • "Hillbilly Elegy: A Memoir of a Family and Culture in Crisis" by J.D. Vance - the author tells about growing up in a troubled family in the rust belt, and eventual "escape" to the Marine Corps and Yale law school. Interestingly, while I originally expected something different from the book, I did like it overall. It doesn't carry a lot of insight besides the very personal story, but perhaps such insight is best gained through individual stories and not editorial pieces that are prone to political biases.
  • "Between the World and Me" by Ta-Nehisi Coates - a troubling auto-biographic account of the author growing up as a black man in Baltimore and later through studies in Howard University. Reading this book is difficult and poignant, as it paints an angry picture of the racial divide in modern-day America.
  • "Exit Right: The People Who Left the Left and Reshaped the American Century" by Daniel Oppenheimer - a biographical account of 6 prominent Americans from the political left, their gradual estrangement from their party and eventual flip to conservatism (or neoconservatism). Provides a fairly interesting angle on major issues American politics of the 20th century.
  • "Little Fires Everywhere" by Celeste Ng - a well-to-do family in Cleveland intersects with a nomad artist and her daughter, with a thorny adoption story in the background. Somewhat soap-opera-y, but not bad overall, with some interesting thoughts on choosing life paths.
  • "To Make Men Free - A history of the Republican party" by Heather Cox Richardson - a detailed history of the Republican party from the times of Lincoln and until the end of Bush Jr's presidency. Very informative book overall, but far from great. For one, it's clearly biased towards the left - and bias makes such books lose a lot of credence; for another, it completely ignores the changes in the Democratic party - for example, very little is said about how the Democrats turned from racism to liberalism in the 1960s. In fact, the author insinuates that most of the changes in the political spectrum were either made by the Republicans, or as a mirror image of their views; clearly, the truth can't be so simple. On the positive side, I liked the exposition of the inherent conflict between the Declaration of Independence (equality of opportunity) and the Constitution (protection of property), which may help explain why there's no easy answers in American politics.
  • "The Omnivore's Dilemma: A Natural History of Four Meals" by Michael Pollan - a really well written book about the current industrial food chain, organic foods, "free-range" farm foods and also modern hunter-gathering. I liked the balanced approach and the book not being too preachy even though the conclusions are fairly clear.
  • "Programming Haskell" by Graham Hutton (2nd ed.) - full review.
  • "The Namesake" by Jhumpa Lahiri - this time a full-length novel about the same topic of Bengal immigrants to the US in the 1970s, following their life through the early 2000s. The protagonist is Gogol, born in Boston to immigrant parents, trying to have a normative American life in the late 20th century. I liked the first ~half of the book much more, as it was more tied to Lahiri's main immigrant theme. The second half felt like a fairly generic "coming-of-age" story for a young intellectual in New York. Still a good read overall, though.
  • "The Vital Question" by Nick Lane - another grand-sweeping account of fundamental bio-chemistry and evolution mixed with speculative musings on the big questions of the origin of life by Nick Lane. My opinion of this book is similar to that of the author's Life Ascending - really well written, tons of knowledge, but way too dense and way too speculative. I'd say the minimal education level for really understanding this book is being a grad student in biology or a related field; otherwise it's fairly hard to judge where the facts end and where the speculation begins. I did enjoy the book overall, however - there's tons of fascinating details in it that just aren't covered anywhere else in popular science.
  • "In Defense of Food: An Eater's Manifesto" by Michael Pollan - the book that gave birth to the saying "Eat food, not too much, mostly plants". A nice and pragmatic treatment of the important question of "what should we eat".

Re-reads:

  • "Travels with Charley in Search of America" by John Steinbeck

Concurrent Servers Part 5 - Redis case study



This is part 5 in a series of posts on writing concurrent network servers. After discussing techniques for constructing concurrent servers in parts 1-4, this time we're going to do a case study of an existing production-quality server - Redis.

Redis logo

Redis is a fascinating project and I've been following it with interest for a while now. One of the things I admire most about Redis is the clarity of its C source code. It also happens to be a great example of a high-performance concurrent in-memory database server, so the opportunity to use it as a case study for this series was too good to ignore.

Let's see how the ideas discussed in parts 1-4 apply to a real-world application.

All posts in the series:

Event-handling library

One of Redis's main claims to fame around the time of its original release in 2009 was its speed - the sheer number of concurrent client connections the server could handle. It was especially notable that Redis did this all in a single thread, without any complex locking and synchronization schemes on the data stored in memory.

This feat was achieved by Redis's own implementation of an event-driven library which is wrapping the fastest event loop available on a system (epoll for Linux, kqueue for BSD and so on). This library is called ae. ae makes it possible to write a fast server as long as none of the internals are blocking, which Redis goes to great lengths to guarantee [1].

What mainly interests us here is ae's support of file events - registering callbacks to be invoked when file descriptors (like network sockets) have something interesting pending. Like libuv, ae supports multiple event loops and - having read parts 3 and 4 in this series - the signature of aeCreateFileEvent shouldn't be surprising:

int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask,
                      aeFileProc *proc, void *clientData);

It registers a callback (proc) for new file events on fd, with the given event loop. When using epoll, it will call epoll_ctl to add an event on the file descriptor (either EPOLLIN, EPOLLOUT or both, depending on the mask parameter). ae's aeProcessEvents is the "run the event loop and dispatch callbacks" function, and it calls epoll_wait under the hood.

Handling client requests

Let's trace through the Redis server code to see how ae is used to register callbacks for client events. initServer starts it by registering a callback for read events on the socket(s) being listened to, by calling aeCreateFileEvent with the callback acceptTcpHandler. This callback is invoked when new client connections are available. It calls accept [2] and then acceptCommonHandler, which in turn calls createClient to initialize the data structures required to track a new client connection.

createClient's job is to start listening for data coming in from the client. It sets the socket to non-blocking mode (a key ingredient in an asynchronous event loop) and registers another file event callback with aeCreateFileEvent - for read events - readQueryFromClient. This function will be invoked by the event loop every time the client sends some data.

readQueryFromClient does just what we'd expect - parses the client's command and acts on it by querying and/or manipulating data and sending a reply back. Since the client socket is non-blocking, this function has to be able to handle EAGAIN, as well as partial data; data read from the client is accumulated in a client-specific buffer, and the full query may be split across multiple invocations of the callback.

Sending data back to clients

In the previous paragraph I said that readQueryFromClient ends up sending replies back to clients. This is logically true, because readQueryFromClient prepares the reply to be sent, but it doesn't actually do the physical sending - since there's no guarantee the client socket is ready for writing/sending data. We have to use the event loop machinery for that.

The way Redis does this is by registering a beforeSleep function to be called every time the event loop is about to go sleeping waiting for sockets to become available for reading/writing. One of the things beforeSleep does is call handleClientsWithPendingWrites. This function tries to send all available replies immediately by calling writeToClient; if some of the sockets are unavailable, it registers an event-loop callback to invoke sendReplyToClient when the socket is ready. This can be seen as a kind of optimization - if the socket is immediately ready for sending (which often is the case for TCP sockets), there's no need to register the event - just send the data. Since sockets are non-blocking, this never actually blocks the loop.

Why does Redis roll its own event library?

In part 4 we've discussed building asynchronous concurrent servers using libuv. It's interesting to ponder the fact that Redis doesn't use libuv, or any similar event library, and instead implements its own - ae, including wrappers for epoll, kqueue and select. In fact, antirez (Redis's creator) answered precisely this question in a blog post in 2011. The gist of his answer: ae is ~770 lines of code he intimately understands; libuv is huge, without providing additional functionality Redis needs.

Today, ae has grown to ~1300 lines, which is still trivial compared to libuv's 26K (this is without Windows, test, samples, docs). libuv is a far more general library, which makes it more complex and more difficult to adapt to the particular needs of another project; ae, on the other hand, was designed for Redis, co-evolved with Redis and contains only what Redis needs.

This is another great example of the dependencies in software projects formula I mentioned in a post earlier this year:

The benefit of dependencies is inversely proportional to the amount of effort spent on a software project.

antirez referred to this, to some extent, in his post. He mentioned that dependencies that provide a lot of added value ("foundational" dependencies in my post) make more sense (jemalloc and Lua are his examples) than dependencies like libuv, whose functionality is fairly easy to implement for the particular needs of Redis.

Multi-threading in Redis

For the vast majority of its history, Redis has been a purely single-threaded affair. Some people find this surprising, but it makes total sense with a bit of thought. Redis is inherently network-bound - as long as the database size is reasonable, for any given client request, much more time is spent waiting on the network than inside Redis's data structures.

These days, however, things are not quite that simple. There are several new capabilities in Redis that use threads:

  1. "Lazy" freeing of memory.
  2. Writing a persistence journal with fsync calls in a background thread.
  3. Running user-defined modules that need to perform a long-running operation.

For the first two features, Redis uses its own simple bio library (the acronym stands for "Background I/O"). The library is hard-coded for Redis's needs and can't be used outside it - it runs a pre-set number of threads, one per background job type Redis needs.

For the third feature, Redis modules could define new Redis commands, and thus are held to the same standards as regular Redis commands, including not blocking the main thread. If a custom Redis command defined in a module wants to perform a long-running operation, it has to spin up a thread to run it in the background. src/modules/helloblock.c in the Redis tree provides an example.

With these features, Redis combines an event loop with threading to get both speed in the common case and flexibility in the general case, similarly to the work queue discussion in part 4 of this series.


[1]A core aspect of Redis is its being an in-memory database; therefore, queries should never take too long to execute. There are all kinds of complications, however. In case of partitioning, a server may end up routing the request to another instance; in this case async I/O is used to avoid blocking other clients.
[2]Through anetAccept; anet is Redis's wrapper for TCP socket code.