Tags Math

This is a brief article on affine mappings and their relation to linear mappings, with some applications.

Linear vs. Affine

To start discussing affine mappings, we have to first address a common confusion around what it means for a function to be linear.

According to Wikipedia the term linear function can refer to two distinct concepts, based on the context:

  1. In Calculus, a linear function is a polynomial function of degree zero or one; in other words, a function of the form f(x)=ax+b for some constants a and b.
  2. In Linear Algebra, a linear function is a linear mapping, or linear transformation.

In this article we're going to be using (2) as the definition of linear, and it will soon become obvious why (1) is confusing when talking about transformations. To avoid some of the jumble going forward, I'm goine to be using the term mapping instead of function, but in linear algebra the two are interchangeable (transformation is another synonym, which I'm going to be making less effort to avoid since it's not as overloaded [1]).

Linear transformations

Since we're talking about linear algebra, let's use the domain of vector spaces for the definitions. A transformation (or mapping) f is linear when for any two vectors \vec{v} and \vec{w} (assuming the vectors are in the same vector space, say \mathbb{R}^2):

  • f(\vec{v}+\vec{w})=f(\vec{v})+f(\vec{w})
  • f(k\vec{v})=kf(\vec{v}) for some scalar k

For example, the mapping f(\vec{v})=\langle 3v_1-4v_2,v_2 \rangle - where v_1 and v_2 are the components of \vec{v} - is linear. The mapping g(\vec{v})=\langle v_2,2v_{1}v_{2} \rangle is not linear.

In fact, it can be shown that for the kind of vector spaces we're mostly interested in [2], any linear mapping can be represented by a matrix that is multiplied by the input vector. This is because we can represent any vector in terms of the standard basis vectors: \vec{v}=v_1\vec{e}_1+...+v_n\vec{e}_n. Then, since f is linear:

\[f(\vec{v})=f(\sum_{i=1}^{n}v_i\vec{e}_i)=\sum_{i=1}^{n}v_if(\vec{e}_i)\]

If we think of f(\vec{e}_i) as column vectors, this is precisely the multiplication of a matrix by \vec{v}:

\[f(\vec{v}) = \begin{pmatrix} \mid & \mid & & \mid \\ f(\vec{e}_1) & f(\vec{e}_2) & \cdots & f(\vec{e}_n) \\ \mid & \mid & & \mid \\ \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \\ ... \\ v_n \end{pmatrix}\]

This multiplication by a matrix can also be seen as a change of basis for \vec{v} from the standard base to a base defined by f. If you want a refresher on how changes of basis work, take a look at my older post on this topic.

Let's get back to our earlier example of the mapping f(\vec{v})=\langle 3v_1-4v_2,v_2 \rangle. We can represent this mapping with the following matrix:

\[\begin{pmatrix} 3 & -4 \\ 0 & 1 \end{pmatrix}\]

Meaning that:

\[f(\vec{v})=\begin{pmatrix} 3 & -4 \\ 0 & 1 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix}\]

Representing linear mappings this way gives us a number of interesting tools for working with them. For example, the associativity of matrix multiplication means that we can represent compositions of mappings by simply multiplying the mapping matrices together.

Consider the following mapping:

\[S=\begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix}\]

In equational form: S(\vec{v})=\langle 2v_1,2v_2 \rangle. This mapping stretches the input vector 2x in both dimensions. To visualize a mapping, it's useful to examine its effects on some standard vectors. Let's use the vectors (0,0), (0,1), (1,0), (1,1) (the "unit square"). In \mathbb{R}^2 they represent four points that can be connected together as follows [3]:

Unit vectors as points on the plane

It's easy to see that when transformed with S, we'll get:

Unit vectors trasformed with 2x stretch

It's also well known that rotation (relative to the origin) can be modeled with the following mapping with \theta in radians:

\[R=\begin{pmatrix} cos\theta & sin\theta \\ -sin\theta & cos\theta \end{pmatrix}\]

Transforming our unit square with this matrix we get:

Unit vectors trasformed with rotation by one radian

Finally, let's say we want to combine these transformations. To stretch and then rotate a vector, we would do: f(\vec{v})=R(Sv). Since matrix multiplication is associative, this can also be rewritten as: f(\vec{v})=(RS)v. In other words, we can find a matrix A=RS which represents the combined transformation, and we "find" it by simply multiplying R and S together [4]:

\[A=\begin{pmatrix} cos\theta & sin\theta \\ -sin\theta & cos\theta \end{pmatrix}\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}=\begin{pmatrix} 2cos\theta & 2sin\theta \\ -2sin\theta & 2cos\theta \end{pmatrix}\]

And when we multiply our unit by this matrix we get:

Unit vectors transformed with rotation and stretch

Affine transformations

Now that we have some good context on linear transformations, it's time to get to the main topic of this post - affine transformations.

For an affine space (we'll talk about what this is exactly in a later section), every affine transformation is of the form g(\vec{v})=Av+b where A is a matrix representing a linear transformation and b is a vector. In other words, an affine transformation combines a linear transformation with a translation.

Quite obviously, every linear transformation is affine (just set b to the zero vector). However, not every affine transformation is linear. For a non-zero b, the linearity rules don't check out. Let's say that:

\[\begin{align*} f(\vec{v})&=A\vec{v}+\vec{b} \\ f(\vec{w})&=A\vec{w}+\vec{b} \end{align*}\]

Then if we try to add these together, we get:

\[f(\vec{v}+\vec{w})=A(\vec{v}+\vec{w})+\vec{b}\]

Whereas:

\[f(\vec{v})+f(\vec{w})=A\vec{v}+b+A\vec{w}+b=A(\vec{v}+\vec{w})+2b\]

The violation of the scalar multiplication rule can be checked similarly.

Let's examine the affine transformation that stretches a vector by a factor of two (similarly to the S transformation we've discussed before) and translates it by 0.5 for both dimensions:

\[f(\vec{v})=\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}\vec{v}+\begin{pmatrix} 0.5 \\ 0.5\end{pmatrix}\]

Here is this transformation visualized:

Unit vectors translated and stretched

With some clever augmentation, we can represent affine transformations as a multiplication by a single matrix, if we add another dimension to the vectors [5]:

\[f(\vec{v})=T\vec{v}=\begin{pmatrix} 2 & 0 & 0.5 \\ 0 & 2 & 0.5 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ 1 \end{pmatrix}\]

The translation vector is tacked on the right-hand side of the transform matrix, with a 1 for the extra dimension (the matrix gets 0s in that dimension). The result will always have a 1 in the final dimension, which we can ignore.

Affine transforms can be composed similarly to linear transforms, using matrix multiplication. This also makes them associative. As an example, let's compose the scaling+translation transform discussed most recently with the rotation transform mentioned earlier. This is the augmented matrix for the rotation:

\[R=\begin{pmatrix} cos\theta & sin\theta & 0 \\ -sin\theta & cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}\]

The composed transform will be f(\vec{v})=T(R(\vec{v}))=(TR)\vec{v}. Its matrix is:

\[TR=\begin{pmatrix} 2 & 0 & 0.5 \\ 0 & 2 & 0.5 \\ 0 & 0 & 1 \end{pmatrix}\begin{pmatrix} cos\theta & sin\theta & 0 \\ -sin\theta & cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}=\begin{pmatrix} 2cos\theta & 2sin\theta & 0.5 \\ -2sin\theta & 2cos\theta & 0.5 \\ 0 & 0 & 1 \end{pmatrix}\]

The visualization is:

Translation after rotation

Affine subspaces

The previous section defined affine transformation w.r.t. the concept of affine space, and now it's time to pay the rigor debt. According to Wikipedia, an affine space:

... is a geometric structure that generalizes the properties of Euclidean spaces in such a way that these are independent of the concepts of distance and measure of angles, keeping only the properties related to parallelism and ratio of lengths for parallel line segments.

Since we've been using vectors and vector spaces so far in the article, let's see the relation between vector spaces and affine spaces. The best explanation I found online is the following.

Consider the vector space \mathbb{R}^2, with two lines:

Lines for subspace and affine space of R2

The blue line can be seen as a vector subspace (also known as linear subspace) of \mathbb{R}^2. On the other hand, the green line is not a vector subspace because it doesn't contain the zero vector. The green line is an affine subspace. This leads us to a definition:

A subset U \subset V of a vector space V is an affine space if there exists a u \in U such that U - u = \{x-u \mid x \in U\} is a vector subspace of V.

If you recall the definition of affine transformations from earlier on, this should seem familiar - linear and affine subspaces are related by using a translation vector. It can also be said that an affine space is a generalization of a linear space, in that it doesn't require a specific origin point. From Wikipedia, again:

Any vector space may be considered as an affine space, and this amounts to forgetting the special role played by the zero vector. In this case, the elements of the vector space may be viewed either as points of the affine space or as displacement vectors or translations. When considered as a point, the zero vector is called the origin. Adding a fixed vector to the elements of a linear subspace of a vector space produces an affine subspace. One commonly says that this affine subspace has been obtained by translating (away from the origin) the linear subspace by the translation vector.

When mathematicians define new algebraic structures, they don't do it just for fun (well, sometimes they do) but because such structures have some properties which can lead to useful generalizations. Affine spaces and transformations also have interesting properties, which make them useful. For example, an affine transformation always maps a line to a line (and not to, say, a parabola). Any two triangles can be converted one to the other using an affine transform, and so on. This leads to interesting applications in computational geometry and 3D graphics.

Affine functions in linear regression and neural networks

Here I want to touch upon the linear vs. affine confusion again, in the context of machine learning. Recall that Linear Regression attempts to fit a line onto data in an optimal way, the line being defined as the function:

\[y(x) = mx + b\]

But as this article explained, y(x) is not actually a linear function; it's an affine function (because of the constant factor b). Should linear regression be renamed to affine regression? It's probably too late for that :-), but it's good to get the terminology right.

Similarly, a single fully connected layer in a neural network is often expressed mathematically as:

\[y(\vec{x})=W\vec{x}+\vec{b}\]

Where \vec{x} is the input vector, W is the weight matrix and \vec{b} is the bias vector. This function is also usually referred to as linear although it's actually affine.

Affine expressions and array accesses

Pivoting from algebra to programming, affine functions have a use when discussing one of the most fundamental building blocks of computer science: accessing arrays.

Let's start by defining an affine expression:

An expression is affine w.r.t. variables v_1,v_2,...,v_n if it can be expressed as c_0+c_{1}v_1+...+c_{n}v_n where c_0,c_1,...,c_n are constants.

Affine expressions are interesting because they are often used to index arrays in loops. Consider the following loop in C that copies all elements in an MxN matrix "one to the left":

for (int i = 0; i < M; ++i) {
  for (int j = 1; j < N; ++j) {
    arr[i][j-1] = arr[i][j];
  }
}

Since C's memory layout for multi-dimensional arrays is row-major, the statement in the loop assigns a value to arr[i*N + j - 1] at every iteration. i*N + j - 1 is an affine expression w.r.t. variables i and j [6].

When all expressions in a loop are affine, the loop is amenable to some advanced analyses and optimizations, but this is a topic for another post.


[1]Though it's also not entirely precise. Generally speaking, transformations are more limited than functions. A transformation is defined on a set as a binjection of the set to itself, whereas functions are more general (they can map between different sets, for example).
[2]Finite-dimensional vector spaces with a defined basis.
[3]Tossing a bit of rigor aside, we can imagine points and vectors to be isomophic since both are represented by pairs of numbers on the \mathbb{R}^2 plane. Some resources will mention the Euclidean plane - \mathbb{E}^2 when talking about points and lines, but the Euclidean plane can be modeled by a same-dimensional real plane so I'll just be using \mathbb{R}^2.
[4]I'll admit this result looks fairly obvious. But longer chains of transforms work in exactly the same way, and the fact that we can represent such chains with a single matrix is very useful.
[5]This trick has a geometrical explanation: translation in 2D can be modeled as adding a dimension and performing a 3D shear operation, then projecting the resulting object onto a 2D plane again. The object will appear shifted.
[6]It's actually only affine if N is a compile-time constant or can be proven to be constant throughout the loop.