Covariance and contravariance in subtyping

Many programming languages support subtyping, a kind of polymorphism that lets us define hierarchical relations on types, with specific types being subtypes of more generic types. For example, a Cat could be a subtype of Mammal, which itself is a subtype of Vertebrate.

Intuitively, functions that accept any Mammal would accept a Cat too. More formally, this is known as the Liskov substitution principle:

Let \phi (x) be a property provable about objects x of type T. Then \phi (y) should be true for objects y of type S where S is a subtype of T.

A shorter way to say S is a subtype of T is S <: T. The relation <: is also sometimes expressed as \le, and can be thought of as "is less general than". So Cat <: Mammal and Mammal <: Vertebrate. Naturally, <: is transitive, so Cat <: Vertebrate; it's also reflexive, as T <: T for any type T [1].

Kinds of variance in subtyping

Variance refers to how subtyping between composite types (e.g. list of Cats versus list of Mammals) relates to subtyping between their components (e.g. Cats and Mammals). Let's use the general Composite<T> to refer to some composite type with components of type T.

Given types S and T with the relation S <: T, variance is a way to describe the relation between the composite types:

  • Covariant means the ordering of component types is preserved: Composite<S> <: Composite<T>.
  • Contravariant means the ordering is reversed: Composite<T> <: Composite<S> [2].
  • Bivariant means both covariant and contravariant.
  • Invariant means neither covariant nor contravariant.

That's a lot of theory and rules right in the beginning; the following examples should help clarify all of this.

Covariance in return types of overriding methods in C++

In C++, when a subclass method overrides a similarly named method in a superclass, their signatures have to match. There is an important exception to this rule, however. When the original return type is B* or B&, the return type of the overriding function is allowed to be D* or D& respectively, provided that D is a public subclass of B. This rule is important to implement methods like Clone:

struct Mammal {
  virtual ~Mammal() = 0;
  virtual Mammal* Clone() = 0;

struct Cat : public Mammal {
  virtual ~Cat() {}

  Cat* Clone() override {
    return new Cat(*this);

struct Dog : public Mammal {
  virtual ~Dog() {}

  Dog* Clone() override {
    return new Dog(*this);

And we can write functions like the following:

Mammal* DoSomething(Mammal* m) {
  Mammal* cloned = m->Clone();
  // Do something with cloned
  return cloned;

No matter what the concrete run-time class of m is, m->Clone() will return the right kind of object.

Armed with our new terminology, we can say that the return type rule for overriding methods is covariant for pointer and reference types. In other words, given Cat <: Mammal we have Cat* <: Mammal*.

Being able to replace Mammal* by Cat* seems like a natural thing to do in C++, but not all typing rules are covariant. Consider this code:

struct MammalClinic {
  virtual void Accept(Mammal* m);

struct CatClinic : public MammalClinic {
  virtual void Accept(Cat* c);

Looks legit? We have general MammalClinics that accept all mammals, and more specialized CatClinics that only accept cats. Given a MammalClinic*, we should be able to call Accept and the right one will be invoked at run-time, right? Wrong. CatClinic::Accept does not actually override MammalClinic::Accept; it simply overloads it. If we try to add the override keyword (as we should always do starting with C++11):

struct CatClinic : public MammalClinic {
  virtual void Accept(Cat* c) override;

We'll get:

error: ‘virtual void CatClinic::Accept(Cat*)’ marked ‘override’, but does not override
   virtual void Accept(Cat* c) override;

This is precisely what the override keyword was created for - help us find erroneous assumptions about methods overriding other methods. The reality is that function overrides are not covariant for pointer types. They are invariant. In fact, the vast majority of typing rules in C++ are invariant; std::vector<Cat> is not a subclass of std::vector<Mammal>, even though Cat <: Mammal. As the next section demonstrates, there's a good reason for that.

Covariant arrays in Java

Suppose we have PersianCat <: Cat, and some class representing a list of cats. Does it make sense for lists to be covariant? On initial thought, yes. Say we have this (pseudocode) function:

MakeThemMeow(List<Cat> lst) {
    for each cat in lst {

Why shouldn't we be able to pass a List<PersianCat> into it? After all, all persian cats are cats, so they can all meow! As long as lists are immutable, this is actually safe. The problem appears when lists can be modified. The best example of this problem can be demonstrated with actual Java code, since in Java array constructors are covariant:

class Main {
  public static void main(String[] args) {
    String strings[] = {"house", "daisy"};
    Object objects[] = strings; // covariant

    objects[1] = "cauliflower"; // works fine
    objects[0] = 5;             // throws exception

In Java, String <: Object, and since arrays are covariant, it means that String[] <: Object[], which makes the assignment on the line marked with "covariant" type-check successfully. From that point on, objects is an array of Object as far as the compiler is concerned, so assigning anything that's a subclass of Object to its elements is kosher, including integers [3]. Therefore the last line in main throws an exception at run-time:

Exception in thread "main" java.lang.ArrayStoreException: java.lang.Integer
    at Main.main(

Assigning an integer fails because at run-time it's known that objects is actually an array of strings. Thus, covariance together with mutability makes array types unsound. Note, however, that this is not just a mistake - it's a deliberate historical decision made when Java didn't have generics and polymorphism was still desired; the same problem exists in C# - read this for more details.

Other languages have immutable containers, which can then be made covariant without jeopardizing the soundness of the type system. For example in OCaml lists are immutable and covariant.

Contravariance for function types

Covariance seems like a pretty intuitive concept, but what about contravariance? When does it make sense to reverse the subtyping relation for composite types to get Composite<T> <: Composite<S> for S <: T?

An important use case is function types. Consider a function that takes a Mammal and returns a Mammal; in functional programming the type of this function is commonly referred to as Mammal -> Mammal. Which function types are valid subtypes of this type?

Here's a pseudo-code definition that makes it easier to discuss:

func user(f : Mammal -> Mammal) {
  // do stuff with 'f'

Can we call user providing it a function of type Mammal -> Cat as f? Inside its body, user may invoke f and expect its return value to be a Mammal. Since Mammal -> Cat returns cats, that's fine, so this usage is safe. It aligns with our earlier intuition that covariance makes sense for function return types.

Note that passing a Mammal -> Vertebrate function as f doesn't work as well, because user expects f to return Mammals, but our function may return a Vertebrate that's not a Mammal (maybe a Bird). Therefore, function return types are not contravariant.

But what about function parameters? So far we've been looking at function types that take Mammal - an exact match for the expected signature of f. Can we call user with a function of type Cat -> Mammal? No, because user expects to be able to pass any kind of Mammal into f, not just Cats. So function parameters are not covariant. On the other hand, it should be safe to pass a function of type Vertebrate -> Mammal as f, because it can take any Mammal, and that's what user is going to pass to it. So contravariance makes sense for function parameters.

Most generally, we can say that Vertebrate -> Cat is a subtype of Mammal -> Mammal, because parameters types are contravariant and return types are covariant. A nice quote that can help remember these rules is: be liberal in what you accept and conservative in what you produce.

This is not just theory; if we go back to C++, this is exactly how function types with std::function behave:

#include <functional>

struct Vertebrate {};
struct Mammal : public Vertebrate {};
struct Cat : public Mammal {};

Cat* f1(Vertebrate* v) {
  return nullptr;

Vertebrate* f2(Vertebrate* v) {
  return nullptr;

Cat* f3(Cat* v) {
  return nullptr;

void User(std::function<Mammal*(Mammal*)> f) {
  // do stuff with 'f'

int main() {
  User(f1);       // works

  return 0;

The invocation User(f1) compiles, because f1 is convertible to the type std::function<Mammal*(Mammal*)> [4]. Had we tried to invoke User(f2) or User(f3), they would fail because neither f2 nor f3 are proper subtypes of std::function<Mammal*(Mammal*)>.


So far we've seen examples of invariance, covariance and contravariance. What about bivariance? Recall, bivariance means that given S <: T, both Composite<S> <: Composite<T> and Composite<T> <: Composite<S> are true. When is this useful? Not often at all, it turns out.

In TypeScript, function parameters are bivariant. The following code compiles correctly but fails at run-time:

function trainDog(d: Dog) { ... }
function cloneAnimal(source: Animal, done: (result: Animal) => void): void { ... }
let c = new Cat();

// Runtime error here occurs because we end up invoking 'trainDog' with a 'Cat'
cloneAnimal(c, trainDog);

Once again, this is not because the TypeScript designers are incompetent. The reason is fairly intricate and explained on this page; the summary is that it's needed to help the type-checker treat functions that don't mutate their arguments as covariant for arrays.

That said, in TypeScript 2.6 this is being changed with a new strictness flag that treats parameters only contravariantly.

Explicit variance specification in Python type-checking

If you had to guess which of the mainstream languages has the most advanced support for variance in their type system, Python probably wouldn't be your first guess, right? I admit it wasn't mine either, because Python is dynamically (duck) typed. But the new type hinting support (described in PEP 484 with more details in PEP 483) is actually fairly advanced.

Here's an example:

class Mammal:

class Cat(Mammal):

def count_mammals_list(seq : List[Mammal]) -> int:
    return len(seq)

mlst = [Mammal(), Mammal()]

If we run mypy type-checking on this code, it will succeed. count_mammals_list takes a list of Mammals, and this is what we passed in; so far, so good. However, the following will fail:

clst = [Cat(), Cat()]

Because List is not covariant. Python doesn't know whether count_mammals_list will modify the list, so allowing calls with a list of Cats is potentially unsafe.

It turns out that the typing module lets us express the variance of types explicitly. Here's a very minimal "immutable list" implementation that only supports counting elements:

T_co = TypeVar('T_co', covariant=True)

class ImmutableList(Generic[T_co]):
    def __init__(self, items: Iterable[T_co]) -> None:
        self.lst = list(items)

    def __len__(self) -> int:
        return len(self.lst)

And now if we define:

def count_mammals_ilist(seq : ImmutableList[Mammal]) -> int:
    return len(seq)

We can actually invoke it with a ImmutableList of Cats, and this will pass type checking:

cimmlst = ImmutableList([Cat(), Cat()])

Similarly, we can support contravariant types, etc. The typing module also provides a number of useful built-ins; for example, it's not really necessary to create an ImmutableList type, as there's already a Sequence type that is covariant.

[1]In most cases <: is also antisymmetric, making it a partial order, but in some cases it isn't; for example, structs with permuted fields can be considered subtypes of each other (in most languages they aren't!) but such subtyping is not antisymmetric.
[2]These terms come from math, and a good rule of thumb to remember how they apply is: co means together, while contra means against. As long as the composite types vary together (in the same direction) as their component types, they are co-variant. When they vary against their component types (in the reverse direction), they are contra-variant.
[3]Strictly speaking, integer literals like 5 are primitives in Java and not objects at all. However, due to autoboxing, this is equivalent to wrapping the 5 in Integer prior to the assignment.
[4]Note that we're using pointer types here. The same example would work with std::function<Mammal(Mammal)> and corresponding f1 taking and returning value types. It's just that in C++ value types are not very useful for polymorphism, so pointer (or reference) values are much more commonly used.

Go hits the concurrency nail right on the head

More than thirteen years have passed since Herb Sutter declared that the free lunch is over and concurrency is upon us, and yet it's hard to claim that most mainstream languages have made a strong shift towards concurrent modes of programming. We have to admit that concurrency is just hard, and the struggles of some of the world's leading programming languages bear witness to this challenge.

Unfortunately, most languages didn't yet move past the threads vs. asynchronous dichotomy. You either use threads, or a single-threaded event loop decorated with a bunch of bells and whistles to make code more palatable. Mixing threads with event loops is possible, but so complicated that few programmers can afford the mental burden for their applications.

Threads aren't a bad thing in languages that have good library support for them, and their scalability is much better than it used to be a decade ago, but for very high levels of concurrency (~100,000 threads and above) they are still inadequate. On the other hand, event-driven programming models are usually single-threaded and don't make good use of the underlying HW. More offensively, they significantly complicate the programming model. I've enjoyed Bob Nystrom's What Color is Your Function for explaining how annoying the model of "non-blocking only here, please" is. The core idea is that in the asynchronous model we have to mentally note the blocking nature of every function, and this affects where we can call it from.

Python took the plunge with asyncio, which is so complicated that many Python luminaries admit they don't understand it, and of course it suffers from the "function color" problem as well, where any blocking call can ruin your day. C++ seems to be going in a similar direction with the coroutines proposal for C++20, but C++ has much less ability to hide magic from users than Python, so I predict it will end up with a glorious soup of templates that even fewer will be able to understand. The fundamental issue here is that both Python and C++ try to solve this problem on a library level, when it really needs a language runtime solution.

What Go does right

As you've probably guessed from this article's title, this brings us to Go. I'm happy to go on record claiming that Go is the mainstream language that gets this really right. And it does so by relying on two key principles in its core design:

  1. Seamless light-weight preemptive concurrency across cores
  2. CSP and sharing by communicating

These two principles are implemented very well in Go, and in unison make concurrent programming in it the best experience, by far, compared to other popular programming languages today. The main reason for this is that they are both implemented in the language runtime, rather than being delegated to libraries.

You can think of goroutines as threads, it's a fairly good mental model. They are truly cheap threads - because the Go runtime implements launching them and switching between them without deferring to the OS kernel. In a recent post I've measured goroutine switching time to be ~170 ns on my machine, 10x faster than thread switching time.

But it's not just the switching time; goroutines also have small stacks that can grow at run-time (something thread stacks cannot do), which is also carefully tuned to be able to run millions of goroutines simultaneously.

There's no magic here; consider this claim - if threads in C++ or JS or Python were extremely lightweight and fast, we wouldn't need async models. Well, this is the case with Go. As Bob Nystrom says in his post - Go has eliminated the distinction between synchronous and asynchronous code.

That's not all, however. The second principle is critical too. The main objections to threads are not just about performance, there's also correctness issues and complexity. Programming with threads is hard - it's hard to synchronize access to data structures without causing deadlocks; it's hard to reason about multiple threads accessing the same data, it's hard to choose the right locking granularity, etc.

And this is where Go's sharing by communicating principle comes in. In idiomatic Go programs you won't see a lot of mutexes, condition variables and critical areas protecting shared data. In fact, you probably won't see much locking at all. This is because Go encourages programmers to use channels instead, and channels are built into the language, with awesome features like select, and so on. Proper use of channels removes the need for more explicit locking, is easier to write correctly, tune for performance, and debug.

Moreover, building these capabilities into the runtime, Go can implement great tools like the race detector, which make concurrency bugs easier to smoke out. It all just fits together so nicely! Obviously many challenges of concurrent programming remain in Go - these are the essential complexities of the problem that no language is likely to remove; Go does a great job at removing the incidental complexities, though.

For these reasons, I believe Ryan Dahl - the creator of Node.js - is absolutely right when he says:

[...] if you’re building a server, I can’t imagine using anything other than Go. [...] I think Node is not the best system to build a massive server web. I would use Go for that. And honestly, that’s the reason why I left Node. It was the realization that: oh, actually, this is not the best server-side system ever.

Different languages are good for different things, which is why programmers should have several sufficiently different languages in their arsenal. If concurrency is central to your application, Go is the language to use.

Partial and Total Orders

Imagine a set of 2D rectangles of different sizes; let's assume for the sake of simplicity that no two rectangles in this set have exactly the same size. Here is a sample set:

Five boxes of different sizes

We'll say that box X fits inside box Y if we could physically enclose X inside Y; in other words, if Y's dimensions are larger than X's. In this example:

  • Box A can fit inside box B, but not the other way around
  • E can fit inside all other boxes, but no other box can fit inside it
  • A, B, D, E can fit inside C, which itself cannot fit in any of the other boxes
  • D cannot fit inside A or B; neither can A or B fit inside D

As we're going to see soon, in this case "fits" is a partial order on a set of 2D rectangular boxes, because even though we can order some of the boxes relative to each other, some other pairs of boxes have no relative order among themselves (for example A and D).

If all pairs of boxes in this set had relative ordering - for example, consider the set without box D - we could define a total order on the set. Another example for this is a set of 2D squares (rather than rectangles); as long as all the squares in the set have unique sizes [1], we can always define a total order on them because for any pair of squares either the first can fit in the second, or vice versa.

Mathematical definition of relations

To develop a mathematically sound approach to ordering, we'll have to dip our feet into set theory and relations. We'll only be talking about binary relations here.

Given a set A, a relation on A is a set of pairs with elements taken from A. A bit more rigorously, given that A\times A is the set containing all possible ordered pairs taken from A (a.k.a. the Cartesian product of A), then R is a relation on A if it's a subset of A\times A, or R\subseteq A\times A.

For example, given the set A=\{1,2,3\}, then:

\[A\times A=\{\left(1,1\right),\left(1,2\right),\left(1,3\right),\left(2,1\right),\left(2,2\right),\left(2,3\right),\left(3,1\right),\left(3,2\right),\left(3,3\right)\}\]

Note that we explicitly defined the pairs to be ordered, meaning that (1,2) and (2,1) are two distinct elements in this set.

By definition, any subset of A\times A is a relation on A. For example R=\{\left(1,1\right),\left(2,2\right),\left(3,3\right)\}. In programming, we often use the term predicate to express a similar idea. A predicate is a function with a binary outcome, and the correspondence to relations is trivial - we just say that all pairs belonging to the relation satisfy the predicate, and vice versa. If we defined a predicate R(x,y) to be true if and only if x==y, we'd get the relation above.

A shortcut notation that will make definitions cleaner: we say xRy when \left(x,y\right)\in R. In our example set 1R1, 2R2 and 3R3. This notation is a bit awkward, but it's the accepted standard in math; therefore I'm using it for consistency with other sources.

Besides, it becomes nicer when R is an operator. If we redefine R as ==, it becomes more natural: 1==1, 2==2, 3==3. The equality relation is a perfectly valid relation on a set - its elements are all the pairs where both members are the same value.

Properties of relations

There are a number of useful properties relations could have. Here are just a few that we'll need for the rest of the article; for a longer list, see the Wikipedia page.

Reflexive: every element in the set is related to itself, or \forall x\in A, xRx. The == relation shown above is reflexive.

Irreflexive: no element in the set is related to itself, or \neg\exists x\in A, xRx. For example if we define the < less than relation on numbers, it's irreflexive since no number is less than itself. In our boxes example, the "fits in" relation is irreflexive because no box can fit inside itself.

Transitive: intuitively, "if x fits inside y, and y fits inside z, then x fits inside z". Mathematically \forall x,y,z \in A, \left(xRy \wedge yRz \right )\rightarrow xRz. The < relation on numbers is obviously transitive.

Symmetric: if x is related to y, then y is related to x. This might sound obvious with the colloquial meaning of "related", but not in the mathematical sense. Most relations we deal with aren't symmetric. The definition is \forall x,y \in A, xRy \rightarrow yRx. For example, the relation == is symmetric, but < is not symmetric.

Antisymmetric: if x is related to y, then y is not related to x unless x and y are the same element; mathematically \forall x,y \in A, \left(xRy \wedge yRx \right ) \rightarrow x=y. For example, the relation \le (less than or equal) is antisymmetric; if x \le y and also y \le x then it must be that x and y are the same number. The relation < is also antisymmetric, though in the empty sense because we won't be able to find any pair x and y to satisfy the left side of the definition; in logic this is called vacuously.

Partial order

There are two kinds of partial orders we can define - weak and strong. The weak partial order is the more common one, so let's start with that. Whenever I'm saying just "partial order", I'll mean a weak partial order.

A weak partial order (a.k.a. non-strict) is a relation on a set A that is reflexive, transitive and antisymmetric. The \le relation on numbers is a classical example:

  • It is reflexive because for any number x we have x\le x
  • It is transitive because given x\le y and y\le z, we know that x\le z
  • It is antisymmetric because given x\le y and y \le x, we know that x and y are the same number

A strong partial order (a.k.a. strict) is a relation on a set A that is irreflexive, transitive and antisymmetric. The difference between weak and strong partial orders is reflexivity. In weak partial orders, every element is related to itself; in strong partial orders, no element is related to itself. The operator < on numbers is an example of strict partial order, since it satisfies all the properties; while \le is reflexive, < is irreflexive.

Our rectantular boxes with the "fits" relation is a good example to distinguish between the two. We can only define a strong partial order on them, because a box cannot fit inside itself.

Another good example is a morning dressing routine. The set of clothes to wear is {underwear, pants, jacket, shirt, left sock, right sock, left shoe, right shoe}, and the relation is "has to be worn before". The following drawing encodes the relation:

Partial ordering of dressing different clothes; what comes before what

This kind of drawing is called a Hasse diagram, which is useful to graphically represent partially ordered sets [2]; the arrow represents the relation. For example, the arrow from "pants" to "left shoe" encodes that pants have to be worn before the left shoe.

Note that this relation is irreflextive, because it's meaningless to say that "pants have to be worn before wearing pants". Therefore, the relation defines a strong partial order on the set.

Similarly to the rectangular boxes example, the partial order here lets us order only some of the elements in the set w.r.t. each other. Some elements like socks and a shirt don't have an order defined.

Total order

A total order is a partial order that has one additional property - any two elements in the set should be related. Mathematically:

\[\forall x\in A\forall y\in A, \left(xRy \vee yRx \right )\]

While a partial order lets us order some elements in a set w.r.t. each other, total order requires us to be able to order all elements in a set. In the boxes example, we can't define a total order for rectangular boxes (there is not "fits in" relation between boxes A and D, no matter which way we try). We can define a total order between square boxes, however, as long as their sizes are unique.

Neither can we define a total order for the dressing diagram shown above, because we can't say either "left socks have to be worn before shirts" or "shirts have to be worn before left socks".

Examples from programming

Partial and total orders frequently come up in programming, especially when thinking about sorts. Sorting an array usually implies finding some total order on its elements. Tie breaking is important, but not always possible. If there is no way to tell two elements apart, we cannot mathematically come up with a total order, but we can still sort (and we do have a weak partial order). This is where the distinction between regular and stable sorts comes in.

Sometimes we're sorting non-linear structures, like dependency graphs in the dressing example from above. In these cases a total order is impossible, but we do have a partial order which can be useful to find a "valid" dressing order - a linear sequence of dressing steps that wouldn't violate any constraints. This can be done with topological sorting which finds a valid "linearization" of the dependency graph.

[1]You may notice that saying "unique" when talking about sets can sound superfluous; after all, sets are defined to have distinct elements. That said, it's not clear what "distinct" means. In our case, distinct can refer to the complete identities of the boxes; for example, two boxes can have the exact same dimensions but different colors - so they are not the same as far as the set is concerned. Moreover, in programming identity is further moot and can be defined for specific types in specific ways. For these reasons I'm going to call out uniqueness explicitly to avoid confusion.
[2]A partially ordered set with R (or poset with R) is a set with a relation R that is a partial order on it.