Variadic templates in C++



Prior to C++11, the only way to write functions that take an arbitrary number of arguments was to use variadic functions like printf, with the ellipsis syntax (...) and the accompanying va_ family of macros. If you've ever written code using this approach you know how cumbersome it is. In addition to being type unsafe (all type resolution has to be done explicitly with casts in va_arg, at runtime), it's also tricky to get right. The va_ macros perform low-level memory manipulation, and I've seen a lot of code that segfaults because it isn't using them carefully enough.

But what always bothered me most with this approach is leaving something that is clearly known at compile-time, to run-time. Yes, when we write a variadic function we don't know all the ways it's going to be used. But when the compiler puts the whole program together, it does know. It sees perfectly well all the invocations of the function throughout the program, and all the possible argument types it gets passed (types are, after all, resolved at compile-time in C++).

Variadic templates

One of the new features of C++11 is variadic templates. Finally, there's a way to write functions that take an arbitrary number of arguments in a type-safe way and have all the argument handling logic resolved at compile-time, rather than run-time. Variadic templates can be used for much more than just functions that take an arbitrary number of arguments; in this article I want to demonstrate some of these capabilities.

Basic example

Let's dive in, by implementing a function that adds all of its arguments together:

template<typename T>
T adder(T v) {
  return v;
}

template<typename T, typename... Args>
T adder(T first, Args... args) {
  return first + adder(args...);
}

And here are a couple of ways we could call it:

long sum = adder(1, 2, 3, 8, 7);

std::string s1 = "x", s2 = "aa", s3 = "bb", s4 = "yy";
std::string ssum = adder(s1, s2, s3, s4);

adder will accept any number of arguments, and will compile properly as long as it can apply the + operator to them. This checking is done by the compiler, at compile time. There's nothing magical about it - it follows C++'s usual template and overload resolution rules.

typename... Args is called a template parameter pack, and Args.. args is called a function parameter pack (Args is, of course, a completely arbitrary name and could be anything else). Variadic templates are written just the way you'd write recursive code - you need a base case (the adder(T v) declaration above) and a general case which "recurses" [1]. The recursion itself happens in the call adder(args...). Note how the general adder is defined - the first argument is peeled off the template parameter pack into type T (and accordingly, argument first). So with each call, the parameter pack gets shorter by one parameter. Eventually, the base case is encountered.

To get a better feel for the process, one can use the __PRETTY_FUNCTION__ macro [2]. If we insert the following as the first line in both versions of adder above:

std::cout << __PRETTY_FUNCTION__ << "\n";

And then execute adder(1, 2, 3, 8, 7), we'll see:

T adder(T, Args...) [T = int, Args = <int, int, int, int>]
T adder(T, Args...) [T = int, Args = <int, int, int>]
T adder(T, Args...) [T = int, Args = <int, int>]
T adder(T, Args...) [T = int, Args = <int>]
T adder(T) [T = int]

Some simple variations

When reading about C++ template meta-programming, one often hears about "pattern matching" and how this part of the language constitutes a fairly complete compile-time functional language.

The example shown above is very basic - template arguments are peeled off one by one until the base case is hit. Here's a somewhat more interesting display of pattern matching:

template<typename T>
bool pair_comparer(T a, T b) {
  // In real-world code, we wouldn't compare floating point values like
  // this. It would make sense to specialize this function for floating
  // point types to use approximate comparison.
  return a == b;
}

template<typename T, typename... Args>
bool pair_comparer(T a, T b, Args... args) {
  return a == b && pair_comparer(args...);
}

pair_comparer accepts any number of arguments and returns true if and only if they are pair-wise equal. The types are not enforced - everything that can be compared goes. For example:

pair_comparer(1.5, 1.5, 2, 2, 6, 6)

Returns true. But if we change the second argument to just 1, this won't compile since a double and int are not the same type.

More interestingly, pair_comparer will only work for an even number of arguments because they are peeled off in pairs and the base case compares two. The following:

pair_comparer(1.5, 1.5, 2, 2, 6, 6, 7)

Does not compile; the compiler complains that the base case expects 2 arguments but only 1 is provided. To fix this, we can add another variation of the function template:

template<typename T>
bool pair_comparer(T a) {
  return false;
}

Here, we force all odd-numbered sequences of arguments to return false, because when only a single argument is left this version is matched.

Note that pair_comparer forces both members of the compared pair to be of the exact same type. A simple variation would be to allow different types, as long as they can be compared. I'll leave this an an exercise to the interested reader.

Performance

If you're concerned with the performance of code that relies on variadic templates, worry not. As there's no actual recursion involved, all we have is a sequence of function calls pre-generated at compile-time. This sequence is, in practice, fairly short (variadic calls with more than 5-6 arguments are rare). Since modern compilers are aggressively inlining code, it's likely to end up being compiled to machine code that has absolutely no function calls. What you end up with, actually, is not unlike loop unrolling.

Compared to the C-style variadic functions, this is a marked win, because C-style variadic arguments have to be resolved at runtime. The va_ macros are literally manipulating the runtime stack. Therefore, variadic templates are often a performance optimization for variadic functions.

Type-safe variadic functions

I have mentioned printf in the beginning of the article, as an example of a variadic function that doesn't use templates. However, as we all know, printf and its kin are not type safe. If you pass a number into a %s format, bad things may happen and the compiler won't warn you about it [3].

It's pretty obvious how variadic templates enable us to write type safe functions. In the case of printf, when the implementation reaches a new formatting directive it can actually assert the type of the argument passed. This assertion won't fire at compile-time, but it will fire - and a nice error message can be generated instead of undefined behavior.

I will not discuss the implementation of a type-safe printf further - it has been rehashed many times already. For some good examples see Stroustrup's new edition of "The C++ Programming Language", or Alexandrescu's "Variadic templates are funadic" talk.

Varidic data structures

This use-case is much more interesting, IMHO, because it was something that just wasn't possible prior to introduction of C++11, at least without considerable hackery.

Custom data structures (structs since the times of C and classes in C++) have compile-time defined fields. They can represent types that grow at runtime (std::vector, for example) but if you want to add new fields, this is something the compiler has to see. Variadic templates make it possible to define data structures that could have an arbitrary number of fields, and have this number configured per use. The prime example of this is a tuple class, and here I want to show how to construct one [4].

Let's start with the type definition:

template <class... Ts> struct tuple {};

template <class T, class... Ts>
struct tuple<T, Ts...> : tuple<Ts...> {
  tuple(T t, Ts... ts) : tuple<Ts...>(ts...), tail(t) {}

  T tail;
};

We start with the base case - the definition of a class template named tuple, which is empty. The specialization that follows peels off the first type from the parameter pack, and defines a member of that type named tail. It also derives from the tuple instantiated with the rest of the pack. This is a recursive definition that stops when there are no more types to peel off, and the base of the hierarchy is an empty tuple. To get a better feel for the resulting data structure, let's use a concrete example:

tuple<double, uint64_t, const char*> t1(12.2, 42, "big");

Ignoring the constructor, here's a pseudo-trace of the tuple structs created:

struct tuple<double, uint64_t, const char*> : tuple<uint64_t, const char*> {
  double tail;
}

struct tuple<uint64_t, const char*> : tuple<const char*> {
  uint64_t tail;
}

struct tuple<const char*> : tuple {
  const char* tail;
}

struct tuple {
}

The layout of data members in the original 3-element tuple will be:

[const char* tail, uint64_t tail, double tail]

Note that the empty base consumes no space, due to empty base optimization. Using Clang's layout dump feature, we can verify this:

*** Dumping AST Record Layout
   0 | struct tuple<double, unsigned long, const char *>
   0 |   struct tuple<unsigned long, const char *> (base)
   0 |     struct tuple<const char *> (base)
   0 |       struct tuple<> (base) (empty)
   0 |       const char * tail
   8 |     unsigned long tail
  16 |   double tail
     | [sizeof=24, dsize=24, align=8
     |  nvsize=24, nvalign=8]

Indeed, the size of the data structure and the internal layout of members is as expected.

So, the struct definition above lets us create tuples, but there's not much else we can do with them yet. The way to access tuples is with the get function template [5], so let's see how it works. First, we'll have to define a helper type that lets us access the type of the k-th element in a tuple:

template <class T, class... Ts>
struct elem_type_holder<0, tuple<T, Ts...>> {
  typedef T type;
};

template <size_t k, class T, class... Ts>
struct elem_type_holder<k, tuple<T, Ts...>> {
  typedef typename elem_type_holder<k - 1, tuple<Ts...>>::type type;
};

elem_type_holder is yet another variadic class template. It takes a number k and the tuple type we're interested in as template parameters. Note that this is a compile-time template metaprogramming construct - it acts on constants and types, not on runtime objects. For example, given elem_type_holder<2, some_tuple_type>, we'll get the following pseudo expansion:

struct elem_type_holder<2, tuple<T, Ts...>> {
  typedef typename elem_type_holder<1, tuple<Ts...>>::type type;
}

struct elem_type_holder<1, tuple<T, Ts...>> {
  typedef typename elem_type_holder<0, tuple<Ts...>>::type type;
}

struct elem_type_holder<0, tuple<T, Ts...>> {
  typedef T type;
}

So the elem_type_holder<2, some_tuple_type> peels off two types from the beginning of the tuple, and sets its type to the type of the third one, which is what we need. Armed with this, we can implement get:

template <size_t k, class... Ts>
typename std::enable_if<
    k == 0, typename elem_type_holder<0, tuple<Ts...>>::type&>::type
get(tuple<Ts...>& t) {
  return t.tail;
}

template <size_t k, class T, class... Ts>
typename std::enable_if<
    k != 0, typename elem_type_holder<k, tuple<T, Ts...>>::type&>::type
get(tuple<T, Ts...>& t) {
  tuple<Ts...>& base = t;
  return get<k - 1>(base);
}

Here, enable_if is used to select between two template overloads of get - one for when k is zero, and one for the general case which peels off the first type and recurses, as usual with variadic function templates.

Since it returns a reference, we can use get to both read tuple elements and write to them:

tuple<double, uint64_t, const char*> t1(12.2, 42, "big");

std::cout << "0th elem is " << get<0>(t1) << "\n";
std::cout << "1th elem is " << get<1>(t1) << "\n";
std::cout << "2th elem is " << get<2>(t1) << "\n";

get<1>(t1) = 103;
std::cout << "1th elem is " << get<1>(t1) << "\n";

Variadic templates for catch-all functions

Here is another example I find interesting. It's different from the ones already shown in the article, because it doesn't really use the traditional recursive approach of implementing variadic templates. Rather, it uses them to express the "any template parameters can go here" concept.

Say we want to write a function that can print out standard library containers. We want it to work for any container, and we also want the user to type as little as possible, so we don't want to act on iterators. We just want print_container(c) to work for any container c. Here's a first approach:

template <template <typename, typename> class ContainerType,
          typename ValueType,
          typename AllocType>
void print_container(const ContainerType<ValueType, AllocType>& c) {
  for (const auto& v : c) {
    std::cout << v << ' ';
  }
  std::cout << '\n';
}

Many of the STL containers are templates that can be parameterized by the value type and an allocator type; for example vector, list, deque, and so on. So we can write:

std::vector<double> vd{3.14, 8.1, 3.2, 1.0};
print_container(vd);

std::list<int> li{1, 2, 3, 5};
print_container(li);

And this works as expected. However, if we try to use it for map, we get a compile error:

std::map<std::string, int> msi{{"foo", 42}, {"bar", 81}, {"bazzo", 4}};
print_container(msi);
^~~~~~~~~~~~~~~
error: no matching function for call to 'print_container'
note: candidate template ignored: substitution failure :
      template template argument has different template
      parameters than its corresponding template template parameter

This is because map is a template parameterized by 4 template arguments, not 2. The same problem would occur for a set, which has 3 template arguments. This is annoying - while the contents of the print_container function would be the same for all these containers, the signature has to be different. What can we do without duplicating code? Variadic templates for the rescue:

template <template <typename, typename...> class ContainerType,
          typename ValueType, typename... Args>
void print_container(const ContainerType<ValueType, Args...>& c) {
  for (const auto& v : c) {
    std::cout << v << ' ';
  }
  std::cout << '\n';
}

What this says is - ContainerType is a template template parameter with any amount of template parameters itself. We don't care really, as long as the compiler can type-deduce them at the call. This version of the function will work for map, set, unordered_map and other containers [6]. One small addition we have to make to support mappings is:

// Implement << for pairs: this is needed to print out mappings where range
// iteration goes over (key, value) pairs.
template <typename T, typename U>
std::ostream& operator<<(std::ostream& out, const std::pair<T, U>& p) {
  out << "[" << p.first << ", " << p.second << "]";
  return out;
}

Variadic templates for forwarding

A somewhat related example is templates that don't do much on their own, but have to forward all their arguments to some other template or function. This turns out to be very useful because C++ has a commonly used construct that is inherently "variadic" when viewed from a template parameter point of view - the constructor. Given a generic type T, to invoke the constructor of T, we may need to pass in an arbitrary number of arguments. Unlike function types that specify their arguments at compile time, given just a generic type T we don't know which constructor(s) it has and how many arguments the constructor accepts.

A very important example of this is the std::make_unique function, available in the standard library since C++14. We want to be able to use it as follows:

std::unique_ptr<FooType> f = std::make_unique<FooType>(1, "str", 2.13);

FooType is an arbitrary type and can be constructed in arbitrary ways. How does make_unique know the signature of its constructor? With variadic templates, it doesn't have to know! Here's how make_unique is typically implemented:

template<typename T, typename... Args>
unique_ptr<T> make_unique(Args&&... args)
{
    return unique_ptr<T>(new T(std::forward<Args>(args)...));
}

Ignore the && syntax and std::forward for now; I will cover them in a future article. What's important for the sake of our current discussion is the use of a variadic template to convey "any amount of arguments can go here" and passing them through to the constructor of c in the new expression.


SFINAE and enable_if



There's an interesting issue one has to consider when mixing function overloading with templates in C++. The problem with templates is that they are usually overly inclusive, and when mixed with overloading, the result may be surprising:

void foo(unsigned i) {
  std::cout << "unsigned " << i << "\n";
}

template <typename T>
void foo(const T& t) {
  std::cout << "template " << t << "\n";
}

What do you think a call to foo(42) would print? The answer is "template 42", and the reason for this is that integer literals are signed by default (they only become unsigned with the U suffix). When the compiler examines the overload candidates to choose from for this call, it sees that the first function needs a conversion, while the second one matches perfectly, so that is the one it picks [1].

When the compiler looks at overload candidates that are templates, it has to actually perform substitution of explicitly specified or deduced types into the template arguments. This doesn't always result in sensical code, as the following example demonstrates; while artificial, it's representative of a lot of generic code written in modern C++:

int negate(int i) {
  return -i;
}

template <typename T>
typename T::value_type negate(const T& t) {
  return -t();
}

Consider a call to negate(42). It will pick up the first overload and return -42. However, while looking for the best overload, all candidates have to be considered. When the compiler considers the templated negate, it substitutes the deduced argument type of the call (int in this case) into the template, and comes up with the declaration:

int::value_type negate(const int& t);

This code is invalid, of course, since int has no member named value_type. So one could ask - should the compiler fail and emit an error message in this case? Well, no. If it did, writing generic code in C++ would be very difficult. In fact, the C++ standard has a special clause for such cases, explaining exactly how a compiler should behave.

SFINAE

In the latest draft of the C++11 standard, the relevant section is 14.8.2; it states that when a substitution failure, such as the one shown above, occurs, type deduction for this particular type fails. That's it. There's no error involved. The compiler simply ignores this candidate and looks at the others.

In the C++ folklore, this rule was dubbed "Substitution Failure Is Not An Error", or SFINAE.

The standard states:

If a substitution results in an invalid type or expression, type deduction fails. An invalid type or expression is one that would be ill-formed if written using the substituted arguments. Only invalid types and expressions in the immediate context of the function type and its template parameter types can result in a deduction failure.

And then goes on to list the possible scenarios that are deemed invalid, such as using a type that is not a class or enumeration type in a qualified name, attempting to create a reference to void, and so on.

But wait, what does it mean by the last sentence about "immediate context"? Consider this (non-sensical) example:

template <typename T>
void negate(const T& t) {
  typename T::value_type n = -t();
}

If type deduction matches this overload for some fundamental type, we'll actually get a compile error due to the T::value_type inside the function body. This is outside of the "immediate context of the function type and its template parameter types" mentioned by the standard. The lesson here is that if we want to write a template that only makes sense for some types, we must make it fail deduction for invalid types right in the declaration, to cause substitution failure. If the invalid type sneaks past the overload candidate selection phase, the program won't compile.

enable_if - a compile-time switch for templates

SFINAE has proved so useful that programmers started to explicitly rely on it very early on in the history of C++. One of the most notable tools used for this purpose is enable_if. It can be defined as follows:

template <bool, typename T = void>
struct enable_if
{};

template <typename T>
struct enable_if<true, T> {
  typedef T type;
};

And now we can do things like:

template <typename T>
void do_stuff(typename enable_if<std::is_integral<T>::value, T>::type &t) {
  // an implementation for integral types (int, char, unsigned, etc.)
}

template <typename T>
void do_stuff(typename enable_if<std::is_class<T>::value, T>::type &t) {
  // an implementation for class types
}

Note SFINAE at work here. When we make the call do_stuff(25), the compiler selects the first overload: since the condition std::is_integral<int> is true, the specialization of struct enable_if for true is used, and its internal type is set to int. The second overload is omitted because without the true specialization (std::is_class<int> is false) the general form of struct enable_if is selected, and it doesn't have a type, so the type of the argument results in a substitution failure.

enable_if has been part of Boost for many years, and since C++11 it's also in the standard C++ library as std::enable_if. Its usage is somewhat verbose though, so C++14 adds this type alias for convenience:

template <bool B, typename T = void>
using enable_if_t = typename enable_if<B, T>::type;

With this, the examples above can be rewritten a bit more succinctly:

template <typename T>
void do_stuff(std::enable_if_t<std::is_integral<T>::value, T> &t) {}

template <typename T>
void do_stuff(std::enable_if_t<std::is_class<T>::value, T> &t) {}

Uses of enable_if

enable_if is an extremely useful tool. There are hundreds of references to it in the C++11 standard template library. It's so useful because it's a key part in using type traits, a way to restrict templates to types that have certain properties. Without enable_if, templates are a rather blunt "catch-all" tool. If we define a function with a template argument, this function will be invoked on all possible types. Type traits and enable_if let us create different functions that act on different kinds of types, while still remaining generic [2].

One usage example I like is the two-argument constructor of std::vector:

// Create the vector {8, 8, 8, 8}
std::vector<int> v1(4, 8);

// Create another vector {8, 8, 8, 8}
std::vector<int> v2(std::begin(v1), std::end(v1));

// Create the vector {1, 2, 3, 4}
int arr[] = {1, 2, 3, 4, 5, 6, 7};
std::vector<int> v3(arr, arr + 4);

There are two forms of the two-argument constructor used here. Ignoring allocators, this is how these constructors could be declared:

template <typename T>
class vector {
    vector(size_type n, const T val);

    template <class InputIterator>
    vector(InputIterator first, InputIterator last);

    ...
}

Both constructors take two arguments, but the second one has the catch-all property of templates. Even though the template argument InputIterator has a descriptive name, it has no semantic meaning - the compiler wouldn't mind if it was called ARG42 or T. The problem here is that even for v1, the second constructor would be invoked if we didn't do something special. This is because the type of 4 is int rather than size_t. So to invoke the first constructor, the compiler would have to perform a type conversion. The second constructor would fit perfectly though.

So how does the library implementor avoid this problem and make sure that the second constructor is only called for iterators? By now we know the answer - with enable_if.

Here is how the second constructor is really defined:

template <class _InputIterator>
vector(_InputIterator __first,
       typename enable_if<__is_input_iterator<_InputIterator>::value &&
                          !__is_forward_iterator<_InputIterator>::value &&
                          ... more conditions ...
                          _InputIterator>::type __last);

It uses enable_if to only enable this overload for types that are input iterators, though not forward iterators. For forward iterators, there's a separate overload, because the constructors for these can be implemented more efficiently.

As I mentioned, there are many uses of enable_if in the C++11 standard library. The string::append method has a very similar use to the above, since it has several overloads that take two arguments and a template overload for iterators.

A somewhat different example is std::signbit, which is supposed to be defined for all arithmetic types (integer or floating point). Here's a simplified version of its declaration in the cmath header:

template <class T>
typename std::enable_if<std::is_arithmetic<T>, bool>::type
signbit(T x)
{
    // implementation
}

Without using enable_if, think about the options the library implementors would have. One would be to overload the function for each of the known arithmetic type. That's very verbose. Another would be to just use an unrestricted template. But then, had we actually passed a wrong type into it, say std::string, we'd most likely get a fairly obscure error at the point of use. With enable_if, we neither have to write boilerplate, nor to produce bad error messages. If we invoke std::signbit as defined above with a bad type we'll get a fairly helpful error saying that a suitable function cannot be found.

A more advanced version of enable_if

Admittedly, std::enable_if is clumsy, and even enable_if_t doesn't help much, though it's a bit less verbose. You still have to mix it into the declaration of a function in a way that often obscures the return type or an argument type. This is why some sources online suggest crafting more advanced versions that "get out of the way". Personally, I think this is the wrong tradeoff to make.

std::enable_if is a rarely used construct. So making it less verbose doesn't buy us much. On the other hand, making it more mysterious is detrimental, because every time we see it we have to think about how it works. The implementation shown here is fairly simple, and I'd keep it this way. Finally I'll note that the C++ standard library uses the verbose, "clumsy" version of std::enable_if without defining more complex versions. I think that's the right decision.


[1]If we had an overload for int, however, this is the one that would be picked, because in overload resolution non-templates are preferred over templates.
[2]Think of it as a mid-way between overloading and templates. C++ has another tool to implement something similar - runtime polymorphism. Type traits let us do that at compile time, without incurring any runtime cost.

Summary of reading: July - September 2014



  • "The Autobiography of Benjamin Franklin" by Benjamin Franklin (audiobook) - this book is surprisingly good, especially the first 2/3rds of it or so. Franklin was a very inspiring individual, worth learning from.
  • "Abundance: The Future Is Better Than You Think" by Peter Diamanidis and Steven Kotler - it's hard to avoid comparing this book to Matt Ridley's "Rational Optimist", if only because the authors quote from Ridley quite a bit. IMHO it's a strictly worse book - the interesting parts can all be found in Ridley's book, just explained better. This is not to say that it's a bad book; I certainly enjoyed parts of it. But if you only have to pick one book on this subject, "Rational Optimist" should be your choice.
  • "Creation: How Science is Reinventing Life Itself" by Adam Rutherford (audiobook) - very interesting book focusing on the biologic origins of life and genetic engineering. The first 2/3rds of the book are superb. The part that came later - about the ethical debate surrounding genetic engineering - I found too polemic and somewhat less interesting. I'd hope for more dive-ins into the latest scientific discoveries. The author does pick that up in the afterword that's actually a long and interesting chapter, describing more of the latest research.
  • "The Philadelphia chromosome" by Jessica Wapner (audiobook) - this is definitely one of the best books I've read this year. A very well put-together account of the research leading to a revolutionary treatment for CML - a type of leukemia. The treatment is one of the first drugs targeted specifically at the core cause of the cancer at the molecular level. Most of the book deals with the basic biological and genetic research, the slow scientific advances during decades, that enabled the drug. It also talks about the clinical trial process of new drugs, FDA approvals, and so on. Very highly recommended for anyone with an interest in biology & genetics.

Re-reads:

  • "La Soledad de los Numeros Primos" by Paolo Giordano