Testing multiple implementations of a trait in Rust

I've been hacking on a small practice project in Rust where I implement the same data structure in several different ways. When testing this project, I want to run exactly the same set of tests on several types that implement the same trait.

As a demonstrative example, let's take the following trait:

pub trait Calculator {
    fn new() -> Self;
    fn add(&self, a: u32, b: u32) -> u32;
}

A straightforward implementation could be Foo:

pub struct Foo {}

impl Calculator for Foo {
    fn new() -> Self {
        Self {}
    }

    fn add(&self, a: u32, b: u32) -> u32 {
        a + b
    }
}

Or, if you enjoy the Peano axioms, a somewhat more involved implementation could be Bar:

pub struct Bar {}

impl Calculator for Bar {
    fn new() -> Self {
        Self {}
    }

    fn add(&self, a: u32, b: u32) -> u32 {
        if b == 0 {
            a
        } else {
            self.add(a, b - 1) + 1
        }
    }
}

Our task is to write the same set of tests once, and invoke it on both Foo and Bar with as little boilerplate as possible. Let's examine several approaches for doing this [1].

Straightforward trait-based testing

The most basic approach to testing our types would be something like:

#[cfg(test)]
mod tests {
    use crate::calculator::{Bar, Calculator, Foo};

    fn trait_tester<C: Calculator>() {
        let c = C::new();
        assert_eq!(c.add(2, 3), 5);
        assert_eq!(c.add(10, 43), 53);
    }

    #[test]
    fn test_foo() {
        trait_tester::<Foo>();
    }

    #[test]
    fn test_bar() {
        trait_tester::<Bar>();
    }
}

The trait_tester function can be invoked on any type that implements the Calculator trait and can host a collection of tests. "Concrete" test functions like test_foo then call trait_tester; the concrete test functions are what the Rust testing framework sees because they're marked with the #[test] attribute.

On the surface, this approach seems workable; looking deeper, however, there is a serious issue.

Suppose we want to write multiple test functions that test different features and usages of our Calculator. We could add trait_tester_feature1, trait_tester_feature2, etc. Then, the concrete test functions would look something like:

#[test]
fn test_foo() {
    trait_tester::<Foo>();
    trait_tester_feature1::<Foo>();
    trait_tester_feature2::<Foo>();
}

#[test]
fn test_bar() {
    trait_tester::<Bar>();
    trait_tester_feature1::<Bar>();
    trait_tester_feature2::<Bar>();
}

Taken to the limit, there's quite a bit of repetition here. In a realistic project the number of tests can easily run into the dozens.

The problem doesn't end here, though; in Rust, the unit of testing is test_foo, not the trait_tester* functions. This means that only test_foo will show up in the testing report, there's no easy way to select to run only trait_tester_feature1, etc. Moreover, test parallelization can only happen between #[test] functions.

The fundamental issue here is: what we really want is to mark each of the trait_tester* functions with #[test], but this isn't trivial because #[test] is a compile-time feature, and the compiler is supposed to know what concrete types partake in each #[test] function definition.

Thankfully, Rust has just the tool for generating code at compile time.

First attempt with macros

Macros can help us generate functions tagged with #[test] at compile time. Let's try this:

macro_rules! calculator_tests {
    ($($name:ident: $type:ty,)*) => {
    $(
        #[test]
        fn $name() {
            let c = <$type>::new();
            assert_eq!(c.add(2, 3), 5);
            assert_eq!(c.add(10, 43), 53);
        }
    )*
    }
}

#[cfg(test)]
mod tests {
    use crate::calculator::{Bar, Calculator, Foo};

    calculator_tests! {
        foo: Foo,
        bar: Bar,
    }
}

The calculator_tests macro generates multiple #[test]-tagged functions, one per type. If we run cargo test, we'll see that the Rust testing framework recognizes and runs them:

[...]
test typetest::tests::bar ... ok
test typetest::tests::foo ... ok
[...]

However, there's an issue; how to we add more testing functions per type, as discussed previously? If only we could do something like fn ${name}_feature1 to name a function. Due to macro hygiene rules this is tricky, though procedural macros like the paste crate can help; see this code for an example.

In any case, I believe there's a better solution.

Second attempt with macros

Instead of encoding the type variant in the function name, we can use a Rust sub-module:

macro_rules! calculator_tests {
    ($($name:ident: $type:ty,)*) => {
    $(
        mod $name {
            use super::*;

            #[test]
            fn test() {
                let c = <$type>::new();
                assert_eq!(c.add(2, 3), 5);
                assert_eq!(c.add(10, 43), 53);
            }
        }
    )*
    }
}

#[cfg(test)]
mod tests {
    use crate::calculator::{Bar, Calculator, Foo};

    calculator_tests! {
        foo: Foo,
        bar: Bar,
    }
}

Now all functions are named test, but they're namespaced inside a module with a configurable name. And yes, now we can easily add more testing functions:

macro_rules! calculator_tests {
    ($($name:ident: $type:ty,)*) => {
    $(
        mod $name {
            use super::*;

            #[test]
            fn test() {
                let c = <$type>::new();
                assert_eq!(c.add(2, 3), 5);
                assert_eq!(c.add(10, 43), 53);
            }

            #[test]
            fn test_feature1() {
                let c = <$type>::new();
                assert_eq!(c.add(6, 9), 15);
            }
        }
    )*
    }
}

If we run cargo test, it works as expected:

test typetestmod::tests::bar::test ... ok
test typetestmod::tests::bar::test_feature1 ... ok
test typetestmod::tests::foo::test_feature1 ... ok
test typetestmod::tests::foo::test ... ok

Each test has its own full path, and is invoked separately. We can select which tests to run from the command line - running only the tests for Bar, say, or run all the feature1 tests for all types. Also notice that the test names are reported "out of order"; this is because they are all run concurrently!

To conclude, with some macro hackery the goal is achieved. We can now write any number of tests in a generic way, and invoke all these tests on multiple types with minimal duplication - just one extra line per type [2].

It's not all perfect, though. Macros add a layer of indirection and it leaks in the error messages. If one of the assert_eq! invocations fails, the reported line is at the point of macro instantiation, which is the same line for all tests for any given type. Re-running the test with RUST_BACKTRACE=1 helps find which of the asserts failed, since it appears in the trace.

[1]	The full source code for this post can be found on GitHub.

[2] Sharp-eyed readers will note that using this approach the common trait isn't actually needed at all! Macros work by textual substitution (AST substitution, to be precise), so the generated code creates a concrete type and invokes its methods. The macro-based tests would work even if Foo and Bar didn't declare themselves as implementing the Calculator trait.