Tags Haskell

In this brief post I want to discuss a fairly unusual feature of Haskell - functions that can be parameterized by their return type.

Parametric vs. ad-hoc polymophism

It's worth beginning with a quick discussion of the two most common kinds of compile-time polymorphism present in Haskell: parametric polymophism and ad-hoc polymorphism.

Parametric polymorphism is possible when we can define a certain operation to work similarly on any type. A simple example is the list type [a], on which many operations are defined in a way that completely disregards what the actual type a is. For instance, the function length can find the length of the list without relying or caring about a. A slightly more interesting example is map, defined as follows for lists:

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

This will work on any list, regardless of what it contains - integers, other lists or some complicated user-defined type. The definition is written in a way that is oblivious to the actual type of a. This kind approach is commonly called generic programming; in C++, it's represented with templates.

This approach is sometimes limiting, however, because we may actually want functions to do something slightly different for every type (or at least for some types). This brings us to ad-hoc polymoprhism, which is represented by either function overloading or template specialization in C++.

Ad-hoc polymorphism in Haskell is achieved by using typeclasses. As an example, consider the built-in class Ord. If you define the <= operator for your type, it's then considered to comply with the Ord class and we can do some interesting things with it [1]. For example, we can implement merge-sorting as follows:

merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys)
  | x <= y = x : merge xs (y : ys)
  | otherwise = y : merge (x : xs) ys

msort :: Ord a => [a] -> [a]
msort [x] = [x]
msort xs = merge (msort left) (msort right)
  where
    (left, right) = (take halflen xs, drop halflen xs)
    halflen = length xs `div` 2

Note that msort works on a list of some type a, similarly to map; in this respect it's parametrically polymorphic, with a twist. The type constraint Ord a => tells the compiler that it's only legal to invoke msort on lists of type a if a implements the Ord class.

Most built-in Haskell types implement the Ord ord class, so we can use msort right away:

> msort [2,1,3,8,5]
[1,2,3,5,8]

However, if we want to use it on user-defined types, we'll need to implement the Ord class manually:

data Person = Person
  { lastName :: String
  , firstName :: String
  } deriving (Eq, Show)

-- Simple example of making Person sortable by defining Ord.
instance Ord Person where
  (Person lx fx) <= (Person ly fy) =
    if lx == ly
      then fx <= fy
      else lx <= ly

Note that this definition of <= relies on some semantic properties of the custom type (that last name is usually sorted before first name) that Haskell has no way of knowing. This is why ad-hoc is often necessary - per-type definitions describe a real-life domain in some way; for example, had Person had some unique ID it would be perhaps more correct to sort people by this ID.

Having done this, we can now sort a list of Persons:

> let folks = [(Person "Jones" "Jan"), (Person "Jones" "Areo"),
               (Person "Falcon" "Hugo"), (Person "Spearson" "Britney")]
> msort folks
[Person {lastName = "Falcon", firstName = "Hugo"},
 Person {lastName = "Jones", firstName = "Areo"},
 Person {lastName = "Jones", firstName = "Jan"},
 Person {lastName = "Spearson", firstName = "Britney"}]

It's important to reiterate that this example showcases both kinds of polymoprhism. msort is parametrically polymoprhic - it's the same code that will work for any type a, as long as a implements Ord. On the other hand, the <= operator is ad-hoc polymorphic - it works on different types, but each type should define its own version.

Return-type polymophism

Now that we have the above covered, it's time to turn to the main topic of this post - return-type polymophism. Combined with type inference, this is a fairly unique and cool aspect of Haskell, especially if you come from the C++ world where templates provide some degree of parametricity but it's limited in certain other ways.

Let's start with an example, the built-in function read:

read :: Read a => String -> a

The read function reads input from a string, which must be completely consumed
by the input process.

Something interesting is happening here: read is parameterized by type a, but this type is not one of its arguments; no, the argument of read is simply a String, which is a concrete type. a appears only in the return type.

Let's try to parse an integer by using read:

> read "1"

<interactive>:46:1:
    No instance for (Read a0) arising from a use of `read'
    The type variable `a0' is ambiguous
    Possible fix: add a type signature that fixes these type variable(s)
    <...>

Oops, what went wrong? The issue here is that 1 could be of several types, and Haskell doesn't know which one to choose. We can fix that with an explicit type annotation:

> read "1" :: Int
1

But 1 can also be of other types:

> read "1" :: Double
1.0

So read can truly return multiple types [2], depending on how it's being called. This is return-type polymorphism.

Here's a more intriguing example:

> putStrLn (take (read "2") (read "\"haskell\""))
ha

Haskell didn't complain about read "2", even though no type annotation was provided. What's going on? The answer is type inference. Haskell looks at that code and sees the result of read "2" being fed into take, as its first argument. The type of take is Int -> [a] -> [a], so the result of read "2" is placed in an Int. This interaction of type inference with return-type polymophism is one of the most impressive features of Haskell, IMHO!

Now let's turn to a slightly more interesting example - monoids. I've mentioned the Monoid type class before, in the context of folds.

class Monoid a where

  The class of monoids (types with an associative binary operation that has an
  identity).

To create a new Monoid, we have to define two functions:

mempty :: a

  Identity of mappend

mappend :: a -> a -> a

  An associative operation

Note the type of mempty - it takes no arguments, but returns an arbitrary type a - return-type polymorphism. Here's how Haskell defines Monoid for lists:

instance Monoid [a] where
        mempty  = []
        mappend = (++)
        mconcat xss = [x | xs <- xss, x <- xs]

With this definition, we could use mempty for strings as follows:

> mempty :: String
""

For numbers, things are somewhat more interesting because there are two ways to define monoids on integers. One is using addition as the associative operation, another is using multiplication. In the former case, the zero element is 0, in the latter it's 1. Since there is no scenario which is clearly better than the other, Haskell doesn't define Monoid for Int:

> mempty :: Int

<interactive>:11:1:
    No instance for (Monoid Int) arising from a use of `mempty'
    Possible fix: add an instance declaration for (Monoid Int)

Rather, it adds two new types that wrap Int: Sum and Product. Here's an example:

> mempty :: Sum Int
Sum {getSum = 0}
> mappend (Sum 6) (Sum 7)
Sum {getSum = 13}

As before with read, note how mempty returns a different type based on what's expected from it. Type inference picks the right overload! In the following sample, type inference knows that mappend takes two arguments of the same type and the first one is a String, so it invokes the String-returning version of mempty:

> mappend "Foobar" mempty
"Foobar"

Similarly, here the Sum-returning version is invoked due to type inference:

> mappend (Sum 10) mempty
Sum {getSum = 10}

Providing multiple capabilities from the same function

I'll conclude with another cool example of return-type polymoprhism from the Haskell standard library: regular expression matching [3]. Haskell defines the RegexLike typeclass as an interface for regex-like matchers. It has the usual zoo of methods one can define (and if undefined, will use one another as the default implementation), with another class for the more generic usage:

-- | RegexContext is the polymorphic interface to do matching.  Since
-- 'target' is polymorphic you may need to suply the type explicitly
-- in contexts where it cannot be inferred.
-- <...>
class (RegexLike regex source) => RegexContext regex source target where
  match :: regex -> source -> target
  matchM :: (Monad m) => regex -> source -> m target

Feel free to dig in the source to see how it's all hooked up (a good example of building high-level abstractions with Haskell!), but match is wrapped in the =~ operator, which behaves polymophically based on the expected return type. In Bool context, is finds whether the match exists at all (this corresponds to the matchTest method of RegexLike):

> "january" =~ "an(ua)*" :: Bool
True

While in Int context, it counts the number of matches (this corresponds to matchCount):

> "january" =~ "an(ua)*" :: Int
1
> "january" =~ "a" :: Int
2

And so on... there are a few more possibilities (like returning the actual list of matches). Depending on your point of view, this is either extremely cool or scary, because sometimes the programmer has to perform type inference in their head to follow the path the compiler is taken, which can make code tricky to understand.


[1]This explanation over-simplifies a bit. The definition of Ord showcases other features of Haskell. First, Ord can only be defined on classes for which Eq is defined - this is enforced by the compiler. Second, Ord has default implementations for a bunch of functions so it suffices to define either <= or compare and get a bunch of others automatically provided.
[2]With different implementations for different types, of course. Parsing a Double from a string is different from parsing an Int.
[3]Inspired by this article by Matthew Manela - thanks, Matthew!