pthreads as a case study of good API design

The POSIX standard for threads (called pthreads) is available on most Unix-like systems these days [1], and it's probably the most popular threads API for programs written in C.

In my opinion, pthreads is a fine example of a great C API. A good API, in any language, is somewhat of an art. Arguably, the more abstraction the language allows, the better the APIs that can be created for it. According to this line of thought, C is a language in which designing a good API is particularly difficult, because it doesn't provide a lot of abstraction tools. However, as some APIs (amongst them pthreads) clearly demonstrate, designing a good C API is possible, if you follow a few rules.

I don't claim to know all the rules, but here are a few pthreads got right.

Simplicity and orthogonality

The pthreads API is inherently simple. Not in the sense that it makes multi-threaded (MT) programming a breeze (I doubt this is possible), but in the sense that it provides everything that's needed to write MT programs, and only that. In other words, pthreads solves a single problem, and solves it well.

Simplicity and orthogonality lead to predictability. There's no duplication, no multiple ways of doing the same thing, which could create confusion. After you spend some time with the API and you need to use some part you've never used before, you just know where to look.

Consistent and logical naming

Speaking of knowing where to look - the importance of naming conventions can not be overemphasized. This is very important in programming in general, and in API design in particular. pthreads is great in this respect.

Types are named pthread_[type]_t (examples: pthread_t, pthread_cond_t, etc.)
Functions are called pthread_[type]_[action] with a few exceptions that are pthread_[action] and pertain to the API in whole and not a specific type.
Constants are named PTHREAD_[NAME]

As an example consider barriers. Suppose that you've just learned about them and are wondering how to clean them up. Having spent even a few hours with pthreads, you will without doubt immediately guess the correct function name (pthread_barrier_destroy), because the naming is so consistent. This is a simple example that saves perhaps a few seconds of looking up a function name, but it's important nevertheless, as each such experience leaves you with more confidence in the API.

Opaque types

The types provided by pthreads are completely opaque. A type such as pthread_mutex_t reveals nothing of its implementation, and you can't even look up its definition without digging deep in the sources of the library.

Such opaqueness is great for encapsulation - an important concept in API design. Restrict what the user can do with your data and you won't get surprised by creative (ab)uses. APIs have to be restrictive - otherwise their abstractions will leak, which is dangerous.

A desired corollary of this design is consistent memory management: new pthread objects are initialized with pthread_[type]_init functions [2] and cleaned up with pthread_[type]_destroy functions. These functions take pointers to pthread types and don't actually allocate and deallocate the object themselves - only their contents.

This is the right approach, because:

The API knows best how to allocate the contents of its objects - the user doesn't even have to know what those contents are.
The user knows best how to allocate the objects themselves. He may choose to place them in static storage, allocate them dynamically, or even on stack for some uses. pthreads doesn't care - all it needs is a pointer to a valid object, through which the object can be initialized, interacted with, or destroyed.

pthreads takes the opaqueness of its types very seriously. Consider the thread ID. When creating a new thread, pthread_create stores it in a pointer to an opaque type pthread_t that the user passes. It's also available to any thread by calling pthread_self. The user is not allowed to make any assumptions about this type [3]. Therefore, pthreads provides the pthread_equal function to compare two such IDs.

Attributes

This aspect is a bit trickier than the others, and unfortunately I haven't seen it used in a lot of other APIs, which is a shame, IMHO.

Non-trivial APIs frequently have large parameter lists for some functions, especially those dealing with creation and initialization. This is an unfortunate result of an unavoidable reality - complex APIs must be customizable. One of the best examples is perhaps the notorious Win32 CreateWindow function. 11 arguments! I bet that you can't remember their designation and order, unless you're Charles Petzold. Therefore, calls to CreateWindow are usually heavily commented to explain what is being passed and where [4]. This problem is especially acute with C, which has neither named arguments, nor default argument values.

To me, this is an example of an API designer being lazy on the expense of the user. It's probably the approach requiring the least amount of code for the API implementer - just shove all those arguments in a list, give them names, and voila - we have a function.

pthreads takes the opposite approach, favoring the user over the API implementer, by using opaque attribute objects.

An attribute object is exactly like any other pthreads object. The user allocates it, and then calls pthread_attr_init to intialize it and pthread_attr_destroy to clean it up (I'm focusing on attributes of threads here, there are also attributes of condition objects, and so on). A cursory count (don't catch me on this one, could be a couple more or a couple less) of thread attributes is 9. But pthread_create takes only 4 arguments (the thread object, an attribute object, the function to run in the thread and an argument to that function). This feat is accomplished through the use of an attribute object, which is an aggregate of all the attributes a user would want to set for his new thread.

Fine, I hear someone say, so pass in a struct full of attributes into the function instead of many arguments. pthreads takes a further step - the attributes object is also completely opaque. You set attributes with pthread_attr_set[name] and can retrieve them with pthread_attr_get[name].

pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 100000);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
pthread_attr_setguardsize(&attr, 1000);
pthread_create(&my_thread, &attr, thread_func, args);

Yes, it requires much more code from the API implementer. Yes, it even requires a bit more code from the user. However, complex parametrization of function calls with attributes is now completely self-explanatory. The user can see exactly what attributes are being set prior to calling a function - no comments are required. Code that is self-documenting in this manner is a worthy goal to strive for.

Useful defaults

Not everything pthreads does is to favor explicitness over code size. For example, the default attributes used by pthread_create (when NULL is passed as the attribute pointer) are useful enough to be a perfectly valid default for most code.

Another example is exiting a thread. When the function running the thread returns, pthread_exit is implicitly called and the return value serves as the thread's exit status.

Defaults are useful only when they make sense. It's perfectly OK to make some assumptions about the most common needs of the user, as long as it's well documented. As the saying goes, you should strive to make the easy things easy, and the difficult things possible.

Conclusion

I hope I've managed to convey some of my views on API design with this article. There are no fast recipes for great APIs. Rather, it is best to learn by example, both from good APIs and from bad APIs. In my humble opinion, pthreads is an example of a good design, for the reasons I've listed above, and perhaps a few more that I've missed.

I don't know if it can be considered a perfect API. Probably not - as I'm sure programmers more knowledgeable than I have found a few quirks with it. But overall, it can indeed serve as a good example.

[1]	There's even a Win32 port available.

[2]	Except for threads themselves, which are created with `pthread_create`. This makes sense, because `pthread_create` not only initializes the object, but also runs the actual thread. Hence, create is a more descriptive verb to use.

[3]	Although many users correctly guess that this is some kind of an integral type, and print it out for debugging.

[4]	That is, if you're lucky to be dealing with good code. In bad code they might not be commented at all, or worse, commented wrongly, which can cause a lot of grief and frequent MSDN counseling.