pthreads as a case study of good API design

April 5th, 2010 at 7:51 am

The POSIX standard for threads (called pthreads) is available on most Unix-like systems these days [1], and it’s probably the most popular threads API for programs written in C.

In my opinion, pthreads is a fine example of a great C API. A good API, in any language, is somewhat of an art. Arguably, the more abstraction the language allows, the better the APIs that can be created for it. According to this line of thought, C is a language in which designing a good API is particularly difficult, because it doesn’t provide a lot of abstraction tools. However, as some APIs (amongst them pthreads) clearly demonstrate, designing a good C API is possible, if you follow a few rules.

I don’t claim to know all the rules, but here are a few pthreads got right.

Simplicity and orthogonality

The pthreads API is inherently simple. Not in the sense that it makes multi-threaded (MT) programming a breeze (I doubt this is possible), but in the sense that it provides everything that’s needed to write MT programs, and only that. In other words, pthreads solves a single problem, and solves it well.

Simplicity and orthogonality lead to predictability. There’s no duplication, no multiple ways of doing the same thing, which could create confusion. After you spend some time with the API and you need to use some part you’ve never used before, you just know where to look.

Consistent and logical naming

Speaking of knowing where to look – the importance of naming conventions can not be overemphasized. This is very important in programming in general, and in API design in particular. pthreads is great in this respect.

  • Types are named pthread_[type]_t (examples: pthread_t, pthread_cond_t, etc.)
  • Functions are called pthread_[type]_[action] with a few exceptions that are pthread_[action] and pertain to the API in whole and not a specific type.
  • Constants are named PTHREAD_[NAME]

As an example consider barriers. Suppose that you’ve just learned about them and are wondering how to clean them up. Having spent even a few hours with pthreads, you will without doubt immediately guess the correct function name (pthread_barrier_destroy), because the naming is so consistent. This is a simple example that saves perhaps a few seconds of looking up a function name, but it’s important nevertheless, as each such experience leaves you with more confidence in the API.

Opaque types

The types provided by pthreads are completely opaque. A type such as pthread_mutex_t reveals nothing of its implementation, and you can’t even look up its definition without digging deep in the sources of the library.

Such opaqueness is great for encapsulation – an important concept in API design. Restrict what the user can do with your data and you won’t get surprised by creative (ab)uses. APIs have to be restrictive – otherwise their abstractions will leak, which is dangerous.

A desired corollary of this design is consistent memory management: new pthread objects are initialized with pthread_[type]_init functions [2] and cleaned up with pthread_[type]_destroy functions. These functions take pointers to pthread types and don’t actually allocate and deallocate the object themselves – only their contents.

This is the right approach, because:

  1. The API knows best how to allocate the contents of its objects – the user doesn’t even have to know what those contents are.
  2. The user knows best how to allocate the objects themselves. He may choose to place them in static storage, allocate them dynamically, or even on stack for some uses. pthreads doesn’t care – all it needs is a pointer to a valid object, through which the object can be initialized, interacted with, or destroyed.

pthreads takes the opaqueness of its types very seriously. Consider the thread ID. When creating a new thread, pthread_create stores it in a pointer to an opaque type pthread_t that the user passes. It’s also available to any thread by calling pthread_self. The user is not allowed to make any assumptions about this type [3]. Therefore, pthreads provides the pthread_equal function to compare two such IDs.

Attributes

This aspect is a bit trickier than the others, and unfortunately I haven’t seen it used in a lot of other APIs, which is a shame, IMHO.

Non-trivial APIs frequently have large parameter lists for some functions, especially those dealing with creation and initialization. This is an unfortunate result of an unavoidable reality – complex APIs must be customizable. One of the best examples is perhaps the notorious Win32 CreateWindow function. 11 arguments! I bet that you can’t remember their designation and order, unless you’re Charles Petzold. Therefore, calls to CreateWindow are usually heavily commented to explain what is being passed and where [4]. This problem is especially acute with C, which has neither named arguments, nor default argument values.

To me, this is an example of an API designer being lazy on the expense of the user. It’s probably the approach requiring the least amount of code for the API implementer – just shove all those arguments in a list, give them names, and voila – we have a function.

pthreads takes the opposite approach, favoring the user over the API implementer, by using opaque attribute objects.

An attribute object is exactly like any other pthreads object. The user allocates it, and then calls pthread_attr_init to intialize it and pthread_attr_destroy to clean it up (I’m focusing on attributes of threads here, there are also attributes of condition objects, and so on). A cursory count (don’t catch me on this one, could be a couple more or a couple less) of thread attributes is 9. But pthread_create takes only 4 arguments (the thread object, an attribute object, the function to run in the thread and an argument to that function). This feat is accomplished through the use of an attribute object, which is an aggregate of all the attributes a user would want to set for his new thread.

Fine, I hear someone say, so pass in a struct full of attributes into the function instead of many arguments. pthreads takes a further step – the attributes object is also completely opaque. You set attributes with pthread_attr_set[name] and can retrieve them with pthread_attr_get[name].

pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 100000);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
pthread_attr_setguardsize(&attr, 1000);
pthread_create(&my_thread, &attr, thread_func, args);

Yes, it requires much more code from the API implementer. Yes, it even requires a bit more code from the user. However, complex parametrization of function calls with attributes is now completely self-explanatory. The user can see exactly what attributes are being set prior to calling a function – no comments are required. Code that is self-documenting in this manner is a worthy goal to strive for.

Useful defaults

Not everything pthreads does is to favor explicitness over code size. For example, the default attributes used by pthread_create (when NULL is passed as the attribute pointer) are useful enough to be a perfectly valid default for most code.

Another example is exiting a thread. When the function running the thread returns, pthread_exit is implicitly called and the return value serves as the thread’s exit status.

Defaults are useful only when they make sense. It’s perfectly OK to make some assumptions about the most common needs of the user, as long as it’s well documented. As the saying goes, you should strive to make the easy things easy, and the difficult things possible.

Conclusion

I hope I’ve managed to convey some of my views on API design with this article. There are no fast recipes for great APIs. Rather, it is best to learn by example, both from good APIs and from bad APIs. In my humble opinion, pthreads is an example of a good design, for the reasons I’ve listed above, and perhaps a few more that I’ve missed.

I don’t know if it can be considered a perfect API. Probably not – as I’m sure programmers more knowledgeable than I have found a few quirks with it. But overall, it can indeed serve as a good example.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1] There’s even a Win32 port available.
[2] Except for threads themselves, which are created with pthread_create. This makes sense, because pthread_create not only initializes the object, but also runs the actual thread. Hence, create is a more descriptive verb to use.
[3] Although many users correctly guess that this is some kind of an integral type, and print it out for debugging.
[4] That is, if you’re lucky to be dealing with good code. In bad code they might not be commented at all, or worse, commented wrongly, which can cause a lot of grief and frequent MSDN counseling.

Related posts:

  1. Qt guidelenes for API design
  2. Python objects, types, classes, and instances – a glossary
  3. The fundamental types of Python – a diagram
  4. Creating threads in Win32 C/C++ programming
  5. Equality in Lisp

13 Responses to “pthreads as a case study of good API design”

  1. AstrobeNo Gravatar Says:

    “To me, [CreateWindow] is an example of an API designer being lazy on the expense of the user”

    Naming an ‘API’ is very misleading. It lets users think that it can be used ‘as is’ in applications. But obviously it is actually not usable.

    Other exemples of so-called APIs come to mind; the BSD socket interface for instance. If I stretch things a little bit, I could argue that pthread’s attributes doesn’t look all that usable.
    I think that the term Application Programming Interface is misunderstood today. It was coined a long time ago when ‘interface’ had a significantly different meaning. Today, a slightly better name would be ‘system programming interface’ or something like that, meaning that it is only ‘raw’ interface (it remind me my days with DOS’ OS services “API”, which were available via software interrupts).

    The problem with that kind of object is that not only in must be customizable (as you noticed), it must also be comprehensive. Hence sometimes large number of arguments, but also concept overload or overhead (pthread’s attribute is an example), etc.
    So there must be some sort of complexity-reduction layer in order to have something convenient for the programmer to use. Of course, behind this layer one looses some possibilities, and there’s potentially many different ways to design that layer depending on tradeoffs one makes.

  2. Casey DuncanNo Gravatar Says:

    This is such a vital subject that gets so little of the attention that it deserves, thank you! You offer a glowing review of the pthreads api here, and I wonder, is there anything about it that you do not like or would change?

  3. lorgNo Gravatar Says:

    1. Good post, it was fun to read.
    2. Re naming – “There are only two hard things in Computer Science: cache invalidation and naming things” – Phil Karlton :)
    3. Actually, I find it mostly easy to write APIs in C. You don’t have a lot of choices to make, and it’s easy to make it standard.
    I find writing a good API in C++ much harder. C++ is much harder just by itself, but writing a good API is even worse. There are just so many choices…

  4. PaulNo Gravatar Says:

    “Consistent and logical naming”? I don’t see how this can be so with confusing names like “condition variable” and “barrier”. The former makes no sense and the latter takes the name from something else (memory barrier) which is different. I’m not sure pthreads is as orthogonal as it could be, since condition variables and semaphores mostly overlap. And the fact that pthreads lacks a generic unified signaling scheme makes it weaker than Windows, despite my usual distaste for Windows APIs. There’s no way in pthreads to wait on two things at the same time. And please don’t tell me to “make my own;” that’s a cop-out.

  5. elibenNo Gravatar Says:

    @Paul,

    AFAIK both condition variables and barriers are well-accepted terms that are used in other APIs as well. When such terms are known already, it makes sense not to invent new ones. Regarding orthogonality, condition variables and semaphores most certainly aren’t the same thing.

  6. CzeslawNo Gravatar Says:

    Thanks for the attributes part it just came in handy for me.

  7. ChillaNo Gravatar Says:

    I dont know if anyone noticed it or not, the first word of the function is the link name of library (pthread). This is a good practise if you have multiple libraries linked into executable and every library follows this naming format, it will be helpful in debugging the stack trace or logging information. Another advantage is, If I have another thread library linking to my executable apart from pthread , the names will not clash.

  8. Camere de supraveghereNo Gravatar Says:

    I really like the road you are posting!

    you entertain an provocative sharp end of aim!

    http://www.cameredesupraveghere.net/

  9. Abafei SoftwareNo Gravatar Says:

    Wow, Yasher Koach, Baruch Hashem! This is very good information, especially the part about designing in a way which makes it easier for users, even if it may take more work by an implementer.

  10. Demetrius MiscovichNo Gravatar Says:

    Hi there! Would you mind if I share your blog with my facebook group? There’s a lot of people that I think would really appreciate your content. Please let me know. Cheers

  11. elibenNo Gravatar Says:

    Demetrius,

    Whatever you mean by “share”, you probably don’t need my permission for it.

  12. FrankNo Gravatar Says:

    Another example of good API design is SQLite:
    http://www.sqlite.org

  13. bop daNo Gravatar Says:

    Consistent and logical naming”? I don’t see how this can be so with confusing names like “condition variable” and “barrier”. The former makes no sense and the latter takes the name from something else (memory barrier) which is different. I’m not sure pthreads is as orthogonal as it could be, since condition variables and semaphores mostly overlap. And the fact that pthreads lacks a generic unified signaling scheme makes it weaker than Windows, despite my usual distaste for Windows APIs. There’s no way in pthreads to wait on two things at the same time. And please don’t tell me to “make my own;” that’s a cop-out.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)