A taxonomy of typing systems - Eli Bendersky's website

One topic that shows up very often in comparisons of programming languages is typing. When the pros discuss the relative merits of, say, Java and Lisp, you might (like me) sometimes find yourself lost in the cornucopia of terminology wondering what the hell is "static" typing, why is it better / worse than "dynamic" typing, and how about that "strong" typing thingy, it surely sounds way better than "weak" typing, and oh, what has "safe" typing to do with it all. This article is an attempt to put it all in order, at least for myself. The topic of typing is controversial enough to have dozens of cross-contradictory definitions. Here I'll try to refer to the most commonly implied uses of the idioms.

So what is typing anyway ?

In programming languages, data is divided to types. integers, floats, strings, arrays of stuff, you get the idea. For instance, in C, the following statement:


float pi = 3.14159265;

Defines a variable pi with a "floating point" type and assigns it a value. Now this variable can be used anywhere a floating point value is expected, such as in the sqrt function. In Perl, a string is defined and printed as follows:


my $str = "hello";
print $str;

After the first line, $str is of a "string" type.

Static vs. Dynamic typing

The simple examples of the last section actually demonstrate the difference between static and dynamic typing. Note how in the C code, the float type is specified in the declaration of the variable pi, while in Perl no such specification is provided. C is statically typed - the type is given to a variable at declaration time, or at compile time. In statically typed languages, the type pertains to a variable. From the moment of the declaration, the variable may be assigned only data of its type (give or take casts, which are also compile time constructs). As a result, statically typed languages have a distinction between compile time and runtime - the compiler can perform type checking in compile time, before the program runs [1]. Other well-known languages with static typing are C++, C#, Java, Basic, Pascal, Ada and Fortran. Perl is dynamically typed - the type of a variable can change during runtime. In dynamically typed languages, the type pertains to data. Variables just "point" to data (not to be confused with C pointers), and they can point to data of different types at different times. Dynamically typed languages usually don't have a distinction between compile time and runtime, and type checking can not be performed before runtime. Most of the "scripting" and "rapid development" languages are dynamically typed - Javascript, Lisp, PHP, Python, Ruby, Scheme and others.

Strong vs. Weak typing

The division to strong and weak typed languages is much less clear-cut than static vs. dynamic, and there is a lot of confusion on what strong typing means. One common definition is disallowing operations on incompatible types. Consider the following example from Perl (a weakly typed language):


print "2" + 4;

This code will print '6', although it seemingly performs addition on incompatible types. This is because Perl is performing an implicit conversion when it sees we want to add a string that represents a number to another number. On the other hand, Ruby is strongly typed and the above statement will generate a TypeError. Another example is C (weak) vs. C++ (strong):


char* buf = malloc(20);

This line of code cleanly compiles in C, but generates an error in C++. malloc returns void*, and while C doesn't mind, C++ does, and demands an explicit cast. While weak typing often allows for faster development, strong typing gives the compiler / runtime an ability to catch more errors that may be potentially dangerous. Strongly typed languages demand more care from the programmer, requiring far more explicit type conversions. Weakly typed languages, on the other hand, resort to large amounts of implicit type conversions. Perl is probably the chief example of weak typing - the Perl hackers even have a name for this philosophy - DWIM (Do What I Mean). On the other extreme is Ada, where you sometimes feel like programming when your hands are tied behind your back, but as many Ada programmers note, once you get the damn thing to compile, it will probably run correctly. Another point to note is that static typing almost always goes hand in hand with strong typing. If you demand types in compile time (static typing), you better enforce the usage of these types throughout the program and deny potentially dangerous implicit conversions (strong typing). The only notable statically typed languages without strong typing are C [2] and Basic.

Safe vs. Unsafe typing

For a proper explanation of safety, let's first define trapped errors as execution errors that cause the computation stop immediately, and untrapped errors as execution errors that go unnoticed and later cause arbitrary behavior. Now we can formally define a safe programming language as one that allows to write only programs that are safe. The type safety enforcement of a programming can be static, catching potential errors at compile time, or dynamic, associating type information with values at run time and consulting them as needed to detect imminent errors, or a combination of both. There is a common misconception that weak typing goes hand in hand with lack of safety. This is far from the truth. Most weakly typed languages defer the checking of operations until the last possible opportunity - at run time, where all the information about the operation is available, which allows for more thorough safety checks. For example, while Perl has weak typing, it is definitely type safe. Try to crash the system in a pure Perl program, if you challenge this statement. On the other hand, while C++ is strongly typed, the following minimal program, while compiling cleanly, will crash spectacularly:


int* p = 0;
*p = 5;

Worse yet, a variation of this code may corrupt the program in very mysterious ways which are notoriously difficult to debug. Languages that allow pointer arithmetic just can't be type-safe. The flexibility of pointers is implemented in other languages by using references, which are safe forms of pointers, without the arithmetic. C++ itself also provides references which is a good, safe substitute for pointers in most situations.

Taxonomy of commonly used programming languages

Below is a list of the some of most commonly used (or discussed) programming languages, and the typing groups they belong to.

Language \|	Static / Dynamic \|	Strong / Weak \|	Safety \|
Ada	static	strong	safe
C	static	weak	unsafe
C++	static	strong	unsafe
Java	static	strong	safe
Javascript	dynamic	weak	safe
Lisp	dynamic	strong	safe
Pascal	static	strong	safe
Perl	dynamic	weak	safe
PHP	dynamic	weak	safe
Python	dynamic	strong	safe
Ruby	dynamic	strong	safe

Conclusion

I hope this article sheds some light on the widely misunderstood topic of typing. As with all engineering tools, different languages have different approaches to typing, and each has its merits and its disadvantages. Knowing the different approaches and what they entail helps up pick the correct tools at appropriate times.

Notes

[1] Providing the actual type for the compiler to see is called explicit typing and this is what most common languages use. Some languages, like Haskell and OCaml employ a technique called type inference to automatically deduce the type of the variable. These languages are implicit static typed. [2] C, in particular, is so notorious for its weak and unsafe typing, that many programmers recommend using the C subset of C++ (sometimes referred to as C++--) instead of C itself for large scale programming (C++ is also unsafe, but at least it is strongly typed, so the compiler is much less forgiving to errors that may lead to unsafe code), that many companies earn good money from creating static code checking (Lint-like) tools for C, and that embedded programmers invent restrictions on the usage of C to earn at least some safety (Misra C. In defense of C it can be said that C is deliberately unsafe, because of performance considerations. The run time checks needed to achieve safety are sometimes considered too expensive. Take array bounds checks, for instance, that cannot be completely eliminated at compile time, and will impose a performance degradation at run time.