A taxonomy of typing systems

November 25th, 2006 at 12:08 pm

One topic that shows up very often in comparisons of programming languages is typing. When the pros discuss the relative merits of, say, Java and Lisp, you might (like me) sometimes find yourself lost in the cornucopia of terminology wondering what the hell is “static” typing, why is it better / worse than “dynamic” typing, and how about that “strong” typing thingy, it surely sounds way better than “weak” typing, and oh, what has “safe” typing to do with it all.

This article is an attempt to put it all in order, at least for myself. The topic of typing is controversial enough to have dozens of cross-contradictory definitions. Here I’ll try to refer to the most commonly implied uses of the idioms.

So what is typing anyway ?

In programming languages, data is divided to types. integers, floats, strings, arrays of stuff, you get the idea. For instance, in C, the following statement:


float pi = 3.14159265;

Defines a variable pi with a “floating point” type and assigns it a value. Now this variable can be used anywhere a floating point value is expected, such as in the sqrt function.

In Perl, a string is defined and printed as follows:


my $str = "hello";
print $str;

After the first line, $str is of a “string” type.

Static vs. Dynamic typing

The simple examples of the last section actually demonstrate the difference between static and dynamic typing. Note how in the C code, the float type is specified in the declaration of the variable pi, while in Perl no such specification is provided.

C is statically typed – the type is given to a variable at declaration time, or at compile time. In statically typed languages, the type pertains to a variable. From the moment of the declaration, the variable may be assigned only data of its type (give or take casts, which are also compile time constructs). As a result, statically typed languages have a distinction between compile time and runtime – the compiler can perform type checking in compile time, before the program runs [1]. Other well-known languages with static typing are C++, C#, Java, Basic, Pascal, Ada and Fortran.

Perl is dynamically typed – the type of a variable can change during runtime. In dynamically typed languages, the type pertains to data. Variables just “point” to data (not to be confused with C pointers), and they can point to data of different types at different times. Dynamically typed languages usually don’t have a distinction between compile time and runtime, and type checking can not be performed before runtime. Most of the “scripting” and “rapid development” languages are dynamically typed – Javascript, Lisp, PHP, Python, Ruby, Scheme and others.

Strong vs. Weak typing

The division to strong and weak typed languages is much less clear-cut than static vs. dynamic, and there is a lot of confusion on what strong typing means. One common definition is disallowing operations on incompatible types. Consider the following example from Perl (a weakly typed language):


print "2" + 4;

This code will print ’6′, although it seemingly performs addition on incompatible types. This is because Perl is performing an implicit conversion when it sees we want to add a string that represents a number to another number. On the other hand, Ruby is strongly typed and the above statement will generate a TypeError.

Another example is C (weak) vs. C++ (strong):


char* buf = malloc(20);

This line of code cleanly compiles in C, but generates an error in C++. malloc returns void*, and while C doesn’t mind, C++ does, and demands an explicit cast.

While weak typing often allows for faster development, strong typing gives the compiler / runtime an ability to catch more errors that may be potentially dangerous. Strongly typed languages demand more care from the programmer, requiring far more explicit type conversions. Weakly typed languages, on the other hand, resort to large amounts of implicit type conversions. Perl is probably the chief example of weak typing – the Perl hackers even have a name for this philosophy – DWIM (Do What I Mean). On the other extreme is Ada, where you sometimes feel like programming when your hands are tied behind your back, but as many Ada programmers note, once you get the damn thing to compile, it will probably run correctly.

Another point to note is that static typing almost always goes hand in hand with strong typing. If you demand types in compile time (static typing), you better enforce the usage of these types throughout the program and deny potentially dangerous implicit conversions (strong typing). The only notable statically typed languages without strong typing are C [2] and Basic.

Safe vs. Unsafe typing

For a proper explanation of safety, let’s first define trapped errors as execution errors that cause the computation stop immediately, and untrapped errors as execution errors that go unnoticed and later cause arbitrary behavior. Now we can formally define a safe programming language as one that allows to write only programs that are safe.

The type safety enforcement of a programming can be static, catching potential errors at compile time, or dynamic, associating type information with values at run time and consulting them as needed to detect imminent errors, or a combination of both.

There is a common misconception that weak typing goes hand in hand with lack of safety. This is far from the truth. Most weakly typed languages defer the checking of operations until the last possible opportunity – at run time, where all the information about the operation is available, which allows for more thorough safety checks. For example, while Perl has weak typing, it is definitely type safe. Try to crash the system in a pure Perl program, if you challenge this statement. On the other hand, while C++ is strongly typed, the following minimal program, while compiling cleanly, will crash spectacularly:


int* p = 0;
*p = 5;

Worse yet, a variation of this code may corrupt the program in very mysterious ways which are notoriously difficult to debug.

Languages that allow pointer arithmetic just can’t be type-safe. The flexibility of pointers is implemented in other languages by using references, which are safe forms of pointers, without the arithmetic. C++ itself also provides references which is a good, safe substitute for pointers in most situations.

Taxonomy of commonly used programming languages

Below is a list of the some of most commonly used (or discussed) programming languages, and the typing groups they belong to.

Language | Static / Dynamic | Strong / Weak | Safety |
Ada static strong safe
C static weak unsafe
C++ static strong unsafe
Java static strong safe
Javascript dynamic weak safe
Lisp dynamic strong safe
Pascal static strong safe
Perl dynamic weak safe
PHP dynamic weak safe
Python dynamic strong safe
Ruby dynamic strong safe

Conclusion

I hope this article sheds some light on the widely misunderstood topic of typing. As with all engineering tools, different languages have different approaches to typing, and each has its merits and its disadvantages. Knowing the different approaches and what they entail helps up pick the correct tools at appropriate times.

Notes

[1] Providing the actual type for the compiler to see is called explicit typing and this is what most common languages use. Some languages, like Haskell and OCaml employ a technique called type inference to automatically deduce the type of the variable. These languages are implicit static typed.

[2] C, in particular, is so notorious for its weak and unsafe typing, that many programmers recommend using the C subset of C++ (sometimes referred to as C++–) instead of C itself for large scale programming (C++ is also unsafe, but at least it is strongly typed, so the compiler is much less forgiving to errors that may lead to unsafe code), that many companies earn good money from creating static code checking (Lint-like) tools for C, and that embedded programmers invent restrictions on the usage of C to earn at least some safety (Misra C.

In defense of C it can be said that C is deliberately unsafe, because of performance considerations. The run time checks needed to achieve safety are sometimes considered too expensive. Take array bounds checks, for instance, that cannot be completely eliminated at compile time, and will impose a performance degradation at run time.

Related posts:

  1. Programming language typing – a big mishmash
  2. type system
  3. Lisp on one hand, Ada on the other
  4. Typing Spanish characters on a standard keyboard
  5. Logical operators in Perl and Ruby

12 Responses to “A taxonomy of typing systems”

  1. MasklinnNo Gravatar Says:

    Thanks for the interesting resource, you explained things fairly clearly and cleanly, but I have an issue with your static/dynamic: in the way you phrase it, it looks like statically-typed languages have type specifications on (all of) their variables while dynamically-typed ones don’t, but this is missing the issues of explicit/implicit typing, and the ability to infer types.

    For example Haskell (which you probably know of) is definitely statically typed, and is yet mostly implicitely typed: few people every type their variables, and even functions usually don’t need typing that much, at least when you keep them simple

  2. elibenNo Gravatar Says:

    Thanks, Masklinn, for your helpful comment. I have updated the article to reflect explicit vs. implicit typing.

  3. Johan TibellNo Gravatar Says:

    I find the distinction between strong and weak typing interesting. Take your example:

    print “2″ + 4;

    If this succeeds the types where in fact compatible and the language is strongly typed as per the definition you gave. If it fails they were incompatible. Looked at this way I can’t see a reason for a language to be weakly typed as that would simply be another way of stating that it is unsafe. Does strongly vs weakly typed simply mean that there exists some morphism between types which makes conversions possible?

  4. Jonathan AllenNo Gravatar Says:

    One correction, most variants of Basic are dynamically typed. I’ll focus on Visual Basic as it is the most popular.

    By default, VB doesn’t require variable declaration, let alone types.

    With Option Explicit, variables are required but you still don’t have to give them types. If you don’t, then they are dynamically bound.

    With Option Strict (VB 7+), then VB requires types for all variables, essentially making them all statically typed.

    On a side note, VB programmers tend to use the terms “Early Binding” and “Late Binding” rather than static and dynamic.

  5. Jonathan AllenNo Gravatar Says:

    Johan,

    The thing about dynamic typing is that it waits until runtime to see if such a conversion exists.

    A = 5
    B = new BankAccount()
    C = A + B

    Obviously there is no way to add a bank account to the number 5. With a statically typed language, that would usually be caught at compile time. With a dynamically typed langauge, you wouldn’t find out the mistake until the program was running.

    Note that static typing isn’t perfect. Consider this code…

    Number A = 5
    Object B = new BankAccount()
    Number C = A + (Integer)B

    By making the cast on the third line, the programmer is saying “Yes, I know this might fail. But trust me, it will work.”. Of course if you give the compiler more info, errors are less likely…

    Number A = 5
    BankAccount B = new BankAccount()
    Number C = A + (Integer)B

    Here the compiler can tell the programmer, “No, you don’t know what you are doing. That will always fail.”

    And lastly, the correct code…

    Number A = 5
    BankAccount B = new BankAccount()
    Number C = A + B.Balance

  6. Jonathan AllenNo Gravatar Says:

    One thing missing is the difference between languages that can generate types at runtime and those that cannot.

    For example, VBScript is dynamically typed. There is no way to associate a specific type with a variable when it is declared. But all the possible types are known ahead of time.

    JavaScript, on the other hand, allows you to build up a type at runtime, essentially adding your own properties to it. As I understand it, Ruby and Python are that way too.

    So, do we have good terminology to distingush between dynamic typing and dynamic types?

  7. comNo Gravatar Says:

    Jonathan,

    These are just different things. Dynamic types, as in Ruby, naturally make it quite impossible for the language compiler to be statically typed – it must be dynamically typed. If the types don’t even exist at compile time, the compiler has no chance of enforcing typing.

  8. Paul PrescodNo Gravatar Says:

    You say that Perl is weakly typed because it does implicit conversions. But Java will also do implicit conversions. Where perl converts strings to integers when you add a string and an integer, Java converts the integer to a string. In addition, many languages implicitly coerece to boolean.

  9. TomPNo Gravatar Says:

    Many languages also perform automatic coercion of numeric types (like Fortran INTEGER, DOUBLE, COMPLEX, INTEGER*2, etc.) in arithmetic expressions. Perl’s automatic coercion of strings to numbers differs from this only in degree, not in kind.

  10. NikolasNo Gravatar Says:

    I think what emerges from this discussion is that most of the languages don’t fit neatly in this taxonomy. Though it’s important to understand the different typing concepts I don’t think we should ponder too much over classifying these languages.

  11. zdenekNo Gravatar Says:

    I would argue that C++ is NOT a strongly typed language. The strongness of it’s typing system is essentially the same as in C. It’s completely irrelevant if unsafe casts are explicit or implicit.

  12. Ricky ClarksonNo Gravatar Says:

    A far more rigorous discussion of typesystems is available at http://cdsmith.twu.net/types.html , by Chris Smith.