Some thoughts on JSON vs. S-expressions

Recently I've been doing some thinking on the topic of data serialization formats. The once-ubiquitous XML has been, at least to some extent and in some domains, displaced by JSON, especially recently with the advent of AJAX interfaces between Javascript code running on the client and web applications on the server. That's, IMHO, a good thing - I find JSON friendlier and saner than XML, at least in most use cases.

However, some developers aim to take the simplicity advantage of JSON over XML further and propose using S-expressions instead. There are even claims that S-expressions are the "fat free" alternative to JSON. This is where I have to say "stop, wait just a moment".

I love Lisp and Scheme, I really do. However, I firmly believe that while being a great thought exercise about the equivalence of code and data, S-expressions are not always a worthy replacement for JSON. Here's why.

Consider the following JSON string that defines a nested list with pairs of fruit names and numbers:

'[["oranges", 2], ["apples", 6], ["pears", 5] ]'

The equivalent S-expression is:

'((ORANGES 2) (APPLES 6) (PEARS 5))

Yep, this has less punctuation, which is nice. However, now suppose I want my data to be not just a nested list, but rather a mapping of fruit names to numbers. And I want it to be an efficient mapping, too. In other words, I want the JSON:

'{"oranges": 2, "apples": 6, "pears": 5}'

When reading this string with a JSON processing module in Python I would get a dictionary. In Javascript, an object, and so on. The point is, this is clearly different from just a nested list. In JSON, the above is by definition an unordered set, which usually implies some sort of efficient associative lookup (hash table, balanced tree, and so on).

As far as I know, there is no way to represent this natively as an S-expression without adding a level of abstraction and/or notation.

Sure, Lisp-y languages have all kinds of associative array data structures. For instance, Common Lisp has hash tables. But the "S-expression" representation [1] of Common Lisp hash tables is:

#S(HASH-TABLE :TEST FASTHASH-EQL (ORANGES . 2) (APPLES . 6) (PEARS . 5))

But this is hardly a "fat-free" syntax any longer, especially for short dicts. And it would change from language to language - Scheme has its own implementation(s) for associative arrays, Clojure its own, etc.

Another alternative would be to use some accepted notation to represent dictionaries in S-expressions. For example, by following key names with a colon. However, with this we get even farther from "standard" S-expressions.

Essentially, when representing data there are two kinds of collections. There are ordered collections (lists, tuples, arrays) and unordered collections (dictionaries, associative arrays, maps, hashes, etc.). Both are important in programming, and both are natively supported and heavily used in pretty much every modern language out there [2]. When designing a data serialization scheme, it's important to take this distinction into account and it may be beneficial for the serialization format to support it natively.

So what's the point of this post? Just to demonstrate one aspect of the JSON vs. S-expressions debate that isn't always being considered when comparing the two. Not that this is a fatal blow to S-expressions, far from it. There's a good chance your data does not need the distinction between ordered and unordered collections, in which case S-expressions are fine. They are fine even when you do need the distinction but are willing to invest in an extra layer of abstraction and define a notation to represent dictionaries on top of S-expressions. However, if you want something that is already de-facto standardized, and moreover is uniformly accepted in many programming languages, JSON is likely to be the better choice.

While it is nice to cut some loose fat here and there, in reality JSON is slim enough so that any extra cutting is subject to the law of diminishing returns. In other words, here practicality definitely beats purity.

[1]	S-expression syntax doesn't formally account for hash tables, so by this I mean the textual representation that would be converted to a hash table by Common Lisp's `read` function.

[2]	Further, dictionaries are frequently used to represent attributes (methods, members, etc.) of objects.