Some thoughts on JSON vs. S-expressions

March 4th, 2012 at 9:05 pm

Recently I’ve been doing some thinking on the topic of data serialization formats. The once-ubiquitous XML has been, at least to some extent and in some domains, displaced by JSON, especially recently with the advent of AJAX interfaces between Javascript code running on the client and web applications on the server. That’s, IMHO, a good thing – I find JSON friendlier and saner than XML, at least in most use cases.

However, some developers aim to take the simplicity advantage of JSON over XML further and propose using S-expressions instead. There are even claims that S-expressions are the "fat free" alternative to JSON. This is where I have to say "stop, wait just a moment".

I love Lisp and Scheme, I really do. However, I firmly believe that while being a great thought exercise about the equivalence of code and data, S-expressions are not always a worthy replacement for JSON. Here’s why.

Consider the following JSON string that defines a nested list with pairs of fruit names and numbers:

'[["oranges", 2], ["apples", 6], ["pears", 5] ]'

The equivalent S-expression is:

'((ORANGES 2) (APPLES 6) (PEARS 5))

Yep, this has less punctuation, which is nice. However, now suppose I want my data to be not just a nested list, but rather a mapping of fruit names to numbers. And I want it to be an efficient mapping, too. In other words, I want the JSON:

'{"oranges": 2, "apples": 6, "pears": 5}'

When reading this string with a JSON processing module in Python I would get a dictionary. In Javascript, an object, and so on. The point is, this is clearly different from just a nested list. In JSON, the above is by definition an unordered set, which usually implies some sort of efficient associative lookup (hash table, balanced tree, and so on).

As far as I know, there is no way to represent this natively as an S-expression without adding a level of abstraction and/or notation.

Sure, Lisp-y languages have all kinds of associative array data structures. For instance, Common Lisp has hash tables. But the "S-expression" representation [1] of Common Lisp hash tables is:

#S(HASH-TABLE :TEST FASTHASH-EQL (ORANGES . 2) (APPLES . 6) (PEARS . 5))

But this is hardly a "fat-free" syntax any longer, especially for short dicts. And it would change from language to language – Scheme has its own implementation(s) for associative arrays, Clojure its own, etc.

Another alternative would be to use some accepted notation to represent dictionaries in S-expressions. For example, by following key names with a colon. However, with this we get even farther from "standard" S-expressions.

Essentially, when representing data there are two kinds of collections. There are ordered collections (lists, tuples, arrays) and unordered collections (dictionaries, associative arrays, maps, hashes, etc.). Both are important in programming, and both are natively supported and heavily used in pretty much every modern language out there [2]. When designing a data serialization scheme, it’s important to take this distinction into account and it may be beneficial for the serialization format to support it natively.

So what’s the point of this post? Just to demonstrate one aspect of the JSON vs. S-expressions debate that isn’t always being considered when comparing the two. Not that this is a fatal blow to S-expressions, far from it. There’s a good chance your data does not need the distinction between ordered and unordered collections, in which case S-expressions are fine. They are fine even when you do need the distinction but are willing to invest in an extra layer of abstraction and define a notation to represent dictionaries on top of S-expressions. However, if you want something that is already de-facto standardized, and moreover is uniformly accepted in many programming languages, JSON is likely to be the better choice.

While it is nice to cut some loose fat here and there, in reality JSON is slim enough so that any extra cutting is subject to the law of diminishing returns. In other words, here practicality definitely beats purity.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg

[1] S-expression syntax doesn’t formally account for hash tables, so by this I mean the textual representation that would be converted to a hash table by Common Lisp’s read function.
[2] Further, dictionaries are frequently used to represent attributes (methods, members, etc.) of objects.

Related posts:

  1. JSON is YAML, but YAML is not JSON
  2. AoHoHoAoA
  3. Parsing expressions by precedence climbing
  4. dfa minimization + log
  5. SICP section 2.4

31 Responses to “Some thoughts on JSON vs. S-expressions”

  1. NikNo Gravatar Says:

    There is one thing still missing in JSON – standard for comments.

  2. fooNo Gravatar Says:

    Adding a dictionary syntax based on hash-tables to Common Lisp is extremely simple. READ-MACROS provide the necessary API to the reader. CLtL2 even comes with an example that looks extremely similar.

  3. Julien OsterNo Gravatar Says:

    You make a good case for when data is truly dynamic, i.e. you don’t know what to expect at all, but I’d guess that in the vast majority, you know exactly what your data will look like. So it’s really not about how the data is represented in your payload (after all, serializing it will turn any unordered list into an ordered representation) but how you will treat it afterwards, i.e. if you read it into e.g. a linked list or an unordered hash set.

    I won’t be happy when getting a list when I expected a dictionary, just as I won’t be happy when getting an integer instead of a string. So strictly spoken, I don’t need extra syntax to tell me what something is. And because of my expectations, it doesn’t really matter whether I implicitly get an unordered list or whether I have to call some .asHashMap() method to extract it, if I get misshaped data it will just fail in different ways (there’s an argument for static typing in there: .asHashMap() will fail right away in an obvious way, while the implicit integer or whatever else that I got instead of the expected dictionary might get passed around quite happily, until much later when it’s actually accessed).

    I pretty much agree with the rest of your article, though, that JSON shouldn’t be replaced by sexps. However, not because of efficiency reasons, but because I as a human can read JSON much better than sexps. As you say, I think it’s a pretty good compromise between a minimal and a descriptive format.

  4. elibenNo Gravatar Says:

    foo,

    Sure, it is. But then you have a specific CL implementation. What about interfaces to other languages?

    Julien,

    That’s for simple data. But for nested lists of dicts of lists of dicts… how will you tell the difference?

  5. Julien OsterNo Gravatar Says:

    Well you know at what level and position within the level to expect a dict or a list?

    Or do you mean cases where you can expect a list *or* a dict at the same position? This would not apply to what I said, it’s under the assumption that you know the structure of the data beforehand (which, I suspect, is the majority of cases).

    Otherwise could you give a specific example of what you mean?

  6. elibenNo Gravatar Says:

    Julien,

    Yes, we know what is expected where. The code is much more convoluted than a simple read, though, isn’t it?

  7. MikeNo Gravatar Says:

    I’m not sure “unordered sequences” is the right term here, as “sequence” implies order. Perhaps “unordered collections” or another term would be better.

  8. TechNeilogyNo Gravatar Says:

    I’ve done a lot of thinking about and experimentation with the map/vector representation problem in s-expression languages. I think Clojure has the right idea: you really only need a few overloads to greatly simplify the syntax. Mostly, you just need a map syntax representation and a vector syntax representation. The rest can fit into standard s-expressions. When you think about it, this is also the approach JSON takes: cover 80 percent of the cases simply, and don’t sweat the rest.

  9. elibenNo Gravatar Says:

    Mike,

    You’re right, I changed “sequence” to “collection” everywhere to be clearer.

  10. FooNo Gravatar Says:

    Easily fixed.

    http://www.reddit.com/r/programming/comments/7sfro/response_to_problems_with_lisp/

  11. PhiloNo Gravatar Says:

    >And it would change from language to language

    …which is one of the most compelling arguments for XML (and JSON, as it continues to grow in popularity). While it’s a chicken-and-egg problem to argue for a tool because it’s got wide adoption, it does provide a pretty significant impedance layer – if you want to displace an existing standard, you need a far better reason than “I don’t like XML because it’s wordy”

  12. IanNo Gravatar Says:

    The important distinction isn’t between ordered and unordered; it’s between list and map. If Lisp had a canonical representation for unordered collections, that wouldn’t be enough, in most cases. What you really want is a canonical representation for a key-value data structure. In most cases such a data structure is unordered, but that’s not the important point.

  13. Nathan RiceNo Gravatar Says:

    I think ignoring hardcore lisp compatibility for the sake of expressiveness would probably be worthwhile here…

    (mapping (“oranges” 2) (“apples” 6) (“Pears” 5))

    Succinct, and the technique can be extended for process descriptions. Just my 2c…

  14. AnonNo Gravatar Says:

    Why can’t

    '((ORANGES 2) (APPLES 6) (PEARS 5))

    be the representation for a dictionary as well? Or even

    '(ORANGES 2 APPLES 6 PEARS 5)

    Why should the textual storage format dictate how you can access a data structure?

  15. Andrew PennebakerNo Gravatar Says:

    Eh, you could use a property list for the front end but use an optimized hashmap behind the scenes.

    ‘(ORANGES 2 APPLES 6 PEARS 5)

    How do you like them apples?

  16. Tom RitchfordNo Gravatar Says:

    First, there are a lot more reasons not to like XML than “it’s wordy”.

    For one thing, bitter experience has shown when people edit XML files, the error rate is surprisingly high – partly the “wordiness” but partly the fact that matching closing tags is a tricky task.

    But more important, programming to an XML data structure is significantly harder.

    XML has *three* different containment mechanisms: a tag can contain subtags, it can have attributes, and it can have CDATA – and any given tag could theoretically do all three of these. As a result, addressing, that is specifying a location with an XML tree, is challenging.

    JSON, by contrast, has only ONE containment mechanism. Yes, nodes can be either dictionaries or arrays, but it makes no difference for addressing – I can always uniquely refer to any element in my JSON tree with a simple series of names like body.lines.5.contents.

    The one big issue IMHO with JSON is comments. In my own code, I solve that neatly by having my files actually be YAML files. YAML is backward compatible with JSON – any JSON file is a YAML file – but YAML is even more compact than JSON *and* it allows comments.

    About 15 years ago, XML was all the rage. I spent many frustrating hours debugging XSLT expressions! Now many big shops have dropped it altogether – and I will never use it in another project unless I have to.

    JSON/YAML for the win !!!

  17. Tom RitchfordNo Gravatar Says:

    Oh, and s-expressions are really a non-starter. Sorry!

    They are cool, but dictionaries and arrays are near-primitive types in almost every modern programming language. I can literally paste my JSON expression into my JS or Python program and have it work right the first time…

  18. Patrick SteinNo Gravatar Says:

    I never leave JSON data in the format that it came in as. When I have JSON data, I want to pluck what I want out of it and build my own structures or use it as parameters to my own functions. Yes, for quick hacks, I might just pass the parsed JSON value around. But, only to get things going.

    Similarly, I’d never design my code to pass XML data around as a parsed DOM so that the various functions can pluck what they want out of it.

    [Eep. I suppose that some of my JavaScript code calls getElementById() during each onclick callback. I'm gonna have to fix that. ;) ]

    There is a huge dichotomy in most of the XML-handling code that I’ve seen (in C++, particularly, but elsewhere, too). The code simultaneously wants to pretend that the XML could be structured any which way at all and that would be fine for the program and yet wants to pretend that when it looks for something in the DOM, it knows exactly where to find it.

    JSON code often has the same dichotomy. “I’m going to read in this JSON data. It could be anything. It could be a map or an array or a single string. Okay, now I’ve read it in. I’m going to pass it into this function. That function knows it’s going to be an array where the third element is a map that has a key called “first name” whose value is going to be a string.”

    Occasionally, a JSON map is what I want. Occasionally, a Lisp p-list is what I want.

    Most of the time, I’ve got a big input-validation-conversion step I need to do right after parseJSON() or (read ...). At that point, it hardly matters what the input format looked like. To me, it should be formatted in whatever way is easiest for the producer.

    There’s very little data that I have that I only want to use with one program in one way.

  19. Vagif VerdiNo Gravatar Says:

    Why do you equate s-expressions with Common Lisp ?

    There’s quite popular lisp on JVM (clojure) that has following syntax for maps:
    {“oranges”: 2, “apples”: 6, “pears”: 5}

    EXACTLY like js.

  20. rcNo Gravatar Says:

    Clojure explicitly addressed and solved this problem. Rich Hickey even talked about it in his Strange Loop presentation: most Lisps “complect” different data structures like lists and maps, so Clojure added syntax to separate them.

    It explains why JSON took off and not Sexps: Lispers didn’t seem to think that complecting was an issue. But if people are going to recommend Sexps in 2012, they should start from Clojure’s syntax.

  21. Josh StoneNo Gravatar Says:

    Um… you do know that association lists are quite standard, right? And you also realize that when you talk about Python parsing it into a dictionary, you’re talking about a “layer of abstraction”?

    ‘((ORANGES . 2) (APPLES . 6) (PEARS . 5))

    This is the canonical representation of an association list. The fact that Common Lisp interprets this as an inefficient list doesn’t mean you couldn’t parse it into an efficient representation with your Lisp-ish JSON library. This is no different from the fact that your JSON example is, at first, interpreted by Python as a string — it is the library that turns it into a dictionary.

  22. DavidNo Gravatar Says:

    JSON used to have comments, but it was removed for a number of reasons. See Douglas Crockford’s “The JSON Saga” on YouTube (~16:00) http://www.youtube.com/watch?v=-C-JoyNuQJs

  23. NicolasNo Gravatar Says:

    Quite interresting. But I think there are several things. First JSON is not “compatible” out of thin air. It work natively with JS and the rest is using API to work with it. It doesn’t even work so naively with JS if you want to deal with security problems.

    S-Expressions can do the same and you can choose your prefered solution to encode a dictionnary with them… Clojure notation is nice but in fact any function call would do:

    ‘(dictionnary ‘(a b) ‘(c d)) or even (dictionnary a b c d).

    With clojure (or a reader macro of your choice) you just have {a b c d}, no need for “:” or “,”

    But you use S-expressions when you want to mix data and code, not just data. This bring you not just dictionnaries and arrays or lists, but what ever you might want.

    Here literal notation for javascript objects miss macros and that increase verbosity and limit expressivity.

    But as a pure exchange format you don’t want executable code… And S-expression are also executable code. Without the executable code part, S-expression are far less interresting.

  24. cookingNo Gravatar Says:

    Admiring the dedication you put into your site and in depth information you offer.
    It’s great to come across a blog every once in a while that isn’t the same out of date rehashed information. Excellent read! I’ve bookmarked your site and I’m including your RSS feeds to my Google account.

  25. MikeNo Gravatar Says:

    Clojure forms are probably more interesting in this space than traditional Lisp s-expressions.

    They have a nice syntax and a lot more features than JSON:

    (some-dsl-header
          {:keyword :my-key
           :namespaced-keyword :my-ns/my-key
           :symbol foo
           :string "foo"
           :big-integer 769869869876976976987698609876N
           :big-decimal 16876876987687698.67sfg896875858M
           :rational 12/79
           :vector [1 2 3 4]
           :set #{a b c d}
           :map {:a 1, :b 2, :c 3}})
  26. Josh TNo Gravatar Says:

    Although Clojure was *perfectly* relevant to this discussion, for posterity I want to make sure a link to EDN (“eed-n”, Extensible Data Notation) is included: https://github.com/edn-format/edn. It’s a subset of Clojure syntax specifically meant for data transfer.

    Mike’s above sample of Clojure is probably a great example of what EDN looks like.

  27. Josh TNo Gravatar Says:

    Also, Self-ML is another great example of Sexpr-based data format being used by contemporary software: http://chocolatapp.com/blog/self-ml

  28. NickNo Gravatar Says:

    I’m with Anon. The text representation or serialized format should have no negative impact on the in-memory runtime data structure. I’m not familiar with json but it would seem that if their serialized format allows you to specify the runtime data structure that would only save potentially a copy over LISP/SCHEME. I guess json could read the serialized format directly into a hashtable/dictionary as in LISP/SCHEME you would read into a list and then generate a hashtable, thus at some point in time you have two copies of the data instead of one.

  29. Yin WangNo Gravatar Says:

    I don’t think your comparison is fair. In the JSON string, you do have extra level of abstraction and/or notation:

    '{"oranges": 2, "apples": 6, "pears": 5}'

    In fact there is so much “extra” here if you see how different this is, compared to the “nested list with pairs format”:

    '[["oranges", 2], ["apples", 6], ["pears", 5] ]'

    All the curly braces and colons are “extra abstraction and notation” in your words.

    The only reason you think JSON is better is because this special treatment for unordered collections is “standardized”. S-expression could have been standardized too, and in much simpler ways (not as your Common Lisp example).

    But JSON shows its limitations when you have more complex data structures (e.g. expressions in programming languages). If you use the strict structure format of JSON, it will result in very cumbersome formats that nobody wants to edit. To get around this limitation, people designed unprincipled expressions languages used by some databases such as MongoDB:

    db.scores.find({a: {'$gt': 15}});

    S-expression (> a 15) is much simpler and more general.

  30. Chris BirdNo Gravatar Says:

    Yin Wang hit the nail on the head for me (and far more eloquently than my low level grousing about MongoDB). The strict dtructure format of JSON leads us to some weird side effects. Where we have genuinely monadic expressions there are occasions where we have to give a value, even though it is meaningless. However sometimes the value is evaluated, so we get into some lovely (NOT) side effects.
    I do think that each otation has some strengths. If all I want is data structures, then JSON is very handy. If all I want is immutable functions then I like my s-exps. My problem occurs when I want to store the immutable functional representation onto a permanent file store and “rehydrate” it when I want to work on it.

  31. Scott KalterNo Gravatar Says:

    Interesting. Two thoughts.

    First, I could easily live with JSON except for the lack of symbols which I’m surprised no one is talking about:

    (a “a”) in Javascript?

    The second is that I bet most s-expression enthusiasts would agree that JSON would be far better off if it would just drop the completely pointless ‘,’ everywhere.

    [1, 2, 3] ==> [1 2 3]

    and

    {“a”:1, “b”:2, “c”:3} ==> {“a”:1 “b”:2 “c”:3}

    Some will point out the ‘:’ is also questionable. I’m on the fence with respect to human readability. For machine readability, ‘:’ is completely pointless too.

    Anyway, for people used to s-expressions those commas are irritating to read and write both by hand and by machine.

    I came across this post because I’m looking for thoughts on how to solve the rather debilitating problem that JSON doesn’t have symbols. So, if I want to represent the something like the following s-expression, for evaluation:

    (= a “a”)

    i.e. look up the value of the variable ‘a’ in some environment and compare it to the string “a”. I’m not finding an obvious JSON equivalent representation.

    This:

    ["=", "a", "a"]

    obviously, does not cut it.

    In summary, if JSON had symbols and lost the pointless commas, I would consider it a potential improvement over s-expressions. The commas are an irritant, the lack of symbols seems pretty broken to me at this moment.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)