I once heard someone, who will remain unnamed, say that JSON could be a faster encoding than raw binary.

I assume most of us know this isn’t true but just to make sure, I wanted to write a test, gather some data, and graph it. Better to have data to prove my claim than not to.

Note: if anyone notices a flaw in my test, feel free to let me know and I’ll be glad to fix it, re-collect the data, and update the post.

First of all, here is the source code for the test I used to generate this data. I ran this on an Amazon EC2 instance with 16 vCPUs an 64 GB of RAM.

This test calculates average times for serializing and deserializing Blocks of data with a different number of Transactions inside them. This is just randomly generated data. The data structures are as follows:

struct Transaction {
	from: String,
	to: String,
	value: usize
}

struct Block {
	transactions: Vec<Transaction>,
	nonce: u64
}

The test calculates the average time it takes to both serialize and deserialize a Block; the average was calculated with 50 samples. The number of transactions starts out at 10,000 and goes up to 250,000 in steps of 10,000.

Let’s take a look at the results:

Serialize Performance Deserialize Performance

The test I wrote uses the Serializer and Deserializer implementations derived by the serde library. Theoretically performance may be improved with custom implementations of these but it shouldn’t affect these results.

JSON obviously is the slowest of the three serialization/deserialization formats I tested. To serialize to JSON, a string must be constructed which contains many “extra” characters ({, }, [, ], ", ,, etc.). In all honesty, the serialization for JSON is slower than I would expect, according to these results. Deserializing JSON is slow because text parsing must occur; think of a recursive-descent parser. Additionally, since everything in JSON (well, besides array elements) have a key assigned to them, hash tables may be needed. This may or may not be the case for serde_json; I haven’t dug through its code to determine this.

CBOR is, as far as I understand, “binary JSON”, similar to BSON. Therefore, it makes sense that deserializing it would have similar performance characteristics to the JSON deserialization. However, serialization is a fair bit faster; I would guess this is because it doesn’t have to do a bunch of string operations, but I don’t know the implementation details of serde_cbor.

Finally, the bincode crate is by far the fastest serialization/deserialization format I tested. Unfortunately it is also the format that I know the least about in terms of how it actually works. I would guess that it does the simplest and most obvious thing: just writes the data and reads it back in. In my opinion, this is the “right” way to do serialization in many cases. Of course there are other considerations that may complicate this such as versioning which may either call for a different format or complicate this simple format. Either way, this is clearly the winner in terms of performance.

The point of this post is simply to say that JSON is not faster than binary, unless the test-case is contrived. Even then, it would probably have to be an unfair test for JSON to outperform a binary format. It’s just less computational work to use a binary format.

That said, JSON has its uses. It’s a great format, but not for communication; it should be used for things such as configuration files.

Anyway, /rant over