How is it that json serialization is so much faster than yaml serialization in Python?

Question

In general, it’s not the complexity of the output that determines the speed of parsing, but the complexity of the accepted input. The JSON grammar is very concise. The YAML parsers are comparatively complex, leading to increased overheads.

JSON’s foremost design goal is
simplicity and universality. Thus,
JSON is trivial to generate and parse,
at the cost of reduced human
readability. It also uses a lowest
common denominator information model,
ensuring any JSON data can be easily
processed by every modern programming
environment.

In contrast, YAML’s foremost design
goals are human readability and
support for serializing arbitrary
native data structures. Thus, YAML
allows for extremely readable files,
but is more complex to generate and
parse. In addition, YAML ventures
beyond the lowest common denominator
data types, requiring more complex
processing when crossing between
different programming environments.

I’m not a YAML parser implementor, so I can’t speak specifically to the orders of magnitude without some profiling data and a big corpus of examples. In any case, be sure to test over a large body of inputs before feeling confident in benchmark numbers.

Update Whoops, misread the question. 🙁 Serialization can still be blazingly fast despite the large input grammar; however, browsing the source, it looks like PyYAML’s Python-level serialization constructs a representation graph whereas simplejson encodes builtin Python datatypes directly into text chunks.

Leave a Comment Cancel reply