I will need some queries in the form “list all objects where one of altnames is ‘foobar’.” The expected table size is on the order of a few million records. Postgres JSON queries can be used for that, and it can also be indexed (Index For Finding Element in JSON array, for example). However, SHOULD it be done that way or is it a perverse workaround that’s not recommended?
It can be done that way but that doesn’t mean that you should. In some sense, the best practice is well documented already (see e.g. using hstore vs using XML vs using EAV vs using a separate table) with a new datatype which, for all intents and practical purposes (besides validation and syntax), is no different from prior unstructured or semi-structured options.
Put another way, it’s the same old pig with new makeup.
JSON offers the ability to use inverted search tree indexes, in the same way as hstore, array types and tsvectors do. They work fine, but keep in mind that they’re primarily designed for extracting points in a neighborhood (think geometry types) ordered by distance, rather than for extracting a list of values in lexicographical order.
To illustrate, take the two plans that Roman’s answer outlines:
- The one that does an index scan plows through disk pages directly, retrieving the rows in the order indicated by the index.
- The one that does a bitmap index scan starts by identifying every disk page that might contain a row, and reads them as they appear on disk, as if it was (and in fact, precisely like) doing a sequence scan that skips useless areas.
Getting back to your question: Cluttered and oversized inverted tree indexes will indeed improve the performance of your app if you use Postgres tables as giant JSON stores. But they’re not a silver bullet either, and they won’t get you as far as proper relational design when dealing with bottlenecks.
The bottom line, in the end, is no different from what you’d get when deciding to use hstore or an EAV:
- If it needs an index (i.e. it frequently appears in a where clause or, even more importantly, in a join clause), you likely want the data in a separate field.
- If it’s primarily cosmetic, JSON/hstore/EAV/XML/whatever-makes-you-sleep-at-night works fine.