PostgreSQL: Create an index to quickly distinguish NULL from non-NULL values

Question

I’m interpreting you claim that it’s “overkill” in two ways: in terms of complexity (using a B-Tree instead of just a list) and space/performance.

For complexity, it’s not overkill. A B-Tree index is preferable because deletes from it will be faster than some kind of “unordered” index (for lack of a better term). (An unordered index would require a full index scan just to delete.) In light of that fact, any gains from an unordered index would be usually be outweighed by the detriments, so the development effort isn’t justified.

For space and performance, though, if you want a highly selective index for efficiency, you can include a WHERE clause on an index, as noted in the fine manual:

CREATE INDEX ON my_table (name) WHERE name IS NOT NULL;

Note that you’ll only see benefits from this index if it can allow PostgreSQL to ignore a large amount of rows when executing your query. E.g., if 99% of the rows have name IS NOT NULL, the index isn’t buying you anything over just letting a full table scan happen; in fact, it would be less efficient (as @CraigRinger notes) since it would require extra disk reads. If however, only 1% of rows have name IS NOT NULL, then this represents huge savings as PostgreSQL can ignore most of the table for your query. If your table is very large, even eliminating 50% of the rows might be worth it. This is a tuning problem, and whether the index is valuable is going to depend heavily on the size and distribution of the data.

Additionally, there is very little gain in terms of space if you still need another index for the name IS NULL rows. See Craig Ringer’s answer for details.

Leave a Comment Cancel reply