Keep PostgreSQL from sometimes choosing a bad query plan

If the query planner makes bad decisions it’s mostly one of two things: 1. The statistics are inaccurate. Do you run ANALYZE enough? Also popular in its combined form VACUUM ANALYZE. If autovacuum is on (which is the default in modern-day Postgres), ANALYZE is run automatically. But consider: Are regular VACUUM ANALYZE still recommended under … Read more

Improving query speed: simple SELECT in big postgres table

Extracting my comments into an answer: the index lookup here was very fast — all the time was spent retrieving the actual rows. 23 seconds / 7871 rows = 2.9 milliseconds per row, which is reasonable for retrieving data that’s scattered across the disk subsystem. Seeks are slow; you can a) fit your dataset in … Read more

Any downsides of using data type “text” for storing strings?

Generally, there is no downside to using text in terms of performance/memory. On the contrary: text is the optimum. Other types have more or less relevant downsides. text is literally the “preferred” type among string types in the Postgres type system, which can affect function or operator type resolution. In particular, never use char(n) (alias … Read more

How do I speed up counting rows in a PostgreSQL table?

For a very quick estimate: SELECT reltuples FROM pg_class WHERE relname=”my_table”; There are several caveats, though. For one, relname is not necessarily unique in pg_class. There can be multiple tables with the same relname in multiple schemas of the database. To be unambiguous: SELECT reltuples::bigint FROM pg_class WHERE oid = ‘my_schema.my_table’::regclass; If you do not … Read more

How to understand an EXPLAIN ANALYZE

While not as useful for a simple plan like this, http://explain.depesz.com is really useful. See http://explain.depesz.com/s/t4fi. Note the “stats” tab and the “options” pulldown. Things to note about this plan: The estimated row count (183) is reasonably comparable to the actual row count (25). It’s not hundreds of times more, nor is it 1. You’re … Read more

Postgres query optimization (forcing an index scan)

For testing purposes you can force the use of the index by “disabling” sequential scans – best in your current session only: SET enable_seqscan = OFF; Do not use this on a productive server. Details in the manual here. I quoted “disabling”, because you cannot actually disable sequential table scans. But any other available option … Read more

Finding similar strings with PostgreSQL quickly

The way you have it, similarity between every element and every other element of the table has to be calculated (almost a cross join). If your table has 1000 rows, that’s already 1,000,000 (!) similarity calculations, before those can be checked against the condition and sorted. Scales terribly. Use SET pg_trgm.similarity_threshold and the % operator … Read more

Optimize GROUP BY query to retrieve latest row per user

For best read performance you need a multicolumn index: CREATE INDEX log_combo_idx ON log (user_id, log_date DESC NULLS LAST); To make index only scans possible, add the otherwise not needed column payload in a covering index with the INCLUDE clause (Postgres 11 or later): CREATE INDEX log_combo_covering_idx ON log (user_id, log_date DESC NULLS LAST) INCLUDE … Read more

Best way to delete millions of rows by ID

It all depends … Assuming no concurrent write access to involved tables or you may have to lock tables exclusively or this route may not be for you at all. Delete all indexes (possibly except the ones needed for the delete itself). Recreate them afterwards. That’s typically much faster than incremental updates to indexes. Check … Read more