postgresql-performance

Keep PostgreSQL from sometimes choosing a bad query plan

April 3, 2024 by Tarik

If the query planner makes bad decisions it’s mostly one of two things: 1. The statistics are inaccurate. Do you run ANALYZE enough? Also popular in its combined form VACUUM ANALYZE. If autovacuum is on (which is the default in modern-day Postgres), ANALYZE is run automatically. But consider: Are regular VACUUM ANALYZE still recommended under … Read more

Improving query speed: simple SELECT in big postgres table

December 12, 2023 by Tarik

Extracting my comments into an answer: the index lookup here was very fast — all the time was spent retrieving the actual rows. 23 seconds / 7871 rows = 2.9 milliseconds per row, which is reasonable for retrieving data that’s scattered across the disk subsystem. Seeks are slow; you can a) fit your dataset in … Read more

Any downsides of using data type “text” for storing strings?

November 28, 2023 by Tarik

Generally, there is no downside to using text in terms of performance/memory. On the contrary: text is the optimum. Other types have more or less relevant downsides. text is literally the “preferred” type among string types in the Postgres type system, which can affect function or operator type resolution. In particular, never use char(n) (alias … Read more

Postgres not using index when index scan is much better option

August 15, 2023 by Tarik

Index (Only) Scan –> Bitmap Index Scan –> Sequential Scan For few rows it pays to run an index scan. If enough data pages are visible to all (= vacuumed enough, and not too much concurrent write load) and the index can provide all column values needed, then a faster index only scan is used. … Read more

How do I speed up counting rows in a PostgreSQL table?

June 6, 2023 by Tarik

For a very quick estimate: SELECT reltuples FROM pg_class WHERE relname=”my_table”; There are several caveats, though. For one, relname is not necessarily unique in pg_class. There can be multiple tables with the same relname in multiple schemas of the database. To be unambiguous: SELECT reltuples::bigint FROM pg_class WHERE oid = ‘my_schema.my_table’::regclass; If you do not … Read more

How to understand an EXPLAIN ANALYZE

May 23, 2023 by Tarik

While not as useful for a simple plan like this, http://explain.depesz.com is really useful. See http://explain.depesz.com/s/t4fi. Note the “stats” tab and the “options” pulldown. Things to note about this plan: The estimated row count (183) is reasonably comparable to the actual row count (25). It’s not hundreds of times more, nor is it 1. You’re … Read more

Postgres query optimization (forcing an index scan)

May 11, 2023 by Tarik

For testing purposes you can force the use of the index by “disabling” sequential scans – best in your current session only: SET enable_seqscan = OFF; Do not use this on a productive server. Details in the manual here. I quoted “disabling”, because you cannot actually disable sequential table scans. But any other available option … Read more

Finding similar strings with PostgreSQL quickly

April 23, 2023 by Tarik

The way you have it, similarity between every element and every other element of the table has to be calculated (almost a cross join). If your table has 1000 rows, that’s already 1,000,000 (!) similarity calculations, before those can be checked against the condition and sorted. Scales terribly. Use SET pg_trgm.similarity_threshold and the % operator … Read more

Optimize GROUP BY query to retrieve latest row per user

February 1, 2023 by Tarik

For best read performance you need a multicolumn index: CREATE INDEX log_combo_idx ON log (user_id, log_date DESC NULLS LAST); To make index only scans possible, add the otherwise not needed column payload in a covering index with the INCLUDE clause (Postgres 11 or later): CREATE INDEX log_combo_covering_idx ON log (user_id, log_date DESC NULLS LAST) INCLUDE … Read more

Best way to delete millions of rows by ID

January 20, 2023 by Tarik

It all depends … Assuming no concurrent write access to involved tables or you may have to lock tables exclusively or this route may not be for you at all. Delete all indexes (possibly except the ones needed for the delete itself). Recreate them afterwards. That’s typically much faster than incremental updates to indexes. Check … Read more