Reduce Postgres query time from seconds to milliseconds with partial indexes
February 26, 2018
A popular way to increase query performance in Postgres is to add indexes. However, if a table is sufficiently large, query performance can still be slow even with appropriate indexes are added. Fortunately, Postgres has an other trick up its sleeve. Partial indexes!
At SupportBee, we receive tens of thousands of emails each day. We process and eventually store those emails in a tickets
table. Many parts of our application query the tickets
table. Here’s one such query
SELECT tickets.id, tickets.last_reply_at FROM "tickets" WHERE "tickets"."account_id" = 1234 AND "tickets"."spam" = 'f' AND "tickets"."trash" = 'f' AND "tickets"."state" = 1 ORDER BY "tickets"."last_reply_at" ASC;
The query fetches legitimate (i.e. not spam or trash) unanswered tickets that belong to a specific account. Despite having indexes on the tickets
table, the query consumed upwards of 4 seconds!
A key reason why this query was so slow was because Postgres had to scan every ticket belonging to the account to determine if a ticket should be included in the query results. In SupportBee, 96% of an account’s tickets are either archived (i.e. they’ve been replied to and resolved) or are spam or trash. In an ideal scenario, Postgres can only scan 4% of an account’s tickets to respond to the query. This is a scenario where Postgres’ partial indexes shine. To improve this query’s performance, we decided to add the following partial index
CREATE INDEX index_unanswered_tickets_on_account_id_and_last_reply_at ON tickets(account_id, last_reply_at DESC) WHERE state = 1 AND spam = 'f' AND trash = 'f’
The partial index only contains legitimate unanswered tickets. The partial index reduced query time from 4 seconds to an unbelievable 3 milliseconds. The execution plan of the query is as follows
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan Backward using index_unanswered_tickets_on_account_id_and_last_reply_at on public.tickets (cost=0.00..151126.72 rows=159024 width=12) (actual time=0.147..0.939 rows=370 loops=1)
Output: id, last_reply_at
Index Cond: (tickets.account_id = 1234)
Buffers: shared hit=364
Total runtime: 3.021 ms
(5 rows)
Depending on the size of your tables and the nuqueries filter on, your mileage with partial indexes may vary. Nevertheless, partial indexes are a very handy alternative to complicated solutions like sharding etc. If you’d like to learn more about partial indexes, the chapter on partial indexes in Postgres’ Manual is a fantastic resource.