Postgres - index types and usage scenarios
By Niraj Zade | 2024 Oct 13 | 8 min read
Index creation template:
CREATE INDEX <index_name> ON <table_name> USING <index_type>(column_name);
Index types:
In-depth explanation in the postgres source code - /src/backend/access/nbtree/README
The postgres b-tree index uses the balanced tree data structure. So, naturally the fetching time will be uniform for all values, as all the elements will be almost equidistant from the root of the tree.
Properties
When to use
When not to use
CREATE INDEX btree_index_name ON your_table(column_name);
Supported operators
< <= = >= >
In-depth explanation in the postgres source code - /src/backend/access/hash/README
When to use
When not to use
Notes on performance
Hash index doesn't like too many hash values in a single bucket. This is exactly like a normal hash map, with a linked list for each bucket. Hash index performs best when the data is uniformly distributed. In other words - the ratio of
cardinality
tonumber_of_rows
is low.When the number of rows is high, but column cardinality is low, the hash index performance will suffer massively.
Reason - This is because the bucket in the hashmap will end up with too many values. Postgres stores the buckets in pages (postgres stores everything in pages). With too many values in a bucket, the bucket will be spread over many pages. This will quickly ruin performance at scale.
CREATE INDEX hash_index_name ON your_table USING hash (column_name);
Supported operators
=
In-depth explanation in the postgres source code - /src/backend/access/gist/README
Properties
When to use
CREATE INDEX gist_index_name ON your_table USING gist (column_name);
Supported operators
<< &< &> >> <<| &<| |&> |>> @> <@ ~= &&
In-depth explanation in the postgres source code - /src/backend/access/spgist/README
Properties
When to use
CREATE INDEX spgist_index_name ON your_table USING spgist (column_name);
Supported operators
<< >> ~= <@ <<| |>>
In-depth explanation in the postgres source code - /src/backend/access/gin/README
More details by the maintainers - Gin for PostgreSQL
Properties
When to use
When not to use
CREATE INDEX gin_index_name ON your_table USING gin (column_name);
Supported operators
<@ @> = &&
In-depth explanation in the postgres source code - /src/backend/access/brin/README
Brin index groups storage pages into blocks, and stores summary information about each block. The default page block size is 128.
When postgres is searching for a value using brin index, it can skip over entire blocks of pages if the index's min/max stats say that the value isn't in that particular page block.
Properties
When to use
CREATE INDEX brin_index_name ON your_table USING brin (column_name);
-- create brin index with custom page block size
CREATE index brin_index_name ON your_table USING brin(column_name) WITH (pages_per_range = 32);
Supported operators
< <= = >= >
A query may not necessarily use multiple indexes in a query.
This is why a single multicolumn index vs 2 single column indexes can give vastly different performance.
Use ANALYZE to actually see what the query is using.
Some queries with OR
, AND
may use multiple indexes for the query. These combined indexes are usually single column indexes - postgres will do a bitwise &
and/or *
operation over these indexes, and use the result to choose rows.
From the documentation:
As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update. Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000, while GIN indexes will handle 100,000+ lexemes better but are slower to update.
More info in the official documentation - GiST and GIN Index Types
If the use case allows fuzzy searches, use the pg_trm module. It uses trigraph + gin/gist indexes and will radically speed up textual searches.