niraj zade :::: ~/blog ~/tools ~/hire

Postgres - index types and usage scenarios

By Niraj Zade | 2024 Oct 13 | tags: postgres 2 dbms 3

Types

Index creation template:

CREATE INDEX <index_name> ON <table_name> USING <index_type>(column_name);

Index types:

none - defaults to b-tree
hash
gist
spgist
gin
brin

B-Tree

In-depth explanation in the postgres source code - /src/backend/access/nbtree/README

The postgres b-tree index uses the balanced tree data structure. So, naturally the fetching time will be uniform for all values, as all the elements will be almost equidistant from the root of the tree.

Properties

Large index size.
Expensive to compute - tree has to be created, balanced

When to use

Default choice of index. Good for general purpose indexing. Works well for a wide range of data types.

When not to use

When the table is extremely large. B-tree are expensive. Are cpu and storage intensive to create and maintain.
When data is high velocity. The constant b-tree rebuild due to dml statements (update, delete) will use up a lot of cpu, and slow down the database.
Do not use when the data is highly skewed. Since b-tree uses a balanced tree, skewed data will trigger a lot of tree re-balancing. This is computationally expensive, and will also cause a lot of i/o as storage pages will be split and moved around.

CREATE INDEX btree_index_name ON your_table(column_name);

Supported operators

Search, insert, delete - single values or range of values

<   <=   =   >=   >

Hash

In-depth explanation in the postgres source code - /src/backend/access/hash/README

Generates a 32 bit hash code from the value of the indexed column.
Uses hash function to map key and values to locations. Perfect fit for fast exact-match/equality lookups. Time complexity to find a row is O(1).
Not good for range queries, sorting

When to use

You are performing exact-match lookups (=)
Column has high cardinality.

When not to use

The column has low cardinality
Your are performing operations other than exact-match lookups (other than =)

Notes on performance

Hash index doesn't like too many hash values in a single bucket. This is exactly like a normal hash map, with a linked list for each bucket. Hash index performs best when the data is uniformly distributed. In other words - the ratio of cardinality to number_of_rows is low.

When the number of rows is high, but column cardinality is low, the hash index performance will suffer massively.

Reason - This is because the bucket in the hashmap will end up with too many values. Postgres stores the buckets in pages (postgres stores everything in pages). With too many values in a bucket, the bucket will be spread over many pages. This will quickly ruin performance at scale.

CREATE INDEX hash_index_name ON your_table USING hash (column_name);

Supported operators

GIST - generalized search tree

In-depth explanation in the postgres source code - /src/backend/access/gist/README

Properties

Uses tree structure, but unlike B-Tree, can be customized for specific data types
Use for indexing multi-dimensional data.
Flexible index types. Useful for geometric and text data types

When to use

Use for spatial data types, text search, custom data types
Use when performing full text search on the column's data

CREATE INDEX gist_index_name ON your_table USING gist (column_name);

Supported operators

<<   &<   &>   >>   <<|   &<|   |&>   |>>   @>   <@   ~=   &&

SP-GIST - Space Partitioned Generalized Search Tree

In-depth explanation in the postgres source code - /src/backend/access/spgist/README

Properties

Extension of GIST. Divides index space into non-overlapping regions for efficient storage and retrieval

When to use

Column has space partitioned data like geographical regions, ranges
When gist index isn't fast enough, sp-gist can radically speed up queries

CREATE INDEX spgist_index_name ON your_table USING spgist (column_name);

Supported operators

<<   >>   ~=   <@   <<|   |>>

GIN - Generalized Inverted Index

In-depth explanation in the postgres source code - /src/backend/access/gin/README

More details by the maintainers - Gin for PostgreSQL

Stores inverted lists of keys and positions, enabling efficient searches for elements with these data structures

Properties

Computationally expensive to generate.
Can generate large index sizes. (as compared to b-tree indexes).

When to use

When the column is of JSONB type. Gin is the the preferred index for JSONB datatype.

When not to use

Table has a lot of updates. Re-calculating gin index is extremely expensive.

CREATE INDEX gin_index_name ON your_table USING gin (column_name);

Supported operators

<@   @>   =   &&

BRIN - Block Range Index

In-depth explanation in the postgres source code - /src/backend/access/brin/README

Brin index groups storage pages into blocks, and stores summary information about each block. The default page block size is 128.

When postgres is searching for a value using brin index, it can skip over entire blocks of pages if the index's min/max stats say that the value isn't in that particular page block.

Properties

Very space efficient. Generates extremely small indexes.
Very CPU efficient. Don't use up too much CPU to build/rebuild.

When to use

Use to optimize storage and query performance on very large tables where the data is ordered.
When data is sorted. Eg - time series, sequential data.
When the table is huge, and b-tree index is too expensive to use (especially if a lot of inserts and updates are occurring. re-computing b-tree is extremely expensive).
When the data has very high velocity (index re-calculation is cheap, so can keep up with data insert speeds)

CREATE INDEX brin_index_name ON your_table USING brin (column_name);

-- create brin index with custom page block size
CREATE index brin_index_name ON your_table USING brin(column_name) WITH (pages_per_range = 32);

Supported operators

<   <=   =   >=   >

Notes on indexing performance

Choosing single or multi-column index

A query may not necessarily use multiple indexes in a query.

This is why a single multicolumn index vs 2 single column indexes can give vastly different performance.

Use ANALYZE to actually see what the query is using.

Some queries with OR, AND may use multiple indexes for the query. These combined indexes are usually single column indexes - postgres will do a bitwise & and/or * operation over these indexes, and use the result to choose rows.

Postgres documentation - 11.5. Combining Multiple Indexes

Gist vs gin for full text searches

Gist indexes are lossy. So, the search results won't be exact and will return some false positives.
Gin indexes are lossless. But performance will degrade logarithmically as number of words increase
- Compared to gist indexes - gin indexes will be 2-3x larger, and generation will take 2-3x larger

From the documentation:

As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update. Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000, while GIN indexes will handle 100,000+ lexemes better but are slower to update.

More info in the official documentation - GiST and GIN Index Types

If the use case allows fuzzy searches, use the pg_trm module. It uses trigraph + gin/gist indexes and will radically speed up textual searches.

Table Of Contents

Types
Notes on indexing performance
- Choosing single or multi-column index
- Gist vs gin for full text searches

Blog

tags: ai 1 blog 4 gen-ai 1 llm 1 meta 1

[2025 Jul 16] My subconscious doesn't like LLMs 11 min blog 4 llm 1 ai 1
[2023 Apr 08] Computers understanding humans makes codebases irrelevant 8 min blog 4 gen-ai 1
[2023 Feb 12] Own your email's domain 6 min blog 4
[2023 Jan 03] Why interpreted languages make superior scripting languages 1 min meta 1 blog 4

Resources

tags: advice 1 dbms 3 infrastructure 1 linux 2 postgres 2 python 2 resources 5 sqlserver 1

Handy linux commands [2025 Jul 10] linux 2
SqlServer - Date & Time formatters [2025 Jul 10] dbms 3 sqlserver 1
PostgreSQL [2025 Jul 04] dbms 3 postgres 2
Links [2025 Jun 15] resources 5
Topicwise resources dump [2025 Jun 15] resources 5
Conference talks, lectures [2025 Jun 02] resources 5
Papers [2025 Jun 02] resources 5
Lecture - You and your research by Dr. Richard Hamming [2024 Oct 14] advice 1
Python - recommended packages [2024 Oct 14] python 2 resources 5
Postgres - index types and usage scenarios [2024 Oct 13] postgres 2 dbms 3
SSH tunneling templates [2024 Sep 22] infrastructure 1 linux 2
Python unicode string comparison and normalization schemes [2024 May 06] python 2