Delta Lake - performance optimization and maintenance

By Niraj Zade  |  2026 Jan 03  |  12m read  |  tags:

Work in progress - The overall flow of this note is being fixed

If you're a new architect, or simply someone new to Databricks/Delta-Lake, you'll want to go through this note a few times. At each time, think how your existing data setup can be improved through things explained in this note.

There is plenty of official documentation on the topic. But this note organizes it all in one place, and forms a definite & actionable structure in your brain.

Gist

Things you should know in 1 line:

  • Auto compaction : solves small file problem by merging small files into fewer larger files, reducing file count overhead
  • Liquid clustering : continuously and incrementally groups rows with similar key values into same files, improving file-level data skipping
  • ZORDER : re-writes data by sorting rows along space-filling curve, allowing multi-column filters to skip more files
  • Partitioning : splits and organizes data into directory based chunks, enabling coarser data pruning before files are scanned (single column filter statements)
  • OPTIMIZE : Executes a family of optimization operations that rewrite data files to improve read performance - file compaction and z-ordering or liquid clustering (whatever is enabled)

On a delta table, you can either use (Liquid Clustering), or (z-ordering+partitioning). Liquid clustering isn't compatible with partitioning and z-ordering.

Liquid clustering and auto compaction are two different mechanisms, independent of each other. However, they are co-executed inside the same spark write jobs. So in a way, these two work hand-in-hand.

  • Operationally
    • Automated management will automatically trigger liquid clustering and auto compaction (via OPTIMIZE)
    • If automated management is not available/enabled, periodically run OPTIMIZE via scheduled job runs
  • Tech choices
    • To prevent small file problem - Enable delta auto-compaction if not already enabled
    • To make sure the files are organized well - prefer liquid clustering over zorder+partitioning

Automated mechanisms

Liquid clustering - Faster reads via data skipping

NOTE: Prefer using liquid clustering instead of the older hive-partitioning & Z-ORDER mechanisms

Liquid clustering cannot be used with partitioning and bucketing. You have to give up all control to liquid clustering.

Preferred for tables that are being loaded by streaming, or by small frequently run batches of appends. Liquid clustering especially shines when read patterns are not predictable.

Liquid clustering optimizes the data layout, aiming for a balanced data layout - uniform file size, and an appropriate number of files depending on the size of the table. It will take care of cardinality, data skew and small file problem.

Use for:

  • skewed data
  • tables that regularly ingest new data (streaming data, frequent batch insertions)
  • query patterns are constantly changing with time
  • partitioning leads to smile file problem

Liquid clustering comes in 2 flavours:

  • Automated liquid clustering (databricks only feature, part of Predictive Optimization) - the runtime will automatically choose columns to cluster on based on history of queries executed on the table. This is very similar to Snowflake's automated re-clustering.
  • Non-automated liquid clustering - you'll have to specify the columns to cluster on

When is liquid clustering triggered?

TRIGGER ON WRITE

Liquid clustering happens at write-time. Every time delta writes writes to a table, liquid clustering is performed on it (INSERT, MERGE, COPY INTO etc). So, all liquid clustering work happens inside the spark jobs that are touching the table. There is no separate background service that continuously optimizes the tables.

MANUAL/EXPLICIT TRIGGER

Liquid clustering can be manually triggered via the optimize command

from delta.tables import *
    deltaTable = DeltaTable.forPath(spark, "path/to/table")
    deltaTable.optimize().executeCompaction()
OPTIMIZE table_name;

AUTOMATED TRIGGER (databricks only)

When predictive optimization is enabled (requires databricks premium plan), databricks will automatically run ANALYZE, OPTIMIZE, VACUUM operations - by queueing them as job to be run on serverless compute instances.

Since OPTIMIZE is being run, it will also end up triggering liquid clustering.

What happens during liquid-clustering?

  1. Delta reads clustering metadata - looks at existing files, their min/max ranges on clustering keys, clustering quality scores
  2. Incoming rows are sorted and grouped by the clustering keys.
  3. Delta will append rows into files whose key ranges are a good match for that particular row. If there is no existing file with a good match, it will create new file with tight range.

There is no global-sort. No full table scan (only file statistics are read). The computational cost is proportional only to the new data.

If liquid clustering finds that the existing data is horribly clustered, it won't re-organize and re-write the whole table. Instead, it will only rewrite a few of the worst files. If it sees diminishing returns, it will stop early.

Why? because it doesn't want to fix streaming SLAs. Re-organizing the entire table may take a long time, breaking the SLA. So instead, it will fix only a few existing files on every run. Over time, as more and more writes trigger liquid clustering, the table will eventually become well-clustered.

So, in short, liquid clustering will touch a few files on every run, keeping the rewrite size small.

Automated liquid clustering

(Databricks only feature)

When automated liquid clustering is enabled, the runtime will analyze the historical query workload, and automatically identify the best columns for clustering. It will figure out the keys which will lead to cost savings from data skipping. In this case, the liquid clustering columns won't need to be manually declared.

-- Enable automatic liquid clustering on an existing table,
-- including tables that previously had manually specified keys.
ALTER TABLE table1 CLUSTER BY AUTO;

-- Disable automatic liquid clustering on an existing table.
ALTER TABLE table1 CLUSTER BY NONE;

-- Disable automatic liquid clustering by setting the clustering keys
-- to chosen clustering columns or new columns.
ALTER TABLE table1 CLUSTER BY (column01, column02);

Source: Databricks: Delta Clustering: Enable/disable automatic liquid clustering

What triggers reclustering?

A re-clustering is triggered when: - Files overlap on clustering keys - There is fragmentation - Query

Creating/modifying tables for liquid clustering

Liquid clustering can only be enabled on delta tables created with CLUSTER BY clause.

How to choose columns to cluster on

Source - delta.io: liquid-clustering

Clustering can be done on multiple columns. Order of columns doesn't matter. Cardinality of the column doesn't matter.

If existing tables are being migrated to liquid clustering based delta tables, choose liquid clustering columns as follows:

  • Hive-style partitioning - use columns used for partitioning the data
  • Z-order - use ZORDER BY columns
  • Hive-style partitioning & zorder - use columns used for partitioning the data and ZORDER BY columns
  • Generated columns to reduce cardinality - use original column, don't create a generated column

Create new table with liquid clustering

(
    DeltaTable.create()
    .tableName("sales")
    .addColumn("customer", dataType = "STRING")
    .addColumn("city", dataType = "STRING")
    .addColumn("sales", dataType = "INT")
    .clusterBy("customer", "city") # enables liquid clustering
    .execute()
)
CREATE TABLE table1(customer string, city string, sales int) USING DELTA CLUSTER BY (customer);

New table from existing table

CREATE TABLE sales_clustered CLUSTER BY (customer, city)
AS SELECT * FROM sales_raw;

Enable liquid clustering on existing table

ALTER TABLE events
SET TBLPROPERTIES (
  delta.liquid.clusteringKeys = 'user_id, event_date'
)

Change liquid-clustering columns

ALTER TABLE table_name CLUSTER BY (new_column1, new_column2);

Disable liquid clustering

ALTER TABLE table_name CLUSTER BY NONE;

Auto-compaction - Control file size

Auto compaction kicks in when the number of file sizes is above a threshold (the spark configuration variable is spark.databricks.delta.autoCompact.minNumFiles). The output file size is controlled using spark.databricks.delta.autoCompact.maxFileSize.

Like liquid clustering, auto compaction comes in 2 flavours:

  • Automated auto-compaction (Databricks only feature, part of Predictive Optimization) - the runtime will run OPTIMIZE, which will end up performing auto-compaction
  • Non-automated auto-compaction - auto compaction will have to be performed by manually running OPTIMIZE via a spark job, or a user.

Auto compaction executes an OPTIMIZE command at the end of a transaction. Keep in mind that since OPTIMIZE is being run - if liquid clustering was enabled, then liquid clustering will also be performed on the table.


Enabling auto compaction

To enable auto compaction, following variables can be set:

  • Table properties: delta.autoOptimize.autoCompact
  • Spark Session setting: spark.databricks.delta.autoCompact.enabled

The values these 2 settings take are:

Options Behavior
auto (recommended) Tunes target file size while respecting other autotuning functionality. Requires Databricks Runtime 10.4 LTS or above.
legacy Alias for true. Requires Databricks Runtime 10.4 LTS or above.
true Use 128 MB as the target file size. No dynamic sizing.
false Turns off auto compaction. Can be set at the session level to override auto compaction for all Delta tables modified in the workload.

Source: Databricks: Auto compaction for Delta Lake on Databricks

Auto tuning

Write heavy tables

If the table is write-heavy and many MERGE and DML operations are executed on it, set table property delta.tuneFileSizesForRewrites to true.

When true, the target file size is set to a lower threshold, which makes write-intensive operations faster.

If not set, databricks will detect if 9 of the last 10 ops were MERGE, and set the table property to true. To disable this behaviour, the value has to be explicitly set to false.

Estimating number of files

Databricks aims for smaller file sizes for smaller tables, and larger file sizes for larger tables. The aim it to make sure that the number of files doesn't increase too much.

However, it won't set the file size dynamically if you've explicitly set target file size (delta table property delta.targetFileSize. Its default value is None).

Table size Target file size Approximate number of files in table
10 GB 256 MB 40
1 TB 256 MB 4096
2.56 TB 256 MB 10240
3 TB 307 MB 12108
5 TB 512 MB 17339
7 TB 716 MB 20784
10 TB 1 GB 24437
20 TB 1 GB 34437
50 TB 1 GB 64437
100 TB 1 GB 114437

Enable background auto compaction

This is a 2 step process:

  • remove spark config spark.databricks.delta.autoCompact.enabled from cluster/notebook settings
  • for each table, run ALTER TABLE <table_name> UNSET TBLPROPERTIES (delta.autoOptimize.autoCompact). This will remove legacy auto-compaction settings.

After removing these, auto compaction will be triggered automatically (for all unity catalogue managed tables)

Automating table maintenance (Databricks only)

Predictive optimization

This is a feature available for Unity Catalog managed tables.

When predictive optimization is enabled (requires databricks premium plan), databricks will automatically run ANALYZE, OPTIMIZE, VACUUM operations - by queueing them as job to be run on serverless compute (you will be billed for this compute).

Since OPTMIZE is being run, it will end up triggering both liquid clustering and auto-compaction on tables.

Observability statistics for predictive optimization executions are stored in the table system.storage.predictive_optimization_operations_history.

To explicitly enable/disable predictive optimization on a table:

ALTER CATALOG [catalog_name] { ENABLE | DISABLE | INHERIT } PREDICTIVE OPTIMIZATION;
ALTER { SCHEMA | DATABASE } schema_name { ENABLE | DISABLE | INHERIT } PREDICTIVE OPTIMIZATION;

Manual intervention

Rough bullet points - Aim for 128mb-1gb per file

Liquid clustering

Small file problem

OPTIMIZE - data layout optimization

Source - Databricks: Delta Optimize

Will perform bin-packing and z-ordering.

Merges small files into larger ones. Good for read heavy tables.

Auto-optimization - wip - WHERE TO PUT THIS IN THIS NOTE?

The term has been split into 2: - https://docs.databricks.com/aws/en/delta/tune-file-size#optimized-writes - https://docs.databricks.com/aws/en/delta/tune-file-size#auto-compact

ALTER TABLE events SET TBLPROPERTIES (
  delta.autoOptimize.optimizeWrite = true,
  delta.autoOptimize.autoCompact = true
)

Z-ORDER - faster reads by data-skipping

NOTE: As of now (2025), liquid clustering is recommended over Z-ORDERING.

WARNING:

Very costly operation to perform. Z-order will lead to re-write amplification. Every OPTIMIZE ZORDER will rewrite large chunks of data.

  • Improves file-level data skipping when queries are filtering on multiple columns
  • While writing - writes rows with similar values close together, and stores min/max stats per file

Now, spark can skip entire files by looking at the min/max stats. Very effective for range and equality filters

Z-ORDER has to be triggered manually (in pipelines or scheduled jobs). So, use Z-ORDER only for legacy tables, and one-off historical backfills.

By default, simply prefer liquid clustering.

To run z-order:

OPTIMIZE orders
ZORDER BY (customer_id, order_date)

Zorder vs bucketing

Bucketing is not supported for delta tables.

Similar to partitioning+bucketing for parquet tables, partitioning+z-ordering is the way to go for delta tables.

However, it is still recommended to simply use liquid clustering instead of partitioning+z-ordering.

Zorder and bucketing solve completely different problems. Zorder will help speed up writes through data-skipping. Bucketing will help speed up execution by preventing shuffles during joins and aggregations.

Z-order doesn't prevent the data-shuffle problem that bucketing does.

Partitioning - WIP

If liquid clustering is not available, Partitioning + Z-ORDER is the way to go, for enabling faster reads via file-skipping.

Also naturally, partitioning won't work only if there are no low-cardinality columns that are being used by queries to filter data. This almost never happens in practice, so partitioning the table is almost always the go-to strategy.

CREATE TABLE events
PARTITIONED BY (event_date)

VACUUM - Removing stale files by trimming history

VACUUM will:

  • Trim data history by deleting snapshot files that are older than the specified time travel retention period.
  • Physically delete dropped rows (that were soft-deleted via deletion vectors)

It is never run automatically. It will always have to be manually executed by a job or a user.

Delete old data files (loses history)

VACUUM events RETAIN 168 HOURS

Other Articles

(Most of these are ever evolving pieces)


sqlserver

SqlServer reference - All date & time formatters
2025 Jul 10 | 2m (519 words)


spark

Spark Microbook
2025 Oct 16 | 26m (4,801 words)

Delta Lake - performance optimization and maintenance
2026 Jan 03 | 12m (2,262 words)

Spark performance optimization compendium
2025 Oct 11 | 4m (887 words)

Spark join strategies
2024 Jan 22 | 13m (2,450 words)


python

Unicode string normalization schemes in Python
2024 May 06 | 7m (1,303 words)

Python gotchas compilation
2023 Sep 17 | 5m (1,058 words)


linux

SSH notes
2024 Sep 22 | 2m (417 words)


career

Lecture - You and your research by Dr. Richard Hamming
2024 Oct 14 | 1hr18m (14,441 words)

Why charge more as an engineer
2025 Oct 13 | 6m (1,253 words)


resources

Catalyzing research within India
2025 Aug 25 | 1m (239 words)

Links collection
2025 Jun 15 | 3m (586 words)

Papers, books, talks etc
2025 Jun 02 | 3m (710 words)


webtech

JWT - a quick overview of its working and encryption algorithms
2024 Apr 02 | 12m (2,321 words)


blog

Isolates+storage over http+orchestrators is the future that has arrived
2023 Jan 03 | 5m (1,026 words)

My subconscious doesn't like LLMs
2025 Jul 16 | 10m (1,969 words)

Why interpreted languages make superior scripting languages
2023 Jan 03 | 1m (287 words)

Own your email's domain
2023 Feb 12 | 5m (934 words)

Computers understanding humans makes codebases irrelevant
2023 Apr 08 | 6m (1,267 words)

© Niraj Zade 2025 - Website, Linkedin
Website was autogenerated on 2026-01-03
Whoever owns storage, owns computing