SQL logical clause execution order

2023 Nov 15  |  2 min read  |  tags: sql (1) data-engineering (1)

SQL queries are declarative, and give deterministic output.

The output is always equivalent to the output you'll get after following the clause execution order below. It is exactly like operator precedence in programming languages.

The logical execution order

The order is:

  1. FROM, ON, JOIN (from, ON, join in this order)
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT (window functions are executed at this step)
  6. Apply distinct() if used
  7. ORDER BY
  8. LIMIT

To put it simply - The engine goes from larger to smaller

It takes in all relevant data and then progressively cuts it down until it get to the specified result.

  1. FROM, ON, JOIN - Take in all source tables. Join them to create a single large table.
  2. WHERE - Filter the large combined table to remove unnecessary rows.
  3. GROUP BY, HAVING - Group the rows, filter rows within the groups.
  4. SELECT, distinct() - Select required columns from the table. Apply window functions if required.
  5. ORDER BY - Order the rows.
  6. LIMIT - Remove rows that don't fit in the specified limit.
  7. Return the data as result.

This can change a bit depending on your SQL engine if it adds new non-standard features. For example - If QUALIFY clause is supported, it is executed after the SELECT clause. But the standard is pretty much the same across databases.

Theoretical background

SQL is declarative.

When you write a SQL query, you tell the execution engine about the outcome/end-state that you want. The database engine figures out the steps to achieve that end state. You don't tell the actual steps (operations and their order of execution).

The databases always give you one guarantee - The output will always be equivalent to the output you'll get after following the logical clause execution order.

Internally, the database may change up the actual order of processing, in order to optimize the query's execution. It may do weird things internally.

But you don't have to care about what the engine does internally. Because the final output will always be the logical output after following the clause execution order.

It is as if the query optimizer says "Why do you care how I cook the query? The dish will taste exactly as you expected."