ai-assisted

Fabric Liquid Clustering Can Quietly Waste Your Compute

Incremental Liquid Clustering in Fabric: when to use it and when not to

Frank Garofalo

31 May 2026 — 7 min read

Liquid Clustering is not a free performance button.

It is useful. It works. And it is still one of the easiest ways for Fabric teams to create permanent operational overhead for tables that never needed it in the first place. The real question is not whether Incremental Liquid Clustering in Fabric helps, but whether your workload earns the extra maintenance cost and complexity.

Why this decision matters now

Microsoft Fabric is moving fast, and optimization guidance is changing with it. For Delta tables in Fabric Lakehouse, Microsoft documentation positions liquid clustering as the recommended layout strategy over Z-Order in the relevant Lakehouse Delta context starting in Runtime 2.0 (Microsoft Learn: liquid clustering, Microsoft Learn: Z-Order).

That still does not mean universal adoption across every table type or workload pattern.

Once a platform team labels something as “best practice,” it spreads far beyond the workloads that actually benefit. A feature intended for a few high-value analytical tables becomes a blanket rule for every Silver table, then every Gold table, then somehow a couple of Bronze landing zones too.

My opinion is simple: Incremental Liquid Clustering should be treated as a workload-specific layout optimization, not a default tuning step.

Last quarter, a retail analytics team showed me a Fabric Lakehouse where they had enabled clustering discussions for 37 tables before they had benchmarked even one dashboard query or identified the top three filter predicates on their largest sales model.

That is backwards.

What Incremental Liquid Clustering actually changes

At a technical level, liquid clustering is a declarative data layout strategy for Delta tables. You define clustering columns, and the table layout is organized to colocate related data values without forcing rigid Hive-style partition directory boundaries (Microsoft Learn: liquid clustering).

The important word is incremental.

This is not a one-time magical rewrite that permanently fixes performance. Layout quality improves over time as new data is written and maintenance reorganizes files around the chosen clustering columns. That is the promise and the trade-off: better read-path locality and data skipping, but with continuing maintenance implications.

It is also important to keep the scope precise:

This is a Lakehouse Delta table capability.
It is not the same thing as Fabric Data Warehouse clustering.
Architects should not assume one clustering story applies identically across both engines, because Fabric Data Warehouse has separate clustering capabilities in preview with different behavior and design considerations (Microsoft Learn: Data Warehouse clustering).

A simple way to explain the choice to a team:

Start with the table: is it large enough and queried often enough to matter?
Then look at access patterns: do the same few columns show up in selective filters repeatedly?
Then validate semantics: are those columns stable enough to be useful clustering keys?
Only then benchmark before and after on representative business queries.
If gains hold and maintenance cost is acceptable, keep it. If not, revert.

The right first question is not “Can we enable clustering?” It is “Is this table large enough, queried often enough, and filtered predictably enough to justify it?”

The real decision is not clustering versus no clustering

This is where the conventional wisdom misses the point.

The real choice set is broader:

Do nothing
Fix table design
Fix query design
Use partitioning
Use liquid clustering
Combine techniques selectively

Many slow Fabric queries have nothing to do with missing clustering. They come from scanning too many columns, weak predicate pushdown, poor join patterns, bad table grain, excessive small files, or querying a table that should have been pre-aggregated.

Fabric’s medallion guidance points clustering toward Silver and Gold layers, which is a useful signal: this is more relevant for curated analytical tables than for raw Bronze ingestion zones (Microsoft Learn: medallion architecture). But even in Silver and Gold, layout tuning is not step one.

Before you cluster anything, ask:

Is there an actual SLA problem?
Is the table large enough for layout to matter?
Are the same columns repeatedly used in selective filters?
Would a better table design solve more than clustering would?

If your dashboard already loads in 6 seconds and the business is happy, “faster” is not automatically worth another maintenance process.

When it fits, when it does not, and what to compare it against

Incremental Liquid Clustering is worth serious evaluation when all of the following are true:

The table is large, typically in Silver or Gold.
Queries repeatedly filter on a small set of columns.
Those columns have stable semantics and meaningful selectivity.
The table is queried often enough that read savings compound.
You want better file locality without the rigidity and skew risks of Hive-style partitioning.

Typical candidate columns are things like:

order_date
region
tenant_id
customer_id
status

If you decide the workload is a fit, the next step is straightforward in Fabric Spark:

# Create or alter a Delta table with liquid clustering in Fabric Spark
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

spark.sql("""
CREATE TABLE IF NOT EXISTS sales_lc (
    order_id BIGINT,
    customer_id STRING,
    region STRING,
    order_date DATE,
    amount DECIMAL(18,2)
) USING DELTA
""")

spark.sql("""
ALTER TABLE sales_lc
CLUSTER BY (region, order_date)
""")

spark.sql("DESCRIBE DETAIL sales_lc").show(truncate=False)

What matters operationally is not the syntax alone. Once you declare clustering, you are making an ongoing maintenance decision for that table.

Just as important: know when not to use it.

Do not use Incremental Liquid Clustering for:

small tables where scan pain is minimal
lightly queried tables where gains will not compound
Bronze ingestion zones focused on raw landing and write simplicity
tables with chaotic predicate patterns across many columns
shared environments with weak cost visibility
cases where precomputation, denormalization, or materialized lake views solve the bigger problem better (Microsoft Learn: materialized lake views)

And compare it honestly against the alternatives you would actually use in production:

Doing nothing: often the right answer if performance is already acceptable.
Partitioning: still useful, especially when concurrent writers need isolation across natural partitions (Microsoft Learn: partitioning).
Legacy Z-Order: still present in older estates, but new Lakehouse Delta decisions should start from current Fabric guidance rather than habit (Microsoft Learn: Z-Order).
Broader table design improvements: better grain, narrower projections, query rewrites, pre-aggregations, materialized lake views, and selective denormalization often produce higher ROI.

Azure Databricks documentation points in a similar Delta ecosystem direction, but direction is not the same as default (Microsoft Learn: Azure Databricks liquid clustering).

How to validate it with benchmarks

The goal is to compare representative business queries before and after clustering, not to optimize synthetic microbenchmarks.

Use a benchmark process that compares the same workload before and after, captures duration and scan-related metrics, and compares medians rather than cherry-picking the fastest run.

Here is a lightweight notebook pattern for timing a few representative queries:

# Lightweight benchmarking pattern for baseline vs clustered layout
import time
from statistics import median

queries = {
    "region_filter": "SELECT count(*) FROM sales_lc WHERE region = 'West'",
    "date_range": "SELECT sum(amount) FROM sales_lc WHERE order_date >= DATE'2025-01-01'",
    "combined": "SELECT avg(amount) FROM sales_lc WHERE region = 'East' AND order_date >= DATE'2025-02-01'"
}

def run_once(sql_text: str) -> float:
    t0 = time.perf_counter()
    spark.sql(sql_text).collect()
    return time.perf_counter() - t0

results = {name: [run_once(sql) for _ in range(5)] for name, sql in queries.items()}
summary = {name: round(median(times), 3) for name, times in results.items()}
print("Median seconds by query:", summary)

What to observe: use multiple runs and compare medians. Single-run timings are too noisy to drive platform decisions.

And once you have baseline and clustered timings, calculate improvement in a way that keeps the business context front and center:

# Compare two layouts and avoid synthetic-only conclusions
baseline = {"region_filter": 2.8, "date_range": 3.1, "combined": 4.0}
clustered = {"region_filter": 1.2, "date_range": 1.6, "combined": 1.9}

for name in baseline:
    improvement = (baseline[name] - clustered[name]) / baseline[name] * 100
    print(f"{name}: {improvement:.1f}% faster")

print("Caution: benchmark representative dashboards, ETL filters, and SLA-critical queries.")
print("Do not decide from a single microbenchmark that ignores concurrency, caching, and write cost.")

A 50% speedup on a query nobody cares about is less valuable than a 15% improvement on an SLA-critical dashboard.

Concurrency also matters. Because Delta in Fabric uses optimistic concurrency control, maintenance and writes can interact in ways architects need to plan for (Microsoft Learn: concurrency control). If multiple jobs are writing and reorganizing the same high-traffic assets, clustering becomes part of platform operations, not just storage layout.

That means you need:

ownership of clustering keys
a review cadence
cost attribution
maintenance windows or orchestration rules
a rollback decision if gains do not hold

A pragmatic adoption rubric I would use in Fabric

If you want a repeatable decision model, score tables on these six dimensions:

Table size
Query frequency
Predicate stability
SLA sensitivity
Write concurrency
Cost visibility

My rule of thumb is blunt:

If you cannot name the top filters, quantify the pain, and identify the owner, do not cluster yet.

The most common failure modes I see are predictable:

treating clustering as a default medallion step
choosing too many clustering columns
applying it before fixing obvious model issues
ignoring shared-capacity cost visibility
confusing Lakehouse liquid clustering with Warehouse clustering

That last one matters more than people think. Engine-specific assumptions create expensive mistakes.

My recommended default for Fabric teams

Here is the opinionated version.

Default to no clustering for Bronze.
Default to no clustering for small or low-value tables.
Evaluate liquid clustering first for large, frequently queried Silver and Gold Delta tables with repeatable filter patterns.
Use partitioning deliberately when write isolation or partition-pruning behavior is the real requirement.
Reassess quarterly, because Fabric capabilities are evolving quickly and yesterday’s workaround can become tomorrow’s unnecessary complexity.

Incremental Liquid Clustering in Fabric is a sharp tool. It is not a universal one. Use it where read-path gains on important business queries clearly outweigh the maintenance, compute, and governance overhead. Everywhere else, restraint is the more mature architecture decision.

Where has liquid clustering paid off in production for you, and where did it add more maintenance than value?

#MicrosoftFabric #Deltalake #DataArchitecture

Sources & References

Try it yourself

Run this tutorial as a Jupyter notebook: Download runbook.ipynb (24 cells, 19 KB).

Fabric Liquid Clustering Can Quietly Waste Your Compute

Frank Garofalo

Why this decision matters now

What Incremental Liquid Clustering actually changes

The real decision is not clustering versus no clustering

When it fits, when it does not, and what to compare it against

How to validate it with benchmarks

A pragmatic adoption rubric I would use in Fabric

My recommended default for Fabric teams

Sources & References

Try it yourself

Read more

Azure Cosmos DB Is the Agent Memory Bet

Copilot Notebooks Could Rewrite Executive Memory

GitHub Copilot Defaults Just Moved Your Governance Line

Copilot Cowork GA Resets Microsoft 365 Automation