Fabric Migration Exposes Every Weak ADF Assumption

Migrating Azure Data Factory Pipelines into Fabric Without Rebuilding Your Data Estate

Fabric Migration Exposes Every Weak ADF Assumption

Fabric migration is not a pipeline import project.

A CIO asked me last month, “If Microsoft says Fabric Data Factory is the next generation of Azure Data Factory, why not just move the pipelines and shut the old platform down?”

That question sounds operational. It is actually architectural.

Microsoft is clear that Data Factory in Microsoft Fabric is the next generation of Azure Data Factory, and Azure Data Factory guidance now points customers toward Fabric evaluation as part of mainstream planning. But that does not make migration a like-for-like service swap. Fabric Data Factory runs inside a broader SaaS analytics platform, and that changes orchestration ownership, governance, monitoring, and the role of OneLake in the target design (Microsoft ADF intro, Fabric migration planning, Microsoft Fabric overview).

Here is the case study I use with executive teams when this topic comes up. The company profile and figures are anonymized, but based on a real enterprise migration assessment.

A global manufacturer had:

  • 214 Azure Data Factory pipelines
  • 37 triggers
  • 61 linked services
  • 2 self-hosted integration runtimes
  • 14 on-premises or privately reachable sources
  • Power BI and SQL-based warehouse consumers downstream
  • a mandate to reduce analytics platform sprawl before the next budgeting cycle

The original assumption was simple: export ADF, import what we can into Fabric, and consolidate.

The outcome was better than that, but only because the team stopped treating migration as a portability exercise and started treating it as a controlled redesign of the operating model.

The situation: the estate looked portable until we mapped the dependencies

The first workshop was optimistic. On paper, the estate was “standard ADF”:

  • copy activities from ERP, SQL Server, SFTP, and SaaS apps
  • scheduled orchestration for nightly and intraday loads
  • some SQL-heavy transformations
  • parameterized pipelines reused across business units
  • self-hosted integration runtime for private network access

That is exactly why the migration looked attractive.

But the second workshop changed the tone.

In a windowless room in Q1, the team projected a dependency map for 214 pipelines and found that 46 of them depended on self-hosted integration runtime paths nobody had documented since a 2022 ERP cutover.

That was the moment the executive sponsor realized the hard part was not import. The hard part was whether the current governance boundaries, release process, and hybrid assumptions would survive a OneLake-centric platform.

The root cause: this was an operating model problem

What surfaced was not “Fabric incompatibility.” It was accumulated architectural debt.

Three root causes showed up quickly.

1. Orchestration had become the integration boundary

In ADF, the team had used pipelines as the practical control plane for:

  • scheduling
  • source connectivity
  • parameter management
  • handoffs between ingestion and warehouse teams
  • failure ownership

Once Fabric entered the picture, that pattern had to be reconsidered. Moving orchestration into Fabric changes the context in which those jobs run.

2. Hybrid access was carrying hidden risk

Self-hosted integration runtime remained essential for private and on-premises connectivity. In Azure Data Factory, self-hosted IR is the mechanism that enables movement between cloud services and data stores in private networks or on-premises environments (self-hosted IR documentation).

So every pipeline touching:

  • on-prem SQL Server
  • file shares
  • SFTP in segmented networks
  • private endpoints with tightly scoped access

had to be validated as a hybrid pattern, not just counted as “one more pipeline.”

3. The destination architecture was different

Fabric’s center of gravity is OneLake. That matters because some ingestion patterns should not be recreated exactly as they exist today.

For supported sources, Fabric Mirroring continuously replicates source data into OneLake with low latency and low operational overhead, which can replace some scheduled copy patterns (Fabric Mirroring overview).

So the migration question became:

  • which pipelines should move as orchestrations,
  • which should be redesigned around mirrored data,
  • which should be retired,
  • and which should stay where they are for now.

The decision: classify the estate before touching a single pipeline

The team made one decision that saved the program: no blanket migration mandate.

Instead, they classified every asset into four buckets:

  • retire
  • rehost
  • refactor
  • replace

They started with a lightweight inventory of the ADF estate to establish scope. This PowerShell example illustrates the kind of first-pass enumeration they used to count pipelines, triggers, linked services, and integration runtimes before any design debates began.

# Enumerate ADF pipelines, triggers, linked services, and integration runtimes
param(
    [string]$ResourceGroupName = "rg-data",
    [string]$DataFactoryName = "adf-prod"
)

$pipelines = Get-AzDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName
$triggers = Get-AzDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName
$linkedServices = Get-AzDataFactoryV2LinkedService -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName
$integrationRuntimes = Get-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName

[pscustomobject]@{
    Pipelines = $pipelines.Count
    Triggers = $triggers.Count
    LinkedServices = $linkedServices.Count
    IntegrationRuntimes = $integrationRuntimes.Count
} | Format-List

If your counts are fuzzy, your migration plan will be fiction.

Next, they exported metadata and built a simple migration flow to force sequencing discipline.

Diagram 2

The sequence matters: inventory first, classify second, validate connectivity and lineage before cutover.

Assume adf_inventory.json is a custom exported metadata file containing pipelines, linked services, and integration runtime references. They then ran a simple classifier over that inventory to separate low-friction candidates from pipelines needing design review.

# Load exported ADF inventory and classify assets by migration readiness
import json
from pathlib import Path

inventory = json.loads(Path("adf_inventory.json").read_text())
for pipeline in inventory.get("pipelines", []):
    activities = pipeline.get("activities", [])
    linked = set(pipeline.get("linkedServices", []))
    uses_custom = any(a.get("type") in {"Custom", "DatabricksNotebook"} for a in activities)
    uses_self_hosted_ir = "SelfHostedIR" in pipeline.get("integrationRuntimes", [])
    complexity = "high" if len(activities) > 15 or uses_custom else "medium" if len(activities) > 5 else "low"
    readiness = "review" if uses_self_hosted_ir or uses_custom else "ready"
    print({
        "pipeline": pipeline["name"],
        "activityCount": len(activities),
        "dependencyType": sorted(linked),
        "complexity": complexity,
        "fabricReadiness": readiness,
    })

This kind of script is illustrative, not production-ready, but it quickly highlights where “easy migration” narratives break down.

[BODY_IMAGE_N]

The result of the assessment:

  • 52 pipelines were retired because they duplicated loads already available elsewhere
  • 71 were tagged rehost candidates with low complexity
  • 63 required refactoring due to parameterization, trigger complexity, or downstream coupling
  • 28 were marked replace because mirroring or a different Fabric-native pattern made more sense

That single exercise changed the board conversation. The question became not “Can we migrate ADF?” but “Where does Fabric reduce estate complexity without creating hidden operational debt?”

The implementation: move in waves, not all at once

The implementation happened in waves over 16 weeks.

Wave 1: low-risk orchestration moves

The first wave targeted pipelines with:

  • 5 or fewer activities
  • cloud-native sources
  • no self-hosted IR dependency
  • limited downstream coupling

The team created a wave plan from the classified inventory rather than letting business pressure dictate sequence.

# Build a simple migration wave plan from classified pipeline inventory
import json
from pathlib import Path

inventory = json.loads(Path("adf_inventory.json").read_text())
waves = {"wave1": [], "wave2": [], "wave3": []}

for p in inventory.get("pipelines", []):
    activities = len(p.get("activities", []))
    self_hosted = "SelfHostedIR" in p.get("integrationRuntimes", [])
    if activities <= 5 and not self_hosted:
        waves["wave1"].append(p["name"])
    elif activities <= 15:
        waves["wave2"].append(p["name"])
    else:
        waves["wave3"].append(p["name"])

print(json.dumps(waves, indent=2))

The goal was not mathematical perfection. It was to avoid moving the hardest hybrid pipelines first.

Wave 1 moved 43 pipelines in 4 weeks. Success criteria were strict:

  • identical schedule adherence
  • no increase in data freshness SLA breach rate
  • lineage visibility preserved
  • no manual support escalation increase

Wave 2: dependency hotspots and governance review

The second wave focused on linked services and private connectivity. This was where the migration almost stalled.

The team used a hotspot review to identify linked services and integration runtime patterns that needed explicit testing and security sign-off.

# Detect dependency hotspots such as self-hosted IR and on-prem linked services
import json
from pathlib import Path

doc = json.loads(Path("adf_inventory.json").read_text())
for ls in doc.get("linkedServices", []):
    props = ls.get("properties", {})
    ir = props.get("connectVia", {}).get("referenceName", "AzureIR")
    ls_type = props.get("type", "Unknown")
    hotspot = ir != "AzureIR" or ls_type in {"SqlServer", "FileServer", "Sftp"}
    print({
        "linkedService": ls["name"],
        "type": ls_type,
        "integrationRuntime": ir,
        "migrationHotspot": hotspot,
    })

Treat this as a triage heuristic, not a definitive dependency detector. A non-default connectVia reference or an on-premises-oriented linked service type is a signal for deeper review, not proof of identical migration risk across all assets.

This review surfaced:

  • 14 linked services with private or on-prem reachability constraints
  • 9 pipelines dependent on segmented network routes with undocumented firewall rules
  • 6 secrets-handling patterns that had to be redesigned for the target operating model
  • 11 pipelines whose support ownership was unclear once moved into Fabric workspaces

That last point mattered more than expected. Governance inside a unified SaaS analytics platform is not the same as governance around a standalone integration service. Workspace ownership, access boundaries, lineage expectations, and domain accountability all had to be redefined.

Wave 3: replace legacy copy patterns where Fabric offered a better pattern

For several source systems, the team chose not to recreate scheduled copy pipelines. Instead, they evaluated whether mirrored access into OneLake would meet latency and operational requirements.

That mattered for two reasons:

  1. It reduced duplicated landing-zone logic.
  2. It aligned downstream engineering and analytics teams around shared data in OneLake.

The results: consolidation worked, but only because operations changed too

After 16 weeks, the program had measurable outcomes.

What moved

  • 134 of 214 pipelines were migrated or replaced in Fabric
  • 52 pipelines were retired
  • 28 remained in ADF temporarily due to hybrid complexity or low strategic value

What improved

  • analytics platform count for core ingestion/orchestration dropped from 3 to 2
  • median time to deploy a low-complexity orchestration change fell from 5 business days to 2
  • duplicate landing-zone datasets for the targeted domains dropped by 38%
  • incident triage time for migrated pipelines fell from 47 minutes median to 29 minutes after runbooks were rewritten
  • daily freshness SLA attainment for the first migrated domains improved from 94.8% to 98.1%

What did not magically improve

  • hybrid connectivity effort: still high
  • release management complexity: initially worse during dual-running
  • security review cycle time: increased by 22% during the first 6 weeks because workspace and identity boundaries had to be re-approved

What the CFO cared about

The first-quarter financial impact was not “instant savings.” It was controlled consolidation:

  • approximately 11% reduction in duplicated operational effort across ingestion and BI support
  • temporary dual-running cost for 10 weeks
  • training and redesign costs that offset early platform savings

That is the honest business case. Consolidation can reduce sprawl, but the benefits come from standardization and ownership simplification, not from pretending migration is free.

The tradeoffs: where Fabric helped, where keeping ADF was the right call

Selective migration beat ideological migration.

Fabric was the right target when:

  • the pipeline was primarily orchestrating cloud-accessible sources
  • downstream consumers were already moving toward Fabric or Power BI-centric workflows
  • duplicated data movement could be reduced via OneLake-centric design
  • governance teams were ready to manage workspace-based ownership

ADF stayed in place when:

  • self-hosted IR and private connectivity were deeply embedded
  • the business value of moving a stable pipeline was low
  • release processes depended on patterns not yet redesigned for Fabric
  • the migration would have created more operational ambiguity than simplification

Migration only stabilized after the support model changed. The team rewrote runbooks, reassigned on-call ownership by domain, and created a cross-platform incident path for the dual-running period.

CI/CD needed the same treatment. Even where pipeline logic was portable, release management was not. Source control, environment promotion, parameter handling, rollback expectations, and test evidence all had to be redesigned around the Fabric operating model.

The executive takeaway: evaluate operating impact, not just portability

If you are evaluating this move now, here is the framework I recommend.

Score each domain on five criteria:

  1. Portability — How much pipeline logic is straightforward orchestration versus custom or tightly coupled behavior?
  2. Hybrid dependency — How much depends on self-hosted IR, private networking, or on-premises access?
  3. Governance fit — Are workspace ownership, access control, and lineage expectations clear in a shared Fabric estate?
  4. Operational readiness — Are monitoring, runbooks, support ownership, and release processes ready to change?
  5. Strategic value — Will OneLake consolidation or reduced duplication create real downstream benefit?

If a domain scores high on portability and strategic value, migrate now.

If it scores high on hybrid dependency and low on strategic value, keep it where it is.

If governance is immature, phase by domain rather than forcing a platform-wide motion.

Migrating Azure Data Factory pipelines into Fabric without rebuilding your data estate is possible, but only if you accept that you are redesigning ownership and operating boundaries even when the pipeline logic itself looks portable.

Microsoft has clearly positioned Fabric Data Factory as the forward path, and for many enterprises that is the right strategic direction (ADF documentation hub, Fabric Data Factory migration planning). But the best programs are selective, sequenced, and explicit about where Fabric should replace old patterns versus where ADF should remain during a longer transition.

If your migration plan starts with “export and import,” it is too shallow.

If it starts with “which operating assumptions break when orchestration moves into OneLake’s orbit,” you are asking the right question.

Where does this break in your environment: self-hosted IR, workspace governance, or release management?

#MicrosoftFabric #AzureDataFactory #DataArchitecture


Code Reference

Additional code samples that complement the tutorial above.

Sample 1 (python)

# Score pipeline migration complexity from exported ADF JSON files
import json
from pathlib import Path

def score_pipeline(doc: dict) -> int:
    activities = doc.get("properties", {}).get("activities", [])
    score = len(activities)
    score += sum(3 for a in activities if a.get("type") in {"ForEach", "Until", "IfCondition"})
    score += sum(5 for a in activities if a.get("type") in {"Custom", "ExecutePipeline"})
    return score

for file in Path("adf_export/pipelines").glob("*.json"):
    doc = json.loads(file.read_text())
    score = score_pipeline(doc)
    band = "low" if score < 8 else "medium" if score < 20 else "high"
    print(f"{doc['name']}: score={score}, complexity={band}")

Sample 2 (powershell)

# Export a lightweight governance inventory for migration assessment
param(
    [string]$ResourceGroupName = "rg-data",
    [string]$DataFactoryName = "adf-prod",
    [string]$OutputPath = ".\adf-inventory.csv"
)

Get-AzDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName |
    ForEach-Object {
        [pscustomobject]@{
            AssetType = "Pipeline"
            Name = $_.Name
            Activities = $_.Activities.Count
            Folder = $_.Folder.Name
            Annotations = ($_.Annotations -join ";")
        }
    } | Export-Csv -Path $OutputPath -NoTypeInformation

Sample 3 (powershell)

# Flag self-hosted integration runtimes and non-cloud linked services
param(
    [string]$ResourceGroupName = "rg-data",
    [string]$DataFactoryName = "adf-prod"
)

$irs = Get-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName
$linked = Get-AzDataFactoryV2LinkedService -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName

$irs | Select-Object Name, Type | Format-Table -AutoSize
$linked | ForEach-Object {
    [pscustomobject]@{
        Name = $_.Name
        Type = $_.Properties.Type
        UsesSelfHostedIR = ($_.Properties.ConnectVia.ReferenceName -ne $null)
    }
} | Format-Table -AutoSize

Sample 4 (mermaid)

Diagram 9


Sources & References

  1. Microsoft Fabric documentation - Microsoft Fabric
  2. Introduction to Azure Data Factory - Azure Data Factory
  3. Create a self-hosted integration runtime - Azure Data Factory & Azure Synapse
  4. Mirroring - Microsoft Fabric
  5. Migration Planning for Azure Data Factory to Fabric Data Factory - Microsoft Fabric
  6. Azure Data Factory Documentation - Azure Data Factory
  7. Ingest Data with Microsoft Fabric - Training

Try it yourself

Run this tutorial as a Jupyter notebook: Download runbook.ipynb (28 cells, 24 KB).

Link copied