---
name: synapse-migration
description: >
  Port Azure Synapse Analytics Spark workloads to Microsoft Fabric.
  Translates mssparkutils calls to notebookutils (including the env→runtime namespace change),
  replaces Linked Services with Fabric Data Connections and OneLake Shortcuts.
  Covers Spark Pools, Lake Databases, Notebooks, and Spark Job Definitions.
  Use when the user wants to:
  (1) port Synapse Spark notebooks to Fabric Lakehouse or Spark Job Definitions,
  (2) replace mssparkutils or Linked Services in Synapse code.
  Triggers: "migrate from synapse", "synapse to fabric", "mssparkutils to notebookutils",
  "synapse linked service replacement", "port synapse notebooks", "synapse workspace migration".
---

> **Update Check — ONCE PER SESSION (mandatory)**
> The first time this skill is used in a session, run the **check-updates** skill before proceeding.
> - **GitHub Copilot CLI / VS Code**: invoke the `check-updates` skill.
> - **Claude Code / Cowork / Cursor / Windsurf / Codex**: compare local vs remote package.json version.
> - Skip if the check was already performed earlier in this session.

> **CRITICAL NOTES**
> 1. To find workspace details (including its ID) from a workspace name: list all workspaces, then use JMESPath filtering
> 2. To find item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace, then use JMESPath filtering
> 3. `mssparkutils` and `notebookutils` share the same API surface in most cases — the namespace is the primary change
> 4. Linked Services have no direct REST API equivalent in Fabric — they are replaced by Data Connections (for external sources) and OneLake Shortcuts (for storage mounts)

# Synapse Analytics → Microsoft Fabric Migration

## Prerequisite Knowledge

These companion documents provide general Fabric REST patterns. **Do NOT read them upfront** — reference only when a specific phase requires a pattern not already covered in this skill's resource files:

- [COMMON-CORE.md](../../common/COMMON-CORE.md) — General Fabric REST API patterns, authentication & token audiences, item discovery via JMESPath
- [COMMON-CLI.md](../../common/COMMON-CLI.md) — `az rest` / `az login` CLI patterns, authentication recipes
- [SPARK-AUTHORING-CORE.md](../../common/SPARK-AUTHORING-CORE.md) — Notebook/lakehouse creation (already covered in [spark-item-migration.md](resources/spark-item-migration.md) and [lake-database-migration.md](resources/lake-database-migration.md))
- [SQLDW-AUTHORING-CORE.md](../../common/SQLDW-AUTHORING-CORE.md) — Fabric Warehouse T-SQL (delegate to `sqldw-authoring-cli` skill)

> **Auth, API endpoints, and item payloads are fully documented in this skill's own files.** The common docs above are fallback references only.

---

## Table of Contents

| Topic | Reference |
|---|---|
| **Migration Orchestrator** | [migration-orchestrator.md](resources/migration-orchestrator.md) |
| API-Driven Migration Workflow | [§ API-Driven Migration Workflow](#api-driven-migration-workflow) |
| Migration Workload Map | [§ Migration Workload Map](#migration-workload-map) |
| Spark Pool → Environment Migration | [spark-pool-migration.md](resources/spark-pool-migration.md) |
| Lake Database → Lakehouse Migration | [lake-database-migration.md](resources/lake-database-migration.md) |
| External Hive Metastore → Lakehouse Migration | [external-hms-migration.md](resources/external-hms-migration.md) |
| Notebook & SJD Migration | [spark-item-migration.md](resources/spark-item-migration.md) |
| Library Compatibility (Synapse vs. Fabric RT 1.3) | [library-compatibility.md](resources/library-compatibility.md) |
| Connector Refactoring (Kusto, Cosmos DB, ADLS OAuth) | [connector-refactoring.md](resources/connector-refactoring.md) |
| `mssparkutils` → `notebookutils` API Mapping | [utility-api-mapping.md](resources/utility-api-mapping.md) |
| Linked Services → Data Connections / Shortcuts | [connectivity-migration.md](resources/connectivity-migration.md) |
| Before/After Code Patterns (incl. Catalog API gaps) | [code-patterns.md](resources/code-patterns.md) |
| Migration Report (with Fabric portal links) | [migration-report.md](resources/migration-report.md) |
| Migration Troubleshooting Guide | [migration-gotchas.md](resources/migration-gotchas.md) |
| Validation & Testing | [validation-testing.md](resources/validation-testing.md) |
| Security & Governance (Production Readiness) | [security-governance.md](resources/security-governance.md) |
| T-SQL & Spark Configuration Differences | [§ T-SQL & Spark Configuration Differences](#t-sql--spark-configuration-differences) |
| Capacity Sizing Reference | [§ Capacity Sizing Reference](#capacity-sizing-reference) |
| Must / Prefer / Avoid | [§ Must / Prefer / Avoid](#must--prefer--avoid) |
| Feature Parity Reference | [§ Feature Parity Reference](#feature-parity-reference) |
| Migration Gotchas — Quick Reference | [§ Migration Gotchas](#migration-gotchas--quick-reference) + [migration-gotchas.md](resources/migration-gotchas.md) |
| Post-Migration: What's Next | [§ Post-Migration: What's Next](#post-migration-whats-next) |

### Context Loading Guide

> **IMPORTANT — Load only what you need.** Do NOT read all resource files upfront. Load the specific file for the phase you are executing:

| When | Read This File | Lines |
|---|---|---|
| User asks to migrate a workspace (full orchestration) | [migration-orchestrator.md](resources/migration-orchestrator.md) | ~1264 |
| Phase 0: Spark Pools → Environments | [spark-pool-migration.md](resources/spark-pool-migration.md) | ~290 |
| Phase 1: Databases → Lakehouses (built-in HMS) | [lake-database-migration.md](resources/lake-database-migration.md) | ~574 |
| Phase 1: Databases → Lakehouses (external HMS) | [external-hms-migration.md](resources/external-hms-migration.md) | ~388 |
| Phase 2–3: Notebooks & SJDs | [spark-item-migration.md](resources/spark-item-migration.md) | ~326 |
| Code refactoring (mssparkutils, connectors) | [utility-api-mapping.md](resources/utility-api-mapping.md) + [connector-refactoring.md](resources/connector-refactoring.md) + [code-patterns.md](resources/code-patterns.md) | ~588 |
| Post-migration validation | [validation-testing.md](resources/validation-testing.md) | ~487 |
| Troubleshooting failures | [migration-gotchas.md](resources/migration-gotchas.md) | ~225 |
| Production security setup | [security-governance.md](resources/security-governance.md) | ~926 |
| Library version gaps | [library-compatibility.md](resources/library-compatibility.md) | ~106 |
| Generating migration report | [migration-report.md](resources/migration-report.md) | ~360 |
| Capacity sizing & SKU planning | [capacity-sizing.md](resources/capacity-sizing.md) | ~85 |
| Feature parity matrix | [feature-parity.md](resources/feature-parity.md) | ~65 |

---

## API-Driven Migration Workflow

This skill supports programmatic migration of Synapse Spark items via REST APIs (no UI-based Migration Assistant required).

### Authentication

| Target | Token Audience |
|---|---|
| Synapse ARM (management plane) | `https://management.azure.com` |
| Synapse Data Plane | `https://dev.azuresynapse.net` |
| Fabric REST API | `https://api.fabric.microsoft.com` |

> Use the token-acquisition recipe in [COMMON-CLI § Authentication Recipes](../../common/COMMON-CLI.md#authentication-recipes) with the audiences above.

### Migration Phases (Execute in Order)

| Phase | Synapse Source | Fabric Target | Resource |
|---|---|---|---|
| Phase 0 | Spark Pool | Environment | [spark-pool-migration.md](resources/spark-pool-migration.md) |
| Phase 1 | Lake Database (built-in HMS) | Lakehouse | [lake-database-migration.md](resources/lake-database-migration.md) |
| Phase 1 | External Hive Metastore | Lakehouse | [external-hms-migration.md](resources/external-hms-migration.md) |
| Phase 1b | Ad-hoc `abfss://` storage paths | OneLake Shortcuts | [migration-orchestrator.md](resources/migration-orchestrator.md) (migrate-and-modernize only) |
| Phase 2 | Notebooks | Notebook | [spark-item-migration.md](resources/spark-item-migration.md) |
| Phase 3 | Spark Job Definitions | SJD | [spark-item-migration.md](resources/spark-item-migration.md) |
| Final | Validation & Testing | — | [validation-testing.md](resources/validation-testing.md) |
| Optional | Security & Governance | — | [security-governance.md](resources/security-governance.md) |

> **Phase order matters**: Environments (Phase 0) must exist before notebooks/SJDs can bind to them. Lakehouses (Phase 1) must exist before notebooks can bind to them (Phase 2).

> For the full execution flow with sub-steps, decision points, lift-and-shift vs. modernize paths, and error recovery, see [migration-orchestrator.md](resources/migration-orchestrator.md).

### REST API Quick Reference

All Synapse and Fabric API endpoints with request/response examples are in [migration-orchestrator.md](resources/migration-orchestrator.md) (Steps 2a–2e). Authentication tokens:

| Target | Token Audience |
|---|---|
| Synapse ARM | `https://management.azure.com` |
| Synapse Data Plane | `https://dev.azuresynapse.net` |
| Fabric REST API | `https://api.fabric.microsoft.com` |

> **API docs**: [Synapse ARM](https://learn.microsoft.com/en-us/rest/api/synapse) · [Synapse Data Plane](https://learn.microsoft.com/en-us/rest/api/synapse/data-plane) · [Fabric Items](https://learn.microsoft.com/en-us/rest/api/fabric/core/items) · [Fabric Shortcuts](https://learn.microsoft.com/en-us/rest/api/fabric/core/onelake-shortcuts) · [Fabric Connections](https://learn.microsoft.com/en-us/rest/api/fabric/core/connections) · [Fabric Environments](https://learn.microsoft.com/en-us/rest/api/fabric/environment)

---

## Migration Workload Map

Use this table to determine the correct Fabric target for each Synapse component:

| Synapse Component | Fabric Target | Notes |
|---|---|---|
| **Spark Pool** (notebooks, jobs) | Fabric Spark (Lakehouse / Notebooks / SJD) | Starter Pool replaces on-demand pools for most workloads |
| **Dedicated SQL Pool** | **Fabric Warehouse** | T-SQL surface area differences apply — see [§ T-SQL & Spark Configuration Differences](#t-sql--spark-configuration-differences). *Procedural migration guide not yet available — separate migration track. For T-SQL authoring, delegate to `sqldw-authoring-cli`.* |
| **Serverless SQL Pool** | **Lakehouse SQL Endpoint** | Read-only Delta/Parquet queries; no DDL required |
| **Synapse Pipelines** | **Fabric Data Pipelines** | Activity types, triggers, and expressions are broadly compatible. *Pipeline migration resource not yet available — separate migration track.* |
| **Synapse Link for Cosmos DB / SQL** | **Fabric Mirroring** | Native mirroring replaces the Synapse Link connector pattern. *Not covered by this skill.* |
| **Linked Services** | **Data Connections** (external) / **OneLake Shortcuts** (storage) | See [connectivity-migration.md](resources/connectivity-migration.md) |
| **Integration Datasets** | **Fabric Pipeline source/sink config** | Dataset definitions are inlined into pipeline activities in Fabric. *Not covered by this skill.* |
| **Managed Virtual Networks** | **Fabric Managed Private Endpoints** | Configure in Fabric capacity settings |
| **Synapse Studio** | **Fabric workspace** | All artifact types live in a single workspace with Git integration |

### Decision Tree: Which Fabric Spark Workload?

```text
Synapse Spark workload
├── Interactive notebook with data exploration → Fabric Notebook (attached to Lakehouse)
├── Scheduled/production job → Spark Job Definition (SJD)
├── T-SQL over files/Delta → Lakehouse SQL Endpoint (no migration needed — just point to OneLake)
└── Real-time ingest → Fabric Eventstream + Lakehouse
```

---

## T-SQL & Spark Configuration Differences

For detailed T-SQL surface area gaps (PolyBase → `COPY INTO`, distribution hints, result set caching) and Spark configuration mappings (pools, `%%configure`, runtime versions), see [feature-parity.md](resources/feature-parity.md).

> **Key actions**: Remove `DISTRIBUTION = HASH(col)` hints, replace `CREATE EXTERNAL TABLE` with `COPY INTO`, replace `spark.read.synapsesql()` with OneLake shortcuts or JDBC. Delegate T-SQL authoring to `sqldw-authoring-cli`.

---

## Capacity Sizing Reference

For Synapse pool → Fabric SKU mapping tables, sizing decision guide, and cost model comparison, see [capacity-sizing.md](resources/capacity-sizing.md).

> **Quick guide**: Dev/test = F8–F16 with Starter Pool; standard production = F32–F64; enterprise = F128+. Use Fabric Trial (free F64, 60 days) for migration validation.

---

## Must / Prefer / Avoid

### MUST DO
- **Replace all `mssparkutils` imports with `notebookutils`** — see [utility-api-mapping.md](resources/utility-api-mapping.md) for the complete namespace table
- **Replace all Linked Services** with Fabric Data Connections (for external databases/services) or OneLake Shortcuts (for ADLS Gen2 / Blob storage mounts) — see [connectivity-migration.md](resources/connectivity-migration.md)
- **Replace `spark.read.synapsesql()`** with Lakehouse shortcut reads or JDBC connections to the Fabric Warehouse SQL endpoint
- **Re-test all notebooks** after migration against the target Fabric Runtime version — Spark minor version differences can surface deprecated API warnings
- **Externalize all workspace/item IDs** — never hardcode; use pipeline parameters or [Variable Libraries](#variable-library-for-environment-promotion)
- **Replace pool-level library installs** with Fabric Environments attached at the workspace or notebook level

### PREFER
- **OneLake Shortcuts over full data copies** — mount existing ADLS Gen2 containers as shortcuts rather than re-ingesting data during migration
- **Fabric Starter Pool** for dev/test migrations — eliminates pool warm-up wait time inherent in Synapse on-demand pools
- **Lakehouse SQL Endpoint** as a drop-in for Serverless SQL Pool reads — point existing consumers at the endpoint with minimal query changes
- **Medallion architecture** for migrated data — align with Bronze/Silver/Gold patterns (see `e2e-medallion-architecture` skill)
- **Incremental migration** — migrate and validate workload by workload rather than performing a big-bang cutover
- **Parameterized notebooks** to allow environment promotion (dev → test → prod) without code changes

### AVOID
- **Do not copy-paste PolyBase `CREATE EXTERNAL TABLE` DDL** into Fabric Warehouse — rewrite as `COPY INTO` or use Lakehouse for external data access
- **Do not assume Synapse Linked Service connection strings are reusable** — credentials and endpoints must be reconfigured as Fabric Data Connections
- **Do not install libraries in notebook cells** (`%pip install` at runtime) for production workloads — use Fabric Environments for reproducible, versioned library management
- **Do not migrate Dedicated SQL Pool distribution hints** (`HASH`, `ROUND_ROBIN`, `REPLICATE`) verbatim — remove them; Fabric Warehouse handles distribution automatically
- **Do not use `wasb://` or `abfss://container@storageaccount.dfs.core.windows.net/` paths** as primary data paths — migrate data access to OneLake `abfss://workspace@onelake.dfs.fabric.microsoft.com/` paths

---

## Examples

See [code-patterns.md](resources/code-patterns.md) for full before/after examples. Key quick references:

**`mssparkutils.env` → `notebookutils.runtime`**

```python
# Synapse
workspace = mssparkutils.env.getWorkspaceName()

# Fabric
workspace = notebookutils.runtime.context["workspaceName"]
```

**Linked Service credential → Key Vault secret**

```python
# Synapse
conn = mssparkutils.credentials.getConnectionStringOrCreds("MyLinkedService")

# Fabric
conn = notebookutils.credentials.getSecret("https://myvault.vault.azure.net/", "my-secret")
```

**Dedicated SQL Pool DDL → Fabric Warehouse DDL**

```sql
-- Synapse (remove distribution hints)
CREATE TABLE dbo.Fact (...) WITH (DISTRIBUTION = HASH(id), CLUSTERED COLUMNSTORE INDEX);

-- Fabric Warehouse
CREATE TABLE dbo.Fact (...);
```

---

## Feature Parity Reference

Full Synapse → Fabric feature matrix (28 features), T-SQL surface area gaps, and Spark configuration differences are in [feature-parity.md](resources/feature-parity.md).

> **Key gaps** (⚠️/❌): `spark.read.synapsesql()` replaced by JDBC/shortcuts · Linked Services redesigned as Data Connections/Shortcuts · External HMS partial (migrate as shortcuts) · `mssparkutils.env` renamed to `notebookutils.runtime` · Result set caching ❌ · Workload management ❌ · PolyBase → `COPY INTO`

---

## Migration Gotchas — Quick Reference

The full troubleshooting guide with code examples and multi-option resolutions is in [migration-gotchas.md](resources/migration-gotchas.md). This summary surfaces the key issues for quick scanning during migration:

| # | Flag ID | Issue | Severity | Blocks? | Resolution Summary |
|---|---|---|---|---|---|
| G1 | `SYNAPSESQL_NO_EQUIVALENT` | `spark.read.synapsesql()` has no Fabric equivalent | High | Yes | Replace with OneLake shortcut read, Warehouse JDBC, or Data Pipeline |
| G2 | `LIBRARY_VERSION_CONFLICT` | Custom library version conflicts with Fabric Runtime | Medium | Maybe | Pin compatible version in Environment, or find Fabric-native alternative |
| G3 | `DELTA_PROTOCOL_MISMATCH` | Delta protocol version incompatibility | High | Yes | Rewrite table with matching protocol (`delta.minReaderVersion`/`minWriterVersion`) |
| G4 | `SECURITY_MODEL_INCOMPATIBLE` | Synapse managed identity / IP firewall not portable | Medium | Yes | Reconfigure as Workspace Identity + Fabric Managed Private Endpoints |
| G5 | `GPU_POOL_UNSUPPORTED` | GPU-accelerated Spark pools not available in Fabric | High | Yes | Migration blocker — keep workload in Synapse or use Azure ML |
| G6 | `DOTNET_SPARK_UNSUPPORTED` | .NET for Spark (C#/F# SJDs) not supported | High | Yes | Migration blocker — rewrite in PySpark or keep in Synapse |
| G7 | `NULLABLE_POOL_REFERENCE` | `bigDataPool`/`targetBigDataPool` field is `null` (not missing) — causes `NoneType` crash | Medium | No | Use `(x.get("bigDataPool") or {}).get(...)` pattern |
| G8 | `SESSION_CONFIG_IGNORED` | Some `%%configure` keys silently ignored in Fabric | Low | No | Remove unsupported keys; use Environment for pool-level config |
| G9 | `SHORTCUT_CONNECTION_FAILED` | ADLS shortcut creation fails (connection/permission) | High | Partial | Verify connection credential type (Key > WorkspaceIdentity > OAuth2) and RBAC |

---

## Post-Migration: What's Next

After completing Phases 0–3 and validation, hand off to these companion skills for ongoing operations:

### Agentic Exploration Workflow

Once data has landed in Fabric Lakehouses, use this sequence to validate and explore:

1. **Discover** → List schemas, tables, and row counts via Lakehouse SQL Endpoint (`sqldw-consumption-cli`)
2. **Sample** → `SELECT TOP 5` on migrated tables to verify data integrity
3. **Validate** → Run validation checks from [validation-testing.md](resources/validation-testing.md) (V1–V6)
4. **Explore** → Write Spark or T-SQL queries against migrated data using `spark-consumption-cli` or `sqldw-consumption-cli`
5. **Build** → Create Gold-layer aggregations with `e2e-medallion-architecture` (Bronze → Silver → Gold)
6. **Consume** → Build semantic models and reports with `semantic-model-authoring`

### Companion Skill Cross-References

| Post-Migration Task | Skill | When to Use |
|---|---|---|
| Interactive Lakehouse SQL queries | `sqldw-consumption-cli` | Exploring migrated data via SQL Endpoint |
| Interactive PySpark exploration | `spark-consumption-cli` | Ad-hoc Spark queries on migrated Lakehouses |
| Notebook & SJD authoring (new) | `spark-authoring-cli` | Creating new Spark items post-migration |
| Medallion architecture build-out | `e2e-medallion-architecture` | Structuring Bronze/Silver/Gold after lift-and-shift |
| Warehouse performance monitoring | `sqldw-operations-cli` | Diagnosing slow queries on Fabric Warehouse |
| Semantic model creation | `semantic-model-authoring` | Building Power BI models over migrated data |
| Report consumption & DAX | `semantic-model-consumption` | Querying existing semantic models |
| KQL analytics | `eventhouse-authoring-cli` / `eventhouse-consumption-cli` | If migrating real-time workloads to Eventhouse |

### Variable Library for Environment Promotion

After migration, avoid hardcoded workspace/item IDs by centralizing configuration in a **Variable Library** item:

```python
# Read config from Variable Library — works in notebooks
lib = notebookutils.variableLibrary.getLibrary("MigrationConfig")
lakehouse_name = lib.lakehouse_name
workspace_id = lib.workspace_id

# ❌ WRONG — .get() does not exist
# notebookutils.variableLibrary.get("MigrationConfig", "lakehouse_name")
```

- Use **Value Sets** (`valueSets/dev.json`, `valueSets/prod.json`) to promote across environments without code changes
- Boolean values are returned as strings — compare with `.lower() == "true"`, not `bool()`
- In Data Pipelines, reference via `@pipeline().libraryVariables.<name>` (not `@variables()`)
- Full Variable Library patterns → see [common/notebook-authoring/context-and-params.md § Variable Library](../../common/notebook-authoring/context-and-params.md#variable-library)