---
name: databricks-migration
description: >
  Port Databricks notebooks and jobs to Microsoft Fabric. Provides an exhaustive dbutils
  to notebookutils substitution table: fs operations (mount removal via OneLake Shortcuts),
  secret scope to Key Vault URL conversion, notebook run and exit, widget replacement with
  parameter-tagged cells, and library install replacement with Fabric Environments.
  Covers Unity Catalog three-level namespace reduction to Lakehouse two-level schemas,
  DBFS path conversion to OneLake, Databricks Jobs to Spark Job Definitions, MLflow
  tracking URI removal, and Photon to Native Execution Engine substitution. Use when the
  user wants to: (1) replace dbutils with notebookutils, (2) collapse Unity Catalog
  namespaces to Lakehouse schemas, (3) convert Databricks Jobs or Delta Live Tables.
  Triggers: "migrate from databricks", "databricks to fabric", "dbutils to notebookutils",
  "dbutils fabric", "unity catalog migration", "dbfs to onelake",
  "databricks notebook migration", "delta live tables fabric", "photon native execution".
---

> **Update Check — ONCE PER SESSION (mandatory)**
> The first time this skill is used in a session, run the **check-updates** skill before proceeding.
> - **GitHub Copilot CLI / VS Code**: invoke the `check-updates` skill.
> - **Claude Code / Cowork / Cursor / Windsurf / Codex**: compare local vs remote package.json version.
> - Skip if the check was already performed earlier in this session.

> **CRITICAL NOTES**
> 1. To find workspace details (including its ID) from a workspace name: list all workspaces, then use JMESPath filtering
> 2. To find item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace, then use JMESPath filtering
> 3. `dbutils.widgets` has **no direct equivalent** in Fabric — use notebook parameters (cell tag `parameters`) or `notebookutils.runtime.context` for context injection
> 4. `dbutils.library` (runtime library install) has **no equivalent** — use Fabric Environments for reproducible library management
> 5. Unity Catalog uses a 3-level namespace (`catalog.schema.table`); Fabric Lakehouse uses 2-level (`schema.table` within a named Lakehouse)

# Databricks → Microsoft Fabric Migration

## Prerequisite Knowledge

Read these companion documents before executing migration tasks:

- [COMMON-CORE.md](../../common/COMMON-CORE.md) — Fabric REST API patterns, authentication, token audiences, item discovery
- [COMMON-CLI.md](../../common/COMMON-CLI.md) — `az rest`, `az login`, token acquisition, Fabric REST via CLI
- [SPARK-AUTHORING-CORE.md](../../common/SPARK-AUTHORING-CORE.md) — Notebook deployment, lakehouse creation, Spark job execution

For notebook and Lakehouse creation, see [spark-authoring-cli](../spark-authoring-cli/SKILL.md).
For Fabric Warehouse DDL/DML authoring, see [sqldw-authoring-cli](../sqldw-authoring-cli/SKILL.md).

---

## Table of Contents

| Topic | Reference |
|---|---|
| Migration Workload Map | [§ Migration Workload Map](#migration-workload-map) |
| Complete `dbutils` → `notebookutils` Mapping | [dbutils-to-notebookutils.md](resources/dbutils-to-notebookutils.md) |
| Unity Catalog → Fabric Lakehouse Schemas | [catalog-migration.md](resources/catalog-migration.md) |
| Before/After Code Patterns | [code-patterns.md](resources/code-patterns.md) |
| Cluster Config → Fabric Spark Pools | [§ Cluster Config → Fabric Spark Pools](#cluster-config--fabric-spark-pools) |
| Databricks Jobs → Spark Job Definitions | [§ Databricks Jobs → Spark Job Definitions](#databricks-jobs--spark-job-definitions) |
| Delta Sharing → OneLake Shortcuts | [§ Delta Sharing → OneLake Shortcuts](#delta-sharing--onelake-shortcuts) |
| MLflow → Fabric ML Experiments | [§ MLflow → Fabric ML Experiments](#mlflow--fabric-ml-experiments) |
| Must / Prefer / Avoid | [§ Must / Prefer / Avoid](#must--prefer--avoid) |
| Authentication & Token Acquisition | [COMMON-CORE.md § Authentication](../../common/COMMON-CORE.md#authentication--token-acquisition) |
| Lakehouse Management | [SPARK-AUTHORING-CORE.md § Lakehouse Management](../../common/SPARK-AUTHORING-CORE.md#lakehouse-management) |
| Notebook Management | [SPARK-AUTHORING-CORE.md § Notebook Management](../../common/SPARK-AUTHORING-CORE.md#notebook-management) |

---

## Migration Workload Map

| Databricks Component | Fabric Target | Notes |
|---|---|---|
| **All-purpose cluster** (notebooks, REPL) | Fabric Notebook (Starter Pool or Custom Pool) | No persistent cluster — Fabric provisions compute on session start |
| **Job cluster** (automated jobs) | **Spark Job Definition (SJD)** | SJD maps one-to-one with Databricks Jobs on job clusters |
| **Unity Catalog** | **Fabric Lakehouse** (schema per namespace) | See [catalog-migration.md](resources/catalog-migration.md) |
| **Databricks Repos** (Git-backed notebooks) | **Fabric Git Integration** | Connect workspace to Azure DevOps or GitHub; notebooks are synced |
| **Delta Live Tables (DLT)** | **Fabric Notebooks** + **Data Pipelines** | No DLT equivalent — rewrite DLT datasets as parameterized notebook cells with pipeline orchestration |
| **Databricks SQL Warehouses** | **Fabric Warehouse** or **Lakehouse SQL Endpoint** | SQL warehouse sessions → Warehouse (for write) or SQL Endpoint (for read-only) |
| **MLflow Tracking** | **Fabric ML Experiments** | MLflow SDK is supported in Fabric — see [§ MLflow](#mlflow--fabric-ml-experiments) |
| **Delta Sharing** | **OneLake Shortcuts** + **Fabric external data sharing** | See [§ Delta Sharing → OneLake Shortcuts](#delta-sharing--onelake-shortcuts) |
| **Databricks Feature Store** | **Fabric Feature Store** (preview) | Direct conceptual equivalent; APIs differ |
| **dbutils** (all sub-modules) | **`notebookutils`** (most sub-modules) | See [dbutils-to-notebookutils.md](resources/dbutils-to-notebookutils.md) for full mapping |

---

## `dbutils` → `notebookutils` Quick Reference

The complete side-by-side API table is in [dbutils-to-notebookutils.md](resources/dbutils-to-notebookutils.md). The key mappings are:

| `dbutils` Call | `notebookutils` Equivalent | Compatibility Note |
|---|---|---|
| `dbutils.fs.ls(path)` | `notebookutils.fs.ls(path)` | **Direct replacement** |
| `dbutils.fs.cp(src, dest)` | `notebookutils.fs.cp(src, dest)` | **Direct replacement** |
| `dbutils.fs.mv(src, dest)` | `notebookutils.fs.mv(src, dest, create_path, overwrite=False)` | ⚠️ Signature differs — see [dbutils-to-notebookutils.md](resources/dbutils-to-notebookutils.md) |
| `dbutils.fs.rm(path, recurse)` | `notebookutils.fs.rm(path, recurse)` | **Direct replacement** |
| `dbutils.fs.mkdirs(path)` | `notebookutils.fs.mkdirs(path)` | **Direct replacement** |
| `dbutils.fs.put(path, contents)` | `notebookutils.fs.put(path, contents)` | **Direct replacement** |
| `dbutils.fs.head(path, maxBytes)` | `notebookutils.fs.head(path, max_bytes)` | ⚠️ Default differs — Python/Scala 100 KB, R 64 KB. See [dbutils-to-notebookutils.md](resources/dbutils-to-notebookutils.md) |
| `dbutils.fs.mount(...)` | `notebookutils.fs.mount(source, mountPoint, extraConfigs=None)` | ✅ **Supported** — Microsoft Entra (default), `accountKey`, or `sasToken` auth. For cross-workspace / persistent sharing, prefer **OneLake Shortcuts** |
| `dbutils.secrets.get(scope, key)` | `notebookutils.credentials.getSecret(keyVaultUrl, secretName)` | Scope → Key Vault URL; key → secret name |
| `dbutils.notebook.run(path, timeout, args)` | `notebookutils.notebook.run(name, timeout, args)` | `path` → notebook `name` (relative to workspace) |
| `dbutils.notebook.exit(value)` | `notebookutils.notebook.exit(value)` | **Direct replacement** |
| `dbutils.widgets.get(name)` | See [§ Widgets Migration](#widgets-migration) | No direct equivalent |
| `dbutils.library.install(...)` | **Not available at runtime** — use **Fabric Environments** | `dbutils.library.restartPython()` → `notebookutils.session.restartPython()` |
| `dbutils.data.summarize(df)` | `display(df.summary())` | Use `display()` or pandas `describe()` |

### Widgets Migration

`dbutils.widgets` has no direct equivalent in Fabric. Use these patterns instead:

| Use Case | Fabric Pattern |
|---|---|
| Pass parameter from parent notebook | Mark a cell in the child notebook as a **parameters cell** (notebook UI: cell "..." menu → "Mark cell as parameters"). The parent calls `notebookutils.notebook.run("child", arguments={"param": "value"})` — at runtime the engine inserts a new cell beneath the parameters cell that overrides the defaults |
| Pipeline-driven parameterization | Same parameters-cell mechanism; the Fabric Pipeline notebook activity supplies override values via its **Base parameters** setting |
| Centralized cross-notebook config | Use `notebookutils.variableLibrary.getLibrary("<name>")` to read values from a Variable Library item (deployment pipelines activate the right value set per stage) |
| Interactive selection in notebook | Use `display()` with input cells, IPython widgets (Python only), or Fabric Data Activator |

> Note: `notebookutils.runtime.context` does **not** expose parameter values. It's for execution metadata (workspace/notebook/activity/user IDs, pipeline-vs-interactive flags, etc.). See [dbutils-to-notebookutils.md § Runtime Context](resources/dbutils-to-notebookutils.md#runtime-context).

---

## Cluster Config → Fabric Spark Pools

| Databricks Cluster Concept | Fabric Spark Equivalent | Notes |
|---|---|---|
| All-purpose cluster (interactive) | **Starter Pool** | Auto-provisioned; no config; ideal for notebooks |
| Job cluster (single-use for jobs) | **Custom Pool** (or Starter Pool) attached to SJD | Configure node size, autoscale in Fabric capacity settings |
| Node type (e.g., `Standard_DS3_v2`) | **Fabric node size** (Small/Medium/Large/X-Large/XX-Large) | Map by vCore/memory ratio |
| Autoscale min/max workers | Custom Pool **min/max node** settings | Available in workspace Spark settings |
| `spark.conf` in cluster settings | **Fabric Environment** Spark properties | Move to Environment item; attach to workspace or notebook |
| `init_scripts` (cluster init) | **Fabric Environment** install script | Not fully equivalent — only library installs are supported |
| Databricks Runtime version | **Fabric Runtime** (1.1 = Spark 3.3, 1.2 = Spark 3.4, 1.3 = Spark 3.5) | Choose matching Spark version; test deprecated APIs |
| Photon accelerator | **Fabric Native Execution Engine (NEE)** | Enable in workspace Spark settings; vectorized execution similar to Photon |

---

## Databricks Jobs → Spark Job Definitions

| Databricks Jobs Concept | Fabric SJD Equivalent | Notes |
|---|---|---|
| Job with single notebook task | **SJD** referencing a notebook | Attach a default Lakehouse; pass parameters via SJD args |
| Multi-task job (DAG of tasks) | **Fabric Data Pipeline** orchestrating multiple SJDs/notebooks | Pipeline activities map to job tasks; dependencies = activity dependencies |
| Job schedule (cron) | **Pipeline schedule trigger** | Cron expression → recurrence trigger in pipeline |
| Job parameters | **SJD default arguments** or **notebook cell parameters** | Parameters cell in notebook is injected at runtime |
| Job clusters per task | **Pool attached to SJD** | Each SJD can specify its Spark pool independently |
| Databricks Workflows | **Fabric Data Pipelines** | Full DAG orchestration with conditions, loops, and failure branches |

> **Delegate to `spark-authoring-cli`** for SJD creation and notebook deployment.

---

## Delta Sharing → OneLake Shortcuts

| Databricks Delta Sharing Pattern | Fabric Equivalent |
|---|---|
| Provider publishes a Delta share | Fabric **external data sharing** (preview) or OneLake Shortcut to ADLS Gen2 where Delta data resides |
| Recipient reads shared data | Create a **OneLake Shortcut** pointing to the ADLS Gen2 Delta table; access via Lakehouse |
| Cross-workspace table sharing within org | **OneLake Shortcuts** pointing to another workspace's Lakehouse tables — no data copy |
| Cross-tenant sharing | Fabric **external data sharing** (GA roadmap) — use ADLS Gen2 shortcut as interim |

---

## MLflow → Fabric ML Experiments

Fabric ML Experiments are built on the MLflow SDK — most code is directly portable:

| Databricks MLflow Pattern | Fabric Equivalent | Migration Action |
|---|---|---|
| `mlflow.set_tracking_uri("databricks")` | Remove — Fabric tracking is automatic | Delete this line in Fabric notebooks |
| `mlflow.set_experiment("/path/exp")` | `mlflow.set_experiment("experiment_name")` | Use name only (not path); Fabric creates the Experiment item |
| `mlflow.log_metric(...)` | `mlflow.log_metric(...)` — **identical** | No change |
| `mlflow.log_artifact(...)` | `mlflow.log_artifact(...)` — **identical** | No change |
| `mlflow.autolog()` | `mlflow.autolog()` — **identical** | No change |
| `mlflow.register_model(...)` | `mlflow.register_model(...)` — **identical** | Model Registry is available in Fabric ML |
| Databricks Model Serving | **Azure ML Online Endpoints** or **Fabric Data Activator** | No direct Fabric model serving yet — use Azure ML |

---

## Must / Prefer / Avoid

### MUST DO
- **Replace all `dbutils.*` calls** using the mapping in [dbutils-to-notebookutils.md](resources/dbutils-to-notebookutils.md) — `dbutils` is not available in Fabric notebooks
- **Migrate `dbutils.fs.mount()` to `notebookutils.fs.mount()`** (✅ supported — Microsoft Entra default, or `accountKey` / `sasToken` from Key Vault). For cross-workspace or persistent sharing, prefer **OneLake Shortcuts** instead. Always pair `mount()` with `unmount()` in `try/finally` — Fabric mounts are not released automatically on session end
- **Replace `dbutils.secrets.get(scope, key)`** with `notebookutils.credentials.getSecret(keyVaultUrl, secretName)` — secret scopes map to Azure Key Vault URLs
- **Redesign widget-based parameter passing** using notebook **parameters cells** (cell "..." menu → "Mark cell as parameters"); use `notebookutils.variableLibrary` for centralized cross-notebook config. `notebookutils.runtime.context` does **not** expose parameter values
- **Replace `dbutils.library.install*()`** with Fabric **Environments** — runtime library installs are not supported in production. `dbutils.library.restartPython()` maps to `notebookutils.session.restartPython()` (Python / PySpark only)
- **Adapt Unity Catalog 3-level namespaces** (`catalog.schema.table`) to Fabric 2-level (`schema.table` within a Lakehouse) — see [catalog-migration.md](resources/catalog-migration.md)
- **Map Databricks cluster init scripts** to Fabric Environments — cluster-level library installs must move to Environment items

### PREFER
- **Fabric Native Execution Engine (NEE)** as the Photon equivalent — enable in workspace Spark settings for vectorized execution on Delta Lake
- **OneLake Shortcuts** over data copy for Delta tables that already exist in ADLS Gen2 — point directly without re-ingesting
- **Fabric Git Integration** as the replacement for Databricks Repos — connect workspace to ADO or GitHub for notebook version control
- **Fabric ML Experiments** for direct MLflow continuity — tracking code requires minimal changes (remove `set_tracking_uri`)
- **Medallion architecture** when restructuring migrated Databricks catalogs — align `bronze`, `silver`, `gold` Unity Catalog schemas to separate Fabric Lakehouses
- **Starter Pool** for migrating interactive notebook workflows — eliminates cluster startup time that was a common pain point in Databricks job clusters

### AVOID
- **Do not import `dbutils` or attempt `dbutils = ...` assignments** in Fabric notebooks — this will raise `NameError`; always use `notebookutils`
- **Do not assume Unity Catalog governance policies transfer automatically** — RBAC, row-level security, and column masking must be reconfigured in Fabric using workspace roles and Lakehouse permissions
- **Do not use `%pip install` in production Fabric notebooks** at runtime — use Fabric Environments for stable, versioned library management
- **Do not attempt to port Delta Live Tables (DLT) pipelines verbatim** — DLT has no Fabric equivalent; rewrite as parameterized notebooks orchestrated by Fabric Pipelines
- **Do not rely on Databricks-specific Spark configurations** (e.g., `spark.databricks.*`) — these are proprietary and will be silently ignored or raise errors in Fabric
- **Do not use DBFS paths** (`dbfs:/...`) — there is no DBFS in Fabric; all paths must use OneLake `abfss://` or Lakehouse-relative paths

---

## Examples

See [dbutils-to-notebookutils.md](resources/dbutils-to-notebookutils.md) and [code-patterns.md](resources/code-patterns.md) for the full mapping. Key quick references:

**`dbutils.fs` → `notebookutils.fs`**

```python
# Databricks
dbutils.fs.ls("/mnt/bronze/orders/")
dbutils.fs.cp("/mnt/raw/file.csv", "/mnt/archive/file.csv")

# Fabric (replace DBFS/mount paths with OneLake relative paths)
notebookutils.fs.ls("Files/bronze/orders/")
notebookutils.fs.cp("Files/raw/file.csv", "Files/archive/file.csv")
```

**`dbutils.secrets` → `notebookutils.credentials`**

```python
# Databricks
pwd = dbutils.secrets.get(scope="prod", key="db-password")

# Fabric (scope → Key Vault URL, key → secret name)
pwd = notebookutils.credentials.getSecret("https://myvault.vault.azure.net/", "db-password")
```

**Unity Catalog namespace → Lakehouse schema**

```python
# Databricks
df = spark.read.table("prod.silver.customers")

# Fabric (catalog dropped; Lakehouse context provides it)
df = spark.read.table("silver.customers")
```