--- name: databricks-lakebase description: "Manage Lakebase Postgres Autoscaling projects, branches, and endpoints via Databricks CLI. Use when asked to create, configure, or manage Lakebase Postgres databases, projects, branches, computes, or endpoints, or to check database load, connection health, and activity (via Postgres catalog views over psql)." compatibility: Requires databricks CLI (>= v0.294.0) metadata: version: "0.1.0" parent: databricks-core --- # Lakebase Postgres Autoscaling **FIRST**: Use the parent `databricks-core` skill for CLI basics, authentication, and profile selection. Lakebase is Databricks' serverless Postgres-compatible database (similar to Neon). It provides fully managed OLTP storage with autoscaling, branching, and scale-to-zero. Manage Lakebase Postgres projects, branches, endpoints, and databases via `databricks postgres` CLI commands. ## Resource Hierarchy ``` Project (top-level container) └── Branch (isolated database environment, copy-on-write) ├── Endpoint (read-write or read-only) ├── Database (standard Postgres DB) └── Role (Postgres role) ``` - **Project**: Top-level container. Creating one auto-provisions a `production` branch and a `primary` read-write endpoint. - **Branch**: Isolated database environment sharing storage with parent (copy-on-write). States: `READY`, `ARCHIVED`. - **Endpoint** (called **Compute** in the Lakebase UI): Compute resource powering a branch. Types: `ENDPOINT_TYPE_READ_WRITE`, `ENDPOINT_TYPE_READ_ONLY` (read replica). - **Database**: Standard Postgres database within a branch. Default: `databricks_postgres`. - **Role**: Postgres role within a branch. Manage roles via `databricks postgres create-role -h`. ### Resource Name Formats | Resource | Format | |----------|--------| | Project | `projects/{project_id}` | | Branch | `projects/{project_id}/branches/{branch_id}` | | Endpoint | `projects/{project_id}/branches/{branch_id}/endpoints/{endpoint_id}` | | Database | `projects/{project_id}/branches/{branch_id}/databases/{database_id}` | All IDs: 1-63 characters, start with lowercase letter, lowercase letters/numbers/hyphens only (RFC 1123). ## CLI Discovery — ALWAYS Do This First > **Note:** "Lakebase" is the product name; the CLI command group is `postgres`. All commands use `databricks postgres ...`. **Do NOT guess command syntax.** Discover available commands and their usage dynamically: ```bash # List all postgres subcommands databricks postgres -h # Get detailed usage for any subcommand (flags, args, JSON fields) databricks postgres -h ``` Run `databricks postgres -h` before constructing any command. Run `databricks postgres -h` to discover exact flags, positional arguments, and JSON spec fields for that subcommand. ## Create a Project > **Do NOT list projects before creating.** ```bash databricks postgres create-project \ --json '{"spec": {"display_name": ""}}' \ --profile ``` - Auto-creates: `production` branch + `primary` read-write endpoint (1 CU min/max, scale-to-zero) - Long-running operation; the CLI waits for completion by default. Use `--no-wait` to return immediately. - Run `databricks postgres create-project -h` for all available spec fields (e.g. `pg_version`). After creation, verify the auto-provisioned resources: ```bash databricks postgres list-branches projects/ --profile databricks postgres list-endpoints projects//branches/ --profile databricks postgres list-databases projects//branches/ --profile ``` ## Autoscaling Endpoints use **compute units (CU)** for autoscaling. Configure min/max CU via `create-endpoint` or `update-endpoint`. Run `databricks postgres create-endpoint -h` to see all spec fields. Scale-to-zero is enabled by default. When idle, compute scales down to zero; it resumes in seconds on next connection. ## Branches Branches are copy-on-write snapshots of an existing branch. Use them for **experimentation**: testing schema migrations, trying queries, or previewing data changes -- without affecting production. ```bash databricks postgres create-branch projects/ \ --json '{ "spec": { "source_branch": "projects//branches/", "no_expiry": true } }' --profile ``` Branches require an expiration policy: use `"no_expiry": true` for permanent branches. When done experimenting, delete the branch. Protected branches must be unprotected first -- use `update-branch` to set `spec.is_protected` to `false`, then delete: ```bash # Step 1 — unprotect databricks postgres update-branch projects//branches/ \ --json '{"spec": {"is_protected": false}}' --profile # Step 2 — delete (run -h to confirm positional arg format for your CLI version) databricks postgres delete-branch projects//branches/ \ --profile ``` **Never delete the `production` branch** — it is the authoritative branch auto-provisioned at project creation. ## App Service Principal Permissions — CRITICAL When a Databricks App connects to Lakebase with `CAN_CONNECT_AND_CREATE`, the app's **service principal (SP)** gets a Postgres role that can: - **Connect** to the database - **Create new schemas and tables** (which the SP will own) The SP **cannot** access schemas or tables created by other roles (human users, pipelines, etc.) unless explicitly granted. This is the #1 cause of `permission denied for schema X` errors after deployment. ### ⚠️ Resource Removal Revokes All Grants If the postgres resource is ever **removed** from the app (even temporarily — e.g. via a partial `databricks apps update --json` that omits `resources`, or a bundle deploy that drops the resource), the platform **revokes the SP's Postgres role entirely**. Re-adding the resource creates a fresh role, but **all manually-applied SQL GRANTs are lost** and must be re-applied. This means: - `databricks apps update --json` with incomplete payloads can silently wipe the postgres resource (it does full replacement, not merge) - A `databricks.yml` change that temporarily removes a resource and redeploys will revoke grants - Schema-level grants (`GRANT USAGE ON SCHEMA ...`) are **not** managed by the platform — they are SQL-level and must be restored manually after any role recreation **Always include ALL resources when using `databricks apps update --json`.** Prefer `databricks.yml` + `bundle deploy` to manage resources declaratively. ### Discovering the SP Role Name The SP's Postgres role name is its **client ID** (a UUID). Find it from the app: ```bash databricks apps get --profile -o json # → .service_principal_client_id is the Postgres role name ``` ### Granting Cross-Schema Access When an app needs to read/write schemas it did not create (e.g. `public`, `gold`, or any schema populated by pipelines or human users), you must connect as the **schema owner** and grant access to the SP role. **Step 1 — Generate credentials and connect as the owner:** ```bash databricks postgres generate-database-credential \ projects//branches//endpoints/ \ --profile ``` Use the returned token as the password for `psql`: ```bash PGPASSWORD= psql "host= dbname= sslmode=require user=" ``` **Step 2 — Check schema ownership** (different schemas may have different owners): ```sql SELECT schema_name, schema_owner FROM information_schema.schemata; ``` Tables synced by pipelines are typically owned by a `databricks_writer_XXXXX` role, not by a human user. You must run `ALTER DEFAULT PRIVILEGES` for **each distinct owner role** to cover future tables. Check table ownership with: ```sql SELECT tablename, tableowner FROM pg_tables WHERE schemaname = ''; ``` **Gotcha:** You may not have permission to run `ALTER DEFAULT PRIVILEGES FOR ROLE "databricks_writer_XXXXX"` (requires SET ROLE). In that case, `GRANT SELECT ON ALL TABLES` covers existing tables, but you'll need to re-run it after new tables are synced. The human-user `ALTER DEFAULT PRIVILEGES` only covers tables the human creates directly. **Step 3 — Grant permissions** (replace `` with the UUID from step above): ```sql -- Read-only access to a schema (e.g. gold, synced from lakehouse) GRANT USAGE ON SCHEMA gold TO ""; GRANT SELECT ON ALL TABLES IN SCHEMA gold TO ""; ALTER DEFAULT PRIVILEGES FOR ROLE "" IN SCHEMA gold GRANT SELECT ON TABLES TO ""; -- Read-write access to a schema (e.g. public) GRANT USAGE ON SCHEMA public TO ""; GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA public TO ""; GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO ""; ALTER DEFAULT PRIVILEGES FOR ROLE "" IN SCHEMA public GRANT SELECT, INSERT, UPDATE ON TABLES TO ""; ALTER DEFAULT PRIVILEGES FOR ROLE "" IN SCHEMA public GRANT USAGE ON SEQUENCES TO ""; -- Full access to an app-managed schema (e.g. support_console) GRANT USAGE, CREATE ON SCHEMA support_console TO ""; GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA support_console TO ""; GRANT USAGE ON ALL SEQUENCES IN SCHEMA support_console TO ""; ALTER DEFAULT PRIVILEGES FOR ROLE "" IN SCHEMA support_console GRANT ALL ON TABLES TO ""; ``` `` is typically your email (the human user who created the schema), but for pipeline-synced schemas check the actual owner (e.g. `databricks_writer_XXXXX`). You need `ALTER DEFAULT PRIVILEGES` for each owner role that creates tables in that schema. **Step 4 — `ALTER DEFAULT PRIVILEGES` is essential.** Without it, new tables created by pipelines or migrations (under the owner role) will not be visible to the SP. This prevents the permissions from breaking again when data is re-synced. ### Ownership vs Grants - **Ownership transfer** (`ALTER ... OWNER TO`) requires `SET ROLE` privilege, which Lakebase may not grant between human and SP roles. Use `GRANT` instead. - **AppKit cache**: AppKit creates an `appkit` schema at startup for persistent caching. If a human user previously created it, the SP gets `permission denied` or `must be owner`. Fix: `DROP SCHEMA appkit CASCADE` so the SP recreates it on next deploy (the cache is ephemeral and safe to drop). ## What's Next ### Build a Databricks App After creating a Lakebase project, scaffold a Databricks App connected to it. **Step 1 — Discover branch name** (use `.name` from a `READY` branch): ```bash databricks postgres list-branches projects/ --profile ``` **Step 2 — Discover database name** (use `.name` from the desired database; `` is the branch ID, not the full resource name): ```bash databricks postgres list-databases projects//branches/ --profile ``` **Step 3 — Scaffold the app** with the `lakebase` feature: ```bash databricks apps init --name \ --features lakebase \ --set "lakebase.postgres.branch=" \ --set "lakebase.postgres.database=" \ --run none --profile ``` Where `` is the full resource name (e.g. `projects//branches/`) and `` is the full resource name (e.g. `projects//branches//databases/`). For the full app development workflow, use the **`databricks-apps`** skill. ### Other Workflows **Connect a Postgres client** Get the connection string from the endpoint, then connect with psql, DBeaver, or any standard Postgres client. ```bash databricks postgres get-endpoint projects//branches//endpoints/ --profile ``` **Manage roles and permissions** Create Postgres roles and grant access to databases or schemas. ```bash databricks postgres create-role -h # discover role spec fields ``` **Add a read-only endpoint** Create a read replica for analytics or reporting workloads to avoid contention on the primary read-write endpoint. ```bash databricks postgres create-endpoint projects//branches/ \ --json '{"spec": {"type": "ENDPOINT_TYPE_READ_ONLY"}}' --profile ``` ## Query Performance Diagnostics Lakebase ships with `pg_stat_statements` enabled by default. Use these queries to identify slow queries, cache efficiency, and currently running operations. ### Connecting via psql Generate a short-lived credential and connect: ```bash databricks postgres generate-database-credential \ projects//branches//endpoints/ \ --profile ``` ```bash PGPASSWORD= psql "host= dbname= sslmode=require user=" ``` ### Database load and health When the user asks about **database load**, **how busy the DB is**, **connection health**, or **runtime health** (short of slow-query tuning), answer with a **quick psql snapshot** using the connection steps above. Resolve `` via `databricks postgres list-endpoints` or `get-endpoint`; default database name is usually `databricks_postgres`; `user` is the Databricks account email (same as in the OAuth subject). **Interpretation** - **`pg_stat_activity`**: **point in time** — counts sessions by `state` (`active`, `idle`, `idle in transaction`, etc.) for the current database. Use this for “how many connections right now?” and whether anything is stuck waiting (`wait_event_type` / `wait_event` on `active` rows). - **`pg_stat_database`**: **cumulative since `stats_reset`** (often `NULL` / long-lived) — commits, rollbacks, buffer hits vs reads, tuple counters, deadlocks, temp file usage. Use this for cache hit ratio and whether the instance has seen stress patterns over time, not for instant CPU %. - **Not in Postgres catalogs**: sustained **CPU, memory, disk IOPS, and CU utilization** — those live in **Databricks workspace / Lakebase metrics** if the user needs time series or autoscaling pressure. **Connections by state** ```sql SELECT state, COUNT(*) AS connections FROM pg_stat_activity WHERE datname = current_database() GROUP BY state ORDER BY connections DESC; ``` **Other active sessions and waits** (excluding your own `psql` backend) ```sql SELECT COUNT(*) FILTER (WHERE state = 'active' AND pid <> pg_backend_pid()) AS other_active, COUNT(*) FILTER (WHERE wait_event_type IS NOT NULL AND state = 'active') AS active_waiting FROM pg_stat_activity WHERE datname = current_database(); ``` **Database-wide cumulative stats** (current database row) ```sql SELECT now() AS sampled_at, numbackends AS backends, xact_commit AS commits, xact_rollback AS rollbacks, blks_read, blks_hit, ROUND(100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0), 2) AS buffer_hit_pct, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted, deadlocks, temp_files, temp_bytes, stats_reset FROM pg_stat_database WHERE datname = current_database(); ``` Optional: `session_time`, `active_time`, and related fields on the same row summarize session time **in milliseconds** (PostgreSQL 14+). For **which queries** are expensive, use **`pg_stat_statements`** below, not these views alone. ### Slow Queries (pg_stat_statements) Top queries by mean execution time — the primary tool for finding performance bottlenecks: ```sql SELECT LEFT(query, 120) AS query_preview, calls, ROUND(total_exec_time::numeric, 2) AS total_ms, ROUND(mean_exec_time::numeric, 2) AS mean_ms, ROUND(min_exec_time::numeric, 2) AS min_ms, ROUND(max_exec_time::numeric, 2) AS max_ms, rows, shared_blks_hit, shared_blks_read FROM pg_stat_statements WHERE query NOT LIKE '%pg_stat_statements%' AND query NOT LIKE '%EXPLAIN%' AND query NOT LIKE '%pg_settings%' AND query NOT LIKE '%pg_database_size%' AND query NOT LIKE '%pg_logical_slot%' AND query NOT LIKE '%pg_replication_slot%' AND query NOT LIKE '%__db_system%' AND query NOT LIKE '%__db_prog%' ORDER BY mean_exec_time DESC LIMIT 20; ``` **IMPORTANT**: Always include all exclusion filters above. Lakebase runs internal queries for replication, CDC, and catalog management that will dominate results if not filtered out. Focus on application queries only. ### Highest Total Time (cumulative impact) Queries that consume the most total database time across all invocations: ```sql SELECT LEFT(query, 120) AS query_preview, calls, ROUND(total_exec_time::numeric, 2) AS total_ms, ROUND(mean_exec_time::numeric, 2) AS mean_ms, rows FROM pg_stat_statements WHERE query NOT LIKE '%pg_stat_statements%' AND query NOT LIKE '%EXPLAIN%' AND query NOT LIKE '%pg_settings%' AND query NOT LIKE '%pg_database_size%' AND query NOT LIKE '%pg_logical_slot%' AND query NOT LIKE '%pg_replication_slot%' AND query NOT LIKE '%__db_system%' AND query NOT LIKE '%__db_prog%' ORDER BY total_exec_time DESC LIMIT 20; ``` A query with 2ms mean but 100K calls may matter more than one with 200ms mean and 4 calls. ### Sequential scans and index candidates (`pg_stat_user_tables`) When the user wants **performance optimization**, **indexing opportunities**, or **tables that do too many sequential scans**, query **`pg_stat_user_tables`**. This surfaces tables where Postgres reads many rows via full table scans (`seq_tup_read`) instead of index-driven lookups—often a sign of **missing or unused indexes** on filter/join columns. **Default report**: top **10** tables by **`seq_tup_read`**, **`public`** schema only (adjust `schemaname` if the user cares about other schemas). Include **`seq_scan`**, **`idx_scan`**, and optional scan/tuple mix columns so the reader can see seq-heavy access. Keep written summaries **brief**; `n_live_tup` is an **estimate** (stale stats can mislead). ```sql SELECT schemaname || '.' || relname AS table_name, seq_tup_read, seq_scan, idx_scan, CASE WHEN seq_scan + idx_scan > 0 THEN round(100.0 * seq_scan / (seq_scan + idx_scan), 1) END AS pct_scans_seq, CASE WHEN seq_tup_read + COALESCE(idx_tup_fetch, 0) > 0 THEN round(100.0 * seq_tup_read / (seq_tup_read + idx_tup_fetch), 1) END AS pct_tuples_seq, n_live_tup::bigint AS est_rows FROM pg_stat_user_tables WHERE schemaname = 'public' ORDER BY seq_tup_read DESC LIMIT 10; ``` **How to read it**: very high **`seq_tup_read`** with a high **`pct_scans_seq`** (or **`idx_scan` near zero** for that table’s workload) points to **index candidates** on columns used in `WHERE`, `JOIN`, and `ORDER BY`. Confirm with **`EXPLAIN (ANALYZE, …)`** on real application queries before creating indexes. **`idx_scan = 0`** does not mean “no indexes exist”—only that no index scan was recorded for that statement window (e.g. only the primary key exists but filters use unindexed columns). Pair with **`pg_stat_statements`** (above) to tie slow queries to tables. ### Cache Hit Ratio Healthy databases serve >99% of reads from shared buffers. A low ratio means queries are hitting disk: ```sql SELECT ROUND( 100.0 * SUM(shared_blks_hit) / NULLIF(SUM(shared_blks_hit) + SUM(shared_blks_read), 0), 2 ) AS cache_hit_pct FROM pg_stat_statements; ``` ### Currently Running Queries Find long-running or stuck queries in real time: ```sql SELECT pid, NOW() - query_start AS duration, state, LEFT(query, 120) AS query_preview FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC; ``` ### Explain a Specific Query Get the execution plan for a slow query to understand sequential scans, missing indexes, or join strategy: ```sql EXPLAIN (ANALYZE, VERBOSE, BUFFERS, FORMAT JSON) ; ``` Drop `ANALYZE` to get the plan without actually executing the query. ### Reset Statistics After deploying a fix, reset stats to measure the improvement from a clean baseline: ```sql SELECT pg_stat_statements_reset(); ``` ## Unity Catalog Grants for Apps with Analytics + Lakebase Apps that use **both** the `analytics` plugin (SQL warehouse queries via `config/queries/*.sql`) and `lakebase` need **two layers** of grants for the SP: 1. **Unity Catalog grants** — for typegen (DESCRIBE) and SQL warehouse query execution at build time and runtime 2. **Lakebase Postgres grants** — for tRPC/pool queries at runtime UC grants use the SP's application ID (UUID) as the principal. The `CAN_CONNECT_AND_CREATE` permission on the Lakebase resource does **not** grant UC access — you must set both independently. ```bash # Find the SP application ID databricks apps get --profile -o json # → .service_principal_client_id # Grant UC catalog access (required for typegen to DESCRIBE queries) databricks grants update catalog \ --json '{"changes": [{"add": ["USE_CATALOG"], "principal": ""}]}' \ --profile # Grant UC schema access databricks grants update schema . \ --json '{"changes": [{"add": ["USE_SCHEMA", "SELECT"], "principal": ""}]}' \ --profile ``` Without UC grants, typegen fails with `INSUFFICIENT_PERMISSIONS` during the platform build, producing empty types (`{}`), which breaks the TypeScript compilation. ## Troubleshooting | Error | Solution | |-------|----------| | `cannot configure default credentials` | Use `--profile` flag or authenticate first | | `PERMISSION_DENIED` | Check workspace permissions | | `permission denied for schema X` | App SP lacks access to a pre-existing schema. See **App Service Principal Permissions** above | | `permission denied for schema appkit` | AppKit cache schema owned by a previous SP or human user. `DROP SCHEMA appkit CASCADE` so the new SP recreates it | | `INSUFFICIENT_PERMISSIONS` / `USE CATALOG` during typegen | SP lacks Unity Catalog grants. See **Unity Catalog Grants** above | | `permission denied for table X` (schema grants exist) | Table-level grants are separate from schema grants. Re-run `GRANT SELECT ON ALL TABLES IN SCHEMA ...` — the table may have been created after the original grant | | `must be owner of table X` | Object was created by a different role. `DROP` and let the SP recreate, or `GRANT ALL` to the SP | | Grants lost after `apps update` or resource change | Removing/re-adding a postgres resource revokes the SP role and all grants. Re-apply all SQL GRANTs. See **Resource Removal Revokes All Grants** above | | Protected branch cannot be deleted | `update-branch` to set `spec.is_protected` to `false` first | | Long-running operation timeout | Use `--no-wait` and poll with `get-operation` |