## Databricks Schema ```mermaid graph LR W(DatabricksWorkspace) -- RESOURCE --> U(DatabricksUser) W -- RESOURCE --> S(DatabricksServicePrincipal) W -- RESOURCE --> G(DatabricksGroup) W -- RESOURCE --> T(DatabricksToken) W -- RESOURCE --> CP(DatabricksClusterPolicy) W -- RESOURCE --> IP(DatabricksInstancePool) W -- RESOURCE --> C(DatabricksCluster) W -- RESOURCE --> SS(DatabricksSecretScope) W -- RESOURCE --> IPL(DatabricksIpAccessList) U -- MEMBER_OF --> G S -- MEMBER_OF --> G G -- MEMBER_OF --> G U -- OWNER_OF --> T S -- OWNER_OF --> T C -- HAS_POLICY --> CP C -- USES_INSTANCE_POOL --> IP ``` ### DatabricksWorkspace A Databricks workspace, scoped by host URL. > **Ontology Mapping**: This node has the extra label `Tenant` to enable cross-platform queries for organizational tenants across different systems. | Field | Description | |-------|-------------| | **id** | Workspace host (e.g. `dbc-xxxx.cloud.databricks.com`) | | **host** | Full workspace URL (indexed) | | tokens_enabled | Whether PATs are enabled in the workspace | | max_token_lifetime_days | Max PAT lifetime in days from the workspace token management settings, or null when the workspace is on the Databricks default policy (the API encodes that as the string `"0"`) | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - `DatabricksUser`, `DatabricksServicePrincipal`, `DatabricksGroup`, `DatabricksToken`, `DatabricksClusterPolicy`, `DatabricksInstancePool`, `DatabricksCluster`, `DatabricksSecretScope`, `DatabricksIpAccessList` belong to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->( :DatabricksUser, :DatabricksServicePrincipal, :DatabricksGroup, :DatabricksToken, :DatabricksClusterPolicy, :DatabricksInstancePool, :DatabricksCluster, :DatabricksSecretScope, :DatabricksIpAccessList ) ``` ### DatabricksUser A workspace SCIM user. > **Ontology Mapping**: This node has the extra label `UserAccount` to enable cross-platform queries for user accounts across different systems. | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{scim_id}` (SCIM ids are not unique across workspaces) | | **scim_id** | Raw SCIM user ID returned by Databricks (indexed) | | **user_name** | SCIM `userName` (typically the email, indexed) | | **email** | Primary email address (indexed) | | display_name | SCIM display name | | external_id | External SCIM ID (federation) | | active | Whether the user is active | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksUser` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksUser) ``` - A `DatabricksUser` is a member of one or more `DatabricksGroup`. ``` (:DatabricksUser)-[:MEMBER_OF]->(:DatabricksGroup) ``` ### DatabricksServicePrincipal A workspace SCIM service principal. > **Ontology Mapping**: This node has the extra label `ServiceAccount` to enable cross-platform queries for non-human accounts across different systems. | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{scim_id}` (SCIM ids are not unique across workspaces) | | **scim_id** | Raw SCIM service principal ID (indexed) | | **application_id** | OAuth application ID (indexed) | | display_name | SCIM display name | | external_id | External SCIM ID (federation) | | active | Whether the service principal is active | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksServicePrincipal` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksServicePrincipal) ``` - A `DatabricksServicePrincipal` is a member of one or more `DatabricksGroup`. ``` (:DatabricksServicePrincipal)-[:MEMBER_OF]->(:DatabricksGroup) ``` ### DatabricksGroup A workspace SCIM group. > **Ontology Mapping**: This node has the extra label `UserGroup` to enable cross-platform group queries. | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{scim_id}` (SCIM ids are not unique across workspaces) | | **scim_id** | Raw SCIM group ID (indexed) | | **display_name** | Group display name (indexed) | | external_id | External SCIM ID (federation) | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksGroup` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksGroup) ``` - A `DatabricksGroup` can be a member of another `DatabricksGroup` (nested groups). ``` (:DatabricksGroup)-[:MEMBER_OF]->(:DatabricksGroup) ``` ### DatabricksToken A Databricks personal access token (PAT) returned by the token management API. | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{token_id}` (token-management ids are workspace-local) | | **token_id** | Raw token id returned by the token-management API (indexed) | | comment | Token description provided at creation | | creation_time | Native datetime when the token was created (UTC) | | expiry_time | Native datetime when the token expires (UTC); null when the token has no expiry | | owner_id | Workspace-scoped composite id of the token owner (matches `DatabricksUser.id` or `DatabricksServicePrincipal.id`) | | created_by_id | Workspace-scoped composite id of the principal that created the token | | created_by_username | Username/email of the principal that created the token (indexed) | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksToken` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksToken) ``` - A `DatabricksUser` or `DatabricksServicePrincipal` owns a `DatabricksToken`. ``` (:DatabricksUser)-[:OWNER_OF]->(:DatabricksToken) (:DatabricksServicePrincipal)-[:OWNER_OF]->(:DatabricksToken) ``` ### DatabricksClusterPolicy A cluster policy returned by the policies API. Cluster policies define a set of allowed configurations a `DatabricksCluster` can be launched with. | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{policy_id}` | | **policy_id** | Raw policy id (indexed) | | **name** | Policy display name (indexed) | | description | Free-text description | | definition | JSON-encoded policy definition (allowed fields, fixed values, …) | | policy_family_id | Policy family id when the policy is derived from a Databricks-provided family | | creator_user_name | User name of the policy creator (indexed) | | created_at | Native datetime when the policy was created (UTC) | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksClusterPolicy` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksClusterPolicy) ``` - A `DatabricksCluster` is launched against a `DatabricksClusterPolicy`. ``` (:DatabricksCluster)-[:HAS_POLICY]->(:DatabricksClusterPolicy) ``` ### DatabricksInstancePool A pre-warmed instance pool that clusters can pull nodes from to reduce startup latency. | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{instance_pool_id}` | | **instance_pool_id** | Raw pool id (indexed) | | **instance_pool_name** | Pool display name (indexed) | | node_type_id | Underlying VM instance type id | | min_idle_instances | Minimum number of idle instances kept warm | | max_capacity | Maximum number of instances the pool can scale to | | idle_instance_autotermination_minutes | Idle instance reclaim window | | enable_elastic_disk | Whether elastic disk autoscaling is enabled | | state | Pool state (`ACTIVE`, `STOPPED`, `DELETED`) | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksInstancePool` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksInstancePool) ``` - A `DatabricksCluster` allocates nodes from a `DatabricksInstancePool`. ``` (:DatabricksCluster)-[:USES_INSTANCE_POOL]->(:DatabricksInstancePool) ``` ### DatabricksCluster A Databricks compute cluster returned by the clusters 2.1 API. | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{cluster_id}` | | **cluster_id** | Raw cluster id (indexed) | | **cluster_name** | Cluster display name (indexed) | | state | Cluster state (`PENDING`, `RUNNING`, `TERMINATED`, …) | | spark_version | Spark / Databricks runtime version string | | runtime_engine | Runtime engine (`STANDARD`, `PHOTON`) | | node_type_id | Worker node VM type id | | driver_node_type_id | Driver node VM type id | | num_workers | Static worker count (null when autoscaling is enabled) | | autotermination_minutes | Idle auto-termination window in minutes | | cluster_source | What created the cluster (`UI`, `JOB`, `API`, `MODELS`, …) | | data_security_mode | UC access mode (`NONE`, `SINGLE_USER`, `USER_ISOLATION`, `LEGACY_*`) | | single_user_name | Owning user for single-user UC clusters (indexed) | | creator_user_name | User name of the cluster creator (indexed) | | instance_pool_id | Raw worker instance pool id, when the cluster targets one (indexed) | | driver_instance_pool_id | Raw driver instance pool id, when the driver targets a distinct pool (indexed) | | enable_local_disk_encryption | Whether local disks are encrypted | | enable_elastic_disk | Whether elastic disk autoscaling is enabled | | start_time | Native datetime when the cluster was first started (UTC) | | terminated_time | Native datetime when the cluster was last terminated (UTC), if applicable | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksCluster` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksCluster) ``` - A `DatabricksCluster` is governed by a `DatabricksClusterPolicy`. ``` (:DatabricksCluster)-[:HAS_POLICY]->(:DatabricksClusterPolicy) ``` - A `DatabricksCluster` allocates nodes from one or more `DatabricksInstancePool` — the worker pool and, when set, a distinct driver pool both land here. ``` (:DatabricksCluster)-[:USES_INSTANCE_POOL]->(:DatabricksInstancePool) ``` ### DatabricksSecretScope A Databricks secret scope. Scopes can be backed by Databricks's own store (`DATABRICKS`) or by an Azure Key Vault (`AZURE_KEYVAULT`). | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{name}` | | **name** | Scope name (indexed) | | backend_type | Backing store (`DATABRICKS`, `AZURE_KEYVAULT`) | | keyvault_resource_id | Azure Key Vault resource id when backend is `AZURE_KEYVAULT` (indexed) | | keyvault_dns_name | Azure Key Vault DNS name when backend is `AZURE_KEYVAULT` | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksSecretScope` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksSecretScope) ``` ### DatabricksIpAccessList An IP access list applied at the workspace level. Restricts inbound access to the workspace to ranges in the allow list, blocks ranges in the block list. | Field | Description | |-------|-------------| | **id** | Workspace-scoped composite id `{workspace_id}/{list_id}` | | **list_id** | Raw list id (indexed) | | **label** | List label (indexed) | | list_type | List type (`ALLOW` / `BLOCK`) | | enabled | Whether the list is enforced | | address_count | Number of addresses in the list | | ip_addresses | Source CIDR / IP entries in the list | | created_at | Native datetime when the list was created (UTC) | | updated_at | Native datetime when the list was last updated (UTC) | | firstseen | Timestamp of when a sync job first created this node | | lastupdated | Timestamp of the last time the node was updated | #### Relationships - A `DatabricksIpAccessList` belongs to a `DatabricksWorkspace`. ``` (:DatabricksWorkspace)-[:RESOURCE]->(:DatabricksIpAccessList) ```