--- title: Specification v0.3 outline: [2, 4] description: The complete xDBML v0.3 specification. Thirty chapters plus five appendices covering lexical conventions, entities, relationships, polymorphism, views, AI-readiness metadata, validation constraints, entity-level checks, the module system (including remote module sources), AST representation, conformance, and DBML 3.13.6 compatibility. --- **License:** [Apache License 2.0](https://github.com/xdbml/xdbml-spec?tab=Apache-2.0-1-ov-file#readme)
**Steward:** [Hackolade](https://hackolade.com) (IntegrIT SA/NV) -- pending governance evolution
**Repository:** [github.com/xdbml/xdbml-spec](https://github.com/xdbml/xdbml-spec)
**Canonical home:** [xdbml.org](https://xdbml.org/) ## Abstract xDBML (eXtended Database Markup Language) is an open markup language for describing the shape of structured data across heterogeneous storage technologies -- relational databases, document stores, columnar warehouses, search engines, graph databases, serialization formats, and API contracts. xDBML is a strict superset of DBML 3.13.6 (the Database Markup Language maintained by Holistics under Apache 2.0) and extends it with constructs for nested hierarchical structures, polymorphism, polyglot vocabulary, first-class JSON data, named reusable types (object-shaped and scalar), AI-readiness metadata, explicit namespace-level declarations, precise relationship cardinality, property-bearing edges for graph models, views, and a module system for splitting schemas across multiple files, locally or over HTTPS. xDBML is designed from the ground up for **AI-assisted data modeling** and **AI-mediated schema interchange**. The language matches the way modern LLMs already describe schemas: nested structures are first-class, polymorphism uses the same `oneOf`/`anyOf`/`allOf` vocabulary as JSON Schema, paths into nested fields use unambiguous dotted notation, and every construct accepts `synonyms:`, `business_term:`, `tags:`, and `granularity:` settings that let natural-language queries resolve to canonical schema elements without guesswork. An LLM asked to "find the monthly recurring revenue field" should not have to infer from column naming conventions; xDBML lets the schema declare the synonyms explicitly. xDBML is designed as the schema layer of a modern data architecture: upstream of physical engine DDLs (SQL, Cassandra CQL, MongoDB validators, Cypher), downstream of contract layers (ODCS) and semantic layers (OSI, dbt MetricFlow). It serves human authors, AI-assisted data modeling workflows, and bidirectional translation with the major serialization formats (Avro, Parquet, JSON Schema, OpenAPI). xDBML extends DBML into a unified metadata and semantic data modeling language with richer support for validation, semantics, cardinality, annotations, and AI-friendly metadata, while deliberately staying readable and Git-friendly. It is designed for AI-assisted data modeling and AI-mediated schema interchange. **Strict-superset claim**: every valid DBML document parses correctly under xDBML rules, and every DBML construct (used in a way valid in DBML) means the same thing in xDBML as in DBML. When a file declares `xdbml: 0.3` (or any 0.x), it has opted into xDBML's extended semantics; the semantic meaning of constructs in that context may legitimately differ from DBML's, because the file is no longer claiming to be DBML. **What's new in v0.3**: remote module sources (§26.14). A `use` or `reuse` directive may now import a module from an `https://` URL in addition to a relative path, with relative references inside a remote module resolving against that module's URL (§26.14.1). This is a strict, backward-compatible addition: every v0.2 document remains valid under v0.3. (v0.2 introduced the module system itself (§26) and scalar Named Types (§14.7).) ## Table of contents [Abstract](#abstract) 1. [Introduction](#_1-introduction) 2. [Document structure](#_2-document-structure) 3. [Lexical conventions](#_3-lexical-conventions) 4. [Version declaration](#_4-version-declaration) 5. [Project](#_5-project) 6. [Container](#_6-container) 7. [Entity](#_7-entity) 8. [Field](#_8-field) 9. [Index](#_9-index) 10. [Checks -- entity-level constraints](#_10-checks-entity-level-constraints) 11. [Relationship and cardinality](#_11-relationship-and-cardinality) 12. [Edge -- relationships with properties](#_12-edge-relationships-with-properties) 13. [View -- derived shapes with source queries](#_13-view-derived-shapes-with-source-queries) 14. [Named type](#_14-named-type) 15. [Enum](#_15-enum) 16. [TablePartial and TableGroup](#_16-tablepartial-and-tablegroup) 17. [Diagram View](#_17-diagram-view) 18. [Note](#_18-note) 19. [Path syntax](#_19-path-syntax) 20. [Polymorphism](#_20-polymorphism) 21. [JSON as a first-class data type](#_21-json-as-a-first-class-data-type) 22. [BSON scalar types](#_22-bson-scalar-types) 23. [AI-readiness and semantic-interop settings](#_23-ai-readiness-and-semantic-interop-settings) 24. [Validation constraints](#_24-validation-constraints) 25. [Records -- sample data](#_25-records-sample-data) 26. [Module system](#_26-module-system) 27. [AST representation](#_27-ast-representation) 28. [Round-trip semantics](#_28-round-trip-semantics) 29. [Relationship to other standards](#_29-relationship-to-other-standards) 30. [Conformance](#_30-conformance) [Appendix A. Reserved keywords](#appendix-a-reserved-keywords) [Appendix B. Lexical grammar reference](#appendix-b-lexical-grammar-reference) [Appendix C. Worked end-to-end examples](#appendix-c-worked-end-to-end-examples) [Appendix D. Compatibility with DBML 3.13.6](#appendix-d-compatibility-with-dbml-3-13-6) [Appendix E. Sources and acknowledgments](#appendix-e-sources-and-acknowledgments) ## 1. Introduction xDBML is born to solve the same frustrations DBML was designed for (difficulty building a mental "big picture" of database structure, opaque field meanings, ER diagrams and DDL that are hard to read and usually outdated), plus additional pains DBML cannot solve: AI-readiness metadata, LLM-portable schemas, nested structures and polymorphism, schema drift across polyglot stacks, and property-bearing graph edges. For the full positioning relative to DBML, see the [FAQ](https://xdbml.org/faq). ### 1.1 Scope xDBML describes the **structural and semantic layer** of data: entities, fields, types, relationships, classifications, validation rules, and AI-readiness metadata -- the *shape* and *meaning* of data. xDBML is the format for **humans and AI to exchange schemas with an xDBML tool**. It is not the round-trip format between a data modeling tool and a target technology. That tool-to-target round-trip happens in **native DDL or schema** -- the tool understands the target's complete capability surface (partitioning, sharding, tablespaces, replication, PL/SQL, triggers, identity columns, advanced constraints, refresh schedules) and preserves it in the tool's own canonical model. xDBML carries the parts of that model with meaning across boundaries. ![xDBML scope diagram](/diagrams/xdbml-scope.svg) **In scope:** - Entity, table, collection, record, and similar structural constructs (§7) - Fields with their declarative type, default, nullability, and validation constraints (§8) - Named reusable types and composition (§14) - Relationships and cardinality, including cross-engine references (§11) - Nested structures: objects, arrays, maps, sets, tuples, and polymorphism (§20, §21) - Property-bearing graph edges (§12) - Views as declarative shape + opaque source query (§13) - Indexes in declarative form (§9) - AI-readiness metadata: synonyms, business terms, classification tags, granularity hints (§23) - Custom properties for organization-specific extensions - Target name declarations and per-Container target overrides (§5) - Notes and prose annotations at every level (§18) **Out of scope, by design:** - **Engine operational features.** Partitioning strategies, sharding configuration, tablespaces, storage models, replication topology, materialized view refresh schedules, clustering keys, time-travel configuration. These stay native to each target. - **Procedural code.** PL/SQL, T-SQL, stored procedures, triggers, server-side functions, computed columns with engine-specific functions. xDBML expresses declarative shape and metadata, not behavior. - **Identity and sequencing details.** xDBML can declare a field as a primary key with auto-generation; the exact sequence configuration is engine-specific. - **Wire-protocol and evolution rules.** Avro schema evolution rules, Protobuf reserved fields, GraphQL federation directives, OpenAPI endpoint operations. - **Query languages.** SQL, Cypher, MQL, GraphQL queries. xDBML generates schemas; consumers write queries. - **Computational meaning.** Measures, metrics, aggregations belong in OSI, dbt MetricFlow, LookML. - **Contractual meaning.** Data quality rules, SLAs, ownership, pricing belong in ODCS. - **Inferential reasoning.** OWL and knowledge graph standards operate at a different layer. - **Provisioning and operations.** Terraform, operators, security policies handle the infrastructure layer. **Round-trip is lossy by design** with respect to target-specific features. An xDBML schema lifted from Oracle DDL preserves the logical content (tables, columns, types, declarative constraints, relationships, indexes in declarative form). It does not preserve operational specifics (partitions, storage configuration, triggers, PL/SQL) -- those are deliberately outside xDBML's scope and remain in the data modeling tool's native representation. The language is also: - Polyglot by default -- no construct privileges relational databases over document, columnar, graph, or exchange-format targets. - Closed-world -- a field not declared in an entity cannot exist in valid data. - Human-authorable, machine-readable, and AI-generable in equal measure. xDBML is a format, not a data modeling tool. The playground at xdbml.org demonstrates the language and works well for learning, prototyping, and small schemas, but sustained enterprise data modeling -- live-database reverse-engineering, target-native DDL generation across many engines, schema diffing and impact analysis, lineage and governance integration, and multi-user collaboration -- requires a purpose-built data modeling tool like ER/Studio, Erwin Data Modeler, or Hackolade. xDBML is designed as the textual exchange format and AI interaction surface that those tools can read, write, and round-trip with: complementing them, not replacing them. ### 1.2 Design principles The language follows seven principles that govern every construct: 1. **Strict superset of DBML.** Every valid DBML 3.13.6 document is a valid xDBML document with identical semantics. 2. **Polyglot by default.** No construct privileges relational databases over document, columnar, graph, or other paradigms. 3. **Vocabulary-permissive at every level.** Container, Entity, and type-level keywords accept target-native synonyms. 4. **Type-name pass-through.** Scalar type names are accepted as-is and lowered by target-specific generators. BSON-specific types (`objectId`, `Decimal128`, `BinData`, etc.) are recognized scalar names; see §22. 5. **Explicit is better than implicit, where it matters.** Path syntax requires explicit array iteration in relationships; polymorphic paths require explicit alternative selectors. 6. **Declarative metadata is welcome; computational and contractual meaning are not.** xDBML carries the declarative meaning of stored data -- names, descriptions, synonyms, glossary references, classification tags, validation constraints -- and leaves computational meaning (measures, metrics, aggregations) to semantic-layer formats and contractual concerns (SLAs, ownership, pricing) to data-contract formats like ODCS. 7. **Versioned evolution.** Every xDBML document declares the spec version it conforms to. ### 1.3 A first look A complete xDBML document spanning Oracle (RDBMS), MongoDB (document), Avro (events), and an LPG graph model: View in playground ```xdbml xdbml: 0.3 Project ecommerce { targets: [Oracle, MongoDB, Neo4j] Note: 'Reference example for the xDBML v0.1 specification' } Type Address { street varchar [not null] city varchar [not null] zip varchar country varchar [default: 'US'] } Type MonetaryAmount { amount Decimal128 [not null] currency string [not null, pattern: '^[A-Z]{3}$', minLength: 3, maxLength: 3, note: 'ISO 4217 currency code'] } Container core [type: schema, target: Oracle] { Table customers { id int [pk] email varchar [unique, not null, pattern: '^[^@]+@[^@]+$'] display_name varchar [not null] primary_address Address created_at timestamp [granularity: second, default: `SYSTIMESTAMP`] } } Container orders_store [type: database, target: MongoDB] { Collection orders { _id objectId [pk] customer_id int32 [not null] placed_at Date [granularity: second] total MonetaryAmount line_items array [ line_item object { sku string [not null] quantity int32 [not null, minimum: 1] unit_price MonetaryAmount } ] payment_method oneOf { card object { last4 string [maxLength: 4] brand string } bank object { iban string } wallet object { provider string account string } } [discriminator: method_kind] } } Container catalog [type: schema, target: Oracle] { Table products { sku varchar [pk] name varchar [not null, synonyms: ['product name', 'item name']] price MonetaryAmount metadata json { weight_grams int dimensions object { length int width int height int } tags array [varchar] } } View top_sellers [materialized: true, refresh_schedule: 'daily'] { source_query: ''' SELECT p.sku, p.name, COUNT(*) AS order_count FROM products p JOIN orders_store.orders o ON o.line_items.sku = p.sku GROUP BY p.sku, p.name ORDER BY order_count DESC FETCH FIRST 100 ROWS ONLY ''' sku varchar [pk] name varchar order_count int } } Container social [type: database, target: Neo4j] { Edge FOLLOWS [source: core.customers, target: core.customers, source_cardinality: '0..*', target_cardinality: '0..*'] { since date [not null] is_close boolean [default: false] } } Ref: orders_store.orders.customer_id > core.customers.id [source: '1..*', target: '1..1'] Ref: orders_store.orders.line_items.[*].sku > catalog.products.sku ``` This single document expresses an Oracle relational schema, a MongoDB document collection with nested arrays and polymorphism, a catalog with both an entity and a materialized view, an LPG social-graph keyspace with a property-bearing `FOLLOWS` edge, two reusable named types, and cross-container relationships including one that traverses an array. It would round-trip to Oracle DDL, MongoDB collection validators, JSON Schema, Avro, Neo4j Cypher schema, and the ODCS schema section. ## 2. Document structure An xDBML document consists of, in order: 1. A version declaration (§4), optionally followed by an `experimental:` opt-in clause. 2. Zero or one `Project` declaration (§5). 3. Any number of top-level declarations in any order: `Container` (§6), `Entity` at project level (§7), `Edge` (§12), `View` (§13), `Type` (§14), `Enum` (§15), `Ref` (§11), `TablePartial`, `TableGroup` (§16), `DiagramView` (§17), `Note` (§18). Comments may appear anywhere whitespace is permitted. ## 3. Lexical conventions ### 3.1 Whitespace and comments Whitespace is insignificant except as a token separator. ```xdbml // Single-line comment, runs to end of line /* Block comment; may span lines */ ``` ### 3.2 Identifiers A bare identifier matches `[A-Za-z_][A-Za-z0-9_]*`. Quoted identifiers are written in double quotes and may contain any character: `"my-table"`, `"first name"`. ### 3.3 String literals Single-quoted (`'a string'`) with `\'` and `\\` escapes. Triple-quoted multi-line (`'''line one\nline two'''`) normalizes indentation. ### 3.4 Number literals Conventional syntax: `42`, `3.14`, `-100`, `1.5e10`. ### 3.5 Expression literals Backtick-quoted strings pass through as engine-native expressions: `` `SYSTIMESTAMP` ``, `` `uuid_generate_v4()` ``. ### 3.6 Block delimiters Curly braces `{ ... }` group block bodies. Square brackets `[ ... ]` carry settings and array type expressions. ### 3.7 Setting separator (the colon) xDBML uses a colon `:` to separate a **setting name from its value**. The colon is required wherever a setting has both a name and a value: View in playground ```xdbml Project ecommerce { targets: Oracle // setting name "targets", value Oracle Note: 'reference example' // setting name "Note", value "reference example" } Container core [type: schema] { // setting name "type", value schema Table customers { Note: 'customer master' // setting name "Note", value "customer master" id int [pk] email varchar [unique, not null, default: 'unknown'] // setting name "default", value "unknown" } } ``` The colon is **not** used for: - **Construct keywords**: `Project ecommerce {`, `Container core [...] {`, `Table customers {`, `Type Address {`. The construct keyword and the declared name are juxtaposed without a colon. - **Field type declarations**: `id int`, `name varchar`. The field name and type are juxtaposed. - **Boolean-style flags inside bracket lists**: `[pk]`, `[not null]`, `[unique]`, `[increment]`. These are markers, not key-value pairs. **One exception**: the `Ref:` statement keyword includes the colon as part of its syntax (`Ref: source > target`). This is a DBML inheritance preserved for compatibility, even though `Ref` is a top-level declaration rather than a setting. ### 3.8 Case sensitivity xDBML follows a two-tier case-sensitivity policy that balances tolerance (accept variant casings users naturally type) with precision (preserve identifier casing that target databases require). **Keywords are case-insensitive.** All language keywords -- construct names (`Project`, `Container`, `Table`, `Entity`, `Type`, `Edge`, `View`, `Note`), settings (`targets`, `target`, `default`, `pattern`, `not null`, `pk`), reserved type-related words (`int`, `varchar`, `objectId`, `Decimal128`), and structural words (`oneOf`, `anyOf`, `array`, `object`) -- match any casing. The parser accepts `note`, `Note`, `NOTE`, and `nOtE` as the same keyword. **Identifiers are case-sensitive.** Names you give to entities, fields, containers, named types, edges, views, and enums preserve user-supplied case. `customer_email` and `Customer_Email` are different identifiers. This is essential because target databases have different casing conventions: - MongoDB: typically camelCase (`orderId`, `customerEmail`) - Cassandra: lower_snake_case strictly enforced - Oracle: UPPER_SNAKE_CASE by default - PostgreSQL: lower_snake_case by convention A roundtrip from one of these targets to another may require explicit identifier renaming; xDBML does not silently re-case identifiers. **Target name values are case-insensitive and alias-normalized.** Values declared in `targets:` (Project) and `target:` (Container) settings are matched case-insensitively and resolved through an alias table (§5.1). So `targets: postgres`, `targets: Postgres`, `targets: PostgreSQL`, and `targets: POSTGRESQL` all normalize to the canonical `PostgreSQL`. **Documentation convention.** This specification and all official examples use a consistent stylistic convention even though the parser is tolerant: - Construct keywords are capital-initial: `Project`, `Container`, `Table`, `Entity`, `Type`, `Edge`, `View` - Block-body settings starting with capital are major annotations: `Note:` at Project/Container/Table/Entity level - Bracket-list settings are lowercase: `[pk]`, `[not null]`, `[default: 'x']`, `[note: 'inline note']` - Type and value keywords are lowercase: `int`, `varchar`, `string`, `array`, `object`, `oneOf` This pattern is inherited from DBML and recommended for new schemas. The parser accepts variant casings but emitting tools, formatters, and the spec itself follow this convention. ### 3.8 Element and field separators Whitespace (including newlines) is insignificant, so within a body the separator between items depends on the construct. Where a comma is optional it may be replaced by a newline; where a comma is required it may still be wrapped across lines. | Construct | Body holds | Separator | |---|---|---| | Entity, Edge, `object` / `struct` / `record`, `json { }` schema | fields | newline -- comma not allowed | | `oneOf` / `anyOf` / `allOf` | named alternatives | newline -- comma not allowed | | tuple (`array [ [0] ... [1] ... ]`) | positional elements | comma and/or newline (comma optional) | | `map` | key type, value type | comma (required) | | `union` | member types | comma (required) | | `array` / `list` / `set` | element type (or a comma-list that desugars to a `union`) | none, or comma between the union-sugar members | An `array`/`set` takes an element *type expression*. For a mix of element types, use a `union` (scalars/named types) or `oneOf` (object shapes), or a tuple (fixed positions); for `array` and `set`, a comma-separated list of scalar/named types is shorthand for a `union` (§8.4, §8.6, §20). ## 4. Version declaration ```xdbml xdbml: 0.3 ``` SemVer-style `MAJOR.MINOR[.PATCH]` numbering. Appears at the top of the file, before any other construct. ### 4.1 Parser behavior | Document declares | Parser supports | Behavior | |---|---|---| | (no declaration) | any | Treated as DBML 3.13.6 -- no xDBML extensions recognized. | | `xdbml: 0.1` | 0.1 | Parsed with v0.1 feature set. | | `xdbml: 0.2` | 0.2+ | Parsed successfully (MINOR-version backward compatibility). | | `xdbml: 0.3` | 0.2 | Warning + refuse to parse. | ### 4.2 Experimental features ```xdbml xdbml: 0.3 experimental: [graph_path_expressions, temporal_validity] ``` Without the opt-in, experimental constructs are not recognized. No backward-compatibility guarantee for experimental features. ## 5. Project ```xdbml Project ecommerce { targets: Oracle Note: 'Reference example' } ``` For polyglot schemas spanning multiple targets, `targets:` accepts an array: ```xdbml Project polyglot_ecommerce { targets: [Oracle, MongoDB, Avro] Note: 'Customer master in Oracle, orders in MongoDB, events in Avro' } ``` At most one `Project` may appear. Recognized settings: `targets`, `Note`. The legacy `database_type:` setting is also accepted for DBML compatibility (see §5.3). ### 5.1 Target name vocabulary Target names are bare identifiers when they have no spaces or special characters (`Oracle`, `PostgreSQL`, `MongoDB`, `Snowflake`, `Neo4j`, `Avro`, `Cassandra`, `BigQuery`, `Databricks`). They must be quoted when they contain spaces or punctuation (`'SQL Server'`, `'Cosmos DB'`, `'JSON Schema'`). Target name values are matched case-insensitively and resolved through the alias table below to a canonical form. The canonical form is what appears in the normalized AST and what generators key off of. | Canonical name | Recognized aliases | |---|---| | `Oracle` | -- | | `PostgreSQL` | `Postgres`, `pg` | | `'SQL Server'` | `MSSQL`, `'Microsoft SQL Server'`, `'T-SQL'` | | `MySQL` | -- | | `MariaDB` | -- | | `SQLite` | -- | | `Snowflake` | -- | | `Databricks` | -- | | `BigQuery` | `'Google BigQuery'` | | `Redshift` | `'Amazon Redshift'` | | `'Synapse Analytics'` | `'Azure Synapse'` | | `TimescaleDB` | -- | | `ClickHouse` | -- | | `MongoDB` | `Mongo` | | `DocumentDB` | `'AWS DocumentDB'` | | `'Cosmos DB'` | `Cosmos`, `'Azure Cosmos DB'` | | `Couchbase` | -- | | `RavenDB` | -- | | `DynamoDB` | -- | | `Cassandra` | `'Apache Cassandra'` | | `ScyllaDB` | -- | | `Neo4j` | -- | | `Memgraph` | -- | | `Neptune` | `'Amazon Neptune'` | | `JanusGraph` | -- | | `Avro` | `'Apache Avro'` | | `Parquet` | `'Apache Parquet'` | | `Protobuf` | `'Protocol Buffers'`, `Proto` | | `'JSON Schema'` | -- | | `OpenAPI` | `Swagger` | | `GraphQL` | -- | | `Elasticsearch` | -- | | `OpenSearch` | -- | The list is non-exhaustive. Target name values outside the table are accepted by the parser but may produce generic output from target-aware generators. The official list grows over time as the ecosystem adds first-class support for new targets. ### 5.2 Container target resolution Each Container has an effective target -- the storage technology, exchange format, or API platform it lowers to. The effective target is resolved as follows: 1. If the Container declares `target:` explicitly, that value is the effective target. 2. Otherwise, if the Project declares exactly one entry in `targets:`, the Container inherits that single target. 3. Otherwise (Project declares multiple targets but Container declares none), the Container's effective target is undefined and the parser MUST emit an error. 4. If the Container declares a `target:` value that does not appear in the Project's `targets:` list, the parser MUST emit an error. ### 5.3 DBML compatibility: the `database_type:` alias xDBML is a strict superset of DBML 3.13.6, which uses `database_type:` to declare a single target engine at the Project level. xDBML preserves this keyword as a legacy alias for single-target schemas: ```xdbml // DBML-compatible (legacy) Project simple { database_type: 'PostgreSQL' } // xDBML preferred equivalent Project simple { targets: PostgreSQL } ``` Compatibility rules: 1. `database_type:` accepts only a single string or bare identifier value. Array values are not permitted (DBML never supported them). 2. A Project that declares `database_type: X` is semantically equivalent to a Project that declares `targets: X`. Parsers MUST normalize the AST to use `targets:` with a single-element list. 3. A Project MUST NOT declare both `database_type:` and `targets:`. Doing so produces a parser error (ambiguous intent). 4. New schemas SHOULD prefer `targets:` over `database_type:`. The legacy form remains supported for round-trip fidelity with DBML tooling and for documents authored before xDBML v0.1. 5. The `database_type:` keyword has no Container-level equivalent. Container-level target declarations always use `target:`. ```xdbml // Single-target project; Container target is implicit Project simple { targets: PostgreSQL } Container app [type: schema] { // implicit target: PostgreSQL Table users { ... } } // Polyglot project; each Container must declare its target Project polyglot { targets: [Oracle, MongoDB] } Container core [type: schema, target: Oracle] { // explicit Table customers { ... } } Container orders_store [type: database, target: MongoDB] { // explicit Collection orders { ... } } ``` ## 6. Container The namespace-level construct between Project and Entity. ### 6.1 Declaration ```xdbml Container core [type: schema] { Note: 'Core domain entities' Entity customers { ... } Entity accounts { ... } } ``` ### 6.2 Keyword synonyms | Keyword | Native to | |---|---| | `Container` | xDBML canonical | | `Schema` | Oracle, PostgreSQL, SQL Server, Snowflake | | `Database` | MongoDB, Cosmos DB | | `Keyspace` | Cassandra, ScyllaDB | | `Namespace` | Avro, ORC, GraphQL | | `Dataset` | BigQuery | | `Bucket` | Couchbase | ### 6.3 The `type:` setting Names the target-native flavor (`schema`, `database`, `keyspace`, `namespace`, `dataset`, `bucket`, `index`). ### 6.4 Container settings ```xdbml Container core [type: schema, target: Oracle] { ... } Container kv_store [type: keyspace, target: Cassandra, replication: '{class: NetworkTopologyStrategy, dc1: 3, dc2: 2}', durable_writes: true] { ... } ``` The `target:` setting declares the specific target technology for the Container. It is optional in single-target projects (where it is inherited from `Project.targets`) and required in polyglot projects with more than one declared target. See §5.2 for the resolution rule. Other Container settings are open-vocabulary key-value pairs. Generators consume what they understand. ### 6.5 Optional containers Entities are not required to belong to a container. An entity belongs to at most one container. ### 6.6 Cross-container references Relationships, edges, and inline references use dotted notation with explicit container prefix: View in playground ```xdbml Container core [type: schema] { Table customers { id int [pk] } } Container sales [type: schema] { Table orders { id int [pk] customer_id int [ref: > core.customers.id] } } Ref: sales.orders.customer_id > core.customers.id ``` The dotted path is `container_name.entity_name.field_path`. ## 7. Entity A named field-bearing construct. ### 7.1 Declaration View in playground ```xdbml Table customers { id int [pk] email varchar [unique, not null] name varchar } ``` ### 7.2 Keyword synonyms | Keyword | Native to | |---|---| | `Entity` | xDBML canonical | | `Table` | Oracle, RDBMS, Cassandra, Hive, BigQuery, Databricks, Snowflake,... | | `Collection` | MongoDB, Cosmos DB, Couchbase | | `Record` | Avro, Parquet, GraphQL | ### 7.3 Schema-qualified declaration ```xdbml Table core.users { ... } ``` Produces an Entity named `users` inside an implicit container `core`. Explicit `Container { Entity ... }` blocks are equivalent and preferred for new xDBML. ### 7.4 Aliases ```xdbml Entity very_long_user_table as U { ... } Ref: U.id < posts.user_id ``` ### 7.5 Entity settings | Setting | Meaning | |---|---| | `note` | Documentation | | `headercolor` | Visualization color | | `synonyms` | Alternative names (§23.1) | | `business_term` | Glossary reference (§23.2) | ## 8. Field ```xdbml field_name field_type [field_settings] ``` ### 8.1 Field types - **Scalar type name**: `int`, `varchar`, `decimal(19,4)`, `timestamp`, etc. - **BSON scalar type** (§22): `string`, `int32`, `int64`/`long`, `double`, `bool`, `null`, `objectId`, `Decimal128`, `BinData`, `Date`, `Timestamp`, `Regex`, `JavaScript`, `MinKey`, `MaxKey`, `Symbol`, `DBPointer`. - **Complex type expression** (§8.2): `object`, `array`, `map`, `set`. - **Polymorphism construct** (§20): `union`, `oneOf`, `anyOf`, `allOf`. - **Named type reference** (§14): bare name of a previously declared `Type`. - **JSON-with-schema** (§21): `json`, `jsonb`, `variant`. ### 8.2 Complex type keywords | Concept | Primary | Aliases | |---|---|---| | Structured value | `object` | `struct`, `record` | | Ordered list | `array` | `list` | | Heterogeneous tuple | `array` (with positional indices) | `list` | | Unique collection | `set` | -- | | Key-value collection | `map` | `dict`, `dictionary` | ### 8.3 Object type ```xdbml address object { street string [not null] city string [not null] zip string country varchar [default: 'US'] } [not null, note: 'Primary mailing address'] ``` ### 8.4 Array of scalars The brackets after `array` hold the element *type*. The common case is a uniform element type: ```xdbml tags array [varchar] // every element is a varchar scores array [int [not null]] // the element type carries its own settings ``` JSON and BSON arrays are not required to be uniform. For a mix of scalar types, list them comma-separated: `array [T1, T2, ...]` is shorthand for `array [union [T1, T2, ...]]` (§20.1), so these two are equivalent: ```xdbml ids array [int, varchar] // shorthand ids array [union [int, varchar]] // the same thing, written out ``` The comma-list shorthand works the same for `array`, `list`, and `set`, and its members must be scalar or named types (or `null`) -- the members a `union` accepts. For a mix of object shapes use `oneOf` (§20.5); for a fixed positional layout use a tuple (§8.6); for a value with no element typing at all use an opaque `json` field (§21.1). ### 8.5 Array of named objects ```xdbml line_items array [ line_item object { sku varchar [not null] quantity int [not null, default: 1] unit_price decimal(10,2) [not null] } ] [not null] ``` ### 8.6 Heterogeneous tuple ```xdbml addresses array [ [0] billing object { street string city string zip string } [1] shipping object { street string city string zip string } ] ``` Tuple elements may be separated by a comma, a newline, or both -- the separator is optional and position-independent, so the compact single-line form below is equivalent to the multi-line form above: ```xdbml coords array [ [0] lat decimal(9,6), [1] lng decimal(9,6) ] ``` Positions are zero-indexed and must form a contiguous range starting at 0. ### 8.7 Maps and sets ```xdbml flags map [string, boolean] permissions set [varchar] ``` ### 8.8 Common field settings | Setting | Behavior | |---|---| | `pk` / `primary key` | Primary key | | `null` / `not null` / `required` | Cardinality | | `unique` | Single-field unique constraint | | `default: value` | Default value | | `increment` | Auto-increment | | `note: 'string'` | Documentation | | `ref: target` | Inline foreign key (§11) | | `check: expr` | Field-level check constraint | | Validation constraints | §24 | | AI-readiness | §23 | `required` is accepted as a synonym for `not null` and is provided for authors familiar with JSON Schema, OpenAPI, and Avro vocabularies where `required` is the conventional keyword. Parsers MUST normalize `required` to `not null` in the AST. Tools that emit xDBML SHOULD use `not null` as the canonical form. Both forms are semantically identical: the field must be present and must have a non-null value. ### 8.9 Cardinality vs. type-level nullability `null` / `not null` settings express **cardinality** -- whether a field is required to be present and whether `null` is acceptable. The token `null` inside a `union [ ... ]` list (§20.1) expresses **type membership** -- that the null literal is one of the permitted runtime types. These are distinct concepts. ## 9. Index View in playground ```xdbml Table bookings { id int country varchar booking_date date created_at timestamp indexes { (id, country) [pk] created_at [name: 'created_at_index'] booking_date (country, booking_date) [unique] booking_date [type: hash] (`id*2`) } } ``` ### 9.1 Index forms | Form | Example | |---|---| | Single field | `created_at [name: 'idx_created']` | | Composite | `(col1, col2)` | | Expression | `` (`expr`) `` | ### 9.2 Index settings | Setting | Values | |---|---| | `type` | `btree`, `hash`, engine-specific | | `name` | string | | `unique` | flag | | `pk` | flag | | `note` | string | ### 9.3 Indexes on nested fields Index paths use the path syntax of §19: ```xdbml indexes { shipping.address.country // object navigation line_items.sku // implicit array iteration line_items.[*].sku // explicit array iteration addresses.[0].city // tuple positional flags.["dark_mode"] // map literal key method.card.brand // polymorphic alternative } ``` For homogeneous arrays, the implicit iteration form is equivalent to the explicit form. For heterogeneous tuples and `Ref` source paths, explicit syntax is required. ## 10. Checks -- entity-level constraints A `checks { }` block declares multi-column constraint expressions on an entity. It is the entity-level counterpart to the field-level `check:` setting (§8.8): where the field-level form expresses a constraint on a single field, the entity-level block expresses constraints that span two or more fields. The block is a peer of the `indexes { }` block (§9) -- both are entity-level declarations that constrain or annotate the entity's data shape. ### 10.1 Declaration View in playground ```xdbml Entity users { id int [pk] wealth decimal(15,2) debt decimal(15,2) checks { `debt + wealth >= 0` [name: 'chk_positive_net_worth'] `wealth >= 0` [name: 'chk_non_negative_wealth'] `debt >= 0` } } ``` Each line inside `checks { }` is a **check expression** -- a backtick-wrapped boolean expression in the target engine's expression language. xDBML treats expressions as opaque strings; the parser does not interpret or validate them. Generators emit them verbatim as the target engine's `CHECK` constraint or equivalent. ### 10.2 Check settings Each check expression accepts optional bracket settings: | Setting | Value | Purpose | |---|---|---| | `name` | string | Name of the constraint as it appears in the generated DDL (`CONSTRAINT chk_name CHECK (...)`). If omitted, the generator chooses a name. | | `note` | string | Free-text annotation. Carries through to generators that support constraint comments. | When `name:` is omitted, generators MAY produce a deterministic synthetic name (e.g., `_chk_`) for reproducible DDL. ### 10.3 Check expressions and the target engine Check expressions are not portable across targets in general. An expression valid in PostgreSQL may not parse in MongoDB, and a MongoDB document-level validator uses an entirely different expression syntax. xDBML preserves the expression as written. Round-trip behavior: - **Single-target schemas**: the expression is emitted verbatim to the target. - **Polyglot schemas**: the expression is emitted to targets where it is syntactically valid; for other targets, generators MAY emit a comment placeholder or omit the constraint with a warning. For maximum portability, authors are encouraged to use expressions that work across the target families they care about, or to use field-level `check:` settings (§8.8) which are more often portable. ### 10.4 Relationship to field-level `check:` The field-level `check:` setting (§8.8) is preferred for single-column constraints because it sits next to the field it constrains: View in playground ```xdbml Entity products { price decimal(10,2) [check: `price > 0`] } ``` The entity-level `checks { }` block is for constraints that **cannot be expressed at field level** -- typically because they reference two or more fields: View in playground ```xdbml Entity reservations { start_date date end_date date checks { `start_date <= end_date` [name: 'chk_valid_date_range'] `end_date - start_date <= 30` [name: 'chk_max_30_days'] } } ``` Both forms generate `CHECK` constraints in SQL targets. Field-level constraints are scoped to a single column in the generated DDL; entity-level constraints reference multiple columns by name. ### 10.5 Relationship to validation constraints (§24) xDBML provides two complementary constraint mechanisms: - **Validation constraints** (§24) -- declarative shape constraints (`pattern`, `minLength`, `maxLength`, range, enum membership). Portable across targets. Lower naturally to JSON Schema, OpenAPI, Avro validators, and SQL `CHECK` for the constraints SQL can express. - **Check expressions** (this section, plus §8.8 for field-level) -- arbitrary boolean expressions in the target engine's language. Maximum expressiveness, but coupled to the target. Use validation constraints when the shape constraint can be expressed declaratively (`email matches pattern X`, `age between 0 and 150`). Use check expressions when you need cross-column logic or engine-specific predicates (`CASE WHEN status = 'shipped' THEN tracking_number IS NOT NULL END`). ## 11. Relationship and cardinality ### 11.1 Cardinality operators | Operator | Maximum-cardinality | |---|---| | `<` | one-to-many | | `>` | many-to-one | | `-` | one-to-one | | `<>` | many-to-many | ### 11.2 Declaration forms ```xdbml // Long form Ref name_optional { schema1.entity1.field1 < schema2.entity2.field2 } // Short form Ref name_optional: schema1.entity1.field1 < schema2.entity2.field2 // Inline form Entity schema2.entity2 { id integer field2 integer [ref: > schema1.entity1.field1] } ``` Relationships are declared at project level. They may span containers. ### 11.3 Cross-container references ```xdbml Ref: sales.orders.customer_id > core.customers.id ``` Dotted notation: `container_name.entity_name.field_path`. For relationships into nested fields, the path continues into the nested structure. ### 11.4 Composite foreign keys ```xdbml Ref: merchant_periods.(merchant_id, country_code) > merchants.(id, country_code) ``` ### 11.5 Relationships into nested fields ```xdbml Ref: orders.shipping.address.country > countries.iso_code Ref: customers.addresses.[0].zip > postal_codes.code ``` ### 11.6 Explicit `.[*]` required for array crossings When a `Ref` source path crosses a homogeneous array, the explicit `.[*]` form is required: ```xdbml Ref: orders.line_items.[*].sku > products.sku // valid Ref: orders.line_items.sku > products.sku // rejected ``` ### 11.7 Explicit cardinality The operators express maximum cardinality with optionality inferred from FK nullability. For precise cardinality on both ends, the `source:` and `target:` settings carry UML-style `'min..max'` strings: ```xdbml // Compact form: operators imply defaults Ref: orders.customer_id > customers.id // Explicit form: precise cardinality on both ends Ref: orders.customer_id > customers.id [source: '1..*', target: '1..1'] // Optional parent (each Pet may have no Owner) Ref: pets.owner_id > people.id [source: '0..*', target: '0..1'] ``` Recognized cardinality strings: | String | Meaning | |---|---| | `'1..1'` | Exactly one | | `'0..1'` | Zero or one | | `'1..*'` | One or more | | `'0..*'` | Zero or more | | `'N..M'` | N to M (non-negative integers; `*` allowed for M) | Alternative four-key form: ```xdbml Ref: pets.owner_id > people.id [min_source: 0, max_source: '*', min_target: 0, max_target: 1] ``` Both forms produce the same AST. The compact-string form is canonical. ### 11.8 Default cardinality inference | Operator | FK nullable? | Source | Target | |---|---|---|---| | `>` | no | `'1..*'` | `'1..1'` | | `>` | yes | `'0..*'` | `'1..1'` | | `<` | no | `'1..1'` | `'1..*'` | | `<` | yes | `'1..1'` | `'0..*'` | | `-` | no | `'1..1'` | `'1..1'` | | `-` | yes | `'1..1'` | `'0..1'` | | `<>` | (n/a) | `'0..*'` | `'0..*'` | ### 11.9 Relationship settings ```xdbml Ref: products.merchant_id > merchants.id [delete: cascade, update: no action] Ref: legacy.user_id > users.id [inactive, color: '#999999', note: 'historical FK; superseded by audit_log.user_ref'] ``` | Setting | Values | |---|---| | `delete` / `update` | `cascade`, `restrict`, `set null`, `set default`, `no action` | | `color` | `#rgb` or `#rrggbb` | | `source` / `target` | Cardinality strings (§11.7) | | `min_source` / `max_source` / `min_target` / `max_target` | Cardinality components | | `inactive` | Flag (no value). Marks the relationship as inactive; visualization tools render it as a dotted line and may exclude it from cardinality computation. The relationship still exists structurally; this is a visualization and documentation hint, not a removal. | | `note` | Free-text annotation. | The `inactive` flag is useful for documenting historical or deprecated foreign keys that remain in the schema but are no longer enforced or relied upon by application code. It has no effect on DDL generation in targets that don't distinguish active from inactive relationships. Settings are not supported on inline `ref:` declarations. ## 12. Edge -- relationships with properties Labeled Property Graph (LPG) databases -- Neo4j, Memgraph, JanusGraph, Amazon Neptune (LPG mode) -- and RDF-star treat relationships as first-class entities that carry properties. ### 12.1 Declaration View in playground ```xdbml Edge KNOWS [source: Person, target: Person] { since date [not null] intimacy int [minimum: 0, maximum: 10] context varchar } Edge RATED [source: User, target: Movie] { rating int [not null, minimum: 1, maximum: 5] rated_at timestamp [granularity: second, not null] review varchar } ``` An Edge declaration carries a name, `source` and `target` settings pointing to connected entities, and a body of fields. ### 12.2 Cardinality on edges Edges use `source_cardinality` and `target_cardinality` settings (distinct from `source:` and `target:` which name the connected entities): View in playground ```xdbml Edge OWNS [source: Person, target: Pet, source_cardinality: '0..*', target_cardinality: '0..1'] { acquired_date date purchase_price decimal(10,2) } ``` ### 12.3 Directionality View in playground ```xdbml Edge FRIENDS_WITH [source: Person, target: Person, undirected: true] { since date } ``` When `undirected: true`, source and target are interchangeable. ### 12.4 Multiple edges between the same entities View in playground ```xdbml Edge LIKES [source: User, target: Post] { liked_at timestamp } Edge SHARED [source: User, target: Post] { shared_at timestamp audience varchar } Edge BLOCKED [source: User, target: Post] { blocked_at timestamp reason varchar } ``` ### 12.5 Edges with named types and partials View in playground ```xdbml Type RelationshipMetadata { created_at timestamp [granularity: second] created_by varchar source_system varchar } Edge KNOWS [source: Person, target: Person] { ~RelationshipMetadata since date intimacy int } ``` ### 12.6 Indexes on edges View in playground ```xdbml Edge RATED [source: User, target: Movie] { rating int [not null] rated_at timestamp indexes { rated_at (rated_at, rating) } } ``` ### 12.7 Round-trip across targets | Target | Edge representation | |---|---| | Neo4j / LPG | Native relationship type with properties | | RDF-star | Quoted triple with annotation properties | | RDBMS | Junction table with FKs plus property columns | | MongoDB | Array of subdocuments on source-side entity | | Avro | Edge events with `source_id`, `target_id`, property fields | | RDF (plain) | Reified statements (lossy) | ## 13. View -- derived shapes with source queries ### 13.1 Declaration View in playground ```xdbml View active_customers [materialized: false] { source_query: ''' SELECT id, email, name, created_at FROM customers WHERE deleted_at IS NULL ''' id int [pk] email varchar name varchar created_at timestamp [granularity: second] } View monthly_revenue [materialized: true, refresh_schedule: 'daily', source_database: 'Oracle'] { source_query: ''' SELECT TRUNC(placed_at, 'MM') AS month, SUM(total) AS revenue FROM orders GROUP BY TRUNC(placed_at, 'MM') ''' month date [pk] revenue decimal(15,2) [granularity: month] } ``` ### 13.2 Materialized vs. virtual - **`materialized: false`** (default) -- virtual view; query executes on access. - **`materialized: true`** -- materialized view; query precomputed and stored. | Setting | Meaning | |---|---| | `refresh_schedule` | Refresh schedule (`hourly`, `daily`, `weekly`, engine-specific) | | `refresh_on` | Source entities triggering refresh | | `storage_options` | Engine-specific storage hints | ### 13.3 Source query as opaque metadata xDBML does not parse the `source_query` string -- it's target-specific. The query is captured verbatim and passed to generators. xDBML can express views for non-SQL query languages (Cypher, MongoDB aggregation pipelines, KSQL) by storing the engine-native query and tagging with `source_database`. ### 13.4 Views in containers ```xdbml Container analytics [type: schema] { Entity orders { ... } View revenue_by_month [materialized: true] { source_query: '...' month date revenue decimal(15,2) } } ``` Cross-container references treat views and entities identically. ### 13.5 View settings | Setting | Meaning | |---|---| | `source_query` | Defining query (opaque string) | | `materialized` | `true` or `false` (default `false`) | | `refresh_schedule` | Refresh schedule | | `refresh_on` | Source entities | | `source_database` | Database flavor of source query | | `note` | Documentation | | `synonyms`, `business_term` | AI-readiness (§23) | ### 13.6 Round-trip across targets | Target | View representation | |---|---| | Oracle / PostgreSQL / SQL Server / Snowflake | `CREATE VIEW` or `CREATE MATERIALIZED VIEW` | | MongoDB | `db.createView()` for virtual; `$merge` for materialized | | BigQuery | `CREATE VIEW` / `CREATE MATERIALIZED VIEW` | | Databricks | `CREATE OR REPLACE VIEW` / `CREATE MATERIALIZED VIEW` | | Elasticsearch | Continuous transforms (materialized) | ## 14. Named type A `Type` declaration defines a named, reusable type expression. ### 14.1 Declaration ```xdbml Type Address { street varchar [not null] city varchar [not null] zip varchar country varchar [default: 'US'] } Type MonetaryAmount { amount Decimal128 [not null] currency string [not null, pattern: '^[A-Z]{3}$', minLength: 3, maxLength: 3] } ``` ### 14.2 Reference ```xdbml Table orders { shipping_address Address billing_address Address total MonetaryAmount [not null] } Entity customers { addresses array [address Address] } ``` Built-in type keywords always take precedence; named types cannot shadow them. ### 14.3 Composition and recursion ```xdbml Type ContactInfo { email varchar phone varchar address Address } Type TreeNode { value int children array [child TreeNode] // self-reference } ``` ### 14.4 Type settings ```xdbml Type Address [note: 'Used across customer and order entities'] { ... } ``` Recognized: `note`, `synonyms`, `business_term`. ### 14.5 Scope Project-scoped. Name collisions across `Type`, `Entity`, `View`, `Edge`, `Enum`, `TablePartial` are rejected at parse time. ### 14.6 Relationship to TablePartial | `Type` when... | `TablePartial` when... | |---|---| | Reusing a value shape | Reusing a set of entity fields | | Reference from nested structures | Splice fields at entity top level | | Cross-references into type's fields | Fields appear as direct columns | | One canonical definition for many | Each entity gets own copy of fields | ### 14.7 Scalar named types (new in v0.2) The Type declarations in §14.1 through §14.6 describe object-shaped reusable structures -- a Type body contains one or more named inner fields. v0.2 extends Named Types to also support **scalar shapes** -- a Type whose entire content is a single field's worth of declaration. **Syntax**: ```xdbml Type Email varchar [pattern: '^[^@]+@[^@]+$', tags: ['pii'], note: 'Standard enterprise email'] Type CountryCode varchar [pattern: '^[A-Z]{2}$', minLength: 2, maxLength: 2] Type Percentage decimal(5,2) [minimum: 0, maximum: 100] ``` The Type's base is a built-in type (or another Named Type), and the bracketed settings carry the full field-level surface (validation constraints, notes, AI-readiness metadata, custom properties). **Usage**: scalar named types are used identically to object-shaped Types -- as the type of a field declaration: View in playground ```xdbml Type Email varchar [pattern: '^[^@]+@[^@]+$', tags: ['pii']] Entity users { id int [pk] email Email } Entity admins { id int [pk] email Email } Entity contacts { id int [pk] email Email } ``` The three entities each get an `email` field of type `varchar` with the pattern validation and tags attached. **Importability**: scalar Named Types are importable via the `type` element type in the module system (§26): ```xdbml reuse { type Email } from './enterprise-types' Entity employees { work_email Email } ``` **Relationship to field-level imports**: scalar Named Types and field-level imports (§26.9) both follow the declaration-vs-placement pattern: the construct declares a reusable shape; placement happens by using the shape's name as a field's type. The difference is in where the reusable shape originates: scalar Named Types are explicitly declared as `Type`; field-level imports clone a shape from an existing entity field. Both mechanisms are valid; authors choose based on context. ## 15. Enum ```xdbml enum job_status { created [note: 'Waiting to be processed'] running done failure } enum grade { "A+" "A" "A-" } ``` Values with non-identifier characters must be quoted. Enums may be container-qualified. ## 16. TablePartial and TableGroup ### 16.1 TablePartial View in playground ```xdbml TablePartial base_template { id int [pk, not null] created_at timestamp [default: `SYSTIMESTAMP`] updated_at timestamp [default: `SYSTIMESTAMP`] } Entity users { ~base_template name varchar } ``` Conflict resolution: local entity definitions override partials; among partials, last-injected wins. TablePartials may also be injected into Edge declarations. ### 16.2 TableGroup ```xdbml TableGroup e_commerce [color: '#3498DB', note: 'Commerce entities'] { orders products inventory } ``` **TableGroup settings**: | Setting | Values | |---|---| | `color` | `#rgb` or `#rrggbb`. Visualization color for the group's frame in diagram renderers. | | `note` | Free-text annotation. | The `color` setting controls how renderers visually frame the group's member entities. Implementations that don't support group colors ignore the setting. ### 16.3 Naming `TablePartial` and `TableGroup` retain their DBML names without `Entity*` synonyms. ## 17. Diagram View ```xdbml DiagramView full_view { Tables { * } Notes { * } TableGroups { * } Containers { * } Views { * } Edges { * } } DiagramView sales_view { Tables { users; orders; products } } ``` Categories: `Tables`, `Notes`, `TableGroups`, `Containers`, `Views`, `Edges`. ## 18. Note Inline notes are settings or blocks on Project, Container, Entity, Field, Index, Edge, View, Type, TableGroup. Standalone notes act as canvas annotations: ```xdbml Note design_decision { ''' We chose to denormalize line items into the orders entity rather than maintaining a separate line_items entity because read patterns dominate writes by 100:1. ''' } ``` ## 19. Path syntax ### 19.1 Segment forms | Segment | Meaning | |---|---| | `.field_name` | Navigate into named field | | `.[N]` | Specific positional array element (zero-indexed) | | `.[*]` | All array elements (iteration) | | `.["literal_key"]` | Map literal key | | `.[*]` (map context) | All map values | | `."quoted name"` | Non-identifier name | | `.AlternativeName` | Polymorphic alternative selector | ### 19.2 Path examples ```xdbml customers.addresses.[0].city // tuple positional orders.line_items.[*].sku // every sku across line items events.[42].payload.user_id // 43rd event's payload flags.["dark_mode"] // specific map key data."user.id" // quoted field name payments.method.card.brand // polymorphic alternative sales.orders.line_items.[*].sku // cross-container with array crossing ``` ### 19.3 Implicit array iteration in indexes Allowed only in index paths. `Ref` paths and tuple positions require explicit syntax. ### 19.4 JSONPath alias acceptance JSONPath-style brackets (no leading dot) are accepted at parse time and normalized to the dot-prefixed canonical form: ```xdbml addresses[0].city // parsed; normalized to addresses.[0].city ``` ### 19.5 Formatter behavior Explicit `.[*]` whenever path depth exceeds two segments; implicit auto-iteration otherwise. Tuple positions and quoted segments always emitted explicitly. ## 20. Polymorphism ### 20.1 Scalar type union -- `union` View in playground ```xdbml Entity records { score union [int, decimal, null] legacy_id union [string, int] [not null] tags array [union [string, int]] } ``` Element ordering is significant. The runtime type acts as the discriminator. ### 20.2 Structural polymorphism -- `oneOf`, `anyOf`, `allOf` View in playground ```xdbml Entity payments { amount decimal(10,2) [not null] method oneOf { card object { last4 varchar(4) brand varchar } bank object { iban varchar } wallet object { provider varchar account varchar } } [discriminator: method_kind] } ``` ### 20.3 Keywords | Keyword | Semantics | |---|---| | `oneOf` | Exactly one of N alternatives | | `anyOf` | One or more alternatives may apply | | `allOf` | Value must satisfy all listed schemas | ### 20.4 Paths through polymorphism -- strict alternative selector ```xdbml indexes { method.card.brand // valid method.brand // rejected -- selector required even when unambiguous } ``` Paths through `union` (scalar) are rejected. ### 20.5 Polymorphism inside arrays ```xdbml events array [ event oneOf { user_event object { type varchar, user_id objectId, action varchar } item_event object { type varchar, item_id objectId, qty int } } [discriminator: type] ] ``` ## 21. JSON as a first-class data type ### 21.1 Opaque JSON View in playground ```xdbml Entity api_logs { request_body json [note: 'shape varies by endpoint'] response_body jsonb [note: 'large; may exceed 1 MB'] raw_data variant } ``` ### 21.2 Schema-known JSON View in playground ```xdbml Entity orders { id int [pk] payload json { shipping_address object { street varchar [not null] city varchar [not null] } items array [ item object { sku varchar [not null] quantity int [not null] } ] } } ``` ### 21.3 `json` vs. `object` `object` describes an abstract structural type -- encoding chosen by target. `json`, `jsonb`, `variant` describe a physical storage choice (JSON-typed column). ### 21.4 Keyword semantics | Keyword | Target intent | |---|---| | `json` | Generic JSON storage (Oracle 21c+, MySQL, MS SQL Server 2025+, BigQuery) | | `jsonb` | PostgreSQL-specific binary JSON, indexable | | `variant` | Snowflake-specific semi-structured with auto-typing | ## 22. BSON scalar types MongoDB stores documents in BSON, a JSON superset with additional scalar types. xDBML recognizes BSON type names as ordinary scalar type identifiers -- no special grammar treatment but preserved through the AST and emitted in MongoDB collection validators. | BSON type | Typical use | |---|---| | `string` | UTF-8 string (BSON element 0x02) | | `int32` | 32-bit signed integer (BSON element 0x10) | | `int64` | 64-bit signed integer (BSON element 0x12); `long` is a recognized alias | | `double` | 64-bit IEEE 754 floating point (BSON element 0x01) | | `bool` | Boolean (BSON element 0x08) | | `null` | Null type (BSON element 0x0A) | | `objectId` | 12-byte MongoDB primary key | | `Decimal128` | High-precision decimal (IEEE 754-2008) | | `BinData` | Binary data with subtype | | `Timestamp` | MongoDB-internal timestamp | | `Date` | UTC datetime | | `Regex` | Compiled regular expression | | `JavaScript` | JavaScript code | | `Symbol` | Deprecated; preserved for legacy reads | | `MinKey` / `MaxKey` | Sentinel comparison values | | `DBPointer` | Deprecated; preserved for legacy reads | `int64` is canonical; `long` is accepted at parse time and normalized to `int64` in the AST. MongoDB shell output uses `long`, while the BSON spec uses `int64` -- both forms are equivalent. ### 22.1 Recommended pattern: BSON-native types in MongoDB Containers When a field lives inside a `Container [type: database]` targeting MongoDB, use BSON-native type names (`string`, `int32`, `int64`, `Decimal128`, `Date`, `objectId`, etc.) rather than SQL conventions (`varchar`, `int`, `decimal(p,s)`, `timestamp`). The rationale: - BSON types are exactly what MongoDB's `$jsonSchema` validators emit -- no translation layer interpreting precision/scale metadata that has no BSON meaning. - `Decimal128` carries its IEEE 754-2008 precision intrinsically and does not accept `(p,s)` parameters. Writing `decimal(19,4)` inside a MongoDB Collection misleadingly suggests precision metadata that BSON cannot represent. - `int` is ambiguous between BSON `int32` and `int64`. Explicit `int32` or `int64` removes the ambiguity that cross-engine `Ref` validation depends on. Generators accept SQL-named scalars inside MongoDB Containers (§1.2 design principle 4 -- type-name pass-through) and convert them on lowering. The recommendation is for clarity and round-trip fidelity, not parser correctness. ### 22.2 Cross-engine references A `Ref` may cross between Containers whose engines use different type-name vocabularies. The reference is valid if the source and target field types map to a compatible value space per the round-trip table below. | Relational type | BSON equivalent | |---|---| | `int`, `integer` | `int32` (when target stores small IDs) or `int64` (when target stores large IDs) | | `bigint` | `int64` | | `varchar`, `varchar(n)`, `text` | `string` (with optional `maxLength`) | | `decimal(p,s)`, `numeric(p,s)` | `Decimal128` | | `boolean`, `bool` | `bool` | | `timestamp`, `timestamptz`, `datetime` | `Date` | A `Ref` from `int32` (BSON) to `int` (relational) is valid by default. A `Ref` from `int64` to a relational `int` issues a warning if the relational type is not `bigint` or wide enough to hold the BSON values. Reusable `Type` declarations (§14) used in both relational and MongoDB Containers should use BSON-native types in the Type body; relational generators lower them per the round-trip table: ```xdbml Type MonetaryAmount { amount Decimal128 [not null] currency string [not null, pattern: '^[A-Z]{3}$', minLength: 3, maxLength: 3] } ``` This Type used in an Oracle `Entity` lowers `amount` to `NUMBER(38,4)` and `currency` to `VARCHAR2(3) CHECK (REGEXP_LIKE(currency, '^[A-Z]{3}$'))`. The same Type used in a MongoDB `Collection` emits `Decimal128` and `string` with `maxLength: 3` and a `pattern` in `$jsonSchema`. ### 22.3 Round-trip to non-MongoDB targets View in playground ```xdbml Container app_data [type: database] { Collection users { _id objectId [pk] email string [unique, not null, maxLength: 255] balance Decimal128 last_login Date avatar BinData } } ``` Round-trip to non-MongoDB targets: | BSON type | Oracle | PostgreSQL | Avro | JSON Schema | |---|---|---|---|---| | `string` | `varchar2(n)` or `clob` | `varchar(n)` or `text` | `string` | `string` | | `int32` | `number(10)` | `integer` | `int` | `integer` | | `int64` | `number(19)` | `bigint` | `long` | `integer` | | `double` | `binary_double` | `double precision` | `double` | `number` | | `bool` | `number(1) check (col in (0,1))` | `boolean` | `boolean` | `boolean` | | `null` | `null` | `null` | `null` | `null` | | `objectId` | `varchar2(24)` w/ check | `varchar(24)` or `bytea` | `string` w/ logical type | `string` w/ `pattern` | | `Decimal128` | `number(38,4)` | `numeric(34,0)` | `decimal` high precision | `string` | | `BinData` | `blob` | `bytea` | `bytes` | `string` w/ `format: byte` | | `Timestamp` | `timestamp with time zone` | `timestamptz` | `long` w/ `logicalType: timestamp-millis` | `string` w/ `format: date-time` | | `Date` | `timestamp with time zone` | `timestamptz` | as `Timestamp` | `string` w/ `format: date-time` | ## 23. AI-readiness and semantic-interop settings Four optional settings expose declarative metadata for AI consumers, semantic-layer tools (OSI, dbt MetricFlow, Cube), and governance platforms (Atlan, Collibra, DataHub, Alation). A fifth mechanism (the `x_` prefix) supports organization-specific custom properties without grammar changes. ### 23.1 `synonyms:` -- alternative names View in playground ```xdbml Entity customers { display_name varchar [synonyms: ['name', 'full name', 'customer name']] mrr_amount decimal(10,2) [synonyms: ['monthly revenue', 'recurring revenue']] } ``` ### 23.2 `business_term:` -- controlled-vocabulary identifier View in playground ```xdbml Entity orders { total_amount decimal(10,2) [business_term: 'Gross Order Value'] net_amount decimal(10,2) [business_term: 'https://glossary.acme.com/terms/net_order_value'] } ``` ### 23.3 `granularity:` -- temporal granularity View in playground ```xdbml Entity events { occurred_at timestamp [granularity: second] recorded_at timestamp [granularity: microsecond] } ``` Values: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`, `millisecond`, `microsecond`, `nanosecond`. ### 23.4 `tags:` -- free-form classification labels A list of string labels for classification, sensitivity, lifecycle, domain, audience, governance markers, or any other discoverable property. Tags are the dominant pattern in modern data catalogs (Atlan, Collibra, DataHub, Alation, Hackolade Studio). ```xdbml Entity customers { email varchar [tags: ['pii', 'contact', 'gdpr-subject']] ssn varchar [tags: ['pii', 'sensitive', 'gdpr-special', 'restricted']] internal_id int [tags: ['internal-only']] } Entity products [tags: ['catalog', 'public-api']] { ... } Type MonetaryAmount [tags: ['finance', 'reusable']] { ... } ``` Tags compose freely with `synonyms`, `business_term`, and other settings. They differ from synonyms (which name the same thing in multiple ways) and from business terms (which reference an external glossary): tags are *classification labels*, freeform and multi-valued. ### 23.5 Custom properties -- the `x_` prefix Settings beyond the recognized vocabulary are accepted as open-vocabulary key-value pairs and passed through to the AST. To distinguish *intentional* custom properties from possible typos in reserved keywords, xDBML reserves the `x_` prefix for explicit custom extensions. ```xdbml Entity customers { email varchar [ pattern: '^[^@]+@[^@]+$', x_sensitivity: 'pii-low', x_acme_data_steward: 'jane@acme.com', x_retention_days: 2555 ] } Container core [type: schema, x_provisioning_template: 'standard-tier-3'] { ... } ``` Conventions: - The prefix is `x_` (underscore-prefix). Dash-prefix is not used because xDBML identifiers do not allow dashes. - The prefix is **optional, not required**. The parser also accepts non-prefixed unknown properties. - The parser does not validate names or values of `x_*` properties -- they pass through to the AST. - Tooling can distinguish `x_*` properties as deliberately custom (versus possibly-misspelled reserved keywords); linters may warn on non-`x_`-prefixed unknown properties. A registry of de-facto-standardized custom properties is maintained at xdbml.org/registry. Custom properties that reach widespread adoption may be promoted to first-class settings in a future MINOR version. ### 23.6 Scope The four first-class settings and the `x_` mechanism carry *declarative* metadata about stored data -- names, descriptions, classifications, glossary references, AI-readiness labels. xDBML does not carry *computational* meaning (measures, metrics, aggregations, derived fields, hierarchies, default-aggregation hints) -- that belongs in adjacent semantic-layer formats -- nor *contractual* obligations (SLAs, ownership, pricing, support hours) -- that belongs in ODCS. ## 24. Validation constraints xDBML accepts the full JSON Schema validation vocabulary as field settings: | Setting | Applies to | Meaning | |---|---|---| | `pattern` | string | Regular expression | | `format` | string | `email`, `uri`, `uuid`, `date-time`, etc. | | `minLength` / `maxLength` | string | Length bounds | | `minimum` / `maximum` | number | Value bounds | | `exclusiveMinimum` / `exclusiveMaximum` | number | Exclusive bounds | | `multipleOf` | number | Multiple of N | | `enum` | any | Fixed list of values | | `minItems` / `maxItems` | array | Element count bounds | | `uniqueItems` | array | All elements distinct | | `minProperties` / `maxProperties` | object | Field count bounds | | `check` | any | Free-form expression | Example: View in playground ```xdbml Entity users { email varchar [pattern: '^[^@]+@[^@]+$', maxLength: 255] age int [minimum: 0, maximum: 150] username varchar [pattern: '^[a-z0-9_]{3,32}$'] tags array [varchar] [minItems: 1, maxItems: 10, uniqueItems: true] } ``` ### 24.1 Generator behavior | Target | Constraint output | |---|---| | Oracle / PostgreSQL / SQL Server / Snowflake | `CHECK` for `minimum`, `maximum`, `pattern`; others as comments | | MongoDB validator | Native `$jsonSchema` with all validations | | JSON Schema | Direct round-trip | | Avro | `pattern` → name validation; bounds → logical-type metadata | | Cassandra | Validations without equivalent dropped with warning | ## 25. Records -- sample data A `records { }` block declares sample data inline in the schema. Records serve documentation, testing, and example-data purposes -- they describe what real rows look like alongside the entity that defines their shape. Records can be declared in two forms: **inside an entity body** (implicit column list) or **at the top level** referencing an entity by name (explicit column list). ### 25.1 Records inside an entity When declared inside an entity body, the `records { }` block uses an implicit column list -- values are assigned to columns in the order the columns are declared on the entity: View in playground ```xdbml Entity users { id int [pk] name varchar email varchar records { 1, 'Alice', 'alice@example.com' 2, 'Bob', 'bob@example.com' 3, 'Carol', 'carol@example.com' } } ``` Each row in the block is a comma-separated list of values matching the column order. When an entity uses `~partial_name` injection (§16.1), the implicit column order follows the same precedence: locally declared fields first, then partial-injected fields in injection order. ### 25.2 Records outside an entity (top-level form) When declared outside an entity body, the `records` declaration names the entity and explicitly lists the columns being populated: ```xdbml records users (id, name, email) { 1, 'Alice', 'alice@example.com' 2, 'Bob', 'bob@example.com' } ``` Columns not listed default to `null` (or the column's declared default). The order of values in each row matches the explicit column list, not the entity's field declaration order. The top-level form is useful when: - Sample data is maintained separately from the entity definition (e.g., one file declares schemas, another declares fixtures) - Only a subset of an entity's columns has meaningful sample values - The same entity has multiple disjoint record sets that benefit from separate declarations ### 25.3 Cross-container references To declare records for an entity inside a Container, use the dotted path: ```xdbml records core.users (id, name, email) { 1, 'Alice', 'alice@example.com' } ``` ### 25.4 Value forms Each value in a record row is one of: | Form | Example | Notes | |---|---|---| | String | `'Alice'`, `'O\'Brien'` | Single-quoted. Escape inner single quotes with `\'`. | | Multi-line string | `'''multi\nline'''` | Triple single-quoted. Useful for description text. | | Number | `42`, `3.14`, `-1.5e-3` | Integer or decimal; standard scientific notation. | | Boolean | `true`, `false` | Lowercase. | | Null | `null` | The SQL/JSON null value. | | Date / Time | `'2026-06-10'`, `'10:30:00'`, `'2026-06-10T14:30:00Z'` | ISO 8601 inside single quotes. Interpreted per the target column's declared type (date, time, timestamp with or without timezone). | | Enum value | `Status.active` | When the column's type is a declared Enum, values may be written as `.` for clarity. | | Backtick expression | `` `gen_random_uuid()` `` | Backtick-wrapped. Treated as opaque target-engine expression, disabling static type checking; lowered verbatim by generators. Useful for database-side defaults or computed values. | Each value is type-checked against the target column's declared type at generation time. Implementations SHOULD report a clear error when a value cannot be coerced (e.g., `'not_a_number'` for an int column). ### 25.5 Generator behavior Records are sample data, not authoritative schema content. Generators MAY emit records as: - SQL `INSERT` statements (PostgreSQL, Oracle, MySQL, etc.) - MongoDB `insertMany` calls or BSON document fixtures - Avro `IndexedRecord` instances for test corpora - JSON document fixtures - Markdown tables in generated documentation Generators MAY also choose to skip records entirely when producing DDL-only output. The records' presence in the xDBML document is non-prescriptive about whether they appear in any generated artifact. ## 26. Module system A single xDBML file can describe a complete schema, and in many cases that is the appropriate structure. For larger architectures, however, splitting the schema across multiple files becomes valuable -- separating polyglot layers by engine, decomposing by domain, or sharing canonical declarations across data products. The module system lets one xDBML document import declarations from another. The design is intentionally close to DBML's module system documented at https://dbml.dbdiagram.io/syntax/module-system, with strict-superset semantics: every valid DBML document parses correctly under xDBML and DBML constructs mean the same thing in xDBML as in DBML. xDBML extends DBML's design in four ways, all of which are available only when the file declares `xdbml: 0.3` (or any 0.x): - **Extended element-type vocabulary** covering xDBML-specific constructs (Container, Entity, Collection, Record, Type, Edge, View, DiagramView) in addition to DBML's table, enum, tablepartial, note, schema, tablegroup. - **Field-level imports**, allowing an author to clone a single field declaration from another file as a reusable named shape. - **Container-scoped imports**, allowing imports to live inside the Container body where they belong. - **Clone blocks**, optionally embedding the imported content directly in the importing file so that the file remains parseable even if the referenced file is unavailable. Through v0.2 these imports were relative-path only. v0.3 adds remote module sources over HTTPS (§26.14). ### 26.1 Files and projects Every xDBML file is a complete document and may declare a `Project { ... }` block describing its standalone view of a schema. When file B imports from file A, what flows from A to B is the **content** of A's declarations -- its Containers, Entities, Types, Edges, Views, TablePartials, Enums, Notes, TableGroups, DiagramViews. What does NOT flow is A's `Project { ... }` declaration. Project declarations are intrinsic to each file; they are never importable. The same file can be opened standalone (its own Project is the project) or imported by another file (only its non-Project content contributes to the importer's project). A file MAY omit the Project declaration if its sole purpose is to be imported. ### 26.2 Import in full vs selective import Two import modes are supported. **Import in full**: ```xdbml reuse * from './path-to-file' ``` Brings every top-level declaration from the source file into the importing file's scope (except for the source's Project block). Cannot be combined with aliasing -- if you need to rename, use the selective form. **Selective import**: ```xdbml reuse { type name type name as alias ... } from './path-to-file' ``` Brings only the listed declarations. Each item may carry an optional `as alias` to rename the import in the current file's namespace. Multi-line and single-line comma-separated forms are both supported: ```xdbml reuse { entity products, entity categories, type Email } from './library' ``` A single directive may import multiple declarations from the same source file. When combined with a clone block (§26.6), the clone block contains all the imported declarations matched by name and element type. ### 26.3 Importable element types The element type names in the selective form are case-insensitive (`Table`, `TABLE`, `table` all work). | Keyword | What is imported | |---|---| | `table` | Entity declared with the DBML `Table` keyword | | `entity` | Entity declared with `Entity` (synonym for `table`) | | `collection` | Entity declared with `Collection` (synonym for `table`) | | `record` | Entity declared with `Record` (synonym for `table`) | | `enum` | Enum | | `tablepartial` | TablePartial | | `note` | Sticky Note | | `schema` | Container declared with `type: schema`, including all entities inside it | | `container` | Container of any type, including all entities inside it | | `tablegroup` | TableGroup, including its membership | | `type` | Named Type (object-shaped or scalar -- §14) | | `edge` | Edge | | `view` | View | | `diagramview` | DiagramView | | `field` | Single field declaration cloned from an entity as a reusable named shape (see §26.8) | The `project` element type is NOT importable. **Source paths mirror the source file's actual hierarchy**: | Source structure | Import path | |---|---| | Entity at top level (no container) | `EntityName` | | Entity inside a Container | `ContainerName.EntityName` | | Field on a top-level entity | `EntityName.FieldName` | | Field on an entity inside a container | `ContainerName.EntityName.FieldName` | | Nested field (object inside an entity) | `ContainerName.EntityName.OuterField.InnerField` | | Top-level Type, Enum, TablePartial, etc. | `Name` | ### 26.4 The `use` and `reuse` directives Both `use` and `reuse` bring declarations from another file into the current file. They differ in one respect: **transitivity to further importers**. **`reuse`** is the simpler default and is what most authors want. Declarations brought in via `reuse` are visible to files that further import the current file: ```xdbml // schemas/index.xdbml -- barrel file reuse * from './oracle-core' reuse * from './mongo-orders' // main.xdbml -- imports the barrel reuse * from './schemas/index' // All declarations from oracle-core and mongo-orders are available. ``` **`use`** is appropriate only when the author wants the imports to remain private to the importing file. Declarations brought in via `use` are NOT visible to files that further import the current file. **Recommendation: use `reuse` by default unless you have a specific reason to prevent transitivity.** Most authors will never need the non-transitive form. Both forms support both `*` (import all) and `{ ... }` (selective) syntaxes. ### 26.5 Where the directive goes determines where the import lives The location of a `use`/`reuse` directive in the importing file determines where the imported element lives in the merged AST. The rule is simple: **the directive's location is the placement location.** What you see textually is what the merged AST contains. There is no separate "visibility scope" or "placement scope" -- they are the same thing. **For non-field imports**, the directive may appear in any location where the imported element type is normally valid: - **Container imports** (`container`, `schema`): at file scope (Containers are top-level constructs) - **Top-level constructs** (`type`, `enum`, `tablepartial`, `edge`, `view`, `tablegroup`, `diagramview`, `note`): at file scope - **Entity imports** (`table`, `entity`, `collection`, `record`): at file scope (top-level uncontained entity) OR inside a Container body (the entity becomes a member of that Container) **For field imports**, the directive MUST appear at file scope (see §26.8 for the declaration-vs-placement pattern). **`use`/`reuse` directives MAY NOT appear inside Entity, Edge, View, or Type bodies.** Imports are declarations that establish structural elements; they don't belong inside the bodies of other structural elements. **Naming**: The imported element is named in the importing file's namespace based on **where it is placed**, not where it came from. For an entity import: - Directive at file scope → entity is named by its bare name at file top level (e.g., `products`) - Directive inside `Container ordering` → entity is named within that container (e.g., `ordering.products`) The source's path appears in the directive's `from` clause (recording where to find the element in the source); the local name reflects the directive's placement. ### 26.6 Clone blocks `use` and `reuse` directives MAY be followed by an optional **clone block** containing the imported content embedded directly in the importing file. When the clone is present, the parser uses it as the authoritative content for that import. When the clone is absent, the parser falls back to opening the referenced file at parse time (DBML-compatible behavior). **Syntax**: ```xdbml // Reference-only (DBML behavior, no clone) reuse { entity core.dim_customer } from './conformed-dimensions' // With clone (xDBML autonomous behavior) reuse { entity core.dim_customer } from './conformed-dimensions' { Entity dim_customer { id int [pk] email Email country CountryCode } } // With clone + tool-managed metadata reuse { entity core.dim_customer } from './conformed-dimensions' [cloned_at: '2026-06-06T14:30:00Z'] { Entity dim_customer { ... } } ``` **What the clone block contains**: The clone block contains exactly the imported declaration(s), without any surrounding wrappers from the source file. The shape of the clone follows the granularity of the import: | Import directive | Clone block content | |---|---| | `reuse { field X.Y.Z } from ...` | The field declaration alone (no entity wrapper, no container wrapper) | | `reuse { entity X.Y } from ...` | The entity declaration alone (no container wrapper) | | `reuse { type T } from ...` | The Type declaration | | `reuse { enum E } from ...` | The Enum declaration | | `reuse { tablepartial P } from ...` | The TablePartial declaration | | `reuse { container C } from ...` | The Container declaration including its intrinsic entities | | `reuse { schema S } from ...` | Same as container -- the schema's content travels with it | | `reuse { tablegroup G } from ...` | The TableGroup declaration including its membership | | `reuse { view V } from ...` | The View declaration | | `reuse { edge E } from ...` | The Edge declaration | | `reuse { diagramview D } from ...` | The DiagramView declaration | | `reuse { note N } from ...` | The sticky Note declaration | | `reuse * from ...` | All top-level declarations from the source's non-Project content, preserving their natural shape | The principle: **container wrappers are dropped when extracting an inner element, preserved when the container itself is the imported element.** **Multiple imports in one directive with one clone block** are supported. The parser matches each clone declaration to its corresponding import item by name and element type; order within the clone block is not significant. **Names in the clone use the importing file's vocabulary**. When a directive aliases an import (`as`), the clone uses the alias. Names referenced inside a clone also use the importing file's vocabulary -- no mediation between source names and importing-file names is required. **Semantics**: - When a clone block is present, the parser MUST use its content as the authoritative declarations for the directive's scope. The referenced path is not opened during parsing. - When a clone block is absent, the parser MUST open the referenced path and use the source file's content as authoritative. If the path cannot be resolved, the parse fails. - Clone content is parsed under the importing file's `xdbml:` version directive, not the source file's. **Tool-mediated clone refresh**: before submitting the file to a parser, tools MAY refresh clones by re-reading their referenced sources and updating clone content (and the `cloned_at` timestamp). Such refresh workflows are tool-defined and out of scope for this specification. **Autonomy property**: a file containing clone blocks for all of its imports is fully self-contained and parses correctly without needing access to any referenced files. Authors choose per-directive whether to include a clone based on workflow needs. **Clone metadata**: the directive MAY carry settings before the clone block. The only setting defined in this version is: - `cloned_at: ` -- ISO 8601 timestamp recording when the clone was captured This setting is informational. The parser does not act on it. Tooling MAY use it to drive drift-detection workflows. Other metadata fields (hashes, source versions, refresh policies) are reserved for future versions. **Drift detection**: drift between the clone content and the current state of the referenced file is **out of scope for the specification**. Tools that author or refresh clones define their own policies. **Nested references within clones**: when a clone is captured, it includes the directly-imported declarations only. If those declarations reference further declarations from other files, the clone does NOT recursively capture those further references. For full self-containment, the author must also import (with clone blocks) the transitively referenced declarations. ### 26.7 Worked example: `use` vs `reuse` **Private internal helper** (use private to importing file): ```xdbml // internal-helpers.xdbml Type InternalAuditFields { created_at timestamp created_by varchar } ``` ```xdbml // orders.xdbml -- keeps the helper private xdbml: 0.2 use { type InternalAuditFields } from './internal-helpers' Container ordering [type: schema] { Entity orders { id int [pk] audit InternalAuditFields customer_id int } } ``` ```xdbml // main.xdbml xdbml: 0.2 Project ecommerce { targets: [PostgreSQL] } reuse * from './orders' // 'ordering.orders' is available. 'InternalAuditFields' is NOT (orders.xdbml used it privately). ``` ### 26.8 Field-level imports Field-level imports clone a single field declaration from another file. They are useful when the importing file wants to reuse an existing field's complete definition without importing the whole entity or asking the library author to extract a Named Type. **Declaration vs placement**: Field imports are different from all other imports in one important respect: they declare a reusable shape without placing a field anywhere. A field cannot exist on its own outside an entity. So the field import directive declares an **available named field shape** at file scope, and the author then uses that name as the type of a field in some entity. This makes field imports semantically similar to scalar Named Types (§14.7): the directive declares a reusable shape; the author places fields of that shape elsewhere in the file. **Field imports must appear at file scope**, never inside a Container body. **Syntax**: ```xdbml reuse { field core.dim_customer.email } from './conformed-dimensions' ``` The path follows xDBML's standard `container.entity.field` form. Field paths into nested structures use dot notation: `field core.dim_customer.addresses.address.city`. **Usage**: the imported field's name (or alias) is used as the type of a field in some entity: ```xdbml reuse { field core.dim_customer.email } from './conformed-dimensions' Container facts [type: schema] { Entity fact_sales { customer_email email // 'email' here is used as the type, not the field name customer_id int [ref: > core.dim_customer.id] amount decimal(10,2) } } ``` To rename the imported field shape, use `as`: ```xdbml reuse { field core.dim_customer.email as canonical_email } from './conformed-dimensions' Entity fact_sales { customer_email canonical_email } ``` **What is copied** (cloning by value at clone-capture time or parse time): - **Type**: copied verbatim, including nested shapes - **Settings** (`not null`, `unique`, `default`, `increment`, `pk`): copied verbatim - **Validation constraints** (`pattern`, `minLength`, `maxLength`, range, enum membership, `check`): copied verbatim - **Notes** (both inline and block): copied verbatim - **AI-readiness metadata** (`synonyms`, `business_term`, `tags`, `granularity`): copied verbatim - **Custom properties** (`x_` prefixed): copied verbatim - **Inline `ref:` settings**: copied only if the ref's target is resolvable in the importing file's scope; otherwise dropped with a warning **Per-instance overrides**: When using the imported shape as a field type, the field's declaration may override settings: ```xdbml reuse { field core.dim_customer.email } from './conformed-dimensions' Entity employees { email email [unique] // adds 'unique' to the imported email's settings } ``` **Field imports support clone blocks**: ```xdbml reuse { field core.dim_customer.email } from './conformed-dimensions' [cloned_at: '2026-06-06T14:30:00Z'] { email varchar [unique, not null, pattern: '^[^@]+@[^@]+$', tags: ['pii']] } ``` ### 26.9 Aliases When two source files define elements with the same name, use `as` to rename imports and avoid conflicts: ```xdbml Container app [type: schema] { reuse { entity users as auth_users } from './auth' reuse { entity users as billing_users } from './billing' } ``` Once aliased, the alias is the only accessible name in the importing file under that import. The clone (if present) uses the alias as the declaration's name. ### 26.10 Name resolution Name resolution operates in two passes: **Pass 1 (declaration collection)**: the parser walks the import graph starting from the parse target. For each `use`/`reuse` directive, the parser uses the clone block if present, otherwise opens the referenced file. Declarations are collected with their fully-qualified names in the importing file's namespace (placement applied, aliases applied). **Pass 2 (reference resolution)**: references are resolved against the declarations collected in pass 1. Forward references within and across files are valid. **Conflict policy**: two imports conflict only if their fully-qualified names in the importing file's namespace collide. Two `Entity users` imported into different Containers (`ordering.users` and `audit.users`) do not conflict; two imports of `users` into the same Container without aliases DO conflict. On conflict, the parser MUST emit an error and require explicit aliasing. **Local declarations always win** over imported declarations of the same name. ### 26.11 Version directive across files Each file is parsed using its own `xdbml:` version directive (or as DBML if no directive is present). Different files in an import graph MAY use different version directives: - A `xdbml: 0.2` file MAY import a `xdbml: 0.1` file - A `xdbml: 0.2` file MAY import a DBML file (no directive) - A `xdbml: 0.1` file MAY import a DBML file - A `xdbml: 0.1` file MAY NOT use v0.2-only constructs such as clone blocks, scalar Named Types, or field imports When a clone is captured from a source file, the clone content is parsed under the **importing** file's version directive, not the source's. ### 26.12 Multi-file Project rule Every xDBML file MAY declare a Project, but **only the entry file's Project survives** in the merged AST. When file A imports from file B: - A's Project is part of the merged project AST - B's Project is ignored (B's declarations contribute, B's Project metadata does not) The same file plays different roles depending on whether it's the entry point or being imported. ### 26.13 Path resolution This section covers relative-path resolution. v0.3 also supports remote module sources over HTTPS, recognized by an `https://` scheme (§26.14); resolving relative references inside a remote module is covered in §26.14.1. A relative path: - Begins with `./` or `../` - Uses forward slashes as path separators on all platforms - Is resolved relative to the importing file's directory - Has an optional file extension (`./auth`, `./auth.xdbml`, `./auth.dbml` all work) When the extension is omitted, the parser searches `.xdbml` then `.dbml`. If neither exists AND no clone block is present, the parser reports an error. When a clone block IS present, path resolution failure is NOT an error -- the clone is authoritative. The parser MAY emit a warning. Paths without a leading `./` or `../` are reserved for future use and rejected in this phase. **Cross-platform compatibility**: forward-slash relative paths are accepted by all modern file APIs on Linux, macOS, and Windows. Browser-based renderers using the File System Access API support the same path syntax. **Browser-based renderer considerations**: when an xDBML file is parsed in a browser-based renderer, reading referenced files may require explicit user permission via the File System Access API or equivalent. Clone blocks eliminate this concern entirely: a file with clones for all its imports requires no file system access to referenced files. Authors writing files for browser-based consumption are encouraged to include clones. ### 26.14 Remote module sources Through v0.2, a directive's source (the string after `from`) was always a relative path to another xDBML file. This revision adds a second form of source: an absolute URL retrieved over HTTPS. A remote source is recognized purely by its scheme. A source string that begins with `https://` is a remote source; any other source string is a relative path and is resolved exactly as in v0.2 (§26.13). Only the `https` scheme is permitted. A source that uses `http`, `file`, `git`, `ssh`, `data`, a protocol-relative form such as `//host/path`, or a bare host such as `github.com/owner/repo/...` is rejected with a located error. It is never reinterpreted as a relative path. Plaintext transport is not acceptable for content that the importing document will trust as schema, and the other forms are excluded so that the boundary between local and remote resolution stays unambiguous. A remote source changes only where the bytes of a module come from. It does not change what an import does. The placement principle of §26.5 (a directive places its imported objects wherever the directive sits) is preserved without modification, and field-level imports (§26.8) are unaffected. Both directive keywords (`use` and `reuse`), both selection forms (the import-all `*` and the selective `{ ... }` list), per-symbol `as` aliasing (§26.9), and any directive settings behave identically whether the string after `from` is a path or a URL, and they bind imported objects the same way. ``` // Import everything a remote module exposes use * from 'https://raw.githubusercontent.com/acme/schemas/v1.4.0/core.xdbml' // Import selected objects; each item carries its element-type slot, // and an optional per-symbol alias reuse { entity Customer, entity Order as SalesOrder } from 'https://raw.githubusercontent.com/acme/schemas/v1.4.0/sales.xdbml' // A relative source continues to work exactly as before use * from './local/inventory.xdbml' ``` #### 26.14.1 Resolving relative references inside a remote module A remote module may itself contain imports. When a remote module contains a relative source, that source is resolved against the base URL of the module that contains it, following the reference resolution rules of RFC 3986. It is not resolved against the entry document, and it is not resolved against any filesystem location. This extends the path resolution of §26.13 to URL bases. For example, if a local entry document imports `https://raw.githubusercontent.com/acme/schemas/v1.4.0/core.xdbml`, and `core.xdbml` contains the relative source `./enums.xdbml`, that reference resolves to `https://raw.githubusercontent.com/acme/schemas/v1.4.0/enums.xdbml`. This lets an author publish a multi-file module set, expose a single entry file, and rely on the set's internal references continuing to work once the entry file is imported by URL. A consequence of resolving against the containing module's base URL is that a remote module can never reach the local filesystem. Every reference resolved against an `https` base is itself an `https` URL. This is a deliberate trust boundary: importing a remote module grants that module no access to the importing system's files. #### 26.14.2 Pinning and reproducibility A URL that names a mutable reference, such as a branch, can return different content over time, so the same import may yield a different model on a later resolution. To keep a model reproducible, an author SHOULD pin a remote source to an immutable reference, such as a commit identifier or a release tag, rather than a branch name. The following table shows immutable raw-content URL forms for common hosts. These forms are informative guidance, not part of the language. | Host | Immutable raw-content form (informative) | |---|---| | GitHub | `https://raw.githubusercontent.com/{owner}/{repo}/{commit-or-tag}/{path}` | | GitLab | `https://gitlab.com/{owner}/{repo}/-/raw/{commit-or-tag}/{path}` | | Bitbucket Cloud | `https://bitbucket.org/{workspace}/{repo}/raw/{commit-or-tag}/{path}` | | Azure DevOps | `https://dev.azure.com/{org}/{project}/_apis/git/repositories/{repo}/items?path=/{path}&versionDescriptor.version={commit}&versionDescriptor.versionType=commit&download=true&api-version=7.1` | Note: each host also serves a human-facing page, a "blob," "src," or "browse" view, at a different URL that returns HTML rather than raw content. A remote source MUST point at the raw-content URL. Recognizing and rewriting a human-facing URL into a raw-content URL is out of scope for the language and is addressed under resolver conformance (§26.14.5). Azure DevOps in particular has no dedicated raw host and serves content through its Items API, which typically also requires authentication; see §26.14.4. #### 26.14.3 Resolution failures Remote resolution can fail in ways local resolution cannot. A resolver MUST treat every one of the following as a hard error that aborts resolution with a located diagnostic. It MUST NOT drop the failed import silently, and it MUST NOT substitute a partial model for the missing one. The conditions are: a network or DNS failure; a response status outside the 2xx range; a request that exceeds the resolver's timeout; a TLS failure; a redirect to a non-`https` target; a redirect chain longer than the resolver's limit; a response larger than the resolver's size limit; and content that does not parse as a valid xDBML module. This requirement exists because a silent file-level drop produces a model that looks complete but is not, which is a known and costly failure mode of module resolution. #### 26.14.4 Credentials and authentication A remote source MUST NOT carry credentials. Userinfo in the authority component, such as `https://user:token@host/...`, and credential-bearing query parameters, such as an access token appended to the URL, are prohibited, and a resolver MUST reject a source that contains userinfo. Authentication for private content is outside the scope of the language. A resolver MAY obtain credentials from its own configuration, keyed by host, and apply them when issuing the request. Credentials are never read from, or written to, the document. This keeps private repositories, and hosts that require a token such as Azure DevOps, usable without placing secrets in source that is itself committed to version control. #### 26.14.5 Resolver conformance The following requirements apply to any resolver that fetches remote sources, and in particular to any resolver that runs in a server context, including a hosted playground or an application backend. They complement the parser implementation requirements of §26.17. A resolver MUST defend against server-side request forgery. It MUST refuse to resolve sources that target loopback addresses, link-local addresses, private address ranges, or cloud metadata endpoints, unless an operator has explicitly allowlisted them. A resolver MUST follow redirects only from `https` to `https`, MUST NOT downgrade to a non-`https` target, and MUST bound the number of redirects it follows. A resolver MUST enforce a maximum response size and a request timeout. A resolver MAY cache resolved content. When detecting import cycles (§26.15) and de-duplicating fetches, a resolver keys on the normalized URL, with the host lowercased and dot-segments resolved. De-duplication is best-effort, because the same content may be reachable through more than one URL, for example a redirecting host name and its raw-content host name. Rewriting a human-facing URL into a raw-content URL is permitted as a non-normative convenience, but it is not required and is not part of the language. A resolver that offers this convenience SHOULD document the host patterns it recognizes, and the parser SHOULD NOT be made responsible for host-specific routing, so that the language remains independent of any host's URL conventions. #### 26.14.6 Browser-based resolution and CORS (informative) A resolver that runs inside a web browser, such as the playground, is subject to the browser's cross-origin policy. A cross-origin fetch succeeds only if the host returns the appropriate cross-origin response headers. In practice, GitHub's raw-content host returns headers that permit cross-origin reads, while the raw or API endpoints of several other hosts do not by default. The playground therefore attempts to resolve any `https` source and reports a cross-origin failure as a distinct, clearly labeled error rather than a generic fetch failure, so that an author understands the cause. It does not maintain a fixed list of permitted hosts, because cross-origin support can change and can differ on self-hosted instances. An author who needs to resolve sources from a host that does not permit cross-origin reads in the browser should use a server-side resolver. Coverage across all hosts is a property of server-side resolvers, not of browser-based resolution. #### 26.14.7 Future considerations (informative) Two capabilities are intentionally deferred to a later revision and are noted here so that their absence is understood as deliberate. The first is an integrity attribute on an import, by which an author could record a content hash and a resolver could verify that the fetched content matches it. The second is a guardrail that prevents a relative reference inside a remote module from resolving above a declared base prefix. Neither is part of this revision. ### 26.15 Circular imports Circular imports are **allowed**, matching DBML's behavior. The two-pass name resolution handles cycles naturally. **Best practice (not a language rule)**: circular imports are often a signal that file boundaries could be improved. ### 26.16 Worked examples **Entity import inside a Container**: View in playground ```xdbml // catalog.xdbml xdbml: 0.2 Project catalog { targets: [PostgreSQL] } Container core [type: schema] { Entity products { id int [pk] sku varchar [unique, not null] name varchar [not null] price decimal(10,2) } } ``` View in playground ```xdbml // order-system.xdbml xdbml: 0.2 Project order_system { targets: [PostgreSQL] } Container ordering [type: schema] { Entity orders { id int [pk] customer_id int placed_at timestamp } reuse { entity core.products } from './catalog' [cloned_at: '2026-06-06T14:30:00Z'] { Entity products { id int [pk] sku varchar [unique, not null] name varchar [not null] price decimal(10,2) } } Entity order_lines { order_id int [ref: > ordering.orders.id] product_id int [ref: > ordering.products.id] quantity int } } ``` The imported `products` entity becomes `ordering.products` in the importing file. The source's `core` container appears in the directive's path but does not survive into the importing file's AST. **Field-level import**: ```xdbml // legacy-customer.xdbml Container ops [type: schema, target: Oracle] { Entity customer_legacy { id number(10) legacy_email varchar2(255) [pattern: '^[^@]+@[^@]+$', tags: ['pii']] } } ``` View in playground ```xdbml // new-system.xdbml xdbml: 0.2 Project new_system { targets: [PostgreSQL] } reuse { field ops.customer_legacy.legacy_email as contact_email } from './legacy-customer' [cloned_at: '2026-06-06T14:30:00Z'] { contact_email varchar2(255) [pattern: '^[^@]+@[^@]+$', tags: ['pii']] } Container app [type: schema] { Entity new_customer { id int [pk] contact_email contact_email } } ``` The field import declares `contact_email` as a reusable shape at file scope. The Entity uses `contact_email` as the type of one of its fields. ### 26.17 Parser implementation requirements A conformant parser supporting the module system MUST: 1. Accept any xDBML file as the parse target. The target file's `Project` is the merged project's Project. 2. For each `use`/`reuse` directive: if a clone block is present, use it as authoritative content. If absent, open the referenced path. 3. Walk the import graph by reading directives at file scope and inside Container bodies, resolving paths relative to the importing file's directory. 4. Visit each unique file at most once; circular imports must terminate. 5. Parse each referenced file with its own `xdbml:` version directive. Parse clone content with the importing file's version directive. 6. Place each imported element at the directive's location in the importing file's merged AST. 7. Reject field imports inside Container bodies; field imports MUST be at file scope. 8. Reject `use`/`reuse` directives inside Entity, Edge, View, or Type bodies. 9. Implement `use` vs `reuse` semantics correctly. 10. For field imports, copy the field declaration as a reusable named shape available for use as a field type, with `ref:` settings dropped (with warning) when the ref target is not in the importing file's scope. 11. When extracting a clone for an entity import, strip any surrounding Container wrapper. When extracting a clone for a container import, preserve the Container with its entities. Apply directive aliases in the clone's declaration names. 12. When a directive carries multiple imports and a clone block, match each clone declaration to its corresponding import item by name and element type. 13. Resolve names in two passes; an imported element's name follows its placement location, not its source path. 14. Detect and reject unresolved name conflicts. 15. Reject paths not starting with `./` or `../`. The parser SHOULD: - Cache parsed file contents during a single parse invocation - Emit warnings for shadowed-but-unaliased imports - Emit warnings for circular import chains - Emit warnings when a clone block is present but the referenced path could not be accessed - Emit warnings when a captured clone has unresolved external references Drift detection between clones and their sources is out of scope for the specification. ## 27. AST representation Abstract Schema Tree ``` Project ├── settings (targets, ...) ├── Note (project-level) ├── Container (0..n) │ ├── settings (type, target, replication, location, ...) │ ├── Note (container-level) │ ├── Entity (n per container) │ │ ├── Field (n; recursively nested) │ │ ├── Index (n) │ │ ├── Check (n) │ │ ├── Records (0..1) │ │ └── Note │ ├── View (n per container) │ │ ├── settings (source_query, materialized, ...) │ │ ├── Field (n) │ │ └── Note │ └── Edge (n per container) │ ├── settings (source, target, source_cardinality, target_cardinality, undirected, ...) │ ├── Field (n) │ ├── Index (n) │ └── Note ├── Entity (0..n; project-level) ├── View (0..n; project-level) ├── Edge (0..n; project-level) ├── Type (0..n) ├── Enum (n) ├── Ref (n; project-level) │ └── Cardinality (source/target min and max) ├── TablePartial (n) ├── TableGroup (n) ├── DiagramView (n) └── Note (n; sticky notes) ``` ### 27.1 Field recursion A Field node carries a type expression. Complex types contain nested Field nodes; tree depth is unbounded. ### 27.2 Path segments Field paths are ordered lists of `PathSegment` nodes, each tagged by kind: `field`, `array_index`, `array_iter`, `map_key`, `map_iter`, `alternative`. Implicit forms are normalized to explicit forms during parsing. ### 27.3 AST flavors - **Raw** -- references stored as strings, for partial-document analysis. - **Normalized** -- references resolved into direct pointers, for traversal and code generation. ## 28. Round-trip semantics ### 28.1 Round-trip targets | Format | Direction | Notes | |---|---|---| | DBML 3.13.6 | xDBML ↔ DBML | Strict superset. xDBML-only constructs emit as comments when downgrading. | | Oracle DDL | xDBML → SQL | `CREATE TABLE`, `CREATE TYPE`, `CHECK`, FKs. JSON via native `JSON` type (21c+). Views as `CREATE VIEW` / `CREATE MATERIALIZED VIEW`. | | PostgreSQL DDL | xDBML → SQL | JSON via `jsonb`. | | MongoDB validators | xDBML ↔ `$jsonSchema` | Object/array/oneOf/union direct. BSON types preserved. | | Avro schemas | xDBML ↔ Avro | Named types → Avro named records. Union element ordering preserved. | | Parquet schemas | xDBML → Parquet | Nested types → Parquet group types. | | JSON Schema | xDBML ↔ JSON Schema | Direct correspondence for most constructs. Named types → `$defs`. | | OpenAPI | xDBML → OpenAPI | Entities and types → `components.schemas`. | | Neo4j / Cypher schema | xDBML → Cypher | Entities → node labels; Edges → relationship types with properties. | | RDF-star | xDBML ↔ RDF-star | Edges → quoted triples with annotations. | | ODCS (schema section) | xDBML → ODCS | xDBML supplies schema portion; other ODCS sections out of scope. | | Hackolade PDM | xDBML ↔ PDM | Lossless round-trip. PDM `Description` fields receive xDBML notes (markdown). | ### 28.2 Lossiness boundary **Lossless:** Container declarations and settings; Entity, View, Edge, Field, Index, Ref structure; nested types; polymorphism; named types (where target supports `$ref`); validation constraints; cardinality; notes. **Lossy when target lacks the construct:** - Polymorphism to relational targets (lowered to discriminator + checks) - Named types to formats without `$ref` (inlined) - Edges to non-LPG relational targets (lowered to junction tables) - Views with non-portable source queries (passed verbatim; may fail on target) Generators emit warnings on lossy lowering. ## 29. Relationship to other standards ``` ┌──────────────────────────────────────────────────────────────────────┐ │ Agent knowledge / context OKF │ ├──────────────────────────────────────────────────────────────────────┤ │ Visualization Mermaid ER, ERD renderers │ ╞══════════════════════════════════════════════════════════════════════╡ │ Data product layer ODPS │ ├──────────────────────────────────────────────────────────────────────┤ │ Inferential / reasoning layer OWL, knowledge graphs │ ├──────────────────────────────────────────────────────────────────────┤ │ Semantic / business layer OSI, dbt MetricFlow, Cube │ ├──────────────────────────────────────────────────────────────────────┤ │ Contract layer ODCS │ ├──────────────────────────────────────────────────────────────────────┤ │ Validation layer JSON Schema, SHACL │ ├──────────────────────────────────────────────────────────────────────┤ │ Schema / shape layer xDBML ← here │ ├──────────────────────────────────────────────────────────────────────┤ │ Engine DDL layer SQL DDL, Cypher, CQL, ... │ └──────────────────────────────────────────────────────────────────────┘ ``` The top band (OKF and Mermaid ER) are cross-cutting consumers that reference or render the schema, rather than layers that own a slice of data-description meaning; each layer beneath the double rule owns one such slice. | Standard | Layer | xDBML relationship | |---|---|---| | SQL DDL | Engine | Forward-engineering target | | Cypher schema | Engine | Forward-engineering target for LPG | | JSON Schema | Schema (JSON-only) | Bidirectional translation | | OpenAPI | Schema (HTTP APIs) | Same shapes | | Avro / Parquet | Schema (serialization) | Forward-engineering target | | SHACL | Validation | Bidirectional translation | | ODCS | Contract | Supplies schema section | | ODPS | Data product | Sits above ODCS | | OSI | Semantic | Complementary via §23 shared vocabulary | | RDFS / OWL | Inferential | Shapes lift into knowledge graphs | | RDF-star | Inferential | Bidirectional for edges with properties | | [Mermaid ER](https://mermaid.ai/open-source/syntax/entityRelationshipDiagram.html) | Visualization | Rendering target: a diagram is generated from xDBML, not the reverse | | [OKF](https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing) | Agent knowledge / context | An xDBML schema populates the schema of an OKF concept document | xDBML does not attempt to be SQL, OSI, ODCS, OWL, or JSON Schema. Each addresses part of the data-description problem at one layer. Nor does it replace diagramming languages such as Mermaid ER, which render a picture of a schema, or agent-context formats such as Google's Open Knowledge Format (OKF), whose concept documents an xDBML schema can populate while remaining the authoritative, generative source. ## 30. Conformance A conforming xDBML v0.3 implementation: 1. **Parses every valid xDBML v0.3 document** to the AST described in §27. 2. **Parses every valid xDBML v0.2 document** with v0.2 semantics (the version directive selects). 3. **Parses every valid xDBML v0.1 document** with v0.1 semantics (the version directive selects). 4. **Rejects malformed documents** with informative error messages. 5. **Honors the version declaration** per §4.1. 6. **Honors the `experimental:` opt-in** per §4.2. 7. **Normalizes implicit forms to canonical AST representations** for paths. 8. **Preserves declared keyword choices** in the raw AST flavor. 9. **Computes default cardinality** per §11.8 when not declared. 10. **Implements the module system** (§26) including relative-path imports, clone blocks, field-level imports, and scoped imports per §26.6, and remote module sources over HTTPS per §26.14 (the `https`-only scheme restriction, and resolution of relative references against a remote module's base URL per §26.14.1). Generators and importers are not required for spec conformance. ## Appendix A. Reserved keywords **Document-level:** `xdbml`, `experimental`, `Project` **Container-level:** `Container`, `Schema`, `Database`, `Keyspace`, `Namespace`, `Dataset`, `Bucket` **Entity-level:** `Entity`, `Table`, `Collection`, `Record` **Other top-level constructs:** `Edge`, `View`, `Type`, `Enum`, `enum`, `Ref`, `TablePartial`, `TableGroup`, `DiagramView`, `Note` **Field types:** `object`, `struct`, `record`, `array`, `list`, `map`, `dict`, `dictionary`, `set`, `union`, `oneOf`, `anyOf`, `allOf`, `json`, `jsonb`, `variant` **Block keywords:** `records`, `indexes`, `checks` **Module system keywords (new in v0.2):** `use`, `reuse`, `from`, `as` **Module system element-type slot values (new in v0.2):** `table`, `entity`, `collection`, `record`, `enum`, `tablepartial`, `note`, `schema`, `container`, `tablegroup`, `type`, `edge`, `view`, `diagramview`, `field` **Module system directive setting (new in v0.2):** `cloned_at` **Relationship operators:** `<`, `>`, `-`, `<>` **Field settings:** `pk`, `primary key`, `null`, `not null`, `unique`, `default`, `increment`, `note`, `ref`, `check` **Cardinality settings:** `source`, `target`, `min_source`, `max_source`, `min_target`, `max_target`, `source_cardinality`, `target_cardinality`, `undirected` **Relationship settings (new for v0.2):** `inactive` (flag for visualization-only deactivation), `color` (`#rgb` / `#rrggbb` for the relationship line) **TableGroup settings (new for v0.2):** `color` (`#rgb` / `#rrggbb` for the group's frame in diagrams) **Check settings:** `name` (constraint name in generated DDL), `note` (free-text annotation) **View settings:** `source_query`, `materialized`, `refresh_schedule`, `refresh_on`, `source_database`, `storage_options` **AI-readiness settings:** `synonyms`, `business_term`, `granularity`, `tags` **Custom-property prefix:** `x_` (reserved as a convention for organization-specific extensions; see §23.5) Quoted identifiers (`"keyword"`) may be used for entities or fields that share a name with a reserved keyword. ## Appendix B. Lexical grammar reference | Token class | Pattern | |---|---| | Identifier | `[A-Za-z_][A-Za-z0-9_]*` | | Quoted identifier | `"` any chars with `\"`, `\\` escapes `"` | | String literal | `'` any chars with `\'`, `\\` escapes `'` | | Multi-line string | `'''` any chars with indentation normalization `'''` | | Number | integer or decimal, optional sign and exponent | | Expression | `` ` `` any chars `` ` `` | | Cardinality string | `'N..M'` where N, M are non-negative integers or `*` | | Comment (single-line) | `//` to end of line | | Comment (block) | `/*` ... `*/` | | Container/entity-qualified name | `identifier.identifier` | | Composite reference | `identifier.(identifier, identifier, ...)` | | Field path | `identifier` followed by zero or more `.identifier`, `.[N]`, `.[*]`, `.["literal"]`, or `."quoted"` segments | A formal ANTLR4 grammar is published at [`grammar/xDBML.g4`](https://github.com/xdbml/xdbml-spec/blob/main/grammar/xDBML.g4) in this repository. ## Appendix C. Worked end-to-end examples ### C.1 Polyglot model: Oracle + MongoDB + events + graph View in playground ```xdbml xdbml: 0.3 Project polyglot_example { targets: [Oracle, MongoDB, Avro, Neo4j] } Type Address { street varchar [not null] city varchar [not null] country varchar } Type MonetaryAmount { amount Decimal128 [not null] currency string [not null, pattern: '^[A-Z]{3}$', minLength: 3, maxLength: 3] } Container core [type: schema, target: Oracle] { Table customers { id int [pk] email varchar [unique, not null, pattern: '^[^@]+@[^@]+$'] display_name varchar [not null] primary_address Address } } Container orders_store [type: database, target: MongoDB] { Collection orders { _id objectId [pk] customer_id int32 [not null] placed_at Date [granularity: second] total MonetaryAmount line_items array [ line_item object { sku string [not null] quantity int32 [not null, minimum: 1] unit_price MonetaryAmount } ] payment_method oneOf { card object { last4 string [maxLength: 4] brand string } bank object { iban string } wallet object { provider string account string } } [discriminator: method_kind] } } Container events [type: namespace, target: Avro] { Record OrderPlaced { event_id string [pk] occurred long [granularity: millisecond, not null, note: 'Unix epoch milliseconds; Avro logical-type timestamp-millis'] order_id string [not null, note: 'String form of order _id from orders_store'] customer_id int [not null] total MonetaryAmount } } Container social [type: database, target: Neo4j] { Edge FOLLOWS [source: core.customers, target: core.customers, source_cardinality: '0..*', target_cardinality: '0..*'] { since date [not null] is_close boolean [default: false] } } Container catalog [type: schema, target: Oracle] { Entity products { sku varchar [pk] name varchar [not null] price MonetaryAmount metadata json { weight_grams int dimensions object { length int width int height int } tags array [varchar] } } View top_sellers [materialized: true, refresh_schedule: 'daily'] { source_query: ''' SELECT p.sku, p.name, COUNT(*) AS order_count FROM catalog.products p JOIN orders_store.orders o ON o.line_items.sku = p.sku GROUP BY p.sku, p.name ORDER BY order_count DESC FETCH FIRST 100 ROWS ONLY ''' sku varchar [pk] name varchar order_count int } } Ref: orders_store.orders.customer_id > core.customers.id [source: '1..*', target: '1..1'] Ref: orders_store.orders.line_items.[*].sku > catalog.products.sku Ref: events.OrderPlaced.order_id > orders_store.orders._id Ref: events.OrderPlaced.customer_id > core.customers.id ``` ### C.2 Heterogeneous tuple View in playground ```xdbml xdbml: 0.3 Entity customers { id int [pk] name varchar addresses array [ [0] billing object { street varchar city varchar zip varchar }, [1] shipping object { street varchar city varchar zip varchar } ] indexes { addresses.[0].city addresses.[1].zip } } ``` ### C.3 Recursive named type View in playground ```xdbml xdbml: 0.3 Type TreeNode { id int value varchar children array [child TreeNode] } Entity organization_chart { root_id int [pk] tree TreeNode } ``` ### C.4 Graph model with property edges View in playground ```xdbml xdbml: 0.3 Project social_network { targets: Neo4j } Container social [type: database, target: Neo4j] { Entity Person { id int [pk] name varchar [not null] email varchar [unique] } Entity Movie { id int [pk] title varchar [not null] year int } Edge KNOWS [source: Person, target: Person, source_cardinality: '0..*', target_cardinality: '0..*'] { since date [not null] intimacy int [minimum: 0, maximum: 10] } Edge ACTED_IN [source: Person, target: Movie] { role varchar [not null] billing int [note: 'order in credits'] } Edge RATED [source: Person, target: Movie] { rating int [not null, minimum: 1, maximum: 5] rated_at timestamp [granularity: second] review varchar } } ``` ## Appendix D. Compatibility with DBML 3.13.6 xDBML is a strict superset of DBML 3.13.6: 1. Every valid DBML 3.13.6 document is a valid xDBML document at the lowest compatibility level (no version declaration). 2. A document declaring `xdbml: 0.1` opts into the v0.1 feature set; all DBML 3.13.6 constructs remain valid. 3. A document declaring `xdbml: 0.2` opts into the v0.2 feature set (which includes the v0.1 feature set plus the module system, scalar Named Types, and field-level imports); all DBML 3.13.6 constructs remain valid. 4. A document declaring `xdbml: 0.3` opts into the v0.3 feature set (the v0.2 feature set plus remote module sources over HTTPS, §26.14); all DBML 3.13.6 constructs remain valid. 5. xDBML extension constructs are not recognized in documents without a version declaration. 6. DBML's implicit-schema syntax (`Table core.users { ... }`) remains valid; produces an implicit Container. 7. DBML's `Table` keyword remains valid as a synonym for `Entity`. 8. DBML's `use` and `reuse` module-system directives (without clone blocks, without xDBML-specific element types) behave identically in xDBML. v0.2 adds clone blocks, field-level imports, the extended element-type vocabulary, and Container-scoped imports; v0.3 adds remote module sources over HTTPS -- all available only when the file declares the corresponding `xdbml:` version. A DBML 3.13.6 document is upgraded to xDBML by adding `xdbml: 0.3` at the top (or `xdbml: 0.2` / `xdbml: 0.1` for an earlier feature set). No structural changes required. ## Appendix E. Sources and acknowledgments xDBML builds on the work of Holistics and contributors to DBML (Apache License 2.0). The DBML language reference at dbml.dbdiagram.io served as the foundation for the constructs documented in §5, §7, §9, §10, §11, §15, §16, §17, §18, §25, §26. xDBML acknowledges intellectual debt to: - **[JSON Schema](https://json-schema.org/)** -- validation vocabulary, `oneOf`/`anyOf`/`allOf` - **[Avro](https://avro.apache.org/)** -- named records, namespaces, union semantics - **[OpenAPI](https://www.openapis.org/)** -- discriminator conventions, `components.schemas` reuse - **[ODCS](https://github.com/bitol-io/open-data-contract-standard)** (bitol.io) -- Open Data Contract Standard: schema/contract layering - **[OSI](https://github.com/open-semantic-interchange/OSI)** -- Open Semantic Interchange: schema/semantic-layer boundary vocabulary - **[openCypher](https://opencypher.org/) / Neo4j** -- labeled property graph semantics; edge properties - **[RDF-star](https://w3c-cg.github.io/rdf-star/cg-spec/editors_draft.html)** -- annotation properties on triples - **[Mermaid ER code](https://mermaid.ai/open-source/syntax/entityRelationshipDiagram.html)** -- cardinality notation principles - **[Hackolade Studio](https://hackolade.com)** -- polyglot data modeling principles, AI-readiness settings, path notation xDBML also interoperates with contemporary standards that postdate its design rather than informing it: - **[Open Knowledge Format (OKF)](https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing)** (Google Cloud) -- vendor-neutral markdown format for agent-facing knowledge context; an xDBML schema populates the schema an OKF concept describes (see [§29](#_29-relationship-to-other-standards)) --- *End of xDBML Specification v0.3.*