--- name: building-dbt-semantic-layer description: Use when creating or modifying dbt Semantic Layer components including semantic models, metrics, and dimensions leveraging MetricFlow. user-invocable: false metadata: author: dbt-labs --- # Building the dbt Semantic Layer This skill guides the creation and modification of dbt Semantic Layer components: semantic models, entities, dimensions, and metrics. - **Semantic models** - Metadata configurations that define how dbt models map to business concepts - **Entities** - Keys that identify the grain of your data and enable joins between semantic models - **Dimensions** - Attributes used to filter or group metrics (categorical or time-based) - **Metrics** - Business calculations defined on top of semantic models (e.g., revenue, order count) ## Additional Resources - [Time Spine Setup](references/time-spine.md) - Required for time-based metrics and aggregations - [Best Practices](references/best-practices.md) - Design patterns and recommendations for semantic models and metrics > [!NOTE] This skill contains guidance for the new dbt semantic layer YAML spec, valid for dbt 1.12.0 and above. If the user is using a different version of dbt, you can use the [migration guide](https://docs.getdbt.com/docs/build/latest-metrics-spec) to help them migrate to the new spec and add new components to their semantic layer. Ask the user if they want to migrate to the new spec before proceeding. ## Entry Points Users may ask questions related to building metrics with the semantic layer in a few different ways. Here are the common entry points to look out for: ### Business Question First When the user describes a metric or analysis need (e.g., "I need to track customer lifetime value by segment"): 1. Search project models or existing semantic models by name, description, and column names for relevant candidates 2. Present top matches with brief context (model name, description, key columns) 3. User confirms which model(s) / semantic models to build on / extend / update 4. Work backwards from users need to define entities, dimensions, and metrics ### Model First When the user specifies a model to expose (e.g., "Add semantic layer to `customers` model"): 1. Read the model SQL and existing YAML config 2. Identify the grain (primary key / entity) 3. Suggest dimensions based on column types and names 4. Ask what metrics the user wants to define Both paths converge on the same implementation workflow. ### Open Ended User asks to build the semantic layer for a project or models that are not specified. ("Build the semantic layer for my project") 1. Identify high importance models in the project 2. Suggest some metrics and dimensions for those models 3. Ask the user if they want to create more metrics and dimensions or if there are any other models they want to build the semantic layer on ## Implementation Workflow ### Step 1: Enable Semantic Model Decide which dbt model to build the semantic model on. Add `semantic_model:` block to the model's YAML with `enabled: true`. Set `agg_time_dimension` to the primary time column. If the model does not have a time column, warn user that the model cannot contain metrics that are time-based. Ask the user if they want to create a derived time dimension. Example YAML: ```yaml models: - name: orders semantic_model: enabled: true # enable the semantic model agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model) ``` ### Step 2: Define Entities Identify the primary key column (check for `_id` suffix, uniqueness tests, or explicit config). Add `entity: \n\t type: primary` block to that column's entry. If the model has foreign keys, define those as `entity: type: foreign`. ```yaml models: - name: orders semantic_model: enabled: true # enable the semantic model agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model) columns: - name: order_id # this is the primary key column of the model entity: type: primary name: order - name: customer_id # this is a foreign key column of the model entity: type: foreign name: customer ``` ### Step 3: Define Dimensions Scan columns for dimension candidates. These would be useful columns to group by when querying a metrics: - Time columns → `dimension: type: time` with appropriate `granularity` (set at the column level) - Categorical columns (strings, booleans) → `dimension: type: categorical` Present suggested dimensions to user for confirmation. Example YAML: ```yaml models: - name: orders semantic_model: enabled: true # enable the semantic model agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model) columns: - name: order_id entity: type: primary name: order - name: customer_id entity: type: foreign name: customer - name: ordered_at granularity: day # set the granularity of the time column dimension: type: time - name: order_status dimension: type: categorical ``` ### Step 4: Define Metrics Create some simple metrics for the model. For each metric, collect: name, description, label, aggregation type, and expression. Support metric types: `simple`, `derived`, `cumulative`, `conversion`, `ratio`. ```yaml models: - name: orders semantic_model: enabled: true # enable the semantic model agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model) columns: - name: order_id entity: type: primary name: order - name: customer_id entity: type: foreign name: customer - name: ordered_at granularity: day # set the granularity of the time column dimension: type: time - name: order_status dimension: type: categorical metrics: - name: order_count type: simple agg: count expr: 1 - name: total_revenue type: simple agg: sum expr: amount - name: average_order_value type: simple agg: avg expr: amount ``` ## Validation After writing YAML, validate in two stages: 1. **Parse Validation**: Run `dbt parse` to confirm YAML syntax and references 2. **Semantic Layer Validation**: - `dbt sl validate` (dbt Cloud CLI / dbt Fusion) - `mf validate-configs` (MetricFlow CLI) Do not consider work complete until both validations pass. ## Editing Existing Components When modifying existing semantic layer config: - Check if the model's YAML already has `semantic_model:` block - Read existing entities, dimensions, and metrics before making changes - Preserve all existing YAML content not being modified - After edits, run full validation to ensure nothing broke ## YAML Format Reference ### Derived Dimensions and Entities if the user wants to create a derived dimension or entity that is not a column within the dbt model, then we can use the `derived_semantics` block. ```yaml derived_semantics: dimensions: - name: order_size_bucket type: categorical expr: case when amount > 100 then 'large' else 'small' end label: "Order Size" entities: - name: order_customer_key type: foreign expr: "order_id || '-' || customer_id" ``` ## Advanced Metric Examples All simple metrics are defined at the model level under the `metrics` key. Advanced metrics that refer to simple metrics _within the same model_ are defined within a model's YAML entry the `models.metrics` key. Advanced metrics that refer to simple metrics _across different models_ are defined at the top level under the `metrics` key. ### Derived Metrics ```yaml - name: revenue_per_order type: derived description: Average revenue per order label: Revenue per Order expr: total_revenue / total_orders input_metrics: - name: total_revenue - name: total_orders # With offset window - name: revenue_growth type: derived expr: total_revenue - revenue_last_week input_metrics: - name: total_revenue - name: total_revenue alias: revenue_last_week offset_window: 1 week filter: "{{ Dimension('order__status') }} = 'completed'" ``` ### Cumulative Metrics ```yaml - name: cumulative_revenue type: cumulative description: Running total of revenue label: Cumulative Revenue input_metric: total_revenue grain_to_date: week period_agg: first # With window - name: trailing_7d_revenue type: cumulative input_metric: total_revenue window: 7 days ``` ### Ratio Metrics ```yaml - name: conversion_rate type: ratio description: Orders divided by visits label: Conversion Rate numerator: total_orders denominator: total_visits # With filters - name: premium_conversion_rate type: ratio numerator: name: total_orders filter: "{{ Dimension('order__customer_segment') }} = 'premium'" alias: premium_orders denominator: total_visits ``` ### Conversion Metrics ```yaml - name: signup_to_purchase type: conversion description: Rate of signups converting to purchase label: Signup to Purchase entity: customer calculation: conversion_rate base_metric: signups conversion_metric: purchases window: 7 days constant_properties: - base_property: signup_channel conversion_property: purchase_channel ``` ### Top-level Metrics (Cross-model) ```yaml # For metrics depending on multiple semantic models metrics: - name: cross_model_ratio type: ratio numerator: name: metric_from_model_a filter: "{{ Dimension('entity__dim') }} > 10" denominator: name: metric_from_model_b config: group: example_group tags: - example_tag meta: owner: "@someone" ``` ### Filtering Metrics Filters can be added to simple metrics or metric inputs to advanced metrics. The format of a filters is a Jinja template that can reference entities, dimensions, and metrics, a boolean operator, and a value. Ensure the value matches the type of the column being filtered. Examples filter: | {{ Entity('entity_name') }} = 'value' filter: | {{ Dimension('primary_entity__dimension_name') }} > 100 filter: | {{ TimeDimension('time_dimension', 'granularity') }} > '2026-01-01' filter: | {{ Metric('metric_name', group_by=['entity_name']) }} > 100 ## Key Formatting Rules - `semantic_model:` block at model level with `enabled: true` - `agg_time_dimension:` at model level (not nested under `semantic_model`) - `entity:` and `dimension:` on columns (can use shorthand or full form) - `granularity:` required at column level for time dimensions - `metrics:` array at model level for single-model metrics - Top-level `metrics:` key for cross-model metrics (derived, ratio, cumulative, conversion only) ## Best Practices - **Start with entities** - Identify the grain before defining dimensions or metrics - **Use shorthand where possible** - `entity: primary` instead of full nested form for simple cases - **Name metrics for business users** - Use clear `label` values for non-technical users - **Keep metrics close to their data** - Simple metrics on their semantic model; top-level only for cross-model - **Set appropriate granularity** - Match the actual data grain (usually `day`) ## Common Pitfalls | Pitfall | Fix | |---------|-----| | Missing `agg_time_dimension` | Every semantic model needs a default time dimension | | `granularity` inside `dimension:` block | Must be at column level | | Both `entity` and `dimension` on same column | A column can only be one or the other | | Simple metrics in top-level `metrics:` | Top-level is only for cross-model metrics | | Using `window` and `grain_to_date` together | Cumulative metrics can only have one | | Missing `input_metrics` on derived metrics | Must list metrics used in `expr` |