--- name: dbt-testing description: dbt testing strategies using dbt_constraints for database-level enforcement, generic tests, and singular tests. Use this skill when implementing data quality checks, adding primary/foreign key constraints, creating custom tests, or establishing comprehensive testing frameworks across bronze/silver/gold layers. --- # dbt Testing ## Purpose Transform AI agents into experts on dbt testing strategies, providing guidance on implementing comprehensive data quality checks with database-enforced constraints, generic tests, and custom singular tests to ensure data integrity across all layers. ## When to Use This Skill Activate this skill when users ask about: - Implementing data quality tests - Adding primary key and foreign key constraints - Using dbt_constraints package for database-level enforcement - Creating generic (reusable) tests - Writing singular (one-off) tests - Testing strategies by layer (bronze/silver/gold) - Debugging test failures - Configuring test severity levels - Storing test failures for analysis **Official dbt Documentation**: [Testing](https://docs.getdbt.com/docs/build/tests) --- ## Testing Philosophy Implement tests in this order for maximum data quality: 1. **Primary Keys** - Every dimension must have one 2. **Foreign Keys** - All fact relationships 3. **Unique Keys** - Business key constraints 4. **Business Rules** - Domain-specific validations 5. **Data Quality** - Completeness, accuracy, consistency --- ## Why Use dbt_constraints? The `dbt_constraints` package provides database-level enforcement (not just dbt tests): ✅ **Database Enforcement** - Creates actual constraints in the data warehouse ✅ **Performance** - Database-level constraints improve query optimization ✅ **Data Integrity** - Prevents invalid data at all access points (not just dbt) ✅ **Documentation** - Constraints visible in database metadata and BI tools ✅ **Query Optimization** - Database can use constraints for better execution plans **Standard dbt tests** only validate during `dbt test` runs. **dbt_constraints** creates real database constraints that are enforced 24/7. **Official dbt_constraints Documentation**: [GitHub - Snowflake-Labs/dbt_constraints](https://github.com/Snowflake-Labs/dbt_constraints) --- ## Package Installation ```yaml # packages.yml packages: - package: Snowflake-Labs/dbt_constraints version: [">=0.8.0", "<1.0.0"] - package: dbt-labs/dbt_utils version: [">=1.0.0", "<2.0.0"] ``` Install packages: ```bash dbt deps ``` **Official dbt Docs**: [Package Management](https://docs.getdbt.com/docs/build/packages) --- ## Primary Key Testing ### Simple Primary Key (dbt_constraints) **Required for every dimension:** ```yaml # models/gold/_models.yml models: - name: dim_customers columns: - name: customer_id tests: - dbt_constraints.primary_key ``` ### Composite Primary Key **When primary key spans multiple columns:** ```yaml models: - name: fct_order_lines tests: - dbt_constraints.primary_key: column_names: - order_id - line_number ``` ### Alternative: Built-in dbt Tests **Not recommended - no database enforcement:** ```yaml columns: - name: product_id tests: - not_null - unique ``` **Limitation**: Only validates during `dbt test` runs, doesn't prevent bad data from other sources. --- ## Foreign Key Testing ### Simple Foreign Key (dbt_constraints) **Ensures referential integrity:** ```yaml models: - name: fct_orders columns: - name: customer_id tests: - dbt_constraints.foreign_key: pk_table_name: ref('dim_customers') pk_column_name: customer_id ``` ### Multiple Foreign Keys **For facts with multiple dimension relationships:** ```yaml models: - name: fct_order_lines columns: - name: order_id tests: - dbt_constraints.foreign_key: pk_table_name: ref('fct_orders') pk_column_name: order_id - name: product_id tests: - dbt_constraints.foreign_key: pk_table_name: ref('dim_products') pk_column_name: product_id - name: customer_id tests: - dbt_constraints.foreign_key: pk_table_name: ref('dim_customers') pk_column_name: customer_id ``` ### Alternative: Built-in dbt Relationships Test **Not recommended - no database enforcement:** ```yaml columns: - name: customer_id tests: - relationships: to: ref('dim_customers') field: customer_id ``` --- ## Unique Key Testing ### Simple Unique Key (dbt_constraints) **For business keys (non-primary keys that must be unique):** ```yaml columns: - name: customer_email tests: - dbt_constraints.unique_key ``` ### Composite Unique Key **When uniqueness spans multiple columns:** ```yaml models: - name: stg_orders tests: - dbt_constraints.unique_key: column_names: - order_number - order_source ``` --- ## Generic Tests (Reusable) ### Built-in dbt Tests ```yaml columns: - name: order_status tests: - not_null - accepted_values: values: ["pending", "processing", "shipped", "delivered", "cancelled"] - name: order_amount tests: - not_null ``` ### dbt_utils Tests **Powerful generic tests from dbt_utils package:** ```yaml columns: - name: customer_email tests: - dbt_utils.not_null_proportion: at_least: 0.95 # 95% of rows must have email - name: order_amount tests: - dbt_utils.accepted_range: min_value: 0 max_value: 1000000 - name: customer_status tests: - dbt_utils.not_empty_string ``` **Official dbt_utils Documentation**: [dbt_utils - Generic Tests](https://github.com/dbt-labs/dbt-utils#generic-tests) --- ### Custom Generic Tests **Create reusable test for common patterns:** ```sql -- tests/generic/test_positive_values.sql {% test positive_values(model, column_name) %} select count(*) from {{ model }} where {{ column_name }} <= 0 {% endtest %} ``` **Usage:** ```yaml columns: - name: order_total tests: - positive_values - name: quantity tests: - positive_values ``` **Another Example: Date Range Test** ```sql -- tests/generic/test_recent_data.sql {% test recent_data(model, column_name, days_ago=30) %} select count(*) from {{ model }} where {{ column_name }} < dateadd(day, -{{ days_ago }}, current_date()) {% endtest %} ``` **Usage:** ```yaml columns: - name: order_date tests: - recent_data: days_ago: 7 # Alert if no orders in last 7 days ``` --- ## Singular Tests (One-Off) **For complex business logic that doesn't fit generic tests:** ```sql -- tests/singular/test_order_dates_sequential.sql with date_validation as ( select o.order_id, o.order_date, c.signup_date from {{ ref('fct_orders') }} o join {{ ref('dim_customers') }} c on o.customer_id = c.customer_id where o.order_date < c.signup_date -- Order before signup = invalid ) select * from date_validation ``` **Test fails if ANY rows are returned.** ### More Singular Test Examples **Revenue Reconciliation:** ```sql -- tests/singular/test_revenue_reconciliation.sql -- Ensure fact table revenue matches source system with fact_revenue as ( select sum(order_amount) as total_revenue from {{ ref('fct_orders') }} where order_date = current_date() - 1 ), source_revenue as ( select sum(amount) as total_revenue from {{ source('erp', 'orders') }} where order_date = current_date() - 1 ), comparison as ( select f.total_revenue as fact_revenue, s.total_revenue as source_revenue, abs(f.total_revenue - s.total_revenue) as difference from fact_revenue f cross join source_revenue s ) select * from comparison where difference > 0.01 -- Tolerance of 1 cent ``` **Referential Integrity Check:** ```sql -- tests/singular/test_orphaned_orders.sql -- Find orders with invalid customer_id (not in dim_customers) select o.order_id, o.customer_id from {{ ref('fct_orders') }} o left join {{ ref('dim_customers') }} c on o.customer_id = c.customer_id where c.customer_id is null and o.customer_id != -1 -- Exclude ghost key ``` **Official dbt Documentation**: [Singular Tests](https://docs.getdbt.com/docs/build/tests#singular-tests) --- ## Testing by Layer ### Bronze Layer (Staging) **Focus**: Basic data quality at source ```yaml models: - name: stg_tpc_h__customers columns: - name: customer_id tests: - dbt_constraints.primary_key - name: customer_email tests: - not_null ``` **Keep it simple** - just verify source data integrity. --- ### Silver Layer (Intermediate) **Focus**: Business rule validation, calculated fields ```yaml models: - name: int_customers__with_orders columns: - name: customer_id tests: - dbt_constraints.primary_key - name: lifetime_orders tests: - not_null - dbt_utils.accepted_range: min_value: 0 - name: lifetime_value tests: - not_null - dbt_utils.accepted_range: min_value: 0 ``` **Add business logic validation** - ensure calculated fields make sense. --- ### Gold Layer (Marts) **Focus**: Comprehensive constraint enforcement with dbt_constraints ```yaml models: - name: dim_customers description: "Customer dimension with full history and metrics" columns: - name: customer_id description: "Unique customer identifier" tests: - dbt_constraints.primary_key - name: customer_tier description: "Customer value classification" tests: - accepted_values: values: ["bronze", "silver", "gold", "platinum"] - name: customer_email tests: - dbt_constraints.unique_key - name: fct_orders description: "Order transactions fact table" columns: - name: order_id tests: - dbt_constraints.primary_key - name: customer_id tests: - dbt_constraints.foreign_key: pk_table_name: ref('dim_customers') pk_column_name: customer_id - name: product_id tests: - dbt_constraints.foreign_key: pk_table_name: ref('dim_products') pk_column_name: product_id - name: order_amount tests: - not_null - dbt_utils.accepted_range: min_value: 0 ``` **Maximum enforcement** - use all constraint types to ensure production data quality. --- ## Test Configuration ### Store Test Failures **Analyze failed test records:** ```bash dbt test --store-failures ``` ```yaml # dbt_project.yml tests: +store_failures: true +schema: dbt_test_failures ``` **Query failures:** ```sql select * from dbt_test_failures.not_null_dim_customers_customer_email ``` --- ### Test Severity Levels **Warn vs Error:** ```yaml columns: - name: customer_email tests: - dbt_constraints.unique_key: config: severity: warn # or 'error' (default) ``` **Severity Behavior:** - `error`: Test failure stops dbt execution (exit code 1) - `warn`: Test failure logs warning but continues (exit code 0) **Use `warn` for**: - Data quality checks that shouldn't block deployment - Known edge cases during migration - Monitoring tests --- ### Limit Test Execution **Test specific model:** ```bash dbt test --select dim_customers ``` **Test by type:** ```bash dbt test --select test_type:generic # All generic tests dbt test --select test_type:singular # All singular tests ``` **Test with dependencies:** ```bash dbt test --select +dim_customers+ # Test model and all dependencies ``` **Official dbt Documentation**: [Test Selection](https://docs.getdbt.com/reference/node-selection/test-selection-examples) --- ## Running Tests ```bash # Run all tests dbt test # Build models and test together (recommended) dbt build # Runs models, then tests # Test specific model dbt test --select dim_customers # Test specific column dbt test --select dim_customers,column:customer_id # Test by layer dbt test --select tag:gold # Test with failures stored dbt test --store-failures --select fct_orders ``` **Best Practice**: Use `dbt build` instead of `dbt run` + `dbt test` separately. --- ## Testing Best Practices ### 1. Test Early and Often Add tests as you build models, not after deployment. ### 2. Layer-Appropriate Testing - **Bronze**: Basic not_null and primary key tests - **Silver**: Business rule validation, range checks - **Gold**: Comprehensive constraint enforcement with dbt_constraints ### 3. Use dbt_constraints for Production Database-level constraints provide: - 24/7 enforcement (not just during dbt runs) - Performance optimization - Better integration with BI tools ### 4. Document Test Purpose ```yaml columns: - name: customer_tier description: "Customer segmentation based on lifetime value" tests: - accepted_values: values: ["bronze", "silver", "gold", "platinum"] config: severity: error ``` ### 5. Balance Coverage vs Performance - Don't over-test trivial columns - Focus on business-critical fields - Use sampling for very large tables if needed --- ## Testing Checklist Before moving to production: - [ ] All dimensions have primary key tests - [ ] All facts have foreign key tests to dimensions - [ ] Business rules are validated with tests - [ ] Data quality tests are in place (not_null, accepted_values) - [ ] Tests run successfully in CI/CD pipeline - [ ] dbt_constraints enabled for all production marts - [ ] Test failures configured to store in database - [ ] Singular tests created for complex business logic --- ## Helping Users with Testing ### Strategy for Assisting Users When users ask about testing: 1. **Identify model type**: Dimension? Fact? Intermediate? 2. **Recommend appropriate tests**: By layer and purpose 3. **Prioritize constraints**: Primary keys → Foreign keys → Business rules 4. **Provide complete examples**: Working YAML configurations 5. **Explain benefits**: Why dbt_constraints over standard tests 6. **Show how to run**: Commands and debugging approaches ### Common User Questions **"What tests should I add?"** - Start with dbt_constraints for primary/foreign keys - Add not_null for required fields - Use accepted_values for enums - Create singular tests for complex business logic **"Why use dbt_constraints instead of regular tests?"** - Database-level enforcement (24/7, not just during dbt runs) - Better query performance - Prevents bad data from any source - Visible in database metadata **"How do I debug test failures?"** - Use `--store-failures` to save failing records - Query the test failure table - Review actual data that failed the test - Add more specific tests to isolate issue --- ## Related Official Documentation - [dbt Docs: Tests](https://docs.getdbt.com/docs/build/tests) - [dbt_constraints Package](https://github.com/Snowflake-Labs/dbt_constraints) - [dbt_utils Generic Tests](https://github.com/dbt-labs/dbt-utils#generic-tests) - [dbt Docs: Test Selection](https://docs.getdbt.com/reference/node-selection/test-selection-examples) --- **Goal**: Transform AI agents into expert dbt testers who implement comprehensive, database-enforced data quality checks that protect data integrity across all layers and access patterns.