--- name: data-quality-test-generator description: Generate comprehensive dbt test suites following FF Analytics data quality standards and dbt 1.10+ syntax. This skill should be used when creating tests for new dbt models, adding tests to existing models, standardizing test coverage, or implementing data quality gates. Covers grain uniqueness, FK relationships, enum validation, and freshness tests. --- # Data Quality Test Generator Generate comprehensive dbt test suites for Fantasy Football Analytics models following Kimball patterns, implementation requirements, and dbt 1.10+ syntax. ## When to Use This Skill Use this skill proactively when: - Creating tests for new dbt models (staging, facts, dimensions, marts) - User asks to "add tests for {model}" or "improve test coverage" - Implementing data quality gates per implementation requirements - Standardizing existing tests to dbt 1.10+ syntax - Creating tests alongside models (integration with dbt-model-builder skill) ## dbt 1.10+ Syntax Requirements **CRITICAL**: FF Analytics uses dbt 1.10+ which requires the new syntax: - Use `data_tests:` not `tests:` - Wrap test arguments in `arguments:` block - Use `config:` for test configuration (severity, where clauses) ### Correct vs Incorrect Syntax **✅ CORRECT (dbt 1.10+):** ```yaml data_tests: - accepted_values: arguments: values: ['value1', 'value2'] config: severity: error ``` **❌ INCORRECT (old syntax):** ```yaml tests: - accepted_values: values: ['value1', 'value2'] severity: error ``` ## Test Generation by Model Type ### Task 1: Staging Model Tests Staging models normalize raw provider data and require schema conformance tests. **Test Priorities:** 1. **not_null** on all primary key columns 2. **unique** on single-column natural keys 3. **accepted_values** on categorical/enum columns 4. Basic data type validation **Example Pattern:** ```yaml version: 2 models: - name: stg_nflverse__weekly description: "NFLverse weekly player stats" columns: - name: player_id description: "gsis_id from nflverse" data_tests: - not_null - name: season data_tests: - not_null - dbt_utils.expression_is_true: arguments: expression: ">= 2020" # Reasonable season bounds - name: week data_tests: - not_null - accepted_values: arguments: values: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] - name: position data_tests: - not_null - accepted_values: arguments: values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF'] ``` **See**: `references/example_staging_tests.yml` ### Task 2: Fact Table Tests Fact tables capture measurable events and require grain + FK + measure validation. **Test Priorities:** 1. **Grain uniqueness** using `dbt_utils.unique_combination_of_columns` 2. **Relationship tests** for all foreign keys to dimensions 3. **not_null** on required measures 4. **accepted_values** on 2×2 model enums (measure_domain, stat_kind, horizon) 5. **Expression tests** for business logic (non-negative scores, etc.) **Example Pattern:** ```yaml version: 2 models: - name: fact_player_stats description: "Player statistics fact table (grain: player + game + stat)" # Grain uniqueness test data_tests: - dbt_utils.unique_combination_of_columns: arguments: combination_of_columns: - player_id - game_id - stat_name - measure_domain - stat_kind config: severity: error columns: # Foreign key to dim_player - name: player_id description: "Foreign key to dim_player (mfl_id)" data_tests: - not_null - relationships: arguments: to: ref('dim_player') field: player_id # Foreign key to dim_game - name: game_id description: "Foreign key to dim_game" data_tests: - not_null - relationships: arguments: to: ref('dim_game') field: game_id # 2x2 model enum: measure_domain - name: measure_domain description: "Real-world vs fantasy scoring dimension" data_tests: - not_null - accepted_values: arguments: values: ['real_world', 'fantasy'] # 2x2 model enum: stat_kind - name: stat_kind description: "Actuals vs projections dimension" data_tests: - not_null - accepted_values: arguments: values: ['actual', 'projection'] # Measure validation - name: stat_value description: "Value of the statistical measure" data_tests: - not_null ``` **See**: `references/example_fact_tests.yml` ### Task 3: Dimension Table Tests Dimensions provide descriptive context and require uniqueness + SCD validation. **Test Priorities:** 1. **unique** + **not_null** on surrogate key (dimension PK) 2. **not_null** on natural key columns 3. **SCD Type 2 tests** (if applicable): - `valid_from <= valid_to` (or valid_to IS NULL for current) - No overlapping date ranges for same natural key 4. **accepted_values** on categorical attributes **Example Pattern:** ```yaml version: 2 models: - name: dim_player description: "Player dimension with SCD Type 2 history" columns: # Surrogate key - name: player_sk description: "Surrogate key for SCD Type 2" data_tests: - unique - not_null # Natural key - name: player_id description: "Natural key (mfl_id canonical)" data_tests: - not_null # SCD Type 2 validity - name: valid_from description: "Start of validity period" data_tests: - not_null - name: valid_to description: "End of validity period (NULL = current)" # SCD Type 2 date logic test - name: _scd_validation description: "Ensure valid_from <= valid_to when valid_to is not null" data_tests: - dbt_utils.expression_is_true: arguments: expression: "valid_from <= COALESCE(valid_to, CURRENT_DATE + INTERVAL '100 years')" # Categorical attributes - name: position description: "Player position" data_tests: - accepted_values: arguments: values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF'] - name: status description: "Active/Inactive/Retired" data_tests: - accepted_values: arguments: values: ['active', 'inactive', 'retired', 'unknown'] ``` **See**: `references/example_dimension_tests.yml` ### Task 4: Mart Tests Marts are analytics-ready views requiring business logic validation. **Test Priorities:** 1. **Row count thresholds** (dbt_utils.expression_is_true on table) 2. **Metric validation** (non-negative fantasy points, reasonable ranges) 3. **Partition completeness** (all weeks/seasons represented) 4. **Aggregate consistency** (totals match source facts) **Example Pattern:** ```yaml version: 2 models: - name: mart_fantasy_actuals_weekly description: "Weekly fantasy points by player (analytics-ready)" data_tests: # Minimum row count threshold - dbt_utils.expression_is_true: arguments: expression: "(SELECT COUNT(*) FROM {{ ref('mart_fantasy_actuals_weekly') }}) > 10000" config: severity: warn columns: - name: player_name data_tests: - not_null - name: fantasy_points description: "Total fantasy points for the week" data_tests: - not_null - dbt_utils.expression_is_true: arguments: expression: ">= 0" # Fantasy points can't be negative - name: games_played data_tests: - accepted_values: arguments: values: [0, 1] # Weekly grain, max 1 game per week ``` ### Task 5: Source Freshness Tests Implement freshness monitoring for critical data sources. **Example Freshness Policies** (always check latest documentation for the most up to date policies) - **NFLverse**: warn_after 24h, error_after 48h (during season) - **Commissioner Sheets**: warn_after 12h, error_after 24h - **KTC**: warn_after 48h, error_after 96h (less time-sensitive) **Example Pattern:** ```yaml version: 2 sources: - name: nflverse description: "NFLverse data provider" freshness: warn_after: {count: 24, period: hour} error_after: {count: 48, period: hour} tables: - name: weekly description: "Weekly player statistics" identifier: "dt=*/**" # Partitioned by date # Freshness test on max snapshot_date freshness: warn_after: {count: 24, period: hour} error_after: {count: 48, period: hour} ``` ## Data Quality Requirements The following tests are **required** in general: 1. **Accepted values on enums** - All categorical columns must have accepted_values tests 2. **Freshness tests** - All source tables must have warn/error thresholds 3. **Grain uniqueness** - All fact tables must test grain with unique_combination_of_columns 4. **FK relationships** - All foreign keys must have relationship tests to dimensions ## Best Practices ### Test Coverage Targets - **Staging models**: >80% column coverage (focus on PKs and enums) - **Fact tables**: 100% grain + FK coverage - **Dimensions**: 100% PK/NK coverage - **Marts**: Business logic validation (thresholds, non-negative, etc.) ### Test Naming Tests are auto-named by dbt. For custom tests, use descriptive names: ```yaml - name: _custom_business_rule description: "Validate that revenue >= cost for all transactions" ``` ### Severity Levels - **error**: Blocking failures (grain violations, missing FKs) - **warn**: Non-blocking issues (freshness delays, low row counts) ```yaml config: severity: warn # Use for non-critical tests ``` ### Where Clauses Use `where` clauses to filter test scope: ```yaml data_tests: - relationships: arguments: to: ref('dim_player') field: player_id config: where: "player_id IS NOT NULL AND season >= 2020" ``` ## Integration with Other Skills - **dbt-model-builder**: Create tests alongside model creation ## Resources ### assets/ - `test_suite_template.yml` - Base template with dbt 1.10+ syntax for common test patterns ### references/ - `example_fact_tests.yml` - Complete fact table test suite (fact_player_projections) - `example_staging_tests.yml` - Staging model test patterns - `example_dimension_tests.yml` - Dimension table test patterns with SCD Type 2 validation ## Common Test Patterns ### Testing 2×2 Model Enums The 2×2 stat model uses consistent enums across all facts: ```yaml # measure_domain: real_world vs fantasy - name: measure_domain data_tests: - accepted_values: arguments: values: ['real_world', 'fantasy'] # stat_kind: actual vs projection - name: stat_kind data_tests: - accepted_values: arguments: values: ['actual', 'projection'] # horizon: projection timeframe (only for stat_kind='projection') - name: horizon data_tests: - accepted_values: arguments: values: ['weekly', 'full_season', 'rest_of_season'] config: where: "stat_kind = 'projection'" ``` ### Testing Date Ranges ```yaml - name: game_date data_tests: - dbt_utils.expression_is_true: arguments: expression: ">= '2020-01-01'" # Reasonable lower bound - dbt_utils.expression_is_true: arguments: expression: "<= CURRENT_DATE + INTERVAL '1 year'" # No future dates beyond 1 year ``` ### Testing Aggregations ```yaml # Ensure weekly totals match season totals - name: _season_total_consistency data_tests: - dbt_utils.expression_is_true: arguments: expression: | ( SELECT SUM(weekly_points) FROM {{ ref('mart_weekly_points') }} WHERE season = 2024 ) = ( SELECT season_total FROM {{ ref('mart_season_totals') }} WHERE season = 2024 ) ```