--- name: aps-doc-staging description: Expert documentation generation for staging transformation layers. Auto-detects SQL engine (Presto/Trino vs Hive), documents transformation rules, PII handling, deduplication strategies, and data quality rules. Use when documenting staging transformations. --- # APS Staging Transformation Documentation Expert Specialized skill for generating comprehensive documentation for staging transformation layers. Automatically detects SQL engines, extracts transformation rules, documents PII handling, and analyzes deduplication strategies. ## When to Use This Skill Use this skill when: - Documenting staging transformation workflows - Creating documentation for data cleaning and standardization logic - Documenting PII handling and security transformations - Creating documentation for deduplication strategies - Documenting data quality rules and validations - Generating documentation for Presto/Trino or Hive transformations **Example requests:** ``` "Document the staging transformation for customer events" "Create staging layer documentation with transformation rules" "Document PII handling in staging transformations" "Generate staging documentation following this template: [Confluence URL]" ``` --- ## 🚨 MANDATORY: Codebase Access Required **WITHOUT codebase access = NO documentation. Period.** **If no codebase access provided:** ``` I cannot create technical documentation without codebase access. Required: - Directory path to staging workflows - Access to .dig, .sql, .yml files Without access, I cannot extract real transformation SQL, PII logic, or table names. Provide path: "Code is in /path/to/staging/" ``` **Before proceeding:** 1. Ask for codebase path if not provided 2. Use Glob to verify SQL files exist 3. STOP if cannot read files **Documentation MUST contain:** - Real transformation SQL from .sql files - Actual PII hashing/masking logic - Real table/column names - Working SQL examples from code **NO generic placeholders. Only real, extracted data.** ## REQUIRED Documentation Template **Follow this EXACT structure (analyzed from production examples):** ```markdown # Staging Transformation - {Engine} Engine ## Overview **Engine**: {Presto/Trino or Hive} **Architecture**: {Loop-based / Other} **Processing Mode**: {Incremental / Full} **Location**: {directory path} ### Key Characteristics {List key features from actual workflow} --- ## Architecture Overview ### Directory Structure {Actual directory tree from codebase} ### Core Components #### 1. Main Workflow File {Name and purpose} **Key Features:** - {Feature from actual .dig file} - {Feature from actual .dig file} **Workflow Phases:** {Extract from actual workflow} #### 2. Configuration File {Name and structure from actual codebase} **Configuration Structure:** {Real YAML structure} **Table Configuration Fields:** {Document actual fields used} #### 3. SQL Transformation Files {Types: init, incremental, upsert - from actual codebase} --- ## Processing Flow ### Initial Load (First Run) {Step-by-step from actual workflow} ### Incremental Load (Subsequent Runs) {Step-by-step from actual workflow} --- ## Data Transformation Rules {Document ACTUAL transformation rules from codebase} ### 1. Date/Timestamp Processing {Real SQL examples from transformation files} ### 2. String Standardization {Real SQL examples} ### 3. JSON Extraction {Real examples if exists} ### 4. Email Processing {Real examples if exists} ### 5. Phone Number Processing {Real examples if exists} ### 6. Deduplication Logic {Real ROW_NUMBER() or DISTINCT logic} ### 7. Metadata Columns {Real source_system, load_timestamp columns} --- ## Table-Specific Transformation Rules {If using reference table like staging_trnsfrm_rules:} **Reference Table**: {database}.{table} **Purpose**: {explain} **Schema**: {real schema} **How Used**: {explain how workflow reads these rules} --- ## Current Implementation **Configured Tables**: {List actual tables from config} --- ## How to Add New Source Tables {Step-by-step with real examples} --- ## Monitoring & Troubleshooting **Key Queries**: {Real SQL for checking status, data quality} **Common Issues**: {Real issues and solutions} --- ## Best Practices {List from actual production experience} --- ## Summary {Brief recap of capabilities} ``` --- **Template Usage Notes:** - Read actual workflows (.dig), configs (.yml), SQL files - Extract REAL transformation logic from SQL - Document REAL deduplication strategies - Use actual table/column names from codebase - Include working SQL examples - NO placeholders - only real extracted data ## Summary This skill generates production-ready staging documentation by: - Reading actual .dig workflows, .yml configs, and .sql files - Following the exact template structure shown above - Extracting real transformation rules from SQL - Documenting actual deduplication logic - Creating comprehensive documentation with working SQL examples **Key capability:** Transforms staging codebase into professional Confluence documentation with all transformation rules documented.