--- name: data-cleaning description: Data cleaning, preprocessing, and quality assurance techniques version: "2.0.0" sasmp_version: "2.0.0" bonded_agent: 05-programming-expert bond_type: SECONDARY_BOND # Skill Configuration config: atomic: true retry_enabled: true max_retries: 3 backoff_strategy: exponential # Parameter Validation parameters: tool_preference: type: string required: true enum: [python, r, excel, sql] default: python data_size: type: string required: false enum: [small, medium, large] default: medium # Observability observability: logging_level: info metrics: [rows_cleaned, missing_handled, duplicates_removed] --- # Data Cleaning Skill ## Overview Master data cleaning and preprocessing techniques essential for reliable analytics. ## Topics Covered - Missing value handling (imputation, deletion) - Outlier detection and treatment - Data type conversion and validation - Duplicate identification and removal - String cleaning and normalization ## Learning Outcomes - Clean messy datasets - Handle missing data appropriately - Detect and treat outliers - Ensure data quality ## Error Handling | Error Type | Cause | Recovery | |------------|-------|----------| | Memory error | Dataset too large | Use chunking or sampling | | Type conversion failed | Invalid data format | Apply preprocessing first | | Encoding issues | Wrong character encoding | Detect and specify encoding | | Validation failure | Data doesn't meet schema | Review and adjust validation rules | ## Related Skills - programming (for automation) - foundations (for data quality concepts) - databases-sql (for SQL-based cleaning)