--- name: causal-inference-engine description: Causal inference skill for estimating treatment effects and understanding causal relationships in business data allowed-tools: - Read - Write - Glob - Grep - Bash metadata: specialization: decision-intelligence domain: business category: forecasting priority: medium shared-candidate: true tools-libraries: - econml - dowhy - causalml - statsmodels --- # Causal Inference Engine ## Overview The Causal Inference Engine skill provides sophisticated methods for estimating causal effects from observational data. It enables business analysts to move beyond correlation to understand true cause-and-effect relationships, supporting evidence-based decision-making for interventions, policy changes, and strategic initiatives. ## Capabilities - Propensity score matching - Inverse probability weighting - Difference-in-differences - Instrumental variables - Regression discontinuity - Synthetic control methods - Causal forest implementation - Sensitivity analysis to unobserved confounding ## Used By Processes - A/B Testing and Experimentation Framework - Predictive Analytics Implementation - Win/Loss Analysis Program ## Usage ### Problem Definition ```python # Define causal question causal_problem = { "treatment": "marketing_campaign", "outcome": "purchase_conversion", "confounders": ["customer_segment", "prior_purchases", "channel", "region"], "instruments": ["random_assignment_probability"], # if available "effect_type": "ATE", # Average Treatment Effect "heterogeneity": ["customer_segment", "tenure"] # for CATE } ``` ### Propensity Score Matching ```python # Propensity score configuration psm_config = { "method": "propensity_score_matching", "estimator": "logistic_regression", "matching": { "method": "nearest_neighbor", "caliper": 0.1, "replacement": False, "ratio": 1 }, "balance_check": True, "covariates": ["age", "income", "prior_purchases", "engagement_score"] } ``` ### Difference-in-Differences ```python # DiD configuration did_config = { "method": "difference_in_differences", "treatment_group": "stores_with_intervention", "control_group": "stores_without_intervention", "pre_period": ["2023-01", "2023-06"], "post_period": ["2023-07", "2023-12"], "parallel_trends_test": True, "fixed_effects": ["store_id", "month"] } ``` ### Causal Forest (Heterogeneous Effects) ```python # Causal forest for CATE causal_forest_config = { "method": "causal_forest", "n_trees": 1000, "honest": True, "effect_modifiers": ["customer_segment", "tenure", "region"], "output": { "individual_effects": True, "confidence_intervals": True, "variable_importance": True } } ``` ## Method Selection Guide | Method | When to Use | Assumptions | |--------|-------------|-------------| | Propensity Score | Selection on observables | No unmeasured confounding | | Difference-in-Differences | Pre/post with control group | Parallel trends | | Regression Discontinuity | Threshold-based treatment | Continuity at threshold | | Instrumental Variables | Unmeasured confounding exists | Valid instrument | | Synthetic Control | Aggregate-level intervention | Pre-treatment fit | | Causal Forest | Heterogeneous effects | Unconfoundedness | ## Input Schema ```json { "causal_problem": { "treatment": "string", "outcome": "string", "confounders": ["string"], "effect_type": "ATE|ATT|CATE" }, "data": "dataframe or path", "method_config": { "method": "string", "parameters": "object" }, "validation": { "refutation_tests": ["placebo", "subset", "random_common_cause"], "sensitivity_analysis": "boolean" } } ``` ## Output Schema ```json { "effect_estimate": { "point_estimate": "number", "confidence_interval": ["number", "number"], "p_value": "number", "standard_error": "number" }, "heterogeneous_effects": { "subgroup": { "effect": "number", "ci": ["number", "number"] } }, "diagnostics": { "balance_statistics": "object", "parallel_trends_test": "object", "first_stage_f_stat": "number (IV)" }, "refutation_results": { "test_name": { "original_effect": "number", "refuted_effect": "number", "passed": "boolean" } }, "sensitivity": { "robustness_value": "number", "interpretation": "string" } } ``` ## Best Practices 1. Clearly articulate the causal question before analysis 2. Draw a causal diagram (DAG) to identify confounders 3. Check covariate balance after matching/weighting 4. Perform sensitivity analysis to unmeasured confounding 5. Use multiple refutation tests to validate results 6. Report effect sizes with confidence intervals 7. Be transparent about assumptions and limitations ## Refutation Tests | Test | What It Checks | |------|----------------| | Placebo Treatment | Effect should be zero with random treatment | | Placebo Outcome | Effect should be zero with unrelated outcome | | Subset Validation | Effect should hold in subsamples | | Random Common Cause | Adding random confounder shouldn't change effect | ## Integration Points - Feeds into Hypothesis Tracker for test results - Connects with Experimentation Manager agent - Supports Predictive Analyst for causal features - Integrates with Bayesian Network Analyzer for causal graphs