---
name: causal-inference
description: "Bengio's causal inference for AI: Interventional reasoning, counterfactuals, and System 2 deep learning. World models with causal structure."
trit: 0
polarity: ERGODIC
source: "Bengio et al. 2019: Towards Causal Representation Learning"
technologies: [Python, DoWhy, CausalML, Pyro]
---

# Causal Inference Skill

> *"Current deep learning is System 1: fast, intuitive, but easily fooled. We need System 2: slow, deliberate, causal."*
> — Yoshua Bengio

## Overview

**Causal inference** enables:
1. **Interventional reasoning**: What happens if I *do* X?
2. **Counterfactual reasoning**: What *would have* happened if...?
3. **Transfer**: Causal structure generalizes across domains
4. **Robustness**: Causal models resist distribution shift

## Pearl's Causal Hierarchy

```
Level 3: Counterfactual (Imagining)
  "What would have happened if I had done X?"
  P(y_x | x', y')
          ▲
Level 2: Intervention (Doing)
  "What happens if I do X?"
  P(y | do(X))
          ▲
Level 1: Association (Seeing)
  "What does X tell me about Y?"
  P(y | x)
```

## Structural Causal Models (SCM)

```python
class StructuralCausalModel:
    """
    SCM: Variables, causal graph, structural equations.
    """
    
    def __init__(self, variables: List[str], graph: DAG, equations: Dict):
        self.variables = variables
        self.graph = graph  # Directed Acyclic Graph
        self.equations = equations  # X_i = f_i(parents(X_i), U_i)
    
    def intervene(self, intervention: Dict[str, float]) -> "SCM":
        """
        do(X = x): Replace equation for X with constant.
        
        This breaks incoming edges to X.
        """
        new_equations = self.equations.copy()
        for var, value in intervention.items():
            new_equations[var] = lambda *_: value
        
        new_graph = self.graph.remove_edges_to(intervention.keys())
        
        return StructuralCausalModel(
            self.variables, new_graph, new_equations
        )
    
    def counterfactual(self, evidence: Dict, intervention: Dict) -> Dict:
        """
        Counterfactual: What would Y be if X had been x, given we observed evidence?
        
        Three steps:
        1. Abduction: Infer noise terms from evidence
        2. Action: Apply intervention
        3. Prediction: Compute counterfactual outcome
        """
        # Step 1: Abduction - infer noise terms U
        noise_terms = self.abduct_noise(evidence)
        
        # Step 2: Action - apply intervention
        intervened_scm = self.intervene(intervention)
        
        # Step 3: Prediction - forward propagate with inferred noise
        counterfactual_world = intervened_scm.forward(noise_terms)
        
        return counterfactual_world


class CausalDiscovery:
    """
    Learn causal structure from data.
    """
    
    def __init__(self, data: pd.DataFrame):
        self.data = data
        
    def pc_algorithm(self) -> DAG:
        """
        PC Algorithm: Constraint-based causal discovery.
        
        1. Start with complete undirected graph
        2. Remove edges based on conditional independence tests
        3. Orient edges using v-structures and rules
        """
        from causallearn.search.ConstraintBased.PC import pc
        
        result = pc(self.data.values)
        return result.G
    
    def gflownet_discovery(self) -> Distribution[DAG]:
        """
        Use GFlowNet to sample DAGs proportional to likelihood.
        
        This gives a DISTRIBUTION over causal graphs,
        properly accounting for uncertainty.
        """
        from gflownet import CausalDAGGFlowNet
        
        gfn = CausalDAGGFlowNet(n_variables=len(self.data.columns))
        gfn.train(reward=lambda g: self.bayesian_score(g))
        
        # Sample multiple DAGs
        dag_samples = [gfn.sample() for _ in range(1000)]
        return dag_samples
```

## System 2 Deep Learning

```python
class System2Network:
    """
    Bengio's vision: Combine System 1 (fast) with System 2 (slow).
    
    System 1: Neural pattern matching (current DL)
    System 2: Deliberate causal reasoning (compositional, symbolic)
    """
    
    def __init__(self):
        self.system1 = NeuralNetwork()  # Fast intuition
        self.system2 = CausalReasoner()  # Slow reasoning
        self.attention = DynamicAttention()  # Which to use when
    
    def forward(self, x: Tensor, requires_reasoning: bool = False) -> Tensor:
        """
        Hybrid forward pass.
        """
        # System 1: quick answer
        fast_answer = self.system1(x)
        
        if not requires_reasoning:
            return fast_answer
        
        # System 2: verify/refine via causal reasoning
        slow_answer = self.system2.reason(x, fast_answer)
        
        # Combine based on confidence
        confidence = self.attention(x, fast_answer, slow_answer)
        return confidence * fast_answer + (1 - confidence) * slow_answer
    
    def causal_attention(self, query: Tensor) -> Tensor:
        """
        Attention guided by causal relevance, not just correlation.
        
        Standard attention: A(Q, K, V) = softmax(QK^T/√d) V
        Causal attention: Weight by causal effect, not correlation
        """
        correlational_weights = self.compute_attention(query)
        causal_effects = self.system2.estimate_effects(query)
        
        # Reweight by causal importance
        causal_weights = correlational_weights * causal_effects
        return self.apply_attention(causal_weights)
```

## GF(3) Triads

```
# Causal-Categorical Triad
sheaf-cohomology (-1) ⊗ causal-inference (0) ⊗ gflownet (+1) = 0 ✓

# System 2 Triad
proofgeneral-narya (-1) ⊗ causal-inference (0) ⊗ forward-forward-learning (+1) = 0 ✓

# World Model Triad
persistent-homology (-1) ⊗ causal-inference (0) ⊗ self-evolving-agent (+1) = 0 ✓
```

## Integration with Interaction Entropy

```ruby
module CausalInference
  def self.intervene(interaction_sequence, intervention)
    # Build causal graph from interaction sequence
    graph = build_causal_graph(interaction_sequence)
    
    # Apply intervention
    intervened = graph.do(intervention)
    
    # Predict outcome
    outcome = intervened.forward_propagate
    
    {
      original_graph: graph,
      intervention: intervention,
      predicted_outcome: outcome,
      trit: 0  # Coordinator (bridges observational and interventional)
    }
  end
  
  def self.counterfactual(interaction, alternative_action)
    # What would have happened if we'd done alternative_action?
    noise = abduct_noise(interaction)
    intervened = apply_intervention(alternative_action)
    counterfactual_outcome = propagate_with_noise(intervened, noise)
    
    {
      actual_outcome: interaction[:outcome],
      counterfactual_outcome: counterfactual_outcome,
      difference: interaction[:outcome] - counterfactual_outcome
    }
  end
end
```

## Key Properties

1. **Invariance**: Causal mechanisms are stable across environments
2. **Modularity**: Can change one mechanism without affecting others
3. **Compositionality**: Complex models from simple causal primitives
4. **Identifiability**: Can (sometimes) learn causal structure from data

## References

1. Bengio, Y. et al. (2019). "A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms."
2. Schölkopf, B. et al. (2021). "Toward Causal Representation Learning."
3. Pearl, J. (2009). *Causality: Models, Reasoning, and Inference*.
4. Bengio, Y. (2017). "The Consciousness Prior."