---


name: identification-theory
description: DAG and potential outcomes frameworks for causal mediation identification


---

# Identification Theory

**Comprehensive framework for causal identification in statistical methodology**

Use this skill when working on: causal identification, mediation analysis identification, DAG-based reasoning, potential outcomes, identification assumptions, partial identification, sensitivity analysis, or deriving identification formulas.

---

## Core Concepts

### What is Identification?

A causal parameter $\psi$ is **identified** if it can be uniquely determined from the observed data distribution $P(O)$.

Formally: $\psi$ is identified if $P_1(O) = P_2(O) \Rightarrow \psi_1 = \psi_2$.

### Why Identification Matters

```
Causal Question → Target Estimand → Identification → Estimation → Inference
     ↓                  ↓                ↓               ↓            ↓
  "Does A           E[Y(1)-Y(0)]     Express in      Statistical   Confidence
   cause Y?"                         terms of P(O)    methods      intervals
```

Without identification, no amount of data can answer causal questions.

---

## Two Frameworks

### 1. Potential Outcomes (Rubin/Neyman)

**Primitives**:
- $Y(a)$ = potential outcome under treatment $a$
- Only $Y = Y(A)$ is observed (consistency)
- Fundamental problem: never observe both $Y(0)$ and $Y(1)$ for same unit

**Advantages**:
- Clear definition of causal effects
- Natural for experimental reasoning
- Connects to missing data theory

### 2. Structural Causal Models (Pearl)

**Primitives**:
- Directed Acyclic Graph (DAG) encoding causal structure
- Structural equations: $Y := f_Y(PA_Y, U_Y)$
- Interventions via do-operator: $P(Y | do(A=a))$

**Advantages**:
- Visual representation of assumptions
- Systematic identification algorithms
- Clear separation of statistical and causal assumptions

---

## DAG Framework

### Directed Acyclic Graphs (DAGs)

A DAG $\mathcal{G} = (V, E)$ consists of:
- **Vertices** $V$: Random variables
- **Directed edges** $E$: Direct causal relationships
- **Acyclic**: No directed cycles

### Key DAG Terminology

| Term | Definition | Notation |
|------|------------|----------|
| Parents | Direct causes | $PA_Y$ |
| Children | Direct effects | $CH_Y$ |
| Ancestors | All causes | $AN_Y$ |
| Descendants | All effects | $DE_Y$ |
| Collider | Node with two incoming arrows | $A \to C \leftarrow B$ |
| Mediator | Node on causal path | $A \to M \to Y$ |
| Confounder | Common cause | $A \leftarrow C \to Y$ |

```r
# DAG specification and visualization using dagitty
library(dagitty)

# Define mediation DAG
mediation_dag <- dagitty('
  dag {
    A [exposure]
    M [mediator]
    Y [outcome]
    X [confounder]

    X -> A
    X -> M
    X -> Y
    A -> M
    A -> Y
    M -> Y
  }
')

# Visualize
plot(mediation_dag)

# Find adjustment sets
adjustmentSets(mediation_dag, exposure = "A", outcome = "Y")

# Check implied conditional independencies
impliedConditionalIndependencies(mediation_dag)
```

---

## D-Separation

### The Core Concept

Two nodes $A$ and $B$ are **d-separated** by set $Z$ if every path between them is blocked.

### Path Blocking Rules

| Path Type | Blocked by conditioning on... |
|-----------|-------------------------------|
| Chain: $A \to M \to B$ | $M$ (blocks) |
| Fork: $A \leftarrow C \to B$ | $C$ (blocks) |
| Collider: $A \to C \leftarrow B$ | NOT $C$ (conditioning opens!) |

### D-separation Formula

$$A \perp\!\!\!\perp_{\mathcal{G}} B \mid Z \iff \text{every path } A \text{---} B \text{ is blocked by } Z$$

```r
# Check d-separation using dagitty
check_dseparation <- function(dag, x, y, z = NULL) {
  if (is.null(z)) {
    dseparated(dag, x, y)
  } else {
    dseparated(dag, x, y, z)
  }
}

# Find all d-separating sets
find_dsep_sets <- function(dag, x, y) {
  # All adjustment sets that d-separate x and y
  adjustmentSets(dag, exposure = x, outcome = y, effect = "total")
}

# Verify conditional independence implications
verify_ci_implications <- function(dag, data) {
  implied_ci <- impliedConditionalIndependencies(dag)

  results <- lapply(implied_ci, function(ci) {
    # Parse the CI statement
    vars <- strsplit(as.character(ci), " _\\|\\|_ | \\| ")[[1]]
    x <- vars[1]
    y <- vars[2]
    z <- if (length(vars) > 2) vars[3:length(vars)] else NULL

    # Test with partial correlation or conditional independence test
    test_result <- test_conditional_independence(data, x, y, z)

    list(statement = as.character(ci), p_value = test_result$p.value)
  })

  do.call(rbind, lapply(results, as.data.frame))
}
```

---

## Backdoor Criterion

### Definition

A set $Z$ satisfies the **backdoor criterion** relative to $(A, Y)$ if:
1. No node in $Z$ is a descendant of $A$
2. $Z$ blocks every path between $A$ and $Y$ that contains an arrow into $A$

### Backdoor Adjustment Formula

If $Z$ satisfies the backdoor criterion:
$$P(Y | do(A = a)) = \sum_z P(Y | A = a, Z = z) P(Z = z)$$

or equivalently:
$$E[Y(a)] = E_Z[E[Y | A = a, Z]]$$

### Front-Door Criterion

When backdoor fails but mediator is unconfounded:
$$P(Y | do(A)) = \sum_m P(M = m | A) \sum_{a'} P(Y | M = m, A = a') P(A = a')$$

```r
# Check backdoor criterion
check_backdoor <- function(dag, exposure, outcome, adjustment_set) {
  # Using dagitty
  valid_sets <- adjustmentSets(dag, exposure = exposure,
                                outcome = outcome, type = "minimal")

  # Check if proposed set is valid
  is_valid <- any(sapply(valid_sets, function(s) {
    setequal(s, adjustment_set)
  }))

  list(
    is_valid = is_valid,
    minimal_sets = valid_sets,
    proposed = adjustment_set
  )
}

# Compute backdoor-adjusted estimate
backdoor_adjustment <- function(data, outcome, exposure, adjustment) {
  formula_str <- paste(outcome, "~", exposure, "+",
                       paste(adjustment, collapse = " + "))
  model <- lm(as.formula(formula_str), data = data)

  # Standardization
  predictions_a1 <- predict(model,
    newdata = transform(data, setNames(list(1), exposure)))
  predictions_a0 <- predict(model,
    newdata = transform(data, setNames(list(0), exposure)))

  list(
    ate = mean(predictions_a1 - predictions_a0),
    se = sqrt(var(predictions_a1 - predictions_a0) / nrow(data))
  )
}

# Full identification analysis
analyze_identification <- function(dag, exposure, outcome) {
  list(
    adjustment_sets = adjustmentSets(dag, exposure, outcome),
    instrumental_sets = instrumentalVariables(dag, exposure, outcome),
    direct_effects = adjustmentSets(dag, exposure, outcome, effect = "direct"),
    implied_independencies = impliedConditionalIndependencies(dag)
  )
}
```

### Framework Equivalence

For most problems, both frameworks give equivalent results:
$$E[Y(a)] = E[Y | do(A=a)]$$

Choose based on context and audience.

---

## Key Identification Assumptions

### For Treatment Effects

| Assumption | Formal Statement | Interpretation |
|------------|------------------|----------------|
| **Consistency** | $Y = Y(A)$ | Observed outcome equals potential outcome for received treatment |
| **Positivity** | $P(A=a \mid X=x) > 0$ for all $x$ with $P(X=x) > 0$ | Every covariate stratum has both treated and untreated |
| **Exchangeability** | $Y(a) \perp\!\!\!\perp A \mid X$ | No unmeasured confounding given $X$ |
| **SUTVA** | No interference, single version of treatment | Units don't affect each other |

### For Mediation Effects

Additional assumptions required:

| Assumption | Formal Statement | Interpretation |
|------------|------------------|----------------|
| **Cross-world exchangeability** | $Y(a,m) \perp\!\!\!\perp M(a^*) \mid X$ | Counterfactual mediator independent of counterfactual outcome |
| **No $A$-$M$ interaction** (optional) | $Y(a,m) - Y(a',m)$ constant in $m$ | Simplifies identification |
| **Compositional** | $Y(a) = Y(a, M(a))$ | Potential outcome composition |

---

## Standard Identification Results

### 1. Average Treatment Effect (ATE)

**Target**: $\psi = E[Y(1) - Y(0)]$

**Under exchangeability** (A1), **consistency** (A2), **positivity** (A3):

$$\psi = E\left[E[Y | A=1, X] - E[Y | A=0, X]\right]$$

**Proof sketch**:
\begin{align}
E[Y(a)] &= E[E[Y(a) | X]] && \text{(iterated expectations)} \\
        &= E[E[Y(a) | A=a, X]] && \text{(A1: exchangeability)} \\
        &= E[E[Y | A=a, X]] && \text{(A2: consistency)}
\end{align}

### 2. Average Treatment Effect on Treated (ATT)

**Target**: $\psi_{ATT} = E[Y(1) - Y(0) | A=1]$

**Under weaker exchangeability** $Y(0) \perp\!\!\!\perp A \mid X$:

$$\psi_{ATT} = E\left[E[Y | A=1, X] - E[Y | A=0, X] \mid A=1\right]$$

### 3. Natural Direct and Indirect Effects (Mediation)

**Target**:
- NDE: $E[Y(1, M(0)) - Y(0, M(0))]$
- NIE: $E[Y(1, M(1)) - Y(1, M(0))]$

**Under mediation assumptions** (see VanderWeele, 2015):

$$NDE = \int\int \{E[Y|A=1,M=m,X=x] - E[Y|A=0,M=m,X=x]\} \, dP(m|A=0,X=x) \, dP(x)$$

$$NIE = \int\int E[Y|A=1,M=m,X=x] \{dP(m|A=1,X=x) - dP(m|A=0,X=x)\} \, dP(x)$$

### 4. Controlled Direct Effect (CDE)

**Target**: $CDE(m) = E[Y(1,m) - Y(0,m)]$

**Simpler identification** (no cross-world assumption):

$$CDE(m) = E[E[Y|A=1,M=m,X] - E[Y|A=0,M=m,X]]$$

---

## DAG-Based Identification

### The Back-Door Criterion

A set $X$ satisfies the back-door criterion relative to $(A, Y)$ if:
1. No node in $X$ is a descendant of $A$
2. $X$ blocks every path between $A$ and $Y$ that contains an arrow into $A$

**If satisfied**:
$$P(Y | do(A=a)) = \sum_x P(Y | A=a, X=x) P(X=x)$$

### The Front-Door Criterion

When there's an unmeasured confounder $U$ between $A$ and $Y$, but $M$ mediates all of $A$'s effect:

```
    U
   / \
  ↓   ↓
  A → M → Y
```

**Identification**:
$$P(Y | do(A=a)) = \sum_m P(M=m | A=a) \sum_{a'} P(Y | M=m, A=a') P(A=a')$$

### Instrumental Variables

When $Z$ affects $Y$ only through $A$:

```
  U
  ↓
Z → A → Y
```

**Local ATE identification** (with monotonicity):
$$LATE = \frac{E[Y | Z=1] - E[Y | Z=0]}{E[A | Z=1] - E[A | Z=0]}$$

---

## Sequential Identification (Multiple Mediators)

### Sequential Mediation (A → M1 → M2 → Y)

**Product of three path** identification requires:

1. Standard confounding control for each arrow
2. No intermediate confounders affected by treatment
3. Sequential ignorability assumptions

**Path-specific effects**:
- Direct: $A \to Y$
- Through $M_1$ only: $A \to M_1 \to Y$
- Through $M_2$ only: $A \to M_2 \to Y$
- Through both: $A \to M_1 \to M_2 \to Y$

### Identification Formula (No Intermediate Confounding)

$$\text{Effect through } M_1 \to M_2 = \int E\left[\frac{\partial^3}{\partial a \partial m_1 \partial m_2} E[Y|A,M_1,M_2,X]\right]$$

Expressed as product of coefficients: $\hat{\alpha}_1 \cdot \hat{\beta}_1 \cdot \hat{\gamma}_2$

---

## Partial Identification

When point identification fails, we can still bound the parameter.

### Manski Bounds (No Assumptions)

For ATE with missing outcomes:
$$E[Y(1)] \in [E[Y \cdot A]/P(A=1) + y_{min}P(A=0), E[Y \cdot A]/P(A=1) + y_{max}P(A=0)]$$

### Sensitivity Analysis

When exchangeability is uncertain, parameterize violation:

**Unmeasured confounding parameter** $\Gamma$:
$$\frac{1}{\Gamma} \leq \frac{P(A=1|X,U=1)/P(A=0|X,U=1)}{P(A=1|X,U=0)/P(A=0|X,U=0)} \leq \Gamma$$

Compute bounds as function of $\Gamma$ (Rosenbaum bounds).

### E-Value

Minimum strength of unmeasured confounding (on risk ratio scale) needed to explain away observed effect:

$$E\text{-value} = RR + \sqrt{RR \times (RR-1)}$$

---

## Identification Strategies by Design

### Randomized Controlled Trials (RCTs)

- Treatment assignment random → exchangeability holds by design
- Still need SUTVA, consistency
- For mediation: randomize $M$ as well, or use sequential ignorability

### Observational Studies

| Strategy | Key Assumption | Best For |
|----------|----------------|----------|
| Regression adjustment | All confounders measured | Rich covariate data |
| Propensity score | Correct PS model | High-dimensional confounders |
| Instrumental variables | Valid instrument exists | Unmeasured confounding |
| Regression discontinuity | Continuity at threshold | Sharp treatment rules |
| Difference-in-differences | Parallel trends | Panel data |

### Natural Experiments

- Exploit exogenous variation (policy changes, geographic variation)
- Requires careful argument for why variation is "as-if random"

---

## Identification in the MediationVerse

### medfit: Foundation
- Implements standard mediation identification
- VanderWeele regression-based approach
- Supports binary/continuous treatments and mediators

### probmed: Effect Size
- $P_M$ identification requires identified NDE/NIE
- Handles case when NDE and NIE have opposite signs

### RMediation: Confidence Intervals
- Takes identified effects as input
- Distribution of product of coefficients (PRODCLIN)
- Monte Carlo intervals

### medrobust: Sensitivity
- When identification assumptions are uncertain
- Bounds on effects under confounding
- E-values for unmeasured confounding

### medsim: Validation
- Simulate data where truth is known
- Verify identification formulas recover true effects
- Test estimator properties

---

## Identification Proof Template

```latex
\begin{theorem}[Identification of $\psi$]
Under Assumptions:
\begin{enumerate}[label=A\arabic*.]
\item (Consistency) $Y = Y(A)$, $M = M(A)$
\item (Positivity) $P(A=a|X) > \epsilon > 0$ for all $a \in \mathcal{A}$
\item (Exchangeability) $Y(a) \perp\!\!\!\perp A \mid X$
\end{enumerate}
the causal estimand $\psi = E[g(Y(a))]$ is identified by
\[
\psi = E_X\left[E[g(Y) \mid A=a, X]\right].
\]
\end{theorem}

\begin{proof}
\begin{align}
E[g(Y(a))] &= E\left[E[g(Y(a)) \mid X]\right]
    && \text{(law of total expectation)} \\
&= E\left[E[g(Y(a)) \mid A=a, X]\right]
    && \text{(by A3: exchangeability)} \\
&= E\left[E[g(Y) \mid A=a, X]\right]
    && \text{(by A1: consistency)}
\end{align}
The RHS depends only on the observed data distribution $P(Y,A,X)$.
\end{proof}
```

---

## Common Identification Pitfalls

### 1. Conditioning on Colliders

```
A → C ← Y
```

Conditioning on $C$ opens a path between $A$ and $Y$.

### 2. Conditioning on Mediators

```
A → M → Y
```

Conditioning on $M$ blocks the indirect effect, doesn't control confounding.

### 3. Overcontrol Bias

Conditioning on descendants of treatment can bias estimates.

### 4. M-Bias

```
U1 → X ← U2
↓         ↓
A ——————→ Y
```

Conditioning on $X$ opens path $A \leftarrow U_1 \rightarrow X \leftarrow U_2 \rightarrow Y$.

### 5. Table 2 Fallacy

Interpreting coefficients causally when model includes intermediate variables.

---

## Verification Questions

When reviewing identification arguments, ask:

1. **Is the target estimand clearly defined?**
2. **Are all assumptions explicitly stated?**
3. **Is each step in the derivation justified?**
4. **Are the assumptions plausible in this context?**
5. **What if an assumption is violated?**
6. **Is there a DAG that encodes the assumptions?**
7. **Are there alternative identification strategies?**

---

## Integration with Other Skills

This skill works with:
- **proof-architect** - For writing identification proofs
- **asymptotic-theory** - For inference after identification
- **methods-paper-writer** - For presenting identification in manuscripts
- **simulation-architect** - For validating identification

---

## Key References
- Imai
- Hernan

- Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.)
- VanderWeele, T.J. (2015). Explanation in Causal Inference
- Hernán, M.A. & Robins, J.M. (2020). Causal Inference: What If
- Imbens, G.W. & Rubin, D.B. (2015). Causal Inference for Statistics

---

**Version**: 1.0
**Created**: 2025-12-08
**Domain**: Causal Inference, Mediation Analysis