--- name: causal-inference description: Add causal reasoning to agent actions. Trigger on ANY high-level action with observable outcomes - emails, messages, calendar changes, file operations, API calls, notifications, reminders, purchases, deployments. Use for planning interventions, debugging failures, predicting outcomes, backfilling historical data for analysis, or answering "what happens if I do X?" Also trigger when reviewing past actions to understand what worked/failed and why. --- # Causal Inference A lightweight causal layer for predicting action outcomes, not by pattern-matching correlations, but by modeling interventions and counterfactuals. ## Core Invariant **Every action must be representable as an explicit intervention on a causal model, with predicted effects + uncertainty + a falsifiable audit trail.** Plans must be *causally valid*, not just plausible. ## When to Trigger **Trigger this skill on ANY high-level action**, including but not limited to: | Domain | Actions to Log | |--------|---------------| | **Communication** | Send email, send message, reply, follow-up, notification, mention | | **Calendar** | Create/move/cancel meeting, set reminder, RSVP | | **Tasks** | Create/complete/defer task, set priority, assign | | **Files** | Create/edit/share document, commit code, deploy | | **Social** | Post, react, comment, share, DM | | **Purchases** | Order, subscribe, cancel, refund | | **System** | Config change, permission grant, integration setup | Also trigger when: - **Reviewing outcomes** — "Did that email get a reply?" → log outcome, update estimates - **Debugging failures** — "Why didn't this work?" → trace causal graph - **Backfilling history** — "Analyze my past emails/calendar" → parse logs, reconstruct actions - **Planning** — "Should I send now or later?" → query causal model ## Backfill: Bootstrap from Historical Data Don't start from zero. Parse existing logs to reconstruct past actions + outcomes. ### Email Backfill ```bash # Extract sent emails with reply status gog gmail list --sent --after 2024-01-01 --format json > /tmp/sent_emails.json # For each sent email, check if reply exists python3 scripts/backfill_email.py /tmp/sent_emails.json ``` ### Calendar Backfill ```bash # Extract past events with attendance gog calendar list --after 2024-01-01 --format json > /tmp/events.json # Reconstruct: did meeting happen? was it moved? attendee count? python3 scripts/backfill_calendar.py /tmp/events.json ``` ### Message Backfill (WhatsApp/Discord/Slack) ```bash # Parse message history for send/reply patterns wacli search --after 2024-01-01 --from me --format json > /tmp/wa_sent.json python3 scripts/backfill_messages.py /tmp/wa_sent.json ``` ### Generic Backfill Pattern ```python # For any historical data source: for record in historical_data: action_event = { "action": infer_action_type(record), "context": extract_context(record), "time": record["timestamp"], "pre_state": reconstruct_pre_state(record), "post_state": extract_post_state(record), "outcome": determine_outcome(record), "backfilled": True # Mark as reconstructed } append_to_log(action_event) ``` ## Architecture ### A. Action Log (required) Every executed action emits a structured event: ```json { "action": "send_followup", "domain": "email", "context": {"recipient_type": "warm_lead", "prior_touches": 2}, "time": "2025-01-26T10:00:00Z", "pre_state": {"days_since_last_contact": 7}, "post_state": {"reply_received": true, "reply_delay_hours": 4}, "outcome": "positive_reply", "outcome_observed_at": "2025-01-26T14:00:00Z", "backfilled": false } ``` Store in `memory/causal/action_log.jsonl`. ### B. Causal Graphs (per domain) Start with 10-30 observable variables per domain. **Email domain:** ``` send_time → reply_prob subject_style → open_rate recipient_type → reply_prob followup_count → reply_prob (diminishing) time_since_last → reply_prob ``` **Calendar domain:** ``` meeting_time → attendance_rate attendee_count → slip_risk conflict_degree → reschedule_prob buffer_time → focus_quality ``` **Messaging domain:** ``` response_delay → conversation_continuation message_length → response_length time_of_day → response_prob platform → response_delay ``` **Task domain:** ``` due_date_proximity → completion_prob priority_level → completion_speed task_size → deferral_risk context_switches → error_rate ``` Store graph definitions in `memory/causal/graphs/`. ### C. Estimation For each "knob" (intervention variable), estimate treatment effects: ```python # Pseudo: effect of morning vs evening sends effect = mean(reply_prob | send_time=morning) - mean(reply_prob | send_time=evening) uncertainty = std_error(effect) ``` Use simple regression or propensity matching first. Graduate to do-calculus when graphs are explicit and identification is needed. ### D. Decision Policy Before executing actions: 1. Identify intervention variable(s) 2. Query causal model for expected outcome distribution 3. Compute expected utility + uncertainty bounds 4. If uncertainty > threshold OR expected harm > threshold → refuse or escalate to user 5. Log prediction for later validation ## Workflow ### On Every Action ``` BEFORE executing: 1. Log pre_state 2. If enough historical data: query model for expected outcome 3. If high uncertainty or risk: confirm with user AFTER executing: 1. Log action + context + time 2. Set reminder to check outcome (if not immediate) WHEN outcome observed: 1. Update action log with post_state + outcome 2. Re-estimate treatment effects if enough new data ``` ### Planning an Action ``` 1. User request → identify candidate actions 2. For each action: a. Map to intervention(s) on causal graph b. Predict P(outcome | do(action)) c. Estimate uncertainty d. Compute expected utility 3. Rank by expected utility, filter by safety 4. Execute best action, log prediction 5. Observe outcome, update model ``` ### Debugging a Failure ``` 1. Identify failed outcome 2. Trace back through causal graph 3. For each upstream node: a. Was the value as expected? b. Did the causal link hold? 4. Identify broken link(s) 5. Compute minimal intervention set that would have prevented failure 6. Log counterfactual for learning ``` ## Quick Start: Bootstrap Today ```bash # 1. Create the infrastructure mkdir -p memory/causal/graphs memory/causal/estimates # 2. Initialize config cat > memory/causal/config.yaml << 'EOF' domains: - email - calendar - messaging - tasks thresholds: max_uncertainty: 0.3 min_expected_utility: 0.1 protected_actions: - delete_email - cancel_meeting - send_to_new_contact - financial_transaction EOF # 3. Backfill one domain (start with email) python3 scripts/backfill_email.py # 4. Estimate initial effects python3 scripts/estimate_effect.py --treatment send_time --outcome reply_received --values morning,evening ``` ## Safety Constraints Define "protected variables" that require explicit user approval: ```yaml protected: - delete_email - cancel_meeting - send_to_new_contact - financial_transaction thresholds: max_uncertainty: 0.3 # don't act if P(outcome) uncertainty > 30% min_expected_utility: 0.1 # don't act if expected gain < 10% ``` ## Files - `memory/causal/action_log.jsonl` — all logged actions with outcomes - `memory/causal/graphs/` — domain-specific causal graph definitions - `memory/causal/estimates/` — learned treatment effects - `memory/causal/config.yaml` — safety thresholds and protected variables ## References - See `references/do-calculus.md` for formal intervention semantics - See `references/estimation.md` for treatment effect estimation methods