# Council Demo Session Pack This pack gives copy/paste prompts and expected output shape so contributors can quickly sanity-check council behavior. ## Demo A — Full mode: Exploration profile (unknown unknown discovery) Goal: stress-test framing quality and epistemic diversity. Prompt: ```bash /council --profile exploration-orthogonal --triad unknowns Should we expand our AI devtool into enterprise compliance workflows this quarter? ``` Optional multi-provider routing: ```bash /council --profile exploration-orthogonal --triad unknowns --models configs/provider-model-slots.example.yaml Should we expand our AI devtool into enterprise compliance workflows this quarter? ``` What good output looks like: - at least 2 non-overlapping objections before consensus - one explicit counterfactual if early agreement forms - evidence labels across multiple categories (`empirical`, `mechanistic`, `strategic`, `ethical`, `heuristic`) - clear unknowns list with concrete data needed Expected verdict sections: - Problem, Council Composition, Model/Provider Routing, Consensus Position - Key Insights by Member, Points of Agreement/Disagreement - Minority Report, Unresolved Questions, Epistemic Diversity Scorecard, Recommended Next Steps ## Demo B — Full mode: Exploration profile (market-entry triad) Goal: validate adversarial and incentive-aware reasoning. Prompt: ```bash /council --profile exploration-orthogonal --triad market-entry Should we launch in Germany before France for our API platform? ``` What good output looks like: - explicit competitor/counterparty behavior assumptions - incentive map by actor class (buyers, legal, competitors, internal teams) - downside containment plan if launch assumptions fail ## Demo C — Full mode: Execution profile (ship-now triad) Goal: test speed-to-decision and ship-readiness. Prompt: ```bash /council --profile execution-lean --triad ship-now Should we ship release v0.9.4 today if one flaky test remains in CI? ``` What good output looks like: - binary recommendation with conditions (ship / block / canary) - explicit rollback triggers - owner + timeline in next steps ## Demo D — Full mode: Execution profile (stability triad) Goal: validate reliability-first reasoning. Prompt: ```bash /council --profile execution-lean --triad stability Our p95 latency regressed 18% after the new caching layer. Should we revert now or investigate first? ``` What good output looks like: - mechanism hypothesis list ranked by likelihood - immediate containment action - bounded investigation window and decision checkpoint ## Demo E — Quick mode Goal: test rapid 2-round deliberation with condensed output. Prompt: ```bash /council --quick Should we add a Redis cache in front of our Postgres auth queries? ``` What good output looks like: - 200-word max analyses from each member - 75-word final positions - Quick Verdict with: Panel, Positions, Consensus, Key Disagreement, Recommended Action - Total output significantly shorter than full mode - Clear, decisive recommendation ## Demo F — Duo mode Goal: test polarity-pair dialectic with focused tension. Prompt: ```bash /council --duo Should we rewrite the monolith into microservices? ``` Expected: auto-selects Aristotle vs Lao Tzu (architecture domain). What good output looks like: - Each member argues from their epistemic lens (categories vs emergence) - Round 2 directly engages the other's specific claims - Duo Verdict presents both sides without forcing consensus - The Core Tension section clearly names the irreducible disagreement - "What This Means for Your Decision" gives actionable framing Alternative duo test: ```bash /council --duo --members torvalds,musashi Should we ship the beta this week or wait for the security audit? ``` Expected: Torvalds (ship now) vs Musashi (strategic timing) — classic polarity tension. ## Demo G — Auto-triad selection Goal: test automatic triad selection from problem analysis. Prompt: ```bash /council What's the best pricing model for our developer API? ``` Expected: coordinator analyzes the problem, selects `product` triad (Torvalds + Machiavelli + Watts), states reasoning. What good output looks like: - Coordinator explicitly states which triad was selected and why - Selection rationale references problem keywords and triad domain match - Full 3-round deliberation follows ## Demo H — AI triad Goal: test AI-native reasoning with the new Karpathy + Sutskever + Ada triad. Prompt: ```bash /council --triad ai Should we fine-tune an open-source LLM or use a frontier API for our customer support agent? ``` Expected: Karpathy assesses ML capability and training dynamics, Sutskever evaluates scaling/safety implications, Ada provides formal analysis of the problem structure. What good output looks like: - Karpathy grounds the analysis in actual model capabilities and failure modes - Sutskever assesses what happens as the system scales (more queries, edge cases, adversarial users) - Ada identifies the formal properties the system must preserve (accuracy, consistency, safety constraints) - Productive tension between Karpathy (empirical, build-and-observe) and Ada (formal, prove-before-build) ## Demo I — AI duo Goal: test the Karpathy vs Sutskever polarity pair. Prompt: ```bash /council --duo Should we train our own foundation model or build on top of existing ones? ``` Expected: auto-selects Karpathy vs Sutskever (ai domain keyword). What good output looks like: - Karpathy argues from empirical ML experience (training dynamics, data requirements, what you actually learn) - Sutskever argues from scaling frontier perspective (where's the phase transition, what's the safety boundary) - Core Tension: build-and-learn vs. pause-and-research - Neither position forced to converge ## Demo J — Decision triad (Kahneman + Munger + Aurelius) Goal: test cognitive bias detection, inversion reasoning, and moral clarity. Prompt: ```bash /council --triad decision Should we acquire this competitor or build the feature ourselves? ``` What good output looks like: - Kahneman identifies specific biases (sunk cost, overconfidence, anchoring on acquisition price) - Munger inverts ("what would guarantee this acquisition destroys value?") and checks circle of competence - Aurelius draws the control boundary and identifies the duty regardless of difficulty - Productive tension between Kahneman's bias skepticism and Munger's multi-model confidence ## Demo K — Uncertainty duo (Taleb vs Karpathy) Goal: test the tail risk vs empirical scaling tension. Prompt: ```bash /council --duo --members taleb,karpathy Should we deploy this ML model to production with 99.2% accuracy? ``` What good output looks like: - Taleb classifies the domain (Mediocristan vs Extremistan) and assesses fragility of the 99.2% claim - Karpathy assesses what the model actually learned, where the 0.8% errors cluster, and failure modes - Core tension: Taleb warns about catastrophic tail scenarios hidden in the 0.8%; Karpathy argues empirical observation reveals more than theoretical tail analysis - Duo Verdict presents both without forcing consensus ## Demo L — Design triad (Rams + Torvalds + Watts) Goal: test user-centered design reasoning alongside engineering pragmatism and philosophical reframing. Prompt: ```bash /council --triad design Our onboarding flow has 8 steps and 40% drop-off. Should we simplify or add more guidance? ``` What good output looks like: - Rams evaluates from the user's perspective — which steps fail the honesty/clarity test - Torvalds asks what's the boring, maintainable solution - Watts questions whether the framing ("simplify vs add guidance") is a false dichotomy - Concrete recommendation with user evidence ## Fast scoring rubric (0-2 each, 10 max) 1. Perspective spread: distinct viewpoints, not paraphrases 2. Decision clarity: actionable recommendation with thresholds 3. Counterfactual depth: strongest alternative is seriously tested 4. Evidence discipline: claims tagged and justified 5. Execution quality: concrete owners, deadlines, rollback criteria Interpretation: - 9-10 strong - 7-8 usable - <=6 revise profile/triad/model routing