--- name: judgment-postmortem-calibration description: "Build VC judgment faster through structured postmortems with quantified calibration: log initial takes, track prediction accuracy with Brier scores, and measure learning rate over time. Use after decisions, passes, and major diligence sprints." license: Proprietary compatibility: Works offline; improved with longitudinal tracking; optional Salesforce logging. metadata: author: evalops version: "0.3" --- # Judgment postmortem calibration ## When to use Use this skill when you want to: - Improve selection judgment (faster learning, fewer repeated mistakes) - Capture why you said yes/no and how evidence changed your view - **Measure your prediction accuracy and learning rate over time** - Build an internal "decision log" that compounds - Review investments or passes after outcomes are known **Trigger points:** - After every IC decision (invest or pass) - After every competitive loss - Quarterly: review passes that raised from others + calculate calibration metrics - Annually: review portfolio outcomes vs initial thesis + measure learning rate ## Inputs you should request (only if missing) - Deal name + date of first meeting - Your initial take (reconstruct honestly if not documented) - Outcome to date (funded by others? traction? pivot? shut down?) - Original memo or notes (if available) - **Your probability estimates at decision time (if recorded)** ## Outputs you must produce 1) **Decision log entry** (structured, one page) 2) **Calibration scorecard** (predictions vs reality with scores) 3) **Brier score calculation** (for probabilistic predictions) 4) **Updated heuristics** (2-5 actionable bullets) 5) **Pattern library update** (what archetype was this?) 6) **Learning rate metrics** (are you getting better?) 7) **Follow-up list** (who to ping, what to track) Templates: - assets/decision-log.md - assets/calibration-tracker.csv (for longitudinal tracking) ## Core principle: Measure what you believe, then score it The value of postmortems comes from: 1. **Honest recording** of what you believed at decision time 2. **Quantified predictions** (probabilities, not just "I thought X") 3. **Systematic scoring** against outcomes 4. **Tracking improvement** over time ## Procedure ### 1) Capture the timeline with predictions | Date | Event | Your belief | Confidence (%) | Outcome | |---|---|---|---|---| | First meeting | "This will be a $1B+ outcome" | 20% | | | First meeting | "Product-market fit within 12 months" | 60% | | | Diligence | "They'll close 3 enterprise deals in 6 months" | 40% | | | Decision | "Worth investing" | 70% | | | +12 months | | | Actual outcome | ### 2) Record the initial thesis with probabilities **At first meeting, I believed:** - What would make the company win: - P(success | investment) estimate: ___% - P(this raises next round) estimate: ___% - P(achieves stated 12-month milestones) estimate: ___% - Top risk and P(risk materializes): ___% - My recommendation: **At decision point, I believed:** - P(success | investment): ___% - P(raises next round): ___% - Top risks with probabilities: 1. Risk: ___ | P(materializes): ___% 2. Risk: ___ | P(materializes): ___% - Final recommendation: ### 3) Document the decision - **Decision:** Invest / Pass / Lost competitive - **Stated rationale (at the time):** - **Unstated factors (be honest):** - **Confidence in decision:** ___% ### 4) Score predictions against reality **Prediction scorecard:** | Prediction | Your P(true) | Actual (1/0) | Brier contribution | |---|---|---|---| | "This will raise Series A" | 70% | 1 (yes) | (0.7-1)² = 0.09 | | "Product-market fit in 12mo" | 60% | 0 (no) | (0.6-0)² = 0.36 | | "3 enterprise deals in 6mo" | 40% | 0 (no) | (0.4-0)² = 0.16 | | "Top risk materializes" | 30% | 1 (yes) | (0.3-1)² = 0.49 | **Brier score for this deal:** (sum of contributions) / n = ___ - Perfect = 0.0, Random = 0.25, Always wrong = 1.0 - Good forecaster: < 0.20 - Reasonable forecaster: 0.20 - 0.25 ### 5) Calculate calibration (are your probabilities accurate?) Group your historical predictions by confidence level: | Confidence bucket | Predictions | Outcomes (% true) | Calibration gap | |---|---|---|---| | 10-20% | 15 | 18% true | +3% (slightly under-confident) | | 30-40% | 22 | 28% true | -7% (slightly over-confident) | | 50-60% | 18 | 52% true | -3% (well calibrated) | | 70-80% | 12 | 58% true | -17% (over-confident) | | 90%+ | 5 | 80% true | -12% (over-confident) | **Calibration insight:** "I tend to be over-confident in the 70-80% range. When I say 75%, things happen ~60% of the time." ### 6) Identify what you underweighted or overweighted **For this deal:** - Underweighted: ___ - Overweighted: ___ - Surprise factor: ___ **Pattern across deals (update quarterly):** | Factor | Times underweighted | Times overweighted | |---|---|---| | Team learning rate | | | | Distribution advantages | | | | Timing/market readiness | | | | Technical moat | | | | Competition | | | | Founder-market fit | | | ### 7) Extract heuristics (portable rules) Good heuristics are: - Specific enough to act on - Falsifiable - Tied to observed evidence - **Attached to a base rate** **Heuristic format:** "When [specific condition], [outcome] happens [X%] of the time in my experience." Examples: - "When a seed-stage founder can't name a specific buyer trigger event, they fail to hit enterprise sales targets 80% of the time." - "When we invest in a second-time founder with prior distribution success, they raise Series A 90% of the time." Write 2-5 heuristics from this postmortem: 1. 2. 3. ### 8) Update your pattern library with base rates **Archetype performance tracking:** | Archetype | Deals | Success rate | Avg Brier | Notes | |---|---|---|---|---| | First-time founder, crowded market | 8 | 25% | 0.28 | Over-confident on differentiation | | Second-time founder, distribution edge | 5 | 80% | 0.15 | Under-confident on execution | | Technical founder, no GTM | 6 | 33% | 0.32 | Over-weight technical moat | ### 9) Measure learning rate (quarterly) **Rolling Brier score by quarter:** | Quarter | Deals scored | Avg Brier | Calibration gap | Trend | |---|---|---|---|---| | Q1 2025 | 12 | 0.28 | 15% over-confident | Baseline | | Q2 2025 | 15 | 0.24 | 10% over-confident | Improving | | Q3 2025 | 14 | 0.21 | 8% over-confident | Improving | | Q4 2025 | 16 | 0.19 | 5% over-confident | Good | **Learning rate = (Brier_t - Brier_t-1) / Brier_t-1** Target: 5-10% improvement per quarter until Brier < 0.20 ### 10) Create follow-up list | What to track | Signal | Recheck date | Prediction to score | |---|---|---|---| | | | | P(___) = ___% | | | | | P(___) = ___% | | Who to keep warm | Why | Next touch | |---|---|---| | | | | ## Quarterly calibration review Every quarter: 1. Score all predictions that reached outcome date 2. Calculate Brier scores by deal and overall 3. Update calibration table (predictions vs outcomes by confidence bucket) 4. Identify systematic biases (over/under-confidence patterns) 5. Review passes that raised: were pass reasons validated? 6. Update heuristics with new base rates 7. Calculate learning rate vs previous quarter **Quarterly output:** - Brier score trend chart - Calibration curve (predicted % vs actual %) - Top 3 biases to correct - Updated heuristic base rates ## Annual review: Portfolio outcomes vs initial thesis For each portfolio company: - What did we believe at investment? - What's the current reality? - Where were we right/wrong? - Score the original predictions - Update archetype base rates **Annual output:** - Portfolio Brier score - Best/worst calibrated predictions - Archetype performance update - Learning rate over 4 quarters - Heuristics validated or invalidated ## Salesforce logging (optional) If Salesforce is your system of record: - Add predictions as structured fields on Opportunity (or in Notes) - Record confidence levels at each stage - Link postmortem Note titled "Postmortem (YYYY-MM-DD)" - Update outcome fields when known - Tag with heuristics extracted ## Edge cases - If you have no outcome yet: run a "process postmortem" focused on what you learned and what evidence was missing. Record predictions for future scoring. - If the outcome is ambiguous: define binary success criteria now, score later. - If you can't remember your initial thesis: reconstruct as honestly as possible, and **start recording predictions with probabilities now**. - If you have few deals: even 10-15 scored predictions start to show calibration patterns.