--- name: metric-diagnosis description: Use when investigating unexpected metric changes - systematically narrows root cause through 4D segmentation, intrinsic vs extrinsic factor analysis, hypothesis testing, and North Star impact assessment --- # Metric Diagnosis Workflow ## Purpose Systematically investigate why a metric changed unexpectedly (dropped or spiked) by segmenting data, distinguishing internal vs. external factors, and testing hypotheses to identify root cause. Prevents rushing to wrong conclusions and ensures evidence-based responses. ## When to Use This Workflow Use this workflow when: - Key metrics drop or spike unexpectedly - Leadership asks "what happened to [metric]?" - Post-mortem analysis needed after incident - A/B test shows unexpected patterns - Need to validate suspected root cause - User behavior changed suddenly without obvious reason ## Skills Sequence This workflow applies the `root-cause-diagnosis` skill three times, then assesses North Star impact: ``` 1. Root Cause Diagnosis (Phase 1) ↓ (Segment across 4 dimensions: People, Geography, Technology, Time) 2. Root Cause Diagnosis (Phase 2) ↓ (Distinguish intrinsic vs extrinsic factors) 3. Root Cause Diagnosis (Phase 3) ↓ (Build hypothesis table and test against data) 4. North Star Alignment ↓ (Assess impact on top-line metrics) OUTPUT: Narrowed scope, tested hypotheses, identified root cause, recommended actions ``` ## Required Inputs Gather this information before starting: ### Metric Change Information - **The metric that changed** - Name and definition - Example: "Weekly Active Users (WAU)" - **Magnitude of change** - Absolute and percentage - Example: "10,000 → 9,000 (10% drop)" - **Timeframe** - When was it noticed? - Over what period did change occur? - Example: "Noticed today, change happened over past week" ### Data Access - **Segmented data availability** - Can you segment by user type, geography, platform? - Are raw logs accessible? - **Comparison data** - Historical baselines - Year-over-year seasonality - Competitor benchmarks (if available) ### Recent Activity Log - **Recent releases** (past 1-4 weeks) - Feature launches - Bug fixes - Infrastructure changes - **Running experiments** - A/B tests active - Configuration changes - **External context** - Marketing campaigns - News events - Competitor activity ## Workflow Steps ### Phase 0: Data Quality Verification (5 minutes) **Before investigating, confirm the data is real:** 1. **Check reporting systems** - Is tracking still functional? - Any pipeline issues? - Data freshness correct? 2. **Cross-reference sources** - Do multiple data sources show same pattern? - Internal dashboards vs. external tools aligned? 3. **Validate sample sizes** - Sufficient data for statistical significance? - Time periods comparable? **Red flags that indicate data quality issues:** - Metric went to exactly zero - Change happened at exact midnight - Only visible in one data source - Coincides with known tracking deployment **If data quality issue found:** Fix tracking first, then re-investigate if problem persists. **If data is valid:** Proceed to Phase 1. ### Phase 1: Narrow the Scope (15-20 minutes) **Use the `root-cause-diagnosis` skill - 4 Dimension Segmentation** Ask clarifying questions across all four dimensions: #### Dimension 1: People (User Segments) Questions to ask: - Does it affect all users equally? - Specific demographics (age, gender, location)? - New vs. returning users? - Free vs. paid users? - Power users vs. casual users? - Specific cohorts or user types? Request data segmentation by user attributes. #### Dimension 2: Geography Questions to ask: - Specific countries or regions affected? - Certain cities? - Urban vs. rural? - Climate zones? - Time zones? Request data segmentation by location. #### Dimension 3: Technology (Platform) Questions to ask: - iOS vs. Android? - Web vs. mobile app? - Desktop vs. mobile web? - Specific browser or OS versions? - Device types? - Network types (WiFi vs. cellular)? Request data segmentation by platform/device. #### Dimension 4: Time Questions to ask: - When exactly did it start? - Gradual decline or sudden drop? - Day of week patterns? - Time of day patterns? - Recurring or one-time? - Seasonal effects? Request time-series data with granular breakdown. **Output of Phase 1:** Narrow problem statement from: - ❌ "Usage is down" To: - ✓ "iOS usage among teens in US dropped 10% starting Sept 1st" **Document:** ```markdown ## Narrowed Scope **Original statement:** [Broad problem] **Narrowed statement:** [Specific problem] **Affected segments:** - People: [Which user groups] - Geography: [Which locations] - Technology: [Which platforms] - Time: [When and pattern] **Unaffected segments:** - [What stayed stable - equally important] ``` ### Phase 2: Generate Hypotheses (15-20 minutes) **Use the `root-cause-diagnosis` skill - Intrinsic vs. Extrinsic Factor Analysis** Brainstorm potential causes in both categories: #### Intrinsic Factors (Internal Changes) **Engineering & Product:** - Recent code releases (check deploy logs) - Bug introductions - Feature launches by any team - A/B experiments started/stopped - Algorithm or ranking changes - Performance degradations **Check with:** - Engineering team (deploys, errors, performance) - Other product teams (their launches) - Data team (tracking changes) **Operations & Business:** - Pricing changes - Policy updates - Support process changes - Marketing campaign changes **Check with:** - Operations team - Marketing team - Customer success team #### Extrinsic Factors (External Events) **Competition:** - Competitor product launches - Competitor pricing changes - Competitor marketing campaigns **Market:** - Economic shifts - Industry trends - Regulatory changes - Platform policy changes (iOS, Android, etc.) **Calendar/Seasonal:** - Holidays - School schedules - Weather patterns - Cultural events **News/PR:** - News coverage (positive or negative) - Social media trends - External events affecting user behavior **Check with:** - Sales team (competitive intelligence) - Customer success (user feedback) - Marketing (market trends) - Industry news sources **Output of Phase 2:** List of 4-6 hypotheses (mix of intrinsic and extrinsic): ```markdown ## Hypothesis List **Intrinsic (Internal):** 1. [Hypothesis name]: [Description] 2. [Hypothesis name]: [Description] 3. [Hypothesis name]: [Description] **Extrinsic (External):** 4. [Hypothesis name]: [Description] 5. [Hypothesis name]: [Description] 6. [Hypothesis name]: [Description] ``` ### Phase 3: Test Hypotheses (20-30 minutes) **Use the `root-cause-diagnosis` skill - Hypothesis Table Method** #### Step 3.1: Create Hypothesis Table **Columns:** Data points observed (from Phase 1) **Rows:** Potential causes (from Phase 2) **Fill each cell with predicted impact:** - ↓ = Would decrease this segment - ↑ = Would increase this segment - → = No effect on this segment - ? = Uncertain #### Step 3.2: Match Patterns For each hypothesis: - Do predictions match observations? - If ALL predictions match → Possible cause - If ANY predictions contradict → Rule out #### Step 3.3: Request Differentiating Data If multiple hypotheses fit: - Identify data points that would differ between them - Request additional segmentation - Narrow to single most likely cause **Example Table:** | Cause | iOS ↓ | Android → | Teens ↓ | Adults → | US ↓ | EU → | |-------|-------|-----------|---------|----------|------|------| | iOS bug in latest update | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | | School started (US only) | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | | Competitor targeted teens | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | **Pattern match:** "School started (US)" fits all observations → Most likely cause **Output of Phase 3:** ```markdown ## Hypothesis Testing Results **Ruled out:** - [Hypothesis]: [Why ruled out] - [Hypothesis]: [Why ruled out] **Possible causes (remaining):** 1. [Most likely hypothesis] - Evidence supporting: [List] - Evidence against: [List if any] - Confidence: [High/Medium/Low] 2. [Second most likely] (if applicable) - Evidence supporting: [List] - Confidence: [High/Medium/Low] **Additional data needed to confirm:** - [Data point that would differentiate remaining hypotheses] ``` ### Phase 4: Assess North Star Impact (10 minutes) **Use the `north-star-alignment` skill** Evaluate whether identified root cause impacts company-level metrics: **Questions:** - Does this affect North Star metrics? - Is impact temporary or permanent? - Does it require immediate action? - What's the business impact (revenue, growth, retention)? **Severity classification:** **Critical (Act immediately):** - Impacts revenue directly - Affects North Star metrics substantially - Permanent degradation if not fixed - Competitive disadvantage **Important (Act soon):** - Impacts user experience - Affects intermediate metrics - May worsen over time - Needs monitoring **Low Priority (Monitor):** - Expected variation (seasonality) - Affects non-critical segments - Self-resolving - No action needed **Output of Phase 4:** ```markdown ## North Star Impact Assessment **Affected North Star metrics:** - [Metric]: [Impact description] - [Metric]: [Impact description] **Business impact:** - Revenue: [Direct/Indirect/None] - Growth: [Slowed/Accelerated/Neutral] - Retention: [Harmed/Improved/Neutral] **Severity:** [Critical/Important/Low Priority] **Urgency:** [Immediate/This week/Monitor] ``` ### Phase 5: Recommend Actions (10 minutes) Based on root cause and severity, recommend next steps: **Action categories:** 1. **Fix (for bugs/issues)** - What needs to be fixed? - Who owns the fix? - Timeline for fix? 2. **Mitigate (for external factors)** - What can reduce harm? - Counter-measures available? - Timeline for mitigation? 3. **Monitor (for expected variations)** - What tracking is needed? - Alert thresholds? - Review cadence? 4. **No Action (for acceptable changes)** - Why no action needed? - What validates this decision? **Output document:** ```markdown # Metric Diagnosis Report: [Metric Name] ## Executive Summary [One paragraph: What happened, why, what to do] ## Problem Statement - **Initial observation:** [Broad statement] - **Narrowed scope:** [Specific statement with segments] - **Magnitude:** [Absolute and % change] - **Timeframe:** [When occurred] ## Root Cause Analysis ### Narrowed Scope (Phase 1) - **Affected:** [User segments, geography, platforms, time pattern] - **Unaffected:** [What stayed stable] ### Hypotheses Tested (Phase 2-3) | Hypothesis | Category | Evidence | Conclusion | |------------|----------|----------|------------| | [Name] | Intrinsic/Extrinsic | [Supporting/Contradicting] | Ruled Out/Possible/Likely | ### Identified Root Cause **Most likely cause:** [Hypothesis name] **Evidence supporting:** 1. [Data point matching prediction] 2. [Data point matching prediction] 3. [Stakeholder confirmation] **Confidence level:** [High/Medium/Low] ### North Star Impact (Phase 4) - **Affected metrics:** [List with impact] - **Severity:** [Critical/Important/Low] - **Urgency:** [Immediate/Soon/Monitor] ## Recommended Actions ### Primary action: **[Fix/Mitigate/Monitor/No Action]** - What: [Specific action] - Who: [Owner] - When: [Timeline] - Success criteria: [How to know it worked] ### Secondary actions: - [Action 1] - [Action 2] ### Monitoring plan: - Metrics to track: [List] - Review frequency: [Daily/Weekly] - Alert thresholds: [When to escalate] ## Appendix - Hypothesis table: [Detailed] - Data sources: [Links] - Stakeholders consulted: [Names/teams] ``` ## Common Diagnostic Patterns ### Pattern 1: Seasonal Variation - **Symptoms:** Recurring pattern (weekly, yearly) - **Investigation:** Compare to historical same period - **Action:** Usually monitor, adjust baselines - **Example:** School year start, holiday season ### Pattern 2: Competitor Impact - **Symptoms:** Sudden shift, specific demographic - **Investigation:** Check competitor news, market research - **Action:** Counter-strategy, feature parity - **Example:** Competing app launch with targeted marketing ### Pattern 3: Internal Bug - **Symptoms:** Sudden drop, platform-specific - **Investigation:** Check recent releases, error logs - **Action:** Hotfix, rollback if needed - **Example:** iOS update broke key feature ### Pattern 4: Product Change Side Effect - **Symptoms:** Gradual shift after release - **Investigation:** Analyze feature adoption patterns - **Action:** Iterate on feature, A/B test variations - **Example:** Redesign changed user behavior ### Pattern 5: Tracking Issue (False Alarm) - **Symptoms:** Exact numbers, midnight timing, single source - **Investigation:** Verify data pipeline - **Action:** Fix tracking, dismiss false alarm - **Example:** Analytics tag removed in deploy ## Common Mistakes | Mistake | Fix | |---------|-----| | Jumping to conclusions without segmentation | Complete Phase 1 (4D segmentation) first | | Only considering one type of factor | Brainstorm both intrinsic AND extrinsic | | Accepting first matching hypothesis | Test all hypotheses systematically | | Skipping data quality check | Always verify data is valid first | | Not consulting stakeholders | Talk to sales, CS, engineering teams | | Immediate rollback without diagnosis | Understand cause before acting | ## Success Criteria Metric diagnosis succeeds when: - Data quality verified - Problem narrowed to specific segments (4 dimensions) - 4-6 hypotheses brainstormed (intrinsic + extrinsic) - Hypotheses tested against data systematically - Root cause identified (or top 2-3 candidates with confidence levels) - North Star impact assessed - Recommended actions clear with owners and timelines - Diagnosis document created and shared - Stakeholders aligned on next steps ## Real-World Example: Microsoft Teams Downloads Drop ### Initial Report "New app downloads down 50% from last week" ### Phase 1: Narrow Scope (15 min) **Questions asked:** - Geography? → "Predominantly US and Europe" - User type? → "Enterprise customers, not consumers" - Platform? → "Web and desktop primarily, some mobile" - Other metrics? → "Usage unchanged (existing users still active)" **Narrowed statement:** "Enterprise customer downloads via web/desktop down 50% in US/Europe, starting last week. Existing user engagement unchanged." ### Phase 2: Generate Hypotheses (15 min) **Intrinsic:** 1. Bug in download flow 2. Recent release broke something 3. A/B test reducing visibility **Extrinsic:** 4. Marketing promotion ended 5. Competitor launch 6. Economic downturn affecting enterprise spending **Stakeholder consults:** - Engineering: No recent releases, no bugs reported - Marketing: "We were running 50% off for enterprises switching from Slack" - Sales: No competitor news, pipeline healthy ### Phase 3: Test Hypotheses (20 min) | Hypothesis | Enterprise | Consumer | US/EU | Usage Same | Downloads Only | |------------|------------|----------|-------|------------|----------------| | Bug in flow | ✗ Both | ✗ Both | ✗ All | ? | ✓ | | Promo ended | ✓ | ✗ | ✓ | ✓ | ✓ | | Competitor | ✗ Both | ✗ Both | ✗ All | ? | ✓ | **Pattern match:** "Promotion ended" fits all observations **Confirmation:** Marketing confirms promotion ended last weekend ### Phase 4: North Star Impact (5 min) **Root cause:** End of promotional campaign - Downloads during promo = artificially inflated baseline - Current downloads = return to normal baseline **North Star impact:** - MAU: No impact (existing users unaffected) - Revenue: No impact (existing customers retained) - Severity: Low (false alarm, not actual problem) ### Phase 5: Recommended Actions (5 min) **Primary action: No action needed** - Not an actual decline, just return to baseline - Adjust baselines for future comparisons **Secondary actions:** - Update dashboards to flag promotional periods - Establish "normal" baseline for future monitoring - Document for future reference (prevent false alarms) **Time to diagnosis: 60 minutes** ## Related Skills This workflow orchestrates these skills: - **root-cause-diagnosis** (Phases 1-3) - Core investigation method - **north-star-alignment** (Phase 4) - Assess strategic impact ## Related Workflows - **tradeoff-decision**: May use diagnosis to understand why metrics changed before deciding - **metrics-definition**: Uses similar systematic approach for defining vs. diagnosing ## Time Estimate **Total: 60-90 minutes** - Phase 0 (Data quality): 5 min - Phase 1 (Narrow scope): 15-20 min - Phase 2 (Hypotheses): 15-20 min - Phase 3 (Test): 20-30 min - Phase 4 (Impact): 10 min - Phase 5 (Actions): 10 min For complex issues, may require multiple iterations or longer investigation time.