--- name: feature-metrics description: Define success metrics using the STEDII framework for trustworthy experiment metrics. disable-model-invocation: false user-invocable: true --- # /feature-metrics - Define Success Metrics Select trustworthy metrics using the STEDII framework. ## Context Routing Logic (Internal - for Claude) **Automatic Context Checks:** When this skill is invoked, immediately check: | Source | Files/Folders | Search Terms | What to Extract | | --------------- | ---------------------------------------------- | ------------------------------------- | ------------------------------------------ | | Current PRD | `thoughts/shared/pm/prds/*.md` | feature name from chat | Hypothesis, problem statement, user impact | | Business Info | `thoughts/shared/pm/context/business-info-template.md` | business model, growth stage, metrics | Product strategy, current North Star | | Metrics Context | `thoughts/shared/pm/metrics/*.md` | baseline numbers, historical data | Current metric baselines, ranges | | Strategy | `thoughts/shared/pm/frameworks/*.md` | feature related to strategic pillar | Strategic fit and expected outcomes | | Meetings | `thoughts/shared/product/meeting-notes/*.md` | feature name, "success metrics" | Stakeholder expectations, past decisions | **Context Priority:** 1. Current PRD and feature context FIRST 2. Business model and strategy SECOND 3. Historical metrics and baselines THIRD 4. Stakeholder expectations FOURTH **Cross-Skill Links:** - If feature is part of larger product strategy → Link to `/write-prod-strategy` - If testing this feature → Link to `/experiment-decision` and `/experiment-metrics` - If metric is North Star related → Link to `/define-north-star` - If sizing impact → Link to `/impact-sizing` for usage estimates - If tracking retention → Link to `/retention-analysis` for cohort analysis --- ## When to Use - Defining success criteria for a new feature - Setting up an A/B test - Creating a PRD metrics section - Validating existing metrics --- ## Step 0: Understanding Current State Before we define metrics, let me check what context already exists... **Checking:** - `thoughts/shared/pm/prds/` for any existing PRD for this feature - `thoughts/shared/pm/context/business-info-template.md` for your product model - `thoughts/shared/pm/metrics/` for historical baseline data - `thoughts/shared/pm/frameworks/` for strategic context - `thoughts/shared/product/meeting-notes/` for stakeholder expectations **[If feature PRD exists]:** "I found your [Feature Name] PRD from [date]. It mentions [hypothesis/goal]. Let me use that as context." **[If metrics exist]:** "I found historical data: [Metric] baselines are currently [values]. I'll use this as reference." **Based on what I find, I'll show you:** ### What We Know About This Feature **Strategic Context:** - [How this feature fits into your Q# strategy / roadmap] - [Expected user impact: # of users affected] - [Business outcome: revenue/retention/engagement impact] **Current Baselines:** - [Relevant historical metrics for comparison] - [Product stage: early-stage feature / mature feature / existing metric improvement] **Success Expectations:** - [From stakeholder meetings: what they're expecting] - [From user research: what users need] - [From business model: what drives your North Star] ### Questions to Clarify Before Selecting Metrics 1. **Feature Scope:** Is this a small UX improvement, new capability, or major feature overhaul? 2. **User Segment:** Who is this feature for? All users, specific segment, or internal teams? 3. **Impact Type:** Are we trying to drive growth, engagement, retention, monetization, or efficiency? 4. **Experiment Timeline:** How long can we run the test? (This affects which metrics we can use) 5. **Business Context:** What's more important right now - speed or certainty? --- ## STEDII Framework Every good metric should pass these 6 criteria: ### S - Sensitive Can the metric detect changes from your feature? - Will it move meaningfully with expected impact? - Is the sample size sufficient? ### T - Timely How quickly does the metric respond? - Can you measure it within your experiment window? - Leading indicators > lagging indicators ### E - Easy to Understand Can stakeholders interpret it? - Avoid complex calculations - Clear cause and effect ### D - Directional Is improvement clear? - Up = good or Down = good? Be explicit - Avoid metrics where direction is ambiguous ### I - Implementable Can you actually track it? - Data exists or can be collected - Engineering effort is reasonable ### I - Independent Does it avoid external factors? - Seasonality effects? - Other experiments running? --- ## Quick Start Prompt When PM types `/feature-metrics`, respond: ``` Let's define metrics for your feature. I'll use the STEDII framework. Tell me: 1. What feature are we measuring? 2. What user behavior does it change? 3. What business outcome do we expect? I'll help you select primary metrics, guardrails, and kill criteria. ``` --- ## Metric Types ### Primary Metric The one metric that defines success. - Directly tied to feature goal - Must pass all STEDII criteria - Single source of truth for go/no-go ### Guardrail Metrics Metrics that must NOT get worse. - Protect against unintended harm - Set acceptable ranges (not targets) - Examples: page load time, error rate, support tickets ### Kill Criteria When to stop the experiment early. - Serious negative impact threshold - Safety concerns - Automatic rollback triggers --- ## Output Template ```markdown # Feature Metrics: [Feature Name] ## Primary Metric **Metric:** [Name] **Definition:** [Exactly how it's calculated] **Current baseline:** [X] **Target:** [Y] ([+/- Z%]) **Timeline:** [When we expect to see impact] **STEDII Check:** - [x] Sensitive - [why] - [x] Timely - [why] - [x] Easy to understand - [why] - [x] Directional - [up/down = good] - [x] Implementable - [data source] - [x] Independent - [controls for] ## Guardrail Metrics | Metric | Acceptable Range | Why It Matters | | ---------- | ---------------- | ------------------ | | [Metric 1] | [range] | [protects against] | | [Metric 2] | [range] | [protects against] | ## Kill Criteria If any of these occur, immediately rollback: - [Metric] drops below [threshold] - [Metric] increases above [threshold] - [Qualitative signal] occurs ## Measurement Plan - **Data source:** [where data comes from] - **Tracking:** [how it's implemented] - **Dashboard:** [where to monitor] - **Review cadence:** [how often to check] ``` --- ## Common Metric Pairs | Feature Type | Primary Metric | Common Guardrails | | ------------ | ------------------- | -------------------- | | Growth | Signups, Activation | Retention, Quality | | Engagement | DAU, Sessions | Load time, Errors | | Revenue | Conversion, ARPU | Refunds, Churn | | Retention | D7/D30 retention | NPS, Support tickets | | Efficiency | Task completion | Time on task, Errors | --- ## Output Integration ### Where Files Go **Feature metrics definitions:** - Active work: Add to PRD in `Strategic Fit` section - When finalized: Reference in `/experiment-decision` for A/B testing approach - Archive: Store final metrics in `thoughts/shared/pm/metrics/[feature-name]-baseline.md` for historical reference ### Link to Other Work After defining metrics: - **Reference in PRDs** - "Success is defined as [primary metric] reaching [target] based on STEDII framework" - **Use in experiments** - Feature metrics become primary metric in `/experiment-decision` - **Track progress** - Monitor against baseline in weekly status updates - **Feed retention analysis** - If tracking retention, pass metric definitions to `/retention-analysis` ### Cross-Skill Integration **Feeds into:** - `/experiment-decision` - Primary metric determines test design and duration - `/feature-results` - Use these metrics to measure actual impact post-launch - `/impact-sizing` - Use guardrails to validate usage estimates - `/metrics-framework` - This metric may become a leading indicator for North Star **Pulls from:** - `/define-north-star` - Ensure primary metric ladders up to North Star - `/impact-sizing` - Usage estimates inform what metrics can detect changes - [[business-info-template]] - Company metrics and baselines --- ## Tips - **One primary metric** - Multiple "primary" metrics = no primary metric - **Guardrails are not goals** - You're not trying to improve them, just protect them - **Leading > Lagging** - Measure what you can act on quickly - **Avoid vanity metrics** - Page views don't matter if nobody converts - **Baseline matters** - Know your current numbers before running experiment - **Time to signal** - Faster metrics (hours/days) beat slow metrics (months) --- ## Output Quality Self-Check Before presenting output to the PM, verify: - [ ] **File saved to correct location:** Output saved to `thoughts/shared/pm/metrics/feature-metrics-[feature-name]-[date].md` - [ ] **Context routing table was checked:** Reviewed `thoughts/shared/pm/prds/` for feature context, `thoughts/shared/pm/context/business-info-template.md` for North Star metric, and `thoughts/shared/pm/metrics/` for existing dashboards and baselines - [ ] **Metrics pass STEDII framework:** Each proposed metric is evaluated against all 6 STEDII dimensions (Sensitive, Timely, Easy to understand, Directional, Implementable, Independent) with pass/fail reasoning - [ ] **Primary metric has baseline and target:** The primary metric includes a current baseline number and a specific target value with timeline (not "improve" or "increase") - [ ] **Guardrail metrics defined:** At least 1 guardrail metric is specified with an acceptable range and explanation of what it protects against - [ ] **Metrics ladder to North Star:** The output explicitly shows how the primary metric connects upward to the company's North Star metric from [[business-info-template]] - [ ] **Data source identified for each metric:** Every metric names where the data comes from (e.g., "PostHog event: task_created" or "database query on users table") - [ ] **Metric sensitivity estimated:** The output addresses whether the expected feature impact is large enough for the metric to detect, given current variance and traffic