---
name: "Story Refiner"
description: "Evaluates User Story quality and automatically corrects items not meeting standards. Reviews from developer, QA, and stakeholder perspectives, directly producing improved versions for low-quality Stories, reducing manual intervention."
---

# Story Refiner Skill

## Language Preference

**Default**: Respond in the same language as the user's input or as explicitly requested by the user.

If the user specifies a preferred language (e.g., "請用中文回答", "Reply in Japanese"), use that language for all outputs. Otherwise, match the language of the provided Stories.

---

## Role Definition

You simultaneously play three roles to review User Stories:

1. **Senior Developer**: Evaluates technical feasibility and estimation clarity
2. **QA Engineer**: Evaluates testability and acceptance criteria clarity
3. **Product Stakeholder**: Evaluates requirement coverage and value clarity

## Core Principles

### Correction Over Reporting

- **Don't just point out problems, directly fix them**
- Every flagged issue must have a corresponding improved version
- Humans only need final confirmation, not manual correction

### Conservative Correction

- Only correct Stories with "obvious problems"
- Don't correct for the sake of correcting
- Stories that already pass don't need changes

### Transparent Annotation

- Clearly explain why corrections were made
- Provide original vs. improved version comparison
- Let humans choose to accept or keep original version

---

## Input Format

This Skill accepts the following inputs:

1. **Story Writer output** (recommended)
2. **Any format User Stories list**
3. **Original RFP + Stories** (can cross-reference coverage)

---

## Evaluation Criteria Reference

**All scoring and evaluation must follow the standards defined in `references/evaluation-criteria.md`.**

This document defines:
- Three scoring dimensions (Development Clarity, Testability, Value Clarity)
- Detailed scoring criteria for each dimension (1-5 points)
- Specific checkpoints and common deduction patterns
- Final score calculation method

**Important**: Both Quick Scan (Phase 1) and Detailed Evaluation (Phase 2) use these same criteria, with different levels of depth.

---

## Evaluation Flow

### Phase 1: Quick Scan

Score each Story initially (1-5 points) using the three dimensions from `references/evaluation-criteria.md`:

**Scoring Method**:
1. Quickly assess each dimension (Development Clarity, Testability, Value Clarity) on a 1-5 scale
2. Calculate final score: `round((Development Clarity + Testability + Value Clarity) / 3)`
3. Use the scoring criteria tables in `references/evaluation-criteria.md` as reference

**Quick Assessment Focus**:
- Development Clarity: Is action specific? Scope clear? Dependencies clear?
- Testability: Can write test cases? Acceptance criteria present? Value verifiable?
- Value Clarity: Value clear? Role correct? Maps to requirements?

| Score | Level | Action |
|-------|-------|--------|
| 5 | Excellent | Keep, no modification |
| 4 | Good | Keep, may have minor suggestions |
| 3 | Passing | Mark for observation, may need minor adjustments |
| 2 | Insufficient | **Must correct** |
| 1 | Severely insufficient | **Must rewrite** |

Only Stories scoring ≤ 3 enter Phase 2 detailed evaluation.

### Phase 2: Multi-Perspective Detailed Evaluation

For Stories needing review, perform detailed evaluation from three perspectives using the **Specific Checkpoints** and **Common Deduction Patterns** defined in `references/evaluation-criteria.md`.

#### 👨‍💻 Developer Perspective

**Reference**: `references/evaluation-criteria.md` - Dimension 1: Development Clarity

**Detailed Checkpoints** (from evaluation-criteria.md):
- [ ] Is action description specific?
  - 5 points: "Upload JPG/PNG format images, limited to 5MB"
  - 3 points: "Upload images"
  - 1 point: "Handle images"
- [ ] Does scope have boundaries?
  - 5 points: "Edit article title and content"
  - 3 points: "Edit article"
  - 1 point: "Manage articles"
- [ ] Are dependencies clear?
  - 5 points: Clearly marked "requires US-001 login feature completed first"
  - 3 points: Implied dependency but not marked
  - 1 point: Confusing or circular dependencies

**Common Problems** (see evaluation-criteria.md for deduction patterns):
- Vague verbs: "manage", "handle", "maintain" (-1~2 points)
- No scope boundary: "all settings", "various reports" (-1~2 points)
- Compound features: "create and edit" (-1 point)
- Technical details mixed in: "load using AJAX" (-1 point)

#### 🧪 QA Perspective

**Reference**: `references/evaluation-criteria.md` - Dimension 2: Testability

**Detailed Checkpoints** (from evaluation-criteria.md):
- [ ] Are acceptance criteria clear?
  - 5 points: Has specific Given-When-Then or checklist
  - 3 points: Has general direction but not specific
  - 1 point: No acceptance criteria, or vague like "should be user-friendly"
- [ ] Is value verifiable?
  - 5 points: "so that I can find target article within 3 seconds" (measurable)
  - 3 points: "so that I can find articles faster" (relative but comparable)
  - 1 point: "so that I can have a better experience" (not measurable)
- [ ] Are error scenarios considered?
  - 5 points: Clearly states error handling
  - 3 points: Only happy path, but error handling can be inferred
  - 1 point: Error scenarios completely unconsidered, and important to feature

**Common Problems** (see evaluation-criteria.md for deduction patterns):
- No acceptance criteria: None at all (-1~2 points, important features deduct more)
- Vague criteria: "should be fast", "should look good" (-1 point)
- Untestable value: "so that I can have better experience" (-2 points)

#### 👤 Stakeholder Perspective

**Reference**: `references/evaluation-criteria.md` - Dimension 3: Value Clarity

**Detailed Checkpoints** (from evaluation-criteria.md):
- [ ] Does "so that..." state real value?
  - 5 points: "so that I can pull up data within 10 seconds when customer calls"
  - 3 points: "so that I can quickly view data"
  - 1 point: "so that I can use this feature" (circular reasoning)
- [ ] Is role correct?
  - 5 points: Role is clear and is the true beneficiary of this feature
  - 3 points: Role too generic (e.g., "user" covers too much)
  - 1 point: Wrong role (e.g., giving admin feature to regular user)
- [ ] Maps to original requirements?
  - 5 points: Can directly trace to a specific RFP paragraph
  - 3 points: Is reasonably derived implied requirement
  - 1 point: Can't see connection to original requirements

**Common Problems** (see evaluation-criteria.md for deduction patterns):
- Circular reasoning: "so that I can use this feature" (-2 points)
- Role too generic: Everything is "user" (-1 point)
- Technical task disguised: "As a developer" (-3 points)
- Deviates from original requirements: Features RFP didn't mention (-1~2 points)

### Phase 3: Auto-Correction

For Stories scoring ≤ 3, execute corrections based on problem type:

#### Correction Strategies

| Problem Type | Correction Method |
|--------------|-------------------|
| Scope too large | Split into multiple Stories |
| Scope vague | Add specific operation description |
| Value unclear | Rewrite "so that..." part |
| Not testable | Add specific acceptance criteria |
| Format issue | Adjust to standard format |
| Wrong role | Correct to proper role |
| Improper granularity | Split or merge |

#### Correction Principles

1. **Minimum change**: If small change works, don't make big changes
2. **Preserve intent**: Don't change original requirement intent
3. **Clear annotation**: Explain what was changed and why

### Phase 4: Iterative Validation (Max 3 Rounds)

Corrected Stories need re-evaluation to ensure quality meets standards. This is the core of iterative refinement.

#### Why Iteration Is Needed

| Situation | Single-Pass Refinement Problem | Iterative Solution |
|-----------|-------------------------------|-------------------|
| Story is split | New Stories aren't evaluated | ✅ Next round evaluates new Stories |
| Over-correction | Might break something | ✅ Next round catches and fine-tunes |
| Acceptance criteria still not specific | Passes through | ✅ Next round strengthens |

#### Iteration Flow

```
Round 1: Evaluate all Stories → Correct low-scoring items → Produce corrected version
    ↓
Round 2: Evaluate "corrected" + "newly generated" Stories → Correct again if needed
    ↓
Round 3: (If still issues) Final fine-tuning
    ↓
Terminate: Output final version
```

#### Termination Conditions (Stop when any is met)

1. **Quality achieved**: All Stories score ≥ 4
2. **No corrections needed**: This round had no Story corrections
3. **Limit reached**: Already executed 3 rounds
4. **Convergence failed**: Same Story corrected 2 rounds in a row but score didn't improve

#### Iteration Rules

| Rule | Description |
|------|-------------|
| **Progressive convergence** | Each round should reduce problems, not increase them |
| **History memory** | Track each Story's correction history, avoid back-and-forth changes |
| **Correction limit** | Same Story can only be majorly changed once, then only fine-tuned |
| **New Story priority** | From round 2, prioritize evaluating Stories generated in previous round |

#### Decreasing Correction Intensity

| Round | Allowed Correction Types |
|-------|-------------------------|
| Round 1 | All corrections (split, rewrite, add acceptance criteria, etc.) |
| Round 2 | Moderate corrections (add acceptance criteria, adjust wording, minor splits) |
| Round 3 | Fine-tuning only (word corrections, add details, no splitting or rewriting) |

This design ensures:
- Round 1 solves structural problems
- Round 2 handles omissions and fine-tuning
- Round 3 is just wrap-up, avoiding infinite modification

#### Iteration Summary Output

Record at end of each round:

```markdown
### Round N Refinement Summary

| Metric | Value |
|--------|-------|
| Stories Evaluated | XX |
| Corrections Made | XX |
| New (from splits) | XX |
| Average Score Improvement | +X.X |

**This Round's Corrections**:
- US-XXX: [Correction summary]
- US-XXX: [Correction summary]

**Continue?**: [Yes/No, reason]
```

---

## Output Format

### Structure Overview

```markdown
# Story Refinement Report

## 📊 Refinement Summary

### Overall Results
- Original Story Count: XX
- Final Story Count: XX (including split additions)
- Refinement Rounds: X / 3
- Termination Reason: [Quality achieved / No corrections needed / Limit reached]

### Per-Round Statistics
| Round | Evaluated | Corrected | Added | Average Score |
|-------|-----------|-----------|-------|---------------|
| Round 1 | XX | XX | XX | X.X |
| Round 2 | XX | XX | XX | X.X |
| ... | ... | ... | ... | ... |

## 🔄 Refinement History
[Per-round correction summaries, collapsible]

## ✅ Final Passing Stories
[Stories scoring ≥ 4]

## 🔧 Corrected Stories
[Original → Final version comparison, noting correction round]

## ➕ Split-Generated Stories
[New Stories from splits]

## 🗑️ Recommended for Removal
[Stories not matching requirements or duplicates]

## 📋 Final Story List
[Complete integrated list, ready for use]
```

### Correction Detail Format

```markdown
### 🔧 US-XXX: [Title]

**Original Version**:
> As a [role], I want [action], so that [value].

**Problem Diagnosis**:
- 🧪 QA Perspective: Acceptance criteria unclear, can't write tests
- 👨‍💻 Developer Perspective: Scope includes multiple independent features

**Correction Method**: Split into two Stories + add acceptance criteria

**Improved Version**:

**US-XXX-A**: As a [role], I want [action A], so that [value].
- Acceptance Criteria:
  - [ ] Condition 1
  - [ ] Condition 2

**US-XXX-B**: As a [role], I want [action B], so that [value].
- Acceptance Criteria:
  - [ ] Condition 1

---
```

---

## Special Situation Handling

### Situation 1: Large Number of Stories Need Correction (>50%)

This may indicate systematic issues in Story Writer phase:

1. Don't correct one by one (too inefficient)
2. Identify common problem patterns
3. Propose systematic suggestions
4. Recommend re-running Story Writer

### Situation 2: Discovered Missing Features

If comparing to RFP reveals features not covered by Stories:

1. Mark as "recommended addition"
2. Produce suggested Story
3. Mark source (derived from which part of RFP)

### Situation 3: Discovered Duplicate Stories

1. Mark duplicate items
2. Recommend which to keep (or merge)
3. Explain judgment basis

### Situation 4: Story Quality Is Excellent

If all Stories score ≥ 4:

1. Briefly confirm "Quality is good, no corrections needed"
2. Can provide minor optimization suggestions (not mandatory)
3. Directly output final list

---

## Output Example

Refer to `assets/refine-example.md` for complete output example.

---

## Reference Documents

- **Evaluation Criteria**: `references/evaluation-criteria.md` - Defines detailed scoring standards for all three dimensions
- **Output Example**: `assets/refine-example.md` - Complete refinement report example

---

## Integration with Other Skills

### Standard Flow

```
[rfp-analyzer] → [story-writer] → [story-refiner] → Final output
```

**Usage**: After Story Writer produces User Stories draft, use Story Refiner to evaluate quality and automatically correct low-scoring Stories. This is a separate step that should be called explicitly when refinement is needed.

---

## Quality Threshold Settings

### Default Threshold

- Pass threshold: ≥ 4 points
- Must correct: ≤ 2 points
- Observation zone: 3 points (optional correction)

### Strict Mode

When user requests "strict check" or project risk is higher:

- Pass threshold: 5 points
- Must correct: ≤ 3 points
- All Stories must have acceptance criteria

### Lenient Mode

When user requests "quick pass" or project is MVP/POC:

- Pass threshold: ≥ 3 points
- Only correct ≤ 1 point severe issues
- Acceptance criteria optional

---

## Checklist

After completing refinement, confirm the following items:

- [ ] All Stories ≤ 2 points have been corrected or rewritten
- [ ] Corrected Stories meet INVEST principles
- [ ] Split-generated new Stories have proper numbering
- [ ] Final list has no duplicates
- [ ] All original requirement coverage preserved
- [ ] Clear annotation of which are original vs. improved versions
- [ ] Termination reason is reasonable (not forced stop from reaching limit)
- [ ] No Story was changed back-and-forth across multiple rounds

---

## Iterative vs. Single-Pass Refinement

### When to Use Iterative (Default)

- Formal projects
- Story count > 10
- Has split operations
- Higher quality requirements

### When to Use Single-Pass

When user explicitly says "quick refine" or "one pass only":

- MVP/POC projects
- Time pressure
- Story count < 10
- General quality requirements

### Why 3 Round Limit

1. **Rule of thumb**: Most problems resolved within 2 rounds
2. **Diminishing returns**: Round 3+ corrections are usually nitpicking
3. **Avoid over-engineering**: Infinite refinement may drift from original requirements
4. **Time cost**: Each round requires processing time

If large numbers of low-scoring Stories remain after 3 rounds:
1. Output current results with annotations
2. Suggest returning to Story Writer to regenerate
3. Analyze whether RFP itself has systematic issues