---
name: visual-verdict
description: Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.
---
# Visual Verdict — Screenshot QA Skill
Pixel-imprecise human eyes miss visual bugs. This skill provides structured, scored visual review using screenshots, returning actionable verdicts rather than vague impressions.
## When to Activate
- After a frontend-dev agent implements a UI component or page
- When cloning an external website (pairs with clone-website skill)
- After CSS changes to verify no visual regressions
- When a designer provides a Figma export or reference screenshot
- Before marking any UI task as complete
## How It Works
```
1. Take screenshot of current implementation
2. Load reference (design mockup, Figma export, or previous version)
3. Score against 6 dimensions (0-100 each)
4. Compute weighted total score
5. Issue verdict: PASS (90+) / REVISE (60-89) / FAIL (<60)
6. List concrete mismatches with file + line fix hints
7. Loop: frontend-dev fixes → new screenshot → rescore → repeat until PASS
```
## Prerequisites
- browser-use MCP installed and active (`~/.mcp.json`)
- Reference image available (Figma PNG export, screenshot, or URL)
- Implementation running (local dev server or deployed URL)
## Scoring Dimensions
### Dimension Weights
| Dimension | Weight | Description |
|-----------|--------|-------------|
| Layout accuracy | 0.25 | Positioning, spacing, alignment of all elements |
| Typography | 0.15 | Font family, size, weight, line-height, letter-spacing |
| Color accuracy | 0.15 | Background, text, border, shadow, gradient colors |
| Responsive behavior | 0.15 | Breakpoint transitions, mobile/tablet rendering |
| Interactive states | 0.15 | Hover, focus, active, disabled, loading states |
| Content completeness | 0.15 | All text, images, icons, labels present |
### Weighted Score Formula
```
total = (layout * 0.25) + (typography * 0.15) + (color * 0.15)
+ (responsive * 0.15) + (interactive * 0.15) + (content * 0.15)
```
### Scoring Each Dimension (0-100)
**Layout accuracy**
- All elements exactly in position, spacing matches: 90-100
- Minor spacing deviation (< 4px): 75-89
- Noticeable misalignment (4-16px): 50-74
- Layout clearly wrong (columns swapped, elements missing): 0-49
**Typography**
- Font, size, weight, line-height all match: 90-100
- One attribute off (wrong weight only): 70-89
- Two attributes off: 50-69
- Wrong font family entirely: 0-49
**Color accuracy**
- All colors within 5% of reference: 90-100
- One element color clearly off: 65-89
- Multiple elements wrong color: 40-64
- Dark/light mode inverted: 0-39
**Responsive behavior**
- Tested at 3 breakpoints, all correct: 90-100
- One breakpoint breaks layout: 65-89
- Two breakpoints break: 40-64
- Not responsive at all: 0-39
**Interactive states**
- All states implemented and visible: 90-100
- Missing one minor state (loading): 70-89
- Missing hover or focus: 50-69
- No interactive states at all: 0-49
**Content completeness**
- All text, images, icons present: 90-100
- One non-critical item missing: 75-89
- One critical item missing (CTA, heading): 50-74
- Multiple items missing: 0-49
## Verdict Thresholds
| Verdict | Score | Meaning |
|---------|-------|---------|
| PASS | 90-100 | Acceptable for production |
| REVISE | 60-89 | Needs fixes, not shippable |
| FAIL | 0-59 | Major rework required |
## Verdict Output Format
```markdown
## Visual Verdict: [Component/Page Name]
**Score: 85/100 — REVISE**
**Attempt: 2/5**
**Previous score: 72 (+13 this iteration)**
### Dimension Scores
| Dimension | Score | Weight | Weighted | Issues |
|-----------|-------|--------|----------|--------|
| Layout | 92 | 0.25 | 23.0 | Minor padding on mobile |
| Typography | 68 | 0.15 | 10.2 | Wrong font-weight on H2 |
| Color | 96 | 0.15 | 14.4 | - |
| Responsive | 80 | 0.15 | 12.0 | Card grid breaks at 768px |
| Interactive | 82 | 0.15 | 12.3 | Missing hover on CTA |
| Content | 91 | 0.15 | 13.7 | - |
| **TOTAL** | | | **85.6** | |
### Concrete Mismatches (prioritized by impact)
1. **[Typography / HIGH]** H2 hero heading
Expected: `font-weight: 700`
Actual: `font-weight: 400` (normal weight)
Fix: `src/components/Hero.tsx` → add `font-bold` class to `
`
2. **[Responsive / MEDIUM]** Card grid layout
Expected: 3-column grid collapses at 1024px
Actual: Collapses at 768px — causes overflow between 768-1024
Fix: `src/components/CardGrid.tsx` → change `md:grid-cols-3` to `lg:grid-cols-3`
3. **[Interactive / MEDIUM]** Primary CTA button
Expected: Darker background on hover (`bg-blue-700`)
Actual: No hover state change
Fix: `src/components/Hero.tsx` → add `hover:bg-blue-700 transition-colors` to CTA
4. **[Layout / LOW]** Mobile padding
Expected: 16px horizontal padding on mobile
Actual: 12px
Fix: `src/components/Layout.tsx` → change `px-3` to `px-4`
### Next Steps
Fix issues 1-3 above, then call visual-verdict again.
Estimated score after fixes: ~94 (PASS)
```
## Screenshot Workflow with browser-use MCP
### Taking the Current Implementation Screenshot
```
Use browser-use MCP:
1. Navigate to implementation URL (e.g., http://localhost:3000/dashboard)
2. Wait for page fully loaded (no spinners)
3. Take full-page screenshot
4. Save to /tmp/verdict-current-{timestamp}.png
```
### Loading the Reference
Three reference types accepted:
**Figma PNG export**: Designer exports component/page as PNG at 2x
```
Reference path: ~/Desktop/designs/dashboard-v2.png
```
**Previous production screenshot**: Regression check
```
Reference path: ~/.claude/benchmarks/screenshots/dashboard-prod.png
```
**Live URL comparison**: Compare two running versions
```
Reference URL A: http://localhost:3000/dashboard (current)
Reference URL B: https://staging.app.com/dashboard (baseline)
```
### Responsive Testing Breakpoints
Always test at these three widths unless told otherwise:
| Name | Width | Target |
|------|-------|--------|
| Mobile | 375px | iPhone SE |
| Tablet | 768px | iPad portrait |
| Desktop | 1440px | Standard laptop |
## Integration with frontend-dev Agent
The typical iteration loop:
```
frontend-dev implements component
→ visual-verdict takes screenshot + scores
→ REVISE: sends concrete fix list to frontend-dev
→ frontend-dev applies fixes
→ visual-verdict rescores
→ Repeat until PASS (max 5 iterations)
→ PASS: task marked complete
```
### Iteration Limit
Maximum 5 iterations per component. If PASS is not reached after 5 attempts:
```
ESCALATION: Visual QA failed after 5 iterations
Component: [name]
Best score achieved: [score]
Blocker: [specific issue that keeps failing]
Recommendation: [designer clarification needed / fundamentally wrong approach]
```
## Integration with clone-website Skill
When cloning an external site, visual-verdict is the quality gate:
```
1. clone-website skill produces initial implementation
2. visual-verdict screenshots both original and clone
3. Scores similarity across all 6 dimensions
4. REVISE verdict triggers targeted fixes
5. PASS verdict confirms clone is production-ready
```
## Color Comparison Method
Color is compared by extracting computed CSS values and comparing:
```
Reference color: #1D4ED8 (rgb 29, 78, 216)
Actual color: #2563EB (rgb 37, 99, 235)
Delta E (perceived): 6.2 — noticeable, score -15
```
Delta E thresholds:
- Delta E < 2: Imperceptible — full score
- Delta E 2-5: Barely noticeable — minor deduction
- Delta E 5-10: Noticeable — moderate deduction
- Delta E > 10: Clearly wrong — major deduction
## Typography Extraction
Typography is checked by reading computed styles, not pixels:
```javascript
// Extract from running implementation
const element = document.querySelector('.hero-title')
const styles = window.getComputedStyle(element)
{
fontFamily: styles.fontFamily, // Compare against reference
fontSize: styles.fontSize, // "32px" vs expected "36px"
fontWeight: styles.fontWeight, // "400" vs expected "700"
lineHeight: styles.lineHeight, // "1.5" vs expected "1.25"
letterSpacing: styles.letterSpacing // "normal" vs expected "-0.02em"
}
```
## Common Visual Bugs Caught
- Font weight 400 instead of 700 (bold not applying)
- Tailwind class not included in safelist (purged in build)
- Media query breakpoint uses px when rem expected
- Hover state not added after copy-paste from design
- Icon imported but wrong size prop
- Color defined in CSS variable but variable not set in theme
- Padding/margin using `px-3` when design shows `px-4`
- Z-index issue hiding interactive element
- Border-radius value wrong (rounded vs rounded-lg vs rounded-full)
- Line-clamp not applied, text overflows on mobile
## Saving Screenshots for Regression History
After a PASS verdict, save the screenshot as the new baseline:
```bash
cp /tmp/verdict-current-{timestamp}.png \
~/.claude/benchmarks/screenshots/{component-name}-{date}.png
```
On the next change, this saved screenshot becomes the reference for regression comparison.
---
**Remember**: A visual that looks "pretty close" in a quick glance often has 3-5 compounding issues that add up to a broken experience. Score it. Trust the number, not the impression.