---
name: ai-generated-ut-code-review
description: Use when reviewing or scoring AI-generated unit tests/UT code, especially when coverage, assertion effectiveness, or test quality is in question and a numeric score, risk level, or must-fix checklist is needed
---

# AI UT Code Review

## Overview
Review AI-generated unit tests for effectiveness, coverage, assertions, negative cases, determinism, and maintainability. Output a 0-10 score, a risk level, and a must-fix checklist. Overall line coverage **must be >= 80%**; otherwise risk is at least High.

## When to Use
- AI-generated UT/test code review or quality evaluation
- Need scoring, risk level, or must-fix checklist
- Questions about coverage or assertion validity

## Workflow
1. Confirm tests target the intended business code and key paths.
2. Check overall line coverage (>= 80% required).
3. Inspect assertions for behavioral validity; flag missing/ineffective assertions.
4. Verify negative/edge cases and determinism (no env/time dependency).
5. Score by rubric, assign risk, list must-fix items with evidence.

## Scoring (0-10)
Each dimension 0-2 points. Sum = total score.

| Dimension | 0 | 1 | 2 |
| --- | --- | --- | --- |
| Coverage | < 80% | 80%+ but shallow | 80%+ and meaningful |
| Assertion Quality | No/invalid assertions | Some weak assertions | Behavior-anchored assertions |
| Negative & Edge | Missing | Partial | Comprehensive |
| Data & Isolation | Flaky/env-dependent | Mixed | Deterministic, isolated |
| Maintainability | Hard to read/modify | Mixed quality | Clear structure & naming |

## Risk Levels
- **Blocker**: Coverage < 80% AND key paths untested, or tests have no meaningful assertions
- **High**: Coverage < 80% OR assertions largely ineffective
- **Medium**: Coverage OK but weak edge cases or fragile design
- **Low**: Minor improvements

## Must-Fix Checklist
- Overall line coverage >= 80%
- Each test has at least one behavior-relevant assertion
- Negative/exception cases exist for core logic
- Tests are deterministic and repeatable

## AI-Generated Test Pitfalls (Check Explicitly)
- No assertions or assertions unrelated to behavior (e.g., only not-null)
- Over-mocking hides real behavior
- Only happy-path coverage
- Tests depend on time/network/env
- Missing verification of side effects

## Output Format (Required, Semi-fixed)
- `Score`: x/10 — Coverage x, Assertion Quality x, Negative & Edge x, Data & Isolation x, Maintainability x
- `Risk`: Low/Medium/High/Blocker — 简述原因（1 行）
- `Must-fix`:
  - [动作 + 证据]
  - [动作 + 证据]
- `Key Evidence`:
  - 引用具体测试用例名或覆盖率报告摘要（1-2 条）
- `Notes`:
  - 最小修复建议或替代方案（1-2 行）

**Rules:**
- 覆盖率 < 80% 风险至少 High，并必须列入 `Must-fix`
- 无断言/无效断言直接提升风险级别，必须列入 `Must-fix`
- 至少 2 条证据；证据不足需说明并降分

## Common Mistakes
- 仅报告覆盖率，不评价断言有效性
- 把日志输出当成断言
- 忽略失败路径/异常路径

## Example (Concise)
Score: 5/10 (Coverage 1, Assertion 0, Negative 1, Data 2, Maintainability 1)
Risk: High
Must-fix:
- Tests for `parseConfig()` contain no behavior assertions (only logs)
- No negative cases for malformed input
Key Evidence:
- `parseConfig()` tests only assert no crash
- Coverage report shows 62% lines
Notes:
- Add assertions on outputs and side effects; add invalid input tests.