--- name: adversarial-review description: Devil's Advocate stress-testing for code, architecture, PRs, and decisions. Surfaces hidden flaws through structured adversarial analysis with metacognitive depth. Use for high-stakes review, stress-testing choices, or when the user wants problems found deliberately. NOT for routine code review (use engineering:code-review). Triggers on "스트레스 테스트", "stress test", "devil's advocate", "반론", "이거 괜찮아", "문제 없을까", "깊은 리뷰", "critical review", "adversarial". argument-hint: "[code/decision/PR to stress-test]" allowed-tools: Read, Grep, Glob, Bash, Agent --- # Adversarial Review Structured Devil's Advocate analysis that surfaces hidden flaws, edge cases, and blind spots. ## Rules (Absolute) 1. **Default to finding problems.** Conduct rigorous analysis across all three vectors. Report every genuine issue found — do not downplay or omit real concerns. If thorough analysis yields fewer than 3 issues, that is a legitimate outcome indicating strong work. Never inflate minor observations to fill a quota, and never fabricate concerns. 2. **Attack the strongest points.** Don't waste time on trivial issues. Target the parts the author is most confident about — that's where hidden assumptions live. 3. **Separate severity levels.** Not all issues are equal. Clearly distinguish critical from minor. 4. **Propose alternatives.** Every criticism must include a concrete alternative or mitigation. 5. **Steel-man first.** Before attacking, state the strongest version of why the current approach was chosen. This prevents straw-man critiques. 6. **No ad hominem.** Critique the work, not the author. Be sharp but constructive. ## Ambiguous Input Handling If the subject under review is unclear or too broad, **ask one clarifying question before proceeding.** Do not review a vague target. Examples of ambiguous input that should trigger a clarification question: - "Review my project" (which aspect? architecture? security? specific files?) - "Is this okay?" with no context (what is "this"?) - A topic so broad that a meaningful adversarial review would be unfocused One question. Get the answer. Then proceed. ## Process ### Phase 1: Steel-Man Before any criticism, articulate: - **Why was this approach chosen?** (Best possible justification) - **What does it optimize for?** (Performance? Simplicity? Time-to-market?) - **Under what conditions is this the right choice?** This ensures the subsequent critique is intellectually honest, not reflexive opposition. ### Phase 2: Adversarial Attack (3 Vectors) Apply three independent attack vectors simultaneously: #### Vector A: Logical Soundness **Scope: Does the REASONING hold?** Examine premises, conclusions, logical flow. - Are there logical contradictions or circular reasoning? - Are conclusions actually supported by the stated premises? - What unstated assumptions does the reasoning depend on? - Is there confirmation bias in the evidence selection? Do NOT examine implementation structure — that's Vector C. #### Vector B: Edge Case Assault **Scope: Does it SURVIVE reality?** Test against real-world conditions. - What happens at boundaries? (empty input, max load, concurrent access, zero state) - What's the failure mode? (graceful degradation vs. catastrophic failure) - What happens in 6 months? (scaling, maintenance burden, team changes) - What would a malicious actor exploit? Test behavior and outcomes, not internal structure. #### Vector C: Structural Integrity **Scope: Is the STRUCTURE sound?** Examine architecture and design. - Does each component have a single, clear responsibility? - Where are the coupling points and dependency chains? - What is the weakest structural link? - Which component, if changed, would cause the most cascading failures? Examine architecture, not logical reasoning. ### Phase 3: Severity Classification Classify every finding: | Severity | Symbol | Meaning | Action Required | |----------|--------|---------|-----------------| | Critical | `🔴` | Will cause production issues, security vulnerabilities, or data loss | Must fix before merge/deploy | | Major | `🟠` | Significant risk, performance issue, or maintainability problem | Should fix, blocking for merge | | Minor | `🟡` | Code smell, style issue, or small optimization opportunity | Consider fixing, non-blocking | | Note | `💡` | Observation, alternative approach, or future consideration | Informational only | ### Phase 4: Counter-Proposal For each Critical and Major finding, provide: 1. **What's wrong** (1-2 sentences) 2. **Why it matters** (concrete impact) 3. **Suggested fix** (code snippet or approach) 4. **Trade-off of the fix** (nothing is free — what does the fix cost?) ## Output Format ```markdown ## Adversarial Review: [Subject] ### Steel-Man > [Why this approach makes sense — strongest justification] ### Findings #### 🔴 Critical: [Title] **Vector:** [Logical Soundness / Edge Case / Structural Integrity] **What:** [Description] **Impact:** [Concrete consequence] **Fix:** [Proposed solution] **Trade-off:** [Cost of the fix] #### 🟠 Major: [Title] ... #### 🟡 Minor: [Title] ... #### 💡 Note: [Title] ... ### Summary | Severity | Count | |----------|-------| | 🔴 Critical | N | | 🟠 Major | N | | 🟡 Minor | N | | 💡 Note | N | ### Verdict [PASS / PASS WITH CONDITIONS / FAIL] - [If PASS WITH CONDITIONS: list required changes] - [If FAIL: list blocking issues] ### Verdict Criteria - **FAIL**: Any Critical finding with no viable short-term mitigation, OR 3+ Major findings - **PASS WITH CONDITIONS**: Any Critical finding with viable mitigation, OR 1-2 Major findings - **PASS**: No Critical findings, no Major findings. Minor and Notes only. These thresholds ensure consistent verdicts across invocations. ### Hidden Assumptions Exposed - [Assumption 1 that the current approach relies on] - [Assumption 2 that could invalidate the approach if wrong] ``` ## Quality Calibration ### BAD Adversarial Review (Don't Do This) ``` ## Adversarial Review: User Auth Module ### Steel-Man > It works. ### Findings #### 🟡 Minor: Variable naming **Vector:** Structural Integrity **What:** Some variables could be named better. **Impact:** Readability. **Fix:** Rename them. **Trade-off:** Time. #### 🟡 Minor: Could add more comments **Vector:** Structural Integrity **What:** Code could use more comments. **Impact:** Future developers might be confused. **Fix:** Add comments. **Trade-off:** None. #### 🟡 Minor: Consider using TypeScript **Vector:** Logical Soundness **What:** TypeScript would catch type errors. **Impact:** Fewer runtime bugs. **Fix:** Migrate to TypeScript. **Trade-off:** Migration effort. ### Verdict: PASS ``` **Why this is bad:** - Steel-man is lazy — no genuine engagement with design intent - All findings are shallow nitpicks, not substantive concerns - No Critical or Major issues even considered — no real stress-testing happened - Vectors are misapplied ("variable naming" is not Structural Integrity) - Fixes are vague ("rename them", "add comments") with no specifics - "Consider using TypeScript" is a preference, not a flaw found through analysis ### GOOD Adversarial Review (Do This) ``` ## Adversarial Review: User Auth Module ### Steel-Man > JWT-based stateless auth was chosen to avoid session storage overhead and > enable horizontal scaling. The 15-minute access token + 7-day refresh token > split balances security against UX friction. Using bcrypt with cost factor 12 > is a well-established choice for password hashing. This design optimizes for > scalability and simplicity in a microservices context. ### Findings #### 🔴 Critical: No refresh token rotation enables silent session hijacking **Vector:** Edge Case **What:** Refresh tokens are long-lived (7 days) and not rotated on use. A stolen refresh token grants persistent access for the full 7-day window with no detection mechanism. **Impact:** An attacker who intercepts one refresh token (via XSS, network sniffing, or device access) maintains access even after the user changes their password, since token revocation is not implemented. **Fix:** Implement refresh token rotation: issue a new refresh token on each refresh, invalidate the previous one, and maintain a token family chain to detect reuse (which indicates theft). **Trade-off:** Requires server-side storage for the token family chain, partially negating the "stateless" benefit. Adds ~50ms per refresh request. #### 🟠 Major: Rate limiting uses in-memory store, lost on restart **Vector:** Structural Integrity **What:** Login rate limiting uses a Map() that resets on process restart. **Impact:** An attacker can bypass rate limiting by timing attempts around deploys or crashes. In a multi-instance deployment, each instance has its own counter, effectively multiplying the allowed attempts by instance count. **Fix:** Move rate limit state to Redis with TTL-based expiry. **Trade-off:** Adds Redis as an infrastructure dependency for the auth service. ~2ms latency per rate limit check. ### Verdict: PASS WITH CONDITIONS - Must implement refresh token rotation before production deploy - Should migrate rate limiting to shared store before scaling to >1 instance ``` **Why this is good:** - Steel-man genuinely engages with the design rationale and trade-offs - Findings target real security risks, not style preferences - Each finding has specific, concrete impact (not "readability" or "confusion") - Fixes include implementation direction AND quantified trade-offs - Vectors are correctly applied and don't overlap - Verdict follows directly from the severity thresholds ## Specialized Modes ### Code Review Mode When reviewing code (files, PRs, diffs): - Read all changed files with the Read tool - Check for OWASP Top 10 vulnerabilities - Verify error handling completeness - Assess test coverage of edge cases - Review naming, structure, and abstraction levels ### Architecture Decision Mode When reviewing architecture/design decisions: - Evaluate scalability assumptions - Test with 10x and 100x current load mentally - Check for single points of failure - Assess vendor lock-in risks - Consider team capability alignment ### PR Review Mode When reviewing pull requests: - Focus on behavioral changes, not style - Check for breaking changes to public APIs - Verify backward compatibility - Assess rollback strategy - Check migration paths for data changes ## When to Use - Before merging any significant PR - Before committing to an architecture decision - When evaluating third-party dependencies - When someone says "this should be fine" - When stakes are high and mistakes are expensive - After completing implementation, before calling it done ## When NOT to Use - Trivial changes (typos, formatting) - When exploration is needed first (use `cross-verified-research`) - When generating alternatives (use `creativity-sampler`) - When you need neutral, exhaustive analysis without a verdict (use `deep-dive-analyzer` — it understands; this skill *challenges*) - Personal preferences or subjective design choices ## Integration Notes - **With creativity-sampler:** After adversarial review reveals problems, use creativity-sampler to generate alternative approaches - **With cross-verified-research:** Use research to verify claims made during review (e.g., "is this really a security risk?"). For a full-rigor workflow: `cross-verified-research` → `adversarial-review` - **With deep-dive-analyzer:** For understanding before challenging: `deep-dive-analyzer` (understand) → `adversarial-review` (challenge). This skill focuses on finding flaws; deep-dive focuses on neutral exhaustive analysis. - **With orchestrator strategy team:** Complements the strategy team's Devil's Advocate agent with structured methodology