--- name: pentest-validation description: "Orchestrate security finding validation through graduated exploitation. 4-phase pipeline: recon (SAST/DAST), analysis (code review), validation (exploit proof), report (No Exploit, No Report gate). Eliminates false positives by proving exploitability." category: specialized-testing priority: critical tokenEstimate: 1500 agents: [qe-pentest-validator, qe-security-scanner, qe-security-reviewer, qe-security-auditor, qe-quality-gate] implementation_status: optimized optimization_version: 1.0 last_optimized: 2026-02-08 dependencies: [security-testing] quick_reference_card: true tags: [pentest, exploitation, security-validation, shannon, no-exploit-no-report, graduated-exploitation] trust_tier: 3 validation: schema_path: schemas/output.json validator_path: scripts/validate-config.json eval_path: evals/pentest-validation.yaml --- # Pentest Validation When validating security findings: 1. REQUIRE explicit authorization for target URL 2. SCAN with qe-security-scanner (SAST + dependency + secrets) 3. ANALYZE with qe-security-reviewer + qe-security-auditor (parallel) 4. VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type) 5. REPORT only confirmed findings with PoC evidence ("No Exploit, No Report") 6. UPDATE exploit playbook with new patterns **Quality Gates:** - Authorization confirmed before ANY exploitation - Target URL is staging/dev (NOT production) - Budget cap enforced ($15 default) - Time cap enforced (30 min default) - All exploitation attempts logged ## Quick Reference Card ### The 4-Phase Pipeline | Phase | Agent(s) | Purpose | Parallelism | |-------|----------|---------|-------------| | **1. Recon** | qe-security-scanner | SAST, DAST, dependency scan, secrets | Internal parallel | | **2. Analysis** | qe-security-reviewer + qe-security-auditor | Code review + compliance check | Both in parallel | | **3. Validation** | qe-pentest-validator | Graduated exploit validation | Per-vuln-type parallel | | **4. Report** | qe-quality-gate | "No Exploit, No Report" filter | Sequential | ### Graduated Exploitation Tiers | Tier | Handler | Cost | Latency | Use When | |------|---------|------|---------|----------| | **1** | Agent Booster (WASM) | $0 | <1ms | Code pattern is conclusive (eval, innerHTML, hardcoded creds) | | **2** | Haiku | $0.0002 | ~500ms | Need payload test against live target | | **3** | Sonnet/Opus | $0.003-$0.015 | 2-5s | Full exploit chain with data proof | ### When to Use This Skill | Scenario | Tier | Estimated Cost | |----------|------|----------------| | PR security review (source only) | 1 | $0 | | Pre-release validation (staging) | 1-2 | $1-5 | | Full pentest validation | 1-3 | $5-15 | | Compliance audit evidence | 1-3 | $5-15 | --- ## Configuration ```yaml pentest: target_url: https://staging.app.com # REQUIRED for Tier 2-3 source_repo: ./src # REQUIRED for Tier 1+ exploitation_tier: 2 # 1=pattern-only, 2=payload-test, 3=full-exploit vuln_types: # Which pipelines to run - injection # SQL, NoSQL, command injection - xss # Reflected, stored, DOM XSS - auth # Auth bypass, session, JWT - ssrf # URL scheme abuse, metadata max_cost_usd: 15 # Budget cap per run timeout_minutes: 30 # Time cap per run require_authorization: true # MUST confirm target ownership no_production: true # Block production URLs production_patterns: # URL patterns to block - "*.prod.*" - "api.*" - "www.*" ``` --- ## Safeguards (Mandatory) ### Authorization Gate Every pentest validation run MUST: 1. Display target URL and exploitation tier to user 2. Require explicit confirmation: "I own/authorized testing of this target" 3. Log authorization with timestamp 4. Block if target URL matches production patterns ### What This Skill Does NOT Do - Full autonomous reconnaissance (Nmap, Subfinder) - Zero-day exploit development - Attack targets without explicit authorization - Test production systems - Store actual exfiltrated data (only proof of access) - Social engineering or phishing simulation - Port scanning or service discovery --- ## Validation Pipelines ### Injection Pipeline | Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | SQL injection | String concat in query | `' OR '1'='1` response diff | UNION SELECT data extraction | | NoSQL injection | `$where`, `$gt` in query | Operator injection test | Collection enumeration | | Command injection | `exec()`, `system()` calls | Command delimiter test | Reverse shell proof | | LDAP injection | String concat in filter | Wildcard injection | Directory enumeration | ### XSS Pipeline | Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | Reflected XSS | No output encoding | `` reflection | Browser JS execution via Playwright | | Stored XSS | `innerHTML` assignment | Payload stored + retrieved | Cookie theft PoC | | DOM XSS | `document.write(location)` | Fragment injection | DOM manipulation proof | ### Auth Pipeline | Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | JWT none | No algorithm validation | Modified JWT accepted | Admin access with forged token | | Session fixation | No session rotation | Pre-set session reused | Cross-user session hijack | | Credential stuffing | No rate limiting | 100 attempts unblocked | Valid credential discovery | | IDOR | No authorization check | Access other user data | Full CRUD on foreign resources | ### SSRF Pipeline | Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | Internal URL | User-controlled URL fetch | `http://169.254.169.254` | Cloud metadata extraction | | DNS rebinding | URL validation bypass | Rebind to internal IP | Internal service access | | Protocol smuggling | URL scheme not restricted | `file:///etc/passwd` | File content in response | --- ## Agent Coordination ### Orchestration Pattern ```typescript // Phase 1: Recon (parallel scans) await Task("Security Scan", { target: "./src", layers: { sast: true, dast: true, dependencies: true, secrets: true } }, "qe-security-scanner"); // Phase 2: Analysis (parallel review) await Promise.all([ Task("Code Security Review", { findings: phase1Results, depth: "comprehensive" }, "qe-security-reviewer"), Task("Compliance Audit", { findings: phase1Results, frameworks: ["owasp-top-10"] }, "qe-security-auditor") ]); // Phase 3: Validation (graduated exploitation) await Task("Exploit Validation", { findings: [...phase1Results, ...phase2Results], target_url: "https://staging.app.com", exploitation_tier: 2, vuln_types: ["injection", "xss", "auth", "ssrf"], max_cost_usd: 15, timeout_minutes: 30 }, "qe-pentest-validator"); // Phase 4: Report ("No Exploit, No Report" gate) await Task("Security Quality Gate", { findings: phase3Results.confirmedFindings, gate: "no-exploit-no-report", require_poc: true }, "qe-quality-gate"); ``` ### Finding Classification | Status | Meaning | Action | |--------|---------|--------| | `confirmed-exploitable` | Exploitation succeeded with PoC | Report with evidence | | `likely-exploitable` | Partial exploitation, defenses detected | Report with caveats | | `not-exploitable` | All exploitation attempts failed | Filter from report | | `inconclusive` | WAF/defense blocked, unclear if vulnerable | Report for manual review | --- ## Exploit Playbook Memory ### Namespace Structure ``` aqe/pentest/ playbook/ exploit/{vuln_type}/{tech_stack}/{technique} bypass/{defense_type}/{technique} payload/{vuln_type}/{variant} results/ validation-{timestamp} poc/ {finding_id}-poc ``` ### Learning Loop 1. **Before validation**: Query playbook for known patterns matching findings 2. **During validation**: Try known payloads first (higher success rate) 3. **After validation**: Store new successful patterns with confidence scores 4. **Over time**: Agent converges on most effective payloads per tech stack --- ## Cost Optimization ### Estimated Cost by Scenario | Scenario | Tier Mix | Findings | Est. Cost | Est. Time | |----------|----------|----------|-----------|-----------| | PR check (source only) | 100% Tier 1 | 5 | $0 | <5s | | Sprint validation | 70% T1, 30% T2 | 15 | $2-5 | 5-10 min | | Release validation | 40% T1, 40% T2, 20% T3 | 25 | $8-15 | 15-30 min | | Full pentest | 20% T1, 30% T2, 50% T3 | 40 | $15-30 | 30-60 min | ### Cost vs Shannon Comparison | Metric | Shannon | AQE Pentest Validation | |--------|---------|----------------------| | Cost per run | ~$50 | $5-15 (graduated tiers) | | Runtime | 60-90 min | 15-30 min (parallel pipelines) | | False positive rate | Low (exploit-proven) | Low (same principle) | | Learning | None (static prompts) | ReasoningBank playbook | --- ## Success Metrics | Metric | Target | Measurement | |--------|--------|-------------| | False positive reduction | >60% of findings eliminated | Pre/post validator comparison | | Exploit confirmation rate | >80% of confirmed findings truly exploitable | Manual PoC verification | | Cost per run | <$15 USD | Token tracking per pipeline | | Time per run | <30 minutes | Execution time metrics | | Playbook growth | 100+ patterns after 6 months | Memory namespace count | --- ## Related Skills - [security-testing](../security-testing/) - OWASP vulnerability scanning - [qe-security-compliance](../qe-security-compliance/) - SAST/DAST automation - [compliance-testing](../compliance-testing/) - Regulatory compliance - [api-testing-patterns](../api-testing-patterns/) - API security testing - [chaos-engineering-resilience](../chaos-engineering-resilience/) - Security under chaos --- ## Remember **"No Exploit, No Report."** A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report. **Think proof, not prediction.** Don't report what MIGHT be vulnerable. Prove what IS vulnerable.