---
name: pentest-whitebox-code-review
description: Source code security audit using backward taint analysis, slot type classification, render context verification, and 3-phase parallel review producing an exploitation queue.
---

# Pentest Whitebox Code Review

## Purpose
Perform systematic white-box source code security audit using Shannon's backward taint analysis methodology. Traces from dangerous sinks back to user-controlled sources, classifies injection contexts by slot type, verifies XSS render contexts, and produces a prioritized exploitation queue for downstream proof-driven exploitation.

## Prerequisites

### Authorization Requirements
- **Written authorization** with explicit scope for source code review
- **Source code access** — full repository with version control history
- **Architecture documentation** if available (data flow diagrams, API specs)
- **Deployment configuration** access (environment variables, secrets management)

### Environment Setup
- semgrep with custom rules for taint analysis
- CodeQL database built for target language
- ripgrep for fast pattern searching
- jadx for Android APK decompilation (if applicable)
- Source map extraction tools for minified JavaScript
- AST parsing tools for target language (tree-sitter, babel, etc.)

## Core Workflow

### Phase 1: Discovery
1. **Architecture Mapping**: Identify application layers (routing, controllers, services, data access, templates). Map data flow from HTTP entry points through business logic to database/file/external sinks.
2. **Entry Point Enumeration**: Catalog all user-controlled input sources — HTTP parameters, headers, cookies, file uploads, WebSocket messages, environment variables, database reads of user-stored data.
3. **Security Pattern Inventory**: Identify existing security controls — input validation functions, output encoding helpers, parameterized query patterns, CSRF protections, authentication middleware, rate limiters.

### Phase 2: Vulnerability Analysis (5 Parallel Tracks)
4. **Injection Sink Hunting**: Backward taint from SQL/command/file/template sinks to sources. Classify each sink by slot type: SQL-val, SQL-ident, CMD-argument, FILE-path, TEMPLATE-expr. Verify whether parameterization or sanitization breaks the taint chain.
5. **XSS Render Context Analysis**: Identify all dynamic output points in templates/responses. Classify each by render context: HTML_BODY, HTML_ATTRIBUTE, JAVASCRIPT_STRING, URL_PARAM, CSS_VALUE. Verify context-appropriate encoding is applied at each output point.
6. **Authentication Checklist (9-point)**: Transport security, rate limiting, session management, token properties, session fixation resistance, password policy enforcement, login response uniformity, account recovery security, SSO/OAuth implementation.
7. **Authorization Model Review (3-type)**: Horizontal (same-role cross-user access), vertical (privilege escalation across roles), context-workflow (state-dependent authorization bypass).
8. **SSRF Sink Hunting**: Identify all outbound request sinks. Classify by type: classic (direct URL), blind (no response), semi-blind (partial response), stored (deferred execution). Trace URL construction from user input to request dispatch.

### Phase 3: Synthesis
9. **Confidence Scoring & Exploitation Queue**: Score each finding by taint chain completeness, sanitization bypass likelihood, and impact severity. Generate exploitation queue JSON for downstream exploit validation.

## Slot Type Classification

| Slot Type | Sink Pattern | Sanitization Required |
|-----------|-------------|----------------------|
| SQL-val | Query parameter value position | Parameterized query / prepared statement |
| SQL-ident | Table name, column name, ORDER BY | Allowlist validation |
| CMD-argument | Shell command argument | Argument escaping + allowlist |
| FILE-path | File read/write path construction | Path canonicalization + allowlist |
| TEMPLATE-expr | Template engine expression | Context-aware auto-escaping |

## Render Context Classification

| Context | Output Location | Encoding Required |
|---------|----------------|-------------------|
| HTML_BODY | Between HTML tags | HTML entity encoding |
| HTML_ATTRIBUTE | Inside attribute values | Attribute encoding + quoting |
| JAVASCRIPT_STRING | Inside JS string literals | JavaScript Unicode escaping |
| URL_PARAM | URL query parameter values | URL percent encoding |
| CSS_VALUE | Inside CSS property values | CSS hex encoding |

## Tool Categories

| Category | Tools | Purpose |
|----------|-------|---------|
| Taint Analysis | semgrep, CodeQL | Automated sink-to-source taint tracing |
| Pattern Search | ripgrep, ast-grep | Fast code pattern matching |
| Decompilation | jadx, sourcemap-extract | Recover source from compiled artifacts |
| AST Parsing | tree-sitter, babel | Language-aware code structure analysis |
| Dependency Audit | npm audit, pip-audit, snyk | Known vulnerability detection |

## References
- `references/tools.md` - Tool function signatures and parameters
- `references/workflows.md` - Taint analysis workflows and vulnerability patterns