--- name: deep-dive-analysis description: AI-powered systematic codebase analysis. Combines mechanical structure extraction with Claude's semantic understanding to produce documentation that captures not just WHAT code does, but WHY it exists and HOW it fits into the system. Includes pattern recognition, red flag detection, flow tracing, and quality assessment. Use for codebase analysis, documentation generation, architecture understanding, or code review. version: 3.1.0 dependencies: python>=3.13 --- # Deep Dive Analysis Skill ## Overview This skill combines **mechanical structure extraction** with **Claude's semantic understanding** to produce comprehensive codebase documentation. Unlike simple AST parsing, this skill captures: - **WHAT** the code does (structure, functions, classes) - **WHY** it exists (business purpose, design decisions) - **HOW** it integrates (dependencies, contracts, flows) - **CONSEQUENCES** of changes (side effects, failure modes) ### Capabilities **Mechanical Analysis (Scripts):** - Extract code structure (classes, functions, imports) - Map dependencies (internal/external) - Find symbol usages across the codebase - Track analysis progress - Classify files by criticality **Semantic Analysis (Claude AI):** - Recognize architectural and design patterns - Identify red flags and anti-patterns - Trace data and control flows - Document contracts and invariants - Assess quality and maintainability **Documentation Maintenance:** - Review and maintain documentation (Phase 8) - Fix broken links and update navigation indexes - Analyze and rewrite code comments (antirez standards) **Use this skill when:** - Analyzing a codebase you're unfamiliar with - Generating documentation that explains WHY, not just WHAT - Identifying architectural patterns and anti-patterns - Performing code review with semantic understanding - Onboarding to a new project - Creating documentation for new contributors ## Prerequisites 1. **analysis_progress.json** must exist in project root (created by DEEP_DIVE_PLAN setup) 2. **DEEP_DIVE_PLAN.md** should be reviewed to understand phase structure ## CRITICAL PRINCIPLE: ABSOLUTE SOURCE OF TRUTH > **THE DOCUMENTATION GENERATED BY THIS SKILL IS THE ABSOLUTE AND UNQUESTIONABLE SOURCE OF TRUTH FOR YOUR PROJECT.** > > **ANY INFORMATION NOT VERIFIED WITH IRREFUTABLE EVIDENCE FROM SOURCE CODE IS FALSE, UNRELIABLE, AND UNACCEPTABLE.** ### IMPORTANT LIMITATION: Verification is Multi-Layer ``` ╔══════════════════════════════════════════════════════════════════════════════╗ ║ VERIFICATION TRUST MODEL ║ ╠══════════════════════════════════════════════════════════════════════════════╣ ║ Layer 1: TOOL-VALIDATED ║ ║ └── Automated checks: file exists, line in range, AST symbol match ║ ║ └── Marker: [VALIDATED: file.py:123 @ 2025-12-20] ║ ║ ║ ║ Layer 2: HUMAN-VERIFIED ║ ║ └── Manual review: semantic correctness, behavior match ║ ║ └── Marker: [VERIFIED: file.py:123 by @reviewer @ 2025-12-20] ║ ║ ║ ║ Layer 3: RUNTIME-CONFIRMED ║ ║ └── Log/trace evidence of actual behavior ║ ║ └── Marker: [CONFIRMED: trace_id=abc123 @ 2025-12-20] ║ ╚══════════════════════════════════════════════════════════════════════════════╝ Tool validation catches STRUCTURAL issues (file moved, line shifted, symbol renamed). Human verification ensures SEMANTIC correctness (code does what doc says). Runtime confirmation proves BEHAVIORAL truth (system actually works this way). ALL THREE LAYERS are required for critical documentation. ``` ### The Iron Law of Documentation ``` ╔══════════════════════════════════════════════════════════════════════════════╗ ║ DOCUMENTATION = f(SOURCE_CODE) + VERIFICATION ║ ║ ║ ║ If NOT verified_against_code(statement) → statement is FALSE ║ ║ If NOT exists_in_codebase(reference) → reference is FABRICATED ║ ║ If NOT traceable_to_source(claim) → claim is SPECULATION ║ ╚══════════════════════════════════════════════════════════════════════════════╝ ``` ### Mandatory Rules (VIOLATION = FAILURE) 1. **NEVER** document anything without reading the actual source code first 2. **NEVER** assume any existing documentation, comment, or docstring is accurate 3. **NEVER** write documentation based on memory, inference, or "what should be" 4. **ALWAYS** derive truth EXCLUSIVELY from reading and tracing actual code 5. **ALWAYS** provide source file + line number for every technical claim 6. **ALWAYS** verify state machines, enums, constants against actual definitions 7. **TREAT** all pre-existing docs as unverified claims requiring validation 8. **MARK** any unverifiable statement as `[UNVERIFIED - REQUIRES CODE CHECK]` ### Verification Requirements | Documentation Type | Required Evidence | |-------------------|-------------------| | **Enum/State values** | Exact match with source code enum definition | | **Function behavior** | Code path tracing, actual implementation reading | | **Constants/Timeouts** | Variable definition in source with file:line | | **Message formats** | Message class definition, field validation | | **Architecture claims** | Import graph analysis, actual class relationships | | **Flow diagrams** | Verified against runtime logs OR code path analysis | ### Documentation Verification Status Every section of documentation MUST have one of these status markers: - `[VERIFIED: source_file.py:123]` - Confirmed against source code - `[VERIFIED: trace_id=xyz]` - Confirmed against runtime logs - `[UNVERIFIED]` - Requires verification before trusting - `[DEPRECATED]` - Code has changed, documentation outdated **UNVERIFIED documentation is UNTRUSTED documentation.** ### CRITICAL PRINCIPLE: NO HISTORICAL DEPTH > **DOCUMENTATION DESCRIBES ONLY THE CURRENT STATE OF THE ART.** > > **NO HISTORY. NO ARCHAEOLOGY. NO "WAS". ONLY "IS".** ``` ╔══════════════════════════════════════════════════════════════════════════════╗ ║ THE TEMPORAL PURITY PRINCIPLE ║ ╠══════════════════════════════════════════════════════════════════════════════╣ ║ Documentation = PRESENT_TENSE(current_implementation) ║ ║ ║ ║ FORBIDDEN: ║ ║ ✗ "was/were/previously/formerly/used to" ║ ║ ✗ "deprecated since version X" → just REMOVE it ║ ║ ✗ "changed from X to Y" → only describe Y ║ ║ ✗ "in the old system..." → irrelevant, delete ║ ║ ✗ inline changelogs → use CHANGELOG.md or git ║ ║ ║ ║ REQUIRED: ║ ║ ✓ Present tense: "The system uses..." not "The system used..." ║ ║ ✓ Current state only: Document what IS, not what WAS ║ ║ ✓ Git for archaeology: History lives in version control, not docs ║ ╚══════════════════════════════════════════════════════════════════════════════╝ ``` **The Rule:** > When you find documentation containing historical language, **DELETE IT**. > Git blame exists for archaeology. Documentation exists for the present. ## Available Commands ### 1. Analyze Single File Extract structure, dependencies, and usages for one file: ```bash python .claude/skills/deep-dive-analysis/scripts/analyze_file.py \ --file src/utils/circuit_breaker.py \ --output-format markdown ``` **Parameters:** - `--file` / `-f`: Relative path to file to analyze - **REQUIRED** - `--output-format` / `-o`: Output format (json, markdown, summary) - default: summary - `--find-usages` / `-u`: Also find all usages of exported symbols - default: false - `--update-progress` / `-p`: Update analysis_progress.json - default: false **Output includes:** - File classification (Critical/High-Complexity/Standard/Utility) - Classes with methods and attributes - Functions with signatures - Internal imports (within project) - External imports (third-party) - External calls (database, network, filesystem, messaging, ipc) - State mutations identified - Error handling patterns ### 2. Check Progress View analysis progress by phase: ```bash python .claude/skills/deep-dive-analysis/scripts/check_progress.py \ --phase 1 \ --status pending ``` **Parameters:** - `--phase` / `-p`: Filter by phase number (1-7) - `--status` / `-s`: Filter by status (pending, analyzing, done, blocked) - `--classification` / `-c`: Filter by classification (critical, high-complexity, standard, utility) - `--verification-needed`: Show only files needing runtime verification ### 3. Find Usages Find all usages of a symbol across the codebase: ```bash python .claude/skills/deep-dive-analysis/scripts/analyze_file.py \ --symbol CircuitBreaker \ --file src/utils/circuit_breaker.py ``` ### 4. Generate Phase Report Generate documentation for an entire phase: ```bash python .claude/skills/deep-dive-analysis/scripts/analyze_file.py \ --phase 1 \ --output-format markdown \ --output-file docs/01_domains/COMMON_LIBRARY.md ``` --- ## Phase 8: Documentation Review Commands ### 5. Scan Documentation Health Discover all documentation files and generate health report: ```bash python .claude/skills/deep-dive-analysis/scripts/doc_review.py scan \ --path docs/ \ --output doc_health_report.json ``` **Output includes:** - Total file count per directory - Files with TODO/FIXME/TBD markers - Files missing last_updated metadata - Large files (>1500 lines) candidates for splitting ### 6. Validate Links Find all broken links in documentation: ```bash python .claude/skills/deep-dive-analysis/scripts/doc_review.py validate-links \ --path docs/ \ --fix # Optional: auto-remove broken links ``` **Actions:** - Extracts all relative markdown links `](../path/to/file.md)` - Verifies target files exist - Reports broken links with source file and line number - With `--fix`: removes or updates broken references ### 7. Verify Against Source Code Verify documentation accuracy against actual source code: ```bash python .claude/skills/deep-dive-analysis/scripts/doc_review.py verify \ --doc docs/agents/lifecycle.md \ --source src/agents/lifecycle.py ``` **Verification includes:** - Documented states vs actual enum values - Documented methods vs actual class methods - Documented constants vs actual values - Flags discrepancies as DRIFT ### 8. Update Navigation Indexes Refresh SEARCH_INDEX.md and BY_DOMAIN.md with current file counts: ```bash python .claude/skills/deep-dive-analysis/scripts/doc_review.py update-indexes \ --search-index docs/00_navigation/SEARCH_INDEX.md \ --by-domain docs/00_navigation/BY_DOMAIN.md ``` **Updates:** - Total file counts - Files per directory statistics - Version and last_updated timestamps - Removes references to deleted files ### 9. Full Documentation Maintenance Run complete Phase 8 workflow: ```bash python .claude/skills/deep-dive-analysis/scripts/doc_review.py full-maintenance \ --path docs/ \ --auto-fix \ --output doc_health_report.json ``` **Executes in order:** 1. Scan documentation health 2. Validate and fix broken links 3. Identify obsolete files (no inbound links, references deleted code) 4. Update navigation indexes 5. Generate final health report --- ## Comment Quality Commands (Antirez Standards) These commands analyze and rewrite code comments following the [antirez commenting standards](https://antirez.com/news/124). ### 10. Analyze Comment Quality Analyze comments in a single file: ```bash python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py analyze \ src/main.py \ --report ``` **Options:** - `--report` / `-r`: Generate detailed markdown report - `--json`: Output as JSON for programmatic use - `--issues-only` / `-i`: Show only problematic comments **Output includes:** - Comment classification (function, design, why, teacher, checklist, guide) - Issue detection (trivial, debt, backup comments) - Suggested rewrites for problematic comments - Statistics and ratios ### 11. Scan Directory for Comment Issues Analyze all Python files in a directory: ```bash python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py scan \ src/ \ --recursive \ --issues-only ``` **Options:** - `--recursive` / `-r`: Include subdirectories - `--issues-only` / `-i`: Show only files with issues - `--json`: Output as JSON ### 12. Generate Comment Health Report Create comprehensive markdown report for entire codebase: ```bash python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py report \ src/ \ --output comment_health.md ``` **Report includes:** - Executive summary with totals - Comment quality breakdown (keep/enhance/rewrite/delete) - Comment type distribution - Files needing attention (ranked by issue count) - Sample issues with file:line references - Actionable recommendations ### 13. Rewrite Comments Apply comment improvements to a file: ```bash # Dry run (preview changes) python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py rewrite \ src/main.py # Apply changes with backup python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py rewrite \ src/main.py \ --apply \ --backup ``` **Options:** - `--apply` / `-a`: Actually modify the file (default: dry run) - `--backup` / `-b`: Create .bak backup before modifying - `--output` / `-o`: Write to different file instead of in-place **Actions taken:** - DELETE: Remove trivial comments and backup (commented-out code) - REWRITE: Add suggested improvements for debt comments (TODO/FIXME) ### 14. View Standards Reference Display the antirez commenting standards: ```bash python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py standards ``` Shows the complete taxonomy of good vs bad comments with examples. --- ### Comment Type Classification | Type | Category | Description | Action | |------|----------|-------------|--------| | **function** | GOOD | API docs at function/class top | Keep/Enhance | | **design** | GOOD | File-level algorithm explanations | Keep | | **why** | GOOD | Explains reasoning behind code | Keep | | **teacher** | GOOD | Educates about domain concepts | Keep | | **checklist** | GOOD | Reminds of coordinated changes | Keep | | **guide** | GOOD | Section dividers, structure | Keep sparingly | | **trivial** | BAD | Restates what code says | Delete | | **debt** | BAD | TODO/FIXME without plan | Rewrite/Resolve | | **backup** | BAD | Commented-out code | Delete | ### Comment Quality Workflow ``` 1. SCAN ├── Run: rewrite_comments.py scan
├── Batch 3: Delete obsolete files
│ └── Manual review + deletion
├── Batch 4: Update navigation indexes
│ └── Run: doc_review.py update-indexes
└── Batch 5: Update timestamps
└── Set last_updated on verified files
3. VERIFICATION
├── Run: doc_review.py scan (confirm improvements)
├── Run: doc_review.py validate-links (confirm zero broken)
└── Generate final doc_health_report.json
```
## Resources
- **Scripts**: `scripts/` - Python analysis tools
- `analyze_file.py` - Source code analysis (Phases 1-7)
- `check_progress.py` - Progress tracking
- `doc_review.py` - Documentation maintenance (Phase 8)
- `comment_rewriter.py` - Comment analysis engine (antirez standards)
- `rewrite_comments.py` - Comment quality CLI tool
- **Templates**: `templates/` - Output templates
- `analysis_report.md` - Module-level report template
- `semantic_analysis.md` - AI-powered per-file analysis template
- **References**: `references/` - Analysis methodology docs
- `DEEP_DIVE_PLAN.md` - Master analysis plan with all phase definitions
- `ANTIREZ_COMMENTING_STANDARDS.md` - Complete antirez comment taxonomy
- `AI_ANALYSIS_METHODOLOGY.md` - AI semantic analysis methodology
- `SEMANTIC_PATTERNS.md` - Pattern recognition guide for Claude
- **analysis_progress.json**: Progress tracking state
- **doc_health_report.json**: Documentation health metrics (generated)
- **comment_health.md**: Comment quality report (generated)