---
name: document-project
description: Generate comprehensive architecture documentation automatically from existing codebase analysis. This skill should be used when working with brownfield projects or updating outdated documentation.
acceptance:
  - documentation_generated: "All three documentation files created (architecture.md, standards.md, patterns.md)"
  - confidence_sufficient: "Overall confidence score ≥70% across all analyzed sections"
  - review_checklist_created: "Human review checklist generated identifying low-confidence areas"
  - configuration_updated: "Project configuration updated with documentation paths and brownfield flag"
inputs:
  codebase_path:
    type: string
    required: true
    description: "Path to codebase to analyze (e.g., src/)"
  existing_docs_mode:
    type: enum
    values: ["merge", "replace", "supplement"]
    default: "merge"
    description: "How to handle existing documentation"
  include_tests:
    type: boolean
    default: true
    description: "Include test file analysis in documentation"
  max_files:
    type: number
    default: 1000
    description: "Maximum files to analyze (safety limit)"
outputs:
  documentation_files:
    type: object
    description: "Paths to generated documentation files"
  confidence_score:
    type: number
    description: "Overall confidence score (0-100%)"
  review_checklist:
    type: string
    description: "Path to human review checklist file"
  analysis_summary:
    type: object
    description: "Summary of analysis (files analyzed, patterns found, etc.)"
telemetry:
  emit: "skill.document-project.completed"
  track:
    - project_name
    - codebase_path
    - duration_ms
    - files_analyzed
    - lines_analyzed
    - confidence_score
    - patterns_identified
---

# Brownfield Project Documentation Generator

Generate comprehensive architecture documentation automatically by analyzing existing codebase structure, patterns, and conventions.

## Purpose

Analyze an existing codebase and generate three comprehensive documentation files:
1. **architecture.md** - Project structure, tech stack, data models, API specifications
2. **standards.md** - Coding standards, best practices, discovered conventions
3. **patterns.md** - Design patterns, architectural patterns, common conventions

This enables BMAD Enhanced to work with brownfield projects by reverse-engineering architecture from code.

## When to Use This Skill

This skill should be used when:
- Starting BMAD Enhanced with existing project (brownfield onboarding)
- Architecture documentation is outdated or missing
- Need to discover implicit patterns and conventions
- Onboarding to unfamiliar codebase

This skill should NOT be used when:
- Greenfield projects (write docs from scratch instead)
- Project already has current, comprehensive documentation
- Codebase >500K lines (too complex for automated analysis)
- Project has no clear structure

## Project Type Support

**Well-Supported Languages:**
- ✅ Node.js/TypeScript (excellent support)
- ✅ Python (good support)
- ✅ Go (good support)
- ✅ Java/Kotlin (good support)
- ✅ Rust (good support)

**Basic Support:**
- ⚠️ PHP, Ruby, C#/.NET (basic support)

**Optimal Codebase Size:** 10K-100K lines
- Smaller: May lack sufficient patterns to analyze
- Larger: Analysis may be too slow or complex

## Sequential Documentation Generation

Execute steps in order - each builds on previous analysis:

### Step 0: Configuration and Validation

**Purpose:** Verify project is suitable for automated documentation.

**Actions:**

1. **Load config** from `.claude/config.yaml` (codebasePath, existingDocs, includeTests, maxFiles)
2. **Validate structure:** Check path exists, identify languages, count files/lines, verify size (10K-100K recommended)
3. **Check existing docs:** Ask how to handle (merge/replace/supplement)
4. **Get confirmation:** Show summary (path, language, file count, lines, estimated time), ask to proceed

**See:** `references/templates.md#step-0-configuration-and-validation-output` for complete formats

**Halt if:**
- Codebase path not found
- No recognizable project structure
- Codebase too large (>500K lines)
- Unsupported language
- User declines to proceed

**Output:** Validation confirmation (project type, lines, existing docs mode, scope, ready status)

**Reference:** See [validation-criteria.md](references/validation-criteria.md) for detailed validation rules.

---

### Step 1: Analyze Project Structure

**Purpose:** Map file organization and module structure.

**Actions:**

1. **Scan directory structure:** Identify main dirs, detect patterns (feature vs type), note nesting
2. **Analyze file organization:** Files per dir, naming conventions, size distribution, co-location patterns
3. **Detect project type:** Backend API / Frontend App / Full-Stack / Library / Monorepo (based on directory structure)
4. **Map relationships:** Analyze imports, build dependency graph, identify core modules, detect circular dependencies

**Output:** Project structure analysis with type, organization, directory structure (dirs + file counts + line counts), key patterns

**See:** `references/templates.md#step-1-codebase-analysis-output` for complete format

**Reference:** See [analysis-techniques.md](references/analysis-techniques.md) for detailed analysis methods.

---

### Step 2: Analyze Technology Stack

**Purpose:** Identify languages, frameworks, and dependencies.

**Actions:**

1. Read package configuration:
   - **Node.js:** package.json
   - **Python:** requirements.txt, pyproject.toml
   - **Go:** go.mod
   - **Java:** pom.xml, build.gradle
   - **Rust:** Cargo.toml

2. Extract dependencies and identify frameworks:
   - Backend: Express, Django, Spring Boot, etc.
   - Frontend: React, Vue, Angular, etc.
   - Database: Prisma, TypeORM, SQLAlchemy, etc.
   - Testing: Jest, Pytest, JUnit, etc.

3. Detect runtime/platform:
   - Node.js/Python/JDK version
   - Database type (PostgreSQL, MySQL, MongoDB)

**Output:** Tech stack summary (runtime, backend/frontend frameworks, key libraries, testing tools, versions, confidence score)

**See:** `references/templates.md#step-2-technology-stack-analysis` for complete format

---

### Step 3: Extract Data Models and Schemas

**Purpose:** Document data structures and validation rules.

**Actions:**

1. Locate data model files:
   - Prisma schema: `prisma/schema.prisma`
   - TypeScript interfaces: `src/types/*.ts`, `src/models/*.ts`
   - Database migrations: `prisma/migrations/`, `migrations/`
   - Validation schemas: Zod, Yup, Joi schemas

2. Parse data models and extract validation rules

3. Analyze relationships:
   - One-to-many, many-to-many
   - Foreign keys and constraints

4. Detect data flow:
   - Request → Validation → Service → Repository → Database
   - Response transformation (DTOs)

**Output:** Data models summary (models with fields/types/constraints, validation rules, relationships, confidence score)

**See:** `references/templates.md#complete-architecture-md-template` for data model format

**Reference:** See [analysis-techniques.md](references/analysis-techniques.md) for model extraction patterns.

---

### Step 4: Analyze API Patterns

**Purpose:** Document API structure and conventions.

**Actions:**

1. Locate API definitions:
   - Express routes: `src/routes/**/*.ts`
   - Controllers: `src/controllers/**/*.ts`
   - OpenAPI/Swagger spec (if exists)
   - GraphQL schemas (if exists)

2. Extract endpoints and analyze request/response patterns:
   - Request validation (middleware)
   - Error response format
   - Success response format
   - Status codes used

3. Identify authentication:
   - JWT tokens, session-based, API keys, OAuth

4. Detect rate limiting and other middleware

**Output:** API specs summary (base URL, authentication type/headers/expiry, error/success response formats, confidence score)

**See:** `references/templates.md#complete-architecture-md-template` for API specification format

---

### Step 5: Extract Coding Standards and Patterns

**Purpose:** Document implicit conventions and best practices.

**Actions:**

1. Analyze code style:
   - Read `.eslintrc`, `.prettierrc`, `tsconfig.json`
   - Detect: indentation, quotes, semicolons
   - Naming conventions: variables, functions, classes, files

2. Identify architectural patterns:
   - Design patterns: Repository, Factory, Strategy
   - Architectural style: Layered, Clean, Hexagonal

3. Extract error handling patterns:
   - Error classes
   - Centralized error handling
   - Logging patterns

4. Detect testing patterns:
   - Test organization
   - Mocking strategy
   - Test data management

**Output:**
```
Coding Standards:

Code Style:
- Indentation: 2 spaces
- Quotes: Single quotes
- Naming: camelCase (variables), PascalCase (classes), kebab-case (files)

Architectural Patterns:
1. Layered Architecture
   - Routes (presentation layer)
   - Services (business logic layer)
   - Repositories (data access layer)

2. Dependency Injection
3. Repository Pattern

Error Handling:
- Custom error classes (AppError, ValidationError)
- Centralized error handling middleware
- Never expose stack traces to clients

Confidence: High (90%)
```

**Reference:** See [pattern-detection.md](references/pattern-detection.md) for pattern identification techniques.

---

### Step 6: Generate Architecture Documentation

**Purpose:** Create comprehensive architecture.md from findings.

**Actions:**

1. Load architecture template structure

2. Populate sections with analyzed data from Steps 1-5:
   - Overview, tech stack, project structure
   - Data models, API specifications
   - Include confidence scores where applicable
   - Add source file references

3. Generate diagrams (optional):
   - Mermaid diagrams for architecture layers
   - Data model ERD
   - API endpoint map

4. Add human review notes for medium/low confidence sections

5. Write documentation file:
   - Create `docs/architecture.md`
   - Or merge with existing docs if "merge" mode selected

**Output:**
```
✓ Architecture documentation generated
✓ File: docs/architecture.md (2,450 lines)
✓ Sections: 12
✓ Overall confidence: 85%
✓ Human review items: 5
```

**Reference:** See [documentation-templates.md](references/documentation-templates.md) for template structures.

---

### Step 7: Generate Standards Documentation

**Purpose:** Create standards.md from discovered patterns.

**Actions:**

1. Extract standards from code analysis:
   - Security standards (from security practices)
   - Testing standards (from test patterns)
   - Code quality standards (from ESLint rules)
   - Performance standards (from observed patterns)

2. Document best practices:
   - Observed consistently across codebase
   - Mark as "discovered" vs "recommended"

3. Create standards document with examples and consistency scores

**Output:**
```
✓ Standards documentation generated
✓ File: docs/standards.md (850 lines)
✓ Standards extracted: 18
✓ Consistency scores: 75-100%
```

---

### Step 8: Generate Patterns Documentation

**Purpose:** Document discovered design patterns and conventions.

**Actions:**

1. Extract design patterns:
   - Repository, Factory, Strategy, Middleware patterns

2. Document usage examples:
   - Show code examples of each pattern
   - Explain when to use
   - Link to existing implementations

3. Identify anti-patterns:
   - Code smells detected
   - Inconsistencies
   - Technical debt

4. Create patterns document

**Output:**
```
✓ Patterns documentation generated
✓ File: docs/patterns.md (620 lines)
✓ Patterns identified: 8
✓ Anti-patterns noted: 3
```

**Reference:** See [pattern-detection.md](references/pattern-detection.md) for pattern documentation templates.

---

### Step 9: Validation and Confidence Scoring

**Purpose:** Score accuracy and identify areas needing human review.

**Actions:**

1. **Calculate confidence:** Score each section (tech stack, structure, data models, API, standards) and compute overall
2. **Identify low-confidence areas:** Find sections <70%, conflicting patterns, missing info
3. **Generate review checklist:** List high/medium priority items needing human verification
4. **Create validation report**

**Output:** Validation summary (overall confidence %, high/medium priority review item counts)

**See:** `references/templates.md#review-checklist-template` and `references/confidence-scoring.md` for formats

**Reference:** See [confidence-scoring.md](references/confidence-scoring.md) for scoring guidelines.

---

### Step 10: Summary and Next Steps

**Purpose:** Provide user with clear summary and action items.

**Actions:**

1. **Generate summary report:** Project name, duration, files analyzed, confidence, generated docs (paths + line counts), key findings (type, arch, tech stack, test coverage), next steps
2. **Create review checklist file:** `docs/REVIEW_CHECKLIST.md`
3. **Update config:** Set brownfield flag, doc paths, documented=true
4. **Prompt next action:** Review items, create task spec, run index-docs, or exit

**Output:** Summary confirmation (report generated, checklist created, config updated, ready status)

**See:** `references/templates.md` for complete summary formats

---

## Confidence Scoring Guidelines

**High (85-100%):** Explicit in code/config, consistent patterns, no conflicts | **Medium (70-84%):** Inferred from patterns, some inconsistencies, needs validation | **Low (<70%):** Missing/unclear info, conflicts, high uncertainty, MUST review

**See:** `references/confidence-scoring.md` for detailed scoring methodology

---

## Limitations

**Cannot:** Understand business logic without context, document deployment infrastructure, capture tribal knowledge, understand legacy decisions, document external integrations perfectly

**Requires:** Readable structured codebase, standard organization, supported language/framework, 10K-100K lines (optimal)

## Best Practices

Run periodically (every 3-6 months) | Always review low-confidence sections | Supplement with manual docs (business context, deployment) | Use as starting point, enhance with team input

## Integration with Planning Workflow

**Brownfield:** document-project → index-docs → create-task-spec (use generated docs) | **Greenfield:** Write docs manually first | **Brownfield:** Generate from code, then refine

## References

Detailed documentation in `references/`:

- **templates.md**: All output formats, complete architecture/standards/patterns templates, analysis summaries, review checklists, error templates
- **analysis-techniques.md**: Analysis methods for structure, tech stack, models, APIs, patterns
- **pattern-detection.md**: Pattern identification and documentation techniques
- **documentation-templates.md**: Templates for architecture, standards, patterns docs
- **confidence-scoring.md**: Confidence calculation methodology and validation criteria
- **validation-criteria.md**: Project validation rules and sizing guidelines