# OpenSpec Skill: spec-gen > Reverse-engineer OpenSpec specifications from existing codebases ## Skill Metadata ```yaml name: spec-gen version: 1.0.0 description: Generate OpenSpec specifications by analyzing existing code author: Clay Good repository: https://github.com/clay-good/spec-gen ``` ## Instructions You are performing "code archaeology" — extracting the truth of what code does and documenting it as OpenSpec specifications. ### Philosophy - **Archaeology over Creativity**: Document what the code ACTUALLY does, not what you imagine it should do - **Evidence-based**: Every requirement and scenario must trace back to actual code - **OpenSpec-native**: Output follows OpenSpec conventions exactly ### Process #### Phase 1: Codebase Survey First, understand the project structure: 1. **Identify project type** by checking for: - `package.json` → Node.js/TypeScript - `pyproject.toml` / `setup.py` → Python - `go.mod` → Go - `Cargo.toml` → Rust - `pom.xml` / `build.gradle` → Java 2. **Find high-value files** (prioritize these for analysis): - Schema/model files (entities, types, interfaces) - Service files (business logic) - Route/controller files (API surface) - Config files (settings, environment) - Entry points (main, index, app) 3. **Identify domains** by looking for: - Directory structure (src/users/, src/orders/, etc.) - File naming patterns (user-service, order-controller) - Import clusters (files that import each other heavily) 4. **Detect frameworks** from dependencies and patterns: - Express, NestJS, FastAPI, Django, etc. - Database: PostgreSQL, MongoDB, etc. - Auth: JWT, OAuth, etc. #### Phase 2: Deep Analysis For each identified domain, analyze the relevant files: 1. **Extract entities**: - What data structures exist? - What are their properties and types? - How do they relate to each other? 2. **Extract behaviors**: - What operations can be performed? - What are the business rules/validations? - What side effects occur (emails, payments, etc.)? 3. **Extract API surface** (if applicable): - What endpoints exist? - What are the request/response shapes? - What authentication is required? #### Phase 3: Generate OpenSpec Specifications Create the OpenSpec directory structure: ``` openspec/ ├── config.yaml └── specs/ ├── overview/ │ └── spec.md ├── {domain-1}/ │ └── spec.md ├── {domain-2}/ │ └── spec.md └── architecture/ └── spec.md ``` ##### Spec File Format Each spec.md MUST follow this format: ```markdown # {Domain} Specification > Generated by spec-gen on {date} > Source files: {list of files analyzed} ## Purpose {2-3 sentences describing what this domain handles} ## Requirements ### Requirement: {RequirementName} {The system SHALL/MUST/SHOULD do X...} Use RFC 2119 keywords: - **SHALL/MUST**: Required behavior - **SHOULD**: Recommended behavior - **MAY**: Optional behavior #### Scenario: {ScenarioName} - **GIVEN** {precondition} - **WHEN** {action} - **THEN** {expected outcome} ## Technical Notes - **Implementation**: `{file paths}` - **Dependencies**: {related domains/services} ``` ##### Critical Formatting Rules 1. Requirements use RFC 2119 keywords (SHALL, MUST, SHOULD, MAY) 2. Scenarios use exactly 4 hashtags (`####`) 3. Scenarios follow Given/When/Then format with bold labels 4. No delta markers (ADDED, MODIFIED, REMOVED) — these are baseline specs #### Phase 4: Update OpenSpec Config If `openspec/config.yaml` exists, preserve existing content and add: ```yaml # Existing content preserved... # Auto-detected by spec-gen spec-gen: generatedAt: "{timestamp}" domains: - {domain-1} - {domain-2} ``` If it doesn't exist, create minimal config: ```yaml schema: spec-driven context: | {Brief project description based on analysis} Tech stack: {detected technologies} Architecture: {detected pattern} ``` #### Phase 5: Drift Detection When specs already exist and code has changed, check for **spec drift** — divergence between the codebase and its specifications. This keeps specs from going stale in multi-engineer environments. **When to check:** Before committing code, when reviewing PRs, or when asked to validate specs are current. **Process:** 1. **Identify what changed** — Use git to find added, modified, deleted, or renamed source files compared to the base branch. Filter out test files, generated files, lock files, static assets, and CI configs. 2. **Map changes to specs** — For each changed file, determine which spec domain covers it by checking: - `> Source files:` header in each `spec.md` - `**Implementation**:` references in Technical Notes - Directory structure inference (e.g., `src/auth/` → auth domain) 3. **Detect four categories of drift:** - **Gap**: Code changed but its spec was not updated - **Stale**: Spec references a deleted or renamed file - **Uncovered**: New source file has no matching spec domain - **Orphaned Spec**: Spec declares source files that no longer exist 4. **Report** issues with the affected file, domain, and a suggestion for resolution. **CLI shorthand:** `spec-gen drift` runs this check. Use `spec-gen drift --install-hook` to add it as a git pre-commit hook. ### Output Checklist Before finishing, verify: - [ ] `openspec/specs/overview/spec.md` exists with system summary - [ ] Each domain has `openspec/specs/{domain}/spec.md` - [ ] `openspec/specs/architecture/spec.md` describes system structure - [ ] All requirements use RFC 2119 keywords - [ ] All scenarios use Given/When/Then format - [ ] `openspec/config.yaml` is created or updated - [ ] No spec drift — run `spec-gen drift` to verify specs match code ### Example Invocation User: "Run spec-gen on this codebase" You should: 1. Survey the codebase structure 2. Identify 3-6 core domains 3. Analyze high-value files in each domain 4. Generate spec.md files for each domain 5. Create overview and architecture specs 6. Update config.yaml 7. Run drift check to confirm specs are in sync Report what you created and suggest next steps: - `openspec validate --all` to check structure - `spec-gen drift --install-hook` to catch future drift - `openspec list --specs` to see generated specs - Manual review and refinement of generated specs