---
name: file-categorization
description: Reusable logic for categorizing files as Command, Agent, Skill, or Documentation based on structure and content analysis
---

# File Categorization Skill

## When to Use This Skill

- Processing files in integration pipelines
- Scanning directories for file organization
- Auto-routing files to appropriate locations
- Generating file inventory reports
- Validating repository structure

## What This Skill Does

Analyzes file structure and content to accurately categorize files into:
- **Commands** - Slash command definitions
- **Agents** - Agent configuration files
- **Skills** - Reusable workflow automation
- **Documentation** - General markdown documentation
- **Other** - Uncategorized files requiring manual review

## Categorization Logic

### Step 1: Filename Pattern Matching

**Commands**:
- Filename matches `*-command.md` or `*command.md`
- Located in `.claude/commands/` directory
- Filename uses verb-noun pattern (e.g., `integration-scan.md`)

**Agents**:
- Filename matches `*-agent.md` or `*agent.md`
- Located in `agents-templates/` directory
- Contains role-based names (architect, builder, validator, etc.)

**Skills**:
- Filename is `SKILL.md` or `*-SKILL.md` or `*-skill.md`
- Located in `skills/*/` directories
- Contains workflow automation content

**Documentation**:
- Standard `.md` files
- Located in `docs/` directory
- Contains reference or tutorial content

### Step 2: Frontmatter Analysis

Read the YAML frontmatter (if present) to identify:

**Command Indicators**:
```yaml
---
description: "..."
allowed-tools: [...]
author: "..."
version: "X.Y"
---
```

**Skill Indicators**:
```yaml
---
name: skill-name
description: "..."
---
```

**Agent Indicators** (less structured, more prose):
```markdown
## Agent Identity
**Role**: [Agent Role]
**Version**: X.Y.Z
**Purpose**: [Purpose description]
```

### Step 3: Content Structure Analysis

**Commands have**:
- Workflow sections with numbered steps
- Bash command examples (prefixed with `!`)
- `allowed-tools` restrictions
- Usage examples

**Agents have**:
- Core Responsibilities section
- Allowed Tools and Permissions section
- Workflow Patterns section
- Context Management section

**Skills have**:
- "When to Use" section
- "What This Skill Does" section
- Step-by-step process descriptions
- Examples with real data

**Documentation has**:
- Standard markdown structure
- Tutorial or reference content
- No executable workflows
- Educational purpose

### Step 4: Keyword Detection

Scan content for category-specific keywords:

**Command Keywords**:
- `!bash`, `!git`, `!npm`, etc. (shell commands)
- "allowed-tools"
- "Usage:", "Workflow:", "Steps:"
- Command-line patterns

**Agent Keywords**:
- "Core Responsibilities"
- "Workflow Patterns"
- "Context Management"
- "Orchestrator", "Sub-Agent"
- "Handoff", "Delegation"

**Skill Keywords**:
- "When to Use"
- "What This Skill Does"
- "Skill" in self-references
- Reusable workflow language

**Documentation Keywords**:
- "Introduction", "Overview", "Guide"
- "Tutorial", "Reference", "Best Practices"
- Educational/explanatory language

## Categorization Algorithm

```
function categorizeFile(filePath, content):
  // Phase 1: Filename and location
  if filename matches command patterns OR in .claude/commands/:
    category = "Command"
    confidence = "High"

  else if filename == "SKILL.md" OR in skills/*/:
    category = "Skill"
    confidence = "High"

  else if in agents-templates/:
    category = "Agent"
    confidence = "High"

  else if in docs/:
    category = "Documentation"
    confidence = "Medium"

  // Phase 2: Frontmatter analysis (refine)
  frontmatter = extractYAML(content)
  if frontmatter contains "allowed-tools" AND "version":
    category = "Command"
    confidence = "High"

  else if frontmatter contains "name" (no allowed-tools):
    category = "Skill"
    confidence = "High"

  // Phase 3: Content structure (if still uncertain)
  if confidence != "High":
    if content contains "## Agent Identity":
      category = "Agent"
      confidence = "High"

    else if content contains "## When to Use":
      category = "Skill"
      confidence = "Medium"

    else if content contains "!bash" OR "!git":
      category = "Command"
      confidence = "Medium"

  // Phase 4: Fallback
  if category == null:
    category = "Other"
    confidence = "Low"
    reason = "Unable to determine category, manual review needed"

  return {category, confidence, reasoning}
```

## Output Format

For each categorized file, return:

```markdown
### [Filename]
- **Category**: [Command|Agent|Skill|Documentation|Other]
- **Confidence**: [High|Medium|Low]
- **Reasoning**: [Why this category was assigned]
- **Frontmatter**: [✅ Valid | ⚠️ Malformed | ❌ Missing]
- **Required Fields**: [List of found/missing fields]
- **Recommended Location**: [Target directory path]
```

## Example Usage

### Example 1: Categorizing Integration File

**Input**:
```
File: USING-GIT-WORKTREES-SKILL.md
Content:
---
name: using-git-worktrees
description: Creates isolated git worktrees...
---

# Using Git Worktrees

## When to Use
...
```

**Output**:
```markdown
### USING-GIT-WORKTREES-SKILL.md
- **Category**: Skill
- **Confidence**: High
- **Reasoning**: Filename matches skill pattern, frontmatter has 'name' field, content has "When to Use" section
- **Frontmatter**: ✅ Valid
- **Required Fields**: name ✅, description ✅
- **Recommended Location**: skills/using-git-worktrees/SKILL.md
```

### Example 2: Categorizing Command File

**Input**:
```
File: integration-scan.md
Content:
---
description: "Scan and categorize incoming files"
allowed-tools: ["Read", "Bash(find)"]
author: "Claude Command and Control"
version: "1.0"
---

# Integration Scan

## Purpose
...
```

**Output**:
```markdown
### integration-scan.md
- **Category**: Command
- **Confidence**: High
- **Reasoning**: Filename uses verb-noun pattern, frontmatter has 'allowed-tools' and 'version'
- **Frontmatter**: ✅ Valid
- **Required Fields**: description ✅, allowed-tools ✅, author ✅, version ✅
- **Recommended Location**: .claude/commands/integration-scan.md
```

### Example 3: Uncategorizable File

**Input**:
```
File: notes.md
Content:
# Random Notes

Some thoughts about the project...
```

**Output**:
```markdown
### notes.md
- **Category**: Other
- **Confidence**: Low
- **Reasoning**: No frontmatter, no structural indicators, generic content
- **Frontmatter**: ❌ Missing
- **Required Fields**: N/A
- **Recommended Location**: Manual review required
```

## Integration with Commands

### Used By

- `/integration-scan` - Primary categorization logic
- `/integration-process` - Determines target directory
- `/integration-validate` - Validates category-specific structure

### Usage Pattern

```markdown
# In integration-scan command

For each file in /INTEGRATION/incoming:
  1. Read file content
  2. Use file-categorization skill
  3. Extract category and confidence
  4. Include in scan report
  5. Mark for processing if High confidence
  6. Flag for review if Medium/Low confidence
```

## Category-Specific Validation Rules

### Commands
- ✅ MUST have: description, allowed-tools, author, version
- ✅ SHOULD have: workflow steps, usage examples
- ⚠️ Check: Tool permissions not overly broad

### Agents
- ✅ MUST have: Agent Identity, Core Responsibilities, Allowed Tools
- ✅ SHOULD have: Workflow Patterns, Context Management
- ⚠️ Check: Role clearly defined

### Skills
- ✅ MUST have: name, description, "When to Use"
- ✅ SHOULD have: Examples, step-by-step process
- ⚠️ Check: Examples use real data (not placeholders)

### Documentation
- ✅ MUST have: Clear title, structured content
- ✅ SHOULD have: Table of contents, cross-references
- ⚠️ Check: No executable workflows (should be in Command/Skill)

## Error Handling

### Malformed Frontmatter
```
Issue: YAML syntax error
Action: Note in categorization output
Category: "Other" with reason "Invalid frontmatter"
Recommendation: Fix YAML before processing
```

### Conflicting Indicators
```
Issue: Filename says "command" but structure says "skill"
Action: Confidence = "Medium"
Reasoning: "Filename and content indicators conflict"
Recommendation: Manual review
```

### Missing Content
```
Issue: File is empty or too short (<100 chars)
Action: Category = "Other"
Confidence: "Low"
Reasoning: "Insufficient content for categorization"
```

## Testing Recommendations

Test with:
1. **Typical files** - Standard commands, agents, skills
2. **Edge cases** - Mixed indicators, missing frontmatter
3. **Malformed files** - Syntax errors, incomplete content
4. **Ambiguous files** - Could fit multiple categories

Expected accuracy:
- **High confidence**: >95% correct
- **Medium confidence**: >80% correct
- **Low confidence**: Requires manual review

## Version History

**1.0** (2025-11-23)
- Initial file categorization skill
- Four-phase categorization algorithm
- Integration with scan/process commands
- Comprehensive validation rules

---

**Skill Status**: Production Ready
**Accuracy Target**: >95% for High confidence categorizations
**Dependencies**: None (standalone logic)