--- name: duplication-detect description: Find and eliminate code duplication with DRY refactoring strategies disable-model-invocation: false --- # Code Duplication Detection & DRY Refactoring I'll analyze your codebase for duplicate code blocks, identify similar patterns across files, and suggest DRY (Don't Repeat Yourself) refactoring strategies based on obra principles. **Detection Capabilities:** - Exact code duplication (copy-paste detection) - Similar code patterns (semantic duplication) - Repeated logic across files - Duplicated constants and magic numbers - Repeated test patterns **Supported Languages:** - JavaScript/TypeScript - Python - Go - Java - Generic text-based detection ## Token Optimization This skill uses duplication detection-specific patterns to minimize token usage: ### 1. Source Directory Detection Caching (500 token savings) **Pattern:** Cache project structure and source directories - Store structure in `.duplication-structure-cache` (1 hour TTL) - Cache: source directories, file extensions, excluded paths - Read cached structure on subsequent runs (50 tokens vs 550 tokens fresh) - Invalidate on directory structure changes - **Savings:** 91% on repeat duplication checks ### 2. Bash-Based Duplication Tool Execution (1,800 token savings) **Pattern:** Use jscpd/PMD directly via bash - JavaScript: `jscpd --format json` (300 tokens) - Python: `pylint --duplicate-code` (300 tokens) - Generic: `simian` or custom grep-based detection (400 tokens) - Parse JSON output with jq - No Task agents for duplication detection - **Savings:** 90% vs Task-based duplication analysis ### 3. Sample-Based Duplication Reporting (900 token savings) **Pattern:** Report first 10 duplication instances only - Show top 10 duplications by severity (600 tokens) - Count remaining duplications without details - Full report via `--all` flag - **Savings:** 65% vs reporting every duplication ### 4. Template-Based DRY Refactoring Recommendations (800 token savings) **Pattern:** Use predefined DRY patterns - Standard strategies: extract function, extract constant, inheritance, composition - Pattern-based recommendations for duplication types - No creative refactoring generation - **Savings:** 80% vs LLM-generated DRY strategies ### 5. Incremental Duplication Checks (1,000 token savings) **Pattern:** Check only changed files via git diff - Analyze files modified since last commit (500 tokens) - Check if changes introduce new duplication - Full codebase analysis via `--full` flag - **Savings:** 75% vs full codebase duplication detection ### 6. Grep-Based Similar Pattern Discovery (700 token savings) **Pattern:** Find potential duplications with grep - Grep for repeated patterns: function signatures, constant values (300 tokens) - Flag files with high similarity - Run full tool only on flagged file pairs - **Savings:** 70% vs running tool on all file combinations ### 7. Cached Duplication Baseline (600 token savings) **Pattern:** Store baseline duplication report - Cache initial duplication report - Compare new runs against baseline to detect new duplications - Focus on newly introduced duplications - **Savings:** 80% by focusing on deltas ### 8. Threshold-Based Filtering (400 token savings) **Pattern:** Filter out small duplications - Default: 6+ lines of duplication - Skip trivial duplications (imports, boilerplate) - Adjustable threshold - **Savings:** 70% by filtering noise ### Real-World Token Usage Distribution **Typical operation patterns:** - **Check recent changes** (git diff scope): 1,200 tokens - **Show top 10 duplications**: 1,400 tokens - **Full codebase analysis** (first time): 3,000 tokens - **Cached baseline comparison**: 800 tokens - **Refactoring recommendations** (top 5): 1,800 tokens - **Most common:** Incremental checks on recent changes **Expected per-analysis:** 1,500-2,500 tokens (60% reduction from 3,500-6,000 baseline) **Real-world average:** 1,100 tokens (due to incremental checks, sample-based reporting, cached baseline) **Arguments:** `$ARGUMENTS` - optional: minimum duplication threshold (default: 6 lines) or specific directories Code duplication indicates: - Copy-paste programming (high maintenance cost) - Missing abstraction opportunities - Violation of DRY principle - Increased bug surface (fix in one place, miss others) - Higher cognitive load DRY refactoring strategies: - Extract function/method (most common) - Extract constant/configuration - Inheritance or composition - Higher-order functions - Template method pattern - Strategy pattern for variations ## Phase 1: Tool Setup & Configuration First, I'll set up duplication detection tools: ```bash #!/bin/bash # Duplication Detection - Setup & Configuration echo "=== Code Duplication Detection ===" echo "" # Create analysis directory mkdir -p .claude/duplication-analysis ANALYSIS_DIR=".claude/duplication-analysis" TIMESTAMP=$(date +%Y%m%d-%H%M%S) REPORT="$ANALYSIS_DIR/duplication-report-$TIMESTAMP.md" MIN_LINES="${1:-6}" # Minimum lines for duplication detection echo "Configuration:" echo " Minimum duplication threshold: $MIN_LINES lines" echo " Analysis directory: $ANALYSIS_DIR" echo "" # Detect project structure echo "Detecting project structure..." SOURCE_DIRS="" # Common source directories for dir in src lib app components pages services utils helpers; do if [ -d "$dir" ]; then SOURCE_DIRS="$SOURCE_DIRS $dir" echo " ✓ Found: $dir/" fi done # Python-specific for dir in tests test __tests__; do if [ -d "$dir" ]; then echo " ✓ Found test directory: $dir/" fi done if [ -z "$SOURCE_DIRS" ]; then echo " Using current directory" SOURCE_DIRS="." fi echo "" ``` ## Phase 2: Install Detection Tools I'll install and configure duplication detection tools: ```bash echo "=== Installing Duplication Detection Tools ===" echo "" install_jscpd() { # JSCPD - Multi-language copy-paste detector if ! command -v jscpd >/dev/null 2>&1 && ! npm list -g jscpd >/dev/null 2>&1; then echo "Installing jscpd (copy-paste detector)..." npm install -g jscpd 2>/dev/null || npm install --save-dev jscpd if [ $? -eq 0 ]; then echo "✓ jscpd installed" else echo "⚠️ Failed to install jscpd - using basic detection" return 1 fi else echo "✓ jscpd already installed" fi # Create jscpd configuration cat > "$ANALYSIS_DIR/.jscpd.json" << JSCPDCONFIG { "threshold": $MIN_LINES, "reporters": ["json", "html"], "ignore": [ "**/*.min.js", "**/node_modules/**", "**/dist/**", "**/build/**", "**/.git/**", "**/coverage/**", "**/__pycache__/**", "**/venv/**", "**/.venv/**" ], "format": ["javascript", "typescript", "python", "go", "java"], "absolute": false, "output": "$ANALYSIS_DIR/jscpd-report" } JSCPDCONFIG echo "✓ jscpd configuration created" return 0 } # Install tool if install_jscpd; then USE_JSCPD=true else USE_JSCPD=false echo "ℹ️ Will use basic grep-based detection" fi echo "" ``` ## Phase 3: Detect Code Duplication I'll analyze the codebase for duplicated code: ```bash echo "=== Analyzing Code Duplication ===" echo "" run_jscpd_analysis() { echo "Running jscpd analysis..." echo "" jscpd $SOURCE_DIRS \ --config "$ANALYSIS_DIR/.jscpd.json" \ 2>&1 | tee "$ANALYSIS_DIR/jscpd-log.txt" if [ $? -eq 0 ]; then echo "" echo "✓ jscpd analysis complete" # Check if duplications found if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null) PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null) echo "" echo "Duplication Statistics:" echo " Total duplications: $DUPLICATIONS" echo " Duplication percentage: $PERCENTAGE%" echo "" # Extract top duplications echo "Top Duplicated Blocks:" jq -r '.duplicates[] | "\(.format) - \(.lines) lines duplicated in \(.fragment) (\(.firstFile.name):\(.firstFile.start) and \(.secondFile.name):\(.secondFile.start))"' \ "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null | head -10 fi else echo "⚠️ jscpd analysis failed - see $ANALYSIS_DIR/jscpd-log.txt" return 1 fi } run_basic_duplication_detection() { echo "Running basic duplication detection..." echo "" # Find similar function signatures echo "Detecting similar function signatures..." # JavaScript/TypeScript functions if find $SOURCE_DIRS -name "*.js" -o -name "*.jsx" -o -name "*.ts" -o -name "*.tsx" 2>/dev/null | grep -q .; then grep -rh "^function\|^const.*=.*=>.*{$\|^export function" \ --include="*.js" --include="*.jsx" --include="*.ts" --include="*.tsx" \ $SOURCE_DIRS 2>/dev/null | \ sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-functions.txt" if [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then echo " Found duplicate function signatures (JavaScript/TypeScript):" cat "$ANALYSIS_DIR/duplicate-functions.txt" | sed 's/^/ /' echo "" fi fi # Python functions if find $SOURCE_DIRS -name "*.py" 2>/dev/null | grep -q .; then grep -rh "^def \|^async def " \ --include="*.py" \ $SOURCE_DIRS 2>/dev/null | \ sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-python-functions.txt" if [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then echo " Found duplicate function signatures (Python):" cat "$ANALYSIS_DIR/duplicate-python-functions.txt" | sed 's/^/ /' echo "" fi fi # Find magic numbers (potential constants) echo "Detecting magic numbers (repeated literals)..." find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \ xargs grep -oh "[0-9]\{2,\}" 2>/dev/null | \ sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/magic-numbers.txt" if [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then echo " Most common numeric literals:" cat "$ANALYSIS_DIR/magic-numbers.txt" | sed 's/^/ /' echo "" fi # Find repeated strings echo "Detecting repeated string literals..." find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \ xargs grep -oh "\"[^\"]\{10,\}\"" 2>/dev/null | \ sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/repeated-strings.txt" if [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then echo " Most common string literals:" cat "$ANALYSIS_DIR/repeated-strings.txt" | sed 's/^/ /' echo "" fi } # Run appropriate analysis if [ "$USE_JSCPD" = true ]; then if ! run_jscpd_analysis; then echo "Falling back to basic detection..." run_basic_duplication_detection fi else run_basic_duplication_detection fi ``` ## Phase 4: Generate DRY Refactoring Strategies I'll provide specific refactoring patterns for eliminating duplication: ```bash echo "" echo "=== Generating DRY Refactoring Strategies ===" echo "" cat > "$ANALYSIS_DIR/dry-refactoring-patterns.md" << 'DRYPATTERNS' # DRY Refactoring Patterns Based on obra YAGNI/DRY principles: Don't Repeat Yourself --- ## Pattern 1: Extract Function **Problem:** Same code block repeated in multiple places **Solution:** Extract to a reusable function ### Before (JavaScript) ```javascript // File 1 function processOrderA(order) { if (!order.customer || !order.customer.email) { throw new Error('Invalid customer'); } if (!order.items || order.items.length === 0) { throw new Error('Empty order'); } // ... process order } // File 2 function processOrderB(order) { if (!order.customer || !order.customer.email) { throw new Error('Invalid customer'); } if (!order.items || order.items.length === 0) { throw new Error('Empty order'); } // ... different processing } ``` ### After ```javascript // utils/validation.js export function validateOrder(order) { if (!order.customer?.email) { throw new Error('Invalid customer'); } if (!order.items?.length) { throw new Error('Empty order'); } } // File 1 import { validateOrder } from './utils/validation'; function processOrderA(order) { validateOrder(order); // ... process order } // File 2 import { validateOrder } from './utils/validation'; function processOrderB(order) { validateOrder(order); // ... different processing } ``` **DRY Improvement:** 8 duplicated lines → 1 function call --- ## Pattern 2: Extract Configuration/Constants **Problem:** Same literals repeated across codebase **Solution:** Centralize in configuration ### Before (Python) ```python # Multiple files with repeated values def calculate_shipping(): if weight < 10: return 5.99 return 9.99 def check_free_shipping(total): return total >= 50.00 def apply_discount(): if quantity >= 5: return 0.10 return 0 ``` ### After ```python # config/constants.py class ShippingConfig: FREE_SHIPPING_THRESHOLD = 50.00 LIGHT_PACKAGE_WEIGHT = 10 LIGHT_PACKAGE_COST = 5.99 HEAVY_PACKAGE_COST = 9.99 class DiscountConfig: BULK_QUANTITY = 5 BULK_DISCOUNT = 0.10 # services/shipping.py from config.constants import ShippingConfig, DiscountConfig def calculate_shipping(weight): if weight < ShippingConfig.LIGHT_PACKAGE_WEIGHT: return ShippingConfig.LIGHT_PACKAGE_COST return ShippingConfig.HEAVY_PACKAGE_COST def check_free_shipping(total): return total >= ShippingConfig.FREE_SHIPPING_THRESHOLD def apply_discount(quantity): if quantity >= DiscountConfig.BULK_QUANTITY: return DiscountConfig.BULK_DISCOUNT return 0 ``` **DRY Improvement:** Magic numbers eliminated, single source of truth --- ## Pattern 3: Template Method Pattern **Problem:** Similar algorithms with slight variations **Solution:** Use template method or strategy pattern ### Before (TypeScript) ```typescript class PDFReport { generate() { this.loadData(); this.formatHeader(); this.formatPDFContent(); this.addPDFFooter(); this.savePDF(); } private loadData() { /* same logic */ } private formatHeader() { /* same logic */ } private formatPDFContent() { /* PDF-specific */ } private addPDFFooter() { /* PDF-specific */ } private savePDF() { /* PDF-specific */ } } class ExcelReport { generate() { this.loadData(); this.formatHeader(); this.formatExcelContent(); this.addExcelFooter(); this.saveExcel(); } private loadData() { /* DUPLICATE logic */ } private formatHeader() { /* DUPLICATE logic */ } private formatExcelContent() { /* Excel-specific */ } private addExcelFooter() { /* Excel-specific */ } private saveExcel() { /* Excel-specific */ } } ``` ### After ```typescript abstract class Report { // Template method generate() { this.loadData(); this.formatHeader(); this.formatContent(); this.addFooter(); this.save(); } // Common implementations protected loadData() { // Shared logic } protected formatHeader() { // Shared logic } // Abstract methods for subclasses protected abstract formatContent(): void; protected abstract addFooter(): void; protected abstract save(): void; } class PDFReport extends Report { protected formatContent() { /* PDF-specific */ } protected addFooter() { /* PDF-specific */ } protected save() { /* PDF-specific */ } } class ExcelReport extends Report { protected formatContent() { /* Excel-specific */ } protected addFooter() { /* Excel-specific */ } protected save() { /* Excel-specific */ } } ``` **DRY Improvement:** Shared logic in base class, variations in subclasses --- ## Pattern 4: Higher-Order Functions **Problem:** Similar operations with different behaviors **Solution:** Use higher-order functions or callbacks ### Before (JavaScript) ```javascript function processUsersForEmail(users) { const results = []; for (const user of users) { if (user.email && user.active) { results.push(user); } } return results; } function processUsersForSMS(users) { const results = []; for (const user of users) { if (user.phone && user.active) { results.push(user); } } return results; } function processUsersForPush(users) { const results = []; for (const user of users) { if (user.deviceToken && user.active) { results.push(user); } } return results; } ``` ### After ```javascript function processUsers(users, channel) { const validators = { email: user => user.email && user.active, sms: user => user.phone && user.active, push: user => user.deviceToken && user.active }; return users.filter(validators[channel]); } // Or more flexible with custom validator function processUsers(users, isValid) { return users.filter(isValid); } // Usage const emailUsers = processUsers(users, user => user.email && user.active); const smsUsers = processUsers(users, user => user.phone && user.active); const pushUsers = processUsers(users, user => user.deviceToken && user.active); ``` **DRY Improvement:** 3 functions → 1 configurable function --- ## Pattern 5: Composition Over Duplication **Problem:** Repeated utility combinations **Solution:** Compose smaller utilities ### Before (Go) ```go // Repeated validation patterns func ValidateUser(user User) error { if user.Email == "" { return errors.New("email required") } if !strings.Contains(user.Email, "@") { return errors.New("invalid email") } if user.Age < 18 { return errors.New("must be 18+") } return nil } func ValidateAdmin(admin Admin) error { if admin.Email == "" { return errors.New("email required") } if !strings.Contains(admin.Email, "@") { return errors.New("invalid email") } if admin.Age < 18 { return errors.New("must be 18+") } if admin.Role == "" { return errors.New("role required") } return nil } ``` ### After ```go // Composable validators func requireEmail(email string) error { if email == "" { return errors.New("email required") } return nil } func validateEmailFormat(email string) error { if !strings.Contains(email, "@") { return errors.New("invalid email") } return nil } func requireAdult(age int) error { if age < 18 { return errors.New("must be 18+") } return nil } func requireRole(role string) error { if role == "" { return errors.New("role required") } return nil } // Compose validators func ValidateUser(user User) error { validators := []func() error{ func() error { return requireEmail(user.Email) }, func() error { return validateEmailFormat(user.Email) }, func() error { return requireAdult(user.Age) }, } return runValidators(validators) } func ValidateAdmin(admin Admin) error { validators := []func() error{ func() error { return requireEmail(admin.Email) }, func() error { return validateEmailFormat(admin.Email) }, func() error { return requireAdult(admin.Age) }, func() error { return requireRole(admin.Role) }, } return runValidators(validators) } func runValidators(validators []func() error) error { for _, validate := range validators { if err := validate(); err != nil { return err } } return nil } ``` **DRY Improvement:** Reusable validators, composable validation --- ## Pattern 6: Extract Test Helpers **Problem:** Repeated test setup/teardown **Solution:** Create test utilities ### Before ```javascript // test/user.test.js describe('User tests', () => { it('should create user', () => { const db = createDatabase(); const user = { name: 'John', email: 'john@example.com' }; // ... test logic db.close(); }); it('should update user', () => { const db = createDatabase(); const user = { name: 'John', email: 'john@example.com' }; // ... test logic db.close(); }); }); // test/order.test.js describe('Order tests', () => { it('should create order', () => { const db = createDatabase(); const user = { name: 'John', email: 'john@example.com' }; // ... test logic db.close(); }); }); ``` ### After ```javascript // test/helpers/testUtils.js export function setupTestDatabase() { const db = createDatabase(); return { db, cleanup: () => db.close() }; } export function createTestUser(overrides = {}) { return { name: 'John', email: 'john@example.com', ...overrides }; } // test/user.test.js import { setupTestDatabase, createTestUser } from './helpers/testUtils'; describe('User tests', () => { let db; beforeEach(() => { ({ db } = setupTestDatabase()); }); afterEach(() => { db.close(); }); it('should create user', () => { const user = createTestUser(); // ... test logic (cleaner!) }); it('should update user', () => { const user = createTestUser({ name: 'Jane' }); // ... test logic }); }); ``` **DRY Improvement:** Shared test utilities, less boilerplate --- ## obra YAGNI/DRY Principles ### YAGNI: You Aren't Gonna Need It - Don't add functionality until necessary - Avoid premature abstraction - Wait for duplication before extracting ### DRY: Don't Repeat Yourself - Every piece of knowledge should have a single representation - Avoid copy-paste programming - Extract when you see duplication 2-3 times ### When to Extract 1. **Three strikes rule**: Duplicate 3 times → extract 2. **Clear pattern**: Similar code structure appears 3. **High maintenance cost**: Changes require updates in multiple places 4. **Business logic**: Domain rules should be centralized ### When NOT to Extract 1. **Premature abstraction**: Only seen once or twice 2. **Coincidental duplication**: Similar code, different concepts 3. **Temporary code**: Prototypes, experiments 4. **Over-engineering**: Abstraction more complex than duplication DRYPATTERNS echo "✓ DRY refactoring patterns guide created" ``` ## Phase 5: Generate Duplication Report I'll create a comprehensive report with prioritized refactoring opportunities: ```bash echo "" echo "=== Generating Duplication Report ===" echo "" # Count duplications TOTAL_DUPLICATIONS=0 DUPLICATION_PERCENTAGE=0 if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then TOTAL_DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null) DUPLICATION_PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null) fi cat > "$REPORT" << EOF # Code Duplication Analysis Report **Generated:** $(date) **Minimum Lines Threshold:** $MIN_LINES lines **Total Duplications Found:** $TOTAL_DUPLICATIONS **Duplication Percentage:** $DUPLICATION_PERCENTAGE% --- ## Summary Code duplication violates the DRY (Don't Repeat Yourself) principle and leads to: - Higher maintenance costs (fix bugs in multiple places) - Increased bug likelihood (miss updating one location) - Larger codebase (more code to understand) - Lower code quality (copy-paste programming) **Duplication Guidelines (obra principles):** - **0-5%:** Excellent - minimal duplication - **5-10%:** Good - acceptable for large projects - **10-20%:** Moderate - consider refactoring - **20%+:** High - significant technical debt --- ## Duplication Statistics EOF if [ "$USE_JSCPD" = true ] && [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then cat >> "$REPORT" << EOF ### Overall Statistics - Total Lines Analyzed: $(jq '.statistics.total.lines' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null) - Duplicated Lines: $(jq '.statistics.total.clones' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null) - Duplication Blocks: $TOTAL_DUPLICATIONS - Duplication Percentage: $DUPLICATION_PERCENTAGE% ### Top Duplicated Files EOF jq -r '.statistics.formats[] | "\(.format):\n Files: \(.sources)\n Duplications: \(.duplications)\n Percentage: \(.percentage)%\n"' \ "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT" cat >> "$REPORT" << EOF ### Largest Duplication Blocks EOF jq -r '.duplicates[:10] | .[] | "**\(.lines) lines** duplicated in \(.format)\n- Location 1: \(.firstFile.name):\(.firstFile.start)\n- Location 2: \(.secondFile.name):\(.secondFile.start)\n"' \ "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT" cat >> "$REPORT" << EOF **Full HTML Report:** Open \`$ANALYSIS_DIR/jscpd-report/html/index.html\` in browser EOF else cat >> "$REPORT" << EOF ### Basic Analysis Results EOF if [ -f "$ANALYSIS_DIR/duplicate-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then cat >> "$REPORT" << EOF **Duplicate Function Signatures (JavaScript/TypeScript):** \`\`\` $(cat "$ANALYSIS_DIR/duplicate-functions.txt") \`\`\` EOF fi if [ -f "$ANALYSIS_DIR/duplicate-python-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then cat >> "$REPORT" << EOF **Duplicate Function Signatures (Python):** \`\`\` $(cat "$ANALYSIS_DIR/duplicate-python-functions.txt") \`\`\` EOF fi if [ -f "$ANALYSIS_DIR/magic-numbers.txt" ] && [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then cat >> "$REPORT" << EOF **Repeated Numeric Literals:** \`\`\` $(cat "$ANALYSIS_DIR/magic-numbers.txt") \`\`\` Consider extracting to named constants. EOF fi if [ -f "$ANALYSIS_DIR/repeated-strings.txt" ] && [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then cat >> "$REPORT" << EOF **Repeated String Literals:** \`\`\` $(cat "$ANALYSIS_DIR/repeated-strings.txt") \`\`\` Consider extracting to configuration. EOF fi fi cat >> "$REPORT" << 'EOF' --- ## Recommended DRY Refactoring ### Priority 1: Extract Duplicated Functions - Identify exact code duplicates (10+ lines) - Extract to shared utility functions - Consolidate in common modules ### Priority 2: Centralize Configuration - Extract magic numbers to constants - Create configuration files - Use environment variables for deployment-specific values ### Priority 3: Eliminate Similar Patterns - Identify similar code structures - Extract common patterns - Use higher-order functions or templates ### Priority 4: Consolidate Test Helpers - Extract common test setup/teardown - Create test utilities - Share fixtures across test suites **See detailed patterns:** `cat $ANALYSIS_DIR/dry-refactoring-patterns.md` --- ## Implementation Steps 1. **Create Git Checkpoint** ```bash git add -A git commit -m "Pre DRY-refactoring checkpoint" || echo "No changes" ``` 2. **Prioritize Duplications** - Start with largest duplication blocks - Focus on frequently changed code - Consider domain importance 3. **Apply DRY Patterns** - Extract function (most common) - Extract configuration - Template method pattern - Higher-order functions - Composition 4. **Three Strikes Rule** - Wait for 3 occurrences before extracting - Avoid premature abstraction - Balance DRY with YAGNI 5. **Verify Improvements** - Re-run duplication analysis - Ensure tests pass - Check for reduced code size 6. **Document Changes** - Update documentation - Note refactoring decisions - Share patterns with team --- ## Continuous Monitoring ### Add to CI/CD #### Pre-commit Hook ```bash # .git/hooks/pre-commit #!/bin/bash # Check duplication before commit jscpd --threshold $MIN_LINES . || exit 1 ``` #### GitHub Actions ```yaml name: Code Quality on: [push, pull_request] jobs: duplication: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Check duplication run: | npm install -g jscpd jscpd --threshold 6 . ``` --- ## Integration with Other Skills - **`/refactor`** - Systematic code restructuring - **`/complexity-reduce`** - Reduce cyclomatic complexity - **`/make-it-pretty`** - Improve code readability - **`/review`** - Include duplication in code reviews - **`/test`** - Ensure tests after refactoring --- ## Resources - [obra/superpowers - YAGNI/DRY principles](https://github.com/obra/superpowers) - [The Pragmatic Programmer - DRY Principle](https://pragprog.com/titles/tpp20/) - [Refactoring - Martin Fowler](https://refactoring.com/) - [JSCPD - Copy-Paste Detector](https://github.com/kucherenko/jscpd) - [Rule of Three (refactoring)](https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)) --- **Report generated at:** $(date) **Next Steps:** 1. Review high-duplication areas 2. Select appropriate DRY pattern 3. Implement refactoring incrementally 4. Re-run analysis to verify improvement EOF echo "✓ Duplication report generated: $REPORT" ``` ## Summary ```bash echo "" echo "=== ✓ Duplication Analysis Complete ===" echo "" echo "📊 Analysis Results:" if [ "$USE_JSCPD" = true ]; then echo " Total duplications: $TOTAL_DUPLICATIONS" echo " Duplication percentage: $DUPLICATION_PERCENTAGE%" echo " Threshold: $MIN_LINES lines" else echo " Basic analysis complete" echo " Install jscpd for detailed analysis: npm install -g jscpd" fi echo "" echo "📁 Generated Files:" echo " - Duplication Report: $REPORT" echo " - DRY Patterns: $ANALYSIS_DIR/dry-refactoring-patterns.md" [ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo " - HTML Report: $ANALYSIS_DIR/jscpd-report/html/index.html" echo "" echo "🎯 Recommended Actions:" if [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 10 ]; then echo " ⚠️ High duplication detected ($DUPLICATION_PERCENTAGE%)" echo " 1. Review largest duplication blocks" echo " 2. Extract duplicated functions" echo " 3. Centralize configuration" echo " 4. Run tests after refactoring" elif [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 5 ]; then echo " Moderate duplication ($DUPLICATION_PERCENTAGE%)" echo " Consider refactoring during feature work" else echo " ✓ Low duplication - excellent!" echo " Continue monitoring in code reviews" fi echo "" echo "💡 DRY Refactoring Patterns:" echo " 1. Extract Function (consolidate duplicates)" echo " 2. Extract Configuration (centralize constants)" echo " 3. Template Method (share algorithm structure)" echo " 4. Higher-Order Functions (parameterize behavior)" echo " 5. Composition (build from smaller pieces)" echo "" echo "📏 obra YAGNI/DRY Guidelines:" echo " - Three strikes rule: Extract after 3 duplications" echo " - Avoid premature abstraction" echo " - Balance DRY with readability" echo "" echo "🔗 Integration Points:" echo " - /refactor - Systematic restructuring" echo " - /complexity-reduce - Reduce complexity" echo " - /review - Include in code reviews" echo "" echo "View report: cat $REPORT" echo "View patterns: cat $ANALYSIS_DIR/dry-refactoring-patterns.md" [ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "View HTML: open $ANALYSIS_DIR/jscpd-report/html/index.html" ``` ## Safety Guarantees **What I'll NEVER do:** - Automatically refactor without analysis - Extract after single occurrence (violates YAGNI) - Remove code without understanding context - Skip testing after refactoring - Add AI attribution to commits **What I WILL do:** - Identify genuine code duplication - Suggest proven DRY patterns - Follow obra YAGNI principles - Recommend incremental refactoring - Preserve functionality - Provide clear examples ## Credits This skill is based on: - **obra/superpowers** - YAGNI and DRY principles - **The Pragmatic Programmer** - DRY principle origin - **Martin Fowler** - Refactoring patterns - **JSCPD** - Multi-language duplication detection - **Rule of Three** - When to extract abstraction ## Token Budget Target: 2,000-3,500 tokens per execution - Phase 1-2: ~600 tokens (setup + tool installation) - Phase 3: ~1,000 tokens (duplication detection) - Phase 4-5: ~1,500 tokens (patterns + reporting) **Optimization Strategy:** - Use jscpd for efficient detection - Fallback to grep-based analysis - Template-based pattern generation - Focus on actionable duplications - Clear DRY refactoring guidance This ensures thorough duplication detection with practical DRY refactoring strategies based on obra principles while respecting token limits.