--- name: map-reduce description: | Orchestration pattern for large-scale processing where work is distributed, processed independently, then combined. Map phase splits and processes, Reduce phase aggregates results. Use for codebase-wide analysis, bulk transformations, or any task where "do X to everything, then summarize." allowed-tools: | bash: ls, cat, grep, find file: read, write mcp: task --- # Map-Reduce Some tasks are too large for single-pass processing: analyze every file in a codebase, check all endpoints for issues, transform all components to new pattern. Map-Reduce handles this: split the work (map), process in parallel, combine results (reduce). It's batch processing for Claude. ## When To Activate Trigger when: - Task involves "all files" or "every X" - Processing a large number of similar items - Need aggregate statistics or summaries across many items - Transformation must be applied uniformly - Analysis must cover entire codebase/dataset Do NOT trigger for: - Small number of items (just do them directly) - Items requiring different processing logic - When intermediate results affect how later items are processed ## The Pattern ``` MAP PHASE REDUCE PHASE ┌────────────┐ │ Process A │──→ Result A ─┐ [All Items] ├────────────┤ │ ┌─────────────┐ │ │ Process B │──→ Result B ─┼──→ │ Aggregate │──→ [Final] │ ├────────────┤ │ └─────────────┘ └──Split─│ Process C │──→ Result C ─┘ ├────────────┤ │ Process D │──→ Result D ─┘ └────────────┘ ``` ## Instructions ### Step 1: Define the Job Specify what you're processing: ``` Map-Reduce Job: [Name] Input: [What collection of items] Map function: [What to do to each item] Reduce function: [How to combine results] Expected output: [What the final result looks like] ``` ### Step 2: Enumerate Items (Map Input) List everything to process: ``` Items to process: 1. [Item 1] - [brief description] 2. [Item 2] - [brief description] 3. [Item 3] - [brief description] ... Total: [N] items ``` For large sets, use patterns: ``` Items: All files matching src/**/*.ts Count: ~150 files Batching: Groups of 10 ``` ### Step 3: Execute Map Phase Process each item (parallel when possible): ``` ═══════════════════════════════════════ MAP PHASE: Processing [N] items ═══════════════════════════════════════ Batch 1 (items 1-10): [Processing...] - Item 1: [Result] - Item 2: [Result] ... Batch 2 (items 11-20): [Processing...] ... Map phase complete: [N] items processed - Succeeded: [X] - Failed: [Y] - Skipped: [Z] ``` ### Step 4: Collect Intermediate Results Gather all map outputs: ``` Intermediate results: ┌──────────┬─────────────────────────────┐ │ Item │ Map Result │ ├──────────┼─────────────────────────────┤ │ Item 1 │ [Result summary] │ │ Item 2 │ [Result summary] │ │ ... │ ... │ └──────────┴─────────────────────────────┘ ``` ### Step 5: Execute Reduce Phase Aggregate results: ``` ═══════════════════════════════════════ REDUCE PHASE: Aggregating results ═══════════════════════════════════════ Reduction strategy: [How combining] Aggregating... Categories identified: - Category A: [N] items - Category B: [M] items Statistics: - Total processed: [X] - Issues found: [Y] - Patterns detected: [Z] ``` ### Step 6: Produce Final Output Present combined results: ``` ═══════════════════════════════════════ MAP-REDUCE COMPLETE: [Job Name] ═══════════════════════════════════════ ## Summary [High-level findings] ## Statistics - Items processed: [N] - [Metric 1]: [Value] - [Metric 2]: [Value] ## Categories/Groups [Breakdown by category] ## Notable Items [Specific items worth highlighting] ## Recommendations [Actions based on findings] ``` ## Batching Strategies **By count:** ``` Items: 150 files Batch size: 10 Batches: 15 ``` **By directory:** ``` Items: All TypeScript files Batches: src/api/*, src/components/*, src/utils/* ``` **By type:** ``` Items: All source files Batches: *.ts, *.tsx, *.css ``` **Adaptive:** ``` Start with batch of 20 If too slow → reduce to 10 If fast → increase to 30 ``` ## Reduce Functions **Count:** Total items matching criteria **Sum:** Aggregate numeric values **Group:** Categorize by attribute **Filter:** Surface items matching condition **Top-N:** Highest/lowest by metric **Histogram:** Distribution across buckets **Merge:** Combine into single artifact ## NEVER - Process items that depend on other items' results (use pipeline) - Ignore failed items - always report what couldn't be processed - Reduce without the map phase (need individual results first) - Skip batching for large sets (will timeout/overwhelm) - Assume uniform results - expect variation ## ALWAYS - Define map and reduce functions before starting - Batch appropriately for the workload - Track success/failure per item - Produce actionable summary, not just raw data - Include "notable items" - outliers worth attention ## Examples ### Example 1: Security Audit User: "Check all API endpoints for authentication issues" ``` Map-Reduce Job: API Security Audit Input: All files in src/api/**/*.ts Map function: Check each file for auth patterns Reduce function: Group by issue severity ═══════════════════════════════════════ MAP PHASE: Processing 23 files ═══════════════════════════════════════ - users.ts: [Missing rate limit on /login] - orders.ts: [No auth on /history endpoint] - products.ts: [Clean] - admin.ts: [Deprecated auth method] ... ═══════════════════════════════════════ REDUCE PHASE: Aggregating results ═══════════════════════════════════════ By severity: - Critical: 2 files (orders.ts, payments.ts) - Warning: 5 files - Clean: 16 files ═══════════════════════════════════════ MAP-REDUCE COMPLETE: API Security Audit ═══════════════════════════════════════ Summary: 7 of 23 endpoints have auth issues Critical (fix immediately): 1. orders.ts:45 - GET /history has no auth check 2. payments.ts:23 - POST /refund missing admin check Recommendations: 1. Add auth middleware to orders router 2. Implement admin check on payments ``` ### Example 2: Codebase Metrics User: "How complex is our codebase? Get metrics on all files." ``` Map-Reduce Job: Codebase Complexity Analysis Input: All source files Map function: Count lines, functions, cyclomatic complexity Reduce function: Aggregate statistics, find outliers ═══════════════════════════════════════ MAP PHASE: Processing 234 files ═══════════════════════════════════════ [Batched processing of all files...] ═══════════════════════════════════════ REDUCE PHASE: Aggregating results ═══════════════════════════════════════ Totals: - Lines of code: 45,230 - Functions: 1,847 - Average complexity: 4.2 Distribution: - Low complexity (<5): 78% - Medium (5-10): 18% - High (>10): 4% ═══════════════════════════════════════ MAP-REDUCE COMPLETE ═══════════════════════════════════════ Notable outliers (high complexity): 1. src/utils/parser.ts - complexity 23 2. src/api/legacy/converter.ts - complexity 19 Recommendations: Consider refactoring top 5 complex files. ``` ### Example 3: Bulk Transformation User: "Convert all class components to functional components" ``` Map-Reduce Job: Class → Functional Conversion Input: All React component files Map function: Convert class to functional if applicable Reduce function: Track conversions, summarize changes ═══════════════════════════════════════ MAP PHASE ═══════════════════════════════════════ - Header.tsx: Converted (was class) - Button.tsx: Already functional (skipped) - Modal.tsx: Converted (was class) - LegacyForm.tsx: Cannot convert (uses getDerivedStateFromProps) ... ═══════════════════════════════════════ REDUCE PHASE ═══════════════════════════════════════ Summary: - Converted: 34 components - Already functional: 56 components - Cannot convert: 3 components - Failed: 1 component Changes made to 34 files. ``` What DOESN'T work: - No batching on 500+ files: Timeout, context overflow - Vague map function: "Check for issues" → inconsistent results - No reduce strategy: End up with 200 disconnected bullet points - Processing order-dependent items: Results inconsistent - Ignoring failures: Miss important edge cases ## Why This Elixir Exists "Check everything" is easy to say, hard to do well. Without structure, you get incomplete coverage, inconsistent analysis, and no useful summary. Map-Reduce brings discipline to bulk operations: every item processed uniformly, failures tracked, results aggregated meaningfully. It's the difference between "I looked at some files" and "I analyzed all 234 files, here's what I found." Scale requires structure. This is that structure.