---
name: maker-methodology
description: Apply MAKER (Massively Decomposed Agentic Processes) to solve long sequential tasks using task decomposition, multi-agent voting, and error correction. Use when facing complex multi-step problems, sequential planning, constraint satisfaction, or tasks requiring many consecutive decisions.
allowed-tools: Read, Write, Edit, Bash, Grep, Glob
---

# MAKER Methodology

Solve million-step tasks with zero errors using Massively Decomposed Agentic Processes.

Based on: ["Solving a Million-Step LLM Task with Zero Errors"](https://arxiv.org/html/2511.09030v1)

## Core Principles

### 1. Maximal Agentic Decomposition (MAD)
Break complex tasks into **minimal single-step subtasks**, not monolithic solutions.

**Instead of**: "Generate entire solution"
**Do**: "Determine next single step" repeated N times

### 2. First-to-Ahead-by-k Voting
Use multiple independent agents voting on each step:
- Continue sampling until one option leads by **k votes**
- **k grows logarithmically** with task complexity: Θ(ln s)
- Prevents error propagation through consensus

### 3. Red-Flagging
Detect and discard unreliable responses:
- Check length (too short/long)
- Validate format
- Detect failure patterns
- Domain-specific validation

## When to Use MAKER

### ✅ Good Fit
- [ ] Task has **>10 sequential steps**
- [ ] Each step has **enumerable options**
- [ ] **State is trackable** between steps
- [ ] **Progress is measurable**
- [ ] Intermediate states are **verifiable**
- [ ] Single sophisticated approach struggles

### ❌ Poor Fit
- Creative/open-ended generation
- Requires holistic understanding
- Continuous optimization
- Tasks completing in <10 steps
- Highly parallel tasks (order doesn't matter)

## Task Types MAKER Excels At

1. **Constraint Satisfaction**: Sudoku, scheduling, resource allocation
2. **Sequential Planning**: Route planning, multi-step refactoring
3. **Code Generation**: Multi-file implementation, test generation
4. **Mathematical Reasoning**: Proof construction, equation solving
5. **Data Pipelines**: ETL workflows, data cleaning sequences

## Implementation Steps

### Step 1: Define Task Interface

Every MAKER task needs these components:

```python
class YourTask:
    def get_current_state(self) -> State:
        """Return current task state."""
        pass

    def get_possible_actions(self) -> List[Action]:
        """Return valid actions from current state."""
        pass

    def apply_action(self, action: Action) -> bool:
        """Apply action and update state. Return success."""
        pass

    def is_complete(self) -> bool:
        """Check if task is finished."""
        pass

    def get_progress(self) -> float:
        """Return completion percentage (0.0 to 1.0)."""
        pass

    def format_for_agent(self) -> str:
        """Format state for LLM consumption (minimal context)."""
        pass
```

### Step 2: Compute Voting Margin

```python
def compute_k(num_steps: int) -> int:
    """Voting margin grows logarithmically."""
    if num_steps <= 10:
        return 2
    elif num_steps <= 100:
        return 3
    elif num_steps <= 1000:
        return 4
    else:
        return max(3, int(math.log(num_steps)) + 1)
```

### Step 3: Create Minimal Agent Prompts

**Key**: Each agent sees ONLY what's needed for the current step.

```
You are solving {task_name}. This is step {step_num}/{expected_steps}.

Current state:
{minimal_state_representation}

What is the next action? Respond ONLY with the action in format: {expected_format}
Do not explain. Just give the action.
```

### Step 4: Implement Voting

```python
def vote_on_next_action(state, k=3, max_agents=50):
    votes = Counter()
    agents_sampled = 0

    while agents_sampled < max_agents:
        action = get_agent_vote(state)  # LiteLLM call

        if action and not should_red_flag(action):
            votes[action] += 1

            # Check for k-vote lead
            sorted_votes = votes.most_common()
            if sorted_votes:
                leader, leader_count = sorted_votes[0]
                second_count = sorted_votes[1][1] if len(sorted_votes) > 1 else 0

                if leader_count - second_count >= k:
                    return leader  # Consensus!

        agents_sampled += 1

    return votes.most_common(1)[0][0] if votes else None
```

### Step 5: Configure Red-Flagging

```python
def should_red_flag(response: str, context: dict) -> bool:
    # Length checks
    if len(response) > 200 or len(response) < 1:
        return True

    # Failure patterns
    if any(pattern in response.lower() for pattern in
           ["i cannot", "i don't know", "error", "invalid"]):
        return True

    # Format validation (task-specific)
    if not matches_expected_format(response):
        return True

    # Domain-specific checks
    return not domain_validator(response, context)
```

### Step 6: Execute MAKER Loop

```python
state = initialize_task()
k = compute_k(estimated_steps)

while not state.is_complete():
    # Vote on next action
    action = vote_on_next_action(state, k=k)

    if action is None:
        # No consensus - may need to backtrack or increase k
        handle_voting_failure()
        continue

    # Apply action
    success = state.apply_action(action)

    if not success:
        # Invalid action - this shouldn't happen with good voting
        handle_invalid_action()
        continue

# Verify final solution
verify_solution(state)
```

## Adaptation Patterns

### Pattern A: Constraint Satisfaction

**Example**: Solving Sudoku

```python
class SudokuTask:
    def get_possible_actions(self):
        # Return valid numbers for next empty cell
        cell = self.next_empty_cell()
        return [num for num in range(1, 10)
                if self.is_valid(cell, num)]

    def format_for_agent(self):
        return f"""
Grid state: {self.grid}
Next cell to fill: {self.next_cell}
Valid options: {self.get_possible_actions()}
Constraints: Row/Column/Box must have 1-9 exactly once
"""
```

**Agent Prompt**:
```
You are solving Sudoku. This is step {step}/{81}.

Current grid:
{grid_visualization}

Which number should go in cell ({row}, {col})?
Valid options: {valid_numbers}

Respond ONLY with the number (1-9). No explanation.
```

### Pattern B: Sequential Planning

**Example**: Multi-step code refactoring

```python
class CodeRefactorTask:
    def get_possible_actions(self):
        return [
            "rename_function(old_name, new_name)",
            "extract_method(lines, new_name)",
            "move_to_module(function, target)",
            "update_imports()"
        ]

    def format_for_agent(self):
        return f"""
Current file: {self.current_file}
Function to refactor: {self.target_function}
Available refactorings: {self.get_possible_actions()}
Tests passing: {self.test_status}
"""
```

**Agent Prompt**:
```
You are refactoring {project_name}. This is step {step}.

Current situation:
- File: {filename}
- Function: {function_name}
- Issue: {code_smell}

What refactoring should be applied next?
Options:
{numbered_options}

Respond ONLY with the option number. No explanation.
```

### Pattern C: Mathematical Reasoning

**Example**: Constructing a proof

```python
class ProofTask:
    def get_possible_actions(self):
        # Return applicable inference rules
        return [rule for rule in self.inference_rules
                if rule.can_apply(self.current_statement)]

    def format_for_agent(self):
        return f"""
Current statement: {self.current}
Goal statement: {self.goal}
Available axioms: {self.axioms}
Available rules: {self.get_possible_actions()}
"""
```

### Pattern D: Data Processing Pipeline

**Example**: ETL workflow

```python
class ETLTask:
    def get_possible_actions(self):
        return [
            "remove_duplicates(column)",
            "fill_missing(column, strategy)",
            "normalize(column, method)",
            "merge_tables(table1, table2, key)"
        ]

    def format_for_agent(self):
        return f"""
Data shape: {self.df.shape}
Missing values: {self.missing_summary()}
Data quality score: {self.quality_score()}
Next transformation options: {self.get_possible_actions()}
"""
```

## Red-Flagging by Task Type

### For Code Generation
- Check syntax validity
- Ensure imports are defined
- Verify function signatures match
- Flag overly long responses (likely hallucination)

### For Mathematical Reasoning
- Verify notation consistency
- Check logical structure
- Flag undefined symbols
- Ensure rule application is valid

### For Planning Tasks
- Verify preconditions are met
- Check action is in allowed set
- Flag circular dependencies
- Ensure resources are available

### For Constraint Satisfaction
- Verify constraints not violated
- Check value in domain
- Flag contradictions
- Ensure progress toward goal

## Cost Analysis

MAKER is cost-effective when:

```
(cheap_model_cost × avg_votes × num_steps) < (expensive_model_cost × num_steps)
```

**Key Insight**: Even with 10-50 votes per step, cheap models (gpt-4o-mini) are often cheaper than one expensive model (gpt-4, o1).

Example:
- GPT-4: $0.015/step
- MAKER (gpt-4o-mini, avg 5 votes): $0.00015/step
- **100× cheaper!**

## Implementation Checklist

When applying MAKER to your task:

- [ ] Define clear state representation
- [ ] Enumerate possible actions per state
- [ ] Create minimal agent prompts (only current step context)
- [ ] Implement state validation
- [ ] Configure red-flagging for your domain
- [ ] Compute appropriate k based on task length
- [ ] Set up progress tracking
- [ ] Implement final solution verification
- [ ] Estimate cost vs single-model approach
- [ ] Test with small instances first

## Debugging MAKER Implementations

### Issue: Agents don't converge (no consensus)

**Causes**:
- k too high for task complexity
- Ambiguous state representation
- Multiple valid solutions

**Solutions**:
- Reduce k or use adaptive k
- Add more context to agent prompts
- Add tie-breaking rules

### Issue: Agents converge to wrong answer

**Causes**:
- Insufficient red-flagging
- Misleading state representation
- Correlated errors (agents make same mistake)

**Solutions**:
- Tighten red-flagging criteria
- Clarify prompt formatting
- Increase temperature for diversity
- Add validation after each step

### Issue: Too slow / too expensive

**Causes**:
- k too high
- Too many agents per vote
- Expensive model selected

**Solutions**:
- Use cheaper model (gpt-4o-mini)
- Reduce k if possible
- Parallelize agent calls
- Cache repeated states

## Examples

### Example 1: Solving Towers of Hanoi (4 disks)

```python
from maker import MAKER, MAKERConfig
from towers_of_hanoi import GameState

# Configure
config = MAKERConfig(
    model="gpt-4o-mini",
    k=3,  # For 15 steps: k=3 is sufficient
    verbose=True
)

# Solve
maker = MAKER(config)
success, moves, stats = maker.solve_towers_of_hanoi(num_disks=4)

# Expected: 15 moves, zero errors
```

### Example 2: Code Refactoring

```python
class RefactorTask:
    def __init__(self, codebase, target_pattern):
        self.codebase = codebase
        self.target = target_pattern
        self.changes = []

    def get_possible_actions(self):
        # Find all instances needing refactoring
        instances = find_pattern(self.codebase, self.target)
        return [f"refactor_{i}" for i in instances]

config = MAKERConfig(
    model="gpt-4o-mini",
    k=compute_k(len(instances)),
    task_type="code_refactoring"
)

maker = MAKER(config, task=RefactorTask(codebase, pattern))
success, changes, stats = maker.solve()
```

## Key Takeaways

1. **Decompose maximally**: Smallest possible steps
2. **Minimize context**: Each agent sees only current step
3. **Vote for consensus**: Prevents error propagation
4. **Red-flag aggressively**: Catch errors early
5. **Scale logarithmically**: k grows as Θ(ln s)
6. **Use cheap models**: They work better with voting!

## Reference Implementation

See `MAKER_GENERALIZATION.md` for:
- Universal task interface
- Adaptation patterns for different domains
- Detailed cost analysis
- Real-world examples
- Troubleshooting guide

## Further Reading

- Paper: https://arxiv.org/html/2511.09030v1
- Implementation: See `maker.py` for working code
- Examples: See `test_maker.py` for different scenarios