--- name: maker-framework description: Orchestrate reliable multi-agent reasoning using MAKER (Maximal Agentic Knowledge Engine for Reasoning). Implements three-pillar architecture for transforming probabilistic LLM outputs into deterministic, verifiable results. Use when tasks require high reliability, parallel consensus voting, or systematic error detection. Triggers include reliability-critical tasks, multi-step reasoning chains, consensus-based verification, parallel agent execution, or explicit MAKER invocation. --- # MAKER Framework Skill Transform unreliable single-model inference into robust, verifiable reasoning through maximal decomposition, parallel consensus voting, and systematic error filtering. ## When to Use MAKER **High-value triggers:** - Tasks requiring >90% accuracy (medical, legal, financial) - Multi-step reasoning where errors compound (p^n problem) - Verification-critical outputs (code, calculations, facts) - Ambiguous tasks benefiting from diverse perspectives **Skip MAKER for:** - Single-fact retrieval (no decomposition benefit) - Creative tasks where diversity is desirable - Time-critical responses (voting adds latency) ## Core Architecture MAKER operates on three pillars applied sequentially: ``` Task → [Pillar 1: Decompose] → DAG of subtasks → [Pillar 2: Vote] → Parallel execution + consensus → [Pillar 3: Filter] → Red-flag invalid outputs → Validated Result ``` ### Pillar 1: Maximal Agentic Decomposition (MAD) Decompose complex tasks into atomic, independently-executable subtasks forming a DAG. **Decomposition principles:** - Each subtask has single, well-defined objective - Subtasks receive explicit input/output schemas - Dependencies form acyclic graph (no cycles) - Maximize width (parallelism) over depth (sequential) **Tool:** `maker_build_dag` ### Pillar 2: First-to-Ahead-by-k Voting Execute each subtask with m parallel agents; accept when one result leads by k votes. **Configuration by criticality:** | Level | m | k | Confidence | |----------|----|----|------------| | low | 3 | 1 | ~70% | | medium | 5 | 2 | ~85% | | high | 7 | 3 | ~95% | | critical | 11 | 5 | ~99% | **Tool:** `maker_vote`, `maker_get_config` ### Pillar 3: Red-Flagging System Discard outputs exhibiting error indicators before voting. **Red flag types:** - Length exceeded (verbose = uncertain) - Format violation (schema mismatch) - Placeholder detected ([TODO], [N/A]) - Uncertainty markers ("possibly", "might be") **Tool:** `maker_red_flag` ## Workflow ### Standard MAKER Pipeline ``` 1. Decompose task → maker_build_dag 2. For each subtask in topological order: a. Generate prompts → maker_generate_prompt (×m) b. Execute agents (parallel LLM calls) c. Validate outputs → maker_red_flag (each) d. Vote on valid outputs → maker_vote 3. Compose results → maker_compose_results ``` ### Example: Multi-Hop QA Task: "What is the capital of the country where the inventor of the telephone was born?" **Step 1: Decompose** ```json { "subtasks": [ {"id": "t1", "description": "Identify inventor of telephone", "dependencies": []}, {"id": "t2", "description": "Determine birthplace of {t1}", "dependencies": ["t1"]}, {"id": "t3", "description": "Identify capital of {t2}", "dependencies": ["t2"]} ] } ``` **Step 2: Execute with voting (m=5, k=2 for medium criticality)** t1 outputs: ["Alexander Graham Bell", "Alexander Graham Bell", "A.G. Bell", "Alexander Graham Bell", "Bell"] → Normalize → "alexander graham bell" wins with 4 votes t2 (with input "Alexander Graham Bell"): → "Edinburgh, Scotland" wins after red-flagging one verbose response t3 (with input "Scotland"): → "Edinburgh" wins unanimously **Step 3: Compose** Final answer: "Edinburgh" ## Integration with Reasoning Skills ### With hierarchical-reasoning MAKER complements hierarchical-reasoning by adding reliability to each reasoning level: ``` Strategic level → MAKER(criticality=high) for key decisions Tactical level → MAKER(criticality=medium) for approach validation Operational → Direct execution for atomic operations ``` ### With knowledge-graph Use MAKER voting on entity extraction to achieve higher-quality knowledge graphs: ``` Document → [MAKER: Extract entities (m=5)] → Validated entities → [MAKER: Extract relations (m=5)] → Validated relations → knowledge-graph merge ``` ## Tool Reference ### maker_build_dag Construct DAG from subtask definitions. Validates acyclicity and computes execution order. ### maker_red_flag Apply red-flag validation to agent output. Returns is_valid boolean and flag details. ### maker_vote Execute first-to-ahead-by-k voting. Returns consensus output with confidence score. ### maker_compute_reliability Calculate theoretical system reliability for given (m, k, n) configuration. ### maker_get_config Get recommended (m, k) configuration for criticality level. ### maker_compose_results Combine validated subtask outputs into final result. ### maker_generate_prompt Create optimized micro-agent prompt with constraints and schema. ## Configuration Guide ### Selecting m and k **Cost-accuracy tradeoff:** - Higher m → more reliable but costlier - Higher k → stronger consensus but slower termination - Early termination typically reduces cost by 30-50% **Decision framework:** 1. Start with criticality-based defaults via `maker_get_config` 2. Use `maker_compute_reliability` to validate configuration 3. Adjust based on empirical accuracy and cost metrics ### Output Schema Design Well-designed schemas enable format-based red-flagging: ```json { "type": "object", "properties": { "answer": {"type": "string"}, "confidence": {"type": "number", "minimum": 0, "maximum": 1} }, "required": ["answer"] } ``` ### Equivalence Methods - `exact`: String equality after trim (dates, numbers) - `normalized`: Lowercase + whitespace normalization (text) - `json`: Parse and re-serialize for canonical comparison (structured) ## Performance Characteristics **Reliability improvement (assuming 85% agent accuracy):** | Steps | Single Agent | MAKER (m=5, k=2) | |-------|-------------|------------------| | 1 | 85.0% | 97.1% | | 3 | 61.4% | 91.5% | | 5 | 44.4% | 86.2% | **Cost multiplier:** ~4-6× single agent (with early termination) **Latency:** ~2-4× single agent (parallelism offsets voting overhead) ## Error Handling **Insufficient valid outputs (red-flagging too aggressive):** 1. Retry with additional agents 2. Relax red-flag thresholds 3. Refine subtask prompt **No consensus (high disagreement):** 1. Further decompose the problematic subtask 2. Increase k threshold 3. Escalate to human review **Cycle detected in DAG:** 1. Review dependency structure 2. Break circular dependencies into sequential steps