--- name: multi-model-meta-analysis description: | Synthesize outputs from multiple AI models into a comprehensive, verified assessment. Use when: (1) User pastes feedback/analysis from multiple LLMs (Claude, GPT, Gemini, etc.) about code or a project, (2) User wants to consolidate model outputs into a single reliable document, (3) User needs conflicting model claims resolved against actual source code. This skill verifies model claims against the codebase, resolves contradictions with evidence, and produces a more reliable assessment than any single model. --- # Multi-Model Synthesis Combine outputs from multiple AI models into a verified, comprehensive assessment by cross-referencing claims against the actual codebase. ## Core Principle Models hallucinate and contradict each other. The source code is the source of truth. Every significant claim must be verified before inclusion in the final assessment. ## Process ### 1. Extract Claims Parse each model's output and extract discrete claims: - Factual assertions about the code ("function X does Y", "there's no error handling in Z") - Recommendations ("should add validation", "refactor this pattern") - Identified issues ("bug in line N", "security vulnerability") Tag each claim with its source model. ### 2. Deduplicate Group semantically equivalent claims: - "Lacks input validation" = "No sanitization" = "User input not checked" - "Should use async/await" = "Convert to promises" = "Make asynchronous" Create canonical phrasing. Track which models mentioned each. ### 3. Verify Against Source For each factual claim or identified issue: ``` CLAIM: "The auth middleware doesn't check token expiry" VERIFY: Read the auth middleware file FINDING: [Confirmed | Refuted | Partially true | Cannot verify] EVIDENCE: [Quote relevant code or explain why claim is wrong] ``` Use Grep, Glob, and Read tools to locate and examine relevant code. Do not trust model claims without verification. ### 4. Resolve Conflicts When models contradict each other: 1. Identify the specific disagreement 2. Examine the actual code 3. Determine which model (if any) is correct 4. Document the resolution with evidence ``` CONFLICT: Model A says "uses SHA-256", Model B says "uses MD5" INVESTIGATION: Read crypto.js lines 45-60 RESOLUTION: Model B is correct - line 52 shows MD5 usage EVIDENCE: `const hash = crypto.createHash('md5')` ``` ### 5. Synthesize Assessment Produce a final document that: - States verified facts (not model opinions) - Cites evidence for significant claims - Notes where verification wasn't possible - Preserves valuable insights that don't require verification (e.g., design suggestions) ## Output Format ```markdown # Synthesized Assessment: [Topic] ## Summary [2-3 sentences describing the verified findings] ## Verified Findings ### Confirmed Issues | Issue | Severity | Evidence | Models | |-------|----------|----------|--------| | [Issue] | High/Med/Low | [file:line or quote] | Claude, GPT | ### Refuted Claims | Claim | Source | Reality | |-------|--------|---------| | [What model said] | GPT-4 | [What code actually shows] | ### Unverifiable Claims | Claim | Source | Why Unverifiable | |-------|--------|------------------| | [Claim] | Claude | [Requires runtime testing / external system / etc.] | ## Consensus Recommendations [Items where 2+ models agree AND verification supports the suggestion] ## Unique Insights Worth Considering [Valuable suggestions from single models that weren't contradicted] ## Conflicts Resolved | Topic | Model A | Model B | Verdict | Evidence | |-------|---------|---------|---------|----------| | [Topic] | [Position] | [Position] | [Which is correct] | [Code reference] | ## Action Items ### Critical (Verified, High Impact) - [ ] [Item] — Evidence: [file:line] ### Important (Verified, Medium Impact) - [ ] [Item] — Evidence: [file:line] ### Suggested (Unverified but Reasonable) - [ ] [Item] — Source: [Models] ``` ## Verification Guidelines **Always verify:** - Bug reports and security issues - Claims about what code does or doesn't do - Assertions about missing functionality - Performance or complexity claims **Trust but note source:** - Style and readability suggestions - Architectural recommendations - Best practice suggestions **Mark as unverifiable:** - Runtime behavior claims (without tests) - Performance benchmarks (without profiling) - External API behavior - User experience claims ## Anti-Patterns - Blindly merging model outputs without checking code - Treating model consensus as proof (all models can be wrong) - Omitting refuted claims (document what was wrong - it's valuable) - Skipping verification because claims "sound right"