---
name: meta-cognitive-reasoning
description: Meta-cognitive reasoning specialist for evidence-based analysis, hypothesis
  testing, and cognitive failure prevention. Use when conducting reviews, making assessments,
  debugging complex issues, or any task requiring rigorous analytical reasoning. Prevents
  premature conclusions, assumption-based errors, and pattern matching without verification.
tags:
- reasoning
- analysis
- review
- debugging
- assessment
- decision-making
- cognitive failure prevention
- meta-cognitive reasoning
- evidence-based reasoning
author: Joseph OBrien
status: unpublished
updated: '2025-12-23'
version: 1.0.1
tag: skill
type: skill
---

# Meta-Cognitive Reasoning

This skill provides disciplined reasoning frameworks for avoiding cognitive failures in analysis, reviews, and decision-making. It enforces evidence-based conclusions, multiple hypothesis generation, and systematic verification.

## When to Use This Skill

- Before making claims about code, systems, or versions
- When conducting code reviews or architectural assessments
- When debugging issues with multiple possible causes
- When encountering unfamiliar patterns or versions
- When making recommendations that could have significant impact
- When pattern matching triggers immediate conclusions
- When analyzing documentation or specifications
- During any task requiring rigorous analytical reasoning

## What This Skill Does

1. **Evidence-Based Reasoning**: Enforces showing evidence before interpretation
2. **Multiple Hypothesis Generation**: Prevents premature commitment to single explanation
3. **Temporal Knowledge Verification**: Handles knowledge cutoff limitations
4. **Cognitive Failure Prevention**: Recognizes and prevents common reasoning errors
5. **Self-Correction Protocol**: Provides framework for transparent error correction
6. **Scope Discipline**: Allocates cognitive effort appropriately

## Core Principles

### 1. Evidence-Based Reasoning Protocol

**Universal Rule: Never conclude without proof**

```
MANDATORY SEQUENCE:
1. Show tool output FIRST
2. Quote specific evidence
3. THEN interpret
```

**Forbidden Phrases:**

- "I assume"
- "typically means"
- "appears to"
- "Tests pass" (without output)
- "Meets standards" (without evidence)

**Required Phrases:**

- "Command shows: 'actual output' - interpretation"
- "Line N: 'code snippet' - meaning"
- "Let me verify..." -> tool output -> interpretation

### 2. Multiple Working Hypotheses

**When identical observations can arise from different mechanisms with opposite implications - investigate before concluding.**

**Three-Layer Reasoning Model:**

```
Layer 1: OBSERVATION (What do I see?)
Layer 2: MECHANISM (How/why does this exist?)
Layer 3: ASSESSMENT (Is this good/bad/critical?)

FAILURE: Jump from Layer 1 -> Layer 3 (skip mechanism)
CORRECT: Layer 1 -> Layer 2 (investigate) -> Layer 3 (assess with context)
```

**Decision Framework:**

1. **Recognize multiple hypotheses exist**
   - What mechanisms could produce this observation?
   - Which mechanisms have opposite implications?

2. **Generate competing hypotheses explicitly**
   - Hypothesis A: [mechanism] -> [implication]
   - Hypothesis B: [different mechanism] -> [opposite implication]

3. **Identify discriminating evidence**
   - What single observation would prove/disprove each?

4. **Gather discriminating evidence**
   - Run the specific test that distinguishes hypotheses

5. **Assess with mechanism context**
   - Same observation + different mechanism = different assessment

### 3. Temporal Knowledge Currency

**Training data has a timestamp; absence of knowledge ≠ evidence of absence**

**Critical Context Check:**

```
Before making claims about what exists:
1. What is my knowledge cutoff date?
2. What is today's date?
3. How much time has elapsed?
4. Could versions/features beyond my training exist?
```

**High Risk Domains (always verify):**

- Package versions (npm, pip, maven)
- Framework versions (React, Vue, Django)
- Language versions (Python, Node, Go)
- Cloud service features (AWS, GCP, Azure)
- API versions and tool versions

**Anti-Patterns:**

- "Version X doesn't exist" (without verification)
- "Latest is Y" (based on stale training data)
- "CRITICAL/BLOCKER" without evidence

### 4. Self-Correction Protocol

**When discovering errors in previous output:**

```
STEP 1: ACKNOWLEDGE EXPLICITLY
- Lead with "CRITICAL CORRECTION"
- Make it impossible to miss

STEP 2: STATE PREVIOUS CLAIM
- Quote exact wrong statement

STEP 3: PROVIDE EVIDENCE
- Show what proves the correction

STEP 4: EXPLAIN ERROR CAUSE
- Root cause: temporal gap? assumption?

STEP 5: CLEAR ACTION
- "NO CHANGE NEEDED" or "Revert suggestion"
```

### 5. Cognitive Resource Allocation

**Parsimony Principle:**

- Choose simplest approach that satisfies requirements
- Simple verification first, complexity only when simple fails

**Scope Discipline:**

- Allocate resources to actual requirements, not hypothetical ones
- "Was this explicitly requested?"

**Information Economy:**

- Reuse established facts
- Re-verify when context changes

**Atomicity Principle:**

- Each action should have one clear purpose
- If description requires "and" between distinct purposes, split it
- Benefits: clearer failure diagnosis, easier progress tracking, better evidence attribution

### 6. Systematic Completion Discipline

**Never declare success until ALL requirements verified**

**High-Risk Scenarios for Premature Completion:**

- Multi-step tasks with many quality gates
- After successfully fixing major issues (cognitive reward triggers)
- When tools show many errors (avoidance temptation)
- Near end of session (completion pressure)

**Completion Protocol:**

1. Break requirements into explicit checkpoints
2. Complete each gate fully before proceeding
3. Show evidence at each checkpoint
4. Resist "good enough" shortcuts

**Warning Signs:**

- Thinking "good enough" instead of checking all requirements
- Applying blanket solutions without individual analysis
- Skipping systematic verification
- Declaring success while evidence shows otherwise

### 7. Individual Analysis Over Batch Processing

**Core Principle: Every item deserves individual attention**

**Apply to:**

- Error messages (read each one individually)
- Review items (analyze each line/file)
- Decisions (don't apply blanket rules)
- Suppressions (justify each one specifically)

**Anti-Patterns:**

- Bulk categorization without reading details
- Blanket solutions applied without context
- Batch processing of unique situations

### 8. Semantic vs Literal Analysis

**Look for conceptual overlap, not just text/pattern duplication**

**Key Questions:**

- What is the actual PURPOSE here?
- Does this serve a functional need or just match a pattern?
- What would be LOST if I removed/changed this?
- Is this the same CONCEPT expressed differently?

**Applications:**

- Documentation: Identify semantic duplication across hierarchy levels
- Code review: Understand intent before suggesting changes
- Optimization: Analyze actual necessity before improving

## How to Use

### Verify Before Claiming

```
Verify that package X version Y exists before recommending changes
```

```
Check if this file structure is symlinks or duplicates before recommending consolidation
```

### Generate Multiple Hypotheses

```
The tests are failing with timeout errors. What are the possible mechanisms?
```

```
These three files have identical content. What could explain this?
```

### Conduct Evidence-Based Review

```
Review this code and show evidence for every claim
```

## Reasoning Workflows

### Verification Workflow

When encountering unfamiliar versions/features:

1. **Recognize uncertainty**: "I don't recall X from training"
2. **Form hypotheses**: A) doesn't exist, B) exists but new, C) is current
3. **Verify before concluding**: Check authoritative source
4. **Show evidence, then interpret**: Command output -> conclusion

### Assessment Workflow

When analyzing code, architecture, or configurations:

1. **Observe**: What do I see?
2. **Investigate mechanism**: HOW does this exist?
3. **Then assess**: Based on mechanism, is this good/bad?

### Review Workflow

For code reviews, documentation reviews, or any analysis:

1. **Clarify scope**: Ask before assuming
2. **Show evidence for every claim**: File:line:code
3. **Generate hypotheses before concluding**
4. **Distinguish mechanism from observation**
5. **Reserve strong language for verified issues**

## Cognitive Failure Patterns

### Pattern 1: Scanning Instead of Reading

- Missing obvious issues while finding minor ones
- Prevention: Read every line/error individually

### Pattern 2: Pattern Matching Without Context

- Applying solutions without understanding problems
- Prevention: Analyze actual purpose before applying templates

### Pattern 3: Assumption-Based Conclusions

- Guessing instead of verifying
- Prevention: Evidence-based verification required

### Pattern 4: Premature Success Declaration

- "Task complete" ≠ "Requirements verified"
- Prevention: Show tool output proving completion

### Pattern 5: Temporal Knowledge Decay

- Confusing "I don't know" with "doesn't exist"
- Prevention: Verify version/feature currency

### Pattern 6: Overconfidence Cascade

```
False premise: "X doesn't exist" (unverified)
    ↓
Amplified: "This is CRITICAL/BLOCKER"
    ↓
Harmful: "Change X to older version Y"
    ↓
Impact: Downgrade from newer to older

BREAK THE CASCADE: Verify the premise first
```

## Examples

### Example 1: Version Verification

**Situation**: Reviewing code that uses `certifi==2025.10.5`

**Wrong Approach:**

```markdown
CRITICAL: Version 2025.10.5 doesn't exist!
Action: Change to 2024.x.x
```

**Correct Approach:**

```markdown
Let me verify this version exists...

$ grep -A 3 'name = "certifi"' uv.lock
name = "certifi"
version = "2025.10.5"
source = { registry = "https://pypi.org/simple" }

Evidence shows version EXISTS on PyPI.
Action: NO CHANGE NEEDED - version is valid.
```

### Example 2: File Structure Analysis

**Observation**: Three files with identical 1494-line content

- specs/AGENTS.md
- specs/CLAUDE.md
- specs/GEMINI.md

**Wrong Approach:**

```markdown
CRITICAL: File duplication - consolidate to single file
```

**Correct Approach:**

```markdown
Competing hypotheses:
A: Duplicated copies -> CRITICAL: Consolidate
B: Symlinks to single source -> EXCELLENT: Keep

Gathering discriminating evidence:
$ ls -la specs/
lrwxrwxrwx CLAUDE.md -> AGENTS.md
lrwxrwxrwx GEMINI.md -> AGENTS.md

Mechanism: Symlinks (Hypothesis B confirmed)
Assessment: EXCELLENT architecture - agent-specific entry points with single source of truth
Action: Keep as-is
```

### Example 3: Test Failure Analysis

**Observation**: 5 tests failing with "connection timeout"

**Hypotheses:**

- A: Single dependency down (fix one thing)
- B: Multiple independent timeouts (fix five things)
- C: Test infrastructure issue (fix setup)
- D: Environment config missing (fix config)

**Investigation:**

- Check test dependencies
- Check error timestamps (simultaneous vs sequential)
- Run tests in isolation

**Then conclude based on evidence.**

## Anti-Patterns

```
DO NOT:
- "File X doesn't exist" without: ls X
- "Function not used" without: grep -r "function_name"
- "Version invalid" without: checking registry/lockfile
- "Tests fail" without: running tests
- "CRITICAL/BLOCKER" without verification
- Use strong language without evidence
- Skip mechanism investigation
- Pattern match to first familiar case

DO:
- Show grep/ls/find output BEFORE claiming
- Quote actual lines: "file.py:123: 'code here' - issue"
- Check lockfiles for resolved versions
- Run available tools and show output
- Reserve strong language for evidence-proven issues
- "Let me verify..." -> tool output -> interpretation
- Generate multiple hypotheses before gathering evidence
- Distinguish observation from mechanism
```

## Clarifying Questions

**Before proceeding with complex tasks, ask:**

1. What is the primary goal/context?
2. What scope is expected (simple fix vs comprehensive)?
3. What are the success criteria?
4. What constraints exist?

**For reviews specifically:**

- Scope: All changed files or specific ones?
- Depth: Quick feedback or comprehensive analysis?
- Focus: Implementation quality, standards, or both?
- Output: List of issues or prioritized roadmap?

## Task Management Patterns

### Review Request Interpretation

**Universal Rule: ALL reviews are comprehensive unless explicitly scoped**

**Never assume limited scope based on:**

- Recent conversation topics
- Previously completed partial work
- Specific words that seem to narrow scope
- Apparent simplicity of request

**Always include:**

- All applicable quality gates
- Evidence for every claim
- Complete verification of requirements
- Systematic coverage (not spot-checking)

### Context Analysis Decision Framework

**Universal Process:**

1. **Analyze actual purpose** (don't assume from patterns)
2. **Check consistency** with actual usage
3. **Verify with evidence** (read/test to confirm)
4. **Ask before acting** when uncertain

**Recognition Pattern:**

```
WRONG: "Other components do X, so this needs X"
RIGHT: "Let me analyze if this component actually needs X for its purpose"
```

## Related Use Cases

- Code reviews requiring evidence-based claims
- Version verification before recommendations
- Architectural assessments
- Debugging with multiple possible causes
- Documentation analysis
- Security audits
- Performance investigations
- Any analysis requiring rigorous reasoning