# Human-in-the-Loop Co-Pilot Guide > **AutoResearchClaw v0.4.0** transforms the pipeline from purely autonomous to a human-AI collaborative research engine. This guide covers everything you need to know. --- ## Table of Contents 1. [Why Co-Pilot?](#1-why-co-pilot) 2. [Quick Start](#2-quick-start) 3. [Intervention Modes](#3-intervention-modes) 4. [The Co-Pilot Workflow](#4-the-co-pilot-workflow) 5. [CLI Commands](#5-cli-commands) 6. [Stage-by-Stage Intervention Guide](#6-stage-by-stage-intervention-guide) 7. [Workshops](#7-workshops) 8. [Detached Operation](#8-detached-operation) 9. [Safety & Guardrails](#9-safety--guardrails) 10. [Intelligence Layer](#10-intelligence-layer) 11. [Pipeline Branching](#11-pipeline-branching) 12. [Adapters (CLI / WebSocket / MCP)](#12-adapters) 13. [Configuration Reference](#13-configuration-reference) 14. [FAQ](#14-faq) --- ## 1. Why Co-Pilot? Fully autonomous research pipelines produce papers fast, but testing reveals consistent quality gaps: | Problem | Root Cause | |---------|-----------| | Weak research ideas | AI lacks taste for what's truly novel and impactful | | Missing baselines | AI doesn't know which comparisons reviewers expect | | Fragile experiment code | No human sanity check before execution | | Thin analysis | AI draws superficial conclusions from results | | Generic paper writing | AI produces correct-but-bland academic prose | The HITL Co-Pilot system solves this by letting you **intervene exactly where your expertise matters most**, while the AI handles the heavy lifting everywhere else. **The result**: papers that combine AI speed with human judgment. --- ## 2. Quick Start ### Option A: Co-Pilot Mode (Recommended) ```bash researchclaw run --topic "Your research idea" --mode co-pilot ``` The pipeline will run automatically and pause at key decision points for your input. At each pause, you'll see an interactive prompt with available actions. ### Option B: Express Mode (Minimal Interruption) ```bash researchclaw run --topic "Your research idea" --mode express ``` Only pauses at 3 critical gates: hypothesis approval (Stage 8), experiment design (Stage 9), and final quality check (Stage 20). ### Option C: Full Auto (Original Behavior) ```bash researchclaw run --topic "Your research idea" --auto-approve ``` No human intervention. Identical to pre-v0.4.0 behavior. --- ## 3. Intervention Modes | Mode | Flag | Pauses At | Best For | |------|------|-----------|----------| | **Full Auto** | `--auto-approve` | Never | Quick exploration, low-stakes experiments | | **Gate Only** | `--mode gate-only` | 3 gate stages (5, 9, 20) | Light oversight | | **Checkpoint** | `--mode checkpoint` | End of each phase (8 points) | Phase-level review | | **Co-Pilot** | `--mode co-pilot` | Critical stages + SmartPause triggers | **Recommended for production** | | **Step-by-Step** | `--mode step-by-step` | After every stage (23 pauses) | Learning the pipeline | | **Express** | `--mode express` | 3 most critical gates only | Experienced users | | **Custom** | `--mode custom` | User-defined per-stage policies | Advanced configuration | ### How to Choose - **First time using the pipeline?** Start with `step-by-step` to understand each stage. - **Publishing a real paper?** Use `co-pilot` for the best quality. - **Running overnight?** Use `gate-only` or `express` — fewer interruptions. - **Batch processing many topics?** Use `full-auto`. --- ## 4. The Co-Pilot Workflow When the pipeline pauses, you'll see an interactive panel: ``` ────────────────────────────────────────────────────────── HITL | Stage 08: HYPOTHESIS_GEN Post-stage review ────────────────────────────────────────────────────────── Stage 8 (HYPOTHESIS_GEN) — done Hypotheses generated. This is a CRITICAL decision point — review each hypothesis for novelty, feasibility, and significance. Outputs: hypotheses.md (1,247 bytes) → ## Hypothesis 1: Quantum gate noise as structured regularization novelty_report.json (892 bytes) Novelty score: 0.72 (moderate) Available actions: [a] Approve and continue [r] Reject and rollback [e] Edit stage output [c] Start collaborative chat [i] Inject guidance / direction [s] Skip this stage [q] Abort pipeline [v] View full stage output Action > ``` ### Available Actions at Every Pause | Key | Action | What Happens | |-----|--------|-------------| | `a` | **Approve** | Accept the output and continue to the next stage | | `r` | **Reject** | Reject the output; pipeline rolls back to an earlier stage | | `e` | **Edit** | Opens the output file in your `$EDITOR` (vim, nano, VS Code, etc.) | | `c` | **Collaborate** | Start a multi-turn chat with the AI to refine the output together | | `i` | **Inject Guidance** | Provide direction that will be incorporated into subsequent stages | | `s` | **Skip** | Skip this stage entirely (use with caution) | | `b` | **Rollback** | Jump back to a specific earlier stage | | `q` | **Abort** | Stop the pipeline entirely | | `v` | **View** | Display the full contents of output files | --- ## 5. CLI Commands ### Starting a Run ```bash # Co-Pilot mode researchclaw run --topic "Quantum noise as neural network regularization" --mode co-pilot # With explicit config researchclaw run --config config.arc.yaml --topic "..." --mode co-pilot # Resume a previous run in co-pilot mode researchclaw run --config config.arc.yaml --resume --mode co-pilot ``` ### Detached Interaction These commands let you interact with a paused pipeline from a separate terminal: ```bash # Check status researchclaw status artifacts/rc-2026-0328-abc123 # Attach interactively (full TUI) researchclaw attach artifacts/rc-2026-0328-abc123 # Quick approve (non-interactive) researchclaw approve artifacts/rc-2026-0328-abc123 --message "Looks good" # Quick reject researchclaw reject artifacts/rc-2026-0328-abc123 --reason "Missing ResNet baseline" # Inject guidance for a specific stage researchclaw guide artifacts/rc-2026-0328-abc123 --stage 9 --message "Add Dropout as baseline" ``` --- ## 6. Stage-by-Stage Intervention Guide ### Where Your Input Matters Most | Stage | Name | Co-Pilot Behavior | Your Role | |-------|------|-------------------|-----------| | 1-2 | Scoping | Pause after | Confirm research direction and scope | | 3 | Search Strategy | Pause after | Add missing search terms or sources | | 5 | Literature Screen | **Approval required** | Verify important papers aren't filtered out | | 7 | Synthesis | Pause after | Check if the identified gaps match your understanding | | **8** | **Hypothesis Gen** | **Collaboration** | **Review, discuss, and refine the core research idea** | | **9** | **Experiment Design** | **Collaboration + Approval** | **Verify baselines, benchmarks, metrics, ablations** | | 10 | Code Generation | Pause after | Spot-check code quality | | 12 | Experiment Run | Stream output | Monitor training metrics in real-time | | 13 | Iterative Refine | Pause after | Decide if refinement should continue | | **15** | **Research Decision** | **Approval required** | **Choose PROCEED, PIVOT, or REFINE** | | 16 | Paper Outline | Pause after | Adjust section structure | | **17** | **Paper Draft** | **Collaboration** | **Co-write key sections** | | 18 | Peer Review | Pause after | Prioritize which review comments to address | | **20** | **Quality Gate** | **Approval required** | **Final publication decision** | | 23 | Citation Verify | Pause after | Review flagged citations | ### Guidance Injection You can inject guidance for **any stage at any time**, even before it runs: ```bash researchclaw guide artifacts/rc-xxx --stage 17 --message "Emphasize the theoretical contribution in the Introduction. Keep the Method section concise." ``` Guidance is automatically included in the LLM's context when that stage executes. --- ## 7. Workshops Workshops are specialized collaboration tools for the most critical stages. ### Idea Workshop (Stage 7-8) When you enter collaboration mode at Stage 8, the Idea Workshop activates: ``` You > c (start collaboration) AI > I generated 3 candidate hypotheses. Here's my evaluation: 1. "Quantum gate noise as structured regularization" — Novelty: 8/10, Feasibility: 6/10 2. "Entanglement-based feature selection" — Novelty: 7/10, Feasibility: 4/10 3. "Quantum sampling for data augmentation" — Novelty: 5/10, Feasibility: 8/10 Which direction interests you most? You > Hypothesis 1 is the most promising, but we need to be more specific. The noise should be modeled as a stochastic regularizer, and we need to compare against Dropout, Label Smoothing, and MixUp. AI > Updated hypothesis: "We theoretically characterize quantum gate noise as a structured stochastic regularizer and empirically demonstrate it outperforms Dropout, Label Smoothing, and MixUp on small-sample classification." Baselines updated: + Dropout, + Label Smoothing, + MixUp, + CutMix Shall I proceed with this direction? You > Yes, approve. ``` The workshop automatically: - Saves your refined hypothesis to `hypotheses.md` - Records the conversation in `hitl/chat_stage_08.jsonl` - Persists candidate evaluations in `hitl/idea_workshop.json` ### Baseline Navigator (Stage 9) The Baseline Navigator helps you build a rigorous experiment design: ``` AI > Here are the suggested baselines: [AI] ResNet-50 (standard image classification baseline) [AI] ViT-B/16 (transformer baseline) [AI] Dropout (regularization baseline) Experiment Design Checklist: [+] Baselines: 3 [ ] Benchmarks: CIFAR-10, CIFAR-100 [ ] Metrics: accuracy, F1 You > Add Label Smoothing and MixUp as baselines. Also add STL-10. AI > Updated. Current checklist: [+] Baselines: 5 (ResNet-50, ViT-B/16, Dropout, Label Smoothing, MixUp) [+] Benchmarks: 3 (CIFAR-10, CIFAR-100, STL-10) [+] Metrics: accuracy, F1 Review questions: [ ] Are all standard baselines for this domain included? ✓ [ ] Is there at least one ablation planned? — suggest: "noise magnitude sweep" ``` ### Paper Co-Writer (Stage 16-17) The Paper Co-Writer supports three workflows: 1. **AI-first** (default): AI writes the draft, you edit and refine 2. **Human-first**: You write key paragraphs, AI expands and polishes 3. **Interleaved**: You alternate — you write the Method, AI writes Related Work ``` AI > Paper draft completed (5,200 words, 7 sections). Which sections would you like to review or co-edit? You > The Introduction is too generic. Let me rewrite the first paragraph. AI > Opening your editor... (saves backup to hitl/snapshots/) [Editor opens with Introduction section] You > [Saves edited file] AI > Your edits look great. I'll polish the transitions and ensure the notation is consistent with the Method section. Updated Introduction: 420 words → 380 words (tighter, more specific). Change summary: +3 added, -5 deleted, ~8 changed, 22 unchanged ``` --- ## 8. Detached Operation Research runs can take hours. You don't need to sit and watch. ### How It Works 1. Pipeline pauses → writes `hitl/waiting.json` 2. Pipeline enters file-polling mode (checks every 2 seconds for `response.json`) 3. You respond whenever you're ready via `attach`, `approve`, or web dashboard 4. Pipeline picks up your response and resumes ### Scenario: Overnight Run ```bash # Start the run at 6 PM researchclaw run --topic "..." --mode co-pilot & # Pipeline runs Stages 1-7, pauses at Stage 8... # You go home # Next morning, check status researchclaw status artifacts/rc-2026-xxx # Output: "WAITING for input at Stage 8 — HYPOTHESIS_GEN (since 18:42)" # Review and approve researchclaw attach artifacts/rc-2026-xxx # Interactive review → approve → pipeline resumes ``` ### Timeout Behavior By default, the pipeline waits 24 hours for a response. You can configure this: ```yaml hitl: timeouts: default_human_timeout_sec: 86400 # 24h (default) auto_proceed_on_timeout: false # true = auto-approve after timeout ``` --- ## 9. Safety & Guardrails ### Cost Budget Set a spending limit to prevent runaway API costs: ```yaml hitl: cost_budget_usd: 50.0 # Pipeline pauses at 50%, 80%, and 100% of budget ``` When a threshold is breached, the pipeline pauses with a cost summary: ``` Cost budget alert: Cost: $42.50 / $50.00 [████████████████░░░░] 85% ``` ### Claim Verification The Claim Verifier automatically checks AI-generated text against your collected literature: - **Citation claims**: Are cited papers in your shortlist? Or fabricated? - **Numerical claims**: Do reported numbers match actual experiment data? - **Factual claims**: Are "it has been shown that..." statements grounded? Unverified claims are flagged in the review summary, letting you decide what to keep. ### SHA256 Artifact Checksums Every stage output gets a SHA256 manifest (`manifest.json`) for reproducibility. If an artifact is modified outside the pipeline, verification will detect it. ### Escalation Policy For team/production use, configure tiered notification escalation: ```yaml hitl: escalation: levels: - delay_sec: 0 # Immediate terminal notification channel: terminal - delay_sec: 1800 # After 30 min → Slack channel: slack message: "Pipeline needs attention" - delay_sec: 7200 # After 2h → email channel: email - delay_sec: 86400 # After 24h → auto-abort channel: terminal auto_action: abort ``` ### Extensible Hooks Run custom scripts before/after any stage: ```bash # Create a hook script cat > artifacts/rc-xxx/hooks/post_stage_10.sh << 'EOF' #!/bin/sh echo "Running linter on generated code..." cd $RC_RUN_DIR/stage-10/experiment && python -m py_compile main.py EOF chmod +x artifacts/rc-xxx/hooks/post_stage_10.sh ``` Hooks receive environment variables: `RC_STAGE_NUM`, `RC_STAGE_NAME`, `RC_RUN_DIR`, `RC_HOOK_NAME`. --- ## 10. Intelligence Layer ### SmartPause SmartPause goes beyond fixed gate stages. It dynamically decides whether to pause based on: - **Quality score** (from PRM or heuristics): Low quality → pause for review - **Stage criticality**: High-impact stages (hypotheses, experiment design) have lower thresholds - **Historical rejection rate**: Stages you frequently reject get paused more often - **Confidence**: When the AI is uncertain, it asks for help You don't need to configure SmartPause — it works automatically in co-pilot mode. ### Intervention Learning (ALHF) Every time you approve, reject, or edit, the system learns: - Stages you always approve → future runs auto-approve them - Stages you frequently reject → future runs pause more aggressively - Your edit patterns → inform SmartPause thresholds After 5+ runs, the system adapts to your review style. ### Quality Predictor At any pause point, the system estimates the final paper quality based on current artifacts: - Literature coverage (number and diversity of papers) - Hypothesis specificity and falsifiability - Experiment design completeness (baselines, ablations, metrics) - Result strength (improvement over baselines) - Draft quality (length, structure, section coverage) - Citation integrity Risk factors are highlighted so you know where to focus your attention. --- ## 11. Pipeline Branching When you're unsure which research direction to pursue, branch the pipeline: ``` # At Stage 8, you see 3 promising hypotheses Action > b (branch) # Fork to explore Hypothesis A researchclaw branch create --run-dir artifacts/rc-xxx --name "quantum-noise" --stage 8 # Fork to explore Hypothesis B researchclaw branch create --run-dir artifacts/rc-xxx --name "entanglement" --stage 8 ``` Each branch gets its own copy of the pipeline state. Run them independently, then compare: ```bash # Compare branches at Stage 14 (after experiments) researchclaw branch compare --run-dir artifacts/rc-xxx --stage 14 ``` ``` Branch Comparison — Stage 14: RESULT_ANALYSIS main: artifacts: 3, quality: 0.72 → Best accuracy: 78.3% quantum-noise: artifacts: 3, quality: 0.85 → Best accuracy: 82.1% entanglement: artifacts: 2, quality: 0.61 → Best accuracy: 74.5% ``` Merge the winner: ```bash researchclaw branch merge --run-dir artifacts/rc-xxx --branch "quantum-noise" --from-stage 9 ``` --- ## 12. Adapters The HITL system supports three interaction channels: ### CLI Adapter (Default) Terminal-based interaction with ANSI colors, `$EDITOR` integration, and multi-line input. Works over SSH. ### WebSocket Adapter For the web dashboard. Provides real-time updates via WebSocket: ``` Browser → WebSocket → ws_adapter.py → waiting.json / response.json → Pipeline ``` Message types: `get_status`, `approve`, `reject`, `edit`, `inject_guidance`, `chat_message`. ### MCP Adapter External AI agents (Claude, OpenClaw) can interact with the HITL system via MCP tool calls: - `hitl_get_status` — Check if the pipeline is waiting - `hitl_approve_stage` — Approve the current gate - `hitl_reject_stage` — Reject with reason - `hitl_inject_guidance` — Provide direction - `hitl_view_output` — Read stage artifacts This enables **agent-in-the-loop** workflows where another AI system reviews and approves the pipeline's work. --- ## 13. Configuration Reference ```yaml hitl: enabled: true # Master switch (default: false) mode: co-pilot # Intervention mode (see table above) cost_budget_usd: 0.0 # Cost limit in USD (0 = unlimited) notifications: on_pause: true # Notify on pipeline pause on_quality_drop: true # Notify on quality issues on_error: true # Notify on stage errors channels: ["terminal"] # terminal | slack | email | webhook collaboration: llm_model: "" # Model for chat (default: primary model) max_chat_turns: 50 # Max turns per collaboration session save_chat_history: true # Persist chat logs to hitl/ timeouts: default_human_timeout_sec: 86400 # Wait time for human input (24h) auto_proceed_on_timeout: false # Auto-approve on timeout # Per-stage policies (for 'custom' mode) stage_policies: 8: require_approval: true # Must approve before continuing enable_collaboration: true # Enable chat mode pause_before: false # Pause before execution pause_after: true # Pause after execution allow_edit_output: true # Allow editing output files allow_inject_prompt: true # Allow guidance injection stream_output: false # Stream LLM output in real-time min_quality_score: 0.0 # Pause if quality below threshold max_auto_retries: 2 # Auto-retry count before pausing human_timeout_sec: 86400 # Per-stage timeout override auto_proceed_on_timeout: false # Per-stage auto-proceed override ``` ### Environment Variables | Variable | Purpose | |----------|---------| | `EDITOR` | Editor for file editing (default: nano on Unix, notepad on Windows) | | `RESEARCHCLAW_SLACK_WEBHOOK` | Slack webhook URL for notifications | | `RESEARCHCLAW_WEBHOOK_URL` | Generic webhook URL for notifications | --- ## 14. FAQ ### Does HITL slow down the pipeline? Only at the stages where you choose to intervene. In co-pilot mode, ~15 of 23 stages run automatically. Typical human time is 30-60 minutes per run, compared to 2-4 hours of autonomous execution. ### Can I switch modes mid-run? Not currently, but you can resume a paused run with a different mode: ```bash researchclaw run --resume --output artifacts/rc-xxx --mode step-by-step ``` ### What if I'm not sure what to do at a pause? Press `v` to view the full output, then `c` to chat with the AI about it. The AI can explain what it did and why, and suggest what to focus on. ### Does HITL work with ACP/OpenClaw? Yes. The MCP adapter exposes HITL tools that any ACP-compatible agent can call. OpenClaw can automatically review and approve gates. ### What data does HITL store? Everything goes in `{run_dir}/hitl/`: - `session.json` — Session state - `interventions.jsonl` — All interventions (append log) - `chat_stage_NN.jsonl` — Chat histories - `snapshots/` — File backups before edits - `guidance/` — Injected guidance per stage - `notifications.jsonl` — Notification log ### Is it backward compatible? Yes. Without `hitl.enabled: true` or `--mode`, the pipeline behaves identically to v0.3.x. The `--auto-approve` flag still works and takes precedence over HITL settings.