# AutoResearchClaw Integration Guide > **The simplest way to use AutoResearchClaw**: give the repo URL to [OpenClaw](https://github.com/openclaw/openclaw) and say *"Research [your topic]."* That's it — OpenClaw handles cloning, installing, configuring, and running the entire 23-stage pipeline for you. This guide is for humans who want to understand what's happening under the hood, or who prefer to set things up manually. --- ## Table of Contents 1. [The Easy Way: OpenClaw](#1-the-easy-way-openclaw) 2. [Manual Setup](#2-manual-setup) 3. [Configuration Walkthrough](#3-configuration-walkthrough) 4. [Running the Pipeline](#4-running-the-pipeline) 5. [Understanding the 23 Stages](#5-understanding-the-23-stages) 6. [Output Artifacts](#6-output-artifacts) 7. [Experiment Modes](#7-experiment-modes) 8. [Conference Templates](#8-conference-templates) 9. [OpenClaw Bridge (Advanced)](#9-openclaw-bridge-advanced) 10. [MetaClaw Integration (Cross-Run Learning)](#10-metaclaw-integration-cross-run-learning) 11. [Other AI Platforms](#11-other-ai-platforms) 12. [Python API](#12-python-api) 13. [Troubleshooting](#13-troubleshooting) 14. [FAQ](#14-faq) --- ## 1. The Easy Way: OpenClaw If you use [OpenClaw](https://github.com/openclaw/openclaw) as your AI assistant, you don't need to read the rest of this guide. ### Steps 1. Share the GitHub repo URL with OpenClaw: ``` https://github.com/aiming-lab/AutoResearchClaw ``` 2. OpenClaw reads `RESEARCHCLAW_AGENTS.md` and `README.md` — it now understands the entire system. > **Note:** `RESEARCHCLAW_AGENTS.md` is generated locally and listed in `.gitignore`. If it doesn't exist, OpenClaw can bootstrap from `README.md` and the project structure. 3. Say something like: ``` Research the application of graph neural networks in drug discovery ``` 4. OpenClaw will: - Clone the repo - Create a virtual environment and install dependencies (`pip install -e .`) - Copy `config.researchclaw.example.yaml` → `config.yaml` - Ask you for an OpenAI API key (or use your environment variable) - Run the full 23-stage pipeline - Return the paper, experiment code, charts, and citations **That's the whole process.** OpenClaw is designed to read agent definition files and bootstrap itself. AutoResearchClaw ships with these files specifically so that any OpenClaw-compatible AI assistant can pick it up and run. ### What if I want to tweak settings? Tell OpenClaw in natural language: - *"Use GPT-5.2 instead of GPT-4o"* - *"Run experiments in sandbox mode, not simulated"* - *"Target ICLR 2025 format instead of NeurIPS"* - *"Skip the quality gate, just auto-approve everything"* OpenClaw will modify `config.yaml` accordingly before running the pipeline. --- ## 2. Manual Setup ### Prerequisites | Requirement | Details | |-------------|---------| | **Python** | 3.11 or newer | | **LLM API** | Any OpenAI-compatible endpoint (OpenAI, Azure, local proxy, etc.) | | **Disk space** | ~100 MB for the repo + artifacts per run | | **Network** | Required for LLM API calls and literature search (Semantic Scholar, arXiv) | ### Installation ```bash # Clone the repository git clone https://github.com/aiming-lab/AutoResearchClaw.git cd AutoResearchClaw # Create a virtual environment (recommended) python3 -m venv .venv source .venv/bin/activate # macOS/Linux # .venv\Scripts\activate # Windows # Install pip install -e . ``` ### Verify Installation ```bash # Check the CLI is available researchclaw --help # Validate your configuration researchclaw validate --config config.yaml ``` --- ## 3. Configuration Walkthrough Start from the provided template: ```bash cp config.researchclaw.example.yaml config.yaml ``` Open `config.yaml` in your editor. Here's what each section does: ### LLM Settings (Required) This is the only section you **must** configure. Everything else has sensible defaults. ```yaml llm: base_url: "https://api.openai.com/v1" # Your LLM API endpoint api_key_env: "OPENAI_API_KEY" # Environment variable name... api_key: "" # ...or paste the key directly here primary_model: "gpt-4o" # Model to use (gpt-4o, gpt-5.2, etc.) fallback_models: # Tried in order if primary fails - "gpt-4.1" - "gpt-4o-mini" s2_api_key: "" # Optional: Semantic Scholar API key for higher rate limits ``` **Using an environment variable** (recommended for security): ```bash export OPENAI_API_KEY="sk-..." ``` **Using a direct key** (simpler, less secure): ```yaml llm: api_key: "sk-your-key-here" ``` **Using a proxy or alternative provider**: ```yaml llm: base_url: "https://your-proxy.example.com/v1" api_key: "your-proxy-key" primary_model: "gpt-4o" # Must be supported by your endpoint ``` ### Research Settings ```yaml research: topic: "Your research topic here" # Can also be set via CLI --topic flag domains: - "machine-learning" # Guides literature search scope daily_paper_count: 10 # Target papers to collect quality_threshold: 4.0 # Minimum paper quality score (1-5) ``` ### Experiment Settings ```yaml experiment: mode: "sandbox" # How experiments run (see Section 7) time_budget_sec: 300 # Max seconds per experiment run max_iterations: 10 # Max refinement loops in Stage 13 metric_key: "primary_metric" # What metric to optimize metric_direction: "minimize" # "minimize" or "maximize" sandbox: python_path: ".venv/bin/python3" # Python binary for sandbox execution gpu_required: false max_memory_mb: 4096 code_agent: # CodeAgent v2 (multi-phase code generation) enabled: true # Architecture planning + sequential file gen + hard validation benchmark_agent: # Automated dataset & baseline selection enabled: true # 4-agent pipeline: Surveyor→Selector→Acquirer→Validator figure_agent: # Academic figure generation enabled: true # 5-agent pipeline: Planner→CodeGen→Renderer→Critic→Integrator repair: # Anti-fabrication experiment repair enabled: true # Diagnose and fix failed experiments before paper writing max_cycles: 3 # Repair retry loops opencode: # OpenCode Beast Mode (see README for details) enabled: true ``` ### Export Settings ```yaml export: target_conference: "neurips_2025" # See Section 8 for all available templates authors: "Anonymous" # Author line in the paper bib_file: "references" # BibTeX file name (without .bib) ``` ### Everything Else (Optional) These have reasonable defaults. Change them only if you need to: ```yaml project: name: "my-research" # Just an identifier for your run mode: "full-auto" # "docs-first", "semi-auto", or "full-auto" runtime: timezone: "America/New_York" max_parallel_tasks: 3 approval_timeout_hours: 12 retry_limit: 2 security: hitl_required_stages: [5, 9, 20] # Stages that pause for human approval allow_publish_without_approval: false notifications: channel: "console" # "console", "discord", or "slack" knowledge_base: backend: "markdown" root: "docs/kb" ``` --- ## 4. Running the Pipeline ### Basic Run ```bash # Run with topic from config.yaml researchclaw run --config config.yaml --auto-approve # Override topic from command line researchclaw run --config config.yaml --topic "Transformer attention for time series" --auto-approve ``` ### CLI Commands | Command | What It Does | |---------|-------------| | `researchclaw setup` | Interactive first-time setup (installs OpenCode Beast Mode, checks Docker/LaTeX) | | `researchclaw init` | Interactive config creation (choose LLM provider, creates `config.arc.yaml`) | | `researchclaw run` | Run the full 23-stage pipeline | | `researchclaw validate` | Check your config file for errors | | `researchclaw doctor` | Diagnose environment issues (Python, dependencies, API connectivity) | | `researchclaw report --run-dir ` | Generate a human-readable summary of a completed run | ### Run Flags | Flag | Effect | |------|--------| | `--topic "..."` | Override the topic in config.yaml | | `--config path` | Config file path (default: `config.yaml`) | | `--output path` | Output directory (default: `artifacts//`) | | `--auto-approve` | Skip manual approval at gate stages (5, 9, 20) | | `--from-stage STAGE_NAME` | Start from a specific stage (e.g., `PAPER_OUTLINE`) | | `--resume` | Resume from the last checkpoint (auto-detects the most recent run matching your topic) | | `--skip-preflight` | Skip LLM connectivity check before starting | | `--skip-noncritical-stage` | Skip non-critical stages on failure instead of aborting | | `--no-graceful-degradation` | Fail pipeline on quality gate failure instead of degrading gracefully | ### Examples ```bash # Full autonomous run — no human intervention researchclaw run -c config.yaml -t "Graph neural networks for protein folding" --auto-approve # Resume a failed run from where it stopped researchclaw run -c config.yaml --resume --auto-approve # Re-run just the paper writing stages researchclaw run -c config.yaml --from-stage PAPER_OUTLINE --auto-approve # Check your setup before running researchclaw doctor -c config.yaml ``` --- ## 5. Understanding the 23 Stages The pipeline runs in 8 phases. Each stage reads artifacts from previous stages and produces new ones. ### Phase A: Research Scoping | # | Stage | What Happens | Produces | |---|-------|-------------|----------| | 1 | TOPIC_INIT | LLM formulates a SMART research goal; auto-detects GPU hardware (NVIDIA/MPS/CPU) | `goal.md`, `hardware_profile.json` | | 2 | PROBLEM_DECOMPOSE | Breaks the goal into prioritized sub-questions | `problem_tree.md` | ### Phase B: Literature Discovery | # | Stage | What Happens | Produces | |---|-------|-------------|----------| | 3 | SEARCH_STRATEGY | Plans search queries and data sources | `search_plan.yaml`, `sources.json` | | 4 | LITERATURE_COLLECT | Queries **real APIs** (arXiv-first, then Semantic Scholar) with expanded queries for broad coverage | `candidates.jsonl` | | 5 | LITERATURE_SCREEN | **[Gate]** Filters by relevance and quality | `shortlist.jsonl` | | 6 | KNOWLEDGE_EXTRACT | Extracts structured knowledge cards from each paper | `cards/` | ### Phase C: Knowledge Synthesis | # | Stage | What Happens | Produces | |---|-------|-------------|----------| | 7 | SYNTHESIS | Clusters findings, identifies research gaps | `synthesis.md` | | 8 | HYPOTHESIS_GEN | Generates falsifiable hypotheses | `hypotheses.md` | ### Phase D: Experiment Design | # | Stage | What Happens | Produces | |---|-------|-------------|----------| | 9 | EXPERIMENT_DESIGN | **[Gate]** Designs experiment plan with baselines and metrics | `exp_plan.yaml` | | 10 | CODE_GENERATION | LLM writes hardware-aware experiment code (adapts packages/constraints to GPU tier) | `experiment.py`, `experiment_spec.md` | | 11 | RESOURCE_PLANNING | Estimates GPU/time requirements | `schedule.json` | ### Phase E: Experiment Execution | # | Stage | What Happens | Produces | |---|-------|-------------|----------| | 12 | EXPERIMENT_RUN | Runs the experiment code (sandbox or simulated); immutable harness injected for time guard and metric validation; partial results captured on timeout | `runs/` | | 13 | ITERATIVE_REFINE | LLM analyzes results, improves code, re-runs (up to 10 iterations); timeout-aware prompts; NaN/divergence fast-fail; stdout truncated for context efficiency | `refinement_log.json`, `experiment_final.py` | ### Phase F: Analysis & Decision | # | Stage | What Happens | Produces | |---|-------|-------------|----------| | 14 | RESULT_ANALYSIS | Statistical analysis of experiment results | `analysis.md` | | 15 | RESEARCH_DECISION | PROCEED / PIVOT decision with evidence | `decision.md` | ### Phase G: Paper Writing | # | Stage | What Happens | Produces | |---|-------|-------------|----------| | 16 | PAPER_OUTLINE | Creates section-level paper outline | `outline.md` | | 17 | PAPER_DRAFT | Writes paper section-by-section (3 LLM calls, 5,000-6,500 words); **hard-blocked when no experiment metrics** (anti-fabrication); conference-grade title guidelines and abstract structure injected | `paper_draft.md` | | 18 | PEER_REVIEW | Simulates 2+ reviewer perspectives with NeurIPS/ICML rubric (1-10 scoring); checks baselines, ablations, claims vs evidence | `reviews.md` | | 19 | PAPER_REVISION | Addresses review comments with length guard (auto-retries if revised paper is shorter than draft) | `paper_revised.md` | ### Phase H: Finalization | # | Stage | What Happens | Produces | |---|-------|-------------|----------| | 20 | QUALITY_GATE | **[Gate]** Checks paper quality score | `quality_report.json` | | 21 | KNOWLEDGE_ARCHIVE | Saves retrospective + reproducibility bundle | `archive.md`, `bundle_index.json` | | 22 | EXPORT_PUBLISH | Generates LaTeX, charts, and code package | `paper_final.md`, `paper.tex`, `code/` | | 23 | CITATION_VERIFY | Fact-checks all references against real APIs | `verification_report.json`, `references_verified.bib` | ### Gate Stages Three stages pause for human review (unless `--auto-approve` is set): | Gate | What's Being Reviewed | On Reject, Rolls Back To | |------|-----------------------|--------------------------| | Stage 5 | Are the collected papers relevant and sufficient? | Stage 4 (re-collect literature) | | Stage 9 | Is the experiment design sound? | Stage 8 (re-generate hypotheses) | | Stage 20 | Does the paper meet quality standards? | Stage 16 (re-write from outline) | For fully autonomous operation, always use `--auto-approve`. --- ## 6. Output Artifacts Each run creates a timestamped directory under `artifacts/`: ``` artifacts/rc-20260310-143200-a1b2c3/ ├── stage-1/goal.md # Research goal ├── stage-2/problem_tree.md # Problem decomposition ├── stage-3/search_plan.yaml # Search strategy ├── stage-4/candidates.jsonl # Raw literature results ├── stage-5/shortlist.jsonl # Screened papers ├── stage-6/cards/ # Knowledge cards (one per paper) ├── stage-7/synthesis.md # Research gap analysis ├── stage-8/hypotheses.md # Research hypotheses ├── stage-9/exp_plan.yaml # Experiment plan ├── stage-10/experiment.py # Generated experiment code ├── stage-10/experiment_spec.md # Experiment specification ├── stage-11/schedule.json # Resource schedule ├── stage-12/runs/run-1.json # Experiment results ├── stage-13/experiment_final.py # Refined experiment code ├── stage-13/experiment_v1.py # Iteration 1 snapshot ├── stage-13/refinement_log.json # Refinement history ├── stage-14/analysis.md # Statistical analysis ├── stage-14/experiment_summary.json # Metrics summary ├── stage-15/decision.md # Proceed/Pivot decision ├── stage-16/outline.md # Paper outline ├── stage-17/paper_draft.md # Full paper draft ├── stage-18/reviews.md # Simulated peer reviews ├── stage-19/paper_revised.md # Revised paper ├── stage-20/quality_report.json # Quality assessment ├── stage-21/archive.md # Knowledge retrospective ├── stage-22/ │ ├── paper_final.md # Final paper (Markdown) │ ├── paper.tex # Conference-ready LaTeX │ ├── references.bib # BibTeX references │ ├── charts/ # Result visualizations │ └── code/ # Open-source code package │ ├── experiment.py │ ├── requirements.txt │ └── README.md ├── stage-23/ │ ├── verification_report.json # Citation fact-check results │ └── references_verified.bib # Cleaned bibliography └── pipeline_summary.json # Overall execution summary ``` ### Key Output Files | File | What You'll Use It For | |------|----------------------| | `stage-22/paper.tex` | Submit to a conference (compile with `pdflatex` or `tectonic`) | | `stage-22/paper_final.md` | Read or further edit the paper | | `stage-22/references.bib` | Bibliography for LaTeX compilation | | `stage-22/code/` | Share experiment code alongside the paper | | `stage-23/verification_report.json` | Check which citations are real vs. hallucinated | | `stage-13/experiment_final.py` | The best-performing experiment code | | `stage-22/charts/` | Figures for the paper | --- ## 7. Experiment Modes AutoResearchClaw supports four modes for running experiments: ### Simulated (Default) ```yaml experiment: mode: "simulated" ``` The LLM **generates synthetic experiment results** without executing any code. This is fast and requires no special setup, but the results are not real. **Best for**: Quick prototyping, testing the pipeline end-to-end, environments without Python scientific packages. ### Sandbox ```yaml experiment: mode: "sandbox" sandbox: python_path: ".venv/bin/python3" gpu_required: false max_memory_mb: 4096 ``` The pipeline **generates Python code and actually runs it** in a subprocess. The code is validated before execution (AST parsing, import whitelist, no file I/O outside sandbox). **Hardware-aware**: Stage 1 auto-detects your GPU (NVIDIA CUDA / Apple MPS / CPU-only) and adapts the generated code accordingly — high-tier GPUs get full PyTorch code, limited GPUs get lightweight experiments, CPU-only gets NumPy/sklearn only. **Best for**: Real experiments on your local machine. Supports numpy and stdlib; deep learning frameworks (torch, tensorflow) are available if installed in your environment and GPU is detected. **Safety features**: - Code validation blocks dangerous operations (subprocess, eval, exec, network calls) - Configurable memory limit and execution timeout - Auto-repair: if generated code has validation errors, the LLM fixes them (up to 3 attempts) ### Docker ```yaml experiment: mode: "docker" docker: image: "researchclaw/experiment:latest" gpu_enabled: true memory_limit_mb: 8192 network_policy: "setup_only" # none | setup_only | pip_only | full auto_install_deps: true shm_size_mb: 2048 ``` The pipeline runs generated code inside a **Docker container** with GPU passthrough, dependency auto-installation, and network isolation. Execution follows a **three-phase model** within a single container: 1. **Phase 0 (pip install)**: Installs auto-detected dependencies from `requirements.txt` (network enabled) 2. **Phase 1 (setup.py)**: Runs `setup.py` for dataset downloads and environment preparation (network enabled) 3. **Phase 2 (experiment)**: Executes the experiment code (network disabled by default via iptables) **Network policies**: - `none` — No network at all (all phases offline). Requires all deps pre-installed in image. - `setup_only` (default) — Network during Phase 0+1, disabled before Phase 2 via iptables (`--cap-add=NET_ADMIN`). - `pip_only` — Network only during Phase 0 (pip install), disabled for Phase 1+2. - `full` — Network available throughout all phases. **Pre-cached datasets**: The Docker image includes CIFAR-10/100, MNIST, FashionMNIST, STL-10, and SVHN at `/opt/datasets`, mounted read-only as `/workspace/data`. No download needed for these standard benchmarks. **Best for**: Reproducible experiments with full dependency isolation. Supports GPU passthrough (NVIDIA) and configurable network policies. **Setup**: Build the image first: ```bash docker build -t researchclaw/experiment:latest researchclaw/docker/ ``` ### SSH Remote ```yaml experiment: mode: "ssh_remote" ssh_remote: host: "gpu-server.example.com" gpu_ids: [0, 1] remote_workdir: "/tmp/researchclaw_experiments" ``` The pipeline sends generated code to a remote GPU server for execution. **Best for**: Experiments that require GPU hardware you don't have locally. --- ## 8. Conference Templates AutoResearchClaw generates LaTeX files formatted for specific conferences: ```yaml export: target_conference: "neurips_2025" ``` | Conference | Config Value | Layout | |------------|-------------|--------| | NeurIPS 2025 | `neurips_2025` (default) | Single-column, `neurips_2025` style | | NeurIPS 2024 | `neurips_2024` | Single-column, `neurips_2024` style | | ICLR 2026 | `iclr_2026` | Single-column, `iclr2026_conference` style | | ICLR 2025 | `iclr_2025` | Single-column, `iclr2025_conference` style | | ICML 2026 | `icml_2026` | Double-column, `icml2026` style | | ICML 2025 | `icml_2025` | Double-column, `icml2025` style | Short aliases are also accepted: `neurips` (→ 2025), `iclr` (→ 2026), `icml` (→ 2026). The Markdown-to-LaTeX converter handles: - Section headings (`#`, `##`, `###`) - Inline and display math (`$...$`, `$$...$$`) - Bold and italic text - Ordered and unordered lists - Tables - Code blocks - Citation references (`[cite_key]` → `\cite{cite_key}`) ### Compiling the LaTeX ```bash # Using tectonic (recommended) tectonic artifacts//stage-22/paper.tex # Using pdflatex cd artifacts//stage-22/ pdflatex paper.tex bibtex paper pdflatex paper.tex pdflatex paper.tex ``` --- ## 9. OpenClaw Bridge (Advanced) For deeper integration with OpenClaw, AutoResearchClaw includes a bridge adapter system. Each flag in the config activates a typed protocol interface: ```yaml openclaw_bridge: use_cron: true # Scheduled research runs use_message: true # Progress notifications (Discord/Slack/Telegram) use_memory: true # Cross-session knowledge persistence use_sessions_spawn: true # Spawn parallel sub-sessions for concurrent stages use_web_fetch: true # Live web search during literature review use_browser: false # Browser-based paper collection ``` ### What Each Adapter Does | Adapter | Protocol | Use Case | |---------|----------|----------| | **Cron** | `CronAdapter.schedule_resume(run_id, stage_id, reason)` | Schedule pipeline resumption (e.g., daily re-runs) | | **Message** | `MessageAdapter.notify(channel, subject, body)` | Send progress updates to chat platforms | | **Memory** | `MemoryAdapter.append(namespace, content)` | Persist knowledge across sessions | | **Sessions** | `SessionsAdapter.spawn(name, command)` | Run pipeline stages in parallel sub-sessions | | **WebFetch** | `WebFetchAdapter.fetch(url)` | Fetch web pages during literature search | | **Browser** | `BrowserAdapter.open(url)` | Open and interact with web pages | When OpenClaw provides a capability (e.g., message sending), the adapter consumes it automatically. When running standalone, recording stubs capture all calls for debugging without side effects. This is an **extension point** — you don't need to configure it for basic usage. --- ## 10. MetaClaw Integration (Cross-Run Learning) [MetaClaw](https://github.com/aiming-lab/MetaClaw) adds **cross-run knowledge transfer** to AutoResearchClaw. When enabled, the pipeline automatically captures lessons from failures and converts them into reusable skills that improve subsequent runs. ### Architecture ``` ┌──────────────────────────────────────────────────────┐ │ AutoResearchClaw Pipeline │ │ Stage 1 → 2 → ... → 23 │ │ │ │ ┌─────────────┐ ┌──────────────────────────────┐ │ │ │ LLMClient │───▶│ MetaClaw Integration Layer │ │ │ │ │ │ (metaclaw_bridge module) │ │ │ └─────────────┘ └──────────┬───────────────────┘ │ │ │ │ │ ┌─────────────┐ ┌──────────▼───────────────────┐ │ │ │ Evolution │◀──▶│ Lesson ↔ Skill Bridge │ │ │ │ Store │ └─────────────────────────────┘ │ │ └─────────────┘ │ └──────────────────────────┬───────────────────────────┘ │ ┌──────────────▼──────────────┐ │ MetaClaw Proxy Server │ │ (optional, :30000) │ │ ┌────────────────────────┐ │ │ │ SkillManager (40+ skills)│ │ │ │ + arc-* learned skills │ │ │ └────────────────────────┘ │ └─────────────────────────────┘ ``` ### How It Works 1. **Lesson Capture**: During each pipeline run, the `EvolutionStore` automatically records failures, warnings, and anomalies as structured lessons in `evolution/lessons.jsonl`. 2. **Lesson → Skill Conversion**: After a run completes, lessons above a configurable severity threshold are converted into `arc-*` skill files stored in `~/.metaclaw/skills/`. Each skill contains: trigger conditions, failure root cause, and actionable guidance. 3. **Skill Injection**: On the next run, `build_overlay()` reads all `arc-*` skills and injects them into the LLM prompt for every stage via the `evolution_overlay` parameter. The LLM receives explicit instructions to avoid previously encountered pitfalls. 4. **Proxy Routing (Optional)**: When the MetaClaw proxy is running, LLM requests are routed through it for additional skill matching and session tracking. If the proxy is unavailable, requests automatically fall back to the direct LLM endpoint. ### Setup #### Step 1: Install MetaClaw ```bash pip install metaclaw # Or clone from source: git clone https://github.com/aiming-lab/MetaClaw.git cd metaclaw && pip install -e . ``` #### Step 2: Configure Add the `metaclaw_bridge` section to your `config.arc.yaml`: ```yaml metaclaw_bridge: enabled: true proxy_url: "http://localhost:30000/v1" # MetaClaw proxy (optional) skills_dir: "~/.metaclaw/skills" # Skill storage directory fallback_url: "https://api.openai.com/v1" # Direct LLM fallback fallback_api_key_env: "OPENAI_API_KEY" lesson_to_skill: enabled: true min_severity: "warning" # Convert warnings + errors max_skills_per_run: 5 # Max new skills per run ``` #### Step 3: Run ```bash # First run — captures lessons, generates initial skills researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve # Check generated skills ls ~/.metaclaw/skills/arc-*/SKILL.md # Second run — skills from Run 1 are automatically injected researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve ``` #### Optional: Start MetaClaw Proxy For full skill matching and session tracking: ```bash metaclaw start --mode skills_only --port 30000 # Or use the provided script: bash scripts/metaclaw_start.sh ``` The proxy is optional — without it, the pipeline still benefits from skill injection via `build_overlay()` and falls back to your configured LLM endpoint. ### Experiment Results In controlled A/B experiments (same topic, same LLM, same configuration): | Metric | Baseline | With MetaClaw | Improvement | |--------|----------|---------------|-------------| | Stage retry rate | 10.5% | 7.9% | **-24.8%** | | Refine cycle count | 2.0 | 1.2 | **-40.0%** | | Pipeline stage completion | 18/19 | 19/19 | **+5.3%** | | Overall robustness score (composite) | 0.714 | 0.845 | **+18.3%** | > Composite robustness score is a weighted average of stage completion rate (40%), retry reduction (30%), and refine cycle efficiency (30%). ### Key Files | File | Purpose | |------|---------| | `researchclaw/metaclaw_bridge/` | Integration module (config, session, lesson_to_skill, prm_gate, skill_feedback) | | `researchclaw/evolution.py` | `build_overlay()` — reads intra-run lessons + cross-run arc-* skills | | `researchclaw/llm/client.py` | Proxy routing with automatic fallback | | `~/.metaclaw/skills/arc-*/SKILL.md` | Learned skill files (auto-generated) | | `scripts/metaclaw_start.sh` | Helper script to launch MetaClaw proxy | ### Backward Compatibility - **Default: OFF.** Without `metaclaw_bridge.enabled: true`, the pipeline is completely unchanged. - **No new required dependencies.** MetaClaw is optional. - **All 1,823 existing tests pass** with the integration code. --- ## 11. Other AI Platforms AutoResearchClaw works with any AI coding assistant that can read project context files. ### Claude Code Claude Code automatically reads `RESEARCHCLAW_CLAUDE.md` (if present) when you open the project. It also loads the skill definition from `.claude/skills/researchclaw/SKILL.md`. > **Note:** `RESEARCHCLAW_CLAUDE.md` is generated locally and listed in `.gitignore`. The `.claude/skills/researchclaw/SKILL.md` file is always available in the repo. ``` You: Research the impact of attention mechanisms on speech recognition Claude: [Reads project context, runs the pipeline, returns results] ``` ### Copilot CLI (GitHub) GitHub Copilot can be used as an ACP agent via the `gh` CLI command (GitHub CLI with Copilot extension). Set the ACP agent to `gh` in your config: ```yaml llm: provider: "acp" acp: agent: "gh" cwd: "." ``` Prerequisites: 1. Install [GitHub CLI](https://cli.github.com/) (`gh`) 2. Install the Copilot extension: `gh extension install github/gh-copilot` 3. Authenticate: `gh auth login` ### OpenCode OpenCode loads skills from `.claude/skills/`. The `researchclaw` skill activates on research-related queries and guides the agent through the pipeline. ### Any AI CLI Provide `RESEARCHCLAW_AGENTS.md` (if generated locally) or `README.md` as context to any AI assistant. `RESEARCHCLAW_AGENTS.md` contains: - The agent role definition (research orchestrator) - Quick setup instructions - Pipeline stage reference - Decision guide for common scenarios The agent reads this file and knows how to install, configure, and run the pipeline. If the file is not present, the `README.md` and `.claude/skills/researchclaw/SKILL.md` provide sufficient context for any AI assistant to operate the pipeline. --- ## 12. Python API For programmatic use or custom integrations: ```python from researchclaw.pipeline.runner import execute_pipeline from researchclaw.config import RCConfig from researchclaw.adapters import AdapterBundle from pathlib import Path # Load configuration config = RCConfig.load("config.yaml", check_paths=False) # Run the full pipeline results = execute_pipeline( run_dir=Path("artifacts/my-run"), run_id="run-001", config=config, adapters=AdapterBundle(), auto_approve_gates=True, ) # Check results for result in results: print(f"Stage {result.stage.name}: {result.status.value}") ``` ### Iterative Pipeline (Multiple Paper Revisions) ```python from researchclaw.pipeline.runner import execute_iterative_pipeline results = execute_iterative_pipeline( run_dir=Path("artifacts/my-run"), run_id="run-001", config=config, adapters=AdapterBundle(), max_iterations=3, # Re-run paper writing up to 3 times convergence_rounds=2, # Stop if quality stabilizes for 2 rounds ) ``` ### Literature Search Only ```python from researchclaw.literature.search import search_papers papers = search_papers("transformer attention mechanisms", limit=20) for p in papers: print(f"{p.title} ({p.year}) — cited {p.citation_count}x") print(p.to_bibtex()) ``` --- ## 13. Troubleshooting ### Pre-Run Diagnostics ```bash # Check everything: Python version, dependencies, API connectivity, config validity researchclaw doctor --config config.yaml ``` ### Common Issues | Problem | Cause | Solution | |---------|-------|----------| | `Missing required field: llm.base_url` | Config incomplete | Set `llm.base_url` and `llm.api_key` (or `api_key_env`) | | `Config validation FAILED` | Invalid YAML or missing fields | Run `researchclaw validate -c config.yaml` for details | | `Preflight check... FAILED` | LLM API unreachable | Check `base_url`, API key, and network connectivity | | Sandbox execution fails | Python path wrong or missing packages | Verify `experiment.sandbox.python_path` exists; ensure numpy is installed | | Code validation rejects all attempts | LLM generates unsafe code | Switch to `simulated` mode, or try a more capable model | | Gate stage blocks pipeline | Manual approval required | Use `--auto-approve` for autonomous mode | | Pipeline fails mid-run | Transient API error | Run with `--resume` to continue from the last checkpoint | | Citations marked HALLUCINATED | LLM invented fake references | This is expected — Stage 23 catches these. Use `references_verified.bib` instead | | LaTeX won't compile | Missing style packages | Install the conference style files, or use `tectonic` which auto-downloads them | ### Resuming a Failed Run ```bash # Resume from the exact point of failure researchclaw run -c config.yaml --resume --auto-approve # Or restart from a specific stage researchclaw run -c config.yaml --from-stage EXPERIMENT_RUN --auto-approve --output artifacts/ ``` ### Reading a Run Report ```bash researchclaw report --run-dir artifacts/rc-20260310-143200-a1b2c3 ``` This prints a human-readable summary: which stages passed, which failed, key metrics, and paper quality scores. --- ## 14. FAQ **Q: How much does a full pipeline run cost in API credits?** A: Depends on your model and topic complexity. A typical run with GPT-4o makes ~35-60 API calls across all 23 stages (paper drafting now uses 3 sequential calls for section-by-section writing). Expect roughly $3-12 per run. Simulated mode uses slightly fewer tokens since it doesn't generate real experiment code. **Q: Can I use a local LLM (Ollama, vLLM, etc.)?** A: Yes — any OpenAI-compatible endpoint works. Set `llm.base_url` to your local server (e.g., `http://localhost:11434/v1` for Ollama). Quality depends heavily on the model's capabilities. **Q: Can I run only part of the pipeline?** A: Yes. Use `--from-stage STAGE_NAME` to start from any stage. The stage reads its inputs from previously generated artifacts, so the earlier stages must have completed at least once. **Q: Are the literature references real?** A: Yes. Stage 4 uses a multi-source strategy (arXiv-first, then Semantic Scholar) with query expansion to find real papers with real titles, DOIs, and citation counts. The pipeline typically collects 100-200 candidates and aims for 30-60 references in the final paper. Stage 23 then verifies every reference to catch any that the LLM might have hallucinated during paper writing. **Q: Can I use this for a real paper submission?** A: AutoResearchClaw is a research tool, not a paper mill. The output is a strong first draft that should be reviewed, improved, and validated by a human researcher before submission. Think of it as an extremely thorough research assistant. **Q: What happens if the LLM API goes down mid-run?** A: The pipeline checkpoints after every stage. Use `--resume` to pick up where it left off. Failed stages are retried according to the `max_retries` setting in each stage's contract. **Q: Can I change the research topic mid-run?** A: Not recommended — the pipeline builds on prior stages' outputs. Start a new run with the new topic instead. --- *Last updated: March 2026 · AutoResearchClaw v0.3.1+*