--- name: generate-config description: Generate and validate mcpbr configuration files for MCP server benchmarking. --- # Instructions You are an expert at creating valid `mcpbr` configuration files. Your goal is to help users create correct YAML configs for their MCP servers. ## Critical Requirements 1. **Always Include {workdir} Placeholder:** The `args` array MUST include `"{workdir}"` as a placeholder for the task repository path. This is CRITICAL - mcpbr replaces this at runtime with the actual working directory. 2. **Valid Commands:** Ensure the `command` field uses an executable that exists on the user's system: - `npx` for Node.js-based MCP servers - `uvx` for Python MCP servers via uv - `python` or `python3` for direct Python execution - Custom binaries (verify they exist with `which `) 3. **Model Aliases:** Use short aliases when possible: - `sonnet` instead of `claude-sonnet-4-5-20250929` - `opus` instead of `claude-opus-4-5-20251101` - `haiku` instead of `claude-haiku-4-5-20251001` 4. **Required Fields:** Every config MUST have: - `mcp_server.command` - `mcp_server.args` (with `"{workdir}"`) - `provider` (usually `"anthropic"`) - `agent_harness` (usually `"claude-code"`) - `model` - `dataset` (or rely on benchmark default) ## Common MCP Server Configurations ### Anthropic Filesystem Server ```yaml mcp_server: name: "filesystem" command: "npx" args: - "-y" - "@modelcontextprotocol/server-filesystem" - "{workdir}" env: {} ``` ### Custom Python MCP Server ```yaml mcp_server: name: "my-server" command: "uvx" args: - "my-mcp-server" - "--workspace" - "{workdir}" env: LOG_LEVEL: "debug" ``` ### Supermodel Codebase Analysis ```yaml mcp_server: name: "supermodel" command: "npx" args: - "-y" - "@supermodeltools/mcp-server" env: SUPERMODEL_API_KEY: "${SUPERMODEL_API_KEY}" ``` ## Configuration Template When generating a new config, use this template: ```yaml mcp_server: name: "" command: "" args: - "" - "" - "{workdir}" # CRITICAL: Include this placeholder env: {} provider: "anthropic" agent_harness: "claude-code" model: "sonnet" # or "opus", "haiku" dataset: "SWE-bench/SWE-bench_Lite" # or null to use benchmark default sample_size: 5 timeout_seconds: 300 max_concurrent: 4 max_iterations: 30 ``` ## Validation Steps Before saving a config, validate: 1. **Workdir Placeholder:** Ensure `"{workdir}"` appears in `args` array. 2. **Command Exists:** Verify the command is available: ```bash which npx # or uvx, python, etc. ``` 3. **Syntax:** YAML syntax is correct (no tabs, proper indentation). 4. **Environment Variables:** If using env vars like `${API_KEY}`, remind user to set them. ## Benchmark-Specific Configurations ### SWE-bench (Default) ```yaml # ... mcp_server config ... provider: "anthropic" agent_harness: "claude-code" model: "sonnet" dataset: "SWE-bench/SWE-bench_Lite" # or SWE-bench/SWE-bench_Verified sample_size: 10 ``` ### CyberGym ```yaml # ... mcp_server config ... provider: "anthropic" agent_harness: "claude-code" model: "sonnet" benchmark: "cybergym" dataset: "sunblaze-ucb/cybergym" cybergym_level: 2 # 0-3 sample_size: 10 ``` ### MCPToolBench++ ```yaml # ... mcp_server config ... provider: "anthropic" agent_harness: "claude-code" model: "sonnet" benchmark: "mcptoolbench" dataset: "MCPToolBench/MCPToolBenchPP" sample_size: 10 ``` ## Custom Agent Prompts Users can customize the agent prompt using the `agent_prompt` field: ```yaml agent_prompt: | Fix the following bug in this repository: {problem_statement} Make the minimal changes necessary to fix the issue. Focus on the root cause, not symptoms. ``` **Important:** The `{problem_statement}` placeholder is required and will be replaced with the actual task description. ## Common Mistakes to Avoid 1. **Missing {workdir}:** Forgetting to include `"{workdir}"` in args. 2. **Hardcoded Paths:** Never hardcode absolute paths like `/workspace` or `/tmp/repo`. 3. **Invalid Commands:** Using commands that don't exist (e.g., `uv` instead of `uvx`). 4. **Wrong Indentation:** YAML is whitespace-sensitive. Use 2 spaces, not tabs. 5. **Missing Quotes:** Environment variable references like `"${VAR}"` need quotes. ## Example Workflow When a user asks to create a config: 1. Ask about their MCP server: - What package/command runs the server? - Does it need any special arguments or environment variables? - Is it Node.js-based (npx) or Python-based (uvx)? 2. Generate the config based on their answers. 3. Validate the config: - Check for `{workdir}` placeholder - Verify command exists - Confirm YAML syntax 4. Save the config (usually to `mcpbr.yaml`). 5. Optionally test the config with a small sample: ```bash mcpbr run -c mcpbr.yaml -n 1 -v ``` ## Helpful Commands ```bash # Generate a default config mcpbr init # List available models mcpbr models # List available benchmarks mcpbr benchmarks # Validate config by doing a dry run with 1 task mcpbr run -c config.yaml -n 1 -v ```