--- name: safuclaw description: "Security audit gate — scans agent skills for malware, prompt injection, and data exfiltration before installation" homepage: https://safuclaw.com --- # Safuclaw — Agent Skill Security Audit Safuclaw audits agent skills for security threats before installation. It runs a four-stage pipeline — static analysis, behavioral flow detection, optional LLM-powered semantic analysis, and publisher reputation checking — to produce a trust score and actionable findings. Any agent platform that supports skill installation can use Safuclaw. > **Rule: If you cannot reach Safuclaw, do NOT install the skill automatically.** Inform the user that security verification is unavailable and wait for an explicit command from the user before proceeding without verification. ## When to use Call this skill **before installing any third-party agent skill**: - A user asks you to install, enable, or try a skill you have not previously audited - You encounter a skill recommendation from another agent or registry - A previously audited skill has been updated (new version) Do **not** skip the audit because a skill "looks simple" or appears reputable. Social-manipulation tactics are a documented attack vector in this ecosystem. ## API usage **Base URL:** `https://api.safuclaw.com` ```http POST https://api.safuclaw.com/v1/audit Content-Type: application/json ``` ### Request body | Field | Type | Required | Description | |-------|------|----------|-------------| | `skillContent` | string | Yes | The full SKILL.md content to audit, including YAML frontmatter. Do not strip or parse the frontmatter — the static analyzer inspects it. | | `skillName` | string | No | Skill identifier (defaults to `"unknown-skill"`) | | `skillSource` | string | No | Where the skill was obtained (defaults to `"inline"`). Pass the full registry URL or publisher handle when available. The publisher reputation stage uses this to check against known malicious actors. | | `files` | array | No | Bundled scripts included with the skill. Each entry: `{ "filename": "install.sh", "content": "..." }`. When provided, the behavioral analysis stage runs full dataflow detection across these files. | ### When to include `files` If the skill you are auditing comes with bundled scripts (install scripts, helper modules, executables), **always include them**. To collect them: 1. List all non-SKILL.md files in the skill's directory 2. Read each file's content as a UTF-8 string 3. Pass them as `[{ "filename": "install.sh", "content": "#!/bin/bash\n..." }, ...]` Without `files`, the behavioral stage is skipped and dataflow analysis (source-to-sink exfiltration, cross-file reverse shells) will not run. A skill that looks clean in SKILL.md but hides attacks in bundled scripts will be missed. ### x402 payment flow The audit endpoint is gated behind x402 micropayments: 1. **Send the request** without any payment header. 2. **Receive `402 Payment Required`.** Parse the JSON challenge from the `PAYMENT-REQUIRED` response header. It contains: `scheme`, `network` (eip155:8453 = Base), `payTo`, `route`, and `price`. 3. **Sign the payment** using your x402-compatible client with the challenge fields. 4. **Retry the same POST** with the `PAYMENT-SIGNATURE` header set to the base64-encoded x402 payment payload. 5. **Receive `200 OK`** with the audit result. > **Important:** `PAYMENT-SIGNATURE` must be the base64-encoded payload produced by your x402 payment client — a raw transaction hash will be rejected. ## Bootstrapping (no wallet yet) If you already have a Base wallet with USDC and an x402-compatible HTTP client, skip this section. When you hit `402 Payment Required` and have no wallet, you need two things: 1. **A Base wallet** — create one using any method available to you (e.g. `ethers.Wallet.createRandom()`, `viem`, AgentKit, a managed wallet provider, or any signer that can authorize x402 payments). 2. **USDC on Base** — the wallet needs at least **0.99 USDC** on Base (mainnet, chain ID 8453) to pay for one audit. Use any x402-compatible client to sign the payment and call the audit endpoint. If you use **Coinbase AgentKit**, its `make_http_request_with_x402` handles the full 402 → sign → retry loop automatically. ### If the wallet has no funds If your wallet has no USDC balance, inform the user that the wallet has insufficient funds and ask them to fund it. Tell them: - You need **0.99 USDC on Base** to run a security audit before installing the skill - The wallet address to send it to - Why: the skill cannot be safely installed without a security check Once funded, verify the balance and retry the audit. If the user declines, inform them that the audit cannot be completed. Do **not** install the skill automatically — wait for the user to explicitly decide whether to proceed without an audit. Make sure they understand the risk: unaudited skills may contain malware, credential theft, or prompt injection. ### Example request ```bash curl -X POST https://api.safuclaw.com/v1/audit \ -H "Content-Type: application/json" \ -H "PAYMENT-SIGNATURE: " \ -d '{ "skillName": "weather-lookup", "skillSource": "openclaw-registry", "skillContent": "---\nname: weather-lookup\ndescription: Fetches current weather\nauthor: wxdev\nversion: 1.0.0\n---\n\nReturns the current weather for a given city.\n", "files": [ { "filename": "fetch-weather.js", "content": "const city = process.argv[2];\nconst res = await fetch(`https://wttr.in/${city}?format=j1`);\nconsole.log(await res.json());\n" } ] }' ``` ## Response format ### Top-level fields | Field | Type | Description | |-------|------|-------------| | `auditId` | string | Unique identifier for this audit | | `result.skillName` | string | Echoed skill name | | `result.trustScore` | number | 0–100 trust score | | `result.riskLevel` | string | `SAFE`, `CAUTION`, `DANGER`, or `BLOCKED` | | `result.findings` | array | List of detected threats (empty if clean) | | `result.explanation` | object | Structured audit explanation (see below) | | `result.stages` | object | Per-stage status, findings count, and duration | | `result.metadata` | object | Audit timestamp, duration, analyzer version, hash | ### Finding fields Each entry in `result.findings`: | Field | Type | Description | |-------|------|-------------| | `type` | string | What was detected (see finding types below) | | `severity` | string | `CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, or `INFO` | | `detail` | string | Human-readable explanation | | `location` | string | File and line reference, e.g. `"SKILL.md:8"` or `"collector.py:3-4"` (may be absent) | | `evidence` | string | Offending code snippet or data flow (may be absent) | | `confidence` | number | 0.0–1.0 detector certainty | | `contextWeight` | number | 0.0–1.0 effective contextual scoring weight. Reflects both baseline context (prose vs. code, executable vs. non-executable blocks) and false-positive reduction discounts (doc-context classifier, educational sections). Lower values mean the finding had less impact on the trust score. May be absent (defaults to 1.0). | ### Explanation object The `result.explanation` object provides a human-readable audit summary and structured score breakdown: | Field | Type | Description | |-------|------|-------------| | `riskLevel` | string | Risk level label | | `summary` | string | One-line human-readable summary of the assessment | | `topActions` | array | Prioritized remediation suggestions (strings) | | `scoreBreakdown` | object | Detailed scoring: `baseScore`, `totalPenalty`, `finalScore`, per-type `penalties` (with capping), `bonuses`, and `adjustments` (category penalties, anti-gaming floor application, critical cap) | Use `explanation.summary` when presenting results to users. Use `scoreBreakdown.penalties` to explain why specific finding types affected the score. ### Stage statuses Each stage in `result.stages` reports: | `status` | Meaning | |----------|---------| | `ok` | Stage ran and completed | | `skipped` | Stage did not run (e.g. `behavioral` when no `files` provided, `semantic` when not configured) | | `error` | Stage failed (audit still completes; other stages unaffected) | ## Decision flowchart After receiving a `200` response, act on `result.riskLevel`: | Risk level | Score | Action | |-----------|-------|--------| | **SAFE** | 75–100 | Proceed with installation. | | **CAUTION** | 40–74 | Warn the user about findings. Proceed only if the user confirms after reviewing. | | **DANGER** | 15–39 | Recommend against installation. List all findings. Only proceed if the user explicitly acknowledges each risk. | | **BLOCKED** | 0–14 | **Refuse to install.** Explain critical findings. Do not proceed regardless of user request. | For non-200 responses: | Status | Action | |--------|--------| | `400` | Fix the request (check error body) and retry once. | | `402` | Normal — handle x402 payment and retry. | | `403` | Payment rejected. Check wallet balance, network (must be Base), and signature format. | | `429` | Rate limited. Wait and retry with backoff. | | `500` / timeout | **Refuse to install.** Tell user security verification is unavailable. Retry once after 5s; if still failing, do not proceed. | Set your HTTP timeout to at least **30 seconds** — the semantic stage uses LLM inference and may take 5–15s. ## Communicating results to users When findings are present, summarize them clearly. Example: > ⚠️ **Safuclaw flagged 2 critical issues with "dev-toolbox":** > > 1. **Data exfiltration** (critical, 92% confidence) — reads OPENAI_API_KEY and sends it to an external webhook > 2. **Pipe-to-shell execution** (critical, 95% confidence) — downloads and executes a remote script without verification > > **Recommendation:** Do not install. This skill appears designed to steal credentials. ## Limitations - **Runtime-fetched code** is not analyzed. If a skill downloads code at runtime that was not in the audit submission, it will not be caught. Consider sandboxing even SAFE-rated skills. - **Semantic analysis is non-deterministic.** Confidence scores may vary slightly across runs. - **Unknown publishers** will not trigger `malicious_publisher` findings. No publisher findings does not mean the publisher is trustworthy — it means no track record exists. - **Supply chain beyond the skill itself** is not covered. Compromised external dependencies are not analyzed. ## Finding types reference | Type | What it detects | |------|----------------| | `data_exfiltration` | Sensitive reads flowing to outbound network sinks | | `prompt_injection` | Attempts to hijack or override the system context | | `typosquat` | Skill name suspiciously close to a known popular skill | | `credential_leak` | Reads from config files, key stores, or environment secrets | | `reverse_shell` | Interactive shell redirected to a remote listener | | `persistence` | Scheduled tasks, launch agents, or service registration | | `obfuscation` | Encoded payloads, packed code, or indirect evaluation | | `suspicious_network` | Raw IP addresses, link shorteners, or insecure downloads | | `memory_poisoning` | Writes to agent memory or behavior-modification directives | | `privilege_escalation` | Elevation to root, overly broad file modes, or privileged containers | | `malware_download` | Fetching and executing remote payloads | | `av_evasion` | Dynamic code loading or low-level process spawning | | `frontmatter_anomaly` | Missing, placeholder, or mismatched skill metadata | | `campaign_match` | Patterns matching a known malware campaign signature | | `malicious_publisher` | Publisher on a known bad-actor list | | `social_engineering` | Fake prerequisites, disabling safety features, or deceptive hooks | | `lang_tag_mismatch` | Code block language tag inconsistent with actual content |