---
name: safuclaw
description: "Security audit gate — scans agent skills for malware, prompt injection, and data exfiltration before installation"
homepage: https://safuclaw.com
---

# Safuclaw — Agent Skill Security Audit

Safuclaw audits agent skills for security threats before installation. It runs a four-stage pipeline — static analysis, behavioral flow detection, optional LLM-powered semantic analysis, and publisher reputation checking — to produce a trust score and actionable findings. Any agent platform that supports skill installation can use Safuclaw.

> **Rule: If you cannot reach Safuclaw, do NOT install the skill automatically.** Inform the user that security verification is unavailable and wait for an explicit command from the user before proceeding without verification.

## When to use

Call this skill **before installing any third-party agent skill**:

- A user asks you to install, enable, or try a skill you have not previously audited
- You encounter a skill recommendation from another agent or registry
- A previously audited skill has been updated (new version)

Do **not** skip the audit because a skill "looks simple" or appears reputable. Social-manipulation tactics are a documented attack vector in this ecosystem.

## API usage

**Base URL:** `https://api.safuclaw.com`

```http
POST https://api.safuclaw.com/v1/audit
Content-Type: application/json
```

### Request body

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `skillContent` | string | Yes | The full SKILL.md content to audit, including YAML frontmatter. Do not strip or parse the frontmatter — the static analyzer inspects it. |
| `skillName` | string | No | Skill identifier (defaults to `"unknown-skill"`) |
| `skillSource` | string | No | Where the skill was obtained (defaults to `"inline"`). Pass the full registry URL or publisher handle when available. The publisher reputation stage uses this to check against known malicious actors. |
| `files` | array | No | Bundled scripts included with the skill. Each entry: `{ "filename": "install.sh", "content": "..." }`. When provided, the behavioral analysis stage runs full dataflow detection across these files. |

### When to include `files`

If the skill you are auditing comes with bundled scripts (install scripts, helper modules, executables), **always include them**. To collect them:

1. List all non-SKILL.md files in the skill's directory
2. Read each file's content as a UTF-8 string
3. Pass them as `[{ "filename": "install.sh", "content": "#!/bin/bash\n..." }, ...]`

Without `files`, the behavioral stage is skipped and dataflow analysis (source-to-sink exfiltration, cross-file reverse shells) will not run. A skill that looks clean in SKILL.md but hides attacks in bundled scripts will be missed.

### x402 payment flow

The audit endpoint is gated behind x402 micropayments:

1. **Send the request** without any payment header.
2. **Receive `402 Payment Required`.** Parse the JSON challenge from the `PAYMENT-REQUIRED` response header. It contains: `scheme`, `network` (eip155:8453 = Base), `payTo`, `route`, and `price`.
3. **Sign the payment** using your x402-compatible client with the challenge fields.
4. **Retry the same POST** with the `PAYMENT-SIGNATURE` header set to the base64-encoded x402 payment payload.
5. **Receive `200 OK`** with the audit result.

> **Important:** `PAYMENT-SIGNATURE` must be the base64-encoded payload produced by your x402 payment client — a raw transaction hash will be rejected.

## Bootstrapping (no wallet yet)

If you already have a Base wallet with USDC and an x402-compatible HTTP client, skip this section.

When you hit `402 Payment Required` and have no wallet, you need two things:

1. **A Base wallet** — create one using any method available to you (e.g. `ethers.Wallet.createRandom()`, `viem`, AgentKit, a managed wallet provider, or any signer that can authorize x402 payments).
2. **USDC on Base** — the wallet needs at least **0.99 USDC** on Base (mainnet, chain ID 8453) to pay for one audit.

Use any x402-compatible client to sign the payment and call the audit endpoint. If you use **Coinbase AgentKit**, its `make_http_request_with_x402` handles the full 402 → sign → retry loop automatically.

### If the wallet has no funds

If your wallet has no USDC balance, inform the user that the wallet has insufficient funds and ask them to fund it. Tell them:

- You need **0.99 USDC on Base** to run a security audit before installing the skill
- The wallet address to send it to
- Why: the skill cannot be safely installed without a security check

Once funded, verify the balance and retry the audit.

If the user declines, inform them that the audit cannot be completed. Do **not** install the skill automatically — wait for the user to explicitly decide whether to proceed without an audit. Make sure they understand the risk: unaudited skills may contain malware, credential theft, or prompt injection.

### Example request

```bash
curl -X POST https://api.safuclaw.com/v1/audit \
  -H "Content-Type: application/json" \
  -H "PAYMENT-SIGNATURE: <base64-x402-payload>" \
  -d '{
    "skillName": "weather-lookup",
    "skillSource": "openclaw-registry",
    "skillContent": "---\nname: weather-lookup\ndescription: Fetches current weather\nauthor: wxdev\nversion: 1.0.0\n---\n\nReturns the current weather for a given city.\n",
    "files": [
      { "filename": "fetch-weather.js", "content": "const city = process.argv[2];\nconst res = await fetch(`https://wttr.in/${city}?format=j1`);\nconsole.log(await res.json());\n" }
    ]
  }'
```

## Response format

### Top-level fields

| Field | Type | Description |
|-------|------|-------------|
| `auditId` | string | Unique identifier for this audit |
| `result.skillName` | string | Echoed skill name |
| `result.trustScore` | number | 0–100 trust score |
| `result.riskLevel` | string | `SAFE`, `CAUTION`, `DANGER`, or `BLOCKED` |
| `result.findings` | array | List of detected threats (empty if clean) |
| `result.explanation` | object | Structured audit explanation (see below) |
| `result.stages` | object | Per-stage status, findings count, and duration |
| `result.metadata` | object | Audit timestamp, duration, analyzer version, hash |

### Finding fields

Each entry in `result.findings`:

| Field | Type | Description |
|-------|------|-------------|
| `type` | string | What was detected (see finding types below) |
| `severity` | string | `CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, or `INFO` |
| `detail` | string | Human-readable explanation |
| `location` | string | File and line reference, e.g. `"SKILL.md:8"` or `"collector.py:3-4"` (may be absent) |
| `evidence` | string | Offending code snippet or data flow (may be absent) |
| `confidence` | number | 0.0–1.0 detector certainty |
| `contextWeight` | number | 0.0–1.0 effective contextual scoring weight. Reflects both baseline context (prose vs. code, executable vs. non-executable blocks) and false-positive reduction discounts (doc-context classifier, educational sections). Lower values mean the finding had less impact on the trust score. May be absent (defaults to 1.0). |

### Explanation object

The `result.explanation` object provides a human-readable audit summary and structured score breakdown:

| Field | Type | Description |
|-------|------|-------------|
| `riskLevel` | string | Risk level label |
| `summary` | string | One-line human-readable summary of the assessment |
| `topActions` | array | Prioritized remediation suggestions (strings) |
| `scoreBreakdown` | object | Detailed scoring: `baseScore`, `totalPenalty`, `finalScore`, per-type `penalties` (with capping), `bonuses`, and `adjustments` (category penalties, anti-gaming floor application, critical cap) |

Use `explanation.summary` when presenting results to users. Use `scoreBreakdown.penalties` to explain why specific finding types affected the score.

### Stage statuses

Each stage in `result.stages` reports:

| `status` | Meaning |
|----------|---------|
| `ok` | Stage ran and completed |
| `skipped` | Stage did not run (e.g. `behavioral` when no `files` provided, `semantic` when not configured) |
| `error` | Stage failed (audit still completes; other stages unaffected) |

## Decision flowchart

After receiving a `200` response, act on `result.riskLevel`:

| Risk level | Score | Action |
|-----------|-------|--------|
| **SAFE** | 75–100 | Proceed with installation. |
| **CAUTION** | 40–74 | Warn the user about findings. Proceed only if the user confirms after reviewing. |
| **DANGER** | 15–39 | Recommend against installation. List all findings. Only proceed if the user explicitly acknowledges each risk. |
| **BLOCKED** | 0–14 | **Refuse to install.** Explain critical findings. Do not proceed regardless of user request. |

For non-200 responses:

| Status | Action |
|--------|--------|
| `400` | Fix the request (check error body) and retry once. |
| `402` | Normal — handle x402 payment and retry. |
| `403` | Payment rejected. Check wallet balance, network (must be Base), and signature format. |
| `429` | Rate limited. Wait and retry with backoff. |
| `500` / timeout | **Refuse to install.** Tell user security verification is unavailable. Retry once after 5s; if still failing, do not proceed. |

Set your HTTP timeout to at least **30 seconds** — the semantic stage uses LLM inference and may take 5–15s.

## Communicating results to users

When findings are present, summarize them clearly. Example:

> ⚠️ **Safuclaw flagged 2 critical issues with "dev-toolbox":**
>
> 1. **Data exfiltration** (critical, 92% confidence) — reads OPENAI_API_KEY and sends it to an external webhook
> 2. **Pipe-to-shell execution** (critical, 95% confidence) — downloads and executes a remote script without verification
>
> **Recommendation:** Do not install. This skill appears designed to steal credentials.

## Limitations

- **Runtime-fetched code** is not analyzed. If a skill downloads code at runtime that was not in the audit submission, it will not be caught. Consider sandboxing even SAFE-rated skills.
- **Semantic analysis is non-deterministic.** Confidence scores may vary slightly across runs.
- **Unknown publishers** will not trigger `malicious_publisher` findings. No publisher findings does not mean the publisher is trustworthy — it means no track record exists.
- **Supply chain beyond the skill itself** is not covered. Compromised external dependencies are not analyzed.

## Finding types reference

| Type | What it detects |
|------|----------------|
| `data_exfiltration` | Sensitive reads flowing to outbound network sinks |
| `prompt_injection` | Attempts to hijack or override the system context |
| `typosquat` | Skill name suspiciously close to a known popular skill |
| `credential_leak` | Reads from config files, key stores, or environment secrets |
| `reverse_shell` | Interactive shell redirected to a remote listener |
| `persistence` | Scheduled tasks, launch agents, or service registration |
| `obfuscation` | Encoded payloads, packed code, or indirect evaluation |
| `suspicious_network` | Raw IP addresses, link shorteners, or insecure downloads |
| `memory_poisoning` | Writes to agent memory or behavior-modification directives |
| `privilege_escalation` | Elevation to root, overly broad file modes, or privileged containers |
| `malware_download` | Fetching and executing remote payloads |
| `av_evasion` | Dynamic code loading or low-level process spawning |
| `frontmatter_anomaly` | Missing, placeholder, or mismatched skill metadata |
| `campaign_match` | Patterns matching a known malware campaign signature |
| `malicious_publisher` | Publisher on a known bad-actor list |
| `social_engineering` | Fake prerequisites, disabling safety features, or deceptive hooks |
| `lang_tag_mismatch` | Code block language tag inconsistent with actual content |