---
name: somnia-agents-llm-inference
description: Deep-dive reference for the LLM Inference agent on Somnia — invoke a deterministic on-chain LLM (Qwen3-30B) from smart contracts. Covers the 4 functions (inferString, inferNumber, inferChat, inferToolsChat), MCP tool calling, on-chain tool yield/resume pattern, allowed-values constraints, and chain-of-thought. Use when building AI moderation, classification, summarization, sentiment scoring, or agentic DeFi bots that need an LLM to decide which on-chain calls to make.
---

# LLM Inference Agent

The LLM Inference agent (`llm-inference`) gives smart contracts access to an on-chain deterministic LLM — Qwen3-30B running with fixed seed and `temperature = 0`, so every validator independently produces **byte-identical** output. That's what makes consensus on AI results possible.

> Read the master `somnia-agents` skill first for the request lifecycle, gas model, and callback pattern. This document only covers the agent-specific ABI and quirks.

## Identity

| Field | Value |
|---|---|
| `agentId` | `12847293847561029384` |
| Per-agent price | **`0.07`** (whole tokens — SOMI on Mainnet, STT on Testnet) |
| Default consensus | Majority — deterministic, byte-identical outputs |
| Source of truth | [`references/agents.json`](../../references/agents.json) |

## Methods

| Function | Purpose |
|---|---|
| `inferString(prompt, system, chainOfThought, allowedValues)` | Single-turn string inference, optionally constrained to a fixed set of values |
| `inferNumber(prompt, system, minValue, maxValue, chainOfThought)` | Single-turn integer inference, clamped to `[minValue, maxValue]` |
| `inferChat(roles, messages, chainOfThought)` | Multi-turn chat with full message history |
| `inferToolsChat(roles, messages, mcpServerUrls, onchainTools, maxIterations, chainOfThought)` | Multi-turn chat with **MCP tool calling** (auto-executed) and **on-chain tool calling** (yielded back to caller as calldata) |

All four return their result via the standard request → callback flow. The full ABI is in [`references/agents.json`](../../references/agents.json) under `agents["llm-inference"].abi`.

---

## `inferString` — constrained classification

Best for: content moderation, sentiment labels, intent classification, any string output from a closed set.

```solidity
function inferString(
    string  prompt,
    string  system,
    bool    chainOfThought,
    string[] allowedValues
) returns (string response);
```

- `system`: system prompt; pass `""` if you don't need one.
- `allowedValues`: pass an empty array for unconstrained text. When non-empty, the model is **forced** to pick one of the listed strings — this is the safest pattern for on-chain logic that branches on the result.
- `chainOfThought`: when `true`, the model is allowed to reason internally (visible in the receipt) before producing the final answer. Increases latency and token cost; helpful for harder classification.

```solidity
bytes memory payload = abi.encodeWithSelector(
    ILLMAgent.inferString.selector,
    'Is this review positive or negative? "Absolutely loved it, best purchase ever!"',
    "You are a sentiment classifier. Reply with one word.",
    false,
    _array("positive", "negative", "neutral")
);
```

The response is decoded as a single `string`:

```solidity
string memory label = abi.decode(responses[0].result, (string));
```

---

## `inferNumber` — bounded integer inference

Best for: rating / scoring, count extraction, confidence values.

```solidity
function inferNumber(
    string prompt,
    string system,
    int256 minValue,
    int256 maxValue,
    bool   chainOfThought
) returns (int256 response);
```

The agent extracts the first integer from the model's response and **clamps** it to `[minValue, maxValue]`. Set `minValue = maxValue = 0` to disable clamping.

```solidity
bytes memory payload = abi.encodeWithSelector(
    ILLMAgent.inferNumber.selector,
    'Rate the sentiment of this review on a 1-10 scale: "..."',
    "You are a sentiment analyst. Reply with a single integer 1-10.",
    int256(1),
    int256(10),
    true  // chain-of-thought helps with subjective scores
);
```

Decode as `int256`:

```solidity
int256 score = abi.decode(responses[0].result, (int256));
```

---

## `inferChat` — multi-turn conversation

Pass full message history as parallel `roles[] / messages[]` arrays — same length, same order.

```solidity
function inferChat(
    string[] roles,        // "system" | "user" | "assistant"
    string[] messages,
    bool     chainOfThought
) returns (string response);
```

```solidity
string[] memory roles = new string[](4);
string[] memory msgs  = new string[](4);
roles[0] = "system";   msgs[0] = "You are a helpful coding assistant.";
roles[1] = "user";     msgs[1] = "How do I reverse a string in JavaScript?";
roles[2] = "assistant"; msgs[2] = "str.split('').reverse().join('')";
roles[3] = "user";     msgs[3] = "Can you explain that step by step?";

bytes memory payload = abi.encodeWithSelector(
    ILLMAgent.inferChat.selector, roles, msgs, false
);
```

Use this when the prompt naturally needs prior context (instructions earlier in the conversation, partial assistant outputs, few-shot examples).

---

## `inferToolsChat` — tool calling (MCP + on-chain)

The most powerful and most subtle of the four. The LLM can call:

- **MCP tools** — discovered automatically from the URLs you pass in `mcpServerUrls`. Executed **in-situ by the agent**: the LLM emits a tool call, the agent forwards it to the MCP server, feeds the result back to the LLM, and continues. Caller sees only the final answer.
- **On-chain tools** — declared as Solidity function signatures in `onchainTools`. The agent **does not** execute these; instead, when the LLM wants to call one, the agent **yields** the calldata back to the caller. The caller executes the call (against any contract, not just the requester) and **resumes** the conversation by passing the tool result back.

```solidity
function inferToolsChat(
    string[] roles,
    string[] messages,
    string[] mcpServerUrls,
    OnchainTool[] onchainTools,
    uint256 maxIterations,
    bool    chainOfThought
) returns (
    string  finishReason,
    string  response,
    string[] updatedRoles,
    string[] updatedMessages,
    string[] pendingToolCallIds,
    bytes[] pendingToolCalls
);

struct OnchainTool {
    string signature;    // e.g. "swap(address token, uint256 amount)"
    string description;  // human-readable description for the LLM
}
```

Supported types in tool signatures: `string`, `bool`, `address`, `uint*`, `int*`, `bytes`, and arrays of these.

### `finishReason` semantics

| Value | What happened | What `response` / `pending*` contain |
|---|---|---|
| `"stop"` | LLM finished — possibly after MCP tool calls (which were auto-executed). | `response`: final text. All other outputs empty. |
| `"tool_calls"` | LLM wants to call **on-chain** tool(s). | `response`: empty. `updatedRoles`/`updatedMessages`: full conversation incl. any MCP results. `pendingToolCallIds[i]` ↔ `pendingToolCalls[i]` parallel arrays — calldata to execute. |
| `"max_iterations"` | Reached `maxIterations` LLM↔tool round-trips without finishing. | Treat as a soft failure — increase `maxIterations` or simplify the prompt. |

### MCP-only flow (auto-executed)

```
Caller ──inferToolsChat([..., mcpServerUrls=["http://weather:80/"], onchainTools=[]])──► Agent
                                                          │
Agent ─list_tools()─► MCP server ─tools─► Agent
Agent ─prompt + tools─► LLM
LLM ─tool_call: getWeather("Tokyo")─► Agent
Agent ─call_tool─► MCP server ─result─► Agent
Agent ─tool_result─► LLM
LLM ─final answer─► Agent
                                                          │
Caller ◄─finishReason="stop", response="Tokyo is 22°C and sunny", [], [], [], []
```

### On-chain tool yield/resume flow

```
Caller ──inferToolsChat([..., onchainTools=[swap(address,uint256)]])──► Agent
                                                          │
Agent ─prompt + tool defs─► LLM
LLM ─tool_call: swap(0xA0b8..., 1000)─► Agent
                                                          │
Caller ◄─finishReason="tool_calls", "", state, [callId], [calldata for swap(...)]
                                                          │
Caller executes calldata against the DEX, captures result
                                                          │
Caller ──inferToolsChat([state ++ {role:"tool", content:'{"tool_call_id":callId,"content":"success"}'}], ...)──► Agent
                                                          │
Agent ─continued conversation─► LLM
LLM ─final answer─► Agent
                                                          │
Caller ◄─finishReason="stop", response="Swapped 1000 USDC successfully", [], [], [], []
```

### Resume protocol

When `finishReason == "tool_calls"`:

1. Iterate `pendingToolCalls[i]` — each is calldata (selector + ABI-encoded args).
2. Execute the call against the appropriate target contract (`(bool ok, bytes memory result) = target.call(pendingToolCalls[i]);`).
3. Append the result to the conversation: a new `(role: "tool", message: jsonOf({tool_call_id: pendingToolCallIds[i], content: resultString}))` per call.
4. Call `inferToolsChat` again with the extended `updatedRoles` + `updatedMessages`.

Repeat until `finishReason == "stop"`. Each round-trip is a new on-chain `createRequest` (with its own deposit + consensus cycle), so cap `maxIterations` and budget accordingly.

### Solidity sketch — agentic swap

```solidity
interface ILLMAgent {
    struct OnchainTool { string signature; string description; }
    function inferToolsChat(
        string[] calldata roles,
        string[] calldata messages,
        string[] calldata mcpServerUrls,
        OnchainTool[] calldata onchainTools,
        uint256 maxIterations,
        bool chainOfThought
    ) external returns (
        string memory finishReason,
        string memory response,
        string[] memory updatedRoles,
        string[] memory updatedMessages,
        string[] memory pendingToolCallIds,
        bytes[] memory pendingToolCalls
    );
}

contract AgenticSwapper is IAgentRequesterHandler {
    IAgentRequester public immutable platform;
    address public immutable dex;
    uint256 public constant LLM_AGENT_ID = 12847293847561029384;
    uint256 public constant SUBCOMMITTEE_SIZE = 3;
    uint256 public constant PRICE_PER_AGENT = 0.07 ether;

    // Tracks per-request state for resume
    mapping(uint256 => bytes) public requestState; // serialized roles+messages

    // ... (createRequest call with onchainTools = [swap(address,uint256)] omitted for brevity)

    function handleResponse(
        uint256 requestId,
        Response[] memory responses,
        ResponseStatus status,
        Request memory /* details */
    ) external override {
        require(msg.sender == address(platform), "Only platform");
        if (status != ResponseStatus.Success || responses.length == 0) return;

        (
            string memory finishReason,
            string memory response,
            string[] memory updatedRoles,
            string[] memory updatedMessages,
            string[] memory pendingToolCallIds,
            bytes[] memory pendingToolCalls
        ) = abi.decode(
            responses[0].result,
            (string, string, string[], string[], string[], bytes[])
        );

        if (_streq(finishReason, "stop")) {
            // Final answer in `response` — done.
            return;
        }

        if (_streq(finishReason, "tool_calls")) {
            // Execute each pending call and resume.
            for (uint256 i = 0; i < pendingToolCalls.length; i++) {
                (bool ok, bytes memory result) = dex.call(pendingToolCalls[i]);
                // append (role:"tool", json("{tool_call_id":callIds[i],"content":...}")) to state
                // ...
            }
            // Submit a new createRequest with updated state. (Funding the chain of
            // requests is application-level — keep msg.value escrowed.)
        }
    }

    function _streq(string memory a, string memory b) internal pure returns (bool) {
        return keccak256(bytes(a)) == keccak256(bytes(b));
    }
    receive() external payable {}
}
```

The **chain of inference requests** (each with its own deposit, callback, and consensus) is what enables agentic behavior on-chain. Track total budget across the chain; each round-trip costs `0.07 × subSize` per the LLM Inference price.

---

## TypeScript encoding

```typescript
import { encodeFunctionData, parseAbi } from 'viem';

const abi = parseAbi([
  'function inferString(string prompt, string system, bool chainOfThought, string[] allowedValues) returns (string)',
  'function inferNumber(string prompt, string system, int256 minValue, int256 maxValue, bool chainOfThought) returns (int256)',
  'function inferChat(string[] roles, string[] messages, bool chainOfThought) returns (string)',
  // inferToolsChat tuple is messy in parseAbi — use the JSON form from references/agents.json
]);

const payload = encodeFunctionData({
  abi,
  functionName: 'inferString',
  args: [
    'Classify: "Check out this amazing new product!"',
    'You are a content classifier. Reply with one word.',
    false,
    ['safe', 'unsafe', 'spam'],
  ],
});
```

For `inferToolsChat`, load the structured ABI from [`references/agents.json`](../../references/agents.json) — `agents["llm-inference"].abi` — and pass it to `encodeFunctionData` directly.

---

## Pitfalls specific to llm-inference

### Determinism is the whole point — preserve it

- **Don't include block-dependent data in the prompt** (block number, `block.timestamp`, recent block hashes). Two validators executing milliseconds apart can see different values, breaking byte-identical outputs and Majority consensus.
- **Don't rely on URLs that return time-sensitive data** when used inside MCP tools — same problem.
- **Don't introduce randomness** into prompt construction (random salts, etc.).

### `allowedValues` is a strong contract

When you pass a non-empty `allowedValues` to `inferString`, the model is constrained — but the constraint is enforced post-hoc by the agent, not via grammar-constrained decoding at the model level. Edge case: if the model produces text that doesn't match any allowed value, the response is `Failed`. Keep allowed values short and unambiguous (`"yes"` / `"no"` over `"definitely yes"` / `"absolutely not"`).

### `chainOfThought = true` is more expensive

Chain-of-thought multiplies token throughput. The runner's reported `executionCost` will be higher (still capped at `perAgentBudget`). For batch / high-volume use, leave it off unless you've measured a quality gain.

### Tool result formatting

When resuming after `finishReason == "tool_calls"`, the tool result message is a **JSON string**:

```json
{"tool_call_id": "<callId>", "content": "<result string or stringified JSON>"}
```

Pass it as a plain string in the `messages` array with role `"tool"`. Malformed JSON here is the most common reason the resume call fails.

### Conversation length cost

Each resume round sends the **full conversation** back through `createRequest`. Long agentic loops can hit gas limits on the dApp side and increase per-request cost on the agent side. Keep system prompts compact and prune old turns when possible.

### MCP server reachability

MCP servers must be reachable from the agent's sandbox — public HTTPS endpoints, not localhost or VPN-only. If the MCP server is down or slow, the LLM hangs on tool calls until either the agent's timeout or `maxIterations`.

### Why is my response `Failed`?

Check the receipt (see master skill). Common LLM-specific causes:

1. **Non-deterministic prompt** — different validators saw slightly different inputs (e.g. trailing whitespace, encoding). Outputs diverged → no Majority. Inspect `prompt` in the `request_received` step across receipts.
2. **`allowedValues` miss** — model output didn't match any allowed string. Loosen the values or drop the constraint.
3. **`max_iterations` reached** — for `inferToolsChat`, the LLM kept emitting tool calls without converging. Increase `maxIterations` or simplify the tool surface.

---

## Cross-references

- `somnia-agents` — request lifecycle, deposit math, callback pattern
- `somnia-agents-invoke` — interactive CLI to fire `inferString` / `inferNumber` calls without writing a contract
- `somnia-agents-llm-parse-website` — when you specifically want extraction from a webpage rather than free-form inference