--- name: somnia-agents-llm-inference description: Deep-dive reference for the LLM Inference agent on Somnia — invoke a deterministic on-chain LLM (Qwen3-30B) from smart contracts. Covers the 4 functions (inferString, inferNumber, inferChat, inferToolsChat), MCP tool calling, on-chain tool yield/resume pattern, allowed-values constraints, and chain-of-thought. Use when building AI moderation, classification, summarization, sentiment scoring, or agentic DeFi bots that need an LLM to decide which on-chain calls to make. --- # LLM Inference Agent The LLM Inference agent (`llm-inference`) gives smart contracts access to an on-chain deterministic LLM — Qwen3-30B running with fixed seed and `temperature = 0`, so every validator independently produces **byte-identical** output. That's what makes consensus on AI results possible. > Read the master `somnia-agents` skill first for the request lifecycle, gas model, and callback pattern. This document only covers the agent-specific ABI and quirks. ## Identity | Field | Value | |---|---| | `agentId` | `12847293847561029384` | | Per-agent price | **`0.07`** (whole tokens — SOMI on Mainnet, STT on Testnet) | | Default consensus | Majority — deterministic, byte-identical outputs | | Source of truth | [`references/agents.json`](../../references/agents.json) | ## Methods | Function | Purpose | |---|---| | `inferString(prompt, system, chainOfThought, allowedValues)` | Single-turn string inference, optionally constrained to a fixed set of values | | `inferNumber(prompt, system, minValue, maxValue, chainOfThought)` | Single-turn integer inference, clamped to `[minValue, maxValue]` | | `inferChat(roles, messages, chainOfThought)` | Multi-turn chat with full message history | | `inferToolsChat(roles, messages, mcpServerUrls, onchainTools, maxIterations, chainOfThought)` | Multi-turn chat with **MCP tool calling** (auto-executed) and **on-chain tool calling** (yielded back to caller as calldata) | All four return their result via the standard request → callback flow. The full ABI is in [`references/agents.json`](../../references/agents.json) under `agents["llm-inference"].abi`. --- ## `inferString` — constrained classification Best for: content moderation, sentiment labels, intent classification, any string output from a closed set. ```solidity function inferString( string prompt, string system, bool chainOfThought, string[] allowedValues ) returns (string response); ``` - `system`: system prompt; pass `""` if you don't need one. - `allowedValues`: pass an empty array for unconstrained text. When non-empty, the model is **forced** to pick one of the listed strings — this is the safest pattern for on-chain logic that branches on the result. - `chainOfThought`: when `true`, the model is allowed to reason internally (visible in the receipt) before producing the final answer. Increases latency and token cost; helpful for harder classification. ```solidity bytes memory payload = abi.encodeWithSelector( ILLMAgent.inferString.selector, 'Is this review positive or negative? "Absolutely loved it, best purchase ever!"', "You are a sentiment classifier. Reply with one word.", false, _array("positive", "negative", "neutral") ); ``` The response is decoded as a single `string`: ```solidity string memory label = abi.decode(responses[0].result, (string)); ``` --- ## `inferNumber` — bounded integer inference Best for: rating / scoring, count extraction, confidence values. ```solidity function inferNumber( string prompt, string system, int256 minValue, int256 maxValue, bool chainOfThought ) returns (int256 response); ``` The agent extracts the first integer from the model's response and **clamps** it to `[minValue, maxValue]`. Set `minValue = maxValue = 0` to disable clamping. ```solidity bytes memory payload = abi.encodeWithSelector( ILLMAgent.inferNumber.selector, 'Rate the sentiment of this review on a 1-10 scale: "..."', "You are a sentiment analyst. Reply with a single integer 1-10.", int256(1), int256(10), true // chain-of-thought helps with subjective scores ); ``` Decode as `int256`: ```solidity int256 score = abi.decode(responses[0].result, (int256)); ``` --- ## `inferChat` — multi-turn conversation Pass full message history as parallel `roles[] / messages[]` arrays — same length, same order. ```solidity function inferChat( string[] roles, // "system" | "user" | "assistant" string[] messages, bool chainOfThought ) returns (string response); ``` ```solidity string[] memory roles = new string[](4); string[] memory msgs = new string[](4); roles[0] = "system"; msgs[0] = "You are a helpful coding assistant."; roles[1] = "user"; msgs[1] = "How do I reverse a string in JavaScript?"; roles[2] = "assistant"; msgs[2] = "str.split('').reverse().join('')"; roles[3] = "user"; msgs[3] = "Can you explain that step by step?"; bytes memory payload = abi.encodeWithSelector( ILLMAgent.inferChat.selector, roles, msgs, false ); ``` Use this when the prompt naturally needs prior context (instructions earlier in the conversation, partial assistant outputs, few-shot examples). --- ## `inferToolsChat` — tool calling (MCP + on-chain) The most powerful and most subtle of the four. The LLM can call: - **MCP tools** — discovered automatically from the URLs you pass in `mcpServerUrls`. Executed **in-situ by the agent**: the LLM emits a tool call, the agent forwards it to the MCP server, feeds the result back to the LLM, and continues. Caller sees only the final answer. - **On-chain tools** — declared as Solidity function signatures in `onchainTools`. The agent **does not** execute these; instead, when the LLM wants to call one, the agent **yields** the calldata back to the caller. The caller executes the call (against any contract, not just the requester) and **resumes** the conversation by passing the tool result back. ```solidity function inferToolsChat( string[] roles, string[] messages, string[] mcpServerUrls, OnchainTool[] onchainTools, uint256 maxIterations, bool chainOfThought ) returns ( string finishReason, string response, string[] updatedRoles, string[] updatedMessages, string[] pendingToolCallIds, bytes[] pendingToolCalls ); struct OnchainTool { string signature; // e.g. "swap(address token, uint256 amount)" string description; // human-readable description for the LLM } ``` Supported types in tool signatures: `string`, `bool`, `address`, `uint*`, `int*`, `bytes`, and arrays of these. ### `finishReason` semantics | Value | What happened | What `response` / `pending*` contain | |---|---|---| | `"stop"` | LLM finished — possibly after MCP tool calls (which were auto-executed). | `response`: final text. All other outputs empty. | | `"tool_calls"` | LLM wants to call **on-chain** tool(s). | `response`: empty. `updatedRoles`/`updatedMessages`: full conversation incl. any MCP results. `pendingToolCallIds[i]` ↔ `pendingToolCalls[i]` parallel arrays — calldata to execute. | | `"max_iterations"` | Reached `maxIterations` LLM↔tool round-trips without finishing. | Treat as a soft failure — increase `maxIterations` or simplify the prompt. | ### MCP-only flow (auto-executed) ``` Caller ──inferToolsChat([..., mcpServerUrls=["http://weather:80/"], onchainTools=[]])──► Agent │ Agent ─list_tools()─► MCP server ─tools─► Agent Agent ─prompt + tools─► LLM LLM ─tool_call: getWeather("Tokyo")─► Agent Agent ─call_tool─► MCP server ─result─► Agent Agent ─tool_result─► LLM LLM ─final answer─► Agent │ Caller ◄─finishReason="stop", response="Tokyo is 22°C and sunny", [], [], [], [] ``` ### On-chain tool yield/resume flow ``` Caller ──inferToolsChat([..., onchainTools=[swap(address,uint256)]])──► Agent │ Agent ─prompt + tool defs─► LLM LLM ─tool_call: swap(0xA0b8..., 1000)─► Agent │ Caller ◄─finishReason="tool_calls", "", state, [callId], [calldata for swap(...)] │ Caller executes calldata against the DEX, captures result │ Caller ──inferToolsChat([state ++ {role:"tool", content:'{"tool_call_id":callId,"content":"success"}'}], ...)──► Agent │ Agent ─continued conversation─► LLM LLM ─final answer─► Agent │ Caller ◄─finishReason="stop", response="Swapped 1000 USDC successfully", [], [], [], [] ``` ### Resume protocol When `finishReason == "tool_calls"`: 1. Iterate `pendingToolCalls[i]` — each is calldata (selector + ABI-encoded args). 2. Execute the call against the appropriate target contract (`(bool ok, bytes memory result) = target.call(pendingToolCalls[i]);`). 3. Append the result to the conversation: a new `(role: "tool", message: jsonOf({tool_call_id: pendingToolCallIds[i], content: resultString}))` per call. 4. Call `inferToolsChat` again with the extended `updatedRoles` + `updatedMessages`. Repeat until `finishReason == "stop"`. Each round-trip is a new on-chain `createRequest` (with its own deposit + consensus cycle), so cap `maxIterations` and budget accordingly. ### Solidity sketch — agentic swap ```solidity interface ILLMAgent { struct OnchainTool { string signature; string description; } function inferToolsChat( string[] calldata roles, string[] calldata messages, string[] calldata mcpServerUrls, OnchainTool[] calldata onchainTools, uint256 maxIterations, bool chainOfThought ) external returns ( string memory finishReason, string memory response, string[] memory updatedRoles, string[] memory updatedMessages, string[] memory pendingToolCallIds, bytes[] memory pendingToolCalls ); } contract AgenticSwapper is IAgentRequesterHandler { IAgentRequester public immutable platform; address public immutable dex; uint256 public constant LLM_AGENT_ID = 12847293847561029384; uint256 public constant SUBCOMMITTEE_SIZE = 3; uint256 public constant PRICE_PER_AGENT = 0.07 ether; // Tracks per-request state for resume mapping(uint256 => bytes) public requestState; // serialized roles+messages // ... (createRequest call with onchainTools = [swap(address,uint256)] omitted for brevity) function handleResponse( uint256 requestId, Response[] memory responses, ResponseStatus status, Request memory /* details */ ) external override { require(msg.sender == address(platform), "Only platform"); if (status != ResponseStatus.Success || responses.length == 0) return; ( string memory finishReason, string memory response, string[] memory updatedRoles, string[] memory updatedMessages, string[] memory pendingToolCallIds, bytes[] memory pendingToolCalls ) = abi.decode( responses[0].result, (string, string, string[], string[], string[], bytes[]) ); if (_streq(finishReason, "stop")) { // Final answer in `response` — done. return; } if (_streq(finishReason, "tool_calls")) { // Execute each pending call and resume. for (uint256 i = 0; i < pendingToolCalls.length; i++) { (bool ok, bytes memory result) = dex.call(pendingToolCalls[i]); // append (role:"tool", json("{tool_call_id":callIds[i],"content":...}")) to state // ... } // Submit a new createRequest with updated state. (Funding the chain of // requests is application-level — keep msg.value escrowed.) } } function _streq(string memory a, string memory b) internal pure returns (bool) { return keccak256(bytes(a)) == keccak256(bytes(b)); } receive() external payable {} } ``` The **chain of inference requests** (each with its own deposit, callback, and consensus) is what enables agentic behavior on-chain. Track total budget across the chain; each round-trip costs `0.07 × subSize` per the LLM Inference price. --- ## TypeScript encoding ```typescript import { encodeFunctionData, parseAbi } from 'viem'; const abi = parseAbi([ 'function inferString(string prompt, string system, bool chainOfThought, string[] allowedValues) returns (string)', 'function inferNumber(string prompt, string system, int256 minValue, int256 maxValue, bool chainOfThought) returns (int256)', 'function inferChat(string[] roles, string[] messages, bool chainOfThought) returns (string)', // inferToolsChat tuple is messy in parseAbi — use the JSON form from references/agents.json ]); const payload = encodeFunctionData({ abi, functionName: 'inferString', args: [ 'Classify: "Check out this amazing new product!"', 'You are a content classifier. Reply with one word.', false, ['safe', 'unsafe', 'spam'], ], }); ``` For `inferToolsChat`, load the structured ABI from [`references/agents.json`](../../references/agents.json) — `agents["llm-inference"].abi` — and pass it to `encodeFunctionData` directly. --- ## Pitfalls specific to llm-inference ### Determinism is the whole point — preserve it - **Don't include block-dependent data in the prompt** (block number, `block.timestamp`, recent block hashes). Two validators executing milliseconds apart can see different values, breaking byte-identical outputs and Majority consensus. - **Don't rely on URLs that return time-sensitive data** when used inside MCP tools — same problem. - **Don't introduce randomness** into prompt construction (random salts, etc.). ### `allowedValues` is a strong contract When you pass a non-empty `allowedValues` to `inferString`, the model is constrained — but the constraint is enforced post-hoc by the agent, not via grammar-constrained decoding at the model level. Edge case: if the model produces text that doesn't match any allowed value, the response is `Failed`. Keep allowed values short and unambiguous (`"yes"` / `"no"` over `"definitely yes"` / `"absolutely not"`). ### `chainOfThought = true` is more expensive Chain-of-thought multiplies token throughput. The runner's reported `executionCost` will be higher (still capped at `perAgentBudget`). For batch / high-volume use, leave it off unless you've measured a quality gain. ### Tool result formatting When resuming after `finishReason == "tool_calls"`, the tool result message is a **JSON string**: ```json {"tool_call_id": "", "content": ""} ``` Pass it as a plain string in the `messages` array with role `"tool"`. Malformed JSON here is the most common reason the resume call fails. ### Conversation length cost Each resume round sends the **full conversation** back through `createRequest`. Long agentic loops can hit gas limits on the dApp side and increase per-request cost on the agent side. Keep system prompts compact and prune old turns when possible. ### MCP server reachability MCP servers must be reachable from the agent's sandbox — public HTTPS endpoints, not localhost or VPN-only. If the MCP server is down or slow, the LLM hangs on tool calls until either the agent's timeout or `maxIterations`. ### Why is my response `Failed`? Check the receipt (see master skill). Common LLM-specific causes: 1. **Non-deterministic prompt** — different validators saw slightly different inputs (e.g. trailing whitespace, encoding). Outputs diverged → no Majority. Inspect `prompt` in the `request_received` step across receipts. 2. **`allowedValues` miss** — model output didn't match any allowed string. Loosen the values or drop the constraint. 3. **`max_iterations` reached** — for `inferToolsChat`, the LLM kept emitting tool calls without converging. Increase `maxIterations` or simplify the tool surface. --- ## Cross-references - `somnia-agents` — request lifecycle, deposit math, callback pattern - `somnia-agents-invoke` — interactive CLI to fire `inferString` / `inferNumber` calls without writing a contract - `somnia-agents-llm-parse-website` — when you specifically want extraction from a webpage rather than free-form inference