What are we testing?

Grab your system prompt

Grab the system prompt from whatever console you're on, or paste it.

What are you testing?

Target model

Output type

Don't have a prompt yet?

Build a system prompt

Go from an idea to a tested system prompt — litmus writes it, you run it.

≈5 questions max — or ✦ Generate now anytime. ⌘/Ctrl + Enter to send.

Prompt analysis

The judges that score outputs

Evaluation prompts

litmus found these dimensions and wrote a rubric for each. Edit any, or add your own.

Dimensions

Rubric

Eval cases · remove any you don't want

What we'll test against

🔧 Tool tests Beta

Test whether the model calls the right tool with valid arguments. Define the tools available to it, then add a case that asserts the expected call.

Tool definitions · JSON array

Or add one manually

Expected tool

🔧

Forbidden · comma-sep

Required args · JSON (optional)

🤖 Agent scenarios Beta

Test a multi-step agent: give it a goal and mock tools (with scripted results — you can inject a failure to test recovery). litmus runs the model in a loop, feeding tool results back, and scores whether it reaches the goal. Tools are mocked — nothing real is executed.

Scenario · JSON

Estimated cost

Running · on your key

Generating & judging

Results

How this version scored

◷ Scores vary run-to-run — the target model is non-deterministic. A different score doesn't always mean a different prompt; re-run a version to see its spread.

Speed · measured this run

Cases

From the failing cases

Fix these, in order

Every pass is kept

Versions

MCP server

Connect, inspect & scan

Point litmus at a live MCP server to inspect what it exposes, call its tools, and run an adversarial security scan.

Name

Transport

🔌

Server URL

Authorization header · optional