--- name: langfuse version: 1.0.2 description: Debug AI traces, find exceptions, analyze sessions, and manage prompts via Langfuse MCP. Also handles MCP setup and configuration. metadata: short-description: Langfuse observability via MCP compatibility: claude-code, codex-cli --- # Langfuse Skill Debug your AI systems through Langfuse observability. **Triggers:** langfuse, traces, debug AI, find exceptions, set up langfuse, what went wrong, why is it slow, datasets, evaluation sets ## Setup **Step 1:** Get credentials from https://cloud.langfuse.com → Settings → API Keys If self-hosted, use your instance URL for `LANGFUSE_HOST` and create keys there. **Step 2:** Install MCP (pick one): ```bash # Claude Code (project-scoped, shared via .mcp.json) claude mcp add \ --scope project \ --env LANGFUSE_PUBLIC_KEY=pk-... \ --env LANGFUSE_SECRET_KEY=sk-... \ --env LANGFUSE_HOST=https://cloud.langfuse.com \ langfuse -- uvx --python 3.11 langfuse-mcp # Codex CLI (user-scoped, stored in ~/.codex/config.toml) codex mcp add langfuse \ --env LANGFUSE_PUBLIC_KEY=pk-... \ --env LANGFUSE_SECRET_KEY=sk-... \ --env LANGFUSE_HOST=https://cloud.langfuse.com \ -- uvx --python 3.11 langfuse-mcp ``` **Step 3:** Restart CLI, verify with `/mcp` (Claude) or `codex mcp list` (Codex) **Step 4:** Test: `fetch_traces(age=60)` ### Read-Only Mode For safer observability without risk of modifying prompts or datasets, enable read-only mode: ```bash # CLI flag langfuse-mcp --read-only # Or environment variable LANGFUSE_MCP_READ_ONLY=true ``` This disables write tools: `create_text_prompt`, `create_chat_prompt`, `update_prompt_labels`, `create_dataset`, `create_dataset_item`, `delete_dataset_item`. For manual `.mcp.json` setup or troubleshooting, see `references/setup.md`. --- ## Playbooks ### "Where are the errors?" ``` find_exceptions(age=1440, group_by="file") ``` → Shows error counts by file. Pick the worst offender. ``` find_exceptions_in_file(filepath="src/ai/chat.py", age=1440) ``` → Lists specific exceptions. Grab a trace_id. ``` get_exception_details(trace_id="...") ``` → Full stacktrace and context. --- ### "What happened in this interaction?" ``` fetch_traces(age=60, user_id="...") ``` → Find the trace. Note the trace_id. If you don't know the user_id, start with: ``` fetch_traces(age=60) ``` ``` fetch_trace(trace_id="...", include_observations=true) ``` → See all LLM calls in the trace. ``` fetch_observation(observation_id="...") ``` → Inspect a specific generation's input/output. --- ### "Why is it slow?" ``` fetch_observations(age=60, type="GENERATION") ``` → Find recent LLM calls. Look for high latency. ``` fetch_observation(observation_id="...") ``` → Check token counts, model, timing. --- ### "What's this user experiencing?" ``` get_user_sessions(user_id="...", age=1440) ``` → List their sessions. ``` get_session_details(session_id="...") ``` → See all traces in the session. --- ### "Manage datasets" ``` list_datasets() ``` → See all datasets. ``` get_dataset(name="evaluation-set-v1") ``` → Get dataset details. ``` list_dataset_items(dataset_name="evaluation-set-v1", page=1, limit=10) ``` → Browse items in the dataset. ``` create_dataset(name="qa-test-cases", description="QA evaluation set") ``` → Create a new dataset. ``` create_dataset_item( dataset_name="qa-test-cases", input={"question": "What is 2+2?"}, expected_output={"answer": "4"} ) ``` → Add test cases. ``` create_dataset_item( dataset_name="qa-test-cases", item_id="item_123", input={"question": "What is 3+3?"}, expected_output={"answer": "6"} ) ``` → Upsert: updates existing item by id or creates if missing. --- ### "Manage prompts" ``` list_prompts() ``` → See all prompts with labels. ``` get_prompt(name="...", label="production") ``` → Fetch current production version. ``` create_text_prompt(name="...", prompt="...", labels=["staging"]) ``` → Create new version in staging. ``` update_prompt_labels(name="...", version=N, labels=["production"]) ``` → Promote to production. (Rollback = re-apply label to older version) --- ## Quick Reference | Task | Tool | |------|------| | List traces | `fetch_traces(age=N)` | | Get trace details | `fetch_trace(trace_id="...", include_observations=true)` | | List LLM calls | `fetch_observations(age=N, type="GENERATION")` | | Get observation | `fetch_observation(observation_id="...")` | | Error count | `get_error_count(age=N)` | | Find exceptions | `find_exceptions(age=N, group_by="file")` | | List sessions | `fetch_sessions(age=N)` | | User sessions | `get_user_sessions(user_id="...", age=N)` | | List prompts | `list_prompts()` | | Get prompt | `get_prompt(name="...", label="production")` | | List datasets | `list_datasets()` | | Get dataset | `get_dataset(name="...")` | | List dataset items | `list_dataset_items(dataset_name="...", limit=N)` | | Create/update dataset item | `create_dataset_item(dataset_name="...", item_id="...")` | `age` = minutes to look back (max 10080 = 7 days) --- ## References - `references/tool-reference.md` — Full parameter docs, filter semantics, response schemas - `references/setup.md` — Manual setup, troubleshooting, advanced configuration