vocabulary: title: Traceloop API Vocabulary description: >- Canonical terms and definitions used across the Traceloop LLM observability platform and REST API. version: 1.0.0 source: https://www.traceloop.com/docs/api-reference/introduction terms: - term: Auto Monitor Setup definition: >- A configured rule that automatically applies a set of evaluators to LLM spans matching a selector (e.g., project, environment). Enables continuous quality monitoring without manual triggering. properties: - external_id - selector - evaluators - status - term: Evaluator definition: >- A scoring function that assesses a specific quality dimension of an LLM interaction. Evaluators accept structured input (prompts, completions, context, ground truth) and return a numeric score, pass/fail flag, and reasoning. Traceloop provides 40+ built-in evaluators plus custom evaluator support. categories: - LLM-as-a-Judge - Agent - Safety - Technical Validation - Utility Metrics - term: Span definition: >- An OpenTelemetry trace span representing a single LLM call or pipeline step. Spans capture prompts, completions, model metadata, latency, and token usage. Traceloop indexes spans for search and evaluation. - term: Span Warehouse definition: >- Traceloop's queryable store of historical LLM spans. Supports filtering by project, environment, time range, model, and custom attributes for analysis and debugging. - term: Selector definition: >- A filter expression used in auto monitor setups to target which spans or traces should be evaluated. Selectors can match by project ID, environment, model, tags, or other span attributes. - term: Faithfulness definition: >- An LLM-as-a-judge evaluator that checks whether a model completion contains only information supported by the provided context. A faithfulness score of 1.0 means no hallucinated facts; 0.0 means the completion contradicts or fabricates content. - term: Answer Correctness definition: >- An evaluator that measures factual accuracy by comparing a model's answer against a ground truth reference. Uses semantic comparison to tolerate paraphrase while penalizing factual errors. - term: Answer Relevancy definition: >- An evaluator that checks whether the model's answer addresses the user's question. Detects off-topic, tangential, or evasive responses. - term: Answer Completeness definition: >- An evaluator that measures whether the model's answer includes all necessary information to fully address the question given the available context. - term: Context Relevance definition: >- An evaluator that checks whether retrieved context documents contain sufficient information to answer the query. Used in RAG pipeline quality assessment. - term: Faithfulness definition: >- Evaluator measuring whether the model's response is grounded in and consistent with the provided context, penalizing hallucination. - term: PII Detector definition: >- A safety evaluator that identifies personally identifiable information (names, emails, phone numbers, SSNs, addresses, etc.) in LLM input or output text. Returns detected PII categories and a pass/fail result. - term: Toxicity Detector definition: >- A safety evaluator that identifies harmful, hateful, threatening, or abusive language in LLM output. Returns a toxicity score and pass/fail. - term: Prompt Injection definition: >- A security evaluator that detects attempts to manipulate an LLM through adversarial instructions embedded in user input, overriding system prompts or safety measures. - term: Secrets Detector definition: >- A security evaluator that identifies API keys, passwords, tokens, private keys, and other credentials that may have been leaked into LLM input or output. - term: Agent Efficiency definition: >- An agent-specific evaluator that detects redundant tool calls, unnecessary follow-up actions, or inefficient reasoning paths in agentic LLM workflows. - term: Agent Tool Trajectory definition: >- An evaluator that compares an agent's actual sequence of tool calls against an expected reference trajectory, measuring order-sensitivity and parameter accuracy. - term: LLM-as-a-Judge definition: >- An evaluation pattern where a capable LLM (e.g., GPT-4, Claude) scores the output of another LLM against defined criteria. Traceloop uses this pattern for subjective quality dimensions like instruction adherence, conversation quality, and topic adherence. - term: Instruction Adherence definition: >- An LLM-as-a-judge evaluator that measures how well a model's response follows explicit instructions provided in the system prompt or user turn. - term: Hallucination definition: >- A model output that contains factual claims not supported by the provided context or ground truth. Detected via the faithfulness and answer-correctness evaluators. - term: OpenLLMetry definition: >- Traceloop's open-source SDK (MIT licensed) that instruments LLM calls using OpenTelemetry conventions. Supports Python, TypeScript/JavaScript, Go, and Ruby with integrations for 20+ LLM providers and orchestration frameworks. - term: Environment definition: >- A logical deployment context within a Traceloop organization (e.g., production, staging, development). Each environment has its own API key and spans are scoped by environment. - term: Metrics High Water Mark (HWM) definition: >- A timestamp indicating the last successfully processed evaluation result. Used for incremental polling to detect new evaluation completions without re-scanning historical data. - term: Perplexity definition: >- A utility evaluator that measures text perplexity from token log-probabilities, indicating how predictable or surprising the model output is. High perplexity may signal incoherent or unusual output. - term: Semantic Similarity definition: >- A utility evaluator that calculates the semantic similarity between two texts using embedding-based comparison. Used to compare model completions against reference answers without requiring exact string matches. - term: Tone Detection definition: >- A utility evaluator that classifies the emotional tone of text (e.g., formal, casual, empathetic, aggressive). Useful for ensuring brand voice consistency in customer-facing LLM applications.