{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "qsOnSYBqfoSL"
},
"source": [
"# Evaluating Agents with Langfuse\n",
"\n",
"In this cookbook, we will learn how to **monitor the internal steps (traces) of the [OpenAI agent SDK](https://github.com/openai/openai-agents-python)** and **evaluate its performance** using [Langfuse](https://langfuse.com/docs).\n",
"\n",
"This guide covers **online** and **offline evaluation** metrics used by teams to bring agents to production fast and reliably. To learn more about evaluation strategies, check out this [blog post](https://langfuse.com/blog/2025-03-04-llm-evaluation-101-best-practices-and-challenges).\n",
"\n",
"**Why AI agent Evaluation is important:**\n",
"- Debugging issues when tasks fail or produce suboptimal results\n",
"- Monitoring costs and performance in real-time\n",
"- Improving reliability and safety through continuous feedback\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "94-c-mbeVk4q"
},
"source": [
"