{ "cells": [ { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "ecb7ce06-97cb-432c-8fa7-f4cf709d50ff", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "# MLflow 3 and LangGraph — Develop and evaluate a *clickbait agent*.\n", "\n", "This notebook demonstrates how to implement and evaluate a clickbait detector and rewriter agent using LangGraph and MLflow 3's GenAI tools. It was developed as part of the talk \"What Comes After Coding: Evaluating Agentic Behaviour\" presented at the Madrid Databricks User Group Meetup.\n", "\n", "The focus is on practical agent development and rigorous evaluation. Core topics include agent development with LangGraph, MLflow instrumentation and the use of custom scorers and built-in judges to assess agentic behavior directly from execution traces." ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "a6aa73c6-4371-4c17-92bf-edf642820c56", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "## Environment Setup and Configuration\n", "\n", "Ensure that the configuration settings in this section are reviewed and updated before proceeding.\n", "\n", "These values determine where artifacts such as models and volumes will be stored, tokens to allow interaction with external tools, and addresses to volumes with datasets." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "db253b14-0030-46bf-b7ca-626678ec5ccd", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [ { "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ "\u001B[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.\u001B[0m\nRequirement already satisfied: mlflow>=3.1.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow[databricks]>=3.1.0) (3.1.0)\nRequirement already satisfied: databricks-sdk in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (0.55.0)\nCollecting databricks-sdk\n Downloading databricks_sdk-0.57.0-py3-none-any.whl.metadata (39 kB)\nRequirement already satisfied: mlflow-skinny==3.1.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.1.0)\nRequirement already satisfied: Flask<4 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.1.1)\nRequirement already satisfied: alembic!=1.10.0,<2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.16.2)\nRequirement already satisfied: docker<8,>=4.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (7.1.0)\nRequirement already satisfied: graphene<4 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.4.3)\nRequirement already satisfied: gunicorn<24 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (23.0.0)\nRequirement already satisfied: matplotlib<4 in /databricks/python3/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.8.4)\nRequirement already satisfied: numpy<3 in /databricks/python3/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.26.4)\nRequirement already satisfied: pandas<3 in /databricks/python3/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.5.3)\nRequirement already satisfied: pyarrow<21,>=4.0.0 in /databricks/python3/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (15.0.2)\nRequirement already satisfied: scikit-learn<2 in /databricks/python3/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.4.2)\nRequirement already satisfied: scipy<2 in /databricks/python3/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.13.1)\nRequirement already satisfied: sqlalchemy<3,>=1.4.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2.0.41)\nRequirement already satisfied: cachetools<7,>=5.0.0 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (5.3.3)\nRequirement already satisfied: click<9,>=7.0 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (8.1.7)\nRequirement already satisfied: cloudpickle<4 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2.2.1)\nRequirement already satisfied: fastapi<1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.115.13)\nRequirement already satisfied: gitpython<4,>=3.1.9 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.1.37)\nRequirement already satisfied: importlib_metadata!=4.7.0,<9,>=3.7.0 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (6.0.0)\nRequirement already satisfied: opentelemetry-api<3,>=1.9.0 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.27.0)\nRequirement already satisfied: opentelemetry-sdk<3,>=1.9.0 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.27.0)\nRequirement already satisfied: packaging<26 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (24.1)\nRequirement already satisfied: protobuf<7,>=3.12.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (5.29.5)\nRequirement already satisfied: pydantic<3,>=1.10.8 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2.11.7)\nRequirement already satisfied: pyyaml<7,>=5.1 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (6.0.1)\nRequirement already satisfied: requests<3,>=2.17.3 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2.32.2)\nRequirement already satisfied: sqlparse<1,>=0.4.0 in /databricks/python3/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.5.1)\nRequirement already satisfied: typing-extensions<5,>=4.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (4.14.0)\nRequirement already satisfied: uvicorn<1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.34.3)\nRequirement already satisfied: google-auth~=2.0 in /databricks/python3/lib/python3.12/site-packages (from databricks-sdk) (2.35.0)\nRequirement already satisfied: azure-storage-file-datalake>12 in /databricks/python3/lib/python3.12/site-packages (from mlflow[databricks]>=3.1.0) (12.17.0)\nRequirement already satisfied: google-cloud-storage>=1.30.0 in /databricks/python3/lib/python3.12/site-packages (from mlflow[databricks]>=3.1.0) (2.18.2)\nRequirement already satisfied: boto3>1 in /databricks/python3/lib/python3.12/site-packages (from mlflow[databricks]>=3.1.0) (1.34.69)\nRequirement already satisfied: botocore in /databricks/python3/lib/python3.12/site-packages (from mlflow[databricks]>=3.1.0) (1.34.69)\nRequirement already satisfied: databricks-agents<2.0,>=1.0.0rc3 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from mlflow[databricks]>=3.1.0) (1.1.0)\nRequirement already satisfied: Mako in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from alembic!=1.10.0,<2->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.3.10)\nRequirement already satisfied: azure-core>=1.30.0 in /databricks/python3/lib/python3.12/site-packages (from azure-storage-file-datalake>12->mlflow[databricks]>=3.1.0) (1.31.0)\nRequirement already satisfied: azure-storage-blob>=12.23.0 in /databricks/python3/lib/python3.12/site-packages (from azure-storage-file-datalake>12->mlflow[databricks]>=3.1.0) (12.23.0)\nRequirement already satisfied: isodate>=0.6.1 in /databricks/python3/lib/python3.12/site-packages (from azure-storage-file-datalake>12->mlflow[databricks]>=3.1.0) (0.6.1)\nRequirement already satisfied: jmespath<2.0.0,>=0.7.1 in /databricks/python3/lib/python3.12/site-packages (from boto3>1->mlflow[databricks]>=3.1.0) (1.0.1)\nRequirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /databricks/python3/lib/python3.12/site-packages (from boto3>1->mlflow[databricks]>=3.1.0) (0.10.2)\nRequirement already satisfied: python-dateutil<3.0.0,>=2.1 in /databricks/python3/lib/python3.12/site-packages (from botocore->mlflow[databricks]>=3.1.0) (2.9.0.post0)\nRequirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from botocore->mlflow[databricks]>=3.1.0) (2.5.0)\nRequirement already satisfied: databricks-connect in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (16.3.4)\nRequirement already satisfied: dataclasses-json in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (0.6.7)\nRequirement already satisfied: jinja2>=3.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (3.1.6)\nRequirement already satisfied: tenacity>=8.5 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (9.1.2)\nRequirement already satisfied: tiktoken>=0.8.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (0.9.0)\nRequirement already satisfied: tqdm in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (4.67.1)\nRequirement already satisfied: blinker>=1.9.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from Flask<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.9.0)\nRequirement already satisfied: itsdangerous>=2.2.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from Flask<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2.2.0)\nRequirement already satisfied: markupsafe>=2.1.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from Flask<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.0.2)\nRequirement already satisfied: werkzeug>=3.1.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from Flask<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.1.3)\nRequirement already satisfied: pyasn1-modules>=0.2.1 in /databricks/python3/lib/python3.12/site-packages (from google-auth~=2.0->databricks-sdk) (0.2.8)\nRequirement already satisfied: rsa<5,>=3.1.4 in /databricks/python3/lib/python3.12/site-packages (from google-auth~=2.0->databricks-sdk) (4.9)\nRequirement already satisfied: google-api-core<3.0.0dev,>=2.15.0 in /databricks/python3/lib/python3.12/site-packages (from google-cloud-storage>=1.30.0->mlflow[databricks]>=3.1.0) (2.20.0)\nRequirement already satisfied: google-cloud-core<3.0dev,>=2.3.0 in /databricks/python3/lib/python3.12/site-packages (from google-cloud-storage>=1.30.0->mlflow[databricks]>=3.1.0) (2.4.1)\nRequirement already satisfied: google-resumable-media>=2.7.2 in /databricks/python3/lib/python3.12/site-packages (from google-cloud-storage>=1.30.0->mlflow[databricks]>=3.1.0) (2.7.2)\nRequirement already satisfied: google-crc32c<2.0dev,>=1.0 in /databricks/python3/lib/python3.12/site-packages (from google-cloud-storage>=1.30.0->mlflow[databricks]>=3.1.0) (1.6.0)\nRequirement already satisfied: graphql-core<3.3,>=3.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from graphene<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.2.6)\nRequirement already satisfied: graphql-relay<3.3,>=3.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from graphene<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.2.0)\nRequirement already satisfied: contourpy>=1.0.1 in /databricks/python3/lib/python3.12/site-packages (from matplotlib<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.2.0)\nRequirement already satisfied: cycler>=0.10 in /databricks/python3/lib/python3.12/site-packages (from matplotlib<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.11.0)\nRequirement already satisfied: fonttools>=4.22.0 in /databricks/python3/lib/python3.12/site-packages (from matplotlib<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (4.51.0)\nRequirement already satisfied: kiwisolver>=1.3.1 in /databricks/python3/lib/python3.12/site-packages (from matplotlib<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.4.4)\nRequirement already satisfied: pillow>=8 in /databricks/python3/lib/python3.12/site-packages (from matplotlib<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (10.3.0)\nRequirement already satisfied: pyparsing>=2.3.1 in /databricks/python3/lib/python3.12/site-packages (from matplotlib<4->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.0.9)\nRequirement already satisfied: pytz>=2020.1 in /databricks/python3/lib/python3.12/site-packages (from pandas<3->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2024.1)\nRequirement already satisfied: charset-normalizer<4,>=2 in /databricks/python3/lib/python3.12/site-packages (from requests<3,>=2.17.3->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2.0.4)\nRequirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.12/site-packages (from requests<3,>=2.17.3->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.7)\nRequirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.12/site-packages (from requests<3,>=2.17.3->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2024.6.2)\nRequirement already satisfied: joblib>=1.2.0 in /databricks/python3/lib/python3.12/site-packages (from scikit-learn<2->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.4.2)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /databricks/python3/lib/python3.12/site-packages (from scikit-learn<2->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2.2.0)\nRequirement already satisfied: greenlet>=1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from sqlalchemy<3,>=1.4.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.2.3)\nRequirement already satisfied: six>=1.11.0 in /usr/lib/python3/dist-packages (from azure-core>=1.30.0->azure-storage-file-datalake>12->mlflow[databricks]>=3.1.0) (1.16.0)\nRequirement already satisfied: cryptography>=2.1.4 in /databricks/python3/lib/python3.12/site-packages (from azure-storage-blob>=12.23.0->azure-storage-file-datalake>12->mlflow[databricks]>=3.1.0) (42.0.5)\nRequirement already satisfied: starlette<0.47.0,>=0.40.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from fastapi<1->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.46.2)\nRequirement already satisfied: gitdb<5,>=4.0.1 in /databricks/python3/lib/python3.12/site-packages (from gitpython<4,>=3.1.9->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (4.0.11)\nRequirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /databricks/python3/lib/python3.12/site-packages (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage>=1.30.0->mlflow[databricks]>=3.1.0) (1.65.0)\nRequirement already satisfied: proto-plus<2.0.0dev,>=1.22.3 in /databricks/python3/lib/python3.12/site-packages (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage>=1.30.0->mlflow[databricks]>=3.1.0) (1.24.0)\nRequirement already satisfied: zipp>=0.5 in /databricks/python3/lib/python3.12/site-packages (from importlib_metadata!=4.7.0,<9,>=3.7.0->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (3.17.0)\nRequirement already satisfied: deprecated>=1.2.6 in /databricks/python3/lib/python3.12/site-packages (from opentelemetry-api<3,>=1.9.0->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.2.14)\nRequirement already satisfied: opentelemetry-semantic-conventions==0.48b0 in /databricks/python3/lib/python3.12/site-packages (from opentelemetry-sdk<3,>=1.9.0->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.48b0)\nRequirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /databricks/python3/lib/python3.12/site-packages (from pyasn1-modules>=0.2.1->google-auth~=2.0->databricks-sdk) (0.4.8)\nRequirement already satisfied: annotated-types>=0.6.0 in /databricks/python3/lib/python3.12/site-packages (from pydantic<3,>=1.10.8->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.7.0)\nRequirement already satisfied: pydantic-core==2.33.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from pydantic<3,>=1.10.8->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (2.33.2)\nRequirement already satisfied: typing-inspection>=0.4.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from pydantic<3,>=1.10.8->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.4.1)\nRequirement already satisfied: regex>=2022.1.18 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from tiktoken>=0.8.0->databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (2024.11.6)\nRequirement already satisfied: h11>=0.8 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from uvicorn<1->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (0.16.0)\nRequirement already satisfied: grpcio-status>=1.59.3 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-connect->databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (1.71.0)\nRequirement already satisfied: grpcio>=1.59.3 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-connect->databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (1.73.0)\nRequirement already satisfied: py4j==0.10.9.7 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from databricks-connect->databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (0.10.9.7)\nRequirement already satisfied: setuptools>=68.0.0 in /usr/local/lib/python3.12/dist-packages (from databricks-connect->databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (74.0.0)\nRequirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from dataclasses-json->databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (3.26.1)\nRequirement already satisfied: typing-inspect<1,>=0.4.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from dataclasses-json->databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (0.9.0)\nRequirement already satisfied: cffi>=1.12 in /databricks/python3/lib/python3.12/site-packages (from cryptography>=2.1.4->azure-storage-blob>=12.23.0->azure-storage-file-datalake>12->mlflow[databricks]>=3.1.0) (1.16.0)\nRequirement already satisfied: wrapt<2,>=1.10 in /databricks/python3/lib/python3.12/site-packages (from deprecated>=1.2.6->opentelemetry-api<3,>=1.9.0->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.14.1)\nRequirement already satisfied: smmap<6,>=3.0.1 in /databricks/python3/lib/python3.12/site-packages (from gitdb<5,>=4.0.1->gitpython<4,>=3.1.9->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (5.0.0)\nRequirement already satisfied: anyio<5,>=3.6.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from starlette<0.47.0,>=0.40.0->fastapi<1->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (4.9.0)\nRequirement already satisfied: mypy-extensions>=0.3.0 in /databricks/python3/lib/python3.12/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json->databricks-agents<2.0,>=1.0.0rc3->mlflow[databricks]>=3.1.0) (1.0.0)\nRequirement already satisfied: sniffio>=1.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-fee525af-078a-43ee-a470-33f1df903818/lib/python3.12/site-packages (from anyio<5,>=3.6.2->starlette<0.47.0,>=0.40.0->fastapi<1->mlflow-skinny==3.1.0->mlflow>=3.1.0->mlflow[databricks]>=3.1.0) (1.3.1)\nRequirement already satisfied: pycparser in /databricks/python3/lib/python3.12/site-packages (from cffi>=1.12->cryptography>=2.1.4->azure-storage-blob>=12.23.0->azure-storage-file-datalake>12->mlflow[databricks]>=3.1.0) (2.21)\n\u001B[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.\u001B[0m\n" ] } ], "source": [ "%pip install -U -qqqq databricks-langchain langgraph==0.3.4 databricks-agents\n", "%pip install --upgrade mlflow[databricks]>=3.1.0 databricks-sdk\n", "%restart_python" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "500e555e-103e-414f-8387-a088f7185315", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Global variables\n", "\n", "This notebook uses Unity Catalog to register models and other resources, and relies on a model serving endpoint for inference. You need to specify:\n", "\n", "- `UC_CATALOG`: the Unity Catalog catalog where resources will be stored.\n", "- `UC_SCHEMA`: the schema within the catalog to use.\n", "- `model_serving_endpoint`: the name of the Databricks model serving endpoint that will handle inference requests. You can use a default one or [create your own](https://docs.databricks.com/aws/en/machine-learning/model-serving/create-manage-serving-endpoints)." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "9086a8a6-37c8-4280-b231-e3f44c788dee", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "UC_CATALOG = \"\"\n", "UC_SCHEMA = \"\"\n", "\n", "AGENT_PARAMETERS = {\n", " \"model_serving_endpoint\": \"\"\n", "}" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "66ac904e-fd4e-4420-abcf-8daf4e19c186", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "#### Jina AI token\n", "\n", "The agent will have access to an external tool to fetch website contents from urls. This tool is implemented using *Jina AI Reader API*, a service for extracting content from web pages and converting it into clean LLM-ready markdown text.\n", "\n", "A free Jina AI API key can be obtained by following [this tutorial](https://www.youtube.com/watch?v=SLv6tSEKYOg).\n", "\n", "The token is managed using [databricks secrets](https://docs.databricks.com/aws/en/security/secrets/). For testing purposes you could replace this definition by inlining the token string." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "f16acaed-53ff-4f7b-aec1-6fa08c97505d", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "import base64\n", "from databricks.sdk import WorkspaceClient\n", "\n", "JINA_AI_TOKEN = base64.b64decode(\n", " WorkspaceClient().secrets.get_secret(\"credentials\", \"jinaai\").value\n", ").decode()\n", "\n", "# JINA_AI_TOKEN = \"YOUR_JINA_AI_TOKEN\" # Replace with your Jina AI token" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "f220c99a-4f53-495c-bb9b-11141e037499", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Clickbait dataset" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "d5ed767f-4cb4-4231-8920-c352f165c67c", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "A corpus of clickbait and non-clickbait title examples will prove useful to evaluate our agent implementation. You can download a dataset with headlines and binary (clickbait or not) classifications from this [Kaggle dataset](https://www.kaggle.com/datasets/amananandrai/clickbait-dataset/data).\n", "\n", "The `CORPUS_FILE` variable stores the path to the dataset in csv format. In this example, the dataset is stored in a unity catalog volume." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "f7e4a12f-ff0a-43ad-b39e-e0aef7fa93c8", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "CORPUS_FILE = \"\"" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "99df8387-5425-43ea-ab85-954de7a87b16", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "assert UC_CATALOG != \"\", \"Please set UC_CATALOG to your catalog name\"\n", "assert UC_SCHEMA != \"\", \"Please set UC_SCHEMA to your schema name\"\n", "assert \"model_serving_endpoint\" in AGENT_PARAMETERS and AGENT_PARAMETERS[\"model_serving_endpoint\"] != \"\", \"Please set AGENT_PARAMETERS['model_serving_endpoint'] to your model serving endpoint name\"\n", "assert JINA_AI_TOKEN != \"\", \"Please set JINA_AI_TOKEN to your Jina AI token\"\n", "assert CORPUS_FILE != \"\", \"Please set CORPUS_FILE to the path of your corpus file\"" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "762d05d9-226b-43b5-8995-bbdfd6db9f60", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "## Implement the agent" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "f53cd785-b7d7-4831-835a-0609fcf74b99", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Register prompts for our agent\n", "\n", "With MLflow we can manage prompts using the [*prompt registry*](https://mlflow.org/docs/latest/genai/prompt-version-mgmt/prompt-registry/). The prompt registry provides many other features, like prompt version control, prompt aliasing, prompt engineering collaboration and even an automatic [prompt optimizer](https://mlflow.org/docs/latest/genai/prompt-version-mgmt/prompt-registry/optimize-prompts).\n", "\n", "Here, it's only used to register prompts and log them as model parameters" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "fb34df2d-3963-4cd9-bfb8-52b4771c4874", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "import mlflow\n", "\n", "# List all registered prompts\n", "registered_prompts = mlflow.genai.search_prompts(f\"catalog='{UC_CATALOG}' AND schema='{UC_SCHEMA}'\")\n", "\n", "# Utility function to register prompts if they are not already registered\n", "def register_initial_prompt(name: str, prompt: str):\n", " uc_name = f\"{UC_CATALOG}.{UC_SCHEMA}.{name}\"\n", " # Check if prompt is already registered\n", " if any([uc_name == prompt.name and int(prompt.tags['PromptVersionCount']) > 0 for prompt in registered_prompts]):\n", " return None\n", " # Register initial version of prompt\n", " return mlflow.genai.register_prompt(\n", " name=f\"{UC_CATALOG}.{UC_SCHEMA}.{name}\",\n", " template=prompt.strip(),\n", " commit_message=f\"Initial version of {name} prompt\",\n", " tags={\n", " \"agent\": \"clickbait_langgraph\",\n", " \"language\": \"en\"\n", " }\n", " )\n", "\n", "register_initial_prompt(\n", " name=\"title_extractor\",\n", " prompt=\"\"\"\n", "You are a title and content extractor agent.\n", "You will receive a message containing a title and content of an article or similar.\n", "You can also just receive a webpage url.\n", "If you receive an url, you will call the necessary tool to fetch the title and content.\n", "If you receive a prompt containing a title and content, you will extract the title and content from the message.\n", "Be careful not to use the fetch tool if there is no valid url in the message (in that case you should always call the extract_title_and_content tool).\n", " \"\"\"\n", ")\n", "\n", "register_initial_prompt(\n", " name=\"clickbait_classifier\",\n", " prompt=\"\"\"\n", "You are a clickbait detector intelligent agent.\n", "Given a headline you will analyze if it is clickbait or not.\n", "You will return a boolean clickbait value, and a list of reasons for your score.\n", "The reasons will be provided as bullet points (in markdown format, using '-' for each point).\n", " \"\"\"\n", ")\n", "\n", "register_initial_prompt(\n", " name=\"clickbait_rewriter\",\n", " prompt=\"\"\"\n", "You are a clickbait headline rewriter. Given a headline that has been classified as clickbait, you will rewrite in a way that is not considered to be clickbait, and that is more informative and related to the actual content. You will just return the new headline. The new headline will be written in the original language.\n", "\n", "The headline: {{title}}\n", "\n", "The content: {{content}}\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "2c847800-fbf8-44fd-9c4c-abfdfab7b8fe", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Tool definitions\n", "\n", "The agent will have the following two tools available:\n", "\n", "- `extract_title_and_content`: an identity function used to extract structured data from the user prompt.\n", "- `fetch_url_title_and_content`: fetches title and content from the given url, using *Jina AI reader API*.\n", "\n", " Note: Defining explicit extraction functions is not always necessary as LangGraph also provides the method `with_structured_outputs`. However, this method cannot be combined with `bind_tools`, that is needed for the web content fetching. More details about this limitation and different strategies for overcoming it can be found it in [this guide](https://langchain-ai.github.io/langgraph/how-tos/react-agent-structured-output)." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "08e655ca-5e0e-4cc6-8674-cf8b4b2ff186", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "import requests\n", "from urllib.parse import urlparse\n", "from langchain_core.tools import tool\n", "\n", "@tool\n", "def extract_title_and_content(title: str, content: str) -> dict[str, str]:\n", " \"\"\"Structure title and content from provided text into a dict.\n", " \n", " Args:\n", " title: The title of the article\n", " content: The content/body of the article\n", " \n", " Returns:\n", " dict: Dictionary with title and content\n", " \"\"\"\n", " return { \"title\": title, \"content\": content }\n", "\n", "@tool\n", "def fetch_url_title_and_content(url: str) -> dict[str, str]:\n", " \"\"\"\n", " Gets title and content from an url\n", "\n", " Arguments:\n", " url: The url to fetch data from\n", " \n", " Returns:\n", " dict: Dictionary with title and content\n", " \"\"\"\n", " \n", " parsed = urlparse(url)\n", " if parsed.scheme not in (\"http\", \"https\"):\n", " url = f\"https://{url}\"\n", " parsed = urlparse(url)\n", " if len(parsed.netloc) == 0:\n", " raise ValueError(f\"Invalid URL: {url}\")\n", " \n", " response = requests.get(f\"https://r.jina.ai/{url}\", headers={\n", " \"Accept\": \"application/json\",\n", " \"Authorization\": f\"Bearer {JINA_AI_TOKEN}\",\n", " \"X-Md-Heading-Style\": \"setext\",\n", " \"X-Base\": \"final\",\n", " \"X-Retain-Images\": \"none\",\n", " \"X-Md-Link-Style\": \"discarded\",\n", " \"X-Timeout\": \"10\",\n", " }).json()\n", "\n", " return {\n", " \"title\": response[\"data\"].get(\"title\", \"\"),\n", " \"content\": response[\"data\"].get(\"content\", \"\"),\n", " }\n", "\n", "tools = [extract_title_and_content, fetch_url_title_and_content]" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "66853e01-c64d-4c1d-9566-4f075b1afaa6", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Structured outputs and agent state\n", "\n", "- Sometimes it's useful to extract structured data from llm calls. This can be done using pydantic models and the `with_structured_output` method.\n", "- LangGraph agents also have a state, a data object that will be passed between graph nodes." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "e61232d1-b9e6-4e72-9919-51e0e6345fbd", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "from typing import TypedDict\n", "from pydantic import BaseModel\n", "from langchain_core.messages import BaseMessage\n", "\n", "class Article(BaseModel):\n", " title: str\n", " content: str\n", "\n", "class Classification(BaseModel):\n", " is_clickbait: bool\n", " classification_reason: str\n", "\n", "class ClickbaitAgentState(BaseModel):\n", " messages: list[BaseMessage] = []\n", " title: str | None = None\n", " content: str | None = None\n", " is_clickbait: bool | None = None\n", " classification_reason: str | None = None\n", " rewritten_title: str | None = None" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "1ce34236-f353-47a0-bbc2-2354c8b2aa62", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### LangGraph agent\n", "\n", "In the next code block the LangGraph agent is implemented by defining nodes and edges between them. \n", "- Some functions are declared to define behavior of nodes.\n", " - Nodes receive a state parameter containing the current graph state. Each node will process messages or state and return state changes. Nodes only return the state keys that contain changes.\n", " - Most nodes are wrappers around a model call, possibly using `bind_tools` or `with_structured_outputs`.\n", " - The node `tool_call` is a wrapper around `ToolNode`. This node executes called tools and saves the outputs in the state.\n", "- A `StateGraph` is declared, based on the previously declared state dictionary `ClickbaitAgentState`.\n", "- These functions are registered as nodes in the graph with the `RunnableLambda` utility.\n", "- The edges between nodes are defined with:\n", " - `add_edge` to declare static directional edges between nodes.\n", " - `add_conditional_edges` to route based on conditions.\n", "- The graph entry point is configured with `set_entry_point`. The execution will stop on reaching the special `END` node." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "7bf2da9b-6f9c-487a-a743-5749d5392a16", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "from typing import Sequence, Union, TypedDict\n", "\n", "from databricks_langchain import ChatDatabricks\n", "\n", "import mlflow\n", "\n", "from langchain_core.language_models import LanguageModelLike\n", "from langchain_core.runnables import RunnableConfig, RunnableLambda\n", "from langchain_core.tools import BaseTool\n", "from langchain_core.messages import AIMessage, SystemMessage, HumanMessage\n", "\n", "from langgraph.graph import END, StateGraph\n", "from langgraph.graph.graph import CompiledGraph\n", "from langgraph.graph.state import CompiledStateGraph\n", "from langgraph.prebuilt import ToolNode\n", "\n", "def create_clickbait_agent(\n", " tools: Sequence[BaseTool],\n", " model_serving_endpoint: str,\n", ") -> CompiledGraph:\n", " \n", " model = ChatDatabricks(endpoint=model_serving_endpoint)\n", "\n", " # Load prompts from the registry\n", " title_extractor_prompt = mlflow.genai.load_prompt(f\"prompts:/{UC_CATALOG}.{UC_SCHEMA}.title_extractor/1\")\n", " clickbait_classifier_prompt = mlflow.genai.load_prompt(f\"prompts:/{UC_CATALOG}.{UC_SCHEMA}.clickbait_classifier/1\")\n", " clickbait_rewriter_prompt = mlflow.genai.load_prompt(f\"prompts:/{UC_CATALOG}.{UC_SCHEMA}.clickbait_rewriter/1\")\n", "\n", " def get_title_and_content(\n", " state: ClickbaitAgentState,\n", " config: RunnableConfig,\n", " ):\n", " chain = (\n", " RunnableLambda(lambda state: [\n", " SystemMessage(content=title_extractor_prompt.template)\n", " ] + state.messages)\n", " | model.bind_tools(tools)\n", " )\n", " response = chain.invoke(state, config)\n", " return {\"messages\": [response]}\n", " \n", " def tool_call(state: ClickbaitAgentState, config: RunnableConfig):\n", " result = ToolNode(tools).invoke(state, config)\n", " article = Article.model_validate_json(result[\"messages\"][-1].content)\n", " return {\n", " \"messages\": state.messages + result[\"messages\"],\n", " \"title\": article.title,\n", " \"content\": article.content\n", " }\n", " \n", " def classify(\n", " state: ClickbaitAgentState,\n", " config: RunnableConfig,\n", " ):\n", " chain = (\n", " RunnableLambda(lambda state: [\n", " SystemMessage(content=clickbait_classifier_prompt.template),\n", " HumanMessage(content=state.title)\n", " ])\n", " | model.with_structured_output(Classification)\n", " )\n", " classification = chain.invoke(state, config)\n", " return {\n", " \"messages\": state.messages\n", " + [AIMessage(content=f\"Classification result: {classification.model_dump()}\")],\n", " \"is_clickbait\": classification.is_clickbait,\n", " \"classification_reason\": classification.classification_reason,\n", " }\n", " \n", " def is_clickbait(\n", " state: ClickbaitAgentState,\n", " config: RunnableConfig,\n", " ):\n", " assert state.is_clickbait is not None\n", " return \"clickbait\" if state.is_clickbait else \"no_clickbait\"\n", " \n", " def clickbait_response(\n", " state: ClickbaitAgentState,\n", " config: RunnableConfig\n", " ):\n", " assert state.title is not None\n", " assert state.content is not None\n", " chain = (\n", " RunnableLambda(lambda state: [\n", " HumanMessage(content = clickbait_rewriter_prompt.format(\n", " title=state.title,\n", " content=state.content,\n", " ))\n", " ])\n", " | model\n", " )\n", " response = chain.invoke(state, config)\n", " return { \"messages\": state.messages + [response]}\n", " \n", " def no_clickbait_response(\n", " state: ClickbaitAgentState,\n", " config: RunnableConfig\n", " ):\n", " return { \"messages\": state.messages + [AIMessage(content=state.title)] }\n", " \n", " graph = StateGraph(ClickbaitAgentState)\n", "\n", " graph.add_node(\"get_title_and_content\", RunnableLambda(get_title_and_content))\n", " graph.add_node(\"tool_call\", RunnableLambda(tool_call))\n", " graph.add_node(\"classify\", RunnableLambda(classify))\n", " graph.add_node(\"clickbait_response\", RunnableLambda(clickbait_response))\n", " graph.add_node(\"no_clickbait_response\", RunnableLambda(no_clickbait_response))\n", "\n", " graph.set_entry_point(\"get_title_and_content\")\n", " graph.add_edge(\"get_title_and_content\", \"tool_call\")\n", " graph.add_edge(\"tool_call\", \"classify\")\n", " graph.add_conditional_edges(\n", " \"classify\",\n", " is_clickbait,\n", " {\n", " \"clickbait\": \"clickbait_response\",\n", " \"no_clickbait\": \"no_clickbait_response\",\n", " },\n", " )\n", " graph.add_edge(\"clickbait_response\", END)\n", " graph.add_edge(\"no_clickbait_response\", END)\n", " \n", " return graph.compile()" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "abf08bcd-5c45-4a2f-8a45-bb89a27b6e90", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Logging a LangGraph agent into MLflow\n", "\n", "- Activating logging executions of our agent, called *traces*, is as simple as executing `mlflow.langchain.autolog`.\n", "- `set_active_model` is used to set the active model with the specified name. This model will be linked to traces generated from now on. The model name can include the version. In this notebook the version is not being associated to a versioning system. [This page](https://mlflow.org/docs/latest/genai/prompt-version-mgmt/version-tracking/track-application-versions-with-mlflow#step-3-link-traces-to-the-application-version) of MLflow documentation gives some insights about agent versioning.\n", "- Model hyperparameters can be associated to the active model using `log_model_params`.\n", "- Finally the agent graph is instantiated. It's important to leave this step as the last one, after enabling logging and registering the active model. Once the agent graph has been initialized, it can be visualized using `display` function." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "ac527b9f-947d-40a3-93ea-b680f0a3629f", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [ { "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ "2025/06/24 11:03:03 INFO mlflow.tracking.fluent: LoggedModel with name 'clickbait-langgraph-1750762982' does not exist, creating one...\n2025/06/24 11:03:04 INFO mlflow.tracking.fluent: Active model is set to the logged model with ID: m-3505dfc179ef47daba099a06eae6cec6\n" ] }, { "output_type": "display_data", "data": { "image/png": "", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from time import time\n", "\n", "mlflow.langchain.autolog()\n", "active_model_info = mlflow.set_active_model(name=f\"clickbait-langgraph-{int(time())}\")\n", "mlflow.log_model_params(model_id=active_model_info.model_id, params=AGENT_PARAMETERS)\n", "agent = create_clickbait_agent(tools, **AGENT_PARAMETERS)\n", "display(agent)" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "5ab848b6-08be-452f-b2df-9561f5d321cd", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "## Invoke the agent\n", "\n", "With the agent declared, it is now possible to call the invoke method by passing an user message.\n", "\n", "We can confirm the autologging is working by checking the produced trace in the block results." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "1cc2fdb6-5945-44d9-9d63-a5a1ca2b0ff1", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "{'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'fetch_url_title_and_content', 'type': 'function', 'function': {'name': 'fetch_url_title_and_content', 'arguments': '{\"url\":\"https://www.lavanguardia.com/cribeo/fast-news/20250527/10723991/popular-nombre-masculino-espana-jonathan-coeficiente-intelectual-bajo-mmn.amp.html\"}'}}]}, response_metadata={'model': 'gemini-2.0-flash-lite', 'usage': {'prompt_tokens': 290, 'completion_tokens': 71, 'total_tokens': 361}, 'object': 'chat.completion', 'id': None, 'created': 1750762987, 'model_name': 'gemini-2.0-flash-lite'}, id='run--e5151a31-08eb-47d2-8c57-fec7e228e3d4-0', tool_calls=[{'name': 'fetch_url_title_and_content', 'args': {'url': 'https://www.lavanguardia.com/cribeo/fast-news/20250527/10723991/popular-nombre-masculino-espana-jonathan-coeficiente-intelectual-bajo-mmn.amp.html'}, 'id': 'fetch_url_title_and_content', 'type': 'tool_call'}]),\n", " ToolMessage(content='{\"title\": \"El popular nombre masculino en España que se asocia al coeficiente intelectual más bajo que hay\", \"content\": \"¿Puede el nombre que llevas influir en cómo te perciben los demás… o incluso en tu inteligencia? Según una reciente investigación realizada por un grupo de profesores de la Universidad de Stanford, existe una correlación estadística entre ciertos nombres y los resultados obtenidos en test de coeficiente intelectual.\\\\n\\\\nEl nombre masculino que, según los datos analizados, aparece con el CI medio más bajo es Jonathan, uno de los más comunes en países como España, Estados Unidos o México.\\\\n\\\\n### El nombre con peor puntuación en las pruebas de CI\\\\n\\\\nEl estudio, basado en una muestra de 70.000 personas, encontró que quienes se llamaban Jonathan registraban una puntuación media en los test de inteligencia de alrededor de 80 puntos, muy por debajo del promedio general establecido en 100. Esta cifra situaría a quienes llevan este nombre dentro del rango considerado de capacidad intelectual baja.\\\\n\\\\nEl calvario de David Castillo al encarnar a Jonathan en \\'Aída\\'\\\\n\\\\nCaptura Youtube/El sentido de la birra\\\\n\\\\nEste hallazgo resulta especialmente llamativo en España, donde más de 24.000 personas se llaman Jonathan, y existen además variantes como Jonatan (más de 19.000) y Yonatan (alrededor de 1.000), según los datos del INE.\\\\n\\\\n**Un vínculo cuestionado por expertos**. A pesar del impacto que puede tener un titular así, muchos especialistas en psicología y educación ponen en entredicho la validez de este tipo de investigaciones. ¿El motivo? Asociar directamente el nombre de una persona con su capacidad intelectual es, según señalan, metodológicamente problemático.\\\\n\\\\nEl nombre no es una variable intrínseca de la persona, sino que está profundamente ligado a factores culturales, sociales y económicos, que sí pueden tener una influencia real en el desarrollo cognitivo. Es decir, lo que puede existir no es una relación causal entre nombre e inteligencia, sino una correlación indirecta a través del entorno en el que nacen y crecen muchas personas con determinados nombres.\\\\n\\\\nCada vez tenemos puntuaciones de coeficiente intelectual más bajas\\\\n\\\\nOtras Fuentes\\\\n\\\\n### La inteligencia es más que un número\\\\n\\\\nOtro punto clave que critican los expertos es la reducción de la inteligencia al coeficiente intelectual. Aunque el CI puede ofrecer información sobre ciertas habilidades cognitivas como la memoria, la lógica o la capacidad matemática, no mide aspectos como la creatividad, la inteligencia emocional, la adaptabilidad, ni el pensamiento crítico, todos ellos esenciales para el desarrollo humano.\\\\n\\\\nPor tanto, este tipo de estudios, aunque llamativos, deben leerse con cautela. El riesgo de perpetuar estereotipos injustos y simplistas es elevado, y más aún si no se tiene en cuenta el contexto completo en el que vive cada persona.\"}', name='fetch_url_title_and_content', tool_call_id='fetch_url_title_and_content'),\n", " AIMessage(content=\"Classification result: {'is_clickbait': True, 'classification_reason': 'The headline uses a sensationalist claim about a popular male name in Spain and its association with the lowest IQ, which is likely to be an exaggeration or a generalization.'}\", additional_kwargs={}, response_metadata={}),\n", " AIMessage(content='Estudio encuentra correlación entre el nombre \"Jonathan\" y puntuaciones más bajas en pruebas de coeficiente intelectual, pero expertos cuestionan la validez.\\n', additional_kwargs={}, response_metadata={'model': 'gemini-2.0-flash-lite', 'usage': {'prompt_tokens': 670, 'completion_tokens': 31, 'total_tokens': 701}, 'object': 'chat.completion', 'id': None, 'created': 1750762993, 'model_name': 'gemini-2.0-flash-lite'}, id='run--d3dae01e-0492-4242-990c-1c2c8ad953a5-0')],\n", " 'title': 'El popular nombre masculino en España que se asocia al coeficiente intelectual más bajo que hay',\n", " 'content': \"¿Puede el nombre que llevas influir en cómo te perciben los demás… o incluso en tu inteligencia? Según una reciente investigación realizada por un grupo de profesores de la Universidad de Stanford, existe una correlación estadística entre ciertos nombres y los resultados obtenidos en test de coeficiente intelectual.\\n\\nEl nombre masculino que, según los datos analizados, aparece con el CI medio más bajo es Jonathan, uno de los más comunes en países como España, Estados Unidos o México.\\n\\n### El nombre con peor puntuación en las pruebas de CI\\n\\nEl estudio, basado en una muestra de 70.000 personas, encontró que quienes se llamaban Jonathan registraban una puntuación media en los test de inteligencia de alrededor de 80 puntos, muy por debajo del promedio general establecido en 100. Esta cifra situaría a quienes llevan este nombre dentro del rango considerado de capacidad intelectual baja.\\n\\nEl calvario de David Castillo al encarnar a Jonathan en 'Aída'\\n\\nCaptura Youtube/El sentido de la birra\\n\\nEste hallazgo resulta especialmente llamativo en España, donde más de 24.000 personas se llaman Jonathan, y existen además variantes como Jonatan (más de 19.000) y Yonatan (alrededor de 1.000), según los datos del INE.\\n\\n**Un vínculo cuestionado por expertos**. A pesar del impacto que puede tener un titular así, muchos especialistas en psicología y educación ponen en entredicho la validez de este tipo de investigaciones. ¿El motivo? Asociar directamente el nombre de una persona con su capacidad intelectual es, según señalan, metodológicamente problemático.\\n\\nEl nombre no es una variable intrínseca de la persona, sino que está profundamente ligado a factores culturales, sociales y económicos, que sí pueden tener una influencia real en el desarrollo cognitivo. Es decir, lo que puede existir no es una relación causal entre nombre e inteligencia, sino una correlación indirecta a través del entorno en el que nacen y crecen muchas personas con determinados nombres.\\n\\nCada vez tenemos puntuaciones de coeficiente intelectual más bajas\\n\\nOtras Fuentes\\n\\n### La inteligencia es más que un número\\n\\nOtro punto clave que critican los expertos es la reducción de la inteligencia al coeficiente intelectual. Aunque el CI puede ofrecer información sobre ciertas habilidades cognitivas como la memoria, la lógica o la capacidad matemática, no mide aspectos como la creatividad, la inteligencia emocional, la adaptabilidad, ni el pensamiento crítico, todos ellos esenciales para el desarrollo humano.\\n\\nPor tanto, este tipo de estudios, aunque llamativos, deben leerse con cautela. El riesgo de perpetuar estereotipos injustos y simplistas es elevado, y más aún si no se tiene en cuenta el contexto completo en el que vive cada persona.\",\n", " 'is_clickbait': True,\n", " 'classification_reason': 'The headline uses a sensationalist claim about a popular male name in Spain and its association with the lowest IQ, which is likely to be an exaggeration or a generalization.'}" ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "display_data", "data": { "application/databricks.mlflow.trace": "\"tr-70aaab048062ee7e5853c6a218d202b0\"", "text/plain": [ "Trace(trace_id=tr-70aaab048062ee7e5853c6a218d202b0)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "input_example = {\"messages\": [HumanMessage(content=\n", " \"is this article clickbait https://www.lavanguardia.com/cribeo/fast-news/20250527/10723991/popular-nombre-masculino-espana-jonathan-coeficiente-intelectual-bajo-mmn.amp.html ?\"\n", ")]}\n", "\n", "output = agent.invoke(input_example)\n", "\n", "display(output)" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "879508e6-953d-4b8c-922c-83917c55b383", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "## Agent evaluation\n", "\n", "Once an agent is implemented it is essential to define a set of evaluation scores that enable both quantitative and qualitative assessments of its behavior. This enables the gathering of meaningful insights about how the model—or specific components of it—are performing.\n", "\n", "MLflow 3's new GenAI API introduces the `evaluate`, which supports executing scorer functions on traces or custom function outputs, via the `predict_fn` parameter. This `predict_fn` function allows wrapping the entire agent agent or individual components-like single nodes-making it possible to reproduce classic testing paradigms such as *integration tests* and *unit tests* within the agent evaluation workflow.\n", "\n", "Evaluation metrics are defined using the `@scorer` decorator. Custom scorer functions can access inputs, outputs, traces, and expectations, giving them full context to evaluate agent behavior.\n", "\n", "Custom scorers can tipically be written in an implementation-agnostic way, allowing the agent logic to be replaced—potentially even with a different framework, such as DSPy—without requiring changes to the scorers themselves. The only likely adjustment needed when switching implementations is updating the `predict_fn`, particularly if the new implementation introduces a different function signature.\n", "\n", "Inside a scorer, it's possible to make custom calls to language models—for example, by passing them the output to assess specific properties. This approach resembles *property-based testing*, where evaluations are derived from general behavioral rules rather than hardcoded expected outputs. Scorers that rely on language models for evaluation are known as judges. Databricks provides several [built-in judges](https://docs.databricks.com/aws/en/mlflow3/genai/eval-monitor/predefined-judge-scorers) for common evaluation scenarios, streamlining the process for standard use cases." ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "4e378244-a5ff-40c0-b23c-ad8ba1094336", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Useful datasets\n", "\n", "In the following block some datasets are imported from the *kaggle clickbait dataset* and the `urls_data.json`." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "1369c1bc-e919-4424-9388-349934512d58", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [], "source": [ "from mlflow.entities import Feedback, Trace\n", "from mlflow.genai.scorers import scorer\n", "import pandas as pd\n", "import random\n", "\n", "# Kaggle dataset\n", "corpus = pd.read_csv(CORPUS_FILE)\n", "SAMPLE_SIZE = 100\n", "RANDOM_SEED = 42 # fix the random seed to get the same sample every time\n", "clickbait_sample = corpus[corpus[\"clickbait\"] == 1].sample(n=SAMPLE_SIZE//2, random_state=RANDOM_SEED)\n", "non_clickbait_sample = corpus[corpus[\"clickbait\"] == 0].sample(n=SAMPLE_SIZE//2, random_state=RANDOM_SEED)\n", "corpus_sample = pd.concat([clickbait_sample, non_clickbait_sample], ignore_index=True)\n", "\n", "# dataset with url, title and content (20 rows)\n", "url_articles = pd.read_json(\"./urls_data.json\")" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "adfd0a27-3f41-4e29-b6d9-c9e11aa9d4db", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Classification correctness\n", "\n", "The `check_clickbait` scorer evaluates whether the agent correctly classifies input titles as clickbait or not.\n", "- The dataset consists of inputs with a title field and expectations containing the expected clickbait classification label.\n", "- To isolate and evaluate just the classification logic, the predict_fn wraps the invoke method of the `classify` node of the agent.\n", "- The scorer compares the agent’s output to the expected label, marking incorrect predictions as either false positives or false negatives." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "6e342b43-032d-4765-b9c7-90c127fae6cb", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [ { "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ "2025/06/24 11:03:15 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset.\n2025/06/24 11:03:21 INFO mlflow.tracking.fluent: Active model is set to the logged model with ID: m-3505dfc179ef47daba099a06eae6cec6\n2025/06/24 11:03:21 INFO mlflow.tracking.fluent: Use `mlflow.set_active_model` to set the active model to a different one if needed.\n2025/06/24 11:03:22 INFO mlflow.models.evaluation.utils.trace: Auto tracing is temporarily enabled during the model evaluation for computing some metrics and debugging. To disable tracing, call `mlflow.autolog(disable=True)`.\n" ] }, { "output_type": "display_data", "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a7d09c0a65294ed9996a9778610ee1e4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Evaluating: 0%| | 0/100 [Elapsed: 00:00, Remaining: ?] " ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", " Evaluation output\n", " \n", " \n", " \n", "\n", "\n", "
\n", " \n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "display_data", "data": { "application/databricks.mlflow.trace": "[\"tr-4fa9dffeb4cf7bbbfa846667f9aaadec\", \"tr-ccf8b80af7b7a0e7ce842e76a008741a\", \"tr-6097b19e20f7a542e2ba90c5d0b92316\", \"tr-15e402862170e0507f62e343148f6f96\", \"tr-3dcf5af1483ba1ff0a4a5173baa4f5af\", \"tr-1faadd7bff3a118ba43da5beee9ad1ea\", \"tr-c1a38835f3056a584d3fa206e33a92d3\", \"tr-9b524e1d84757b8321527cb86141e231\", \"tr-7acf1d30d0fce450411873cf6445fb37\", \"tr-9a18ef6f23a39461f287719ec5ad7877\"]", "text/plain": [ "[Trace(trace_id=tr-4fa9dffeb4cf7bbbfa846667f9aaadec), Trace(trace_id=tr-ccf8b80af7b7a0e7ce842e76a008741a), Trace(trace_id=tr-6097b19e20f7a542e2ba90c5d0b92316), Trace(trace_id=tr-15e402862170e0507f62e343148f6f96), Trace(trace_id=tr-3dcf5af1483ba1ff0a4a5173baa4f5af), Trace(trace_id=tr-1faadd7bff3a118ba43da5beee9ad1ea), Trace(trace_id=tr-c1a38835f3056a584d3fa206e33a92d3), Trace(trace_id=tr-9b524e1d84757b8321527cb86141e231), Trace(trace_id=tr-7acf1d30d0fce450411873cf6445fb37), Trace(trace_id=tr-9a18ef6f23a39461f287719ec5ad7877)]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "titles_dataset = (\n", " corpus_sample\n", " .apply(\n", " lambda row: {\n", " \"inputs\": {\"title\": row[\"headline\"]},\n", " \"expectations\": {\"is_clickbait\": (row[\"clickbait\"] == 1)},\n", " },\n", " axis=1,\n", " result_type=\"expand\",\n", " )\n", " .sample(frac=1, random_state=42)\n", ")\n", "\n", "def run_classify_node(title: str):\n", " return agent.nodes[\"classify\"].invoke(\n", " ClickbaitAgentState(title=title)\n", " )\n", "\n", "@scorer\n", "def check_clickbait(outputs, expectations):\n", "\n", " value = outputs[\"is_clickbait\"] == expectations[\"is_clickbait\"] \n", " rationale = (\n", " \"The model predicted the wrong clickbait status\" if value else \"The model predicted the correct clickbait status\"\n", " )\n", " \n", " feedback = [Feedback(\n", " name = \"success\",\n", " value = value,\n", " rationale=rationale\n", " )] \n", "\n", " if not value:\n", " if expectations[\"is_clickbait\"]:\n", " feedback.append(Feedback(\n", " name = \"false negative\",\n", " value = True))\n", " feedback.append(Feedback(\n", " name = \"false positive\",\n", " value = False))\n", " else :\n", " feedback.append(Feedback(\n", " name = \"false negative\",\n", " value = False))\n", " feedback.append(Feedback(\n", " name = \"false positive\",\n", " value = True))\n", "\n", " return feedback\n", "\n", "evaluation = mlflow.genai.evaluate(\n", " data=titles_dataset,\n", " scorers=[check_clickbait],\n", " predict_fn=run_classify_node\n", ")" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "d505cad3-dba7-4834-a9be-5e520a8de04d", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Clickbait response correctness\n", "\n", "The `check_clickbait_response` scorer evaluates whether the agent successfully rewrites titles originally classified as clickbait, eliminating the clickbait aspect in the revised version.\n", "\n", "- The dataset includes title and content fields as inputs.\n", "- To isolate and evaluate the rewriting logic, the `predict_fn` wraps the invoke method of the `clickbait_response` node.\n", "- The scorer then uses the `classify` node as a judge, applying it to the rewritten output to determine whether it would still be classified as clickbait." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "2fcf15a4-1cae-4911-a498-870a36239817", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [ { "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ "2025/06/24 11:03:48 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset.\n2025/06/24 11:03:49 INFO mlflow.tracking.fluent: Active model is set to the logged model with ID: m-3505dfc179ef47daba099a06eae6cec6\n2025/06/24 11:03:49 INFO mlflow.tracking.fluent: Use `mlflow.set_active_model` to set the active model to a different one if needed.\n" ] }, { "output_type": "display_data", "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d18a7c9f87f94d31b49b6c56b079f51d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Evaluating: 0%| | 0/20 [Elapsed: 00:00, Remaining: ?] " ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", " Evaluation output\n", " \n", " \n", " \n", "\n", "\n", "
\n", " \n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "display_data", "data": { "application/databricks.mlflow.trace": "[\"tr-8f1ec97bce68ac772ca165640f81f01c\", \"tr-e0f860c862b29f920ce46b2e846f46fc\", \"tr-46cc4059d4974b4f9bc787fa926ce7b3\", \"tr-7881e9d0764cd2b3bbd8b2e41f0e5971\", \"tr-ad79b7e6bfb02c6bbc3b9de27238fdf3\", \"tr-b41e8b69bb5bf9f10c5599ae64e922aa\", \"tr-d261cd3b76be6d57d8d9ab2d2cd8ac76\", \"tr-d0bf3896118a42fd1acddb6ddbd119c0\", \"tr-f1305d5fcc3172f03948a6591e9714ec\", \"tr-65cb34b61daadeb5894f49942d30df15\"]", "text/plain": [ "[Trace(trace_id=tr-8f1ec97bce68ac772ca165640f81f01c), Trace(trace_id=tr-e0f860c862b29f920ce46b2e846f46fc), Trace(trace_id=tr-46cc4059d4974b4f9bc787fa926ce7b3), Trace(trace_id=tr-7881e9d0764cd2b3bbd8b2e41f0e5971), Trace(trace_id=tr-ad79b7e6bfb02c6bbc3b9de27238fdf3), Trace(trace_id=tr-b41e8b69bb5bf9f10c5599ae64e922aa), Trace(trace_id=tr-d261cd3b76be6d57d8d9ab2d2cd8ac76), Trace(trace_id=tr-d0bf3896118a42fd1acddb6ddbd119c0), Trace(trace_id=tr-f1305d5fcc3172f03948a6591e9714ec), Trace(trace_id=tr-65cb34b61daadeb5894f49942d30df15)]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "clickbait_titles_dataset = (\n", " url_articles\n", " .apply(\n", " lambda row: {\n", " \"inputs\": {\n", " \"title\": row[\"title\"],\n", " \"content\": row[\"content\"]\n", " }\n", " },\n", " axis=1,\n", " result_type=\"expand\",\n", " )\n", ")\n", "\n", "def run_clickbait_response_node(title: str, content: str):\n", " return agent.nodes[\"clickbait_response\"].invoke(\n", " ClickbaitAgentState(title=title, content=content)\n", " )\n", "\n", "@scorer\n", "def check_clickbait_response(inputs, outputs):\n", " output_title = outputs[\"messages\"][-1].content\n", " judge_output = agent.nodes[\"classify\"].invoke(\n", " ClickbaitAgentState(title=output_title)\n", " )\n", " return Feedback(\n", " name=\"response is no clickbait\",\n", " value=not judge_output[\"is_clickbait\"],\n", " rationale=judge_output[\"classification_reason\"],\n", " )\n", "\n", "evaluation = mlflow.genai.evaluate(\n", " data=clickbait_titles_dataset,\n", " scorers=[check_clickbait_response],\n", " predict_fn=run_clickbait_response_node\n", ")" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "22e6391a-ca71-48ed-a68e-63325522264c", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Tool call correctness\n", "\n", "The `check_tool_use` scorer verifies whether the agent invoked the expected tools during execution.\n", "\n", "- The dataset consists of user messages (in this case, URLs) as inputs, with expectations specifying that the agent should call the `fetch_url_title_and_content` tool.\n", "- The `predict_fn` wraps the full agent invocation, simulating a user message being processed end-to-end.\n", "- The scorer inspects the execution trace and collects all tool invocations by searching for spans of type `\"TOOL\"`. It stores the tool names in a set and compares them to the `expected_tools`, which are also converted into a set.\n", "\n", "Note: Tool names are stored in sets to ignore the order of invocation. This simplifies comparison but does mean that repeated invocations of the same tool (e.g., retries or multiple uses) are ignored. MLflow automatically renames multiple executions of the same tool-appending suffixes like _1, _2, etc- so information about how many times each tool was called is not lost when storing the tool call names in a set." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "3128e56a-2376-49bd-bdc3-51d678640a70", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [ { "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ "2025/06/24 11:03:58 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset.\n2025/06/24 11:04:13 INFO mlflow.tracking.fluent: Active model is set to the logged model with ID: m-3505dfc179ef47daba099a06eae6cec6\n2025/06/24 11:04:13 INFO mlflow.tracking.fluent: Use `mlflow.set_active_model` to set the active model to a different one if needed.\n" ] }, { "output_type": "display_data", "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2acd28b455dc4520915a16ea88f7952a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Evaluating: 0%| | 0/20 [Elapsed: 00:00, Remaining: ?] " ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", " Evaluation output\n", " \n", " \n", " \n", "\n", "\n", "
\n", " \n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "display_data", "data": { "application/databricks.mlflow.trace": "[\"tr-876c166d0f8f9758bb65b67f4dd938b9\", \"tr-61e3d40289989544614fd161d26629af\", \"tr-60211a55456b541afa70094026c1c8ee\", \"tr-cefbb0219a51562cbf6f2da864f09760\", \"tr-81224a14aea007c3466283ea3d50b7f7\", \"tr-346361759f634cc47e789290b8eedede\", \"tr-d1459550022334ebc78f4c73f1e29577\", \"tr-d71bc888967ec6b5a8fb5ba80cf9ae49\", \"tr-43550366f00d88b637841e01f8a17dc3\", \"tr-dc13d75f9f48c5a074888024def6b72a\"]", "text/plain": [ "[Trace(trace_id=tr-876c166d0f8f9758bb65b67f4dd938b9), Trace(trace_id=tr-61e3d40289989544614fd161d26629af), Trace(trace_id=tr-60211a55456b541afa70094026c1c8ee), Trace(trace_id=tr-cefbb0219a51562cbf6f2da864f09760), Trace(trace_id=tr-81224a14aea007c3466283ea3d50b7f7), Trace(trace_id=tr-346361759f634cc47e789290b8eedede), Trace(trace_id=tr-d1459550022334ebc78f4c73f1e29577), Trace(trace_id=tr-d71bc888967ec6b5a8fb5ba80cf9ae49), Trace(trace_id=tr-43550366f00d88b637841e01f8a17dc3), Trace(trace_id=tr-dc13d75f9f48c5a074888024def6b72a)]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tool_use_dataset = url_articles.apply(\n", " lambda row: {\n", " \"inputs\": {\"user_message\": row[\"url\"]},\n", " \"expectations\": {\"expected_tools\": [\"fetch_url_title_and_content\"]},\n", " },\n", " axis=1,\n", " result_type=\"expand\",\n", ")\n", "\n", "def invoke_agent(user_message: str):\n", " return agent.invoke(\n", " {\"messages\": [HumanMessage(content=user_message)]}\n", " )\n", "\n", "@scorer\n", "def check_tool_use(trace, expectations):\n", " tool_calls = {span.name for span in trace.search_spans(span_type=\"TOOL\")}\n", "\n", " if not tool_calls:\n", " return Feedback(value=False, rationale=\"No tool calls found\")\n", " \n", " expected_tools = set(expectations[\"expected_tools\"])\n", "\n", " if expected_tools != tool_calls:\n", " return Feedback(\n", " value=False,\n", " rationale=(\n", " \"Tool calls did not match expectations.\\n\"\n", " f\"Expected {expected_tools} but got {tool_calls}.\"\n", " )\n", " )\n", " return Feedback(value=True)\n", "\n", "check_tool_use_eval_result = mlflow.genai.evaluate(\n", " data=tool_use_dataset, scorers=[check_tool_use], predict_fn=invoke_agent\n", ")" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "8d6e9121-9a55-45ea-9c86-b8629b7c981e", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "source": [ "### Predefined judges\n", "\n", "MLflow GenAI provides several built-in judges, that can be used to evaluate common behavioral expectations without writing custom logic. Two such predefined judges are `Safety` and `Guidelines`.\n", "\n", "- The `Safety` scorer evaluates content—whether generated by the application or provided by a user—for harmful, unethical, or inappropriate material.\n", "- The `Guidelines` scorer enables fast and flexible evaluation based on natural language rules, framed as binary pass/fail conditions. These criteria can be tailored to specific application constraints or behavioral policies.\n", "\n", "Unlike earlier examples that require defining a `predict_fn` to execute and evaluate an agent function, this approach uses the data parameter to evaluate previously executed traces. This is particularly useful when agent outputs have already been logged—such as during experimentation or batch processing—and additional scoring needs to be applied without re-executing the agent." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "ee15813f-8539-4507-a781-473f608748b7", "showTitle": false, "tableResultSettingsMap": {}, "title": "" } }, "outputs": [ { "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ "2025/06/24 11:04:48 INFO mlflow.tracking.fluent: Active model is set to the logged model with ID: m-3505dfc179ef47daba099a06eae6cec6\n2025/06/24 11:04:48 INFO mlflow.tracking.fluent: Use `mlflow.set_active_model` to set the active model to a different one if needed.\n" ] }, { "output_type": "display_data", "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c4063d0b7f824b2aa540ac5126437cb7", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Evaluating: 0%| | 0/20 [Elapsed: 00:00, Remaining: ?] " ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", " Evaluation output\n", " \n", " \n", " \n", "\n", "\n", "
\n", " \n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "output_type": "display_data", "data": { "application/databricks.mlflow.trace": "[\"tr-5f1c1f65a456a1ce427e946bfebd8441\", \"tr-dc13d75f9f48c5a074888024def6b72a\", \"tr-2e2ba56a964708470c413552e29752fd\", \"tr-d31fe933873ee75a035583319bc5fbd2\", \"tr-d6e9124245995f7cb8b8bed9ab29f22f\", \"tr-cc042bd8881398b87123a483698016ac\", \"tr-af28a1961642df91f100d885e7cb4aca\", \"tr-61e3d40289989544614fd161d26629af\", \"tr-81224a14aea007c3466283ea3d50b7f7\", \"tr-346361759f634cc47e789290b8eedede\"]", "text/plain": [ "[Trace(trace_id=tr-5f1c1f65a456a1ce427e946bfebd8441), Trace(trace_id=tr-dc13d75f9f48c5a074888024def6b72a), Trace(trace_id=tr-2e2ba56a964708470c413552e29752fd), Trace(trace_id=tr-d31fe933873ee75a035583319bc5fbd2), Trace(trace_id=tr-d6e9124245995f7cb8b8bed9ab29f22f), Trace(trace_id=tr-cc042bd8881398b87123a483698016ac), Trace(trace_id=tr-af28a1961642df91f100d885e7cb4aca), Trace(trace_id=tr-61e3d40289989544614fd161d26629af), Trace(trace_id=tr-81224a14aea007c3466283ea3d50b7f7), Trace(trace_id=tr-346361759f634cc47e789290b8eedede)]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from mlflow.genai.scorers import Safety, Guidelines\n", "\n", "traces = mlflow.search_traces(run_id=check_tool_use_eval_result.run_id)\n", "\n", "mlflow.genai.evaluate(\n", " data=traces,\n", " scorers=[\n", " Safety(),\n", " Guidelines(name=\"question\", guidelines=\"The response must not contain a question\")\n", " ],\n", ")" ] } ], "metadata": { "application/vnd.databricks.v1+notebook": { "computePreferences": null, "dashboards": [], "environmentMetadata": null, "inputWidgetPreferences": null, "language": "python", "notebookMetadata": { "pythonIndentUnit": 4 }, "notebookName": "Clickbait agent - LangGraph", "widgets": {} }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }