<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/callbacks/OpenInferenceCallback.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在Colab中打开"/></a>


# OpenInference回调处理程序 + Arize Phoenix

[OpenInference](https://github.com/Arize-ai/open-inference-spec)是用于捕获和存储AI模型推断的开放标准。它使生产LLMapp服务器能够无缝集成LLM可观测解决方案，如[Arize](https://arize.com/)和[Phoenix](https://github.com/Arize-ai/phoenix)。

`OpenInferenceCallbackHandler`保存来自LLM应用程序的数据，以进行下游分析和调试。特别是，它以列格式保存以下数据：

- 查询ID
- 查询文本
- 查询嵌入
- 分数（例如，余弦相似度）
- 检索到的文档ID

本教程演示了回调处理程序在笔记本内进行实验和轻量级生产日志记录的用法。

⚠️ `OpenInferenceCallbackHandler`处于beta阶段，其API可能会发生变化。

ℹ️ 如果发现您的特定查询引擎或用例不受支持，请在[GitHub](https://github.com/Arize-ai/open-inference-spec/issues)上提出问题。


在使用OpenAI API之前，您需要配置您的API密钥。您可以在[OpenAI网站](https://beta.openai.com/signup/)上注册并获取API密钥。一旦您获得了API密钥，您可以将其配置到您的代码中，以便进行API调用。


In [None]:
import os
from getpass import getpass

if os.getenv("OPENAI_API_KEY") is None:
    os.environ["OPENAI_API_KEY"] = getpass(
        "Paste your OpenAI key from:"
        " https://platform.openai.com/account/api-keys\n"
    )
assert os.getenv("OPENAI_API_KEY", "").startswith(
    "sk-"
), "This doesn't look like a valid OpenAI API key"
print("OpenAI API key configured")

OpenAI API key configured


## 安装依赖项并导入库

安装笔记本的依赖项。


In [None]:
%pip install -q html2text llama-index pandas pyarrow tqdm
%pip install -q llama-index-readers-web
%pip install -q llama-index-callbacks-openinference

导入库。


In [None]:
import hashlib
import json
from pathlib import Path
import os
import textwrap
from typing import List, Union

import llama_index.core
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.callbacks import CallbackManager
from llama_index.callbacks.openinference import OpenInferenceCallbackHandler
from llama_index.callbacks.openinference.base import (
    as_dataframe,
    QueryData,
    NodeData,
)
from llama_index.core.node_parser import SimpleNodeParser
import pandas as pd
from tqdm import tqdm

## 加载和解析文档

从Paul Graham的文章“What I Worked On”中加载文档。


In [None]:
documents = SimpleWebPageReader().load_data(
    [
        "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
    ]
)
print(documents[0].text)



What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in t

解析文档为节点。显示第一个节点的文本。


In [None]:
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
print(nodes[0].text)

What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the

## 以Pandas Dataframe的形式访问数据

在笔记本中尝试聊天机器人和LLM应用程序时，经常会对一小组用户查询运行您的聊天机器人，并收集和分析数据以进行迭代改进。`OpenInferenceCallbackHandler`以列格式存储您的数据，并提供方便的访问方式，以Pandas dataframe的形式访问数据。

实例化OpenInference回调处理程序。


In [None]:
callback_handler = OpenInferenceCallbackHandler()
callback_manager = CallbackManager([callback_handler])
llama_index.core.Settings.callback_manager = callback_manager

构建索引并实例化查询引擎。


In [None]:
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

在一组查询中运行您的查询引擎。


In [None]:
max_characters_per_line = 80
queries = [
    "What did Paul Graham do growing up?",
    "When and how did Paul Graham's mother die?",
    "What, in Paul Graham's opinion, is the most distinctive thing about YC?",
    "When and how did Paul Graham meet Jessica Livingston?",
    "What is Bel, and when and where was it written?",
]
for query in queries:
    response = query_engine.query(query)
    print("Query")
    print("=====")
    print(textwrap.fill(query, max_characters_per_line))
    print()
    print("Response")
    print("========")
    print(textwrap.fill(str(response), max_characters_per_line))
    print()

Query
=====
What did Paul Graham do growing up?

Response
Paul Graham grew up writing short stories and programming. He started
programming on an IBM 1401 in 9th grade using an early version of Fortran.
Later, he transitioned to microcomputers and began programming on a TRS-80,
where he wrote simple games and a word processor.

Query
=====
When and how did Paul Graham's mother die?

Response
Paul Graham's mother died when he was 18 years old, from a brain tumor.

Query
=====
What, in Paul Graham's opinion, is the most distinctive thing about YC?

Response
The most distinctive thing about YC, according to Paul Graham, is that instead
of deciding for himself what to work on, the problems come to him. Every 6
months, a new batch of startups brings their problems, which then become the
problems of YC. This aspect of YC's work is engaging and allows for a diverse
range of challenges to be addressed, making it a unique and dynamic environment
for learning about startups.

Query
=====
When an

您查询引擎运行的数据可以作为 pandas dataframe 进行访问，以便进行分析和迭代改进。


In [None]:
query_data_buffer = callback_handler.flush_query_data_buffer()
query_dataframe = as_dataframe(query_data_buffer)
query_dataframe

Unnamed: 0,:id.id:,:timestamp.iso_8601:,:feature.text:prompt,:feature.[float].embedding:prompt,:feature.text:llm_prompt,:feature.[[str]]:llm_messages,:prediction.text:response,:feature.[str].retrieved_document_ids:prompt,:feature.[float].retrieved_document_scores:prompt
0,c0ac90c2-706d-41f6-b840-cfff2d5406ce,2024-02-20T16:03:47.852685,What did Paul Graham do growing up?,"[0.00727177644148469, -0.009682492353022099, 0...",,"[(system, You are an expert Q&A system that is...",Paul Graham grew up writing short stories and ...,[6c2625857520cae8185c229df96c5c8324d998503c98a...,"[0.808843861957992, 0.7996330023661674]"
1,234e0990-6ff8-4253-a361-c139871565f4,2024-02-20T16:03:51.083002,When and how did Paul Graham's mother die?,"[0.015593511052429676, 0.004450097680091858, -...",,"[(system, You are an expert Q&A system that is...",Paul Graham's mother died when he was 18 years...,[6c2625857520cae8185c229df96c5c8324d998503c98a...,"[0.7773106216458116, 0.7698260744207294]"
2,4128c740-ba0d-4f44-8c11-4360d6e27137,2024-02-20T16:03:53.847162,"What, in Paul Graham's opinion, is the most di...","[0.0027695773169398308, 0.001457934849895537, ...",,"[(system, You are an expert Q&A system that is...","The most distinctive thing about YC, according...",[9aaeaab1077c1a7faf97ded69d2d037e18ed9d3cb0e63...,"[0.8299917416870926, 0.8223302097228329]"
3,2d78599f-af33-4d62-9c24-3f8f95821abf,2024-02-20T16:03:57.576285,When and how did Paul Graham meet Jessica Livi...,"[0.002315425779670477, -0.0024678888730704784,...",,"[(system, You are an expert Q&A system that is...",Paul Graham met Jessica Livingston at a big pa...,[6c2625857520cae8185c229df96c5c8324d998503c98a...,"[0.8240769796488667, 0.8076047627810241]"
4,7b9abfb5-3687-48bf-9da7-3361e7b401be,2024-02-20T16:03:58.605381,"What is Bel, and when and where was it written?","[0.009047380648553371, -0.013641595840454102, ...",,"[(system, You are an expert Q&A system that is...",Bel is a new Lisp that was written in Arc. It ...,[a4569bc4d16179f406798904d9cc4bbd7d3c0caca3161...,"[0.8315868190669687, 0.7641419929089962]"


数据框的列名符合OpenInference规范，该规范指定了每个列的类别、数据类型和意图。


## 记录生产数据

在生产环境中，LlamaIndex应用程序的维护者可以通过实现并传递自定义的`callback`给`OpenInferenceCallbackHandler`来记录系统生成的数据。回调函数的类型为`Callable[List[QueryData]]`，接受来自`OpenInferenceCallbackHandler`的查询数据缓冲区，将数据持久化（例如，上传到云存储或发送到数据摄取服务），并在数据持久化后刷新缓冲区。下面包含了一个参考实现，当缓冲区超过一定大小时，定期将数据以OpenInference格式写入本地Parquet文件。


In [None]:
class ParquetCallback:    def __init__(        self, data_path: Union[str, Path], max_buffer_length: int = 1000    ):        self._data_path = Path(data_path)  # 数据路径        self._data_path.mkdir(parents=True, exist_ok=False)  # 创建目录，如果存在则报错        self._max_buffer_length = max_buffer_length  # 最大缓冲长度        self._batch_index = 0  # 批次索引    def __call__(        self,        query_data_buffer: List[QueryData],  # 查询数据缓冲区        node_data_buffer: List[NodeData],  # 节点数据缓冲区    ) -> None:        if len(query_data_buffer) >= self._max_buffer_length:  # 如果查询数据缓冲区长度大于等于最大缓冲长度            query_dataframe = as_dataframe(query_data_buffer)  # 将查询数据缓冲区转换为数据框            file_path = self._data_path / f"log-{self._batch_index}.parquet"  # 文件路径            query_dataframe.to_parquet(file_path)  # 将查询数据框写入parquet文件            self._batch_index += 1  # 批次索引加一            query_data_buffer.clear()  # ⚠️ 清空缓冲区，否则会无限增长！            node_data_buffer.clear()  # 没有记录节点数据缓冲区，但仍需清空

⚠️ 在生产环境中，清空缓冲区非常重要，否则回调处理程序将无限地在内存中积累数据，最终导致系统崩溃。


将Parquet写入器附加到您的回调函数，并重新运行查询引擎。数据将被保存到磁盘。


In [None]:
data_path = "data"parquet_writer = ParquetCallback(    data_path=data_path,    # 这个参数被人为地设置得很低，仅用于演示目的    # 以强制刷新到磁盘，在实际情况下会大得多    max_buffer_length=1,)callback_handler = OpenInferenceCallbackHandler(callback=parquet_writer)callback_manager = CallbackManager([callback_handler])llama_index.core.Settings.callback_manager = callback_managerindex = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()for query in tqdm(queries):    query_engine.query(query)

100%|██████████| 5/5 [00:13<00:00,  2.70s/it]


加载并显示从磁盘中保存的Parquet数据，以验证记录器是否正常工作。


In [None]:
query_dataframes = []
for file_name in os.listdir(data_path):
    file_path = os.path.join(data_path, file_name)
    query_dataframes.append(pd.read_parquet(file_path))
query_dataframe = pd.concat(query_dataframes)
query_dataframe

Unnamed: 0,:id.id:,:timestamp.iso_8601:,:feature.text:prompt,:feature.[float].embedding:prompt,:feature.text:llm_prompt,:feature.[[str]]:llm_messages,:prediction.text:response,:feature.[str].retrieved_document_ids:prompt,:feature.[float].retrieved_document_scores:prompt
0,e6f01e8e-f774-45da-beff-4fcb6c85726a,2024-02-20T16:04:02.364860,What did Paul Graham do growing up?,"[0.00727177644148469, -0.009682492353022099, 0...",,"[[system, You are an expert Q&A system that is...","Growing up, Paul Graham worked on writing shor...",[6c2625857520cae8185c229df96c5c8324d998503c98a...,"[0.808843861957992, 0.7996330023661674]"
0,e024727e-4264-4f2b-8ab2-147df11da38d,2024-02-20T16:04:05.025023,When and how did Paul Graham's mother die?,"[0.015593511052429676, 0.004450097680091858, -...",,"[[system, You are an expert Q&A system that is...",Paul Graham's mother died when he was 18 years...,[6c2625857520cae8185c229df96c5c8324d998503c98a...,"[0.7773106216458116, 0.7698260744207294]"
0,75cd0ccd-4899-4055-a8b9-38b4300070e9,2024-02-20T16:04:08.291817,"What, in Paul Graham's opinion, is the most di...","[0.0027695773169398308, 0.001457934849895537, ...",,"[[system, You are an expert Q&A system that is...","The most distinctive thing about YC, according...",[9aaeaab1077c1a7faf97ded69d2d037e18ed9d3cb0e63...,"[0.8299917416870926, 0.8223302097228329]"
0,44a0e668-e790-4a4b-ac62-4ee6b545ca6a,2024-02-20T16:04:11.318325,When and how did Paul Graham meet Jessica Livi...,"[0.002315425779670477, -0.0024678888730704784,...",,"[[system, You are an expert Q&A system that is...",Paul Graham met Jessica Livingston at a big pa...,[6c2625857520cae8185c229df96c5c8324d998503c98a...,"[0.8240769796488667, 0.8076047627810241]"
0,c59da7f0-0f6c-43f1-9600-6ade61cfd658,2024-02-20T16:04:13.206596,"What is Bel, and when and where was it written?","[0.009047380648553371, -0.013641595840454102, ...",,"[[system, You are an expert Q&A system that is...",Bel is a new Lisp that was written in Arc. It ...,[a4569bc4d16179f406798904d9cc4bbd7d3c0caca3161...,"[0.8315868190669687, 0.7641419929089962]"
