{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["\"在\n"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["# 🚀 RAG/LLM 评估器 - DeepEval\n", "\n", "这个代码教程展示了如何轻松地将DeepEval与LlamaIndex集成在一起。DeepEval使得对你的RAG/LLMs进行单元测试变得很容易。\n", "\n", "你可以在这里阅读更多关于DeepEval框架的信息:https://docs.confident-ai.com/docs/getting-started\n", "\n", "欢迎在GitHub上查看我们的仓库:https://github.com/confident-ai/deepeval\n"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### 设置和安装\n", "\n", "我们建议通过pip进行设置和安装!\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!pip install -q -q llama-index\n", "!pip install -U -q deepeval"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["这一步是可选的,只有当你想要一个由服务器托管的仪表板时才需要进行!(嘘,我认为你应该这样做!)\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!deepeval login"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## 指标类型\n", "\n", "DeepEval提出了一个对RAG应用进行单元测试的主观框架。它将评估分解为测试用例,并提供了一系列评估指标,您可以自由地为每个测试用例进行评估,包括:\n", "\n", "- G-Eval\n", "- 总结\n", "- 答案相关性\n", "- 忠实度\n", "- 上下文召回\n", "- 上下文精确度\n", "- 上下文相关性\n", "- RAGAS\n", "- 幻觉\n", "- 偏见\n", "- 毒性\n", "\n", "[DeepEval](https://github.com/confident-ai/deepeval)将最新的研究成果纳入其评估指标中,然后用于驱动LlamaIndex的评估器。您可以在[这里](https://docs.confident-ai.com/docs/metrics-introduction)了解更多关于完整指标列表以及它们的计算方法。\n"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## 第1步 - 设置您的LlamaIndex应用程序\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["from llama_index.core import VectorStoreIndex和SimpleDirectoryReader", "", "# 阅读LlamaIndex的快速入门以获取更多详细信息,您需要事先将数据存储在“YOUR_DATA_DIRECTORY”中", "documents = SimpleDirectoryReader(\"YOUR_DATA_DIRECTORY\").load_data()", "index = VectorStoreIndex.from_documents(documents)", "rag_application = index.as_query_engine()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## 第二步 - 使用DeepEval的RAG/LLM评估器\n"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["DeepEval提供了6种开箱即用的评估器,其中一些用于RAG,一些直接用于LLM输出(尽管也适用于RAG)。让我们尝试一下忠实度评估器(用于评估RAG中的幻觉)。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["来自deepeval.integrations.llamaindex的DeepEvalFaithfulnessEvaluator", "", "# 你的RAG应用程序的一个示例输入", "user_input = \"LlamaIndex是什么?\"", "", "# LlamaIndex返回一个包含输出字符串和检索节点的响应对象", "response_object = rag_application.query(user_input)", "", "evaluator = DeepEvalFaithfulnessEvaluator()", "evaluation_result = evaluator.evaluate_response(", " query=user_input, response=response_object", ")", "print(evaluation_result)"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## 评估器的完整列表\n", "\n", "以下是您可以从`deepeval`中导入的所有6个评估器:\n", "\n", "```python\n", "from deepeval.integrations.llama_index import (\n", " DeepEvalAnswerRelevancyEvaluator,\n", " DeepEvalFaithfulnessEvaluator,\n", " DeepEvalContextualRelevancyEvaluator,\n", " DeepEvalSummarizationEvaluator,\n", " DeepEvalBiasEvaluator,\n", " DeepEvalToxicityEvaluator,\n", ")\n", "```\n", "\n", "要了解所有评估器的定义以及它们如何与DeepEval的测试套件集成,[请点击这里。](https://docs.confident-ai.com/docs/integrations-llamaindex)\n", "\n", "## 有用链接\n", "\n", "- [DeepEval快速入门](https://docs.confident-ai.com/docs/getting-started)\n", "- [关于LLM评估指标的一切](https://www.confident-ai.com/blog/llm-evaluation-metrics-everything-you-need-for-llm-evaluation)\n"]}], "metadata": {"colab": {"provenance": []}, "kernelspec": {"display_name": "Python 3", "name": "python3"}, "language_info": {"name": "python"}}, "nbformat": 4, "nbformat_minor": 0}