{"cells": [{"cell_type": "markdown", "id": "f2cf3ee9-d4dd-44d1-a529-6e92d9f8e023", "metadata": {}, "source": ["# LLM编译器代理食谱\n", "\n", "\"在Colab中打开\"/\n", "\n", "**注意**: 感谢[LLMCompiler源代码库](https://github.com/SqueezeAILab/LLMCompiler)。我们的许多实现都是从这个代码库中提取的(并且使用了LlamaIndex模块进行了适应)。\n", "\n", "在这本食谱中,我们将展示如何使用我们的LLMCompiler代理实现来处理各种设置。这包括使用一些简单的函数工具进行数学运算,还可以回答关于多个文档的更高级RAG用例的多部分查询。\n", "\n", "我们看到LLMCompilerAgent能够并行调用函数,比通过ReAct顺序执行能够更快地得到结果。\n"]}, {"cell_type": "code", "execution_count": null, "id": "47d9197e", "metadata": {}, "outputs": [], "source": ["%pip install llama-index-agent-llm-compiler\n", "%pip install llama-index-readers-wikipedia\n", "%pip install llama-index-llms-openai"]}, {"cell_type": "code", "execution_count": null, "id": "338345d0-0b03-44f0-b812-d98b530d047b", "metadata": {}, "outputs": [], "source": ["import nest_asyncio\n", "\n", "nest_asyncio.apply()"]}, {"cell_type": "code", "execution_count": null, "id": "653fd259-499a-4705-aff6-16890e3f5373", "metadata": {}, "outputs": [], "source": ["from llama_index.agent.llm_compiler import LLMCompilerAgentWorker"]}, {"cell_type": "markdown", "id": "b82925e9-0c6b-4e1a-82f7-5d238cfd4bc6", "metadata": {}, "source": ["## 使用简单函数测试LLMCompiler代理\n", "\n", "在这里,我们将使用简单的数学函数(加法、乘法)来测试LLMCompiler代理,以说明它的工作原理。\n"]}, {"cell_type": "code", "execution_count": null, "id": "e6fdf3ad-0ded-4165-bdc6-870f91dc723d", "metadata": {}, "outputs": [], "source": ["import json\n", "from typing import Sequence, List\n", "\n", "from llama_index.llms.openai import OpenAI\n", "from llama_index.core.llms import ChatMessage\n", "from llama_index.core.tools import BaseTool, FunctionTool"]}, {"cell_type": "markdown", "id": "cc38b021-97c8-4d00-8781-91ce081547c0", "metadata": {}, "source": ["### 定义函数\n"]}, {"cell_type": "code", "execution_count": null, "id": "c13da398-bd9e-4f2e-82d2-2dbb62a3ac52", "metadata": {}, "outputs": [], "source": ["def multiply(a: int, b: int) -> int:", " \"\"\"将两个整数相乘并返回结果整数\"\"\"", " return a * b", "", "", "multiply_tool = FunctionTool.from_defaults(fn=multiply)", "", "", "def add(a: int, b: int) -> int:", " \"\"\"将两个整数相加并返回结果整数\"\"\"", " return a + b", "", "", "add_tool = FunctionTool.from_defaults(fn=add)", "", "tools = [multiply_tool, add_tool]"]}, {"cell_type": "code", "execution_count": null, "id": "9a77611c-5df1-488e-8856-6bc5cdb4249e", "metadata": {}, "outputs": [{"data": {"text/plain": ["'{\"type\": \"object\", \"properties\": {\"a\": {\"title\": \"A\", \"type\": \"integer\"}, \"b\": {\"title\": \"B\", \"type\": \"integer\"}}, \"required\": [\"a\", \"b\"]}'"]}, "execution_count": null, "metadata": {}, "output_type": "execute_result"}], "source": ["multiply_tool.metadata.fn_schema_str"]}, {"cell_type": "markdown", "id": "b9ce60c7-9625-42a2-87d6-a8fe7854a4ca", "metadata": {}, "source": ["### 设置LLMCompiler代理\n", "\n", "我们导入`LLMCompilerAgentWorker`并将其与`AgentRunner`结合在一起。\n"]}, {"cell_type": "code", "execution_count": null, "id": "faf87b3e-a027-4da2-a402-a704cd03a728", "metadata": {}, "outputs": [], "source": ["from llama_index.core.agent import AgentRunner"]}, {"cell_type": "code", "execution_count": null, "id": "35b3fa8a-647f-4268-aac2-4816fcfbdecc", "metadata": {}, "outputs": [], "source": ["llm = OpenAI(model=\"gpt-4\")"]}, {"cell_type": "code", "execution_count": null, "id": "413a6043-4824-4a12-9de7-fbfe15c4e541", "metadata": {}, "outputs": [], "source": ["callback_manager = llm.callback_manager"]}, {"cell_type": "code", "execution_count": null, "id": "1c14e9bb-b401-414e-916e-4c575b26b5e9", "metadata": {}, "outputs": [], "source": ["agent_worker = LLMCompilerAgentWorker.from_tools(\n", " tools, llm=llm, verbose=True, callback_manager=callback_manager\n", ")\n", "agent = AgentRunner(agent_worker, callback_manager=callback_manager)"]}, {"cell_type": "markdown", "id": "c7f2da7b-e611-4441-ab0e-22b8748ec658", "metadata": {}, "source": ["### 测试一些查询\n"]}, {"cell_type": "code", "execution_count": null, "id": "a0980a07-d2f5-434a-b9e9-80deabc0d1a9", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["> Running step 1ecbfd84-5440-4390-b954-a6c83b2dcb9a for task 8ac1971d-e9ee-4bf3-9e42-b6b07ab7c2da.\n", "> Step count: 0\n", "\u001b[1;3;38;5;200m> Plan: 1. multiply(121, 3)\n", "2. add($1, 42)\n", "Thought: I can answer the question now.\n", "3. join()\n", "\u001b[0m\u001b[1;3;34mRan task: multiply. Observation: 363\n", "\u001b[0m\u001b[1;3;34mRan task: add. Observation: 405\n", "\u001b[0m\u001b[1;3;34mRan task: join. Observation: None\n", "\u001b[0m\u001b[1;3;38;5;200m> Thought: The result of the operation is 405.\n", "\u001b[0m\u001b[1;3;38;5;200m> Answer: 405\n", "\u001b[0m"]}], "source": ["response = agent.chat(\"What is (121 * 3) + 42?\")"]}, {"cell_type": "code", "execution_count": null, "id": "6ebe5e1f-4929-4f47-9af2-1da8faebe638", "metadata": {}, "outputs": [{"data": {"text/plain": ["AgentChatResponse(response='405', sources=[], source_nodes=[], is_dummy_stream=False)"]}, "execution_count": null, "metadata": {}, "output_type": "execute_result"}], "source": ["response"]}, {"cell_type": "code", "execution_count": null, "id": "a3dfd5c4-633c-4729-a337-a4fcf13d8a76", "metadata": {}, "outputs": [{"data": {"text/plain": ["[ChatMessage(role=, content='What is (121 * 3) + 42?', additional_kwargs={}),\n", " ChatMessage(role=, content='405', additional_kwargs={})]"]}, "execution_count": null, "metadata": {}, "output_type": "execute_result"}], "source": ["agent.memory.get_all()"]}, {"cell_type": "markdown", "id": "c1e836cb-f481-4033-afca-211a52ef0665", "metadata": {}, "source": ["## 尝试使用LLMCompiler进行RAG\n", "\n", "现在让我们尝试使用LLMCompiler进行RAG的用例。具体来说,我们将加载一个包含各种城市维基百科文章的数据集,并对其提出问题。\n"]}, {"cell_type": "markdown", "id": "975c5d36-6d80-4004-a565-15351b7ac702", "metadata": {}, "source": ["### 设置数据\n", "\n", "我们使用我们的 `WikipediaReader` 来加载各个城市的数据。\n"]}, {"cell_type": "code", "execution_count": null, "id": "c2032f42", "metadata": {}, "outputs": [], "source": ["%pip install llama-index-readers-wikipedia wikipedia"]}, {"cell_type": "code", "execution_count": null, "id": "a50dccdc-491e-4ff2-8657-5dfea6fab284", "metadata": {}, "outputs": [], "source": ["from llama_index.readers.wikipedia import WikipediaReader"]}, {"cell_type": "code", "execution_count": null, "id": "2647c3be-1637-4b88-ae76-bb513f7e9682", "metadata": {}, "outputs": [], "source": ["wiki_titles = [\"Toronto\", \"Seattle\", \"Chicago\", \"Boston\", \"Miami\"]"]}, {"cell_type": "code", "execution_count": null, "id": "8eeb7add-aa6c-4f47-8d8c-6a6d222adc55", "metadata": {}, "outputs": [], "source": ["city_docs = {}\n", "reader = WikipediaReader()\n", "for wiki_title in wiki_titles:\n", " docs = reader.load_data(pages=[wiki_title])\n", " city_docs[wiki_title] = docs"]}, {"cell_type": "markdown", "id": "4af02a14-0d58-4570-93c5-78e1c87c8c72", "metadata": {}, "source": ["### 设置LLM + 服务上下文\n"]}, {"cell_type": "code", "execution_count": null, "id": "6425ad0e-2f8e-4101-8730-387f4aab8bfa", "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["/var/folders/0g/wd11bmkd791fz7hvgy1kqyp00000gn/T/ipykernel_52010/432572999.py:6: DeprecationWarning: Call to deprecated class method from_defaults. (ServiceContext is deprecated, please use `llama_index.settings.Settings` instead.) -- Deprecated since version 0.10.0.\n", " service_context = ServiceContext.from_defaults(llm=llm)\n"]}], "source": ["from llama_index.core import ServiceContext\n", "from llama_index.llms.openai import OpenAI\n", "from llama_index.core.callbacks import CallbackManager\n", "\n", "llm = OpenAI(temperature=0, model=\"gpt-4\")\n", "service_context = ServiceContext.from_defaults(llm=llm)\n", "callback_manager = CallbackManager([])"]}, {"cell_type": "markdown", "id": "c3a52155-1790-44e6-b6be-14964b0b25f1", "metadata": {}, "source": ["### 定义工具集\n", "\n", "在这个项目中,我们将使用以下工具集:\n", "\n", "- Python 3:一种流行的编程语言,用于开发机器学习模型和处理数据。\n", "- NumPy:用于进行科学计算的Python库,特别适用于处理多维数组和矩阵运算。\n", "- Pandas:提供数据分析和处理工具的Python库,特别适用于处理结构化数据。\n", "- Matplotlib:用于创建可视化图表和图形的Python库。\n", "- Scikit-learn:用于机器学习建模和数据分析的Python库,包含了许多常用的机器学习算法和工具。\n", "\n", "确保在运行代码之前安装了这些工具集。\n"]}, {"cell_type": "code", "execution_count": null, "id": "91324b15-9f98-4f52-a832-2eadc76b73c9", "metadata": {}, "outputs": [], "source": ["from llama_index.core import load_index_from_storage, StorageContext", "from llama_index.core.node_parser import SentenceSplitter", "from llama_index.core.tools import QueryEngineTool, ToolMetadata", "from llama_index.core import VectorStoreIndex", "import os", "", "node_parser = SentenceSplitter()", "", "# 构建代理字典", "query_engine_tools = []", "", "for idx, wiki_title in enumerate(wiki_titles):", " nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title])", "", " if not os.path.exists(f\"./data/{wiki_title}\"):", " # 构建向量索引", " vector_index = VectorStoreIndex(", " nodes,", " service_context=service_context,", " callback_manager=callback_manager,", " )", " vector_index.storage_context.persist(", " persist_dir=f\"./data/{wiki_title}\"", " )", " else:", " vector_index = load_index_from_storage(", " StorageContext.from_defaults(persist_dir=f\"./data/{wiki_title}\"),", " service_context=service_context,", " callback_manager=callback_manager,", " )", " # 定义查询引擎", " vector_query_engine = vector_index.as_query_engine()", "", " # 定义工具", " query_engine_tools.append(", " QueryEngineTool(", " query_engine=vector_query_engine,", " metadata=ToolMetadata(", " name=f\"vector_tool_{wiki_title}\",", " description=(", " \"用于与特定方面相关的问题(例如历史、艺术与文化、体育、人口统计等)\"", " f\" {wiki_title} 相关的问题。\"", " ),", " ),", " )", " )"]}, {"cell_type": "markdown", "id": "29750eb8-b117-4777-b228-32e8becf662c", "metadata": {}, "source": ["### 设置LLMCompilerAgent\n", "\n", "```python\n", "from llmcompileragent import LLMCompilerAgent\n", "\n", "# 创建LLMCompilerAgent对象\n", "agent = LLMCompilerAgent()\n", "```\n"]}, {"cell_type": "code", "execution_count": null, "id": "dde0c37b-5660-43a0-a22f-78d4521204a8", "metadata": {}, "outputs": [], "source": ["from llama_index.core.agent import AgentRunner\n", "from llama_index.llms.openai import OpenAI\n", "\n", "llm = OpenAI(model=\"gpt-4\")\n", "agent_worker = LLMCompilerAgentWorker.from_tools(\n", " query_engine_tools,\n", " llm=llm,\n", " verbose=True,\n", " callback_manager=callback_manager,\n", ")\n", "agent = AgentRunner(agent_worker, callback_manager=callback_manager)"]}, {"cell_type": "markdown", "id": "cf2e41c1-3c68-4cc0-85f1-94eb21746bac", "metadata": {}, "source": ["### 测试查询\n", "\n", "这里我们将测试一些查询。\n"]}, {"cell_type": "code", "execution_count": null, "id": "54484552-e53f-4548-8719-8c188179b596", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["> Running step 7010cd56-3e69-4e25-8875-1ae0ab644e7d for task 1ec47644-e7e5-459e-81b4-2cb2f8105912.\n", "> Step count: 0\n", "\u001b[1;3;38;5;200m> Plan: 1. vector_tool_Miami(\"demographics\")\n", "2. vector_tool_Chicago(\"demographics\")\n", "3. join()\n", "\u001b[0m\u001b[1;3;34mRan task: vector_tool_Chicago. Observation: Chicago's demographics are diverse and have changed significantly over time. The city's industrial working class was initially formed by various ethnic groups, with a significant influx of African Americans from the American South in the early 20th century. The city also has a notable Bosnian population that arrived mainly in the 1990s and 2000s. Most of the foreign-born population in Chicago were born in Mexico, Poland, and India. \n", "\n", "As of July 2019, the largest racial or ethnic group in Chicago is non-Hispanic White, making up 32.8% of the population, followed by Blacks at 30.1% and the Hispanic population at 29.0%. The city also has the third-largest LGBT population in the United States, with an estimated 7.5% of the adult population identifying as LGBTQ in 2018. \n", "\n", "In terms of economic demographics, the median income for a household in the city was $47,408, and the median income for a family was $54,188, according to the U.S. Census Bureau's American Community Survey data estimates for 2008–2012. About 18.3% of families and 22.1% of the population lived below the poverty line. \n", "\n", "Religion also plays a significant role in the city's demographics. Christianity is the most practiced religion in Chicago, with Roman Catholicism and Protestantism being the largest branches. The city also has a sizable non-Christian population, including those who are irreligious, Jewish, Muslim, Buddhist, and Hindu.\n", "\u001b[0m\u001b[1;3;34mRan task: vector_tool_Miami. Observation: Miami is the largest city in South Florida and the second-largest in Florida, with a population of 442,241 as of the 2020 census. The city is part of the Miami metropolitan area, which has over 6 million residents. Miami's population is diverse, with a significant Hispanic majority. In 1970, the population was reported as 45.3% Hispanic, 32.9% non-Hispanic white, and 22.7% black. As of 2010, 34.4% of city residents were of Cuban origin, and other significant populations included Central American, South American, and West Indian or Afro-Caribbean American origins. The non-Hispanic white population has been growing in the 21st century, making up 14.0% of the population by 2020. The city also has a small Asian population, accounting for 1.0% of the population in 2010. Christianity is the most practiced religion in Miami, followed by Judaism, Islam, Buddhism, Hinduism, and other religions. As of 2022, there were 3,440 homeless people in Miami-Dade County, with 591 unsheltered homeless people within the city limits of Miami.\n", "\u001b[0m\u001b[1;3;34mRan task: join. Observation: None\n", "\u001b[0m\u001b[1;3;38;5;200m> Thought: Miami and Chicago are both diverse cities with significant Hispanic populations. Miami has a larger Hispanic majority, with a significant Cuban population. Chicago, on the other hand, has a more evenly distributed demographic, with non-Hispanic Whites being the largest group, followed closely by Blacks and Hispanics. Both cities have a significant Christian population, but Chicago also has a sizable non-Christian population. In terms of economic demographics, Chicago has a higher median income and a higher poverty rate than Miami.\n", "\u001b[0m\u001b[1;3;38;5;200m> Answer: Miami and Chicago are both diverse cities with significant Hispanic populations. Miami has a larger Hispanic majority, with a significant Cuban population. Chicago, on the other hand, has a more evenly distributed demographic, with non-Hispanic Whites being the largest group, followed closely by Blacks and Hispanics. Both cities have a significant Christian population, but Chicago also has a sizable non-Christian population. In terms of economic demographics, Chicago has a higher median income and a higher poverty rate than Miami.\n", "\u001b[0mMiami and Chicago are both diverse cities with significant Hispanic populations. Miami has a larger Hispanic majority, with a significant Cuban population. Chicago, on the other hand, has a more evenly distributed demographic, with non-Hispanic Whites being the largest group, followed closely by Blacks and Hispanics. Both cities have a significant Christian population, but Chicago also has a sizable non-Christian population. In terms of economic demographics, Chicago has a higher median income and a higher poverty rate than Miami.\n"]}], "source": ["response = agent.chat(\n", " \"Tell me about the demographics of Miami, and compare that with the demographics of Chicago?\"\n", ")\n", "print(str(response))"]}, {"cell_type": "code", "execution_count": null, "id": "e1e0c219-ce9b-435c-86e1-85f5aaf3afe8", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["> Running step c5ce4ccc-80a7-453b-8ebc-b3aeedfdb338 for task 16e3d4ca-7cab-4d0c-bbfa-225a09680ccd.\n", "> Step count: 0\n", "\u001b[1;3;38;5;200m> Plan: 1. vector_tool_Chicago(\"climate during wintertime\")\n", "2. vector_tool_Seattle(\"climate during wintertime\")\n", "3. join()\n", "\u001b[0m\u001b[1;3;34mRan task: vector_tool_Seattle. Observation: During wintertime, Seattle experiences cool, wet weather. Extreme cold temperatures, below about 15 °F or -9 °C, are rare due to the moderating influence of the adjacent Puget Sound, the greater Pacific Ocean, and Lake Washington. The city is often cloudy due to frequent storms and lows moving in from the Pacific Ocean, and it has many \"rain days\". However, the precipitation is often a light drizzle rather than heavy rainfall.\n", "\u001b[0m\u001b[1;3;34mRan task: vector_tool_Chicago. Observation: During wintertime, the city experiences relatively cold and snowy conditions. Blizzards can occur, as they did in winter 2011. The normal winter high from December through March is about 36 °F (2 °C). January and February are the coldest months. A polar vortex in January 2019 nearly broke the city's cold record of −27 °F (−33 °C), which was set on January 20, 1985. Measurable snowfall can continue through the first or second week of April. The city's proximity to Lake Michigan tends to keep the lakefront somewhat cooler in summer and less brutally cold in winter than inland parts of the city and suburbs away from the lake. Northeast winds from wintertime cyclones departing south of the region sometimes bring the city lake-effect snow.\n", "\u001b[0m\u001b[1;3;34mRan task: join. Observation: None\n", "\u001b[0m\u001b[1;3;38;5;200m> Thought: Comparing the two climates, Seattle seems to have a milder winter climate than Chicago.\n", "\u001b[0m\u001b[1;3;38;5;200m> Answer: Seattle\n", "\u001b[0mSeattle\n"]}], "source": ["response = agent.chat(\n", " \"Is the climate of Chicago or Seattle better during the wintertime?\"\n", ")\n", "print(str(response))"]}], "metadata": {"kernelspec": {"display_name": "llm-compiler", "language": "python", "name": "llm-compiler"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3"}}, "nbformat": 4, "nbformat_minor": 5}