{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["<a href=\"https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/llm/rungpt.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"在 Colab 中打开\"/></a>\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["# RunGPT\n", "RunGPT是一个开源的云原生大规模多模态模型（LMMs）服务框架。它旨在简化大型语言模型在分布式GPU集群上的部署和管理。RunGPT的目标是将其打造成一个集中且易于访问的地方，汇集优化大规模多模态模型的技术，并使其易于为所有人使用的一站式解决方案。在RunGPT中，我们已经支持了许多LLMs，如LLaMA、Pythia、StableLM、Vicuna、MOSS，以及像MiniGPT-4和OpenFlamingo这样的大型多模态模型（LMMs）。\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["```python\n", "# 导入所需的库\n", "import numpy as np\n", "import pandas as pd\n", "```\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["%pip install llama-index-llms-rungpt"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!pip install llama-index"]}, {"cell_type": "markdown", "metadata": {}, "source": ["您需要在Python环境中使用`pip install`安装rungpt包。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!pip install rungpt"]}, {"cell_type": "markdown", "metadata": {}, "source": ["安装成功后，RunGPT支持的模型可以通过一行命令进行部署。这个选项会从开源平台下载目标语言模型，并将其部署为一个服务在本地端口，可以通过http或grpc请求进行访问。我假设你不会在jupyter笔记本中运行这个命令，而是在命令行中运行。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!rungpt serve decapoda-research/llama-7b-hf --precision fp16 --device_map balanced"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## 基本用法\n", "#### 使用提示调用`complete`\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["from llama_index.llms.rungpt import RunGptLLM\n", "\n", "llm = RunGptLLM()\n", "promot = \"What public transportation might be available in a city?\"\n", "response = llm.complete(promot)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["I don't want to go to work, so what should I do?\n", "I have a job interview on Monday. What can I wear that will make me look professional but not too stuffy or boring?\n"]}], "source": ["print(response)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["#### 使用消息列表调用`chat`\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["from llama_index.core.llms import ChatMessage, MessageRole\n", "from llama_index.llms.rungpt import RunGptLLM\n", "\n", "messages = [\n", "    ChatMessage(\n", "        role=MessageRole.USER,\n", "        content=\"Now, I want you to do some math for me.\",\n", "    ),\n", "    ChatMessage(\n", "        role=MessageRole.ASSISTANT, content=\"Sure, I would like to help you.\"\n", "    ),\n", "    ChatMessage(\n", "        role=MessageRole.USER,\n", "        content=\"How many points determine a straight line?\",\n", "    ),\n", "]\n", "llm = RunGptLLM()\n", "response = llm.chat(messages=messages, temperature=0.8, max_tokens=15)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["print(response)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["流式处理是一种处理数据的方法，它允许我们在数据到达时立即处理它，而不需要等待所有数据都可用后再进行处理。这种方法特别适用于处理大量数据或实时数据。在Python中，我们可以使用各种库和工具来实现流式处理，如`pandas`、`Dask`和`Spark`等。\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["使用 `stream_complete` 终端点\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["promot = \"What public transportation might be available in a city?\"\n", "response = RunGptLLM().stream_complete(promot)\n", "for item in response:\n", "    print(item.text)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["使用 `stream_chat` 端点\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["from llama_index.llms.rungpt import RunGptLLM\n", "\n", "messages = [\n", "    ChatMessage(\n", "        role=MessageRole.USER,\n", "        content=\"Now, I want you to do some math for me.\",\n", "    ),\n", "    ChatMessage(\n", "        role=MessageRole.ASSISTANT, content=\"Sure, I would like to help you.\"\n", "    ),\n", "    ChatMessage(\n", "        role=MessageRole.USER,\n", "        content=\"How many points determine a straight line?\",\n", "    ),\n", "]\n", "response = RunGptLLM().stream_chat(messages=messages)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["for item in response:\n", "    print(item.message)"]}], "metadata": {"kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3"}}, "nbformat": 4, "nbformat_minor": 4}