{"cells":[{"cell_type":"markdown","id":"4b089493","metadata":{},"source":["# 模拟环境:体育馆\n","\n","对于LLM代理的许多应用程序,环境是真实的(互联网、数据库、REPL等)。然而,我们也可以定义代理在模拟环境中进行交互,比如基于文本的游戏。这是一个使用[Gymnasium](https://github.com/Farama-Foundation/Gymnasium)(之前是[OpenAI Gym](https://github.com/openai/gym))创建简单代理-环境交互循环的示例。"]},{"cell_type":"code","execution_count":1,"id":"f36427cf","metadata":{},"outputs":[],"source":["# 安装gym库\n","!pip install gym"]},{"cell_type":"code","execution_count":2,"id":"f9bd38b4","metadata":{},"outputs":[],"source":["# 导入必要的模块\n","import tenacity # 引入tenacity模块\n","from langchain.output_parsers import RegexParser # 从langchain.output_parsers模块中导入RegexParser类\n","from langchain.schema import ( # 从langchain.schema模块中导入HumanMessage和SystemMessage类\n"," HumanMessage,\n"," SystemMessage,\n",")"]},{"cell_type":"markdown","id":"e222e811","metadata":{},"source":["## 定义代理程序"]},{"cell_type":"code","execution_count":3,"id":"870c24bc","metadata":{},"outputs":[],"source":["class GymnasiumAgent:\n"," @classmethod\n"," def get_docs(cls, env):\n"," return env.unwrapped.__doc__ # 获取环境的文档字符串\n","\n"," def __init__(self, model, env):\n"," self.model = model # 模型\n"," self.env = env # 环境\n"," self.docs = self.get_docs(env) # 获取环境的文档字符串\n","\n"," self.instructions = \"\"\"\n","你的目标是最大化你的回报,即你所获得的奖励的总和。\n","我会给你一个观察值、奖励、终止标志、截断标志和迄今为止的回报,格式如下:\n","\n","观察值: \n","奖励: \n","终止: \n","截断: \n","回报: \n","\n","你需要用一个动作来回应,格式如下:\n","\n","动作: \n","\n","你需要用你的实际动作替换 。\n","除此之外,不要做任何其他操作,只需返回动作。\n","\"\"\"\n"," self.action_parser = RegexParser(\n"," regex=r\"Action: (.*)\", output_keys=[\"action\"], default_output_key=\"action\"\n"," ) # 用于解析动作的正则表达式解析器\n","\n"," self.message_history = [] # 消息历史记录\n"," self.ret = 0 # 回报\n","\n"," def random_action(self):\n"," action = self.env.action_space.sample() # 随机选择一个动作\n"," return action\n","\n"," def reset(self):\n"," self.message_history = [\n"," SystemMessage(content=self.docs), # 添加环境的文档字符串到消息历史记录中\n"," SystemMessage(content=self.instructions), # 添加指令到消息历史记录中\n"," ]\n","\n"," def observe(self, obs, rew=0, term=False, trunc=False, info=None):\n"," self.ret += rew # 更新回报\n","\n"," obs_message = f\"\"\"\n","观察值: {obs}\n","奖励: {rew}\n","终止: {term}\n","截断: {trunc}\n","回报: {self.ret}\n"," \"\"\" # 构建观察消息\n"," self.message_history.append(HumanMessage(content=obs_message)) # 添加观察消息到消息历史记录中\n"," return obs_message\n","\n"," def _act(self):\n"," act_message = self.model.invoke(self.message_history) # 使用模型进行动作选择\n"," self.message_history.append(act_message) # 添加动作消息到消息历史记录中\n"," action = int(self.action_parser.parse(act_message.content)[\"action\"]) # 解析动作消息中的动作\n"," return action\n","\n"," def act(self):\n"," try:\n"," for attempt in tenacity.Retrying(\n"," stop=tenacity.stop_after_attempt(2), # 最多重试2次\n"," wait=tenacity.wait_none(), # 重试之间没有等待时间\n"," retry=tenacity.retry_if_exception_type(ValueError), # 当遇到 ValueError 异常时重试\n"," before_sleep=lambda retry_state: print(\n"," f\"ValueError occurred: {retry_state.outcome.exception()}, retrying...\"\n"," ), # 打印错误信息并重试\n"," ):\n"," with attempt:\n"," action = self._act() # 执行动作选择\n"," except tenacity.RetryError:\n"," action = self.random_action() # 如果重试失败,则随机选择一个动作\n"," return action # 返回选择的动作"]},{"cell_type":"markdown","id":"2e76d22c","metadata":{},"source":["## 初始化模拟环境和代理程序"]},{"cell_type":"code","execution_count":4,"id":"9e902cfd","metadata":{},"outputs":[],"source":["# 创建一个名为env的Blackjack-v1环境\n","env = gym.make(\"Blackjack-v1\")\n","# 使用ChatOpenAI模型创建一个名为agent的GymnasiumAgent代理,并设置temperature参数为0.2,同时将env环境传入\n","agent = GymnasiumAgent(model=ChatOpenAI(temperature=0.2), env=env)"]},{"cell_type":"markdown","id":"e2c12b15","metadata":{},"source":["## 主循环"]},{"cell_type":"code","execution_count":5,"id":"ad361210","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["\n","Observation: (15, 4, 0)\n","Reward: 0\n","Termination: False\n","Truncation: False\n","Return: 0\n"," \n","Action: 1\n","\n","Observation: (25, 4, 0)\n","Reward: -1.0\n","Termination: True\n","Truncation: False\n","Return: -1.0\n"," \n","break True False\n"]}],"source":["# 导入必要的库\n","import gym\n","\n","# 创建环境\n","env = gym.make('CartPole-v0')\n","\n","# 重置环境并获取初始观测和信息\n","observation, info = env.reset()\n","\n","# 重置智能体\n","agent.reset()\n","\n","# 使用智能体观测环境并获取观测信息\n","obs_message = agent.observe(observation)\n","print(obs_message)\n","\n","# 进入主循环\n","while True:\n"," # 智能体选择动作\n"," action = agent.act()\n"," \n"," # 执行动作并获取新的观测、奖励、终止、截断和信息\n"," observation, reward, termination, truncation, info = env.step(action)\n"," \n"," # 使用智能体观测环境并获取观测信息\n"," obs_message = agent.observe(observation, reward, termination, truncation, info)\n"," print(f\"Action: {action}\")\n"," print(obs_message)\n","\n"," # 如果达到终止或截断条件,则跳出循环\n"," if termination or truncation:\n"," print(\"break\", termination, truncation)\n"," break\n","\n","# 关闭环境\n","env.close()"]},{"cell_type":"code","execution_count":null,"id":"58a13e9c","metadata":{},"outputs":[],"source":[]}],"metadata":{"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.16"}},"nbformat":4,"nbformat_minor":5}