{ "cells": [ { "cell_type": "markdown", "id": "500ad5dd", "metadata": {}, "source": [ "# Building a Coding Agent with GPT-5.1 and the OpenAI Agents SDK\n", "\n", "GPT-5.1 is exceptionally strong at coding, and with the new code-editing and command-execution tools available in the [Responses API](https://platform.openai.com/docs/api-reference/responses), it’s now easier than ever to build coding agents that can work across full codebases and iterate quickly.\n", "\n", "In this guide, we’ll use the [Agents SDK](https://openai.github.io/openai-agents-python/) to build a **coding agent that can scaffold a brand-new app from a prompt and refine it through user feedback**. Our agent will be equipped with the following tools:\n", "\n", "- **apply_patch** — to edit files\n", "- **shell** — to run shell commands\n", "- **web_search** — to pull fresh information from the web\n", "- **Context7 MCP** — to access up-to-date documentation\n", "\n", "We’ll begin by focusing on the `shell` and `web_search` tools to generate a new project with web-sourced context. Then we’ll add `apply_patch` so the agent can iterate on the codebase, and we’ll connect it to the [Context7 MCP server](https://context7.com/) so it can write code informed by the most recent docs." ] }, { "cell_type": "markdown", "id": "7d1bea10", "metadata": {}, "source": [ "## Set up the agent\n", "\n", "With the Agents SDK, defining an agent is as simple as providing instructions and a list of tools. In this example, we want to use the newest `gpt-5.1` model for its state-of-the-art coding abilities.\n", "\n", "We’ll start by enabling `web_search`, which gives the agent the ability to look up up-to-date information online, and `shell`, which lets the agent propose shell commands for tasks like scaffolding, installing dependencies, and running build steps.\n", "\n", "The shell tool works by letting the model propose commands it believes should be executed. Your environment is responsible for actually running those commands and returning the output.\n", "\n", "The Agents SDK automates most of this command-execution handshake for you—you only need to implement the shell executor, the environment in which those commands will run." ] }, { "cell_type": "code", "execution_count": null, "id": "e03e427a", "metadata": {}, "outputs": [], "source": [ "%pip install openai-agents openai asyncio" ] }, { "cell_type": "code", "execution_count": 1, "id": "4e7a48c0", "metadata": {}, "outputs": [], "source": [ "import os \n", "\n", "# Make sure your OpenAI API key is defined (you can set it on your global environment, or export it manually)\n", "# export OPENAI_API_KEY=\"sk-...\"\n", "assert \"OPENAI_API_KEY\" in os.environ, \"Please set OPENAI_API_KEY first.\"" ] }, { "cell_type": "markdown", "id": "82ac5519", "metadata": {}, "source": [ "### Define a working environment and shell executor\n", "\n", "For simplicity, we'll run shell commands locally and isolate them in a dedicated workspace directory. This ensures the agent only interacts with files inside that folder.\n", "\n", "**Note:** In production, **always execute shell commands in a sandboxed environment**. Arbitrary command execution is inherently risky and must be tightly controlled." ] }, { "cell_type": "code", "execution_count": 2, "id": "42b89fc1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Workspace directory: /Users/katia/dev/openai-cookbook/examples/coding-agent-workspace\n" ] } ], "source": [ "# Create an isolated workspace for shell commands\n", "from pathlib import Path\n", "\n", "workspace_dir = Path(\"coding-agent-workspace\").resolve()\n", "workspace_dir.mkdir(exist_ok=True)\n", "\n", "print(f\"Workspace directory: {workspace_dir}\")" ] }, { "cell_type": "markdown", "id": "d8eca9ba", "metadata": {}, "source": [ "We’ll now define a small `ShellExecutor` class that:\n", "\n", "- Receives a `ShellCommandRequest` from the agent\n", "- Optionally asks for approval before running commands\n", "- Runs them using `asyncio.create_subprocess_shell`\n", "- Returns a `ShellResult` with the outputs\n", "\n", "All commands will run with `cwd=workspace_dir`, so they only affect files in that subfolder." ] }, { "cell_type": "code", "execution_count": 3, "id": "e8bae5bb", "metadata": {}, "outputs": [], "source": [ "import asyncio\n", "import os\n", "from collections.abc import Sequence\n", "from pathlib import Path\n", "from typing import Literal\n", "\n", "from agents import (\n", " ShellTool,\n", " ShellCommandRequest,\n", " ShellCommandOutput,\n", " ShellCallOutcome,\n", " ShellResult,\n", ")\n", "\n", "\n", "async def require_approval(commands: Sequence[str]) -> None:\n", " \"\"\"\n", " Ask for confirmation before running shell commands.\n", "\n", " Set SHELL_AUTO_APPROVE=1 in your environment to skip this prompt\n", " (useful when you're iterating a lot or running in CI).\n", " \"\"\"\n", " if os.environ.get(\"SHELL_AUTO_APPROVE\") == \"1\":\n", " return\n", "\n", " print(\"Shell command approval required:\")\n", " for entry in commands:\n", " print(\" \", entry)\n", " response = input(\"Proceed? [y/N] \").strip().lower()\n", " if response not in {\"y\", \"yes\"}:\n", " raise RuntimeError(\"Shell command execution rejected by user.\")\n", "\n", "\n", "class ShellExecutor:\n", " \"\"\"\n", " Shell executor for the notebook cookbook.\n", "\n", " - Runs all commands inside `workspace_dir`\n", " - Captures stdout/stderr\n", " - Enforces an optional timeout from `action.timeout_ms`\n", " - Returns a ShellResult with ShellCommandOutput entries using ShellCallOutcome\n", " \"\"\"\n", "\n", " def __init__(self, cwd: Path):\n", " self.cwd = cwd\n", "\n", " async def __call__(self, request: ShellCommandRequest) -> ShellResult:\n", " action = request.data.action\n", " await require_approval(action.commands)\n", "\n", " outputs: list[ShellCommandOutput] = []\n", "\n", " for command in action.commands:\n", " proc = await asyncio.create_subprocess_shell(\n", " command,\n", " cwd=self.cwd,\n", " env=os.environ.copy(),\n", " stdout=asyncio.subprocess.PIPE,\n", " stderr=asyncio.subprocess.PIPE,\n", " )\n", "\n", " timed_out = False\n", " try:\n", " timeout = (action.timeout_ms or 0) / 1000 or None\n", " stdout_bytes, stderr_bytes = await asyncio.wait_for(\n", " proc.communicate(),\n", " timeout=timeout,\n", " )\n", " except asyncio.TimeoutError:\n", " proc.kill()\n", " stdout_bytes, stderr_bytes = await proc.communicate()\n", " timed_out = True\n", "\n", " stdout = stdout_bytes.decode(\"utf-8\", errors=\"ignore\")\n", " stderr = stderr_bytes.decode(\"utf-8\", errors=\"ignore\")\n", "\n", " # Use ShellCallOutcome instead of exit_code/status fields directly\n", " outcome = ShellCallOutcome(\n", " type=\"timeout\" if timed_out else \"exit\",\n", " exit_code=getattr(proc, \"returncode\", None),\n", " )\n", "\n", " outputs.append(\n", " ShellCommandOutput(\n", " command=command,\n", " stdout=stdout,\n", " stderr=stderr,\n", " outcome=outcome,\n", " )\n", " )\n", "\n", " if timed_out:\n", " # Stop running further commands if this one timed out\n", " break\n", "\n", " return ShellResult(\n", " output=outputs,\n", " provider_data={\"working_directory\": str(self.cwd)},\n", " )\n", "\n", "\n", "shell_tool = ShellTool(executor=ShellExecutor(cwd=workspace_dir))" ] }, { "cell_type": "markdown", "id": "9c9b2a74", "metadata": {}, "source": [ "### Define the agent" ] }, { "cell_type": "code", "execution_count": 4, "id": "81ab508a", "metadata": {}, "outputs": [], "source": [ "# Define the agent's instructions\n", "INSTRUCTIONS = '''\n", "You are a coding assistant. The user will explain what they want to build, and your goal is to run commands to generate a new app.\n", "You can search the web to find which command you should use based on the technical stack, and use commands to create code files. \n", "You should also install necessary dependencies for the project to work. \n", "'''" ] }, { "cell_type": "code", "execution_count": 5, "id": "a1d804d9", "metadata": {}, "outputs": [], "source": [ "from agents import Agent, Runner, ShellTool, WebSearchTool\n", "\n", "coding_agent = Agent(\n", " name=\"Coding Agent\",\n", " model=\"gpt-5.1\",\n", " instructions=INSTRUCTIONS,\n", " tools=[\n", " WebSearchTool(),\n", " shell_tool\n", " ]\n", ")" ] }, { "cell_type": "markdown", "id": "e56a68b9", "metadata": {}, "source": [ "## Start a new project\n", "\n", "Let’s send a prompt to our coding agent and then inspect the files it created in the `workspace_dir`.\n", "In this example, we'll create a NextJS dashboard using the [shadcn](https://ui.shadcn.com/) library.\n", "\n", "**Note:** sometimes you might run into an `MaxTurnsExceeded` error, or the project might have a dependency error. Simply run the agent loop again. In a production environment, you would implement an external loop or user input handling to iterate if the project creation fails." ] }, { "cell_type": "code", "execution_count": 6, "id": "79aaeecd", "metadata": {}, "outputs": [], "source": [ "prompt = \"Create a new NextJS app that shows dashboard-01 from https://ui.shadcn.com/blocks on the home page\"" ] }, { "cell_type": "code", "execution_count": 7, "id": "ce5880c5", "metadata": {}, "outputs": [], "source": [ "import asyncio\n", "from agents import ItemHelpers, RunConfig\n", "\n", "async def run_coding_agent_with_logs(prompt: str):\n", " \"\"\"\n", " Run the coding agent and stream logs about what's happening\n", " \"\"\"\n", " print(\"=== Run starting ===\")\n", " print(f\"[user] {prompt}\\n\")\n", "\n", " result = Runner.run_streamed(\n", " coding_agent,\n", " input=prompt\n", " )\n", "\n", " async for event in result.stream_events():\n", " \n", " # High-level items: messages, tool calls, tool outputs, MCP, etc.\n", " if event.type == \"run_item_stream_event\":\n", " item = event.item\n", "\n", " # 1) Tool calls (function tools, web_search, shell, MCP, etc.)\n", " if item.type == \"tool_call_item\":\n", " raw = item.raw_item\n", " raw_type_name = type(raw).__name__\n", "\n", " # Special-case the ones we care most about in this cookbook\n", " if raw_type_name == \"ResponseFunctionWebSearch\":\n", " print(\"[tool] web_search_call – agent is calling web search\")\n", " elif raw_type_name == \"LocalShellCall\":\n", " # LocalShellCall.action.commands is where the commands live\n", " commands = getattr(getattr(raw, \"action\", None), \"commands\", None)\n", " if commands:\n", " print(f\"[tool] shell – running commands: {commands}\")\n", " else:\n", " print(\"[tool] shell – running command\")\n", " else:\n", " # Generic fallback for other tools (MCP, function tools, etc.)\n", " print(f\"[tool] {raw_type_name} called\")\n", "\n", " # 2) Tool call outputs\n", " elif item.type == \"tool_call_output_item\":\n", " # item.output is whatever your tool returned (could be structured)\n", " output_preview = str(item.output)\n", " if len(output_preview) > 400:\n", " output_preview = output_preview[:400] + \"…\"\n", " print(f\"[tool output] {output_preview}\")\n", "\n", " # 3) Normal assistant messages\n", " elif item.type == \"message_output_item\":\n", " text = ItemHelpers.text_message_output(item)\n", " print(f\"[assistant]\\n{text}\\n\")\n", "\n", " # 4) Other event types (reasoning, MCP list tools, etc.) – ignore\n", " else:\n", " pass\n", "\n", " print(\"=== Run complete ===\\n\")\n", "\n", " # Once streaming is done, result.final_output contains the final answer\n", " print(\"Final answer:\\n\")\n", " print(result.final_output)\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "4efc56ee", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Run starting ===\n", "[user] Create a new NextJS app that shows dashboard-01 from https://ui.shadcn.com/blocks on the home page\n", "\n", "Shell command approval required:\n", " npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", " cd shadcn-dashboard && npm install shadcn-ui class-variance-authority clsx tailwind-merge lucide-react\n", " cd shadcn-dashboard && npx shadcn-ui@latest init -y\n", "Proceed? [y/N] y\n", "[tool] ResponseOutputMessage called\n", "[tool output] $ npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", "\u001b[?25l\u001b[2K\u001b[1G\u001b[36m?\u001b[39m \u001b[1mWould you like to use \u001b[34mReact Compiler\u001b[39m?\u001b[22m \u001b[90m›\u001b[39m \u001b[36m\u001b[4mNo\u001b[39m\u001b[24m \u001b[90m/\u001b[39m Yes\n", "\n", "$ cd shadcn-dashboard && npm install shadcn-ui class-variance-authority clsx tailwind-merge lucide-react\n", "stderr:\n", "/bin/sh: line 0: cd: shadcn-dashboard…\n", "Shell command approval required:\n", " yes \"No\" | npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", " cd shadcn-dashboard && npm install shadcn-ui class-variance-authority clsx tailwind-merge lucide-react\n", " cd shadcn-dashboard && npx shadcn@latest init -y\n", "Proceed? [y/N] y\n", "[tool] ResponseOutputMessage called\n", "[tool output] $ yes \"No\" | npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", "\u001b[?25l\u001b[2K\u001b[1G\u001b[36m?\u001b[39m \u001b[1mWould you like to use \u001b[34mReact Compiler\u001b[39m?\u001b[22m \u001b[90m›\u001b[39m \u001b[36m\u001b[4mNo\u001b[39m\u001b[24m \u001b[90m/\u001b[39m Yes\u0007\u0007\u001b[2K\u001b[1G\u001b[2K\u001b[1G\u001b[32m✔\u001b[39m \u001b[1mWould you like to use \u001b[34mReact Compiler\u001b[39m?\u001b[22m \u001b[90m…\u001b[39m \u001b[36m\u001b[4mNo\u001b[39m\u001b[24m \u001b[90m/\u001b[39m Yes\n", "\u001b[?2…\n", "Shell command approval required:\n", " cd shadcn-dashboard && yes \"\" | npx shadcn@latest init\n", " cd shadcn-dashboard && npx shadcn@latest add button card dropdown-menu input label progress select separator sheet sidebar skeleton tabs avatar\n", "Proceed? [y/N] y\n", "[tool] ResponseOutputMessage called\n", "[tool output] $ cd shadcn-dashboard && yes \"\" | npx shadcn@latest init\n", "\u001b[?25l\u001b[36m?\u001b[39m \u001b[1mWhich color would you like to use as the \u001b[36mbase color\u001b[39m?\u001b[22m \u001b[90m›\u001b[39m \u001b[90m- Use arrow-keys. Return to submit.\u001b[39m\n", "\u001b[36m❯\u001b[39m \u001b[36m\u001b[4mNeutral\u001b[39m\u001b[24m\u001b[90m\u001b[39m\n", " Gray\u001b[90m\u001b[39m\n", " Zinc\u001b[90m\u001b[39m\n", " Stone\u001b[90m\u001b[39m\n", " Slate\u001b[90m\u001b[39m\n", "\u001b[2K\u001b[1G\u001b[32m✔\u001b[39m \u001b[1mWhich color would you like to use as the \u001b…\n", "Shell command approval required:\n", " cd shadcn-dashboard && ls\n", " cd shadcn-dashboard && sed -n '1,200p' src/app/page.tsx\n", " cd shadcn-dashboard && sed -n '1,260p' src/app/layout.tsx\n", "Proceed? [y/N] y\n", "[tool] ResponseOutputMessage called\n", "[tool output] $ cd shadcn-dashboard && ls\n", "components.json\n", "eslint.config.mjs\n", "next-env.d.ts\n", "next.config.ts\n", "\u001b[1m\u001b[36mnode_modules\u001b[m\u001b[m\n", "package-lock.json\n", "package.json\n", "postcss.config.mjs\n", "\u001b[1m\u001b[36mpublic\u001b[m\u001b[m\n", "README.md\n", "\u001b[1m\u001b[36msrc\u001b[m\u001b[m\n", "tsconfig.json\n", "\n", "$ cd shadcn-dashboard && sed -n '1,200p' src/app/page.tsx\n", "import Image from \"next/image\";\n", "\n", "export default function Home() {\n", " return (\n", "