{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Extract fields from LLM JSON responses\n", "\n", "Parse and access specific fields from structured JSON responses returned by language models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem\n", "\n", "LLM APIs return nested JSON responses with metadata you don't need. You want to extract just the text content or specific fields for downstream processing.\n", "\n", "```json\n", "{\n", " \"id\": \"chatcmpl-123\",\n", " \"choices\": [{\n", " \"message\": {\n", " \"content\": \"This is the actual response text\" // ← You want this\n", " }\n", " }],\n", " \"usage\": {\"tokens\": 50}\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n", "\n", "**What's in this recipe:**\n", "\n", "- Extract text content from chat completions\n", "- Access nested JSON fields\n", "- Create separate columns for different fields\n", "\n", "You use JSON path notation to extract specific fields from API responses and store them in computed columns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable openai\n", "\n", "import getpass\n", "import os\n", "\n", "if 'OPENAI_API_KEY' not in os.environ:\n", " os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pixeltable as pxt\n", "from pixeltable.functions import openai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create prompts table" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata\n", "Created directory 'json_demo'.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a fresh directory\n", "pxt.drop_dir('json_demo', force=True)\n", "pxt.create_dir('json_demo')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'prompts'.\n" ] } ], "source": [ "t = pxt.create_table('json_demo/prompts', {'prompt': pxt.String})" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `prompts`: 2 rows [00:00, 325.83 rows/s]\n", "Inserted 2 rows with 0 errors.\n" ] }, { "data": { "text/plain": [ "2 rows inserted, 2 values computed." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.insert(\n", " [\n", " {'prompt': 'What is the capital of France?'},\n", " {'prompt': 'Write a haiku about coding'},\n", " ]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get LLM responses" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 2 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "2 rows updated, 2 values computed." ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add computed column for API response (returns full JSON)\n", "t.add_computed_column(\n", " response=openai.chat_completions(\n", " messages=[{'role': 'user', 'content': t.prompt}],\n", " model='gpt-4o-mini',\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extract specific fields\n", "\n", "Use dot notation to access nested JSON fields:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 2 column values with 0 errors.\n", "Added 2 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "2 rows updated, 2 values computed." ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract just the text content\n", "t.add_computed_column(text=t.response.choices[0].message.content)\n", "\n", "# Extract token usage\n", "t.add_computed_column(tokens=t.response.usage.total_tokens)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prompttexttokens
What is the capital of France?The capital of France is Paris.21
Write a haiku about codingFingers dance on keys, \n", "Logic flows like whispered winds, \n", "Worlds born line by line.34
" ], "text/plain": [ " prompt \\\n", "0 What is the capital of France? \n", "1 Write a haiku about coding \n", "\n", " text tokens \n", "0 The capital of France is Paris. 21 \n", "1 Fingers dance on keys, \\nLogic flows like whi... 34 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View clean results\n", "t.select(t.prompt, t.text, t.tokens).collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explanation\n", "\n", "**Common extraction patterns:**\n", "\n", "| API | Text content path |\n", "|-----|-------------------|\n", "| OpenAI | `response.choices[0].message.content` |\n", "| Anthropic | `response.content[0].text` |\n", "| OpenAI Whisper | `response.text` |\n", "\n", "**Accessing JSON fields:**\n", "\n", "- Use dot notation for object properties: `response.usage`\n", "- Use brackets for array elements: `choices[0]`\n", "- Chain them: `response.choices[0].message.content`\n", "\n", "**Extracted columns are computed:**\n", "\n", "Changes to the source data automatically update all extracted fields." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "\n", "- [Configure API keys](https://docs.pixeltable.com/howto/cookbooks/core/workflow-api-keys)\n", "- [Extract structured data from images](https://docs.pixeltable.com/howto/cookbooks/images/vision-structured-output)" ] } ], "metadata": { "kernelspec": { "display_name": "pixeltable", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 2 }