{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Extract structured data from images\n", "\n", "Use AI vision to extract JSON data from receipts, forms, documents, and other images." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem\n", "\n", "You have images containing structured information (receipts, forms, ID cards) and need to extract specific fields as JSON for downstream processing.\n", "\n", "| Image | Fields to extract |\n", "|-------|------------------|\n", "| receipt.jpg | vendor, total, date, items |\n", "| business_card.jpg | name, email, phone, company |\n", "| invoice.pdf | invoice_number, amount, due_date |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n", "\n", "**What's in this recipe:**\n", "\n", "- Extract structured JSON from images using GPT-4o\n", "- Use `openai.vision()` which handles images directly\n", "- Access individual fields from the extracted data\n", "\n", "You use Pixeltable's `openai.vision()` function which automatically handles image encoding. Request JSON output via `response_format` in `model_kwargs`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable openai\n", "\n", "import getpass\n", "import os\n", "\n", "if 'OPENAI_API_KEY' not in os.environ:\n", " os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pixeltable as pxt\n", "from pixeltable.functions import openai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load images" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata\n", "Created directory 'extraction_demo'.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a fresh directory\n", "pxt.drop_dir('extraction_demo', force=True)\n", "pxt.create_dir('extraction_demo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'images'.\n" ] } ], "source": [ "t = pxt.create_table('extraction_demo/images', {'image': pxt.Image})" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `images`: 2 rows [00:00, 365.50 rows/s]\n", "Inserted 2 rows with 0 errors.\n" ] }, { "data": { "text/plain": [ "2 rows inserted, 4 values computed." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Insert sample images\n", "t.insert(\n", " [\n", " {\n", " 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'\n", " },\n", " {\n", " 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'\n", " },\n", " ]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extract structured data\n", "\n", "Use `openai.vision()` to analyze images and get JSON output:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 2 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "2 rows updated, 4 values computed." ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add extraction column using openai.vision (handles images directly)\n", "PROMPT = \"\"\"Analyze this image and extract the following as JSON:\n", "- description: A brief description of the image\n", "- objects: List of objects visible in the image\n", "- dominant_colors: List of dominant colors\n", "- scene_type: Type of scene (indoor, outdoor, etc.)\"\"\"\n", "\n", "t.add_computed_column(\n", " data=openai.vision(\n", " prompt=PROMPT,\n", " image=t.image,\n", " model='gpt-4o-mini',\n", " model_kwargs={'response_format': {'type': 'json_object'}},\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
imagedata
\n", " \n", "
{\n", " "description": "A cheerful young woman holding a pink umbrella while wearing a colorful swimsuit, standing by a body of water.",\n", " "objects": [\n", " "woman",\n", " "umbrella",\n", " "swimsuit",\n", " "water",\n", " "lifeguard stand"\n", " ],\n", " "dominant_colors": [\n", " "pink",\n", " "blue",\n", " "green",\n", " "white"\n", " ],\n", " "scene_type": "outdoor"\n", "}
\n", " \n", "
{\n", " "description": "A peaceful rural landscape featuring a cow grazing on lush green grass near a tree.",\n", " "objects": [\n", " "cow",\n", " "tree",\n", " "grass",\n", " "house",\n", " "sky"\n", " ],\n", " "dominant_colors": [\n", " "green",\n", " "blue",\n", " "gray",\n", " "brown",\n", " "white"\n", " ],\n", " "scene_type": "outdoor"\n", "}
" ], "text/plain": [ " image \\\n", "0 \n", " \n", " \n", " image\n", " description\n", " \n", " \n", " \n", " \n", "
\n", " \n", "
\n", " A cheerful young woman holding a pink umbrella while wearing a colorful swimsuit, standing by a body of water.\n", " \n", " \n", "
\n", " \n", "
\n", " A peaceful rural landscape featuring a cow grazing on lush green grass near a tree.\n", " \n", " \n", "" ], "text/plain": [ " image \\\n", "0 str:\n", " return json.loads(data).get('description', '')\n", "\n", "\n", "t.select(t.image, description=parse_description(t.data)).collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explanation\n", "\n", "**Why use `openai.vision()`:**\n", "\n", "- Handles PIL Images directly (no manual base64 encoding or URL storage)\n", "- Simpler API than constructing chat messages manually\n", "- Returns the content string directly\n", "\n", "**Getting JSON output:**\n", "\n", "Pass `model_kwargs={'response_format': {'type': 'json_object'}}` to get structured JSON.\n", "\n", "**Other extraction use cases:**\n", "\n", "| Use Case | Fields to extract |\n", "|----------|------------------|\n", "| Receipts | vendor, total, date, items, tax |\n", "| Business cards | name, title, company, email, phone |\n", "| Product photos | brand, model, condition, defects |\n", "| Documents | title, date, author, summary |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "\n", "- [Analyze images in batch](https://docs.pixeltable.com/howto/cookbooks/images/vision-batch-analysis)\n", "- [Configure API keys](https://docs.pixeltable.com/howto/cookbooks/core/workflow-api-keys)" ] } ], "metadata": { "kernelspec": { "display_name": "pixeltable", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 2 }