{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Export data to SQL databases\n", "\n", "Send your Pixeltable data to PostgreSQL, SQLite, MySQL, TigerData, or Snowflake for use in external applications.\n", "\n", "**What's in this recipe:**\n", "\n", "- Export entire tables or filtered queries to any SQL database\n", "- Select specific columns for export\n", "- Handle existing tables with replace or append options\n", "- Connect to cloud PostgreSQL services (e.g. TigerData)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem\n", "\n", "You have processed data in your pipeline—cleaned text, generated embeddings, extracted metadata—and need to send it to a SQL database for use by other applications or teams.\n", "\n", "| Source | Destination | Use case |\n", "|--------|-------------|----------|\n", "| Product catalog | PostgreSQL | Web application backend |\n", "| ML predictions | SQLite | Local analytics |\n", "| Extracted entities | MySQL | Business intelligence |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n", "\n", "You use `export_sql()` to export tables or queries to any SQL database via database connection strings. The function automatically maps Pixeltable types to appropriate SQL types for each database dialect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable snowflake-sqlalchemy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pixeltable as pxt\n", "import tempfile\n", "from pathlib import Path\n", "from pixeltable.io.sql import export_sql" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create sample data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created directory 'sql_export_demo'.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a fresh directory\n", "pxt.drop_dir('sql_export_demo', force=True)\n", "pxt.create_dir('sql_export_demo')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'products'.\n" ] } ], "source": [ "# Create a table with product data\n", "products = pxt.create_table(\n", " 'sql_export_demo/products',\n", " {\n", " 'name': pxt.String,\n", " 'price': pxt.Float,\n", " 'in_stock': pxt.Bool,\n", " 'metadata': pxt.Json,\n", " },\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserted 5 rows with 0 errors in 0.01 s (566.35 rows/s)\n" ] }, { "data": { "text/plain": [ "5 rows inserted." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Insert sample products\n", "products.insert(\n", " [\n", " {\n", " 'name': 'Wireless Mouse',\n", " 'price': 29.99,\n", " 'in_stock': True,\n", " 'metadata': {'category': 'electronics', 'rating': 4.5},\n", " },\n", " {\n", " 'name': 'USB-C Hub',\n", " 'price': 49.99,\n", " 'in_stock': False,\n", " 'metadata': {'category': 'electronics', 'rating': 4.2},\n", " },\n", " {\n", " 'name': 'Mechanical Keyboard',\n", " 'price': 89.99,\n", " 'in_stock': True,\n", " 'metadata': {'category': 'electronics', 'rating': 4.8},\n", " },\n", " {\n", " 'name': 'Monitor Stand',\n", " 'price': 39.99,\n", " 'in_stock': True,\n", " 'metadata': {'category': 'accessories', 'rating': 4.0},\n", " },\n", " {\n", " 'name': 'Webcam',\n", " 'price': 59.99,\n", " 'in_stock': False,\n", " 'metadata': {'category': 'electronics', 'rating': 3.9},\n", " },\n", " ]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namepricein_stockmetadata
Wireless Mouse29.99True{"rating": 4.5, "category": "electronics"}
USB-C Hub49.99False{"rating": 4.2, "category": "electronics"}
Mechanical Keyboard89.99True{"rating": 4.8, "category": "electronics"}
Monitor Stand39.99True{"rating": 4., "category": "accessories"}
Webcam59.99False{"rating": 3.9, "category": "electronics"}
" ], "text/plain": [ " name price in_stock \\\n", "0 Wireless Mouse 29.99 True \n", "1 USB-C Hub 49.99 False \n", "2 Mechanical Keyboard 89.99 True \n", "3 Monitor Stand 39.99 True \n", "4 Webcam 59.99 False \n", "\n", " metadata \n", "0 {'rating': 4.5, 'category': 'electronics'} \n", "1 {'rating': 4.2, 'category': 'electronics'} \n", "2 {'rating': 4.8, 'category': 'electronics'} \n", "3 {'rating': 4.0, 'category': 'accessories'} \n", "4 {'rating': 3.9, 'category': 'electronics'} " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View the data\n", "products.collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Export an entire table\n", "\n", "You pass a table and a SQLAlchemy connection string to export all rows and columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a SQLite database for this demo\n", "db_path = Path(tempfile.mkdtemp()) / 'products.db'\n", "connection_string = f'sqlite:///{db_path}'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Export the full table\n", "export_sql(products, 'products', db_connect_str=connection_string)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Wireless Mouse', 29.99, 1, '{\"rating\": 4.5, \"category\": \"electronics\"}'),\n", " ('USB-C Hub', 49.99, 0, '{\"rating\": 4.2, \"category\": \"electronics\"}'),\n", " ('Mechanical Keyboard', 89.99, 1, '{\"rating\": 4.8, \"category\": \"electronics\"}'),\n", " ('Monitor Stand', 39.99, 1, '{\"rating\": 4.0, \"category\": \"accessories\"}'),\n", " ('Webcam', 59.99, 0, '{\"rating\": 3.9, \"category\": \"electronics\"}')]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Verify the export with SQLAlchemy\n", "import sqlalchemy as sql\n", "\n", "engine = sql.create_engine(connection_string)\n", "with engine.connect() as conn:\n", " result = conn.execute(sql.text('SELECT * FROM products')).fetchall()\n", "\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Export a filtered query\n", "\n", "You can export any query result—filter rows, select specific columns, or apply transformations before export." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Export only in-stock products\n", "export_sql(\n", " products.where(products.in_stock == True),\n", " 'in_stock_products',\n", " db_connect_str=connection_string,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Wireless Mouse', 29.99),\n", " ('Mechanical Keyboard', 89.99),\n", " ('Monitor Stand', 39.99)]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Verify filtered export\n", "with engine.connect() as conn:\n", " result = conn.execute(\n", " sql.text('SELECT name, price FROM in_stock_products')\n", " ).fetchall()\n", "\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Export specific columns\n", "\n", "You select only the columns you need before exporting. You can also rename columns in the output." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Export only name and price columns\n", "export_sql(\n", " products.select(products.name, products.price),\n", " 'price_list',\n", " db_connect_str=connection_string,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Export with renamed columns\n", "export_sql(\n", " products.select(\n", " product_name=products.name, unit_price=products.price\n", " ),\n", " 'renamed_columns',\n", " db_connect_str=connection_string,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['name', 'price']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Verify column selection\n", "inspector = sql.inspect(engine)\n", "columns = [col['name'] for col in inspector.get_columns('price_list')]\n", "columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Handle existing tables\n", "\n", "You control what happens when the target table already exists using the `if_exists` parameter:\n", "\n", "| Option | Behavior |\n", "|--------|----------|\n", "| `'error'` | Raise an error (default) |\n", "| `'replace'` | Drop the existing table and create a new one |\n", "| `'insert'` | Append new rows to the existing table |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Append new data to existing table\n", "export_sql(\n", " products.where(products.price > 50),\n", " 'products',\n", " db_connect_str=connection_string,\n", " if_exists='insert',\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Total rows after insert: 7'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check row count after insert\n", "with engine.connect() as conn:\n", " result = conn.execute(\n", " sql.text('SELECT COUNT(*) FROM products')\n", " ).fetchone()\n", "\n", "f'Total rows after insert: {result[0]}'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Replace with fresh data\n", "export_sql(\n", " products.select(products.name, products.price),\n", " 'products',\n", " db_connect_str=connection_string,\n", " if_exists='replace',\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"Columns: ['name', 'price'], Row count: 5\"" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check that table was replaced\n", "inspector = sql.inspect(engine)\n", "columns = [col['name'] for col in inspector.get_columns('products')]\n", "\n", "with engine.connect() as conn:\n", " row_count = conn.execute(\n", " sql.text('SELECT COUNT(*) FROM products')\n", " ).fetchone()[0]\n", "\n", "f'Columns: {columns}, Row count: {row_count}'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Export to cloud PostgreSQL (TigerData)\n", "\n", "You can export directly to cloud-hosted PostgreSQL databases like [TigerData](https://www.timescale.com/cloud) (Timescale Cloud). Get your credentials from the TigerData dashboard after creating a service." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import getpass\n", "import os\n", "\n", "# Skip interactive sections in CI environments\n", "SKIP_CLOUD_TESTS = os.environ.get('CI') or os.environ.get(\n", " 'GITHUB_ACTIONS'\n", ")\n", "\n", "if not SKIP_CLOUD_TESTS:\n", " # Enter your TigerData credentials interactively\n", " tigerdata_host = input(\n", " 'TigerData host (e.g., abc123.tsdb.cloud.timescale.com): '\n", " )\n", " tigerdata_port = input('TigerData port (e.g., 38963): ')\n", " tigerdata_user = input('TigerData username (e.g., tsdbadmin): ')\n", " tigerdata_password = getpass.getpass('TigerData password: ')\n", " tigerdata_dbname = input('TigerData database name (e.g., tsdb): ')\n", "\n", " # Build the connection string (use postgresql+psycopg:// for SQLAlchemy compatibility)\n", " tigerdata_connection = f'postgresql+psycopg://{tigerdata_user}:{tigerdata_password}@{tigerdata_host}:{tigerdata_port}/{tigerdata_dbname}?sslmode=require'\n", "else:\n", " print('Skipping TigerData section (running in CI)')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not SKIP_CLOUD_TESTS:\n", " # Export to TigerData\n", " export_sql(\n", " products,\n", " 'pixeltable_products',\n", " db_connect_str=tigerdata_connection,\n", " if_exists='replace',\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Wireless Mouse', 29.99, True, {'rating': 4.5, 'category': 'electronics'}),\n", " ('USB-C Hub', 49.99, False, {'rating': 4.2, 'category': 'electronics'}),\n", " ('Mechanical Keyboard', 89.99, True, {'rating': 4.8, 'category': 'electronics'}),\n", " ('Monitor Stand', 39.99, True, {'rating': 4.0, 'category': 'accessories'}),\n", " ('Webcam', 59.99, False, {'rating': 3.9, 'category': 'electronics'})]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "if not SKIP_CLOUD_TESTS:\n", " # Verify the export in TigerData\n", " tigerdata_engine = sql.create_engine(tigerdata_connection)\n", " with tigerdata_engine.connect() as conn:\n", " result = conn.execute(\n", " sql.text('SELECT * FROM pixeltable_products')\n", " ).fetchall()\n", " result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Export to Snowflake\n", "\n", "You can export directly to [Snowflake](https://www.snowflake.com/) data warehouses. Get your account identifier from the Snowflake web interface under **Admin → Accounts**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not SKIP_CLOUD_TESTS:\n", " # Enter your Snowflake credentials interactively\n", " snowflake_account = input(\n", " 'Snowflake account identifier (e.g., WEZMMGC-AIB20064): '\n", " )\n", " snowflake_user = input('Snowflake username: ')\n", " snowflake_password = getpass.getpass('Snowflake password: ')\n", " snowflake_warehouse = input(\n", " 'Snowflake warehouse (e.g., COMPUTE_WH): '\n", " )\n", " snowflake_database = input('Snowflake database: ')\n", " snowflake_schema = input('Snowflake schema (e.g., PUBLIC): ')\n", "\n", " # Build the connection string\n", " snowflake_connection = f'snowflake://{snowflake_user}:{snowflake_password}@{snowflake_account}/{snowflake_database}/{snowflake_schema}?warehouse={snowflake_warehouse}'\n", "else:\n", " print('Skipping Snowflake section (running in CI)')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not SKIP_CLOUD_TESTS:\n", " # Export to Snowflake (without JSON column)\n", " export_sql(\n", " products.select(products.name, products.price, products.in_stock),\n", " 'PIXELTABLE_PRODUCTS',\n", " db_connect_str=snowflake_connection,\n", " if_exists='replace',\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Wireless Mouse', 29.99, True, None),\n", " ('USB-C Hub', 49.99, False, None),\n", " ('Mechanical Keyboard', 89.99, True, None),\n", " ('Monitor Stand', 39.99, True, None),\n", " ('Webcam', 59.99, False, None)]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "if not SKIP_CLOUD_TESTS:\n", " # Verify the export in Snowflake\n", " snowflake_engine = sql.create_engine(snowflake_connection)\n", " with snowflake_engine.connect() as conn:\n", " result = conn.execute(\n", " sql.text('SELECT * FROM PIXELTABLE_PRODUCTS')\n", " ).fetchall()\n", " result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exporting media data\n", "\n", "For tables containing media types (`pxt.Image`, `pxt.Video`, `pxt.Audio`), you have two options:\n", "\n", "1. **Extract metadata before export** - Select only the columns you need (paths, embeddings, extracted text, etc.) and export those to SQL.\n", "\n", "1. **Use Pixeltable destinations** - For syncing media files to cloud storage, use Pixeltable's built-in destination support with providers like [Tigris](https://docs.pixeltable.com/howto/providers/working-with-tigris).\n", "\n", "**Example: Export image metadata to SQL**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'images'.\n", "Added 0 column values with 0 errors in 0.01 s\n", "Added 0 column values with 0 errors in 0.01 s\n", "Added 0 column values with 0 errors in 0.01 s\n" ] }, { "data": { "text/plain": [ "No rows affected." ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a table with images\n", "images = pxt.create_table(\n", " 'sql_export_demo/images', {'image': pxt.Image, 'label': pxt.String}\n", ")\n", "\n", "# Add computed columns for metadata\n", "images.add_computed_column(width=images.image.width)\n", "images.add_computed_column(height=images.image.height)\n", "images.add_computed_column(mode=images.image.mode)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserted 2 rows with 0 errors in 0.03 s (63.85 rows/s)\n" ] }, { "data": { "text/plain": [ "2 rows inserted." ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Insert sample images\n", "base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'\n", "images.insert(\n", " [\n", " {'image': f'{base_url}/000000000036.jpg', 'label': 'cat'},\n", " {'image': f'{base_url}/000000000090.jpg', 'label': 'scene'},\n", " ]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Export metadata (not the image itself) to SQL\n", "export_sql(\n", " images.select(images.label, images.width, images.height, images.mode),\n", " 'image_metadata',\n", " db_connect_str=connection_string, # or tigerdata_connection for cloud\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('cat', 481, 640, 'RGB'), ('scene', 640, 429, 'RGB')]" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Verify the metadata export\n", "with engine.connect() as conn:\n", " result = conn.execute(\n", " sql.text('SELECT * FROM image_metadata')\n", " ).fetchall()\n", "\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explanation\n", "\n", "**Connection strings:**\n", "\n", "The function uses SQLAlchemy connection strings. Common formats:\n", "\n", "| Database | Connection string format |\n", "|----------|-------------------------|\n", "| SQLite | `sqlite:///path/to/db.sqlite` |\n", "| PostgreSQL | `postgresql+psycopg://user:pass@host:port/dbname` |\n", "| TigerData | `postgresql+psycopg://user:pass@host:port/dbname?sslmode=require` |\n", "| MySQL | `mysql://user:pass@host:port/dbname` |\n", "| Snowflake | `snowflake://user:pass@account/database/schema?warehouse=WH` |\n", "\n", "**Type mapping:**\n", "\n", "Pixeltable types map to SQL types automatically:\n", "\n", "| Pixeltable type | SQL type |\n", "|-----------------|----------|\n", "| `pxt.Int` | INTEGER |\n", "| `pxt.Float` | FLOAT |\n", "| `pxt.String` | VARCHAR |\n", "| `pxt.Bool` | BOOLEAN |\n", "| `pxt.Json` | JSON (JSONB for PostgreSQL) |\n", "| `pxt.Timestamp` | TIMESTAMP |\n", "| `pxt.Date` | DATE |\n", "| `pxt.UUID` | UUID |\n", "| `pxt.Binary` | BLOB/BYTEA |\n", "\n", "**Unsupported types:**\n", "\n", "Media types like `pxt.Image`, `pxt.Video`, and `pxt.Audio` cannot be exported directly. Extract the data you need (paths, embeddings, metadata) before export." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "\n", "- [Working with Tigris](https://docs.pixeltable.com/howto/providers/working-with-tigris) - Sync media files to cloud storage\n", "- [Cloud Storage Integration](https://docs.pixeltable.com/integrations/cloud-storage) - S3, GCS, and Azure Blob storage\n", "- [Export to PyTorch](./data-export-pytorch) - Export for ML training" ] } ], "metadata": { "kernelspec": { "display_name": "pixeltable", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 2 }