{ "cells": [ { "cell_type": "markdown", "id": "c6580207-e8eb-4730-9768-ff6a3b2f6004", "metadata": {}, "source": [ "# Computed Columns\n", "\n", "This guide introduces one of Pixeltable's most essential and powerful concepts: computed columns. You'll learn how to:\n", "\n", "- Add computed columns to a table\n", "- Use computed columns for complex operations such as image processing and model inference\n", "\n", "## Prerequisites\n", "\n", "This guide assumes you're familiar with:\n", "\n", "- Creating and managing tables\n", "- Inserting and querying data\n", "- Basic table operations\n", "\n", "If you're new to Pixeltable, start with the [Tables and Data Operations](https://docs.pixeltable.com/tutorials/tables-and-data-operations) guide.\n", "\n", "
\n", "If you are running this guide in Colab:\n", "In order to make it run a bit snappier, let's switch to a GPU-equipped instance for this Colab session. To do that, click on the Runtime -> Change runtime type menu item at the top, then select the GPU radio button and click on Save.\n", "
\n", "\n", "First, let's ensure the Pixeltable library is installed in your environment, along with the Huggingface `transformers` library." ] }, { "cell_type": "code", "execution_count": null, "id": "3b801006-8738-485d-97ca-58d698e02f7f", "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable torch transformers" ] }, { "cell_type": "markdown", "id": "50e33d13-f9ee-46de-b1b9-ef263c5d2f23", "metadata": {}, "source": [ "### Computed Columns" ] }, { "cell_type": "markdown", "id": "c0c8fa1b-9037-4f0f-8992-05eeada6e9fb", "metadata": {}, "source": [ "Let's start with a simple example that illustrates the basic concepts behind computed columns. We'll use a table of world population data for our example. Remember that you can import datasets into a Pixeltable table by using `pxt.create_table()` with the `source` parameter." ] }, { "cell_type": "code", "execution_count": 1, "id": "9ce15344-3990-4f43-9dc9-df3874cbb647", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata\n", "Created directory 'fundamentals'.\n", "Created table 'population'.\n", "Inserting rows into `population`: 234 rows [00:00, 6850.71 rows/s]\n", "Inserted 234 rows with 0 errors.\n" ] } ], "source": [ "import pixeltable as pxt\n", "\n", "pxt.create_dir('fundamentals', if_exists='ignore')\n", "pop_t = pxt.create_table(\n", " 'fundamentals/population',\n", " source='https://github.com/pixeltable/pixeltable/raw/release/docs/resources/world-population-data.csv',\n", " if_exists='replace',\n", ")" ] }, { "cell_type": "markdown", "id": "e1ceaafb-0173-4c98-aa1c-e5e4d1c31299", "metadata": {}, "source": [ "Also recall that `pop_t.head()` returns the first few rows of a table, and typing the table name `pop_t` by itself gives the schema." ] }, { "cell_type": "code", "execution_count": 2, "id": "f398a3d0-21e3-4b7f-850a-dfb2dd642c9c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cca3countrycontinentpop_2023pop_2022pop_2000area__km__
INDIndiaAsia1428627663141717317310596336753287590.
CHNChinaAsia1425671352142588733712640990699706961.
USAUnited StatesNorth America3399965633382898572823985549372610.
IDNIndonesiaAsia2775341222755013392140724211904569.
PAKPakistanAsia240485658235824862154369924881912.
" ], "text/plain": [ " cca3 country continent pop_2023 pop_2022 pop_2000 \\\n", "0 IND India Asia 1428627663 1417173173 1059633675 \n", "1 CHN China Asia 1425671352 1425887337 1264099069 \n", "2 USA United States North America 339996563 338289857 282398554 \n", "3 IDN Indonesia Asia 277534122 275501339 214072421 \n", "4 PAK Pakistan Asia 240485658 235824862 154369924 \n", "\n", " area__km__ \n", "0 3287590.0 \n", "1 9706961.0 \n", "2 9372610.0 \n", "3 1904569.0 \n", "4 881912.0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.head(5)" ] }, { "cell_type": "code", "execution_count": 3, "id": "8e21cc05-cdbf-4e0c-be5d-baa723f39bb2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'fundamentals/population'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
cca3String
countryString
continentString
pop_2023Int
pop_2022Int
pop_2000Int
area__km__Float
\n" ], "text/plain": [ "table 'fundamentals/population'\n", "\n", " Column Name Type Computed With\n", " cca3 String \n", " country String \n", " continent String \n", " pop_2023 Int \n", " pop_2022 Int \n", " pop_2000 Int \n", " area__km__ Float " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t" ] }, { "cell_type": "markdown", "id": "512a5b14-998a-458f-8cd1-a8cec212c1b7", "metadata": {}, "source": [ "Now let's suppose we want to add a new column for the year-over-year population change from 2022 to 2023. You can `select()` such a quantity into a Pixeltable `Query`, giving it the name `yoy_change` (year-over-year change):" ] }, { "cell_type": "code", "execution_count": 4, "id": "cb90fa27-d2dc-4b5f-8e8c-3ab554a831cf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyoy_change
India11454490
China-215985
United States1706706
Indonesia2032783
Pakistan4660796
" ], "text/plain": [ " country yoy_change\n", "0 India 11454490\n", "1 China -215985\n", "2 United States 1706706\n", "3 Indonesia 2032783\n", "4 Pakistan 4660796" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.select(\n", " pop_t.country, yoy_change=(pop_t.pop_2023 - pop_t.pop_2022)\n", ").head(5)" ] }, { "cell_type": "markdown", "id": "9fb3220e-7577-4aa8-a51a-6565fde17689", "metadata": {}, "source": [ "A __computed column__ is a way of turning such a selection into a new, permanent column of the table. Here's how it works:" ] }, { "cell_type": "code", "execution_count": 5, "id": "7e7b479e-221a-48ea-b334-e955645e1072", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 234 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "234 rows updated, 468 values computed." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.add_computed_column(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022))" ] }, { "cell_type": "markdown", "id": "6e6ea02d-56e5-4407-a6fb-8ebf96e1dd61", "metadata": {}, "source": [ "As soon as the column is added, Pixeltable will (by default) automatically compute its value for all rows in the table, storing the results in the new column. If we now inspect the schema of `pop_t`, we see the new column and its definition." ] }, { "cell_type": "code", "execution_count": 6, "id": "ec010d9b-cd9e-4040-93ef-8e008fce53e5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'fundamentals/population'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
cca3String
countryString
continentString
pop_2023Int
pop_2022Int
pop_2000Int
area__km__Float
yoy_changeIntpop_2023 - pop_2022
\n" ], "text/plain": [ "table 'fundamentals/population'\n", "\n", " Column Name Type Computed With\n", " cca3 String \n", " country String \n", " continent String \n", " pop_2023 Int \n", " pop_2022 Int \n", " pop_2000 Int \n", " area__km__ Float \n", " yoy_change Int pop_2023 - pop_2022" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t" ] }, { "cell_type": "markdown", "id": "5b70ee06-e946-402d-a4e9-357007ce90c5", "metadata": {}, "source": [ "The new column can be queried in the usual manner." ] }, { "cell_type": "code", "execution_count": 7, "id": "26882591-8101-4f10-bf2e-c5f0a1220d28", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyoy_change
India11454490
China-215985
United States1706706
Indonesia2032783
Pakistan4660796
" ], "text/plain": [ " country yoy_change\n", "0 India 11454490\n", "1 China -215985\n", "2 United States 1706706\n", "3 Indonesia 2032783\n", "4 Pakistan 4660796" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.select(pop_t.country, pop_t.yoy_change).head(5)" ] }, { "cell_type": "markdown", "id": "457ebd24-c2ea-46fd-aac9-d59457f75944", "metadata": {}, "source": [ "The output is identical to the previous example, but now we're retrieving the computed output from the database, instead of computing it on-the-fly.\n", "\n", "Computed columns can be \"chained\" with other computed columns. Here's an example that expresses population change as a percentage:" ] }, { "cell_type": "code", "execution_count": 8, "id": "671940f0-19e5-4e1f-8b60-32bf49e8e17c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 234 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "234 rows updated, 468 values computed." ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.add_computed_column(\n", " yoy_percent_change=(100 * pop_t.yoy_change / pop_t.pop_2022)\n", ")" ] }, { "cell_type": "code", "execution_count": 9, "id": "dfc3aa93-3338-4c5c-9097-76916af56768", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'fundamentals/population'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
cca3String
countryString
continentString
pop_2023Int
pop_2022Int
pop_2000Int
area__km__Float
yoy_changeIntpop_2023 - pop_2022
yoy_percent_changeFloat(100 * yoy_change) / pop_2022
\n" ], "text/plain": [ "table 'fundamentals/population'\n", "\n", " Column Name Type Computed With\n", " cca3 String \n", " country String \n", " continent String \n", " pop_2023 Int \n", " pop_2022 Int \n", " pop_2000 Int \n", " area__km__ Float \n", " yoy_change Int pop_2023 - pop_2022\n", " yoy_percent_change Float (100 * yoy_change) / pop_2022" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t" ] }, { "cell_type": "code", "execution_count": 10, "id": "ed03276d-c987-4222-bbca-76a6be06a51c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyoy_changeyoy_percent_change
India114544900.808
China-215985-0.015
United States17067060.505
Indonesia20327830.738
Pakistan46607961.976
" ], "text/plain": [ " country yoy_change yoy_percent_change\n", "0 India 11454490 0.808263\n", "1 China -215985 -0.015147\n", "2 United States 1706706 0.504510\n", "3 Indonesia 2032783 0.737849\n", "4 Pakistan 4660796 1.976380" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.select(\n", " pop_t.country, pop_t.yoy_change, pop_t.yoy_percent_change\n", ").head(5)" ] }, { "cell_type": "markdown", "id": "e94950c8-739e-4a6e-a07b-34a168726685", "metadata": {}, "source": [ "Although computed columns appear superficially similar to Queries, there is a key difference. Because computed columns are a permanent part of the table, they will be automatically updated any time new data is added to the table. These updates will propagate through any other computed columns that are \"downstream\" of the new data, ensuring that the state of the entire data is kept up-to-date.\n", "\n", "
\n", "In traditional data workflows, it is commonplace to recompute entire pipelines when the input dataset is changed or enlarged. In Pixeltable, by contrast, all updates are applied incrementally. When new data appear in a table or existing data are altered, Pixeltable will recompute only those rows that are dependent on the changed data.\n", "
\n", "\n", "Let's see how this works in practice. For purposes of illustration, we'll add an entry for California to the table, as if it were a country." ] }, { "cell_type": "code", "execution_count": 11, "id": "309d240d-2a14-4983-963c-38de012671c6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `population`: 1 rows [00:00, 228.35 rows/s]\n", "Inserted 1 row with 0 errors.\n" ] }, { "data": { "text/plain": [ "1 row inserted, 5 values computed." ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.insert(country='California', pop_2023=39110000, pop_2022=39030000)" ] }, { "cell_type": "markdown", "id": "ab488573-099d-4100-b6c9-007b6cb52edd", "metadata": {}, "source": [ "Observe that the computed columns `yoy_growth` and `yoy_percent_growth` have been automatically updated in response to the new data." ] }, { "cell_type": "code", "execution_count": 12, "id": "46b1fa03-080e-4b99-8f55-d25c48c56afe", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cca3countrycontinentpop_2023pop_2022pop_2000area__km__yoy_changeyoy_percent_change
FLKFalkland IslandsSouth America379137803080.012173.110.291
NIUNiueOceania193519342074.0261.10.052
TKLTokelauOceania189318711666.012.221.176
VATVatican CityEurope518510651.00.4481.569
NoneCaliforniaNone3911000039030000NaNNaN800000.205
" ], "text/plain": [ " cca3 country continent pop_2023 pop_2022 pop_2000 \\\n", "0 FLK Falkland Islands South America 3791 3780 3080.0 \n", "1 NIU Niue Oceania 1935 1934 2074.0 \n", "2 TKL Tokelau Oceania 1893 1871 1666.0 \n", "3 VAT Vatican City Europe 518 510 651.0 \n", "4 None California None 39110000 39030000 NaN \n", "\n", " area__km__ yoy_change yoy_percent_change \n", "0 12173.00 11 0.291005 \n", "1 261.00 1 0.051706 \n", "2 12.00 22 1.175842 \n", "3 0.44 8 1.568627 \n", "4 NaN 80000 0.204971 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.tail(5)" ] }, { "cell_type": "markdown", "id": "69471c19-96c2-48dc-a317-02662da7aca9", "metadata": {}, "source": [ "
\n", "Remember that all tables in Pixeltable are persistent. This includes computed columns: when you create a computed column, its definition is stored in the database. You can think of computed columns as setting up a persistent compute workflow: if you close your notebook or restart your Python instance, computed columns (along with the relationships between them, and any data contained in them) will be preserved.\n", "
" ] }, { "cell_type": "markdown", "id": "79636135-7dff-4eae-b2ad-0f39e9055051", "metadata": {}, "source": [ "### Recomputing Columns\n", "\n", "From time to time you might need to recompute the data in an existing computed column. Perhaps the *code* for one of your UDFs has changed, and you want to recompute a column that uses that UDF in order to pick up the new logic. Or perhaps you want to re-run a nondeterministic computation such as model inference. The command to do this is `recompute_columns()`. It won't do much in the current example, because all our computations are simple and deterministic, but for demonstration purposes here's what it looks like:" ] }, { "cell_type": "code", "execution_count": 13, "id": "33bedfd4-27e5-4ed8-8bf3-f3afa48d2967", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `population`: 235 rows [00:00, 8795.92 rows/s]\n" ] }, { "data": { "text/plain": [ "235 rows updated, 940 values computed." ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.recompute_columns(pop_t.yoy_change, pop_t.yoy_percent_change)" ] }, { "cell_type": "code", "execution_count": 14, "id": "5614239d-69ff-4ff5-aea0-cfd153d769b7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cca3countrycontinentpop_2023pop_2022pop_2000area__km__yoy_changeyoy_percent_change
FLKFalkland IslandsSouth America379137803080.012173.110.291
NIUNiueOceania193519342074.0261.10.052
TKLTokelauOceania189318711666.012.221.176
VATVatican CityEurope518510651.00.4481.569
NoneCaliforniaNone3911000039030000NaNNaN800000.205
" ], "text/plain": [ " cca3 country continent pop_2023 pop_2022 pop_2000 \\\n", "0 FLK Falkland Islands South America 3791 3780 3080.0 \n", "1 NIU Niue Oceania 1935 1934 2074.0 \n", "2 TKL Tokelau Oceania 1893 1871 1666.0 \n", "3 VAT Vatican City Europe 518 510 651.0 \n", "4 None California None 39110000 39030000 NaN \n", "\n", " area__km__ yoy_change yoy_percent_change \n", "0 12173.00 11 0.291005 \n", "1 261.00 1 0.051706 \n", "2 12.00 22 1.175842 \n", "3 0.44 8 1.568627 \n", "4 NaN 80000 0.204971 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_t.tail(5)" ] }, { "cell_type": "markdown", "id": "c69d22c4-70e8-488e-9872-b09f4967c63d", "metadata": {}, "source": [ "As expected, it looks the same.\n", "\n", "
\n", "If you modify the data that a computed column depends on, Pixeltable will recompute automatically; so recompute_columns() is primarily useful when the input data remains the same, but your UDF business logic changes.\n", "
" ] }, { "cell_type": "markdown", "id": "eeb2adaa-52e5-4617-b5d3-f7cfcc873510", "metadata": {}, "source": [ "### A More Complex Example: Image Processing" ] }, { "cell_type": "markdown", "id": "dc590704-601d-4104-85a2-395a3df029ff", "metadata": {}, "source": [ "Pixeltable supports media data such as images alongside traditional structured data. Let's explore an example that uses computed columns for image processing operations.\n", "\n", "In this example, we'll create the table directly by providing a schema, rather than importing it from a CSV." ] }, { "cell_type": "code", "execution_count": 15, "id": "a107160d-3bcb-4600-9631-8ce2251916a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'image_ops'.\n" ] } ], "source": [ "t = pxt.create_table('fundamentals/image_ops', {'source': pxt.Image})" ] }, { "cell_type": "code", "execution_count": 16, "id": "c0de60be-43ba-4810-911c-0979e24454a3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `image_ops`: 3 rows [00:00, 1133.39 rows/s]\n", "Inserted 3 rows with 0 errors.\n" ] }, { "data": { "text/plain": [ "3 rows inserted, 6 values computed." ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url_prefix = 'https://github.com/pixeltable/pixeltable/raw/release/docs/resources/images'\n", "images = ['000000000139.jpg', '000000000632.jpg', '000000000872.jpg']\n", "t.insert({'source': f'{url_prefix}/{image}'} for image in images)" ] }, { "cell_type": "code", "execution_count": 17, "id": "1c043567-79be-42f5-84dc-00ca5293003b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
source
\n", " \n", "
\n", " \n", "
\n", " \n", "
" ], "text/plain": [ " source\n", "0 \n", "\"UDF\" is standard terminology in databases, meaning \"User-Defined Function\". Technically speaking, the get_metadata() function isn't user-defined, it's built in to the Pixeltable library. But we'll consistently refer to Pixeltable functions as \"UDFs\" in order to clearly distinguish them from ordinary Python functions. Later in this guide, we'll see how to turn (almost) any Python function into a Pixeltable UDF.\n", "" ] }, { "cell_type": "code", "execution_count": 18, "id": "b9b528f9-a078-4bd4-a7ba-ede104c87270", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 3 column values with 0 errors.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourcemetadata
\n", " \n", "
{"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 426, "palette": null}
\n", " \n", "
{"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 483, "palette": null}
\n", " \n", "
{"bits": 8, "mode": "RGB", "width": 621, "format": "JPEG", "height": 640, "palette": null}
" ], "text/plain": [ " source \\\n", "0 \n", " \n", " \n", " source\n", " metadata\n", " rotated\n", " \n", " \n", " \n", " \n", "
\n", " \n", "
\n", " {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 426, "palette": null}\n", "
\n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", " {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 483, "palette": null}\n", "
\n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", " {"bits": 8, "mode": "RGB", "width": 621, "format": "JPEG", "height": 640, "palette": null}\n", "
\n", " \n", "
\n", " \n", " \n", "" ], "text/plain": [ " source \\\n", "0 \n", " \n", " \n", " source\n", " metadata\n", " rotated\n", " rotated_transparent\n", " \n", " \n", " \n", " \n", "
\n", " \n", "
\n", " {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 426, "palette": null}\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", " {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 483, "palette": null}\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", " {"bits": 8, "mode": "RGB", "width": 621, "format": "JPEG", "height": 640, "palette": null}\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " \n", "" ], "text/plain": [ " source \\\n", "0 \n", "In addition to get_metadata(), convert(), and rotate(), Pixeltable has a sizable library of other common image operations that can be used as UDFs in computed columns. For the most part, the image UDFs are analogs of the operations provided by the Pillow library (in fact, Pixeltable is just using Pillow under the covers). You can read more about the provided image (and other) UDFs in the Pixeltable SDK Documentation.\n", "" ] }, { "cell_type": "markdown", "id": "78e21ea7-fae6-4dc0-84a8-0ded949e4ba8", "metadata": {}, "source": [ "Let's have a look at our table schema." ] }, { "cell_type": "code", "execution_count": 22, "id": "78d30cf2-c74d-4ac2-ad59-a692876b7d0c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'fundamentals/image_ops'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
sourceImage
metadataJsonsource.get_metadata()
rotatedImagesource.rotate(10)
rotated_transparentImage['RGBA']source.convert('RGBA').rotate(10)
\n" ], "text/plain": [ "table 'fundamentals/image_ops'\n", "\n", " Column Name Type Computed With\n", " source Image \n", " metadata Json source.get_metadata()\n", " rotated Image source.rotate(10)\n", " rotated_transparent Image['RGBA'] source.convert('RGBA').rotate(10)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t" ] }, { "cell_type": "markdown", "id": "285cdb28-1b39-4c75-90bd-1b796056e93e", "metadata": {}, "source": [ "### Image Detection" ] }, { "cell_type": "markdown", "id": "31c7d5e8-c0aa-463e-af58-6f6ee7863084", "metadata": {}, "source": [ "In addition to simple operations like `rotate()` and `convert()`, the Pixeltable API includes UDFs for various off-the-shelf image models. Let's look at one example: object detection using the ResNet-50 model. Model inference is a UDF too, and it can be inserted into a computed column like any other.\n", "\n", "This one may take a little more time to compute, since it involves first downloading the ResNet-50 model (if it isn't already cached), then running inference on the images in our table." ] }, { "cell_type": "code", "execution_count": 23, "id": "5f83dc5c-b712-4886-8c18-f2bae260ccdb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 3 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "3 rows updated, 3 values computed." ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pixeltable.functions.huggingface import detr_for_object_detection\n", "\n", "t.add_computed_column(\n", " detections=detr_for_object_detection(\n", " t.source, model_id='facebook/detr-resnet-50', threshold=0.8\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 24, "id": "f8742aa2-d204-4183-a6de-31a0f4011fd4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourcedetections
\n", " \n", "
{"boxes": [[243.329, 198.938, 252.491, 212.817], [411.603, 219.568, 442.399, 305.185], [339.933, 202.026, 347.892, 217.15], [512.94, 200.648, 530.061, 221.701], [494.712, 175.403, 518.906, 285.738], [342.964, 178.523, 367.205, 226.858], ..., [445.77, 170.045, 508.718, 291.668], [359.977, 212.132, 372.942, 229.529], [363.898, 220.541, 417.606, 315.758], [448.891, 121.4, 462.657, 142.467], [421.982, 158.211, 465.436, 300.659], [383.675, 172.826, 403.362, 211.302]], "labels": [86, 62, 86, 78, 82, 64, ..., 82, 86, 62, 85, 1, 1], "scores": [0.979, 0.987, 0.908, 0.915, 0.884, 0.926, ..., 0.987, 0.99, 0.997, 0.972, 0.999, 0.882], "label_text": ["vase", "chair", "vase", "microwave", "refrigerator", "potted plant", ..., "refrigerator", "vase", "chair", "clock", "person", "person"]}
\n", " \n", "
{"boxes": [[0.677, 280.959, 401.834, 477.803], [488.731, 51.887, 497.612, 82.074], [472.22, 251.556, 486.093, 287.805], [490.944, 249.437, 514.341, 288.037], [513.687, 48.824, 522.899, 80.729], [479.89, 251.301, 497.199, 288.15], ..., [341.577, 219.091, 429.855, 352.103], [468.799, 251.786, 481.076, 288.294], [461.138, 295.626, 470.704, 337.119], [503.576, 247.766, 535.297, 288.59], [440.375, 250.286, 449.473, 287.611], [499.56, 51.345, 508.956, 81.398]], "labels": [65, 84, 84, 84, 84, 84, ..., 64, 84, 84, 84, 84, 84], "scores": [0.995, 0.808, 0.819, 0.847, 0.841, 0.822, ..., 0.999, 0.81, 0.831, 0.844, 0.818, 0.81], "label_text": ["bed", "book", "book", "book", "book", "book", ..., "potted plant", "book", "book", "book", "book", "book"]}
\n", " \n", "
{"boxes": [[270.334, 353.472, 307.723, 404.887], [155.908, 101.902, 434.426, 551.656], [380.517, 160.025, 430.297, 190.625], [154.665, 128.519, 425.751, 607.173], [289.045, 98.536, 437.856, 549.117]], "labels": [40, 1, 37, 1, 1], "scores": [0.821, 0.897, 0.972, 0.98, 0.909], "label_text": ["baseball glove", "person", "sports ball", "person", "person"]}
" ], "text/plain": [ " source \\\n", "0 \n", " \n", " \n", " source\n", " image_with_bb\n", " \n", " \n", " \n", " \n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " \n", "" ], "text/plain": [ " source \\\n", "0 \n", " \n", " \n", " image_with_bb\n", " \n", " \n", " \n", " \n", "
\n", " \n", "
\n", " \n", " \n", "" ], "text/plain": [ " image_with_bb\n", "0 \n", "#T_0a06a_row0_col0 {\n", " white-space: pre-wrap;\n", " text-align: left;\n", " font-weight: bold;\n", "}\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'fundamentals/image_ops'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
sourceImage
metadataJsonsource.get_metadata()
rotatedImagesource.rotate(10)
rotated_transparentImage['RGBA']source.convert('RGBA').rotate(10)
detectionsJsondetr_for_object_detection(source, model_id='facebook/detr-resnet-50', threshold=0.8)
image_with_bbImagedraw_bounding_boxes(source, detections.boxes, detections.label_text, fill=True)
\n" ], "text/plain": [ "table 'fundamentals/image_ops'\n", "\n", " Column Name Type Computed With\n", " source Image \n", " metadata Json source.get_metadata()\n", " rotated Image source.rotate(10)\n", " rotated_transparent Image['RGBA'] source.convert('RGBA').rotate(10)\n", " detections Json detr_for_object_detection(source, model_id='fa...\n", " image_with_bb Image draw_bounding_boxes(source, detections.boxes, ..." ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t" ] }, { "cell_type": "markdown", "id": "11d89a01-8bb6-4e98-ad86-949748e48ad9", "metadata": {}, "source": [ "And as always, when we add new data to the table, its computed columns are updated automatically. Let's try this on a few more images." ] }, { "cell_type": "code", "execution_count": 28, "id": "5cc956a5-3062-4440-ae08-169df53a7172", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `image_ops`: 2 rows [00:00, 944.77 rows/s]\n", "Inserted 2 rows with 0 errors.\n" ] }, { "data": { "text/plain": [ "2 rows inserted, 14 values computed." ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "more_images = ['000000000108.jpg', '000000000885.jpg']\n", "t.insert({'source': f'{url_prefix}/{image}'} for image in more_images)" ] }, { "cell_type": "code", "execution_count": 29, "id": "5be3825d-e014-400b-8d42-b9666db465ca", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceimage_with_bbdetections_labeltextmetadata
\n", " \n", "
\n", " \n", "
["train"]{"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 427, "palette": null}
\n", " \n", "
\n", " \n", "
["person", "person", "tennis racket", "person", "person", "person", "person", "person", "person", "person", "person", "person", "person"]{"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 427, "palette": null}
" ], "text/plain": [ " source \\\n", "0 \n", "It bears repeating that Pixeltable is persistent! Anything you put into a table, including computed columns, will be saved in persistent storage. This includes inference outputs such as t.detections, as well as generated images such as t.image_with_bb. (Later we'll see how to tune this behavior in cases where it might be undesirable to store everything, but the default behavior is that computed column output is always persisted.)\n", "" ] }, { "cell_type": "markdown", "id": "47451ff9-97d4-454c-9361-88ee1a3c17a7", "metadata": {}, "source": [ "### Expressions\n", "\n", "Let's have a closer look at that call to `draw_bounding_boxes()` in the last example.\n", "\n", "```python\n", "draw_bounding_boxes(t.source, t.detections.boxes, labels=t.detections.label_text, fill=True)\n", "```\n", "\n", "There are a couple of things going on. `draw_bounding_boxes()` is, of course, a UDF, and its first argument is a column reference of the sort we've used many times now: `t.source`, the source image. The other two arguments are more than simple column references, though: they're compound expressions that include the column reference `t.detections` along with a suffix (`.boxes` or `.label_text`) that tells Pixeltable to look inside the dictionary stored in `t.detections`.\n", "\n", "These are all examples of Pixeltable expressions. In fact, we've seen other types of Pixeltable expressions as well, without explicitly calling them out:\n", "\n", "- Calls to a UDF are expressions, such as `t.source.rotate(10)`, or the `draw_bounding_boxes()` example above;\n", "- Arithmetic operations are expressions, such as year-over-year calculation in our first example: `100 * pop_t.yoy_change / pop_t.pop_2022`.\n", "\n", "## Next Steps\n", "\n", "Learn more about working with Pixeltable:\n", "\n", "- [Queries and Expressions](https://docs.pixeltable.com/tutorials/queries-and-expressions)\n", "- [Tables and Data Operations](https://docs.pixeltable.com/tutorials/tables-and-data-operations)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.19" } }, "nbformat": 4, "nbformat_minor": 5 }