{ "cells": [ { "cell_type": "markdown", "id": "c6580207-e8eb-4730-9768-ff6a3b2f6004", "metadata": {}, "source": [ "# Computed Columns\n", "\n", "This guide introduces one of Pixeltable's most essential and powerful concepts: computed columns. You'll learn how to:\n", "\n", "- Add computed columns to a table\n", "- Use computed columns for complex operations such as image processing and model inference\n", "\n", "## Prerequisites\n", "\n", "This guide assumes you're familiar with:\n", "\n", "- Creating and managing tables\n", "- Inserting and querying data\n", "- Basic table operations\n", "\n", "If you're new to Pixeltable, start with the [Tables and Data Operations](https://docs.pixeltable.com/tutorials/tables-and-data-operations) guide.\n", "\n", "
Runtime -> Change runtime type menu item at the top, then select the GPU radio button and click on Save.\n",
"| cca3 | \n", "country | \n", "continent | \n", "pop_2023 | \n", "pop_2022 | \n", "pop_2000 | \n", "area__km__ | \n", "
|---|---|---|---|---|---|---|
| IND | \n", "India | \n", "Asia | \n", "1428627663 | \n", "1417173173 | \n", "1059633675 | \n", "3287590. | \n", "
| CHN | \n", "China | \n", "Asia | \n", "1425671352 | \n", "1425887337 | \n", "1264099069 | \n", "9706961. | \n", "
| USA | \n", "United States | \n", "North America | \n", "339996563 | \n", "338289857 | \n", "282398554 | \n", "9372610. | \n", "
| IDN | \n", "Indonesia | \n", "Asia | \n", "277534122 | \n", "275501339 | \n", "214072421 | \n", "1904569. | \n", "
| PAK | \n", "Pakistan | \n", "Asia | \n", "240485658 | \n", "235824862 | \n", "154369924 | \n", "881912. | \n", "
| table 'fundamentals/population' | \n", "
| Column Name | \n", "Type | \n", "Computed With | \n", "
|---|---|---|
| cca3 | \n", "String | \n", "\n", " |
| country | \n", "String | \n", "\n", " |
| continent | \n", "String | \n", "\n", " |
| pop_2023 | \n", "Int | \n", "\n", " |
| pop_2022 | \n", "Int | \n", "\n", " |
| pop_2000 | \n", "Int | \n", "\n", " |
| area__km__ | \n", "Float | \n", "\n", " |
| country | \n", "yoy_change | \n", "
|---|---|
| India | \n", "11454490 | \n", "
| China | \n", "-215985 | \n", "
| United States | \n", "1706706 | \n", "
| Indonesia | \n", "2032783 | \n", "
| Pakistan | \n", "4660796 | \n", "
| table 'fundamentals/population' | \n", "
| Column Name | \n", "Type | \n", "Computed With | \n", "
|---|---|---|
| cca3 | \n", "String | \n", "\n", " |
| country | \n", "String | \n", "\n", " |
| continent | \n", "String | \n", "\n", " |
| pop_2023 | \n", "Int | \n", "\n", " |
| pop_2022 | \n", "Int | \n", "\n", " |
| pop_2000 | \n", "Int | \n", "\n", " |
| area__km__ | \n", "Float | \n", "\n", " |
| yoy_change | \n", "Int | \n", "pop_2023 - pop_2022 | \n", "
| country | \n", "yoy_change | \n", "
|---|---|
| India | \n", "11454490 | \n", "
| China | \n", "-215985 | \n", "
| United States | \n", "1706706 | \n", "
| Indonesia | \n", "2032783 | \n", "
| Pakistan | \n", "4660796 | \n", "
| table 'fundamentals/population' | \n", "
| Column Name | \n", "Type | \n", "Computed With | \n", "
|---|---|---|
| cca3 | \n", "String | \n", "\n", " |
| country | \n", "String | \n", "\n", " |
| continent | \n", "String | \n", "\n", " |
| pop_2023 | \n", "Int | \n", "\n", " |
| pop_2022 | \n", "Int | \n", "\n", " |
| pop_2000 | \n", "Int | \n", "\n", " |
| area__km__ | \n", "Float | \n", "\n", " |
| yoy_change | \n", "Int | \n", "pop_2023 - pop_2022 | \n", "
| yoy_percent_change | \n", "Float | \n", "(100 * yoy_change) / pop_2022 | \n", "
| country | \n", "yoy_change | \n", "yoy_percent_change | \n", "
|---|---|---|
| India | \n", "11454490 | \n", "0.808 | \n", "
| China | \n", "-215985 | \n", "-0.015 | \n", "
| United States | \n", "1706706 | \n", "0.505 | \n", "
| Indonesia | \n", "2032783 | \n", "0.738 | \n", "
| Pakistan | \n", "4660796 | \n", "1.976 | \n", "
| cca3 | \n", "country | \n", "continent | \n", "pop_2023 | \n", "pop_2022 | \n", "pop_2000 | \n", "area__km__ | \n", "yoy_change | \n", "yoy_percent_change | \n", "
|---|---|---|---|---|---|---|---|---|
| FLK | \n", "Falkland Islands | \n", "South America | \n", "3791 | \n", "3780 | \n", "3080.0 | \n", "12173. | \n", "11 | \n", "0.291 | \n", "
| NIU | \n", "Niue | \n", "Oceania | \n", "1935 | \n", "1934 | \n", "2074.0 | \n", "261. | \n", "1 | \n", "0.052 | \n", "
| TKL | \n", "Tokelau | \n", "Oceania | \n", "1893 | \n", "1871 | \n", "1666.0 | \n", "12. | \n", "22 | \n", "1.176 | \n", "
| VAT | \n", "Vatican City | \n", "Europe | \n", "518 | \n", "510 | \n", "651.0 | \n", "0.44 | \n", "8 | \n", "1.569 | \n", "
| None | \n", "California | \n", "None | \n", "39110000 | \n", "39030000 | \n", "NaN | \n", "NaN | \n", "80000 | \n", "0.205 | \n", "
| cca3 | \n", "country | \n", "continent | \n", "pop_2023 | \n", "pop_2022 | \n", "pop_2000 | \n", "area__km__ | \n", "yoy_change | \n", "yoy_percent_change | \n", "
|---|---|---|---|---|---|---|---|---|
| FLK | \n", "Falkland Islands | \n", "South America | \n", "3791 | \n", "3780 | \n", "3080.0 | \n", "12173. | \n", "11 | \n", "0.291 | \n", "
| NIU | \n", "Niue | \n", "Oceania | \n", "1935 | \n", "1934 | \n", "2074.0 | \n", "261. | \n", "1 | \n", "0.052 | \n", "
| TKL | \n", "Tokelau | \n", "Oceania | \n", "1893 | \n", "1871 | \n", "1666.0 | \n", "12. | \n", "22 | \n", "1.176 | \n", "
| VAT | \n", "Vatican City | \n", "Europe | \n", "518 | \n", "510 | \n", "651.0 | \n", "0.44 | \n", "8 | \n", "1.569 | \n", "
| None | \n", "California | \n", "None | \n", "39110000 | \n", "39030000 | \n", "NaN | \n", "NaN | \n", "80000 | \n", "0.205 | \n", "
recompute_columns() is primarily useful when the input data remains the same, but your UDF business logic changes.\n",
"| source | \n", "
|---|
\n",
" | \n",
"
\n",
" | \n",
"
\n",
" | \n",
"
get_metadata() function isn't user-defined, it's built in to the Pixeltable library. But we'll consistently refer to Pixeltable functions as \"UDFs\" in order to clearly distinguish them from ordinary Python functions. Later in this guide, we'll see how to turn (almost) any Python function into a Pixeltable UDF.\n",
""
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "b9b528f9-a078-4bd4-a7ba-ede104c87270",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Added 3 column values with 0 errors.\n"
]
},
{
"data": {
"text/html": [
"| source | \n", "metadata | \n", "
|---|---|
\n",
" | \n",
" {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 426, "palette": null} | \n", "
\n",
" | \n",
" {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 483, "palette": null} | \n", "
\n",
" | \n",
" {"bits": 8, "mode": "RGB", "width": 621, "format": "JPEG", "height": 640, "palette": null} | \n", "
get_metadata(), convert(), and rotate(), Pixeltable has a sizable library of other common image operations that can be used as UDFs in computed columns. For the most part, the image UDFs are analogs of the operations provided by the Pillow library (in fact, Pixeltable is just using Pillow under the covers). You can read more about the provided image (and other) UDFs in the Pixeltable SDK Documentation.\n",
""
]
},
{
"cell_type": "markdown",
"id": "78e21ea7-fae6-4dc0-84a8-0ded949e4ba8",
"metadata": {},
"source": [
"Let's have a look at our table schema."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "78d30cf2-c74d-4ac2-ad59-a692876b7d0c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"| table 'fundamentals/image_ops' | \n", "
| Column Name | \n", "Type | \n", "Computed With | \n", "
|---|---|---|
| source | \n", "Image | \n", "\n", " |
| metadata | \n", "Json | \n", "source.get_metadata() | \n", "
| rotated | \n", "Image | \n", "source.rotate(10) | \n", "
| rotated_transparent | \n", "Image['RGBA'] | \n", "source.convert('RGBA').rotate(10) | \n", "
| source | \n", "detections | \n", "
|---|---|
\n",
" | \n",
" {"boxes": [[243.329, 198.938, 252.491, 212.817], [411.603, 219.568, 442.399, 305.185], [339.933, 202.026, 347.892, 217.15], [512.94, 200.648, 530.061, 221.701], [494.712, 175.403, 518.906, 285.738], [342.964, 178.523, 367.205, 226.858], ..., [445.77, 170.045, 508.718, 291.668], [359.977, 212.132, 372.942, 229.529], [363.898, 220.541, 417.606, 315.758], [448.891, 121.4, 462.657, 142.467], [421.982, 158.211, 465.436, 300.659], [383.675, 172.826, 403.362, 211.302]], "labels": [86, 62, 86, 78, 82, 64, ..., 82, 86, 62, 85, 1, 1], "scores": [0.979, 0.987, 0.908, 0.915, 0.884, 0.926, ..., 0.987, 0.99, 0.997, 0.972, 0.999, 0.882], "label_text": ["vase", "chair", "vase", "microwave", "refrigerator", "potted plant", ..., "refrigerator", "vase", "chair", "clock", "person", "person"]} | \n", "
\n",
" | \n",
" {"boxes": [[0.677, 280.959, 401.834, 477.803], [488.731, 51.887, 497.612, 82.074], [472.22, 251.556, 486.093, 287.805], [490.944, 249.437, 514.341, 288.037], [513.687, 48.824, 522.899, 80.729], [479.89, 251.301, 497.199, 288.15], ..., [341.577, 219.091, 429.855, 352.103], [468.799, 251.786, 481.076, 288.294], [461.138, 295.626, 470.704, 337.119], [503.576, 247.766, 535.297, 288.59], [440.375, 250.286, 449.473, 287.611], [499.56, 51.345, 508.956, 81.398]], "labels": [65, 84, 84, 84, 84, 84, ..., 64, 84, 84, 84, 84, 84], "scores": [0.995, 0.808, 0.819, 0.847, 0.841, 0.822, ..., 0.999, 0.81, 0.831, 0.844, 0.818, 0.81], "label_text": ["bed", "book", "book", "book", "book", "book", ..., "potted plant", "book", "book", "book", "book", "book"]} | \n", "
\n",
" | \n",
" {"boxes": [[270.334, 353.472, 307.723, 404.887], [155.908, 101.902, 434.426, 551.656], [380.517, 160.025, 430.297, 190.625], [154.665, 128.519, 425.751, 607.173], [289.045, 98.536, 437.856, 549.117]], "labels": [40, 1, 37, 1, 1], "scores": [0.821, 0.897, 0.972, 0.98, 0.909], "label_text": ["baseball glove", "person", "sports ball", "person", "person"]} | \n", "
| table 'fundamentals/image_ops' | \n", "
| Column Name | \n", "Type | \n", "Computed With | \n", "
|---|---|---|
| source | \n", "Image | \n", "\n", " |
| metadata | \n", "Json | \n", "source.get_metadata() | \n", "
| rotated | \n", "Image | \n", "source.rotate(10) | \n", "
| rotated_transparent | \n", "Image['RGBA'] | \n", "source.convert('RGBA').rotate(10) | \n", "
| detections | \n", "Json | \n", "detr_for_object_detection(source, model_id='facebook/detr-resnet-50', threshold=0.8) | \n", "
| image_with_bb | \n", "Image | \n", "draw_bounding_boxes(source, detections.boxes, detections.label_text, fill=True) | \n", "
| source | \n", "image_with_bb | \n", "detections_labeltext | \n", "metadata | \n", "
|---|---|---|---|
\n",
" | \n",
" \n",
" | \n",
" ["train"] | \n", "{"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 427, "palette": null} | \n", "
\n",
" | \n",
" \n",
" | \n",
" ["person", "person", "tennis racket", "person", "person", "person", "person", "person", "person", "person", "person", "person", "person"] | \n", "{"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 427, "palette": null} | \n", "
t.detections, as well as generated images such as t.image_with_bb. (Later we'll see how to tune this behavior in cases where it might be undesirable to store everything, but the default behavior is that computed column output is always persisted.)\n",
""
]
},
{
"cell_type": "markdown",
"id": "47451ff9-97d4-454c-9361-88ee1a3c17a7",
"metadata": {},
"source": [
"### Expressions\n",
"\n",
"Let's have a closer look at that call to `draw_bounding_boxes()` in the last example.\n",
"\n",
"```python\n",
"draw_bounding_boxes(t.source, t.detections.boxes, labels=t.detections.label_text, fill=True)\n",
"```\n",
"\n",
"There are a couple of things going on. `draw_bounding_boxes()` is, of course, a UDF, and its first argument is a column reference of the sort we've used many times now: `t.source`, the source image. The other two arguments are more than simple column references, though: they're compound expressions that include the column reference `t.detections` along with a suffix (`.boxes` or `.label_text`) that tells Pixeltable to look inside the dictionary stored in `t.detections`.\n",
"\n",
"These are all examples of Pixeltable expressions. In fact, we've seen other types of Pixeltable expressions as well, without explicitly calling them out:\n",
"\n",
"- Calls to a UDF are expressions, such as `t.source.rotate(10)`, or the `draw_bounding_boxes()` example above;\n",
"- Arithmetic operations are expressions, such as year-over-year calculation in our first example: `100 * pop_t.yoy_change / pop_t.pop_2022`.\n",
"\n",
"## Next Steps\n",
"\n",
"Learn more about working with Pixeltable:\n",
"\n",
"- [Queries and Expressions](https://docs.pixeltable.com/tutorials/queries-and-expressions)\n",
"- [Tables and Data Operations](https://docs.pixeltable.com/tutorials/tables-and-data-operations)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.19"
}
},
"nbformat": 4,
"nbformat_minor": 5
}