{ "cells": [ { "cell_type": "markdown", "id": "c0b21982", "metadata": {}, "source": [ "# Embedding Indices\n", "\n", "Main takeaways:\n", "\n", "- Indexing in Pixeltable is declarative\n", " - you create an index on a column and supply the embedding functions you want to use (for inserting data into the index as well as lookups)\n", " - Pixeltable maintains the index in response to any kind of update of the indexed table (i.e., `insert()`/`update()`/`delete()`)\n", "- Perform index lookups with the `similarity()` pseudo-function, in combination with the `order_by()` and `limit()` clauses" ] }, { "cell_type": "markdown", "id": "b0af852e", "metadata": {}, "source": [ "To make this concrete, let's create a table of images with the [`create_table()`](https://docs.pixeltable.com/sdk/latest/pixeltable#func-create_table) function.\n", "We're also going to add some columns to demonstrate combining similarity search with other predicates.\n", "\n", "
Runtime -> Change runtime type menu item at the top, then select the GPU radio button and click on Save.\n",
".using() as a partial function operator. It's a general operator that can be applied to any UDF (not just embedding functions), transforming a UDF with n parameters into one with k parameters by fixing the values of n-k of its arguments. Python has something similar in the functools package: the functools.partial() operator.\n",
"| id | \n", "img | \n", "similarity | \n", "
|---|---|---|
| 6 | \n", "\n",
" | \n",
" 1. | \n", "
| 3 | \n", "\n",
" | \n",
" 0.607 | \n", "
| text | \n", "similarity | \n", "
|---|---|
| Picasso's output, especially in his early career, is often periodized. | \n", "0.699 | \n", "
| During the first decade of the 20th century, his style changed as he experimented with different theories, techniques, and ideas. | \n", "0.697 | \n", "