{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Sharing\n", "\n", "Learn how to publish datasets to Pixeltable Cloud and replicate datasets from the cloud to your local environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "Pixeltable Cloud enables you to:\n", "\n", "- **Publish** your datasets for sharing with teams or the public\n", "- **Replicate** datasets from the cloud to your local environment\n", "- Share multimodal AI datasets (images, videos, audio, documents) without managing infrastructure\n", "\n", "This guide demonstrates both publishing and replicating datasets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "Data sharing functionality requires Pixeltable version 0.4.24 or later." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Replicating datasets\n", "\n", "You can replicate any public dataset from Pixeltable Cloud to your local environment without needing an account or API key.\n", "\n", "### Replicate a public dataset\n", "\n", "Let's replicate a mini-version of the COCO-2017 dataset from Pixeltable Cloud. You can find this dataset at [pixeltable.com/t/pixeltable:fiftyone/coco_mini_2017](https://www.pixeltable.com/t/pixeltable:fiftyone/coco_mini_2017), or browse for other [public datasets](https://www.pixeltable.com/data-products).\n", "\n", "When calling `replicate()`:\n", "\n", "- **`remote_uri`** (required): The URI of the cloud dataset you want to replicate\n", "- **`local_path`** (your choice): The local directory/table name where you want to store the replica\n", "- **Variable name** (your choice): The Python variable in your session/script to reference the table (e.g., `coco_copy`)\n", "\n", "See the [replicate() SDK reference](https://docs.pixeltable.com/sdk/latest/pixeltable#func-replicate) for full documentation." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata\n", "Created directory 'sharing-demo'.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9c6419f86ffa4813b5c8491472495192", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting table data into: /Users/asiegel/.pixeltable/tmp/acad78b1-4a62-483e-a0b1-728ccb5603cf\n",
      "Created directory '_system'.\n",
      "Created local replica 'sharing-demo/coco-copy' from URI: pxt://pixeltable:fiftyone/coco_mini_2017\n"
     ]
    }
   ],
   "source": [
    "import pixeltable as pxt\n",
    "\n",
    "pxt.drop_dir('sharing-demo', force=True)\n",
    "pxt.create_dir('sharing-demo')\n",
    "\n",
    "# The remote_uri is the specific cloud dataset you want to replicate\n",
    "# The local_path and variable name are yours to choose\n",
    "coco_copy = pxt.replicate(\n",
    "    remote_uri='pxt://pixeltable:fiftyone/coco_mini_2017',\n",
    "    local_path='sharing-demo.coco-copy',\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can check that the replica exists at the local path with `list_tables()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['sharing-demo/coco-copy']"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pxt.list_tables('sharing-demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see the structure of the replicated table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "\n",
       "  \n",
       "  \n",
       "  \n",
       "    \n",
       "      \n",
       "    \n",
       "  \n",
       "
replica 'sharing-demo/coco-copy'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
imageImage
coco_idInt
num_detectionsInt
widthIntwidth(image)
heightIntheight(image)
captionStringvision(image=image, model='gpt-4o-mini', prompt='Describe this image in one sentence, focusing on the main objects, their actions, and the setting. Use clear, factual language similar to COCO dataset captions.')
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Index NameColumnMetricEmbedding
idx6imagecosineclip(image, model_id='openai/clip-vit-base-patch16')
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
COMMENT: Mini-version of COCO-2017 validation split with 50 images from the FiftyOne dataset zoo.
\n" ], "text/plain": [ "replica 'sharing-demo/coco-copy'\n", "\n", " Column Name Type Computed With\n", " image Image \n", " coco_id Int \n", " num_detections Int \n", " width Int width(image)\n", " height Int height(image)\n", " caption String vision(image=image, model='gpt-4o-mini', promp...\n", "\n", " Index Name Column Metric Embedding\n", " idx6 image cosine clip(image, model_id='openai/clip-vit-base-pat...\n", "\n", "COMMENT: Mini-version of COCO-2017 validation split with 50 images from the FiftyOne dataset zoo." ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coco_copy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Working with replicas\n", "\n", "Replicated datasets are read-only locally, but you can query, explore, and use them in powerful ways:\n", "\n", "**1. Query and explore the data**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
imagecoco_idnum_detectionswidthheightcaption
\n", " \n", "
415640427A person wearing a helmet and protective gear rides a skateboard down a residential street, with houses and parked cars visible in the background.
\n", " \n", "
479426640A young man in a red shirt and black cap is cooking at a grill in a diner-like setting, while various condiments and kitchen utensils are visible on the counter and walls.
\n", " \n", "
441640427A brown eagle with a white head is flying low over a body of water, its wings spread wide against the dark, rippling surface.
" ], "text/plain": [ " image coco_id num_detections \\\n", "0 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a sample image to search with\n", "sample_img = (\n", " coco_copy.select(coco_copy.image).limit(1).collect()[0]['image']\n", ")\n", "sample_img" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
imagesimilarity
\n", " \n", "
1.
\n", " \n", "
0.708
\n", " \n", "
0.669
\n", " \n", "
0.607
\n", " \n", "
0.606
" ], "text/plain": [ " image similarity\n", "0 \n", " \n", " \n", " image\n", " similarity\n", " \n", " \n", " \n", " \n", "
\n", " \n", "
\n", " 0.268\n", " \n", " \n", "
\n", " \n", "
\n", " 0.262\n", " \n", " \n", "
\n", " \n", "
\n", " 0.234\n", " \n", " \n", "
\n", " \n", "
\n", " 0.22\n", " \n", " \n", "" ], "text/plain": [ " image similarity\n", "0