{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Sharing\n", "\n", "Learn how to publish datasets to Pixeltable Cloud and replicate datasets from the cloud to your local environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "Pixeltable Cloud enables you to:\n", "\n", "- **Publish** your datasets for sharing with teams or the public\n", "- **Replicate** datasets from the cloud to your local environment\n", "- Share multimodal AI datasets (images, videos, audio, documents) without managing infrastructure\n", "\n", "This guide demonstrates both publishing and replicating datasets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "Data sharing functionality requires Pixeltable version 0.4.24 or later." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Replicating datasets\n", "\n", "You can replicate any public dataset from Pixeltable Cloud to your local environment without needing an account or API key.\n", "\n", "### Replicate a public dataset\n", "\n", "Let's replicate a mini-version of the COCO-2017 dataset from Pixeltable Cloud. You can find this dataset at [pixeltable.com/t/pixeltable:fiftyone/coco_mini_2017](https://www.pixeltable.com/t/pixeltable:fiftyone/coco_mini_2017), or browse for other [public datasets](https://www.pixeltable.com/data-products).\n", "\n", "When calling `replicate()`:\n", "\n", "- **`remote_uri`** (required): The URI of the cloud dataset you want to replicate\n", "- **`local_path`** (your choice): The local directory/table name where you want to store the replica\n", "- **Variable name** (your choice): The Python variable in your session/script to reference the table (e.g., `coco_copy`)\n", "\n", "See the [replicate() SDK reference](https://docs.pixeltable.com/sdk/latest/pixeltable#func-replicate) for full documentation." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata\n", "Created directory 'sharing-demo'.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9c6419f86ffa4813b5c8491472495192", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n" ], "text/plain": [] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Extracting table data into: /Users/asiegel/.pixeltable/tmp/acad78b1-4a62-483e-a0b1-728ccb5603cf\n", "Created directory '_system'.\n", "Created local replica 'sharing-demo/coco-copy' from URI: pxt://pixeltable:fiftyone/coco_mini_2017\n" ] } ], "source": [ "import pixeltable as pxt\n", "\n", "pxt.drop_dir('sharing-demo', force=True)\n", "pxt.create_dir('sharing-demo')\n", "\n", "# The remote_uri is the specific cloud dataset you want to replicate\n", "# The local_path and variable name are yours to choose\n", "coco_copy = pxt.replicate(\n", " remote_uri='pxt://pixeltable:fiftyone/coco_mini_2017',\n", " local_path='sharing-demo.coco-copy',\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can check that the replica exists at the local path with `list_tables()`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['sharing-demo/coco-copy']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pxt.list_tables('sharing-demo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see the structure of the replicated table:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "| replica 'sharing-demo/coco-copy' | \n", "
| Column Name | \n", "Type | \n", "Computed With | \n", "
|---|---|---|
| image | \n", "Image | \n", "\n", " |
| coco_id | \n", "Int | \n", "\n", " |
| num_detections | \n", "Int | \n", "\n", " |
| width | \n", "Int | \n", "width(image) | \n", "
| height | \n", "Int | \n", "height(image) | \n", "
| caption | \n", "String | \n", "vision(image=image, model='gpt-4o-mini', prompt='Describe this image in one sentence, focusing on the main objects, their actions, and the setting. Use clear, factual language similar to COCO dataset captions.') | \n", "
| Index Name | \n", "Column | \n", "Metric | \n", "Embedding | \n", "
|---|---|---|---|
| idx6 | \n", "image | \n", "cosine | \n", "clip(image, model_id='openai/clip-vit-base-patch16') | \n", "
| COMMENT: Mini-version of COCO-2017 validation split with 50 images from the FiftyOne dataset zoo. | \n", "
| image | \n", "coco_id | \n", "num_detections | \n", "width | \n", "height | \n", "caption | \n", "
|---|---|---|---|---|---|
\n",
" | \n",
" 41 | \n", "5 | \n", "640 | \n", "427 | \n", "A person wearing a helmet and protective gear rides a skateboard down a residential street, with houses and parked cars visible in the background. | \n", "
\n",
" | \n",
" 47 | \n", "9 | \n", "426 | \n", "640 | \n", "A young man in a red shirt and black cap is cooking at a grill in a diner-like setting, while various condiments and kitchen utensils are visible on the counter and walls. | \n", "
\n",
" | \n",
" 44 | \n", "1 | \n", "640 | \n", "427 | \n", "A brown eagle with a white head is flying low over a body of water, its wings spread wide against the dark, rippling surface. | \n", "
| image | \n", "similarity | \n", "
|---|---|
\n",
" | \n",
" 1. | \n", "
\n",
" | \n",
" 0.708 | \n", "
\n",
" | \n",
" 0.669 | \n", "
\n",
" | \n",
" 0.607 | \n", "
\n",
" | \n",
" 0.606 | \n", "