{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convert Dataset Formats\n", "\n", "This recipe demonstrates how to use FiftyOne to convert datasets on disk between common formats." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you haven't already, install FiftyOne:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook contains bash commands. To run it as a notebook, you must install the [Jupyter bash kernel](https://github.com/takluyver/bash_kernel) via the command below.\n", "\n", "Alternatively, you can just copy + paste the code blocks into your shell." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "pip install bash_kernel\n", "python -m bash_kernel.install" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this recipe we'll use the [FiftyOne Dataset Zoo](https://docs.voxel51.com/dataset_zoo/index.html) to download some open source datasets to work with.\n", "\n", "Specifically, we'll need [TensorFlow](https://www.tensorflow.org/) and [TensorFlow Datasets](https://www.tensorflow.org/datasets) installed to [access the datasets](https://docs.voxel51.com/dataset_zoo/api.html#customizing-your-ml-backend):" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "pip install tensorflow tensorflow-datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download datasets\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the test split of the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) from the [FiftyOne Dataset Zoo](https://docs.voxel51.com/dataset_zoo/index.html) using the command below:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading split 'test' to '~/fiftyone/cifar10/test'\n", "Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ~/fiftyone/cifar10/tmp-download/cifar-10-python.tar.gz\n", "170500096it [00:04, 35887670.65it/s] \n", "Extracting ~/fiftyone/cifar10/tmp-download/cifar-10-python.tar.gz to ~/fiftyone/cifar10/tmp-download\n", " 100% |███| 10000/10000 [5.2s elapsed, 0s remaining, 1.8K samples/s] \n", "Dataset info written to '~/fiftyone/cifar10/info.json'\n" ] } ], "source": [ "# Download the test split of CIFAR-10\n", "fiftyone zoo datasets download cifar10 --split test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the validation split of the [KITTI dataset]( http://www.cvlibs.net/datasets/kitti) from the [FiftyOne Dataset Zoo](https://docs.voxel51.com/dataset_zoo/index.html) using the command below:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Split 'validation' already downloaded\n" ] } ], "source": [ "# Download the validation split of KITTI\n", "fiftyone zoo datasets download kitti --split validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The fiftyone convert command" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [FiftyOne CLI](https://voxel51.com/docs/fiftyone/cli/index.html) provides a number of utilities for importing and exporting datasets in a variety of common (or custom) formats.\n", "\n", "Specifically, the `fiftyone convert` command provides a convenient way to convert datasets on disk between formats by specifying the [fiftyone.types.Dataset](https://voxel51.com/docs/fiftyone/api/fiftyone.types.html#fiftyone.types.dataset_types.Dataset) type of the input and desired output.\n", "\n", "FiftyOne provides a collection of [builtin types](https://voxel51.com/docs/fiftyone/user_guide/import_datasets.html#supported-import-formats) that you can use to read/write datasets in common formats out-of-the-box:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "