{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Get your data ready for training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.basic_data import * " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class DataBunch[source]

\n", "\n", "> DataBunch(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `test_dl`:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). \n", "\n", "An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

create[source]

\n", "\n", "> create(`train_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `valid_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `test_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)=`None`, `path`:`PathOrStr`=`'.'`, `bs`:`int`=`64`, `num_workers`:`int`=`16`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `collate_fn`:`Callable`=`'data_collate'`) → `DataBunch`" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.create, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

dl[source]

\n", "\n", "> dl(`ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=``) → [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)\n", "\n", "Returns appropriate [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) for validation, training, or test (`ds_type`). " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.dl)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

add_tfm[source]

\n", "\n", "> add_tfm(`tfm`:`Callable`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.add_tfm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adds a transform to all dataloaders." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class DeviceDataLoader[source]

\n", "\n", "> DeviceDataLoader(`dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device), `tfms`:`List`\\[`Callable`\\]=`None`, `collate_fn`:`Callable`=`'data_collate'`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Factory method" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

create[source]

\n", "\n", "> create(`dataset`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `bs`:`int`=`64`, `shuffle`:`bool`=`False`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`device(type='cuda')`, `tfms`:`Collection`\\[`Callable`\\]=`None`, `num_workers`:`int`=`16`, `collate_fn`:`Callable`=`'data_collate'`, `kwargs`:`Any`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.create, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Methods" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "hide_input": false }, "outputs": [ { "data": { "text/markdown": [ "

one_batch[source]

\n", "\n", "> one_batch() → `Collection`\\[`Tensor`\\]\n", "\n", "Get one batch from the data loader. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.one_batch)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

add_tfm[source]

\n", "\n", "> add_tfm(`tfm`:`Callable`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.add_tfm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add a transform (i.e. same as `self.tfms.append(tfm)`)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

remove_tfm[source]

\n", "\n", "> remove_tfm(`tfm`:`Callable`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.remove_tfm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remove a transform." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generic classes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first two last classes are just empty shell to be subclassed by one of the applications, the last one is there to create empty [`DataBunch`](/basic_data.html#DataBunch) (useful when we want a [`Learner`](/basic_train.html#Learner) on inference mode)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class DatasetBase[source]

\n", "\n", "> DatasetBase(`c`:`int`) :: [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)\n", "\n", "Base class for all fastai datasets. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DatasetBase, title_level=3)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class LabelDataset[source]

\n", "\n", "> LabelDataset(`classes`:`Collection`, `class2idx`:`Dict`\\[`Any`, `int`\\]=`None`) :: [`DatasetBase`](/basic_data.html#DatasetBase)\n", "\n", "Base class for fastai datasets that do classification, mapped according to `classes`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(LabelDataset, title_level=3)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class SingleClassificationDataset[source]

\n", "\n", "> SingleClassificationDataset(`classes`:`StrList`) :: [`DatasetBase`](/basic_data.html#DatasetBase)\n", "\n", "A [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) that contains no data, only `classes`, mainly used for inference with `set_item` " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(SingleClassificationDataset, title_level=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Undocumented Methods - Methods moved below this line will intentionally be hidden" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

proc_batch[source]

\n", "\n", "> proc_batch(`b`:`Tensor`) → `Tensor`\n", "\n", "Proces batch `b` of `TensorImage`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.proc_batch)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

data_collate[source]

\n", "\n", "> data_collate(`batch`:`ItemsList`) → `Tensor`\n", "\n", "Convert `batch` items to tensor data. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.collate_fn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## New Methods - Please document or move to the undocumented section" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "jekyll": { "keywords": "fastai", "summary": "Basic classes to contain the data for model training.", "title": "basic_data" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }