{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Get your data ready for training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.basic_data import * " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class DataBunch[source]

\n", "\n", "> DataBunch(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `test_dl`:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). \n", "\n", "An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

create[source]

\n", "\n", "> create(`train_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `valid_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `test_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)=`None`, `path`:`PathOrStr`=`'.'`, `bs`:`int`=`64`, `num_workers`:`int`=`4`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `collate_fn`:`Callable`=`'data_collate'`) → `DataBunch`" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.create, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

show_batch[source]

\n", "\n", "> show_batch(`rows`:`int`=`5`, `ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=``, `kwargs`)\n", "\n", "Show a batch of data in `ds_type` on a few `rows`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.show_batch)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

dl[source]

\n", "\n", "> dl(`ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=``) → [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)\n", "\n", "Returns appropriate [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) for validation, training, or test (`ds_type`). " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.dl)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

one_batch[source]

\n", "\n", "> one_batch(`ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=``, `detach`:`bool`=`True`, `denorm`:`bool`=`True`) → `Collection`\\[`Tensor`\\]\n", "\n", "Get one batch from the data loader of `ds_type`. Optionally `detach` and `denorm`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.one_batch)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

one_item[source]

\n", "\n", "> one_item(`item`, `detach`:`bool`=`False`, `denorm`:`bool`=`False`)\n", "\n", "Get ìtem` into a batch. Optionally `detach` and `denorm`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.one_item)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

add_tfm[source]

\n", "\n", "> add_tfm(`tfm`:`Callable`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.add_tfm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adds a transform to all dataloaders." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class DeviceDataLoader[source]

\n", "\n", "> DeviceDataLoader(`dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device), `tfms`:`List`\\[`Callable`\\]=`None`, `collate_fn`:`Callable`=`'data_collate'`, `skip_size1`:`bool`=`False`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Factory method" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

create[source]

\n", "\n", "> create(`dataset`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `bs`:`int`=`64`, `shuffle`:`bool`=`False`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`device(type='cuda')`, `tfms`:`Collection`\\[`Callable`\\]=`None`, `num_workers`:`int`=`4`, `collate_fn`:`Callable`=`'data_collate'`, `kwargs`:`Any`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.create, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Methods" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

add_tfm[source]

\n", "\n", "> add_tfm(`tfm`:`Callable`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.add_tfm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add a transform (i.e. same as `self.tfms.append(tfm)`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

remove_tfm[source]

\n", "\n", "> remove_tfm(`tfm`:`Callable`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.remove_tfm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remove a transform." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

`DatasetType`

\n", "\n", "> Enum = [Train, Valid, Test, Single]" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DatasetType, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Internal enumerator to name the training, validation and test dataset/dataloader." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Undocumented Methods - Methods moved below this line will intentionally be hidden" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

proc_batch[source]

\n", "\n", "> proc_batch(`b`:`Tensor`) → `Tensor`\n", "\n", "Proces batch `b` of `TensorImage`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.proc_batch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## New Methods - Please document or move to the undocumented section" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

export[source]

\n", "\n", "> export(`fname`:`str`=`'export.pkl'`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.export)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

_databunch_load_empty[source]

\n", "\n", "> _databunch_load_empty(`path`, `fname`:`str`=`'export.pkl'`, `tfms`:`Union`\\[`Callable`, `Collection`\\[`Callable`\\]\\]=`None`, `tfm_y`:`bool`=`False`, `kwargs`)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.load_empty)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

data_collate[source]

\n", "\n", "> data_collate(`batch`:`ItemsList`) → `Tensor`\n", "\n", "Convert `batch` items to tensor data. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.collate_fn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "jekyll": { "keywords": "fastai", "summary": "Basic classes to contain the data for model training.", "title": "basic_data" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }