# Get your data ready for training

This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you.

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.basic_data import * 

In [None]:
show_doc(DataBunch, doc_string=False)

<h2 id="DataBunch"><code>class</code> <code>DataBunch</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L59" class="source_link">[source]</a></h2>

> <code>DataBunch</code>(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `test_dl`:`Optional`\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\[`Collection`\[`Callable`\]\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`)

Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). 

An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader).

In [None]:
show_doc(DataBunch.create, doc_string=False)

<h4 id="DataBunch.create"><code>create</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L81" class="source_link">[source]</a></h4>

> <code>create</code>(`train_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `valid_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `test_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)=`None`, `path`:`PathOrStr`=`'.'`, `bs`:`int`=`64`, `num_workers`:`int`=`4`, `tfms`:`Optional`\[`Collection`\[`Callable`\]\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `collate_fn`:`Callable`=`'data_collate'`) → `DataBunch`

Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method.

In [None]:
show_doc(DataBunch.show_batch)

<h4 id="DataBunch.show_batch"><code>show_batch</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L132" class="source_link">[source]</a></h4>

> <code>show_batch</code>(`rows`:`int`=`5`, `ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=`<DatasetType.Train: 1>`, `kwargs`)

Show a batch of data in `ds_type` on a few `rows`.  

In [None]:
show_doc(DataBunch.dl)

<h4 id="DataBunch.dl"><code>dl</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L95" class="source_link">[source]</a></h4>

> <code>dl</code>(`ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=`<DatasetType.Valid: 2>`) → [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)

Returns appropriate [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) for validation, training, or test (`ds_type`).  

In [None]:
show_doc(DataBunch.one_batch)

<h4 id="DataBunch.one_batch"><code>one_batch</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L110" class="source_link">[source]</a></h4>

> <code>one_batch</code>(`ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=`<DatasetType.Train: 1>`, `detach`:`bool`=`True`, `denorm`:`bool`=`True`) → `Collection`\[`Tensor`\]

Get one batch from the data loader of `ds_type`. Optionally `detach` and `denorm`.  

In [None]:
show_doc(DataBunch.one_item)

<h4 id="DataBunch.one_item"><code>one_item</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L124" class="source_link">[source]</a></h4>

> <code>one_item</code>(`item`, `detach`:`bool`=`False`, `denorm`:`bool`=`False`)

Get ìtem` into a batch. Optionally `detach` and `denorm`.  

In [None]:
show_doc(DataBunch.add_tfm)

<h4 id="DataBunch.add_tfm"><code>add_tfm</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L107" class="source_link">[source]</a></h4>

> <code>add_tfm</code>(`tfm`:`Callable`)

Adds a transform to all dataloaders.

In [None]:
show_doc(DeviceDataLoader, doc_string=False)

<h2 id="DeviceDataLoader"><code>class</code> <code>DeviceDataLoader</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L12" class="source_link">[source]</a></h2>

> <code>DeviceDataLoader</code>(`dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device), `tfms`:`List`\[`Callable`\]=`None`, `collate_fn`:`Callable`=`'data_collate'`, `skip_size1`:`bool`=`False`)

Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. 

### Factory method

In [None]:
show_doc(DeviceDataLoader.create, doc_string=False)

<h4 id="DeviceDataLoader.create"><code>create</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L52" class="source_link">[source]</a></h4>

> <code>create</code>(`dataset`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `bs`:`int`=`64`, `shuffle`:`bool`=`False`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`device(type='cuda')`, `tfms`:`Collection`\[`Callable`\]=`None`, `num_workers`:`int`=`4`, `collate_fn`:`Callable`=`'data_collate'`, `kwargs`:`Any`)

Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization.

### Methods

In [None]:
show_doc(DeviceDataLoader.add_tfm)

<h4 id="DeviceDataLoader.add_tfm"><code>add_tfm</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L36" class="source_link">[source]</a></h4>

> <code>add_tfm</code>(`tfm`:`Callable`)

Add a transform (i.e. same as `self.tfms.append(tfm)`).

In [None]:
show_doc(DeviceDataLoader.remove_tfm)

<h4 id="DeviceDataLoader.remove_tfm"><code>remove_tfm</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L37" class="source_link">[source]</a></h4>

> <code>remove_tfm</code>(`tfm`:`Callable`)

Remove a transform.

In [None]:
show_doc(DatasetType, doc_string=False)

<h2 id="DatasetType">`DatasetType`</h2>

> <code>Enum</code> = [Train, Valid, Test, Single]

Internal enumerator to name the training, validation and test dataset/dataloader.

## Undocumented Methods - Methods moved below this line will intentionally be hidden

In [None]:
show_doc(DeviceDataLoader.proc_batch)

<h4 id="DeviceDataLoader.proc_batch"><code>proc_batch</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L39" class="source_link">[source]</a></h4>

> <code>proc_batch</code>(`b`:`Tensor`) → `Tensor`

Proces batch `b` of `TensorImage`.  

## New Methods - Please document or move to the undocumented section

In [None]:
show_doc(DataBunch.export)

<h4 id="DataBunch.export"><code>export</code><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L143" class="source_link">[source]</a></h4>

> <code>export</code>(`fname`:`str`=`'export.pkl'`)

In [None]:
show_doc(DataBunch.load_empty)

<h4 id="_databunch_load_empty"><code>_databunch_load_empty</code><a href="https://github.com/fastai/fastai/blob/master/fastai/data_block.py#L499" class="source_link">[source]</a></h4>

> <code>_databunch_load_empty</code>(`path`, `fname`:`str`=`'export.pkl'`, `tfms`:`Union`\[`Callable`, `Collection`\[`Callable`\]\]=`None`, `tfm_y`:`bool`=`False`, `kwargs`)

In [None]:
show_doc(DeviceDataLoader.collate_fn)

<h4 id="data_collate"><code>data_collate</code><a href="https://github.com/fastai/fastai/blob/master/fastai/torch_core.py#L95" class="source_link">[source]</a></h4>

> <code>data_collate</code>(`batch`:`ItemsList`) → `Tensor`

Convert `batch` items to tensor data.  