{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get your data ready for training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [],
   "source": [
    "from fastai.gen_doc.nbdoc import *\n",
    "from fastai.basic_data import * "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h2 id=\"DataBunch\"><code>class</code> <code>DataBunch</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L59\" class=\"source_link\">[source]</a></h2>\n",
       "\n",
       "> <code>DataBunch</code>(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `test_dl`:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). \n",
    "\n",
    "An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.create\"><code>create</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L81\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>create</code>(`train_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `valid_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `test_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)=`None`, `path`:`PathOrStr`=`'.'`, `bs`:`int`=`64`, `num_workers`:`int`=`4`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `collate_fn`:`Callable`=`'data_collate'`) → `DataBunch`"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.create, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.show_batch\"><code>show_batch</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L132\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>show_batch</code>(`rows`:`int`=`5`, `ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=`<DatasetType.Train: 1>`, `kwargs`)\n",
       "\n",
       "Show a batch of data in `ds_type` on a few `rows`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.show_batch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.dl\"><code>dl</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L95\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>dl</code>(`ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=`<DatasetType.Valid: 2>`) → [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)\n",
       "\n",
       "Returns appropriate [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) for validation, training, or test (`ds_type`).  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.dl)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.one_batch\"><code>one_batch</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L110\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>one_batch</code>(`ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=`<DatasetType.Train: 1>`, `detach`:`bool`=`True`, `denorm`:`bool`=`True`) → `Collection`\\[`Tensor`\\]\n",
       "\n",
       "Get one batch from the data loader of `ds_type`. Optionally `detach` and `denorm`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.one_batch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.one_item\"><code>one_item</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L124\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>one_item</code>(`item`, `detach`:`bool`=`False`, `denorm`:`bool`=`False`)\n",
       "\n",
       "Get ìtem` into a batch. Optionally `detach` and `denorm`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.one_item)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.add_tfm\"><code>add_tfm</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L107\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>add_tfm</code>(`tfm`:`Callable`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.add_tfm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Adds a transform to all dataloaders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h2 id=\"DeviceDataLoader\"><code>class</code> <code>DeviceDataLoader</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L12\" class=\"source_link\">[source]</a></h2>\n",
       "\n",
       "> <code>DeviceDataLoader</code>(`dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device), `tfms`:`List`\\[`Callable`\\]=`None`, `collate_fn`:`Callable`=`'data_collate'`, `skip_size1`:`bool`=`False`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Factory method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.create\"><code>create</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L52\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>create</code>(`dataset`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `bs`:`int`=`64`, `shuffle`:`bool`=`False`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`device(type='cuda')`, `tfms`:`Collection`\\[`Callable`\\]=`None`, `num_workers`:`int`=`4`, `collate_fn`:`Callable`=`'data_collate'`, `kwargs`:`Any`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.create, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Methods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.add_tfm\"><code>add_tfm</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L36\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>add_tfm</code>(`tfm`:`Callable`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.add_tfm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add a transform (i.e. same as `self.tfms.append(tfm)`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.remove_tfm\"><code>remove_tfm</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L37\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>remove_tfm</code>(`tfm`:`Callable`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.remove_tfm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remove a transform."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h2 id=\"DatasetType\">`DatasetType`</h2>\n",
       "\n",
       "> <code>Enum</code> = [Train, Valid, Test, Single]"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DatasetType, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Internal enumerator to name the training, validation and test dataset/dataloader."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Undocumented Methods - Methods moved below this line will intentionally be hidden"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.proc_batch\"><code>proc_batch</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L39\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>proc_batch</code>(`b`:`Tensor`) → `Tensor`\n",
       "\n",
       "Proces batch `b` of `TensorImage`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.proc_batch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New Methods - Please document or move to the undocumented section"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.export\"><code>export</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L143\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>export</code>(`fname`:`str`=`'export.pkl'`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.export)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"_databunch_load_empty\"><code>_databunch_load_empty</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/data_block.py#L499\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>_databunch_load_empty</code>(`path`, `fname`:`str`=`'export.pkl'`, `tfms`:`Union`\\[`Callable`, `Collection`\\[`Callable`\\]\\]=`None`, `tfm_y`:`bool`=`False`, `kwargs`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.load_empty)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"data_collate\"><code>data_collate</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/torch_core.py#L95\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>data_collate</code>(`batch`:`ItemsList`) → `Tensor`\n",
       "\n",
       "Convert `batch` items to tensor data.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.collate_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "jekyll": {
   "keywords": "fastai",
   "summary": "Basic classes to contain the data for model training.",
   "title": "basic_data"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}