{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get your data ready for training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "hide_input": true
   },
   "outputs": [],
   "source": [
    "from fastai.gen_doc.nbdoc import *\n",
    "from fastai.basic_data import * "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h2 id=\"DataBunch\"><code>class</code> <code>DataBunch</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L100\" class=\"source_link\">[source]</a></h2>\n",
       "\n",
       "> <code>DataBunch</code>(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `test_dl`:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). \n",
    "\n",
    "An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.create\"><code>create</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L114\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>create</code>(`train_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `valid_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `test_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)=`None`, `path`:`PathOrStr`=`'.'`, `bs`:`int`=`64`, `num_workers`:`int`=`16`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `collate_fn`:`Callable`=`'data_collate'`) → `DataBunch`"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.create, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.dl\"><code>dl</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L126\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>dl</code>(`ds_type`:[`DatasetType`](/basic_data.html#DatasetType)=`<DatasetType.Valid: 2>`) → [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)\n",
       "\n",
       "Returns appropriate [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) for validation, training, or test (`ds_type`).  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.dl)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DataBunch.add_tfm\"><code>add_tfm</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L133\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>add_tfm</code>(`tfm`:`Callable`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DataBunch.add_tfm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Adds a transform to all dataloaders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h2 id=\"DeviceDataLoader\"><code>class</code> <code>DeviceDataLoader</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L49\" class=\"source_link\">[source]</a></h2>\n",
       "\n",
       "> <code>DeviceDataLoader</code>(`dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device), `tfms`:`List`\\[`Callable`\\]=`None`, `collate_fn`:`Callable`=`'data_collate'`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Factory method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.create\"><code>create</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L93\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>create</code>(`dataset`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `bs`:`int`=`64`, `shuffle`:`bool`=`False`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`device(type='cuda')`, `tfms`:`Collection`\\[`Callable`\\]=`None`, `num_workers`:`int`=`16`, `collate_fn`:`Callable`=`'data_collate'`, `kwargs`:`Any`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.create, doc_string=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Methods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "hide_input": false
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.one_batch\"><code>one_batch</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L85\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>one_batch</code>() → `Collection`\\[`Tensor`\\]\n",
       "\n",
       "Get one batch from the data loader.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.one_batch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.add_tfm\"><code>add_tfm</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L72\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>add_tfm</code>(`tfm`:`Callable`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.add_tfm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add a transform (i.e. same as `self.tfms.append(tfm)`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.remove_tfm\"><code>remove_tfm</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L73\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>remove_tfm</code>(`tfm`:`Callable`)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.remove_tfm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remove a transform."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generic classes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first two last classes are just empty shell to be subclassed by one of the applications, the last one is there to create empty [`DataBunch`](/basic_data.html#DataBunch) (useful when we want a [`Learner`](/basic_train.html#Learner) on inference mode)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h3 id=\"DatasetBase\"><code>class</code> <code>DatasetBase</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L7\" class=\"source_link\">[source]</a></h3>\n",
       "\n",
       "> <code>DatasetBase</code>(`c`:`int`) :: [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)\n",
       "\n",
       "Base class for all fastai datasets.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DatasetBase, title_level=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h3 id=\"LabelDataset\"><code>class</code> <code>LabelDataset</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L25\" class=\"source_link\">[source]</a></h3>\n",
       "\n",
       "> <code>LabelDataset</code>(`classes`:`Collection`, `class2idx`:`Dict`\\[`Any`, `int`\\]=`None`) :: [`DatasetBase`](/basic_data.html#DatasetBase)\n",
       "\n",
       "Base class for fastai datasets that do classification, mapped according to `classes`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(LabelDataset, title_level=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h3 id=\"SingleClassificationDataset\"><code>class</code> <code>SingleClassificationDataset</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L39\" class=\"source_link\">[source]</a></h3>\n",
       "\n",
       "> <code>SingleClassificationDataset</code>(`classes`:`StrList`) :: [`DatasetBase`](/basic_data.html#DatasetBase)\n",
       "\n",
       "A [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) that contains no data, only `classes`, mainly used for inference with `set_item`  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(SingleClassificationDataset, title_level=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Undocumented Methods - Methods moved below this line will intentionally be hidden"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"DeviceDataLoader.proc_batch\"><code>proc_batch</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L75\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>proc_batch</code>(`b`:`Tensor`) → `Tensor`\n",
       "\n",
       "Proces batch `b` of `TensorImage`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.proc_batch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"data_collate\"><code>data_collate</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/torch_core.py#L89\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>data_collate</code>(`batch`:`ItemsList`) → `Tensor`\n",
       "\n",
       "Convert `batch` items to tensor data.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(DeviceDataLoader.collate_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New Methods - Please document or move to the undocumented section"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "jekyll": {
   "keywords": "fastai",
   "summary": "Basic classes to contain the data for model training.",
   "title": "basic_data"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}