{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Get your data ready for training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.basics import * " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class DataBunch[source][test]

\n", "\n", "> DataBunch(**`train_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`valid_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`fix_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)=***`None`***, **`test_dl`**:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***)\n", "\n", "
×

Tests found for DataBunch:

Some other tests where DataBunch is used:

To run tests please refer to this guide.

\n", "\n", "Bind `train_dl`,`valid_dl` and `test_dl` in a data object. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It also ensures all the dataloaders are on `device` and applies to them `dl_tfms` as batch are drawn (like normalization). `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) or the [data block API](/data_block.html) why this can be important). \n", "\n", "`train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Factory method" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

create[source][test]

\n", "\n", "> create(**`train_ds`**:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), **`valid_ds`**:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), **`test_ds`**:`Optional`\\[[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`bs`**:`int`=***`64`***, **`val_bs`**:`int`=***`None`***, **`num_workers`**:`int`=***`8`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***, **\\*\\*`dl_kwargs`**) → `DataBunch`\n", "\n", "
×

Tests found for create:

  • pytest -sv tests/test_basic_data.py::test_DataBunch_Create [source]
  • pytest -sv tests/test_basic_data.py::test_DataBunch_no_valid_dl [source]

Some other tests where create is used:

  • pytest -sv tests/test_basic_data.py::test_DeviceDataLoader_getitem [source]

To run tests please refer to this guide.

\n", "\n", "Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and maybe `test_ds` with a batch size of `bs`. Passes `**dl_kwargs` to `DataLoader()` " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.create)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`num_workers` is the number of CPUs to use, `tfms`, `device` and `collate_fn` are passed to the init method." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "
Warning: You can pass regular pytorch Dataset here, but they'll require more attributes than the basic ones to work with the library. See below for more details.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "jekyll_warn(\"You can pass regular pytorch Dataset here, but they'll require more attributes than the basic ones to work with the library. See below for more details.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualization" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

show_batch[source][test]

\n", "\n", "> show_batch(**`rows`**:`int`=***`5`***, **`ds_type`**:[`DatasetType`](/basic_data.html#DatasetType)=***``***, **`reverse`**:`bool`=***`False`***, **\\*\\*`kwargs`**)\n", "\n", "
×

Tests found for show_batch:

  • pytest -sv tests/test_basic_data.py::test_DataBunch_show_batch [source]

To run tests please refer to this guide.

\n", "\n", "Show a batch of data in `ds_type` on a few `rows`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.show_batch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Grabbing some data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

dl[source][test]

\n", "\n", "> dl(**`ds_type`**:[`DatasetType`](/basic_data.html#DatasetType)=***``***) → [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)\n", "\n", "
×

No tests found for dl. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Returns an appropriate [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) with a dataset for validation, training, or test (`ds_type`). " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.dl)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

one_batch[source][test]

\n", "\n", "> one_batch(**`ds_type`**:[`DatasetType`](/basic_data.html#DatasetType)=***``***, **`detach`**:`bool`=***`True`***, **`denorm`**:`bool`=***`True`***, **`cpu`**:`bool`=***`True`***) → `Collection`\\[`Tensor`\\]\n", "\n", "
×

Tests found for one_batch:

  • pytest -sv tests/test_basic_data.py::test_DataBunch_onebatch [source]
  • pytest -sv tests/test_basic_data.py::test_DataBunch_save_load [source]
  • pytest -sv tests/test_text_data.py::test_backwards_cls_databunch [source]
  • pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_1 [source]
  • pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_2 [source]

To run tests please refer to this guide.

\n", "\n", "Get one batch from the data loader of `ds_type`. Optionally `detach` and `denorm`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.one_batch)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

one_item[source][test]

\n", "\n", "> one_item(**`item`**, **`detach`**:`bool`=***`False`***, **`denorm`**:`bool`=***`False`***, **`cpu`**:`bool`=***`False`***)\n", "\n", "
×

Tests found for one_item:

  • pytest -sv tests/test_basic_data.py::test_DataBunch_oneitem [source]

To run tests please refer to this guide.

\n", "\n", "Get `item` into a batch. Optionally `detach` and `denorm`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.one_item)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

sanity_check[source][test]

\n", "\n", "> sanity_check()\n", "\n", "
×

No tests found for sanity_check. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Check the underlying data in the training set can be properly loaded. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.sanity_check)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load and save" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can save your [`DataBunch`](/basic_data.html#DataBunch) object for future use with this method." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

save[source][test]

\n", "\n", "> save(**`file`**:`PathLikeOrBinaryStream`=***`'data_save.pkl'`***)\n", "\n", "
×

Tests found for save:

  • pytest -sv tests/test_basic_data.py::test_DataBunch_save_load [source]

To run tests please refer to this guide.

\n", "\n", "Save the [`DataBunch`](/basic_data.html#DataBunch) in `self.path/file`. `file` can be file-like (file or buffer) " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.save)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

load_data[source][test]

\n", "\n", "> load_data(**`path`**:`PathOrStr`, **`file`**:`PathLikeOrBinaryStream`=***`'data_save.pkl'`***, **`bs`**:`int`=***`64`***, **`val_bs`**:`int`=***`None`***, **`num_workers`**:`int`=***`8`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n", "\n", "
×

Tests found for load_data:

  • pytest -sv tests/test_basic_data.py::test_DataBunch_save_load [source]
  • pytest -sv tests/test_text_data.py::test_load_and_save_test [source]

To run tests please refer to this guide.

\n", "\n", "Load a saved [`DataBunch`](/basic_data.html#DataBunch) from `path/file`. `file` can be file-like (file or buffer) " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(load_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "
Important: The arguments you passed when you created your first `DataBunch` aren't saved, so you should pass them here if you don't want the default.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "jekyll_important(\"The arguments you passed when you created your first `DataBunch` aren't saved, so you should pass them here if you don't want the default.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "
Note: Data cannot be serialized on Windows and then loaded on Linux or vice versa because `Path` object doesn't support this. We will find a workaround for that in v2.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "jekyll_note(\"Data cannot be serialized on Windows and then loaded on Linux or vice versa because `Path` object doesn't support this. We will find a workaround for that in v2.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is to allow you to easily create a new [`DataBunch`](/basic_data.html#DataBunch) with a different batch size for instance. You will also need to reapply any normalization (in vision) you might have done on your original [`DataBunch`](/basic_data.html#DataBunch)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Empty [`DataBunch`](/basic_data.html#DataBunch) for inference" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

export[source][test]

\n", "\n", "> export(**`file`**:`PathLikeOrBinaryStream`=***`'export.pkl'`***)\n", "\n", "
×

No tests found for export. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Export the minimal state of `self` for inference in `self.path/file`. `file` can be file-like (file or buffer) " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.export)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

load_empty[source][test]

\n", "\n", "> load_empty(**`path`**, **`fname`**:`str`=***`'export.pkl'`***)\n", "\n", "
×

No tests found for _databunch_load_empty. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Load an empty [`DataBunch`](/basic_data.html#DataBunch) from the exported file in `path/fname` with optional `tfms`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.load_empty, full_name='load_empty')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This method should be used to create a [`DataBunch`](/basic_data.html#DataBunch) at inference, see the corresponding [tutorial](/tutorial.inference.html)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

add_test[source][test]

\n", "\n", "> add_test(**`items`**:`Iterator`\\[`T_co`\\], **`label`**:`Any`=***`None`***, **`tfms`**=***`None`***, **`tfm_y`**=***`None`***)\n", "\n", "
×

No tests found for add_test. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Add the `items` as a test set. Pass along `label` otherwise label them with [`EmptyLabel`](/core.html#EmptyLabel). " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.add_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dataloader transforms" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

add_tfm[source][test]

\n", "\n", "> add_tfm(**`tfm`**:`Callable`)\n", "\n", "
×

No tests found for add_tfm. To contribute a test please refer to this guide and this discussion.

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DataBunch.add_tfm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adds a transform to all dataloaders." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a custom Dataset in fastai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to use your pytorch [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) in fastai, you may need to implement more attributes/methods if you want to use the full functionality of the library. Some functions can easily be used with your pytorch [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) if you just add an attribute, for others, the best would be to create your own [`ItemList`](/data_block.html#ItemList) by following [this tutorial](/tutorial.itemlist.html). Here is a full list of what the library will expect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First of all, you obviously need to implement the methods `__len__` and `__getitem__`, as indicated by the pytorch docs. Then the most needed things would be:\n", "- `c` attribute: it's used in most functions that directly create a [`Learner`](/basic_train.html#Learner) ([`tabular_learner`](/tabular.learner.html#tabular_learner), [`text_classifier_learner`](/text.learner.html#text_classifier_learner), [`unet_learner`](/vision.learner.html#unet_learner), [`cnn_learner`](/vision.learner.html#cnn_learner)) and represents the number of outputs of the final layer of your model (also the number of classes if applicable).\n", "- `classes` attribute: it's used by [`ClassificationInterpretation`](/train.html#ClassificationInterpretation) and also in [`collab_learner`](/collab.html#collab_learner) (best to use [`CollabDataBunch.from_df`](/collab.html#CollabDataBunch.from_df) than a pytorch [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)) and represents the unique tags that appear in your data.\n", "- maybe a `loss_func` attribute: that is going to be used by [`Learner`](/basic_train.html#Learner) as a default loss function, so if you know your custom [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) requires a particular loss, you can put it.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Toy example with image-like numpy arrays and binary label" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class ArrayDataset(Dataset):\n", " \"Sample numpy array dataset\"\n", " def __init__(self, x, y):\n", " self.x, self.y = x, y\n", " self.c = 2 # binary label\n", " \n", " def __len__(self):\n", " return len(self.x)\n", " \n", " def __getitem__(self, i):\n", " return self.x[i], self.y[i]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(tensor([[[0.8053, 0.5914, 0.5369],\n", " [0.6880, 0.4680, 0.5457],\n", " [0.0051, 0.2096, 0.3469]],\n", " \n", " [[0.5170, 0.2542, 0.9869],\n", " [0.0176, 0.5049, 0.4417],\n", " [0.3495, 0.7276, 0.5426]]], dtype=torch.float64), tensor([[0.],\n", " [0.]], dtype=torch.float64))" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_x = np.random.rand(10, 3, 3) # 10 images (3x3)\n", "train_y = np.random.rand(10, 1).round() # binary label\n", "\n", "valid_x = np.random.rand(10, 3, 3)\n", "valid_y = np.random.rand(10, 1).round()\n", "\n", "train_ds, valid_ds = ArrayDataset(train_x, train_y), ArrayDataset(valid_x, valid_y)\n", "data = DataBunch.create(train_ds, valid_ds, bs=2, num_workers=1)\n", "data.one_batch()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### For a specific application" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In text, your dataset will need to have a `vocab` attribute that should be an instance of [`Vocab`](/text.transform.html#Vocab). It's used by [`text_classifier_learner`](/text.learner.html#text_classifier_learner) and [`language_model_learner`](/text.learner.html#language_model_learner) when building the model.\n", "\n", "In tabular, your dataset will need to have a `cont_names` attribute (for the names of continuous variables) and a `get_emb_szs` method that returns a list of tuple `(n_classes, emb_sz)` representing, for each categorical variable, the number of different codes (don't forget to add 1 for nan) and the corresponding embedding size. Those two are used with the `c` attribute by [`tabular_learner`](/tabular.learner.html#tabular_learner). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Functions that really won't work" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make those last functions work, you really need to use the [data block API](/data_block.html) and maybe write your own [custom ItemList](/tutorial.itemlist.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [`DataBunch.show_batch`](/basic_data.html#DataBunch.show_batch) (requires `.x.reconstruct`, `.y.reconstruct` and `.x.show_xys`)\n", "- [`Learner.predict`](/basic_train.html#Learner.predict) (requires `x.set_item`, `.y.analyze_pred`, `.y.reconstruct` and maybe `.x.reconstruct`)\n", "- [`Learner.show_results`](/basic_train.html#Learner.show_results) (requires `x.reconstruct`, `y.analyze_pred`, `y.reconstruct` and `x.show_xyzs`)\n", "- `DataBunch.set_item` (requires `x.set_item`)\n", "- [`Learner.backward`](/basic_train.html#Learner.backward) (uses `DataBunch.set_item`)\n", "- [`DataBunch.export`](/basic_data.html#DataBunch.export) (requires `export`)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class DeviceDataLoader[source][test]

\n", "\n", "> DeviceDataLoader(**`dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device), **`tfms`**:`List`\\[`Callable`\\]=***`None`***, **`collate_fn`**:`Callable`=***`'data_collate'`***)\n", "\n", "
×

Tests found for DeviceDataLoader:

Some other tests where DeviceDataLoader is used:

  • pytest -sv tests/test_basic_data.py::test_DeviceDataLoader_getitem [source]

To run tests please refer to this guide.

\n", "\n", "Bind a [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) to a [`torch.device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device). " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Factory method" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

create[source][test]

\n", "\n", "> create(**`dataset`**:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), **`bs`**:`int`=***`64`***, **`shuffle`**:`bool`=***`False`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`device(type='cpu')`***, **`tfms`**:`Collection`\\[`Callable`\\]=***`None`***, **`num_workers`**:`int`=***`8`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **\\*\\*`kwargs`**:`Any`)\n", "\n", "
×

Tests found for create:

Some other tests where create is used:

  • pytest -sv tests/test_basic_data.py::test_DataBunch_Create [source]
  • pytest -sv tests/test_basic_data.py::test_DataBunch_no_valid_dl [source]
  • pytest -sv tests/test_basic_data.py::test_DeviceDataLoader_getitem [source]

To run tests please refer to this guide.

\n", "\n", "Create DeviceDataLoader from `dataset` with `bs` and `shuffle`: process using `num_workers`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.create)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The given `collate_fn` will be used to put the samples together in one batch (by default it grabs their data attribute). `shuffle` means the dataloader will take the samples randomly if that flag is set to `True`, or in the right order otherwise. `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Methods" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

add_tfm[source][test]

\n", "\n", "> add_tfm(**`tfm`**:`Callable`)\n", "\n", "
×

No tests found for add_tfm. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Add `tfm` to `self.tfms`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.add_tfm)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

remove_tfm[source][test]

\n", "\n", "> remove_tfm(**`tfm`**:`Callable`)\n", "\n", "
×

No tests found for remove_tfm. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Remove `tfm` from `self.tfms`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.remove_tfm)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

new[source][test]

\n", "\n", "> new(**\\*\\*`kwargs`**)\n", "\n", "
×

No tests found for new. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Create a new copy of `self` with `kwargs` replacing current values. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.new)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

proc_batch[source][test]

\n", "\n", "> proc_batch(**`b`**:`Tensor`) → `Tensor`\n", "\n", "
×

No tests found for proc_batch. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Process batch `b` of `TensorImage`. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.proc_batch)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

`DatasetType`[test]

\n", "\n", "> Enum = [Train, Valid, Test, Single, Fix]\n", "\n", "
×

No tests found for DatasetType. To contribute a test please refer to this guide and this discussion.

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DatasetType, doc_string=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Internal enumerator to name the training, validation and test dataset/dataloader." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Open This Notebook\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Undocumented Methods - Methods moved below this line will intentionally be hidden" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "

data_collate[source][test]

\n", "\n", "> data_collate(**`batch`**:`ItemsList`) → `Tensor`\n", "\n", "
×

No tests found for data_collate. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Convert `batch` items to tensor data. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(DeviceDataLoader.collate_fn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## New Methods - Please document or move to the undocumented section" ] } ], "metadata": { "jekyll": { "keywords": "fastai", "summary": "Basic classes to contain the data for model training.", "title": "basic_data" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 2 }