## datasets

This module has the necessary functions to be able to download several useful datasets that we might be interested in using in our models.

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.datasets import * 
from fastai.datasets import Config
from pathlib import Path

In [None]:
show_doc(URLs)

<h2 id="URLs" class="doc_header"><code>class</code> <code>URLs</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L8" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#URLs-pytest" style="float:right; padding-right:10px">[test]</a></h2>

> <code>URLs</code>()

<div class="collapse" id="URLs-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#URLs-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>URLs</code>:</p><p>Some other tests where <code>URLs</code> is used:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L43" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Global constants for dataset and model URLs.  

This contains all the datasets' and models' URLs, and some classmethods to help use them - you don't create objects of this class. The supported datasets are (with their calling name): `S3_NLP`, `S3_COCO`, `MNIST_SAMPLE`, `MNIST_TINY`, `IMDB_SAMPLE`, `ADULT_SAMPLE`, `ML_SAMPLE`, `PLANET_SAMPLE`, `CIFAR`, `PETS`, `MNIST`. To get details on the datasets you can see the [fast.ai datasets webpage](http://course.fast.ai/datasets). Datasets with SAMPLE in their name are subsets of the original datasets. In the case of MNIST, we also have a TINY dataset which is even smaller than MNIST_SAMPLE.

Models is now limited to `WT103` but you can expect more in the future!

In [None]:
URLs.MNIST_SAMPLE

'http://files.fast.ai/data/examples/mnist_sample'

## Downloading Data

For the rest of the datasets you will need to download them with [`untar_data`](/datasets.html#untar_data) or [`download_data`](/datasets.html#download_data). [`untar_data`](/datasets.html#untar_data) will decompress the data file and download it while [`download_data`](/datasets.html#download_data) will just download and save the compressed file in `.tgz` format. 

The locations where the data and models are downloaded are set in `config.yml`, which by default is located in `~/.fastai`. This directory can be changed via the optional environment variable `FASTAI_HOME` (e.g FASTAI_HOME=/home/.fastai).

If no `config.yml` is present in the specified directory, a default one will be created with `data_archive_path`, `data_path` and `models_path` entries. The `data_path` and `models_path` entries point respectively to [`data`](/tabular.data.html#tabular.data) folder and [`models`](/tabular.models.html#tabular.models) folder in the same directory as `config.yml`. The `data_archive_path` allows you to set a separate folder to save compressed datasets for archiving purposes. It defaults to the same directory as `data_path`. 

Configure those download locations by editing `data_archive_path`, `data_path` and `models_path` in `config.yml`.

In [None]:
show_doc(untar_data)

<h4 id="untar_data" class="doc_header"><code>untar_data</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L224" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#untar_data-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>untar_data</code>(**`url`**:`str`, **`fname`**:`PathOrStr`=***`None`***, **`dest`**:`PathOrStr`=***`None`***, **`data`**=***`True`***, **`force_download`**=***`False`***) → `Path`

<div class="collapse" id="untar_data-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#untar_data-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>untar_data</code>:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_load_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L26" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L42" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_vision_data.py::test_trunc_download</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_vision_data.py#L170" class="source_link" style="float:right">[source]</a></li></ul><p>Some other tests where <code>untar_data</code> is used:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L43" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Download `url` to `fname` if `dest` doesn't exist, and un-tgz to folder `dest`.  

In general, [`untar_data`](/datasets.html#untar_data) uses a `url` to download a `tgz` file under `fname`, and then un-tgz `fname` into a folder under `dest`. 

If you have run [`untar_data`](/datasets.html#untar_data) before, then running `untar_data(URLs.something)` again will just return you `dest` without downloading again.

If you have run [`untar_data`](/datasets.html#untar_data) before, then running [`untar_data`](/datasets.html#untar_data) again with `force_download=True` or the tgz file under `fname` being corrupted somehow, will remove the existing `fname` and `dest` and start downloading again.

If you have run [`untar_data`](/datasets.html#untar_data) before, but `dest` does not exist, meaning no folder under `dest` exists (the folder could be removed or renamed somehow), then running `untar_data(URLs.something)` again will execute [`download_data`](/datasets.html#download_data). Furthermore, if the tgz file under `fname` does exist, then there will be no actual downloading rather than un-tgz `fname` into `dest`; if `fname` does not exist, then downloading for the tgz file will be actually executed.

**Note**: the `url` you feed to [`untar_data`](/datasets.html#untar_data) must be one of `URLs.something`.

In [None]:
untar_data(URLs.PLANET_SAMPLE)

PosixPath('/home/ubuntu/.fastai/data/planet_sample')

In [None]:
show_doc(download_data)

<h4 id="download_data" class="doc_header"><code>download_data</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L209" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#download_data-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>download_data</code>(**`url`**:`str`, **`fname`**:`PathOrStr`=***`None`***, **`data`**:`bool`=***`True`***, **`ext`**:`str`=***`'.tgz'`***) → `Path`

<div class="collapse" id="download_data-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#download_data-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>download_data</code>:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_load_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L26" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L42" class="source_link" style="float:right">[source]</a></li></ul><p>Some other tests where <code>download_data</code> is used:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L43" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Download `url` to destination `fname`.  

Note: If the data file already exists in a <code>data</code> directory inside the notebook, that data file will be used instead of the one present in the folder specified in `config.yml`. `config.yml` is located in the directory specified in optional environment variable `FASTAI_HOME` (defaults to `~/.fastai/`). Paths are resolved by calling the function [`datapath4file`](/datasets.html#datapath4file) - which checks if data exists locally (`data/`) first, before downloading to the folder specified in `config.yml`.

Example:

In [None]:
download_data(URLs.PLANET_SAMPLE)

PosixPath('/home/ubuntu/.fastai/data/planet_sample.tgz')

In [None]:
show_doc(datapath4file)

<h4 id="datapath4file" class="doc_header"><code>datapath4file</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L202" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#datapath4file-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>datapath4file</code>(**`filename`**:`str`, **`ext`**:`str`=***`'.tgz'`***, **`archive`**=***`True`***)

<div class="collapse" id="datapath4file-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#datapath4file-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>datapath4file</code>:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_load_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L26" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L42" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Return data path to `filename`, checking locally first then in the config file.  

All the downloading functions use this to decide where to put the tgz and expanded folder. If `filename` already exists in a <code>data</code> directory in the same place as the calling notebook/script, that is used as the parent directly; otherwise, `config.yml` is read to see what path to use, which defaults to <code>~/.fastai/data</code> is used. To override this default, simply modify the value in your `config.yml`:

    data_archive_path: ~/.fastai/data
    data_path: ~/.fastai/data

`config.yml` is located in the directory specified in the optional environment variable `FASTAI_HOME` (defaults to `~/.fastai/`).

In [None]:
show_doc(url2path)

<h4 id="url2path" class="doc_header"><code>url2path</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L188" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#url2path-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>url2path</code>(**`url`**, **`data`**=***`True`***, **`ext`**:`str`=***`'.tgz'`***)

<div class="collapse" id="url2path-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#url2path-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>url2path</code>:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_load_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L26" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L42" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Change `url` to a path.  

In [None]:
show_doc(Config)

<h2 id="Config" class="doc_header"><code>class</code> <code>Config</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L131" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Config-pytest" style="float:right; padding-right:10px">[test]</a></h2>

> <code>Config</code>()

<div class="collapse" id="Config-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Config-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>Config</code>:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_creates_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L15" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_datasets.py::test_default_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L29" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_datasets.py::test_load_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L26" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L42" class="source_link" style="float:right">[source]</a></li></ul><p>Some other tests where <code>Config</code> is used:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_user_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L43" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Creates a default config file 'config.yml' in $FASTAI_HOME (default `~/.fastai/`)  

You probably won't need to use this yourself - it's used by `URLs.datapath4file`.

In [None]:
show_doc(Config.get_path)

<h4 id="Config.get_path" class="doc_header"><code>get_path</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L146" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Config-get_path-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>get_path</code>(**`path`**)

<div class="collapse" id="Config-get_path-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Config-get_path-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>get_path</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

Get the `path` in the config file.  

Get the key corresponding to `path` in the [`Config`](/datasets.html#Config).

In [None]:
show_doc(Config.data_path)

<h4 id="Config.data_path" class="doc_header"><code>data_path</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L151" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Config-data_path-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>data_path</code>()

<div class="collapse" id="Config-data_path-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Config-data_path-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>data_path</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

Get the path to data in the config file.  

Get the `Path` where the data is stored.

In [None]:
show_doc(Config.model_path)

<h4 id="Config.model_path" class="doc_header"><code>model_path</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L161" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Config-model_path-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>model_path</code>()

<div class="collapse" id="Config-model_path-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Config-model_path-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>model_path</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

Get the path to fastai pretrained models in the config file.  

## Undocumented Methods - Methods moved below this line will intentionally be hidden

In [None]:
show_doc(Config.create)

<h4 id="Config.create" class="doc_header"><code>create</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L174" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Config-create-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>create</code>(**`fpath`**)

<div class="collapse" id="Config-create-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Config-create-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>create</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

Creates a [`Config`](/datasets.html#Config) from `fpath`.  

In [None]:
show_doc(url2name)

<h4 id="url2name" class="doc_header"><code>url2name</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L185" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#url2name-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>url2name</code>(**`url`**)

<div class="collapse" id="url2name-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#url2name-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>url2name</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

In [None]:
show_doc(Config.get_key)

<h4 id="Config.get_key" class="doc_header"><code>get_key</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L141" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Config-get_key-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>get_key</code>(**`key`**)

<div class="collapse" id="Config-get_key-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Config-get_key-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>No tests found for <code>get_key</code>. To contribute a test please refer to <a href="/dev/test.html">this guide</a> and <a href="https://forums.fast.ai/t/improving-expanding-functional-tests/32929">this discussion</a>.</p></div></div>

Get the path to `key` in the config file.  

In [None]:
show_doc(Config.get)

<h4 id="Config.get" class="doc_header"><code>get</code><a href="https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L166" class="source_link" style="float:right">[source]</a><a class="source_link" data-toggle="collapse" data-target="#Config-get-pytest" style="float:right; padding-right:10px">[test]</a></h4>

> <code>get</code>(**`fpath`**=***`None`***, **`create_missing`**=***`True`***)

<div class="collapse" id="Config-get-pytest"><div class="card card-body pytest_card"><a type="button" data-toggle="collapse" data-target="#Config-get-pytest" class="close" aria-label="Close"><span aria-hidden="true">&times;</span></a><p>Tests found for <code>get</code>:</p><p>Some other tests where <code>get</code> is used:</p><ul><li><code>pytest -sv tests/test_datasets.py::test_creates_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L15" class="source_link" style="float:right">[source]</a></li><li><code>pytest -sv tests/test_datasets.py::test_default_config</code> <a href="https://github.com/fastai/fastai/blob/master/tests/test_datasets.py#L29" class="source_link" style="float:right">[source]</a></li></ul><p>To run tests please refer to this <a href="/dev/test.html#quick-guide">guide</a>.</p></div></div>

Retrieve the [`Config`](/datasets.html#Config) in `fpath`.  

## New Methods - Please document or move to the undocumented section