{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This module has the necessary functions to be able to download several useful datasets that we might be interested in using in our models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [],
   "source": [
    "from fastai.gen_doc.nbdoc import *\n",
    "from fastai.datasets import * \n",
    "from fastai.datasets import Config\n",
    "from pathlib import Path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h2 id=\"URLs\"><code>class</code> <code>URLs</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L8\" class=\"source_link\">[source]</a></h2>\n",
       "\n",
       "> <code>URLs</code>()\n",
       "\n",
       "Global constants for dataset and model URLs.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(URLs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This contains all the datasets' and models' URLs, and some classmethods to help use them - you don't create objects of this class. The supported datasets are (with their calling name): `S3_NLP`, `S3_COCO`, `MNIST_SAMPLE`, `MNIST_TINY`, `IMDB_SAMPLE`, `ADULT_SAMPLE`, `ML_SAMPLE`, `PLANET_SAMPLE`, `CIFAR`, `PETS`, `MNIST`. To get details on the datasets you can see the [fast.ai datasets webpage](http://course.fast.ai/datasets). Datasets with SAMPLE in their name are subsets of the original datasets. In the case of MNIST, we also have a TINY dataset which is even smaller than MNIST_SAMPLE.\n",
    "\n",
    "Models is now limited to `WT103` but you can expect more in the future!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'http://files.fast.ai/data/examples/mnist_sample'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URLs.MNIST_SAMPLE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Downloading Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the rest of the datasets you will need to download them with [`untar_data`](/datasets.html#untar_data) or [`download_data`](/datasets.html#download_data). [`untar_data`](/datasets.html#untar_data) will decompress the data file and download it while [`download_data`](/datasets.html#download_data) will just download and save the compressed file in `.tgz` format. \n",
    "\n",
    "By default, data will be downloaded to `~/.fastai/data` folder.  \n",
    "Configure the default `data_path` by editing `~/.fastai/config.yml`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"untar_data\"><code>untar_data</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L147\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>untar_data</code>(<b>`url`</b>:`str`, <b>`fname`</b>:`PathOrStr`=<b><i>`None`</i></b>, <b>`dest`</b>:`PathOrStr`=<b><i>`None`</i></b>, <b>`data`</b>=<b><i>`True`</i></b>, <b>`force_download`</b>=<b><i>`False`</i></b>) → `Path`\n",
       "\n",
       "Download `url` to `fname` if it doesn't exist, and un-tgz to folder `dest`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(untar_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PosixPath('/home/ubuntu/.fastai/data/planet_sample')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "untar_data(URLs.PLANET_SAMPLE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"download_data\"><code>download_data</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L132\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>download_data</code>(<b>`url`</b>:`str`, <b>`fname`</b>:`PathOrStr`=<b><i>`None`</i></b>, <b>`data`</b>:`bool`=<b><i>`True`</i></b>) → `Path`\n",
       "\n",
       "Download `url` to destination `fname`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(download_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: If the data file already exists in a <code>data</code> directory inside the notebook, that data file will be used instead of <code>~/.fasta/data</code>. Paths are resolved by calling the function [`datapath4file`](/datasets.html#datapath4file) - which checks if data exists locally (`data/`) first, before downloading to `~/.fastai/data` home directory.\n",
    "\n",
    "Example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PosixPath('/home/ubuntu/.fastai/data/planet_sample.tgz')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "download_data(URLs.PLANET_SAMPLE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"datapath4file\"><code>datapath4file</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L126\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>datapath4file</code>(<b>`filename`</b>)\n",
       "\n",
       "Return data path to `filename`, checking locally first then in the config file.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(datapath4file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the downloading functions use this to decide where to put the tgz and expanded folder. If `filename` already exists in a <code>data</code> directory in the same place as the calling notebook/script, that is used as the parent directly, otherwise `~/.fastai/config.yml` is read to see what path to use, which defaults to <code>~/.fastai/data</code> is used. To override this default, simply modify the value in your `~/.fastai/config.yml`:\n",
    "\n",
    "    data_path: ~/.fastai/data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"url2path\"><code>url2path</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L113\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>url2path</code>(<b>`url`</b>, <b>`data`</b>=<b><i>`True`</i></b>)\n",
       "\n",
       "Change `url` to a path.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(url2path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h2 id=\"Config\"><code>class</code> <code>Config</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L65\" class=\"source_link\">[source]</a></h2>\n",
       "\n",
       "> <code>Config</code>()\n",
       "\n",
       "Creates a default config file at `~/.fastai/config.yml`  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(Config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You probably won't need to use this yourself - it's used by `URLs.datapath4file`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"Config.get_path\"><code>get_path</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L78\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>get_path</code>(<b>`path`</b>)\n",
       "\n",
       "Get the `path` in the config file.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(Config.get_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get the key corresponding to `path` in the [`Config`](/datasets.html#Config)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"Config.data_path\"><code>data_path</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L83\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>data_path</code>()\n",
       "\n",
       "Get the path to data in the config file.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(Config.data_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get the `Path` where the data is stored."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"Config.model_path\"><code>model_path</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L88\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>model_path</code>()\n",
       "\n",
       "Get the path to fastai pretrained models in the config file.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(Config.model_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Undocumented Methods - Methods moved below this line will intentionally be hidden"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"Config.create\"><code>create</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L101\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>create</code>(<b>`fpath`</b>)\n",
       "\n",
       "Creates a [`Config`](/datasets.html#Config) from `fpath`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(Config.create)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"url2name\"><code>url2name</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L112\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>url2name</code>(<b>`url`</b>)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(url2name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"Config.get_key\"><code>get_key</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L73\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>get_key</code>(<b>`key`</b>)\n",
       "\n",
       "Get the path to `key` in the config file.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(Config.get_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"Config.get\"><code>get</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L93\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>get</code>(<b>`fpath`</b>=<b><i>`None`</i></b>, <b>`create_missing`</b>=<b><i>`True`</i></b>)\n",
       "\n",
       "Retrieve the [`Config`](/datasets.html#Config) in `fpath`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(Config.get)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New Methods - Please document or move to the undocumented section"
   ]
  }
 ],
 "metadata": {
  "jekyll": {
   "keywords": "fastai"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}