{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Caching Demo\n",
    "\n",
    "To run this demo make sure you have installed tqdm `conda install -c conda-forge tqdm` or `pip install tqdm` so that you will see the progress bar in the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import intake\n",
    "cat = intake.open_catalog('cache_demo.yml')\n",
    "list(cat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each entry in the catalog has a cache associated with it. When accessing the catalog metadata, the file does not get downloaded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats = cat.demographic_stats()\n",
    "stats.cache[0].clear_all()\n",
    "stats._urlpath"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The download occurs when the data source is `read` for the first time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = stats.read()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Second read doesn't download\n",
    "\n",
    "When the source is read again, the new local version will be used. So the read will be much faster. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = stats.read()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can inspect the cache from the command line or using the python API. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!intake cache list-files https://s3.amazonaws.com/earth-data/Demographic_Statistics_By_Zip_Code.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.cache[0].get_metadata(stats._urlpath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.cache_dirs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also use the `os` module to inspect the cache dir more directly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.listdir(os.path.join(os.path.expanduser('~'), '.intake', 'cache'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's clear the cache and then redownload the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.cache[0].clear_cache(stats._urlpath)\n",
    "stats.cache[0].get_metadata(stats._urlpath)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or equivilently:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!intake cache clear"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!intake cache list-files https://s3.amazonaws.com/earth-data/Demographic_Statistics_By_Zip_Code.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = stats.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!intake cache list-files https://s3.amazonaws.com/earth-data/Demographic_Statistics_By_Zip_Code.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cache directory is configurable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.cache[0].clear_cache(stats._urlpath)  # clear default cache\n",
    "\n",
    "import os.path\n",
    "\n",
    "cat = intake.open_catalog('cache_demo.yml')\n",
    "stats = cat.demographic_stats()\n",
    "stats.set_cache_dir(os.path.join(os.getcwd(), 'test_cache_dir'))\n",
    "stats.cache_dirs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = stats.read()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "List the files in the default intake cahce dir to see that nothing is in there. Then inspect the dir defined above to see that there is a dir with a unique id. Alternately - use the CLI to access the cache info."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.listdir(os.path.join(os.path.expanduser('~'), '.intake', 'cache'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.listdir('./test_cache_dir')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!intake cache list-files https://s3.amazonaws.com/earth-data/Demographic_Statistics_By_Zip_Code.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.cache[0].get_metadata(stats._urlpath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.cache[0].clear_all()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Disable Caching\n",
    "\n",
    "Caching can be globally disabled from the config using the python API or by editing the config file directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from intake.config import conf\n",
    "conf['cache_disabled'] = True\n",
    "\n",
    "cat = intake.open_catalog('cache_demo.yml')\n",
    "stats = cat.demographic_stats()\n",
    "df = stats.read()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!intake config info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.cache_dirs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.listdir(os.path.join(os.path.expanduser('~'), '.intake', 'cache'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.cache[0].get_metadata(stats._urlpath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}