{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Caching Demo\n", "\n", "To run this demo make sure you have installed tqdm `conda install -c conda-forge tqdm` or `pip install tqdm` so that you will see the progress bar in the notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import intake\n", "cat = intake.open_catalog('cache_demo.yml')\n", "list(cat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each entry in the catalog has a cache associated with it. When accessing the catalog metadata, the file does not get downloaded." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats = cat.demographic_stats()\n", "stats.cache[0].clear_all()\n", "stats._urlpath" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The download occurs when the data source is `read` for the first time." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = stats.read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Second read doesn't download\n", "\n", "When the source is read again, the new local version will be used. So the read will be much faster. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = stats.read()\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can inspect the cache from the command line or using the python API. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!intake cache list-files https://s3.amazonaws.com/earth-data/Demographic_Statistics_By_Zip_Code.csv" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.cache[0].get_metadata(stats._urlpath)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.cache_dirs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use the `os` module to inspect the cache dir more directly" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.listdir(os.path.join(os.path.expanduser('~'), '.intake', 'cache'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's clear the cache and then redownload the file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.cache[0].clear_cache(stats._urlpath)\n", "stats.cache[0].get_metadata(stats._urlpath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or equivilently:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!intake cache clear" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!intake cache list-files https://s3.amazonaws.com/earth-data/Demographic_Statistics_By_Zip_Code.csv" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = stats.read()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!intake cache list-files https://s3.amazonaws.com/earth-data/Demographic_Statistics_By_Zip_Code.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cache directory is configurable" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.cache[0].clear_cache(stats._urlpath) # clear default cache\n", "\n", "import os.path\n", "\n", "cat = intake.open_catalog('cache_demo.yml')\n", "stats = cat.demographic_stats()\n", "stats.set_cache_dir(os.path.join(os.getcwd(), 'test_cache_dir'))\n", "stats.cache_dirs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = stats.read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "List the files in the default intake cahce dir to see that nothing is in there. Then inspect the dir defined above to see that there is a dir with a unique id. Alternately - use the CLI to access the cache info." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.listdir(os.path.join(os.path.expanduser('~'), '.intake', 'cache'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.listdir('./test_cache_dir')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!intake cache list-files https://s3.amazonaws.com/earth-data/Demographic_Statistics_By_Zip_Code.csv" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.cache[0].get_metadata(stats._urlpath)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.cache[0].clear_all()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Disable Caching\n", "\n", "Caching can be globally disabled from the config using the python API or by editing the config file directly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from intake.config import conf\n", "conf['cache_disabled'] = True\n", "\n", "cat = intake.open_catalog('cache_demo.yml')\n", "stats = cat.demographic_stats()\n", "df = stats.read()\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!intake config info" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.cache_dirs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.listdir(os.path.join(os.path.expanduser('~'), '.intake', 'cache'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.cache[0].get_metadata(stats._urlpath)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }