{ "cells": [ { "cell_type": "markdown", "id": "minor-throat", "metadata": {}, "source": [ "# Demo with esgf search for CMIP6 at DKRZ site (Files)\n", "\n", "ESGF Node at DKRZ: https://esgf-data.dkrz.de/search/cmip6-dkrz/\n", "\n", "**Use Case**: subset 3hr data ... use files as input for subset (**only** for demo)." ] }, { "cell_type": "markdown", "id": "becoming-domain", "metadata": {}, "source": [ "## Use esgf search at DKRZ ... no distributed search\n", "\n", "\n", "\n", "Using ``esgf-pyclient``: \n", "https://esgf-pyclient.readthedocs.io/en/latest/notebooks/examples/search.html" ] }, { "cell_type": "code", "execution_count": null, "id": "polyphonic-black", "metadata": {}, "outputs": [], "source": [ "from pyesgf.search import SearchConnection\n", "conn = SearchConnection('http://esgf-data.dkrz.de/esg-search',\n", " distrib=False)" ] }, { "cell_type": "markdown", "id": "frequent-flexibility", "metadata": {}, "source": [ "**Search only CMIP6 files locally available at DKRZ**" ] }, { "cell_type": "code", "execution_count": null, "id": "protecting-mechanics", "metadata": {}, "outputs": [], "source": [ "ctx = conn.new_context(project='CMIP6', data_node='esgf3.dkrz.de', latest=True)\n", "ctx.hit_count" ] }, { "cell_type": "markdown", "id": "biblical-clinic", "metadata": {}, "source": [ "Select a dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "physical-linux", "metadata": {}, "outputs": [], "source": [ "results = ctx.search(\n", " institution_id='MPI-M',\n", " source_id='MPI-ESM1-2-HR',\n", " experiment_id='historical', \n", " variable='pr', \n", " frequency='3hr',\n", " variant_label='r1i1p1f1'\n", ")\n", "len(results)" ] }, { "cell_type": "code", "execution_count": null, "id": "virgin-engineer", "metadata": {}, "outputs": [], "source": [ "ds = results[0]\n", "ds.json" ] }, { "cell_type": "markdown", "id": "solved-setting", "metadata": {}, "source": [ "Get a dataset identifier used by rook" ] }, { "cell_type": "code", "execution_count": null, "id": "opened-bottom", "metadata": {}, "outputs": [], "source": [ "dataset_id = ds.json['instance_id']\n", "dataset_id" ] }, { "cell_type": "markdown", "id": "directed-vegetable", "metadata": {}, "source": [ "Time range" ] }, { "cell_type": "code", "execution_count": null, "id": "spare-jurisdiction", "metadata": {}, "outputs": [], "source": [ "f\"{ds.json['datetime_start']}/{ds.json['datetime_stop']})\"" ] }, { "cell_type": "markdown", "id": "nominated-blocking", "metadata": {}, "source": [ "Bounding Box: (West, Sout, East, North)" ] }, { "cell_type": "code", "execution_count": null, "id": "sporting-jason", "metadata": {}, "outputs": [], "source": [ "f\"({ds.json['west_degrees']}, {ds.json['south_degrees']},{ds.json['east_degrees']}, {ds.json['west_degrees']}, {ds.json['north_degrees']})\"\n" ] }, { "cell_type": "markdown", "id": "spoken-correction", "metadata": {}, "source": [ "Size in GB" ] }, { "cell_type": "code", "execution_count": null, "id": "consistent-intervention", "metadata": {}, "outputs": [], "source": [ "f\"{ds.json['size'] / 1024 / 1024 / 1024} GB\"" ] }, { "cell_type": "markdown", "id": "emotional-efficiency", "metadata": {}, "source": [ "Make a file search" ] }, { "cell_type": "code", "execution_count": null, "id": "south-dayton", "metadata": {}, "outputs": [], "source": [ "files = results[0].file_context().search()\n", "download_url = files[0].download_url\n", "download_url" ] }, { "cell_type": "markdown", "id": "palestinian-prison", "metadata": {}, "source": [ "Map to file path at DKRZ" ] }, { "cell_type": "code", "execution_count": null, "id": "amateur-investor", "metadata": {}, "outputs": [], "source": [ "file_url = download_url.replace(\n", " \"http://esgf3.dkrz.de/thredds/fileServer/cmip6/\",\n", " \"/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/\"\n", ")\n", "file_url" ] }, { "cell_type": "markdown", "id": "continuous-turner", "metadata": {}, "source": [ "## Use Rook to run subset" ] }, { "cell_type": "code", "execution_count": null, "id": "vocal-serve", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ['ROOK_URL'] = 'http://rook.dkrz.de/wps'\n", "os.environ['ROOK_MODE'] = 'async'\n", "\n", "from rooki import operators as ops" ] }, { "cell_type": "markdown", "id": "virgin-objective", "metadata": {}, "source": [ "Run subset workflow\n", "\n", "http://bboxfinder.com/" ] }, { "cell_type": "code", "execution_count": null, "id": "eastern-parking", "metadata": {}, "outputs": [], "source": [ "bbox_africa = \"-23.906250,-35.746512,63.632813,37.996163\"\n", "\n", "wf = ops.Subset(\n", " ops.Input(\n", " 'tas', [file_url]\n", " ),\n", " time=\"1850-01-01/1850-12-31\",\n", " area=bbox_africa,\n", " \n", ")\n", "resp = wf.orchestrate()\n", "resp.ok" ] }, { "cell_type": "markdown", "id": "alone-copyright", "metadata": {}, "source": [ "### The outputs are available as a Metalink document\n", "https://github.com/metalink-dev" ] }, { "cell_type": "markdown", "id": "enabling-tracy", "metadata": {}, "source": [ "Metalink URL" ] }, { "cell_type": "code", "execution_count": null, "id": "alone-prototype", "metadata": {}, "outputs": [], "source": [ "resp.url" ] }, { "cell_type": "markdown", "id": "upper-addition", "metadata": {}, "source": [ "Number of files" ] }, { "cell_type": "code", "execution_count": null, "id": "duplicate-principal", "metadata": {}, "outputs": [], "source": [ "resp.num_files" ] }, { "cell_type": "markdown", "id": "inappropriate-invalid", "metadata": {}, "source": [ "Total size in MB" ] }, { "cell_type": "code", "execution_count": null, "id": "simplified-organizer", "metadata": {}, "outputs": [], "source": [ "resp.size_in_mb" ] }, { "cell_type": "markdown", "id": "minimal-spain", "metadata": {}, "source": [ "Download URLs" ] }, { "cell_type": "code", "execution_count": null, "id": "legendary-pacific", "metadata": {}, "outputs": [], "source": [ "resp.download_urls()" ] }, { "cell_type": "markdown", "id": "dedicated-brunei", "metadata": {}, "source": [ "Download and open with xarray" ] }, { "cell_type": "code", "execution_count": null, "id": "korean-playlist", "metadata": {}, "outputs": [], "source": [ "ds_0 = resp.datasets()[0]\n", "ds_0" ] }, { "cell_type": "markdown", "id": "moderate-begin", "metadata": {}, "source": [ "### Provenance\n", "\n", "Provenance information is given using the *PROV* standard.\n", "https://pypi.org/project/prov/" ] }, { "cell_type": "markdown", "id": "assigned-girlfriend", "metadata": {}, "source": [ "URL to json document" ] }, { "cell_type": "code", "execution_count": null, "id": "absolute-colon", "metadata": {}, "outputs": [], "source": [ "resp.provenance()" ] }, { "cell_type": "markdown", "id": "prepared-terrace", "metadata": {}, "source": [ "Provenance Plot" ] }, { "cell_type": "code", "execution_count": null, "id": "specific-visibility", "metadata": {}, "outputs": [], "source": [ "from IPython.display import Image\n", "Image(resp.provenance_image())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }