# Demo with esgf search for CMIP6 at DKRZ site (Files)

ESGF Node at DKRZ: https://esgf-data.dkrz.de/search/cmip6-dkrz/

**Use Case**: subset 3hr data ... use files as input for subset (**only** for demo).

## Use esgf search at DKRZ ... no distributed search



Using ``esgf-pyclient``: 
https://esgf-pyclient.readthedocs.io/en/latest/notebooks/examples/search.html

In [None]:
from pyesgf.search import SearchConnection
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search',
 distrib=False)

**Search only CMIP6 files locally available at DKRZ**

In [None]:
ctx = conn.new_context(project='CMIP6', data_node='esgf3.dkrz.de', latest=True)
ctx.hit_count

Select a dataset

In [None]:
results = ctx.search(
 institution_id='MPI-M',
 source_id='MPI-ESM1-2-HR',
 experiment_id='historical', 
 variable='pr', 
 frequency='3hr',
 variant_label='r1i1p1f1'
)
len(results)

In [None]:
ds = results[0]
ds.json

Get a dataset identifier used by rook

In [None]:
dataset_id = ds.json['instance_id']
dataset_id

Time range

In [None]:
f"{ds.json['datetime_start']}/{ds.json['datetime_stop']})"

Bounding Box: (West, Sout, East, North)

In [None]:
f"({ds.json['west_degrees']}, {ds.json['south_degrees']},{ds.json['east_degrees']}, {ds.json['west_degrees']}, {ds.json['north_degrees']})"


Size in GB

In [None]:
f"{ds.json['size'] / 1024 / 1024 / 1024} GB"

Make a file search

In [None]:
files = results[0].file_context().search()
download_url = files[0].download_url
download_url

Map to file path at DKRZ

In [None]:
file_url = download_url.replace(
 "http://esgf3.dkrz.de/thredds/fileServer/cmip6/",
 "/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/"
)
file_url

## Use Rook to run subset

In [None]:
import os
os.environ['ROOK_URL'] = 'http://rook.dkrz.de/wps'
os.environ['ROOK_MODE'] = 'async'

from rooki import operators as ops

Run subset workflow

http://bboxfinder.com/

In [None]:
bbox_africa = "-23.906250,-35.746512,63.632813,37.996163"

wf = ops.Subset(
 ops.Input(
 'tas', [file_url]
 ),
 time="1850-01-01/1850-12-31",
 area=bbox_africa,
 
)
resp = wf.orchestrate()
resp.ok

### The outputs are available as a Metalink document
https://github.com/metalink-dev

Metalink URL

In [None]:
resp.url

Number of files

In [None]:
resp.num_files

Total size in MB

In [None]:
resp.size_in_mb

Download URLs

In [None]:
resp.download_urls()

Download and open with xarray

In [None]:
ds_0 = resp.datasets()[0]
ds_0

### Provenance

Provenance information is given using the *PROV* standard.
https://pypi.org/project/prov/

URL to json document

In [None]:
resp.provenance()

Provenance Plot

In [None]:
from IPython.display import Image
Image(resp.provenance_image())