{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading ensemble output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import modules" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import xarray as xr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set number of ensemble members" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "niter = 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set output path and ensemble name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, all my ensemble members are named \"hydro_ensemble_LHC_X\" where X is the member number" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "path = \"/glade/scratch/kdagon/archive/\"\n", "PPE = \"hydro_ensemble_LHC_\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set output variables of interest" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "var = ['FPSN', 'EFLX_LH_TOT']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a list of paths to each ensemble member output directory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is where we use unix wildcards (e.g., *) to get all the files in each directory\\\n", "Note that not all wildcard functionality works the same way as in does in the command line" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['/glade/scratch/kdagon/archive/hydro_ensemble_LHC_1/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_2/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_3/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_4/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_5/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_6/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_7/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_8/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_9/lnd/hist/*',\n", " '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_10/lnd/hist/*']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#full_paths = [path+PPE+str(i+1)+\"/lnd/hist/*{001[6-9],20-}*\" for i in range(niter)] # specific to years 16-20; NOTE: this wildcard doesn't work with open_mfdataset\n", "full_paths = [path+PPE+str(i+1)+\"/lnd/hist/*\" for i in range(niter)] # all history files in the folder\n", "full_paths[:10] # look at the first 10 paths" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a preprocess function that returns a specific variable or variables" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def preprocess(ds):\n", " return ds[var]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test opening the first ensemble member" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 9.21 s, sys: 51.6 ms, total: 9.26 s\n", "Wall time: 11.5 s\n" ] } ], "source": [ "%%time\n", "da_model = xr.open_mfdataset(full_paths[0], combine='by_coords', preprocess=preprocess)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "Dimensions: (lat: 46, lon: 72, time: 60)\n", "Coordinates:\n", " * lon (lon) float32 0.0 5.0 10.0 15.0 ... 340.0 345.0 350.0 355.0\n", " * lat (lat) float32 -90.0 -86.0 -82.0 -78.0 ... 78.0 82.0 86.0 90.0\n", " * time (time) object 0016-02-01 00:00:00 ... 0021-01-01 00:00:00\n", "Data variables:\n", " FPSN (time, lat, lon) float32 dask.array\n", " EFLX_LH_TOT (time, lat, lon) float32 dask.array\n", "Attributes:\n", " title: CLM History file information\n", " comment: NOTE: None of the variables ar...\n", " Conventions: CF-1.0\n", " history: created on 05/28/18 20:36:54\n", " source: Community Land Model CLM4.0\n", " hostname: cheyenne\n", " username: kdagon\n", " version: unknown\n", " revision_id: $Id: histFileMod.F90 42903 201...\n", " case_title: UNSET\n", " case_id: hydro_ensemble_LHC_1\n", " Surface_dataset: surfdata_4x5_16pfts_Irrig_CMIP...\n", " Initial_conditions_dataset: finidat_interp_dest.nc\n", " PFT_physiological_constants_dataset: hydro_ensemble_LHC_1.nc\n", " ltype_vegetated_or_bare_soil: 1\n", " ltype_crop: 2\n", " ltype_UNUSED: 3\n", " ltype_landice_multiple_elevation_classes: 4\n", " ltype_deep_lake: 5\n", " ltype_wetland: 6\n", " ltype_urban_tbd: 7\n", " ltype_urban_hd: 8\n", " ltype_urban_md: 9\n", " ctype_vegetated_or_bare_soil: 1\n", " ctype_crop: 2\n", " ctype_crop_noncompete: 2*100+m, m=cft_lb,cft_ub\n", " ctype_landice: 3\n", " ctype_landice_multiple_elevation_classes: 4*100+m, m=1,glcnec\n", " ctype_deep_lake: 5\n", " ctype_wetland: 6\n", " ctype_urban_roof: 71\n", " ctype_urban_sunwall: 72\n", " ctype_urban_shadewall: 73\n", " ctype_urban_impervious_road: 74\n", " ctype_urban_pervious_road: 75\n", " cft_c3_crop: 1\n", " cft_c3_irrigated: 2\n", " time_period_freq: month_1\n", " Time_constant_3Dvars_filename: ./hydro_ensemble_LHC_1.clm2.h0...\n", " Time_constant_3Dvars: ZSOI:DZSOI:WATSAT:SUCSAT:BSW:H..." ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "da_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Open each ensemble member as a list of datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start with the first 10 ensemble members to test functionality" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1min 33s, sys: 1.17 s, total: 1min 34s\n", "Wall time: 1min 55s\n" ] } ], "source": [ "%%time\n", "da_model = [xr.open_mfdataset(p, combine='by_coords', preprocess=preprocess) for p in full_paths[:10]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: this takes about ~2min to read 10 ensemble members with 5 years of monthly history files for each (4x5 resolution)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an ensemble member dimension to index on" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, start with first 10 members" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n", "Dimensions without coordinates: ens" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ensdim = xr.DataArray(list(range(1,11)), dims='ens', name='ens') # or can use np.arange\n", "ensdim" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Concatenate the ensemble members along new ensemble dimension" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "da_model_concat = xr.concat(da_model, dim=ensdim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now you have a dataset indexed by ensemble member, with only the specify varible output you want" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "Dimensions: (ens: 10, lat: 46, lon: 72, time: 60)\n", "Coordinates:\n", " * lon (lon) float32 0.0 5.0 10.0 15.0 ... 340.0 345.0 350.0 355.0\n", " * lat (lat) float32 -90.0 -86.0 -82.0 -78.0 ... 78.0 82.0 86.0 90.0\n", " * time (time) object 0016-02-01 00:00:00 ... 0021-01-01 00:00:00\n", " * ens (ens) int64 1 2 3 4 5 6 7 8 9 10\n", "Data variables:\n", " FPSN (ens, time, lat, lon) float32 dask.array\n", " EFLX_LH_TOT (ens, time, lat, lon) float32 dask.array\n", "Attributes:\n", " title: CLM History file information\n", " comment: NOTE: None of the variables ar...\n", " Conventions: CF-1.0\n", " history: created on 05/28/18 20:36:54\n", " source: Community Land Model CLM4.0\n", " hostname: cheyenne\n", " username: kdagon\n", " version: unknown\n", " revision_id: $Id: histFileMod.F90 42903 201...\n", " case_title: UNSET\n", " case_id: hydro_ensemble_LHC_1\n", " Surface_dataset: surfdata_4x5_16pfts_Irrig_CMIP...\n", " Initial_conditions_dataset: finidat_interp_dest.nc\n", " PFT_physiological_constants_dataset: hydro_ensemble_LHC_1.nc\n", " ltype_vegetated_or_bare_soil: 1\n", " ltype_crop: 2\n", " ltype_UNUSED: 3\n", " ltype_landice_multiple_elevation_classes: 4\n", " ltype_deep_lake: 5\n", " ltype_wetland: 6\n", " ltype_urban_tbd: 7\n", " ltype_urban_hd: 8\n", " ltype_urban_md: 9\n", " ctype_vegetated_or_bare_soil: 1\n", " ctype_crop: 2\n", " ctype_crop_noncompete: 2*100+m, m=cft_lb,cft_ub\n", " ctype_landice: 3\n", " ctype_landice_multiple_elevation_classes: 4*100+m, m=1,glcnec\n", " ctype_deep_lake: 5\n", " ctype_wetland: 6\n", " ctype_urban_roof: 71\n", " ctype_urban_sunwall: 72\n", " ctype_urban_shadewall: 73\n", " ctype_urban_impervious_road: 74\n", " ctype_urban_pervious_road: 75\n", " cft_c3_crop: 1\n", " cft_c3_irrigated: 2\n", " time_period_freq: month_1\n", " Time_constant_3Dvars_filename: ./hydro_ensemble_LHC_1.clm2.h0...\n", " Time_constant_3Dvars: ZSOI:DZSOI:WATSAT:SUCSAT:BSW:H..." ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "da_model_concat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example of chunking when reading in a large file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set the file path" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "file_path = \"/glade/scratch/nanr/forKatie/daily/\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specify the file name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is 1/4 degree atmosphere daily output, RCP8.5 from 2070-2100" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "file = \"b.e13.BRCP85C5CN.ne120_g16.001.cam.h1.PRECT.20700101-21001231.FV.nc\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Open with chunking along time, lat, lon" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You kind of have to know your general dimension sizes to select chunk sizes\\\n", "For example here I am spliting lat and lon into 2 chunks (total lat = 786, total lon = 1152)\\\n", "And I am chunking time (the largest dimension) into size 100 each" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "ds = xr.open_dataset(file_path+file, chunks={'time': 100, 'lat': 384, 'lon': 576})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More about how to choose chunk sizes:\\\n", "https://docs.dask.org/en/latest/array-best-practices.html \\\n", "https://examples.dask.org/xarray.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read in a specific variable as a data array" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * lat (lat) float64 -90.0 -89.77 -89.53 -89.3 ... 89.3 89.53 89.77 90.0\n", " * lon (lon) float64 0.0 0.3125 0.625 0.9375 ... 358.8 359.1 359.4 359.7\n", " * time (time) object 2070-01-02 00:00:00 ... 2101-01-01 00:00:00\n", "Attributes:\n", " units: m/s\n", " long_name: Total (convective and large-scale) precipitation rate (li...\n", " cell_methods: time: mean\n", " cell_measures: area: area" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PRECT = ds.PRECT\n", "PRECT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Look at the total array and chunk sizes" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 40.04 GB 88.47 MB
Shape (11315, 768, 1152) (100, 384, 576)
Count 457 Tasks 456 Chunks
Type float32 numpy.ndarray
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 1152\n", " 768\n", " 11315\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PRECT.data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python (conda-analysis)", "language": "python", "name": "analysis" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }