{ "cells": [ { "cell_type": "markdown", "id": "070604f6-cd48-4c29-bc15-72a50a4f9cbb", "metadata": {}, "source": [ "# WARNING : this notebook still needs modernising. See [iris_example_code#12](https://github.com/scitools-classroom/iris_example_code/issues/12) for more details" ] }, { "cell_type": "markdown", "id": "14d85a51", "metadata": {}, "source": [ "## Managing Sparse Data\n", "\n", "### Introduction\n", "\n", "The aim of this example is to demonstrate how to take data sampled at arbitrary random locations and put it onto a defined grid. \n", "It also contains some specific suggestions for processing EN3 ocean profiles data. \n", "\n", "We take our example data from the EN3 datasheet of vertical ocean profiles\n", "observations. These data are produced by mobile submerging ocean buoys.\n", "[General information](http://www.metoffice.gov.uk/hadobs/en3/) on this data\n", "and an account of the [file format](http://www.metoffice.gov.uk/hadobs/en3/en3_file_formats.html)\n", "is provided by the Met Office.\n", "\n", "From our point of view, there are two key aspects of this data that we need to deal with:\n", "\n", " 1. each measurement value can also have a related 'quality control' (\"QC\") indicator that needs to be taken into account.\n", " 2. measurements occur at arbitrary times and locations, so the time, depth, latitude and longitude are provided by additional _measured_ values for each data point, instead of following a planned sampling pattern.\n" ] }, { "cell_type": "markdown", "id": "1405ee99", "metadata": {}, "source": [ "#### Task summary \n", "\n", " 1. Read in potential temperature data from an EN3 observations file, taking the values from the POTM_CORRECTED (\"corrected pot. temp\") netCDF variable.\n", " 1. Also read the other information relating to these datapoints, including locations, depths and times.\n", " 1. Apply a quality threshold to the related quality control flag variable ('POTM_CORRECTED_QC'), \n", " such that only data values with a quality figure of less than 3 are included in\n", " the calculations.\n", " 1. Create a global 3d spatial grid with regular longitude, latitude and depth (down to 500m).\n", " 1. Collect the observations that fall within each grid cell.\n", " 1. Calculate mean, standard deviation and count (number) of datapoints within each gridcell: each observation point contained in a particular cell contributes to that cell's statistical result.\n", " 1. Implement a 'count' threshold on the data, so that each gridcell must contain >3 observation points to produce a result (otherwise it is set to 'missing data').\n", " 1. Preserve an indication of the original observation times in the gridded results.\n" ] }, { "cell_type": "markdown", "id": "78746f2d", "metadata": {}, "source": [ "### Import required modules, including Iris" ] }, { "cell_type": "code", "execution_count": 1, "id": "86907e6e", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "import iris\n", "import iris.analysis\n", "import iris.quickplot as qplt" ] }, { "cell_type": "markdown", "id": "85f68ec7", "metadata": {}, "source": [ "Check the Iris version." ] }, { "cell_type": "code", "execution_count": 2, "id": "531736ac-8df1-4b3c-a448-542a39d59ae0", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iris version: 3.10.0\n" ] } ], "source": [ "print(f'Iris version: {iris.__version__}')" ] }, { "cell_type": "markdown", "id": "d6506d2d", "metadata": {}, "source": [ "### Load data\n", "\n", "Define the data file." ] }, { "cell_type": "code", "execution_count": 3, "id": "7791d968", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "obs_filepath = './example_data/EN3_v2a_Profiles_195001.nc'" ] }, { "cell_type": "markdown", "id": "8898f5de", "metadata": {}, "source": [ "We first define some helper functions to load data from netCDf file variables and mask any points of poor\n", "quality.\n", "\n", "Each data variable has an associated variable containing its 'quality control' flags (\"QC\").\n", "On loading, we will check the related QC variable, where available, and mask individual datapoints according to a QC threshold value.\n", "\n", "In practical terms, the QC values are numeric but coded as string. Typical values are 1='good', 4='poor' and 0='missing QC data'. We also have to deal with possible missing datapoints in both the data\n", "variable and the QC variable: we mask any missing datapoints, but ignore\n", "missing QC data (as some QC variables in the file have _all_ values = 'missing').\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "e8895683", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "def ensure_full_mask(array):\n", " \"\"\"Return MaskedArray version of data with a full mask array.\"\"\"\n", " full_mask_array = np.ma.getmaskarray(array)\n", " return np.ma.MaskedArray(array, mask=full_mask_array)\n", "\n", "def load_en3_variable(var_name):\n", " \"\"\"\n", " Load a named data variable from the EN3 file as an Iris cube with a masked\n", " data array.\n", "\n", " \"\"\"\n", " # Load cube from nc variable\n", " cube = iris.load_cube(obs_filepath, var_name)\n", " # Force data to be a masked array, with a fully expanded mask array\n", " cube.data = ensure_full_mask(cube.data)\n", " # Also infer a data mask if there is a '_fillvalue' attribute. This\n", " # _ought_ to be automatic with netCDF4-python, but some EN3 files seem to\n", " # have a mis-spelling here (should be '_FillValue', with capitals).\n", " if '_fillvalue' in cube.attributes:\n", " mdi = cube.attributes['_fillvalue']\n", " cube.data.mask |= cube.data == mdi\n", " cube.data.set_fill_value(mdi)\n", " return cube\n", "\n", "def load_en3_with_quality_mask(main_var_name, qc_var_name, qc_max_valid=9):\n", " \"\"\"\n", " Load a named data variable from the EN3 file as an Iris cube.\n", "\n", " Generates a masked data cube, in which data is also masked where the\n", " related QC value exceeds a given threshold.\n", "\n", " Args:\n", "\n", " * main_var_name (string):\n", " the data variable long_name within the netCDF file.\n", " * qc_var_name (string):\n", " name of the related QC variable\n", "\n", " Kwargs:\n", " * qc_max_valid (int):\n", " Threshold value for QC data. QC values larger than this result in the\n", " datapoint being masked.\n", "\n", " \"\"\"\n", " # Load main and QC data values\n", " cube = load_en3_variable(main_var_name)\n", " qc_data = load_en3_variable(qc_var_name).data\n", " # Missing QC data equates to okay\n", " qc_data.set_fill_value(qc_max_valid)\n", " qc_data = qc_data.filled()\n", " # Convert to numbers\n", " qc_data = np.array(qc_data, dtype=int)\n", " # data QC figure means invalid if *too large*\n", " cube.data.mask |= qc_data > qc_max_valid\n", " return cube" ] }, { "cell_type": "markdown", "id": "070b5a0b", "metadata": {}, "source": [ "**NOTE:** This approach is still somewhat simplified, as the original files\n", "contain extra QC data that may need to be considered (see the file format specification, linked above). We are ignoring that here, for simplicity. " ] }, { "cell_type": "markdown", "id": "5a679bb0", "metadata": {}, "source": [ "#### Load the required data into cube variables." ] }, { "cell_type": "code", "execution_count": 5, "id": "7fdb7726", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "potm cube:\n", " corrected pot. temp / (degree_celsius) (-- : 2581; -- : 55)\n", " Attributes:\n", " _fillvalue 99999.0\n", " c_format '%9.3f'\n", " comment 'corrected value'\n", " fortran_format 'f9.3'\n", " resolution 0.001\n", " valid_max 40.0\n", " valid_min -3.0\n", "\n", "depth cube:\n", " corrected depth / (metre) (-- : 2581; -- : 55)\n", " Attributes:\n", " _fillvalue 99999.0\n", " c_format '%7.1f'\n", " fortran_format 'f7.1'\n", " resolution 0.1\n", " valid_max 15000.0\n", " valid_min 0.0\n", "\n", "longitude cube:\n", " longitude of the station, best estimated value / (degrees) (-- : 2581)\n", " Attributes:\n", " _fillvalue 99999.0\n", " valid_max 180.0\n", " valid_min -180.0\n", "\n", "time cube:\n", " julian day (utc) of the location relative to reference_date_time / (days since 1950-01-01 00:00:00) (-- : 2581)\n", " Attributes:\n", " _fillvalue 99999.0\n", " conventions 'relative julian days with decimal part (as parts of day)'\n" ] } ], "source": [ "# Get the main data (potential temperatures), applying quality levels\n", "potm = load_en3_with_quality_mask('corrected pot. temp',\n", " 'quality on pot. temperature',\n", " qc_max_valid=2)\n", "\n", "# Get depth and location info\n", "depth = load_en3_with_quality_mask('corrected depth', 'quality on depth')\n", "longitude = load_en3_with_quality_mask(\n", " 'longitude of the station, best estimated value',\n", " 'quality on position (latitude and longitude)',\n", " qc_max_valid=1)\n", "latitude = load_en3_with_quality_mask(\n", " 'latitude of the station, best estimated value',\n", " 'quality on position (latitude and longitude)',\n", " qc_max_valid=1)\n", "\n", "# Get time reference + convert to units string\n", "reftime = iris.load_cube(obs_filepath, 'date of reference for julian days')\n", "reftime_str = ''.join([i.decode() for i in reftime.data])\n", "assert len(reftime_str) == 14\n", "ref_unit_str = 'days since {:4s}-{:2s}-{:2s} {:2s}:{:2s}:{:2s}'.format(\n", " reftime_str[:4], reftime_str[4:6], reftime_str[6:8],\n", " reftime_str[8:10], reftime_str[10:12], reftime_str[12:14])\n", "\n", "# Get time data\n", "time = load_en3_with_quality_mask(\n", " 'julian day (utc) of the location relative to reference_date_time',\n", " 'quality on date and time')\n", "time.units = ref_unit_str\n", "\n", "# Show some results\n", "print('\\npotm cube:\\n', potm)\n", "print('\\ndepth cube:\\n', depth)\n", "print('\\nlongitude cube:\\n', longitude)\n", "print('\\ntime cube:\\n', time)" ] }, { "cell_type": "markdown", "id": "7e4955a8", "metadata": {}, "source": [ "**NOTE:**\n", "The data here is actually dimensioned by `[<\"profile id\">, <\"depth sample number\">]`. It turns out that these dimensions actually have no practical meaning for our purposes, except that position and time information only depend on the first of these. These dimensions will be lost when data is put onto the grid.\n", "\n", "**NOTE:**\n", "You would not normally see a '\\_fillvalue' attribute. This is due to a mis-spelling in the netCDF source datafiles -- see above." ] }, { "cell_type": "markdown", "id": "a631cd30", "metadata": {}, "source": [ "### Perform the regridding operation\n", "We want to place our data onto a regular grid.\n", "\n", "To do this, for each datapoint we work out which gridcell it belongs in, by comparing its measured latitude, longitude and depth values to the gridcell boundaries. \n", "We then calculate a result for each gridcell, which is a statistical combination of all the datapoints that fall within that cell. \n", "\n", "Generally there is less data than fills the whole grid (in this case, for instance, all gridcells over land will be empty). Thus, many cells will have no result, which is why we call this a 'sparse' arrangement.\n", "\n", "#### Define the parameters of the desired global grid" ] }, { "cell_type": "code", "execution_count": 6, "id": "59e67524", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "z_min, z_max, dz = 0.0, 500.0, 50.0\n", "y_min, y_max, dy = -90.0, 90.0, 2.0\n", "x_min, x_max, dx = -180.0, 180.0, 3.0\n", "nz = int((z_max - z_min) / dz)\n", "ny = int((y_max - y_min) / dy)\n", "nx = int((x_max - x_min) / dx)" ] }, { "cell_type": "markdown", "id": "268b1f22", "metadata": {}, "source": [ "#### Define a function for the main regridding operation\n", "\n", "Because of the sparse nature of the source data, we can use the \"cube.aggregated_by\" method. \n", "This collects data into categories based on attached categorical coordinate values (see Iris documentation for this method, under [Iris.cube.Cube](http://scitools.org.uk/iris/docs/latest/iris/iris/cube.html)).\n", "\n", "The basis of this operation is explained more fully in the \"Coordinate Categorisation\" example (see: \n", "[Coordinate Categorisation](http://nbviewer.ipython.org/urls/raw.github.com/SciTools/iris_example_code/master/coord_categorisation.ipynb)).\n", "\n", "In outline, we will do the following:\n", "\n", " * calculate gridcell indices for each data point from the associated values of depth, longitude and latitude\n", " * attach these index-value arrays to the data as auxiliary coordinates\n", " * categorise the data over the index coordinates, performing an aggregation (mean, std-dev or count) for each result \n", " * expand the results, which are still sparse, into a full cube on the required grid" ] }, { "cell_type": "code", "execution_count": 7, "id": "28db05fb", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "def aggregate_to_grid(phenom, aggregator=iris.analysis.MEAN, **agg_kwargs):\n", " # Regrid a data cube onto the grid previously specified (by 'nx', 'dx',\n", " # 'x_min', 'x_max' and the equivalents in y and z).\n", " #\n", " # For this, each datapoint is assigned to a gridcell 'box' according to its\n", " # corresponding coordinate values (depth, longitude, latitude).\n", " #\n", " # N.B. agg_kwargs is passed to the aggregator operation.\n", "\n", " # Make a 1-D 'flattened' version of the source data cube\n", " # (as 1D coordinates are required by the 'aggregate_by' operation)\n", " cube_flat = iris.cube.Cube(phenom.data.flat[...])\n", " cube_flat.metadata = phenom.metadata\n", "\n", " # Make coordinates containing gridcell indices for each grid coord value\n", " def add_index_coord(coord_points, coord_name, start, step):\n", " # Calculate gridcell indices of all the points.\n", " cell_indices = np.floor((coord_points - start) / step)\n", " # Make these all integers, for eventual use as array indices.\n", " cell_indices = np.array(cell_indices, dtype=int)\n", " # Add these as a \"categorical coord\" to aggregate by.\n", " cube_flat.add_aux_coord(\n", " iris.coords.AuxCoord(cell_indices, long_name=coord_name),\n", " 0)\n", "\n", " # Add a coordinate containing gridcell indices in the z-dimension (depth).\n", " add_index_coord(depth.data.flat[:], 'i_depth', z_min, dz)\n", " # Repeat for lats and lons -- except these need broadcasting to 2d first\n", " lons_2d, _ = np.broadcast_arrays(longitude.data[:, None], phenom.data)\n", " lats_2d, _ = np.broadcast_arrays(latitude.data[:, None], phenom.data)\n", " add_index_coord(lons_2d.flat[:], 'i_lon', x_min, dx)\n", " add_index_coord(lats_2d.flat[:], 'i_lat', y_min, dy)\n", "\n", " # Aggregate the data to get a statistical result for each 'inhabited' cell.\n", " result_cells = cube_flat.aggregated_by(('i_depth', 'i_lat', 'i_lon'),\n", " aggregator=aggregator,\n", " **agg_kwargs)\n", "\n", " # Make a full-grid result cube.\n", " # N.B. metadata comes from aggregation (includes units + cell_method)\n", " full_data_empty = np.ma.MaskedArray(np.zeros((nz, ny, nx),\n", " dtype=result_cells.data.dtype),\n", " mask=True)\n", " result_cube = iris.cube.Cube(full_data_empty)\n", " result_cube.metadata = result_cells.metadata\n", "\n", " # Assign aggregation results to the appropriate gridcells ...\n", " i_z, i_y, i_x = [result_cells.coord(coord_name).points\n", " for coord_name in ('i_depth', 'i_lat', 'i_lon')]\n", " # ... but discarding results that lie outside the target grid.\n", " i_ok = np.where((i_z >= 0) & (i_z < nz) &\n", " (i_y >= 0) & (i_y < ny) &\n", " (i_x >= 0) & (i_x < nx))\n", " i_z, i_y, i_x = i_z[i_ok], i_y[i_ok], i_x[i_ok]\n", " result_cube.data[[i_z, i_y, i_x]] = result_cells.data[i_ok]\n", "\n", " # Add DimCoords defining the grid.\n", " result_cube.add_dim_coord(iris.coords.DimCoord.from_regular(\n", " z_min - 0.5 * dz, dz, nz, with_bounds=True,\n", " standard_name='depth', units='metres'), 0)\n", " result_cube.add_dim_coord(iris.coords.DimCoord.from_regular(\n", " y_min - 0.5 * dy, dy, ny, with_bounds=True,\n", " standard_name='latitude', units='degrees'), 1)\n", " result_cube.add_dim_coord(iris.coords.DimCoord.from_regular(\n", " x_min - 0.5 * dx, dx, nx, with_bounds=True,\n", " standard_name='longitude', units='degrees'), 2)\n", "\n", " return result_cube" ] }, { "cell_type": "markdown", "id": "e1d73334", "metadata": {}, "source": [ "### Make and output the results\n", "\n", "First regrid the data to form the main required result cubes." ] }, { "cell_type": "code", "execution_count": 8, "id": "944fd534", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "ename": "ValueError", "evalue": "shape mismatch: value array of shape (1308,) could not be broadcast to indexing result of shape (3,1308,90,120)", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[8], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# Regrid the main data to get a gridded mean, std-dev and count.\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m potm_mean \u001b[38;5;241m=\u001b[39m \u001b[43maggregate_to_grid\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpotm\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 3\u001b[0m potm_std_dev \u001b[38;5;241m=\u001b[39m aggregate_to_grid(potm, iris\u001b[38;5;241m.\u001b[39manalysis\u001b[38;5;241m.\u001b[39mSTD_DEV)\n\u001b[1;32m 4\u001b[0m potm_counts \u001b[38;5;241m=\u001b[39m aggregate_to_grid(potm,\n\u001b[1;32m 5\u001b[0m iris\u001b[38;5;241m.\u001b[39manalysis\u001b[38;5;241m.\u001b[39mCOUNT,\n\u001b[1;32m 6\u001b[0m function\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mlambda\u001b[39;00m points: np\u001b[38;5;241m.\u001b[39mones(points\u001b[38;5;241m.\u001b[39mshape))\n", "Cell \u001b[0;32mIn[7], line 55\u001b[0m, in \u001b[0;36maggregate_to_grid\u001b[0;34m(phenom, aggregator, **agg_kwargs)\u001b[0m\n\u001b[1;32m 51\u001b[0m i_ok \u001b[38;5;241m=\u001b[39m np\u001b[38;5;241m.\u001b[39mwhere((i_z \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m) \u001b[38;5;241m&\u001b[39m (i_z \u001b[38;5;241m<\u001b[39m nz) \u001b[38;5;241m&\u001b[39m\n\u001b[1;32m 52\u001b[0m (i_y \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m) \u001b[38;5;241m&\u001b[39m (i_y \u001b[38;5;241m<\u001b[39m ny) \u001b[38;5;241m&\u001b[39m\n\u001b[1;32m 53\u001b[0m (i_x \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m) \u001b[38;5;241m&\u001b[39m (i_x \u001b[38;5;241m<\u001b[39m nx))\n\u001b[1;32m 54\u001b[0m i_z, i_y, i_x \u001b[38;5;241m=\u001b[39m i_z[i_ok], i_y[i_ok], i_x[i_ok]\n\u001b[0;32m---> 55\u001b[0m \u001b[43mresult_cube\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdata\u001b[49m\u001b[43m[\u001b[49m\u001b[43m[\u001b[49m\u001b[43mi_z\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mi_y\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mi_x\u001b[49m\u001b[43m]\u001b[49m\u001b[43m]\u001b[49m \u001b[38;5;241m=\u001b[39m result_cells\u001b[38;5;241m.\u001b[39mdata[i_ok]\n\u001b[1;32m 57\u001b[0m \u001b[38;5;66;03m# Add DimCoords defining the grid.\u001b[39;00m\n\u001b[1;32m 58\u001b[0m result_cube\u001b[38;5;241m.\u001b[39madd_dim_coord(iris\u001b[38;5;241m.\u001b[39mcoords\u001b[38;5;241m.\u001b[39mDimCoord\u001b[38;5;241m.\u001b[39mfrom_regular(\n\u001b[1;32m 59\u001b[0m z_min \u001b[38;5;241m-\u001b[39m \u001b[38;5;241m0.5\u001b[39m \u001b[38;5;241m*\u001b[39m dz, dz, nz, with_bounds\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m,\n\u001b[1;32m 60\u001b[0m standard_name\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mdepth\u001b[39m\u001b[38;5;124m'\u001b[39m, units\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmetres\u001b[39m\u001b[38;5;124m'\u001b[39m), \u001b[38;5;241m0\u001b[39m)\n", "File \u001b[0;32m~/miniconda3/envs/examplecode/lib/python3.12/contextlib.py:81\u001b[0m, in \u001b[0;36mContextDecorator.__call__..inner\u001b[0;34m(*args, **kwds)\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[38;5;129m@wraps\u001b[39m(func)\n\u001b[1;32m 79\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21minner\u001b[39m(\u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwds):\n\u001b[1;32m 80\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_recreate_cm():\n\u001b[0;32m---> 81\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/miniconda3/envs/examplecode/lib/python3.12/site-packages/numpy/ma/core.py:3397\u001b[0m, in \u001b[0;36mMaskedArray.__setitem__\u001b[0;34m(self, indx, value)\u001b[0m\n\u001b[1;32m 3395\u001b[0m _data[indx\u001b[38;5;241m.\u001b[39mdata] \u001b[38;5;241m=\u001b[39m dval\n\u001b[1;32m 3396\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m-> 3397\u001b[0m \u001b[43m_data\u001b[49m\u001b[43m[\u001b[49m\u001b[43mindx\u001b[49m\u001b[43m]\u001b[49m \u001b[38;5;241m=\u001b[39m dval\n\u001b[1;32m 3398\u001b[0m _mask[indx] \u001b[38;5;241m=\u001b[39m mval\n\u001b[1;32m 3399\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(indx, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mdtype\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m (indx\u001b[38;5;241m.\u001b[39mdtype \u001b[38;5;241m==\u001b[39m MaskType):\n", "\u001b[0;31mValueError\u001b[0m: shape mismatch: value array of shape (1308,) could not be broadcast to indexing result of shape (3,1308,90,120)" ] } ], "source": [ "# Regrid the main data to get a gridded mean, std-dev and count.\n", "potm_mean = aggregate_to_grid(potm)\n", "potm_std_dev = aggregate_to_grid(potm, iris.analysis.STD_DEV)\n", "potm_counts = aggregate_to_grid(potm,\n", " iris.analysis.COUNT,\n", " function=lambda points: np.ones(points.shape))\n", " # NOTE: 'function' arg looks a bit odd, as must return an *array* of bool" ] }, { "cell_type": "markdown", "id": "1bf8d6c7", "metadata": {}, "source": [ "Additionally, mask data according to the \"`count > 3`\" requirement (task point \\#7)." ] }, { "cell_type": "code", "execution_count": null, "id": "db938241", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# Add occupancy 'count > 3' as an extra condition for valid data on the others\n", "invalid_counts = potm_counts.data <= 3\n", "potm_mean.data.mask |= invalid_counts\n", "potm_std_dev.data.mask |= invalid_counts" ] }, { "cell_type": "markdown", "id": "3f2638d3", "metadata": {}, "source": [ "Also add a bounded 'time' coord from the minimum and maximum times of the values contributing to each cell (task point \\#8).\n", "\n", "**NOTE:** In principle, the 'aggregate\\_by' method should be able to _automatically_ calculate these time bounds, if the time values were attached to the data as an auxiliary coordinate. However, at present it does not calculate these correctly." ] }, { "cell_type": "code", "execution_count": null, "id": "ec9aacc8", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# Expand the times to 2d so we can use the same code to get min+max times.\n", "time_data2d, _ = np.broadcast_arrays(time.data[:, None], potm.data)\n", "time2d = iris.cube.Cube(time_data2d)\n", "time2d.metadata = time.metadata\n", "\n", "# Add a bounded time coordinate to all the output cubes.\n", "time_min = aggregate_to_grid(time2d, iris.analysis.MIN)\n", "time_max = aggregate_to_grid(time2d, iris.analysis.MAX)\n", "time_centres = 0.5 * (time_min.data + time_max.data)\n", "time_bounds = np.concatenate((time_min.data[..., None],\n", " time_max.data[..., None]),\n", " axis=-1)\n", "time_coord = iris.coords.AuxCoord(time_centres,\n", " bounds=time_bounds,\n", " units=time.units,\n", " standard_name='time')\n", "potm_mean.add_aux_coord(time_coord, (0, 1, 2))\n", "potm_std_dev.add_aux_coord(time_coord, (0, 1, 2))\n", "potm_counts.add_aux_coord(time_coord, (0, 1, 2))" ] }, { "cell_type": "markdown", "id": "be510142", "metadata": {}, "source": [ "Save results to a file." ] }, { "cell_type": "code", "execution_count": null, "id": "1ce978cf", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# Save result cubes to netCDF\n", "iris.save((potm_mean, potm_std_dev, potm_counts), 'temp_sparse_regrid.nc')" ] }, { "cell_type": "markdown", "id": "f5bb7989", "metadata": {}, "source": [ "Plot, and print, a 2d slice of the cube to give an idea of the results." ] }, { "cell_type": "code", "execution_count": null, "id": "a47b65ec", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# Plot the mean result at the top level\n", "plt.figure(figsize=(12, 8))\n", "qplt.pcolormesh(potm_mean[0])\n", "plt.gca().coastlines()\n", "plt.show()\n", "\n", "print (potm_mean[0])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.5" } }, "nbformat": 4, "nbformat_minor": 5 }