{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy and arrays\n", "We do not advise you to use Numpy directly. The XArray and Pandas packages are now mature enough to be used instead. However, Numpy is the historical package for numerical calculations in Python. The other packages have been built upon it, i.e. they encapsulate Numpy arrays with additional information to simplify some analyses. \n", "\n", "Additionally, some functionalities of Numpy haven't been ported to Xarray and should be used directly.\n", "Additionally, most examples you will find in documentation of some packages (e.g. SciPy) or StackOverflow might use Numpy. It is worth for you knowing about this package and understanding how to adapt Numpy examples to Xarray." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create an array\n", "There are plenty of ways to do so depending on what you want. The low-level function is `np.ndarray()` which you probably won't use much. But the [webpage](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) for this function is interesting as it lists all attributes and methods associated with numpy arrays!" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[2.00000000e+000, 2.00000000e+000, 9.38724727e-323],\n", " [0.00000000e+000, 0.00000000e+000, 5.02034658e+175],\n", " [1.21647076e-046, 4.95419814e-062, 3.27611965e+179],\n", " [1.51954764e-051, 5.82777938e-144, 1.12855837e+277]],\n", "\n", " [[6.32672840e+180, 2.66371964e+233, 5.04621361e+180],\n", " [8.37170571e-144, 1.36347154e+161, 4.50618609e-144],\n", " [7.79952704e-143, 8.45341552e+169, 1.68749783e+160],\n", " [1.16066661e-046, 1.23139611e+165, 1.41529403e+161]]])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Low-level \n", "rr = np.ndarray(shape=(2,4,3),dtype=float)\n", "rr" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[3. 4.]\n", " [5. 6.]]\n" ] } ], "source": [ "# From list or tuple\n", "rr = np.array([[3.,4.],[5.,6.]])\n", "print(rr)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['Claire' 'Paola']\n", " ['Scott' 'Danny']]\n" ] } ], "source": [ "rr1 = np.array([['Claire','Paola'],['Scott','Danny']]) # It doesn't have to be a numerical type \n", "print(rr1)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['Claire' '10']\n", " ['Paola' '6']]\n" ] } ], "source": [ "rr2 = np.array([['Claire',10], ['Paola', 6]]) # It doesn't have to be only 1 type\n", "print(rr2)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0. 0.]\n", " [0. 0. 0.]]\n", "[[1 1 1 1]\n", " [1 1 1 1]]\n" ] } ], "source": [ "# Initialise to 0 or 1.\n", "rr = np.zeros((2,3),dtype=float)\n", "print(rr)\n", "rr1 = np.ones((2,4),dtype=np.int32)\n", "print(rr1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 0, 0],\n", " [0, 0, 0, 0]], dtype=int32)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Same shape as an existing array. It's possible to choose the data-type with the dtype argument.\n", "rr2 = np.zeros_like(rr1)\n", "rr2" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,\n", " 39, 41, 43])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Evenly spaced values\n", "rr2= np.arange(5,45,2)\n", "rr2" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 5, 7],\n", " [ 9, 11]],\n", "\n", " [[13, 15],\n", " [17, 19]],\n", "\n", " [[21, 23],\n", " [25, 27]],\n", "\n", " [[29, 31],\n", " [33, 35]],\n", "\n", " [[37, 39],\n", " [41, 43]]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reshaping an existing array\n", "rr2 = rr2.reshape((5,2,2))\n", "rr2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read data from file\n", "Do you remember the csv example from the last training? Here it is with numpy." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[50. 30. 40.]\n", " [70. 20. 30.]]\n", "[[50. 70.]\n", " [30. 20.]\n", " [40. 30.]]\n", "[50. 70.] [30. 20.] [40. 30.]\n" ] } ], "source": [ "li = np.loadtxt('test.txt',delimiter=',',skiprows=2)\n", "print(li)\n", "# For the third format example, simply take the transpose\n", "print(li.T)\n", "# You want the columns in separate arrays?\n", "c1,c2,c3 = np.loadtxt('test.txt', delimiter=',',skiprows=2,unpack=True)\n", "print(c1,c2,c3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Indexing\n", "It is the same as for lists etc, except for the multi-dimensional part:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First element 5\n", "\n", "First index of the second dimension \n", " [[ 5 7]\n", " [13 15]\n", " [21 23]\n", " [29 31]\n", " [37 39]]\n", "\n", "First 2 indexes along the 1st dimension and all other indexes along other dimensions\n", " [[[ 5 7]\n", " [ 9 11]]\n", "\n", " [[13 15]\n", " [17 19]]]\n", "\n", "Stride [ 7 23 39]\n", "\n" ] } ], "source": [ "print(f\"First element {rr2[0,0,0]}\\n\")\n", "print(f\"First index of the second dimension \\n {rr2[:,0,:]}\\n\")\n", "print(f\"First 2 indexes along the 1st dimension and all other indexes along other dimensions\\n {rr2[:2,:,:]}\\n\")\n", "print(f\"Stride {rr2[0:5:2,0,1]}\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is a generic form to say \"all other indexes along all other dimensions\", i.e. \"everything else\", without specifying the number of dimensions in your array. It can be used to indicate all dimensions before or after the specified slice:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Specify slices in all dimensions: \n", "[[[ 5 7]\n", " [ 9 11]]\n", "\n", " [[13 15]\n", " [17 19]]]\n", "\n", "Generic form:\n", "[[[ 5 7]\n", " [ 9 11]]\n", "\n", " [[13 15]\n", " [17 19]]]\n", "\n", "Any number of dimensions specified before:\n", "[[ 5 7]\n", " [13 15]]\n", "\n", "Works for the start of the array as well:\n", "[[ 5 9]\n", " [13 17]\n", " [21 25]\n", " [29 33]\n", " [37 41]]\n", "\n", "You can omit the dimensions, but it is confusing to read:\n", "[[[ 5 7]\n", " [ 9 11]]\n", "\n", " [[13 15]\n", " [17 19]]]\n", "\n" ] } ], "source": [ "print(f\"Specify slices in all dimensions: \\n{rr2[:2,:,:]}\\n\")\n", "print(f\"Generic form:\\n{rr2[:2,...]}\\n\")\n", "print(f\"Any number of dimensions specified before:\\n{rr2[:2,0,...]}\\n\")\n", "print(f\"Works for the start of the array as well:\\n{rr2[...,0]}\\n\")\n", "print(f\"You can omit the dimensions, but it is confusing to read:\\n{rr2[:2]}\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Matlab users\n", "In Matlab, arrays are matrices. That is not true in Python. This means in Matlab, the multiplication is the matrice multiplication, in Python that's multiplication element by element.\n", "This [page](https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html#numpy-for-matlab-users-notes) provides a long table of equivalents between Matlab and Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations with arrays along some given axis\n", "`numpy` has a lot of handy functions for common operations. For example if you want the mean of an array:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "24.0" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rr2.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's handy, what is even more handy is the possibility to calculate the mean over a given dimension only. For example, rr2 is 3D. Let's say the dimensions are time, latitude and longitude respectively and you want to calculate the time average at each spatial point:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[21., 23.],\n", " [25., 27.]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rr2.mean(axis=0) # Remember indexes start at 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Working with time\n", "There is already a lot out there. You probably won't need to develop much yourself.\n", "\n", "Numpy has a date data type: `datetime64`. Do not confuse `datetime64` from Numpy and `datetime` from Python! They do not have the same methods or abilities. Both can be useful.\n", "\n", "`datetime64` is relatively simple, it doesn't have a lot of built-in capabilities. When doing fancy date calculations in Python, the must is probably to work with `pandas`. `pandas` is built upon numpy so can readily convert `datetime64` to its own date and time objects. \n", "\n", "Note `xarray` and `pandas` are also very compatible with each other." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2020-04-15T05:00:00.000\n", "2020-04\n" ] } ], "source": [ "print(np.datetime64('2020-04-15T05:00','ms'))\n", "print(np.datetime64('2020-04-10','M'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can do simple calculations. For example, the number of days in February 2036:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "29 days\n" ] } ], "source": [ "print(np.datetime64('2036-03','D')-np.datetime64('2036-02','D'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Be careful of the unit:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 months\n" ] } ], "source": [ "print(np.datetime64('2036-03','M')-np.datetime64('2036-02','M'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's possible to convert units. So if you want both the number of months and the number of days:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "22 months\n", "669 days\n" ] } ], "source": [ "timeA = np.datetime64('1988-05','M')\n", "timeB = np.datetime64('1990-03','M')\n", "delta = timeB - timeA\n", "print(delta)\n", "deltaD = timeB.astype('datetime64[D]') - timeA.astype('datetime64[D]')\n", "print(deltaD)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note, `delta` and `deltaD` are not the strings printed out above. They are `numpy.timedelta64` objects. The `print()` function gives a pretty output because of the way the object has been developed." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.timedelta64(669,'D')" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deltaD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Xarray \n", "Numpy is great but it is very generic. And it only gives you the raw data. The coder has to keep track of the additional information: is there a time dimension? What field does this data represent? etc.\n", "\n", "Xarray introduces labelled arrays which typically means you get self-described arrays: name of\n", "the field, name of dimensions, coordinates for the dimensions, etc.\n", "\n", "As such it works very well with the netCDF format since this is also a self-describing format." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "import xarray as xr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading in netcdf file" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "Show/Hide data repr\n", "\n", "\n", "\n", "\n", "\n", "Show/Hide attributes\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
xarray.Dataset
" ], "text/plain": [ "\n", "Dimensions: (lat: 145, lon: 192, time: 408)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0\n", " * lon (lon) float32 0.0 1.875 3.75 5.625 ... 352.5 354.375 356.25 358.125\n", " * time (time) datetime64[ns] 1978-01-16T12:00:00 ... 2011-12-16T12:00:00\n", "Data variables:\n", " pr (time, lat, lon) float32 ...\n", " tas (time, lat, lon) float32 ...\n", "Attributes:\n", " input_file_format: UM ancillary\n", " input_uri: file:///short/dt6/rzl561/UM_ROUTDIR/rzl5...\n", " input_word_length: 8\n", " history: Wed Jun 4 11:09:52 2014: ncks -v tas,pr...\n", " input_byte_ordering: big_endian\n", " nco_openmp_thread_number: 1\n", " NCO: 4.3.8\n", " summary: ACCESS1.3b model output \\n from Land use...\n", " institution: ARC Centre of Excellence for Climate Sys...\n", " source: ACCESS1.3b (HadGEM3-CABLE2.0), AMIP, N96\\n\n", " contact: ruth.lorenz@env.ethz.ch\n", " title: ACCESS1.3b model output from the Amazoni...\n", " Conventions: CF-1.6, ACDD-1.3\n", " license: http://creativecommons.org/licenses/by-n...\n", " id: http://dx.doi.org/10.4225/41/563BEB78E3A93\n", " keywords: climatology\n", " date_created: 2014-06-04\n", " creator_name: Ruth Lorenz\n", " creator_email: ruth.lorenz@env.ethz.ch\n", " publisher_name: ARCCSS data manager\n", " publisher_email: paola.petrelli@utas.edu.au\n", " product_version: 1.0\n", " references: Lorenz, R., and A. J. Pitman (2014), Eff...\n", " time_coverage_start: 1978-01-01T00:00\n", " time_coverage_end: 2012-01-01T00:00\n", " geospatial_lat_min: -90.0\n", " geospatial_lat_max: 90.0\n", " geospatial_lon_min: 0.0\n", " geospatial_lon_max: 358.125\n", " DODS_EXTRA.Unlimited_Dimension: time" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# open netcdf file\n", "ds = xr.open_dataset(\"http://dapds00.nci.org.au/thredds/dodsC/ks32/ARCCSS_Data/ACCESS1-3b_AMZDEF/v1-0/001GPwc.tas_pr_monthly_TS.1978_2011.nc\")\n", "# see how all the info is there\n", "ds\n", "# print just a variable to see the variable level attributes." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "Show/Hide data repr\n", "\n", "\n", "\n", "\n", "\n", "Show/Hide attributes\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
xarray.DataArray
'tas'
  • time: 408
  • lat: 145
  • lon: 192
  • ...
    [11358720 values with dtype=float32]
    • lat
      (lat)
      float32
      -90.0 -88.75 -87.5 ... 88.75 90.0
      units :
      degrees_north
      standard_name :
      latitude
      point_spacing :
      even
      axis :
      Y
      array([-90.  , -88.75, -87.5 , -86.25, -85.  , -83.75, -82.5 , -81.25, -80.  ,\n",
             "       -78.75, -77.5 , -76.25, -75.  , -73.75, -72.5 , -71.25, -70.  , -68.75,\n",
             "       -67.5 , -66.25, -65.  , -63.75, -62.5 , -61.25, -60.  , -58.75, -57.5 ,\n",
             "       -56.25, -55.  , -53.75, -52.5 , -51.25, -50.  , -48.75, -47.5 , -46.25,\n",
             "       -45.  , -43.75, -42.5 , -41.25, -40.  , -38.75, -37.5 , -36.25, -35.  ,\n",
             "       -33.75, -32.5 , -31.25, -30.  , -28.75, -27.5 , -26.25, -25.  , -23.75,\n",
             "       -22.5 , -21.25, -20.  , -18.75, -17.5 , -16.25, -15.  , -13.75, -12.5 ,\n",
             "       -11.25, -10.  ,  -8.75,  -7.5 ,  -6.25,  -5.  ,  -3.75,  -2.5 ,  -1.25,\n",
             "         0.  ,   1.25,   2.5 ,   3.75,   5.  ,   6.25,   7.5 ,   8.75,  10.  ,\n",
             "        11.25,  12.5 ,  13.75,  15.  ,  16.25,  17.5 ,  18.75,  20.  ,  21.25,\n",
             "        22.5 ,  23.75,  25.  ,  26.25,  27.5 ,  28.75,  30.  ,  31.25,  32.5 ,\n",
             "        33.75,  35.  ,  36.25,  37.5 ,  38.75,  40.  ,  41.25,  42.5 ,  43.75,\n",
             "        45.  ,  46.25,  47.5 ,  48.75,  50.  ,  51.25,  52.5 ,  53.75,  55.  ,\n",
             "        56.25,  57.5 ,  58.75,  60.  ,  61.25,  62.5 ,  63.75,  65.  ,  66.25,\n",
             "        67.5 ,  68.75,  70.  ,  71.25,  72.5 ,  73.75,  75.  ,  76.25,  77.5 ,\n",
             "        78.75,  80.  ,  81.25,  82.5 ,  83.75,  85.  ,  86.25,  87.5 ,  88.75,\n",
             "        90.  ], dtype=float32)
    • lon
      (lon)
      float32
      0.0 1.875 3.75 ... 356.25 358.125
      units :
      degrees_east
      standard_name :
      longitude
      modulo :
      360.0
      point_spacing :
      even
      axis :
      X
      array([  0.   ,   1.875,   3.75 ,   5.625,   7.5  ,   9.375,  11.25 ,  13.125,\n",
             "        15.   ,  16.875,  18.75 ,  20.625,  22.5  ,  24.375,  26.25 ,  28.125,\n",
             "        30.   ,  31.875,  33.75 ,  35.625,  37.5  ,  39.375,  41.25 ,  43.125,\n",
             "        45.   ,  46.875,  48.75 ,  50.625,  52.5  ,  54.375,  56.25 ,  58.125,\n",
             "        60.   ,  61.875,  63.75 ,  65.625,  67.5  ,  69.375,  71.25 ,  73.125,\n",
             "        75.   ,  76.875,  78.75 ,  80.625,  82.5  ,  84.375,  86.25 ,  88.125,\n",
             "        90.   ,  91.875,  93.75 ,  95.625,  97.5  ,  99.375, 101.25 , 103.125,\n",
             "       105.   , 106.875, 108.75 , 110.625, 112.5  , 114.375, 116.25 , 118.125,\n",
             "       120.   , 121.875, 123.75 , 125.625, 127.5  , 129.375, 131.25 , 133.125,\n",
             "       135.   , 136.875, 138.75 , 140.625, 142.5  , 144.375, 146.25 , 148.125,\n",
             "       150.   , 151.875, 153.75 , 155.625, 157.5  , 159.375, 161.25 , 163.125,\n",
             "       165.   , 166.875, 168.75 , 170.625, 172.5  , 174.375, 176.25 , 178.125,\n",
             "       180.   , 181.875, 183.75 , 185.625, 187.5  , 189.375, 191.25 , 193.125,\n",
             "       195.   , 196.875, 198.75 , 200.625, 202.5  , 204.375, 206.25 , 208.125,\n",
             "       210.   , 211.875, 213.75 , 215.625, 217.5  , 219.375, 221.25 , 223.125,\n",
             "       225.   , 226.875, 228.75 , 230.625, 232.5  , 234.375, 236.25 , 238.125,\n",
             "       240.   , 241.875, 243.75 , 245.625, 247.5  , 249.375, 251.25 , 253.125,\n",
             "       255.   , 256.875, 258.75 , 260.625, 262.5  , 264.375, 266.25 , 268.125,\n",
             "       270.   , 271.875, 273.75 , 275.625, 277.5  , 279.375, 281.25 , 283.125,\n",
             "       285.   , 286.875, 288.75 , 290.625, 292.5  , 294.375, 296.25 , 298.125,\n",
             "       300.   , 301.875, 303.75 , 305.625, 307.5  , 309.375, 311.25 , 313.125,\n",
             "       315.   , 316.875, 318.75 , 320.625, 322.5  , 324.375, 326.25 , 328.125,\n",
             "       330.   , 331.875, 333.75 , 335.625, 337.5  , 339.375, 341.25 , 343.125,\n",
             "       345.   , 346.875, 348.75 , 350.625, 352.5  , 354.375, 356.25 , 358.125],\n",
             "      dtype=float32)
    • time
      (time)
      datetime64[ns]
      1978-01-16T12:00:00 ... 2011-12-16T12:00:00
      standard_name :
      time
      axis :
      T
      array(['1978-01-16T12:00:00.000000000', '1978-02-15T00:00:00.000000000',\n",
             "       '1978-03-16T12:00:00.000000000', ..., '2011-10-16T12:00:00.000000000',\n",
             "       '2011-11-16T00:00:00.000000000', '2011-12-16T12:00:00.000000000'],\n",
             "      dtype='datetime64[ns]')
  • stash_item :
    236
    stash_model :
    1
    lookup_source :
    defaults (cdunifpp V0.13)
    long_name :
    air temperature at 1.5 m
    cell_methods :
    time: mean
    units :
    K
    stash_section :
    3
    name :
    tas
    standard_name :
    air_temperature
" ], "text/plain": [ "\n", "[11358720 values with dtype=float32]\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0\n", " * lon (lon) float32 0.0 1.875 3.75 5.625 ... 352.5 354.375 356.25 358.125\n", " * time (time) datetime64[ns] 1978-01-16T12:00:00 ... 2011-12-16T12:00:00\n", "Attributes:\n", " stash_item: 236\n", " stash_model: 1\n", " lookup_source: defaults (cdunifpp V0.13)\n", " long_name: air temperature at 1.5 m\n", " cell_methods: time: mean\n", " units: K\n", " stash_section: 3\n", " name: tas\n", " standard_name: air_temperature" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Variables are stored in the Dataset in a dictionary so you can refer to them by name\n", "tas = ds['tas']\n", "tas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculations using dimension names\n", "Just like numpy, arrays have common functions as methods. But unlike numpy, you can identify\n", "dimensions by name instead of index position.\n", "\n", "Xarray arrays work with most numpy functions. If not, you can access the underlying numpy array." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Global mean\n", " \n", "array(278.28284, dtype=float32) \n", "\n", "Latitudinal mean\n", " \n", "array([[278.467 , 278.4859 , 278.38632, ..., 278.32104, 278.27985,\n", " 278.3821 ],\n", " [278.21014, 278.16296, 278.00296, ..., 278.27924, 278.2395 ,\n", " 278.24582],\n", " [277.4103 , 277.37396, 277.2726 , ..., 277.37128, 277.35922,\n", " 277.40472],\n", " ...,\n", " [278.6603 , 278.71646, 278.6278 , ..., 278.4839 , 278.49448,\n", " 278.61685],\n", " [279.5895 , 279.64603, 279.59464, ..., 279.65546, 279.55872,\n", " 279.58517],\n", " [279.07565, 279.1083 , 279.01367, ..., 279.26144, 279.12497,\n", " 279.1093 ]], dtype=float32)\n", "Coordinates:\n", " * lon (lon) float32 0.0 1.875 3.75 5.625 ... 352.5 354.375 356.25 358.125\n", " * time (time) datetime64[ns] 1978-01-16T12:00:00 ... 2011-12-16T12:00:00\n" ] } ], "source": [ "print('Global mean\\n',tas.mean(),'\\n')\n", "print('Latitudinal mean\\n',tas.mean(dim='lat'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selection using coordinate values\n", "Remember how the variables have the dimension names and the coordinate arrays attached to them? This means you can select data using the coordinate names and values rather than indexes." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "Show/Hide data repr\n", "\n", "\n", "\n", "\n", "\n", "Show/Hide attributes\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
xarray.DataArray
'tas'
  • time: 408
  • lon: 192
  • 245.47575 245.20438 244.97054 ... 249.84563 249.65027 249.47154
    array([[245.47575, 245.20438, 244.97054, ..., 246.39372, 246.07375, 245.7676 ],\n",
           "       [237.7508 , 237.50137, 237.2808 , ..., 238.60011, 238.29971, 238.01099],\n",
           "       [223.22281, 222.64638, 222.1247 , ..., 225.09627, 224.4605 , 223.82877],\n",
           "       ...,\n",
           "       [225.40732, 225.0019 , 224.64072, ..., 226.89435, 226.37177, 225.8708 ],\n",
           "       [243.86288, 243.60034, 243.36409, ..., 244.71707, 244.42044, 244.13028],\n",
           "       [249.29846, 249.13568, 248.98572, ..., 249.84563, 249.65027, 249.47154]],\n",
           "      dtype=float32)
    • lat
      ()
      float32
      -85.0
      units :
      degrees_north
      standard_name :
      latitude
      point_spacing :
      even
      axis :
      Y
      array(-85., dtype=float32)
    • lon
      (lon)
      float32
      0.0 1.875 3.75 ... 356.25 358.125
      units :
      degrees_east
      standard_name :
      longitude
      modulo :
      360.0
      point_spacing :
      even
      axis :
      X
      array([  0.   ,   1.875,   3.75 ,   5.625,   7.5  ,   9.375,  11.25 ,  13.125,\n",
             "        15.   ,  16.875,  18.75 ,  20.625,  22.5  ,  24.375,  26.25 ,  28.125,\n",
             "        30.   ,  31.875,  33.75 ,  35.625,  37.5  ,  39.375,  41.25 ,  43.125,\n",
             "        45.   ,  46.875,  48.75 ,  50.625,  52.5  ,  54.375,  56.25 ,  58.125,\n",
             "        60.   ,  61.875,  63.75 ,  65.625,  67.5  ,  69.375,  71.25 ,  73.125,\n",
             "        75.   ,  76.875,  78.75 ,  80.625,  82.5  ,  84.375,  86.25 ,  88.125,\n",
             "        90.   ,  91.875,  93.75 ,  95.625,  97.5  ,  99.375, 101.25 , 103.125,\n",
             "       105.   , 106.875, 108.75 , 110.625, 112.5  , 114.375, 116.25 , 118.125,\n",
             "       120.   , 121.875, 123.75 , 125.625, 127.5  , 129.375, 131.25 , 133.125,\n",
             "       135.   , 136.875, 138.75 , 140.625, 142.5  , 144.375, 146.25 , 148.125,\n",
             "       150.   , 151.875, 153.75 , 155.625, 157.5  , 159.375, 161.25 , 163.125,\n",
             "       165.   , 166.875, 168.75 , 170.625, 172.5  , 174.375, 176.25 , 178.125,\n",
             "       180.   , 181.875, 183.75 , 185.625, 187.5  , 189.375, 191.25 , 193.125,\n",
             "       195.   , 196.875, 198.75 , 200.625, 202.5  , 204.375, 206.25 , 208.125,\n",
             "       210.   , 211.875, 213.75 , 215.625, 217.5  , 219.375, 221.25 , 223.125,\n",
             "       225.   , 226.875, 228.75 , 230.625, 232.5  , 234.375, 236.25 , 238.125,\n",
             "       240.   , 241.875, 243.75 , 245.625, 247.5  , 249.375, 251.25 , 253.125,\n",
             "       255.   , 256.875, 258.75 , 260.625, 262.5  , 264.375, 266.25 , 268.125,\n",
             "       270.   , 271.875, 273.75 , 275.625, 277.5  , 279.375, 281.25 , 283.125,\n",
             "       285.   , 286.875, 288.75 , 290.625, 292.5  , 294.375, 296.25 , 298.125,\n",
             "       300.   , 301.875, 303.75 , 305.625, 307.5  , 309.375, 311.25 , 313.125,\n",
             "       315.   , 316.875, 318.75 , 320.625, 322.5  , 324.375, 326.25 , 328.125,\n",
             "       330.   , 331.875, 333.75 , 335.625, 337.5  , 339.375, 341.25 , 343.125,\n",
             "       345.   , 346.875, 348.75 , 350.625, 352.5  , 354.375, 356.25 , 358.125],\n",
             "      dtype=float32)
    • time
      (time)
      datetime64[ns]
      1978-01-16T12:00:00 ... 2011-12-16T12:00:00
      standard_name :
      time
      axis :
      T
      array(['1978-01-16T12:00:00.000000000', '1978-02-15T00:00:00.000000000',\n",
             "       '1978-03-16T12:00:00.000000000', ..., '2011-10-16T12:00:00.000000000',\n",
             "       '2011-11-16T00:00:00.000000000', '2011-12-16T12:00:00.000000000'],\n",
             "      dtype='datetime64[ns]')
  • stash_item :
    236
    stash_model :
    1
    lookup_source :
    defaults (cdunifpp V0.13)
    long_name :
    air temperature at 1.5 m
    cell_methods :
    time: mean
    units :
    K
    stash_section :
    3
    name :
    tas
    standard_name :
    air_temperature
" ], "text/plain": [ "\n", "array([[245.47575, 245.20438, 244.97054, ..., 246.39372, 246.07375, 245.7676 ],\n", " [237.7508 , 237.50137, 237.2808 , ..., 238.60011, 238.29971, 238.01099],\n", " [223.22281, 222.64638, 222.1247 , ..., 225.09627, 224.4605 , 223.82877],\n", " ...,\n", " [225.40732, 225.0019 , 224.64072, ..., 226.89435, 226.37177, 225.8708 ],\n", " [243.86288, 243.60034, 243.36409, ..., 244.71707, 244.42044, 244.13028],\n", " [249.29846, 249.13568, 248.98572, ..., 249.84563, 249.65027, 249.47154]],\n", " dtype=float32)\n", "Coordinates:\n", " lat float32 -85.0\n", " * lon (lon) float32 0.0 1.875 3.75 5.625 ... 352.5 354.375 356.25 358.125\n", " * time (time) datetime64[ns] 1978-01-16T12:00:00 ... 2011-12-16T12:00:00\n", "Attributes:\n", " stash_item: 236\n", " stash_model: 1\n", " lookup_source: defaults (cdunifpp V0.13)\n", " long_name: air temperature at 1.5 m\n", " cell_methods: time: mean\n", " units: K\n", " stash_section: 3\n", " name: tas\n", " standard_name: air_temperature" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tas.sel(lat=-85)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Xarray can even interpolate for you. You don't have to know the exact values of the points that are in your array. For example you could ask for the nearest points to 100°E in longitude. I would not use this to actually interpolate a whole field! Just use it to save on typing or if you have a projected grid." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "Show/Hide data repr\n", "\n", "\n", "\n", "\n", "\n", "Show/Hide attributes\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
xarray.DataArray
'tas'
  • time: 408
  • lon: 192
  • 244.72847 244.6048 244.46696 ... 249.38219 249.31053 249.22963
    array([[244.72847, 244.6048 , 244.46696, ..., 245.10951, 244.98584, 244.85846],\n",
           "       [237.15535, 237.08058, 237.00803, ..., 237.37787, 237.30826, 237.23053],\n",
           "       [222.5073 , 222.38301, 222.26732, ..., 222.87436, 222.75264, 222.6297 ],\n",
           "       ...,\n",
           "       [225.26901, 225.17426, 225.08258, ..., 225.56303, 225.46518, 225.36763],\n",
           "       [242.8312 , 242.72885, 242.62749, ..., 243.12952, 243.03493, 242.93588],\n",
           "       [249.1559 , 249.08406, 249.00839, ..., 249.38219, 249.31053, 249.22963]],\n",
           "      dtype=float32)
    • lat
      ()
      float32
      -87.5
      units :
      degrees_north
      standard_name :
      latitude
      point_spacing :
      even
      axis :
      Y
      array(-87.5, dtype=float32)
    • lon
      (lon)
      float32
      0.0 1.875 3.75 ... 356.25 358.125
      units :
      degrees_east
      standard_name :
      longitude
      modulo :
      360.0
      point_spacing :
      even
      axis :
      X
      array([  0.   ,   1.875,   3.75 ,   5.625,   7.5  ,   9.375,  11.25 ,  13.125,\n",
             "        15.   ,  16.875,  18.75 ,  20.625,  22.5  ,  24.375,  26.25 ,  28.125,\n",
             "        30.   ,  31.875,  33.75 ,  35.625,  37.5  ,  39.375,  41.25 ,  43.125,\n",
             "        45.   ,  46.875,  48.75 ,  50.625,  52.5  ,  54.375,  56.25 ,  58.125,\n",
             "        60.   ,  61.875,  63.75 ,  65.625,  67.5  ,  69.375,  71.25 ,  73.125,\n",
             "        75.   ,  76.875,  78.75 ,  80.625,  82.5  ,  84.375,  86.25 ,  88.125,\n",
             "        90.   ,  91.875,  93.75 ,  95.625,  97.5  ,  99.375, 101.25 , 103.125,\n",
             "       105.   , 106.875, 108.75 , 110.625, 112.5  , 114.375, 116.25 , 118.125,\n",
             "       120.   , 121.875, 123.75 , 125.625, 127.5  , 129.375, 131.25 , 133.125,\n",
             "       135.   , 136.875, 138.75 , 140.625, 142.5  , 144.375, 146.25 , 148.125,\n",
             "       150.   , 151.875, 153.75 , 155.625, 157.5  , 159.375, 161.25 , 163.125,\n",
             "       165.   , 166.875, 168.75 , 170.625, 172.5  , 174.375, 176.25 , 178.125,\n",
             "       180.   , 181.875, 183.75 , 185.625, 187.5  , 189.375, 191.25 , 193.125,\n",
             "       195.   , 196.875, 198.75 , 200.625, 202.5  , 204.375, 206.25 , 208.125,\n",
             "       210.   , 211.875, 213.75 , 215.625, 217.5  , 219.375, 221.25 , 223.125,\n",
             "       225.   , 226.875, 228.75 , 230.625, 232.5  , 234.375, 236.25 , 238.125,\n",
             "       240.   , 241.875, 243.75 , 245.625, 247.5  , 249.375, 251.25 , 253.125,\n",
             "       255.   , 256.875, 258.75 , 260.625, 262.5  , 264.375, 266.25 , 268.125,\n",
             "       270.   , 271.875, 273.75 , 275.625, 277.5  , 279.375, 281.25 , 283.125,\n",
             "       285.   , 286.875, 288.75 , 290.625, 292.5  , 294.375, 296.25 , 298.125,\n",
             "       300.   , 301.875, 303.75 , 305.625, 307.5  , 309.375, 311.25 , 313.125,\n",
             "       315.   , 316.875, 318.75 , 320.625, 322.5  , 324.375, 326.25 , 328.125,\n",
             "       330.   , 331.875, 333.75 , 335.625, 337.5  , 339.375, 341.25 , 343.125,\n",
             "       345.   , 346.875, 348.75 , 350.625, 352.5  , 354.375, 356.25 , 358.125],\n",
             "      dtype=float32)
    • time
      (time)
      datetime64[ns]
      1978-01-16T12:00:00 ... 2011-12-16T12:00:00
      standard_name :
      time
      axis :
      T
      array(['1978-01-16T12:00:00.000000000', '1978-02-15T00:00:00.000000000',\n",
             "       '1978-03-16T12:00:00.000000000', ..., '2011-10-16T12:00:00.000000000',\n",
             "       '2011-11-16T00:00:00.000000000', '2011-12-16T12:00:00.000000000'],\n",
             "      dtype='datetime64[ns]')
  • stash_item :
    236
    stash_model :
    1
    lookup_source :
    defaults (cdunifpp V0.13)
    long_name :
    air temperature at 1.5 m
    cell_methods :
    time: mean
    units :
    K
    stash_section :
    3
    name :
    tas
    standard_name :
    air_temperature
" ], "text/plain": [ "\n", "array([[244.72847, 244.6048 , 244.46696, ..., 245.10951, 244.98584, 244.85846],\n", " [237.15535, 237.08058, 237.00803, ..., 237.37787, 237.30826, 237.23053],\n", " [222.5073 , 222.38301, 222.26732, ..., 222.87436, 222.75264, 222.6297 ],\n", " ...,\n", " [225.26901, 225.17426, 225.08258, ..., 225.56303, 225.46518, 225.36763],\n", " [242.8312 , 242.72885, 242.62749, ..., 243.12952, 243.03493, 242.93588],\n", " [249.1559 , 249.08406, 249.00839, ..., 249.38219, 249.31053, 249.22963]],\n", " dtype=float32)\n", "Coordinates:\n", " lat float32 -87.5\n", " * lon (lon) float32 0.0 1.875 3.75 5.625 ... 352.5 354.375 356.25 358.125\n", " * time (time) datetime64[ns] 1978-01-16T12:00:00 ... 2011-12-16T12:00:00\n", "Attributes:\n", " stash_item: 236\n", " stash_model: 1\n", " lookup_source: defaults (cdunifpp V0.13)\n", " long_name: air temperature at 1.5 m\n", " cell_methods: time: mean\n", " units: K\n", " stash_section: 3\n", " name: tas\n", " standard_name: air_temperature" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tas.sel(lat=-87, method='nearest')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quick plotting\n", "It is very easy to do a quick plot of your data. It isn't a plot ready for publication but it can easily allow you to visualise your fields. And the nice touch is Xarray will automatically use the meta-data to add labelling to the plot." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "tas.sel(time='1978-01').plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note we haven't loaded matplotlib, but because xarray uses it, it loads it for us." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save data to file\n", "There is a very simple way to save data back to a netcdf file. It isn't necessarily the fastest way. But you should only have to write out some analysed fields using Python which means relatively small amounts of data.\n", "\n", "Note, netcdf support inline compression so you should ALWAYS save your data compressed. Inline compression means the file just looks the same, access is the same, you don't need to uncompress before being able to see the information from the file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compression\n", "encod={}\n", "for var in ds.data_vars: # data_vars stores the names of the variables in a dataset, as strings\n", " encod[var]={'zlib':True}\n", "\n", "# Write to file\n", "ds.to_netcdf('test.nc',encoding=encod)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More info\n", "This was a very very quick presentation of `xarray`. We were simply trying to give you the very basics, especially highlights the philosophy of the package. We will run a `xarray` training soon. In the meantime if you want to know more, you can always run through [xarray quick overview](http://xarray.pydata.org/en/stable/quick-overview.html)\n", "\n", "In addition, the CMS team has a [blog](https://climate-cms.org/) with quite a few blogs using `xarray` and Python in general. Feel free to check those as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 2 }