{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "name = '2017-10-16-masked-arrays'\n",
    "title = 'Masked arrays in NumPy'\n",
    "tags = 'numpy'\n",
    "author = 'Denis Sergeev'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nb_tools import connect_notebook_to_post\n",
    "from IPython.core.display import HTML\n",
    "\n",
    "html = connect_notebook_to_post(name, title, tags, author)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A masked array includes:\n",
    "- mask of bad values travels with the array.\n",
    "\n",
    "\n",
    "Those elements deemed bad are treated as if they did not exist. Operations using the array automatically use the mask of bad values.\n",
    "\n",
    "\n",
    "Typically bad values may represent something like a land mask (i.e. sea surface temperature only exists where there is ocean)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "38ba6171-e3af-4083-9c19-e6e3f38e4891"
    }
   },
   "source": [
    "![masked array schematic](http://nbviewer.jupyter.org/github/ueapy/enveast_python_course_materials/blob/master/figures/masked_array.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "f2e85099-ffb1-45a3-bb8d-cac1439c62d9"
    }
   },
   "source": [
    "All operations related to masked arrays live in `numpy.ma` submodule."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 3, 4, 5])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.array([1, 2, 3, 4, 5])\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "3f014dab-6e80-43d0-8c94-35bc123114f9"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The simplest example of manual creation of a masked array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "nbpresent": {
     "id": "7eba33b7-40c1-4195-893b-071a03b2b65f"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "masked_array(data = [-- 2 3 -- 5],\n",
       "             mask = [ True False False  True False],\n",
       "       fill_value = 999999)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mx = np.ma.masked_array(data=x,\n",
    "                        mask=[True, False, False, True, False],\n",
    "#                        fill_value=-999\n",
    "                      )\n",
    "mx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can check if an array contains any masked values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.ma.is_masked(mx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or we can check if a particular element is masked"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mx[1] is np.ma.masked"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The original data are not erased, they are still stored in the `data` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 3, 4, 5])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mx.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Mask"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can be accessed directly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ True, False, False,  True, False], dtype=bool)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mx.mask"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The masked entries can be filled with a given value to get an usual array back:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([999999,      2,      3, 999999,      5])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mx.filled()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The mask can also be cleared:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "mx.mask = np.ma.nomask"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False, False, False, False], dtype=bool)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mx.mask"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Domain-aware functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some functions handle masked values automatically, e.g. the `log` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "masked_array(data = [0.0 0.6931471805599453 1.0986122886681098 1.3862943611198906\n",
       " 1.6094379124341003],\n",
       "             mask = [False False False False False],\n",
       "       fill_value = 999999)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.log(mx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "masked_array(data = [0.0 0.6931471805599453 1.0986122886681098 1.3862943611198906\n",
       " 1.6094379124341003],\n",
       "             mask = [False False False False False],\n",
       "       fill_value = 999999)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.ma.log(mx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that result is the same."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Others don't see the mask, and so a relevant function from the `ma` submodule should be used instead (if it exists):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "55"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.dot(mx, mx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "masked_array(data = 55,\n",
       "             mask = False,\n",
       "       fill_value = 999999)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.ma.dot(mx, mx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Array creation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "9175cb41-3123-45e0-904c-9fe219070fc5"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Often, a task is to mask array depending on a criterion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "nbpresent": {
     "id": "484c1561-5a07-425e-a56d-8af2ff9a7094"
    }
   },
   "outputs": [],
   "source": [
    "a = np.linspace(1, 15, 15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,\n",
       "        12.,  13.,  14.,  15.])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "nbpresent": {
     "id": "22202376-6934-490e-bd31-aaf7017cea05"
    }
   },
   "outputs": [],
   "source": [
    "masked_a = np.ma.masked_greater_equal(a, 11)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "nbpresent": {
     "id": "4c6a3958-60de-4b04-a7ab-cab1a5d7bf76"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "masked_array(data = [1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 -- -- -- -- --],\n",
       "             mask = [False False False False False False False False False False  True  True\n",
       "  True  True  True],\n",
       "       fill_value = 1e+20)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "masked_a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Other simple examples can be found in the NumPy Docs:\n",
    "https://docs.scipy.org/doc/numpy-1.13.0/reference/maskedarray.generic.html#examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <small>\n",
       "    <p> This post was written as an IPython (Jupyter) notebook. You can view or download it using\n",
       "    <a href=\"http://nbviewer.ipython.org/github/ueapy/ueapy.github.io/blob/src/content/notebooks/2017-10-16-masked-arrays.ipynb\">nbviewer</a>.</p>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "HTML(html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}