{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "init_cell": true,
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<!-- The customized css for the slides -->\n",
       "<link rel=\"stylesheet\" type=\"text/css\" href=\"../../assets/styles/basic.css\"/>\n",
       "<link rel=\"stylesheet\" type=\"text/css\" href=\"../../assets/styles/python-programming-basic.css\"/>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<!-- The customized css for the slides -->\n",
    "<link rel=\"stylesheet\" type=\"text/css\" href=\"../../assets/styles/basic.css\"/>\n",
    "<link rel=\"stylesheet\" type=\"text/css\" href=\"../../assets/styles/python-programming-basic.css\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Install the necessary dependencies\n",
    "\n",
    "import os\n",
    "import sys\n",
    "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "source": [
    "---\n",
    "license:\n",
    "    code: MIT\n",
    "    content: CC-BY-4.0\n",
    "github: https://github.com/ocademy-ai/machine-learning\n",
    "venue: By Ocademy\n",
    "open_access: true\n",
    "bibliography:\n",
    "  - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n",
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "notebookRunGroups": {
     "groupValue": ""
    },
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# NumPy and Pandas"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 1. NumPy\n",
    "\n",
    "Numpy is a library for working with tensors, i.e. multi-dimensional arrays. Array has values of the same underlying type, and it is simpler than dataframe, but it offers more mathematical operations, and creates less overhead.\n",
    "\n",
    "<div class=\"alignRight\">\n",
    "  <img class=\"removeMargin\" height=\"100\" src=\"../images/numpy.png\"/>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### The basics of NumPy arrays\n",
    "\n",
    "Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas are built around the NumPy array.\n",
    "\n",
    "We’ll cover a few categories of basic array manipulations here:\n",
    "\n",
    "- Attributes of arrays: Determining the size, shape, memory consumption, and data types of arrays.\n",
    "- Indexing of arrays: Getting and setting the value of individual array elements.\n",
    "- Slicing of arrays: Getting and setting smaller subarrays within a larger array.\n",
    "- Reshaping of arrays: Changing the shape of a given array Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### NumPy array attributes\n",
    "\n",
    "First let’s discuss some useful array attributes. We’ll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "np.random.seed(0)  # seed for reproducibility\n",
    "\n",
    "x1 = np.random.randint(10, size=6)  # One-dimensional array\n",
    "x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array\n",
    "x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x3 ndim:  3\n",
      "x3 shape: (3, 4, 5)\n",
      "x3 size:  60\n"
     ]
    }
   ],
   "source": [
    "print(\"x3 ndim: \", x3.ndim)\n",
    "print(\"x3 shape:\", x3.shape)\n",
    "print(\"x3 size: \", x3.size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(\"dtype:\", x3.dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "itemsize: 8 bytes\n",
      "nbytes: 480 bytes\n"
     ]
    }
   ],
   "source": [
    "print(\"itemsize:\", x3.itemsize, \"bytes\")\n",
    "print(\"nbytes:\", x3.nbytes, \"bytes\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split",
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Computation on NumPy arrays: universal functions\n",
    "\n",
    "Computation on NumPy arrays can be very fast, or it can be very slow. The key to making it fast is to use vectorized operations, generally implemented through NumPy’s universal functions (ufuncs)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "np.random.seed(0)\n",
    "\n",
    "\n",
    "def compute_reciprocals(values):\n",
    "    output = np.empty(len(values))\n",
    "    for i in range(len(values)):\n",
    "        output[i] = 1.0 / values[i]\n",
    "    return output\n",
    "\n",
    "\n",
    "values = np.random.randint(1, 10, size=5)\n",
    "compute_reciprocals(values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "cell_style": "split",
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.87 s ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "big_array = np.random.randint(1, 100, size=1000000)\n",
    "%timeit compute_reciprocals(big_array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.16666667 1.         0.25       0.25       0.125     ]\n",
      "[0.16666667 1.         0.25       0.25       0.125     ]\n"
     ]
    }
   ],
   "source": [
    "print(compute_reciprocals(values))\n",
    "print(1.0 / values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "606 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit (1.0 / big_array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Exploring NumPy’s ufuncs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x     = [0 1 2 3]\n",
      "x + 5 = [5 6 7 8]\n",
      "x - 5 = [-5 -4 -3 -2]\n",
      "x * 2 = [0 2 4 6]\n",
      "x / 2 = [0.  0.5 1.  1.5]\n",
      "x // 2 = [0 0 1 1]\n"
     ]
    }
   ],
   "source": [
    "x = np.arange(4)\n",
    "print(\"x     =\", x)\n",
    "print(\"x + 5 =\", x + 5)\n",
    "print(\"x - 5 =\", x - 5)\n",
    "print(\"x * 2 =\", x * 2)\n",
    "print(\"x / 2 =\", x / 2)\n",
    "print(\"x // 2 =\", x // 2)  # floor division"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Aggregations: min, max, and everything in between\n",
    "\n",
    "Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50.461758453195614"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "L = np.random.random(100)\n",
    "sum(L)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50.46175845319564"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.sum(L)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "74.9 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
      "193 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "big_array = np.random.rand(1000000)\n",
    "%timeit sum(big_array)\n",
    "%timeit np.sum(big_array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Minimum and maximum"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7.071203171893359e-07, 0.9999997207656334)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "min(big_array), max(big_array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7.071203171893359e-07, 0.9999997207656334)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.min(big_array), np.max(big_array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "54.5 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
      "175 µs ± 2.59 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit min(big_array)\n",
    "%timeit np.min(big_array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7.071203171893359e-07 0.9999997207656334 500216.8034810001\n"
     ]
    }
   ],
   "source": [
    "print(big_array.min(), big_array.max(), big_array.sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Computation on arrays: broadcasting\n",
    "\n",
    "Another means of vectorizing operations is to use NumPy’s broadcasting functionality. Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 6, 7])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = np.array([0, 1, 2])\n",
    "b = np.array([5, 5, 5])\n",
    "a + b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 6, 7])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a + 5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1., 1., 1.],\n",
       "       [1., 1., 1.],\n",
       "       [1., 1., 1.]])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "M = np.ones((3, 3))\n",
    "M"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1., 2., 3.],\n",
       "       [1., 2., 3.],\n",
       "       [1., 2., 3.]])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "M + a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Structured data: NumPy’s structured arrays\n",
    "\n",
    "While often our data can be well represented by a homogeneous array of values, sometimes this is not the case. This section demonstrates the use of NumPy’s structured arrays and record arrays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "name = ['Alice', 'Bob', 'Cathy', 'Doug']\n",
    "age = [25, 45, 37, 19]\n",
    "weight = [55.0, 85.5, 68.0, 61.5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]\n"
     ]
    }
   ],
   "source": [
    "# Use a compound data type for structured arrays\n",
    "data = np.zeros(4, dtype={\n",
    "    'names': ('name', 'age', 'weight'),\n",
    "    'formats': ('U10', 'i4', 'f8')\n",
    "})\n",
    "print(data.dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )\n",
      " ('Doug', 19, 61.5)]\n"
     ]
    }
   ],
   "source": [
    "data['name'] = name\n",
    "data['age'] = age\n",
    "data['weight'] = weight\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get all names\n",
    "data['name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('Alice', 25, 55.)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get first row of data\n",
    "data[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Doug'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the name from the last row\n",
    "data[-1]['name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Alice', 'Doug'], dtype='<U10')"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get names where age is under 30\n",
    "data[data['age'] < 30]['name']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "This section on structured and record arrays is purposely at the end of this chapter, because it leads so well into the next package we will cover: Pandas."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 2. Pandas\n",
    "\n",
    "Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,\n",
    "built on top of the Python programming language.\n",
    "\n",
    "<div class=\"alignRight\">\n",
    "  <img class=\"removeMargin\" height=\"100\" src=\"../images/pandas.png\"/>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Introducing Pandas objects\n",
    "\n",
    "At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than simple integer indices."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### The Pandas Series object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.25\n",
       "1    0.50\n",
       "2    0.75\n",
       "3    1.00\n",
       "dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "data = pd.Series([0.25, 0.5, 0.75, 1.0])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.25, 0.5 , 0.75, 1.  ])"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=4, step=1)"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    0.50\n",
       "2    0.75\n",
       "dtype: float64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[1]\n",
    "data[1:3]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Data indexing and selection\n",
    "\n",
    "In Pandas, we looked in detail at methods and tools to access, set, and modify values in NumPy arrays. These included indexing (e.g., `arr[2, 1]`), slicing (e.g., `arr[:, 1:5]`), masking (e.g., `arr[arr > 0]`), fancy indexing (e.g., `arr[0, [1, 5]]`), and combinations thereof (e.g., `arr[:, [1, 5]]`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Data selection in Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0.25\n",
       "b    0.50\n",
       "c    0.75\n",
       "d    1.00\n",
       "dtype: float64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.Series(\n",
    "    [0.25, 0.5, 0.75, 1.0],\n",
    "    index=['a', 'b', 'c', 'd']\n",
    ")\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['b']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'a' in data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['a', 'b', 'c', 'd'], dtype='object')"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(data.items())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    a\n",
       "3    b\n",
       "5    c\n",
       "dtype: object"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3    b\n",
       "5    c\n",
       "dtype: object"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.iloc[1:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'a'"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.loc[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Data selection in DataFrame\n",
    "\n",
    "Recall that a `DataFrame` acts in many ways like a two-dimensional or structured array, and in other ways like a dictionary of `Series` structures sharing the same index. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>area</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>423967</td>\n",
       "      <td>38332521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>695662</td>\n",
       "      <td>26448193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>141297</td>\n",
       "      <td>19651127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>170312</td>\n",
       "      <td>19552860</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>149995</td>\n",
       "      <td>12882135</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              area       pop\n",
       "California  423967  38332521\n",
       "Texas       695662  26448193\n",
       "New York    141297  19651127\n",
       "Florida     170312  19552860\n",
       "Illinois    149995  12882135"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "area = pd.Series({\n",
    "    'California': 423967, 'Texas': 695662,\n",
    "    'New York': 141297, 'Florida': 170312,\n",
    "    'Illinois': 149995\n",
    "})\n",
    "pop = pd.Series({\n",
    "    'California': 38332521, 'Texas': 26448193,\n",
    "    'New York': 19651127, 'Florida': 19552860,\n",
    "    'Illinois': 12882135\n",
    "})\n",
    "data = pd.DataFrame({'area': area, 'pop': pop})\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California    423967\n",
       "Texas         695662\n",
       "New York      141297\n",
       "Florida       170312\n",
       "Illinois      149995\n",
       "Name: area, dtype: int64"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['area']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California    423967\n",
       "Texas         695662\n",
       "New York      141297\n",
       "Florida       170312\n",
       "Illinois      149995\n",
       "Name: area, dtype: int64"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.area"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.area is data['area']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### DataFrame as two-dimensional array\n",
    "\n",
    "As mentioned previously, we can also view the `DataFrame` as an enhanced two-dimensional array. We can examine the raw underlying data array using the `values` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  423967, 38332521],\n",
       "       [  695662, 26448193],\n",
       "       [  141297, 19651127],\n",
       "       [  170312, 19552860],\n",
       "       [  149995, 12882135]])"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>California</th>\n",
       "      <th>Texas</th>\n",
       "      <th>New York</th>\n",
       "      <th>Florida</th>\n",
       "      <th>Illinois</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>area</th>\n",
       "      <td>423967</td>\n",
       "      <td>695662</td>\n",
       "      <td>141297</td>\n",
       "      <td>170312</td>\n",
       "      <td>149995</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pop</th>\n",
       "      <td>38332521</td>\n",
       "      <td>26448193</td>\n",
       "      <td>19651127</td>\n",
       "      <td>19552860</td>\n",
       "      <td>12882135</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      California     Texas  New York   Florida  Illinois\n",
       "area      423967    695662    141297    170312    149995\n",
       "pop     38332521  26448193  19651127  19552860  12882135"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>area</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>423967</td>\n",
       "      <td>38332521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>695662</td>\n",
       "      <td>26448193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>141297</td>\n",
       "      <td>19651127</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              area       pop\n",
       "California  423967  38332521\n",
       "Texas       695662  26448193\n",
       "New York    141297  19651127"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.iloc[:3, :2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>area</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>423967</td>\n",
       "      <td>38332521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>695662</td>\n",
       "      <td>26448193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>141297</td>\n",
       "      <td>19651127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>170312</td>\n",
       "      <td>19552860</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>149995</td>\n",
       "      <td>12882135</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              area       pop\n",
       "California  423967  38332521\n",
       "Texas       695662  26448193\n",
       "New York    141297  19651127\n",
       "Florida     170312  19552860\n",
       "Illinois    149995  12882135"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.loc[:'Illinois', :'pop']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Operating on data in Pandas\n",
    "\n",
    "Pandas inherits much of this functionality from NumPy, and the ufuncs are key to this."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Ufuncs: index preservation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    6\n",
       "1    3\n",
       "2    7\n",
       "3    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rng = np.random.RandomState(42)\n",
    "ser = pd.Series(rng.randint(0, 10, 4))\n",
    "ser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A  B  C  D\n",
       "0  6  9  2  6\n",
       "1  7  4  3  7\n",
       "2  7  2  5  4"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(\n",
    "    rng.randint(0, 10, (3, 4)),\n",
    "    columns=['A', 'B', 'C', 'D']\n",
    ")\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     403.428793\n",
       "1      20.085537\n",
       "2    1096.633158\n",
       "3      54.598150\n",
       "dtype: float64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.exp(ser)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Handling missing data\n",
    "\n",
    "Real-world data is rarely clean and homogeneous.\n",
    "\n",
    "Pandas chose to use sentinels for missing data, and further chose to use two already-existing Python null values: the special floating-point `NaN` value, and the Python `None` object."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### `None`: pythonic missing data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, None, 3, 4], dtype=object)"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vals1 = np.array([1, None, 3, 4])\n",
    "vals1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This `dtype=object` means that the best common type representation NumPy could infer for the contents of the array is that they are Python objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dtype = object\n",
      "50.6 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
      "\n",
      "dtype = int\n",
      "589 µs ± 5.26 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for dtype in ['object', 'int']:\n",
    "    print(\"dtype =\", dtype)\n",
    "    %timeit np.arange(1E6, dtype=dtype).sum()\n",
    "    print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "unsupported operand type(s) for +: 'int' and 'NoneType'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m/Users/zhangqi/ws/machine-learning/open-machine-learning-jupyter-book/slides/python-programming/python-in-data-science.ipynb Cell 79\u001b[0m in \u001b[0;36m<cell line: 1>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> <a href='vscode-notebook-cell:/Users/zhangqi/ws/machine-learning/open-machine-learning-jupyter-book/slides/python-programming/python-in-data-science.ipynb#Y212sZmlsZQ%3D%3D?line=0'>1</a>\u001b[0m vals1\u001b[39m.\u001b[39;49msum()\n",
      "File \u001b[0;32m/usr/local/lib/python3.9/site-packages/numpy/core/_methods.py:48\u001b[0m, in \u001b[0;36m_sum\u001b[0;34m(a, axis, dtype, out, keepdims, initial, where)\u001b[0m\n\u001b[1;32m     46\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m_sum\u001b[39m(a, axis\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m, dtype\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m, out\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m, keepdims\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m,\n\u001b[1;32m     47\u001b[0m          initial\u001b[39m=\u001b[39m_NoValue, where\u001b[39m=\u001b[39m\u001b[39mTrue\u001b[39;00m):\n\u001b[0;32m---> 48\u001b[0m     \u001b[39mreturn\u001b[39;00m umr_sum(a, axis, dtype, out, keepdims, initial, where)\n",
      "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for +: 'int' and 'NoneType'"
     ]
    }
   ],
   "source": [
    "vals1.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### NaN: missing numerical data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('float64')"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vals2 = np.array([1, np.nan, 3, 4]) \n",
    "vals2.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Notice that NumPy chose a native floating-point type for this array: this means that unlike the object array from before, this array supports fast operations pushed into compiled code. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "590 µs ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "%timeit np.arange(1E6, dtype=dtype).sum()\n",
    "print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vals2.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### NaN and None in Pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    NaN\n",
       "2    2.0\n",
       "3    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series([1, np.nan, 2, None])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = pd.Series(range(2), dtype=int)\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    NaN\n",
       "1    1.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[0] = None\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Operating on null values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "data = pd.Series([1, np.nan, 'hello', None])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1     True\n",
       "2    False\n",
       "3     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.isnull()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        1\n",
       "2    hello\n",
       "dtype: object"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[data.notnull()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        1\n",
       "2    hello\n",
       "dtype: object"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1.0\n",
       "b    NaN\n",
       "c    2.0\n",
       "d    NaN\n",
       "e    3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.Series([1, np.nan, 2, None, 3], index=list('abcde'))\n",
    "data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1.0\n",
       "b    0.0\n",
       "c    2.0\n",
       "d    0.0\n",
       "e    3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = data.fillna(0)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Combining datasets: concat and append\n",
    "\n",
    "Some of the most interesting studies of data come from combining different data sources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>C1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C\n",
       "0  A0  B0  C0\n",
       "1  A1  B1  C1\n",
       "2  A2  B2  C2"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def make_df(cols, ind):\n",
    "    \"\"\"Quickly make a DataFrame\"\"\"\n",
    "    data = {c: [str(c) + str(i) for i in ind]\n",
    "            for c in cols}\n",
    "    return pd.DataFrame(data, ind)\n",
    "\n",
    "\n",
    "# example DataFrame\n",
    "make_df('ABC', range(3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Simple concatenation with pd.concat\n",
    "\n",
    "Pandas has a function, `pd.concat()`, which has a similar syntax to `np.concatenate` but contains a number of options that we’ll discuss momentarily:\n",
    "\n",
    "```\n",
    "pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,\n",
    "          keys=None, levels=None, names=None, verify_integrity=False,\n",
    "          copy=True)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    A\n",
       "2    B\n",
       "3    C\n",
       "dtype: object"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3])\n",
    "ser1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4    D\n",
       "5    E\n",
       "6    F\n",
       "dtype: object"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ser2 = pd.Series(['D', 'E', 'F'], index=[4, 5, 6])\n",
    "ser2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    A\n",
       "2    B\n",
       "3    C\n",
       "4    D\n",
       "5    E\n",
       "6    F\n",
       "dtype: object"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([ser1, ser2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Combining datasets: merge and join\n",
    "\n",
    "One essential feature offered by Pandas is its high-performance, in-memory join and merge operations.\n",
    "\n",
    "The `pd.merge()` function implements a number of types of joins: the *one-to-one*, *many-to-one*, and *many-to-many* joins."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>employee</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Bob</td>\n",
       "      <td>Accounting</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jake</td>\n",
       "      <td>Engineering</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Lisa</td>\n",
       "      <td>Engineering</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sue</td>\n",
       "      <td>HR</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  employee        group\n",
       "0      Bob   Accounting\n",
       "1     Jake  Engineering\n",
       "2     Lisa  Engineering\n",
       "3      Sue           HR"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = pd.DataFrame({\n",
    "    'employee': ['Bob', 'Jake', 'Lisa', 'Sue'],\n",
    "    'group': ['Accounting', 'Engineering', 'Engineering', 'HR']\n",
    "})\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>employee</th>\n",
       "      <th>hire_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Lisa</td>\n",
       "      <td>2004</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>2008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Jake</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sue</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  employee  hire_date\n",
       "0     Lisa       2004\n",
       "1      Bob       2008\n",
       "2     Jake       2012\n",
       "3      Sue       2014"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.DataFrame({\n",
    "    'employee': ['Lisa', 'Bob', 'Jake', 'Sue'],\n",
    "    'hire_date': [2004, 2008, 2012, 2014]\n",
    "})\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>employee</th>\n",
       "      <th>group</th>\n",
       "      <th>hire_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Bob</td>\n",
       "      <td>Accounting</td>\n",
       "      <td>2008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jake</td>\n",
       "      <td>Engineering</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Lisa</td>\n",
       "      <td>Engineering</td>\n",
       "      <td>2004</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sue</td>\n",
       "      <td>HR</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  employee        group  hire_date\n",
       "0      Bob   Accounting       2008\n",
       "1     Jake  Engineering       2012\n",
       "2     Lisa  Engineering       2004\n",
       "3      Sue           HR       2014"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = pd.merge(df1, df2)\n",
    "df3"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 3. Your turn! 🚀\n",
    "\n",
    "[Data processing in Python](../../data-science/working-with-data/numpy.html#your-turn)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 4. References\n",
    "\n",
    "1. [Working with data - Numpy](https://ocademy-ai.github.io/machine-learning/data-science/working-with-data/numpy.html)\n",
    "2. [Working with data - Pandas](https://ocademy-ai.github.io/machine-learning/data-science/working-with-data/pandas.html)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "init_cell": "run_on_kernel_ready",
  "jupytext": {
   "formats": "ipynb"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "rise": {
   "autolaunch": true,
   "chalkboard": {
    "color": [
     "rgb(250, 250, 250)",
     "rgb(250, 250, 250)"
    ]
   },
   "enable_chalkboard": true,
   "header": "<div class=\"logo header\"><img class=\"removeMargin\" src=\"../images/logo.png\" width=\"50\"/><b class=\"headerText\" style=\"margin-left:10px;\">Ocademy</b></div>",
   "scroll": true
  },
  "vscode": {
   "interpreter": {
    "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}