{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ede102be",
   "metadata": {},
   "source": [
    "# Introduction to Jupyter Notebooks and Pandas\n",
    "\n",
    "## What is a Jupyter Notebook? 📓\n",
    "\n",
    "- A [Jupyter](https://jupyter.org/) notebook is a document that can contain live code w/ results, visualizations, and rich text.\n",
    "- It is widely used in data science and analytics.\n",
    "- A Jupyter notebook has a `.ipynb` file extension (e.g., `my_notebook.ipynb`).\n",
    "- A Jupyter notebook is a list of cells.\n",
    "\n",
    "### How do you run a Jupyter notebook?\n",
    "\n",
    "A Jupyter notebook can be run in one of the following Jupyter environments.\n",
    "\n",
    "1. Jupyter Notebook - original web application for creating and sharing computational documents\n",
    "2. JupyterLab - a web-based development environment for notebooks (considered a newer version of the Jupyter Notebook)\n",
    "3. [Google Colab](https://colab.research.google.com/) - Google's cloud notebook platform built on top of [Jupyter](https://jupyter.org/) environment\n",
    "\n",
    "The first two environments require installations on your local machine or a server. We will use Google Colab as you can run it inside a cloud environment.\n",
    "\n",
    "### Types of cells\n",
    "\n",
    "Every cell in a Jupyter notebook is of a specific type. The list of supported types vary by Jupyter environment.\n",
    "\n",
    "Google Colab supports two types of cells.\n",
    "\n",
    "1. Code cell\n",
    "2. Text cell (also known as a Markdown cell)\n",
    "\n",
    "The cell below is a *code* cell. It contains a block of executable code.\n",
    "\n",
    "Run the code below by clicking on the cell below and clicking the \"Run\" icon (▶)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22a8ee16",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(10 + 20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36734cbb",
   "metadata": {
    "id": "DiznwS2xem7h"
   },
   "source": [
    "▶️ Run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9b06ef4c",
   "metadata": {
    "id": "xOFvIip0em7h"
   },
   "outputs": [],
   "source": [
    "import unittest\n",
    "tc = unittest.TestCase()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1613fdd6",
   "metadata": {
    "id": "Dg-bBkF8fGaN"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 1: Find the sum of a list\n",
    "\n",
    "#### 👇 Tasks\n",
    "\n",
    "- ✔️ Complete the code cell below to find the sum of all values in `my_list`.\n",
    "- ✔️ Store the result in a new variable named `result`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d097c610",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "478\n"
     ]
    }
   ],
   "source": [
    "my_list = [11, 20, 52, 91, 90, 75, 74, 20, 21, 10, 14]\n",
    "\n",
    "### BEGIN SOLUTION\n",
    "result = 0\n",
    "\n",
    "for num in my_list:\n",
    "    result = result + num\n",
    "### END SOLUTION\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2538ff9",
   "metadata": {
    "id": "EdrK-mBsem7r"
   },
   "source": [
    "#### 🧭 Check Your Work\n",
    "\n",
    "- Once you're done, run the code cell below to test correctness.\n",
    "- ✔️ If the code cell runs without an error, you're good to move on.\n",
    "- ❌ If the code cell throws an error, go back and fix any incorrect parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1b03a629",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Challenge 1 Autograder\n",
    "import unittest\n",
    "\n",
    "tc = unittest.TestCase()\n",
    "\n",
    "tc.assertEqual(result, 478)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d571480",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Introduction to Pandas\n",
    "\n",
    "Pandas is a Python *library* for data manipulation and analysis. Although it's used universally in data-related programming applications, it was initially developed for financial analysis by [AQR Capital Management](https://www.aqr.com/).\n",
    "\n",
    "Note: A *library* in the context of programming is a collection of functions (and other data) that others have already written for you.\n",
    "\n",
    "Pandas is popular for many reasons:\n",
    "\n",
    "1. 🏃🏿‍♀️ It's fast (for most cases where the dataset can be loaded to your memory).\n",
    "2. 🪒 It supports most of the features required for data manipulation.\n",
    "3. 💡 Write less code. Get more done."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ecc6d53",
   "metadata": {
    "id": "Dg-bBkF8fGaN"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 2: Import packages\n",
    "\n",
    "#### 👇 Tasks\n",
    "\n",
    "- ✔️ Import the following Python packages.\n",
    "    1. `pandas`: Use alias `pd`.\n",
    "    2. `numpy`: Use alias `np`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "26fb5cd6",
   "metadata": {
    "id": "jfnhv8_Yem7j"
   },
   "outputs": [],
   "source": [
    "### BEGIN SOLUTION\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1448b30e",
   "metadata": {
    "id": "WCOuwkrzem7j"
   },
   "source": [
    "#### 🧭 Check Your Work\n",
    "\n",
    "- Once you're done, run the code cell below to test correctness.\n",
    "- ✔️ If the code cell runs without an error, you're good to move on.\n",
    "- ❌ If the code cell throws an error, go back and fix incorrect parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "74263691",
   "metadata": {
    "id": "WQ-COKL8em7k"
   },
   "outputs": [],
   "source": [
    "# Challenge 2 Autograder\n",
    "import sys\n",
    "tc.assertTrue(\"pd\" in globals(), \"Check whether you have correctly import Pandas with an alias.\")\n",
    "tc.assertTrue(\"np\" in globals(), \"Check whether you have correctly import NumPy with an alias.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39618ef0",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### It all starts with a `Series`...\n",
    "\n",
    "The basic building block of Pandas is a `Series`. A `Series` is like a list, but with many more features.\n",
    "\n",
    "You can create a `Series` by passing a list of values to `pd.Series()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "711ee851",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    2.0\n",
       "2    3.0\n",
       "3    NaN\n",
       "4    5.0\n",
       "5    6.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([1, 2, 3, np.nan, 5, 6])\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa5ff66d",
   "metadata": {},
   "source": [
    "### Few things to note here\n",
    "\n",
    "1. These look similar to a Python `list`.\n",
    "2. The last line of the printed output tells us the data type of values in the `Series` (`dtype: float64`).\n",
    "- What the heck is `np.nan`?\n",
    "    - It is used to indicate a \"missing value\".\n",
    "    - `np.nan` is NOT the same as `0`.\n",
    "    \n",
    "### Differences between a list and a Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4f64e65c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 4, 1, 2, 3, 4]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "my_list = [1, 2, 3, 4]\n",
    "\n",
    "print(type(my_list))\n",
    "display(my_list * 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "3aa5a515",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.series.Series'>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0    2\n",
       "1    4\n",
       "2    6\n",
       "3    8\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "my_series = pd.Series([1, 2, 3, 4])\n",
    "\n",
    "print(type(my_series))\n",
    "display(my_series * 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85a52ee0",
   "metadata": {},
   "source": [
    "What happens when you multiply a Python `list` by number `2`? It repeats the elements.\n",
    "\n",
    "How about a `Series`? It multiples each element by `2`!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68d7595e",
   "metadata": {
    "id": "Dg-bBkF8fGaN"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 3: Create new `Series`\n",
    "\n",
    "#### 👇 Tasks\n",
    "\n",
    "- ✔️ Create a new Pandas `Series` named `my_series` with the following three values: `10`, `20`, `30`.\n",
    "\n",
    "#### 🚀 Hint\n",
    "\n",
    "The code below creates a new Pandas `Series` with the values `1` and `2`.\n",
    "\n",
    "```python\n",
    "my_new_series = pd.Series([1, 2])\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "b4d0611a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    10\n",
       "1    20\n",
       "2    30\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "my_series = pd.Series([10, 20, 30])\n",
    "### END SOLUTION\n",
    "\n",
    "my_series"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70ac9be0",
   "metadata": {
    "id": "EdrK-mBsem7r"
   },
   "source": [
    "#### 🧭 Check Your Work\n",
    "\n",
    "- Once you're done, run the code cell below to test correctness.\n",
    "- ✔️ If the code cell runs without an error, you're good to move on.\n",
    "- ❌ If the code cell throws an error, go back and fix any incorrect parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f16ea8b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Challenge 2 Autograder\n",
    "pd.testing.assert_series_equal(my_series, pd.Series([1, 2, 3]) * 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b84f0899",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### Using `Series` methods\n",
    "\n",
    "A pandas `Series` is similar to a Python `list`. However, a `Series` provides many methods (equivalent to functions) for you to use.\n",
    "\n",
    "As an example, `num_reviews.mean()` will return the average number of reviews."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "44cda202",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0    12715\n",
      "1     2274\n",
      "2     2771\n",
      "3     3952\n",
      "4      528\n",
      "5     2766\n",
      "6      724\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "reviews_count = [12715, 2274, 2771, 3952, 528, 2766, 724]\n",
    "num_reviews = pd.Series(reviews_count)\n",
    "\n",
    "print(num_reviews)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}