{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "abb58d0b",
   "metadata": {},
   "source": [
    "# Introduction to Jupyter Notebooks and Pandas\n",
    "\n",
    "## What is a Jupyter Notebook?\n",
    "\n",
    "A [Jupyter](https://jupyter.org/) notebook is a document that can contain live code w/ results, visualizations, and rich text. It is widely used in data science and analytics. The cell below is a *code* cell. It contains a block of executable code.\n",
    "\n",
    "Run the code below by clicking on the cell below and clicking the \"Run\" button on top (▶)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7cccbac8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30\n"
     ]
    }
   ],
   "source": [
    "print(10 + 20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d085a2a1",
   "metadata": {
    "id": "DiznwS2xem7h"
   },
   "source": [
    "▶️ Run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "7c2705fe",
   "metadata": {
    "id": "xOFvIip0em7h"
   },
   "outputs": [],
   "source": [
    "import unittest\n",
    "tc = unittest.TestCase()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f0a04f0",
   "metadata": {},
   "source": [
    "## Types of cells\n",
    "\n",
    "There are three different type of cells.\n",
    "\n",
    "1. Code cell\n",
    "2. Markdown cell\n",
    "3. Raw cell\n",
    "\n",
    "We will most frequently use the first two types of cells."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1de8ab20",
   "metadata": {
    "id": "Dg-bBkF8fGaN"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 1: Find the sum of a list\n",
    "\n",
    "#### 👇 Tasks\n",
    "\n",
    "- ✔️ Complete the code cell below to find the sum of all values in `my_list`.\n",
    "- ✔️ Store the result in a new variable named `result`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1b5e79e0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "478\n"
     ]
    }
   ],
   "source": [
    "my_list = [11, 20, 52, 91, 90, 75, 74, 20, 21, 10, 14]\n",
    "\n",
    "### BEGIN SOLUTION\n",
    "result = 0\n",
    "\n",
    "for num in my_list:\n",
    "    result = result + num\n",
    "### END SOLUTION\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fdd9c853",
   "metadata": {
    "id": "EdrK-mBsem7r"
   },
   "source": [
    "#### 🧭 Check Your Work\n",
    "\n",
    "- Once you're done, run the code cell below to test correctness.\n",
    "- ✔️ If the code cell runs without an error, you're good to move on.\n",
    "- ❌ If the code cell throws an error, go back and fix any incorrect parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b2d60d4c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import unittest\n",
    "\n",
    "tc = unittest.TestCase()\n",
    "\n",
    "tc.assertEqual(result, 478)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbb0c241",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Introduction to Pandas\n",
    "\n",
    "Pandas is a Python *library* for data manipulation and analysis. Although it's used universally in data-related programming applications, it was initially developed for financial analysis by [AQR Capital Management](https://www.aqr.com/).\n",
    "\n",
    "![Pandas logo](https://github.com/bdi475/notebooks/blob/main/images/pandas-logo.png?raw=true)\n",
    "\n",
    "Note: A *library* in the context of programming is a collection of functions (and other data) that others have already written for you.\n",
    "\n",
    "Pandas is popular for many reasons:\n",
    "\n",
    "1. 🏃🏿‍♀️ It's fast (for most cases where the dataset can be loaded to your memory).\n",
    "2. 🪒 It supports most of the features required for data manipulation.\n",
    "3. 💡 Write less code. Get more done."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2485fa9",
   "metadata": {
    "id": "Dg-bBkF8fGaN"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 2: Import packages\n",
    "\n",
    "#### 👇 Tasks\n",
    "\n",
    "- ✔️ Import the following Python packages.\n",
    "    1. `pandas`: Use alias `pd`.\n",
    "    2. `numpy`: Use alias `np`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "1a931be7",
   "metadata": {
    "id": "jfnhv8_Yem7j"
   },
   "outputs": [],
   "source": [
    "### BEGIN SOLUTION\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "511e1493",
   "metadata": {
    "id": "WCOuwkrzem7j"
   },
   "source": [
    "#### 🧭 Check Your Work\n",
    "\n",
    "- Once you're done, run the code cell below to test correctness.\n",
    "- ✔️ If the code cell runs without an error, you're good to move on.\n",
    "- ❌ If the code cell throws an error, go back and fix incorrect parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a8041cd8",
   "metadata": {
    "id": "WQ-COKL8em7k"
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "tc.assertTrue(\"pd\" in globals(), \"Check whether you have correctly import Pandas with an alias.\")\n",
    "tc.assertTrue(\"np\" in globals(), \"Check whether you have correctly import NumPy with an alias.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49f78e8f",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### It all starts with a `Series`...\n",
    "\n",
    "The basic building block of Pandas is a `Series`. A `Series` is like a list, but with many more features.\n",
    "\n",
    "You can create a `Series` by passing a list of values to `pd.Series()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "06b980fe",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    2.0\n",
       "2    3.0\n",
       "3    NaN\n",
       "4    5.0\n",
       "5    6.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([1, 2, 3, np.nan, 5, 6])\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50cc82bb",
   "metadata": {},
   "source": [
    "### Few things to note here\n",
    "\n",
    "1. These look similar to a Python `list`.\n",
    "2. The last line of the printed output tells us the data type of values in the `Series` (`dtype: float64`).\n",
    "- What the heck is `np.nan`?\n",
    "    - It is used to indicate a \"missing value\".\n",
    "    - `np.nan` is NOT the same as `0`.\n",
    "    \n",
    "### Differences between a list and a Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "372f1887",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 4, 1, 2, 3, 4]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "my_list = [1, 2, 3, 4]\n",
    "\n",
    "print(type(my_list))\n",
    "display(my_list * 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0b35e389",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.series.Series'>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0    2\n",
       "1    4\n",
       "2    6\n",
       "3    8\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "my_series = pd.Series([1, 2, 3, 4])\n",
    "\n",
    "print(type(my_series))\n",
    "display(my_series * 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84ee75dd",
   "metadata": {},
   "source": [
    "What happens when you multiply a Python `list` by number `2`? It repeats the elements.\n",
    "\n",
    "How about a `Series`? It multiples each element by `2`!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff91c121",
   "metadata": {
    "id": "Dg-bBkF8fGaN"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 3: Create new `Series`\n",
    "\n",
    "#### 👇 Tasks\n",
    "\n",
    "- ✔️ Create a new Pandas `Series` named `my_series` with the following three values: `10`, `20`, `30`.\n",
    "\n",
    "#### 🚀 Hint\n",
    "\n",
    "The code below creates a new Pandas `Series` with the values `1` and `2`.\n",
    "\n",
    "```python\n",
    "my_new_series = pd.Series([1, 2])\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "aaa01f50",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    10\n",
       "1    20\n",
       "2    30\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "my_series = pd.Series([10, 20, 30])\n",
    "### END SOLUTION\n",
    "\n",
    "my_series"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed26cf6d",
   "metadata": {
    "id": "EdrK-mBsem7r"
   },
   "source": [
    "#### 🧭 Check Your Work\n",
    "\n",
    "- Once you're done, run the code cell below to test correctness.\n",
    "- ✔️ If the code cell runs without an error, you're good to move on.\n",
    "- ❌ If the code cell throws an error, go back and fix any incorrect parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "91ae33ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.testing.assert_series_equal(my_series, pd.Series([1, 2, 3]) * 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3870734e",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### Using `Series` methods\n",
    "\n",
    "A pandas `Series` is similar to a Python `list`. However, a `Series` provides many methods (equivalent to functions) for you to use.\n",
    "\n",
    "As an example, `num_reviews.mean()` will return the average number of reviews."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "1982fc6a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ellipsis"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews_count = [12715, 2274, 2771, 3952, 528, 2766, 724]\n",
    "num_reviews = pd.Series(reviews_count)\n",
    "\n",
    "# YOUR CODE HERE\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "370ca8b3-368f-460a-80ff-f19a0a6be150",
   "metadata": {
    "id": "Dg-bBkF8fGaN"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 4: Create a Pandas DataFrame\n",
    "\n",
    "#### 👇 Tasks\n",
    "\n",
    "- ✔️ You are given two lists - `product_names` and `num_reviews` that contain the names of make-up products and the number of reviews on Sephora.com.\n",
    "- ✔️ Using the two lists, create a new Pandas `DataFrame` named `df_top_products` that has the following two columns:\n",
    "    1. `product_name`: Names of the products\n",
    "    2. `num_review`: Number of reviews\n",
    "- ✔️ Note that the column names are singular.\n",
    "\n",
    "#### 🚀 Hint\n",
    "\n",
    "The code below creates a new Pandas `DataFrame` from two series.\n",
    "\n",
    "```python\n",
    "my_new_dataframe = pd.DataFrame({\n",
    "    \"column_one\": my_series1,\n",
    "    \"column_two\": my_series2\n",
    "})\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "9371b5a5-da1d-4ca5-9343-aa2040c996d6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product_name</th>\n",
       "      <th>num_review</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Laneige Lip Sleeping Mask</td>\n",
       "      <td>12715</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>The Ordinary Hyaluronic Acid 2% + B5</td>\n",
       "      <td>2274</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Laneige Lip Glowy Balm</td>\n",
       "      <td>2766</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Chanel COCO MADEMOISELLE Eau de Parfum</td>\n",
       "      <td>724</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             product_name  num_review\n",
       "0               Laneige Lip Sleeping Mask       12715\n",
       "1    The Ordinary Hyaluronic Acid 2% + B5        2274\n",
       "2                  Laneige Lip Glowy Balm        2766\n",
       "3  Chanel COCO MADEMOISELLE Eau de Parfum         724"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "product_names = [\n",
    "    \"Laneige Lip Sleeping Mask\",\n",
    "    \"The Ordinary Hyaluronic Acid 2% + B5\",\n",
    "    \"Laneige Lip Glowy Balm\",\n",
    "    \"Chanel COCO MADEMOISELLE Eau de Parfum\"\n",
    "]\n",
    "\n",
    "num_reviews = [\n",
    "    12715,\n",
    "    2274,\n",
    "    2766,\n",
    "    724\n",
    "]\n",
    "\n",
    "### BEGIN SOLUTION\n",
    "df_top_products = pd.DataFrame({\n",
    "    \"product_name\": product_names,\n",
    "    \"num_review\": num_reviews\n",
    "})\n",
    "### END SOLUTION\n",
    "\n",
    "display(df_top_products)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5c7b7b4-b116-4e62-bcd8-c7d8b7334e77",
   "metadata": {
    "id": "EdrK-mBsem7r"
   },
   "source": [
    "#### 🧭 Check Your Work\n",
    "\n",
    "- Once you're done, run the code cell below to test correctness.\n",
    "- ✔️ If the code cell runs without an error, you're good to move on.\n",
    "- ❌ If the code cell throws an error, go back and fix any incorrect parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "dba4d57c-2908-42d6-8f2e-6f4e3a88de0d",
   "metadata": {
    "id": "DXDG3nzpem7r",
    "nbgrader": {
     "grade": true,
     "grade_id": "challenge-02",
     "locked": true,
     "points": "1",
     "solution": false
    }
   },
   "outputs": [],
   "source": [
    "pd.testing.assert_frame_equal(\n",
    "    df_top_products.reset_index(drop=True),\n",
    "    pd.DataFrame({\"product_name\": {0: \"Laneige Lip Sleeping Mask\",\n",
    "        1: \"The Ordinary Hyaluronic Acid 2% + B5\",\n",
    "        2: \"Laneige Lip Glowy Balm\",\n",
    "        3: \"Chanel COCO MADEMOISELLE Eau de Parfum\"},\n",
    "        \"num_review\": {0: 12715, 1: 2274, 2: 2766, 3: 724}})\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c59589d5-8fca-42ac-b6eb-992d27206421",
   "metadata": {
    "id": "126Uw95nem7k"
   },
   "source": [
    "---\n",
    "\n",
    "### 📌 Load data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f4c723f-5eec-4c05-a937-b7cd601c97c1",
   "metadata": {
    "id": "kH43B35Mem7l"
   },
   "source": [
    "The second part of today's lecture is all about **you**. 👻 Literally.\n",
    "\n",
    "▶️ Run the code cell below to create a new `DataFrame` named `df_you`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "a2cdcd9a-b41e-42f8-ad46-94a48f3628d2",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "id": "CXJECgQiem7l",
    "outputId": "59ba80a2-892b-4e64-bbe9-1b8fc3a2880c"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>major1</th>\n",
       "      <th>major2</th>\n",
       "      <th>city</th>\n",
       "      <th>country</th>\n",
       "      <th>fav_restaurant</th>\n",
       "      <th>fav_movie</th>\n",
       "      <th>has_iphone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ahana Chakraborty</td>\n",
       "      <td>Statistics</td>\n",
       "      <td>Business &amp; Informatics</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>USA</td>\n",
       "      <td>Poke Lab</td>\n",
       "      <td>Shrek 2</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Andrew Rozmus</td>\n",
       "      <td>Psychology</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Elmhurst</td>\n",
       "      <td>USA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Anusha Adira</td>\n",
       "      <td>Computer Engineering</td>\n",
       "      <td>Business</td>\n",
       "      <td>Cupertino</td>\n",
       "      <td>USA</td>\n",
       "      <td>Bangkok Thai</td>\n",
       "      <td>Three Idiots</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Arthur Pyptyuk</td>\n",
       "      <td>Economics</td>\n",
       "      <td>Business</td>\n",
       "      <td>Hoffman Estates</td>\n",
       "      <td>USA</td>\n",
       "      <td>Sakanaya</td>\n",
       "      <td>Hereditary</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Aryajit Das</td>\n",
       "      <td>Economics</td>\n",
       "      <td>Business &amp; Global Markets plus Society</td>\n",
       "      <td>Streamwood</td>\n",
       "      <td>USA</td>\n",
       "      <td>Dubai Grill</td>\n",
       "      <td>Transformers: Age of Extinction</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                name                major1  \\\n",
       "0  Ahana Chakraborty            Statistics   \n",
       "1      Andrew Rozmus            Psychology   \n",
       "2       Anusha Adira  Computer Engineering   \n",
       "3     Arthur Pyptyuk             Economics   \n",
       "4        Aryajit Das             Economics   \n",
       "\n",
       "                                   major2             city country  \\\n",
       "0                  Business & Informatics          Chicago     USA   \n",
       "1                                     NaN         Elmhurst     USA   \n",
       "2                                Business        Cupertino     USA   \n",
       "3                                Business  Hoffman Estates     USA   \n",
       "4  Business & Global Markets plus Society       Streamwood     USA   \n",
       "\n",
       "  fav_restaurant                        fav_movie  has_iphone  \n",
       "0       Poke Lab                          Shrek 2        True  \n",
       "1            NaN                              NaN        True  \n",
       "2   Bangkok Thai                     Three Idiots        True  \n",
       "3       Sakanaya                       Hereditary        True  \n",
       "4    Dubai Grill  Transformers: Age of Extinction        True  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_you = pd.read_csv(\"https://github.com/bdi475/datasets/raw/main/about-you.csv\")\n",
    "\n",
    "# Used to keep a clean copy\n",
    "df_you_backup = df_you.copy()\n",
    "\n",
    "# head() displays the first 5 rows of a DataFrame\n",
    "df_you.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93cb3aa1-6901-45ac-9355-734ace9759cd",
   "metadata": {
    "id": "t6vVg2dJem7m"
   },
   "source": [
    "☝️ **Hold on.** Didn't we always create `DataFrame`s using `pd.DataFrame()`?\n",
    "\n",
    "Yes. But we can *import* existing data as a Pandas `DataFrame` using `pd.read_csv()`. There are many other similar import methods. For now, we'll mostly use `pd.read_csv()`.\n",
    "\n",
    "The table below explains each column in `df_you`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "babad61d-efee-49d2-b6b4-a521bb4f9ba4",
   "metadata": {
    "id": "LI33A8-jem7m"
   },
   "source": [
    "| Column Name             | Description                                               |\n",
    "|-------------------------|-----------------------------------------------------------|\n",
    "| name                    | First name                                                |\n",
    "| major1                  | Major                                                     |\n",
    "| major2                  | Second major OR minor (blank if no second major or minor) |\n",
    "| city                    | City the person is from                                   |\n",
    "| country                 | Country the person is from                                   |\n",
    "| fav_restaurant          | Favorite restaurant (blank if no restaurant was given)    |\n",
    "| fav_movie               | Favorite movie (blank if no movie was given)              |\n",
    "| has_iphone              | Whether the person use an iPhone                          |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "930451c6-a001-4d5f-8537-6d7ca5002235",
   "metadata": {
    "id": "d5hE9oVSem7n"
   },
   "source": [
    "---\n",
    "\n",
    "### 📌 Concise summary of a `DataFrame`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "664d9b88-6b5e-4d97-8848-9ab423dec6fb",
   "metadata": {
    "id": "lg5s3hW2em7n"
   },
   "source": [
    "👉 A common first step in working with a `DataFrame` is to use the `info()` method. `info()` prints a concise summary of a `DataFrame`.\n",
    "- Index data type\n",
    "- Column information: for each column, the following information is displayed:\n",
    "    - Number of non-missing values\n",
    "    - Data type of the column\n",
    "- Memory usage"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c17d9d82-2330-4966-81d4-2c50dc6980df",
   "metadata": {
    "id": "v4myKhF8em7o"
   },
   "source": [
    "▶️ Run `df_you.info()` below to see the `info()` method in action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "3e566e47-1a11-4110-ac18-9ee1beab9590",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "UfyZRM7rem7o",
    "outputId": "891d9445-721f-4c60-a345-36b679815cf0",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 45 entries, 0 to 44\n",
      "Data columns (total 8 columns):\n",
      " #   Column          Non-Null Count  Dtype \n",
      "---  ------          --------------  ----- \n",
      " 0   name            45 non-null     object\n",
      " 1   major1          44 non-null     object\n",
      " 2   major2          36 non-null     object\n",
      " 3   city            35 non-null     object\n",
      " 4   country         37 non-null     object\n",
      " 5   fav_restaurant  33 non-null     object\n",
      " 6   fav_movie       31 non-null     object\n",
      " 7   has_iphone      45 non-null     bool  \n",
      "dtypes: bool(1), object(7)\n",
      "memory usage: 2.6+ KB\n"
     ]
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "df_you.info()\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e8abd55-1c5e-48e4-8029-c61805535a5f",
   "metadata": {
    "id": "prZjAYOUem7o"
   },
   "source": [
    "👉 From the result of `df_you.info()`, we can understand a couple of things:\n",
    "\n",
    "- There are 8 columns.\n",
    "- 7 out of 8 columns have the `object` data type.\n",
    "    - In Pandas, a string data type is shown as `object`, not `str`.\n",
    "        - We will skip the technical discussion for now.\n",
    "- The second line of the output tells us the number of rows (i.e., observations).\n",
    "- Some columns contain one or more missing values.\n",
    "    - Missing values are displayed as `NaN`.\n",
    "    - To denote a missing value, use NumPy's `np.nan` (more on this later)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec0a09ad-6180-4ea9-a8df-20c3aaf8e7a4",
   "metadata": {
    "id": "OJ4qqvKIem7q"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 5: Display first/last/random rows"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "caec58af-5dcc-42e5-932a-38703bc25a66",
   "metadata": {
    "id": "TwWyyaQPem7q"
   },
   "source": [
    "▶️ Run `df_you.head()` to print the first 5 rows of `df_you`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "5f802a73-5323-4184-827c-0bc5c9aa1469",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "OoYuVAyhem7q",
    "outputId": "8eea66d1-ee93-464c-9889-346fd512d07a",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>major1</th>\n",
       "      <th>major2</th>\n",
       "      <th>city</th>\n",
       "      <th>country</th>\n",
       "      <th>fav_restaurant</th>\n",
       "      <th>fav_movie</th>\n",
       "      <th>has_iphone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ahana Chakraborty</td>\n",
       "      <td>Statistics</td>\n",
       "      <td>Business &amp; Informatics</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>USA</td>\n",
       "      <td>Poke Lab</td>\n",
       "      <td>Shrek 2</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Andrew Rozmus</td>\n",
       "      <td>Psychology</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Elmhurst</td>\n",
       "      <td>USA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Anusha Adira</td>\n",
       "      <td>Computer Engineering</td>\n",
       "      <td>Business</td>\n",
       "      <td>Cupertino</td>\n",
       "      <td>USA</td>\n",
       "      <td>Bangkok Thai</td>\n",
       "      <td>Three Idiots</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Arthur Pyptyuk</td>\n",
       "      <td>Economics</td>\n",
       "      <td>Business</td>\n",
       "      <td>Hoffman Estates</td>\n",
       "      <td>USA</td>\n",
       "      <td>Sakanaya</td>\n",
       "      <td>Hereditary</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Aryajit Das</td>\n",
       "      <td>Economics</td>\n",
       "      <td>Business &amp; Global Markets plus Society</td>\n",
       "      <td>Streamwood</td>\n",
       "      <td>USA</td>\n",
       "      <td>Dubai Grill</td>\n",
       "      <td>Transformers: Age of Extinction</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                name                major1  \\\n",
       "0  Ahana Chakraborty            Statistics   \n",
       "1      Andrew Rozmus            Psychology   \n",
       "2       Anusha Adira  Computer Engineering   \n",
       "3     Arthur Pyptyuk             Economics   \n",
       "4        Aryajit Das             Economics   \n",
       "\n",
       "                                   major2             city country  \\\n",
       "0                  Business & Informatics          Chicago     USA   \n",
       "1                                     NaN         Elmhurst     USA   \n",
       "2                                Business        Cupertino     USA   \n",
       "3                                Business  Hoffman Estates     USA   \n",
       "4  Business & Global Markets plus Society       Streamwood     USA   \n",
       "\n",
       "  fav_restaurant                        fav_movie  has_iphone  \n",
       "0       Poke Lab                          Shrek 2        True  \n",
       "1            NaN                              NaN        True  \n",
       "2   Bangkok Thai                     Three Idiots        True  \n",
       "3       Sakanaya                       Hereditary        True  \n",
       "4    Dubai Grill  Transformers: Age of Extinction        True  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "df_you.head()\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f572157-3d4d-46c6-b46e-89349c8f464a",
   "metadata": {
    "id": "TwWyyaQPem7q"
   },
   "source": [
    "▶️ Run `df_you.tail(4)` to print the last 4 rows of `df_you`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "d7b529c3-0972-4fe8-9106-4e56cf6b8315",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "OoYuVAyhem7q",
    "outputId": "8eea66d1-ee93-464c-9889-346fd512d07a"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>major1</th>\n",
       "      <th>major2</th>\n",
       "      <th>city</th>\n",
       "      <th>country</th>\n",
       "      <th>fav_restaurant</th>\n",
       "      <th>fav_movie</th>\n",
       "      <th>has_iphone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>Twinkle Yeruva</td>\n",
       "      <td>Computer Science</td>\n",
       "      <td>Business</td>\n",
       "      <td>Schaumburg</td>\n",
       "      <td>USA</td>\n",
       "      <td>Sticky Rice</td>\n",
       "      <td>Maze Runner</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>Valentina Flores</td>\n",
       "      <td>Economics</td>\n",
       "      <td>Business &amp; French</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>USA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Book of Life</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>Victoria Hernandez</td>\n",
       "      <td>Industrial Design</td>\n",
       "      <td>Spanish</td>\n",
       "      <td>East Moline</td>\n",
       "      <td>USA</td>\n",
       "      <td>Bangkok Thai</td>\n",
       "      <td>Mamma Mia or Shrek</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>Vikas Chavda</td>\n",
       "      <td>Economics</td>\n",
       "      <td>Business</td>\n",
       "      <td>Geneva</td>\n",
       "      <td>USA</td>\n",
       "      <td>Yogi</td>\n",
       "      <td>Kingsman: Secret Service</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  name             major1              major2         city  \\\n",
       "41      Twinkle Yeruva   Computer Science            Business   Schaumburg   \n",
       "42    Valentina Flores          Economics  Business & French       Chicago   \n",
       "43  Victoria Hernandez  Industrial Design             Spanish  East Moline   \n",
       "44        Vikas Chavda          Economics            Business       Geneva   \n",
       "\n",
       "   country fav_restaurant                 fav_movie  has_iphone  \n",
       "41     USA    Sticky Rice               Maze Runner        True  \n",
       "42     USA            NaN              Book of Life        True  \n",
       "43     USA   Bangkok Thai        Mamma Mia or Shrek        True  \n",
       "44     USA           Yogi  Kingsman: Secret Service        True  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "df_you.tail(4)\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ae9815a-a5b7-4747-9175-9535b133e614",
   "metadata": {
    "id": "TwWyyaQPem7q"
   },
   "source": [
    "▶️ Run `df_you.sample(3)` to print 3 randomly sampled rows from `df_you`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "ed993bdb-cb01-4a0e-8071-861107d96fda",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "OoYuVAyhem7q",
    "outputId": "8eea66d1-ee93-464c-9889-346fd512d07a"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>major1</th>\n",
       "      <th>major2</th>\n",
       "      <th>city</th>\n",
       "      <th>country</th>\n",
       "      <th>fav_restaurant</th>\n",
       "      <th>fav_movie</th>\n",
       "      <th>has_iphone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Cole Jordan</td>\n",
       "      <td>Computer Science</td>\n",
       "      <td>Business</td>\n",
       "      <td>NaN</td>\n",
       "      <td>USA</td>\n",
       "      <td>Chipotle</td>\n",
       "      <td>Ratatouille</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Julia Kevin</td>\n",
       "      <td>Bioengineering</td>\n",
       "      <td>Business</td>\n",
       "      <td>Elmhurst</td>\n",
       "      <td>USA</td>\n",
       "      <td>KoFusion</td>\n",
       "      <td>Set it Up</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>Spencer Sadler</td>\n",
       "      <td>Computer Science</td>\n",
       "      <td>Business</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>USA</td>\n",
       "      <td>Bangkok Thai</td>\n",
       "      <td>Ratatouille</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              name            major1    major2      city country  \\\n",
       "8      Cole Jordan  Computer Science  Business       NaN     USA   \n",
       "21     Julia Kevin    Bioengineering  Business  Elmhurst     USA   \n",
       "38  Spencer Sadler  Computer Science  Business   Chicago     USA   \n",
       "\n",
       "   fav_restaurant    fav_movie  has_iphone  \n",
       "8        Chipotle  Ratatouille        True  \n",
       "21       KoFusion    Set it Up        True  \n",
       "38   Bangkok Thai  Ratatouille        True  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "df_you.sample(3)\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "2ebbb254-bae3-462a-87a1-08b37c7e3441",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "challenge-03",
     "locked": true,
     "points": "1",
     "solution": false
    }
   },
   "outputs": [],
   "source": [
    "# Autograder"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5aec9daa-8aea-4981-a3ae-38733d6e3841",
   "metadata": {
    "id": "hgU6chDUem7o"
   },
   "source": [
    "---\n",
    "\n",
    "### 📌 Number of rows and columns in a `DataFrame`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed9edce3-3563-47a9-9226-5d5a520b0e46",
   "metadata": {
    "id": "3w2j6dSFem7p"
   },
   "source": [
    "👉 How many rows and columns does `df_you` have?\n",
    "\n",
    "▶️ Run `df_you.shape` below to see the *shape* (number of rows and columns) of the database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "19fc92e6-1b9e-41eb-beeb-60a0a83804a8",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "go7OUMtTem7p",
    "outputId": "cfb21812-0eef-4bc8-e2a0-fc11756ce7a0"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(45, 8)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "df_you.shape\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fc55735-9a55-4ea2-92fd-1940f15c9a0a",
   "metadata": {
    "id": "7J--UYFvem7p"
   },
   "source": [
    "👉 Can you store the number of rows and columns to variables?\n",
    "\n",
    "---\n",
    "\n",
    "- `df_you.shape` returns a `tuple` in `(num_rows, num_cols)` format. \n",
    "- What is a `tuple`? 🙀\n",
    "- A `tuple` is a `list` that cannot be modified once created.\n",
    "\n",
    "▶️ Run the code cell below to see how a `tuple` is nearly identical to a `list`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "59f5c6b5-3fb9-448b-9b96-1f6a0351ef89",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "vAnEKnhUem7p",
    "outputId": "23c744fd-7028-496b-a243-a015ea486481"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "my_list[1]=20\n",
      "my_tuple[1]=20\n"
     ]
    }
   ],
   "source": [
    "# These two are nearly identical,\n",
    "# The only difference is that my_tuple cannot be modified once created\n",
    "my_list = [10, 20]\n",
    "my_tuple = (10, 20)\n",
    "\n",
    "print(f\"my_list[1]={my_list[1]}\")    # prints 20\n",
    "print(f\"my_tuple[1]={my_tuple[1]}\")  # also prints 20"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68fa4f8f-b6a5-49ca-91a8-b08a35dff0af",
   "metadata": {
    "id": "OJ4qqvKIem7q"
   },
   "source": [
    "---\n",
    "\n",
    "### 🎯 Challenge 6: Find the number of rows and columns in a `DataFrame`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5a24363-6f2c-4573-8be3-576fafb7441f",
   "metadata": {
    "id": "TwWyyaQPem7q"
   },
   "source": [
    "#### 👇 Tasks\n",
    "\n",
    "- ✔️ Store the number of rows in `df_you` to a new variable named `num_rows`.\n",
    "- ✔️ Store the number of columns in `df_you` to a new variable named `num_cols`.\n",
    "- ✔️ Use `.shape`, not `len()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "43c16cb2-af4e-4c46-9c24-ba087434b1da",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "OoYuVAyhem7q",
    "outputId": "8eea66d1-ee93-464c-9889-346fd512d07a",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "45\n",
      "8\n"
     ]
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "num_rows = df_you.shape[0]\n",
    "num_cols = df_you.shape[1]\n",
    "### END SOLUTION\n",
    "\n",
    "print(num_rows)\n",
    "print(num_cols)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e863033-5619-47a1-9221-d8728e798c9c",
   "metadata": {
    "id": "EdrK-mBsem7r"
   },
   "source": [
    "#### 🧭 Check Your Work\n",
    "\n",
    "- Once you're done, run the code cell below to test correctness.\n",
    "- ✔️ If the code cell runs without an error, you're good to move on.\n",
    "- ❌ If the code cell throws an error, go back and fix incorrect parts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "42e11cbf-924e-4679-8c9b-239a3049f796",
   "metadata": {
    "id": "DXDG3nzpem7r",
    "nbgrader": {
     "grade": true,
     "grade_id": "challenge-04",
     "locked": true,
     "points": "1",
     "solution": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "tc.assertEqual(num_rows, len(df_you.index), f\"Number of rows should be {len(df_you.index)}\")\n",
    "tc.assertEqual(num_cols, len(df_you.columns), f\"Number of columns should be {len(df_you.columns)}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}