{ "cells": [ { "cell_type": "markdown", "id": "ede102be", "metadata": {}, "source": [ "# Introduction to Jupyter Notebooks and Pandas\n", "\n", "## What is a Jupyter Notebook? 📓\n", "\n", "- A [Jupyter](https://jupyter.org/) notebook is a document that can contain live code w/ results, visualizations, and rich text.\n", "- It is widely used in data science and analytics.\n", "- A Jupyter notebook has a `.ipynb` file extension (e.g., `my_notebook.ipynb`).\n", "- A Jupyter notebook is a list of cells.\n", "\n", "### How do you run a Jupyter notebook?\n", "\n", "A Jupyter notebook can be run in one of the following Jupyter environments.\n", "\n", "1. Jupyter Notebook - original web application for creating and sharing computational documents\n", "2. JupyterLab - a web-based development environment for notebooks (considered a newer version of the Jupyter Notebook)\n", "3. [Google Colab](https://colab.research.google.com/) - Google's cloud notebook platform built on top of [Jupyter](https://jupyter.org/) environment\n", "\n", "The first two environments require installations on your local machine or a server. We will use Google Colab as you can run it inside a cloud environment.\n", "\n", "### Types of cells\n", "\n", "Every cell in a Jupyter notebook is of a specific type. The list of supported types vary by Jupyter environment.\n", "\n", "Google Colab supports two types of cells.\n", "\n", "1. Code cell\n", "2. Text cell (also known as a Markdown cell)\n", "\n", "The cell below is a *code* cell. It contains a block of executable code.\n", "\n", "Run the code below by clicking on the cell below and clicking the \"Run\" icon (▶)." ] }, { "cell_type": "code", "execution_count": null, "id": "22a8ee16", "metadata": {}, "outputs": [], "source": [ "print(10 + 20)" ] }, { "cell_type": "markdown", "id": "36734cbb", "metadata": { "id": "DiznwS2xem7h" }, "source": [ "▶️ Run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder." ] }, { "cell_type": "code", "execution_count": 2, "id": "9b06ef4c", "metadata": { "id": "xOFvIip0em7h" }, "outputs": [], "source": [ "import unittest\n", "tc = unittest.TestCase()" ] }, { "cell_type": "markdown", "id": "1613fdd6", "metadata": { "id": "Dg-bBkF8fGaN" }, "source": [ "---\n", "\n", "### 🎯 Challenge 1: Find the sum of a list\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Complete the code cell below to find the sum of all values in `my_list`.\n", "- ✔️ Store the result in a new variable named `result`." ] }, { "cell_type": "code", "execution_count": 3, "id": "d097c610", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "478\n" ] } ], "source": [ "my_list = [11, 20, 52, 91, 90, 75, 74, 20, 21, 10, 14]\n", "\n", "### BEGIN SOLUTION\n", "result = 0\n", "\n", "for num in my_list:\n", " result = result + num\n", "### END SOLUTION\n", "\n", "print(result)" ] }, { "cell_type": "markdown", "id": "d2538ff9", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 4, "id": "1b03a629", "metadata": {}, "outputs": [], "source": [ "# Challenge 1 Autograder\n", "import unittest\n", "\n", "tc = unittest.TestCase()\n", "\n", "tc.assertEqual(result, 478)" ] }, { "cell_type": "markdown", "id": "5d571480", "metadata": {}, "source": [ "---\n", "\n", "## Introduction to Pandas\n", "\n", "Pandas is a Python *library* for data manipulation and analysis. Although it's used universally in data-related programming applications, it was initially developed for financial analysis by [AQR Capital Management](https://www.aqr.com/).\n", "\n", "Note: A *library* in the context of programming is a collection of functions (and other data) that others have already written for you.\n", "\n", "Pandas is popular for many reasons:\n", "\n", "1. 🏃🏿‍♀️ It's fast (for most cases where the dataset can be loaded to your memory).\n", "2. 🪒 It supports most of the features required for data manipulation.\n", "3. 💡 Write less code. Get more done." ] }, { "cell_type": "markdown", "id": "9ecc6d53", "metadata": { "id": "Dg-bBkF8fGaN" }, "source": [ "---\n", "\n", "### 🎯 Challenge 2: Import packages\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Import the following Python packages.\n", " 1. `pandas`: Use alias `pd`.\n", " 2. `numpy`: Use alias `np`." ] }, { "cell_type": "code", "execution_count": 5, "id": "26fb5cd6", "metadata": { "id": "jfnhv8_Yem7j" }, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "import pandas as pd\n", "import numpy as np\n", "### END SOLUTION" ] }, { "cell_type": "markdown", "id": "1448b30e", "metadata": { "id": "WCOuwkrzem7j" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 6, "id": "74263691", "metadata": { "id": "WQ-COKL8em7k" }, "outputs": [], "source": [ "# Challenge 2 Autograder\n", "import sys\n", "tc.assertTrue(\"pd\" in globals(), \"Check whether you have correctly import Pandas with an alias.\")\n", "tc.assertTrue(\"np\" in globals(), \"Check whether you have correctly import NumPy with an alias.\")" ] }, { "cell_type": "markdown", "id": "39618ef0", "metadata": {}, "source": [ "---\n", "\n", "### It all starts with a `Series`...\n", "\n", "The basic building block of Pandas is a `Series`. A `Series` is like a list, but with many more features.\n", "\n", "You can create a `Series` by passing a list of values to `pd.Series()`." ] }, { "cell_type": "code", "execution_count": 7, "id": "711ee851", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 3.0\n", "3 NaN\n", "4 5.0\n", "5 6.0\n", "dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series([1, 2, 3, np.nan, 5, 6])\n", "\n", "s" ] }, { "cell_type": "markdown", "id": "fa5ff66d", "metadata": {}, "source": [ "### Few things to note here\n", "\n", "1. These look similar to a Python `list`.\n", "2. The last line of the printed output tells us the data type of values in the `Series` (`dtype: float64`).\n", "- What the heck is `np.nan`?\n", " - It is used to indicate a \"missing value\".\n", " - `np.nan` is NOT the same as `0`.\n", " \n", "### Differences between a list and a Series" ] }, { "cell_type": "code", "execution_count": 8, "id": "4f64e65c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "[1, 2, 3, 4, 1, 2, 3, 4]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "my_list = [1, 2, 3, 4]\n", "\n", "print(type(my_list))\n", "display(my_list * 2)" ] }, { "cell_type": "code", "execution_count": 9, "id": "3aa5a515", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "0 2\n", "1 4\n", "2 6\n", "3 8\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "my_series = pd.Series([1, 2, 3, 4])\n", "\n", "print(type(my_series))\n", "display(my_series * 2)" ] }, { "cell_type": "markdown", "id": "85a52ee0", "metadata": {}, "source": [ "What happens when you multiply a Python `list` by number `2`? It repeats the elements.\n", "\n", "How about a `Series`? It multiples each element by `2`!" ] }, { "cell_type": "markdown", "id": "68d7595e", "metadata": { "id": "Dg-bBkF8fGaN" }, "source": [ "---\n", "\n", "### 🎯 Challenge 3: Create new `Series`\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Create a new Pandas `Series` named `my_series` with the following three values: `10`, `20`, `30`.\n", "\n", "#### 🚀 Hint\n", "\n", "The code below creates a new Pandas `Series` with the values `1` and `2`.\n", "\n", "```python\n", "my_new_series = pd.Series([1, 2])\n", "```" ] }, { "cell_type": "code", "execution_count": 10, "id": "b4d0611a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 10\n", "1 20\n", "2 30\n", "dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "my_series = pd.Series([10, 20, 30])\n", "### END SOLUTION\n", "\n", "my_series" ] }, { "cell_type": "markdown", "id": "70ac9be0", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 11, "id": "f16ea8b6", "metadata": {}, "outputs": [], "source": [ "# Challenge 2 Autograder\n", "pd.testing.assert_series_equal(my_series, pd.Series([1, 2, 3]) * 10)" ] }, { "cell_type": "markdown", "id": "b84f0899", "metadata": {}, "source": [ "---\n", "\n", "### Using `Series` methods\n", "\n", "A pandas `Series` is similar to a Python `list`. However, a `Series` provides many methods (equivalent to functions) for you to use.\n", "\n", "As an example, `num_reviews.mean()` will return the average number of reviews." ] }, { "cell_type": "code", "execution_count": 12, "id": "44cda202", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 12715\n", "1 2274\n", "2 2771\n", "3 3952\n", "4 528\n", "5 2766\n", "6 724\n", "dtype: int64\n" ] } ], "source": [ "reviews_count = [12715, 2274, 2771, 3952, 528, 2766, 724]\n", "num_reviews = pd.Series(reviews_count)\n", "\n", "print(num_reviews)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 5 }