{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1bc72fe3",
   "metadata": {},
   "source": [
    "---   \n",
    " <img align=\"left\" width=\"75\" height=\"75\"  src=\"https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png\"> \n",
    "\n",
    "<h1 align=\"center\">Department of Data Science</h1>\n",
    "<h1 align=\"center\">Course: Tools and Techniques for Data Science</h1>\n",
    "\n",
    "---\n",
    "<h3><div align=\"right\">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d522b27",
   "metadata": {},
   "source": [
    "<h1 align=\"center\">Lecture 3.12 (Pandas-04)</h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b852d0bd",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/arifpucit/data-science/blob/master/Section-3-Python-for-Data-Scientists/Lec-3.12(Pandas-04-IO-with-CSV-EXCEL-and-JSON-Files).ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7de11ce0",
   "metadata": {},
   "source": [
    "## _IO with CSV EXCEL and JSON Files_\n",
    "\n",
    "<img align=\"center\" width=\"600\" height=\"150\"  src=\"images/fileformats.png\" >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b992d9d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "441aebbc",
   "metadata": {},
   "source": [
    "#### Read Pandas Documentation:\n",
    "- General Info: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html\n",
    "\n",
    "\n",
    "- For `read_csv`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv\n",
    "\n",
    "\n",
    "- For `read_excel`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html?highlight=read_excel#pandas.read_excel\n",
    "\n",
    "\n",
    "- For `read_json`:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.read_json.html?highlight=pandas%20read_json#pandas.io.json.read_json\n",
    "\n",
    "\n",
    "- For `to_csv`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv\n",
    "\n",
    "\n",
    "\n",
    "- For `to_excel`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html?highlight=to_excel#pandas.DataFrame.to_excel\n",
    "\n",
    "\n",
    "- For `to_json`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html?highlight=to_json"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22dcd278",
   "metadata": {},
   "source": [
    "## Learning agenda of this notebook\n",
    "[Pandas](https://pandas.pydata.org/) provides helper functions to read data from various file formats like CSV, EXCEL, JSON, HTML, SQL table, and many more.\n",
    "1. Reading data from a CSV/TSV File\n",
    "2. Reading a CSV file from a Remote System\n",
    "3. Writing Contents of Dataframe to a CSV File\n",
    "4. Reading data from an EXCEL File\n",
    "5. Writing Contents of Dataframe to an EXCEL File\n",
    "6. Reading data from a JSON File\n",
    "7. Writing Contents of Dataframe to a JSON File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d85d4c7a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# To install this library in Jupyter notebook\n",
    "import sys\n",
    "!{sys.executable} -m pip install pandas --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b96c7461",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('1.3.4',\n",
       " ['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas'])"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.__version__ , pd.__path__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c74b937",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "bb4e943a",
   "metadata": {},
   "source": [
    "## 1. Reading from CSV/TSV Files\n",
    ">**CSV**: A text file in which the values are separated by a comma or a tab character is called a CSV or a TSV file. Each line of the file is a data record and each record consists of one or more fields, separated by a specific character called separator. A CSV/TSV file is typically used to store tabular data (numbers and text), in which each line will have the same number of fields."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "586d81fe",
   "metadata": {},
   "source": [
    "### a. Reading a  Simple CSV File\n",
    "The `pd.read_csv()` method is used to read a comma-separated file into a DataFrame.\n",
    "```\n",
    "pd.read_csv(fname, delimiter=None, header='infer', skiprows=None , nrows=None , usecols=None,  footer='',...)\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "36f4f1fd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "rollno,gender,group,age,math,english,urdu\r\n",
      "MS01,female,group B,28.0,72.0,72,74.0\r\n",
      "MS02,female,group C,33.0,69.0,90,88.0\r\n",
      "MS03,female,group B,21.0,,95,93.0\r\n",
      "MS04,male,group A,44.0,47.0,57,44.0\r\n",
      "MS05,male,group C,54.0,76.0,78,\r\n",
      "MS06,female,group B,,71.0,83,78.0\r\n",
      "MS07,female,group B,47.0,88.0,95,92.0\r\n",
      "MS08,male,group B,33.0,40.0,43,39.0\r\n",
      "MS09,male,group D,27.0,64.0,64,67.0\r\n",
      "MS10,female,group B,33.0,38.0,60,50.0\r\n",
      "MS11,male,group C,46.0,58.0,54,52.0\r\n",
      "MS12,male,group D,53.0,40.0,52,43.0\r\n",
      "MS13,female,group B,25.0,65.0,81,73.0\r\n",
      "MS14,male,group A,26.0,78.0,72,70.0\r\n",
      "MS15,female,group A,25.0,50.0,53,58.0\r\n",
      "MS16,female,group C,29.0,69.0,75,78.0\r\n",
      "MS17,male,group C,22.0,88.0,89,86.0\r\n",
      "MS18,female,group B,31.0,18.0,32,28.0\r\n",
      "MS19,male,group C,33.0,46.0,42,46.0\r\n",
      "MS20,female,group C,36.0,54.0,58,\r\n",
      "MS21,male,group D,34.0,66.0,69,63.0\r\n",
      "MS22,female,group B,32.0,65.0,75,70.0\r\n",
      "MS23,male,group D,31.0,44.0,54,53.0\r\n",
      "MS24,female,group C,37.0,,73,73.0\r\n",
      "MS25,male,group D,33.0,74.0,71,80.0\r\n",
      "MS26,male,group A,39.0,73.0,74,72.0\r\n",
      "MS27,male,group B,,69.0,54,55.0\r\n",
      "MS28,female,group C,47.0,67.0,69,75.0\r\n",
      "MS29,male,group C,44.0,70.0,70,65.0\r\n",
      "MS30,female,group D,42.0,62.0,70,75.0\r\n",
      "MS31,female,group D,41.0,69.0,74,74.0\r\n",
      "MS32,female,group B,,63.0,65,61.0\r\n",
      "MS33,female,group E,29.0,56.0,72,65.0\r\n",
      "MS34,male,group D,34.0,40.0,42,38.0\r\n",
      "MS35,male,group E,45.0,97.0,87,82.0\r\n",
      "MS36,male,group E,31.0,81.0,81,79.0\r\n",
      "MS37,female,group D,28.0,74.0,81,\r\n",
      "MS38,female,group D,26.0,50.0,64,59.0\r\n",
      "MS39,female,group D,36.0,75.0,90,88.0\r\n",
      "MS40,male,group B,33.0,57.0,56,57.0\r\n",
      "MS41,male,group C,35.0,55.0,61,54.0\r\n",
      "MS42,female,group C,22.0,58.0,73,68.0\r\n",
      "MS43,female,group B,38.0,53.0,58,65.0\r\n",
      "MS44,male,group B,31.0,59.0,65,66.0\r\n",
      "MS45,female,group E,28.0,,56,54.0\r\n",
      "MS46,male,group B,26.0,65.0,54,57.0\r\n",
      "MS47,female,group A,36.0,55.0,65,62.0\r\n",
      "MS48,female,group C,30.0,66.0,71,76.0\r\n",
      "MS49,female,group D,40.0,57.0,74,76.0\r\n",
      "MS50,male,group C,37.0,66.0,78,81.0\r\n"
     ]
    }
   ],
   "source": [
    "! cat datasets/classmarks.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "fef96838",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(50, 7)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e5863995",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rollno</th>\n",
       "      <th>gender</th>\n",
       "      <th>group</th>\n",
       "      <th>age</th>\n",
       "      <th>math</th>\n",
       "      <th>english</th>\n",
       "      <th>urdu</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MS01</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>28.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>72</td>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS02</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>33.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>90</td>\n",
       "      <td>88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS03</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>21.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>95</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS04</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>44.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>57</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS05</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>54.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>78</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>MS06</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>NaN</td>\n",
       "      <td>71.0</td>\n",
       "      <td>83</td>\n",
       "      <td>78.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>MS07</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>47.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>95</td>\n",
       "      <td>92.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>MS08</td>\n",
       "      <td>male</td>\n",
       "      <td>group B</td>\n",
       "      <td>33.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>43</td>\n",
       "      <td>39.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>MS09</td>\n",
       "      <td>male</td>\n",
       "      <td>group D</td>\n",
       "      <td>27.0</td>\n",
       "      <td>64.0</td>\n",
       "      <td>64</td>\n",
       "      <td>67.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>MS10</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>33.0</td>\n",
       "      <td>38.0</td>\n",
       "      <td>60</td>\n",
       "      <td>50.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>MS11</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>46.0</td>\n",
       "      <td>58.0</td>\n",
       "      <td>54</td>\n",
       "      <td>52.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>MS12</td>\n",
       "      <td>male</td>\n",
       "      <td>group D</td>\n",
       "      <td>53.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>52</td>\n",
       "      <td>43.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>MS13</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>25.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>81</td>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>MS14</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>26.0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>72</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>MS15</td>\n",
       "      <td>female</td>\n",
       "      <td>group A</td>\n",
       "      <td>25.0</td>\n",
       "      <td>50.0</td>\n",
       "      <td>53</td>\n",
       "      <td>58.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>MS16</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>29.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>75</td>\n",
       "      <td>78.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>MS17</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>22.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>89</td>\n",
       "      <td>86.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>MS18</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>31.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>32</td>\n",
       "      <td>28.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>MS19</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>33.0</td>\n",
       "      <td>46.0</td>\n",
       "      <td>42</td>\n",
       "      <td>46.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>MS20</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>36.0</td>\n",
       "      <td>54.0</td>\n",
       "      <td>58</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>MS21</td>\n",
       "      <td>male</td>\n",
       "      <td>group D</td>\n",
       "      <td>34.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>69</td>\n",
       "      <td>63.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>MS22</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>32.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>75</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>MS23</td>\n",
       "      <td>male</td>\n",
       "      <td>group D</td>\n",
       "      <td>31.0</td>\n",
       "      <td>44.0</td>\n",
       "      <td>54</td>\n",
       "      <td>53.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>MS24</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>37.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>73</td>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>MS25</td>\n",
       "      <td>male</td>\n",
       "      <td>group D</td>\n",
       "      <td>33.0</td>\n",
       "      <td>74.0</td>\n",
       "      <td>71</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>MS26</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>39.0</td>\n",
       "      <td>73.0</td>\n",
       "      <td>74</td>\n",
       "      <td>72.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>MS27</td>\n",
       "      <td>male</td>\n",
       "      <td>group B</td>\n",
       "      <td>NaN</td>\n",
       "      <td>69.0</td>\n",
       "      <td>54</td>\n",
       "      <td>55.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>MS28</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>47.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>69</td>\n",
       "      <td>75.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>MS29</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>44.0</td>\n",
       "      <td>70.0</td>\n",
       "      <td>70</td>\n",
       "      <td>65.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>MS30</td>\n",
       "      <td>female</td>\n",
       "      <td>group D</td>\n",
       "      <td>42.0</td>\n",
       "      <td>62.0</td>\n",
       "      <td>70</td>\n",
       "      <td>75.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>MS31</td>\n",
       "      <td>female</td>\n",
       "      <td>group D</td>\n",
       "      <td>41.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>74</td>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>MS32</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>NaN</td>\n",
       "      <td>63.0</td>\n",
       "      <td>65</td>\n",
       "      <td>61.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>MS33</td>\n",
       "      <td>female</td>\n",
       "      <td>group E</td>\n",
       "      <td>29.0</td>\n",
       "      <td>56.0</td>\n",
       "      <td>72</td>\n",
       "      <td>65.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>MS34</td>\n",
       "      <td>male</td>\n",
       "      <td>group D</td>\n",
       "      <td>34.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>42</td>\n",
       "      <td>38.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>MS35</td>\n",
       "      <td>male</td>\n",
       "      <td>group E</td>\n",
       "      <td>45.0</td>\n",
       "      <td>97.0</td>\n",
       "      <td>87</td>\n",
       "      <td>82.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>MS36</td>\n",
       "      <td>male</td>\n",
       "      <td>group E</td>\n",
       "      <td>31.0</td>\n",
       "      <td>81.0</td>\n",
       "      <td>81</td>\n",
       "      <td>79.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>MS37</td>\n",
       "      <td>female</td>\n",
       "      <td>group D</td>\n",
       "      <td>28.0</td>\n",
       "      <td>74.0</td>\n",
       "      <td>81</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>MS38</td>\n",
       "      <td>female</td>\n",
       "      <td>group D</td>\n",
       "      <td>26.0</td>\n",
       "      <td>50.0</td>\n",
       "      <td>64</td>\n",
       "      <td>59.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>MS39</td>\n",
       "      <td>female</td>\n",
       "      <td>group D</td>\n",
       "      <td>36.0</td>\n",
       "      <td>75.0</td>\n",
       "      <td>90</td>\n",
       "      <td>88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>MS40</td>\n",
       "      <td>male</td>\n",
       "      <td>group B</td>\n",
       "      <td>33.0</td>\n",
       "      <td>57.0</td>\n",
       "      <td>56</td>\n",
       "      <td>57.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>MS41</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>35.0</td>\n",
       "      <td>55.0</td>\n",
       "      <td>61</td>\n",
       "      <td>54.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>MS42</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>22.0</td>\n",
       "      <td>58.0</td>\n",
       "      <td>73</td>\n",
       "      <td>68.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>MS43</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>38.0</td>\n",
       "      <td>53.0</td>\n",
       "      <td>58</td>\n",
       "      <td>65.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>MS44</td>\n",
       "      <td>male</td>\n",
       "      <td>group B</td>\n",
       "      <td>31.0</td>\n",
       "      <td>59.0</td>\n",
       "      <td>65</td>\n",
       "      <td>66.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>MS45</td>\n",
       "      <td>female</td>\n",
       "      <td>group E</td>\n",
       "      <td>28.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>56</td>\n",
       "      <td>54.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>MS46</td>\n",
       "      <td>male</td>\n",
       "      <td>group B</td>\n",
       "      <td>26.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>54</td>\n",
       "      <td>57.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>MS47</td>\n",
       "      <td>female</td>\n",
       "      <td>group A</td>\n",
       "      <td>36.0</td>\n",
       "      <td>55.0</td>\n",
       "      <td>65</td>\n",
       "      <td>62.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>MS48</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>30.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>71</td>\n",
       "      <td>76.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>MS49</td>\n",
       "      <td>female</td>\n",
       "      <td>group D</td>\n",
       "      <td>40.0</td>\n",
       "      <td>57.0</td>\n",
       "      <td>74</td>\n",
       "      <td>76.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>MS50</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>37.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>78</td>\n",
       "      <td>81.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   rollno  gender    group   age  math  english  urdu\n",
       "0    MS01  female  group B  28.0  72.0       72  74.0\n",
       "1    MS02  female  group C  33.0  69.0       90  88.0\n",
       "2    MS03  female  group B  21.0   NaN       95  93.0\n",
       "3    MS04    male  group A  44.0  47.0       57  44.0\n",
       "4    MS05    male  group C  54.0  76.0       78   NaN\n",
       "5    MS06  female  group B   NaN  71.0       83  78.0\n",
       "6    MS07  female  group B  47.0  88.0       95  92.0\n",
       "7    MS08    male  group B  33.0  40.0       43  39.0\n",
       "8    MS09    male  group D  27.0  64.0       64  67.0\n",
       "9    MS10  female  group B  33.0  38.0       60  50.0\n",
       "10   MS11    male  group C  46.0  58.0       54  52.0\n",
       "11   MS12    male  group D  53.0  40.0       52  43.0\n",
       "12   MS13  female  group B  25.0  65.0       81  73.0\n",
       "13   MS14    male  group A  26.0  78.0       72  70.0\n",
       "14   MS15  female  group A  25.0  50.0       53  58.0\n",
       "15   MS16  female  group C  29.0  69.0       75  78.0\n",
       "16   MS17    male  group C  22.0  88.0       89  86.0\n",
       "17   MS18  female  group B  31.0  18.0       32  28.0\n",
       "18   MS19    male  group C  33.0  46.0       42  46.0\n",
       "19   MS20  female  group C  36.0  54.0       58   NaN\n",
       "20   MS21    male  group D  34.0  66.0       69  63.0\n",
       "21   MS22  female  group B  32.0  65.0       75  70.0\n",
       "22   MS23    male  group D  31.0  44.0       54  53.0\n",
       "23   MS24  female  group C  37.0   NaN       73  73.0\n",
       "24   MS25    male  group D  33.0  74.0       71  80.0\n",
       "25   MS26    male  group A  39.0  73.0       74  72.0\n",
       "26   MS27    male  group B   NaN  69.0       54  55.0\n",
       "27   MS28  female  group C  47.0  67.0       69  75.0\n",
       "28   MS29    male  group C  44.0  70.0       70  65.0\n",
       "29   MS30  female  group D  42.0  62.0       70  75.0\n",
       "30   MS31  female  group D  41.0  69.0       74  74.0\n",
       "31   MS32  female  group B   NaN  63.0       65  61.0\n",
       "32   MS33  female  group E  29.0  56.0       72  65.0\n",
       "33   MS34    male  group D  34.0  40.0       42  38.0\n",
       "34   MS35    male  group E  45.0  97.0       87  82.0\n",
       "35   MS36    male  group E  31.0  81.0       81  79.0\n",
       "36   MS37  female  group D  28.0  74.0       81   NaN\n",
       "37   MS38  female  group D  26.0  50.0       64  59.0\n",
       "38   MS39  female  group D  36.0  75.0       90  88.0\n",
       "39   MS40    male  group B  33.0  57.0       56  57.0\n",
       "40   MS41    male  group C  35.0  55.0       61  54.0\n",
       "41   MS42  female  group C  22.0  58.0       73  68.0\n",
       "42   MS43  female  group B  38.0  53.0       58  65.0\n",
       "43   MS44    male  group B  31.0  59.0       65  66.0\n",
       "44   MS45  female  group E  28.0   NaN       56  54.0\n",
       "45   MS46    male  group B  26.0  65.0       54  57.0\n",
       "46   MS47  female  group A  36.0  55.0       65  62.0\n",
       "47   MS48  female  group C  30.0  66.0       71  76.0\n",
       "48   MS49  female  group D  40.0  57.0       74  76.0\n",
       "49   MS50    male  group C  37.0  66.0       78  81.0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#The `read_csv`, by default assumes that the file contains comma separated values, \n",
    "# and the first row of the file conatins names of columns, which will be taken as column labels\n",
    "df = pd.read_csv('datasets/classmarks.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a0945b2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a190256",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "d8e68f05",
   "metadata": {},
   "source": [
    "**The `df.head(N)` method is used to select/display first `N` rows, based on `position`, i.e., the integer value corresponding to the position of the row (from 0 to n-1). The default value of `N` is 5.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ab86207",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b06b026d",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "544e5255",
   "metadata": {},
   "outputs": [],
   "source": [
    "# For negative values of n, this method returns all rows except the last `n` rows, equivalent to df[:-n].\n",
    "# The df has a total of 50 rows, so the following will return first 2 rows\n",
    "df.head(-48)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8592e87",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9a20bcc0",
   "metadata": {},
   "source": [
    "**The `df.tail(N)` method is used to select/display last `N` rows, based on `position`, i.e., the integer value corresponding to the position of the row (from 0 to n-1). The default value of `N` is 5.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3ad4272",
   "metadata": {},
   "outputs": [],
   "source": [
    "# tail() method is useful for quickly verifying data, after sorting or appending rows.\n",
    "df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03574cd4",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.tail(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "494b01d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# For negative values of `n`, this function returns all rows except the first `n` rows, equivalent to df[n:]\n",
    "# The df has a total of 50 rows, so the following will return last 3 rows\n",
    "df.tail(-46)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d805f4a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "aa28bae8",
   "metadata": {},
   "source": [
    "### b.Reading a CSV File having a Delimter, other than Comma\n",
    "- By default, the `read_csv()` expect comma as seperator. But if the CSV file has some other seperator or delimiter like (semi-collon or tab), it will raise an error.\n",
    "- To handler the issue we need to pass specific value to the `delimiter` argument of `read_csv()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5650685",
   "metadata": {},
   "outputs": [],
   "source": [
    "! cat datasets/classmarkswithtab.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4e3d8e55",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rollno\\tgender\\tgroup\\tage\\tmath\\tenglish\\turdu</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MS01\\tfemale\\tgroup B\\t28.0\\t72.0\\t72\\t74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS02\\tfemale\\tgroup C\\t33.0\\t69.0\\t90\\t88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS03\\tfemale\\tgroup B\\t21.0\\t\\t95\\t93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS04\\tmale\\tgroup A\\t44.0\\t47.0\\t57\\t44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS05\\tmale\\tgroup C\\t54.0\\t76.0\\t78\\t</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  rollno\\tgender\\tgroup\\tage\\tmath\\tenglish\\turdu\n",
       "0     MS01\\tfemale\\tgroup B\\t28.0\\t72.0\\t72\\t74.0\n",
       "1     MS02\\tfemale\\tgroup C\\t33.0\\t69.0\\t90\\t88.0\n",
       "2         MS03\\tfemale\\tgroup B\\t21.0\\t\\t95\\t93.0\n",
       "3       MS04\\tmale\\tgroup A\\t44.0\\t47.0\\t57\\t44.0\n",
       "4           MS05\\tmale\\tgroup C\\t54.0\\t76.0\\t78\\t"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('datasets/classmarkswithtab.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "0b427ca0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>rollno</td>\n",
       "      <td>gender</td>\n",
       "      <td>group</td>\n",
       "      <td>age</td>\n",
       "      <td>math</td>\n",
       "      <td>english</td>\n",
       "      <td>urdu</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS01</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>28.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>72</td>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS02</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>33.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>90</td>\n",
       "      <td>88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS03</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>21.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>95</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS04</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>44.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>57</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        0       1        2     3     4        5     6\n",
       "0  rollno  gender    group   age  math  english  urdu\n",
       "1    MS01  female  group B  28.0  72.0       72  74.0\n",
       "2    MS02  female  group C  33.0  69.0       90  88.0\n",
       "3    MS03  female  group B  21.0   NaN       95  93.0\n",
       "4    MS04    male  group A  44.0  47.0       57  44.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('datasets/classmarkswithtab.csv', delimiter='\\t', header=None)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4d94b7e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "d7228943",
   "metadata": {},
   "source": [
    "### c. Reading a CSV File not having Column Labels\n",
    "- By default the `read_csv()` method assume the first row of the file will contain column labels\n",
    "- If this is not the case, i.e., the file do not contain column labels rather data, it will be dealt as column label\n",
    "- Understand this in following example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92b3aaaf",
   "metadata": {},
   "outputs": [],
   "source": [
    "! cat datasets/classmarkswithoutcollabels.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "be736d7d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>MS01</th>\n",
       "      <th>female</th>\n",
       "      <th>group B</th>\n",
       "      <th>28</th>\n",
       "      <th>72</th>\n",
       "      <th>72.1</th>\n",
       "      <th>74</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MS02</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>33.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>90</td>\n",
       "      <td>88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS03</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>21.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>95</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS04</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>44.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>57</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS05</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>54.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>78</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS06</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>NaN</td>\n",
       "      <td>71.0</td>\n",
       "      <td>83</td>\n",
       "      <td>78.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   MS01  female  group B    28    72  72.1    74\n",
       "0  MS02  female  group C  33.0  69.0    90  88.0\n",
       "1  MS03  female  group B  21.0   NaN    95  93.0\n",
       "2  MS04    male  group A  44.0  47.0    57  44.0\n",
       "3  MS05    male  group C  54.0  76.0    78   NaN\n",
       "4  MS06  female  group B   NaN  71.0    83  78.0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('datasets/classmarkswithoutcollabels.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4a78813",
   "metadata": {},
   "source": [
    "**To read such files, you have to pass the parameter `header=None` to the `read_csv()` method as shown below**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8d228941",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MS01</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>28.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>72</td>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS02</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>33.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>90</td>\n",
       "      <td>88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS03</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>21.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>95</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS04</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>44.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>57</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS05</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>54.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>78</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      0       1        2     3     4   5     6\n",
       "0  MS01  female  group B  28.0  72.0  72  74.0\n",
       "1  MS02  female  group C  33.0  69.0  90  88.0\n",
       "2  MS03  female  group B  21.0   NaN  95  93.0\n",
       "3  MS04    male  group A  44.0  47.0  57  44.0\n",
       "4  MS05    male  group C  54.0  76.0  78   NaN"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('datasets/classmarkswithoutcollabels.csv', header=None)\n",
    "df.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb895f0e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b9783830",
   "metadata": {},
   "source": [
    "**Now if you want to assign new column labels to make them more understandable, you can assign the list of column labels to the `columns` attribute of the dataframe object**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "404229bb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rollno</th>\n",
       "      <th>gender</th>\n",
       "      <th>group</th>\n",
       "      <th>age</th>\n",
       "      <th>math</th>\n",
       "      <th>english</th>\n",
       "      <th>urdu</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MS01</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>28.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>72</td>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS02</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>33.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>90</td>\n",
       "      <td>88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS03</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>21.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>95</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS04</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>44.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>57</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS05</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>54.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>78</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  rollno  gender    group   age  math  english  urdu\n",
       "0   MS01  female  group B  28.0  72.0       72  74.0\n",
       "1   MS02  female  group C  33.0  69.0       90  88.0\n",
       "2   MS03  female  group B  21.0   NaN       95  93.0\n",
       "3   MS04    male  group A  44.0  47.0       57  44.0\n",
       "4   MS05    male  group C  54.0  76.0       78   NaN"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col_names = ['rollno', 'gender', 'group', 'age', 'math', 'english', 'urdu']\n",
    "df.columns = col_names\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "045f907c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eedf135e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "736ff36e",
   "metadata": {},
   "source": [
    "### d. Reading a CSV File having Comments in the beginning\n",
    "- You may get an error while reading a CSV file because someone may have added few comments on the top of the file. In pandas we can still read the data set by skipping few rows from the top.\n",
    "- To deal with the ParseError, open the csv file in the text editor and check if you have some comments on the top.\n",
    "- If yes, then count the number of rows to skip.\n",
    "- While reading file, pass the parameter **skiprows = n** (number of rows in the beginninghaving comments to skip)\n",
    "- While reading file, pass the parameter **skipfooter = n** (number of rows at the end having comments to skip)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "4ca9ed6b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "These are comments,,,,,,\r",
      "\r\n",
      "These are second line of comments,,,,,,\r",
      "\r\n",
      "These are third line of comments,,,,,,\r",
      "\r\n",
      "rollno,gender,group,age,math,english,urdu\r",
      "\r\n",
      "MS01,female,group B,28,72,72,74\r",
      "\r\n",
      "MS02,female,group C,33,69,90,88\r",
      "\r\n",
      "MS03,female,group B,21,,95,93\r",
      "\r\n",
      "MS04,male,group A,44,47,57,44\r",
      "\r\n",
      "MS05,male,group C,54,76,78,\r",
      "\r\n",
      "MS06,female,group B,,71,83,78\r",
      "\r\n",
      "MS07,female,group B,47,88,95,92\r",
      "\r\n",
      "MS08,male,group B,33,40,43,39\r",
      "\r\n",
      "MS09,male,group D,27,64,64,67\r",
      "\r\n",
      "MS10,female,group B,33,38,60,50\r",
      "\r\n",
      "MS11,male,group C,46,58,54,52\r",
      "\r\n",
      "MS12,male,group D,53,40,52,43\r",
      "\r\n",
      "MS13,female,group B,25,65,81,73\r",
      "\r\n",
      "MS14,male,group A,26,78,72,70\r",
      "\r\n",
      "MS15,female,group A,25,50,53,58\r",
      "\r\n",
      "MS16,female,group C,29,69,75,78\r",
      "\r\n",
      "MS17,male,group C,22,88,89,86\r",
      "\r\n",
      "MS18,female,group B,31,18,32,28\r",
      "\r\n",
      "MS19,male,group C,33,46,42,46\r",
      "\r\n",
      "MS20,female,group C,36,54,58,\r",
      "\r\n",
      "MS21,male,group D,34,66,69,63\r",
      "\r\n",
      "MS22,female,group B,32,65,75,70\r",
      "\r\n",
      "MS23,male,group D,31,44,54,53\r",
      "\r\n",
      "MS24,female,group C,37,,73,73\r",
      "\r\n",
      "MS25,male,group D,33,74,71,80\r",
      "\r\n",
      "MS26,male,group A,39,73,74,72\r",
      "\r\n",
      "MS27,male,group B,,69,54,55\r",
      "\r\n",
      "MS28,female,group C,47,67,69,75\r",
      "\r\n",
      "MS29,male,group C,44,70,70,65\r",
      "\r\n",
      "MS30,female,group D,42,62,70,75\r",
      "\r\n",
      "MS31,female,group D,41,69,74,74\r",
      "\r\n",
      "MS32,female,group B,,63,65,61\r",
      "\r\n",
      "MS33,female,group E,29,56,72,65\r",
      "\r\n",
      "MS34,male,group D,34,40,42,38\r",
      "\r\n",
      "MS35,male,group E,45,97,87,82\r",
      "\r\n",
      "MS36,male,group E,31,81,81,79\r",
      "\r\n",
      "MS37,female,group D,28,74,81,\r",
      "\r\n",
      "MS38,female,group D,26,50,64,59\r",
      "\r\n",
      "MS39,female,group D,36,75,90,88\r",
      "\r\n",
      "MS40,male,group B,33,57,56,57\r",
      "\r\n",
      "MS41,male,group C,35,55,61,54\r",
      "\r\n",
      "MS42,female,group C,22,58,73,68\r",
      "\r\n",
      "MS43,female,group B,38,53,58,65\r",
      "\r\n",
      "MS44,male,group B,31,59,65,66\r",
      "\r\n",
      "MS45,female,group E,28,,56,54\r",
      "\r\n",
      "MS46,male,group B,26,65,54,57\r",
      "\r\n",
      "MS47,female,group A,36,55,65,62\r",
      "\r\n",
      "MS48,female,group C,30,66,71,76\r",
      "\r\n",
      "MS49,female,group D,40,57,74,76\r",
      "\r\n",
      "MS50,male,group C,37,66,78,81"
     ]
    }
   ],
   "source": [
    "! cat datasets/classmarkswithtopcomments.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "c60f6550",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>These are comments</th>\n",
       "      <th>Unnamed: 1</th>\n",
       "      <th>Unnamed: 2</th>\n",
       "      <th>Unnamed: 3</th>\n",
       "      <th>Unnamed: 4</th>\n",
       "      <th>Unnamed: 5</th>\n",
       "      <th>Unnamed: 6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>These are second line of comments</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>These are third line of comments</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>rollno</td>\n",
       "      <td>gender</td>\n",
       "      <td>group</td>\n",
       "      <td>age</td>\n",
       "      <td>math</td>\n",
       "      <td>english</td>\n",
       "      <td>urdu</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS01</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>28</td>\n",
       "      <td>72</td>\n",
       "      <td>72</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS02</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>33</td>\n",
       "      <td>69</td>\n",
       "      <td>90</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  These are comments Unnamed: 1 Unnamed: 2 Unnamed: 3  \\\n",
       "0  These are second line of comments        NaN        NaN        NaN   \n",
       "1   These are third line of comments        NaN        NaN        NaN   \n",
       "2                             rollno     gender      group        age   \n",
       "3                               MS01     female    group B         28   \n",
       "4                               MS02     female    group C         33   \n",
       "\n",
       "  Unnamed: 4 Unnamed: 5 Unnamed: 6  \n",
       "0        NaN        NaN        NaN  \n",
       "1        NaN        NaN        NaN  \n",
       "2       math    english       urdu  \n",
       "3         72         72         74  \n",
       "4         69         90         88  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Try reading a csv file having 3 comments lines in the beginning.\n",
    "df = pd.read_csv('datasets/classmarkswithtopcomments.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc580e77",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Try reading a csv file having 3 comments lines in the beginning.\n",
    "df = pd.read_csv('datasets/classmarkswithtopcomments.csv', skiprows=3)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6ad9283",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "255294ba",
   "metadata": {},
   "source": [
    "### e. Reading a portion of CSV File in a Dataframe\n",
    "- Suppose the dataset inside the csv file is too big and you don't want to spend that much time for reading that data\n",
    "- Or might be your system crashes, when you try to load that much data\n",
    "- Solution is read\n",
    "    - Specific number of rows by passing `nrows` parameter to `read_csv()` method\n",
    "    - Specific number of columns by passing `usecols` parameter to `read_csv()` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "347ed4a8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10, 7)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read just 10 rows from the csv file by passing the number of rows to read to `nrows` argument\n",
    "df = pd.read_csv('datasets/classmarks.csv', nrows=10)\n",
    "df.shape\n",
    "#df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac7ad23b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read specific columns from the csv file by passing a list of column names to `usecols` argument\n",
    "df = pd.read_csv('datasets/classmarks.csv', usecols= ['rollno', 'group','english'])\n",
    "df.shape\n",
    "#df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "08ad1ccf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7, 3)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Ofcourse you can use both the parameters at the same time\n",
    "df = pd.read_csv('datasets/classmarks.csv', nrows= 7, usecols= ['rollno', 'group','english'])\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "ab2470e3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rollno</th>\n",
       "      <th>group</th>\n",
       "      <th>english</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MS01</td>\n",
       "      <td>group B</td>\n",
       "      <td>72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS02</td>\n",
       "      <td>group C</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS03</td>\n",
       "      <td>group B</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS04</td>\n",
       "      <td>group A</td>\n",
       "      <td>57</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS05</td>\n",
       "      <td>group C</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>MS06</td>\n",
       "      <td>group B</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>MS07</td>\n",
       "      <td>group B</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  rollno    group  english\n",
       "0   MS01  group B       72\n",
       "1   MS02  group C       90\n",
       "2   MS03  group B       95\n",
       "3   MS04  group A       57\n",
       "4   MS05  group C       78\n",
       "5   MS06  group B       83\n",
       "6   MS07  group B       95"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aee98254",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37a24ec5",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "aff77179",
   "metadata": {},
   "source": [
    "## 2. Reading a CSV File from a Remote System"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "5140b250",
   "metadata": {},
   "outputs": [],
   "source": [
    "# To avoid URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED]..... \n",
    "import ssl\n",
    "ssl._create_default_https_context = ssl._create_unverified_context"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffddc381",
   "metadata": {},
   "source": [
    "### a. Reading a CSV file from GitHub Gist"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3218b2e9",
   "metadata": {},
   "source": [
    "The `data1.csv` file actually resides on my GitHub Gist at following URL:\n",
    "https://gist.githubusercontent.com/arifpucit/bbcb0bba0b5c245585b375f273f17876/raw/28ddebe991c86f7001178896329005ea174f2bde/data1.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6cae0a2",
   "metadata": {},
   "source": [
    "Bitly is a URL shortening service that I have used to create a short link for easy usage in the cell below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "b7eb02da",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Names</th>\n",
       "      <th>Age</th>\n",
       "      <th>Addr</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Arif</td>\n",
       "      <td>50</td>\n",
       "      <td>Lahore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Rauf</td>\n",
       "      <td>52</td>\n",
       "      <td>Islamabad</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Maaz</td>\n",
       "      <td>27</td>\n",
       "      <td>Peshawer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Hadeed</td>\n",
       "      <td>22</td>\n",
       "      <td>Islamabad</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Mujahid</td>\n",
       "      <td>18</td>\n",
       "      <td>Karachi</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Names  Age       Addr\n",
       "0     Arif   50     Lahore\n",
       "1     Rauf   52  Islamabad\n",
       "2     Maaz   27   Peshawer\n",
       "3   Hadeed   22  Islamabad\n",
       "4  Mujahid   18    Karachi"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "myurl = 'https://bit.ly/31eYGTx'\n",
    "df = pd.read_csv(myurl)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "055a3906",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "f312831d",
   "metadata": {},
   "source": [
    "### b. Reading a CSV file from a Google Docs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "278b2d6e",
   "metadata": {},
   "source": [
    "- Google Sheet URL:  https://docs.google.com/spreadsheets/d/1H9ZTGVRXN3zuyP3cbJQnbX7JlqhqFNMOE_WwDP-PRTE/edit#gid=2084742287\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "60c5a6ca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rollno</th>\n",
       "      <th>gender</th>\n",
       "      <th>group</th>\n",
       "      <th>age</th>\n",
       "      <th>math</th>\n",
       "      <th>english</th>\n",
       "      <th>urdu</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MS01</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>28.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>72</td>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS02</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>33.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>90</td>\n",
       "      <td>88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS03</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>21.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>95</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS04</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>44.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>57</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS05</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>54.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>78</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  rollno  gender    group   age  math  english  urdu\n",
       "0   MS01  female  group B  28.0  72.0       72  74.0\n",
       "1   MS02  female  group C  33.0  69.0       90  88.0\n",
       "2   MS03  female  group B  21.0   NaN       95  93.0\n",
       "3   MS04    male  group A  44.0  47.0       57  44.0\n",
       "4   MS05    male  group C  54.0  76.0       78   NaN"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sheetID = '1H9ZTGVRXN3zuyP3cbJQnbX7JlqhqFNMOE_WwDP-PRTE'\n",
    "sheetName = 'sheet1'\n",
    "URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(sheetID, sheetName)\n",
    "\n",
    "df = pd.read_csv(URL)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "3043328a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'https://docs.google.com/spreadsheets/d/1H9ZTGVRXN3zuyP3cbJQnbX7JlqhqFNMOE_WwDP-PRTE/gviz/tq?tqx=out:csv&sheet=sheet1'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da021170",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "3bec93fe",
   "metadata": {},
   "source": [
    "## 3. Writing Contents of Dataframe to a CSV File\n",
    "- The `pd.to_csv()` method is used to write the contents of a dataframe (with indices) to a CSV file.\n",
    "- The only required argument is the file path.\n",
    "- For details see help page or python documents (link given above)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87862682",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_class = pd.read_csv('datasets/classmarks.csv')\n",
    "df_class.head(7)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "176a8515",
   "metadata": {},
   "source": [
    ">- Let us create a new dataframe from above dataframe containing records of only group B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05d35cf1",
   "metadata": {},
   "outputs": [],
   "source": [
    "mask = (df_class['group'] == 'group B')\n",
    "mask.head(7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "519347c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_class_groupB = df_class.loc[mask]\n",
    "df_class_groupB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d92b6ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_class_groupB.to_csv('datasets/classmarksgroupB.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a4da09b",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('datasets/classmarksgroupB.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c378d68",
   "metadata": {},
   "source": [
    ">To avoid writing the row indices column inside the file pass `index=False` argument to `to_csv()` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99b286a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_class_groupB.to_csv('datasets/classmarksgroupB.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6d69405",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('datasets/classmarksgroupB.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7eb77253",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "dc33cd11",
   "metadata": {},
   "source": [
    "## 4. I/O with EXCEL Files\n",
    ">**XLSX**: XLSX is a Microsoft Excel Open XML file format. It also comes under the Spreadsheet file format. It is an XML-based file format created by Microsoft Excel. In XLSX data is organized under the cells and columns in a sheet. Each XLSX file may contain one or more sheets. So a workbook can contain multiple sheets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae3b2500",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install xlrd xlwt openpyxl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3d1d092",
   "metadata": {},
   "source": [
    "### a. Reading a Simple Excel File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "3cf3d333",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rollno</th>\n",
       "      <th>gender</th>\n",
       "      <th>group</th>\n",
       "      <th>age</th>\n",
       "      <th>math</th>\n",
       "      <th>english</th>\n",
       "      <th>urdu</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MS01</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>28.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>72</td>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MS02</td>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>33.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>90</td>\n",
       "      <td>88.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MS03</td>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>21.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>95</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MS04</td>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>44.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>57</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MS05</td>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>54.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>78</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  rollno  gender    group   age  math  english  urdu\n",
       "0   MS01  female  group B  28.0  72.0       72  74.0\n",
       "1   MS02  female  group C  33.0  69.0       90  88.0\n",
       "2   MS03  female  group B  21.0   NaN       95  93.0\n",
       "3   MS04    male  group A  44.0  47.0       57  44.0\n",
       "4   MS05    male  group C  54.0  76.0       78   NaN"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_excel(io='datasets/classmarks.xlsx')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3430bd49",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a08102cc",
   "metadata": {},
   "source": [
    "### b. Reading an Excel File having Comments in the beginning\n",
    "- You may get an error while reading an Excel file because someone may have added few comments on the top of the file. In pandas we can still read the data set by skipping few rows from the top.\n",
    "- To deal with the ParseError, open the Excel file in MS EXCEL and check if you have some comments on the top.\n",
    "- If yes, then count the number of rows to skip.\n",
    "- While reading file, pass the parameter **skiprows = n** (number of rows in the beginning having comments to skip)\n",
    "- While reading file, pass the parameter **skipfooter = n** (number of rows at the end having comments to skip)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6cccdcd0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The following file has three lines of comments in the beginning of the file.\n",
    "df = pd.read_excel(io='datasets/classmarkswithcomments.xlsx',skiprows=3)\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8edcc587",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1ff5b811",
   "metadata": {},
   "source": [
    "### c. Reading Excel Workbook with Multiple Sheets\n",
    "- By default `pd.read_excel()` function read only the first sheet.\n",
    "- What if we want to read an Excel file having multiple sheets.\n",
    "- The `big_mart_sales_with_multiple_sheets.xlsx` is a workbook that contains three sheets for different years data. The sheet names are 1985, 1987, and 1997"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "7e857867",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1463, 12)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_excel('datasets/big_mart_sales_with_multiple_sheets.xlsx')\n",
    "# if you check/view the data you can see, it only contains the data of first excel sheet (for the year 1985)\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "9b7966ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_1985 = pd.read_excel('datasets/big_mart_sales_with_multiple_sheets.xlsx',sheet_name='1985')\n",
    "df_1987 = pd.read_excel('datasets/big_mart_sales_with_multiple_sheets.xlsx',sheet_name='1987')\n",
    "df_1997 = pd.read_excel('datasets/big_mart_sales_with_multiple_sheets.xlsx',sheet_name='1997')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "47ce9c66",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1463, 12)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_1985.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "939cf540",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(932, 12)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_1987.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "687a6a4c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(930, 12)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_1997.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bdcdad44",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "229e0221",
   "metadata": {},
   "source": [
    "## 5. Writing Contents of Dataframe to an EXCEL File\n",
    "- The `pd.to_excel()` method is used to write the contents of a dataframe (with indices) to an Excel file.\n",
    "- The only required argument is the file path.\n",
    "- For details see help page or python documents (link given above)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b30e9d9",
   "metadata": {},
   "source": [
    ">- Let us create a new single dataframe after concatenating all the above three dataframes using `pd.concat()` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "6a30fd89",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3325, 12)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_concatenated = pd.concat(objs=[df_1985, df_1987, df_1997])\n",
    "\n",
    "df_concatenated.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f28d1c9f",
   "metadata": {},
   "source": [
    "**Note the total number of rows in this dataframe equals to `1463+932+930 = 3325`**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b40bb0f",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_concatenated.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a895a92a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# you can store the concatenated data inside your dataframe into a single Excel file\n",
    "# You can mention the argument `index= false` for not storing row indices (0, 1,2,3,... in the Excel file.\n",
    "\n",
    "df_concatenated.to_excel(excel_writer='temp.xlsx', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec95f340",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let us verify\n",
    "data = pd.read_excel(io='temp.xlsx')\n",
    "\n",
    "data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28a845bd",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "7f84a0f6",
   "metadata": {},
   "source": [
    "## 6. I/O with JSON Files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6219e7d6",
   "metadata": {},
   "source": [
    ">**JSON**: JavaScript Object Notation is a text-based open standard file format that uses human-readable text consisting of attribute–value pairs and arrays. It is a data interchange format that is used to store and transfer the data via Internet, primarily between a web client and a server."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc7f1cf8",
   "metadata": {},
   "source": [
    "### a. Reading a Simple JSON File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d59f6291",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install SQLAlchemy psycopg2-binary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "6900634f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{\"name\": \"Kamal\", \"age\": 12, \"grade\": \"A\"}, {\"name\": \"Hashim\", \"age\": 18, \"grade\": \"B\"}, {\"name\": \"Salman\", \"age\": 11, \"grade\": \"A\"}, {\"name\": \"Mazhar\", \"age\": 12, \"grade\": \"C\"}, {\"name\": \"Eisha\", \"age\": 13, \"grade\": \"B\"}, {\"name\": \"Farhan\", \"age\": 22, \"grade\": \"C\"}, {\"name\": \"Mohsin\", \"age\": 11, \"grade\": \"A\"}, {\"name\": \"Bilal\", \"age\": 19, \"grade\": \"A\"}, {\"name\": \"Ishaan\", \"age\": 10, \"grade\": \"D\"}, {\"name\": \"Zalaid\", \"age\": 9, \"grade\": \"B\"}]"
     ]
    }
   ],
   "source": [
    "! cat datasets/simple.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "f7bf34ec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Kamal</td>\n",
       "      <td>12</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Hashim</td>\n",
       "      <td>18</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Salman</td>\n",
       "      <td>11</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mazhar</td>\n",
       "      <td>12</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eisha</td>\n",
       "      <td>13</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Farhan</td>\n",
       "      <td>22</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Mohsin</td>\n",
       "      <td>11</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Bilal</td>\n",
       "      <td>19</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Ishaan</td>\n",
       "      <td>10</td>\n",
       "      <td>D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Zalaid</td>\n",
       "      <td>9</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     name  age grade\n",
       "0   Kamal   12     A\n",
       "1  Hashim   18     B\n",
       "2  Salman   11     A\n",
       "3  Mazhar   12     C\n",
       "4   Eisha   13     B\n",
       "5  Farhan   22     C\n",
       "6  Mohsin   11     A\n",
       "7   Bilal   19     A\n",
       "8  Ishaan   10     D\n",
       "9  Zalaid    9     B"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read the json file using read_json method of pandas library\n",
    "df = pd.read_json('datasets/simple.json')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5bc074c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "55cdd441",
   "metadata": {},
   "source": [
    "### b. Reading JSON File having each record in a separate line\n",
    "- Some of the json files are written as records i.e each json line is a separate json object. For example:\n",
    "```\n",
    "{ 'name' : 'Ahsan', 'roll_no' : '100' } # line 1\n",
    "{ 'name' : 'Ayesha' , 'roll_no' : '101' } # line 2\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "673fc206",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\"name\": \"Kamal\", \"age\": 12, \"grade\": \"A\"}\r\n",
      "{\"name\": \"Hashim\", \"age\": 18, \"grade\": \"B\"}\r\n",
      "{\"name\": \"Salman\", \"age\": 11, \"grade\": \"A\"}\r\n",
      "{\"name\": \"Mazhar\", \"age\": 12, \"grade\": \"C\"}\r\n",
      "{\"name\": \"Eisha\", \"age\": 13, \"grade\": \"B\"}\r\n",
      "{\"name\": \"Farhan\", \"age\": 22, \"grade\": \"C\"}\r\n",
      "{\"name\": \"Mohsin\", \"age\": 11, \"grade\": \"A\"}\r\n",
      "{\"name\": \"Bilal\", \"age\": 19, \"grade\": \"A\"}\r\n",
      "{\"name\": \"Ishaan\", \"age\": 10, \"grade\": \"D\"}\r\n",
      "{\"name\": \"Zalaid\", \"age\": 9, \"grade\": \"B\"}\r\n"
     ]
    }
   ],
   "source": [
    "! cat datasets/simple_records.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "be13ceb0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Kamal</td>\n",
       "      <td>12</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Hashim</td>\n",
       "      <td>18</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Salman</td>\n",
       "      <td>11</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mazhar</td>\n",
       "      <td>12</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eisha</td>\n",
       "      <td>13</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Farhan</td>\n",
       "      <td>22</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Mohsin</td>\n",
       "      <td>11</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Bilal</td>\n",
       "      <td>19</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Ishaan</td>\n",
       "      <td>10</td>\n",
       "      <td>D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Zalaid</td>\n",
       "      <td>9</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     name  age grade\n",
       "0   Kamal   12     A\n",
       "1  Hashim   18     B\n",
       "2  Salman   11     A\n",
       "3  Mazhar   12     C\n",
       "4   Eisha   13     B\n",
       "5  Farhan   22     C\n",
       "6  Mohsin   11     A\n",
       "7   Bilal   19     A\n",
       "8  Ishaan   10     D\n",
       "9  Zalaid    9     B"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# To read such file you need to pass `lines=True` to the `read_json()` method of dataframe\n",
    "df = pd.read_json('datasets/simple_records.json',lines=True)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff918dd0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "4abc6869",
   "metadata": {},
   "source": [
    "## 7. Writing Contents of Dataframe to a JSON File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "683f9b20",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_json('datasets/temp.json')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a566ec5",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_json('datasets/temp.json')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d2aa6ae",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}