{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<table>\n",
    "<tr>\n",
    "<td width=15%><img src=\"./img/UGA.png\"></img></td>\n",
    "<td><center><h1>Introduction to Python for Data Sciences</h1></center></td>\n",
    "<td width=15%><a href=\"http://www.iutzeler.org\" style=\"font-size: 16px; font-weight: bold\">Franck Iutzeler</a> </td>\n",
    "</tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<br/><br/>\n",
    "\n",
    "<center><a style=\"font-size: 40pt; font-weight: bold\">Chap. 3 - Data Handling with Pandas </a></center> \n",
    "\n",
    "<br/><br/>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1- Pandas\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "In a previous chapter, we explored some features of NumPy and notably its arrays. Here we will take a look at the data structures provided by the **Pandas** library.\n",
    "\n",
    "Pandas is a newer package built on top of NumPy which provides an efficient implementation of **DataFrames**. DataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations.\n",
    "\n",
    "\n",
    "\n",
    "Just as we generally import NumPy under the alias ``np``, we will import Pandas under the alias ``pd``.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Pandas Series\n",
    "\n",
    "\n",
    "A Pandas `Series` is a one-dimensional array of indexed data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.25\n",
       "1    0.50\n",
       "2    0.75\n",
       "3    1.00\n",
       "dtype: float64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.Series([0.25, 0.5, 0.75, 1.0])\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The contents can be accessed in the same way as for NumPy arrays, to the difference that when more than one value is selected, the type remains a Pandas ``Series``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.25 <class 'numpy.float64'>\n"
     ]
    }
   ],
   "source": [
    "print(data[0],type(data[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2    0.75\n",
      "3    1.00\n",
      "dtype: float64 <class 'pandas.core.series.Series'>\n"
     ]
    }
   ],
   "source": [
    "print(data[2:],type(data[2:]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The type ``Series`` wraps both a sequence of values and a sequence of indices, which we can access with the <tt>values</tt> and <tt>index</tt> attributes.\n",
    "\n",
    "* ``values`` are the contents of the series as a NumPy array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.25 0.5  0.75 1.  ] <class 'numpy.ndarray'>\n"
     ]
    }
   ],
   "source": [
    "print(data.values,type(data.values))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* ``index`` are the indices of the series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RangeIndex(start=0, stop=4, step=1) <class 'pandas.core.indexes.range.RangeIndex'>\n"
     ]
    }
   ],
   "source": [
    "print(data.index,type(data.index))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series Indices\n",
    "\n",
    "The main difference between NumPy arrays and Pandas Series is the presence of this <tt>index</tt> field. By default, it is set (as in NumPy arrays) as <tt>0,1,..,size_of_the_series</tt> but a Series index can be explicitly defined. The indices may be numbers but also strings. Then, the contents of the series *have to* be accessed using these defined indices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a    0.25\n",
      "b    0.50\n",
      "c    0.75\n",
      "d    1.00\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.75\n"
     ]
    }
   ],
   "source": [
    "print(data['c'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1    0.25\n",
      "3    0.50\n",
      "4    0.75\n",
      "2    1.00\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "data = pd.Series([0.25, 0.5, 0.75, 1.0], index=[1, 3, 4, 2])\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.0\n"
     ]
    }
   ],
   "source": [
    "print(data[2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series and Python Dictionaries [\\*] \n",
    "\n",
    "Pandas Series and Python Dictionaries are close semantically: mappping keys to values. However, the implementation of Pandas series is usually more efficient than dictionaries in the context of data science. Naturally, Series can be contructed from dictionaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135} <class 'dict'>\n",
      "California    38332521\n",
      "Texas         26448193\n",
      "New York      19651127\n",
      "Florida       19552860\n",
      "Illinois      12882135\n",
      "dtype: int64 <class 'pandas.core.series.Series'>\n"
     ]
    }
   ],
   "source": [
    "population_dict = {'California': 38332521,\n",
    "                   'Texas': 26448193,\n",
    "                   'New York': 19651127,\n",
    "                   'Florida': 19552860,\n",
    "                   'Illinois': 12882135}\n",
    "population = pd.Series(population_dict)\n",
    "print(population_dict,type(population_dict))\n",
    "print(population,type(population))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "38332521"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "population['California']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California    38332521\n",
       "Texas         26448193\n",
       "New York      19651127\n",
       "Florida       19552860\n",
       "Illinois      12882135\n",
       "dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "population['California':'Illinois']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Pandas DataFrames\n",
    "\n",
    "DataFrames is a fundamental object of Pandas that mimicks what can be found in `R` for instance. Dataframes can be seen as an array of Series: to each `index` (corresponding to an individual for instance or a line in a table), a Dataframe maps multiples values; these values corresponds to the `columns` of the DataFrame which each have a name (as a string).   \n",
    "\n",
    "\n",
    "In the following example, we will construct a Dataframe from two Series with common indices. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "area = pd.Series( {'California': 423967, 'Texas': 695662, 'New York': 141297, 'Florida': 170312, 'Illinois': 149995})\n",
    "population = pd.Series({'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            Population    Area\n",
      "California    38332521  423967\n",
      "Texas         26448193  695662\n",
      "New York      19651127  141297\n",
      "Florida       19552860  170312\n",
      "Illinois      12882135  149995 <class 'pandas.core.frame.DataFrame'>\n"
     ]
    }
   ],
   "source": [
    "states = pd.DataFrame({'Population': population, 'Area': area})\n",
    "print(states,type(states))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In Jupyter notebooks, DataFrames are displayed in a fancier way when the name of the dataframe is typed (instead of using <tt>print</tt>)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Population</th>\n",
       "      <th>Area</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "      <td>423967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "      <td>695662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19651127</td>\n",
       "      <td>141297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>19552860</td>\n",
       "      <td>170312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>12882135</td>\n",
       "      <td>149995</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Population    Area\n",
       "California    38332521  423967\n",
       "Texas         26448193  695662\n",
       "New York      19651127  141297\n",
       "Florida       19552860  170312\n",
       "Illinois      12882135  149995"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "states"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrames have \n",
    "* <tt>index</tt> that are the defined indices as in Series\n",
    "* <tt>columns</tt> that are the columns names\n",
    "* <tt>values</tt> that return a (2D) NumPy array with the contents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')\n",
      "Index(['Population', 'Area'], dtype='object')\n",
      "[[38332521   423967]\n",
      " [26448193   695662]\n",
      " [19651127   141297]\n",
      " [19552860   170312]\n",
      " [12882135   149995]] <class 'numpy.ndarray'> (5, 2)\n"
     ]
    }
   ],
   "source": [
    "print(states.index)\n",
    "print(states.columns)\n",
    "print(states.values,type(states.values),states.values.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Warning:*  When accessing a Dataframe, `dataframe_name[column_name]` return the corresponding column as a Series. `dataframe_name[index_name]` returns an error! We will see later how to access a specific index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "California    423967\n",
      "Texas         695662\n",
      "New York      141297\n",
      "Florida       170312\n",
      "Illinois      149995\n",
      "Name: Area, dtype: int64 <class 'pandas.core.series.Series'>\n"
     ]
    }
   ],
   "source": [
    "print(states['Area'],type(states['Area']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "KeyError:  'California'\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    print(states['California'])\n",
    "except KeyError as error: \n",
    "    print(\"KeyError: \",error)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dataframe creation\n",
    "\n",
    "To create DataFrames, the main methods are:\n",
    "* from Series (as above)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "California    38332521\n",
      "Texas         26448193\n",
      "New York      19651127\n",
      "Florida       19552860\n",
      "Illinois      12882135\n",
      "dtype: int64 <class 'pandas.core.series.Series'>\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Population</th>\n",
       "      <th>Area</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "      <td>423967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "      <td>695662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19651127</td>\n",
       "      <td>141297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>19552860</td>\n",
       "      <td>170312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>12882135</td>\n",
       "      <td>149995</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Population    Area\n",
       "California    38332521  423967\n",
       "Texas         26448193  695662\n",
       "New York      19651127  141297\n",
       "Florida       19552860  170312\n",
       "Illinois      12882135  149995"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(population,type(population))\n",
    "states = pd.DataFrame({'Population': population, 'Area': area})\n",
    "states"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* from NumPy arrays (the columns and indices are taken as the array's ones)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-1.47908983  0.55834675 -0.68109792]\n",
      " [ 1.18023681  1.82871481  0.0944462 ]\n",
      " [-0.22391784  0.26061809  0.68857944]\n",
      " [-1.75644104  0.74439857 -0.45926716]\n",
      " [-0.90534641 -1.57246221  2.28871663]] <class 'numpy.ndarray'>\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-1.479090</td>\n",
       "      <td>0.558347</td>\n",
       "      <td>-0.681098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.180237</td>\n",
       "      <td>1.828715</td>\n",
       "      <td>0.094446</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.223918</td>\n",
       "      <td>0.260618</td>\n",
       "      <td>0.688579</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-1.756441</td>\n",
       "      <td>0.744399</td>\n",
       "      <td>-0.459267</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.905346</td>\n",
       "      <td>-1.572462</td>\n",
       "      <td>2.288717</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0 -1.479090  0.558347 -0.681098\n",
       "1  1.180237  1.828715  0.094446\n",
       "2 -0.223918  0.260618  0.688579\n",
       "3 -1.756441  0.744399 -0.459267\n",
       "4 -0.905346 -1.572462  2.288717"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A = np.random.randn(5,3)\n",
    "print(A,type(A))\n",
    "dfA = pd.DataFrame(A)\n",
    "dfA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* from a *list* of *dictionaries*. Be careful, each element of the list is an example (corresponding to an automatic index 0,1,...) while each key of the dictonary corresponds to a column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'a': 0, 'b': 0}, {'a': 1, 'b': 2}, {'a': 2, 'b': 4}] <class 'list'>\n",
      "{'a': 0, 'b': 0} <class 'dict'>\n"
     ]
    }
   ],
   "source": [
    "data = [{'a': i, 'b': 2 * i} for i in range(3)]\n",
    "print(data,type(data))\n",
    "print(data[0],type(data[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b\n",
       "0  0  0\n",
       "1  1  2\n",
       "2  2  4"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(data)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* from a *file* , typically a <tt>csv</tt> file (for comma separated values), eventually with the names of the columns as a first line.\n",
    "\n",
    "\n",
    "    col_1_name,col_2_name,col_3_name\n",
    "    col_1_v1,col_2_v1,col_3_v1\n",
    "    col_1_v2,col_2_v2,col_3_v2\n",
    "    ...\n",
    "    \n",
    "For other files types (MS Excel, libSVM, any other separator) see this [part of the doc](https://pandas.pydata.org/pandas-docs/stable/api.html#input-output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "order,name,height(cm)\r\n",
      "1,George Washington,189\r\n",
      "2,John Adams,170\r\n",
      "3,Thomas Jefferson,189\r\n"
     ]
    }
   ],
   "source": [
    "!head -4 data/president_heights.csv # Jupyter bash command to see the first 4 lines of the file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>order</th>\n",
       "      <th>name</th>\n",
       "      <th>height(cm)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>George Washington</td>\n",
       "      <td>189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>John Adams</td>\n",
       "      <td>170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Thomas Jefferson</td>\n",
       "      <td>189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>James Madison</td>\n",
       "      <td>163</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>James Monroe</td>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>John Quincy Adams</td>\n",
       "      <td>171</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>Andrew Jackson</td>\n",
       "      <td>185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>Martin Van Buren</td>\n",
       "      <td>168</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>William Henry Harrison</td>\n",
       "      <td>173</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>John Tyler</td>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>11</td>\n",
       "      <td>James K. Polk</td>\n",
       "      <td>173</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>12</td>\n",
       "      <td>Zachary Taylor</td>\n",
       "      <td>173</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13</td>\n",
       "      <td>Millard Fillmore</td>\n",
       "      <td>175</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14</td>\n",
       "      <td>Franklin Pierce</td>\n",
       "      <td>178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>15</td>\n",
       "      <td>James Buchanan</td>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>16</td>\n",
       "      <td>Abraham Lincoln</td>\n",
       "      <td>193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>17</td>\n",
       "      <td>Andrew Johnson</td>\n",
       "      <td>178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>18</td>\n",
       "      <td>Ulysses S. Grant</td>\n",
       "      <td>173</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>19</td>\n",
       "      <td>Rutherford B. Hayes</td>\n",
       "      <td>174</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>20</td>\n",
       "      <td>James A. Garfield</td>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>21</td>\n",
       "      <td>Chester A. Arthur</td>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>23</td>\n",
       "      <td>Benjamin Harrison</td>\n",
       "      <td>168</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>25</td>\n",
       "      <td>William McKinley</td>\n",
       "      <td>170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>26</td>\n",
       "      <td>Theodore Roosevelt</td>\n",
       "      <td>178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>27</td>\n",
       "      <td>William Howard Taft</td>\n",
       "      <td>182</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>28</td>\n",
       "      <td>Woodrow Wilson</td>\n",
       "      <td>180</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>29</td>\n",
       "      <td>Warren G. Harding</td>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>30</td>\n",
       "      <td>Calvin Coolidge</td>\n",
       "      <td>178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>31</td>\n",
       "      <td>Herbert Hoover</td>\n",
       "      <td>182</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>32</td>\n",
       "      <td>Franklin D. Roosevelt</td>\n",
       "      <td>188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>33</td>\n",
       "      <td>Harry S. Truman</td>\n",
       "      <td>175</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>34</td>\n",
       "      <td>Dwight D. Eisenhower</td>\n",
       "      <td>179</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>35</td>\n",
       "      <td>John F. Kennedy</td>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>36</td>\n",
       "      <td>Lyndon B. Johnson</td>\n",
       "      <td>193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>37</td>\n",
       "      <td>Richard Nixon</td>\n",
       "      <td>182</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>38</td>\n",
       "      <td>Gerald Ford</td>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>39</td>\n",
       "      <td>Jimmy Carter</td>\n",
       "      <td>177</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>40</td>\n",
       "      <td>Ronald Reagan</td>\n",
       "      <td>185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>41</td>\n",
       "      <td>George H. W. Bush</td>\n",
       "      <td>188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>42</td>\n",
       "      <td>Bill Clinton</td>\n",
       "      <td>188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>43</td>\n",
       "      <td>George W. Bush</td>\n",
       "      <td>182</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>44</td>\n",
       "      <td>Barack Obama</td>\n",
       "      <td>185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>45</td>\n",
       "      <td>Donald Trump</td>\n",
       "      <td>188</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    order                    name  height(cm)\n",
       "0       1       George Washington         189\n",
       "1       2              John Adams         170\n",
       "2       3        Thomas Jefferson         189\n",
       "3       4           James Madison         163\n",
       "4       5            James Monroe         183\n",
       "5       6       John Quincy Adams         171\n",
       "6       7          Andrew Jackson         185\n",
       "7       8        Martin Van Buren         168\n",
       "8       9  William Henry Harrison         173\n",
       "9      10              John Tyler         183\n",
       "10     11           James K. Polk         173\n",
       "11     12          Zachary Taylor         173\n",
       "12     13        Millard Fillmore         175\n",
       "13     14         Franklin Pierce         178\n",
       "14     15          James Buchanan         183\n",
       "15     16         Abraham Lincoln         193\n",
       "16     17          Andrew Johnson         178\n",
       "17     18        Ulysses S. Grant         173\n",
       "18     19     Rutherford B. Hayes         174\n",
       "19     20       James A. Garfield         183\n",
       "20     21       Chester A. Arthur         183\n",
       "21     23       Benjamin Harrison         168\n",
       "22     25        William McKinley         170\n",
       "23     26      Theodore Roosevelt         178\n",
       "24     27     William Howard Taft         182\n",
       "25     28          Woodrow Wilson         180\n",
       "26     29       Warren G. Harding         183\n",
       "27     30         Calvin Coolidge         178\n",
       "28     31          Herbert Hoover         182\n",
       "29     32   Franklin D. Roosevelt         188\n",
       "30     33         Harry S. Truman         175\n",
       "31     34    Dwight D. Eisenhower         179\n",
       "32     35         John F. Kennedy         183\n",
       "33     36       Lyndon B. Johnson         193\n",
       "34     37           Richard Nixon         182\n",
       "35     38             Gerald Ford         183\n",
       "36     39            Jimmy Carter         177\n",
       "37     40           Ronald Reagan         185\n",
       "38     41       George H. W. Bush         188\n",
       "39     42            Bill Clinton         188\n",
       "40     43          George W. Bush         182\n",
       "41     44            Barack Obama         185\n",
       "42     45            Donald Trump         188"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv('data/president_heights.csv')\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Names and Values\n",
    "\n",
    "Notice there can be missing values in DataFrames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     a  b    c\n",
       "0  1.0  2  NaN\n",
       "1  NaN  3  4.0"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can set indices and columns names *a posteriori*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-1.479090</td>\n",
       "      <td>0.558347</td>\n",
       "      <td>-0.681098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.180237</td>\n",
       "      <td>1.828715</td>\n",
       "      <td>0.094446</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>-0.223918</td>\n",
       "      <td>0.260618</td>\n",
       "      <td>0.688579</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>-1.756441</td>\n",
       "      <td>0.744399</td>\n",
       "      <td>-0.459267</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>-0.905346</td>\n",
       "      <td>-1.572462</td>\n",
       "      <td>2.288717</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           a         b         c\n",
       "1  -1.479090  0.558347 -0.681098\n",
       "4   1.180237  1.828715  0.094446\n",
       "9  -0.223918  0.260618  0.688579\n",
       "16 -1.756441  0.744399 -0.459267\n",
       "25 -0.905346 -1.572462  2.288717"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfA.columns = ['a','b','c']\n",
    "dfA.index = [i**2 for i in range(1,6)  ]\n",
    "dfA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Indexing\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Population</th>\n",
       "      <th>Area</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "      <td>423967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "      <td>695662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19651127</td>\n",
       "      <td>141297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>19552860</td>\n",
       "      <td>170312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>12882135</td>\n",
       "      <td>149995</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Population    Area\n",
       "California    38332521  423967\n",
       "Texas         26448193  695662\n",
       "New York      19651127  141297\n",
       "Florida       19552860  170312\n",
       "Illinois      12882135  149995"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "area = pd.Series( {'California': 423967, 'Texas': 695662, 'New York': 141297, 'Florida': 170312, 'Illinois': 149995})\n",
    "population = pd.Series({'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135})\n",
    "states = pd.DataFrame({'Population': population, 'Area': area})\n",
    "states"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may access columns directly with names, *then* you can access individuals with their index. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California    423967\n",
       "Texas         695662\n",
       "New York      141297\n",
       "Florida       170312\n",
       "Illinois      149995\n",
       "Name: Area, dtype: int64"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "states['Area']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "695662"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "states['Area']['Texas']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To ease the access, Pandas offers dedicated methods:\n",
    "* <tt>iloc</tt> enables to access subparts of the dataframe as if it was a NumPy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Population</th>\n",
       "      <th>Area</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "      <td>423967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "      <td>695662</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Population    Area\n",
       "California    38332521  423967\n",
       "Texas         26448193  695662"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "states.iloc[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California    38332521\n",
       "Texas         26448193\n",
       "Name: Population, dtype: int64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "states.iloc[:2,0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* <tt>loc</tt> does the same but with the explicit names (the last one is included)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Population</th>\n",
       "      <th>Area</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "      <td>423967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "      <td>695662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19651127</td>\n",
       "      <td>141297</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Population    Area\n",
       "California    38332521  423967\n",
       "Texas         26448193  695662\n",
       "New York      19651127  141297"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "states.loc[:'New York']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Population</th>\n",
       "      <th>Area</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "      <td>423967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "      <td>695662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19651127</td>\n",
       "      <td>141297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>19552860</td>\n",
       "      <td>170312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>12882135</td>\n",
       "      <td>149995</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Population    Area\n",
       "California    38332521  423967\n",
       "Texas         26448193  695662\n",
       "New York      19651127  141297\n",
       "Florida       19552860  170312\n",
       "Illinois      12882135  149995"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "states.loc[:,'Population':]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}