{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Indexing and selecting data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "try:\n", " import seaborn\n", "except ImportError:\n", " pass" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# redefining the example objects\n", "\n", "# series\n", "population = pd.Series({'Germany': 81.3, 'Belgium': 11.3, 'France': 64.3, \n", " 'United Kingdom': 64.9, 'Netherlands': 16.9})\n", "\n", "# dataframe\n", "data = {'country': ['Belgium', 'France', 'Germany', 'Netherlands', 'United Kingdom'],\n", " 'population': [11.3, 64.3, 81.3, 16.9, 64.9],\n", " 'area': [30510, 671308, 357050, 41526, 244820],\n", " 'capital': ['Brussels', 'Paris', 'Berlin', 'Amsterdam', 'London']}\n", "countries = pd.DataFrame(data)\n", "countries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting the index to the country names:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "countries = countries.set_index('country')\n", "countries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some notes on selecting data\n", "\n", "One of pandas' basic features is the labeling of rows and columns, but this makes indexing also a bit more complex compared to numpy. We now have to distuinguish between:\n", "\n", "- selection by label\n", "- selection by position." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `data[]` provides some convenience shortcuts " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a DataFrame, basic indexing selects the columns.\n", "\n", "Selecting a single column:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "countries['area']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or multiple columns:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "countries[['area', 'population']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But, slicing accesses the rows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "countries['France':'Netherlands']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "