{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Render our plots inline\n", "%matplotlib inline\n", "\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "# Make the graphs a bit prettier, and bigger\n", "plt.style.use('ggplot')\n", "plt.rcParams['figure.figsize'] = (15, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.1 Reading data from a csv file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can read data from a CSV file using the `read_csv` function. By default, it assumes that the fields are comma-separated.\n", "\n", "We're going to be looking some cyclist data from Montréal. Here's the [original page](http://donnees.ville.montreal.qc.ca/dataset/velos-comptage) (in French), but it's already included in this repository. We're using the data from 2012.\n", "\n", "This dataset is a list of how many people were on 7 different bike paths in Montreal, each day." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "broken_df = pd.read_csv('./data/bikes.csv',encoding = \"ISO-8859-1\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Look at the first 3 rows\n", "broken_df[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll notice that this is totally broken! `read_csv` has a bunch of options that will let us fix that, though. Here we'll\n", "\n", "* change the column separator to a `;`\n", "* Set the encoding to `'latin1'` (the default is `'utf8'`)\n", "* Parse the dates in the 'Date' column\n", "* Tell it that our dates have the day first instead of the month first\n", "* Set the index to be the 'Date' column" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fixed_df = pd.read_csv('./data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')\n", "fixed_df[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.2 Selecting a column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you read a CSV, you get a kind of object called a `DataFrame`, which is made up of rows and columns. You get columns out of a DataFrame the same way you get elements out of a dictionary.\n", "\n", "Here's an example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fixed_df['Berri 1']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.3 Plotting a column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just add `.plot()` to the end! How could it be easier? =)\n", "\n", "We can see that, unsurprisingly, not many people are biking in January, February, and March, " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fixed_df['Berri 1'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also plot all the columns just as easily. We'll make it a little bigger, too.\n", "You can see that it's more squished together, but all the bike paths behave basically the same -- if it's a bad day for cyclists, it's a bad day everywhere." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fixed_df.plot(figsize=(15, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.4 Putting all that together" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the code we needed to write do draw that graph, all together:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('./data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')\n", "df['Berri 1'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "