{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"from __future__ import division\n",
"\n",
"import numpy as np\n",
"import scipy as sp\n",
"import pandas as pd\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"\n",
"#IPython magic command for inline plotting\n",
"%matplotlib inline\n",
"#a better plot shape for IPython\n",
"mpl.rcParams['figure.figsize']=[15,3]"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
" Quick Overview of matplotlib"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"x = np.linspace(0, 1, 10001)\n",
"y = np.cos(np.pi/x) * np.exp(-x**2)\n",
"\n",
"plt.plot(x, y)\n",
"plt.show()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Plot the following equations over the domain $y \\in \\left[-1, 2\\right]$.\n",
" * $y = f(x) = x^2 \\exp(-x)$\n",
" * $y = f(x) = \\log x$\n",
" * $y = f(x) = 1 + x^x + 3 x^4$"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we say pandas, we are not talking about...."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But about..."
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Data analysis: pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The **pandas** data analysis module provides data structures and tools for data analysis. It focuses on data handling and manipulation as well as linear and panel regression. It is designed to let you carry out your entire data workflow in Python without having to switch to a domain-specific language such as R.\n",
"Although largely compatible with NumPy/SciPy, there are some important differences in indexing, data organization, and features. The basic Pandas data type is not ndarray, but **Series** and **DataFrame**. These allow you to index data and align axes efficiently."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Series"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A `Series` object is a one-dimensional array which can hold any data type. Like a dictionary, it has a set of indices for access (like keys); unlike a dictionary, it is ordered. Data alignment is intrinsic and will not be broken unless you do it explicitly. It is very similar to ndarray from NumPy.\n",
"An arbitrary list of values can be used as the index, or a list of axis labels (so it can act something like a `dict`)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s = pd.Series([1,5,float('NaN'),7.5,2.1,3])\n",
"print(s)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dates = pd.date_range('20140201', periods=s.size)\n",
"s.index = dates\n",
"print(s)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"letters = ['A', 'B', 'Ch', '#', '#', '---']\n",
"s.index = letters\n",
"print(s)\n",
"print('\\nAccess is like a dictionary key:\\ns[\\'---\\'] = '+str(s['---']))\n",
"print('\\nRepeat labels are possible:\\ns[\\'#\\']=\\n'+str(s['#']))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NumPy functions expecting an ndarray often do just fine with Series as well."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"t = np.exp(s)\n",
"print(t)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"String Methods"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
" s.str.upper()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
" s.str.lower()\n",
" "
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.str.len()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s2 = pd.Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])\n",
"print s2\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s2.str.split('_')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"
| Method | \n", "Description | \n", "
|---|---|
| cat | \n", "Concatenate strings | \n", "
| split | \n", "Split strings on delimiter | \n", "
| get | \n", "Index into each element (retrieve i-th element | \n", "
| join | \n", "Join strings in each element of the Series with passed separator | \n", "
| contains | \n", "Return boolean array if each string contains pattern/regex | \n", "
| replace | \n", "Replace occurrences of pattern/regex with some other string | \n", "
| repeat | \n", "Duplicate values (s.str.repeat(3) equivalent to x * 3) | \n", "
| pad | \n", "Add whitespace to left, right, or both sides of strings | \n", "
| center | \n", "Equivalent to pad(side='both') | \n", "
| wrap | \n", "Split long strings into lines with length less than a given width | \n", "
| slice | \n", "Slice each string in the Series | \n", "
| slice_replace | \n", "Replace slice in each string with passed value | \n", "
| count | \n", "Count occurrences of pattern | \n", "
| startswith | \n", "Equivalent to str.startswith(pat) for each element | \n", "
| endswith | \n", "Equivalent to str.endswith(pat) for each element | \n", "
| findall | \n", "Compute list of all occurrences of pattern/regex for each string | \n", "
| match | \n", "Call re.match on each element, returning matched groups as list | \n", "
| extract | \n", "Call re.match on each element, as match does, but return matched groups as strings for convenience. | \n", "
| len | \n", "Compute string lengths | \n", "
| strip | \n", "Equivalent to str.strip | \n", "
| rstrip | \n", "Equivalent to str.rstrip | \n", "
| lstrip | \n", "Equivalent to str.lstrip | \n", "
| lower | \n", "Equivalent to str.lower | \n", "
| upper | \n", "Equivalent to str.upper | \n", "