{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 11. Pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas es un paquete de Python que proporciona estructuras de datos __rápidas, flexibles y expresivas__ diseñadas para que trabajar con datos __\"relacionales\" o \"etiquetados\"__ sea fácil e intuitivo, es una de las librerias más usadas debido a su potencia y además es de código abierto . Su función es ser una herramienta de alto nivel para realizar__ analisis de datos__ en el mundo real. \n",
"\n",
"Pandas es muy adecuado para muchos tipos diferentes de datos:\n",
"\n",
"- Datos tabulares con columnas de tipo heterogéneo, como en una tabla de SQL o una hoja de cálculo de Excel\n",
"- Datos de series de tiempo ordenados y no ordenados (no necesariamente de frecuencia fija).\n",
"- Datos de matriz arbitraria (homogéneamente tipados o heterogéneos) con etiquetas de fila y columna\n",
"- Cualquier otra forma de conjuntos de datos observacionales / estadísticos. Los datos realmente no necesitan ser etiquetados en absoluto para ser colocados en una estructura de datos pandas.\n",
"\n",
"Pandas ofrece las siguientes estructuras de datos:\n",
"\n",
"* __Series__: Son arrays unidimensionales con indexación (arrays con índice o etiquetados), similar a los diccionarios. Pueden generarse a partir de diccionarios o de listas.\n",
" \n",
"* __DataFrame__: Son estructuras de datos similares a las tablas de bases de datos relacionales como SQL.\n",
" \n",
"* __Panel, Panel4D y PanelND__: Estas estructuras de datos permiten trabajar con más de dos dimensiones. Dado que es algo complejo y poco utilizado trabajar con arrays de más de dos dimensiones no trataremos los paneles en estos tutoriales de introdución a Pandas.\n",
"\n",
"Aquí están algunas de las cosas que los pandas hacen bien:\n",
"\n",
"- Manejo fácil de __datos faltantes__ (representados como __NaN__) en punto flotante así como datos de punto no flotante.\n",
"- Cambios de tamaño: las columnas se pueden __insertar y eliminar__ de DataFrame y objetos de dimensiones superiores.\n",
"- __Alineación automática y explícita de datos__: los objetos pueden alinearse explícitamente con un conjunto de etiquetas, o el usuario puede simplemente ignorar las etiquetas y dejar que Series, DataFrame, etc. alineen automáticamente los datos en cálculos.\n",
"- Potente y flexible al __agrupar por funcionalidad__ para realizar operaciones __split-apply-combine__ en conjuntos de datos, tanto para la agregación como para la transformación de datos.\n",
"- __Facilita la conversión__ de datos desiguales y diferenciados en otras estructuras de datos Python y NumPy en objetos DataFrame.\n",
"- Recorte inteligente basado en __slicing, fancy indexing, y subsetting__ de grandes conjuntos de datos.\n",
"- Robustas herramientas de IO para cargar datos de __archivos planos__ (CSV y delimitado), archivos de Excel, bases de datos y guardar / cargar datos desde el formato __HDF5 ultrarrápido__.\n",
"- __Funciones específicas de series de tiempo__: generación de intervalos de fechas y conversión de frecuencia, estadísticas de ventanas en movimiento, regresiones lineales de ventanas en movimiento, cambio de fecha y retraso, etc.\n",
"\n",
"Para los científicos de datos, el trabajo con datos suele dividirse en múltiples etapas: muestrear y limpiar los datos, analizarlos o modelarlos, y luego organizar los resultados del análisis en una forma adecuada para representación gráfica o tabular. Pandas es la herramienta ideal para todas estas tareas.\n",
"\n",
"Otras notas:\n",
"\n",
"- Pandas es __rápido__. Muchos de los bits algorítmicos de bajo nivel se han modificado extensamente en el código de Cython. Sin embargo, como con cualquier otra cosa, la generalización suele sacrificar el rendimiento. Así que si usted se centra en una característica para su aplicación que puede ser capaz de crear una herramienta especializada más rápida.\n",
"- Pandas es una dependencia de __statsmodels__, por lo que es una parte importante del ecosistema de computación estadística en Python.\n",
"- Pandas ha sido ampliamente utilizado en la producción en __aplicaciones financieras__."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ahora empecemos importando pandas:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Forma convencional de importar pandas:\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Importando matplotlib para graficar \n",
"%matplotlib inline\n",
"\n",
"import matplotlib.pyplot as plt\n",
"\n",
"plt.rcParams['figure.figsize'] = (15, 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Estructuras de Datos\n",
"------------------------------\n",
"\n",
"### Series\n",
"\n",
"La estructura de datos de Series en Pandas es una matriz etiquetada unidimensional.\n",
"\n",
"- Los datos de la matriz pueden ser de cualquier tipo (números enteros, cadenas, números de punto flotante, objetos Python, etc.).\n",
"\n",
"- Los datos dentro de la matriz son homogéneos.\n",
"\n",
"- Los datos pueden ser listas, arrays, o un diccionario.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 33\n",
"1 19\n",
"2 15\n",
"3 89\n",
"4 11\n",
"5 -5\n",
"6 9\n",
"dtype: int64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Constructor de serie con datos como una lista de enteros\n",
"s1 = pd.Series([33, 19, 15, 89, 11, -5, 9])\n",
"s1"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Tipo de serie es la serie pandas\n",
"type(s1)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([33, 19, 15, 89, 11, -5, 9])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Recupera los valores de la serie \n",
"s1.values"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"numpy.ndarray"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Tipo de valores de datos es NumPy ndarray\n",
"type(s1.values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Alt text](../images/series.jpg \"Optional title\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Mon 33\n",
"Tue 19\n",
"Wed 15\n",
"Thu 89\n",
"Fri 11\n",
"Sat -5\n",
"Sun 9\n",
"dtype: int64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Define los datos e indices como listas\n",
"data1 = [33, 19, 15, 89, 11, -5, 9]\n",
"index1 = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']\n",
"\n",
"# Crea la serie \n",
"s2 = pd.Series(data1, index=index1)\n",
"\n",
"s2"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"![Alt text](../images/series2.jpg \"Optional title\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Weekday\n",
"Mon 33\n",
"Tue 19\n",
"Wed 15\n",
"Thu 89\n",
"Fri 11\n",
"Sat -5\n",
"Sun 9\n",
"Name: Daily Temperatures, dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# También podemos dar etiquetas significativas a los datos de la serie y el índice\n",
"\n",
"s2.name='Daily Temperatures'\n",
"s2.index.name='Weekday'\n",
"\n",
"s2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La representación más general de una serie es como un almacén de key-values ordenado.\n",
"\n",
"- El orden es representado por el offset.\n",
"- El valor-clave es una asignación de índice o etiqueta a los valores de matriz de datos.\n",
"- Indice como \"offset\" o \"posición\" vs índice como \"etiqueta\" o \"clave\".\n",
"\n",
"![Alt text](../images/series3.jpg \"Optional title\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Al ser las series de tipo__NumPy-ndarray__ podemos efectuar las mismas operaciones que hicimos en Numpy:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Weekday\n",
"Mon 66\n",
"Tue 38\n",
"Wed 30\n",
"Thu 178\n",
"Fri 22\n",
"Sat -10\n",
"Sun 18\n",
"Name: Daily Temperatures, dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s2 * 2"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Weekday\n",
"Mon 33\n",
"Tue 19\n",
"Wed 15\n",
"Name: Daily Temperatures, dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Podemos usar el slicing usando la posicion\n",
"s2[0:3]"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"Weekday\n",
"Mon 33\n",
"Tue 19\n",
"Wed 15\n",
"Name: Daily Temperatures, dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Tambien podemos usar slicin usando sus etiquetas(labes)\n",
"s2['Mon':'Wed']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Uniendo Series"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
bedrs
\n",
"
bathrs
\n",
"
price_sqr_meter
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
2
\n",
"
19510
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
15190
\n",
"
\n",
"
\n",
"
2
\n",
"
1
\n",
"
2
\n",
"
16107
\n",
"
\n",
"
\n",
"
3
\n",
"
1
\n",
"
1
\n",
"
22991
\n",
"
\n",
"
\n",
"
4
\n",
"
2
\n",
"
2
\n",
"
24508
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" bedrs bathrs price_sqr_meter\n",
"0 1 2 19510\n",
"1 2 1 15190\n",
"2 1 2 16107\n",
"3 1 1 22991\n",
"4 2 2 24508"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"s1 = pd.Series(np.random.randint(1, high=5, size=100, dtype='l'))\n",
"s2 = pd.Series(np.random.randint(1, high=4, size=100, dtype='l'))\n",
"s3 = pd.Series(np.random.randint(10000, high=30001, size=100, dtype='l'))\n",
"\n",
"housemkt = pd.concat([s1, s2, s3], axis=1)\n",
"housemkt.rename(columns = {0: 'bedrs', 1: 'bathrs', 2: 'price_sqr_meter'}, inplace=True)\n",
"\n",
"housemkt.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dataframes\n",
"\n",
"La estructura de datos de DataFrame en Pandas es una matriz etiquetada bidimensional.\n",
"\n",
"- Los datos de la matriz pueden ser de cualquier tipo (números enteros, cadenas, números de punto flotante, objetos Python, etc.).\n",
"- Los datos dentro de cada columna son homogéneos\n",
"- De forma predeterminada, Pandas crea un índice numérico para las filas en la secuencia 0 ... n\n",
"\n",
"![Alt text](../images/dataframe.jpg \"Optional title\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import datetime\n",
"\n",
"# Creamos una lista de fechas desde 12-01 to 12-10\n",
"dt = datetime.datetime(2016,12,1)\n",
"end = datetime.datetime(2016,12,10)\n",
"step = datetime.timedelta(days=1)\n",
"dates = []\n",
"\n",
"# Rellenar la lista\n",
"while dt < end:\n",
" dates.append(dt.strftime('%m-%d'))\n",
" dt += step"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['12-01',\n",
" '12-02',\n",
" '12-03',\n",
" '12-04',\n",
" '12-05',\n",
" '12-06',\n",
" '12-07',\n",
" '12-08',\n",
" '12-09']"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dates"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Arequipa
\n",
"
Date
\n",
"
Lima
\n",
"
Puno
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
15
\n",
"
12-01
\n",
"
20
\n",
"
-2
\n",
"
\n",
"
\n",
"
1
\n",
"
19
\n",
"
12-02
\n",
"
18
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
15
\n",
"
12-03
\n",
"
23
\n",
"
2
\n",
"
\n",
"
\n",
"
3
\n",
"
11
\n",
"
12-04
\n",
"
19
\n",
"
5
\n",
"
\n",
"
\n",
"
4
\n",
"
9
\n",
"
12-05
\n",
"
25
\n",
"
7
\n",
"
\n",
"
\n",
"
5
\n",
"
8
\n",
"
12-06
\n",
"
27
\n",
"
-5
\n",
"
\n",
"
\n",
"
6
\n",
"
13
\n",
"
12-07
\n",
"
23
\n",
"
-3
\n",
"
\n",
"
\n",
"
7
\n",
"
14
\n",
"
12-08
\n",
"
29
\n",
"
4
\n",
"
\n",
"
\n",
"
8
\n",
"
16
\n",
"
12-09
\n",
"
30
\n",
"
7
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Arequipa Date Lima Puno\n",
"0 15 12-01 20 -2\n",
"1 19 12-02 18 0\n",
"2 15 12-03 23 2\n",
"3 11 12-04 19 5\n",
"4 9 12-05 25 7\n",
"5 8 12-06 27 -5\n",
"6 13 12-07 23 -3\n",
"7 14 12-08 29 4\n",
"8 16 12-09 30 7"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d = {'Date': dates, 'Arequipa' : [15,19,15,11,9,8,13,14,16], 'Puno': [-2,0,2,5,7,-5,-3,4,7], 'Lima':[20,18,23,19,25,27,23,29,30]}\n",
"temps = pd.DataFrame(d)\n",
"temps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Leyendo data de un archivo csv \n",
"\n",
"Puede leer datos de un archivo __CSV__ (comma-separated values) utilizando la función read_csv. \n",
"\n",
"Vamos a buscar algunos datos de avistamientos de ovnis."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Leyendo el dataset de reportes de avistamientos en un dataframe\n",
"ufo = pd.read_csv('../data/ufo.csv')"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
City
\n",
"
Colors Reported
\n",
"
Shape Reported
\n",
"
State
\n",
"
Time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Ithaca
\n",
"
NaN
\n",
"
TRIANGLE
\n",
"
NY
\n",
"
6/1/1930 22:00
\n",
"
\n",
"
\n",
"
1
\n",
"
Willingboro
\n",
"
NaN
\n",
"
OTHER
\n",
"
NJ
\n",
"
6/30/1930 20:00
\n",
"
\n",
"
\n",
"
2
\n",
"
Holyoke
\n",
"
NaN
\n",
"
OVAL
\n",
"
CO
\n",
"
2/15/1931 14:00
\n",
"
\n",
"
\n",
"
3
\n",
"
Abilene
\n",
"
NaN
\n",
"
DISK
\n",
"
KS
\n",
"
6/1/1931 13:00
\n",
"
\n",
"
\n",
"
4
\n",
"
New York Worlds Fair
\n",
"
NaN
\n",
"
LIGHT
\n",
"
NY
\n",
"
4/18/1933 19:00
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" City Colors Reported Shape Reported State Time\n",
"0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00\n",
"1 Willingboro NaN OTHER NJ 6/30/1930 20:00\n",
"2 Holyoke NaN OVAL CO 2/15/1931 14:00\n",
"3 Abilene NaN DISK KS 6/1/1931 13:00\n",
"4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Examinamos las 5 primeras filas\n",
"ufo.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Documentacion de [read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"También podemos leer data de una web, en este caso leeremos un archivo __TSV__ (Tabular-separated-values): con __read_table__:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
order_id
\n",
"
quantity
\n",
"
item_name
\n",
"
choice_description
\n",
"
item_price
\n",
"
\n",
" \n",
" \n",
"
\n",
"
4617
\n",
"
1833
\n",
"
1
\n",
"
Steak Burrito
\n",
"
[Fresh Tomato Salsa, [Rice, Black Beans, Sour ...
\n",
"
$11.75
\n",
"
\n",
"
\n",
"
4618
\n",
"
1833
\n",
"
1
\n",
"
Steak Burrito
\n",
"
[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese...
\n",
"
$11.75
\n",
"
\n",
"
\n",
"
4619
\n",
"
1834
\n",
"
1
\n",
"
Chicken Salad Bowl
\n",
"
[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...
\n",
"
$11.25
\n",
"
\n",
"
\n",
"
4620
\n",
"
1834
\n",
"
1
\n",
"
Chicken Salad Bowl
\n",
"
[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...
\n",
"
$8.75
\n",
"
\n",
"
\n",
"
4621
\n",
"
1834
\n",
"
1
\n",
"
Chicken Salad Bowl
\n",
"
[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...
\n",
"
$8.75
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id quantity item_name \\\n",
"4617 1833 1 Steak Burrito \n",
"4618 1833 1 Steak Burrito \n",
"4619 1834 1 Chicken Salad Bowl \n",
"4620 1834 1 Chicken Salad Bowl \n",
"4621 1834 1 Chicken Salad Bowl \n",
"\n",
" choice_description item_price \n",
"4617 [Fresh Tomato Salsa, [Rice, Black Beans, Sour ... $11.75 \n",
"4618 [Fresh Tomato Salsa, [Rice, Sour Cream, Cheese... $11.75 \n",
"4619 [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... $11.25 \n",
"4620 [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... $8.75 \n",
"4621 [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... $8.75 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Leyendo el dataset de ordenes de Chipotle de una URL y guardar los resultados en un dataframe\n",
"orders = pd.read_table('http://bit.ly/chiporders')\n",
"\n",
"#mostramos las ultimas filas\n",
"orders.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para seleccionar una Columna o __\"Serie\"__ usamos la notacion []:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0 Ithaca\n",
"1 Willingboro\n",
"2 Holyoke\n",
"3 Abilene\n",
"4 New York Worlds Fair\n",
"Name: City, dtype: object"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ufo['City'].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tambien podemos usar la notación punto (__.__):"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 Ithaca\n",
"1 Willingboro\n",
"2 Holyoke\n",
"3 Abilene\n",
"4 New York Worlds Fair\n",
"Name: City, dtype: object"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ufo.City.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La notacion de brackets [] o corchetes siempre funciona mientras que la notación del punto tiene limitaciones:\n",
"\n",
"- La notación de puntos no funciona si hay espacios en el nombre de la serie\n",
"- La notación de puntos no funciona si la Serie tiene el mismo nombre que un método o atributo de DataFrame (como 'head' o 'shape')\n",
"- No se puede utilizar la notación de puntos para definir el nombre de una nueva serie (véase más adelante)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
star_rating
\n",
"
title
\n",
"
content_rating
\n",
"
genre
\n",
"
duration
\n",
"
actors_list
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
9.3
\n",
"
The Shawshank Redemption
\n",
"
R
\n",
"
Crime
\n",
"
142
\n",
"
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
\n",
"
\n",
"
\n",
"
1
\n",
"
9.2
\n",
"
The Godfather
\n",
"
R
\n",
"
Crime
\n",
"
175
\n",
"
[u'Marlon Brando', u'Al Pacino', u'James Caan']
\n",
"
\n",
"
\n",
"
2
\n",
"
9.1
\n",
"
The Godfather: Part II
\n",
"
R
\n",
"
Crime
\n",
"
200
\n",
"
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
\n",
"
\n",
"
\n",
"
3
\n",
"
9.0
\n",
"
The Dark Knight
\n",
"
PG-13
\n",
"
Action
\n",
"
152
\n",
"
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
\n",
"
\n",
"
\n",
"
4
\n",
"
8.9
\n",
"
Pulp Fiction
\n",
"
R
\n",
"
Crime
\n",
"
154
\n",
"
[u'John Travolta', u'Uma Thurman', u'Samuel L....
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" star_rating title content_rating genre duration \\\n",
"0 9.3 The Shawshank Redemption R Crime 142 \n",
"1 9.2 The Godfather R Crime 175 \n",
"2 9.1 The Godfather: Part II R Crime 200 \n",
"3 9.0 The Dark Knight PG-13 Action 152 \n",
"4 8.9 Pulp Fiction R Crime 154 \n",
"\n",
" actors_list \n",
"0 [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt... \n",
"1 [u'Marlon Brando', u'Al Pacino', u'James Caan'] \n",
"2 [u'Al Pacino', u'Robert De Niro', u'Robert Duv... \n",
"3 [u'Christian Bale', u'Heath Ledger', u'Aaron E... \n",
"4 [u'John Travolta', u'Uma Thurman', u'Samuel L.... "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Leyendo un dataset de las top-rated IMDb movies en un dataframe\n",
"\n",
"movies = pd.read_csv('../data/imdb.csv')\n",
"movies.head()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" City Colors Reported Shape Reported State Time\n",
"0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00\n",
"1 Willingboro NaN OTHER NJ 6/30/1930 20:00\n",
"2 Holyoke NaN OVAL CO 2/15/1931 14:00\n",
"3 Abilene NaN DISK KS 6/1/1931 13:00\n",
"4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Examinamos las 5 ultimas filas\n",
"ufo = pd.read_csv('../data/ufo.csv')\n",
"ufo.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Los nombres de cada columna deben de tratar de no tener espacios, para ello podemos cambiar el nombre de las columnas de distintas formas como:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
city
\n",
"
colors_reported
\n",
"
shape_reported
\n",
"
state
\n",
"
time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Ithaca
\n",
"
NaN
\n",
"
TRIANGLE
\n",
"
NY
\n",
"
6/1/1930 22:00
\n",
"
\n",
"
\n",
"
1
\n",
"
Willingboro
\n",
"
NaN
\n",
"
OTHER
\n",
"
NJ
\n",
"
6/30/1930 20:00
\n",
"
\n",
"
\n",
"
2
\n",
"
Holyoke
\n",
"
NaN
\n",
"
OVAL
\n",
"
CO
\n",
"
2/15/1931 14:00
\n",
"
\n",
"
\n",
"
3
\n",
"
Abilene
\n",
"
NaN
\n",
"
DISK
\n",
"
KS
\n",
"
6/1/1931 13:00
\n",
"
\n",
"
\n",
"
4
\n",
"
New York Worlds Fair
\n",
"
NaN
\n",
"
LIGHT
\n",
"
NY
\n",
"
4/18/1933 19:00
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" city colors_reported shape_reported state time\n",
"0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00\n",
"1 Willingboro NaN OTHER NJ 6/30/1930 20:00\n",
"2 Holyoke NaN OVAL CO 2/15/1931 14:00\n",
"3 Abilene NaN DISK KS 6/1/1931 13:00\n",
"4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Reemplazar todos los nombres de columnas sobrescribiendo el atributo 'columnas'\n",
"ufo_cols = ['city', 'colors_reported', 'shape_reported', 'state', 'time']\n",
"ufo.columns = ufo_cols\n",
"ufo.head()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
city
\n",
"
Colors_Reported_test
\n",
"
Shape_Reported_test
\n",
"
state
\n",
"
time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Ithaca
\n",
"
NaN
\n",
"
TRIANGLE
\n",
"
NY
\n",
"
6/1/1930 22:00
\n",
"
\n",
"
\n",
"
1
\n",
"
Willingboro
\n",
"
NaN
\n",
"
OTHER
\n",
"
NJ
\n",
"
6/30/1930 20:00
\n",
"
\n",
"
\n",
"
2
\n",
"
Holyoke
\n",
"
NaN
\n",
"
OVAL
\n",
"
CO
\n",
"
2/15/1931 14:00
\n",
"
\n",
"
\n",
"
3
\n",
"
Abilene
\n",
"
NaN
\n",
"
DISK
\n",
"
KS
\n",
"
6/1/1931 13:00
\n",
"
\n",
"
\n",
"
4
\n",
"
New York Worlds Fair
\n",
"
NaN
\n",
"
LIGHT
\n",
"
NY
\n",
"
4/18/1933 19:00
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" city Colors_Reported_test Shape_Reported_test state \\\n",
"0 Ithaca NaN TRIANGLE NY \n",
"1 Willingboro NaN OTHER NJ \n",
"2 Holyoke NaN OVAL CO \n",
"3 Abilene NaN DISK KS \n",
"4 New York Worlds Fair NaN LIGHT NY \n",
"\n",
" time \n",
"0 6/1/1930 22:00 \n",
"1 6/30/1930 20:00 \n",
"2 2/15/1931 14:00 \n",
"3 6/1/1931 13:00 \n",
"4 4/18/1933 19:00 "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Renombrar dos de las columnas mediante el método 'rename'\n",
"ufo.rename(columns={'colors_reported':'Colors_Reported_test', 'shape_reported':'Shape_Reported_test'}, inplace=True)\n",
"ufo.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"Para remover una columna usamos el método __drop__ :"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
city
\n",
"
Shape_Reported_test
\n",
"
state
\n",
"
time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Ithaca
\n",
"
TRIANGLE
\n",
"
NY
\n",
"
6/1/1930 22:00
\n",
"
\n",
"
\n",
"
1
\n",
"
Willingboro
\n",
"
OTHER
\n",
"
NJ
\n",
"
6/30/1930 20:00
\n",
"
\n",
"
\n",
"
2
\n",
"
Holyoke
\n",
"
OVAL
\n",
"
CO
\n",
"
2/15/1931 14:00
\n",
"
\n",
"
\n",
"
3
\n",
"
Abilene
\n",
"
DISK
\n",
"
KS
\n",
"
6/1/1931 13:00
\n",
"
\n",
"
\n",
"
4
\n",
"
New York Worlds Fair
\n",
"
LIGHT
\n",
"
NY
\n",
"
4/18/1933 19:00
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" city Shape_Reported_test state time\n",
"0 Ithaca TRIANGLE NY 6/1/1930 22:00\n",
"1 Willingboro OTHER NJ 6/30/1930 20:00\n",
"2 Holyoke OVAL CO 2/15/1931 14:00\n",
"3 Abilene DISK KS 6/1/1931 13:00\n",
"4 New York Worlds Fair LIGHT NY 4/18/1933 19:00"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Eliminar una columna (axis=1 se refiere a columnas)\n",
"ufo.drop('Colors_Reported_test', axis=1, inplace=True)\n",
"ufo.head()"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" country beer_servings spirit_servings wine_servings \\\n",
"0 Afghanistan 0 0 0 \n",
"1 Albania 89 132 54 \n",
"2 Algeria 25 0 14 \n",
"3 Andorra 245 138 312 \n",
"4 Angola 217 57 45 \n",
"\n",
" total_litres_of_pure_alcohol continent \n",
"0 0.0 Asia \n",
"1 4.9 Europe \n",
"2 0.7 Africa \n",
"3 12.4 Europe \n",
"4 5.9 Africa "
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Lee el dataset de consumo de alcohol en un dataframe\n",
"\n",
"drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
"drinks.head()"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"61.471698113207545"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Calcula la media de cervezas servidas solo en paises del continente africano\n",
"\n",
"drinks[drinks.continent=='Africa'].beer_servings.mean()"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"continent\n",
"Africa 61.471698\n",
"Asia 37.045455\n",
"Europe 193.777778\n",
"North America 145.434783\n",
"Oceania 89.687500\n",
"South America 175.083333\n",
"Name: beer_servings, dtype: float64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Calcula la media de cervezas servidas por cada continente\n",
"\n",
"drinks.groupby('continent').beer_servings.mean()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
beer_servings
\n",
"
spirit_servings
\n",
"
wine_servings
\n",
"
total_litres_of_pure_alcohol
\n",
"
\n",
"
\n",
"
continent
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Africa
\n",
"
61.471698
\n",
"
16.339623
\n",
"
16.264151
\n",
"
3.007547
\n",
"
\n",
"
\n",
"
Asia
\n",
"
37.045455
\n",
"
60.840909
\n",
"
9.068182
\n",
"
2.170455
\n",
"
\n",
"
\n",
"
Europe
\n",
"
193.777778
\n",
"
132.555556
\n",
"
142.222222
\n",
"
8.617778
\n",
"
\n",
"
\n",
"
North America
\n",
"
145.434783
\n",
"
165.739130
\n",
"
24.521739
\n",
"
5.995652
\n",
"
\n",
"
\n",
"
Oceania
\n",
"
89.687500
\n",
"
58.437500
\n",
"
35.625000
\n",
"
3.381250
\n",
"
\n",
"
\n",
"
South America
\n",
"
175.083333
\n",
"
114.750000
\n",
"
62.416667
\n",
"
6.308333
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" beer_servings spirit_servings wine_servings \\\n",
"continent \n",
"Africa 61.471698 16.339623 16.264151 \n",
"Asia 37.045455 60.840909 9.068182 \n",
"Europe 193.777778 132.555556 142.222222 \n",
"North America 145.434783 165.739130 24.521739 \n",
"Oceania 89.687500 58.437500 35.625000 \n",
"South America 175.083333 114.750000 62.416667 \n",
"\n",
" total_litres_of_pure_alcohol \n",
"continent \n",
"Africa 3.007547 \n",
"Asia 2.170455 \n",
"Europe 8.617778 \n",
"North America 5.995652 \n",
"Oceania 3.381250 \n",
"South America 6.308333 "
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Especificando una columna a la que se debe aplicar la función de agregación no se requiere\n",
"\n",
"drinks.groupby('continent').mean()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3EAAAF/CAYAAADn4UAVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X+4bnVdJ/z3xwNqj0JKHhkeFY70oBOWHvVElmYGY5Ga\nv1IEf2TmRD3DKFpW0DTZ1Yzz8Fhqk41OlCiaP4AHNUrH9CEnzX7oQYkQYQSEEQbh+CMhNZTDZ/64\n1zluj/tw9jn73qx7nf16Xdd9nXt917r3/dZrudzvvdb6ruruAAAAMA13GTsAAAAAK6fEAQAATIgS\nBwAAMCFKHAAAwIQocQAAABOixAEAAEyIEgcAADAhShwAAMCEKHEAAAATcsDYAZLkPve5T2/atGns\nGAAAAKO46KKLPt/dG1ey7R5LXFU9IMmbkxyapJOc2d3/uaoOSXJOkk1JrklyQnd/afjM6UlemGR7\nkhd395/f0Xds2rQpW7duXUleAACA/U5VXbvSbVdyOeVtSX6pu49O8qgkp1TV0UlOS3Jhdx+V5MJh\nOcO6E5M8JMnxSV5XVRv27j8CAAAAy9ljievuG7r748P7W5J8Ksn9kjwlydnDZmcneerw/ilJ3tHd\nt3b3Z5JcmeSYeQcHAABYj/ZqYpOq2pTk4Un+Lsmh3X3DsOpzmV1umcwK3meXfOy6YWzXn3VyVW2t\nqq3btm3by9gAAADr04pLXFXdM8n5SV7S3TcvXdfdndn9civW3Wd295bu3rJx44ru3wMAAFj3VlTi\nqurAzArcW7v7ncPwjVV12LD+sCQ3DePXJ3nAko/ffxgDAABglfZY4qqqkrwhyae6+9VLVl2Q5PnD\n++cn+ZMl4ydW1d2q6oFJjkry0flFBgAAWL9W8py4Ryd5XpJ/qKqLh7FfS3JGknOr6oVJrk1yQpJ0\n9yer6twkl2U2s+Up3b197skBAADWoT2WuO7+qyS1m9XH7eYzr0jyilXkAgAAYBl7NTslAAAA41Li\nAAAAJkSJAwAAmJCVTGwCwCptOu09Y0dYSNec8cSxIwDA5DgTBwAAMCFKHAAAwIQocQAAABOixAEA\nAEyIEgcAADAhShwAAMCEKHEAAAATosQBAABMiBIHAAAwIUocAADAhChxAAAAE6LEAQAATIgSBwAA\nMCFKHAAAwIQocQAAABOixAEAAEyIEgcAADAhShwAAMCEKHEAAAATosQBAABMiBIHAAAwIXsscVV1\nVlXdVFWXLhk7p6ouHl7XVNXFw/imqvraknX/dS3DAwAArDcHrGCbNyX5/SRv3jHQ3c/a8b6qXpXk\ny0u2v6q7N88rIAAAAN+0xxLX3R+qqk3LrauqSnJCkmPnGwsAAIDlrORM3B354SQ3dvenl4w9cLi8\n8stJfr27P7zcB6vq5CQnJ8nhhx++yhgAAMAd2XTae8aOsHCuOeOJY0fYJ6ud2OSkJG9fsnxDksOH\nyyl/Mcnbqurg5T7Y3Wd295bu3rJx48ZVxgAAAFgf9rnEVdUBSZ6e5JwdY919a3d/YXh/UZKrkjxo\ntSEBAACYWc2ZuH+V5PLuvm7HQFVtrKoNw/sjkxyV5OrVRQQAAGCHlTxi4O1J/ibJg6vquqp64bDq\nxHzrpZRJ8tgklwz3xP1/SX6hu784z8AAAADr2UpmpzxpN+M/s8zY+UnOX30sAAAAlrPaiU0AAAC4\nEylxAAAAE6LEAQAATIgSBwAAMCFKHAAAwIQocQAAABOixAEAAEyIEgcAADAhShwAAMCEKHEAAAAT\nosQBAABMiBIHAAAwIUocAADAhChxAAAAE6LEAQAATIgSBwAAMCFKHAAAwIQocQAAABOixAEAAEyI\nEgcAADAhShwAAMCEKHEAAAATosQBAABMiBIHAAAwIXsscVV1VlXdVFWXLhn7zaq6vqouHl5PWLLu\n9Kq6sqquqKofX6vgAAAA69FKzsS9Kcnxy4y/prs3D6/3JklVHZ3kxCQPGT7zuqraMK+wAAAA690e\nS1x3fyjJF1f4856S5B3dfWt3fybJlUmOWUU+AAAAlljNPXEvqqpLhsst7z2M3S/JZ5dsc90w9m2q\n6uSq2lpVW7dt27aKGAAAAOvHvpa41yc5MsnmJDckedXe/oDuPrO7t3T3lo0bN+5jDAAAgPVln0pc\nd9/Y3du7+/Ykf5hvXjJ5fZIHLNn0/sMYAAAAc7BPJa6qDluy+LQkO2auvCDJiVV1t6p6YJKjknx0\ndREBAADY4YA9bVBVb0/yuCT3qarrkrw8yeOqanOSTnJNkp9Pku7+ZFWdm+SyJLclOaW7t69NdAAA\ngPVnjyWuu09aZvgNd7D9K5K8YjWhAAAAWN4eSxwAcOfadNp7xo6wcK4544ljRwBYGKt5xAAAAAB3\nMiUOAABgQpQ4AACACVHiAAAAJkSJAwAAmBAlDgAAYEKUOAAAgAlR4gAAACZEiQMAAJgQJQ4AAGBC\nlDgAAIAJUeIAAAAmRIkDAACYECUOAABgQpQ4AACACVHiAAAAJkSJAwAAmBAlDgAAYEKUOAAAgAlR\n4gAAACZEiQMAAJgQJQ4AAGBClDgAAIAJUeIAAAAmZI8lrqrOqqqbqurSJWO/XVWXV9UlVfWuqrrX\nML6pqr5WVRcPr/+6luEBAADWm5WciXtTkuN3GftAku/t7ocm+R9JTl+y7qru3jy8fmE+MQEAAEhW\nUOK6+0NJvrjL2Pu7+7Zh8W+T3H8NsgEAALCLedwT97NJ/tuS5QcOl1L+ZVX98O4+VFUnV9XWqtq6\nbdu2OcQAAADY/62qxFXVv0tyW5K3DkM3JDm8uzcn+cUkb6uqg5f7bHef2d1bunvLxo0bVxMDAABg\n3djnEldVP5PkSUme092dJN19a3d/YXh/UZKrkjxoDjkBAADIPpa4qjo+ya8keXJ3f3XJ+Maq2jC8\nPzLJUUmunkdQAAAAkgP2tEFVvT3J45Lcp6quS/LyzGajvFuSD1RVkvztMBPlY5P8VlV9I8ntSX6h\nu7+47A8GAABgr+2xxHX3ScsMv2E3256f5PzVhgIAAGB585idEgAAgDuJEgcAADAhShwAAMCEKHEA\nAAATosQBAABMiBIHAAAwIUocAADAhChxAAAAE6LEAQAATIgSBwAAMCFKHAAAwIQocQAAABOixAEA\nAEyIEgcAADAhShwAAMCEKHEAAAATosQBAABMiBIHAAAwIUocAADAhChxAAAAE6LEAQAATIgSBwAA\nMCFKHAAAwIQocQAAABOyxxJXVWdV1U1VdemSsUOq6gNV9enh33svWXd6VV1ZVVdU1Y+vVXAAAID1\naCVn4t6U5Phdxk5LcmF3H5XkwmE5VXV0khOTPGT4zOuqasPc0gIAAKxzeyxx3f2hJF/cZfgpSc4e\n3p+d5KlLxt/R3bd292eSXJnkmDllBQAAWPf29Z64Q7v7huH955IcOry/X5LPLtnuumHs21TVyVW1\ntaq2btu2bR9jAAAArC+rntikuztJ78PnzuzuLd29ZePGjauNAQAAsC7sa4m7saoOS5Lh35uG8euT\nPGDJdvcfxgAAAJiDfS1xFyR5/vD++Un+ZMn4iVV1t6p6YJKjknx0dREBAADY4YA9bVBVb0/yuCT3\nqarrkrw8yRlJzq2qFya5NskJSdLdn6yqc5NcluS2JKd09/Y1yg4AALDu7LHEdfdJu1l13G62f0WS\nV6wmFAAAAMtb9cQmAAAA3HmUOAAAgAlR4gAAACZEiQMAAJgQJQ4AAGBClDgAAIAJUeIAAAAmRIkD\nAACYkD0+7BsAgMW06bT3jB1hIV1zxhPHjgBrypk4AACACVHiAAAAJkSJAwAAmBAlDgAAYEKUOAAA\ngAlR4gAAACZEiQMAAJgQJQ4AAGBClDgAAIAJUeIAAAAmRIkDAACYECUOAABgQpQ4AACACVHiAAAA\nJkSJAwAAmJAD9vWDVfXgJOcsGToyyW8kuVeSn0uybRj/te5+7z4nBAAAYKd9LnHdfUWSzUlSVRuS\nXJ/kXUlekOQ13f07c0kIAADATvO6nPK4JFd197Vz+nkAAAAsY14l7sQkb1+y/KKquqSqzqqqey/3\ngao6uaq2VtXWbdu2LbcJAAAAu1h1iauquyZ5cpLzhqHXZ3Z/3OYkNyR51XKf6+4zu3tLd2/ZuHHj\namMAAACsC/M4E/cTST7e3TcmSXff2N3bu/v2JH+Y5Jg5fAcAAACZT4k7KUsupayqw5ase1qSS+fw\nHQAAAGQVs1MmSVXdI8njk/z8kuFXVtXmJJ3kml3WAQAAsAqrKnHd/ZUk37XL2PNWlQgAAIDdmtfs\nlAAAANwJlDgAAIAJUeIAAAAmRIkDAACYECUOAABgQpQ4AACACVHiAAAAJkSJAwAAmBAlDgAAYEKU\nOAAAgAlR4gAAACZEiQMAAJiQA8YOsGg2nfaesSMspGvOeOLYEQAAgDgTBwAAMClKHAAAwIQocQAA\nABOixAEAAEyIEgcAADAhShwAAMCEKHEAAAATosQBAABMiBIHAAAwIUocAADAhChxAAAAE6LEAQAA\nTMgBq/lwVV2T5JYk25Pc1t1bquqQJOck2ZTkmiQndPeXVhcTAACAZD5n4n60uzd395Zh+bQkF3b3\nUUkuHJYBAACYg7W4nPIpSc4e3p+d5Klr8B0AAADr0mpLXCf5/6vqoqo6eRg7tLtvGN5/Lsmhy32w\nqk6uqq1VtXXbtm2rjAEAALA+rOqeuCSP6e7rq+q+ST5QVZcvXdndXVW93Ae7+8wkZybJli1blt0G\nAACAb7WqM3Hdff3w701J3pXkmCQ3VtVhSTL8e9NqQwIAADCzzyWuqu5RVQfteJ/kx5JcmuSCJM8f\nNnt+kj9ZbUgAAABmVnM55aFJ3lVVO37O27r7fVX1sSTnVtULk1yb5ITVxwQAACBZRYnr7quTPGyZ\n8S8kOW41oQAAAFjeWjxiAAAAgDWixAEAAEyIEgcAADAhShwAAMCEKHEAAAATosQBAABMiBIHAAAw\nIUocAADAhChxAAAAE6LEAQAATIgSBwAAMCEHjB0ApmzTae8ZO8LCueaMJ44dAQBgv+ZMHAAAwIQo\ncQAAABOixAEAAEyIEgcAADAhShwAAMCEKHEAAAATosQBAABMiBIHAAAwIUocAADAhChxAAAAE6LE\nAQAATIgSBwAAMCH7XOKq6gFV9cGquqyqPllVpw7jv1lV11fVxcPrCfOLCwAAsL4dsIrP3pbkl7r7\n41V1UJKLquoDw7rXdPfvrD4eAAAAS+1zievuG5LcMLy/pao+leR+8woGAADAt5vLPXFVtSnJw5P8\n3TD0oqq6pKrOqqp77+YzJ1fV1qraum3btnnEAAAA2O+tusRV1T2TnJ/kJd19c5LXJzkyyebMztS9\narnPdfeZ3b2lu7ds3LhxtTEAAADWhVWVuKo6MLMC99bufmeSdPeN3b29u29P8odJjll9TAAAAJLV\nzU5ZSd6Q5FPd/eol44ct2expSS7d93gAAAAstZrZKR+d5HlJ/qGqLh7Gfi3JSVW1OUknuSbJz68q\nIQAAADutZnbKv0pSy6x6777HAQAA4I7MZXZKAAAA7hxKHAAAwIQocQAAABOixAEAAEyIEgcAADAh\nShwAAMCEKHEAAAATosQBAABMiBIHAAAwIUocAADAhChxAAAAE6LEAQAATIgSBwAAMCFKHAAAwIQo\ncQAAABOixAEAAEyIEgcAADAhShwAAMCEKHEAAAATosQBAABMiBIHAAAwIUocAADAhChxAAAAE6LE\nAQAATMialbiqOr6qrqiqK6vqtLX6HgAAgPVkTUpcVW1I8l+S/ESSo5OcVFVHr8V3AQAArCdrdSbu\nmCRXdvfV3f31JO9I8pQ1+i4AAIB1Y61K3P2SfHbJ8nXDGAAAAKtQ3T3/H1r1jCTHd/e/Hpafl+QH\nuvvfLtnm5CQnD4sPTnLF3INM332SfH7sEEyG/YWVsq+wN+wvrJR9hb1hf/l2R3T3xpVseMAaBbg+\nyQOWLN9/GNupu89McuYaff9+oaq2dveWsXMwDfYXVsq+wt6wv7BS9hX2hv1lddbqcsqPJTmqqh5Y\nVXdNcmKSC9bouwAAANaNNTkT1923VdW/TfLnSTYkOau7P7kW3wUAALCerNXllOnu9yZ571r9/HXC\n5absDfsLK2VfYW/YX1gp+wp7w/6yCmsysQkAAABrY63uiQMAAGANKHEAAAATosQBAABMyJpNbAIA\nTFNVfW+So5PcfcdYd795vETA/sCxZX5MbLJgqureSY7Kt+7cHxovEYvM/sJKVVUleU6SI7v7t6rq\n8CT/ors/OnI0FkxVvTzJ4zL7Reu9SX4iyV919zPGzMViqqpHJXltku9JctfMHi31le4+eNRgLBzH\nlvlS4hZIVf3rJKcmuX+Si5M8KsnfdPexowZjIdlf2BtV9foktyc5tru/Z/gDwPu7+/tHjsaCqap/\nSPKwJJ/o7odV1aFJ/ri7Hz9yNBZQVW1NcmKS85JsSfLTSR7U3aePGoyF49gyX+6JWyynJvn+JNd2\n948meXiSfxw3EgvM/sLe+IHuPiXJPydJd38ps7+aw66+1t23J7mtqg5OclOSB4yciQXW3Vcm2dDd\n27v7jUmOHzsTC8mxZY7cE7dY/rm7/7mqUlV36+7Lq+rBY4diYdlf2BvfqKoNSTpJqmpjZmfmYFdb\nq+peSf4wyUVJ/inJ34wbiQX21aq6a5KLq+qVSW6IkwQsz7FljlxOuUCq6l1JXpDkJUmOTfKlJAd2\n9xNGDcZCsr+wN6rqOUmeleSRSd6U5BlJfr27zxszF4utqjYlObi7Lxk5Cguqqo7I7IzKgUlemuQ7\nk7xuODsHy3JsWT0lbkFV1Y9kdiB8X3d/few8LDb7CytRVf8yyXHD4l9096fGzMNiqqqnZbZ/fHlY\nvleSx3X3u8dNBkyZY8t8KXELZJjh6ZPdfcuwfHCS7+nuvxs3GYukqg7u7pur6pDl1nf3F+/sTExD\nVT0iyWMyu6TyI9398ZEjsYCq6uLu3rzL2Ce6++FjZWLxVNW53X3CMFnFt/0y2d0PHSEWC8yxZb7c\nE7dYXp/kEUuW/2mZMXhbkidldj15J6kl6zrJkWOEYrFV1W8keWaS8zPbZ95YVed1938cNxkLaLn7\nmfy+wK5OHf590qgpmBLHljlyJm6B7OYvFJf4axawWlV1RZKHdfc/D8vfkeTi7jYZDt+iqs7KbKbb\n/zIMnZLkkO7+mdFCAZPn2DJfZg9aLFdX1Yur6sDhdWqSq8cOxWKqqkdX1T2G98+tqlcPD3CG5fyv\nLHkofJK7Jbl+pCwsthcl+XqSc4bXrZn9sgXfpqqeXlWfrqovV9XNVXVLVd08di4WkmPLHDkTt0Cq\n6r5Jfi+zmQY7yYVJXtLdN40ajIVUVZdk9tDMh2Y22+AfJTmhu39kzFwspqp6d2bPFfxAZseXxyf5\naJLrkqS7XzxeOmCqqurKJD9poiS4cylxMFFV9fHufsRwr9P13f2GHWNjZ2PxVNXz72h9d599Z2Vh\nMVXV73b3S6rqT7P8RBVPHiEWC66qPtLdjx47B4vLsWVtuJlwAVTVr3T3K6vqtVl+5/YXcpZzS1Wd\nnuR5SX64qu4S/5tmN7r77OGBvA8ahq7o7m+MmYmF85bh398ZNQVTs7Wqzkny7swuj0uSdPc7x4vE\ngnFsWQN+4VsMOy5B2DpqCqbmWUmeneQF3f25qnpsknuMnIkFVVWPS3J2kmsym53yAVX1/O7+0Ji5\nWBzdfVFVbUhycnc/Z+w8TMbBSb6a5MeWjHUSJY4kji1rRYlbAN39p8PO/X3d/bKx8zANQ3H7YJJn\nV9UfJ/lMkt8dORaL61VJfqy7r0iSqnpQkrcneeSoqVgo3b29qo6oqrt299fHzsPi6+4XjJ2BxefY\nMn9K3IIYdm7XlLNHwy/fJw2vz2c2w1N194+OGoxFd+COApck3f0/qurAMQOxsK5O8pGquiDJV3YM\ndverx4vEoqqquyd5YZKHZMkMuN39s6OFYlE5tsyRErdYLh527PPyrTu3SxJY6vIkH07ypO6+Mkmq\n6qXjRmICtlbVHyX542H5OXEJN8u7anjdJclBI2dh8b0ls/9f+vEkv5XZscVMlSzHsWWOzE65QKrq\njcsMt79msVRVPTXJiUkeneR9Sd6R5I+6+4GjBmOhVdXdMnsez2OGoQ8neV1337r7T7GeVdX/0d1f\nHTsHi62qPtHdD6+qS7r7ocMZ/g9396PGzsZicmyZD2fiFkBV/b/d/atJ3tvd542dh8XW3e9O8u7h\nQd9PSfKSJPetqtcneVd3v3/UgCyc4Z7bs4Ybyl22wh2qqh9M8oYk90xyeFU9LMnPd/e/GTcZC2rH\nLLf/WFXfm+RzSe47Yh4WlGPLfN1l7AAkSZ5QVZXk9LGDMB3d/ZXuflt3/2SS+yf5RJJfHTkWC6i7\ntyc5YnjEAOzJ72Z2adwXkqS7/z7JY0dNxCI7s6runeTfJ7kgyWVJXjluJBaUY8scORO3GN6X5EtJ\n7llVN2c2/fcOt3f3d44Ti6no7i8lOXN4wXLcUM6KdfdnZ39b3Gn7WFlYbN39R8Pbv0xy5JhZWHyO\nLfPjTNwC6O5f7u57JXlPdx/c3Qd190FJnpDkrSPHA/YPVyX5s3zzhvIdL9jVZ6vqh5J0VR1YVS+L\niSrYjao6tKreUFX/bVg+uqpeOHYuFpJjyxyZ2GTBVNXDM5s6/oTMnvt1fnf//ripAFgvquo+Sf5z\nkn+V2ZUh709yand/YdRgLKShvL0xyb/r7odV1QFJPtHd3zdyNBaMY8t8KXELYDfP/XpZdx8xajBg\nvzE8GP7bDvjdfewIcYD9RFV9rLu/f8cslcPYxd29eexssD9zT9xi8NwvYK29bMn7uyf5qSS3jZSF\nBVZVD0zyoiSbsuT3hO5+8liZWGhfqarvyvBHoqp6VJIvjxuJReTYMl9K3GJ4embP/fpgVe147lfd\n8UcAVq67L9pl6CNV9dFRwrDo3p3ZNOB/muT2kbOw+H4xs1kpv7uqPpJkY5JnjBuJBeXYMkcup1wg\nS577dVKSY5O8OZ77BcxBVR2yZPEuSR6Z5Pe6+8EjRWJBVdXfdfcPjJ2D6Rjug3twZn+AvqK7v7GH\nj7AOObbMlxK3oIZnrjwzybO6+7ix8wDTVlWfyexyp8rsMsrPJPmt7v6rUYOxcKrq2UmOymzSgVt3\njHf3x0cLxcKqqlOSvLW7/3FYvneSk7r7deMmY9E4tsyXEgcA7FRV/0+S52X2WIodlzy1SXBYznKT\nmCyd5AR2cGyZL/fEAezHqupXuvuVw/tndvd5S9b9p+7+tfHSsaCemeTI7v762EGYhA1VVT2cFaiq\nDUnuOnImFpNjyxx52DfA/u3EJe9P32Xd8XdmECbj0iT3GjsEk/HnSc6pquOq6rjMJmd738iZWEyO\nLXPkTBzA/q128365ZUhmv2RdXlUfy7fet2IacJbz75P8XJJ/Myz/eWYzEMKuHFvmSIkD2L/1bt4v\ntwxJ8vKxA7D4hhkp/1OSFyT57DB8eJKrM7vSa/tI0Vhcji1zZGITgP1YVW1P8pXMzrp9R5Kv7liV\n5O7dfeBY2ZiGqnpMZrMNnjJ2FhZHVb0myUFJXtrdtwxjByV5VZKvdfepY+Zj8Tm2rI4SBwB8i6p6\neJJnZzYRwWeSnN/dvz9uKhZJVX06yYN6l18kh4lNLu/uo8ZJxiJzbJkfl1MCAKmqByU5aXh9Psk5\nmf2x90dHDcai6l0L3DC4vaqcIWAnx5a1YXZKACBJLk9ybJIndfdjuvu1cV8Tu3dZVf30roNV9dzM\n9iXYwbFlDTgTBwAkydMzeyTFB6vqfZlNFW8GU3bnlCTvrKqfTXLRMLYls3tvnzZaKhaRY8sacE8c\nALBTVd0jyVMyu/Tp2CRvTvKu7n7/qMFYSFV1bJKHDIuXdfeFY+ZhcTm2zJcSBwAsq6rundkEBM/q\n7uPGzgPsHxxbVk+JAwAAmBATmwAAAEyIEgcAADAhShwAAMCEKHEAwE5V9fSq+nRVfbmqbq6qW6rq\n5rFzAdPm2DJfJjYBAHaqqiuT/GR3f2rsLMD+w7FlvpyJAwCWutEvWcAacGyZI2fiAIBU1dOHtz+S\n5F8keXeSW3es7+53jpELmDbHlrWhxAEAqao33sHq7u6fvdPCAPsNx5a1ocQBADtV1aO7+yN7GgPY\nG44t86XEAQA7VdXHu/sRexoD2BuOLfN1wNgBAIDxVdUPJvmhJBur6heXrDo4yYZxUgFT59iyNpQ4\nACBJ7prknpn9bnDQkvGbkzxjlETA/sCxZQ24nBIASJJU1YYk53b3T42dBdi/VNUR3X3t2Dn2F87E\nAQBJku7eXlX/59g5gP3Sm6rq284edfexY4SZOiUOAFjq4qq6IMl5Sb6yY9CznIBVetmS93dP8lNJ\nbhspy+S5nBIA2Gk3z3TyLCdg7qrqo919zNg5psiZOABgp+5+wdgZgP1PVR2yZPEuSR6Z5DtHijN5\nShwAsFNV3T/Ja5M8ehj6cJJTu/u68VIB+4GLknSSyuwyys8keeGoiSbM5ZQAwE5V9YEkb0vylmHo\nuUme092PHy8VAEspcQDATlV1cXdv3tMYwN6oqgOT/N9JHjsM/fckf9Dd3xgt1ITdZewAAMBC+UJV\nPbeqNgyv5yb5wtihgMl7fWb3wb1ueD1yGGMfOBMHAOxUVUdkdk/cD2Z2/8pfJ3lxd//PUYMBk1ZV\nf9/dD9vTGCtjYhMAYKfuvjbJk8fOAex3tlfVd3f3VUlSVUcm2T5ypslS4gCAVNVv3MHq7u7/cKeF\nAfZHv5zkg1V1dWYzVB6RxCNN9pHLKQGAVNUvLTN8j8ymAP+u7r7nnRwJ2M9U1d2SPHhYvKK7bx0z\nz5QpcQDAt6iqg5KcmlmBOzfJq7r7pnFTAVNUVd+f5LPd/blh+aeT/FSSa5P8Znd/ccx8U2V2SgAg\nSVJVh1TVf0xySWa3XDyiu39VgQNW4Q+SfD1JquqxSc5I8uYkX05y5oi5Js09cQBAquq3kzw9s1+q\nvq+7/2nkSMD+YcOSs23PSnJmd5+f5PyqunjEXJPmckoAIFV1e5Jbk9yW2aMFdq7KbGKTg0cJBkxa\nVV2aZHOEEOQdAAACsElEQVR331ZVlyc5ubs/tGNdd3/vuAmnyZk4ACDd7RYLYC28PclfVtXnk3wt\nyYeTpKr+r8wuqWQfOBMHAACsmap6VJLDkry/u78yjD0oyT27++OjhpsoJQ4AAGBCXDoBAAAwIUoc\nAADAhChxAKwrVbWpqp69ZHlLVf3eGnzPU6vq6Hn/XABQ4gBYbzYl2Vniuntrd794Db7nqUmUOADm\nTokDYFKq6qer6pKq+vuqestwZu0vhrELq+rwYbs3VdXvVdVfV9XVVfWM4UeckeSHq+riqnppVT2u\nqv5s+MxvVtVZVfXfh8+8eMn3PreqPjp87g+qasMw/k9V9Yohz99W1aFV9UNJnpzkt4ftv/vO/W8J\ngP2ZEgfAZFTVQ5L8epJju/thSU5N8tokZ3f3Q5O8NcnSSyMPS/KYJE/KrLwlyWlJPtzdm7v7Nct8\nzb9M8uNJjkny8qo6sKq+J8mzkjy6uzcn2Z7kOcP290jyt0OeDyX5ue7+6yQXJPnl4XuumtN/BQDg\nYd8ATMqxSc7r7s8nSXd/sap+MMnTh/VvSfLKJdu/u7tvT3JZVR26wu94T3ffmuTWqropyaFJjkvy\nyCQfq6ok+Y4kNw3bfz3Jnw3vL0ry+H36TwYAK6TEAbA/u3XJ+9qHz2zP7P8rK7Ozfacvs/03+psP\nXd2xPQCsGZdTAjAlf5HkmVX1XUlSVYck+eskJw7rn5Pkw3v4GbckOWgvv/fCJM+oqvvu+N6qOmIN\nvgcA9kiJA2AyuvuTSV6R5C+r6u+TvDrJi5K8oKouSfK8zO6TuyOXJNk+TETy0hV+72WZ3Yv3/uF7\nPpDZ/XZ35B1JfrmqPmFiEwDmqb55BQgAAACLzpk4AACACVHiAAAAJkSJAwAAmBAlDgAAYEKUOAAA\ngAlR4gAAACZEiQMAAJiQ/w3G3Mq4qTvOqwAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Diagrama de barras de lado a lado del DataFrame de arriba\n",
"\n",
"drinks.groupby('continent').beer_servings.mean().plot(kind='bar')"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3EAAAFcCAYAAABm2TeNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYbFV57/HvD1Ccp3AkyOBBgxJwwHhEI5orzooGNEbA\niagJ3hsH0EQvGG80A7kkxtmrBhVERRGniGKMSJxR8YAogxKIYICg4BDFmeG9f+xdnDpNn6mrunev\n6u/nefrp6l3dp979nO6q+u211rtSVUiSJEmS2rDV0AVIkiRJkjafIU6SJEmSGmKIkyRJkqSGGOIk\nSZIkqSGGOEmSJElqiCFOkiRJkhpiiJMkSZKkhhjiJEmSJKkhhjhJkiRJasg2QxcAsN1229Xq1auH\nLkOSJEmSBnHmmWd+v6pWbc73LosQt3r1atauXTt0GZIkSZI0iCTf2dzvdTqlJEmSJDXEECdJkiRJ\nDTHESZIkSVJDDHGSJEmS1BBDnCRJkiQ1xBAnSZIkSQ0xxEmSJElSQwxxkiRJktQQQ5wkSZIkNcQQ\nJ0mSJEkNMcRJkiRJUkO2GbqAhVp9xClL9liXHL3fkj2WJEmSJG2MI3GSJEmS1BBDnCRJkiQ1xBAn\nSZIkSQ0xxEmSJElSQwxxkiRJktQQQ5wkSZIkNcQQJ0mSJEkNMcRJkiRJUkMMcZIkSZLUEEOcJEmS\nJDXEECdJkiRJDTHESZIkSVJDDHGSJEmS1BBDnCRJkiQ1xBAnSZIkSQ0xxEmSJElSQwxxkiRJktQQ\nQ5wkSZIkNcQQJ0mSJEkNMcRJkiRJUkMMcZIkSZLUEEOcJEmSJDVkkyEuyc5JPp3k/CTnJTmsP/6K\nJJcnObv/eOzYzxyZ5KIkFyR51GKegCRJkiStJNtsxvdcC/xZVZ2V5NbAmUlO7e97TVX94/g3J9kD\nOAjYE7gT8Kkkd6uq66ZZuCRJkiStRJsciauqK6rqrP721cA3gR038iP7AydW1a+q6mLgImDvaRQr\nSZIkSSvdFq2JS7IauA/wlf7Q85N8I8mxSW7fH9sRuHTsxy5jntCX5NAka5Osveqqq7a4cEmSJEla\niTY7xCW5FfBB4PCq+gnwZuAuwF7AFcCrtuSBq+qYqlpTVWtWrVq1JT8qSZIkSSvWZoW4JDehC3An\nVNWHAKrqe1V1XVVdD7yVdVMmLwd2HvvxnfpjkiRJkqQJbU53ygBvB75ZVa8eO77D2Lc9ATi3v30y\ncFCSbZPsCuwGnDG9kiVJkiRp5dqc7pT7AE8Hzklydn/spcDBSfYCCrgEeA5AVZ2X5CTgfLrOls+1\nM6UkSZIkTccmQ1xVfQHIPHd9fCM/cxRw1AR1SZIkSZLmsUXdKSVJkiRJwzLESZIkSVJDDHGSJEmS\n1BBDnCRJkiQ1xBAnSZIkSQ0xxEmSJElSQwxxkiRJktQQQ5wkSZIkNcQQJ0mSJEkNMcRJkiRJUkMM\ncZIkSZLUEEOcJEmSJDXEECdJkiRJDTHESZIkSVJDDHGSJEmS1BBDnCRJkiQ1xBAnSZIkSQ0xxEmS\nJElSQwxxkiRJktQQQ5wkSZIkNcQQJ0mSJEkNMcRJkiRJUkMMcZIkSZLUEEOcJEmSJDXEECdJkiRJ\nDTHESZIkSVJDDHGSJEmS1BBDnCRJkiQ1xBAnSZIkSQ0xxEmSJElSQwxxkiRJktQQQ5wkSZIkNcQQ\nJ0mSJEkNMcRJkiRJUkMMcZIkSZLUEEOcJEmSJDXEECdJkiRJDdlkiEuyc5JPJzk/yXlJDuuP3yHJ\nqUku7D/ffuxnjkxyUZILkjxqMU9AkiRJklaSzRmJuxb4s6raA3gA8NwkewBHAKdV1W7Aaf3X9Pcd\nBOwJPBp4U5KtF6N4SZIkSVppNhniquqKqjqrv3018E1gR2B/4Pj+244HDuhv7w+cWFW/qqqLgYuA\nvadduCRJkiStRFu0Ji7JauA+wFeA7avqiv6u7wLb97d3BC4d+7HL+mNz/61Dk6xNsvaqq67awrIl\nSZIkaWXa7BCX5FbAB4HDq+on4/dVVQG1JQ9cVcdU1ZqqWrNq1aot+VFJkiRJWrE2K8QluQldgDuh\nqj7UH/5ekh36+3cAruyPXw7sPPbjO/XHJEmSJEkT2pzulAHeDnyzql49dtfJwCH97UOAj4wdPyjJ\ntkl2BXYDzpheyZIkSZK0cm2zGd+zD/B04JwkZ/fHXgocDZyU5NnAd4AnA1TVeUlOAs6n62z53Kq6\nbuqVS5IkSdIKtMkQV1VfALKBux+2gZ85CjhqgrokSZIkSfPYou6UkiRJkqRhGeIkSZIkqSGGOEmS\nJElqiCFOkiRJkhpiiJMkSZKkhhjiJEmSJKkhhjhJkiRJaoghTpIkSZIaYoiTJEmSpIYY4iRJkiSp\nIYY4SZIkSWqIIU6SJEmSGmKIkyRJkqSGGOIkSZIkqSGGOEmSJElqiCFOkiRJkhpiiJMkSZKkhhji\nJEmSJKkhhjhJkiRJaoghTpIkSZIaYoiTJEmSpIYY4iRJkiSpIYY4SZIkSWqIIU6SJEmSGmKIkyRJ\nkqSGGOIkSZIkqSGGOEmSJElqiCFOkiRJkhpiiJMkSZKkhhjiJEmSJKkhhjhJkiRJaoghTpIkSZIa\nYoiTJEmSpIYY4iRJkiSpIYY4SZIkSWrINkMXoHVWH3HKkjzOJUfvtySPI0mSJGn6HImTJEmSpIYY\n4iRJkiSpIZsMcUmOTXJlknPHjr0iyeVJzu4/Hjt235FJLkpyQZJHLVbhkiRJkrQSbc5I3DuAR89z\n/DVVtVf/8XGAJHsABwF79j/zpiRbT6tYSZIkSVrpNhniqupzwA8389/bHzixqn5VVRcDFwF7T1Cf\nJEmSJGnMJGvinp/kG/10y9v3x3YELh37nsv6YzeS5NAka5OsveqqqyYoQ5IkSZJWjoWGuDcDdwH2\nAq4AXrWl/0BVHVNVa6pqzapVqxZYhiRJkiStLAsKcVX1vaq6rqquB97KuimTlwM7j33rTv0xSZIk\nSdIULCjEJdlh7MsnAKPOlScDByXZNsmuwG7AGZOVKEmSJEka2WZT35DkvcBDgO2SXAa8HHhIkr2A\nAi4BngNQVeclOQk4H7gWeG5VXbc4pUuSJEnSyrPJEFdVB89z+O0b+f6jgKMmKUqSJEmSNL9JulNK\nkiRJkpaYIU6SJEmSGmKIkyRJkqSGGOIkSZIkqSGGOEmSJElqiCFOkiRJkhpiiJMkSZKkhmxynzhp\noVYfccqSPM4lR++3JI8jSZIkLQeOxEmSJElSQwxxkiRJktQQQ5wkSZIkNcQQJ0mSJEkNMcRJkiRJ\nUkMMcZIkSZLUEEOcJEmSJDXEECdJkiRJDTHESZIkSVJDDHGSJEmS1JBthi5AasXqI05Zkse55Oj9\nluRxJEmS1CZH4iRJkiSpIYY4SZIkSWqIIU6SJEmSGmKIkyRJkqSGGOIkSZIkqSGGOEmSJElqiCFO\nkiRJkhpiiJMkSZKkhhjiJEmSJKkhhjhJkiRJaoghTpIkSZIaYoiTJEmSpIYY4iRJkiSpIYY4SZIk\nSWqIIU6SJEmSGmKIkyRJkqSGGOIkSZIkqSGGOEmSJElqyCZDXJJjk1yZ5NyxY3dIcmqSC/vPtx+7\n78gkFyW5IMmjFqtwSZIkSVqJNmck7h3Ao+ccOwI4rap2A07rvybJHsBBwJ79z7wpydZTq1aSJEmS\nVrhNhriq+hzwwzmH9weO728fDxwwdvzEqvpVVV0MXATsPaVaJUmSJGnFW+iauO2r6or+9neB7fvb\nOwKXjn3fZf0xSZIkSdIUTNzYpKoKqC39uSSHJlmbZO1VV101aRmSJEmStCIsNMR9L8kOAP3nK/vj\nlwM7j33fTv2xG6mqY6pqTVWtWbVq1QLLkCRJkqSVZaEh7mTgkP72IcBHxo4flGTbJLsCuwFnTFai\nJEmSJGlkm019Q5L3Ag8BtktyGfBy4GjgpCTPBr4DPBmgqs5LchJwPnAt8Nyqum6RapckSZKkFWeT\nIa6qDt7AXQ/bwPcfBRw1SVGSJEmSpPlN3NhEkiRJkrR0DHGSJEmS1BBDnCRJkiQ1xBAnSZIkSQ0x\nxEmSJElSQwxxkiRJktQQQ5wkSZIkNcQQJ0mSJEkNMcRJkiRJUkMMcZIkSZLUEEOcJEmSJDXEECdJ\nkiRJDTHESZIkSVJDDHGSJEmS1BBDnCRJkiQ1xBAnSZIkSQ3ZZugCJA1n9RGnLMnjXHL0fkvyOJIk\nSSuBI3GSJEmS1BBH4iTNDEcWJUnSSuBInCRJkiQ1xBAnSZIkSQ0xxEmSJElSQwxxkiRJktQQQ5wk\nSZIkNcQQJ0mSJEkNMcRJkiRJUkMMcZIkSZLUEEOcJEmSJDXEECdJkiRJDTHESZIkSVJDthm6AEnS\n/FYfccqSPM4lR++3JI8jSZKmw5E4SZIkSWqIIU6SJEmSGmKIkyRJkqSGGOIkSZIkqSGGOEmSJElq\niCFOkiRJkhpiiJMkSZKkhky0T1ySS4CrgeuAa6tqTZI7AO8DVgOXAE+uqh9NVqYkSZIkCaYzErdv\nVe1VVWv6r48ATquq3YDT+q8lSZIkSVOwGNMp9weO728fDxywCI8hSZIkSSvSpCGugE8lOTPJof2x\n7avqiv72d4Ht5/vBJIcmWZtk7VVXXTVhGZIkSZK0Mky0Jg54UFVdnuSOwKlJvjV+Z1VVkprvB6vq\nGOAYgDVr1sz7PZIkSZKk9U00EldVl/efrwQ+DOwNfC/JDgD95ysnLVKSJEmS1FlwiEtyyyS3Ht0G\nHgmcC5wMHNJ/2yHARyYtUpIkSZLUmWQ65fbAh5OM/p33VNUnknwVOCnJs4HvAE+evExJkiRJEkwQ\n4qrq28C95zn+A+BhkxQlSZIkSZrfYmwxIEmSJElaJIY4SZIkSWqIIU6SJEmSGmKIkyRJkqSGGOIk\nSZIkqSGGOEmSJElqiCFOkiRJkhpiiJMkSZKkhhjiJEmSJKkhhjhJkiRJaoghTpIkSZIaYoiTJEmS\npIYY4iRJkiSpIYY4SZIkSWrINkMXIElaOVYfccqSPM4lR++3JI8jSdIQHImTJEmSpIYY4iRJkiSp\nIYY4SZIkSWqIIU6SJEmSGmKIkyRJkqSGGOIkSZIkqSGGOEmSJElqiCFOkiRJkhpiiJMkSZKkhhji\nJEmSJKkhhjhJkiRJasg2QxcgSVKrVh9xypI8ziVH77ckjyNJaoMjcZIkSZLUEEOcJEmSJDXEECdJ\nkiRJDTHESZIkSVJDDHGSJEmS1BC7U0qSJMBum5LUCkfiJEmSJKkhhjhJkiRJaoghTpIkSZIa4po4\nSZI0s2Ztnd+snY+khVm0kbgkj05yQZKLkhyxWI8jSZIkSSvJoozEJdka+H/AI4DLgK8mObmqzl+M\nx5MkSVJ7HFmUFmaxplPuDVxUVd8GSHIisD9giJMkSdJMWqpQCgbTlW6xQtyOwKVjX18G3H+RHkuS\nJEnSIpi10dJZOZ9U1fT/0eRJwKOr6o/7r58O3L+qnjf2PYcCh/Zf3h24YOqF3Nh2wPeX4HGW0qyd\nk+ez/M3aOXk+y9+snZPns/zN2jl5PsvfrJ2T57Mwd66qVZvzjYs1Enc5sPPY1zv1x25QVccAxyzS\n488rydqqWrOUj7nYZu2cPJ/lb9bOyfNZ/mbtnDyf5W/WzsnzWf5m7Zw8n8W3WN0pvwrslmTXJDcF\nDgJOXqTHkiRJkqQVY1FG4qrq2iTPA/4V2Bo4tqrOW4zHkiRJkqSVZNE2+66qjwMfX6x/f4GWdPrm\nEpm1c/J8lr9ZOyfPZ/mbtXPyfJa/WTsnz2f5m7Vz8nwW2aI0NpEkSZIkLY7FWhMnSZIkSVoEhjhJ\nkiRJaoghTpIkaUYk2SrJk4euQ9LiMsQ1Jsnzk9x+6Dq0aUluMXQN0+Lv3fKWZOskd0qyy+hj6Jom\n0Z/Pp4euY5qSPD6Jr7ladFV1PfCSoeuYpj6YPnDoOrRxSX5j6BqmKcnWQ9ewMTP/gpJktyQfSHJ+\nkm+PPoauawLbA19NclKSRyfJ0AVNIsmqJC9NckySY0cfQ9c1iSQPTHI+8K3+63snedPAZU1qZn7v\n0nlakr/sv94lyd5D17VQSZ4PfA84FTil//jYoEVNqKquA65Pctuha5miA4ELk/xDkt2HLmZSSfZJ\ncmqSf+9fVy9u8bU1yRf6z1cn+cnYx9VJfjJ0fRP4VJI/T7JzkjuMPoYuaqH6YPr/hq5j2pJ8KMl+\nM3SB58tJ3p/ksS2/TxhzYZJXJtlj6ELmM/PdKfsn6JcDrwEeDzwT2Kqq/nLQwibQ/2E8ku5c1gAn\nAW+vqv8YtLAFSHI68HngTOC60fGq+uBgRU0oyVeAJwEnV9V9+mPnVtU9hq1sMrPye5fkzcD1wEOr\n6rf7EcZPVtX9Bi5tQZJcBNy/qn4wdC3TlOQjwH3owunPRser6gWDFTWhJLcBDqb7GyrgOOC9VXX1\noIUtQJJvAS/kxs/dM/V72KokF89zuKrqLktezJQk+UfgS8CHakbevCZ5ON3zwQOA9wPHVdUFw1a1\ncP37hIcDzwLuR/c+4R1V9e+DFrZASW4NHESfHYBjgROrallc4FkJIe7MqrpvknOq6p7jx4aubRJJ\n7k33S/Vo4NN0TwCnVlVTUyiSnF1Vew1dxzQl+UpV3T/J18ZC3Ner6t5D1zapWfi9S3JWVf3OrPz/\n9NMOH1FV1w5dyzQlOWS+41V1/FLXMk39dKOnA4cD3wR+C3h9Vb1h0MK20Oh5bug6JpXkiVX1of72\n7avqR0PXpPkluRq4JXAt8EsgdMH0NoMWNgX9rIODgb8ALgXeCry7qq4ZtLAJJNkXeDfd/9nXgSOq\n6kvDVrVwSf4H8B7gdsAHgL+pqouGrGnRNvteRn7VD1NfmOR5wOXArQauacGSHAY8A/g+8DbgxVV1\nzegcaW8e/MeSPLbfHH5WXNrP3a8kNwEOo3uz1qwZ+727pp/nXtBN6aUbmWvVt4HPJDkF+NXoYFW9\neriSJldVxye5KXC3/tAFjb+h2R/4I7rQ9k5g76q6Mt3a2fOBpkIc8OkkrwQ+xPq/d2cNV9KCvIzu\nHABOA35nwFqmpv+9ehGwS1UdmmQ34O5V1exU66q69dA1LIb+ws7T6C7ufA04AXgQcAjwkOEq23Jz\nzuV7wPOBk4G96EYadx2uui3Xv1fYj+7i9WrgVXT/Pw8GPs6616dBrIQQdxhwC+AFwN8AD6X7w2jV\nHYAnVtV3xg9W1fVJHjdQTZM4DHhpkl8DozdorV9Z+5/A64Ad6S4afBJ47qAVTW6Wfu9eD3wYuGOS\no+imvr5s2JIm8p/9x037j5mQ5CHA8cAldFfcd05ySFV9bsi6JvAE4DVz66+qnyd59kA1TWI0Crdm\n7FjRvca2JBu43brj6Ka6jpqBXE73Jrq5EJdk96r6VpJ5A3aDFw5ukOTDwN2BdwGPr6or+rvel2Tt\ncJUt2JfozuWAqrps7PjaJG8ZqKZJXEg36+iVVXX62PEPJPm9gWq6wcxPp5wVm1qQXFU/XKpatDL1\nV6S2Z+ziT1X953AVLVzfWOJhdG/aTquqpkdKAZLcCqCqfjp0LdOQ5EzgKaP1IUnuRrd+rLmp8P3f\nzqeqat+ha5mGfgT+SVV10tC1TKpf23cw3XqXdwNPYSzMtRoQkqytqjWzMG08yTH9aOJ8HWurqlq7\ncHCDJPtW1Ux04u2f5/6hqv5s6FqmJcmtlvNr6syPxCVZQzfH+M6s/+bzXoMVtTBn0l3lDLAL8KP+\n9u3orsI3NUQ9LsnvA6MrGp9peboHQJJd6aYQrGb937nfH6qmSfVTkV9BNz1iNPWwgKb+jvoXmfOq\nanf67qGtS3IPuiufd+i//j7wjKo6b9DCJneT8QX+VfXv/fTk5lTVdUmuT3Lbqvrx0PVMqh+Bfwld\n04LWXQGMph5/d+w2tDmyOPLrJDdn3bTxuzI27bUlVXVo/3kmLoJAtxZzvtsjo3WaLemf52ZtG4i/\nTPK3wC+AT9C953lhVb172LI6Mx/i6Oauvhg4h4bXvVTVrgBJ3gp8eLSGLMljgAOGrG0SSY6m62B0\nQn/osCT7VNWRA5Y1qX8G3g58lIZ/5+Y4nG49RdOd5/oXmQuS7NLqKOI8jgFeNLqa209DfCvrplG1\nam2St9GNjgA8FWhxetHIT4FzksxKt81PJflz4H2sfz5NzQqZpWAwx8vp3nTunOQEYB+6NZnN6i/i\n/C/GLvoC/9ToWtnHb+S+Yt06zdacneRkuqm7488LrZ7PI6vqJUmeQDe1/4nA51j3ujSomZ9OmeQL\nVfWgoeuYlvEumxs71ook3wD26veAGY2UfK3BkdIbzErXtnGz1AExyefoWtefwfovMk2OlM43RarV\naVPjkmxLt5Z09Pz9eeBNVdXkaMKsdducxRb2I6Ppe0PXMam+ycQD6GbtfLmqvj9wSRPpL+rchG6t\nLHTNM66rqj8eriqNS3LcPIerqp615MVMQZLzqmrP/nfvA1X1ieX0+roSQtzD6Oa7n8b6HbSavCqQ\n5F/p3syMX53+vap61HBVLVwf4h4yunrbr/37TOMh7inAbnQNTVru2kaSF/U396RbfN18B8S+TfCN\nVNVnl7qWaegXxp9FN6USus5g962qJwxX1WT6iznvrKqnDl3LNPXT23aphveBWglG25AMXcdCbKj5\nx0iLr0Mjs3TBKsnTqurdY6+x62nxtXUW9bPFDqCbTrk33RKmjy2XC/UrYTrlM4Hd6a7ejK/laTLE\n0QXSl9N11yu6Yd2DB61oMv8X+Fo/0hO6aRJHDFvSxO5Jd4Xwoaz/O9fi2opRS+eZ6YDYaljbiGcB\nf8W657TP98ea1U97vXOSm1bVr4euZxqSPB74R7q/n12T7AX8dcMjwDPXwn7MlUMXMIFXbeS+Vl+H\nRq5Lcteq+g+AJHdhbKP5xtyy/zxT2yYk2Yluu5R9+kOfBw6b06myGVV1RJJ/AH7cvy79DNh/6LpG\nVsJI3AVVdfeh65i2JLesqp9t+juXvyQ70K2LAzijqr47ZD2TSnIRsMesvPkcl+Q2dFMjrh66loVK\nt2Hs6InvpnQXeH7W+LYWMyfJO4HfpttjaHzaa5NXqPtumw+lm2kw6hZ4blXdY9jKFibJ++gabj2j\nqu7Rh7rTq2qvgUvTjOpnVh1Htzdm6BrWPXNWujvOgn7N73tYf2bIU6vqEcNVNZm+edgewM1Gx6rq\nncNVtM5KGIk7PckeVXX+0IVMQ9/55210G5bvkuTewHOq6k+HrWzLzLPvy+gqzZ2S3KnlKR/AuXRD\n7i1fzV1P3+X1OPqrhkl+DDyrqs4ctLAFqLENY5OE7qraA4araGGSvLaqDk/yUdaF0hu0OsIz5j/6\nj62YjavV11TVj7tfuRu03PjorlV1YJKD4Yb97prbZ23W/o7m63Q4rtWlJABVddpoxLc/dEGra2RH\nZrCb9aqqGl8X944khw9WzYSSvJxuw/U96Db3fgzwBcAQt0QeQNct52K6tTyhG0lodc3Va4BH0V2d\npqq+nmWw4eACvAg4lPmnfrQ+5eN2wLeSfJX114+1+qQMcCzwp1X1eYAkD6ILda3+HQHdEwHwz/0T\ndWvTeEdXOv9x0CoWQb8m7tZV9edD1zJF5/XrZbfu34i+ADh9Ez+znM1KC/tZ+zuaua6HG3mPc/8k\nVNXnlrSg6Zq1btY/SPI04L391wcDLXe1fhJwb7qGe89Msj3LpDMlrIwQ9+ihC5i2qrp0zgXP5uaE\nj3X+ekxV/XL8viQ3m+dHWvLyoQtYBNeNAhxAVX0hSZOdKudcqd4KWAP8cgPfvmyNjYLuVVWvG78v\nyWFAs2v/+rUH+2z6O5vyfLo9S39FN93oX4G/GbSiybyCG7ewf+agFS3A2N/RWuAXczolbztYYQtU\nVc39H2yGF89zbLRP6c7A1ktbzlT9sqpeP3QRU/QsujVxr6H7Pzqdtre2+EW/L+a1/XKSK+l+55aF\nmQ9xVfUdgCR3ZGw+a8Mu7adUVr9nymHANweuaRKnA3O7ac13rBkz2DgD4LNJ/onu6loBBwKfGU2H\nbWz66/iV6mvp9n5ZNguVF+AQ4HVzjv3RPMdaM2v7De1XVX9BF+QASPKHdOfXnKr6ZL/Ob9TC/rDG\nW9ifBjycbj8/gJvTdRhuar/FWex6WFXrjS72F3heRrc5+/MHKWp6XtfPBGm+m3Vvp7mzjvr/r0sH\nqmdSa5Pcjm7v1TPpnh++NGxJ68x8iEvy+3RT9u5El6DvTBd69hyyrgn8T7o3ZzsCl9P94T930IoW\nIMlv0p3DzZPch+5NAMBtgFsMVtgE0u9JOKdxBqybwtty44xRC+e5o4z3ob3pr2+rqi+OH+hfZJpa\nw9ivRXoKXafDk8fuujXQ1IbLG3Azumk4479bTU4H6x3JjQPbfMeakOS0qnoY3bYjc4+16GZVNQpw\nVNVP+2YtrZnJrodwQ2OT/0P3PPB3VXXqwCVNwyx1s4ZuFG7uRfj5jjVhrN/EW5J8ArhNVX1jyJrG\nzXyIo5uu8gDgU1V1nyT70nXLaVJ/pXMW9k56FN1owU50IXsU4n4CvHSgmiZS/aby440zZkVV7Tt0\nDVM0Ky8ypwNXANux/trSq4Fl8yKzULMyLSzJY4DHAjsmGZ82dRu6keCm9NPdbwFsl+T2rH8BbsfB\nCpvcz5L8zmgEpG/m9IuBa9piVfVP/VTQn1TVa4auZxqS7Ec3gv1j4GVV9YWBS5qmPwTu0no36yS/\nSzdqvWrOKPBtaHi66/iFqaq6ZO6xoa2EEHdNVf0gyVZJtqqqTyd57dBFLdSsdDKqquOB45P8QVV9\ncOh6pqV/8TyvqnYfupZp619I92T9Nrt/PVxFW2bWXmT6qeLfAX536FoWwwztN/RfdOutfp9uOs7I\n1cALB6loMs8BDqeb3XIm61+Ae+NQRU3B4cD7k/xX//UOdNPGm9OvKT2Ybl3SLPgoXQfrHwAvSfKS\n8Ttbe/8zx6x0s74pXdf0bVh/FPgndM1BmtLKxaqVEOL+O8mt6DbFPiHJlYytr2jQrHUyum9/VeO/\nAfo/lj+rqpcNXNeC9C+eFyTZpar+c+h6piXJW+ie0Pal2+LiScAZgxa15WbqRWYkyQPows5v053j\n1szGvnf8UBdsAAAOmklEQVTH0TUA+cP+66f1x5rab6iqvg58Pcl7quoauOF5bueq+tGw1W25vonO\n65I8v6reMHQ9k0pyP+DSqvpqkt3pQuoT6Zq2XDxocZP5YpI3Au9j/TWlLa61mqWZIHPNRDfrvhfA\nZ5O8Y6wXxVbArarqJ8NWtyBNXKxaCZt935JuSsRWdNMQbwucUFVNtjxN8pWquv/QdUxLkq+NNr4d\nO3ZWVbU2te0GST5Ht1bsDNZ/8WzqSXlckm9U1b3GPt8K+JeqevDQtW2pJHcevcjMgiRrgYPo1lat\nAZ4B3K2qjhy0sAklOXvuxtHzHWtFks/QjcZtQ/em4Eq6zbFbHI0bNWX5RFVdneRldNOR/7a1kJDk\nLODhVfXDvpX9iXSzXfYCfruqmrzAk2S+DbCrqlpda7We8amvLUvyP+Y73mqDtCTvoevdcB3wVbqR\nq9dV1SsHLWyBlvvFqpkeieuntn2sX89zPXD8wCVNw6x1Mto6ybajDTv7fYeaa+sMkOS3gO3pFl6P\nezDd2qWWjVrw/zzJneimtewwYD2T2DbJMdx4SnKzb26q6qIkW1fVdcBxSb5G1zSjZbO239Btq+on\nSf4YeGdVvTxJy2sX/09VvT/dnpEPB14JvBlo7SLj1lU1agR0IHBMP8X/g0nOHrCuiczYOub5vI32\n1jHfSKthbSP26J/nngr8C93+q2fSPT80p6re0HeEX8367xfc7Hux9VPbrk9y26r68dD1TMmsdTI6\nATgtyXF0w9V/RLth+7XAkVV1zvjBJD8E/o5uGmyrPtq32X0lcBbd79xbhy1pwd4PvIXuTUBzeyzO\n4+dJbkrXkv8f6C4YbDVwTdMw335DLTc72SbJDsCTGdtmoGGjv5396ILPKUn+dsiCFmjrJNtU1bXA\nw4BDx+5r9j1Skm2BP+DGbz6bWce8Cdn0tyx/Mzgd/ib99lcHAG+sqmuSNDvlL8m7gLsCZ7PuOa8A\nQ9wS+SlwTpJTWX9q2wuGK2kiM9HJaKSq/j7J1+mu5BbdBrh3HraqBdt+boADqKpzkqxe+nKmo5/X\nPlq3+MEkH6Nrx93qhZFrq+rNQxcxRU+ne+F/Hl2jjJ3p3rw1rZ/y2uwU5Hn8Nd3z2xf69Vd3AS4c\nuKZJXJ5u78hHAH/fh4YWLx68l24tz/fpll58Hm6YWdHqcxzAR+jqP5OxWTsz5K+GLmBK3sg80+EH\nrWgy/0S39+rXgc8luTPdOrJWraEbXVyWQXQlrIk7ZL7jfXfE5iT5Z+DQqmq9k9EN+n3inkIXUC8G\nPlhVy2bh6OZKcmFV7baB+y6qqt9a6pqmZb61i61K8gq69UgfZv0pybOwt1rzkryB9fdZXE+rF+CS\n/Eara7Hn0++h9mjgnKq6sB9lvGdVfXLg0rZYPxqyA/DJqvpZf+xudE0ZmlyqkOTcqrrH0HVMW5J7\ncePRxVb3jiTJ2qpaM1pv3h+bmddbgLGR7uYkeT/wgqpalktiZn4krqqOT7Kqv33V0PVMwUx0Mupf\nIA/uP75P10Erjc/jX5vkT6pqvWmG/RqYMzfwM604LckfAB9arlektsDows6Lx44VcJcBaplYksfR\n7Yd5Z7rn9NY3l187dvuvuPEG8636cr/G6ji6pkBN/x1V1c/7bs8PohtRvJZGRxar6svzHPv3IWqZ\notOT3HO+2SGtSnIscC/gPNZfTtJsiGNGpsMneVpVvXvO9j3jXr2kBU3PdsD5Sc5gGb7nntmRuCSh\ne/F/Ht0fROheZN7Q8pzwWelklOR6umkrz66qi/pj366qJt9IAyTZnm5059esC21r6Oa5P6GqvjtU\nbZNKcjVwS7q/oV/SflCYGUkuomuJfk7rwWCuWboi3b8mPZxurd/9gJOAd7QaFvoGW2uAu1fV3fqG\nR++vqn028aNaREnOpQs42wC7Ad+me/M5es6+14DlTSTJ+VW1x9B1TFM/3fB7dO8TXkjXQf1No/dF\nrUjynOo2mp/3oltVNTn9dbm/557lEPci4DF0Uw8v7o/dha571ieqqtlNMPuwcL/+yzNanFqZ5AC6\neeD70O3HcyLwtqraddDCpiDJvsBoGst5VfVvQ9aj9fXTwF4E7FJVhybZje6N6McGLm1B+lbiD6uq\nWdg3cj2tbzeyIf1zxLvpLox8HTiiqr40bFVbph9VvA9w1ihoj08J0zCS/Ihue4R5tby9SpK3A6+q\nqvOHrmVSmbG9ZDWMWQ5xXwMeUVXfn3N8Fd289yav7iZ5Ml2HwM/QXVl7MPDiqvrAkHUtVLp9/Pan\nm1b5ULqOPx9ucV3FLEsy3xvpHwPfaW2ue5L30Y2UPqOq7tGHutOr3f3H7kc3nfKzrD/do9XpKzeY\npRCX5DfoNix/Ot2V97cDJ9O94X5/axewkpxRVXuP/o/65/IvGeKGNUt/M3P1oyInA9+l8dHF8f+n\nJB+sqqabUSV5/cbub3gt87LuHjrLa+JuMjfAQbcurm9/2qq/AO43Gn3rQ+mngCZDXL+I/D3Ae5Lc\nnq65yf+m2wdPy8eb6PbkGa2vuCdwLnDbJP+rsdB916o6MMnBcMPanpbbVR9F14X3ZnQvMk3rp+6O\nri7eIsmos1nrU3i/BLwLOKCqLhs7vjbJWwaqaRIn9d0pb5fkT+imiba67cgsueNG1iW1fnHn7XQX\nQc5h3Zq4Vo2/5jS7jGTM+Lr/WVrLvKy7h85yiNtYC/6W2/NvNWf65A9ocBHsfKrqR8Ax/YeWl/+i\nW794HkCSPehapr+EblF5SyHu1+k2lS+AJHel7Rbcd5qlLnRVdeuha1gkd9/QmsWq+vulLmahkhxO\nt2ffa4F96dqH3x34y6o6dcjaBHQjBbdiRvZRm+Oqqjp56CKmpDZwu0njHd+THN5qB/j5VNVFSbau\nquuA4/qZfkcOXRfMdoi799gV3HGhu2Ldqk8k+Ve6vW0ADgQ+PmA9WhnuNgpwAFV1fpLdq+rbDQ5i\nvYJuHebOSU6gW5f5R0MWNKGPJ3lkY6OhK0aSk8du3+j+5dLlbAvsRBfgdqcbEfkiXahrvQPvrLii\n5eZtm/C1JO8BPsr6U8db7E45eo8a4OYzNOMAZiCUjlnW3UNndk3crOk3Ht2+qr6Y5Il0bZ0B/hs4\noar+Y7jqNOv6dWQ/pGtAA93Fg+3oprZ8oarut6GfXY769UkPoHvB/PJ8U69bMdY59FfANczGm4CZ\nkeQq4FK6C29fYc4IyXLpcral+jc2a4AHAr/bf/z3rHUPbM0sdXSdK8lx8xyuqnrWkhejDZqldZnL\nvXuoIa4RST4GHDl3z5ck9wT+rqoeP0xlWgn66Yd/yrqLB1+kWyf3S+AWVfXToWrbUkk+SrcO8+R+\nTaa0aJJsDTyCrnnTvYBTgPeOj2y3KMlt6YLbPv3n29Ftc/HMQQtb4ZLcoap+OHQdWlnmrmUGfj66\niwYvKrbSPdQQ14gkX93QaEeSc6rqnktdk1aW/sr73emeqC+oqmsGLmlB+g5nBwL7AV+lG138WFX9\nctDCtlA/nfVbG+gcSlWdtdQ1aeOSbEsX5l4J/FVVvXHgkrZYkmOAPYGr6UYWv0w3mv2jQQvTzEry\nBjYyRa/VzodavlrpHjrLa+Jmze02ct/Nl6wKrUhJHgIcD1xCd2Vt5ySHVNXnhqxrIfrpa5/tR0ge\nCvwJcCzQ1JVCur3uDgVeNc99RXduWgb68LYfXYBbDbwe+PCQNU1gF2Bb4ELgcuAyumn90mJZO3QB\nWnGa6B7qSFwjkrwX+Leqeuuc439Mtx/egcNUppUgyZnAU6rqgv7ru9FNCbvvsJUtTD899PF0I3K/\nQzcS9/xhq9IsSvJO4B50DahOrKpzBy5pYv2WHHvSrYd7IN35/ZBun7hZaS0uaYWaMxK3bNf4GeIa\nkWR7uiu3v2ZdF7A1dIstn1BV3x2qNs2+JN+Yu6HqfMdakOQkYG+6DpXvAz5bVU3vOZTkgXQjPDfM\nrqiqdw5WkG6Q5HpgtPZy/AW3ybUi45LsRLcm7oHA44DfqKqNzRqRtliS11bV4f165hu9aW2ww6uW\nuSTX0T1vh26227Jc42eIa0ySfemuegKcV1X/NmQ9WhmSHEu3ueq7+0NPBbZusStYkkcBn+r3fGle\nkncBdwXOBkbnVK4T0WJI8gLWjcBdQ7e9wOjjnNYviGj5SXLfqjqzX898I612eJUmZYiTtEn9mp7n\nsq475efp2uw2s0l2vzXHBjW61xBJvgnssaGNpKVpSvJq+r3hquqKoevR7GulU6C01AxxklaEsT2G\n7kg3ijAaxd6X7g3p4wYpbEJJ3g+8wDfUkmZRK50CpaVmd0pJG5TkpKp6cpJzmH8tQjNr4kb7VyU5\nlW7k6or+6x2AdwxY2qS2A85Pcgbdht/QTafcf8CaJGlamugUKC01Q5ykjTms/9zkKNUG7DRn1Op7\nwJ2HKmYKXjF2O8CDgYOGKUWSpq42cFta0ZxOKWmLJNkO+EGra7CSvBHYDXhvf+hA4MKWG4EkuQ/w\nFOAPgYuBD1XVG4atSpIm10qnQGmpORInaYOSPAA4mm4PqL8B3kU3fW+rJM+oqk8MWd9CVNXzkjwB\n+L3+0OnAbw5Y0oL0e/Ud3H98n267hFTVvoMWJklTVFVbD12DtBwZ4iRtzBuBlwK3pWsE8piq+nKS\n3elGspoLcb1L6JqbjEauPjhoNQvzLbouoY+rqosAkrxw2JIkSdJSMMRJ2phtquqTAEn+uqq+DFBV\n30qy8Z9cZmZw5OqJdGvfPp3kE8CJrN8AQJIkzaithi5A0rI2vnHvL+bc19qauG8BD6UbuXpQv2as\n2Q2/q+qfq+ogYHfg08DhwB2TvDnJI4etTpIkLSYbm0jaoE0sKL9ZVd1kqNq2VJID6Eau9qGbBnoi\n8Laq2nXQwqYoye3ppogeWFUPG7oeSZK0OAxxklaUJLcE9qebVvlQ4J3Ah0fTRiVJkpY7Q5ykFcuR\nK0mS1CJDnCRJkiQ1xMYmkiRJktQQQ5wkSZIkNcQQJ0mSJEkNMcRJkiRJUkP+PzqoF2UFKhhZAAAA\nAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# bar plot of the 'value_counts' for the 'genre' Series\n",
"movies.genre.value_counts().plot(kind='bar')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Manejando Valores perdidos o Missing Values\n",
"\n",
"¿Qué significa \"NaN\"? \n",
"\n",
"- \"NaN\" no es una cadena, sino que es un valor especial: __numpy.nan__. \n",
"- Representa \"Not a number\" e indica un valor faltante. \n",
"- __read_csv__ detecta los valores perdidos (de forma predeterminada) al leer el archivo y los reemplaza con este valor especial."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
City
\n",
"
Colors Reported
\n",
"
Shape Reported
\n",
"
State
\n",
"
Time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
18236
\n",
"
Grant Park
\n",
"
NaN
\n",
"
TRIANGLE
\n",
"
IL
\n",
"
12/31/2000 23:00
\n",
"
\n",
"
\n",
"
18237
\n",
"
Spirit Lake
\n",
"
NaN
\n",
"
DISK
\n",
"
IA
\n",
"
12/31/2000 23:00
\n",
"
\n",
"
\n",
"
18238
\n",
"
Eagle River
\n",
"
NaN
\n",
"
NaN
\n",
"
WI
\n",
"
12/31/2000 23:45
\n",
"
\n",
"
\n",
"
18239
\n",
"
Eagle River
\n",
"
RED
\n",
"
LIGHT
\n",
"
WI
\n",
"
12/31/2000 23:45
\n",
"
\n",
"
\n",
"
18240
\n",
"
Ybor
\n",
"
NaN
\n",
"
OVAL
\n",
"
FL
\n",
"
12/31/2000 23:59
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" City Colors Reported Shape Reported State Time\n",
"18236 Grant Park NaN TRIANGLE IL 12/31/2000 23:00\n",
"18237 Spirit Lake NaN DISK IA 12/31/2000 23:00\n",
"18238 Eagle River NaN NaN WI 12/31/2000 23:45\n",
"18239 Eagle River RED LIGHT WI 12/31/2000 23:45\n",
"18240 Ybor NaN OVAL FL 12/31/2000 23:59"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Leyendo el dataset de reportes de avistamientos en un dataframe\n",
"ufo = pd.read_csv('../data/ufo.csv')\n",
"ufo.tail()"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"18236 True\n",
"18237 True\n",
"18238 True\n",
"18239 False\n",
"18240 True\n",
"Name: Colors Reported, dtype: bool"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Si color reported es null retornara True\n",
"ufo['Colors Reported'].isnull().tail()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"18236 False\n",
"18237 False\n",
"18238 False\n",
"18239 True\n",
"18240 False\n",
"Name: Colors Reported, dtype: bool"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Caso contrario retornara False con notnull()\n",
"ufo['Colors Reported'].notnull().tail()"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
City
\n",
"
Colors Reported
\n",
"
Shape Reported
\n",
"
State
\n",
"
Time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
21
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
LA
\n",
"
8/15/1943 0:00
\n",
"
\n",
"
\n",
"
22
\n",
"
NaN
\n",
"
NaN
\n",
"
LIGHT
\n",
"
LA
\n",
"
8/15/1943 0:00
\n",
"
\n",
"
\n",
"
204
\n",
"
NaN
\n",
"
NaN
\n",
"
DISK
\n",
"
CA
\n",
"
7/15/1952 12:30
\n",
"
\n",
"
\n",
"
241
\n",
"
NaN
\n",
"
BLUE
\n",
"
DISK
\n",
"
MT
\n",
"
7/4/1953 14:00
\n",
"
\n",
"
\n",
"
613
\n",
"
NaN
\n",
"
NaN
\n",
"
DISK
\n",
"
NV
\n",
"
7/1/1960 12:00
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" City Colors Reported Shape Reported State Time\n",
"21 NaN NaN NaN LA 8/15/1943 0:00\n",
"22 NaN NaN LIGHT LA 8/15/1943 0:00\n",
"204 NaN NaN DISK CA 7/15/1952 12:30\n",
"241 NaN BLUE DISK MT 7/4/1953 14:00\n",
"613 NaN NaN DISK NV 7/1/1960 12:00"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Nos devuelve el dataframe con las columnas vacias de City\n",
"ufo[ufo.City.isnull()].head()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(18241, 5)"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Devuelve el numero de filas y columnas\n",
"ufo.shape"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2486, 5)"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Si faltan 'algun (any)' valor en una fila entonces elimina esa fila esa fila\n",
"ufo.dropna(how='any').shape"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(18241, 5)"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Si faltan todos(all) los valores en una fila, entonces elimina esa fila (no se eliminan en este caso)\n",
"ufo.dropna(how='all').shape"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(15576, 5)"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Si falta algun valor en una fila (teniendo en cuenta sólo 'City' y 'Shape Reported'), entonces se elimina esa fila\n",
"ufo.dropna(subset=['City', 'Shape Reported'], how='any').shape"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(18237, 5)"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Si \"all\" los valores estan faltantes en una filla (considerando solo 'City' y 'Shape Reported') entonces elimina esa fila\n",
"ufo.dropna(subset=['City', 'Shape Reported'], how='all').shape"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LIGHT 2803\n",
"DISK 2122\n",
"TRIANGLE 1889\n",
"OTHER 1402\n",
"CIRCLE 1365\n",
"Name: Shape Reported, dtype: int64"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 'value_counts' no incluye missing values por defecto\n",
"ufo['Shape Reported'].value_counts().head()"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LIGHT 2803\n",
"NaN 2644\n",
"DISK 2122\n",
"TRIANGLE 1889\n",
"OTHER 1402\n",
"Name: Shape Reported, dtype: int64"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Incluye explícitamente los missing values\n",
"ufo['Shape Reported'].value_counts(dropna=False).head()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Rellenar los valores faltantes con un valor especificado\n",
"ufo['Shape Reported'].fillna(value='VARIOUS', inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"VARIOUS 2977\n",
"LIGHT 2803\n",
"DISK 2122\n",
"TRIANGLE 1889\n",
"OTHER 1402\n",
"Name: Shape Reported, dtype: int64"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Confirmar que los valores faltantes fueron rellenados\n",
"ufo['Shape Reported'].value_counts().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Slicing \n",
"\n",
"El método __loc__ se utiliza para seleccionar filas y columnas por etiqueta. Puede pasar\n",
": \n",
"- Una etiqueta única \n",
"- Una lista de etiquetas \n",
"- Una porción de etiquetas \n",
"- Una serie booleana \n",
"- Dos puntos (que indica \"todas las etiquetas\")"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
State
\n",
"
Type of Crime
\n",
"
Crime
\n",
"
Year
\n",
"
Count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1960
\n",
"
406
\n",
"
\n",
"
\n",
"
1
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1961
\n",
"
427
\n",
"
\n",
"
\n",
"
2
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1962
\n",
"
316
\n",
"
\n",
"
\n",
"
3
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1963
\n",
"
340
\n",
"
\n",
"
\n",
"
4
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1964
\n",
"
316
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State Type of Crime Crime Year Count\n",
"0 Alabama Violent Crime Murder and nonnegligent Manslaughter 1960 406\n",
"1 Alabama Violent Crime Murder and nonnegligent Manslaughter 1961 427\n",
"2 Alabama Violent Crime Murder and nonnegligent Manslaughter 1962 316\n",
"3 Alabama Violent Crime Murder and nonnegligent Manslaughter 1963 340\n",
"4 Alabama Violent Crime Murder and nonnegligent Manslaughter 1964 316"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Leyendo la data de crimen cometidos en los EEUU\n",
"crime_data = pd.read_csv('../data/crime.csv')\n",
"crime_data.head()"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"State Alabama\n",
"Type of Crime Violent Crime\n",
"Crime Murder and nonnegligent Manslaughter\n",
"Year 1960\n",
"Count 406\n",
"Name: 0, dtype: object"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fila 0, todas las columnas\n",
"crime_data.loc[0, :]"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
State
\n",
"
Type of Crime
\n",
"
Crime
\n",
"
Year
\n",
"
Count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1960
\n",
"
406
\n",
"
\n",
"
\n",
"
1
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1961
\n",
"
427
\n",
"
\n",
"
\n",
"
2
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1962
\n",
"
316
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State Type of Crime Crime Year Count\n",
"0 Alabama Violent Crime Murder and nonnegligent Manslaughter 1960 406\n",
"1 Alabama Violent Crime Murder and nonnegligent Manslaughter 1961 427\n",
"2 Alabama Violent Crime Murder and nonnegligent Manslaughter 1962 316"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# filas 0 , 1 y 2, all columns\n",
"crime_data.loc[[0, 1, 2], :]"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
State
\n",
"
Type of Crime
\n",
"
Crime
\n",
"
Year
\n",
"
Count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1960
\n",
"
406
\n",
"
\n",
"
\n",
"
1
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1961
\n",
"
427
\n",
"
\n",
"
\n",
"
2
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1962
\n",
"
316
\n",
"
\n",
"
\n",
"
3
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1963
\n",
"
340
\n",
"
\n",
"
\n",
"
4
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1964
\n",
"
316
\n",
"
\n",
"
\n",
"
5
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1965
\n",
"
395
\n",
"
\n",
"
\n",
"
6
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1966
\n",
"
384
\n",
"
\n",
"
\n",
"
7
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1967
\n",
"
415
\n",
"
\n",
"
\n",
"
8
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1968
\n",
"
421
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State Type of Crime Crime Year Count\n",
"0 Alabama Violent Crime Murder and nonnegligent Manslaughter 1960 406\n",
"1 Alabama Violent Crime Murder and nonnegligent Manslaughter 1961 427\n",
"2 Alabama Violent Crime Murder and nonnegligent Manslaughter 1962 316\n",
"3 Alabama Violent Crime Murder and nonnegligent Manslaughter 1963 340\n",
"4 Alabama Violent Crime Murder and nonnegligent Manslaughter 1964 316\n",
"5 Alabama Violent Crime Murder and nonnegligent Manslaughter 1965 395\n",
"6 Alabama Violent Crime Murder and nonnegligent Manslaughter 1966 384\n",
"7 Alabama Violent Crime Murder and nonnegligent Manslaughter 1967 415\n",
"8 Alabama Violent Crime Murder and nonnegligent Manslaughter 1968 421"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# filas 0 a la 8, all columns\n",
"crime_data.loc[0:8 , :]"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
State
\n",
"
Type of Crime
\n",
"
Crime
\n",
"
Year
\n",
"
Count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1960
\n",
"
406
\n",
"
\n",
"
\n",
"
1
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1961
\n",
"
427
\n",
"
\n",
"
\n",
"
2
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1962
\n",
"
316
\n",
"
\n",
"
\n",
"
3
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1963
\n",
"
340
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State Type of Crime Crime Year Count\n",
"0 Alabama Violent Crime Murder and nonnegligent Manslaughter 1960 406\n",
"1 Alabama Violent Crime Murder and nonnegligent Manslaughter 1961 427\n",
"2 Alabama Violent Crime Murder and nonnegligent Manslaughter 1962 316\n",
"3 Alabama Violent Crime Murder and nonnegligent Manslaughter 1963 340"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# filas 0 a la 3, all columns\n",
"crime_data.loc[0:3]"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
State
\n",
"
Count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Alabama
\n",
"
406
\n",
"
\n",
"
\n",
"
1
\n",
"
Alabama
\n",
"
427
\n",
"
\n",
"
\n",
"
2
\n",
"
Alabama
\n",
"
316
\n",
"
\n",
"
\n",
"
3
\n",
"
Alabama
\n",
"
340
\n",
"
\n",
"
\n",
"
4
\n",
"
Alabama
\n",
"
316
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State Count\n",
"0 Alabama 406\n",
"1 Alabama 427\n",
"2 Alabama 316\n",
"3 Alabama 340\n",
"4 Alabama 316"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# filas de la 0 a la 4 (incluyendo) las columnas 'State' y 'Count'\n",
"crime_data.loc[0:4, ['State', 'Count']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"El método __iloc__ se utiliza para seleccionar filas y columnas por posición entera. Se puede pasar: \n",
"\n",
"- Una sola posición entera \n",
"- Una lista de posiciones enteras \n",
"- Una porción de posiciones enteras \n",
"- Dos puntos (que indica \"todas las posiciones enteras\")"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
State
\n",
"
Type of Crime
\n",
"
Crime
\n",
"
Year
\n",
"
Count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1960
\n",
"
406
\n",
"
\n",
"
\n",
"
1
\n",
"
Alabama
\n",
"
Violent Crime
\n",
"
Murder and nonnegligent Manslaughter
\n",
"
1961
\n",
"
427
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State Type of Crime Crime Year Count\n",
"0 Alabama Violent Crime Murder and nonnegligent Manslaughter 1960 406\n",
"1 Alabama Violent Crime Murder and nonnegligent Manslaughter 1961 427"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# filas en la posicion 0 al 2 (excluyente) todas las columnas\n",
"crime_data.iloc[0:2, :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"El método __ix__ se utiliza para seleccionar filas y columnas por etiqueta o posición de número entero, y sólo debe utilizarse cuando se necesita mezclar selección basada en etiquetas y enteros en la misma llamada.\n",
"\n",
"Reglas para el uso de números con ix: \n",
"\n",
"- Si el índice es cadenas, los números se tratan como posiciones enteras, y por lo tanto los cortes son exclusivos a la derecha. \n",
"- Si el índice es números enteros, los números se tratan como etiquetas y, por lo tanto, los cortes son inclusivos."
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
beer_servings
\n",
"
spirit_servings
\n",
"
wine_servings
\n",
"
total_litres_of_pure_alcohol
\n",
"
continent
\n",
"
\n",
"
\n",
"
country
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Afghanistan
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0.0
\n",
"
Asia
\n",
"
\n",
"
\n",
"
Albania
\n",
"
89
\n",
"
132
\n",
"
54
\n",
"
4.9
\n",
"
Europe
\n",
"
\n",
"
\n",
"
Algeria
\n",
"
25
\n",
"
0
\n",
"
14
\n",
"
0.7
\n",
"
Africa
\n",
"
\n",
"
\n",
"
Andorra
\n",
"
245
\n",
"
138
\n",
"
312
\n",
"
12.4
\n",
"
Europe
\n",
"
\n",
"
\n",
"
Angola
\n",
"
217
\n",
"
57
\n",
"
45
\n",
"
5.9
\n",
"
Africa
\n",
"
\n",
"
\n",
"
Antigua & Barbuda
\n",
"
102
\n",
"
128
\n",
"
45
\n",
"
4.9
\n",
"
North America
\n",
"
\n",
"
\n",
"
Argentina
\n",
"
193
\n",
"
25
\n",
"
221
\n",
"
8.3
\n",
"
South America
\n",
"
\n",
"
\n",
"
Armenia
\n",
"
21
\n",
"
179
\n",
"
11
\n",
"
3.8
\n",
"
Europe
\n",
"
\n",
"
\n",
"
Australia
\n",
"
261
\n",
"
72
\n",
"
212
\n",
"
10.4
\n",
"
Oceania
\n",
"
\n",
"
\n",
"
Austria
\n",
"
279
\n",
"
75
\n",
"
191
\n",
"
9.7
\n",
"
Europe
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" beer_servings spirit_servings wine_servings \\\n",
"country \n",
"Afghanistan 0 0 0 \n",
"Albania 89 132 54 \n",
"Algeria 25 0 14 \n",
"Andorra 245 138 312 \n",
"Angola 217 57 45 \n",
"Antigua & Barbuda 102 128 45 \n",
"Argentina 193 25 221 \n",
"Armenia 21 179 11 \n",
"Australia 261 72 212 \n",
"Austria 279 75 191 \n",
"\n",
" total_litres_of_pure_alcohol continent \n",
"country \n",
"Afghanistan 0.0 Asia \n",
"Albania 4.9 Europe \n",
"Algeria 0.7 Africa \n",
"Andorra 12.4 Europe \n",
"Angola 5.9 Africa \n",
"Antigua & Barbuda 4.9 North America \n",
"Argentina 8.3 South America \n",
"Armenia 3.8 Europe \n",
"Australia 10.4 Oceania \n",
"Austria 9.7 Europe "
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Leer el dataset de consumo de alcohol en un DataFrame y establecer 'país' como el índice\n",
"drinks = pd.read_csv('http://bit.ly/drinksbycountry', index_col='country')\n",
"drinks.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"89"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fila con la etiqueta 'Albania' columna en la posicion 0\n",
"drinks.ix['Albania', 0]"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"89"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# fila en la posicion 1, columna con etiqueta 'beer_servings'\n",
"drinks.ix[1, 'beer_servings']"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
beer_servings
\n",
"
spirit_servings
\n",
"
\n",
"
\n",
"
country
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Albania
\n",
"
89
\n",
"
132
\n",
"
\n",
"
\n",
"
Algeria
\n",
"
25
\n",
"
0
\n",
"
\n",
"
\n",
"
Andorra
\n",
"
245
\n",
"
138
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" beer_servings spirit_servings\n",
"country \n",
"Albania 89 132\n",
"Algeria 25 0\n",
"Andorra 245 138"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filas 'Albania' hasta 'Andorra' (inclusiva), columnas en la posicion 0 hasta 2 (exclusiva)\n",
"drinks.ix['Albania':'Andorra', 0:2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__ Entonces... ¿Cuáles son las diferencias entre loc, iloc, e ix? veamos... __\n",
"\n",
"Usemos la vieja confiable para contestar esta pregunta, es bueno buscar preguntas en internet ya que es imposible memorizar tantos metodos... ;) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"http://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Apply\n",
"\n",
"Podemos aplicar funciones a data series:"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def alcoholics(x):\n",
" if x > 10:\n",
" return 'alcoholics!'\n",
" else:\n",
" return 'Sober people'"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
beer_servings
\n",
"
spirit_servings
\n",
"
wine_servings
\n",
"
total_litres_of_pure_alcohol
\n",
"
continent
\n",
"
Kind of people
\n",
"
\n",
"
\n",
"
country
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Afghanistan
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0.0
\n",
"
Asia
\n",
"
Sober people
\n",
"
\n",
"
\n",
"
Albania
\n",
"
89
\n",
"
132
\n",
"
54
\n",
"
4.9
\n",
"
Europe
\n",
"
Sober people
\n",
"
\n",
"
\n",
"
Algeria
\n",
"
25
\n",
"
0
\n",
"
14
\n",
"
0.7
\n",
"
Africa
\n",
"
Sober people
\n",
"
\n",
"
\n",
"
Andorra
\n",
"
245
\n",
"
138
\n",
"
312
\n",
"
12.4
\n",
"
Europe
\n",
"
alcoholics!
\n",
"
\n",
"
\n",
"
Angola
\n",
"
217
\n",
"
57
\n",
"
45
\n",
"
5.9
\n",
"
Africa
\n",
"
Sober people
\n",
"
\n",
"
\n",
"
Antigua & Barbuda
\n",
"
102
\n",
"
128
\n",
"
45
\n",
"
4.9
\n",
"
North America
\n",
"
Sober people
\n",
"
\n",
"
\n",
"
Argentina
\n",
"
193
\n",
"
25
\n",
"
221
\n",
"
8.3
\n",
"
South America
\n",
"
Sober people
\n",
"
\n",
"
\n",
"
Armenia
\n",
"
21
\n",
"
179
\n",
"
11
\n",
"
3.8
\n",
"
Europe
\n",
"
Sober people
\n",
"
\n",
"
\n",
"
Australia
\n",
"
261
\n",
"
72
\n",
"
212
\n",
"
10.4
\n",
"
Oceania
\n",
"
alcoholics!
\n",
"
\n",
"
\n",
"
Austria
\n",
"
279
\n",
"
75
\n",
"
191
\n",
"
9.7
\n",
"
Europe
\n",
"
Sober people
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" beer_servings spirit_servings wine_servings \\\n",
"country \n",
"Afghanistan 0 0 0 \n",
"Albania 89 132 54 \n",
"Algeria 25 0 14 \n",
"Andorra 245 138 312 \n",
"Angola 217 57 45 \n",
"Antigua & Barbuda 102 128 45 \n",
"Argentina 193 25 221 \n",
"Armenia 21 179 11 \n",
"Australia 261 72 212 \n",
"Austria 279 75 191 \n",
"\n",
" total_litres_of_pure_alcohol continent Kind of people \n",
"country \n",
"Afghanistan 0.0 Asia Sober people \n",
"Albania 4.9 Europe Sober people \n",
"Algeria 0.7 Africa Sober people \n",
"Andorra 12.4 Europe alcoholics! \n",
"Angola 5.9 Africa Sober people \n",
"Antigua & Barbuda 4.9 North America Sober people \n",
"Argentina 8.3 South America Sober people \n",
"Armenia 3.8 Europe Sober people \n",
"Australia 10.4 Oceania alcoholics! \n",
"Austria 9.7 Europe Sober people "
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"drinks['Kind of people'] = drinks['total_litres_of_pure_alcohol'].apply(alcoholics)\n",
"drinks.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creación de Dummy Variables\n",
"\n",
"En general: \n",
"- Si tiene __\"K\" posibles valores__ para una característica categórica, sólo necesita __\"K-1\" dummy variables __ para capturar toda la información sobre esa característica. \n",
"- Una convención es __eliminar la primera variable ficticia__, que define ese nivel como la \"baseline\"."
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
3
\n",
"
Braund, Mr. Owen Harris
\n",
"
male
\n",
"
22.0
\n",
"
1
\n",
"
0
\n",
"
A/5 21171
\n",
"
7.2500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
4
\n",
"
5
\n",
"
0
\n",
"
3
\n",
"
Allen, Mr. William Henry
\n",
"
male
\n",
"
35.0
\n",
"
0
\n",
"
0
\n",
"
373450
\n",
"
8.0500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Leyendo la data de crimen cometidos en los EEUU\n",
"titanic = pd.read_csv('../data/titanic.csv')\n",
"titanic.head()"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
Sex_male
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
3
\n",
"
Braund, Mr. Owen Harris
\n",
"
male
\n",
"
22.0
\n",
"
1
\n",
"
0
\n",
"
A/5 21171
\n",
"
7.2500
\n",
"
NaN
\n",
"
S
\n",
"
1
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
0
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
5
\n",
"
0
\n",
"
3
\n",
"
Allen, Mr. William Henry
\n",
"
male
\n",
"
35.0
\n",
"
0
\n",
"
0
\n",
"
373450
\n",
"
8.0500
\n",
"
NaN
\n",
"
S
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked Sex_male \n",
"0 0 A/5 21171 7.2500 NaN S 1 \n",
"1 0 PC 17599 71.2833 C85 C 0 \n",
"2 0 STON/O2. 3101282 7.9250 NaN S 0 \n",
"3 0 113803 53.1000 C123 S 0 \n",
"4 0 373450 8.0500 NaN S 1 "
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create the 'Sex_male' dummy variable using the 'map' method\n",
"titanic['Sex_male'] = titanic.Sex.map({'female':0, 'male':1})\n",
"titanic.head()"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"