{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Mini tutorial de NumPy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Esta notebook fue creada originalmente como un blog post por [Raúl E. López Briega](http://relopezbriega.github.io) para el [sitio de capacitaciones de IAAR](https://iaarhub.github.io/). El contenido esta bajo la licencia BSD.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"[NumPy](http://www.numpy.org) es el principal paquete para la computación científica con [Python](https://www.python.org/)\n",
"\n",
"Este mini tutorial esta orientado a una introducción de Python para la Ciencia de Datos. ***Pueden contactarme [aquí](http://relopezbriega.github.io/) para cualquier tipo de sugerencias!***. Basado en la cheat sheet de [JulianGaal](https://github.com/juliangaal/python-cheat-sheet/blob/master/NumPy/NumPy.md)\n",
"\n",
"## Índice\n",
"1. [Numpy básico](#basics)\n",
" - [Placeholders](#place)\n",
" - [Ejemplos](#ex)\n",
"2. [Arrays](#arrays)\n",
" - [Propiedades](#props)\n",
" - [Copiando/Ordenando](#gops)\n",
" * [Ejemplos](#array-Ejemplo)\n",
" - [Manipulación de Arrays](#man)\n",
" * [Agregando/Quitando Elementos](#addrem)\n",
" * [Combinando Arrays](#comb)\n",
" * [Dividiendo Arrays](#split)\n",
" * [Mas](#more)\n",
"3. [Matemáticas](#maths)\n",
" - [Operaciones aritméticas](#ops)\n",
" * [Ejemplos](#operations-Ejemplos)\n",
" - [Comparaciones](#comparison)\n",
" * [Ejemplos](#comparison-Ejemplo)\n",
" - [Estadística básica](#stats)\n",
" - [Mas funciones](#more_func)\n",
"4. [Slicing y Subsetting](#ss)\n",
" - [Ejemplos](#exp)\n",
"5. [Trucos](#Trucos)\n",
"6. [Créditos](#creds)\n",
"\n",
"\n",
"## Numpy básico \n",
"\n",
"Una de las funciones más utilizadas de [NumPy](http://www.numpy.org) son los *arreglos o arrays*: La principal diferencia entre las *listas de python* y los *arrays de numpy* esta dada por la velocidad y las funcionalidades adicionales que poseen estas últimas. Las *listas* solo nos dan operaciones básicas, pero los *arrays de numpy* nos agregan FFTs, convoluciones, búsquedas rápidas, estadística, álgebra lineal, histogramas, entre muchas otras cosas.\n",
"La más importante ventaja que poseen los *arrays de numpy* para la ciencia de datos, es la habilidad de hacer cálculos a nivel de los elementos. \n",
"\n",
"`eje 0` siempre se refiere a una fila \n",
"\n",
"`eje 1` siempre se refiere a una columna\n",
"\n",
"| Operador | Descripción | Documentación |\n",
"| :------------- | :------------- | :--------|\n",
"|`np.array([1,2,3])`|1d array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html#numpy.array)|\n",
"|`np.array([(1,2,3),(4,5,6)])`|2d array|ver arriba|\n",
"|`np.arange(start,stop,step)`|array desde un rango|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html)|\n",
"\n",
"### Placeholders \n",
"| Operador | Descripción |Documentación|\n",
"| :------------- | :------------- |:---------- |\n",
"|`np.linspace(0,2,9)`|Agrega valores equidistantes desde el intervalo hasta el largo del array |[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)|\n",
"|`np.zeros((1,2))`|Crea un array de ceros|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html)|\n",
"|`np.ones((1,2))`|Crea un array de unos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html#numpy.ones)|\n",
"|`np.random.random((5,5))`|Crea un array de números aleatorios|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.random.html)|\n",
"|`np.empty((2,2))`|Crea un array vacío|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.empty.html)|\n",
"\n",
"### Ejemplos "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Convencion para importar numpy \n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([1, 2, 3]), array([[1, 2, 3],\n",
" [4, 5, 6]]))"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 1 dimension\n",
"x = np.array([1,2,3])\n",
"# 2 dimensiones\n",
"y = np.array([(1,2,3),(4,5,6)])\n",
"x, y"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# con int\n",
"x = np.arange(3)\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0., 1., 2.])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# con float\n",
"y = np.arange(3.0)\n",
"y"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 4, 5, 6])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# rango\n",
"x = np.arange(3,7)\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 5])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# rango con intervalo\n",
"y = np.arange(3,7,2)\n",
"y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Arrays \n",
"\n",
"### Propiedades de los Arrays \n",
"|Sintaxis|Descripción|Documentación|\n",
"|:-------------|:-------------|:-----------|\n",
"|`array.shape`|Dimensiones (Filas,Columnas)|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html)|\n",
"|`len(array)`|Largo de Arrays|[link](https://docs.python.org/3.5/library/functions.html#len)|\n",
"|`array.ndim`|Numero de dimensiones de Array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ndim.html)|\n",
"|`array.size`|Números de Elementos de Array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.size.html)|\n",
"|`array.dtype`|Tipo de Datos|[link](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)|\n",
"|`array.astype(type)`|Convertir tipo de datos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.astype.html)|\n",
"|`type(array)`|Tipo de Array|[link](https://docs.scipy.org/doc/numpy/user/basics.types.html)|\n",
"\n",
"### Copiando/Ordenando \n",
"| Operador | Descripciones | Documentación |\n",
"| :------------- | :------------- | :----------- |\n",
"|`np.copy(array)`|Crea una copia del array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.copy.html)|\n",
"|`other = array.copy()`|Crea una copia profunda del array|ver arriba|\n",
"|`array.sort()`|Ordena un Array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html)|\n",
"|`array.sort(axis=0)`|Ordena los ejes del Array|ver arriba|\n",
"\n",
"#### Ejemplos "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Ordenar. Ordenar en orden ascendente\n",
"y = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])\n",
"y.sort()\n",
"y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Manipulación de Arrays\n",
"\n",
"### Agregando o quitando elementos \n",
"|Operador|Descripción|Documentación|\n",
"|:-----------|:--------|:---------|\n",
"|`np.append(a,b)`|Agrega items al array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html)|\n",
"|`np.insert(array, 1, 2, axis)`|Inserta items al arrays en ejes 0 o 1|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.insert.html)|\n",
"|`array.resize((2,4))`|Redimensiona el array a la forma(2,4)|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.resize.html)|\n",
"|`np.delete(array,1,axis)`|Elimina items del array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html)|\n",
"\n",
"### Combinando Arrays \n",
"|Operador|Descripción|Documentación|\n",
"|:---------|:-------|:---------|\n",
"|`np.concatenate((a,b),axis=0)`|Concatena 2 arrays, agrega al final|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html)|\n",
"|`np.vstack((a,b))`|Apila array a nivel filas|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html)|\n",
"|`np.hstack((a,b))`|Apila array a nivel columna|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.hstack.html#numpy.hstack)|\n",
"\n",
"### Dividiendo Arrays \n",
"|Operador|Descripción|Documentación|\n",
"|:---------|:-------|:------|\n",
"|`numpy.split()`||[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html)|\n",
"|`np.array_split(array, 3)`|Divide un array en sub-arrays de (casi) idéntico tamaño|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_split.html#numpy.array_split)|\n",
"|`numpy.hsplit(array, 3)`|Divide el array en forma horizontal en el 3er índice |[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.hsplit.html#numpy.hsplit)|\n",
"\n",
"\n",
"### Mas \n",
"|Operador|Descripción|Documentación|\n",
"|:--------|:--------|:--------|\n",
"|`other = ndarray.flatten()`|Aplana un array de 2d a una de 1d|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html)|\n",
"|`array = np.transpose(other)` `array.T` |Transpone el array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html)|\n",
"\n",
"\n",
"## Matemáticas \n",
"\n",
"### Operaciones aritméticas \n",
"| Operador | Descripción |Documentación|\n",
"| :------------- | :------------- |:---------|\n",
"|`np.add(x,y)`|Adición|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.add.html)|\n",
"|`np.substract(x,y)`|Substracción|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.subtract.html#numpy.subtract)|\n",
"|`np.divide(x,y)`|División|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.divide.html#numpy.divide)|\n",
"|`np.multiply(x,y)`|Multiplicación|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.multiply.html#numpy.multiply)|\n",
"|`np.sqrt(x)`|Raíz cuadrada|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sqrt.html#numpy.sqrt)|\n",
"|`np.sin(x)`|Seno a nivel elemento|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sin.html#numpy.sin)|\n",
"|`np.cos(x)`|Coseno a nivel elemento|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cos.html#numpy.cos)|\n",
"|`np.log(x)`|Logaritmo natural a nivel elementos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html#numpy.log)|\n",
"|`np.dot(x,y)`|Producto escalar|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html)|\n",
"\n",
"**Recordar:** Las operaciones con arrays de NumPy funcionan a nivel elemento.\n",
"\n",
"#### Ejemplo "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[2 4 6]\n",
" [5 7 9]]\n"
]
}
],
"source": [
"# Si un array de 1d es sumada a otra de 2d, Numpy elije \n",
"# la array con dimensión más pequeña y la suma con la de\n",
"# dimensión más grande\n",
"a = np.array([1, 2, 3])\n",
"b = np.array([(1, 2, 3), (4, 5, 6)])\n",
"print(np.add(a, b))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Comparación \n",
"| Operador | Descripción | Documentación |\n",
"| :------------- | :------------- |:---------|\n",
"|`==`|Igual a|[link](https://docs.python.org/2/library/stdtypes.html)|\n",
"|`!=`|No igual a|[link](https://docs.python.org/2/library/stdtypes.html)|\n",
"|`<`|Menor que|[link](https://docs.python.org/2/library/stdtypes.html)|\n",
"|`>`|Mayor que|[link](https://docs.python.org/2/library/stdtypes.html)|\n",
"|`<=`|Menor o igual que|[link](https://docs.python.org/2/library/stdtypes.html)|\n",
"|`>=`|Mayor o igual que|[link](https://docs.python.org/2/library/stdtypes.html)|\n",
"|`np.array_equal(x,y)`|Comparación a nivel elemento|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html)|\n",
"\n",
"#### Ejemplo "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ True True True True True False False False False False]\n"
]
}
],
"source": [
"# Utilizando operadores de comparación creará un array de tipo booleano.\n",
"z = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n",
"c = z < 6\n",
"print(c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Estadística básica \n",
"| Operador | Descripción | Documentación |\n",
"| :------------- | :------------- |:--------- |\n",
"|`array.mean()``np.mean(array)`|Media aritmética|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html#numpy.mean)|\n",
"|`np.median(array)`|Mediana|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.median.html#numpy.median)|\n",
"|`array.corrcoef()`|Coeficiente de correlación|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html#numpy.corrcoef)|\n",
"|`array.std(array)`|Desvío estándar|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html#numpy.std)|\n",
"\n",
"### Mas funciones\n",
"| Operador | Descripción | Documentación |\n",
"| :------------- | :------------- |:--------- |\n",
"|`array.sum()`|Suma a nivel elementos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html)|\n",
"|`array.min()`|Minimiza a nivel elementos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.min.html)|\n",
"|`array.max(axis=0)`|Máximo valor de un determinado eje|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.max.html)|\n",
"|`array.cumsum(axis=0)`|Suma acumulada en un eje específico|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cumsum.html)|\n",
"\n",
"\n",
"## Slicing y Subsetting \n",
"|Operador|Descripción|Documentación|\n",
"| :------------- | :------------- | :------------- |\n",
"|`array[i]`|array 1d al índice i|[link](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)|\n",
"|`array[i,j]`|array 2d al index[i][j]|ver arriba|\n",
"|`array[i<4]`|índice booleano, ver [Trucos](#Trucos)|ver arriba|\n",
"|`array[0:3]`|Selecciona items de indice 0, 1 y 2|ver arriba|\n",
"|`array[0:2,1]`|Selecciona items de filas 0 y 1 de la columna 1|ver arriba|\n",
"|`array[:1]`|Selecciona items de fila 0 (igual a array[0:1, :])|ver arriba|\n",
"|`array[1:2, :]`|Selecciona items de fila 1|ver arriba|\n",
"|`array[1,...]`|Igual a array[1,:,:]|ver arriba|\n",
"|`array[ : :-1]`|Reversa el `array`|ver arriba|\n",
"\n",
"\n",
"#### Ejemplos "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[3]\n",
"[3 6]\n",
"[1 2 3]\n",
"[3]\n",
"[1 4]\n",
"[[4 5]]\n"
]
}
],
"source": [
"# Seleccionando elementos.\n",
"b = np.array([(1, 2, 3), (4, 5, 6)])\n",
"\n",
"# El índice *antes* de la coma refiere a filas,\n",
"# el índice *después* de la coma refiere a columnas.\n",
"print(b[0:1, 2])\n",
"print(b[:len(b), 2])\n",
"print(b[0, :])\n",
"print(b[0, 2:])\n",
"print(b[:, 0])\n",
"\n",
"c = np.array([(1, 2, 3), (4, 5, 6)])\n",
"d = c[1:2, 0:2]\n",
"print(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Trucos \n",
"\n",
"Esta es una lista de ejemplos en progreso. Si conocen un buen truco, no duden en comentar para que sea incluido en el tutorial. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[4]\n",
"[1 2 3 6 1 1]\n"
]
}
],
"source": [
"# Truco de indices cuando trabajamos con 2 arrays\n",
"a = np.array([1,2,3,6,1,4,1])\n",
"b = np.array([5,6,7,8,3,1,2])\n",
"\n",
"# Solo guardar a con indice dónde b == 1\n",
"other_a = a[b == 1]\n",
"print(other_a)\n",
"\n",
"# Guardar todos las las posiciones excepto aquella en que b==1\n",
"other_other_a = a[b != 1]\n",
"print(other_other_a)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[6 8 6 9]\n",
"[1 2 3 4 4]\n"
]
}
],
"source": [
"# Otra forma de trabajar con índices\n",
"x = np.array([4,6,8,1,2,6,9])\n",
"y = x > 5\n",
"print(x[y])\n",
"\n",
"# Más compacta\n",
"x = np.array([1, 2, 3, 4, 4, 35, 212, 5, 5, 6])\n",
"print(x[x < 5])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Créditos \n",
"[Datacamp](https://www.datacamp.com/home),\n",
"[Quandl](https://s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+SciPy,+NumPy+Cheat+Sheet.pdf) y [Documentación oficial](https://docs.scipy.org/doc/numpy/)\n",
"\n",
"*Este post fue escrito utilizando IPython notebook. Pueden descargar este [notebook](https://github.com/IAARhub/iaar_template/tree/master/content/notebooks/numpy_cheat_sheet.ipynb) o ver su version estática en [nbviewer](http://nbviewer.ipython.org/github/IAARhub/iaar_template/tree/master/content/notebooks/numpy_cheat_sheet.ipynb).*"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}