{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mini tutorial de NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Esta notebook fue creada originalmente como un blog post por [Raúl E. López Briega](http://relopezbriega.github.io) para el [sitio de capacitaciones de IAAR](https://iaarhub.github.io/). El contenido esta bajo la licencia BSD.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"numpy\"\n", "\n", "[NumPy](http://www.numpy.org) es el principal paquete para la computación científica con [Python](https://www.python.org/)\n", "\n", "Este mini tutorial esta orientado a una introducción de Python para la Ciencia de Datos. ***Pueden contactarme [aquí](http://relopezbriega.github.io/) para cualquier tipo de sugerencias!***. Basado en la cheat sheet de [JulianGaal](https://github.com/juliangaal/python-cheat-sheet/blob/master/NumPy/NumPy.md)\n", "\n", "## Índice\n", "1. [Numpy básico](#basics)\n", " - [Placeholders](#place)\n", " - [Ejemplos](#ex)\n", "2. [Arrays](#arrays)\n", " - [Propiedades](#props)\n", " - [Copiando/Ordenando](#gops)\n", " * [Ejemplos](#array-Ejemplo)\n", " - [Manipulación de Arrays](#man)\n", " * [Agregando/Quitando Elementos](#addrem)\n", " * [Combinando Arrays](#comb)\n", " * [Dividiendo Arrays](#split)\n", " * [Mas](#more)\n", "3. [Matemáticas](#maths)\n", " - [Operaciones aritméticas](#ops)\n", " * [Ejemplos](#operations-Ejemplos)\n", " - [Comparaciones](#comparison)\n", " * [Ejemplos](#comparison-Ejemplo)\n", " - [Estadística básica](#stats)\n", " - [Mas funciones](#more_func)\n", "4. [Slicing y Subsetting](#ss)\n", " - [Ejemplos](#exp)\n", "5. [Trucos](#Trucos)\n", "6. [Créditos](#creds)\n", "\n", "\n", "## Numpy básico \n", "\n", "Una de las funciones más utilizadas de [NumPy](http://www.numpy.org) son los *arreglos o arrays*: La principal diferencia entre las *listas de python* y los *arrays de numpy* esta dada por la velocidad y las funcionalidades adicionales que poseen estas últimas. Las *listas* solo nos dan operaciones básicas, pero los *arrays de numpy* nos agregan FFTs, convoluciones, búsquedas rápidas, estadística, álgebra lineal, histogramas, entre muchas otras cosas.
\n", "La más importante ventaja que poseen los *arrays de numpy* para la ciencia de datos, es la habilidad de hacer cálculos a nivel de los elementos. \n", "\n", "`eje 0` siempre se refiere a una fila \n", "\n", "`eje 1` siempre se refiere a una columna\n", "\n", "| Operador | Descripción | Documentación |\n", "| :------------- | :------------- | :--------|\n", "|`np.array([1,2,3])`|1d array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html#numpy.array)|\n", "|`np.array([(1,2,3),(4,5,6)])`|2d array|ver arriba|\n", "|`np.arange(start,stop,step)`|array desde un rango|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html)|\n", "\n", "### Placeholders \n", "| Operador | Descripción |Documentación|\n", "| :------------- | :------------- |:---------- |\n", "|`np.linspace(0,2,9)`|Agrega valores equidistantes desde el intervalo hasta el largo del array |[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)|\n", "|`np.zeros((1,2))`|Crea un array de ceros|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html)|\n", "|`np.ones((1,2))`|Crea un array de unos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html#numpy.ones)|\n", "|`np.random.random((5,5))`|Crea un array de números aleatorios|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.random.html)|\n", "|`np.empty((2,2))`|Crea un array vacío|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.empty.html)|\n", "\n", "### Ejemplos " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Convencion para importar numpy \n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1, 2, 3]), array([[1, 2, 3],\n", " [4, 5, 6]]))" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1 dimension\n", "x = np.array([1,2,3])\n", "# 2 dimensiones\n", "y = np.array([(1,2,3),(4,5,6)])\n", "x, y" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# con int\n", "x = np.arange(3)\n", "x" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0., 1., 2.])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# con float\n", "y = np.arange(3.0)\n", "y" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 4, 5, 6])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rango\n", "x = np.arange(3,7)\n", "x" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 5])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rango con intervalo\n", "y = np.arange(3,7,2)\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arrays \n", "\n", "### Propiedades de los Arrays \n", "|Sintaxis|Descripción|Documentación|\n", "|:-------------|:-------------|:-----------|\n", "|`array.shape`|Dimensiones (Filas,Columnas)|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html)|\n", "|`len(array)`|Largo de Arrays|[link](https://docs.python.org/3.5/library/functions.html#len)|\n", "|`array.ndim`|Numero de dimensiones de Array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ndim.html)|\n", "|`array.size`|Números de Elementos de Array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.size.html)|\n", "|`array.dtype`|Tipo de Datos|[link](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)|\n", "|`array.astype(type)`|Convertir tipo de datos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.astype.html)|\n", "|`type(array)`|Tipo de Array|[link](https://docs.scipy.org/doc/numpy/user/basics.types.html)|\n", "\n", "### Copiando/Ordenando \n", "| Operador | Descripciones | Documentación |\n", "| :------------- | :------------- | :----------- |\n", "|`np.copy(array)`|Crea una copia del array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.copy.html)|\n", "|`other = array.copy()`|Crea una copia profunda del array|ver arriba|\n", "|`array.sort()`|Ordena un Array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html)|\n", "|`array.sort(axis=0)`|Ordena los ejes del Array|ver arriba|\n", "\n", "#### Ejemplos " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Ordenar. Ordenar en orden ascendente\n", "y = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])\n", "y.sort()\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Manipulación de Arrays\n", "\n", "### Agregando o quitando elementos \n", "|Operador|Descripción|Documentación|\n", "|:-----------|:--------|:---------|\n", "|`np.append(a,b)`|Agrega items al array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html)|\n", "|`np.insert(array, 1, 2, axis)`|Inserta items al arrays en ejes 0 o 1|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.insert.html)|\n", "|`array.resize((2,4))`|Redimensiona el array a la forma(2,4)|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.resize.html)|\n", "|`np.delete(array,1,axis)`|Elimina items del array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html)|\n", "\n", "### Combinando Arrays \n", "|Operador|Descripción|Documentación|\n", "|:---------|:-------|:---------|\n", "|`np.concatenate((a,b),axis=0)`|Concatena 2 arrays, agrega al final|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html)|\n", "|`np.vstack((a,b))`|Apila array a nivel filas|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html)|\n", "|`np.hstack((a,b))`|Apila array a nivel columna|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.hstack.html#numpy.hstack)|\n", "\n", "### Dividiendo Arrays \n", "|Operador|Descripción|Documentación|\n", "|:---------|:-------|:------|\n", "|`numpy.split()`||[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html)|\n", "|`np.array_split(array, 3)`|Divide un array en sub-arrays de (casi) idéntico tamaño|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_split.html#numpy.array_split)|\n", "|`numpy.hsplit(array, 3)`|Divide el array en forma horizontal en el 3er índice |[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.hsplit.html#numpy.hsplit)|\n", "\n", "\n", "### Mas \n", "|Operador|Descripción|Documentación|\n", "|:--------|:--------|:--------|\n", "|`other = ndarray.flatten()`|Aplana un array de 2d a una de 1d|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html)|\n", "|`array = np.transpose(other)`
`array.T` |Transpone el array|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html)|\n", "\n", "\n", "## Matemáticas \n", "\n", "### Operaciones aritméticas \n", "| Operador | Descripción |Documentación|\n", "| :------------- | :------------- |:---------|\n", "|`np.add(x,y)`|Adición|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.add.html)|\n", "|`np.substract(x,y)`|Substracción|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.subtract.html#numpy.subtract)|\n", "|`np.divide(x,y)`|División|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.divide.html#numpy.divide)|\n", "|`np.multiply(x,y)`|Multiplicación|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.multiply.html#numpy.multiply)|\n", "|`np.sqrt(x)`|Raíz cuadrada|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sqrt.html#numpy.sqrt)|\n", "|`np.sin(x)`|Seno a nivel elemento|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sin.html#numpy.sin)|\n", "|`np.cos(x)`|Coseno a nivel elemento|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cos.html#numpy.cos)|\n", "|`np.log(x)`|Logaritmo natural a nivel elementos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html#numpy.log)|\n", "|`np.dot(x,y)`|Producto escalar|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html)|\n", "\n", "**Recordar:** Las operaciones con arrays de NumPy funcionan a nivel elemento.\n", "\n", "#### Ejemplo " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[2 4 6]\n", " [5 7 9]]\n" ] } ], "source": [ "# Si un array de 1d es sumada a otra de 2d, Numpy elije \n", "# la array con dimensión más pequeña y la suma con la de\n", "# dimensión más grande\n", "a = np.array([1, 2, 3])\n", "b = np.array([(1, 2, 3), (4, 5, 6)])\n", "print(np.add(a, b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparación \n", "| Operador | Descripción | Documentación |\n", "| :------------- | :------------- |:---------|\n", "|`==`|Igual a|[link](https://docs.python.org/2/library/stdtypes.html)|\n", "|`!=`|No igual a|[link](https://docs.python.org/2/library/stdtypes.html)|\n", "|`<`|Menor que|[link](https://docs.python.org/2/library/stdtypes.html)|\n", "|`>`|Mayor que|[link](https://docs.python.org/2/library/stdtypes.html)|\n", "|`<=`|Menor o igual que|[link](https://docs.python.org/2/library/stdtypes.html)|\n", "|`>=`|Mayor o igual que|[link](https://docs.python.org/2/library/stdtypes.html)|\n", "|`np.array_equal(x,y)`|Comparación a nivel elemento|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html)|\n", "\n", "#### Ejemplo " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ True True True True True False False False False False]\n" ] } ], "source": [ "# Utilizando operadores de comparación creará un array de tipo booleano.\n", "z = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n", "c = z < 6\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Estadística básica \n", "| Operador | Descripción | Documentación |\n", "| :------------- | :------------- |:--------- |\n", "|`array.mean()`
`np.mean(array)`|Media aritmética|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html#numpy.mean)|\n", "|`np.median(array)`|Mediana|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.median.html#numpy.median)|\n", "|`array.corrcoef()`|Coeficiente de correlación|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html#numpy.corrcoef)|\n", "|`array.std(array)`|Desvío estándar|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html#numpy.std)|\n", "\n", "### Mas funciones\n", "| Operador | Descripción | Documentación |\n", "| :------------- | :------------- |:--------- |\n", "|`array.sum()`|Suma a nivel elementos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html)|\n", "|`array.min()`|Minimiza a nivel elementos|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.min.html)|\n", "|`array.max(axis=0)`|Máximo valor de un determinado eje|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.max.html)|\n", "|`array.cumsum(axis=0)`|Suma acumulada en un eje específico|[link](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cumsum.html)|\n", "\n", "\n", "## Slicing y Subsetting \n", "|Operador|Descripción|Documentación|\n", "| :------------- | :------------- | :------------- |\n", "|`array[i]`|array 1d al índice i|[link](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)|\n", "|`array[i,j]`|array 2d al index[i][j]|ver arriba|\n", "|`array[i<4]`|índice booleano, ver [Trucos](#Trucos)|ver arriba|\n", "|`array[0:3]`|Selecciona items de indice 0, 1 y 2|ver arriba|\n", "|`array[0:2,1]`|Selecciona items de filas 0 y 1 de la columna 1|ver arriba|\n", "|`array[:1]`|Selecciona items de fila 0 (igual a array[0:1, :])|ver arriba|\n", "|`array[1:2, :]`|Selecciona items de fila 1|ver arriba|\n", "|`array[1,...]`|Igual a array[1,:,:]|ver arriba|\n", "|`array[ : :-1]`|Reversa el `array`|ver arriba|\n", "\n", "\n", "#### Ejemplos " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3]\n", "[3 6]\n", "[1 2 3]\n", "[3]\n", "[1 4]\n", "[[4 5]]\n" ] } ], "source": [ "# Seleccionando elementos.\n", "b = np.array([(1, 2, 3), (4, 5, 6)])\n", "\n", "# El índice *antes* de la coma refiere a filas,\n", "# el índice *después* de la coma refiere a columnas.\n", "print(b[0:1, 2])\n", "print(b[:len(b), 2])\n", "print(b[0, :])\n", "print(b[0, 2:])\n", "print(b[:, 0])\n", "\n", "c = np.array([(1, 2, 3), (4, 5, 6)])\n", "d = c[1:2, 0:2]\n", "print(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trucos \n", "\n", "Esta es una lista de ejemplos en progreso. Si conocen un buen truco, no duden en comentar para que sea incluido en el tutorial. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4]\n", "[1 2 3 6 1 1]\n" ] } ], "source": [ "# Truco de indices cuando trabajamos con 2 arrays\n", "a = np.array([1,2,3,6,1,4,1])\n", "b = np.array([5,6,7,8,3,1,2])\n", "\n", "# Solo guardar a con indice dónde b == 1\n", "other_a = a[b == 1]\n", "print(other_a)\n", "\n", "# Guardar todos las las posiciones excepto aquella en que b==1\n", "other_other_a = a[b != 1]\n", "print(other_other_a)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[6 8 6 9]\n", "[1 2 3 4 4]\n" ] } ], "source": [ "# Otra forma de trabajar con índices\n", "x = np.array([4,6,8,1,2,6,9])\n", "y = x > 5\n", "print(x[y])\n", "\n", "# Más compacta\n", "x = np.array([1, 2, 3, 4, 4, 35, 212, 5, 5, 6])\n", "print(x[x < 5])" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Créditos \n", "[Datacamp](https://www.datacamp.com/home),\n", "[Quandl](https://s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+SciPy,+NumPy+Cheat+Sheet.pdf) y [Documentación oficial](https://docs.scipy.org/doc/numpy/)\n", "\n", "*Este post fue escrito utilizando IPython notebook. Pueden descargar este [notebook](https://github.com/IAARhub/iaar_template/tree/master/content/notebooks/numpy_cheat_sheet.ipynb) o ver su version estática en [nbviewer](http://nbviewer.ipython.org/github/IAARhub/iaar_template/tree/master/content/notebooks/numpy_cheat_sheet.ipynb).*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }