{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cómo cargar datos en Python\n", "\n", "Es muy útil poder leer datos, pero es aún más útil poder modificarlos a nuestro antojo.\n", "\n", "Hay varias maneras de hacer esto,usando diferentes módulos de Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Usando Numpy\n", "\n", "Podemos generar datos dentro de Python usando Numpy y luego guardándolos en algún archivo, cuya extensión es ```.npy``` (Qué raro) para luego volver a cargarlo en otro Notebook o Script.\n", "\n", "**Ejemplo:**" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data =\n", "\n", " [[1 2 3]\n", " [4 5 6]]\n" ] } ], "source": [ "import numpy as np\n", "\n", "\n", "data = np.array([[1, 2, 3], [4, 5, 6]])\n", "\n", "np.save('/home/moury/Documents/Data/test', data)\n", "\n", "x = np.load('/home/moury/Documents/Data/test.npy')\n", "\n", "print(\"Data =\\n\\n\",x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Juguemos** con los datos:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "La suma es: 21 \n", "\n", "El Producto es:\n", "\n", " [[ 2 4 6]\n", " [ 8 10 12]] \n", "\n", "La Potencia es:\n", "\n", " [[ 1 4 9]\n", " [16 25 36]] \n", "\n", "El promedio es: 3.5 \n", "\n", "La Desviación estándar es: 1.707825127659933\n" ] } ], "source": [ "sumita = np.sum(x)\n", "produc = 2*x\n", "potenc = x**2\n", "promed = np.average(x)\n", "stdevi = np.std(x)\n", "\n", "print(\"La suma es:\", sumita,\"\\n\")\n", "print(\"El Producto es:\\n\\n\", produc,\"\\n\")\n", "print(\"La Potencia es:\\n\\n\", potenc,\"\\n\")\n", "print(\"El promedio es:\", promed,\"\\n\")\n", "print(\"La Desviación estándar es:\", stdevi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "También podríamos haber usado el atributo ```loadtxt``` para leer archivos .txt:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data =\n", "\n", " [[4. 1. 5.]\n", " [4. 8. 3.]\n", " [1. 4. 5.]\n", " [9. 0. 6.]]\n" ] } ], "source": [ "x=np.loadtxt(\"/home/moury/Documents/Data/data.txt\")\n", "print(\"Data =\\n\\n\",x)\n", "#Notar los puntos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Usando CSV\n", "\n", "Numpy es un poco celoso para leer archivos de otros formatos diferentes de npy y .txt.\n", "\n", "En vez de eso, podemos usar el módulo **csv** (comma-separated values) para leer archivos.\n", "\n", "Esto es un poco más engorroso, pero sin lugar a dudas vale la pena.\n", "\n", "**Nota:** Solo podremos leer datos que contengan elementos numéricos." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[4. 1. 5.]\n", " [4. 8. 3.]\n", " [1. 4. 5.]\n", " [9. 0. 6.]]\n" ] } ], "source": [ "import csv\n", "\n", "filen = '/home/moury/Documents/Data/data.txt'\n", "rawdata = open(filen,'r')\n", "reader = csv.reader(rawdata, delimiter=' ')\n", "x = list(reader)\n", "data = np.array(x).astype('float')\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cargar datos desde Internet\n", "\n", "Para esto, basta suministrar la dirección url donde se encuentran alojados los datos e importar ciertos atributos del módulo **urllib**:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 6. 148. 72. ... 0.627 50. 1. ]\n", " [ 1. 85. 66. ... 0.351 31. 0. ]\n", " [ 8. 183. 64. ... 0.672 32. 1. ]\n", " ...\n", " [ 5. 121. 72. ... 0.245 30. 0. ]\n", " [ 1. 126. 60. ... 0.349 47. 1. ]\n", " [ 1. 93. 70. ... 0.315 23. 0. ]]\n" ] } ], "source": [ "from numpy import loadtxt\n", "from urllib.request import urlopen\n", "\n", "url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv'\n", "\n", "rawdata=urlopen(url)\n", "\n", "data2 = loadtxt(rawdata, delimiter=\",\")\n", "\n", "print(data2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tarea\n", "\n", "- Hacer el mismo proceso usando la librería pandas\n", "- ¿Por qué es tan famoso pandas?\n", "- Haga un análisis estadístico descriptivo de un conjunto grande datos" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 4 }