{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "FORMATO CSV\n", "==" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNombreSexoSueldo
01Juanm10000
12Ramónm20000
23Annaf10500
34Julietaf13000
45Pablom30000
56Pedrom10500
67Gabyf11000
78Ceciliaf27000
\n", "
" ], "text/plain": [ " id Nombre Sexo Sueldo\n", "0 1 Juan m 10000\n", "1 2 Ramón m 20000\n", "2 3 Anna f 10500\n", "3 4 Julieta f 13000\n", "4 5 Pablo m 30000\n", "5 6 Pedro m 10500\n", "6 7 Gaby f 11000\n", "7 8 Cecilia f 27000" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd \n", "dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\") \n", "dat_csv \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">Si queremos que en lugar de todos los datos, se restrinja la salida a las primeras cinco filas, podemos utilizar el método head()." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNombreSexoSueldo
01Juanm10000
12Ramónm20000
23Annaf10500
34Julietaf13000
45Pablom30000
\n", "
" ], "text/plain": [ " id Nombre Sexo Sueldo\n", "0 1 Juan m 10000\n", "1 2 Ramón m 20000\n", "2 3 Anna f 10500\n", "3 4 Julieta f 13000\n", "4 5 Pablo m 30000" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_csv.head() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">En los dos casos anteriores, aparecen los elementos de la primera fila como los nombres de las columnas de datos, si no quisiéramos este comportamiento podemos agregar el atributo header = None, de la siguiente manera:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123
0idNombreSexoSueldo
11Juanm10000
22Ramónm20000
33Annaf10500
44Julietaf13000
55Pablom30000
66Pedrom10500
77Gabyf11000
88Ceciliaf27000
\n", "
" ], "text/plain": [ " 0 1 2 3\n", "0 id Nombre Sexo Sueldo\n", "1 1 Juan m 10000\n", "2 2 Ramón m 20000\n", "3 3 Anna f 10500\n", "4 4 Julieta f 13000\n", "5 5 Pablo m 30000\n", "6 6 Pedro m 10500\n", "7 7 Gaby f 11000\n", "8 8 Cecilia f 27000" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", header = None)\n", "dat_csv " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">Si quisiéramos que tomar una determinada línea como cabecera, debemos asignarle al parámetro header el número de fila correspondiente, en el siguiente ejemplo se toma la línea 3 como cabecera:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
3Annaf10500
04Julietaf13000
15Pablom30000
26Pedrom10500
37Gabyf11000
48Ceciliaf27000
\n", "
" ], "text/plain": [ " 3 Anna f 10500\n", "0 4 Julieta f 13000\n", "1 5 Pablo m 30000\n", "2 6 Pedro m 10500\n", "3 7 Gaby f 11000\n", "4 8 Cecilia f 27000" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", header = 3)\n", "dat_csv " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">Por defecto, read_csv asigna un índice numérico predeterminado que comienza con cero al leer los datos. Sin embargo, es posible alterar este comportamiento pasando el nombre de la columna que utilizaremos como índice. A continuación se dejara la columna id como índice de la tabla:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NombreSexoSueldo
id
1Juanm10000
2Ramónm20000
3Annaf10500
4Julietaf13000
5Pablom30000
6Pedrom10500
7Gabyf11000
8Ceciliaf27000
\n", "
" ], "text/plain": [ " Nombre Sexo Sueldo\n", "id \n", "1 Juan m 10000\n", "2 Ramón m 20000\n", "3 Anna f 10500\n", "4 Julieta f 13000\n", "5 Pablo m 30000\n", "6 Pedro m 10500\n", "7 Gaby f 11000\n", "8 Cecilia f 27000" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", index_col='id')\n", "dat_csv\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">En el caso de que queramos restringir la tabla a algunas columnas específicas, podemos realizarlo con el parámetro __usecols__ indicando en una lista las columnas seleccionadas. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NombreSueldo
0Juan10000
1Ramón20000
2Anna10500
3Julieta13000
4Pablo30000
\n", "
" ], "text/plain": [ " Nombre Sueldo\n", "0 Juan 10000\n", "1 Ramón 20000\n", "2 Anna 10500\n", "3 Julieta 13000\n", "4 Pablo 30000" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", usecols=['Nombre', 'Sueldo'])\n", "dat_csv.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">Si queremos eliminar una fila en particular, podemos utilizar el parámetro skiprows, en el siguiente caso se eliminan las filas 1 y 4:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNombreSexoSueldo
02Ramónm20000
13Annaf10500
25Pablom30000
36Pedrom10500
47Gabyf11000
58Ceciliaf27000
\n", "
" ], "text/plain": [ " id Nombre Sexo Sueldo\n", "0 2 Ramón m 20000\n", "1 3 Anna f 10500\n", "2 5 Pablo m 30000\n", "3 6 Pedro m 10500\n", "4 7 Gaby f 11000\n", "5 8 Cecilia f 27000" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", skiprows = [1,4])\n", "dat_csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">Para presentar un número dado de filas desde el inicio, podemos utilizar el parámetro nrows, en el siguiente ejemplo se presentan las primeras tres filas de datos." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNombreSexoSueldo
01Juanm10000
12Ramónm20000
23Annaf10500
\n", "
" ], "text/plain": [ " id Nombre Sexo Sueldo\n", "0 1 Juan m 10000\n", "1 2 Ramón m 20000\n", "2 3 Anna f 10500" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", nrows=3)\n", "dat_csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">Para seleccionar un rango específico de filas podemos utilizar el método query especificando los límites aplicables sobre una determinada columna:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNombreSexoSueldo
23Annaf10500
34Julietaf13000
45Pablom30000
\n", "
" ], "text/plain": [ " id Nombre Sexo Sueldo\n", "2 3 Anna f 10500\n", "3 4 Julieta f 13000\n", "4 5 Pablo m 30000" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\")\n", "dat_csv.query('2 < id < 6')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">Para tratar con los datos de una columna, podemos recuperarlos dentro de una lista mediante el uso de un bucle for, y luego si es necesario convertir la lista en una array como se muestra a continuación:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Juan', 'Pedro', 'Marcelo', 'Anna']\n", "['m', 'm', 'm', 'f']\n", "\n", "['Juan' 'Pedro' 'Marcelo' 'Anna']\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "dat_csv = pd.read_csv('datos.csv', encoding = \"ISO-8859-1\")\n", "datos_x = dat_csv.nombre\n", "datos_y = dat_csv.sexo\n", "x = []\n", "y = []\n", "for i in dat_csv.nombre:\n", " x.append(i)\n", "for j in dat_csv.sexo:\n", " y.append(j)\n", "print(x) \n", "print(y)\n", "\n", "x_array = np.array(x)\n", "print(type(x_array))\n", "print(x_array)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "FORMATO EXCEL\n", "==" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">En el caso de trabajar directamente con un archivo de Excel, podemos utilizar el método __read_excel()__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "FORMATO JSON\n", "==" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nombreedadestado_civilesposohijosautos
0Ana33TruePabloCecilia{'modelo': 'Ford', 'color': 'rojo'}
1Ana33TruePabloLuis{'modelo': 'Chevrolet', 'color': 'azul'}
\n", "
" ], "text/plain": [ " nombre edad estado_civil esposo hijos \\\n", "0 Ana 33 True Pablo Cecilia \n", "1 Ana 33 True Pablo Luis \n", "\n", " autos \n", "0 {'modelo': 'Ford', 'color': 'rojo'} \n", "1 {'modelo': 'Chevrolet', 'color': 'azul'} " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "movies_json = pd.read_json('json1.json')\n", "movies_json.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "FORMATO HTML\n", "==" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[ 0 1 2 3\n", " 0 id Nombre Sexo Sueldo\n", " 1 1 Anna f 10000]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "pd.read_html('index.html')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "BASE DE DATOS\n", "==" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnombresexo
01Pedrom
12Annaf
23Celestef
\n", "
" ], "text/plain": [ " id nombre sexo\n", "0 1 Pedro m\n", "1 2 Anna f\n", "2 3 Celeste f" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import sqlite3\n", "conn = sqlite3.connect(\"mibase.sqlite\")\n", "df = pd.read_sql_query(\"SELECT * FROM producto;\", conn)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "FORMATO XML\n", "==" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " nombre email telefono calle\n", "0 Juan juan@gmail.com 011-1111111111 Tacuarí 342\n", "1 Anna anna@gmail.com None None\n", "2 Pedro pedro@gmail.com 011-2222222222 Río de Janeiro 215\n", "3 Gabriela gabriela@gmail.com 011-3333333333 Rivadavia 8345\n" ] } ], "source": [ "import xml.etree.cElementTree as et\n", "import pandas as pd\n", " \n", "def obtenerValorDeNodo(node):\n", " return node.text if node is not None else None\n", " \n", " \n", "def main():\n", " parsed_xml = et.parse(\"datos.xml\")\n", " dfcols = ['nombre', 'email', 'telefono', 'calle']\n", " df_xml = pd.DataFrame(columns=dfcols)\n", " \n", " for node in parsed_xml.getroot():\n", " nombre = node.attrib.get('nombre')\n", " email = node.find('email')\n", " telefono = node.find('telefono')\n", " calle = node.find('direccion/calle')\n", " \n", " df_xml = df_xml.append( pd.Series([nombre, obtenerValorDeNodo(email), obtenerValorDeNodo(telefono), obtenerValorDeNodo(calle)], index=dfcols), ignore_index=True)\n", " print(df_xml) \n", " \n", "main()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 4 }