{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FORMATO CSV\n",
"=="
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" Nombre | \n",
" Sexo | \n",
" Sueldo | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Juan | \n",
" m | \n",
" 10000 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Ramón | \n",
" m | \n",
" 20000 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Anna | \n",
" f | \n",
" 10500 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Julieta | \n",
" f | \n",
" 13000 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Pablo | \n",
" m | \n",
" 30000 | \n",
"
\n",
" \n",
" 5 | \n",
" 6 | \n",
" Pedro | \n",
" m | \n",
" 10500 | \n",
"
\n",
" \n",
" 6 | \n",
" 7 | \n",
" Gaby | \n",
" f | \n",
" 11000 | \n",
"
\n",
" \n",
" 7 | \n",
" 8 | \n",
" Cecilia | \n",
" f | \n",
" 27000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id Nombre Sexo Sueldo\n",
"0 1 Juan m 10000\n",
"1 2 Ramón m 20000\n",
"2 3 Anna f 10500\n",
"3 4 Julieta f 13000\n",
"4 5 Pablo m 30000\n",
"5 6 Pedro m 10500\n",
"6 7 Gaby f 11000\n",
"7 8 Cecilia f 27000"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
"dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\") \n",
"dat_csv \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Si queremos que en lugar de todos los datos, se restrinja la salida a las primeras cinco filas, podemos utilizar el método head()."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" Nombre | \n",
" Sexo | \n",
" Sueldo | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Juan | \n",
" m | \n",
" 10000 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Ramón | \n",
" m | \n",
" 20000 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Anna | \n",
" f | \n",
" 10500 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Julieta | \n",
" f | \n",
" 13000 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Pablo | \n",
" m | \n",
" 30000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id Nombre Sexo Sueldo\n",
"0 1 Juan m 10000\n",
"1 2 Ramón m 20000\n",
"2 3 Anna f 10500\n",
"3 4 Julieta f 13000\n",
"4 5 Pablo m 30000"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat_csv.head() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">En los dos casos anteriores, aparecen los elementos de la primera fila como los nombres de las columnas de datos, si no quisiéramos este comportamiento podemos agregar el atributo header = None, de la siguiente manera:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" id | \n",
" Nombre | \n",
" Sexo | \n",
" Sueldo | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" Juan | \n",
" m | \n",
" 10000 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" Ramón | \n",
" m | \n",
" 20000 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" Anna | \n",
" f | \n",
" 10500 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" Julieta | \n",
" f | \n",
" 13000 | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" Pablo | \n",
" m | \n",
" 30000 | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" Pedro | \n",
" m | \n",
" 10500 | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" Gaby | \n",
" f | \n",
" 11000 | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" Cecilia | \n",
" f | \n",
" 27000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3\n",
"0 id Nombre Sexo Sueldo\n",
"1 1 Juan m 10000\n",
"2 2 Ramón m 20000\n",
"3 3 Anna f 10500\n",
"4 4 Julieta f 13000\n",
"5 5 Pablo m 30000\n",
"6 6 Pedro m 10500\n",
"7 7 Gaby f 11000\n",
"8 8 Cecilia f 27000"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", header = None)\n",
"dat_csv "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Si quisiéramos que tomar una determinada línea como cabecera, debemos asignarle al parámetro header el número de fila correspondiente, en el siguiente ejemplo se toma la línea 3 como cabecera:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 3 | \n",
" Anna | \n",
" f | \n",
" 10500 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 4 | \n",
" Julieta | \n",
" f | \n",
" 13000 | \n",
"
\n",
" \n",
" 1 | \n",
" 5 | \n",
" Pablo | \n",
" m | \n",
" 30000 | \n",
"
\n",
" \n",
" 2 | \n",
" 6 | \n",
" Pedro | \n",
" m | \n",
" 10500 | \n",
"
\n",
" \n",
" 3 | \n",
" 7 | \n",
" Gaby | \n",
" f | \n",
" 11000 | \n",
"
\n",
" \n",
" 4 | \n",
" 8 | \n",
" Cecilia | \n",
" f | \n",
" 27000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 3 Anna f 10500\n",
"0 4 Julieta f 13000\n",
"1 5 Pablo m 30000\n",
"2 6 Pedro m 10500\n",
"3 7 Gaby f 11000\n",
"4 8 Cecilia f 27000"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", header = 3)\n",
"dat_csv "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Por defecto, read_csv asigna un índice numérico predeterminado que comienza con cero al leer los datos. Sin embargo, es posible alterar este comportamiento pasando el nombre de la columna que utilizaremos como índice. A continuación se dejara la columna id como índice de la tabla:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Nombre | \n",
" Sexo | \n",
" Sueldo | \n",
"
\n",
" \n",
" id | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" Juan | \n",
" m | \n",
" 10000 | \n",
"
\n",
" \n",
" 2 | \n",
" Ramón | \n",
" m | \n",
" 20000 | \n",
"
\n",
" \n",
" 3 | \n",
" Anna | \n",
" f | \n",
" 10500 | \n",
"
\n",
" \n",
" 4 | \n",
" Julieta | \n",
" f | \n",
" 13000 | \n",
"
\n",
" \n",
" 5 | \n",
" Pablo | \n",
" m | \n",
" 30000 | \n",
"
\n",
" \n",
" 6 | \n",
" Pedro | \n",
" m | \n",
" 10500 | \n",
"
\n",
" \n",
" 7 | \n",
" Gaby | \n",
" f | \n",
" 11000 | \n",
"
\n",
" \n",
" 8 | \n",
" Cecilia | \n",
" f | \n",
" 27000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Nombre Sexo Sueldo\n",
"id \n",
"1 Juan m 10000\n",
"2 Ramón m 20000\n",
"3 Anna f 10500\n",
"4 Julieta f 13000\n",
"5 Pablo m 30000\n",
"6 Pedro m 10500\n",
"7 Gaby f 11000\n",
"8 Cecilia f 27000"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", index_col='id')\n",
"dat_csv\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">En el caso de que queramos restringir la tabla a algunas columnas específicas, podemos realizarlo con el parámetro __usecols__ indicando en una lista las columnas seleccionadas. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Nombre | \n",
" Sueldo | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Juan | \n",
" 10000 | \n",
"
\n",
" \n",
" 1 | \n",
" Ramón | \n",
" 20000 | \n",
"
\n",
" \n",
" 2 | \n",
" Anna | \n",
" 10500 | \n",
"
\n",
" \n",
" 3 | \n",
" Julieta | \n",
" 13000 | \n",
"
\n",
" \n",
" 4 | \n",
" Pablo | \n",
" 30000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Nombre Sueldo\n",
"0 Juan 10000\n",
"1 Ramón 20000\n",
"2 Anna 10500\n",
"3 Julieta 13000\n",
"4 Pablo 30000"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", usecols=['Nombre', 'Sueldo'])\n",
"dat_csv.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Si queremos eliminar una fila en particular, podemos utilizar el parámetro skiprows, en el siguiente caso se eliminan las filas 1 y 4:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" Nombre | \n",
" Sexo | \n",
" Sueldo | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2 | \n",
" Ramón | \n",
" m | \n",
" 20000 | \n",
"
\n",
" \n",
" 1 | \n",
" 3 | \n",
" Anna | \n",
" f | \n",
" 10500 | \n",
"
\n",
" \n",
" 2 | \n",
" 5 | \n",
" Pablo | \n",
" m | \n",
" 30000 | \n",
"
\n",
" \n",
" 3 | \n",
" 6 | \n",
" Pedro | \n",
" m | \n",
" 10500 | \n",
"
\n",
" \n",
" 4 | \n",
" 7 | \n",
" Gaby | \n",
" f | \n",
" 11000 | \n",
"
\n",
" \n",
" 5 | \n",
" 8 | \n",
" Cecilia | \n",
" f | \n",
" 27000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id Nombre Sexo Sueldo\n",
"0 2 Ramón m 20000\n",
"1 3 Anna f 10500\n",
"2 5 Pablo m 30000\n",
"3 6 Pedro m 10500\n",
"4 7 Gaby f 11000\n",
"5 8 Cecilia f 27000"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", skiprows = [1,4])\n",
"dat_csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Para presentar un número dado de filas desde el inicio, podemos utilizar el parámetro nrows, en el siguiente ejemplo se presentan las primeras tres filas de datos."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" Nombre | \n",
" Sexo | \n",
" Sueldo | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Juan | \n",
" m | \n",
" 10000 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Ramón | \n",
" m | \n",
" 20000 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Anna | \n",
" f | \n",
" 10500 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id Nombre Sexo Sueldo\n",
"0 1 Juan m 10000\n",
"1 2 Ramón m 20000\n",
"2 3 Anna f 10500"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\", nrows=3)\n",
"dat_csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Para seleccionar un rango específico de filas podemos utilizar el método query especificando los límites aplicables sobre una determinada columna:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" Nombre | \n",
" Sexo | \n",
" Sueldo | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" 3 | \n",
" Anna | \n",
" f | \n",
" 10500 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Julieta | \n",
" f | \n",
" 13000 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Pablo | \n",
" m | \n",
" 30000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id Nombre Sexo Sueldo\n",
"2 3 Anna f 10500\n",
"3 4 Julieta f 13000\n",
"4 5 Pablo m 30000"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat_csv = pd.read_csv('empleados.csv', encoding = \"ISO-8859-1\")\n",
"dat_csv.query('2 < id < 6')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Para tratar con los datos de una columna, podemos recuperarlos dentro de una lista mediante el uso de un bucle for, y luego si es necesario convertir la lista en una array como se muestra a continuación:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Juan', 'Pedro', 'Marcelo', 'Anna']\n",
"['m', 'm', 'm', 'f']\n",
"\n",
"['Juan' 'Pedro' 'Marcelo' 'Anna']\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"dat_csv = pd.read_csv('datos.csv', encoding = \"ISO-8859-1\")\n",
"datos_x = dat_csv.nombre\n",
"datos_y = dat_csv.sexo\n",
"x = []\n",
"y = []\n",
"for i in dat_csv.nombre:\n",
" x.append(i)\n",
"for j in dat_csv.sexo:\n",
" y.append(j)\n",
"print(x) \n",
"print(y)\n",
"\n",
"x_array = np.array(x)\n",
"print(type(x_array))\n",
"print(x_array)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FORMATO EXCEL\n",
"=="
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">En el caso de trabajar directamente con un archivo de Excel, podemos utilizar el método __read_excel()__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FORMATO JSON\n",
"=="
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" nombre | \n",
" edad | \n",
" estado_civil | \n",
" esposo | \n",
" hijos | \n",
" autos | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Ana | \n",
" 33 | \n",
" True | \n",
" Pablo | \n",
" Cecilia | \n",
" {'modelo': 'Ford', 'color': 'rojo'} | \n",
"
\n",
" \n",
" 1 | \n",
" Ana | \n",
" 33 | \n",
" True | \n",
" Pablo | \n",
" Luis | \n",
" {'modelo': 'Chevrolet', 'color': 'azul'} | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" nombre edad estado_civil esposo hijos \\\n",
"0 Ana 33 True Pablo Cecilia \n",
"1 Ana 33 True Pablo Luis \n",
"\n",
" autos \n",
"0 {'modelo': 'Ford', 'color': 'rojo'} \n",
"1 {'modelo': 'Chevrolet', 'color': 'azul'} "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"movies_json = pd.read_json('json1.json')\n",
"movies_json.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FORMATO HTML\n",
"=="
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[ 0 1 2 3\n",
" 0 id Nombre Sexo Sueldo\n",
" 1 1 Anna f 10000]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"pd.read_html('index.html')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"BASE DE DATOS\n",
"=="
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" nombre | \n",
" sexo | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Pedro | \n",
" m | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Anna | \n",
" f | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Celeste | \n",
" f | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id nombre sexo\n",
"0 1 Pedro m\n",
"1 2 Anna f\n",
"2 3 Celeste f"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import sqlite3\n",
"conn = sqlite3.connect(\"mibase.sqlite\")\n",
"df = pd.read_sql_query(\"SELECT * FROM producto;\", conn)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FORMATO XML\n",
"=="
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" nombre email telefono calle\n",
"0 Juan juan@gmail.com 011-1111111111 Tacuarí 342\n",
"1 Anna anna@gmail.com None None\n",
"2 Pedro pedro@gmail.com 011-2222222222 Río de Janeiro 215\n",
"3 Gabriela gabriela@gmail.com 011-3333333333 Rivadavia 8345\n"
]
}
],
"source": [
"import xml.etree.cElementTree as et\n",
"import pandas as pd\n",
" \n",
"def obtenerValorDeNodo(node):\n",
" return node.text if node is not None else None\n",
" \n",
" \n",
"def main():\n",
" parsed_xml = et.parse(\"datos.xml\")\n",
" dfcols = ['nombre', 'email', 'telefono', 'calle']\n",
" df_xml = pd.DataFrame(columns=dfcols)\n",
" \n",
" for node in parsed_xml.getroot():\n",
" nombre = node.attrib.get('nombre')\n",
" email = node.find('email')\n",
" telefono = node.find('telefono')\n",
" calle = node.find('direccion/calle')\n",
" \n",
" df_xml = df_xml.append( pd.Series([nombre, obtenerValorDeNodo(email), obtenerValorDeNodo(telefono), obtenerValorDeNodo(calle)], index=dfcols), ignore_index=True)\n",
" print(df_xml) \n",
" \n",
"main()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}