{"cells":[{"cell_type":"markdown","source":"# Escalas de medicion\n","metadata":{"id":"Gg_mKXTZ0oqx","cell_id":"1849a88712c04a9394679509f34c8eff","deepnote_cell_type":"markdown"}},{"cell_type":"markdown","source":"**Nominal:** Consiste en clasificar objetos o fenómenos, según ciertas características, tipologías o nombres, dándoles una denominación o símbolo, sin que implique ninguna relación de orden, distancia o proporción entre los objetos o fenómeno.\n\n**Ejemplo** Cuando un producto se rotula de acuerdo al cumplimiento de las especificaciones de diseño como \"conforme y no conforme\". o \"crítico, grave, y menor\". No se obtienen valores numéricos y no se puede realizar un orden de las observaciones con sentido.\n\n**Ordinal** Llamada también escala de orden jerárquico, con ella se establecen posiciones relativas de los objetos o fenómenos en estudio, respecto a alguna característica de interés, sin que se reflejen distancias entre ellos. Puede suceder que los objetos de una categoría de las escala no sean precisamente diferentes a los objetos de otra categoría de la escala, sino que están relacionados entre si.\n\n**Ejemplo** Suponga que a los clientes en un almacen se les hace unas preguntas para valorar la calidad del servicio. Los clientes valoran la calidad de acuerdo a las siguientes respuestas: 1 (excelente), 2 (bueno), 3 (regular), 3 (malo) 4 (pésimo).\n\n**Intervalo** Representa un nivel de medición más preciso, matemáticamente hablando, que las anteriores; no solo se establece un orden en las posiciones relativas de los objetos o individuos, sino que se mide también la distancia entre los intervalos o las diferentes categorías o clases.\n\n**Ejemplo** Suponga que se está interesado en la temperatura del fundido de acero. Se toman cuatro lecturas cada dos horas: 2050, 2100, 2150, 2200 y 2250 F. Obviamente los datos pueden ser ordenados (semejante a los datos ordinales) en orden ascendente de temperatura indicando temperatura más fria, menos fria, y asi sucesivamente.\n\n**Razon** Cuando una escala tiene todas las características de una escala de intervalo y además un punto cero real en su origen, se llama escala de razón. Además de distinción, orden y distancia, ésta es una escala que permite establecer en que proporción es mayor una categoría de una escala que otra. El cero absoluto o natural representa la nulidad de lo que se estudia.\n\n**Ejemplo** Suponga que el peso de cuatro piezas fundidas de metal son 2.0, 2.1, 2.3 y 2.5 kg. El orden(ordinal) y la diferencia (intervalo) en los pesos puede ser comparado. Así, el incremento de peso de 2.0 a 2.1 es de 0.1 kg, el cual es el mismo que el que existe entre 2.3 y 2.4 kg","metadata":{"id":"bndj1KVa0tME","cell_id":"f313c38466d548dbaf58977e557bc9d9","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"from google.colab import drive\nimport os\ndrive.mount('/content/gdrive')\n# Establecer ruta de acceso en drive\nimport os\nprint(os.getcwd())\nos.chdir(\"/content/gdrive/My Drive\")","metadata":{"id":"A2vwLVGR0cf_","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"056f15a9fbf04be4aae4e4f6d4baa4bc","outputId":"bfeeda6d-6f0e-4ad1-e054-9737ad3ba03a","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":2604,"user_tz":180,"timestamp":1647184350115},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount(\"/content/gdrive\", force_remount=True).\n/content/gdrive/My Drive\n"}],"execution_count":9},{"cell_type":"markdown","source":"# Histograma\n\nEl histograma es una técnica gráfica utilizada para presentar gran cantidad de datos. Se le atribuye a Karl Pearson en 1895. El histograma puede ser: de **frecuencias absolutas, de frecuencias relativas, de frecuencias absolutas acumuladas y de frecuencias relativas acumuladas**. Para la construcción del histograma se requiere elaborar una tabla de distribución de frecuencias, lo cual se desarrollará a continuación.\n\nEl gráfico de la distribución de frecuencias, se llama histograma. El histograma de frecuencias es una representación visual de los datos en donde se evidencian fundamentalmente tres características: **forma, acumulación o tendencia posicional y dispersión o variabilidad**.\n\nEl histograma (de frecuencias) en si es una sucesión de rectángulos construidos sobre un sistema de coordenadas de la siguiente manera:\n\n1. Las bases de los rectángulos se localizan en el eje horizontal. La longitud de la base es igual al ancho del intervalo.\n2. Las alturas de los rectángulos se registran sobre el eje vertical y corresponden a las frecuencias de los intervalos.\n3. Las áreas de los rectángulos son proporcionales a las frecuencias de las clases.\n\n$$k= 1 + 3.3 log_{10} (n)$$\n","metadata":{"id":"2kG9_qWu8v5w","cell_id":"68ea5cde436049b4ae8923a6040edee0","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"import seaborn as sns\nimport matplotlib.pyplot as plt\ntips = sns.load_dataset('tips')\ntips.head()","metadata":{"id":"YVsxX5sj7s5R","colab":{"height":206,"base_uri":"https://localhost:8080/"},"cell_id":"2c21876bc3a7440f9d3cfc4acae98258","outputId":"1b7ab514-5dad-47ba-9b5c-138bf07883d8","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":21,"user_tz":180,"timestamp":1647184297830},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":" total_bill tip sex smoker day time size\n0 16.99 1.01 Female No Sun Dinner 2\n1 10.34 1.66 Male No Sun Dinner 3\n2 21.01 3.50 Male No Sun Dinner 3\n3 23.68 3.31 Male No Sun Dinner 2\n4 24.59 3.61 Female No Sun Dinner 4","text/html":"\n
\n
\n
\n\n
\n \n
\n
\n
total_bill
\n
tip
\n
sex
\n
smoker
\n
day
\n
time
\n
size
\n
\n \n \n
\n
0
\n
16.99
\n
1.01
\n
Female
\n
No
\n
Sun
\n
Dinner
\n
2
\n
\n
\n
1
\n
10.34
\n
1.66
\n
Male
\n
No
\n
Sun
\n
Dinner
\n
3
\n
\n
\n
2
\n
21.01
\n
3.50
\n
Male
\n
No
\n
Sun
\n
Dinner
\n
3
\n
\n
\n
3
\n
23.68
\n
3.31
\n
Male
\n
No
\n
Sun
\n
Dinner
\n
2
\n
\n
\n
4
\n
24.59
\n
3.61
\n
Female
\n
No
\n
Sun
\n
Dinner
\n
4
\n
\n \n
\n
\n \n \n \n\n \n
\n
\n "},"metadata":{},"execution_count":4}],"execution_count":4},{"cell_type":"code","source":"sns.distplot(tips['total_bill'], kde=False) \nplt.show()","metadata":{"id":"Ed4SXTKO9mcZ","colab":{"height":338,"base_uri":"https://localhost:8080/"},"cell_id":"162556c826b9430eb0282ac75236d63a","outputId":"277f9f14-4c72-48ec-dc34-c10a163f4094","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":590,"user_tz":180,"timestamp":1647184298405},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stderr","text":"/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n warnings.warn(msg, FutureWarning)\n"},{"output_type":"display_data","data":{"text/plain":"
","image/png":"iVBORw0KGgoAAAANSUhEUgAAAXAAAAEKCAYAAAALoA6YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAOp0lEQVR4nO3df6zddX3H8edrBaNTM0SuDaF0l02mIWbU5A5wsgVxOBaJsMyg1rkuIWmWuQSdxqH/GJctkSzxR+KWrBFjk1mEoAzinLOpMJ1bikVAQTQgA0ZTaEWIsEWX4nt/nG/D3eW297T3nHt59z4fyc35fj/f7/me9yc5vPrhc873c1JVSJL6+YXVLkCSdGwMcElqygCXpKYMcElqygCXpKYMcElq6oRxTkryIPAU8AxwsKrmkpwMXAfMAg8Cl1fVE9MpU5K00NGMwN9QVZuqam7YvwrYVVVnAruGfUnSCsk4N/IMI/C5qvrRvLYfABdU1b4kpwK3VtWrjnSdU045pWZnZ5dXsSStMbfffvuPqmpmYftYUyhAAV9NUsDfV9U2YH1V7RuOPwqsX+ois7Oz7NmzZ9yaJUlAkocWax83wM+vqr1JXgHsTPL9+QerqoZwX+yFtwJbATZu3HgUJUuSjmSsOfCq2js87gduBM4BHhumThge9x/muduqaq6q5mZmnvN/AJKkY7RkgCd5cZKXHtoG3gTcDdwMbBlO2wLcNK0iJUnPNc4UynrgxiSHzt9RVV9J8i3g+iRXAA8Bl0+vTEnSQksGeFU9AJy9SPvjwBunUZQkaWneiSlJTRngktSUAS5JTRngktTUuDfy6Hlgx+6Hp3Ldzed6g5XUkSNwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekplzMakqmtfCUJB3iCFySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJampsQM8ybokdyT50rB/RpLdSe5Pcl2SF0yvTEnSQkczAr8SuHfe/tXAx6vqlcATwBWTLEySdGRjBXiSDcCbgU8P+wEuBG4YTtkOXDaNAiVJixt3BP4J4APAz4f9lwNPVtXBYf8R4LQJ1yZJOoIlAzzJJcD+qrr9WF4gydYke5LsOXDgwLFcQpK0iHFG4K8H3pLkQeDzjKZOPgmclOTQr9pvAPYu9uSq2lZVc1U1NzMzM4GSJUkwRoBX1QerakNVzQJvB75WVe8EbgHeOpy2BbhpalVKkp5jOd8D/wvgz5Pcz2hO/JrJlCRJGscJS5/yrKq6Fbh12H4AOGfyJUmSxnFUAa7j047dD0/lupvP3TiV60oa8VZ6SWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWpqyQBP8sIktyW5K8k9ST4ytJ+RZHeS+5Ncl+QF0y9XknTIOCPwnwEXVtXZwCbg4iTnAVcDH6+qVwJPAFdMr0xJ0kJLBniNPD3snjj8FXAhcMPQvh24bCoVSpIWNdYceJJ1Se4E9gM7gR8CT1bVweGUR4DTplOiJGkxJ4xzUlU9A2xKchJwI/DqcV8gyVZgK8DGjRuPpUY1tWP3wxO/5uZzfQ9JhxzVt1Cq6kngFuB1wElJDv0DsAHYe5jnbKuquaqam5mZWVaxkqRnjfMtlJlh5E2SFwEXAfcyCvK3DqdtAW6aVpGSpOcaZwrlVGB7knWMAv/6qvpSku8Bn0/yV8AdwDVTrFOStMCSAV5V3wFeu0j7A8A50yhKkrQ078SUpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElq6oTVLkA6Gjt2PzyV624+d+NUritN05Ij8CSnJ7klyfeS3JPkyqH95CQ7k9w3PL5s+uVKkg4ZZwrlIPC+qjoLOA94d5KzgKuAXVV1JrBr2JckrZAlA7yq9lXVt4ftp4B7gdOAS4Htw2nbgcumVaQk6bmO6kPMJLPAa4HdwPqq2jccehRYP9HKJElHNHaAJ3kJ8AXgPVX1k/nHqqqAOszztibZk2TPgQMHllWsJOlZYwV4khMZhffnquqLQ/NjSU4djp8K7F/suVW1rarmqmpuZmZmEjVLkhjvWygBrgHuraqPzTt0M7Bl2N4C3DT58iRJhzPO98BfD7wL+G6SO4e2DwEfBa5PcgXwEHD5dEqUJC1myQCvqn8DcpjDb5xsOZKkcXkrvSQ1ZYBLUlMGuCQ1ZYBLUlMGuCQ1ZYBLUlMGuCQ1ZYBLUlMGuCQ1ZYBLUlMGuCQ1ZYBLUlMGuCQ1ZYBLUlMGuCQ1ZYBLUlPj/CKPdNzbsfvhqVx387kbp3JdCRyBS1JbBrgkNWWAS1JTa34OfFpzn5I0bY7AJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJampJQM8yWeS7E9y97y2k5PsTHLf8Piy6ZYpSVponBH4Z4GLF7RdBeyqqjOBXcO+JGkFLRngVfV14McLmi8Ftg/b24HLJlyXJGkJxzoHvr6q9g3bjwLrJ1SPJGlMy/4Qs6oKqMMdT7I1yZ4kew4cOLDcl5MkDY41wB9LcirA8Lj/cCdW1baqmququZmZmWN8OUnSQsca4DcDW4btLcBNkylHkjSucb5GeC3wH8CrkjyS5Argo8BFSe4DfmfYlyStoCV/lb6q3nGYQ2+ccC3ScWfH7oenct3N526cynXVi3diSlJTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNbXkjTySnn+mcYOQNwf14whckpoywCWpKQNckppyDlwS4MJbHTkCl6SmDHBJasoAl6SmDHBJasoAl6SmDHBJasoAl6SmDHBJasobeSRpnk4LhTkCl6SmDHBJasoAl6SmDHBJasoAl6SmDHBJasoAl6Sm2nwPfFqLzUuaLv/bnR5H4JLUlAEuSU0Z4JLUlAEuSU0tK8CTXJzkB0nuT3LVpIqSJC3tmAM8yTrgb4HfA84C3pHkrEkVJkk6suWMwM8B7q+qB6rqf4HPA5dOpixJ0lKWE+CnAf81b/+RoU2StAKmfiNPkq3A1mH36SQ/mPZrPk+cAvxotYtYRfZ/7fZ/LfcdFun/O5d/zV9erHE5Ab4XOH3e/oah7f+pqm3AtmW8TktJ9lTV3GrXsVrs/9rt/1ruO6xs/5czhfIt4MwkZyR5AfB24ObJlCVJWsoxj8Cr6mCSPwP+BVgHfKaq7plYZZKkI1rWHHhVfRn48oRqOd6suWmjBez/2rWW+w4r2P9U1Uq9liRpgryVXpKaMsAnIMlnkuxPcve8tpOT7Exy3/D4stWscVqSnJ7kliTfS3JPkiuH9rXS/xcmuS3JXUP/PzK0n5Fk97DMxHXDB/3HrSTrktyR5EvD/prpf5IHk3w3yZ1J9gxtK/L+N8An47PAxQvargJ2VdWZwK5h/3h0EHhfVZ0FnAe8e1hSYa30/2fAhVV1NrAJuDjJecDVwMer6pXAE8AVq1jjSrgSuHfe/lrr/xuqatO8rw+uyPvfAJ+Aqvo68OMFzZcC24ft7cBlK1rUCqmqfVX17WH7KUb/EZ/G2ul/VdXTw+6Jw18BFwI3DO3Hbf8BkmwA3gx8etgPa6j/h7Ei738DfHrWV9W+YftRYP1qFrMSkswCrwV2s4b6P0wf3AnsB3YCPwSerKqDwynH+zITnwA+APx82H85a6v/BXw1ye3DneewQu//Nr+J2VlVVZLj+us+SV4CfAF4T1X9ZDQIGzne+19VzwCbkpwE3Ai8epVLWjFJLgH2V9XtSS5Y7XpWyflVtTfJK4CdSb4//+A03/+OwKfnsSSnAgyP+1e5nqlJciKj8P5cVX1xaF4z/T+kqp4EbgFeB5yU5NAAadFlJo4TrwfekuRBRiuSXgh8krXTf6pq7/C4n9E/4OewQu9/A3x6bga2DNtbgJtWsZapGeY7rwHuraqPzTu0Vvo/M4y8SfIi4CJGnwPcArx1OO247X9VfbCqNlTVLKPlNL5WVe9kjfQ/yYuTvPTQNvAm4G5W6P3vjTwTkORa4AJGq5A9BnwY+EfgemAj8BBweVUt/KCzvSTnA98Avsuzc6AfYjQPvhb6/+uMPqRax2hAdH1V/WWSX2E0Ij0ZuAP4w6r62epVOn3DFMr7q+qStdL/oZ83DrsnADuq6q+TvJwVeP8b4JLUlFMoktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngaifJSUn+dIlzZpNsHuNas/OXAV7k+B8n+dRhjv37wmskueDQkqrStBng6ugk4IgBDswCSwb4clTVb07z+tJSDHB19FHgV4cF9P9m+Lt7WFT/bfPO+a3hnPcOo+RvJPn28Hc04Xt6kluHxfk/fKgxydNHepI0ba5GqI6uAl5TVZuS/AHwJ8DZjJYy+FaSrw/nvL+qLgFI8ovARVX10yRnAtcCc4tf/jnOAV4D/M9w/X+qqj2T7ZJ09AxwdXc+cO2wpOtjSf4V+A3gJwvOOxH4VJJNwDPArx3Fa+ysqscBknxxeE0DXKvOANda8V5GC42dzWjq8KdH8dyFCwa5gJCeF5wDV0dPAS8dtr8BvG34VZwZ4LeB2xacA/BLwL6q+jnwLkarB47rouFHal/E6KexvrncDkiT4Ahc7VTV40m+OXx175+B7wB3MRoZf6CqHk3yOPBMkrsY/ej03wFfSPJHwFeA/z6Kl7yN0Q9WbAD+wflvPV+4nKwkNeUUiiQ15RSKBCT5XeDqBc3/WVW/vxr1SONwCkWSmnIKRZKaMsAlqSkDXJKaMsAlqSkDXJKa+j9goK0GGa3hkwAAAABJRU5ErkJggg==\n"},"metadata":{"needs_background":"light"}}],"execution_count":5},{"cell_type":"code","source":"sns.histplot(data=tips,x='total_bill',hue='sex') \nplt.title('Histograma de gasto por sexo')\nplt.show()","metadata":{"id":"5uuTFo6--STa","colab":{"height":296,"base_uri":"https://localhost:8080/"},"cell_id":"b9e247956be44f1aa68df2d43bc0dbe3","outputId":"87f67bf6-5300-4ae7-f911-2d3da536e375","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":15,"user_tz":180,"timestamp":1647184301613},"deepnote_cell_type":"code"},"outputs":[{"output_type":"display_data","data":{"text/plain":"
\n\n"},"metadata":{}}],"execution_count":7},{"cell_type":"markdown","source":"# Series de tiempo (Lineplot)","metadata":{"id":"z4wOGOrFEO5K","cell_id":"1d7533cab0bc4528b1eeacb18b8c91d8","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"import pandas as pd\ndf=pd.read_csv('accidents.csv',delimiter=\";\")\ntype(df)","metadata":{"id":"o8XIFxAmERvU","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"801a80ab80d24535b2d800f30099f17e","outputId":"2902bd7d-eddf-427a-918a-3f1091dbe89b","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":1076,"user_tz":180,"timestamp":1647184351921},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"pandas.core.frame.DataFrame"},"metadata":{},"execution_count":10}],"execution_count":10},{"cell_type":"code","source":"print(df.shape)\ndf.head()","metadata":{"id":"XMx7qE2JEhtT","colab":{"height":439,"base_uri":"https://localhost:8080/"},"cell_id":"3cb013162f514bd0b215352c202b7ebd","outputId":"dc59361f-19e8-4b4c-89e0-1646de3669b6","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":418,"user_tz":180,"timestamp":1647184354674},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"(238522, 24)\n"},{"output_type":"execute_result","data":{"text/plain":" DATE TIME BOROUGH ZIP CODE LATITUDE LONGITUDE \\\n0 09/26/2018 12:12 BRONX 10454.0 40.808987 -73.911316 \n1 09/25/2018 16:30 BROOKLYN 11236.0 40.636005 -73.912510 \n2 08/22/2019 19:30 QUEENS 11101.0 40.755490 -73.939530 \n3 09/23/2018 13:10 QUEENS 11367.0 NaN NaN \n4 08/20/2019 22:40 BRONX 10468.0 40.868336 -73.901270 \n\n ON STREET NAME NUMBER OF PEDESTRIANS INJURED \\\n0 NaN 0 \n1 FLATLANDS AVENUE 1 \n2 NaN 0 \n3 MAIN STREET 0 \n4 NaN 0 \n\n NUMBER OF PEDESTRIANS KILLED NUMBER OF CYCLIST INJURED ... \\\n0 0 0 ... \n1 0 0 ... \n2 0 0 ... \n3 0 1 ... \n4 0 0 ... \n\n CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 \\\n0 NaN NaN \n1 NaN NaN \n2 NaN NaN \n3 Unspecified NaN \n4 Unspecified NaN \n\n CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID \\\n0 NaN NaN 3988123 \n1 NaN NaN 3987962 \n2 NaN NaN 4193132 \n3 NaN NaN 3985962 \n4 NaN NaN 4192111 \n\n VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 \\\n0 Sedan NaN \n1 Sedan NaN \n2 Sedan NaN \n3 Bike Station Wagon/Sport Utility Vehicle \n4 Sedan Sedan \n\n VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5 \n0 NaN NaN NaN \n1 NaN NaN NaN \n2 NaN NaN NaN \n3 NaN NaN NaN \n4 NaN NaN NaN \n\n[5 rows x 24 columns]","text/html":"\n
\n
\n
\n\n
\n \n
\n
\n
DATE
\n
TIME
\n
BOROUGH
\n
ZIP CODE
\n
LATITUDE
\n
LONGITUDE
\n
ON STREET NAME
\n
NUMBER OF PEDESTRIANS INJURED
\n
NUMBER OF PEDESTRIANS KILLED
\n
NUMBER OF CYCLIST INJURED
\n
...
\n
CONTRIBUTING FACTOR VEHICLE 2
\n
CONTRIBUTING FACTOR VEHICLE 3
\n
CONTRIBUTING FACTOR VEHICLE 4
\n
CONTRIBUTING FACTOR VEHICLE 5
\n
COLLISION_ID
\n
VEHICLE TYPE CODE 1
\n
VEHICLE TYPE CODE 2
\n
VEHICLE TYPE CODE 3
\n
VEHICLE TYPE CODE 4
\n
VEHICLE TYPE CODE 5
\n
\n \n \n
\n
0
\n
09/26/2018
\n
12:12
\n
BRONX
\n
10454.0
\n
40.808987
\n
-73.911316
\n
NaN
\n
0
\n
0
\n
0
\n
...
\n
NaN
\n
NaN
\n
NaN
\n
NaN
\n
3988123
\n
Sedan
\n
NaN
\n
NaN
\n
NaN
\n
NaN
\n
\n
\n
1
\n
09/25/2018
\n
16:30
\n
BROOKLYN
\n
11236.0
\n
40.636005
\n
-73.912510
\n
FLATLANDS AVENUE
\n
1
\n
0
\n
0
\n
...
\n
NaN
\n
NaN
\n
NaN
\n
NaN
\n
3987962
\n
Sedan
\n
NaN
\n
NaN
\n
NaN
\n
NaN
\n
\n
\n
2
\n
08/22/2019
\n
19:30
\n
QUEENS
\n
11101.0
\n
40.755490
\n
-73.939530
\n
NaN
\n
0
\n
0
\n
0
\n
...
\n
NaN
\n
NaN
\n
NaN
\n
NaN
\n
4193132
\n
Sedan
\n
NaN
\n
NaN
\n
NaN
\n
NaN
\n
\n
\n
3
\n
09/23/2018
\n
13:10
\n
QUEENS
\n
11367.0
\n
NaN
\n
NaN
\n
MAIN STREET
\n
0
\n
0
\n
1
\n
...
\n
Unspecified
\n
NaN
\n
NaN
\n
NaN
\n
3985962
\n
Bike
\n
Station Wagon/Sport Utility Vehicle
\n
NaN
\n
NaN
\n
NaN
\n
\n
\n
4
\n
08/20/2019
\n
22:40
\n
BRONX
\n
10468.0
\n
40.868336
\n
-73.901270
\n
NaN
\n
0
\n
0
\n
0
\n
...
\n
Unspecified
\n
NaN
\n
NaN
\n
NaN
\n
4192111
\n
Sedan
\n
Sedan
\n
NaN
\n
NaN
\n
NaN
\n
\n \n
\n
5 rows × 24 columns
\n
\n \n \n \n\n \n
\n
\n "},"metadata":{},"execution_count":11}],"execution_count":11},{"cell_type":"code","source":"#Agrupe los datos disponibles mensualmente y genere un line plot de accidentes a lo largo del tiempo. ¿Ha aumentado el número de accidentes durante el último año y medio?\ndf['DATE']=pd.to_datetime(df['DATE'])\nmonthly_accidents =df.groupby(df['DATE'].dt.to_period('M')).size()\nmonthly_accidents.plot.line()","metadata":{"id":"sc-R1gyNEn4E","colab":{"height":308,"base_uri":"https://localhost:8080/"},"cell_id":"c5ced4b72cee40ceb8246a6df5538d24","outputId":"f701cd77-dda4-4b5e-9713-eed5d458ab80","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":585,"user_tz":180,"timestamp":1647184356536},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":""},"metadata":{},"execution_count":12},{"output_type":"display_data","data":{"text/plain":"
","image/png":"\n"},"metadata":{"needs_background":"light"}}],"execution_count":12},{"cell_type":"markdown","source":"# Facetgrid","metadata":{"id":"iS664EXkFZEM","cell_id":"baffba237a3f4c02bc7f87149e4eaeac","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"df['TIME']=pd.to_datetime(df['TIME'])\ndf['HOUR'] = df['TIME'].dt.hour\n\ndf1 = pd.DataFrame({'count': df.groupby(['BOROUGH', 'HOUR']).size()})\ndf1\ndf1 = df1.reset_index()\ndf1.head(10)\n\ndf1 = pd.DataFrame({'count': df.groupby(['BOROUGH', 'HOUR']).size()})\ndf1 = df1.reset_index()\nchart = sns.FacetGrid(df1, col='BOROUGH', margin_titles=True, col_wrap=3, aspect=2, row_order=df['BOROUGH'].unique)\nchart.map(sns.barplot, 'HOUR', 'count',)","metadata":{"id":"ko4TMNxfE0yg","colab":{"height":528,"base_uri":"https://localhost:8080/"},"cell_id":"d240c2a35f1d40c0a8fa35ea7612b009","outputId":"131957f0-dae5-4d4b-cd35-f80d99df3ac7","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":4239,"user_tz":180,"timestamp":1647184364615},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stderr","text":"/usr/local/lib/python3.7/dist-packages/seaborn/axisgrid.py:670: UserWarning:\n\nUsing the barplot function without specifying `order` is likely to produce an incorrect plot.\n\n"},{"output_type":"execute_result","data":{"text/plain":""},"metadata":{},"execution_count":13},{"output_type":"display_data","data":{"text/plain":"
","image/png":"\n"},"metadata":{"needs_background":"light"}}],"execution_count":13},{"cell_type":"markdown","source":"# Otros graficos ","metadata":{"id":"s0_2Dc130Xnt","cell_id":"dd847845bfd24158a03ee5bff6a4432d","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"df_prueba=df[['DATE','BOROUGH']]\npie_borough = df_prueba.groupby('BOROUGH').agg('count')\npie_borough=pie_borough.rename(columns={'DATE': 'Frecuencia'})\npie_borough","metadata":{"id":"iwWH-b2LLtxf","colab":{"height":238,"base_uri":"https://localhost:8080/"},"cell_id":"39c3f9b6bd9445b5b236f725272c0866","outputId":"bd7b9616-ef0f-479f-884a-749a1fb5d090","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":397,"user_tz":180,"timestamp":1647184366534},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":" Frecuencia\nBOROUGH \nBRONX 37709\nBROOKLYN 76253\nMANHATTAN 48749\nQUEENS 67120\nSTATEN ISLAND 8691","text/html":"\n
\n
\n
\n\n
\n \n
\n
\n
Frecuencia
\n
\n
\n
BOROUGH
\n
\n
\n \n \n
\n
BRONX
\n
37709
\n
\n
\n
BROOKLYN
\n
76253
\n
\n
\n
MANHATTAN
\n
48749
\n
\n
\n
QUEENS
\n
67120
\n
\n
\n
STATEN ISLAND
\n
8691
\n
\n \n
\n
\n \n \n \n\n \n
\n
\n "},"metadata":{},"execution_count":14}],"execution_count":14},{"cell_type":"code","source":"labels = pie_borough.index\nprint(labels)\npie, ax = plt.subplots(figsize=[10,6])\nfig=plt.pie(x=pie_borough, autopct=\"%.1f%%\",labels=labels,explode=[0.05]*5,\\\n pctdistance=0.5)\nplt.title(\"Distribucion de barrios\", fontsize=14);\n","metadata":{"id":"RzVAQnCZM30i","colab":{"height":461,"base_uri":"https://localhost:8080/"},"cell_id":"0d6f095fa6ea41c6b6c30383ba757bf0","outputId":"070f9776-6b35-48f0-f75b-b240e5b6d1ae","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":609,"user_tz":180,"timestamp":1647184368326},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"Index(['BRONX', 'BROOKLYN', 'MANHATTAN', 'QUEENS', 'STATEN ISLAND'], dtype='object', name='BOROUGH')\n"},{"output_type":"stream","name":"stderr","text":"/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:4: MatplotlibDeprecationWarning:\n\nNon-1D inputs to pie() are currently squeeze()d, but this behavior is deprecated since 3.1 and will be removed in 3.3; pass a 1D array instead.\n\n"},{"output_type":"display_data","data":{"text/plain":"
","image/png":"\n"},"metadata":{}}],"execution_count":15},{"cell_type":"code","source":"pie_borough","metadata":{"id":"J8XlI61GTu_2","colab":{"height":238,"base_uri":"https://localhost:8080/"},"cell_id":"2654d3883dfd483c8096971be824ddd5","outputId":"8c9549e0-7bc5-4971-9df0-2018954e4b8a","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":14,"user_tz":180,"timestamp":1647184369666},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":" Frecuencia\nBOROUGH \nBRONX 37709\nBROOKLYN 76253\nMANHATTAN 48749\nQUEENS 67120\nSTATEN ISLAND 8691","text/html":"\n
\n
\n
\n\n
\n \n
\n
\n
Frecuencia
\n
\n
\n
BOROUGH
\n
\n
\n \n \n
\n
BRONX
\n
37709
\n
\n
\n
BROOKLYN
\n
76253
\n
\n
\n
MANHATTAN
\n
48749
\n
\n
\n
QUEENS
\n
67120
\n
\n
\n
STATEN ISLAND
\n
8691
\n
\n \n
\n
\n \n \n \n\n \n
\n
\n "},"metadata":{},"execution_count":16}],"execution_count":16},{"cell_type":"code","source":"import plotly.express as px\nfig = px.pie(pie_borough, values='Frecuencia', \\\n names=pie_borough.index, title='Piechart Boroughs')\nfig.show()","metadata":{"id":"CxpXXYJjTn6C","colab":{"height":542,"base_uri":"https://localhost:8080/"},"cell_id":"0017ce0fb393483ba0e699109704b221","outputId":"652eb57f-49f1-478d-afd6-54747fe89cc0","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":12,"user_tz":180,"timestamp":1647184371104},"deepnote_cell_type":"code"},"outputs":[{"output_type":"display_data","data":{"text/html":"\n\n\n
\n
\n\n"},"metadata":{}}],"execution_count":17},{"cell_type":"code","source":"import plotly.express as px\nfig = px.pie(pie_borough, values='Frecuencia', \\\n names=pie_borough.index, title='Piechart Boroughs')\nfig.update_traces(textposition='inside', textinfo='percent+label')\nfig.show()","metadata":{"id":"DKPcfSo3TNPu","colab":{"height":542,"base_uri":"https://localhost:8080/"},"cell_id":"b33551a4603746d290b9210a0c1db7c8","outputId":"07278e34-1336-4dd5-9262-4237eed55db4","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":18,"user_tz":180,"timestamp":1647184372796},"deepnote_cell_type":"code"},"outputs":[{"output_type":"display_data","data":{"text/html":"\n\n\n
\n
\n\n"},"metadata":{}}],"execution_count":18},{"cell_type":"markdown","source":"# Medidas de tendencia central \n\n**Media** \n$$\\bar{x} =\\frac{\\sum_{i=1}^n x_i}{n}$$\n\nPara el caso de datos discretos agrupados:\n\n$$\\bar{x} =\\sum_j x_j fr(x_j)$$\nPara datos agrupados en clases, la media se calcula suponiendo que todos los datos de cada clase son idénticos al centro de la clase, con lo que,\nllamando mj a estos valores centrales y fr (mj) a la frecuencia relativa de la\nclase j, la fórmula se reduce a:\n$$\\bar{x} =\\sum_j m_j fr(m_j)$$\n\n**Media geometrica**\nMuy utilizada en lo que son tasas de interes y aspectos financieros\n\n$$B=\\sqrt{x_1 *x_2 *\\dots* x_n}$$\n\n**Media armonica**\nSe usa usualmente para calcular promedios espacio temporales \n\n$$C= \\frac{n}{\\sum_{i=1}^n \\frac{1}{x_i}}$$\n\nLa media armonica siempre es la menor de las tres, la aritmetica la mayor y la geometrica un valor intermedio:\n\n$$C< B<\\bar{x}$$\n\n**Media recortada** Es simplemente la media removiendo en la parte inferior y superior de los datos ordenados cierto porcentaje de los datos \n\n**Mediana y moda**\nLa mediana es un valor tal que, ordenados en magnitud los datos, el 50% es menor que ella y el 50% mayor. Por tanto, al ordenar los datos sin agrupar,\nla mediana es el valor central, si su número es impar, o la media de los dos\ncentrales, si hay un número par\n\nPara datos agrupados discretos se toma\ncomo mediana el valor xm tal que\n$$fr(x\\leq x_a) <0.5$$\n$$fr(x\\leq x_b) >0.5$$\n\nLa moda simplemente es el valor mas frecuente\n\n\n\n","metadata":{"id":"5dKt5_iMUaoW","cell_id":"47f3942d7ccc48c581b4feabafcaba96","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"import scipy \nscipy.stats.describe(monthly_accidents)","metadata":{"id":"Yj_tYHNVVXZd","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"b83ac25cf6fe46ceb056f9ac8c6dcaa7","outputId":"8a8c2895-0641-4cc4-ee37-88851523ecdd","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":275,"user_tz":180,"timestamp":1647184529757},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"DescribeResult(nobs=20, minmax=(8466, 13438), mean=11926.1, variance=1518605.3578947366, skewness=-1.160513548007565, kurtosis=1.140580420470969)"},"metadata":{},"execution_count":19}],"execution_count":19},{"cell_type":"code","source":"scipy.stats.gmean(monthly_accidents) # Media geometrica","metadata":{"id":"Ap-ftqQeYo6B","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"ce63eff9a700450798de251c7447e1a2","outputId":"ae27a4da-200b-4534-f4fb-f706448b0bbf","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":306,"user_tz":180,"timestamp":1647184530470},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"11859.492451965642"},"metadata":{},"execution_count":20}],"execution_count":20},{"cell_type":"code","source":"scipy.stats.hmean(monthly_accidents) # Media armonica","metadata":{"id":"Zjbo6DwyYun2","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"7adfde0ca61640dda850463a313df8b4","outputId":"2ab88692-166e-4a78-eecd-90dad9f9b987","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":5,"user_tz":180,"timestamp":1647184531590},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"11785.837775632142"},"metadata":{},"execution_count":21}],"execution_count":21},{"cell_type":"code","source":"scipy.stats.trim_mean(monthly_accidents,0.1) # Media recortada (Proporcion removida en cada cola 10%)","metadata":{"id":"bUFcmrl9ZfDk","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"868a3c2caaa74f699e999b264dfec449","outputId":"06201734-e0b5-400c-c5dc-a78442f92bc7","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":477,"user_tz":180,"timestamp":1647184532518},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"12060.75"},"metadata":{},"execution_count":22}],"execution_count":22},{"cell_type":"code","source":"scipy.stats.mode(monthly_accidents) # Moda","metadata":{"id":"urL1Q_BdgF3y","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"17bf158a9e6c410d990a6f1c451d1072","outputId":"e3e62d98-fc45-43ec-f02b-f2dfc678c486","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":5,"user_tz":180,"timestamp":1647184532785},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"ModeResult(mode=array([8466]), count=array([1]))"},"metadata":{},"execution_count":23}],"execution_count":23},{"cell_type":"markdown","source":"# Medidas de dispersion\n\n**Desviacion tipica** \n\nPromedio de las desviaciones de los datos respecto a la medida de centralización\n\n$$s=\\sqrt{\\frac{\\sum_{i=1}^n (x_i -\\bar{x})^2}{n-1}}$$\n\nPara datos agrupados es:\n\n$$s=\\sqrt{\\sum_{i=1}^n (x_i -\\bar{x})^2 fr(x_i)}$$\n\nLa información conjunta que proporcionan la media y la desviación típica\npuede precisarse de la siguiente forma: entre la media y k veces la desviación\ntípica existe, como mínimo, el\n\n$$100(1-\\frac{1}{k^2})\\%$$ de las observaciones.\n\nPara dos desviaciones tipicas:\n\n$$100(1-\\frac{1}{2^2})\\% = 75\\% $$\n\nA esto se conoce como la desigualdad de **Tchebychev**\n\n\n**Coeficiente de variacion**\n\nEl coeficiente de variación es una medida relativa de variabilidad. En ingeniería\nse utiliza mucho el coeficiente inverso,$\\frac{|x|}{s}$, que se conoce como\ncoeficiente señal-ruido.\n\nEl coeficiente de variación en datos positivos de una población homogénea es típicamente menor que la unidad. Si este coeficiente es mayor que 1.5, conviene investigar posibles fuentes de heterogeneidad en los datos (medidas con distintos instrumentos; en personas de distinto sexo; en distintos momentos temporales, etc.).\n\n**Mediana de las desviaciones absolutas**\nLa mediana de las desviaciones absolutas (MEDA) que tiene la ventaja, como la mediana, de no verse afectada por datos extremos.\n\n$$MEDA= median|X_t - Mediana| $$\n\n**Rango** Se denomina rango o recorrido de una variable la diferencia entre su valor máximo y mínimo\n\n$$Rango = Max(X)- Min(X)$$\n\n\nLlamaremos **percentil** p al menor valor superior al $p%$ de los datos. Por ejemplo, si el número de datos es impar,la mediana es el percentil 50. \n\nLlamaremos **cuartiles** a aquellos valores que dividen la distribución\nen cuatro partes iguales. El primer cuartil, Q1, es por definición\nigual al percentil 25, el segundo es la mediana y el tercero, Q3, el percentil\n75, los percentiles y los cuartiles se utilizan para construir medidas de\ndispersión basadas en los datos ordenados, como el **rango intercuartílico (IQR)**,\nque es la diferencia entre los percentiles 75 y 25.\n\n$$IQR= P_{75} -P_{25}$$\n\n**Error estandar**\nDesviación estándar de la muestra dividida por la raíz cuadrada del tamaño de la muestra (suponiendo la independencia estadística de los valores de la muestra).\n\n$$SE= \\frac{\\sigma}{\\sqrt{n}}$$\n\n","metadata":{"id":"gjOb8LLNbc7u","cell_id":"5e187b97f33749fa9ce4bfe285d343e5","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"monthly_accidents","metadata":{"id":"3G9hl3WdaKu8","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"bd23a09ced7646509a00ff854a8d273a","outputId":"22bedc31-0fb5-48af-f4f4-b7e244da5b7d","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":6,"user_tz":180,"timestamp":1647184535584},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"DATE\n2018-01 11735\n2018-02 10395\n2018-03 12519\n2018-04 11679\n2018-05 13438\n2018-06 13314\n2018-07 12787\n2018-08 12644\n2018-09 12425\n2018-10 13336\n2018-11 12447\n2018-12 12479\n2019-01 11000\n2019-02 10310\n2019-03 11482\n2019-04 10833\n2019-05 12642\n2019-06 12577\n2019-07 12014\n2019-08 8466\nFreq: M, dtype: int64"},"metadata":{},"execution_count":24}],"execution_count":24},{"cell_type":"code","source":"scipy.stats.describe(monthly_accidents) # Calcular el coeficiente de variacion","metadata":{"id":"ukBuN9BzgAOT","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"bc89342545cf48be874775fa237cc7a1","outputId":"4451e85d-6a69-43e6-f336-6a2a4c988c4d","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":436,"user_tz":180,"timestamp":1647184537150},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"DescribeResult(nobs=20, minmax=(8466, 13438), mean=11926.1, variance=1518605.3578947366, skewness=-1.160513548007565, kurtosis=1.140580420470969)"},"metadata":{},"execution_count":25}],"execution_count":25},{"cell_type":"code","source":"scipy.stats.variation(monthly_accidents) # Calcular el coeficiente de variacion","metadata":{"id":"894Ei7kkfwdK","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"0a6153ce9a5443248bfe2a08c6d596e3","outputId":"fe997bc6-bf02-4d33-bf82-38bbb92bf3f0","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":4,"user_tz":180,"timestamp":1647184537434},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"0.10071306660647113"},"metadata":{},"execution_count":26}],"execution_count":26},{"cell_type":"code","source":"scipy.stats.iqr(monthly_accidents) # Calcular el IQR","metadata":{"id":"MRR6jFzUgOfb","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"b1b2c260f0ea4bb1a460262810764d0b","outputId":"66a68e7d-6455-49a2-db5b-7e9dda32ae38","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":4,"user_tz":180,"timestamp":1647184538420},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"1281.0"},"metadata":{},"execution_count":27}],"execution_count":27},{"cell_type":"code","source":"scipy.stats.sem(monthly_accidents) # Calcular el Error estandar","metadata":{"id":"VS-XCLp8gVHH","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"15f39ebeebb949f592fef80a9abfef43","outputId":"43c5748d-8cf6-482c-a7f4-a4cc4bf250d8","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":6,"user_tz":180,"timestamp":1647184538970},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"275.5544735523937"},"metadata":{},"execution_count":28}],"execution_count":28},{"cell_type":"markdown","source":"# Medidas de asimetria y kurtosis\n\nEstas medidas informan sobre dos aspectos importantes de la forma de\nla distribución: su grado de asimetría y su grado de homogeneidad. Al ser\nmedidas de forma, no dependen de las unidades de medida de los datos\n\n**Asimetria** En un conjunto de datos simétricos respecto a su media $\\bar{x}$, la suma $\\sum (x-\\bar{x})^3$ será nula, mientras que con datos asimétricos esta suma crecerá con la asimetría.\nPara obtener una medida adimensional, se define el coeficiente de\nasimetría mediante:\n\n$$CA=\\frac{\\sum_{i=1}^n (x_i -\\bar{x})^3}{ns^3}$$\n\nDonde s es la desviación típica.\n\nEl signo del coeficiente de asimetría indica la forma de la distribución. \n1. Si este coeficiente es negativo, la distribución se alarga para valores inferiores a la media\n\n2. Si el coeficiente es positivo, la cola de la distribución se extiende para valores superiores a la media\n\n\n\n","metadata":{"id":"aZ2PBUf0gvzh","cell_id":"e1fb1ff6defd425a9b38dde2dac6ee2b","deepnote_cell_type":"markdown"}},{"cell_type":"markdown","source":"**Kurtosis** es una característica de como la frecuencia relativa se reparte entre el centro y los extremos\n\n\n$$CA_p=\\frac{\\sum_{i=1}^n (x_i -\\bar{x})^4}{ns^4}$$\n\nEste coeficiente es siempre mayor\no igual que uno. El coeficiente de curtosis es importante porque nos informa respecto a la heterogeneidad de la distribución.\n\n1. Si es muy bajo (menor de 2), indica una distribución mezclada\n2. si es muy alto (mayor de 6), indica\nla presencia de valores extremos atípicos.\n","metadata":{"id":"DBRyor7xh22P","cell_id":"09b32ea1e7e946c39632673a5c2437d1","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"scipy.stats.skew(monthly_accidents) # Calcular el CA","metadata":{"id":"rUZG-NhZhXZk","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"1774a07e0db448bba8699b61f16ed1f4","outputId":"1a67997e-4961-44fd-cdee-e9a0974292a7","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":455,"user_tz":180,"timestamp":1647184542403},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"-1.160513548007565"},"metadata":{},"execution_count":29}],"execution_count":29},{"cell_type":"code","source":"scipy.stats.kurtosis(monthly_accidents) # Calcular el CA_p","metadata":{"id":"e94m4q1zgvPG","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"96f17c5ff9714502a3916bba06759a97","outputId":"dcc43176-7983-4c9a-82d3-4e12823b16b8","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":4,"user_tz":180,"timestamp":1647184542752},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"1.140580420470969"},"metadata":{},"execution_count":30}],"execution_count":30},{"cell_type":"code","source":"plt.hist(monthly_accidents)","metadata":{"id":"mhji3Yi4jWqJ","colab":{"height":334,"base_uri":"https://localhost:8080/"},"cell_id":"5cf9c44ec340426aa0d723c36ce0c687","outputId":"ed0bc530-d953-4ee5-a8e2-127ea3a0dea7","executionInfo":{"user":{"userId":"09471607480253994520","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjvGjd5VpSUEHTxlxXRYAinh8eCspL5nxvcW9wD=s64","displayName":"David Francisco Bustos Usta"},"status":"ok","elapsed":299,"user_tz":180,"timestamp":1647184543318},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"(array([1., 0., 0., 2., 1., 1., 3., 2., 7., 3.]),\n array([ 8466. , 8963.2, 9460.4, 9957.6, 10454.8, 10952. , 11449.2,\n 11946.4, 12443.6, 12940.8, 13438. ]),\n )"},"metadata":{},"execution_count":31},{"output_type":"display_data","data":{"text/plain":"