{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "DQ_birthdates_basic_stats_v1.1\n" ], "metadata": { "id": "wO2KsMTUPxWr" } }, { "cell_type": "markdown", "source": [ "**Analysis approach of the data quality of dates using basic statistical methods (with Python)**\n", "A user-friendly approach to detect data quality issues and outliers in dates" ], "metadata": { "id": "-QF90t-BeCk5" } }, { "cell_type": "markdown", "source": [ "\n", "There main problems that cause defects in data quality are related to: problems in data entry applications, incorrect migration of data from old systems, degradation of information systems, process integration problems, etc.\n", "\n", "All of them lead to information degradation problems, generate high management costs at the operational level and have a negative impact on information analysis and decision making.\n", "\n", "In order to detect the existence of possible data quality problems in the imported dataset, it is recommended to carry out the following set of steps:\n", "\n", "1. Define the use case to analyze\n", "2. Explore dataset\n", "3. Detection and cleansing of main technical errors\n", "4. Analysis of distribution and frequencies\n", "\n", "\n", "**1. Define the use case to analyze**\n", "It is important to keep in mind the use case to which the data to be analyzed refers, because in its analysis we must apply the technical criteria of the data as well as the functional ones. An example could be the analysis of dates of birth of customers of an online store. The range of ages should be over 18 years to 100 years. Out of this age range, the records have a high probability of being considered a possible error.\n", "\n", "\n", "**2. Explore dataset**\n", "Once the use case is known, we will import the dates of birth dataset from a public github repository. Once the dataset is loaded, we will obtain the information from the dataset, as well as a preview a sample of the data, where we can see that the dataset contains two columns: one of them is about ids and the other dates of birth on datetime format.\n", "\n", "Additionally, we will obtain the maximum and minimum dates to get an idea of ​​the range of dates with which we are working about." ], "metadata": { "id": "GkJfb_MXXZb3" } }, { "cell_type": "code", "source": [ "import pandas as pd\n", "\n", "#Import dataset dates from github\n", "url = \"https://raw.githubusercontent.com/mabrotons/datasets/master/birthdates.csv\"\n", "\n", "\n", "df = pd.read_csv(url, index_col=0, parse_dates=['birthdates'])\n", "\n", "print(\"Data frame info: \") \n", "print(df.info())\n", "\n", "print(\"\\nData frame head: \") \n", "print(df.head())\n", "\n", "dates = df['birthdates']\n", "print(\"Min date: \" + str(min(dates)))\n", "print(\"Max date: \" + str(max(dates)))" ], "metadata": { "id": "DutKSKQ9XISC", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "ed4f0f1b-862d-4022-e612-4eac2652ba60" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Data frame info: \n", "\n", "Int64Index: 2103 entries, 0 to 2102\n", "Data columns (total 2 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 ids 2103 non-null object \n", " 1 birthdates 2103 non-null datetime64[ns]\n", "dtypes: datetime64[ns](1), object(1)\n", "memory usage: 49.3+ KB\n", "None\n", "\n", "Data frame head: \n", " ids birthdates\n", "0 C0000000001 1948-01-16\n", "1 C0000000002 1988-03-11\n", "2 C0000000003 1999-05-11\n", "3 C0000000004 1997-06-01\n", "4 C0000000005 1967-12-10\n", "Min date: 1801-09-03 00:00:00\n", "Max date: 2029-10-15 00:00:00\n" ] } ] }, { "cell_type": "markdown", "source": [ "**3. Detection and cleansing of main technical errors**\n", "\n", "As a first step on the analysis, we will build a graph to represent all the data included in the dataset in order to have a first view of the data, its distribution, the range, as well as detect possible outliers." ], "metadata": { "id": "_TD1njAdXXrb" } }, { "cell_type": "code", "source": [ "import matplotlib.pyplot as plt\n", "\n", "f = plt.figure()\n", "f.set_figwidth(15)\n", "f.set_figheight(5)\n", "\n", "plt.hist(dates, bins=50, edgecolor='black')\n", "plt.xticks(rotation=30)\n", "plt.title(\"Dates\")\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 347 }, "id": "drxqdxn1vcfy", "outputId": "689967e8-ed14-400f-9e45-945272b8eea3" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3YAAAFKCAYAAABRis1yAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAf/ElEQVR4nO3deZhlV1kv4N9HOgQwQIA0oU2nE4YAAkqQJswyiQaMT+BeQEAhAhqQQfDCNYjXK4gBIjMyaBguoELMBZEQeYAIKDMkYUpC5BKB0Gm6STOFyUCG7/6xd0tZ6aRr7Opd9b7Pc546Z529qtbp/urU+e299trV3QEAAGC6rrHSAwAAAGBxBDsAAICJE+wAAAAmTrADAACYOMEOAABg4gQ7AACAiRPsAAAAJk6wA2DyquqrVfUfVfX9qvpuVX2sqp5QVbv9O1dVh1VVV9W6PTFWAFgOgh0Aq8Wvd/d1kxya5AVJjk/y+pUdEgDsGYIdAKtKd1/c3acm+Y0kx1bV7arq16rqM1X1varaUlXPntHlQ+PX71bVD6rqrklSVY+tqvOq6jtV9d6qOnRsr6p6aVVdNH6/s6vqdnv0RQLALIIdAKtSd38qyYVJ7pnkh0keneSAJL+W5Peq6kHjpr80fj2gu/fv7o9X1TFJnpXkvyVZn+TDSd46bvcrY59bJrl+kocl+dbyvyIAuGqCHQCr2deT3LC7/6W7z+7uK7r78xlC2r2upt8Tkjy/u8/r7suSPC/JEeNRu0uTXDfJrZPUuM22ZX4dAHC1BDsAVrODk3y7qu5cVR+sqh1VdXGG4Hbg1fQ7NMnLx4VYvpvk20kqycHd/YEkr0zyqiQXVdVJVXW9ZX4dAHC1BDsAVqWqulOGYPeRJG9JcmqSQ7r7+kn+KkNQS5LeRfctSR7f3QfMuF27uz+WJN39iu6+Y5LbZJiS+T+X+eUAwNUS7ABYVarqelV1dJKTk/xtd5+dYerkt7v7kqo6MskjZ3TZkeSKJDeb0fZXSf6oqm47fs/rV9VDx/t3Go8A7pvh3L1Lxv4AsGJcsweA1eJdVXVZhpD1hSQvyRDQkuSJSV5cVa9M8q9JTsmwkEq6+0dVdUKSj45h7ajufkdV7Z/k5PG8uouTnJ7k/ya5XpKXZgiClyR5b5IX7qHXCAC7VN27moECAADAVJiKCQAAMHGCHQAAwMTtNthV1bWq6lNV9bmqOreqnjO237SqPllV51fV31fVNcf2/cbH54/PH7a8LwEAAGBtm8sRux8nuW933z7JEUmOqqq7JDkxyUu7+xZJvpPkceP2j0vynbH9peN2AAAALJPdBrse/GB8uO946yT3TfK2sf1NSR403j9mfJzx+ftV1c5rBQEAALDE5nS5g6raJ8lZSW6R5FVJ/j3Jd7v7snGTCzNcBDbj1y1J0t2XVdXFSW6U5JtX9f0PPPDAPuywwxYyfgAAgMk766yzvtnd6xfaf07BrrsvT3JEVR2Q5B1Jbr3QH7hTVR2X5Lgk2bRpU84888zFfksAAIBJqqoLFtN/Xqtidvd3k3wwyV2THFBVO4PhxiRbx/tbkxwyDm5dkusn+dYuvtdJ3b25uzevX7/gYAoAALDmzWVVzPXjkbpU1bWT3D/JeRkC3kPGzY5N8s7x/qnj44zPf6BdBR0AAGDZzGUq5oYkbxrPs7tGklO6+7Sq+kKSk6vqz5N8Jsnrx+1fn+Rvqur8JN9O8vBlGDcAAACj3Qa77v58kjvsov3LSY7cRfslSR66JKMDAABgt+Z1jh0AAAB7H8EOAABg4gQ7AACAiRPsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAYAls2LgpVbWg24aNm1Z6+Ezcbi9QDgAA7N72rVty6PGnLajvBScevcSjYa1xxA4AAGDiBDsAAICJE+wAAAAmTrADAACYOMEOAABg4gQ7AACAiRPsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJg4wQ4AAGDiBDsAAICJE+wAAAAmTrADAACYOMEOAABg4gQ7AACAiRPsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJg4wQ4AAGDiBDsAAICJE+wAAAAmTrADAACYuN0Gu6o6pKo+WFVfqKpzq+qpY/uzq2prVX12vD1wRp8/qqrzq+qLVfWry/kCAAAA1rp1c9jmsiRP7+5PV9V1k5xVVaePz720u180c+Oquk2Shye5bZKfTfLPVXXL7r58KQcOAADAYLdH7Lp7W3d/erz//STnJTn4arock+Tk7v5xd38lyflJjlyKwQIAAHBl8zrHrqoOS3KHJJ8cm55cVZ+vqjdU1Q3GtoOTbJnR7cLsIghW1XFVdWZVnbljx455DxwAAIDBnINdVe2f5O1Jntbd30vymiQ3T3JEkm1JXjyfH9zdJ3X35u7evH79+vl0BQAAYIY5Bbuq2jdDqPu77v6HJOnub3T35d19RZLX5qfTLbcmOWRG941jGwAAAMtgLqtiVpLXJzmvu18yo33DjM0enOSc8f6pSR5eVftV1U2THJ7kU0s3ZAAAAGaay6qYd0/yqCRnV9Vnx7ZnJXlEVR2RpJN8Ncnjk6S7z62qU5J8IcOKmk+yIiYAAMDy2W2w6+6PJKldPPXuq+lzQpITFjEuAAAA5mheq2ICAACw9xHsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJg4wQ4AAGDiBDsAAICJE+wAAAAmTrADAACYOMEOAABg4gQ7AACAiRPsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJg4wQ4AAGDiBDsAAICJE+wAAAAmTrADAACYOMEOAABg4gQ7AACAiRPsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJg4wQ4AAGDiBDsAAICJE+wAAAAmTrADAACYOMEOAABg4nYb7KrqkKr6YFV9oarOraqnju03rKrTq+pL49cbjO1VVa+oqvOr6vNV9YvL/SIAAADWsrkcsbssydO7+zZJ7pLkSVV1myTPTPL+7j48yfvHx0nygCSHj7fjkrxmyUcNAADAf9ptsOvubd396fH+95Ocl+TgJMckedO42ZuSPGi8f0ySN/fgE0kOqKoNSz5yAAAAkszzHLuqOizJHZJ8MslB3b1tfGp7koPG+wcn2TKj24Vj2+zvdVxVnVlVZ+7YsWOewwYAAGCnOQe7qto/yduTPK27vzfzue7uJD2fH9zdJ3X35u7evH79+vl0BQAAYIY5Bbuq2jdDqPu77v6HsfkbO6dYjl8vGtu3JjlkRveNYxsAAADLYC6rYlaS1yc5r7tfMuOpU5McO94/Nsk7Z7Q/elwd8y5JLp4xZRMAAIAltm4O29w9yaOSnF1Vnx3bnpXkBUlOqarHJbkgycPG596d5IFJzk/yoySPWdIRAwAA8F/sNth190eS1FU8fb9dbN9JnrTIcQEAADBH81oVEwAAgL2PYAcAADBxgh0AAMDECXYAAAATJ9gBAABMnGAHAAAwcYIdAADAxAl2AAAAEyfYAQAATJxgBwAAMHGCHQAAwMQJdgAAABMn2AEAAEycYAcAADBxgh0AAMDECXYAAAATJ9gBAABMnGAHAAAwcYIdAADAxAl2AAAAEyfYAQAATJxgBwAAMHGCHQAAwMQJdgAAABMn2AEAAEycYAcAADBxgh0AAMDECXYAAAATJ9gBAABMnGAHAAAwcYIdAADAxAl2AAAAEyfYAQAATJxgBwAAMHGCHQAAwMQJdgAAABO322BXVW+oqouq6pwZbc+uqq1V9dnx9sAZz/1RVZ1fVV+sql9droEDAAAwmMsRuzcmOWoX7S/t7iPG27uTpKpuk+ThSW479nl1Ve2zVIMFAADgynYb7Lr7Q0m+Pcfvd0ySk7v7x939lSTnJzlyEeMDAABgNxZzjt2Tq+rz41TNG4xtByfZMmObC8e2K6mq46rqzKo6c8eOHYsYBgAAwNq20GD3miQ3T3JEkm1JXjzfb9DdJ3X35u7evH79+gUOAwAAgAUFu+7+Rndf3t1XJHltfjrdcmuSQ2ZsunFsAwAAYJksKNhV1YYZDx+cZOeKmacmeXhV7VdVN01yeJJPLW6IAAAAXJ11u9ugqt6a5N5JDqyqC5P8aZJ7V9URSTrJV5M8Pkm6+9yqOiXJF5JcluRJ3X358gwdAACAZA7BrrsfsYvm11/N9ickOWExgwIAAGDuFrMqJgAAAHsBwQ4AAGDiBDsAAICJE+wAAAAmTrADAACYOMEOAABg4gQ7AACAiRPsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJg4wQ4AAGDiBDsAAEiyYeOmVNWCb7CS1q30AAAAYKYNGzdl+9YtC+p7k4MPybYLv7agvtu3bsmhx5+2oL5JcsGJRy+4LyyWYAcAwF5lMQFLuGKtMhUTAIAlt5hpjcD8OWIHAMCSc9QN9ixH7AAAACZOsAMAAJg4wQ4AAGDiBDsAAICJE+wAAFapxV5we91+17ayJUyEVTEBAFappbjgtpUtYRocsQMAmIPFHP3asHHTSg8fWOUcsQMAmAPXZZuIffY1FZQ1SbADAGD1uPxSAZw1yVRMAIC92GKmgAJrhyN2AMCasGHjpmzfumWlhzFvpoACcyHYAQB71GIC1k0OPiTbLvzagvouxQqRAHsrwQ4A2KMcgQJYes6xAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZut8Guqt5QVRdV1Tkz2m5YVadX1ZfGrzcY26uqXlFV51fV56vqF5dz8AAAAMztiN0bkxw1q+2ZSd7f3Ycnef/4OEkekOTw8XZcktcszTABAAC4KrsNdt39oSTfntV8TJI3jffflORBM9rf3INPJDmgqjYs1WABAAC4soWeY3dQd28b729PctB4/+AkM684euHYdiVVdVxVnVlVZ+7YsWOBwwAAAGDRi6d0dyfpBfQ7qbs3d/fm9evXL3YYAAAAa9a6Bfb7RlVt6O5t41TLi8b2rUkOmbHdxrENAGDx9tk3VbXSowDY6yw02J2a5NgkLxi/vnNG+5Or6uQkd05y8YwpmwAAi3P5pTn0+NMW1PWCE49e4sEA7D12G+yq6q1J7p3kwKq6MMmfZgh0p1TV45JckORh4+bvTvLAJOcn+VGSxyzDmAEAAJhht8Guux9xFU/dbxfbdpInLXZQAAAAzN2iF08BAABgZQl2AAAAEyfYAQAATJxgBwAAMHELvdwBAABz5fp7wDIT7AAAlpvr7wHLzFRMAACAiRPsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJg4wQ4AAGDiBDsAAICJE+wAAAAmTrADAACYOMEOAABg4gQ7AACAiRPsAAAAJk6wAwDmbcPGTamqBd0AWHrrVnoAAMD0bN+6JYcef9qC+l5w4tFLPBoAHLEDAACYOMEOAABg4gQ7AACAiRPsAAAAJk6wAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJg4wQ4AAGDiBDsAAICJE+wAAAAmTrADAACYOMEOAAAmbMPGTamqBd02bNy00sNniaxbTOeq+mqS7ye5PMll3b25qm6Y5O+THJbkq0ke1t3fWdwwAQCAXdm+dUsOPf60BfW94MSjl3g0rJSlOGJ3n+4+ors3j4+fmeT93X14kvePjwEAAFgmyzEV85gkbxrvvynJg5bhZwAAADBabLDrJO+rqrOq6rix7aDu3jbe357koF11rKrjqurMqjpzx44dixwGAADA2rWoc+yS3KO7t1bVjZOcXlX/NvPJ7u6q6l117O6TkpyUJJs3b97lNgAAAOzeoo7YdffW8etFSd6R5Mgk36iqDUkyfr1osYMEAADgqi042FXVz1TVdXfeT/IrSc5JcmqSY8fNjk3yzsUOEgAAgKu2mKmYByV5R1Xt/D5v6e73VNUZSU6pqscluSDJwxY/TAAAAK7KgoNdd385ye130f6tJPdbzKAAAACYu+W43AEAAAB7kGAHAAAwcYIdAADAxAl2ALBIGzZuSlUt6LZh46aVHj4Aq8BiL1AOAGve9q1bcujxpy2o7wUnHr3EowFgLXLEDgAAYOIEOwAAgIkT7ABgJe2z74LPz1u337UX3Nf5fUCSRb0HeQ/ZuzjHDgBW0uWXLur8vIX23dkfWOMW+R7E3sMROwAAYEGsCrz3cMQOAABYEKsC7z0EOwBYq8ZzawCYPsEOANYq59YArBrOsQMAAPY8K3IuKUfsAACAPc+sgSUl2AEAwEpzziuLJNgBAMBKc/SKRXKOHQAAwMQJdgAAABMn2AEAAEycYAcAADBxgh0AAMDECXYAAAATJ9gBAABMnGAHAAAwcYIdAADAxAl2AAAAEyfYAQAATJxgBwAAMHGCHQAAwMQJdgAAABMn2AEAAEycYAcAADBxgh0AAMDECXZXY8PGTamqBd3W7XftBfetqmzYuGmlX/68Lebfa6qveTH8ewEAsFTWrfQA9mbbt27JoceftqC+F5x49IL7JskFL3pwqmpBffe55rVy+U8u2eN9kyzuNZ949IL7TtFi6itZe/9eAABctWULdlV1VJKXJ9knyeu6+wXL9bNWpcsvXZFQudi+i7LPvgsOszc5+JBsu/Bri/v5AAAwUcsS7KpqnySvSnL/JBcmOaOqTu3uLyzHz2OVWGSYXagNGzdl+9YtC+orUAIArAAHBK5kuY7YHZnk/O7+cpJU1clJjkki2LE8FvHLnSx8CulipswCALBAK3RAYG9W3b3037TqIUmO6u7fGR8/Ksmdu/vJM7Y5Lslx48NbJfnikg9kfg5M8s0VHgN7FzXBrqgLZlMTzKYm2BV1wWyza+LQ7l6/0G+2YoundPdJSU5aqZ8/W1Wd2d2bV3oc7D3UBLuiLphNTTCbmmBX1AWzLXVNLNflDrYmOWTG441jGwAAAEtsuYLdGUkOr6qbVtU1kzw8yanL9LMAAADWtGWZitndl1XVk5O8N8PlDt7Q3ecux89aQnvNtFD2GmqCXVEXzKYmmE1NsCvqgtmWtCaWZfEUAAAA9pzlmooJAADAHiLYAQAATJxgBwAAK6yqaqXHwN5nPnUh2F2FqrpZVW1c6XGwd1EXzKYm2B0f1phNTZAkVXWPqnpNVT0xSdrCF2RxdSHYzVJV16yqNyZ5T5K/qarHVtW1x+e8Ea9R6oLZ1ARXp6puW1X3TnxYY6AmmKmqfjHJa5KcleSBVfXSqjpihYfFCltsXQh2V3b7JPt39y2T/K8kv5TkUVW1rzfiNe2IqAv+KzXBlVTVNarq1UnenuRZVfXcqtq887mVHR0rQU1wFY5MckZ3vy7J7yT5UYYP8geu7LBYYXfKIurCG0qSqto4Yw/7PkluUVXV3R/NsDf+1knuuWIDZEVU1Z2q6kYzmtTFGldVP1dVG8aH14ia4MoOyBD4b53kN5N8K8nTq2r/7r5iZYfGCrlB1MSaV1UPq6r/UVV3G5s+nWT/qrpJd29P8oEk65PcY8UGyR5XVXerqndU1W3Hps9kEXWxpoNdVW2qqg8keUuSN1bVTZN8OcmHkhw1bva+JN9L8vNVtd/KjJQ9raruk+STSY6qqmsm2Z7kw0l+ddxEXawhVXWLqnpXktcmedf4BvzFJB+JmljzqurQqrrW+PBGSe5WVT/T3TsyHKX5TpInj9uaprsGVNVDdp4fk+R6URNrVlXtU1X/O8nxY9NfV9WvJ/lhkq8mudfY/q9Jvptk49hPXawNt09yuyR3qqrrZDhCd2GGWUDJPOtiTQe7JL+X5BPd/UsZPri/MMnPJNmW5I5VdWB3fzvJvye5e3f/2C/amnFYhr1pt0hy8wx7WLdn+MW7kbpYc56b5KzuvkeGgP+UDG++26Im1qyquk1V/WOSNyY5tapu1d1fSvKJJE8bN9uW4YP8EVW1wTTd1a2q9q+qtyd5RpLvVNW67v5Kko9GTaxJ3X15klsleXp3vyTJczKE+nVJvp6hDm7T3Zdl2GH44LGfulgbbpjk3CSbM4S8c5NckCGHzLsu1lywq6qbVNW+M5q2J0l3H59k3yR3z/CB/noZpkwkyTuT3KiqrucXbXUa62Lm78PXk7wtyaFJ7tPdP0zywSTXTfKocRt1sYqNNbFuPPr2nSTnjU91hpOar53k3UmuHzWxZuwM7FV16wwnuH+wu++T5Owkrxw3e32Su1fVTcc/yt9IckmS66zAkFlms3biHJLkG919l+5+a5LLx/Y3ZqiJm6mJ1a+qHl1V96qqA8ambyS5wRj035ZhJ+D9M0yzuyTJn4/bHZzkjKpat8cHzbKbURfXHx/vk+Fo3PMz7Cw+MkPg//jYfsLYdc51sWaCXVXdr6o+nORVSV4xNn8/yeVVdb3x8auTPDLJ55L8Y5LfrarnZ/gH/mSGw+asIrPq4tUznrp7hj0kL8swte53k1wrybuSPE5drF4zauLVSV7R3T/O8Ef4gVV1dpL7Ztj7+oEkP8nwXqEm1o6dUy4vTvLM7n75+PjPklynqtYnOSPDDsK/SJLuPifDTqIf7+Gxsmdca8b9X8hPp0w9McmfVtU9MuyF/2iSFyVqYjWqwYaq+mCSYzMcHHhVVe2f5JtJfj7J/uPmr0jyWxl2AjwnyXer6p+SPDzJ68bwzyqwi7p4ZJJXV9X68WjuHTLsMH5ZhqNyZ2U4L/eFGY76z6su1kSwq6pbJnlekpdnmH55s6q6Q5JPJblfhj1s6e73Jrlmkod298eS/EaSLyX50+5+1vgfwCqxi7rYVFX3H58+N8Mb8Q+S/HKSlyZZ190fzvALpi5WoVk18YQkN6+qu3b3izO86X6hu4/o7mck+WySR3b3R6ImVr2qun9VnZ7khVX1sO7e1t0fn3G05ueTXNLdO7r7BxmC3sFV9ZdVdU6GqTUXm6K7esyoib+oqkeMzZ9Osq2q3pDkrhn2uv9xkgdl+DuyvqpeqSZWl6raZ5ylcd0kW7v7fhk+V3wvQ4h7dZK7JfmFqrpOd/9bkv+X4UN+kjw+yW939526+/w9/wpYDldRF09M8u0krxs3+1qSAzN87jg8w/vCOWOIm3ddrNpDvTun1Y0rTh2R5FPd/bbx6Nz3k1zU3Z+pql9J8pCquqK7z0tycoapVenuczN8wGeV2E1d/DA/nW53RJJnZphu988Zzr38SVWVulhd5vBesWWcvt1JvlxVNxzPp3t7koeqidWvqm6RYarU8zL8EX56Vd2iu5+X4e/opRmm1O18/0h3/6SqHpThPN3Tu/vUPT9ylssuauIZVfWzGT6c/SDDghh37e5Lq+pbSe7Z3SdV1X/PUBPvUxPTN06le26Sfarq3RlO47k8Gc6tq6onZzin8sUZFup7eJINSf4+w/vGx8ZtL02yY4+/AJbFHOriqRl2AN0uyX4ZTv3530memmEH0D2r6u0LqYtVecSuqh6TYUWZ545Nn89wEuJrM5wHceMkL66qV2b4h9w/yQuq6g/Gx5/b86Nmuc2hLg7KsDf+ZRl+sT6V4Y/x72VYLfWgJPasriJzfK/4iyR/maEG7pph2uUTM0yTeK9z6VanGq49tvNv5J0zLJ7zzu7+TIZpuH9YVTce//AmwxTdT4x9/6SqNnb3Rd39MR/gV4c51MSfZPg88c4MR2oeNm77uSQ3rqprqInVo6rulZ9Omzs/w9+RS5Pcp6qOTP5z4ZTnJHlhd785w+rJj66qz2TYKXT2Soyd5TPHurhibH9+d/9xkp/t7hd199cznLv9rhl/W+b381fbZ5JxLvPfZljo4tgkj+juL47nPfx2kh9092tqWJp6a5JfHo/cPSLDYfKTe7gmFavIAuriqO4+Y0b//cZzrVgl5lkTX88Q6q6X5IEZliZ+UXd/ckUGz7IaA/8JSf5Pd/9xVf1Ckn9Jcsfu/kpVPT7JcUnO7e5Hj1PpTs8wHfc2GXYWHDcuusQqMMeaeEKGCwsfV1XHZNhRfEqG95c3JHlJhkXtVtcHrzWqqu6Z5LDu/pvx8aszBLX/SPKU7r7juCPgxhkWVvqD7t5SVTdJcp3u/vJKjZ3lM8+6+Mskx3f3l6tq34WGuZlW3RG78dyG3x9PaH9fkmePT30rw8WDzxm3uyTDtMuDx8dv7e6nCHWr0zzr4i0Zjs7NnKYn1K0yC3ivuFl3n9Hdz+nuhwp1q9MY+I9JcmKSB1TVrbv780nenOR5VfXRDBeh/+0MK6AePHa9YYZFdZ7W3b8p1K0e86iJRyfZWMOFhd+ZIeh9N8njxr3xVwh1q8pZSU4Zp90lw+I4m7r7jRmm4D1lPDKzMcml3b0lSbp7u1C3qs23Lr6c/Od03EVbdcEuSbr7a+Pdl2VY/OAB4z/i+UlOqqpbVdWzMqx86LyYNWIedXHPjHUxPs8qNY+auFuSf1upcbLn7CLwP2d86ulJnpRh7+pvZfjAviPDJXP2S/I73X3fcVoeq8g8a+Ki8WvGHUF/3d0fX4Fhs8y6+0fd/eP+6WJZ989Pz4d6TJKfq6rTkrw1w6I6rAHzrIvPJEt7MfpVNxVztnF6xG919z3Hxy/KcOLqNZL84c49KKwt6oLZ1ASzjVOmTk3yJ9393nGFs8vH507IcA7F77elydeMOdbEU9rKuGvGeGSmk/xThv/788fFdb6ZYdr+V7p760qOkT1vpepiVQe78UTlK6rqbRn2ov0ow3z3s7v7P1Z2dKwUdcFsaoKrMgb+R3b3vcbHR2ZYvn7fJI/t7u0rOT72PDXBTOPRlmtmWL7+HUkem2FK/1O6+3srOTZWzkrVxaoOdklSVddJ8p4MJ7T/WXe/YjddWAPUBbOpCWabFfi3ZbiY9D8n+VJ3//vKjo6VoCbYlaq6S4ZLF3wswwI7r1/hIbEXWIm6WLXXsZvhiRnmNt/fAhjMoC6YTU3wX4wf4K+TYfWye2cI/O9Z2VGxktQEV+HCDEdtX+LvBzPs8bpYC0fsrmEBDGZTF8ymJtiVqnpGhtXLjveBjURNAHuvVR/sAGChBH5mUxPA3kqwAwAAmLhVeR07AACAtUSwAwAAmDjBDgAAYOIEOwAAgIkT7AAAACZOsAMAAJi4/w88JnyCUrgvmgAAAABJRU5ErkJggg==\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "Once the global vision of the date ranges from year 1801 to 2029 is available, we can see:\n", "- some of years are out of trustly range\n", "- the high density of dates is distributed between the years 1940 to 2000\n", "- the graphs show a clear error close to the year 2000\n", "\n", "If we focus on the outliers of the year 2000, we can detect that there is a possible problem with the date 01/01/2000 (dd/mm/YYYY) might be caused by a technical error or by using date file as a dummy date." ], "metadata": { "id": "aeV_LtnMPy2t" } }, { "cell_type": "markdown", "source": [ "If we focus on the outliers of the year 2000, we can detect that there is a possible problem with the date 01/01/2000 (dd/mm/YYYY) that is possibly caused by a technical error or by using date file as a dummy date." ], "metadata": { "id": "1BUlZp4vW5BN" } }, { "cell_type": "code", "source": [ "from datetime import date\n", "dates_outlier = [item for item in dates if item > date(1999, 12, 15) and item < date(2000, 1, 15)]\n", "f = plt.figure()\n", "f.set_figwidth(15)\n", "f.set_figheight(5)\n", "plt.xticks(rotation=30)\n", "plt.hist(dates_outlier, bins=30, edgecolor='black')\n", "plt.title(\"Outliers\")\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 363 }, "id": "6O7_Xw6mZPJB", "outputId": "99070213-cc80-4575-f75c-23388080e83c" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "Once potential outliers are removed, we can re-visualize the plot without erroneous data, making easy." ], "metadata": { "id": "t-l99hZCZj9F" } }, { "cell_type": "code", "source": [ "dates_clean = [item for item in dates if item > date(2000, 1, 1) or item < date(2000, 1, 1)]\n", "f = plt.figure()\n", "f.set_figwidth(15)\n", "f.set_figheight(5)\n", "plt.xticks(rotation=30)\n", "plt.hist(dates_clean, bins=40, edgecolor='black')\n", "plt.title(\"Dates without outliers\")\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 347 }, "id": "lDIKMcvkZup2", "outputId": "19ea8dee-2a20-41ab-a871-727b600b2989" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3YAAAFKCAYAAABRis1yAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de7xcVX338c/XJKCIgELAwCEB5WLRKmpAFFEUrahpoc+jCN5QsfFKteJjRJ9WrUVF8QKi1igUaCvIo1WQWhXvF+QS8AKI1BSJSSAQBLyAcgm/54+9o+Ph5HZmTib7nM/79TqvM7P23mvWnKzMzHevtdekqpAkSZIkddd9ht0ASZIkSVJ/DHaSJEmS1HEGO0mSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJHVekiuTHLiW7d9M8vKN2KQJk6SS7Nbe/uckfz/sNkmShs9gJ0lTXJJrk/wuyW+S3JrkgiSvTLJe7xFJdmnDxvSJbuuaVNXDq+qbbXvenuTfNsbjJjkwybIJrH+tgbSqXllV75yox5ckdYfBTpIE8JdV9QBgDvAeYAFwynCbpIkyzBAuSZoYBjtJ0h9U1a+q6lzgecCRSR4BkOTZSX6Q5NdJliZ5e89h325/35rkt0ke3x7zsiRXJbklyZeTzGnLk+SDSW5s67t89eP0SvKUJJf33D8/ySU997+T5ND29rVJnpbkYOAtwPPatvyop8o5Sb7Xjkx+Jcl2PXX9VTud89Z2lOzPerb9Yepje/+0JP+U5P7AfwE7to/12yQ7jvE8tk5yRpKVSZYk+b+rR0NHjy72jn4mOQ44ADi5rfvkMeo+Lck/9dyfl+SHPSOvj+zZdm2SBUl+DNzWPsaCJMvbv8nVSQ4a/RiSpG4w2EmS7qWqLgaW0QQLgNuAFwPbAM8GXrU6VAFPan9vU1VbVtX3kxxCE7D+FzAT+A5wZrvfX7TH7AFsDRwG/HKMZlwI7J5kuyQzgEfShKgHJLkfMLett7fdXwLeBXy6bcujejY/H3gpsD2wGfBGgCR7tG17fdvWLwJfSLLZOv5GtwHPBK5rH2vLqrpujF0/3D7PhwBPpvk7vnRtdbf1v7V9fq9t637t2vZP8mjgVOAVwLbAx4Fzk2zes9sRNP9+2wAPBV4L7NOO1j4DuHZd7ZIkbZoMdpKkNbkOeBBAVX2zqi6vqnuq6sc0QejJazn2lcC7q+qqqrqbJmzt3Y7a3QU8AHgYkHaf60dXUFW/Ay6hCYGPBX4EfA/YH9gP+FlVjRUI1+Rfquq/23rPBvZuy58H/GdVnV9VdwEnAPcDnrABdY8pyTTgcODYqvpNVV0LvB94Ub91j2E+8PGquqiqVlXV6cAdNH+r1U6qqqXt32AVsDmwV5IZVXVtVf3PBLRLkrQRGOwkSWuyE3AzQJLHJflGO53wVzTBbbu1HDsHOLGdEnhrW0+Anarq68DJwEeAG5MsTLLVGur5FnAgTbj7FvBNmkD55Pb+hljRc/t2YMv29o7AktUbquoeYCnN8+/XdsCM3vrb24Ooe7Q5wDGr/+bt331nmue32tLVN6pqMc0o5dtp/h3OGmsqqSSpGwx2kqR7SbIPTfj4blv0KeBcYOeq2hr4Z5qgBlBjVLEUeEVVbdPzc7+qugCgqk6qqscCe9FMyfw/a2jK6GD3LdYd7MZqz9pcRxOKgOYaQJpAtLwtuh3Yomf/B2/AY91EM0I5p6dsdk/dt62l7vWpv9dS4LhRf/MtqurMnn3+pL6q+lRVPbFtXwHHb8DjSZI2IQY7SdIfJNkqyTzgLODfqmr14iUPAG6uqt8n2ZfmerXVVgL30FxDtto/A8cmeXhb79ZJntve3qcdAZxBE2x+3x4/lguAPYF9gYur6kqaEPI4/rhoy2g3ALtkPb+ugWZa5rOTHNS26RiaKYwXtNt/CDw/ybR2cZbeKag3ANsm2XqsiqtqVVv/ce21gXOANwCrF0z5IfCkJLPbOo4d47k8hPXzCeCV7d82Se6fZtGbB4y1c5I9kzy1vQbv98DvWPO/gyRpE2ewkyRBs1jIb2hGfd4KfIA/XeDj1cA/tvv8A01YAaCqbgeOA77XTgHcr6o+RzP6c1aSXwNX0Cw0ArAVTQi5hWZa4i+B943VqHaBksuAK6vqzrb4+8CSqrpxDc/l/7W/f5nksnU98aq6GnghzSInNwF/SfP1D6sf73Vt2a3AC4DP9xz7U5rrDa9pn/tYUxmPpgmw19CMgH6KZpETqup84NPAj4FLgfNGHXsi8Jw0K4uetI7nsQj4G5pprrcAi4GXrOWQzWm+2uImmmmq23PvYClJ6ohUbeiMFUmSJEnSpsQRO0mSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI6bPuwGAGy33Xa1yy67DLsZkiRJkjQUl1566U1VNXO8x28SwW6XXXZh0aJFw26GJEmSJA1FkiX9HO9UTEmSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI4z2EmSJElSxxnsJEmSJKnjDHaSJEmS1HEGO0mSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSpL7NGplNkoH8zBqZPeynI3XO9GE3QJIkSd23YvlS5iw4byB1LTl+3kDqkaaSdY7YJTk1yY1JrhhVfnSSnya5Msl7e8qPTbI4ydVJnjERjZYkSZIk/dH6jNidBpwMnLG6IMlTgEOAR1XVHUm2b8v3Ag4HHg7sCHw1yR5VtWrQDZckSZIkNdY5YldV3wZuHlX8KuA9VXVHu8+NbfkhwFlVdUdV/RxYDOw7wPZKkiRJkkYZ7+IpewAHJLkoybeS7NOW7wQs7dlvWVsmSZIkSZog4108ZTrwIGA/YB/g7CQP2ZAKkswH5gPMnu3KR5IkSZI0XuMdsVsG/Ec1LgbuAbYDlgM79+w30pbdS1UtrKq5VTV35syZ42yGJEmSJGm8we7zwFMAkuwBbAbcBJwLHJ5k8yS7ArsDFw+ioZIkSZKksa1zKmaSM4EDge2SLAPeBpwKnNp+BcKdwJFVVcCVSc4GfgLcDbzGFTElSZIkaWKtM9hV1RFr2PTCNex/HHBcP42SJEmSJK2/8U7FlCRJkjZ5s0Zmk2QgP7NGXPBPm67xroopSZIkbfJWLF/KnAXnDaSuJcfPG0g90kRwxE6SJEmSOs5gJ0mSJEkd51RMSZIkbVqmzSDJsFshdYrBTpIkSZuWVXd5XZy0gZyKKUmSJEkdZ7CTJEmSpI4z2EmSJElSxxnsJEmSJKnjDHaSJEmS1HEGO0mSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI4z2EmSJElSxxnsJEmSJKnjDHaSJEmS1HEGO0mSJEnqOIOdJEnSFDVrZDZJBvIjabimD7sBkiRJGo4Vy5cyZ8F5A6lryfHzBlKPpPFxxE6SJGkMgxzNmjUye9hPR9Ikt84RuySnAvOAG6vqEaO2HQOcAMysqpvSjMOfCDwLuB14SVVdNvhmS5IkTSxHsyR1yfqM2J0GHDy6MMnOwF8Av+gpfiawe/szH/hY/02UJEmSJK3NOoNdVX0buHmMTR8E3gRUT9khwBnVuBDYJsmsgbRUkiRJkjSmcV1jl+QQYHlV/WjUpp2ApT33l7VlY9UxP8miJItWrlw5nmZIkiRJG8+0GV53qU3WBq+KmWQL4C000zDHraoWAgsB5s6dW+vYXZIkSRquVXd53aU2WeP5uoOHArsCP2q/s2QEuCzJvsByYOeefUfaMkmSJEnSBNngqZhVdXlVbV9Vu1TVLjTTLR9TVSuAc4EXp7Ef8Kuqun6wTZYkSZIk9VpnsEtyJvB9YM8ky5IctZbdvwhcAywGPgG8eiCtlCRJkiSt0TqnYlbVEevYvkvP7QJe03+zJEmSJEnra1yrYkqSJEmSNh0GO0mSNGnMGpk9sOXoJalLxrMqpiRJ0iZpxfKlLkcvaUpyxE6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI4z2EmSJElSxxnsJEmSOsSVPyWNxVUxJUmSOsSVPyWNxRE7SZIkSeo4g50kSZIkdZzBTpIkSZI6zmAnSZIkSR1nsJMkSZKkjjPYSZIkSVLHGewkSZIkqeMMdpIkSZLUcQY7SZI0VLNGZpNkID+SNFVNH3YDJEnS1LZi+VLmLDhvIHUtOX7eQOqRpK5xxE6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI4z2EmSJElSx60z2CU5NcmNSa7oKXtfkp8m+XGSzyXZpmfbsUkWJ7k6yTMmquGSJEmdMW2GK39KmlDrsyrmacDJwBk9ZecDx1bV3UmOB44FFiTZCzgceDiwI/DVJHtU1arBNluSJKlDVt3lyp+SJtQ6R+yq6tvAzaPKvlJVd7d3LwRG2tuHAGdV1R1V9XNgMbDvANsrSZIkSRplENfYvQz4r/b2TsDSnm3L2rJ7STI/yaIki1auXDmAZkiSJEnS1NRXsEvyVuBu4N839NiqWlhVc6tq7syZM/tphiRJkiRNaetzjd2YkrwEmAccVFXVFi8Hdu7ZbaQtkyRJkiRNkHGN2CU5GHgT8FdVdXvPpnOBw5NsnmRXYHfg4v6bKUmSJElak3WO2CU5EzgQ2C7JMuBtNKtgbg6c3y67e2FVvbKqrkxyNvATmimar3FFTEmSJEmaWOsMdlV1xBjFp6xl/+OA4/pplCRJkiRp/Q1iVUxJkiRJ0hAZ7CRJkiSp4wx2kiRJktRxBjtJkiRJ6jiDnSRJkiR1nMFOkiRJkjrOYCdJkiRJHWewkyRJkqSOM9hJkiRJUscZ7CRJkiSp4wx2kiRJktRxBjtJkiRJ6jiDnSRJkiR1nMFOkiRJkjrOYCdJkiRJHWewkyRJkqSOM9hJkiRJUscZ7CRJkiSp4wx2kiRJktRxBjtJkiRJ6jiDnSRJkiR1nMFOkiRJkjpuncEuyalJbkxyRU/Zg5Kcn+Rn7e8HtuVJclKSxUl+nOQxE9l4SZI0HLNGZpNkID+SpP5NX499TgNOBs7oKXsz8LWqek+SN7f3FwDPBHZvfx4HfKz9LUmSJpEVy5cyZ8F5A6lryfHzBlKPJE1l6xyxq6pvAzePKj4EOL29fTpwaE/5GdW4ENgmyaxBNVaSJEmSdG/jvcZuh6q6vr29Atihvb0TsLRnv2VtmSRJkiRpgvS9eEpVFVAbelyS+UkWJVm0cuXKfpshSZIkSVPWeIPdDaunWLa/b2zLlwM79+w30pbdS1UtrKq5VTV35syZ42yGJEmSJGm8we5c4Mj29pHAOT3lL25Xx9wP+FXPlE1JkiRJ0gRY56qYSc4EDgS2S7IMeBvwHuDsJEcBS4DD2t2/CDwLWAzcDrx0AtosSZIkSeqxzmBXVUesYdNBY+xbwGv6bZQkSZIkaf31vXiKJEmSJGm4DHaSJEmS1HEGO0mSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI4z2EmSJElSxxnsJEmSJKnjDHaSJEmS1HEGO0mSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI4z2EmSJElSxxnsJEmSJKnjDHaSJEmS1HEGO0mSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI7rK9gl+bskVya5IsmZSe6bZNckFyVZnOTTSTYbVGMlSZIkSfc27mCXZCfgb4G5VfUIYBpwOHA88MGq2g24BThqEA2VJEmSJI2t36mY04H7JZkObAFcDzwV+Ey7/XTg0D4fQ5IkSZK0FuMOdlW1HDgB+AVNoPsVcClwa1Xd3e62DNhprOOTzE+yKMmilStXjrcZkiRJkjTl9TMV84HAIcCuwI7A/YGD1/f4qlpYVXOrau7MmTPH2wxJkiRJmvL6mYr5NODnVbWyqu4C/gPYH9imnZoJMAIs77ONkiRJkqS16CfY/QLYL8kWSQIcBPwE+AbwnHafI4Fz+muiJEmSJGlt+rnG7iKaRVIuAy5v61oILADekGQxsC1wygDaKUmSJElag+nr3mXNquptwNtGFV8D7NtPvZIkSZKk9dfv1x1IkiRJkobMYCdJkiRJHWewkyRJkqSOM9hJkiRJUscZ7CRJkiSp4wx2kiRJktRxBjtJkiRJ6jiDnSRJkiR1nMFOkiRJkjrOYCdJkiRJHWewkyRJkqSOM9hJkiRJUscZ7CRJkiSp4wx2kiRJktRxBjtJkiRJ6jiDnSRJkiR1nMFOkiRJkjrOYCdJkiRJHWewkyRJkqSOM9hJkiRJUscZ7CRJkiSp4wx2kiRJUofNGplNkoH8zBqZPeyno3Ga3s/BSbYBPgk8AijgZcDVwKeBXYBrgcOq6pa+WilJkiRpTCuWL2XOgvMGUteS4+cNpB5tfP2O2J0IfKmqHgY8CrgKeDPwtaraHfhae1+SJEmSNEHGHeySbA08CTgFoKrurKpbgUOA09vdTgcO7beRkiRJkqQ162fEbldgJfAvSX6Q5JNJ7g/sUFXXt/usAHbot5GSJEmSpDXrJ9hNBx4DfKyqHg3cxqhpl1VVNNfe3UuS+UkWJVm0cuXKPpohSZIkSVNbP8FuGbCsqi5q73+GJujdkGQWQPv7xrEOrqqFVTW3qubOnDmzj2ZIkiRJ0tQ27mBXVSuApUn2bIsOAn4CnAsc2ZYdCZzTVwslSZIkSWvV19cdAEcD/55kM+Aa4KU0YfHsJEcBS4DD+nwMSZIkSdJa9BXsquqHwNwxNh3UT72SJEmSpPXX7/fYSZIkSZKGzGAnSZIkSR1nsJMkSZKkjjPYSZIkSVLHGewkSZIkqeMMdpIkSZLUcQY7SZIkSeo4g50kSZIkdZzBTpIkSZI6zmAnSZIkSR1nsJMkSZKkjjPYSZIkSVLHGewkSZIkqeMMdpIkSZLUcQY7SZIkSeo4g50kSZIkdZzBTpIkSZI6zmAnSZIkSR1nsJMkSZI0cLNGZpNkID+zRmYP++ls8qYPuwGSJEmSJp8Vy5cyZ8F5A6lryfHzBlLPZOaInSRJmzDPeEvaqKbNGNhrjjYuR+wkSdqEecZb0ka16i5fczrKYCdJ0lTRnomXJE0+fQe7JNOARcDyqpqXZFfgLGBb4FLgRVV1Z7+PI0mS+uSZeEmatAZxjd3rgKt67h8PfLCqdgNuAY4awGNIkiRJktagr2CXZAR4NvDJ9n6ApwKfaXc5HTi0n8eQJEmSJK1dvyN2HwLeBNzT3t8WuLWq7m7vLwN2GuvAJPOTLEqyaOXKlX02Q5KkTccgV7KUJGl9jPsauyTzgBur6tIkB27o8VW1EFgIMHfu3BpvOyRJ2tS4kqUkaWPrZ/GU/YG/SvIs4L7AVsCJwDZJprejdiPA8v6bKUmSJElak3FPxayqY6tqpKp2AQ4Hvl5VLwC+ATyn3e1I4Jy+WylJkiRJWqNBrIo52gLgDUkW01xzd8oEPIYkSZIkqTWQLyivqm8C32xvXwPsO4h6JUmSJEnrNhEjdpIkSZKkjchgJ0mSJEkdZ7CTJEmSpI4z2EmSJElSxxnsJEmSJKnjBrIqpiRJkqQNMG0GSYbdCk0iBjtJkiRpY1t1F3MWnDeQqpYcP28g9ajbnIopSZIkSR1nsJMkSZKkjjPYSZIkSVLHGewkSZIkqeMMdpIkSZLUcQY7SZIkSeo4g50kSZIkdZzBTpIkSZI6zmAnSZIkSR1nsJMkSZKkjjPYSZIkSVLHGewkSZIkqeMMdpIkSZLUcQY7SZIkSeo4g50kSZIkddy4g12SnZN8I8lPklyZ5HVt+YOSnJ/kZ+3vBw6uuZIkSZKk0foZsbsbOKaq9gL2A16TZC/gzcDXqmp34GvtfUmSJEnSBBl3sKuq66vqsvb2b4CrgJ2AQ4DT291OBw7tt5GSJEmSpDUbyDV2SXYBHg1cBOxQVde3m1YAOwziMSRJkiRJY+s72CXZEvgs8Pqq+nXvtqoqoNZw3Pwki5IsWrlyZb/NkCRJkqQpq69gl2QGTaj796r6j7b4hiSz2u2zgBvHOraqFlbV3KqaO3PmzH6aIUmSJElTWj+rYgY4Bbiqqj7Qs+lc4Mj29pHAOeNvniRJkiRpXab3cez+wIuAy5P8sC17C/Ae4OwkRwFLgMP6a6IkSZIkaW3GHeyq6rtA1rD5oPHWK0mSJEnaMANZFVOSJEmSNDwGO0mSgFkjs0kykB9Jkja2fq6xkyRp0lixfClzFpw3kLqWHD9vIPVIkrS+HLGTJEmSpI4z2EmSJElSxxnsJEmSJKnjDHaSJEmS1HEGO0mSJEnqOIOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI4z2EmSJElSxxnsJEmSJKnjDHZrMWtkNkkG8jNrZPawn44kSZKkSWr6sBuwKVuxfClzFpw3kLqWnPDXJBlIXQ/eaWeuX/aLgdQ1a2Q2K5YvHUhdg2yXJEmSpPVnsNtYVt01uJB4/LyB1AMDDq8DbJckSZKk9edUTA3OtBkDm7o6ffP7TfppsE71lSRJ0qA4YqfBGfCo5GQfSXS0VJIkSYNisOuidmRMkiRJmhIG+Pl3sq4LYbDrok30ej1JkiRpQvj5d528xk6SJEmSOs5gJ22AQS54IkmSJA3KhE3FTHIwcCIwDfhkVb1noh5LWqsBX5PoNABJkiRtaiYk2CWZBnwEeDqwDLgkyblV9ZOJeDxprZyTLUmSpEluoqZi7gssrqprqupO4CzgkAl6LEmSJEma0lJVg680eQ5wcFW9vL3/IuBxVfXann3mA/Pbu3sCVw+8IRtmO+CmIbdBmxb7hMZiv9Bo9gmNZp/QWOwXGm10n5hTVTPHW9nQvu6gqhYCC4f1+KMlWVRVc4fdDm067BMai/1Co9knNJp9QmOxX2i0QfeJiZqKuRzYuef+SFsmSZIkSRqwiQp2lwC7J9k1yWbA4cC5E/RYkiRJkjSlTchUzKq6O8lrgS/TfN3BqVV15UQ81gBtMtNCtcmwT2gs9guNZp/QaPYJjcV+odEG2icmZPEUSZIkSdLGM1FTMSVJkiRJG4nBTpIkSZI6zmAnSZIkDVmSDLsN2vRsSL8w2K1BkockGRl2O7RpsV9oNPuE1sUPaxrNPiGAJE9M8rEkrwYoF74Q/fULg90oSTZLchrwJeBfk7wsyf3abb4QT1H2C41mn9DaJHl4kgPBD2tq2CfUK8ljgI8BlwLPSvLBJHsPuVkasn77hcHu3h4FbFlVewD/F3gS8KIkM3whntL2xn6hP2Wf0L0kuU+SjwKfBd6S5J1J5q7eNtzWaRjsE1qDfYFLquqTwMuB22k+yG833GZpyPahj37hCwqQZKTnDPs0YLckqarv0ZyNfxhwwNAaqKFIsk+SbXuK7BdTXJI/SzKrvXsf7BO6t21oAv/DgBcAvwSOSbJlVd0z3KZpSB6IfWLKS3JYkjckeUJbdBmwZZIHV9UK4OvATOCJQ2ukNrokT0jyuSQPb4t+QB/9YkoHuySzk3wd+BRwWpJdgWuAbwMHt7t9Bfg18OdJNh9OS7WxJXkKcBFwcJLNgBXAd4BntLvYL6aQJLsl+QLwCeAL7Qvw1cB3sU9MeUnmJLlve3db4AlJ7l9VK2lGaW4BXtvu6zTdKSDJc1ZfHwNshX1iykoyLck/AAvaoo8n+UvgNuBa4Mlt+beAW4GR9jj7xdTwKOARwD5JtqAZoVtGMwsINrBfTOlgB7wKuLCqnkTzwf19wP2B64HHJtmuqm4G/gfYv6ru8D/alLELzdm03YCH0pxhXUHzH29b+8WU807g0qp6Ik3AP5rmxfd67BNTVpK9knweOA04N8meVfUz4ELg9e1u19N8kN87ySyn6U5uSbZM8lngjcAtSaZX1c+B72GfmJKqahWwJ3BMVX0AeAdNqJ8OXEfTD/aqqrtpThj+dXuc/WJqeBBwJTCXJuRdCSyhySEb3C+mXLBL8uAkM3qKVgBU1QJgBrA/zQf6rWimTACcA2ybZCv/o01Obb/o/f9wHfAZYA7wlKq6DfgG8ADgRe0+9otJrO0T09vRt1uAq9pNRXNR8/2ALwJbY5+YMlYH9iQPo7nA/RtV9RTgcuDkdrdTgP2T7Nq+Kd8A/B7YYghN1gQbdRJnZ+CGqtqvqs4EVrXlp9H0iYfYJya/JC9O8uQk27RFNwAPbIP+Z2hOAj6dZprd74F/avfbCbgkyfSN3mhNuJ5+sXV7fxrNaNy7aU4W70sT+L/flh/XHrre/WLKBLskByX5DvAR4KS2+DfAqiRbtfc/Cjwf+BHweeBvkryb5g98Ec2wuSaRUf3ioz2b9qc5Q/Ihmql1fwPcF/gCcJT9YvLq6RMfBU6qqjto3oSfleRy4Kk0Z1+/DtxJ81phn5g6Vk+5/BXw5qo6sb3/j8AWSWYCl9CcIHwvQFVdQXOS6I6N3FZtHPftuf1I/jhl6tXA25I8keYs/PeAE8A+MRmlMSvJN4AjaQYHPpJkS+Am4M+BLdvdTwJeSHMS4B3ArUn+Ezgc+GQb/jUJjNEvng98NMnMdjT30TQnjD9EMyp3Kc11ue+jGfXfoH4xJYJdkj2AdwEn0ky/fEiSRwMXAwfRnGGjqr4MbAY8t6ouAJ4H/Ax4W1W9pf0H0CQxRr+YneTp7eYraV6Ifws8DfggML2qvkPzH8x+MQmN6hOvBB6a5PFV9X6aF92fVNXeVfVG4IfA86vqu9gnJr0kT09yPvC+JIdV1fVV9f2e0Zo/B35fVSur6rc0QW+nJB9OcgXN1JpfOUV38ujpE+9NckRbfBlwfZJTgcfTnHV/K3AozfvIzCQn2ycmlyTT2lkaDwCWV9VBNJ8rfk0T4j4KPAF4ZJItquqnwH/TfMgHeAXwkqrap6oWb/xnoImwhn7xauBm4JPtbr8AtqP53LE7zevCFW2I2+B+MWmHeldPq2tXnNobuLiqPtOOzv0GuLGqfpDkL4DnJLmnqq4CzqKZWkVVXUnzAV+TxDr6xW38cbrd3sCbaabbfZXm2ss7k8R+Mbmsx2vF0nb6dgHXJHlQez3dZ4Hn2icmvyS70UyVehfNm/AxSXarqnfRvI/eRTOlbvXrB1V1Z5JDaa7TPb+qzt34LddEGaNPvDHJjjQfzn5LsyDG46vqrkkEwVoAAAUQSURBVCS/BA6oqoVJ/jdNn/iKfaL72ql07wSmJfkizWU8q6C5ti7Ja2muqXw/zUJ9hwOzgE/TvG5c0O57F7Byoz8BTYj16BevozkB9Ahgc5pLf/4BeB3NCaADknx2PP1iUo7YJXkpzYoy72yLfkxzEeInaK6D2B54f5KTaf6QWwLvSfJ37f0fbfxWa6KtR7/YgeZs/Ido/mNdTPNm/Cqa1VJ3ADyzOoms52vFe4EP0/SBx9NMu3w1zTSJL3st3eSU5rvHVr9HPo5m8ZxzquoHNNNw35Rk+/aNF5opuhe2x/59kpGqurGqLvAD/OSwHn3i72k+T5xDM1JzWLvvj4Dtk9zHPjF5JHkyf5w2t5jmfeQu4ClJ9oU/LJzyDuB9VXUGzerJL07yA5qTQpcPo+2aOOvZL+5py99dVW8FdqyqE6rqOpprt7/Q896yYY8/2T6TtHOZ/41moYsjgSOq6ur2uoeXAL+tqo+lWZp6OfC0duTuCJph8rOq+U4qTSLj6BcHV9UlPcdv3l5rpUliA/vEdTShbivgWTRLE59QVRcNpfGaUG3gPw74l6p6a5JHAt8EHltVP0/yCmA+cGVVvbidSnc+zXTcvWhOFsxvF13SJLCefeKVNF8sPD/JITQnis+meX05FfgAzaJ2k+uD1xSV5ABgl6r61/b+R2mC2u+Ao6vqse2JgO1pFlb6u6pamuTBwBZVdc2w2q6Js4H94sPAgqq6JsmM8Ya5XpNuxK69tuFv2wvavwK8vd30S5ovD76i3e/3NNMud2rvn1lVRxvqJqcN7Befohmd652mZ6ibZMbxWvGQqrqkqt5RVc811E1ObeA/BDgeeGaSh1XVj4EzgHcl+R7Nl9C/hGYF1J3aQx9Es6jO66vqBYa6yWMD+sSLgZE0Xyx8Dk3QuxU4qj0bf4+hblK5FDi7nXYHzeI4s6vqNJopeEe3IzMjwF1VtRSgqlYY6ia1De0X18AfpuP2bdIFO4Cq+kV780M0ix88s/0jLgYWJtkzyVtoVj70upgpYgP6xQG0/aLdrklqA/rEE4CfDqud2njGCPzvaDcdA7yG5uzqC2k+sK+k+cqczYGXV9VT22l5mkQ2sE/c2P6mPRH08ar6/hCarQlWVbdX1R31x8Wyns4fr4d6KfBnSc4DzqRZVEdTwAb2ix/AYL+MftJNxRytnR7xwqo6oL1/As2Fq/cB3rT6DIqmFvuFRrNPaLR2ytS5wN9X1ZfbFc5WtduOo7mG4m/LpcmnjPXsE0eXK+NOGe3ITAH/SfNvv7hdXOcmmmn7P6+q5cNsoza+YfWLSR3s2guV70nyGZqzaLfTzHe/vKp+N9zWaVjsFxrNPqE1aQP/86vqye39fWmWr58BvKyqVgyzfdr47BPq1Y62bEazfP3ngJfRTOk/uqp+Pcy2aXiG1S8mdbADSLIF8CWaC9r/sapOWschmgLsFxrNPqHRRgX+62m+TPqrwM+q6n+G2zoNg31CY0myH81XF1xAs8DOKUNukjYBw+gXk/Z77Hq8mmZu89NdAEM97BcazT6hP9F+gN+CZvWyA2kC/5eG2yoNk31Ca7CMZtT2A75/qMdG7xdTYcTuPi6AodHsFxrNPqGxJHkjzeplC/zAJrBPSNp0TfpgJ0nSeBn4NZp9QtKmymAnSZIkSR03Kb/HTpIkSZKmEoOdJEmSJHWcwU6SJEmSOs5gJ0mSJEkdZ7CTJEmSpI4z2EmSJElSx/1/PLIVtN7fXU0AAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "**4. Analysis of distribution and frequencies**\n", "\n", "The next step is to analyze the split of dates of the dataset, component by component. First of all, we can analyze dates (excluding previous detected outliers) from the point of view of the year, and it seems like the global view." ], "metadata": { "id": "jq8gXJUaaHbP" } }, { "cell_type": "code", "source": [ "dates_clean_year = [item.year for item in dates_clean]\n", "f = plt.figure()\n", "f.set_figwidth(15)\n", "f.set_figheight(5)\n", "\n", "plt.hist(dates_clean_year, bins=40, edgecolor='black')\n", "plt.title(\"Years\")\n", "plt.show()\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 336 }, "id": "8dqvP2R3zMiG", "outputId": "a22aa9f0-3202-4eae-8e97-a2ed4e8ab539" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "Now, we can do the same analysis using the month as frequency." ], "metadata": { "id": "CUOJzI3gRyZT" } }, { "cell_type": "code", "source": [ "dates_clean_month = [item.month for item in dates_clean]\n", "f = plt.figure()\n", "f.set_figwidth(10)\n", "f.set_figheight(5)\n", "plt.hist(dates_clean_month, bins=12, edgecolor='black')\n", "plt.title(\"Months\")\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 336 }, "id": "zFRrZ-7MSUvQ", "outputId": "55b474d5-110e-47b8-e5a9-00b32017e75c" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlYAAAE/CAYAAACEto0QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAU+UlEQVR4nO3de6xlZ3kf4N+bGSctl4KpT93B4/EAMbRAiUkmhJZCKJfEUAtDRandlJhLMiDhlLSRuISoRK2oIIHQpKQgExybhBhTDMFCNMV1aQhSIBmD5RgMwTYePNOxPdwJpODL2z/OnrKZHnuOZ3/n7H3OeR5p66z1rtvrZWn75/WttXZ1dwAAmN0PzLsBAIDNQrACABhEsAIAGESwAgAYRLACABhEsAIAGESwAra0quqq+uF59wFsDoIVsDCq6qaq+m5VnXRU/VOTALR7xv3/r6r6uVn2AXBPBCtg0XwhyblHZqrqHyS5z/zaAVg9wQpYNL+X5Gen5s9L8s4jM1X1gKp6Z1Udrqr9VfUrVfUDk2UvqKqPVdUbq+qrVfWFqnrGZNnrkjwxyVuq6q+q6i1Tx3haVX2+qr5WVb9dVTXZ5oer6o+r6utV9aWqunSt/+GBjU2wAhbNx5P8rar6+1W1Lck5SX5/avl/TvKAJA9N8pNZDmEvnFr+E0k+l+SkJL+W5B1VVd39miR/kuT87r5fd58/tc1ZSX48yWOSPC/JT0/q/yHJh5OcmGTn5NgAd0uwAhbRkatWT09yXZKDk/qRoPXq7v5md9+U5E1Jnj+17f7ufnt335nk4iQ7kpx8jOO9vru/1t1fTPKRJGdM6rcnOS3Jg7v7/3T3x2b/RwM2M8EKWES/l+RfJnlBpoYBs3wV6oQk+6dq+5OcMjV/y5GJ7v72ZPJ+xzjeLVPT355a/xVJKsmfVdWnq+pFq+wf2KIEK2DhdPf+LN/E/swk75ta9KV87yrSEbvyvStax9z1vezjlu7++e5+cJKXJPkvXs0A3BPBClhUL07ylO7+1lTtziTvSfK6qrp/VZ2W5N/m++/Buie3ZvnerFWpqn9eVTsns1/NcjC7a7XbA1uPYAUspO6+obv3rbDoF5J8K8mNST6W5A+SXLjK3f5mkudOnhj8rVWs/+NJPlFVf5Xk8iQv7+4bV3ksYAuq7nt1ZRwAgLvhihUAwCCCFQDAIIIVAMAgghUAwCCCFQDAINvn3UCSnHTSSb179+55twEAcExXXXXVl7p7aaVlCxGsdu/enX37VnpdDQDAYqmq/Xe3zFAgAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDMyY6du1JVG+KzY+eueZ+uDWEhfoQZALaiWw7enNNe+cF5t7Eq+99w1rxb2BBcsQIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABjkmMGqqi6sqtuq6tqp2qVVdfXkc1NVXT2p766qv55a9ra1bB4AYJGs5idtLkryliTvPFLo7n9xZLqq3pTk61Pr39DdZ4xqEABgozhmsOruj1bV7pWWVVUleV6Sp4xtCwBg45n1HqsnJrm1uz8/VXtIVX2qqv64qp444/4BADaM1QwF3pNzk1wyNX8oya7u/nJV/ViSP6yqR3X3N47esKr2JtmbJLt27ZqxDQCA+TvuK1ZVtT3JP0ty6ZFad3+nu788mb4qyQ1JHr7S9t19QXfv6e49S0tLx9sGAMDCmGUo8GlJPtvdB44UqmqpqrZNph+a5PQkN87WIgDAxrCa1y1ckuRPkzyiqg5U1Ysni87J9w8DJsmTklwzef3Ce5O8tLu/MrJhAIBFtZqnAs+9m/oLVqhdluSy2dtiI9ixc1duOXjzvNtYlb97yqk5dOCL824DYOPadkKWXwaw2Ob9fT/rzetsYbccvDmnvfKD825jVfa/4ax5twCwsd15+4b4zp/3972ftAEAGESwAgAYRLACABhEsAIAGESwggWyY+euVNXCf3bs9GsJACvxVCAskI3ypOW8n7phfjbKa1bm/cg9W5dgBcCqCf9wzwwFAgAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAyypV4Q6o3BW9i2E1JV8+4CgE1uSwUrbwzewu683b97ANacoUAAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQbbUU4HAIBvk9RVeXQKsN8EKuPe8vgJgRYYCAQAGOWawqqoLq+q2qrp2qvarVXWwqq6efJ45tezVVXV9VX2uqn56rRoHOKbJkOVG+OzYuWveZwsYYDVDgRcleUuSdx5Vf3N3v3G6UFWPTHJOkkcleXCS/1FVD+/uOwf0CnDvbJAhy8SwJWwWx7xi1d0fTfKVVe7v7CTv7u7vdPcXklyf5HEz9AcAsGHMco/V+VV1zWSo8MRJ7ZQk079yfGBSAwDY9I43WL01ycOSnJHkUJI33dsdVNXeqtpXVfsOHz58nG0AACyO4wpW3X1rd9/Z3XcleXu+N9x3MMmpU6vunNRW2scF3b2nu/csLS0dTxsAAAvluIJVVe2Ymn1OkiNPDF6e5Jyq+qGqekiS05P82WwtAgBsDMd8KrCqLkny5CQnVdWBJK9N8uSqOiNJJ7kpyUuSpLs/XVXvSfKZJHckeZknAgGAreKYwaq7z12h/I57WP91SV43S1MAMJMN8rNLbD5+0gaAzWeDvMPM+8s2Hz9pAwAwiGAFADCIYAUAMIhgBQAwiGAFADCIYAUAMIhgBQAwiGAFADCIYAUAMIhgBQAwiGAFADCIYAUAMIgfYQZYBNtOSFXNuwtgRoIVwCK48/ac9soPzruLY9r/hrPm3QIsNEOBAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMcM1hV1YVVdVtVXTtV+/Wq+mxVXVNV76+qB07qu6vqr6vq6snnbWvZPADAIlnNFauLkpx5VO2KJI/u7sck+cskr55adkN3nzH5vHRMmwAAi2/7sVbo7o9W1e6jah+emv14kueObWuL23ZCqmreXQAA99Ixg9UqvCjJpVPzD6mqTyX5RpJf6e4/GXCMreXO23PaKz847y6Oaf8bzpp3CwCwUGYKVlX1miR3JHnXpHQoya7u/nJV/ViSP6yqR3X3N1bYdm+SvUmya9euWdoAAFgIx/1UYFW9IMlZSX6muztJuvs73f3lyfRVSW5I8vCVtu/uC7p7T3fvWVpaOt42AAAWxnEFq6o6M8krkjyru789VV+qqm2T6YcmOT3JjSMaBQBYdMccCqyqS5I8OclJVXUgyWuz/BTgDyW5YnKT9ccnTwA+Kcm/r6rbk9yV5KXd/ZU16h0AYKGs5qnAc1cov+Nu1r0syWWzNgUAsBF58zoAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIKsKVlV1YVXdVlXXTtUeVFVXVNXnJ39PnNSrqn6rqq6vqmuq6kfXqnkAgEWy2itWFyU586jaq5Jc2d2nJ7lyMp8kz0hy+uSzN8lbZ28TAGDxrSpYdfdHk3zlqPLZSS6eTF+c5NlT9Xf2so8neWBV7RjRLADAIpvlHquTu/vQZPqWJCdPpk9JcvPUegcmNQCATW3Izevd3Un63mxTVXural9V7Tt8+PCINgAA5mqWYHXrkSG+yd/bJvWDSU6dWm/npPZ9uvuC7t7T3XuWlpZmaAMAYDHMEqwuT3LeZPq8JB+Yqv/s5OnAxyf5+tSQIQDAprV9NStV1SVJnpzkpKo6kOS1SV6f5D1V9eIk+5M8b7L6h5I8M8n1Sb6d5IWDewYAWEirClbdfe7dLHrqCut2kpfN0hQAwEbkzesAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAg2w/3g2r6hFJLp0qPTTJv0vywCQ/n+TwpP7L3f2h4+4QAGCDOO5g1d2fS3JGklTVtiQHk7w/yQuTvLm73zikQwCADWLUUOBTk9zQ3fsH7Q8AYMMZFazOSXLJ1Pz5VXVNVV1YVSeutEFV7a2qfVW17/DhwyutAgCwocwcrKrqB5M8K8l/nZTemuRhWR4mPJTkTStt190XdPee7t6ztLQ0axsAAHM34orVM5J8srtvTZLuvrW77+zuu5K8PcnjBhwDAGDhjQhW52ZqGLCqdkwte06SawccAwBg4R33U4FJUlX3TfL0JC+ZKv9aVZ2RpJPcdNQyAIBNa6Zg1d3fSvK3j6o9f6aOAAA2KG9eBwAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYZPusO6iqm5J8M8mdSe7o7j1V9aAklybZneSmJM/r7q/OeiwAgEU26orVP+nuM7p7z2T+VUmu7O7Tk1w5mQcA2NTWaijw7CQXT6YvTvLsNToOAMDCGBGsOsmHq+qqqto7qZ3c3Ycm07ckOXnAcQAAFtrM91gl+cfdfbCq/k6SK6rqs9MLu7urqo/eaBLC9ibJrl27BrQBADBfM1+x6u6Dk7+3JXl/ksclubWqdiTJ5O9tK2x3QXfv6e49S0tLs7YBADB3MwWrqrpvVd3/yHSSn0pybZLLk5w3We28JB+Y5TgAABvBrEOBJyd5f1Ud2dcfdPcfVdWfJ3lPVb04yf4kz5vxOAAAC2+mYNXdNyb5kRXqX07y1Fn2DQCw0XjzOgDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgxx2squrUqvpIVX2mqj5dVS+f1H+1qg5W1dWTzzPHtQsAsLi2z7DtHUl+qbs/WVX3T3JVVV0xWfbm7n7j7O0BAGwcxx2suvtQkkOT6W9W1XVJThnVGADARjPkHquq2p3ksUk+MSmdX1XXVNWFVXXiiGMAACy6mYNVVd0vyWVJfrG7v5HkrUkeluSMLF/RetPdbLe3qvZV1b7Dhw/P2gYAwNzNFKyq6oQsh6p3dff7kqS7b+3uO7v7riRvT/K4lbbt7gu6e09371laWpqlDQCAhTDLU4GV5B1Jruvu35iq75ha7TlJrj3+9gAANo5Zngp8QpLnJ/mLqrp6UvvlJOdW1RlJOslNSV4yU4cAABvELE8FfixJrbDoQ8ffDgDAxuXN6wAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDrFmwqqozq+pzVXV9Vb1qrY4DALAo1iRYVdW2JL+d5BlJHpnk3Kp65FocCwBgUazVFavHJbm+u2/s7u8meXeSs9foWAAAC2GtgtUpSW6emj8wqQEAbFrV3eN3WvXcJGd2989N5p+f5Ce6+/ypdfYm2TuZfUSSzw1vZPM4KcmX5t3EFuJ8ry/ne3053+vPOV9f63G+T+vupZUWbF+jAx5McurU/M5J7f/p7guSXLBGx99Uqmpfd++Zdx9bhfO9vpzv9eV8rz/nfH3N+3yv1VDgnyc5vaoeUlU/mOScJJev0bEAABbCmlyx6u47qur8JP89ybYkF3b3p9fiWAAAi2KthgLT3R9K8qG12v8WY8h0fTnf68v5Xl/O9/pzztfXXM/3mty8DgCwFflJGwCAQQSrBVVVp1bVR6rqM1X16ap6+bx72gqqaltVfaqqPjjvXraCqnpgVb23qj5bVddV1T+cd0+bWVX9m8n3ybVVdUlV/Y1597TZVNWFVXVbVV07VXtQVV1RVZ+f/D1xnj1uJndzvn998p1yTVW9v6oeuJ49CVaL644kv9Tdj0zy+CQv87NA6+LlSa6bdxNbyG8m+aPu/ntJfiTO/ZqpqlOS/Oske7r70Vl+sOic+Xa1KV2U5Myjaq9KcmV3n57kysk8Y1yU//98X5Hk0d39mCR/meTV69mQYLWguvtQd39yMv3NLP8Hx9vr11BV7UzyT5P8zrx72Qqq6gFJnpTkHUnS3d/t7q/Nt6tNb3uSv1lV25PcJ8n/nnM/m053fzTJV44qn53k4sn0xUmeva5NbWIrne/u/nB33zGZ/XiW36W5bgSrDaCqdid5bJJPzLeTTe8/JXlFkrvm3cgW8ZAkh5P87mT49Xeq6r7zbmqz6u6DSd6Y5ItJDiX5end/eL5dbRknd/ehyfQtSU6eZzNbzIuS/Lf1PKBgteCq6n5JLkvyi939jXn3s1lV1VlJbuvuq+bdyxayPcmPJnlrdz82ybdiiGTNTO7rOTvLgfbBSe5bVf9qvl1tPb38KL7H8ddBVb0my7fVvGs9jytYLbCqOiHLoepd3f2+efezyT0hybOq6qYk707ylKr6/fm2tOkdSHKgu49ciX1vloMWa+NpSb7Q3Ye7+/Yk70vyj+bc01Zxa1XtSJLJ39vm3M+mV1UvSHJWkp/pdX6vlGC1oKqqsnzvyXXd/Rvz7mez6+5Xd/fO7t6d5Rt6/2d3+7/5NdTdtyS5uaoeMSk9Ncln5tjSZvfFJI+vqvtMvl+eGg8LrJfLk5w3mT4vyQfm2MumV1VnZvm2jmd197fX+/iC1eJ6QpLnZ/nKydWTzzPn3RQM9gtJ3lVV1yQ5I8l/nHM/m9bkyuB7k3wyyV9k+fvfG8EHq6pLkvxpkkdU1YGqenGS1yd5elV9PstXDl8/zx43k7s5329Jcv8kV0z+2/m2de3Jm9cBAMZwxQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgkP8L37DNRk6hpHQAAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "Viewing dates through the day of the month, we can detect a new outlier candidate: a large number of dates with the first day of the month compared to all other dates." ], "metadata": { "id": "f3nxadnKSuMQ" } }, { "cell_type": "code", "source": [ "dates_clean_day = [item.day for item in dates_clean]\n", "f = plt.figure()\n", "f.set_figwidth(10)\n", "f.set_figheight(5)\n", "plt.hist(dates_clean_day, bins=31, edgecolor='black')\n", "plt.title(\"Days\")\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 336 }, "id": "I9ynM5alTRBH", "outputId": "02988b3b-e542-4967-ff24-01390d3b78e3" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlYAAAE/CAYAAACEto0QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAVg0lEQVR4nO3dfYxlZ30f8O+va5O0BGJTT92V18sCMkRAk03YOK0K1A0NMYQGqCKHbQBDaRdaXBG1ak2IFCgSEiQYqiip0SIsTAPGbsyLS9wWi5JQ1PCyBsc2GIJN7drbtXexw1t4tf3rH3M2GTaz3vXcZzz3zn4+0tXe+5xz7v3No7Nzv/M856W6OwAAzO6vbXQBAACbhWAFADCIYAUAMIhgBQAwiGAFADCIYAUAMIhgBQAwiGAFzL2qurWqvl1V36iqr1bV/66qV1SV32HAXPFLCVgU/7i7H5Hk0UnemOTCJO/Y2JIAfpBgBSyU7v5ad1+V5JeTnF9VT66qX6iqz1bV16vq9qp63eH1q+oPqupfr3yPqrq+qp5fy95aVQenbW+oqic/xD8SsIkIVsBC6u5PJbkjydOS/HmSFyc5JckvJPmXVfW8adVLk7zw8HZV9RNJzkjyB0memeTpSR6f5EeTnJfk7ofoRwA2IcEKWGT/L8mjuvsPu/uG7r6/u69PclmSfzCtc1WSx1fVWdPrFyW5vLu/l+T7SR6R5MeSVHff1N0HHuKfAdhEBCtgkZ2R5J6q+pmq+mhVHaqqryV5RZLTkqS7v5Pk8iQvnA52353kP0/L/meS30nyu0kOVtXeqnrkRvwgwOYgWAELqap+OsvB6uNJ3pPlkakzu/tHk7wtSa1Y/dIkv5LkGUm+1d1/fHhBd/92dz8lyROzPCX47x6anwDYjAQrYKFU1SOr6jlJ3pvk97r7hixP593T3d+pqrOT/NOV20xB6v4kF2UarZre66en0a6Ts3yc1nem9QDWRLACFsV/rapvJLk9ya8neUuSl07L/lWS10/LfyPJFats/64kfyfJ761oe2SStyf5syS3ZfnA9d9al+qBE0J190bXALDuqurFSfZ091M3uhZg8zJiBWx6VfU3sjyqtXejawE2N8EK2NSq6ueTHEpyV5YPcgdYN6YCAQAGMWIFADCIYAUAMMhJG11Akpx22mm9Y8eOjS4DAOCYrr322q9099Jqy+YiWO3YsSP79u3b6DIAAI6pqm472jJTgQAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDnFDBauu27amqNT+2btu+0T8CADDH5uImzA+VO/ffnkdf+KE1b3/bm54zsBoAYLM5oUasAADWk2AFADCIYAUAMIhgBQAwiGAFADCIYAUAMIhgBQAwiGAFADCIYAUAMMgxg1VVXVJVB6vqxhVtl1fVddPj1qq6bmrfUVXfXrHsbetZPADAPDmeW9q8M8nvJHnX4Ybu/uXDz6vqoiRfW7H+Ld29c1SBAACL4pjBqrs/VlU7VltWVZXkvCQ/O7YsAIDFM+sxVk9Lcld3f2lF22Oq6rNV9UdV9bQZ3x8AYGEcz1TgA9md5LIVrw8k2d7dd1fVU5J8oKqe1N1fP3LDqtqTZE+SbN++fcYyAAA23ppHrKrqpCT/JMnlh9u6+7vdfff0/NoktyR5/Grbd/fe7t7V3buWlpbWWgYAwNyYZSrwHyX5Qnffcbihqpaqasv0/LFJzkry5dlKBABYDMdzuYXLkvxxkidU1R1V9bJp0Qvyg9OASfL0JNdPl1/4/SSv6O57RhYMADCvjueswN1HaX/JKm1XJrly9rIAABaPK68DAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADHLMYFVVl1TVwaq6cUXb66pqf1VdNz2evWLZr1XVzVX1xar6+fUqHABg3hzPiNU7k5y7Svtbu3vn9Lg6SarqiUlekORJ0zb/qaq2jCoWAGCeHTNYdffHktxznO/33CTv7e7vdvf/SXJzkrNnqA8AYGHMcozVBVV1/TRVeOrUdkaS21esc8fUBgCw6a01WF2c5HFJdiY5kOSiB/sGVbWnqvZV1b5Dhw6tsQwAgPmxpmDV3Xd1933dfX+St+cvp/v2JzlzxarbprbV3mNvd+/q7l1LS0trKQMAYK6sKVhV1dYVL5+f5PAZg1cleUFV/VBVPSbJWUk+NVuJAACL4aRjrVBVlyU5J8lpVXVHktcmOaeqdibpJLcmeXmSdPfnquqKJJ9Pcm+SV3b3fetTOgDAfDlmsOru3as0v+MB1n9DkjfMUhQAwCJy5XUAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQY4ZrKrqkqo6WFU3rmj7rar6QlVdX1Xvr6pTpvYdVfXtqrpuerxtPYsHAJgnxzNi9c4k5x7Rdk2SJ3f3jyf50yS/tmLZLd29c3q8YkyZAADz75jBqrs/luSeI9o+3N33Ti8/kWTbOtQGALBQRhxj9c+S/LcVrx9TVZ+tqj+qqqcNeH8AgIVw0iwbV9WvJ7k3ybunpgNJtnf33VX1lCQfqKondffXV9l2T5I9SbJ9+/ZZygAAmAtrHrGqqpckeU6SX+nuTpLu/m533z09vzbJLUkev9r23b23u3d1966lpaW1lgEAMDfWFKyq6twk/z7JL3b3t1a0L1XVlun5Y5OcleTLIwoFAJh3x5wKrKrLkpyT5LSquiPJa7N8FuAPJbmmqpLkE9MZgE9P8vqq+n6S+5O8orvvWfWNAQA2mWMGq+7evUrzO46y7pVJrpy1KACAReTK6wAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWAACDHFewqqpLqupgVd24ou1RVXVNVX1p+vfUqb2q6rer6uaqur6qfmq9igcAmCfHO2L1ziTnHtH26iQf6e6zknxkep0kz0py1vTYk+Ti2csEAJh/xxWsuvtjSe45ovm5SS6dnl+a5Hkr2t/Vyz6R5JSq2jqiWACAeTbLMVand/eB6fmdSU6fnp+R5PYV690xtQEAbGpDDl7v7k7SD2abqtpTVfuqat+hQ4dGlAEAsKFmCVZ3HZ7im/49OLXvT3LmivW2TW0/oLv3dveu7t61tLQ0QxkAAPNhlmB1VZLzp+fnJ/ngivYXT2cH/t0kX1sxZQgAsGmddDwrVdVlSc5JclpV3ZHktUnemOSKqnpZktuSnDetfnWSZye5Ocm3krx0cM0AAHPpuIJVd+8+yqJnrLJuJ3nlLEUBACwiV14HABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGOSktW5YVU9IcvmKpscm+Y0kpyT5F0kOTe2v6e6r11whAMCCWHOw6u4vJtmZJFW1Jcn+JO9P8tIkb+3uNw+pEABgQYyaCnxGklu6+7ZB7wcAsHBGBasXJLlsxesLqur6qrqkqk5dbYOq2lNV+6pq36FDh1ZbBQBgocwcrKrqYUl+Mcl/mZouTvK4LE8THkhy0Wrbdffe7t7V3buWlpZmLQMAYMONGLF6VpLPdPddSdLdd3X3fd19f5K3Jzl7wGcAc2brtu2pqpkeW7dt3+gfA2CoNR+8vsLurJgGrKqt3X1gevn8JDcO+AwmW7dtz537b1/z9n/7jDNz4I7/O7AiFtWs+1KSPPrCD820/W1ves5M2wPMm5mCVVU9PMnPJXn5iubfrKqdSTrJrUcsY0Z37r99pi8zX2QcZl8CGG+mYNXdf57kbx7R9qKZKgIAWFCuvM6DNuuxNY6rAY7k9wqbxYhjrDjBmEICRvN7hc3CiBUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVcEJzmj8wksstACc0p/mzmcx6q6otD/vh3Pe976x5e7dNE6wAYNMY8YeCPzRmYyoQNsCs00+moJaN6EeAkYxYwQaY9a/KxF+GiX4E5o8RKwCAQQSrE82Wk02dABzB9PwgA75jFr0fTQWeaO77vqkTgCOYVh7Ed4wRK4ATnWt5wThGrABOcK7lBeMYsQJYcLOOOAHjGLECWHBGnDaHWa+aznwQrB6M6WyHtXKpfwCOxgH0m4Ng9WDMeLaDHR4ANjfB6qE044gXbDr+TwCbjGD1UHJ9D/hBm2EU2CECwAqCFcAsNkM4BIZxuQUAgEFmHrGqqluTfCPJfUnu7e5dVfWoJJcn2ZHk1iTndfefzfpZAMwhx8rBXxg1FfgPu/srK16/OslHuvuNVfXq6fWFgz4LgHni+NFlAiZZv2OsnpvknOn5pUn+MIIVAJuZ4+3ImGOsOsmHq+raqtoztZ3e3Qem53cmOX3A5wBsPtMoh1vSwOYwYsTqqd29v6r+VpJrquoLKxd2d1dVH7nRFML2JMn27e6MzvEbcdsHp7gzN0yjwaYyc7Dq7v3Tvwer6v1Jzk5yV1Vt7e4DVbU1ycFVttubZG+S7Nq1668ELzgat30AYF7NNBVYVQ+vqkccfp7kmUluTHJVkvOn1c5P8sFZPgfmzdZt203dAPBXzDpidXqS909fFCcleU93//eq+nSSK6rqZUluS3LejJ8Dc2XWUTMjZgCb00zBqru/nOQnVmm/O8kzZnlvAIBF48rrAACDCFYAAIO4CTMPPVcnBkbze4U5IVjx0HN1YmA0v1eYE6YCAQAGEawAAAYRrAAABhGsAAAGcfA6JyZnEAGwDgQrTkzOIAJgHZgKBAAYxIgVLCrTmQBzR7CCRWU6E2DumAoEABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhkzcGqqs6sqo9W1eer6nNV9aqp/XVVtb+qrpsezx5XLgDA/Dpphm3vTfJvu/szVfWIJNdW1TXTsrd295tnLw8AYHGsOVh194EkB6bn36iqm5KcMaowAIBFM+QYq6rakeQnk3xyarqgqq6vqkuq6tQRnwEAMO9mDlZV9SNJrkzyq9399SQXJ3lckp1ZHtG66Cjb7amqfVW179ChQ7OWAQCw4WYKVlV1cpZD1bu7+31J0t13dfd93X1/krcnOXu1bbt7b3fv6u5dS0tLs5QBADAXZjkrsJK8I8lN3f2WFe1bV6z2/CQ3rr08AIDFMctZgX8/yYuS3FBV101tr0myu6p2JukktyZ5+UwVAgAsiFnOCvx4klpl0dVrLwcAYHG58joAwCCCFQDAIIIVAMAgghUAMD+2nJyqWvNj67btG1r+LGcFAgCMdd/38+gLP7TmzW9703MGFvPgGbECABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYZN2CVVWdW1VfrKqbq+rV6/U5AADzYl2CVVVtSfK7SZ6V5IlJdlfVE9fjswAA5sV6jVidneTm7v5yd38vyXuTPHedPgsAYC6sV7A6I8ntK17fMbUBAGxa1d3j37Tql5Kc293/fHr9oiQ/090XrFhnT5I908snJPniMd72tCRfGV7siUt/jqdPx9Kf4+nT8fTpWIvSn4/u7qXVFpy0Th+4P8mZK15vm9r+QnfvTbL3eN+wqvZ1964x5aE/x9OnY+nP8fTpePp0rM3Qn+s1FfjpJGdV1WOq6mFJXpDkqnX6LACAubAuI1bdfW9VXZDkfyTZkuSS7v7cenwWAMC8WK+pwHT31UmuHviWxz1tyHHRn+Pp07H053j6dDx9OtbC9+e6HLwOAHAicksbAIBB5j5YuTXOeFV1a1XdUFXXVdW+ja5nEVXVJVV1sKpuXNH2qKq6pqq+NP176kbWuEiO0p+vq6r90356XVU9eyNrXCRVdWZVfbSqPl9Vn6uqV03t9tE1eoA+tZ+uUVX9cFV9qqr+ZOrT/zC1P6aqPjl9718+nQS3MOZ6KnC6Nc6fJvm5LF9k9NNJdnf35ze0sAVXVbcm2dXdi3CtkLlUVU9P8s0k7+ruJ09tv5nknu5+4/RHwKndfeFG1rkojtKfr0vyze5+80bWtoiqamuSrd39map6RJJrkzwvyUtiH12TB+jT82I/XZOqqiQP7+5vVtXJST6e5FVJ/k2S93X3e6vqbUn+pLsv3shaH4x5H7FyaxzmUnd/LMk9RzQ/N8ml0/NLs/xLl+NwlP5kjbr7QHd/Znr+jSQ3ZfnuF/bRNXqAPmWNetk3p5cnT49O8rNJfn9qX7j9dN6DlVvjrI9O8uGquna6Aj5jnN7dB6bndyY5fSOL2SQuqKrrp6lC01ZrUFU7kvxkkk/GPjrEEX2a2E/XrKq2VNV1SQ4muSbJLUm+2t33Tqss3Pf+vAcr1sdTu/unkjwrySunaRgG6uU59vmdZ18MFyd5XJKdSQ4kuWhjy1k8VfUjSa5M8qvd/fWVy+yja7NKn9pPZ9Dd93X3zizfoeXsJD+2wSXNbN6D1TFvjcOD1937p38PJnl/lndmZnfXdBzG4eMxDm5wPQutu++afunen+TtsZ8+KNMxK1cmeXd3v29qto/OYLU+tZ+O0d1fTfLRJH8vySlVdfg6mwv3vT/vwcqtcQarqodPB16mqh6e5JlJbnzgrThOVyU5f3p+fpIPbmAtC+9wAJg8P/bT4zYdFPyOJDd191tWLLKPrtHR+tR+unZVtVRVp0zP/3qWT1S7KcsB65em1RZuP53rswKTZDp19T/mL2+N84YNLmmhVdVjszxKlSxfef89+vTBq6rLkpyT5Tux35XktUk+kOSKJNuT3JbkvO52QPZxOEp/npPl6ZVOcmuSl684PogHUFVPTfK/ktyQ5P6p+TVZPibIProGD9Cnu2M/XZOq+vEsH5y+JcsDPVd09+un76n3JnlUks8meWF3f3fjKn1w5j5YAQAsinmfCgQAWBiCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIP8f8PbcbGzirb8AAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "Finally, we can analyze from days of week as frequency." ], "metadata": { "id": "8Om6hNk0Tq3q" } }, { "cell_type": "code", "source": [ "weekDays = (\"Sunday\",\"Monday\",\"Tuesday\",\"Wednesday\",\"Thursday\",\"Friday\",\"Saturday\")\n", "dates_clean_weekday = [weekDays[item.weekday()] for item in dates_clean]\n", "f = plt.figure()\n", "f.set_figwidth(10)\n", "f.set_figheight(5)\n", "plt.hist(dates_clean_weekday, bins=7, edgecolor='black')\n", "plt.title(\"WeekDays\")\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 336 }, "id": "XRMaxn0iT1ec", "outputId": "d5f9003e-4773-4c64-a98d-f303459da929" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "In this use case, being a dataset of dates of birth, we can do the analysis as simple dates (as we have done) or transform them to ages." ], "metadata": { "id": "8ZzZ9cNwUTlY" } }, { "cell_type": "code", "source": [ "def calculate_age(birth_date):\n", " today = date.today()\n", " age = today.year - birth_date.year - ((today.month, today.day) < (birth_date.month, birth_date.day))\n", " \n", " return age\n", " \n", "\n", "ages = [calculate_age(item) for item in dates_clean]\n", "f = plt.figure()\n", "f.set_figwidth(10)\n", "f.set_figheight(5)\n", "\n", "plt.hist(ages, bins=40, edgecolor='black')\n", "plt.title(\"Ages\")\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 336 }, "id": "4Jn56YFB50R6", "outputId": "28f7d403-fc9a-4201-e5a1-79e1004288ba" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlYAAAE/CAYAAACEto0QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAVqklEQVR4nO3df6xed30f8PdnThoxykporODGcRxQoA1b51IvMBVQOlqaRBGB/ciSoZAxNoOUSK3UaQEmFTStUrMRkKp2qYJIEyYIZIMUFmUtWdRBKy0UB6IQQlMSGjfx7MQQRlCpWOJ89sc9Xi/ONfa9z/f63uf69ZKO7nm+55zn+VwfP1dvfb/nfE91dwAAmN3fWOsCAAA2CsEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKmCtV9T+r6ttVdcpa1wJwOMEKmBtVtT3Ja5N0kjeuaTEASxCsgHny1iR3J7kpyZWHGqvqx6vqv1XVU1X1xar691X1x4u2/2RV3VlVT1bVg1V16aJtF1XVA1X13araW1X/+nj+QsDGctJaFwCwDG9N8oEkX0hyd1Wd3t2PJ/ntJH+Z5MVJtif5gyR7kqSqnp/kziS/luTCJH8nyZ1VdX93P5Dkw0ku7e4/qqpTk5x9fH8lYCPRYwXMhap6TZKzktza3fckeTjJP6uqTUn+UZL3dvf3prB086JDL07ySHf/bnc/091fTvLJJP9k2v50knOr6m9197e7+0vH7ZcCNhzBCpgXVyb5bHd/c3r9saltcxZ63x9dtO/i9bOSvKqq/s+hJclbstC7lSyEsouS7Kmqz1XV31/NXwLY2AwFAuteVT0vyaVJNlXV/qn5lCQvTHJ6kmeSbE3yZ9O2Mxcd/miSz3X3Ly713t39xSSXVNXJSa5OcuthxwMcMz1WwDx4U5KDSc5NsmNafirJH2XhuqtPJXlfVf3NqvrJqe2Q25O8rKquqKqTp+XvVdVPVdWPVNVbqurHuvvpJE8lefZ4/mLAxiJYAfPgyiS/291/0d37Dy1JfisLw3pXJ/mxJPuT/OcktyT5fpJ093eTvCHJZUn+97TPtVno8UqSK5I8UlVPJXnn9H4AK1LdvdY1AAxVVdcmeXF3X3nUnQEG0mMFzL1pnqqfrgXnJXl7ktvWui7gxOPidWAjeEEWhv9+IsnjSa5L8uk1rQg4IRkKBAAYxFAgAMAgghUAwCDr4hqr0047rbdv377WZQAAHNU999zzze7evNS2dRGstm/fnt27d691GQAAR1VVe460zVAgAMAgghUAwCCCFQDAIIIVAMAgRw1WVXVjVT1RVfcvavtEVd07LY9U1b1T+/aq+qtF235nNYsHAFhPjuWuwJuy8AT5jxxq6O5/emi9qq5L8p1F+z/c3TtGFQgAMC+OGqy6+/NVtX2pbVVVSS5N8g/GlgUAMH9mvcbqtUke7+6vL2o7u6q+XFWfq6rXzvj+AABzY9YJQi/PwhPlD9mXZFt3f6uqfjbJ71XVK7r7qcMPrKpdSXYlybZt22YsAwBg7a24x6qqTkryD5N84lBbd3+/u781rd+T5OEkL1vq+O6+obt3dvfOzZuXnBUeAGCuzDIU+AtJ/rS7HzvUUFWbq2rTtP6SJOck+cZsJQIAzIdjmW7hliT/K8nLq+qxqnr7tOmy/OAwYJK8Lsl90/QL/zXJO7v7yZEFs7Fs2botVbWsZctWQ8cArE/V3WtdQ3bu3Nkewnxiqqqcdc3tyzpmz7UXZz38vwXgxFRV93T3zqW2mXkdAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBimG2bN2WqlrWAgAbyUlrXQAbx/69j+asa25f1jF7rr14laoBgONPjxUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCCCFQDAIIIVAMAgghUAwCBHDVZVdWNVPVFV9y9qe19V7a2qe6flokXb3l1VD1XVg1X1S6tVOMuzZeu2VNWyli1bt6112XPFvzEAJx3DPjcl+a0kHzms/YPd/f7FDVV1bpLLkrwiyU8k+R9V9bLuPjigVmawf++jOeua25d1zJ5rL16lajYm/8YAHLXHqrs/n+TJY3y/S5J8vLu/391/nuShJOfNUB8AwNyY5Rqrq6vqvmmo8NSp7Ywkjy7a57Gp7TmqaldV7a6q3QcOHJihDACA9WGlwer6JC9NsiPJviTXLfcNuvuG7t7Z3Ts3b968wjIAANaPFQWr7n68uw9297NJPpS/Hu7bm+TMRbtundoAADa8FQWrqtqy6OWbkxy6Y/AzSS6rqlOq6uwk5yT5k9lKBACYD0e9K7CqbklyfpLTquqxJO9Ncn5V7UjSSR5J8o4k6e6vVtWtSR5I8kySq9wRCACcKI4arLr78iWaP/xD9v/1JL8+S1EAAPPIzOsAAIMIVgAAgwhWAACDCFYAAIMIVgAAgwhWnBC2bN2WqjrmZcvWbWtdMgBz6KjTLcBGsH/voznrmtuPef891168itUAsFHpsQIAGESwAgAYRLACABhEsAIAGMTF6xzZppNTVWtdBQDMDcGKIzv4tDvpAGAZDAUCAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYwRxZ7sOkPVAa4Pgy3QLMkeU+TDoxDQbA8aTHCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDTLcBSNp2cqlrrKgCYM4IVLOXg0+aLAmDZjjoUWFU3VtUTVXX/orb/WFV/WlX3VdVtVfXCqX17Vf1VVd07Lb+zmsUDAKwnx3KN1U1JLjis7c4kf7u7fzrJnyV596JtD3f3jml555gyAQDWv6MGq+7+fJInD2v7bHc/M728O8nWVagNAGCujLgr8F8k+e+LXp9dVV+uqs9V1WsHvD8AwFyY6eL1qvq3SZ5J8tGpaV+Sbd39rar62SS/V1Wv6O6nljh2V5JdSbJt27ZZygAAWBdW3GNVVf88ycVJ3tLdnSTd/f3u/ta0fk+Sh5O8bKnju/uG7t7Z3Ts3b9680jIAANaNFQWrqrogyb9J8sbu/t6i9s1VtWlaf0mSc5J8Y0ShAADr3VGHAqvqliTnJzmtqh5L8t4s3AV4SpI7p0kU757uAHxdkn9XVU8neTbJO7v7ySXfGFbK5J0ArFNHDVbdffkSzR8+wr6fTPLJWYuCH8rknQCsU54VCAAwiGAFADCIYAUAMIhgBQAwiGAFADCIYAUAMIhgBQAwyEzPCgRmZLJTgA1FsIK1tMzJTk10CrC+GQoEABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABjkmIJVVd1YVU9U1f2L2l5UVXdW1denn6dO7VVVv1lVD1XVfVX1ytUqHgBgPTnWHqubklxwWNu7ktzV3eckuWt6nSQXJjlnWnYluX72MgEA1r9jClbd/fkkTx7WfEmSm6f1m5O8aVH7R3rB3UleWFVbRhQLALCezXKN1endvW9a35/k9Gn9jCSPLtrvsakNAGBDG3Lxend3kl7OMVW1q6p2V9XuAwcOjCgDAGBNzRKsHj80xDf9fGJq35vkzEX7bZ3afkB339DdO7t75+bNm2coAwBgfZglWH0myZXT+pVJPr2o/a3T3YGvTvKdRUOGAAAb1knHslNV3ZLk/CSnVdVjSd6b5DeS3FpVb0+yJ8ml0+53JLkoyUNJvpfkbYNrBgBYl44pWHX35UfY9Pol9u0kV81SFADAPDLzOgDAIIIVAMAgghUAwCCCFWx0m05OVS1r2bJ121pXDTCXjunidWCOHXw6Z11z+7IO2XPtxatUDMDGpscKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgEMEKAGAQwQoAYBDBCgBgkJNWemBVvTzJJxY1vSTJryV5YZJ/leTA1P6e7r5jxRUCAMyJFQer7n4wyY4kqapNSfYmuS3J25J8sLvfP6RCAIA5MWoo8PVJHu7uPYPeDwBg7owKVpcluWXR66ur6r6qurGqTh30GQAA69rMwaqqfiTJG5P8l6np+iQvzcIw4b4k1x3huF1Vtbuqdh84cGCpXQAA5sqIHqsLk3ypux9Pku5+vLsPdvezST6U5LylDuruG7p7Z3fv3Lx584AyAADW1ohgdXkWDQNW1ZZF296c5P4BnwEAsO6t+K7AJKmq5yf5xSTvWNT8H6pqR5JO8shh2wAANqyZglV3/2WSHz+s7YqZKgIAmFNmXgcAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwWge2bN2WqjrmZcvWbWtdMgCwhJPWugCS/XsfzVnX3H7M+++59uJVrAYAWCk9VgAAgwhWAACDCFYAAIMIVgAAgwhWAACDCFYAAIMIVgAAg8w8j1VVPZLku0kOJnmmu3dW1YuSfCLJ9iSPJLm0u78962cBAKxno3qsfr67d3T3zun1u5Lc1d3nJLlreg0AsKGt1lDgJUluntZvTvKmVfocAIB1Y0Sw6iSfrap7qmrX1HZ6d++b1vcnOX3A5wAArGsjgtVruvuVSS5MclVVvW7xxu7uLISvH1BVu6pqd1XtPnDgwIAyxlvuw5E9IBkATmwzX7ze3Xunn09U1W1JzkvyeFVt6e59VbUlyRNLHHdDkhuSZOfOnc8JXuvBch+OnHhAMgCcyGbqsaqq51fVCw6tJ3lDkvuTfCbJldNuVyb59CyfAwAwD2btsTo9yW1Vdei9Ptbdv19VX0xya1W9PcmeJJfO+DkAAOveTMGqu7+R5O8u0f6tJK+f5b0BAOaNmdcBAAYRrAAABhGsgJmtZGqSk055nulMgA1n5ukWAFY6NYnpTICNRo8VAMAgeqxG23RypuknAIATjGA12sGnDW8AwAlKsAKeS88rwIoIVsBzLbPnVa8rwAIXrwMADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAMYoLQeWRWbABYlwSreeR5hACwLhkKBAAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYRLACABhEsAIAGESwAgAYZMXBqqrOrKo/rKoHquqrVfXLU/v7qmpvVd07LReNKxcAYP2a5VmBzyT51e7+UlW9IMk9VXXntO2D3f3+2csDAJgfKw5W3b0vyb5p/btV9bUkZ4wqDABg3gy5xqqqtif5mSRfmJqurqr7qurGqjr1CMfsqqrdVbX7wIEDI8oAAFhTMwerqvrRJJ9M8ivd/VSS65O8NMmOLPRoXbfUcd19Q3fv7O6dmzdvnrUMAIA1N1OwqqqTsxCqPtrdn0qS7n68uw9297NJPpTkvNnLBABY/2a5K7CSfDjJ17r7A4vatyza7c1J7l95eQAA82OWuwJ/LskVSb5SVfdObe9JcnlV7UjSSR5J8o6ZKgQAmBOz3BX4x0lqiU13rLwcAID5ZeZ1AIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEKwCAQQQrAIBBBCsAgEEEK2B+bDo5VXXMy5at29a6YuAEc9JaFwBwzA4+nbOuuf2Yd99z7cWrWAzAc+mxAgAYRLACAI6LLVu3bfjhfEOBAMBxsX/vo8sbzn//m1NVy/qMF59xZvY99hfLLW0YwQoAWJ+WeV1lsvbXVhoKBIAsf5jqeA1VraSuk0553qruv9JjTgQnVI/Vlq3bsn/vo2tdBgDr0HKHqZLj0zuy0rqWewftan/GoWM2uhMqWC17bPcE+A8AAIxjKBAAVsqktRzmhOqxAoChTFrLYVatx6qqLqiqB6vqoap612p9DgDAerEqwaqqNiX57SQXJjk3yeVVde5qfBYAwHqxWj1W5yV5qLu/0d3/N8nHk1yySp8FAPNhmddknShTFGwkq3WN1RlJFs9r8FiSV63SZwHAfJjDCS9Znuru8W9a9Y+TXNDd/3J6fUWSV3X31Yv22ZVk1/Ty5UkeHF7IWKcl+eZaF8GKOX/zy7mbX87d/HLufrizunvzUhtWq8dqb5IzF73eOrX9f919Q5IbVunzh6uq3d29c63rYGWcv/nl3M0v525+OXcrt1rXWH0xyTlVdXZV/UiSy5J8ZpU+CwBgXViVHqvufqaqrk7yB0k2Jbmxu7+6Gp8FALBerNoEod19R5I7Vuv918DcDFuyJOdvfjl388u5m1/O3QqtysXrAAAnIs8KBAAYRLA6Bh7PM1+q6pGq+kpV3VtVu6e2F1XVnVX19ennqWtdJ0lV3VhVT1TV/YvaljxXteA3p+/hfVX1yrWrnOSI5+99VbV3+v7dW1UXLdr27un8PVhVv7Q2VZMkVXVmVf1hVT1QVV+tql+e2n3/ZiRYHYXH88ytn+/uHYtuF35Xkru6+5wkd02vWXs3JbngsLYjnasLk5wzLbuSXH+cauTIbspzz1+SfHD6/u2YrrfN9HfzsiSvmI75T9PfV9bGM0l+tbvPTfLqJFdN58j3b0aC1dF5PM/GcEmSm6f1m5O8aQ1rYdLdn0/y5GHNRzpXlyT5SC+4O8kLq2rL8amUpRzh/B3JJUk+3t3f7+4/T/JQFv6+sga6e193f2la/26Sr2XhqSm+fzMSrI5uqcfznLFGtXBsOslnq+qeaYb/JDm9u/dN6/uTnL42pXEMjnSufBfnx9XTcNGNi4bdnb91qqq2J/mZJF+I79/MBCs2otd09yuz0HV9VVW9bvHGXrgV1u2wc8C5mkvXJ3lpkh1J9iW5bm3L4Yepqh9N8skkv9LdTy3e5vu3MoLV0R318TysL929d/r5RJLbsjDc8Pihbuvp5xNrVyFHcaRz5bs4B7r78e4+2N3PJvlQ/nq4z/lbZ6rq5CyEqo9296emZt+/GQlWR+fxPHOkqp5fVS84tJ7kDUnuz8I5u3La7cokn16bCjkGRzpXn0ny1unupFcn+c6iIQvWicOuu3lzFr5/ycL5u6yqTqmqs7NwEfSfHO/6WFBVleTDSb7W3R9YtMn3b0arNvP6RuHxPHPn9CS3LfzNyElJPtbdv19VX0xya1W9PcmeJJeuYY1MquqWJOcnOa2qHkvy3iS/kaXP1R1JLsrCRc/fS/K2414wP+AI5+/8qtqRhSGkR5K8I0m6+6tVdWuSB7JwR9pV3X1wLeomSfJzSa5I8pWqundqe098/2Zm5nUAgEEMBQIADCJYAQAMIlgBAAwiWAEADCJYAQAMIlgBAAwiWAEADCJYAQAM8v8ABod0qxOO5NIAAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "Now, let’s calculate the basic statistics by age and visualize them easily with a boxplot, including:\n", "- min whisker (Q1–1.5*IQR)\n", "- Q1 (25th percentile rank)\n", "- median\n", "- Q3 (75th percentile rank)\n", "- max whisker (Q3–1.5*IQR)\n", "- outliers at both ends of the box figure" ], "metadata": { "id": "QHEgQoRZoRvU" } }, { "cell_type": "code", "source": [ "import statistics\n", "print(\"Min ages: \" + str(min(ages)))\n", "print(\"Max ages: \" + str(max(ages)))\n", "print(\"Mean ages: \" + str(statistics.mean(ages)))\n", "print(\"Median ages: \" + str(statistics.median(ages)))\n", "print(\"Standard deviation ages: \" + str(statistics.stdev(ages)) +\"\\n\")\n", "\n", "# Creating plot\n", "f = plt.figure()\n", "f.set_figwidth(10)\n", "f.set_figheight(5)\n", " \n", "red_circle = dict(markerfacecolor='red', marker='o')\n", "plt.boxplot(x=ages, vert=False, patch_artist=True, flierprops=red_circle, labels=['Ages']);\n", "plt.title(\"Ages\")\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 440 }, "id": "_CYBfiBhPmNW", "outputId": "e443b1ef-8376-48d2-8743-fac7bb91b8cd" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Min ages: -7\n", "Max ages: 221\n", "Mean ages: 53.85601681555439\n", "Median ages: 50\n", "Standard deviation ages: 33.66534871034922\n", "\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAE/CAYAAACTlB3ZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAATBUlEQVR4nO3df6zdd33f8df7XschiU1M4jSEYPtGGxX50TWjaVdUttFfFLJocQXLQjIIENWrDO4qtRWslgbS5mn5o12HRyzRlh/rTBBKILRRBE1RShtVtEkYtPlROtb6ENIklKSQX0oCzqd/nHO94+t7fJ3L/fj42o+HdHTv+Z7v93vf3+/Xx37qnHOTaq0FAIB+ZqY9AADA8U5wAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFrHpV9YdV9fdVdfK0ZwFYjOACVrWqmkvyz5O0JP96qsMATCC4gNXurUm+kOQjSa6ZX1hVZ1bV71XV41V1Z1X9l6q6Y+zxV1bVbVX1WFV9paquGHvs0qq6r6qeqKoHq+qXj+YBAcefNdMeAOB79NYkv57kT5N8oarObq09kuQDSZ5K8tIkc0k+m2SQJFV1WpLbkvynJG9I8gNJbquqe1pr9yX57SRXtNb+uKpekuS8o3tIwPHGK1zAqlVVr0myJcknWmt3J/l/Sa6qqtkkb0zy3tba06OI+ujYppcl2dda+3Br7buttf+T5KYk/2b0+HeSXFBVL26t/X1r7YtH7aCA45LgAlaza5L8fmvtm6P7HxstOyvDV/AfGFt3/PstSf5ZVX1r/pbk6gxfDUuGsXZpkkFVfb6qXt3zIIDjn7cUgVWpqk5JckWS2ap6eLT45CQbkpyd5LtJXp7kr0aPbRrb/IEkn2+t/fRi+26t3Znk8qo6Kcm7knxiwfYAL4hXuIDVamuS/UkuSHLx6HZ+kj/O8HNdn0zyvqo6tapeOVo275Yk319Vb6mqk0a3H66q86tqbVVdXVWnt9a+k+TxJM8fzQMDjj+CC1itrkny4dba11prD8/fkvzPDN8efFeS05M8nOR3ktyQ5Nkkaa09keR1Sa5M8rejda7L8BWyJHlLkn1V9XiSnx/tD2DZqrU27RkAuquq65K8tLV2zZIrA6wwr3ABx6XRf2frn9TQjyS5Nsmnpj0XcGLyoXngeLU+w7cRX5bkkSS/luTTU50IOGF5SxEAoDNvKQIAdCa4AAA6O6Y/w7Vx48Y2Nzc37TEAAJZ09913f7O1dtZijx3TwTU3N5e77rpr2mMAACypqgaTHvOWIgBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ4ILAKAzwUU357x8c6rqmL3lfadPfYYXejvn5ZunfVkBWIY10x6A49fDDz6QLe++ZdpjHMZVx/h8hxpcd9m0RwBgGbzCBQDQmeACAOhMcAEAdCa4AAA6E1wAAJ0JLgCAzgQXAEBnggsAoLMTPriqatojACvAcxk4lp3wwQUA0JvgAgDoTHABAHQmuAAAOhNcAACdHVFwVdXWqmpV9creAwEAHG+O9BWuNye5Y/QVAE44N+zdm4vm5jI7M5OXrV+fdTMzmanK6VUHvr7+p35q4vY7tm/PGWvWHLTN/P3zNm7Mpo0bMzszk4vm5nLD3r1HNMdFc3PZsX378P7Y/taffPKScx3Yz4JjOOjrzEzWLVh2xpo12bF9+yHH9KIF2//ghRcuOu8Ne/ceWHa48zdpnfGvG2Zn833r1h1yLhauO1u15Hntbcngqqp1SV6T5NokV46WzVTV9VX1l1V1W1XdWlVvGj32Q1X1+aq6u6o+W1XnjJb/QlXdV1V/XlUf73hMALCibti7Nzu3bcvuwSDPtJa9Tz6ZM1vLR5PcnGQuyY4kX/7c5xaNmx3bt+fGPXuyff/+zI22eTbJTaP7Vz/6aGYefTQfaS27B4Ps3LZt0ThYOMfWwSA37tkzvD/a3zlJ1jz33IGfcfMicx20n7Fj+IkkZ4/Nd3NreUmSsxbMfOOePfnBCy/MjXv25Kb9+/PzSV4yvl2Sb9x3X7ace+5B8+4eDPKLb3973vOOd2TrYHDQuRifc36+VwwGB88zmm/H6P6nnn8+pzz11IHzduOePTlltM2O0THdnOSZ5LDn9Wio1trhV6i6OslPtNaurao/yfAYzkvyjiSXJfm+JPcn+bkkn07y+SSXt9b+rqr+bZKfaa29o6r+Nsl5rbVnq2pDa+1bSw13ySWXtLvuuut7Ob4lVVWWOgcsT1Vly7tvmfYYE+170VWZe+Zj0x7jBRlcd5k/rxN4LtPTRXNz2T0Y5MfHlt2e4T+I94x9vzvJ1iTfXvBn8Yw1a3LT/v0H1llsP7sX7m/Lltyzb99h57howv5+LslXFywbn2vS8bwxyU0vYH83j9Y9Y8J24+vM+8dJfnPsmBfbZtOWLdk9GEycZ/48Lbw//jMnnutFzutKqaq7W2uXLPrYEQTXLUn+R2vttqr6hSSbk6xJ8uXW2odH63wyyceS/GWSP0ny16PNZ5M81Fp7XVV9JsmTGZ6Hm1trT074eduSbEuSzZs3/9BgMHhBB/tC+a9T9yW4VtbgusumPcIxTXDRy+zMTJ5pLSeNLftOkhcl2T/2/TNJTk7y/II/izNVeXZsncX288zC/VVl//PPH3aO2cPsb/+CZeNzTTqekzN85ehI9ze/7syE7Rbb3/zMk87FyRn+2/xMa0c0z8LrML/NxHO9yHldKYcLrsO+pVhVZ2T4CuNvVdW+JL+S5IokkyqlktzbWrt4dPuB1trrRo/9qyQfSPKqJHdW1ZrFdtBa+2Br7ZLW2iVnnXXWUse2Ilprbh1u9DHt63qs3qCn8zdvzh0Llt2R5PwF39+RZP0i22+YnT1oncX2c8j+Nm9eco5J+ztvkWXjc006ng0vcH/z607abv0iy89LDnsu1o/NN2m/50+4P/4zJ57rRc7r0bDUZ7jelOR3WmtbWmtzrbVNSf4myWNJ3jj6LNfZSV47Wv8rSc6qqlcnSVWdVFUXVtVMkk2ttduTvDvJ6UnWdTgeAFhxO3ftyrWnnprbM3yV5PYkb8vwH7TbM/yQ89YkVyV59U/+5CHbX71tW64arXPtaJvvLNj2oP2demp27tq15BzzP3N8f/8uyaMLli2ca7HjuTbJv1xkf29N8sQi+5u74IID6165yHZXJdnwspcd8nOeOOmkvG3t2kXPxfyc8/MtNs/8eVzsOlyV5PvH1jnkXE84r0fDoq8yjXlzkusWLLspw3D8epL7kjyQ5ItJvt1ae2704fn3V9Xpo/3/RpK/SvK/R8sqyfvbEXyGCwCOBW+++uokyY6dO3P/176Ws087LY8/9VSuaS3rMwyS3RnGwmf+4A8O2X739dcnSa7/4Afzrf37s3W0zYbZ2Xxr//7sPfPMPJ/kbY89lvM3b86uXbsO/MzDzXH+5s1506WXZsett+b+wSCnj/Z32tq12frcc3kiw1d8Fs510H4Gg6wbzfPo6Ov8fOursr+1PD22bMPsbK7eti27r78+O7ZvzxtHx7R2fLsMg+zL996bG/buPWje3xgFz66dO7NvMDhom4Vz7tq5M48sWGf+XP/XJC+emcnaU07J255++sC5uP3WW/PIYJDdY8fyZJLzt2yZeF6PhiU/wzVxw6p1rbUnq+rMJH+W5Mdaaw+v5HA+NL+6+dD8yvOh+ck8l4FpO9xnuJZ6hetwbqmqDUnWJvnPKx1bAADHi2UHV2vttSs4BwDAccv/SxEAoDPBBQDQmeACAOhMcAEAdHbCB5dfI4fjg+cycCw74YMLAKA3wQUA0JngAgDoTHABAHQmuAAAOhNcAACdCS4AgM4EFwBAZ2umPQDHr5eeuymD6y6b9hiTvffFx/Z8i3jpuZumPQIAyyC46Oahr39t2iMsqb1v2hMAcCLwliIAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGeCCwCgs2qtTXuGiarq75IMpj3HEjYm+ea0h2BZXLvVy7VbvVy71c31O7wtrbWzFnvgmA6u1aCq7mqtXTLtOXjhXLvVy7VbvVy71c31Wz5vKQIAdCa4AAA6E1zfuw9OewCWzbVbvVy71cu1W91cv2XyGS4AgM68wgUA0JngWqaqen1VfaWqvlpV75n2PCytqvZV1V9U1Zeq6q7RsjOq6raq+r+jry+Z9pwkVfWhqvpGVd0ztmzRa1VD7x89F/+8ql41vcmZcO3eV1UPjp57X6qqS8ce+4+ja/eVqvqZ6UxNklTVpqq6varuq6p7q+o/jJZ77q0AwbUMVTWb5ANJ3pDkgiRvrqoLpjsVR+jHW2sXj/1a83uSfK619ooknxvdZ/o+kuT1C5ZNulZvSPKK0W1bkj1HaUYW95Eceu2S5L+PnnsXt9ZuTZLR35tXJrlwtM31o79fmY7vJvml1toFSX40yTtH18hzbwUIruX5kSRfba39dWvtuSQfT3L5lGdieS5P8tHR9x9NsnWKszDSWvujJI8tWDzpWl2e5H+1oS8k2VBV5xydSVlowrWb5PIkH2+tPdta+5skX83w71emoLX2UGvti6Pvn0hyf5Jz47m3IgTX8pyb5IGx+18fLePY1pL8flXdXVXbRsvObq09NPr+4SRnT2c0jsCka+X5uDq8a/S204fG3rp37Y5RVTWX5J8m+dN47q0IwcWJ5DWttVdl+DL4O6vqX4w/2Ia/suvXdlcB12rV2ZPkHyW5OMlDSX5tuuNwOFW1LslNSX6xtfb4+GOee8snuJbnwSSbxu6/fLSMY1hr7cHR128k+VSGb108Mv8S+OjrN6Y3IUuYdK08H49xrbVHWmv7W2vPJ/nN/P+3DV27Y0xVnZRhbO1trX1ytNhzbwUIruW5M8krquq8qlqb4Yc+f3fKM3EYVXVaVa2f/z7J65Lck+F1u2a02jVJPj2dCTkCk67V7yZ56+g3pn40ybfH3v7gGLDgcz0/m+FzLxleuyur6uSqOi/DD1//2dGej6GqqiS/neT+1tqvjz3kubcC1kx7gNWotfbdqnpXks8mmU3yodbavVMei8M7O8mnhn+fZE2Sj7XWPlNVdyb5RFVdm2SQ5IopzshIVd2Q5LVJNlbV15O8N8l/y+LX6tYkl2b4geunk7z9qA/MAROu3Wur6uIM34ral+TfJ0lr7d6q+kSS+zL8Dbl3ttb2T2NukiQ/luQtSf6iqr40Wvar8dxbEf5L8wAAnXlLEQCgM8EFANCZ4AIA6ExwAQB0JrgAADoTXAAAnQkuAIDOBBcAQGf/AJRn7PybjmjEAAAAAElFTkSuQmCC\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "**Conclusion**\n", "We can use basic statistical methods to perform a data quality analysis in order to detect potential errors and outliers. Coding with Python we can easily calculate the statistics and represent them graphically to help their interpretation.\n", "\n" ], "metadata": { "id": "vjXzriiGwwBz" } } ] }