{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "A6iFUUQLNDlE" }, "source": [ "\n", " \n", " \n", " \n", "
\n", " Run in Google Colab\n", " \n", " View on Github\n", " \n", " View raw on Github\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "rHEGnwRZTG6O" }, "source": [ "# Module 8: Histogram and CDF\n", "\n", "A deep dive into Histogram and boxplot." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2020-06-14T19:55:00.229Z", "iopub.status.busy": "2020-06-14T19:55:00.202Z", "iopub.status.idle": "2020-06-14T19:55:00.311Z", "shell.execute_reply": "2020-06-14T19:55:00.333Z" }, "executionInfo": { "elapsed": 184, "status": "ok", "timestamp": 1687818245973, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "S4vGQ3FkTG6R" }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import seaborn as sns\n", "import altair as alt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "id": "N6blkVDDTG6T" }, "source": [ "## The tricky histogram with pre-counted data" ] }, { "cell_type": "markdown", "metadata": { "id": "95_k7X_-TG6T" }, "source": [ "Let's revisit the table from the class\n", "\n", "| Hours | Frequency |\n", "|-------|-----------|\n", "| 0-1 | 4,300 |\n", "| 1-3 | 6,900 |\n", "| 3-5 | 4,900 |\n", "| 5-10 | 2,000 |\n", "| 10-24 | 2,100 |" ] }, { "cell_type": "markdown", "metadata": { "id": "CeO69PpmTG6U" }, "source": [ "You can draw a histogram by just providing bins and counts instead of a list of numbers. So, let's try that." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2020-06-14T19:55:09.164Z", "iopub.status.busy": "2020-06-14T19:55:09.141Z", "iopub.status.idle": "2020-06-14T19:55:09.196Z", "shell.execute_reply": "2020-06-14T19:55:09.215Z" }, "executionInfo": { "elapsed": 154, "status": "ok", "timestamp": 1687818249521, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "FZPPq6inTG6U" }, "outputs": [], "source": [ "bins = [0, 1, 3, 5, 10, 24]\n", "data = {0.5: 4300, 2: 6900, 4: 4900, 7: 2000, 15: 2100}" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:55:11.108Z", "iopub.status.busy": "2020-06-14T19:55:11.089Z", "iopub.status.idle": "2020-06-14T19:55:11.146Z", "shell.execute_reply": "2020-06-14T19:55:11.165Z" }, "executionInfo": { "elapsed": 5, "status": "ok", "timestamp": 1687818250050, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "0kJUBT1hTG6U", "jupyter": { "outputs_hidden": false }, "outputId": "cf1ab5f7-e38b-4ddd-d37c-c993c7108c6a" }, "outputs": [ { "data": { "text/plain": [ "dict_keys([0.5, 2, 4, 7, 15])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.keys()" ] }, { "cell_type": "markdown", "metadata": { "id": "dLAjvNswTG6V" }, "source": [ "**Q: Draw histogram using this data.** Useful query: [Google search: matplotlib histogram pre-counted](https://www.google.com/search?client=safari&rls=en&q=matplotlib+histogram+already+counted&ie=UTF-8&oe=UTF-8#q=matplotlib+histogram+pre-counted)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 467 }, "execution": { "iopub.execute_input": "2020-06-14T19:55:50.412Z", "iopub.status.busy": "2020-06-14T19:55:50.391Z", "iopub.status.idle": "2020-06-14T19:55:50.511Z", "shell.execute_reply": "2020-06-14T19:55:50.533Z" }, "executionInfo": { "elapsed": 624, "status": "ok", "timestamp": 1687818251274, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "wo4Z0fgRTG6V", "jupyter": { "outputs_hidden": false }, "outputId": "008af409-d043-4577-8c87-e513ddc1ab49" }, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Frequency')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "jdeSwaifTG6W" }, "source": [ "As you can see, the **default histogram does not normalize with binwidth and simply shows the counts**! This can be very misleading if you are working with variable bin width (e.g. logarithmic bins). So please be mindful about histograms when you work with variable bins.\n", "\n", "**Q: You can fix this by using the `density` option.**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 467 }, "execution": { "iopub.execute_input": "2020-06-14T19:55:58.905Z", "iopub.status.busy": "2020-06-14T19:55:58.882Z", "iopub.status.idle": "2020-06-14T19:55:58.991Z", "shell.execute_reply": "2020-06-14T19:55:59.009Z" }, "executionInfo": { "elapsed": 610, "status": "ok", "timestamp": 1687818252370, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "4ucfqdvvTG6W", "jupyter": { "outputs_hidden": false }, "outputId": "625b307a-c444-4fb5-f305-c04e22ae41b3" }, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Density')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "4roBIweOTG6W" }, "source": [ "## Let's use an actual dataset" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2020-06-14T19:56:24.048Z", "iopub.status.busy": "2020-06-14T19:56:24.027Z", "iopub.status.idle": "2020-06-14T19:56:24.081Z", "shell.execute_reply": "2020-06-14T19:56:24.100Z" }, "executionInfo": { "elapsed": 151, "status": "ok", "timestamp": 1687818255570, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "xPTrYs55TG6X" }, "outputs": [], "source": [ "import vega_datasets" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 285 }, "execution": { "iopub.execute_input": "2020-06-14T19:56:25.250Z", "iopub.status.busy": "2020-06-14T19:56:25.234Z", "iopub.status.idle": "2020-06-14T19:56:25.670Z", "shell.execute_reply": "2020-06-14T19:56:25.727Z" }, "executionInfo": { "elapsed": 1381, "status": "ok", "timestamp": 1687818257413, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "a244eCSOTG6X", "jupyter": { "outputs_hidden": false }, "outputId": "57724aad-7da7-436c-f146-84e94932e933" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleUS_GrossWorldwide_GrossUS_DVD_SalesProduction_BudgetRelease_DateMPAA_RatingRunning_Time_minDistributorSourceMajor_GenreCreative_TypeDirectorRotten_Tomatoes_RatingIMDB_RatingIMDB_Votes
0The Land Girls146083.0146083.0NaN8000000.0Jun 12 1998RNaNGramercyNoneNoneNoneNoneNaN6.11071.0
1First Love, Last Rites10876.010876.0NaN300000.0Aug 07 1998RNaNStrandNoneDramaNoneNoneNaN6.9207.0
2I Married a Strange Person203134.0203134.0NaN250000.0Aug 28 1998NoneNaNLionsgateNoneComedyNoneNoneNaN6.8865.0
3Let's Talk About Sex373615.0373615.0NaN300000.0Sep 11 1998NoneNaNFine LineNoneComedyNoneNone13.0NaNNaN
4Slam1009819.01087521.0NaN1000000.0Oct 09 1998RNaNTrimarkOriginal ScreenplayDramaContemporary FictionNone62.03.4165.0
\n", "
" ], "text/plain": [ " Title US_Gross Worldwide_Gross US_DVD_Sales \\\n", "0 The Land Girls 146083.0 146083.0 NaN \n", "1 First Love, Last Rites 10876.0 10876.0 NaN \n", "2 I Married a Strange Person 203134.0 203134.0 NaN \n", "3 Let's Talk About Sex 373615.0 373615.0 NaN \n", "4 Slam 1009819.0 1087521.0 NaN \n", "\n", " Production_Budget Release_Date MPAA_Rating Running_Time_min Distributor \\\n", "0 8000000.0 Jun 12 1998 R NaN Gramercy \n", "1 300000.0 Aug 07 1998 R NaN Strand \n", "2 250000.0 Aug 28 1998 None NaN Lionsgate \n", "3 300000.0 Sep 11 1998 None NaN Fine Line \n", "4 1000000.0 Oct 09 1998 R NaN Trimark \n", "\n", " Source Major_Genre Creative_Type Director \\\n", "0 None None None None \n", "1 None Drama None None \n", "2 None Comedy None None \n", "3 None Comedy None None \n", "4 Original Screenplay Drama Contemporary Fiction None \n", "\n", " Rotten_Tomatoes_Rating IMDB_Rating IMDB_Votes \n", "0 NaN 6.1 1071.0 \n", "1 NaN 6.9 207.0 \n", "2 NaN 6.8 865.0 \n", "3 13.0 NaN NaN \n", "4 62.0 3.4 165.0 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies = vega_datasets.data.movies()\n", "movies.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "EuyhowKmTG6X" }, "source": [ "Let's plot the histogram of IMDB ratings." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 484 }, "execution": { "iopub.execute_input": "2020-06-14T19:56:30.176Z", "iopub.status.busy": "2020-06-14T19:56:30.152Z", "iopub.status.idle": "2020-06-14T19:56:30.285Z", "shell.execute_reply": "2020-06-14T19:56:30.306Z" }, "executionInfo": { "elapsed": 393, "status": "ok", "timestamp": 1687818258217, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "lCzR16y5TG6X", "jupyter": { "outputs_hidden": false }, "outputId": "35899e50-84a1-4384-baf8-0c7a177b225f" }, "outputs": [ { "data": { "text/plain": [ "(array([ 9., 39., 76., 133., 293., 599., 784., 684., 323., 48.]),\n", " array([1.4 , 2.18, 2.96, 3.74, 4.52, 5.3 , 6.08, 6.86, 7.64, 8.42, 9.2 ]),\n", " )" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.hist(movies.IMDB_Rating)" ] }, { "cell_type": "markdown", "metadata": { "id": "az_zAN5wTG6Y" }, "source": [ "Did you get an error or a warning? What's going on?\n", "\n", "The problem is that the column contains `NaN` (Not a Number) values, which represent missing data points. The following command check whether each value is a `NaN` and returns the result." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:56:48.429Z", "iopub.status.busy": "2020-06-14T19:56:48.411Z", "iopub.status.idle": "2020-06-14T19:56:48.468Z", "shell.execute_reply": "2020-06-14T19:56:48.486Z" }, "executionInfo": { "elapsed": 6, "status": "ok", "timestamp": 1687818258701, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "x3jwldMSTG6Y", "jupyter": { "outputs_hidden": false }, "outputId": "bca94b9a-d041-435a-fa39-dcff6d0cef18" }, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 False\n", "3 True\n", "4 False\n", " ... \n", "3196 False\n", "3197 True\n", "3198 False\n", "3199 False\n", "3200 False\n", "Name: IMDB_Rating, Length: 3201, dtype: bool" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.IMDB_Rating.isna()" ] }, { "cell_type": "markdown", "metadata": { "id": "OxBEfsovTG6Y" }, "source": [ "As you can see there are a bunch of missing rows. You can count them." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:56:51.688Z", "iopub.status.busy": "2020-06-14T19:56:51.671Z", "iopub.status.idle": "2020-06-14T19:56:51.722Z", "shell.execute_reply": "2020-06-14T19:56:51.739Z" }, "executionInfo": { "elapsed": 148, "status": "ok", "timestamp": 1687818259537, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "3FfJ3a34TG6Y", "jupyter": { "outputs_hidden": false }, "outputId": "1b2524c6-6262-4cc2-b5c6-d16dafeb7c32" }, "outputs": [ { "data": { "text/plain": [ "213" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(movies.IMDB_Rating.isna())" ] }, { "cell_type": "markdown", "metadata": { "id": "i1BRz3o5TG6Z" }, "source": [ "or drop them." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:56:52.894Z", "iopub.status.busy": "2020-06-14T19:56:52.876Z", "iopub.status.idle": "2020-06-14T19:56:52.932Z", "shell.execute_reply": "2020-06-14T19:56:52.950Z" }, "executionInfo": { "elapsed": 173, "status": "ok", "timestamp": 1687818261726, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "vwa6CfbsTG6Z", "jupyter": { "outputs_hidden": false }, "outputId": "f32f7a7b-52f0-496f-95be-427c1968999c" }, "outputs": [ { "data": { "text/plain": [ "2988" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "IMDB_ratings_nan_dropped = movies.IMDB_Rating.dropna()\n", "len(IMDB_ratings_nan_dropped)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:56:54.792Z", "iopub.status.busy": "2020-06-14T19:56:54.775Z", "iopub.status.idle": "2020-06-14T19:56:54.826Z", "shell.execute_reply": "2020-06-14T19:56:54.843Z" }, "executionInfo": { "elapsed": 156, "status": "ok", "timestamp": 1687818262363, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "vVZ3sm-ITG6Z", "jupyter": { "outputs_hidden": false }, "outputId": "2ad2aa9f-5d76-4e3e-ae66-cab435e19eb0" }, "outputs": [ { "data": { "text/plain": [ "3201" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "213 + 2988" ] }, { "cell_type": "markdown", "metadata": { "id": "v_MDtrt_TG6Z" }, "source": [ "The `dropna` can be applied to the dataframe too.\n", "\n", "**Q: drop rows from `movies` dataframe where either `IMDB_Rating` or `IMDB_Votes` is `NaN`.**" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2020-06-14T19:56:55.835Z", "iopub.status.busy": "2020-06-14T19:56:55.817Z", "iopub.status.idle": "2020-06-14T19:56:55.861Z", "shell.execute_reply": "2020-06-14T19:56:55.878Z" }, "executionInfo": { "elapsed": 150, "status": "ok", "timestamp": 1687818264693, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "eUxZKFWkTG6Z" }, "outputs": [], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:56:56.768Z", "iopub.status.busy": "2020-06-14T19:56:56.751Z", "iopub.status.idle": "2020-06-14T19:56:56.803Z", "shell.execute_reply": "2020-06-14T19:56:56.819Z" }, "executionInfo": { "elapsed": 145, "status": "ok", "timestamp": 1687818265230, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "c6jSjbxhTG6a", "jupyter": { "outputs_hidden": false }, "outputId": "69d58a1c-61d4-4721-c1bd-3e9b568d9acf" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 0\n" ] } ], "source": [ "# Both should be zero.\n", "print(sum(movies.IMDB_Rating.isna()), sum(movies.IMDB_Votes.isna()))" ] }, { "cell_type": "markdown", "metadata": { "id": "4ZeoyhBJTG6a" }, "source": [ "How does `matplotlib` decides the bins? Actually `matplotlib`'s `hist` function uses `numpy`'s `histogram` function under the hood." ] }, { "cell_type": "markdown", "metadata": { "id": "fOlXNjFyTG6a" }, "source": [ "**Q: Plot the histogram of movie ratings (`IMDB_Rating`) using the `plt.hist()` function.**" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 484 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:00.883Z", "iopub.status.busy": "2020-06-14T19:57:00.865Z", "iopub.status.idle": "2020-06-14T19:57:00.973Z", "shell.execute_reply": "2020-06-14T19:57:01.058Z" }, "executionInfo": { "elapsed": 1678, "status": "ok", "timestamp": 1687818275930, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "rn1cO2ByTG6a", "jupyter": { "outputs_hidden": false }, "outputId": "2cd84d13-3fec-4ea2-9948-c299038ba14f" }, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Frequency')" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "3BelR31kTG6a" }, "source": [ "Have you noticed that this function returns three objects? Take a look at the documentation [here](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist) to figure out what they are.\n", "\n", "To get the returned three objects:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 466 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:04.322Z", "iopub.status.busy": "2020-06-14T19:57:04.229Z", "iopub.status.idle": "2020-06-14T19:57:05.023Z", "shell.execute_reply": "2020-06-14T19:57:05.046Z" }, "executionInfo": { "elapsed": 1035, "status": "ok", "timestamp": 1687818276961, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "qrmt7ma_TG6a", "jupyter": { "outputs_hidden": false }, "outputId": "139797bc-c5b3-4b84-9f47-523354786db2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 9. 39. 76. 133. 293. 599. 784. 684. 323. 48.]\n", "[1.4 2.18 2.96 3.74 4.52 5.3 6.08 6.86 7.64 8.42 9.2 ]\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "n_raw, bins_raw, patches = plt.hist(movies.IMDB_Rating)\n", "print(n_raw)\n", "print(bins_raw)" ] }, { "cell_type": "markdown", "metadata": { "id": "1SUgGDBwTG6b" }, "source": [ "Here, `n_raw` contains the values of histograms, i.e., the number of movies in each of the 10 bins. Thus, the sum of the elements in `n_raw` should be equal to the total number of movies.\n", "\n", "**Q: Test whether the sum of values in `n_raw` is equal to the number of movies in the `movies` dataset**" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:07.793Z", "iopub.status.busy": "2020-06-14T19:57:07.778Z", "iopub.status.idle": "2020-06-14T19:57:07.833Z", "shell.execute_reply": "2020-06-14T19:57:07.849Z" }, "executionInfo": { "elapsed": 10, "status": "ok", "timestamp": 1687818276962, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "iCGZ-jN2TG6b", "jupyter": { "outputs_hidden": false }, "outputId": "3fabb0e6-1b39-4bbc-ebd3-ddf2009c3851" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2988.0\n", "2988\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "c9gIcp06TG6b" }, "source": [ "The second returned object (`bins_raw`) is a list containing the edges of the 10 bins: the first bin is \\[1.4, 2.18\\], the second \\[2.18, 2.96\\], and so on. What's the width of the bins?" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:10.040Z", "iopub.status.busy": "2020-06-14T19:57:10.025Z", "iopub.status.idle": "2020-06-14T19:57:10.081Z", "shell.execute_reply": "2020-06-14T19:57:10.097Z" }, "executionInfo": { "elapsed": 151, "status": "ok", "timestamp": 1687818311779, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "im5eQCroTG6b", "jupyter": { "outputs_hidden": false }, "outputId": "ca756ea4-bc06-404e-c650-0c2675d17369" }, "outputs": [ { "data": { "text/plain": [ "array([0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.diff(bins_raw)" ] }, { "cell_type": "markdown", "metadata": { "id": "C3TsmP_RTG6b" }, "source": [ "The width is same as the maximum value minus minimum value, divided by 10." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:11.016Z", "iopub.status.busy": "2020-06-14T19:57:10.997Z", "iopub.status.idle": "2020-06-14T19:57:11.052Z", "shell.execute_reply": "2020-06-14T19:57:11.068Z" }, "executionInfo": { "elapsed": 223, "status": "ok", "timestamp": 1687818312845, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "GxrmxezwTG6b", "jupyter": { "outputs_hidden": false }, "outputId": "e4ef2cd2-6171-4b28-d669-61ca6ed9d8b4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.4 9.2\n", "0.7799999999999999\n" ] } ], "source": [ "min_rating = min(movies.IMDB_Rating)\n", "max_rating = max(movies.IMDB_Rating)\n", "print(min_rating, max_rating)\n", "print( (max_rating-min_rating) / 10 )" ] }, { "cell_type": "markdown", "metadata": { "id": "JGxvd4CGTG6c" }, "source": [ "Now, let's plot a normalized (density) histogram." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 486 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:12.070Z", "iopub.status.busy": "2020-06-14T19:57:12.053Z", "iopub.status.idle": "2020-06-14T19:57:12.176Z", "shell.execute_reply": "2020-06-14T19:57:12.239Z" }, "executionInfo": { "elapsed": 432, "status": "ok", "timestamp": 1687818319541, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "OIrthY8ATG6c", "jupyter": { "outputs_hidden": false }, "outputId": "92cb8ce3-30af-45ce-ac89-3a276607c7b2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.0038616 0.0167336 0.03260907 0.05706587 0.12571654 0.25701095\n", " 0.33638829 0.29348162 0.13858854 0.0205952 ]\n", "[1.4 2.18 2.96 3.74 4.52 5.3 6.08 6.86 7.64 8.42 9.2 ]\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "n, bins, patches = plt.hist(movies.IMDB_Rating, density=True)\n", "print(n)\n", "print(bins)" ] }, { "cell_type": "markdown", "metadata": { "id": "-qKKEOXMTG6c" }, "source": [ "The ten bins do not change. But now `n` represents the density of the data inside each bin. In other words, the sum of the area of each bar will equal to 1.\n", "\n", "**Q: Can you verify this?**\n", "\n", "Hint: the area of each bar is calculated as height * width. You may get something like 0.99999999999999978 instead of 1." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:13.305Z", "iopub.status.busy": "2020-06-14T19:57:13.290Z", "iopub.status.idle": "2020-06-14T19:57:13.338Z", "shell.execute_reply": "2020-06-14T19:57:13.353Z" }, "executionInfo": { "elapsed": 154, "status": "ok", "timestamp": 1687818325739, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "HtDdUJUdTG6c", "jupyter": { "outputs_hidden": false }, "outputId": "b7d1e959-23db-4106-f296-d3190f318e81" }, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "ugwtR9guTG6c" }, "source": [ "Anyway, these data generated from the `hist` function is calculated from `numpy`'s `histogram` function. https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html\n", "\n", "Note that the result of `np.histogram()` is same as that of `plt.hist()`." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:14.303Z", "iopub.status.busy": "2020-06-14T19:57:14.287Z", "iopub.status.idle": "2020-06-14T19:57:14.334Z", "shell.execute_reply": "2020-06-14T19:57:14.349Z" }, "executionInfo": { "elapsed": 201, "status": "ok", "timestamp": 1687818328266, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "xn-WpuKiTG6c", "jupyter": { "outputs_hidden": false }, "outputId": "05d56921-f033-4a88-dd19-eda81353c5ca" }, "outputs": [ { "data": { "text/plain": [ "(array([ 9, 39, 76, 133, 293, 599, 784, 684, 323, 48]),\n", " array([1.4 , 2.18, 2.96, 3.74, 4.52, 5.3 , 6.08, 6.86, 7.64, 8.42, 9.2 ]))" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.histogram(movies.IMDB_Rating)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 484 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:14.903Z", "iopub.status.busy": "2020-06-14T19:57:14.885Z", "iopub.status.idle": "2020-06-14T19:57:14.997Z", "shell.execute_reply": "2020-06-14T19:57:15.015Z" }, "executionInfo": { "elapsed": 319, "status": "ok", "timestamp": 1687818328581, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "a1DPjJI7TG6c", "jupyter": { "outputs_hidden": false }, "outputId": "f7213626-edd6-4584-f7ef-96070d480917" }, "outputs": [ { "data": { "text/plain": [ "(array([ 9., 39., 76., 133., 293., 599., 784., 684., 323., 48.]),\n", " array([1.4 , 2.18, 2.96, 3.74, 4.52, 5.3 , 6.08, 6.86, 7.64, 8.42, 9.2 ]),\n", " )" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.hist(movies.IMDB_Rating)" ] }, { "cell_type": "markdown", "metadata": { "id": "o4ArxCNETG6d" }, "source": [ "If you look at the documentation, you can see that `numpy` uses simply 10 as the default number of bins. But you can set it manually or set it to be `auto`, which is the \"Maximum of the `sturges` and `fd` estimators.\". Let's try this `auto` option." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 430 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:15.850Z", "iopub.status.busy": "2020-06-14T19:57:15.833Z", "iopub.status.idle": "2020-06-14T19:57:15.998Z", "shell.execute_reply": "2020-06-14T19:57:16.015Z" }, "executionInfo": { "elapsed": 316, "status": "ok", "timestamp": 1687818329586, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "g04dPJySTG6d", "jupyter": { "outputs_hidden": false }, "outputId": "868eaedc-074b-45df-ea31-5d3a0648d9e2" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "_ = plt.hist(movies.IMDB_Rating, bins='auto')" ] }, { "cell_type": "markdown", "metadata": { "id": "TM_jST11TG6d" }, "source": [ "## Consequences of the binning parameter\n", "\n", "Let's explore the effect of bin size using small multiples. In `matplotlib`, you can use [subplot](https://www.google.com/search?client=safari&rls=en&q=matplotlib+subplot&ie=UTF-8&oe=UTF-8) to put multiple plots into a single figure.\n", "\n", "For instance, you can do something like:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 463 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:17.609Z", "iopub.status.busy": "2020-06-14T19:57:17.591Z", "iopub.status.idle": "2020-06-14T19:57:17.858Z", "shell.execute_reply": "2020-06-14T19:57:17.878Z" }, "executionInfo": { "elapsed": 1004, "status": "ok", "timestamp": 1687818421168, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "4lliCeD6TG6d", "jupyter": { "outputs_hidden": false }, "outputId": "88567283-2f7e-470d-bccd-e19734730225" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0cAAAGsCAYAAAAWptzrAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy80BEi2AAAACXBIWXMAAA9hAAAPYQGoP6dpAABHHUlEQVR4nO3df3RU9Z3/8dckTAaCTGKwyZAlYNa2QPghlChMtRyUkBBSKprdbmqKaZcDWzahhXQR4gEaiBqMLiKIUHos2COprbvF1kghA1SiJUCITYFgqVpbbHWS/RZhhByGSWa+f3hy6xiQkMxkfuT5OCdH7r2fO/d93w5z8+LOvdfk8/l8AgAAAIB+LibUBQAAAABAOCAcAQAAAIAIRwAAAAAgiXAEAAAAAJIIRwAAAAAgiXAEAAAAAJIIRwAAAAAgSRoQ6gKCxev16v3339eQIUNkMplCXQ4A9Bs+n08fffSRUlNTFRPDv8F9EscmAAiN7h6bojYcvf/++0pLSwt1GQDQb7333nsaPnx4qMsIKxybACC0rnVsitpwNGTIEEkfN8BqtYa4mq48Ho9qa2uVnZ0ts9kc6nIiGr0MDPoYGPRRcrlcSktLMz6H8Q8cm/oH+hgY9DFw6GX3j01RG446v65gtVrD9gAUHx8vq9Xab9+kgUIvA4M+BgZ9/Ae+NtYVx6b+gT4GBn0MHHr5D9c6NvFlcAAAAAAQ4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgA0M+sW7dOJpNJS5YsMeZdunRJxcXFGjp0qG644Qbl5+erpaXFb70zZ84oLy9P8fHxSk5O1rJly9Te3t7H1QMAgolwBADoNxoaGvTDH/5QEyZM8Ju/dOlSvfzyy3rxxRd18OBBvf/++7rvvvuM5R0dHcrLy9Ply5d16NAhPffcc9qxY4dWr17d17sAAAgiwhEAoF+4cOGCCgsL9aMf/Ug33nijMf/8+fN69tlntX79et19992aPHmytm/frkOHDunw4cOSpNraWp06dUrPP/+8Jk6cqNzcXFVUVGjz5s26fPlyqHYJABBgA0JdAAAAfaG4uFh5eXnKysrSww8/bMxvbGyUx+NRVlaWMW/06NEaMWKE6uvrNXXqVNXX12v8+PFKSUkxxuTk5GjRokVqbm7WpEmTrrhNt9stt9ttTLtcLkmSx+ORx+MJ9C72WmdN4VhbJKGPgUEfA4dedn/fCUcAgKj3wgsv6I033lBDQ0OXZU6nU3FxcUpMTPSbn5KSIqfTaYz5ZDDqXN657GoqKyu1Zs2aLvNra2sVHx9/vbvRZxwOR6hLiAr0MTDoY+D05162tbV1axzhCFFjXPleuTtMoS4jYllifaq6Pbh9/PO6vKC8LvBZ3nvvPX3ve9+Tw+HQwIED+3TbZWVlKi0tNaZdLpfS0tKUnZ0tq9Xap7V0h8fjkcPh0MyZM2U2m0NdTsSij4ERjD6OK9/bq/VPlucEpI6+xnvyH2fur4VwBACIao2NjWptbdWXvvQlY15HR4fq6ur09NNPa+/evbp8+bLOnTvnd/aopaVFNptNkmSz2XT06FG/1+28m13nmCuxWCyyWCxd5pvN5rD+BSXc64sU9DEwAtnH3v7jX6T//+zP78nu7jc3ZAAARLUZM2boxIkTampqMn4yMzNVWFho/NlsNmv//v3GOqdPn9aZM2dkt9slSXa7XSdOnFBra6sxxuFwyGq1KiMjo8/3CQAQHJw5AgBEtSFDhmjcuHF+8wYPHqyhQ4ca8+fPn6/S0lIlJSXJarVq8eLFstvtmjp1qiQpOztbGRkZmjdvnqqqquR0OrVy5UoVFxdf8cwQACAyEY4AAP3ek08+qZiYGOXn58vtdisnJ0fPPPOMsTw2NlY1NTVatGiR7Ha7Bg8erKKiIq1duzaEVQMAAo1wBADod1599VW/6YEDB2rz5s3avHnzVdcZOXKkdu/eHeTKAAChxDVHAAAAACDCEQAAAABIIhwBAAAAgCTCEQAAAABIIhwBAAAAgCTCEQAAAABI4lbeAAAAiBA3r3gl1CUgynHmCAAAAADUg3BUV1enOXPmKDU1VSaTSS+99NJVx37nO9+RyWTShg0b/OafPXtWhYWFslqtSkxM1Pz583XhwgW/McePH9dXvvIVDRw4UGlpaaqqqrreUgEAAACg2647HF28eFG33nrrZz5FXJJ27dqlw4cPKzU1tcuywsJCNTc3y+FwqKamRnV1dVq4cKGx3OVyKTs7WyNHjlRjY6Mef/xxlZeXa9u2bddbLgAAAAB0y3Vfc5Sbm6vc3NzPHPO3v/1Nixcv1t69e5WXl+e37M0339SePXvU0NCgzMxMSdKmTZs0e/ZsPfHEE0pNTdXOnTt1+fJl/fjHP1ZcXJzGjh2rpqYmrV+/3i9EfZLb7Zbb7TamXS6XJMnj8cjj8VzvbgZdZ03hWFuk6eyhJcYX4koiW2f/gtnH/vB+5+92/953AEBkC/gNGbxer+bNm6dly5Zp7NixXZbX19crMTHRCEaSlJWVpZiYGB05ckT33nuv6uvrNW3aNMXFxRljcnJy9Nhjj+nDDz/UjTfe2OV1KysrtWbNmi7za2trFR8fH6C9CzyHwxHqEqJGRaY31CVEhWD2cffu3UF77XDTn/9ut7W1hboEAAB6JODh6LHHHtOAAQP03e9+94rLnU6nkpOT/YsYMEBJSUlyOp3GmPT0dL8xKSkpxrIrhaOysjKVlpYa0y6XS2lpacrOzpbVau3VPgWDx+ORw+HQzJkzZTabQ11OROvs5apjMXJ7TaEuJ2JZYnyqyPQGtY8ny3OC8rrhhL/b/zhzDwBApAloOGpsbNRTTz2lN954QyZT3/6SarFYZLFYusw3m81h/QtKuNcXSdxek9wdhKPeCmYf+9N7vT//3e6v+w0AiHwBvZX3a6+9ptbWVo0YMUIDBgzQgAED9Je//EXf//73dfPNN0uSbDabWltb/dZrb2/X2bNnZbPZjDEtLS1+YzqnO8cAAAAAQCAFNBzNmzdPx48fV1NTk/GTmpqqZcuWae/evZIku92uc+fOqbGx0VjvwIED8nq9mjJlijGmrq7O76Jeh8OhUaNGXfErdQAAAADQW9f9tboLFy7o7bffNqbfffddNTU1KSkpSSNGjNDQoUP9xpvNZtlsNo0aNUqSNGbMGM2aNUsLFizQ1q1b5fF4VFJSooKCAuO23/fff7/WrFmj+fPna/ny5Tp58qSeeuopPfnkk73ZVwAAAKDHbl7xSq/W//O6vGsPQkhddzg6duyY7rrrLmO68yYIRUVF2rFjR7deY+fOnSopKdGMGTMUExOj/Px8bdy40ViekJCg2tpaFRcXa/Lkybrpppu0evXqq97GGwAAAAB667rD0fTp0+Xzdf85KH/+85+7zEtKSlJ1dfVnrjdhwgS99tpr11seAAAAAPRIwG/lDQBX09uvI0QCS6xPVbdL48r3RvTdE/nqBwCgPwroDRkAAAAAIFIRjgAAAABAhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgD0A1u2bNGECRNktVpltVplt9v161//2lg+ffp0mUwmv5/vfOc7fq9x5swZ5eXlKT4+XsnJyVq2bJna29v7elcAAEE0INQFAAAQbMOHD9e6dev0hS98QT6fT88995zuuece/e53v9PYsWMlSQsWLNDatWuNdeLj440/d3R0KC8vTzabTYcOHdIHH3ygBx54QGazWY8++mif7w8AIDgIRwCAqDdnzhy/6UceeURbtmzR4cOHjXAUHx8vm812xfVra2t16tQp7du3TykpKZo4caIqKiq0fPlylZeXKy4uLuj7AAAIPsIRAKBf6ejo0IsvvqiLFy/Kbrcb83fu3Knnn39eNptNc+bM0apVq4yzR/X19Ro/frxSUlKM8Tk5OVq0aJGam5s1adKkK27L7XbL7XYb0y6XS5Lk8Xjk8XiCsXu90llTONYWSehjYFypj5ZYX6jKCYhQvSd4T3Z/3wlHAIB+4cSJE7Lb7bp06ZJuuOEG7dq1SxkZGZKk+++/XyNHjlRqaqqOHz+u5cuX6/Tp0/rFL34hSXI6nX7BSJIx7XQ6r7rNyspKrVmzpsv82tpav6/thRuHwxHqEqICfQyMT/ax6vYQFhIAu3fvDun2+/N7sq2trVvjCEcAgH5h1KhRampq0vnz5/U///M/Kioq0sGDB5WRkaGFCxca48aPH69hw4ZpxowZeuedd3TLLbf0eJtlZWUqLS01pl0ul9LS0pSdnS2r1dqr/QkGj8cjh8OhmTNnymw2h7qciEUfA+NKfRxXvjfEVfXOyfKckGyX9+Q/ztxfC+EIANAvxMXF6fOf/7wkafLkyWpoaNBTTz2lH/7wh13GTpkyRZL09ttv65ZbbpHNZtPRo0f9xrS0tEjSVa9TkiSLxSKLxdJlvtlsDutfUMK9vkhBHwPjk310d5hCXE3vhPr90J/fk93db27lDQDol7xer9/1QJ/U1NQkSRo2bJgkyW6368SJE2ptbTXGOBwOWa1W46t5AIDIx5kjAEDUKysrU25urkaMGKGPPvpI1dXVevXVV7V371698847qq6u1uzZszV06FAdP35cS5cu1bRp0zRhwgRJUnZ2tjIyMjRv3jxVVVXJ6XRq5cqVKi4uvuKZIQBAZCIcAQCiXmtrqx544AF98MEHSkhI0IQJE7R3717NnDlT7733nvbt26cNGzbo4sWLSktLU35+vlauXGmsHxsbq5qaGi1atEh2u12DBw9WUVGR33ORAACRj3AEAIh6zz777FWXpaWl6eDBg9d8jZEjR4b8TlMAgODimiMAAAAAEOEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAkjTgeleoq6vT448/rsbGRn3wwQfatWuX5s6dK0nyeDxauXKldu/erT/96U9KSEhQVlaW1q1bp9TUVOM1zp49q8WLF+vll19WTEyM8vPz9dRTT+mGG24wxhw/flzFxcVqaGjQ5z73OS1evFgPPvhg7/cYAAAAIXHzile6PdYS61PV7dK48r1yd5iCWBXwD9d95ujixYu69dZbtXnz5i7L2tra9MYbb2jVqlV644039Itf/EKnT5/W1772Nb9xhYWFam5ulsPhUE1Njerq6rRw4UJjucvlUnZ2tkaOHKnGxkY9/vjjKi8v17Zt23qwiwAAAABwbdd95ig3N1e5ublXXJaQkCCHw+E37+mnn9btt9+uM2fOaMSIEXrzzTe1Z88eNTQ0KDMzU5K0adMmzZ49W0888YRSU1O1c+dOXb58WT/+8Y8VFxensWPHqqmpSevXr/cLUQAAAAAQKNcdjq7X+fPnZTKZlJiYKEmqr69XYmKiEYwkKSsrSzExMTpy5Ijuvfde1dfXa9q0aYqLizPG5OTk6LHHHtOHH36oG2+8sct23G633G63Me1yuSR9/FU/j8cTpL3ruc6awrG2SNPZQ0uML8SVRLbO/tHH3omWPvbms4nPNQBApApqOLp06ZKWL1+ub3zjG7JarZIkp9Op5ORk/yIGDFBSUpKcTqcxJj093W9MSkqKsexK4aiyslJr1qzpMr+2tlbx8fEB2Z9g+PSZNvRcRaY31CVEBfoYGJHex927d/d43ba2tgBWAgBA3wlaOPJ4PPr6178un8+nLVu2BGszhrKyMpWWlhrTLpdLaWlpys7ONoJZOPF4PHI4HJo5c6bMZnOoy4lonb1cdSxGbi8XbPaUJcanikwvfeylaOnjyfKcHq/beeYeAIBIE5Rw1BmM/vKXv+jAgQN+4cRms6m1tdVvfHt7u86ePSubzWaMaWlp8RvTOd055tMsFossFkuX+WazOazDR7jXF0ncXhN3swkA+hgYkd7H3nwu8ZkGAIhUAX/OUWcweuutt7Rv3z4NHTrUb7ndbte5c+fU2NhozDtw4IC8Xq+mTJlijKmrq/P73rrD4dCoUaOu+JU6AAAAAOit6w5HFy5cUFNTk5qamiRJ7777rpqamnTmzBl5PB79y7/8i44dO6adO3eqo6NDTqdTTqdTly9fliSNGTNGs2bN0oIFC3T06FH99re/VUlJiQoKCoxnId1///2Ki4vT/Pnz1dzcrJ/97Gd66qmn/L42BwAAAACBdN1fqzt27JjuuusuY7ozsBQVFam8vFy/+tWvJEkTJ070W+83v/mNpk+fLknauXOnSkpKNGPGDOMhsBs3bjTGJiQkqLa2VsXFxZo8ebJuuukmrV69mtt4AwAAAAia6w5H06dPl8939VvUftayTklJSaqurv7MMRMmTNBrr712veUBAAAAQI8E/JojAAAAAIhEhCMAAAAAEOEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAUg8eAgsAAADg+t284pVerf/ndXkBqgRXw5kjAAAAABDhCAAAAAAkEY4AAAAAQBLhCAAAAAAkEY4AAP3Ali1bNGHCBFmtVlmtVtntdv361782ll+6dEnFxcUaOnSobrjhBuXn56ulpcXvNc6cOaO8vDzFx8crOTlZy5YtU3t7e1/vCgAgiAhHAICoN3z4cK1bt06NjY06duyY7r77bt1zzz1qbm6WJC1dulQvv/yyXnzxRR08eFDvv/++7rvvPmP9jo4O5eXl6fLlyzp06JCee+457dixQ6tXrw7VLgEAgoBbeQMAot6cOXP8ph955BFt2bJFhw8f1vDhw/Xss8+qurpad999tyRp+/btGjNmjA4fPqypU6eqtrZWp06d0r59+5SSkqKJEyeqoqJCy5cvV3l5ueLi4kKxWwCAACMcAQD6lY6ODr344ou6ePGi7Ha7Ghsb5fF4lJWVZYwZPXq0RowYofr6ek2dOlX19fUaP368UlJSjDE5OTlatGiRmpubNWnSpCtuy+12y+12G9Mul0uS5PF45PF4grSHPddZUzjWFkno49VZYn3dHxvj8/svev6e4j3Z/X0nHAEA+oUTJ07Ibrfr0qVLuuGGG7Rr1y5lZGSoqalJcXFxSkxM9BufkpIip9MpSXI6nX7BqHN557Krqays1Jo1a7rMr62tVXx8fC/3KHgcDkeoS4gK9LGrqtuvf52KTG/gC4lQu3fv7tX6/fk92dbW1q1xhCMAQL8watQoNTU16fz58/qf//kfFRUV6eDBg0HdZllZmUpLS41pl8ultLQ0ZWdny2q1BnXbPeHxeORwODRz5kyZzeZQlxOx6OPVjSvf2+2xlhifKjK9WnUsRm6vKYhVRY6T5Tk9Wo/35D/O3F8L4QgA0C/ExcXp85//vCRp8uTJamho0FNPPaV/+7d/0+XLl3Xu3Dm/s0ctLS2y2WySJJvNpqNHj/q9Xufd7DrHXInFYpHFYuky32w2h/UvKOFeX6Sgj125O64/5Li9ph6tF416+37qz+/J7u43d6sDAPRLXq9XbrdbkydPltls1v79+41lp0+f1pkzZ2S32yVJdrtdJ06cUGtrqzHG4XDIarUqIyOjz2sHAAQHZ44AAFGvrKxMubm5GjFihD766CNVV1fr1Vdf1d69e5WQkKD58+ertLRUSUlJslqtWrx4sex2u6ZOnSpJys7OVkZGhubNm6eqqio5nU6tXLlSxcXFVzwzBACITIQjAEDUa21t1QMPPKAPPvhACQkJmjBhgvbu3auZM2dKkp588knFxMQoPz9fbrdbOTk5euaZZ4z1Y2NjVVNTo0WLFslut2vw4MEqKirS2rVrQ7VLAIAgIBwBAKLes88++5nLBw4cqM2bN2vz5s1XHTNy5Mhe3ykKABDeuOYIAAAAAEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkEQ4AgAAAABJhCMAAAAAkNSDcFRXV6c5c+YoNTVVJpNJL730kt9yn8+n1atXa9iwYRo0aJCysrL01ltv+Y05e/asCgsLZbValZiYqPnz5+vChQt+Y44fP66vfOUrGjhwoNLS0lRVVXX9ewcAAAAA3XTd4ejixYu69dZbtXnz5isur6qq0saNG7V161YdOXJEgwcPVk5Oji5dumSMKSwsVHNzsxwOh2pqalRXV6eFCxcay10ul7KzszVy5Eg1Njbq8ccfV3l5ubZt29aDXQQAAACAaxtwvSvk5uYqNzf3ist8Pp82bNiglStX6p577pEk/eQnP1FKSopeeuklFRQU6M0339SePXvU0NCgzMxMSdKmTZs0e/ZsPfHEE0pNTdXOnTt1+fJl/fjHP1ZcXJzGjh2rpqYmrV+/3i9EAQAAAECgXHc4+izvvvuunE6nsrKyjHkJCQmaMmWK6uvrVVBQoPr6eiUmJhrBSJKysrIUExOjI0eO6N5771V9fb2mTZumuLg4Y0xOTo4ee+wxffjhh7rxxhu7bNvtdsvtdhvTLpdLkuTxeOTxeAK5mwHRWVM41hZpOntoifGFuJLI1tk/+tg70dLH3nw28bkGAIhUAQ1HTqdTkpSSkuI3PyUlxVjmdDqVnJzsX8SAAUpKSvIbk56e3uU1OpddKRxVVlZqzZo1XebX1tYqPj6+h3sUfA6HI9QlRI2KTG+oS4gK9DEwIr2Pu3fv7vG6bW1tAawEAIC+E9BwFEplZWUqLS01pl0ul9LS0pSdnS2r1RrCyq7M4/HI4XBo5syZMpvNoS4nonX2ctWxGLm9plCXE7EsMT5VZHrpYy9FSx9Pluf0eN3OM/cAAESagIYjm80mSWppadGwYcOM+S0tLZo4caIxprW11W+99vZ2nT171ljfZrOppaXFb0zndOeYT7NYLLJYLF3mm83msA4f4V5fJHF7TXJ3RO4vo+GCPgZGpPexN59LfKYB0evmFa+EugQgqAL6nKP09HTZbDbt37/fmOdyuXTkyBHZ7XZJkt1u17lz59TY2GiMOXDggLxer6ZMmWKMqaur8/veusPh0KhRo674lToAAAAA6K3rDkcXLlxQU1OTmpqaJH18E4ampiadOXNGJpNJS5Ys0cMPP6xf/epXOnHihB544AGlpqZq7ty5kqQxY8Zo1qxZWrBggY4eParf/va3KikpUUFBgVJTUyVJ999/v+Li4jR//nw1NzfrZz/7mZ566im/r80BAAAAQCBd99fqjh07prvuusuY7gwsRUVF2rFjhx588EFdvHhRCxcu1Llz53TnnXdqz549GjhwoLHOzp07VVJSohkzZigmJkb5+fnauHGjsTwhIUG1tbUqLi7W5MmTddNNN2n16tXcxhsAAABA0Fx3OJo+fbp8vqvfotZkMmnt2rVau3btVcckJSWpurr6M7czYcIEvfbaa9dbHgAAAAD0SECvOQIAAACASEU4AgAAAAARjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIANAPVFZW6rbbbtOQIUOUnJysuXPn6vTp035jpk+fLpPJ5Pfzne98x2/MmTNnlJeXp/j4eCUnJ2vZsmVqb2/vy10BAATRgFAXAABAsB08eFDFxcW67bbb1N7eroceekjZ2dk6deqUBg8ebIxbsGCB1q5da0zHx8cbf+7o6FBeXp5sNpsOHTqkDz74QA888IDMZrMeffTRPt0fAEBwEI4AAFFvz549ftM7duxQcnKyGhsbNW3aNGN+fHy8bDbbFV+jtrZWp06d0r59+5SSkqKJEyeqoqJCy5cvV3l5ueLi4oK6DwCA4CMcAQD6nfPnz0uSkpKS/Obv3LlTzz//vGw2m+bMmaNVq1YZZ4/q6+s1fvx4paSkGONzcnK0aNEiNTc3a9KkSV2243a75Xa7jWmXyyVJ8ng88ng8Ad+v3uqsKRxriyTR3EdLrK/vthXj8/svev6eiub3ZHd1d98JRwCAfsXr9WrJkiW64447NG7cOGP+/fffr5EjRyo1NVXHjx/X8uXLdfr0af3iF7+QJDmdTr9gJMmYdjqdV9xWZWWl1qxZ02V+bW2t31f2wo3D4Qh1CVEhGvtYdXvfb7Mi09v3Gw1Tu3fv7tX60fie7K62trZujSMcAQD6leLiYp08eVKvv/663/yFCxcafx4/fryGDRumGTNm6J133tEtt9zSo22VlZWptLTUmHa5XEpLS1N2drasVmvPdiCIPB6PHA6HZs6cKbPZHOpyIlY093Fc+d4+25YlxqeKTK9WHYuR22vqs+2Gs5PlOT1aL5rfk93Veeb+WghHAIB+o6SkRDU1Naqrq9Pw4cM/c+yUKVMkSW+//bZuueUW2Ww2HT161G9MS0uLJF31OiWLxSKLxdJlvtlsDutfUMK9vkgRjX10d/R9SHF7TSHZbjjq7fspGt+T3dXd/eZW3gCAqOfz+VRSUqJdu3bpwIEDSk9Pv+Y6TU1NkqRhw4ZJkux2u06cOKHW1lZjjMPhkNVqVUZGRlDqBgD0Lc4cAQCiXnFxsaqrq/XLX/5SQ4YMMa4RSkhI0KBBg/TOO++ourpas2fP1tChQ3X8+HEtXbpU06ZN04QJEyRJ2dnZysjI0Lx581RVVSWn06mVK1equLj4imeHAACRhzNHAICot2XLFp0/f17Tp0/XsGHDjJ+f/exnkqS4uDjt27dP2dnZGj16tL7//e8rPz9fL7/8svEasbGxqqmpUWxsrOx2u775zW/qgQce8HsuEgAgsnHmCAAQ9Xy+z74VcFpamg4ePHjN1xk5cmSv7xYFAAhfnDkCAAAAABGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJBGOAAAAAEAS4QgAAAAAJAUhHHV0dGjVqlVKT0/XoEGDdMstt6iiokI+n88Y4/P5tHr1ag0bNkyDBg1SVlaW3nrrLb/XOXv2rAoLC2W1WpWYmKj58+frwoULgS4XAAAAACQFIRw99thj2rJli55++mm9+eabeuyxx1RVVaVNmzYZY6qqqrRx40Zt3bpVR44c0eDBg5WTk6NLly4ZYwoLC9Xc3CyHw6GamhrV1dVp4cKFgS4XAAAAACRJAwL9gocOHdI999yjvLw8SdLNN9+sn/70pzp69Kikj88abdiwQStXrtQ999wjSfrJT36ilJQUvfTSSyooKNCbb76pPXv2qKGhQZmZmZKkTZs2afbs2XriiSeUmpoa6LIBAAAA9HMBD0df/vKXtW3bNv3xj3/UF7/4Rf3+97/X66+/rvXr10uS3n33XTmdTmVlZRnrJCQkaMqUKaqvr1dBQYHq6+uVmJhoBCNJysrKUkxMjI4cOaJ77723y3bdbrfcbrcx7XK5JEkej0cejyfQu9lrnTWFY22RprOHlhjfNUbis3T2jz72TrT0sTefTXyuAQAiVcDD0YoVK+RyuTR69GjFxsaqo6NDjzzyiAoLCyVJTqdTkpSSkuK3XkpKirHM6XQqOTnZv9ABA5SUlGSM+bTKykqtWbOmy/za2lrFx8f3er+CxeFwhLqEqFGR6Q11CVGBPgZGpPdx9+7dPV63ra0tgJUAANB3Ah6Ofv7zn2vnzp2qrq7W2LFj1dTUpCVLlig1NVVFRUWB3pyhrKxMpaWlxrTL5VJaWpqys7NltVqDtt2e8ng8cjgcmjlzpsxmc6jLiWidvVx1LEZurynU5UQsS4xPFZle+thL0dLHk+U5PV6388w9AACRJuDhaNmyZVqxYoUKCgokSePHj9df/vIXVVZWqqioSDabTZLU0tKiYcOGGeu1tLRo4sSJkiSbzabW1la/121vb9fZs2eN9T/NYrHIYrF0mW82m8M6fIR7fZHE7TXJ3RG5v4yGC/oYGJHex958LvGZBgCIVAEPR21tbYqJ8b8JXmxsrLzej79ikp6eLpvNpv379xthyOVy6ciRI1q0aJEkyW6369y5c2psbNTkyZMlSQcOHJDX69WUKVMCXTIAAAAQ9m5e8UqP1rPE+lR1e4CLiVIBD0dz5szRI488ohEjRmjs2LH63e9+p/Xr1+vf//3fJUkmk0lLlizRww8/rC984QtKT0/XqlWrlJqaqrlz50qSxowZo1mzZmnBggXaunWrPB6PSkpKVFBQwJ3qAAAAAARFwMPRpk2btGrVKv3nf/6nWltblZqaqv/4j//Q6tWrjTEPPvigLl68qIULF+rcuXO68847tWfPHg0cONAYs3PnTpWUlGjGjBmKiYlRfn6+Nm7cGOhyAQAAAEBSEMLRkCFDtGHDBm3YsOGqY0wmk9auXau1a9dedUxSUpKqq6sDXR4AAAAAXFHMtYcAAAAAQPQjHAEAAACACEcAAAAAIIlwBAAAAACSCEcAAAAAIIlwBADoByorK3XbbbdpyJAhSk5O1ty5c3X69Gm/MZcuXVJxcbGGDh2qG264Qfn5+WppafEbc+bMGeXl5Sk+Pl7JyclatmyZ2tvb+3JXAABBRDgCAES9gwcPqri4WIcPH5bD4ZDH41F2drYuXrxojFm6dKlefvllvfjiizp48KDef/993Xfffcbyjo4O5eXl6fLlyzp06JCee+457dixw+85fgCAyBbw5xwBABBu9uzZ4ze9Y8cOJScnq7GxUdOmTdP58+f17LPPqrq6Wnfffbckafv27RozZowOHz6sqVOnqra2VqdOndK+ffuUkpKiiRMnqqKiQsuXL1d5ebni4uK6bNftdsvtdhvTLpdLkuTxeOTxeIK4xz3TWVM41hZJormPllhf320rxuf3X/RcZw+j8T3ZXd3dd8IRAKDfOX/+vKSPHzguSY2NjfJ4PMrKyjLGjB49WiNGjFB9fb2mTp2q+vp6jR8/XikpKcaYnJwcLVq0SM3NzZo0aVKX7VRWVmrNmjVd5tfW1io+Pj7QuxUwDocj1CVEhWjsY9Xtfb/Nikxv3280SkXje7K72traujWOcAQA6Fe8Xq+WLFmiO+64Q+PGjZMkOZ1OxcXFKTEx0W9sSkqKnE6nMeaTwahzeeeyKykrK1Npaakx7XK5lJaWpuzsbFmt1kDtUsB4PB45HA7NnDlTZrM51OVErGju47jyvX22LUuMTxWZXq06FiO319Rn241Gnb2Mxvdkd3Weub8WwhEAoF8pLi7WyZMn9frrrwd9WxaLRRaLpct8s9kc1r+ghHt9kSIa++ju6PuQ4vaaQrLdaBSN78nu6u5+c0MGAEC/UVJSopqaGv3mN7/R8OHDjfk2m02XL1/WuXPn/Ma3tLTIZrMZYz5997rO6c4xAIDIRjgCAEQ9n8+nkpIS7dq1SwcOHFB6errf8smTJ8tsNmv//v3GvNOnT+vMmTOy2+2SJLvdrhMnTqi1tdUY43A4ZLValZGR0Tc7AgAIKr5WBwCIesXFxaqurtYvf/lLDRkyxLhGKCEhQYMGDVJCQoLmz5+v0tJSJSUlyWq1avHixbLb7Zo6daokKTs7WxkZGZo3b56qqqrkdDq1cuVKFRcXX/GrcwCAyEM4AgBEvS1btkiSpk+f7jd/+/bt+ta3viVJevLJJxUTE6P8/Hy53W7l5OTomWeeMcbGxsaqpqZGixYtkt1u1+DBg1VUVKS1a9f21W4AAIKMcAQAiHo+37WfkzJw4EBt3rxZmzdvvuqYkSNHavfu3YEsDQAQRrjmCAAAAABEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJAUpHD0t7/9Td/85jc1dOhQDRo0SOPHj9exY8eM5T6fT6tXr9awYcM0aNAgZWVl6a233vJ7jbNnz6qwsFBWq1WJiYmaP3++Lly4EIxyAQAAACDw4ejDDz/UHXfcIbPZrF//+tc6deqU/vu//1s33nijMaaqqkobN27U1q1bdeTIEQ0ePFg5OTm6dOmSMaawsFDNzc1yOByqqalRXV2dFi5cGOhyAQAAAECSNCDQL/jYY48pLS1N27dvN+alp6cbf/b5fNqwYYNWrlype+65R5L0k5/8RCkpKXrppZdUUFCgN998U3v27FFDQ4MyMzMlSZs2bdLs2bP1xBNPKDU1NdBlAwAAAOjnAh6OfvWrXyknJ0f/+q//qoMHD+qf/umf9J//+Z9asGCBJOndd9+V0+lUVlaWsU5CQoKmTJmi+vp6FRQUqL6+XomJiUYwkqSsrCzFxMToyJEjuvfee7ts1+12y+12G9Mul0uS5PF45PF4Ar2bvdZZUzjWFmk6e2iJ8YW4ksjW2T/62DvR0sfefDbxuQYEz80rXunV+n9elxegSoDoFPBw9Kc//UlbtmxRaWmpHnroITU0NOi73/2u4uLiVFRUJKfTKUlKSUnxWy8lJcVY5nQ6lZyc7F/ogAFKSkoyxnxaZWWl1qxZ02V+bW2t4uPjA7FrQeFwOEJdQtSoyPSGuoSoQB8DI9L7uHv37h6v29bWFsBKAADoOwEPR16vV5mZmXr00UclSZMmTdLJkye1detWFRUVBXpzhrKyMpWWlhrTLpdLaWlpys7OltVqDdp2e8rj8cjhcGjmzJkym82hLieidfZy1bEYub2mUJcTsSwxPlVkeuljL0VLH0+W5/R43c4z9wAARJqAh6Nhw4YpIyPDb96YMWP0v//7v5Ikm80mSWppadGwYcOMMS0tLZo4caIxprW11e812tvbdfbsWWP9T7NYLLJYLF3mm83msA4f4V5fJHF7TXJ3RO4vo+GCPgZGpPexN59LfKYBACJVwO9Wd8cdd+j06dN+8/74xz9q5MiRkj6+OYPNZtP+/fuN5S6XS0eOHJHdbpck2e12nTt3To2NjcaYAwcOyOv1asqUKYEuGQAAAAACf+Zo6dKl+vKXv6xHH31UX//613X06FFt27ZN27ZtkySZTCYtWbJEDz/8sL7whS8oPT1dq1atUmpqqubOnSvp4zNNs2bN0oIFC7R161Z5PB6VlJSooKCAO9UBAAAACIqAh6PbbrtNu3btUllZmdauXav09HRt2LBBhYWFxpgHH3xQFy9e1MKFC3Xu3Dndeeed2rNnjwYOHGiM2blzp0pKSjRjxgzFxMQoPz9fGzduDHS5AAAAACApCOFIkr761a/qq1/96lWXm0wmrV27VmvXrr3qmKSkJFVXVwejPAAAAADoIuDXHAEAAABAJCIcAQCiXl1dnebMmaPU1FSZTCa99NJLfsu/9a1vyWQy+f3MmjXLb8zZs2dVWFgoq9WqxMREzZ8/XxcuXOjDvQAABBvhCAAQ9S5evKhbb71VmzdvvuqYWbNm6YMPPjB+fvrTn/otLywsVHNzsxwOh2pqalRXV6eFCxcGu3QAQB8KyjVHAACEk9zcXOXm5n7mGIvFctVn6b355pvas2ePGhoalJmZKUnatGmTZs+erSeeeOKqd1J1u91yu93GdOcDcj0ejzweT092Jag6awrH2iJJMPtoifX1av3e1tTb7V/XtmJ8fv9Fz3X2sD//3e7uvhOOAACQ9Oqrryo5OVk33nij7r77bj388MMaOnSoJKm+vl6JiYlGMJKkrKwsxcTE6MiRI7r33nuv+JqVlZVas2ZNl/m1tbWKj48Pzo4EgMPhCHUJUSEYfay6vXfr7969O6Tb74mKTG/fbzRK9ee/221tbd0aRzgCAPR7s2bN0n333af09HS98847euihh5Sbm6v6+nrFxsbK6XQqOTnZb50BAwYoKSlJTqfzqq9bVlam0tJSY9rlciktLU3Z2dmyWq1B25+e8ng8cjgcmjlzpsxmc6jLiVjB7OO48r29Wv9keU5It389LDE+VWR6tepYjNxeU59tNxp19rI//93uPHN/LYQjAEC/V1BQYPx5/PjxmjBhgm655Ra9+uqrmjFjRo9f12KxyGKxdJlvNpvD+heUcK8vUgSjj+6O3oWE3tbT2+33aJteU0i2G43689/t7u43N2QAAOBT/vmf/1k33XST3n77bUmSzWZTa2ur35j29nadPXv2qtcpAQAiD+EIAIBP+etf/6q///3vGjZsmCTJbrfr3LlzamxsNMYcOHBAXq9XU6ZMCVWZAIAA42t1AICod+HCBeMskCS9++67ampqUlJSkpKSkrRmzRrl5+fLZrPpnXfe0YMPPqjPf/7zysn5+PqMMWPGaNasWVqwYIG2bt0qj8ejkpISFRQUXPVOdQCAyMOZIwBA1Dt27JgmTZqkSZMmSZJKS0s1adIkrV69WrGxsTp+/Li+9rWv6Ytf/KLmz5+vyZMn67XXXvO7Xmjnzp0aPXq0ZsyYodmzZ+vOO+/Utm3bQrVLAIAg4MwRACDqTZ8+XT7f1Z+Vsnfvte/AlZSUpOrq6kCWBQAIM5w5AgAAAAARjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAkjQg1AUAAACgb9y84pVQlwCENc4cAQAAAIAIRwAAAAAgiXAEAAAAAJIIRwAAAAAgiXAEAAAAAJIIRwAAAAAgiXAEAAAAAJIIRwAAAAAgiXAEAAAAAJIIRwAAAAAgiXAEAAAAAJIIRwAAAAAgiXAEAAAAAJIIRwAAAAAgqQ/C0bp162QymbRkyRJj3qVLl1RcXKyhQ4fqhhtuUH5+vlpaWvzWO3PmjPLy8hQfH6/k5GQtW7ZM7e3twS4XAAAAQD8V1HDU0NCgH/7wh5owYYLf/KVLl+rll1/Wiy++qIMHD+r999/XfffdZyzv6OhQXl6eLl++rEOHDum5557Tjh07tHr16mCWCwAAAKAfC1o4unDhggoLC/WjH/1IN954ozH//PnzevbZZ7V+/Xrdfffdmjx5srZv365Dhw7p8OHDkqTa2lqdOnVKzz//vCZOnKjc3FxVVFRo8+bNunz5crBKBgAAANCPDQjWCxcXFysvL09ZWVl6+OGHjfmNjY3yeDzKysoy5o0ePVojRoxQfX29pk6dqvr6eo0fP14pKSnGmJycHC1atEjNzc2aNGlSl+253W653W5j2uVySZI8Ho88Hk8wdrFXOmsKx9oiTWcPLTG+EFcS2Tr7Rx97J1r62JvPJj7XACA8jSvfK3eHqcfr/3ldXgCrCU9BCUcvvPCC3njjDTU0NHRZ5nQ6FRcXp8TERL/5KSkpcjqdxphPBqPO5Z3LrqSyslJr1qzpMr+2tlbx8fE92Y0+4XA4Ql1C1KjI9Ia6hKhAHwMj0vu4e/fuHq/b1tYWwEoAAOg7AQ9H7733nr73ve/J4XBo4MCBgX75qyorK1Npaakx7XK5lJaWpuzsbFmt1j6ro7s8Ho8cDodmzpwps9kc6nIiWmcvVx2Lkdvb838N6e8sMT5VZHrpYy9FSx9Pluf0eN3OM/cAAESagIejxsZGtba26ktf+pIxr6OjQ3V1dXr66ae1d+9eXb58WefOnfM7e9TS0iKbzSZJstlsOnr0qN/rdt7NrnPMp1ksFlksli7zzWZzWIePcK8vkri9pl6dKsbH6GNgRHofe/O5xGcaACBSBfyGDDNmzNCJEyfU1NRk/GRmZqqwsND4s9ls1v79+411Tp8+rTNnzshut0uS7Ha7Tpw4odbWVmOMw+GQ1WpVRkZGoEsGAAAAgMCHoyFDhmjcuHF+P4MHD9bQoUM1btw4JSQkaP78+SotLdVvfvMbNTY26tvf/rbsdrumTp0qScrOzlZGRobmzZun3//+99q7d69Wrlyp4uLiK54dAgDgs9TV1WnOnDlKTU2VyWTSSy+95Lfc5/Np9erVGjZsmAYNGqSsrCy99dZbfmPOnj2rwsJCWa1WJSYmav78+bpw4UIf7gUAINiC/hDYK3nyySf11a9+Vfn5+Zo2bZpsNpt+8YtfGMtjY2NVU1Oj2NhY2e12ffOb39QDDzygtWvXhqJcAECEu3jxom699VZt3rz5isurqqq0ceNGbd26VUeOHNHgwYOVk5OjS5cuGWMKCwvV3Nwsh8Ohmpoa1dXVaeHChX21CwCAPhC0W3l/0quvvuo3PXDgQG3evPmqBylJGjlyZK/ulgQAQKfc3Fzl5uZecZnP59OGDRu0cuVK3XPPPZKkn/zkJ0pJSdFLL72kgoICvfnmm9qzZ48aGhqUmZkpSdq0aZNmz56tJ554QqmpqVd8bR4z0T8Fs4+W2Mh+TMD1iJZHI4SDQPUykj8bult7n4QjAADC1bvvviun0+n3/L2EhARNmTJF9fX1KigoUH19vRITE41gJElZWVmKiYnRkSNHdO+9917xtXnMRP8WjD5W3R7wlwx7kf5ohHDS215G8omL7j5mgnAEAOjXOp+fd6Xn633y+XvJycl+ywcMGKCkpKSrPn9P4jET/VUw+ziufG9AXy+cRcujEcJBoHrZm8c8hFp3HzNBOAIAIEh4zET/Fow+RvIjAnoq0h+NEE5628tI/lzobu0huSEDAADhovP5eZ3P0+v06efvffLxEpLU3t6us2fPXvX5ewCAyEM4AgD0a+np6bLZbH7P33O5XDpy5Ijf8/fOnTunxsZGY8yBAwfk9Xo1ZcqUPq8ZABAcfK0OABD1Lly4oLffftuYfvfdd9XU1KSkpCSNGDFCS5Ys0cMPP6wvfOELSk9P16pVq5Samqq5c+dKksaMGaNZs2ZpwYIF2rp1qzwej0pKSlRQUHDVO9UBACIP4QgAEPWOHTumu+66y5juvElCUVGRduzYoQcffFAXL17UwoULde7cOd15553as2ePBg4caKyzc+dOlZSUaMaMGYqJiVF+fr42btzY5/sCAAgewhEAIOpNnz5dPt/Vn+9hMpm0du3az3zYeFJSkqqrq4NRHgAgTHDNEQAAAACIcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACBJGhDqAgAAAPqLm1e8EuoSAHwGzhwBAAAAgAhHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkghHAAAAACCJcAQAAAAAkoIQjiorK3XbbbdpyJAhSk5O1ty5c3X69Gm/MZcuXVJxcbGGDh2qG264Qfn5+WppafEbc+bMGeXl5Sk+Pl7JyclatmyZ2tvbA10uAAAAAEgKQjg6ePCgiouLdfjwYTkcDnk8HmVnZ+vixYvGmKVLl+rll1/Wiy++qIMHD+r999/XfffdZyzv6OhQXl6eLl++rEOHDum5557Tjh07tHr16kCXCwAAAACSpAGBfsE9e/b4Te/YsUPJyclqbGzUtGnTdP78eT377LOqrq7W3XffLUnavn27xowZo8OHD2vq1Kmqra3VqVOntG/fPqWkpGjixImqqKjQ8uXLVV5erri4uC7bdbvdcrvdxrTL5ZIkeTweeTyeQO9mr3XWFI61RZrOHlpifCGuJLJ19o8+9k609LE3n018rgEAIlXAw9GnnT9/XpKUlJQkSWpsbJTH41FWVpYxZvTo0RoxYoTq6+s1depU1dfXa/z48UpJSTHG5OTkaNGiRWpubtakSZO6bKeyslJr1qzpMr+2tlbx8fGB3q2AcTgcoS4halRkekNdQlSgj4ER6X3cvXt3j9dta2sLYCVAeLl5xSvXHGOJ9anqdmlc+V65O0x9UBWAQAlqOPJ6vVqyZInuuOMOjRs3TpLkdDoVFxenxMREv7EpKSlyOp3GmE8Go87lncuupKysTKWlpca0y+VSWlqasrOzZbVaA7VLAePxeORwODRz5kyZzeZQlxPROnu56liM3F4OQj1lifGpItNLH3spWvp4sjynx+t2nrmPJOXl5V3+gW3UqFH6wx/+IOnja2W///3v64UXXpDb7VZOTo6eeeaZLscqAEBkC2o4Ki4u1smTJ/X6668HczOSJIvFIovF0mW+2WwO6/AR7vVFErfXxL/QBQB9DIxI72NvPpci9TNt7Nix2rdvnzE9YMA/DpFLly7VK6+8ohdffFEJCQkqKSnRfffdp9/+9rehKBUAECRBC0clJSWqqalRXV2dhg8fbsy32Wy6fPmyzp0753f2qKWlRTabzRhz9OhRv9frvJtd5xgAAAJpwIABVzzGdOda2avhetjoY4m99vWE0XLtYajRx8AJVC8j+bOhu7UHPBz5fD4tXrxYu3bt0quvvqr09HS/5ZMnT5bZbNb+/fuVn58vSTp9+rTOnDkju90uSbLb7XrkkUfU2tqq5ORkSR9fm2O1WpWRkRHokgEA0FtvvaXU1FQNHDhQdrtdlZWVGjFiRLeulb0aroeNPlW3d39spF97GC7oY+D0tpe9uR411Lp7PWzAw1FxcbGqq6v1y1/+UkOGDDGuEUpISNCgQYOUkJCg+fPnq7S0VElJSbJarVq8eLHsdrtxgMnOzlZGRobmzZunqqoqOZ1OrVy5UsXFxVf86hwAAL0xZcoU7dixQ6NGjdIHH3ygNWvW6Ctf+YpOnjzZrWtlr4brYcPPuPK9Qd9GtFx7GGr0MXAC1cveXI8aat29Hjbg4WjLli2SpOnTp/vN3759u771rW9Jkp588knFxMQoPz/f78LWTrGxsaqpqdGiRYtkt9s1ePBgFRUVae3atYEuFwAA5ebmGn+eMGGCpkyZopEjR+rnP/+5Bg0a1OPX5XrY8NOX1wJG+rWH4YI+Bk5vexnJnwvdrT0oX6u7loEDB2rz5s3avHnzVceMHDkyok/dAQAiV2Jior74xS/q7bff1syZM695rSwAIDrEhLoAAADCzYULF/TOO+9o2LBhftfKdvr0tbIAgOgQ9IfAAgAQ7v7rv/5Lc+bM0ciRI/X+++/rBz/4gWJjY/WNb3yjW9fKAgCiA+EIANDv/fWvf9U3vvEN/f3vf9fnPvc53XnnnTp8+LA+97nPSbr2tbIAgOhAOAIA9HsvvPDCZy7vzrWyAIDIxzVHAAAAACDOHH2mm1e8ErTXtsT6VHX7x89b4PaUvdPZSwAAAKA3OHMEAAAAAOLMEQAAAIBu6O23qv68Li9AlQQPZ44AAAAAQIQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASdKAUBcAAADQXTeveCXUJQCIYpw5AgAAAAARjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACRxtzoAANCHuNscgHDGmSMAAAAAEOEIAAAAACTxtToAAHAd+FocgGjGmSMAAAAAEOEIAAAAACQRjgAAAABAEuEIAAAAACQRjgAAAABAEuEIAAAAACRxK28AAAAAfaC3jwL487q8AFVydZw5AgAAAAARjgAAAABAEuEIAAAAACQRjgAAAABAEjdkAAAgokTCBc0AEKnCOhxt3rxZjz/+uJxOp2699VZt2rRJt99+e6jLAgD0Y5F+bPqscGWJ9anqdmlc+V65O0x9WBUAhIew/Vrdz372M5WWluoHP/iB3njjDd16663KyclRa2trqEsDAPRTHJsAILqF7Zmj9evXa8GCBfr2t78tSdq6dateeeUV/fjHP9aKFSu6jHe73XK73cb0+fPnJUlnz56Vx+PpUQ0D2i/2aL1uvbbXp7Y2rwZ4YtTh5V/neoNeBgZ9DIxo6ePf//73Hq/70UcfSZJ8Pl+gygkbHJvQHfQxMOhj4ERLL/vk2OQLQ2632xcbG+vbtWuX3/wHHnjA97Wvfe2K6/zgBz/wSeKHH3744SdMft57770+OGL0HY5N/PDDDz+R/3OtY1NYnjn6f//v/6mjo0MpKSl+81NSUvSHP/zhiuuUlZWptLTUmPZ6vTp79qyGDh0qkyn8ErLL5VJaWpree+89Wa3WUJcT0ehlYNDHwKCPks/n00cffaTU1NRQlxJQHJvQXfQxMOhj4NDL7h+bwjIc9YTFYpHFYvGbl5iYGJpiroPVau23b9JAo5eBQR8Do7/3MSEhIdQlhAWOTf0bfQwM+hg4/b2X3Tk2heUNGW666SbFxsaqpaXFb35LS4tsNluIqgIA9GccmwAg+oVlOIqLi9PkyZO1f/9+Y57X69X+/ftlt9tDWBkAoL/i2AQA0S9sv1ZXWlqqoqIiZWZm6vbbb9eGDRt08eJF4w5Bkc5isegHP/hBl69b4PrRy8Cgj4FBH6MbxyZ0B30MDPoYOPSy+0w+X/jea/Xpp582HrQ3ceJEbdy4UVOmTAl1WQCAfoxjEwBEr7AORwAAAADQV8LymiMAAAAA6GuEIwAAAAAQ4QgAAAAAJBGOAAAAAEAS4ajPVVZW6rbbbtOQIUOUnJysuXPn6vTp06EuK+KtW7dOJpNJS5YsCXUpEedvf/ubvvnNb2ro0KEaNGiQxo8fr2PHjoW6rIjS0dGhVatWKT09XYMGDdItt9yiiooKcb8bRAqOTcHBsannODb1Hsemngnb5xxFq4MHD6q4uFi33Xab2tvb9dBDDyk7O1unTp3S4MGDQ11eRGpoaNAPf/hDTZgwIdSlRJwPP/xQd9xxh+666y79+te/1uc+9zm99dZbuvHGG0NdWkR57LHHtGXLFj333HMaO3asjh07pm9/+9tKSEjQd7/73VCXB1wTx6bA49jUcxybAoNjU89wK+8Q+7//+z8lJyfr4MGDmjZtWqjLiTgXLlzQl770JT3zzDN6+OGHNXHiRG3YsCHUZUWMFStW6Le//a1ee+21UJcS0b761a8qJSVFzz77rDEvPz9fgwYN0vPPPx/CyoCe4djUOxybeodjU2BwbOoZvlYXYufPn5ckJSUlhbiSyFRcXKy8vDxlZWWFupSI9Ktf/UqZmZn613/9VyUnJ2vSpEn60Y9+FOqyIs6Xv/xl7d+/X3/84x8lSb///e/1+uuvKzc3N8SVAT3Dsal3ODb1DsemwODY1DN8rS6EvF6vlixZojvuuEPjxo0LdTkR54UXXtAbb7yhhoaGUJcSsf70pz9py5YtKi0t1UMPPaSGhgZ997vfVVxcnIqKikJdXsRYsWKFXC6XRo8erdjYWHV0dOiRRx5RYWFhqEsDrhvHpt7h2NR7HJsCg2NTzxCOQqi4uFgnT57U66+/HupSIs57772n733ve3I4HBo4cGCoy4lYXq9XmZmZevTRRyVJkyZN0smTJ7V161YOQNfh5z//uXbu3Knq6mqNHTtWTU1NWrJkiVJTU+kjIg7Hpp7j2BQYHJsCg2NTD/kQEsXFxb7hw4f7/vSnP4W6lIi0a9cunyRfbGys8SPJZzKZfLGxsb729vZQlxgRRowY4Zs/f77fvGeeecaXmpoaoooi0/Dhw31PP/2037yKigrfqFGjQlQR0DMcm3qHY1NgcGwKDI5NPcOZoz7m8/m0ePFi7dq1S6+++qrS09NDXVJEmjFjhk6cOOE379vf/rZGjx6t5cuXKzY2NkSVRZY77rijy+16//jHP2rkyJEhqigytbW1KSbG/xLO2NhYeb3eEFUEXB+OTYHBsSkwODYFBsemniEc9bHi4mJVV1frl7/8pYYMGSKn0ylJSkhI0KBBg0JcXeQYMmRIl+/CDx48WEOHDuU78tdh6dKl+vKXv6xHH31UX//613X06FFt27ZN27ZtC3VpEWXOnDl65JFHNGLECI0dO1a/+93vtH79ev37v/97qEsDuoVjU2BwbAoMjk2BwbGpZ7iVdx8zmUxXnL99+3Z961vf6ttiosz06dO5XWoP1NTUqKysTG+99ZbS09NVWlqqBQsWhLqsiPLRRx9p1apV2rVrl1pbW5WamqpvfOMbWr16teLi4kJdHnBNHJuCh2NTz3Bs6j2OTT1DOAIAAAAA8ZwjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJBEOAIAAAAASYQjAAAAAJAk/X9GNplQseVb1wAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(10,5))\n", "plt.subplot(1,2,1)\n", "movies.IMDB_Rating.hist(bins=3)\n", "plt.subplot(1,2,2)\n", "movies.IMDB_Rating.hist(bins=20)" ] }, { "cell_type": "markdown", "metadata": { "id": "XXIbKfeWTG6d" }, "source": [ "What does the argument in `plt.subplot(1,2,1)` mean? If you're not sure, check out: http://stackoverflow.com/questions/3584805/in-matplotlib-what-does-the-argument-mean-in-fig-add-subplot111\n", "\n", "**Q: create 8 subplots (2 rows and 4 columns) with the following `binsizes`.**" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 872 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:19.049Z", "iopub.status.busy": "2020-06-14T19:57:19.029Z", "iopub.status.idle": "2020-06-14T19:57:20.201Z", "shell.execute_reply": "2020-06-14T19:57:20.255Z" }, "executionInfo": { "elapsed": 2347, "status": "ok", "timestamp": 1687818423649, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "jlYvPcu-TG6d", "jupyter": { "outputs_hidden": false }, "outputId": "cb6e99e3-6332-420f-e3d1-c0bec8422308" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nbins = [2, 3, 5, 10, 30, 40, 60, 100 ]\n", "figsize = (18, 10)\n", "\n", "# TODO\n", "\n", "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "EhkbFVmOTG6d" }, "source": [ "Do you see the issues with having too few bins or too many bins? In particular, do you notice weird patterns that emerge from `bins=30`?\n", "\n", "**Q: Can you guess why do you see such patterns? What are the peaks and what are the empty bars? What do they tell you about choosing the binsize in histograms?**" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 430 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:21.486Z", "iopub.status.busy": "2020-06-14T19:57:21.468Z", "iopub.status.idle": "2020-06-14T19:57:21.640Z", "shell.execute_reply": "2020-06-14T19:57:21.689Z" }, "executionInfo": { "elapsed": 637, "status": "ok", "timestamp": 1687818426810, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "_RYXN95UTG6e", "jupyter": { "outputs_hidden": false }, "outputId": "6079008d-ce20-459c-8381-b0ef63548fe4" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:22.246Z", "iopub.status.busy": "2020-06-14T19:57:22.230Z", "iopub.status.idle": "2020-06-14T19:57:22.280Z", "shell.execute_reply": "2020-06-14T19:57:22.297Z" }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1687818426810, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "okuF4FOzTG6e", "jupyter": { "outputs_hidden": false }, "outputId": "fad130a5-cc9d-4832-d9f2-b52d8d95313b" }, "outputs": [ { "data": { "text/plain": [ "40" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "YhF5JL3mTG6e" }, "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "OVCQtwOLTG6e" }, "source": [ "## Formulae for choosing the number of bins.\n", "\n", "We can manually choose the number of bins based on those formulae." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 409 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:26.087Z", "iopub.status.busy": "2020-06-14T19:57:26.071Z", "iopub.status.idle": "2020-06-14T19:57:26.506Z", "shell.execute_reply": "2020-06-14T19:57:26.556Z" }, "executionInfo": { "elapsed": 1146, "status": "ok", "timestamp": 1687818431715, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "My4M97X8TG6f", "outputId": "17eb04e4-63fd-4e9e-d880-138c1a92f87a" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "N = len(movies)\n", "\n", "plt.figure(figsize=(12,4))\n", "\n", "# Sqrt\n", "nbins = int(np.sqrt(N))\n", "\n", "plt.subplot(1,3,1)\n", "plt.title(\"SQRT, {} bins\".format(nbins))\n", "movies.IMDB_Rating.hist(bins=nbins)\n", "\n", "# Sturge's formula\n", "nbins = int(np.ceil(np.log2(N) + 1))\n", "\n", "plt.subplot(1,3,2)\n", "plt.title(\"Sturge, {} bins\".format(nbins))\n", "movies.IMDB_Rating.hist(bins=nbins)\n", "\n", "# Freedman-Diaconis\n", "iqr = np.percentile(movies.IMDB_Rating, 75) - np.percentile(movies.IMDB_Rating, 25)\n", "width = 2*iqr/np.power(N, 1/3)\n", "nbins = int((max(movies.IMDB_Rating) - min(movies.IMDB_Rating)) / width)\n", "\n", "plt.subplot(1,3,3)\n", "plt.title(\"F-D, {} bins\".format(nbins))\n", "movies.IMDB_Rating.hist(bins=nbins)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "8RKA0F83TG6f" }, "source": [ "But we can also use built-in formulae too. Let's try all of them." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 387 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:27.570Z", "iopub.status.busy": "2020-06-14T19:57:27.554Z", "iopub.status.idle": "2020-06-14T19:57:28.976Z", "shell.execute_reply": "2020-06-14T19:57:28.994Z" }, "executionInfo": { "elapsed": 1582, "status": "ok", "timestamp": 1687818433290, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "qoEJofYATG6f", "jupyter": { "outputs_hidden": false }, "outputId": "2c4f4f9c-1a22-422f-9865-2cae93d50bfc" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(20,4))\n", "\n", "plt.subplot(161)\n", "movies.IMDB_Rating.hist(bins='fd')\n", "\n", "plt.subplot(162)\n", "movies.IMDB_Rating.hist(bins='doane')\n", "\n", "plt.subplot(163)\n", "movies.IMDB_Rating.hist(bins='scott')\n", "\n", "plt.subplot(164)\n", "movies.IMDB_Rating.hist(bins='rice')\n", "\n", "plt.subplot(165)\n", "movies.IMDB_Rating.hist(bins='sturges')\n", "\n", "plt.subplot(166)\n", "movies.IMDB_Rating.hist(bins='sqrt')" ] }, { "cell_type": "markdown", "metadata": { "id": "33_5ZPO2TG6f" }, "source": [ "Some are decent, but several of them tend to overestimate the good number of bins. As you have more data points, some of the formulae may overestimate the necessary number of bins. Particularly in our case, because of the precision issue, we shouldn't increase the number of bins too much." ] }, { "cell_type": "markdown", "metadata": { "id": "oX4mgRmVTG6f" }, "source": [ "### Then, how should we choose the number of bins?" ] }, { "cell_type": "markdown", "metadata": { "id": "nwa4u4gjTG6f" }, "source": [ "So what's the conclusion? use Scott's rule or Sturges' formula?\n", "\n", "No, I think the take-away is that you **should understand how the inappropriate number of bins can mislead you** and you should **try multiple number of bins** to obtain the most accurate picture of the data. Although the 'default' may work in most cases, don't blindly trust it! Don't judge the distribution of a dataset based on a single histogram. Try multiple parameters to get the full picture!" ] }, { "cell_type": "markdown", "metadata": { "id": "Fevnnf2fTG6f" }, "source": [ "## CDF (Cumulative distribution function)\n", "\n", "Drawing a CDF is easy. Because it's very common data visualization, histogram has an option called `cumulative`." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 448 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:33.036Z", "iopub.status.busy": "2020-06-14T19:57:32.985Z", "iopub.status.idle": "2020-06-14T19:57:33.475Z", "shell.execute_reply": "2020-06-14T19:57:33.492Z" }, "executionInfo": { "elapsed": 590, "status": "ok", "timestamp": 1687818472742, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "tGx6OxKATG6g", "jupyter": { "outputs_hidden": false }, "outputId": "908dae60-f150-4f40-9149-8faa5670e7ca" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "movies.IMDB_Rating.hist(cumulative=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "ylVvfe-eTG6g" }, "source": [ "You can also combine with options such as `histtype` and `density`." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 448 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:34.566Z", "iopub.status.busy": "2020-06-14T19:57:34.550Z", "iopub.status.idle": "2020-06-14T19:57:34.671Z", "shell.execute_reply": "2020-06-14T19:57:34.687Z" }, "executionInfo": { "elapsed": 336, "status": "ok", "timestamp": 1687818473325, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "V-qGVKxVTG6g", "jupyter": { "outputs_hidden": false }, "outputId": "9fd0e1f5-e2fb-49dc-ebb8-9d7a17bc5e29" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "movies.IMDB_Rating.hist(histtype='step', cumulative=True, density=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "5LkuToB3TG6g" }, "source": [ "And increase the number of bins." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 448 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:35.726Z", "iopub.status.busy": "2020-06-14T19:57:35.709Z", "iopub.status.idle": "2020-06-14T19:57:37.266Z", "shell.execute_reply": "2020-06-14T19:57:37.319Z" }, "executionInfo": { "elapsed": 2968, "status": "ok", "timestamp": 1687818476463, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "d0fgONoxTG6g", "jupyter": { "outputs_hidden": false }, "outputId": "3f3cb362-f263-4565-f72a-33ea6be4e2cb" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "movies.IMDB_Rating.hist(cumulative=True, density=True, bins=1000)" ] }, { "cell_type": "markdown", "metadata": { "id": "JrytXZBQTG6g" }, "source": [ "This method works fine. By increasing the number of bins, you can get a CDF in the resolution that you want. But let's also try it manually to better understand what's going on. First, we should sort all the values." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:38.193Z", "iopub.status.busy": "2020-06-14T19:57:38.177Z", "iopub.status.idle": "2020-06-14T19:57:38.229Z", "shell.execute_reply": "2020-06-14T19:57:38.245Z" }, "executionInfo": { "elapsed": 10, "status": "ok", "timestamp": 1687818476464, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "hQmzLYonTG6g", "jupyter": { "outputs_hidden": false }, "outputId": "e986fb7e-dd88-412e-8d7b-7652dee31d4a" }, "outputs": [ { "data": { "text/plain": [ "1247 1.4\n", "406 1.5\n", "1754 1.6\n", "1590 1.7\n", "1515 1.7\n", "Name: IMDB_Rating, dtype: float64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rating_sorted = movies.IMDB_Rating.sort_values()\n", "rating_sorted.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "GwGp6rXDTG6g" }, "source": [ "We need to know the number of data points," ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:39.585Z", "iopub.status.busy": "2020-06-14T19:57:39.569Z", "iopub.status.idle": "2020-06-14T19:57:39.617Z", "shell.execute_reply": "2020-06-14T19:57:39.635Z" }, "executionInfo": { "elapsed": 8, "status": "ok", "timestamp": 1687818476465, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "wzlbCrkcTG6h", "jupyter": { "outputs_hidden": false }, "outputId": "788c365c-1eb4-446c-9a6e-3627f7a758e4" }, "outputs": [ { "data": { "text/plain": [ "2988" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = len(rating_sorted)\n", "N" ] }, { "cell_type": "markdown", "metadata": { "id": "TB-YgQwDTG6h" }, "source": [ "And I think this may be useful for you." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:40.533Z", "iopub.status.busy": "2020-06-14T19:57:40.515Z", "iopub.status.idle": "2020-06-14T19:57:40.565Z", "shell.execute_reply": "2020-06-14T19:57:40.581Z" }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1687818476989, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "ta5S5g-OTG6h", "jupyter": { "outputs_hidden": false }, "outputId": "69f7a2c3-e913-4d5a-c10c-e66abf2dc7c0" }, "outputs": [ { "data": { "text/plain": [ "array([0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 , 0.22,\n", " 0.24, 0.26, 0.28, 0.3 , 0.32, 0.34, 0.36, 0.38, 0.4 , 0.42, 0.44,\n", " 0.46, 0.48, 0.5 , 0.52, 0.54, 0.56, 0.58, 0.6 , 0.62, 0.64, 0.66,\n", " 0.68, 0.7 , 0.72, 0.74, 0.76, 0.78, 0.8 , 0.82, 0.84, 0.86, 0.88,\n", " 0.9 , 0.92, 0.94, 0.96, 0.98, 1. ])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = 50\n", "np.linspace(1/n, 1.0, num=n)" ] }, { "cell_type": "markdown", "metadata": { "id": "xTkw7L1ETG6h" }, "source": [ "**Q: now you're ready to draw a proper CDF. Draw the CDF plot of this data.**" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 449 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:41.498Z", "iopub.status.busy": "2020-06-14T19:57:41.481Z", "iopub.status.idle": "2020-06-14T19:57:41.677Z", "shell.execute_reply": "2020-06-14T19:57:41.693Z" }, "executionInfo": { "elapsed": 345, "status": "ok", "timestamp": 1687818478045, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "mS2e8Ji4TG6h", "jupyter": { "outputs_hidden": false }, "outputId": "0e2fd594-524b-4d3a-df83-aad9d593a30e" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "_yRa45NKTG6h" }, "source": [ "## A bit more histogram with altair\n", "\n", "As you may remember, you can get a pandas dataframe from `vega_datasets` package and use it to create visualizations. But, if you use `altair`, you can simply pass the URL instead of the actual data." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 36 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:43.058Z", "iopub.status.busy": "2020-06-14T19:57:43.042Z", "iopub.status.idle": "2020-06-14T19:57:43.092Z", "shell.execute_reply": "2020-06-14T19:57:43.108Z" }, "executionInfo": { "elapsed": 138, "status": "ok", "timestamp": 1687818484772, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "i5tVwCoUTG6h", "jupyter": { "outputs_hidden": false }, "outputId": "f2d50db3-0a7e-4dbb-8e4c-73112fb813e2" }, "outputs": [ { "data": { "text/plain": [ "'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vega_datasets.data.movies.url" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2020-06-14T19:57:43.771Z", "iopub.status.busy": "2020-06-14T19:57:43.755Z", "iopub.status.idle": "2020-06-14T19:57:43.806Z", "shell.execute_reply": "2020-06-14T19:57:43.821Z" }, "executionInfo": { "elapsed": 12, "status": "ok", "timestamp": 1687818484928, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "ovK3CJblTG6i", "jupyter": { "outputs_hidden": false }, "outputId": "148eff64-dbe9-44d0-bcca-a4e775aac584" }, "outputs": [ { "data": { "text/plain": [ "RendererRegistry.enable('jupyterlab')" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Choose based on your environment\n", "alt.renderers.enable('jupyterlab')\n", "# alt.renderers.enable('notebook')" ] }, { "cell_type": "markdown", "metadata": { "id": "lwcVA0LxTG6i" }, "source": [ "As mentioned before, in `altair` histogram is not special. It is just a plot that use bars (`mark_bar()`) where X axis is defined by `IMDB_Rating` with bins (`bin=True`), and Y axis is defined by `count()` aggregation function." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:44.724Z", "iopub.status.busy": "2020-06-14T19:57:44.708Z", "iopub.status.idle": "2020-06-14T19:57:44.757Z", "shell.execute_reply": "2020-06-14T19:57:44.817Z" }, "executionInfo": { "elapsed": 11, "status": "ok", "timestamp": 1687818484929, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "h048hIgETG6i", "jupyter": { "outputs_hidden": false }, "outputId": "2729bc70-08cd-4c5b-e569-76c73bf8cbb6" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": true, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "mark": { "type": "bar" } }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(vega_datasets.data.movies.url).mark_bar().encode(\n", " alt.X(\"IMDB_Rating:Q\", bin=True),\n", " alt.Y('count()')\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "KrhwH6I1TG6i" }, "source": [ "Have you noted that it is `IMDB_Rating:Q` not `IMDB_Rating`? This is a shorthand for" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:45.826Z", "iopub.status.busy": "2020-06-14T19:57:45.808Z", "iopub.status.idle": "2020-06-14T19:57:45.861Z", "shell.execute_reply": "2020-06-14T19:57:45.890Z" }, "executionInfo": { "elapsed": 24, "status": "ok", "timestamp": 1687818485080, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "Jna8_X-MTG6i", "jupyter": { "outputs_hidden": false }, "outputId": "11a8b01b-03f5-4d69-c124-cce4b583a2d9" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": true, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "mark": { "type": "bar" } }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(vega_datasets.data.movies.url).mark_bar().encode(\n", " alt.X('IMDB_Rating', type='quantitative', bin=True),\n", " alt.Y(aggregate='count', type='quantitative')\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "PXWdvT7XTG6i" }, "source": [ "In altair, you want to specify the data types using one of the four categories: quantitative, ordinal, nominal, and temporal. https://altair-viz.github.io/user_guide/encoding.html#data-types" ] }, { "cell_type": "markdown", "metadata": { "id": "07Qhm4sZTG6i" }, "source": [ "Although you can adjust the bins in `altair`, it does not encourage you to set the bins directly. For instance, although there is `step` parameter that directly sets the bin size, there are parameters such as `maxbins` (maximum number of bins) or `minstep` (minimum allowable step size), or `nice` (attemps to make the bin boundaries more human-friendly), that encourage you not to specify the bins directly." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:48.350Z", "iopub.status.busy": "2020-06-14T19:57:48.333Z", "iopub.status.idle": "2020-06-14T19:57:48.386Z", "shell.execute_reply": "2020-06-14T19:57:48.414Z" }, "executionInfo": { "elapsed": 21, "status": "ok", "timestamp": 1687818485080, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "6fg-viz1TG6i", "jupyter": { "outputs_hidden": false }, "outputId": "3c6ba6e4-afc4-4819-dc3f-ed276f5fbc59" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "step": 0.09 }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "mark": { "type": "bar" } }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from altair import Bin\n", "\n", "alt.Chart(vega_datasets.data.movies.url).mark_bar().encode(\n", " alt.X(\"IMDB_Rating:Q\", bin=Bin(step=0.09)),\n", " alt.Y('count()')\n", ")" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:49.339Z", "iopub.status.busy": "2020-06-14T19:57:49.320Z", "iopub.status.idle": "2020-06-14T19:57:49.373Z", "shell.execute_reply": "2020-06-14T19:57:49.401Z" }, "executionInfo": { "elapsed": 20, "status": "ok", "timestamp": 1687818485081, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "lhrilAaATG6j", "jupyter": { "outputs_hidden": false }, "outputId": "530289a0-fa70-4eda-a05e-90caa03bc646" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "maxbins": 20, "nice": true }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "mark": { "type": "bar" } }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(vega_datasets.data.movies.url).mark_bar().encode(\n", " alt.X(\"IMDB_Rating:Q\", bin=Bin(nice=True, maxbins=20)),\n", " alt.Y('count()')\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "B6I7ZhwUTG6j" }, "source": [ "### Composing charts in altair\n", "\n", "`altair` has a very nice way to compose multiple plots. Two histograms side by side? just do the following." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2020-06-14T19:57:50.814Z", "iopub.status.busy": "2020-06-14T19:57:50.798Z", "iopub.status.idle": "2020-06-14T19:57:50.842Z", "shell.execute_reply": "2020-06-14T19:57:50.859Z" }, "executionInfo": { "elapsed": 156, "status": "ok", "timestamp": 1687818529552, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "xPmv3BSrTG6j" }, "outputs": [], "source": [ "chart1 = alt.Chart(vega_datasets.data.movies.url).mark_bar().encode(\n", " alt.X(\"IMDB_Rating:Q\", bin=Bin(step=0.1)),\n", " alt.Y('count()')\n", ").properties(\n", " width=300,\n", " height=150\n", ")\n", "chart2 = alt.Chart(vega_datasets.data.movies.url).mark_bar().encode(\n", " alt.X(\"IMDB_Rating:Q\", bin=Bin(nice=True, maxbins=20)),\n", " alt.Y('count()')\n", ").properties(\n", " width=300,\n", " height=150\n", ")" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:52.890Z", "iopub.status.busy": "2020-06-14T19:57:52.874Z", "iopub.status.idle": "2020-06-14T19:57:52.923Z", "shell.execute_reply": "2020-06-14T19:57:52.953Z" }, "executionInfo": { "elapsed": 10, "status": "ok", "timestamp": 1687818530196, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "ysVDBmkHTG6j", "jupyter": { "outputs_hidden": false }, "outputId": "cb97ec72-7ae3-4472-e672-08aa74e3b5af" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "hconcat": [ { "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "step": 0.1 }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 300 }, { "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "maxbins": 20, "nice": true }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 300 } ] }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chart1 | chart2" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:53.433Z", "iopub.status.busy": "2020-06-14T19:57:53.415Z", "iopub.status.idle": "2020-06-14T19:57:53.469Z", "shell.execute_reply": "2020-06-14T19:57:53.502Z" }, "executionInfo": { "elapsed": 11, "status": "ok", "timestamp": 1687818530827, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "4V9T406iTG6j", "jupyter": { "outputs_hidden": false }, "outputId": "e90872a0-239a-4dca-d2a3-87edae091c74" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "hconcat": [ { "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "step": 0.1 }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 300 }, { "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "maxbins": 20, "nice": true }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 300 } ] }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.hconcat(chart1, chart2)" ] }, { "cell_type": "markdown", "metadata": { "id": "J2H-DmilTG6j" }, "source": [ "Vertical commposition?" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:55.011Z", "iopub.status.busy": "2020-06-14T19:57:54.929Z", "iopub.status.idle": "2020-06-14T19:57:55.080Z", "shell.execute_reply": "2020-06-14T19:57:55.113Z" }, "executionInfo": { "elapsed": 447, "status": "ok", "timestamp": 1687818531762, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "XqiQlLJwTG6j", "jupyter": { "outputs_hidden": false }, "outputId": "b4f414f9-d5af-4630-801c-b1f0980d1338" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "vconcat": [ { "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "step": 0.1 }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 300 }, { "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "maxbins": 20, "nice": true }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 300 } ] }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.vconcat(chart1, chart2)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:56.070Z", "iopub.status.busy": "2020-06-14T19:57:56.051Z", "iopub.status.idle": "2020-06-14T19:57:56.107Z", "shell.execute_reply": "2020-06-14T19:57:56.143Z" }, "executionInfo": { "elapsed": 14, "status": "ok", "timestamp": 1687818531763, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "lNNGnNF9TG6k", "jupyter": { "outputs_hidden": false }, "outputId": "c3de22f3-1815-45c9-85d8-279f43b6fb59" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "vconcat": [ { "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "step": 0.1 }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 300 }, { "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "encoding": { "x": { "bin": { "maxbins": 20, "nice": true }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 300 } ] }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chart1 & chart2" ] }, { "cell_type": "markdown", "metadata": { "id": "jAIuSQfmTG6k" }, "source": [ "Shall we avoid some repetitions? You can define a *base* empty chart first and then assign encodings later when you put together multiple charts together. Here is an example: https://altair-viz.github.io/user_guide/compound_charts.html#repeated-charts\n", "\n", "**Q: Using the base chart approach to create a 2x2 chart where the top row shows the two histograms of `IMDB_Rating` with `maxbins`=10 and 50 respectively, and the bottom row shows another two histograms of `IMDB_Votes` with `maxbins`=10 and 50.**" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 108 }, "execution": { "iopub.execute_input": "2020-06-14T19:57:57.894Z", "iopub.status.busy": "2020-06-14T19:57:57.878Z", "iopub.status.idle": "2020-06-14T19:57:57.930Z", "shell.execute_reply": "2020-06-14T19:57:57.969Z" }, "executionInfo": { "elapsed": 13, "status": "ok", "timestamp": 1687818532588, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "Z8y3XrhmTG6k", "jupyter": { "outputs_hidden": false }, "outputId": "28095dd2-5f7b-4134-b4ef-9f5d92dd447b" }, "outputs": [ { "data": { "application/vnd.vegalite.v5+json": { "$schema": "https://vega.github.io/schema/vega-lite/v5.17.0.json", "config": { "view": { "continuousHeight": 300, "continuousWidth": 300 } }, "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json" }, "vconcat": [ { "hconcat": [ { "encoding": { "x": { "bin": { "maxbins": 10 }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 200 }, { "encoding": { "x": { "bin": { "maxbins": 50 }, "field": "IMDB_Rating", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 200 } ] }, { "hconcat": [ { "encoding": { "x": { "bin": { "maxbins": 10 }, "field": "IMDB_Votes", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 200 }, { "encoding": { "x": { "bin": { "maxbins": 50 }, "field": "IMDB_Votes", "type": "quantitative" }, "y": { "aggregate": "count", "type": "quantitative" } }, "height": 150, "mark": { "type": "bar" }, "width": 200 } ] } ] }, "text/plain": [ "\n", "\n", "If you see this message, it means the renderer has not been properly enabled\n", "for the frontend that you are using. For more information, see\n", "https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\n" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE\n" ] } ], "metadata": { "anaconda-cloud": {}, "colab": { "provenance": [] }, "kernel_info": { "name": "dviz" }, "kernelspec": { "display_name": "Python 3.9.13 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" }, "nteract": { "version": "0.23.3" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "vscode": { "interpreter": { "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e" } } }, "nbformat": 4, "nbformat_minor": 0 }