{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 1.02: Indexing, Slicing, Splitting, and Iterating" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our client wants to prove that our dataset is nicely distributed around the mean value of 100. \n", "They asked us to run some tests on several subsections of it to make sure they won't get a non-descriptive section of our data.\n", "\n", "Look at the mean value of each subtask." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Loading the dataset" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# importing the necessary dependencies\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# loading the Dataset\n", "dataset = np.genfromtxt('../../Datasets/normal_distribution_splittable.csv', delimiter=',')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we need several rows of our dataset to complete the given task, we have to use indexing to get the right rows. \n", "To recap, index: \n", "- the second row \n", "- the last row\n", "- the first value of the first row\n", "- the last value of the second to the last row" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "96.90038836444445" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# indexing the second row of the dataset (2nd row)\n", "second_row = dataset[1]\n", "\n", "np.mean(second_row)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "100.18096645222221" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# indexing the last element of the dataset (last row)\n", "last_row = dataset[-1]\n", "\n", "np.mean(last_row)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "99.14931546" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# indexing the first value of the first row (1st row, 1st value)\n", "first_val_first_row = dataset[0][0]\n", "\n", "np.mean(first_val_first_row)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "101.2226037" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# indexing the last value of the second to last row (we want to use the combined access syntax here) \n", "last_val_second_last_row = dataset[-2, -1]\n", "\n", "np.mean(last_val_second_last_row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Slicing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other than the single rows and values we also need to get some subsets of the dataset. \n", "Use slicing for:\n", "- a 2x2 slice starting from the second row and second element to the 4th element in the 4th row\n", "- every other element of the 5th row\n", "- the content of the last row in reversed order" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "95.63393608250001" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# slicing an intersection of 4 elements (2x2) of the first two rows and first two columns\n", "subsection_2x2 = dataset[1:3, 1:3]\n", "\n", "np.mean(subsection_2x2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Why is it not a problem if such a small subsection has a bigger standard deviation from 100?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Several smaller values can cluster in such a small subsection leading to the value being really low. \n", "If we make our subsection larger, we have a higher chance of getting a more expressive view of our data." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "98.35235805800001" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# selecting every second element of the fifth row \n", "every_other_elem = dataset[4, ::2]\n", "\n", "np.mean(every_other_elem)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "100.18096645222222" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reversing the entry order, selecting the first two rows in reversed order\n", "reversed_last_row = dataset[-1, ::-1]\n", "\n", "np.mean(reversed_last_row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Splitting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our client's team only wants to use a small subset of the given dataset. \n", "Therefore we need to first split it into 3 equal pieces and then give them the first half of the first split. \n", "They sent us this drawing to show us what they need:\n", "```\n", "1, 2, 3, 4, 5, 6 1, 2 3, 4 5, 6 1, 2 \n", "3, 2, 1, 5, 4, 6 => 3, 2 1, 5 4, 6 => 3, 2 => 1, 2\n", "5, 3, 1, 2, 4, 3 5, 3 1, 2 4, 3 3, 2\n", "1, 2, 2, 4, 1, 5 1, 2 2, 4 1, 5 5, 3\n", " 1, 2\n", "```\n", "\n", "> **Note:** \n", "We are using a very small dataset here but imagine you have a huge amount of data and only want to look at a small subset of it to tweak your visualizations" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# splitting up our dataset horizontally on indices one third and two thirds\n", "hor_splits = np.hsplit(dataset,(3))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# splitting up our dataset vertically on index 2\n", "ver_splits = np.vsplit(hor_splits[0],(2))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset (24, 9)\n", "Subset (12, 3)\n" ] } ], "source": [ "# requested subsection of our dataset which has only half the amount of rows and only a third of the columns\n", "print(\"Dataset\", dataset.shape)\n", "print(\"Subset\", ver_splits[0].shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Iterating" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you sent over the dataset they tell you that they also need a way iterate over the whole dataset element by element as if it would be a one-dimensional list. \n", "However, they want to also now the position in the dataset itself.\n", "\n", "They send you this piece of code and tell you that it's not working as mentioned. \n", "Come up with the right solution for their needs using the `ndenumerate method`." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "99.14931546 0\n", "104.03852715 1\n", "107.43534677 2\n", "97.85230675 3\n", "98.74986914 4\n", "98.80833412 5\n", "96.81964892 6\n", "98.56783189 7\n", "101.34745901 8\n", "92.02628776 9\n", "97.10439252 10\n", "99.32066924 11\n", "97.24584816 12\n", "92.9267508 13\n", "92.65657752 14\n", "105.7197853 15\n", "101.23162942 16\n", "93.87155456 17\n", "95.66253664 18\n", "95.17750125 19\n", "90.93318132 20\n", "110.18889465 21\n", "98.80084371 22\n", "105.95297652 23\n", "98.37481387 24\n", "106.54654286 25\n", "107.22482426 26\n", "91.37294597 27\n", "100.96781394 28\n", "100.40118279 29\n", "113.42090475 30\n", "105.48508838 31\n", "91.6604946 32\n", "106.1472841 33\n", "95.08715803 34\n", "103.40412146 35\n", "101.20862522 36\n", "103.5730309 37\n", "100.28690912 38\n", "105.85269352 39\n", "93.37126331 40\n", "108.57980357 41\n", "100.79478953 42\n", "94.20019732 43\n", "96.10020311 44\n", "102.80387079 45\n", "98.29687616 46\n", "93.24376389 47\n", "97.24130034 48\n", "89.03452725 49\n", "96.2832753 50\n", "104.60344836 51\n", "101.13442416 52\n", "97.62787811 53\n", "106.71751618 54\n", "102.97585605 55\n", "98.45723272 56\n", "100.72418901 57\n", "106.39798503 58\n", "95.46493436 59\n", "94.35373179 60\n", "106.83273763 61\n", "100.07721494 62\n", "96.02548256 63\n", "102.82360856 64\n", "106.47551845 65\n", "101.34745901 66\n", "102.45651798 67\n", "98.74767493 68\n", "97.57544275 69\n", "92.5748759 70\n", "91.37294597 71\n", "105.30350449 72\n", "92.87730812 73\n", "103.19258339 74\n", "104.40518318 75\n", "101.29326772 76\n", "100.85447132 77\n", "101.2226037 78\n", "106.03868807 79\n", "97.85230675 80\n", "110.44484313 81\n", "93.87155456 82\n", "101.5363647 83\n", "97.65393524 84\n", "92.75048583 85\n", "101.72074646 86\n", "96.96851209 87\n", "103.29147111 88\n", "99.14931546 89\n", "101.3514185 90\n", "100.37372248 91\n", "106.6471081 92\n", "100.61742813 93\n", "105.0320535 94\n", "99.35999981 95\n", "98.87007532 96\n", "95.85284217 97\n", "93.97853495 98\n", "97.21315663 99\n", "107.02874163 100\n", "102.17642112 101\n", "96.74630281 102\n", "95.93799169 103\n", "102.62384733 104\n", "105.07475277 105\n", "97.59572169 106\n", "106.57364584 107\n", "95.65982034 108\n", "107.22482426 109\n", "107.19119932 110\n", "102.93039474 111\n", "85.98839623 112\n", "95.19184343 113\n", "91.32093303 114\n", "102.35313953 115\n", "100.39303522 116\n", "100.39303522 117\n", "92.0108226 118\n", "97.75887636 119\n", "93.18884302 120\n", "100.44940274 121\n", "108.09423367 122\n", "96.50342927 123\n", "99.58664719 124\n", "95.19184343 125\n", "103.1521596 126\n", "109.40523174 127\n", "93.83969256 128\n", "99.95827854 129\n", "101.83462816 130\n", "99.69982772 131\n", "103.05289628 132\n", "103.93383957 133\n", "104.15899829 134\n", "106.11454989 135\n", "88.80221141 136\n", "94.5081787 137\n", "94.59300658 138\n", "101.08830521 139\n", "96.34622848 140\n", "96.89244283 141\n", "98.07122664 142\n", "100.28690912 143\n", "96.78266211 144\n", "99.84251605 145\n", "104.03478031 146\n", "106.57052697 147\n", "105.13668343 148\n", "105.37011896 149\n", "99.07551254 150\n", "104.15899829 151\n", "98.75108352 152\n", "101.86186193 153\n", "103.61720152 154\n", "99.57859892 155\n", "99.4889538 156\n", "103.05541444 157\n", "98.65912661 158\n", "98.72774132 159\n", "104.70526438 160\n", "110.44484313 161\n", "97.49594839 162\n", "96.59385486 163\n", "104.63817694 164\n", "102.55198606 165\n", "105.86078488 166\n", "96.5937781 167\n", "93.04610867 168\n", "99.92159953 169\n", "100.96781394 170\n", "96.76814836 171\n", "91.6779221 172\n", "101.79132774 173\n", "101.20773355 174\n", "98.29243952 175\n", "101.83845792 176\n", "97.94046856 177\n", "102.20618501 178\n", "91.37294597 179\n", "106.89005002 180\n", "106.57364584 181\n", "102.26648279 182\n", "107.40064604 183\n", "99.94318168 184\n", "103.40412146 185\n", "106.38276709 186\n", "98.00253006 187\n", "97.10439252 188\n", "99.80873105 189\n", "101.63973121 190\n", "106.46476468 191\n", "110.43976681 192\n", "100.69156231 193\n", "99.99579473 194\n", "101.32113654 195\n", "94.76253572 196\n", "97.24130034 197\n", "96.10020311 198\n", "94.57421727 199\n", "100.80409326 200\n", "105.02389857 201\n", "98.61325194 202\n", "95.62359311 203\n", "97.99762409 204\n", "103.83852459 205\n", "101.2226037 206\n", "94.11176915 207\n", "99.62387832 208\n", "104.51786419 209\n", "97.62787811 210\n", "93.97853495 211\n", "98.75108352 212\n", "106.05042487 213\n", "100.07721494 214\n", "106.89005002 215\n" ] } ], "source": [ "# iterating over whole datagmaiset (each value in each row)\n", "curr_index = 0\n", "for x in np.nditer(dataset):\n", " print(x, curr_index)\n", " curr_index += 1" ] }, { "cell_type": "code", "execution_count": 155, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(0, 0) 99.14931546\n", "(0, 1) 104.03852715\n", "(0, 2) 107.43534677\n", "(0, 3) 97.85230675\n", "(0, 4) 98.74986914\n", "(0, 5) 98.80833412\n", "(0, 6) 96.81964892\n", "(0, 7) 98.56783189\n", "(0, 8) 101.34745901\n", "(1, 0) 92.02628776\n", "(1, 1) 97.10439252\n", "(1, 2) 99.32066924\n", "(1, 3) 97.24584816\n", "(1, 4) 92.9267508\n", "(1, 5) 92.65657752\n", "(1, 6) 105.7197853\n", "(1, 7) 101.23162942\n", "(1, 8) 93.87155456\n", "(2, 0) 95.66253664\n", "(2, 1) 95.17750125\n", "(2, 2) 90.93318132\n", "(2, 3) 110.18889465\n", "(2, 4) 98.80084371\n", "(2, 5) 105.95297652\n", "(2, 6) 98.37481387\n", "(2, 7) 106.54654286\n", "(2, 8) 107.22482426\n", "(3, 0) 91.37294597\n", "(3, 1) 100.96781394\n", "(3, 2) 100.40118279\n", "(3, 3) 113.42090475\n", "(3, 4) 105.48508838\n", "(3, 5) 91.6604946\n", "(3, 6) 106.1472841\n", "(3, 7) 95.08715803\n", "(3, 8) 103.40412146\n", "(4, 0) 101.20862522\n", "(4, 1) 103.5730309\n", "(4, 2) 100.28690912\n", "(4, 3) 105.85269352\n", "(4, 4) 93.37126331\n", "(4, 5) 108.57980357\n", "(4, 6) 100.79478953\n", "(4, 7) 94.20019732\n", "(4, 8) 96.10020311\n", "(5, 0) 102.80387079\n", "(5, 1) 98.29687616\n", "(5, 2) 93.24376389\n", "(5, 3) 97.24130034\n", "(5, 4) 89.03452725\n", "(5, 5) 96.2832753\n", "(5, 6) 104.60344836\n", "(5, 7) 101.13442416\n", "(5, 8) 97.62787811\n", "(6, 0) 106.71751618\n", "(6, 1) 102.97585605\n", "(6, 2) 98.45723272\n", "(6, 3) 100.72418901\n", "(6, 4) 106.39798503\n", "(6, 5) 95.46493436\n", "(6, 6) 94.35373179\n", "(6, 7) 106.83273763\n", "(6, 8) 100.07721494\n", "(7, 0) 96.02548256\n", "(7, 1) 102.82360856\n", "(7, 2) 106.47551845\n", "(7, 3) 101.34745901\n", "(7, 4) 102.45651798\n", "(7, 5) 98.74767493\n", "(7, 6) 97.57544275\n", "(7, 7) 92.5748759\n", "(7, 8) 91.37294597\n", "(8, 0) 105.30350449\n", "(8, 1) 92.87730812\n", "(8, 2) 103.19258339\n", "(8, 3) 104.40518318\n", "(8, 4) 101.29326772\n", "(8, 5) 100.85447132\n", "(8, 6) 101.2226037\n", "(8, 7) 106.03868807\n", "(8, 8) 97.85230675\n", "(9, 0) 110.44484313\n", "(9, 1) 93.87155456\n", "(9, 2) 101.5363647\n", "(9, 3) 97.65393524\n", "(9, 4) 92.75048583\n", "(9, 5) 101.72074646\n", "(9, 6) 96.96851209\n", "(9, 7) 103.29147111\n", "(9, 8) 99.14931546\n", "(10, 0) 101.3514185\n", "(10, 1) 100.37372248\n", "(10, 2) 106.6471081\n", "(10, 3) 100.61742813\n", "(10, 4) 105.0320535\n", "(10, 5) 99.35999981\n", "(10, 6) 98.87007532\n", "(10, 7) 95.85284217\n", "(10, 8) 93.97853495\n", "(11, 0) 97.21315663\n", "(11, 1) 107.02874163\n", "(11, 2) 102.17642112\n", "(11, 3) 96.74630281\n", "(11, 4) 95.93799169\n", "(11, 5) 102.62384733\n", "(11, 6) 105.07475277\n", "(11, 7) 97.59572169\n", "(11, 8) 106.57364584\n", "(12, 0) 95.65982034\n", "(12, 1) 107.22482426\n", "(12, 2) 107.19119932\n", "(12, 3) 102.93039474\n", "(12, 4) 85.98839623\n", "(12, 5) 95.19184343\n", "(12, 6) 91.32093303\n", "(12, 7) 102.35313953\n", "(12, 8) 100.39303522\n", "(13, 0) 100.39303522\n", "(13, 1) 92.0108226\n", "(13, 2) 97.75887636\n", "(13, 3) 93.18884302\n", "(13, 4) 100.44940274\n", "(13, 5) 108.09423367\n", "(13, 6) 96.50342927\n", "(13, 7) 99.58664719\n", "(13, 8) 95.19184343\n", "(14, 0) 103.1521596\n", "(14, 1) 109.40523174\n", "(14, 2) 93.83969256\n", "(14, 3) 99.95827854\n", "(14, 4) 101.83462816\n", "(14, 5) 99.69982772\n", "(14, 6) 103.05289628\n", "(14, 7) 103.93383957\n", "(14, 8) 104.15899829\n", "(15, 0) 106.11454989\n", "(15, 1) 88.80221141\n", "(15, 2) 94.5081787\n", "(15, 3) 94.59300658\n", "(15, 4) 101.08830521\n", "(15, 5) 96.34622848\n", "(15, 6) 96.89244283\n", "(15, 7) 98.07122664\n", "(15, 8) 100.28690912\n", "(16, 0) 96.78266211\n", "(16, 1) 99.84251605\n", "(16, 2) 104.03478031\n", "(16, 3) 106.57052697\n", "(16, 4) 105.13668343\n", "(16, 5) 105.37011896\n", "(16, 6) 99.07551254\n", "(16, 7) 104.15899829\n", "(16, 8) 98.75108352\n", "(17, 0) 101.86186193\n", "(17, 1) 103.61720152\n", "(17, 2) 99.57859892\n", "(17, 3) 99.4889538\n", "(17, 4) 103.05541444\n", "(17, 5) 98.65912661\n", "(17, 6) 98.72774132\n", "(17, 7) 104.70526438\n", "(17, 8) 110.44484313\n", "(18, 0) 97.49594839\n", "(18, 1) 96.59385486\n", "(18, 2) 104.63817694\n", "(18, 3) 102.55198606\n", "(18, 4) 105.86078488\n", "(18, 5) 96.5937781\n", "(18, 6) 93.04610867\n", "(18, 7) 99.92159953\n", "(18, 8) 100.96781394\n", "(19, 0) 96.76814836\n", "(19, 1) 91.6779221\n", "(19, 2) 101.79132774\n", "(19, 3) 101.20773355\n", "(19, 4) 98.29243952\n", "(19, 5) 101.83845792\n", "(19, 6) 97.94046856\n", "(19, 7) 102.20618501\n", "(19, 8) 91.37294597\n", "(20, 0) 106.89005002\n", "(20, 1) 106.57364584\n", "(20, 2) 102.26648279\n", "(20, 3) 107.40064604\n", "(20, 4) 99.94318168\n", "(20, 5) 103.40412146\n", "(20, 6) 106.38276709\n", "(20, 7) 98.00253006\n", "(20, 8) 97.10439252\n", "(21, 0) 99.80873105\n", "(21, 1) 101.63973121\n", "(21, 2) 106.46476468\n", "(21, 3) 110.43976681\n", "(21, 4) 100.69156231\n", "(21, 5) 99.99579473\n", "(21, 6) 101.32113654\n", "(21, 7) 94.76253572\n", "(21, 8) 97.24130034\n", "(22, 0) 96.10020311\n", "(22, 1) 94.57421727\n", "(22, 2) 100.80409326\n", "(22, 3) 105.02389857\n", "(22, 4) 98.61325194\n", "(22, 5) 95.62359311\n", "(22, 6) 97.99762409\n", "(22, 7) 103.83852459\n", "(22, 8) 101.2226037\n", "(23, 0) 94.11176915\n", "(23, 1) 99.62387832\n", "(23, 2) 104.51786419\n", "(23, 3) 97.62787811\n", "(23, 4) 93.97853495\n", "(23, 5) 98.75108352\n", "(23, 6) 106.05042487\n", "(23, 7) 100.07721494\n", "(23, 8) 106.89005002\n" ] } ], "source": [ "# iterating over whole dataset with indices matching the position in the dataset\n", "for index, value in np.ndenumerate(dataset):\n", " print(index, value)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }