{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Distributions and Sampling with NumPy\n", "\n", "#### **EXERCISES**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "+ _1. When we print `my_list` and `my_array` it appears to produce the same result. How would you check the type of data structure for each one?_\n", "\n", "```python\n", "my_list = [11, 12, 33]\n", "print(my_list)\n", "\n", "my_array = np.array([11, 12, 33])\n", "print(my_array)\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "msiqhz0XPT6D" }, "source": [ "\n", "**SOLUTION**" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[11, 12, 33] \n", "[11 12 33] \n" ] } ], "source": [ "my_list = [11, 12, 33]\n", "print(my_list, type(my_list))\n", "\n", "my_array = np.array([11, 12, 33])\n", "print(my_array, type(my_array))\n", "\n", "# printing also the type, would give us an indication of the data structure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "+ _2. Define two vectors $X$ and $Y$, with a set of 5 numbers of your choice. Secondly, try to perform the following operation $(X + Y) / 2$. Is it possible to make this operation with both Lists and Arrays? if not why?_" ] }, { "cell_type": "markdown", "metadata": { "id": "msiqhz0XPT6D" }, "source": [ "\n", "**SOLUTION**" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unsupported operand type(s) for /: 'list' and 'int'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mY\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m20\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m30\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m40\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m50\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mY\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for /: 'list' and 'int'" ] } ], "source": [ "#Trying out with list\n", "X = [11, 12, 12, 14, 15]\n", "Y = [10, 20, 30, 40, 50]\n", "\n", "(X + Y) /2" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10.5, 16. , 21. , 27. , 32.5])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Trying out with array\n", "X = np.array([11, 12, 12, 14, 15])\n", "Y = np.array([10, 20, 30, 40, 50])\n", "\n", "(X + Y) /2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It does not work with lists simply because X and Y are Python objects, and one can't make mathematical operations with two objects just like that. On the other hands, Numpy makes arrays to be seen as a mathematical vector, which make it possible to perform operations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "+ _3. Let's say that I do want to import the whole stats library and then use the Uniform distribution_\n", "\n", "```python\n", "import scipy.stats as stats\n", "```\n", "\n", "Why is the following function not working anymore? Make the appropiate changes to fix the problem.\n", "\n", "```python\n", "uniform.rvs(size=100)\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "msiqhz0XPT6D" }, "source": [ "\n", "**SOLUTION**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two ways to import libraries:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.86471247, 0.86690817, 0.05705536, 0.46569065, 0.29175586,\n", " 0.66459976, 0.79557259, 0.85127024, 0.02890321, 0.03092554])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# importing the whole library\n", "import scipy.stats as stats\n", "\n", "# adding the alias at the begining\n", "stats.uniform.rvs(size=10)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.28894512, 0.47649095, 0.21860038, 0.80341151, 0.32551943,\n", " 0.82727626, 0.48046056, 0.08225253, 0.66734134, 0.31230785])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# importing only the fucntion needed\n", "from scipy.stats import uniform\n", "\n", "# no alias needed\n", "uniform.rvs(size=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "+ _4. All of the following 3 functions generate a sample of 20 random numbers from a Normal distribution with `mean = 10` and `sd = 5`. \n", "They have the same parameters, but do they produce the same results? What are the differences or similarities among them?_\n", "\n", "```python\n", "norm.rvs(10, 5, 20)\n", "\n", "norm.rvs(loc=10, scale=5, size=20, random_state=2021)\n", "\n", "norm.rvs(random_state=2021, scale=5, loc=10, size=20)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "msiqhz0XPT6D" }, "source": [ "\n", "**SOLUTION**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- When the name of the arguments in the fucntion are not specified, then the order of the parameters matter\n", "\n", "```python\n", "norm.rvs(10, 5, 20)\n", "\n", "```\n", "\n", "- When the name of the arguments are specified, then the parameters can be put in any order, like so\n", "\n", "```python\n", "norm.rvs(loc=10, scale=5, size=20)\n", "\n", "norm.rvs(scale=5, loc=10, size=20)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "+ _5. Given the following vector in an array form. Explore why it cannot be sampled. How do you fix this?_\n", "\n", "```python\n", "V = np.array([0, 1, 2, 3, 4, 5])\n", "random.sample(V, 2) \n", "```\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "Population must be a sequence or set. For dicts, use list(d).", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mV\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrandom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msample\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mV\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/opt/anaconda3/lib/python3.7/random.py\u001b[0m in \u001b[0;36msample\u001b[0;34m(self, population, k)\u001b[0m\n\u001b[1;32m 315\u001b[0m \u001b[0mpopulation\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 316\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_Sequence\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 317\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Population must be a sequence or set. For dicts, use list(d).\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 318\u001b[0m \u001b[0mrandbelow\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_randbelow\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 319\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: Population must be a sequence or set. For dicts, use list(d)." ] } ], "source": [ "V = np.array([0, 1, 2, 3, 4, 5])\n", "random.sample(V, 2) " ] }, { "cell_type": "markdown", "metadata": { "id": "msiqhz0XPT6D" }, "source": [ "\n", "**SOLUTION**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can't be sampled because of 2 reasons. \n", "1. The library random need to be imported\n", "2. the array need to be converted to a list" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import random\n", "V = np.array([0, 1, 2, 3, 4, 5])\n", "V = list(V)\n", "random.sample(V, 2) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "+ _6. Define a vector with elements of your wish, you can use Lists or arrays to do this. Then write a code that will take a sample corresponding to the 20% total amount of elements\n", "\n", "Hint: you can use `len()` function\n" ] }, { "cell_type": "markdown", "metadata": { "id": "msiqhz0XPT6D" }, "source": [ "\n", "**SOLUTION**" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "def percetage_sample(vector, percentage):\n", " \"This function takes a vector and a desire percentage sample in decimal notation\"\n", " # calculate elements to sample\n", " to_sample = len(vector)*percentage\n", " # convert to closest integer\n", " to_sample = int(to_sample)\n", " # do random sample\n", " sampled = random.sample(list(vector), to_sample) \n", " return sampled" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Too Hot to Handle', 'Rick and Morty']" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "netflix = [\"Luis Miguel\", \"New Amsterdam\", \"Lupin\", \"Shtisel\", \"Taco Chronicles\", \"The Queen's Gambit\", \n", " \"Too Hot to Handle\", \"The Crown\", \"Rick and Morty\", \"Anne+\", \"Selling Sunset\", \"Vikings\"] \n", "\n", "# Sampling 20% of the Netflix list\n", "percetage_sample(netflix, 0.20)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }