{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Distributions and Sampling with NumPy\n",
    "\n",
    "#### <i style=\"color:red\">**EXERCISES**</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "+ _1. When we print `my_list` and `my_array` it appears to produce the same result. How would you check the type of data structure for each one?_\n",
    "\n",
    "```python\n",
    "my_list = [11, 12, 33]\n",
    "print(my_list)\n",
    "\n",
    "my_array = np.array([11, 12, 33])\n",
    "print(my_array)\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "msiqhz0XPT6D"
   },
   "source": [
    "\n",
    "**SOLUTION**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[11, 12, 33] <class 'list'>\n",
      "[11 12 33] <class 'numpy.ndarray'>\n"
     ]
    }
   ],
   "source": [
    "my_list = [11, 12, 33]\n",
    "print(my_list, type(my_list))\n",
    "\n",
    "my_array = np.array([11, 12, 33])\n",
    "print(my_array, type(my_array))\n",
    "\n",
    "# printing also the type, would give us an indication of the data structure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "+ _2. Define two vectors $X$ and $Y$, with a set of 5 numbers of your choice. Secondly, try to perform the following operation $(X + Y) / 2$. Is it possible to make this operation with both Lists and Arrays? if not why?_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "msiqhz0XPT6D"
   },
   "source": [
    "\n",
    "**SOLUTION**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "unsupported operand type(s) for /: 'list' and 'int'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-8-ee5320697cea>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0mY\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m20\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m30\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m40\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m50\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mY\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for /: 'list' and 'int'"
     ]
    }
   ],
   "source": [
    "#Trying out with list\n",
    "X = [11, 12, 12, 14, 15]\n",
    "Y = [10, 20, 30, 40, 50]\n",
    "\n",
    "(X + Y) /2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([10.5, 16. , 21. , 27. , 32.5])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Trying out with array\n",
    "X = np.array([11, 12, 12, 14, 15])\n",
    "Y = np.array([10, 20, 30, 40, 50])\n",
    "\n",
    "(X + Y) /2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It does not work with lists simply because X and Y are Python objects, and one can't make mathematical operations with two objects just like that. On the other hands, Numpy makes arrays to be seen as a mathematical vector, which make it possible to perform operations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "+ _3. Let's say that I do want to import the whole stats library and then use the Uniform distribution_\n",
    "\n",
    "```python\n",
    "import scipy.stats as stats\n",
    "```\n",
    "\n",
    "Why is the following function not working anymore? Make the appropiate changes to fix the problem.\n",
    "\n",
    "```python\n",
    "uniform.rvs(size=100)\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "msiqhz0XPT6D"
   },
   "source": [
    "\n",
    "**SOLUTION**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are two ways to import libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.86471247, 0.86690817, 0.05705536, 0.46569065, 0.29175586,\n",
       "       0.66459976, 0.79557259, 0.85127024, 0.02890321, 0.03092554])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# importing the whole library\n",
    "import scipy.stats as stats\n",
    "\n",
    "# adding the alias at the begining\n",
    "stats.uniform.rvs(size=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.28894512, 0.47649095, 0.21860038, 0.80341151, 0.32551943,\n",
       "       0.82727626, 0.48046056, 0.08225253, 0.66734134, 0.31230785])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# importing only the fucntion needed\n",
    "from scipy.stats import uniform\n",
    "\n",
    "# no alias needed\n",
    "uniform.rvs(size=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "+ _4. All of the following 3 functions generate a sample of 20 random numbers from a Normal distribution with `mean = 10` and `sd = 5`. \n",
    "They have the same parameters, but do they produce the same results? What are the differences or similarities among them?_\n",
    "\n",
    "```python\n",
    "norm.rvs(10, 5, 20)\n",
    "\n",
    "norm.rvs(loc=10, scale=5, size=20, random_state=2021)\n",
    "\n",
    "norm.rvs(random_state=2021, scale=5, loc=10, size=20)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "msiqhz0XPT6D"
   },
   "source": [
    "\n",
    "**SOLUTION**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- When the name of the arguments in the fucntion are not specified, then the order of the parameters matter\n",
    "\n",
    "```python\n",
    "norm.rvs(10, 5, 20)\n",
    "\n",
    "```\n",
    "\n",
    "- When the name of the arguments are specified, then the parameters can be put in any order, like so\n",
    "\n",
    "```python\n",
    "norm.rvs(loc=10, scale=5, size=20)\n",
    "\n",
    "norm.rvs(scale=5, loc=10, size=20)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "\n",
    "+ _5. Given the following vector in an array form. Explore why it cannot be sampled. How do you fix this?_\n",
    "\n",
    "```python\n",
    "V = np.array([0, 1, 2, 3, 4, 5])\n",
    "random.sample(V, 2) \n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "Population must be a sequence or set.  For dicts, use list(d).",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-12-a9019d9632b4>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0mV\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrandom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msample\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mV\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/opt/anaconda3/lib/python3.7/random.py\u001b[0m in \u001b[0;36msample\u001b[0;34m(self, population, k)\u001b[0m\n\u001b[1;32m    315\u001b[0m             \u001b[0mpopulation\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    316\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_Sequence\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 317\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Population must be a sequence or set.  For dicts, use list(d).\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    318\u001b[0m         \u001b[0mrandbelow\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_randbelow\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    319\u001b[0m         \u001b[0mn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mTypeError\u001b[0m: Population must be a sequence or set.  For dicts, use list(d)."
     ]
    }
   ],
   "source": [
    "V = np.array([0, 1, 2, 3, 4, 5])\n",
    "random.sample(V, 2) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "msiqhz0XPT6D"
   },
   "source": [
    "\n",
    "**SOLUTION**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can't be sampled because of 2 reasons. \n",
    "1. The library random need to be imported\n",
    "2. the array need to be converted to a list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import random\n",
    "V = np.array([0, 1, 2, 3, 4, 5])\n",
    "V = list(V)\n",
    "random.sample(V, 2) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "+ _6. Define a vector with elements of your wish, you can use Lists or arrays to do this. Then write a code that will take a sample corresponding to the 20% total amount of elements\n",
    "\n",
    "Hint: you can use `len()` function\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "msiqhz0XPT6D"
   },
   "source": [
    "\n",
    "**SOLUTION**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "def percetage_sample(vector, percentage):\n",
    "    \"This function takes a vector and a desire percentage sample in decimal notation\"\n",
    "    # calculate elements to sample\n",
    "    to_sample = len(vector)*percentage\n",
    "    # convert to closest integer\n",
    "    to_sample = int(to_sample)\n",
    "    # do random sample\n",
    "    sampled = random.sample(list(vector), to_sample) \n",
    "    return sampled"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Too Hot to Handle', 'Rick and Morty']"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "netflix = [\"Luis Miguel\", \"New Amsterdam\", \"Lupin\", \"Shtisel\", \"Taco Chronicles\", \"The Queen's Gambit\", \n",
    "           \"Too Hot to Handle\", \"The Crown\", \"Rick and Morty\", \"Anne+\", \"Selling Sunset\", \"Vikings\"] \n",
    "\n",
    "# Sampling 20% of the Netflix list\n",
    "percetage_sample(netflix, 0.20)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}