{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Distributions and Sampling with NumPy\n",
"\n",
"#### **EXERCISES**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"+ _1. When we print `my_list` and `my_array` it appears to produce the same result. How would you check the type of data structure for each one?_\n",
"\n",
"```python\n",
"my_list = [11, 12, 33]\n",
"print(my_list)\n",
"\n",
"my_array = np.array([11, 12, 33])\n",
"print(my_array)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "msiqhz0XPT6D"
},
"source": [
"\n",
"**SOLUTION**"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[11, 12, 33] \n",
"[11 12 33] \n"
]
}
],
"source": [
"my_list = [11, 12, 33]\n",
"print(my_list, type(my_list))\n",
"\n",
"my_array = np.array([11, 12, 33])\n",
"print(my_array, type(my_array))\n",
"\n",
"# printing also the type, would give us an indication of the data structure"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"+ _2. Define two vectors $X$ and $Y$, with a set of 5 numbers of your choice. Secondly, try to perform the following operation $(X + Y) / 2$. Is it possible to make this operation with both Lists and Arrays? if not why?_"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "msiqhz0XPT6D"
},
"source": [
"\n",
"**SOLUTION**"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "unsupported operand type(s) for /: 'list' and 'int'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mY\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m20\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m30\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m40\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m50\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mY\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for /: 'list' and 'int'"
]
}
],
"source": [
"#Trying out with list\n",
"X = [11, 12, 12, 14, 15]\n",
"Y = [10, 20, 30, 40, 50]\n",
"\n",
"(X + Y) /2"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([10.5, 16. , 21. , 27. , 32.5])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Trying out with array\n",
"X = np.array([11, 12, 12, 14, 15])\n",
"Y = np.array([10, 20, 30, 40, 50])\n",
"\n",
"(X + Y) /2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It does not work with lists simply because X and Y are Python objects, and one can't make mathematical operations with two objects just like that. On the other hands, Numpy makes arrays to be seen as a mathematical vector, which make it possible to perform operations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"+ _3. Let's say that I do want to import the whole stats library and then use the Uniform distribution_\n",
"\n",
"```python\n",
"import scipy.stats as stats\n",
"```\n",
"\n",
"Why is the following function not working anymore? Make the appropiate changes to fix the problem.\n",
"\n",
"```python\n",
"uniform.rvs(size=100)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "msiqhz0XPT6D"
},
"source": [
"\n",
"**SOLUTION**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two ways to import libraries:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.86471247, 0.86690817, 0.05705536, 0.46569065, 0.29175586,\n",
" 0.66459976, 0.79557259, 0.85127024, 0.02890321, 0.03092554])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# importing the whole library\n",
"import scipy.stats as stats\n",
"\n",
"# adding the alias at the begining\n",
"stats.uniform.rvs(size=10)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.28894512, 0.47649095, 0.21860038, 0.80341151, 0.32551943,\n",
" 0.82727626, 0.48046056, 0.08225253, 0.66734134, 0.31230785])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# importing only the fucntion needed\n",
"from scipy.stats import uniform\n",
"\n",
"# no alias needed\n",
"uniform.rvs(size=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"+ _4. All of the following 3 functions generate a sample of 20 random numbers from a Normal distribution with `mean = 10` and `sd = 5`. \n",
"They have the same parameters, but do they produce the same results? What are the differences or similarities among them?_\n",
"\n",
"```python\n",
"norm.rvs(10, 5, 20)\n",
"\n",
"norm.rvs(loc=10, scale=5, size=20, random_state=2021)\n",
"\n",
"norm.rvs(random_state=2021, scale=5, loc=10, size=20)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "msiqhz0XPT6D"
},
"source": [
"\n",
"**SOLUTION**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- When the name of the arguments in the fucntion are not specified, then the order of the parameters matter\n",
"\n",
"```python\n",
"norm.rvs(10, 5, 20)\n",
"\n",
"```\n",
"\n",
"- When the name of the arguments are specified, then the parameters can be put in any order, like so\n",
"\n",
"```python\n",
"norm.rvs(loc=10, scale=5, size=20)\n",
"\n",
"norm.rvs(scale=5, loc=10, size=20)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"+ _5. Given the following vector in an array form. Explore why it cannot be sampled. How do you fix this?_\n",
"\n",
"```python\n",
"V = np.array([0, 1, 2, 3, 4, 5])\n",
"random.sample(V, 2) \n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "Population must be a sequence or set. For dicts, use list(d).",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mV\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrandom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msample\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mV\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m~/opt/anaconda3/lib/python3.7/random.py\u001b[0m in \u001b[0;36msample\u001b[0;34m(self, population, k)\u001b[0m\n\u001b[1;32m 315\u001b[0m \u001b[0mpopulation\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 316\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_Sequence\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 317\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Population must be a sequence or set. For dicts, use list(d).\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 318\u001b[0m \u001b[0mrandbelow\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_randbelow\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 319\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpopulation\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: Population must be a sequence or set. For dicts, use list(d)."
]
}
],
"source": [
"V = np.array([0, 1, 2, 3, 4, 5])\n",
"random.sample(V, 2) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "msiqhz0XPT6D"
},
"source": [
"\n",
"**SOLUTION**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This can't be sampled because of 2 reasons. \n",
"1. The library random need to be imported\n",
"2. the array need to be converted to a list"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import random\n",
"V = np.array([0, 1, 2, 3, 4, 5])\n",
"V = list(V)\n",
"random.sample(V, 2) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"+ _6. Define a vector with elements of your wish, you can use Lists or arrays to do this. Then write a code that will take a sample corresponding to the 20% total amount of elements\n",
"\n",
"Hint: you can use `len()` function\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "msiqhz0XPT6D"
},
"source": [
"\n",
"**SOLUTION**"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"def percetage_sample(vector, percentage):\n",
" \"This function takes a vector and a desire percentage sample in decimal notation\"\n",
" # calculate elements to sample\n",
" to_sample = len(vector)*percentage\n",
" # convert to closest integer\n",
" to_sample = int(to_sample)\n",
" # do random sample\n",
" sampled = random.sample(list(vector), to_sample) \n",
" return sampled"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Too Hot to Handle', 'Rick and Morty']"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"netflix = [\"Luis Miguel\", \"New Amsterdam\", \"Lupin\", \"Shtisel\", \"Taco Chronicles\", \"The Queen's Gambit\", \n",
" \"Too Hot to Handle\", \"The Crown\", \"Rick and Morty\", \"Anne+\", \"Selling Sunset\", \"Vikings\"] \n",
"\n",
"# Sampling 20% of the Netflix list\n",
"percetage_sample(netflix, 0.20)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}