# Distributions and Sampling with NumPy

#### **EXERCISES**

___
+ _1. When we print `my_list` and `my_array` it appears to produce the same result. How would you check the type of data structure for each one?_

```python
my_list = [11, 12, 33]
print(my_list)

my_array = np.array([11, 12, 33])
print(my_array)
```



**SOLUTION**

In [6]:
import numpy as np

In [7]:
my_list = [11, 12, 33]
print(my_list, type(my_list))

my_array = np.array([11, 12, 33])
print(my_array, type(my_array))

# printing also the type, would give us an indication of the data structure

[11, 12, 33] 
[11 12 33] 


___
+ _2. Define two vectors $X$ and $Y$, with a set of 5 numbers of your choice. Secondly, try to perform the following operation $(X + Y) / 2$. Is it possible to make this operation with both Lists and Arrays? if not why?_


**SOLUTION**

In [8]:
#Trying out with list
X = [11, 12, 12, 14, 15]
Y = [10, 20, 30, 40, 50]

(X + Y) /2

TypeError: unsupported operand type(s) for /: 'list' and 'int'

In [9]:
#Trying out with array
X = np.array([11, 12, 12, 14, 15])
Y = np.array([10, 20, 30, 40, 50])

(X + Y) /2

array([10.5, 16. , 21. , 27. , 32.5])

It does not work with lists simply because X and Y are Python objects, and one can't make mathematical operations with two objects just like that. On the other hands, Numpy makes arrays to be seen as a mathematical vector, which make it possible to perform operations.

___
+ _3. Let's say that I do want to import the whole stats library and then use the Uniform distribution_

```python
import scipy.stats as stats
```

Why is the following function not working anymore? Make the appropiate changes to fix the problem.

```python
uniform.rvs(size=100)
```



**SOLUTION**

There are two ways to import libraries:

In [10]:
# importing the whole library
import scipy.stats as stats

# adding the alias at the begining
stats.uniform.rvs(size=10)

array([0.86471247, 0.86690817, 0.05705536, 0.46569065, 0.29175586,
 0.66459976, 0.79557259, 0.85127024, 0.02890321, 0.03092554])

In [11]:
# importing only the fucntion needed
from scipy.stats import uniform

# no alias needed
uniform.rvs(size=10)

array([0.28894512, 0.47649095, 0.21860038, 0.80341151, 0.32551943,
 0.82727626, 0.48046056, 0.08225253, 0.66734134, 0.31230785])

___
+ _4. All of the following 3 functions generate a sample of 20 random numbers from a Normal distribution with `mean = 10` and `sd = 5`. 
They have the same parameters, but do they produce the same results? What are the differences or similarities among them?_

```python
norm.rvs(10, 5, 20)

norm.rvs(loc=10, scale=5, size=20, random_state=2021)

norm.rvs(random_state=2021, scale=5, loc=10, size=20)
```


**SOLUTION**

- When the name of the arguments in the fucntion are not specified, then the order of the parameters matter

```python
norm.rvs(10, 5, 20)

```

- When the name of the arguments are specified, then the parameters can be put in any order, like so

```python
norm.rvs(loc=10, scale=5, size=20)

norm.rvs(scale=5, loc=10, size=20)
```

___

+ _5. Given the following vector in an array form. Explore why it cannot be sampled. How do you fix this?_

```python
V = np.array([0, 1, 2, 3, 4, 5])
random.sample(V, 2) 
```


In [12]:
V = np.array([0, 1, 2, 3, 4, 5])
random.sample(V, 2) 

TypeError: Population must be a sequence or set. For dicts, use list(d).


**SOLUTION**

This can't be sampled because of 2 reasons. 
1. The library random need to be imported
2. the array need to be converted to a list

In [13]:
import random
V = np.array([0, 1, 2, 3, 4, 5])
V = list(V)
random.sample(V, 2) 

[1, 2]

___
+ _6. Define a vector with elements of your wish, you can use Lists or arrays to do this. Then write a code that will take a sample corresponding to the 20% total amount of elements

Hint: you can use `len()` function



**SOLUTION**

In [20]:
import random

def percetage_sample(vector, percentage):
 "This function takes a vector and a desire percentage sample in decimal notation"
 # calculate elements to sample
 to_sample = len(vector)*percentage
 # convert to closest integer
 to_sample = int(to_sample)
 # do random sample
 sampled = random.sample(list(vector), to_sample) 
 return sampled

In [21]:
netflix = ["Luis Miguel", "New Amsterdam", "Lupin", "Shtisel", "Taco Chronicles", "The Queen's Gambit", 
 "Too Hot to Handle", "The Crown", "Rick and Morty", "Anne+", "Selling Sunset", "Vikings"] 

# Sampling 20% of the Netflix list
percetage_sample(netflix, 0.20)

['Too Hot to Handle', 'Rick and Morty']