# Lesson 9: Numpy

Version 1.0. Prepared by [Makzan](https://makzan.net). Updated at 2021 March.

Numpy provides essential vector and matric computation. Numpy comes with its own array implementation similar to Python list. The difference is that numpy array has only one type for the entire array. 

- [Numpy array creation](#Numpy-array-creation)
- [Zeros, ones, full, random](b#Zeros,-ones,-full,-random)
- [random seed](#random-seed)
- [Creating array with linspace](#Creating-array-with-linspace)
- [Reshaping array](#reshaping-array)
- [Array operations](#Numpy-Operations)
- [Array broadcast](#Array-broadcast)
- [Querying Array](#Querying-the-array)
- [Array slicing](#Array-Slicing)
- [Reading CSV](#Reading-CSV)
- [Dot Product](#Extra:-Dot-Product)
- [Solving linear equations with numpy](#Extra:-Solving-linear-equations-with-Numpy)

In [10]:
import numpy as np

## Numpy array creation

In [11]:
arr1 = np.array([1,2,3,4,5])
print(arr1)

[1 2 3 4 5]


array from range

In [12]:
arr2 = np.array(range(10))
print(arr2)

[0 1 2 3 4 5 6 7 8 9]


array from range with `arange`

In [13]:
arr2b = np.arange(10)
print(arr2b)

[0 1 2 3 4 5 6 7 8 9]


In [14]:
arr2c = np.arange(10,20)
print(arr2c)

[10 11 12 13 14 15 16 17 18 19]


In [15]:
arr2d = np.arange(1,20,2)
print(arr2d)

[ 1  3  5  7  9 11 13 15 17 19]


We can specify the data type by using `dtype`.

In [16]:
arr3 = np.array(range(10), dtype='float')
print(arr3)

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


### Exercise

Please generate an array of [10,20,30,40,50,60,70,80,90,100], in int type.

|Expected result|
|---|
|array([ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100])|

### Exercise

Please generate an array of [10,20,30,40,50,60,70,80,90,100], in float type

|Expected result|
|---|
|array([ 10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])|

## Zeros, ones, full, random

In [17]:
arr6 = np.zeros(10)
print(arr6)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [18]:
arr6b = np.zeros(10, dtype='int')
print(arr6b)

[0 0 0 0 0 0 0 0 0 0]


In [19]:
arr7 = np.ones(10, dtype='float')
print(arr7)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [20]:
arr8 = np.full(3, 3.14)
print(arr8)

[3.14 3.14 3.14]


In [21]:
arr9 = np.full( (3,5), 3.14)
print(arr9)

[[3.14 3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14 3.14]]


In [22]:
arr10 = np.random.rand(100)
print(arr10)

[0.79172504 0.52889492 0.56804456 0.92559664 0.07103606 0.0871293
 0.0202184  0.83261985 0.77815675 0.87001215 0.97861834 0.79915856
 0.46147936 0.78052918 0.11827443 0.63992102 0.14335329 0.94466892
 0.52184832 0.41466194 0.26455561 0.77423369 0.45615033 0.56843395
 0.0187898  0.6176355  0.61209572 0.616934   0.94374808 0.6818203
 0.3595079  0.43703195 0.6976312  0.06022547 0.66676672 0.67063787
 0.21038256 0.1289263  0.31542835 0.36371077 0.57019677 0.43860151
 0.98837384 0.10204481 0.20887676 0.16130952 0.65310833 0.2532916
 0.46631077 0.24442559 0.15896958 0.11037514 0.65632959 0.13818295
 0.19658236 0.36872517 0.82099323 0.09710128 0.83794491 0.09609841
 0.97645947 0.4686512  0.97676109 0.60484552 0.73926358 0.03918779
 0.28280696 0.12019656 0.2961402  0.11872772 0.31798318 0.41426299
 0.0641475  0.69247212 0.56660145 0.26538949 0.52324805 0.09394051
 0.5759465  0.9292962  0.31856895 0.66741038 0.13179786 0.7163272
 0.28940609 0.18319136 0.58651293 0.02010755 0.82894003 0.00469548

In [23]:
arr10b = np.random.rand(3,3)
print(arr10b)

[[0.44712538 0.84640867 0.69947928]
 [0.29743695 0.81379782 0.39650574]
 [0.8811032  0.58127287 0.88173536]]


## random seed

In programming language, random is not true random. We call it pseudorandom. Given the same seed, we can always generate the same sequence of numbers.

In [24]:
np.random.seed(0)
arr11 = np.random.rand(10,1)
print(arr11)

[[0.5488135 ]
 [0.71518937]
 [0.60276338]
 [0.54488318]
 [0.4236548 ]
 [0.64589411]
 [0.43758721]
 [0.891773  ]
 [0.96366276]
 [0.38344152]]


If we try to keep executing the following random function, we will keep getting new random numbers. But indeed, they are following the same sequence given the same seed.

Try re-running the previous seed and we will get the same sequence again.

In [25]:
np.random.rand(10,1)

array([[0.79172504],
       [0.52889492],
       [0.56804456],
       [0.92559664],
       [0.07103606],
       [0.0871293 ],
       [0.0202184 ],
       [0.83261985],
       [0.77815675],
       [0.87001215]])

**Exercise**: 
Pleaes try using the seed 540, and see if you can generate the following expected result.

In [26]:
np.random.seed(0)
arr = np.random.rand(10,1)
print(arr)

[[0.5488135 ]
 [0.71518937]
 [0.60276338]
 [0.54488318]
 [0.4236548 ]
 [0.64589411]
 [0.43758721]
 [0.891773  ]
 [0.96366276]
 [0.38344152]]


| Expected Result for seed(540) |
| --- |
| [[0.71688165]
 [0.50553693]
 [0.18142109]
 [0.70069925]
 [0.81784415]
 [0.28708016]
 [0.97490719]
 [0.09495503]
 [0.84069722]
 [0.06900928]]|

## Creating array with linspace

In [27]:
arr4 = np.linspace(0,10,3)
print(arr4)

[ 0.  5. 10.]


In [28]:
arr4b = np.linspace(0,100,5)
print(arr4b)

[  0.  25.  50.  75. 100.]


In [29]:
arr4c = np.linspace(0,1,4)
print(arr4c)

[0.         0.33333333 0.66666667 1.        ]


## reshaping array

In [30]:
arr5 = np.arange(1,13).reshape([3,4])
print(arr5)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [31]:
arr5.shape

(3, 4)

## Exercise

Try to create a random array with shape (5,4)

In [32]:
np.random.seed(0) # Reset the seed, in order to re-create the same result.




|Expected result|
|---|
|array([[0.5488135 , 0.71518937, 0.60276338, 0.54488318],
       [0.4236548 , 0.64589411, 0.43758721, 0.891773  ],
       [0.96366276, 0.38344152, 0.79172504, 0.52889492],
       [0.56804456, 0.92559664, 0.07103606, 0.0871293 ],
       [0.0202184 , 0.83261985, 0.77815675, 0.87001215]])|

## Exercise

Try to create a one's array with shape (5,4)

|Expected result|
|---|
|array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])|

## Array Operations

In [33]:
grid = np.arange(1,10).reshape([3,3])
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


### Tile

In [34]:
grid2 = np.arange(1,4)
print(grid2)

[1 2 3]


In [35]:
grid2 = np.tile(grid2, (3,1))
print(grid2)

[[1 2 3]
 [1 2 3]
 [1 2 3]]


In [36]:
print(grid+grid2)

[[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]]


In [37]:
print(grid-grid2)

[[0 0 0]
 [3 3 3]
 [6 6 6]]


In [38]:
print(grid*grid2)

[[ 1  4  9]
 [ 4 10 18]
 [ 7 16 27]]


In [39]:
print(grid/grid2)

[[1.  1.  1. ]
 [4.  2.5 2. ]
 [7.  4.  3. ]]


In [40]:
print(grid//grid2)

[[1 1 1]
 [4 2 2]
 [7 4 3]]


In [41]:
print(grid ** grid2)

[[  1   4  27]
 [  4  25 216]
 [  7  64 729]]


## Array broadcast

In [42]:
grid = np.arange(1,10).reshape([3,3])
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [43]:
grid.shape

(3, 3)

In [44]:
print(grid + 3)

[[ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [45]:
print(grid*3)

[[ 3  6  9]
 [12 15 18]
 [21 24 27]]


In [46]:
print(grid/10)

[[0.1 0.2 0.3]
 [0.4 0.5 0.6]
 [0.7 0.8 0.9]]


In [47]:
print(grid/3)

[[0.33333333 0.66666667 1.        ]
 [1.33333333 1.66666667 2.        ]
 [2.33333333 2.66666667 3.        ]]


In [48]:
print(grid//3)

[[0 0 1]
 [1 1 2]
 [2 2 3]]


In [49]:
print(grid+1)

[[ 2  3  4]
 [ 5  6  7]
 [ 8  9 10]]


In [50]:
grid2 = np.arange(1,4)
print(grid2)

[1 2 3]


In [51]:
grid2.shape

(3,)

In [52]:
grid = np.arange(1,10).reshape([3,3])
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [53]:
print(grid+grid2)

[[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]]


In [54]:
print(grid ** 2)

[[ 1  4  9]
 [16 25 36]
 [49 64 81]]


In [55]:
print(grid % 5)

[[1 2 3]
 [4 0 1]
 [2 3 4]]


## Querying the array

In [56]:
arr = np.random.random(10000)
print(arr)

[0.5488135  0.71518937 0.60276338 ... 0.75842952 0.02378743 0.81357508]


In [57]:
print(np.sum(arr))

4964.588916200894


In [58]:
print(np.max(arr))

0.9999779517807228


In [59]:
print(np.min(arr))

7.2449638492178e-05


In [60]:
print(np.mean(arr))

0.49645889162008944


In [61]:
print(np.median(arr))

0.49350103035904186


In [62]:
print(len(arr[arr<0.2]))

2060


In [63]:
print(len(arr[(arr>0.2) & (arr<0.3)]))

995


**Exercise**: Given the following numpy array, please find all the records with negative value.

In [64]:
arr = np.array((-3, 10, 20, -5, -2, 50, 34, -12, 10))

In [65]:
arr

array([ -3,  10,  20,  -5,  -2,  50,  34, -12,  10])

|Expected Result|
|---|
|array([ -3,  -5,  -2, -12])|

## Array Slicing

**Slicing in NumPy array is NOT COPY.**

In [66]:
# [i, j]
# [i, :]
# [:, j]
# [i_start:i_end, j_start:j_end]

In [67]:
grid = np.arange(1,13).reshape([3,4])
print(grid)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [68]:
print(grid[0,:])

[1 2 3 4]


In [69]:
print(grid[:,0])

[1 5 9]


In [70]:
print(grid[:,1:3])

[[ 2  3]
 [ 6  7]
 [10 11]]


In [71]:
grid2 = grid[:,:]

Letâ€™s see if the slicing is a copy or not:

In [72]:
grid[0,0] = 100

print(grid)

print(grid2)

[[100   2   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]
[[100   2   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]


In [73]:
grid[:,1:3] = 99

print(grid)

print(grid2)

[[100  99  99   4]
 [  5  99  99   8]
 [  9  99  99  12]]
[[100  99  99   4]
 [  5  99  99   8]
 [  9  99  99  12]]


## Reading CSV

In [74]:
data = np.genfromtxt('visitors.csv',delimiter=',', dtype='datetime64[D],uint8', skip_header=0, names=('date','visitors'))
print(data)

[('2018-12-18',  22) ('2018-12-17',   0) ('2018-12-16',   4)
 ('2018-12-15', 218) ('2018-12-14',  11) ('2018-12-13',  11)
 ('2018-12-12',  14) ('2018-12-11',   4) ('2018-12-10',   5)
 ('2018-12-09',  15) ('2018-12-08', 104) ('2018-12-07',  19)
 ('2018-12-06',   8) ('2018-12-05',   3) ('2018-12-04',  24)
 ('2018-12-03',  66) ('2018-12-02',  40) ('2018-12-01',  69)
 ('2018-11-30',   8) ('2018-11-29',  13) ('2018-11-28',  10)
 ('2018-11-27',  18) ('2018-11-26',  72) ('2018-11-25',  31)
 ('2018-11-24', 146) ('2018-11-23',  42) ('2018-11-22',  56)
 ('2018-11-21',  19) ('2018-11-20',  76) ('2018-11-19',  11)
 ('2018-11-18',   0) ('2018-11-17',   0) ('2018-11-16',   6)
 ('2018-11-15',   7) ('2018-11-14',  32) ('2018-11-13', 102)
 ('2018-11-12', 198) ('2018-11-11',  22) ('2018-11-10',  82)
 ('2018-11-09', 213) ('2018-11-08',  52) ('2018-11-07',  13)
 ('2018-11-06',   0) ('2018-11-05',   6) ('2018-11-04',   0)
 ('2018-11-03',   7) ('2018-11-02',  25) ('2018-11-01',  29)
 ('2018-10-31',   9) ('2

In [75]:
data['date']

array(['2018-12-18', '2018-12-17', '2018-12-16', '2018-12-15',
       '2018-12-14', '2018-12-13', '2018-12-12', '2018-12-11',
       '2018-12-10', '2018-12-09', '2018-12-08', '2018-12-07',
       '2018-12-06', '2018-12-05', '2018-12-04', '2018-12-03',
       '2018-12-02', '2018-12-01', '2018-11-30', '2018-11-29',
       '2018-11-28', '2018-11-27', '2018-11-26', '2018-11-25',
       '2018-11-24', '2018-11-23', '2018-11-22', '2018-11-21',
       '2018-11-20', '2018-11-19', '2018-11-18', '2018-11-17',
       '2018-11-16', '2018-11-15', '2018-11-14', '2018-11-13',
       '2018-11-12', '2018-11-11', '2018-11-10', '2018-11-09',
       '2018-11-08', '2018-11-07', '2018-11-06', '2018-11-05',
       '2018-11-04', '2018-11-03', '2018-11-02', '2018-11-01',
       '2018-10-31', '2018-10-30', '2018-10-29', '2018-10-28'],
      dtype='datetime64[D]')

In [76]:
data['visitors']

array([ 22,   0,   4, 218,  11,  11,  14,   4,   5,  15, 104,  19,   8,
         3,  24,  66,  40,  69,   8,  13,  10,  18,  72,  31, 146,  42,
        56,  19,  76,  11,   0,   0,   6,   7,  32, 102, 198,  22,  82,
       213,  52,  13,   0,   6,   0,   7,  25,  29,   9,  14,   4,   4],
      dtype=uint8)

### Exercise

What is the shape of the loaded CSV `data`?

|Expected result|
|---|
|(52,)|

What is the last 3 records in the data?

|Expected result|
|---|
|array([('2018-10-30', 14), ('2018-10-29',  4), ('2018-10-28',  4)],
      dtype=[('date', '<M8[D]'), ('visitors', 'u1')])|

What is the maximum visitors count for a single day?

|Expected result|
|---|
|218|

What is the minimum visitors coutn for a single day?

In [77]:
np.min(data['visitors'])

0

|Expected result|
|---|
|0|

If we exclude the day with 0 visitors, what is the minimum visitors a day?

First, try to create an array of visitors that exlucdes all 0 data.

|Expected result|
|---|
|array([ 22,   4, 218,  11,  11,  14,   4,   5,  15, 104,  19,   8,   3,
        24,  66,  40,  69,   8,  13,  10,  18,  72,  31, 146,  42,  56,
        19,  76,  11,   6,   7,  32, 102, 198,  22,  82, 213,  52,  13,
         6,   7,  25,  29,   9,  14,   4,   4], dtype=uint8)|

Next, we find the minimum value.

|Expected result|
|---|
|3|

## Extra: Dot Product

In [78]:
v1 = [2,3]
v2 = [5,3]
np.dot(v1, v2)

19

\begin{gather}
\begin{pmatrix}
2 & -1 \\
0 & 3 \\
1 & 0
\end{pmatrix}
\begin{pmatrix}
0 & 1 & 4 & -1\\
-2 & 0 & 0 & 2
\end{pmatrix}
\end{gather}

In [79]:
A=[[2,-1],
   [0,3],
   [1,0]]
B=[[0,1,4,-1],
   [-2,0,0,2]]

C = np.dot(A, B)
print(C)

[[ 2  2  8 -4]
 [-6  0  0  6]
 [ 0  1  4 -1]]


For instance, we can calculate the degree between two vector by using dot product and norm.

$ a.b=|a||b|\cos(\theta) $

In [80]:
def theta(v1, v2):
    dot_product = np.dot(v1,v2)
    norms = np.linalg.norm(v1)*np.linalg.norm(v2)
    rad = np.arccos(dot_product/norms)
    return np.rad2deg(rad)

In [81]:
v1 = [0,1]
v2 = [1,0]

theta(v1,v2)

90.0

In [82]:
v1 = [1,4,5]
v2 = [2,1,5]
v3 = [3,5,6]

Which two vectors are more "similar" to each other?

In [83]:
theta(v1,v2)

29.152519407030084

In [84]:
theta(v1,v3)

12.186074922100465

## Extra: Solving linear equations with Numpy

For instance, we can use Numpy to solve linear equations.

$ x+2y=7 $  
$ 3x+4y=15 $

We can express the equations in matrix form.

\begin{gather}
\begin{pmatrix}
1 & 2 \\
3 & 4
\end{pmatrix}
\begin{pmatrix}
x\\
y
\end{pmatrix}
=
\begin{pmatrix}
7\\
15
\end{pmatrix}
\end{gather}

$ Ar=s $  
$ r = A^{-1}s $

In [85]:
A = [[1, 2],
     [3, 4]]
s = [7, 15]
Ainv = np.linalg.inv(A)
Ainv

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

In [86]:
np.dot(Ainv, s)

array([1., 3.])

Let's try to solve the equations directly by using `np.linalg.solve`.

In [87]:
np.linalg.solve(A, s)

array([1., 3.])

Another example of solving a 3-variable equations by using Numpy.

$ 2x+y+3z=20 $  
$ x+2y+4z=21 $  
$ x+y+2z=13 $

We can express the equations in matrix form.

\begin{gather}
\begin{pmatrix}
2 & 1 & 3 \\
1 & 2 & 4 \\
1 & 1 & 2
\end{pmatrix}
\begin{pmatrix}
x\\
y\\
z
\end{pmatrix}
=
\begin{pmatrix}
20\\
21\\
13
\end{pmatrix}
\end{gather}

In [88]:
A = [[2, 1, 3],
     [1, 2, 4],
     [1, 1, 2]]

s = [20, 21, 13]

r = np.linalg.solve(A, s)
r

array([5., 4., 2.])

### Exercise

Given the following equations, please calculate the value of x, y and z

$ x+y+z=14 $  
$ 2x+y+2z=25 $  
$ 3y+z=16 $

|Expected result|
|---|
|array([4., 3., 7.])|

## Summay

In this lesson, we learned to express vector and matrix by using Numpy. We also learned essential operations and have a glimpse on how Numpy can help us on numerical computation.

In next lesson, we will use Pandas to process our data into tabular data with `series`.