{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", " Numpy\n", "

\n", "\n", "

\n", "You can use the asterisk (\\*) to repeat the list multiple times.
\n", "For more details, check the Python list operations: [click me](https://www.tutorialspoint.com/python/python_lists.htm)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Abby', 'Ann', 'Cameron', 'Shubhom', 'Ethan', 'Dae Won', 'Jared']" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b_list=[\"Dae Won\", \"Jared\"]\n", "ab_list=a_list + b_list # Using the plus (+) sign I can put 2 lists together\n", "ab_list" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Dae Won', 'Jared', 'Dae Won', 'Jared', 'Dae Won', 'Jared']" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b_list=b_list*3 # Using the asterisk (*) I can repeat the list.\n", "b_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "What is Numpy?\n", "

\n", "\n", "Numerical Python, or \"Numpy\" for short, is a foundational package on which many of the most common data science packages are built. Numpy provides us with multi-dimensional arrays, called ndarrays, which can be created as vectors or matrices. We can use numpy to manipulate datasets to make them easier to work with. Numpy also comes with a number of helpful statistical methods.\n", "\n", "The key features of numpy are:\n", "\n", "- **ndarrays** are n-dimensional arrays of the same data type which are fast and space-efficient. There are many built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).\n", "- **Broadcasting** is a tool which dictates how operations between multi-dimensional arrays of different sizes will be carried out.\n", "- **Vectorization** allows for numeric operations on ndarrays.\n", "- **Input/Output** simplifies reading and writing of data from/to file.\n", "\n", "Additional Recommended Resources:
\n", "Numpy Documentation
\n", "Python for Data Analysis by Wes McKinney
\n", "Python Data science Handbook by Jake VanderPlas\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Intro to ndarrays

\n", "\n", "**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy. One important thing to note is that all elements in an ndarray must be of the same type. Let's get started by creating ndarrays using the numpy package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Creating and modifying Rank 1 ndarrays:\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The \"as\" keyword in the import statement allows us to give a local name to the numpy package, so that we can refer to it as \"np\" rather than \"numpy\" in subsequent code. In the following lines of code, we use a couple of methods in the numpy package:\n", "\n", "**np.array([comma-separated elements here])** creates a rank 1 array (like a vector) with the elements specified between the brackets\n", "\n", "**nameOfArray.shape()** returns a list of integers that represent the size of the array in each dimension" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "import numpy as np # makes local name for package numpy as np\n", "\n", "arr = np.array([3, 2, 1]) # Create a rank 1 array\n", "\n", "print(type(arr)) # The type of an ndarray is \"\"" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3,)\n" ] } ], "source": [ "# the shape of arr\n", "print(arr.shape) " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 2, 1)\n" ] } ], "source": [ "# access each element in the array using its index\n", "print(arr[0], arr[1], arr[2]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ndarrays are **mutable**, which means that the contents of an array can be changed after it is created. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[100 2 1]\n" ] } ], "source": [ "arr[0] = 100 # change the first element of the array (the element at index 0)\n", "\n", "print(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Creating a Rank 2 ndarray:

\n", "\n", "A rank 2 **ndarray** has two dimensions. Notice the format below of [ [row] , [row] ]. 2 dimensional arrays are great for representing matrices which are often useful in data science. We use the same methods as before to analyze rank 2 arrays as well." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [6 5 4]]\n", "(2L, 3L)\n", "(1, 2, 6)\n" ] } ], "source": [ "arr2 = np.array([[1,2,3],[6,5,4]]) # Create a rank 2 array\n", "\n", "print(arr2) # print the array\n", "\n", "print(arr2.shape) # print number of rows, columns \n", "\n", "print(arr2[0, 0], arr2[0, 1], arr2[1, 0]) # print the elements at the specified indices [row, column]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Different ways to create ndarrays:\n", "

\n", "\n", "In the code below, we create a number of different sized arrays with different shapes and values. Numpy has some built in methods (listed below) which help us quickly and easily create multidimensional arrays with pre-filled values.\n", "\n", "**np.zeros((dimensions))** creates an array of zeros with the specified dimensions\n", "\n", "**np.full((dimensions), value)** creates an array with the specified dimensions where every element is the specified value\n", "\n", "**np.eye(dimensions)** creates an array where the elements on the diagonal are 1s and all other elements are 0\n", "\n", "**np.ones(dimensions)** creates an array of ones\n", "\n", "**np.random.random(dimensions)** generates an array of random floating-point values between 0 and 1" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0.]]\n" ] } ], "source": [ "import numpy as np\n", "\n", "# create a 3x4 array of zeros\n", "ex1 = np.zeros((3, 4)) \n", "print(ex1) " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 6. 6. 6.]\n", " [ 6. 6. 6.]]\n" ] } ], "source": [ "# create a 2x3 array filled with 9.0\n", "ex2 = np.full((2,3), 6.0) \n", "print(ex2) " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.33070061 0.42710688 0.7277987 ]\n", " [ 0.63052933 0.12942678 0.86988264]]\n" ] } ], "source": [ "# create a 2x3 array of random floating-point numbers between 0 and 1\n", "ex3 = np.random.random((2,3))\n", "print(ex3) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Using Array Indexing\n", "

\n", "\n", "It's often more convenient to look only at specific sections of arrays. We can accomplish this using **array indexing**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Slice indexing (slicing):\n", "

\n", "\n", "We can use slice indexing to pull out sub-regions of ndarrays. The general syntax for this is array[start index:end index]. Note that the start index is included in the slice, while the last index is not. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[11 12 13 14]\n", " [21 22 23 24]\n", " [31 32 33 34]\n", " [41 42 43 44]]\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Rank 2 array of shape (4, 4)\n", "an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34], [41,42,43,44]])\n", "print(an_array)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[13 14]\n", " [23 24]]\n" ] } ], "source": [ "array_slice = an_array[:2, 2:] # Use array slicing to get a subarray of the first 2 rows x the last 2 columns\n", "print(array_slice)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you modify a slice, you actually modify the underlying array. This is because when you use array slicing, you aren't creating a new array; instead, you're creating a reference to the slice of the array that you've selected. Also, note that the element at a given index in a slice often does not correspond to the element at that index in the original array, since a slice is a section of the original array. For example:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Initial element at 0, 1: 55\n", "After modification: 55\n" ] } ], "source": [ "print('Initial element at 0, 1: ', an_array[0, 2]) # print the element at 0, 1 \n", "array_slice[0, 0] = 55 # array_slice[0, 0] is the same piece of data as an_array[0, 2]\n", "print('After modification: ', an_array[0, 2]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Using integer indexing & slice indexing\n", "

\n", "\n", "Integer indexing, as the name implies, simply selects the elements of an array at the specified indices. We can use combinations of integer indexing and slice indexing to create different shaped matrices." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[11 12 13 14]\n", " [21 22 23 24]\n", " [31 32 33 34]\n", " [41 42 43 44]]\n" ] } ], "source": [ "# create the same 4x4 array as above\n", "an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34], [41,42,43,44]])\n", "print(an_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When slicing, [:] with no start or end indices selects all the elements in that row or column. In the following example, the combination of integer and slice indexing selects all elements in the last row of the original array. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[31 32 33 34]\n" ] } ], "source": [ "# Using integer indexing with slicing generates an array of lower rank\n", "row_rank1 = an_array[2, :]\n", "\n", "print(row_rank1) # notice the []" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that when you try to do the same thing using slicing alone, the subarray that you create will be of the same rank as the original array, even though it may actually be of a smaller dimension." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[31 32 33 34]]\n" ] } ], "source": [ "# Using slicing alone generates an array of the same rank as the original array\n", "row_rank2 = an_array[2:3, :]\n", "\n", "print(row_rank2) # Notice the [[ ]]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[14 24 34 44]\n", "\n", "[[14]\n", " [24]\n", " [34]\n", " [44]]\n" ] } ], "source": [ "#We see the same thing when we work with the columns instead of the rows of the array:\n", "col_rank1 = an_array[:, 3]# turns to single brackets\n", "col_rank2 = an_array[:, 3:4]# turns to double brackets\n", "\n", "print(col_rank1) #signle brackets\n", "print()\n", "print(col_rank2) #double brackets\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Using Array Indexing to change elements\n", "

\n", "Using Boolean Indexing\n", "\n", "

\n", "\n", "We can also use boolean indexing to filter out elements of an array based on whether or not they fulfill some condition. This is very useful when we only want to look at a specific portion of a dataset, ex. where some entries have a certain characteristic we want to explore. \n", "\n", "

\n", "\n", "Using Array Indexing to change elements\n", "

\n", "\n", "Array Operations\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Arithmetic Array Operations:\n", "\n", "

\n", "\n", "Intro to Statistical Methods, Sorting, and Set Operations\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Getting Started with Statistical Operations\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many useful statistical operations for numpy arrays, some of which are:\n", "\n", "**array.mean()**, which computes the mean of all elements in a matrix. **array.mean(axis = 1)** returns an array containing the mean values of each row, while **arr.mean(axis = 0)** returns an array containing the mean values of each column.\n", "\n", "**array.sum()** returns the sum of all of the elements in the array.\n", "\n", "**np.median(array, axis=)** computes the median of the elements in an array. Similar to the mean function, the axis argument specifies whether the medians should be computed by row or by column. \n", "\n", "There are many other statistical methods out there; check out the numpy reference below if you need a function that isn't listed here or if you're looking for more detailed information about the functions above. Numpy Reference
" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-14.23672556 -0.78790432 4.5445005 ]\n", " [ 8.69446704 -0.5144282 3.5245827 ]\n", " [ -2.45896255 1.9621798 -26.3136989 ]]\n", "\n", " -2.84288772214\n", "\n", " [-3.49337646 3.90154051 -8.93682722]\n", "\n", " [-2.66707369 0.21994909 -6.08153857]\n", "\n", " -25.5859894993\n", "\n", " -0.514428200974\n", "\n", " [-0.78790432 3.5245827 -2.45896255]\n" ] } ], "source": [ "# setup a random 3x3 matrix\n", "arr = 10 * np.random.randn(3,3)\n", "print(arr)\n", "\n", "# compute the mean for all elements in the array\n", "print('\\n',arr.mean())\n", "\n", "# set the axis value to 1 compute the means for each row\n", "print('\\n',arr.mean(axis = 1))\n", "\n", "# set the axis value to 0 compute the means for each column\n", "print('\\n',arr.mean(axis = 0))\n", "\n", "# sum all the elements in the array\n", "print('\\n',arr.sum())\n", "\n", "# compute the median for all elements in the array\n", "print('\\n',np.median(arr))\n", "\n", "# compute the medians for each row\n", "print('\\n',np.median(arr, axis = 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Using the Unique method\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The NumPy method **unique** is very useful in data science. It allows us to pull out only the values that are unique in an array. Note that in the following example, the array has a number of duplicate 8s, 12s, and 13s. The output after calling **unique** on it is just 8, 12, and 13." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 8 12 13]\n" ] } ], "source": [ "an_array = np.array([8,12,13,13,12,8,13,12])\n", "\n", "print(np.unique(an_array))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Set Operations on ndarrays\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "We can use set routines in numpy to perform operations on and compare two arrays. In the code below, we use the following set methods:\n", "\n", "**np.intersect1d(array1, array2)** returns an array with the values that array1 and array2 both have.\n", "\n", "**np.union1d(array1, array2)** returns an array with all the unique values from both array1 and array2.\n", "\n", "**np.setdiff1d(array1, array2)** returns an array with elements in array1 that are not in array2.\n", "\n", "**np.in1d(array1, array2)** returns a boolean array of whether each element of array1 is also present in array2.\n", "\n", "\\*Note that the two arrays can have different numbers of elements, but must both be rank 1 arrays.\n" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['dog' 'cat' 'bird' 'turtle'] ['cat' 'bird' 'horse']\n" ] } ], "source": [ "ar1 = np.array(['dog','cat','bird','turtle'])\n", "ar2 = np.array(['cat','bird','horse'])\n", "print(ar1, ar2)" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['bird' 'cat']\n" ] } ], "source": [ "print( np.intersect1d(ar1, ar2) ) " ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['bird' 'cat' 'dog' 'horse' 'turtle']\n" ] } ], "source": [ "print( np.union1d(ar1, ar2) )" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['dog' 'turtle']\n" ] } ], "source": [ "print( np.setdiff1d(ar1, ar2) )" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False True True False]\n" ] } ], "source": [ "print( np.in1d(ar1, ar2) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Intro to Broadcasting:\n", "

\n", "

\n", "\n", "Other Common ndarray Operations\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, you'll find some other useful functions for ndarrays. There are a myriad of these, so we encourage you to go through these and explore the numpy documentation linked below.\n", "\n", "Numpy Documentation
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Dot Product and Inner Product\n", "\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**array1.dot(array2)** or **np.dot(array1, array2)** returns the dot or inner product of two arrays.\n", "\n", "\\*Note that if the two arrays are 2D (matrices), **dot** returns the dot product, and if they are 1D (vectors), it returns the inner product." ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[4 4]\n", " [4 4]]\n", "\n", "[[4 4]\n", " [4 4]]\n" ] } ], "source": [ "# determine the dot product of two matrices\n", "arr1_2d = np.array([[2,2],[2,2]])\n", "arr2_2d = np.array([[1,1],[1,1]])\n", "\n", "print(arr1_2d.dot(arr2_2d))\n", "print()\n", "print(np.dot(arr1_2d, arr2_2d))" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "180\n", "\n", "180\n" ] } ], "source": [ "# determine the inner product of two vectors\n", "arr1_1d = np.array([9 , 9 ])\n", "arr2_1d = np.array([10, 10])\n", "\n", "print(arr1_1d.dot(arr2_1d))\n", "print()\n", "print(np.dot(arr1_1d, arr2_1d))" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[36 36]\n", "\n", "[36 36]\n" ] } ], "source": [ "# dot product on an array and vector\n", "print(arr1_2d.dot(arr1_1d))\n", "print()\n", "print(np.dot(arr1_2d, arr1_1d))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Using sum():\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following code, we explore the various uses of the **sum()** method." ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "70\n" ] } ], "source": [ "# sum elements in the array\n", "arr1 = np.array([[10,15],[20,25]])\n", "\n", "print(np.sum(arr1)) # sum of all elements" ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[30 40]\n" ] } ], "source": [ "print(np.sum(arr1, axis=0)) # sum of elements in each column" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[25 45]\n" ] } ], "source": [ "print(np.sum(arr1, axis=1)) # sum of elements in each row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Element-wise Functions:

\n", "\n", "**np.maximum(array1, array2)** compares two arrays and returns a new array containing the element-wise maxima. For more element-wise functions, see the numpy documentation." ] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.2464078 1.08978049]\n", " [-2.32407039 -1.26413564]\n", " [ 0.65640993 -0.08689985]]\n" ] } ], "source": [ "# create a random array\n", "a = np.random.randn(3,2)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 147, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-0.23497313 -0.65355001]\n", " [-1.44739306 -0.24026659]\n", " [ 0.50512202 -1.92570741]]\n" ] } ], "source": [ "# create another random array\n", "b = np.random.randn(3,2)\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 148, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.2464078 1.08978049]\n", " [-1.44739306 -0.24026659]\n", " [ 0.65640993 -0.08689985]]\n" ] } ], "source": [ "# return the element wise maxima between two arrays\n", "\n", "print(np.maximum(a, b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Reshaping arrays:\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**array.reshape(dimensions)** gives a new shape to an array without changing its data." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]\n" ] } ], "source": [ "# put values 0 through 14 in an array\n", "arr = np.arange(15)\n", "print(arr)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 1 2]\n", " [ 3 4 5]\n", " [ 6 7 8]\n", " [ 9 10 11]\n", " [12 13 14]]\n" ] } ], "source": [ "# reshape to be a 5 x 3 matrix\n", "new_arr = arr.reshape(5,3)\n", "print(new_arr)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 1 2 3 4]\n", " [ 5 6 7 8 9]\n", " [10 11 12 13 14]]\n" ] } ], "source": [ "new_arr2=new_arr.reshape(3,5)\n", "print(new_arr2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Using transpose():\n", "\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**np.transpose(array)** returns the transpose of an array with its dimensions permuted." ] }, { "cell_type": "code", "execution_count": 163, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[11 21]\n", " [12 22]]\n" ] } ], "source": [ "# transpose\n", "arr = np.array([[11,12],[21,22]])\n", "\n", "new_arr1 = np.transpose(arr)\n", "print(new_arr1)" ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[11 21]\n", " [12 22]]\n" ] } ], "source": [ "# another way to call the method\n", "new_arr2 = arr.T\n", "print(new_arr2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Indexing using where():

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**np.where(condition, array1, array2)** returns elements, either from array1 or array2, depending on the condition. The output array contains elements of array1 where the condition is True, and elements from array2 elsewhere." ] }, { "cell_type": "code", "execution_count": 168, "metadata": { "collapsed": true }, "outputs": [], "source": [ "array1 = np.array([1,2,3,4,5])\n", "\n", "array2 = np.array([10,20,30,40,50])\n", "\n", "filter = np.array([True, False, True, False, True])" ] }, { "cell_type": "code", "execution_count": 169, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1 20 3 40 5]\n" ] } ], "source": [ "out = np.where(filter, array1, array2)\n", "print(out)" ] }, { "cell_type": "code", "execution_count": 173, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.19950743 0.02384889 0.61389011]\n", " [ 0.17116865 0.7800457 0.19325871]\n", " [ 0.35606818 0.16259557 0.06632833]]\n" ] } ], "source": [ "ran_arr = np.random.rand(3,3)\n", "print(ran_arr)" ] }, { "cell_type": "code", "execution_count": 175, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ -1 -1 1000]\n", " [ -1 1000 -1]\n", " [ -1 -1 -1]]\n" ] } ], "source": [ "new_arr = np.where( ran_arr > 0.5, 1000, -1)\n", "print(new_arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Using any() and all()

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**np.any()** tests whether any element in an array evaluates to True.\n", "\n", "**np.all()** tests whether all elements in an array evaluate to True." ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr_bools = np.array([ True, False, True, True, False ])" ] }, { "cell_type": "code", "execution_count": 177, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr_bools.any()" ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 178, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr_bools.all()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Random Number Generation:\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**np.random.normal(mean, standard deviation, dimensions)** draws random samples from a normal (Gaussian) distribution using information provided in parameters.\n", "\n", "**np.random.randint(low, high, dimensions)** returns an array with specified dimensions of random integers from low (inclusive) to high (exclusive).\n", "\n", "**np.random.permutation(array)** returns a new array with original array elements shuffled randomly.\n", "\n", "**np.random.uniform(low, high, dimensions)** draws samples from a uniform distribution using information provided in parameters." ] }, { "cell_type": "code", "execution_count": 179, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-2.71998444 -0.09117765 -0.42421087 2.65954844]\n" ] } ], "source": [ "arr1 = np.random.normal(size = (3,4))[0]\n", "print(arr1)" ] }, { "cell_type": "code", "execution_count": 180, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[18 28 13 6 25]\n" ] } ], "source": [ "arr2 = np.random.randint(low=3,high=30,size=5)\n", "print(arr2)" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([28, 6, 25, 13, 18])" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.permutation(arr2) # reorder elements in arr2" ] }, { "cell_type": "code", "execution_count": 183, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.31604841, 0.98754604, 0.73381728])" ] }, "execution_count": 183, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.uniform(size=3) # uniform distribution" ] }, { "cell_type": "code", "execution_count": 184, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.9300743 , 1.11163927, 0.90843176])" ] }, "execution_count": 184, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.normal(size=3) # normal distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "Merging two data sets:\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**np.vstack((array1, array2))** takes a sequence of arrays and stacks them vertically to make a single array.\n", "\n", "**np.hstack((array1, array2))** takes a sequence of arrays and stacks them horizontally to make a single array.\n", "\n", "**np.concatenate((array1, array2), axis)** joins a sequence of arrays along a the specified axis." ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 7 6 17]\n", " [ 7 21 9]\n", " [ 8 21 7]]\n", "\n", "[[ 6 19 13]\n", " [ 9 13 28]\n", " [28 7 24]]\n" ] } ], "source": [ "arr1 = np.random.randint(low=5,high=30,size=(3,3))\n", "print(arr1)\n", "\n", "print()\n", "arr2 = np.random.randint(low=5,high=30,size=(3,3))\n", "print(arr2)" ] }, { "cell_type": "code", "execution_count": 187, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 7 6 17]\n", " [ 7 21 9]\n", " [ 8 21 7]\n", " [ 6 19 13]\n", " [ 9 13 28]\n", " [28 7 24]]\n" ] } ], "source": [ "varr = np.vstack((arr1,arr2))\n", "print(varr)" ] }, { "cell_type": "code", "execution_count": 188, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 7 6 17 6 19 13]\n", " [ 7 21 9 9 13 28]\n", " [ 8 21 7 28 7 24]]\n" ] } ], "source": [ "harr = np.hstack((arr1,arr2))\n", "print(harr)" ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 7, 6, 17],\n", " [ 7, 21, 9],\n", " [ 8, 21, 7],\n", " [ 6, 19, 13],\n", " [ 9, 13, 28],\n", " [28, 7, 24]])" ] }, "execution_count": 189, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate([arr1, arr2], axis = 0)" ] }, { "cell_type": "code", "execution_count": 190, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 7, 6, 17, 6, 9, 28],\n", " [ 7, 21, 9, 19, 13, 7],\n", " [ 8, 21, 7, 13, 28, 24]])" ] }, "execution_count": 190, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate([arr1, arr2.T], axis = 1)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }