# Introduction to Python for machine learning

Author: Brian Stucky

## 1. Introduction


## 2. Introducing Jupyter notebooks

 * Click in a cell to make it active.
 * Type `shift+enter` to run the code in the cell.
 * Typing `shift+enter` will also open a new cell below the active cell if there is not already a cell there.


## 3. Python basics

Writing literal values in Python: Numbers are written as, e.g., `12` or `3.141592654`, and literal text values, called *strings*, are written as, e.g., `'this is a string'` or `"this is a string"`.

The `print()` function writes output to the console.

Python provides all of the basic arithmetic operators for working with numerical values.

The `=` operator is used to assign a value to a variable (and create the variable if it does not yet exist).

## 4. Conditional statements

Python provides an `if` statement that can be used to make a decision. If statements are often used with the comparison operators: `>` (greater than), `<` (less than), `==` (equal to), or `!=` (not equal to).

If we'd like to also do something when the test is `False`, we can add an `else` clause to the `if` statement.

### Exercise

Given a variable `someval` that can have any real number value, write code that ensures `someval` is in the range -10 to 10, inclusive, by truncating values outside of that range. E.g., if the starting value of `someval` is -23, the ending value of `someval` would be -10.

## 5. Lists and loops

A Python _list_ allows us to group multiple values together in a single data structure. We can define a list using brackets, `[` and `]`.

Elements of a list are accessed using *subscript notation*. The first element of a list is at index 0, the next is at index 1, and so on.

Python's `for` loop provides a convenient way to sequentially access every item in a list.

The indented part of a `for` loop is called the loop's *body*, and it can contain multiple lines of code.

The `len` function returns the number of items in a list.

### Exercise

Given a non-empty list of non-negative numbers, called `num_list`, write code that uses a `for` loop to find the largest item in the list.

## 6. Working with Python packages, modules, and functions

Python code is often organized into units called _packages_ and _modules_.

Use the `import` statement to tell Python that you want to load a library. Once a library is loaded, the dot operator, `.`, lets you access the objects contained in the library.

A *function* comprises a unit of code that accepts one or more *arguments*, does some computations using the argument values, and then returns the result.

The result of a function call can be assigned to a variable, just like any other value.

Functions can take any number of arguments. Arguments are separated by a comma, `,`.

Python allows us to assign a shortcut name for a library as part of the `import` statement.

Sometimes, it is convenient to be able to access an object in a library directly without typing the library name every time. Python provides an alternative `import` syntax that makes this easy.

### Exercise

With the help of the math library, write a short Python program to find the length of the hypotenuse of a right triangle given the lengths of the other two sides, represented by the variables `a` and `b`. Use the [documentation for the math library](https://docs.python.org/3/library/math.html) as needed.

## 7. Using NumPy

Introducing the [NumPy](https://www.numpy.org/) *multidimensional array*.
 1. By convention, the shortcut name `np` is used for `numpy`.
 2. Indexing of numpy arrays is exactly as for Python lists, with the first element at index 0.
 3. Arrays can generally be used in the same ways you'd use lists.

Arithmetic operations on arrays are performed *element-wise*.

NumPy also provides many common mathematical functions that can be used with arrays. Most of these operate element-wise, but some calculate a single value from the contents of an array.

Common statistical summary functions, such as `min()` and `mean()`, can also be accessed as properties of the array objects themselves, which is sometimes more convenient.

### Exercise

Consider the code below:
```
arr_1 = np.array([1, 2, 3, 4, 5, 6])
arr_2 = arr_1

arr_1[2] = 2
arr_2[3] = 5
```
What will be the final value of `arr_1`? What will be the final value of `arr_2`? Run the code and check your answers. Were you surprised by the results?

## 8. Using pandas

[*Pandas*](https://pandas.pydata.org/) provides a structure called `DataFrame` for working with tabular data. We'll work with the famous [iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), which is provided in `nb-datasets/iris_dataset.csv` in [*comma-separated values*](https://en.wikipedia.org/wiki/Comma-separated_values), or *CSV*, format.

To inspect the contents of a DataFrame, we can use the `head()` or `tail()` functions.


The `len()` function returns the number of rows in a dataset.

DataFrames include a function called `describe()` that provides a basic statistical overview of a DataFrame.

We can access individual columns of a DataFrame using a special form of subscript notation that uses the column name.

Each column in a Pandas DataFrame is a special kind of numpy array.

Basic statistical summary methods are defined for DataFrames, too, and they return the summary statistic for each column.

## 9. Conclusion