{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Python for machine learning\n", "\n", "Author: Brian Stucky" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 1. Introduction\n", "\n", "\n", "## 2. Introducing Jupyter notebooks\n", "\n", " * Click in a cell to make it active.\n", " * Type `shift+enter` to run the code in the cell.\n", " * Typing `shift+enter` will also open a new cell below the active cell if there is not already a cell there.\n", "\n", "\n", "## 3. Python basics\n", "\n", "Writing literal values in Python: Numbers are written as, e.g., `12` or `3.141592654`, and literal text values, called *strings*, are written as, e.g., `'this is a string'` or `\"this is a string\"`.\n", "\n", "The `print()` function writes output to the console." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python provides all of the basic arithmetic operators for working with numerical values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `=` operator is used to assign a value to a variable (and create the variable if it does not yet exist)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Conditional statements\n", "\n", "Python provides an `if` statement that can be used to make a decision. If statements are often used with the comparison operators: `>` (greater than), `<` (less than), `==` (equal to), or `!=` (not equal to)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we'd like to also do something when the test is `False`, we can add an `else` clause to the `if` statement." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Given a variable `someval` that can have any real number value, write code that ensures `someval` is in the range -10 to 10, inclusive, by truncating values outside of that range. E.g., if the starting value of `someval` is -23, the ending value of `someval` would be -10." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Lists and loops\n", "\n", "A Python _list_ allows us to group multiple values together in a single data structure. We can define a list using brackets, `[` and `]`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Elements of a list are accessed using *subscript notation*. The first element of a list is at index 0, the next is at index 1, and so on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python's `for` loop provides a convenient way to sequentially access every item in a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The indented part of a `for` loop is called the loop's *body*, and it can contain multiple lines of code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `len` function returns the number of items in a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Given a non-empty list of non-negative numbers, called `num_list`, write code that uses a `for` loop to find the largest item in the list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Working with Python packages, modules, and functions\n", "\n", "Python code is often organized into units called _packages_ and _modules_.\n", "\n", "Use the `import` statement to tell Python that you want to load a library. Once a library is loaded, the dot operator, `.`, lets you access the objects contained in the library." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A *function* comprises a unit of code that accepts one or more *arguments*, does some computations using the argument values, and then returns the result." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result of a function call can be assigned to a variable, just like any other value." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Functions can take any number of arguments. Arguments are separated by a comma, `,`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python allows us to assign a shortcut name for a library as part of the `import` statement." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes, it is convenient to be able to access an object in a library directly without typing the library name every time. Python provides an alternative `import` syntax that makes this easy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "With the help of the math library, write a short Python program to find the length of the hypotenuse of a right triangle given the lengths of the other two sides, represented by the variables `a` and `b`. Use the [documentation for the math library](https://docs.python.org/3/library/math.html) as needed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Using NumPy\n", "\n", "Introducing the [NumPy](https://www.numpy.org/) *multidimensional array*.\n", " 1. By convention, the shortcut name `np` is used for `numpy`.\n", " 2. Indexing of numpy arrays is exactly as for Python lists, with the first element at index 0.\n", " 3. Arrays can generally be used in the same ways you'd use lists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arithmetic operations on arrays are performed *element-wise*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy also provides many common mathematical functions that can be used with arrays. Most of these operate element-wise, but some calculate a single value from the contents of an array." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Common statistical summary functions, such as `min()` and `mean()`, can also be accessed as properties of the array objects themselves, which is sometimes more convenient." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Consider the code below:\n", "```\n", "arr_1 = np.array([1, 2, 3, 4, 5, 6])\n", "arr_2 = arr_1\n", "\n", "arr_1[2] = 2\n", "arr_2[3] = 5\n", "```\n", "What will be the final value of `arr_1`? What will be the final value of `arr_2`? Run the code and check your answers. Were you surprised by the results?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Using pandas\n", "\n", "[*Pandas*](https://pandas.pydata.org/) provides a structure called `DataFrame` for working with tabular data. We'll work with the famous [iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), which is provided in `nb-datasets/iris_dataset.csv` in [*comma-separated values*](https://en.wikipedia.org/wiki/Comma-separated_values), or *CSV*, format." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To inspect the contents of a DataFrame, we can use the `head()` or `tail()` functions.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `len()` function returns the number of rows in a dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "DataFrames include a function called `describe()` that provides a basic statistical overview of a DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access individual columns of a DataFrame using a special form of subscript notation that uses the column name." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each column in a Pandas DataFrame is a special kind of numpy array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Basic statistical summary methods are defined for DataFrames, too, and they return the summary statistic for each column." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Conclusion" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }