{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "What is machine learning?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section we will begin to explore the basic principles of machine learning.\n", "Machine Learning is about building programs with **tunable parameters** (typically an\n", "array of floating point values) that are adjusted automatically so as to improve\n", "their behavior by **adapting to previously seen data.**\n", "\n", "Machine Learning can be considered a subfield of **Artificial Intelligence** since those\n", "algorithms can be seen as building blocks to make computers learn to behave more\n", "intelligently by somehow **generalizing** rather that just storing and retrieving data items\n", "like a database system would do.\n", "\n", "We'll take a look at two very simple machine learning tasks here.\n", "The first is a **classification** task: the figure shows a\n", "collection of two-dimensional data, colored according to two different class\n", "labels. A classification algorithm may be used to draw a dividing boundary\n", "between the two clusters of points:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Start matplotlib inline mode, so figures will appear in the notebook\n", "%matplotlib inline" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "# Import the example plot from the figures directory\n", "from figures import plot_sgd_separator\n", "plot_sgd_separator()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This may seem like a trivial task, but it is a simple version of a very important concept.\n", "By drawing this separating line, we have learned a model which can **generalize** to new\n", "data: if you were to drop another point onto the plane which is unlabeled, this algorithm\n", "could now **predict** whether it's a blue or a red point.\n", "\n", "If you'd like to see the source code used to generate this, you can either open the\n", "code in the `figures` directory, or you can load the code using the `%load` magic command:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%load figures/sgd_separator.py" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next simple task we'll look at is a **regression** task: a simple best-fit line\n", "to a set of data:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from figures import plot_linear_regression\n", "plot_linear_regression()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, this is an example of fitting a model to data, such that the model can make\n", "generalizations about new data. The model has been **learned** from the training\n", "data, and can be used to predict the result of test data:\n", "here, we might be given an x-value, and the model would\n", "allow us to predict the y value. Again, this might seem like a trivial problem,\n", "but it is a basic example of a type of operation that is fundamental to\n", "machine learning tasks." ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "An Overview of Scikit-learn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Adapted from* [*http://scikit-learn.org/stable/tutorial/basic/tutorial.html*](http://scikit-learn.org/stable/tutorial/basic/tutorial.html)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "import numpy as np\n", "from matplotlib import pyplot as plt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Loading an Example Dataset" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn import datasets\n", "digits = datasets.load_digits()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "digits.data" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "digits.target" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "digits.images[0]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Learning and Predicting" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn import svm\n", "clf = svm.SVC(gamma=0.001, C=100.)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "clf.fit(digits.data[:-1], digits.target[:-1])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "clf.predict(digits.data[-1])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.figure(figsize=(2, 2))\n", "plt.imshow(digits.images[-1], interpolation='nearest', cmap=plt.cm.binary)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print(digits.target[-1])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }