{ "metadata": { "name": "", "signature": "sha256:a55628c0abb2e9280b1298cd8a5c88f52701f21aa005e40974533dc38547b65b" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for PyCon 2014. Source and license info is on [GitHub](https://github.com/jakevdp/sklearn_pycon2014/)." ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "An Introduction to scikit-learn: Machine Learning in Python" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "http://github.com/jakevdp/sklearn_pycon2014" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Goals of this Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Introduce the basics of Machine Learning**, and some skills useful in practice.\n", "- **Introduce the syntax of scikit-learn**, so that you can make use of the rich toolset available." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Schedule:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Outline:\n", "\n", "**9:00 - 9:30** Preliminaries: Setup & introduction\n", "\n", "- Making sure your computer is set-up\n", "- What is Machine Learning?\n", "- Quick review of Numpy and Matplotlib\n", "\n", "**9:30 - 10:15** Basic Principles of Machine Learning and the Scikit-learn Interface\n", "\n", "- Machine learning data layout\n", "- Supervised Learning\n", " - Classification\n", " - Regression\n", " - Measuring performance\n", "- Unsupervised Learning\n", " - Clustering\n", " - Dimensionality Reduction\n", "- Evaluation of models\n", "- How to choose the right algorithm for your dataset\n", "\n", "**10:15 - 11:00** Supervised learning in-depth\n", "\n", "- Two important algorithms: Support Vector Machines and Random Forests\n", "- Application: recognizing handwritten digits\n", "\n", "**11:00 - 11:45** Unsupervised learning in-depth\n", "\n", "- Two important algorithms: PCA and K Means\n", "- Application: image color compression\n", "\n", "**11:45 - 12:20** Validation and Model Selection\n", "\n", "- Overfitting, Underfitting, bias, and variance\n", "- Improving your fit: validation curves and learning curves\n", "- Application: facial recognition" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Preliminaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial requires the following packages:\n", "\n", "- Python version 2.6-2.7\n", "- `numpy` version 1.5 or later: http://www.numpy.org/\n", "- `scipy` version 0.9 or later: http://www.scipy.org/\n", "- `matplotlib` version 1.0 or later: http://matplotlib.org/\n", "- `scikit-learn` version 0.12 or later: http://scikit-learn.org\n", "- `ipython` version 1.0 or later, with notebook support: http://ipython.org\n", "\n", "The easiest way to get these is to use an all-in-one installer such as [Anaconda](http://www.continuum.io/downloads) from Continuum. These are available for multiple architectures." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Anaconda Setup\n", "\n", "If you're using [Anaconda](https://store.continuum.io/cshop/anaconda/), simpy type\n", "\n", "```\n", "conda install scikit-learn\n", "```\n", "\n", "Otherwise it's best to install from source (requires a C compiler):\n", "```\n", "git clone https://github.com/scikit-learn/scikit-learn.git\n", "cd scikit-learn\n", "python setup.py install\n", "```\n", "\n", "Scikit-learn requires ``NumPy`` and ``SciPy``, and examples require ``Matplotlib``.\n", "\n", "**Note**: some examples used in this tutorial require the scripts in the ``fig_code`` directory, which can be found within the ``notebooks`` subdirectory of the Github repository at [https://github.com/jakevdp/sklearn_pycon2014/](https://github.com/jakevdp/sklearn_pycon2014/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alternatives\n", "\n", "- **Linux**: If you're on Linux, you can use the linux distribution tools (by typing, for example `apt-get install numpy` or `yum install numpy`.\n", "\n", "- **Mac**: If you're on OSX, there are similar tools such as MacPorts or HomeBrew which contain pre-compiled versions of these packages.\n", "\n", "- **Windows**: Windows can be challenging: the best bet is probably to use a package installer such as Anaconda, above." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Checking your installation\n", "\n", "You can run the following code to check the versions of the packages on your system:\n", "\n", "(in IPython notebook, press `shift` and `return` together to execute the contents of a cell)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy\n", "print 'numpy:', numpy.__version__\n", "\n", "import scipy\n", "print 'scipy:', scipy.__version__\n", "\n", "import matplotlib\n", "print 'matplotlib:', matplotlib.__version__\n", "\n", "import sklearn\n", "print 'scikit-learn:', sklearn.__version__" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "numpy: 1.7.1\n", "scipy: 0.13.2\n", "matplotlib: 1.3.1\n", "scikit-learn: 0.14.1\n" ] } ], "prompt_number": 1 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Useful Resources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **scikit-learn:** http://scikit-learn.org (see especially the narrative documentation)\n", "- **matplotlib:** http://matplotlib.org (see especially the gallery section)\n", "- **IPython:** http://ipython.org (also check out http://nbviewer.ipython.org)\n", "- **astroML:** http://astroML.github.com (shameless plug: this is my project!)" ] } ], "metadata": {} } ] }