{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Machine Learning Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Preliminaries\n",
    "- Goal\n",
    "    - Top-level overview of machine learning\n",
    "- Materials\n",
    "    - Study Bishop pp. 1-4\n",
    "    - Study this notebook "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### What is Machine Learning?\n",
    "- Machine Learning relates to **building models from data**.\n",
    "  - Suppose we want to make a model for a complex process about which we have little knowledge (so hand-programming is not possible).\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "  - **Solution**: Get the computer to program itself by showing it examples of the behavior that we want.\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "  - Practically, we choose a library of models, and write a program that picks a model and tunes it to fit the data.\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "  - **Criterion**: a _good_ model generalizes well to unseen data from the same process.\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- This method is known in various scientific communities under different names such as machine learning, statistical inference, system identification, data mining, source coding, data compression, etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Machine learning and the scientific inquiry loop.\n",
    "\n",
    "<img src=\"./figures/ml-and-scientific-loop.png\" width=\"450px\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Machine Learning is Difficult\n",
    "\n",
    "- Modeling (Learning) Problems\n",
    "  - Is there any regularity in the data anyway?\n",
    "  - What is our prior knowledge and how to express it mathematically?\n",
    "  - How to pick the model library?\n",
    "  - How to tune the models to the data?\n",
    "  - How to measure the generalization performance?\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- Quality of Observed Data\n",
    "  - Not enough data\n",
    "  - Too much data?\n",
    "  - Available data may be messy (measurement noise, missing data points, outliers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### A Machine Learning Taxonomy\n",
    "\n",
    "- **Supervised Learning**: Given examples of inputs and corresponding\n",
    "desired outputs, predict outputs on future inputs.\n",
    "  - Examples: classification, regression, time series prediction\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- **Unsupervised Learning**: (a.k.a. **density estimation**). Given only inputs, automatically discover representations, features, structure, etc.\n",
    "  - Examples: clustering, outlier detection, compression\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- **Reinforcement Learning**: Given sequences of inputs, actions from a\n",
    "fixed set, and scalar rewards/punishments, _learn_ to select action\n",
    "sequences in a way that maximizes expected reward, e.g. chess and robotics. (This is more akin to learning how to design good experiments and is not covered in this course.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- Other stuff, like **Preference Learning**, **learning to rank**, etc. (also not covered in this course). Note that many machine learning problems can be (re-)formulated as special cases of either a supervised or unsupervised problem, which are both covered in this class.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Supervised Learning\n",
    "\n",
    "- Given observations $D=\\{(x_1,y_1),\\dots,(x_N,y_N)\\}$, the goal is to estimate the conditional distribution $p(y|x)$.\n",
    "\n",
    "##### Classification \n",
    "\n",
    "<img src=\"./figures/Bishop-Figure4.5b.png\" width=\"300px\">\n",
    "\n",
    "- The target variable $y$ is a _discrete-valued_ vector representing class labels "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "##### Regression \n",
    "\n",
    "<img src=\"./figures/Bishop-Figure1.2.png\" width=\"300px\">\n",
    "\n",
    "- Same problem statement as classification but now the target variable is a _real-valued_ vector."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Unsupervised Learning\n",
    "\n",
    "Given data $D=\\{x_1,\\ldots,x_N\\}$, model the (unconditional) probability distribution $p(x)$ (a.k.a. **density estimation**).\n",
    "\n",
    "\n",
    "##### Clustering\n",
    "\n",
    "<img src=\"./figures/fig-Zoubin-clustering-example.png\" width=\"300px\">\n",
    "\n",
    "- Group data into clusters such that all data points in a cluster have similar properties."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "##### Compression / dimensionality reduction\n",
    "\n",
    "<img src=\"./figures/fig-compression-example.jpg\" width=\"300px\">\n",
    "\n",
    "- Output from coder is much smaller in size than original, but if coded signal if further processed by a decoder, then the result is very close (or exactly equal) to the original."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Some Machine Learning Applications\n",
    "\n",
    "- computer speech recognition, speaker recognition\n",
    "- face recognition, iris identification\n",
    "- printed and handwritten text parsing\n",
    "- financial prediction, outlier detection (credit-card fraud)\n",
    "- user preference modeling (amazon); modeling of human perception\n",
    "- modeling of the web (google)\n",
    "- machine translation\n",
    "- medical expert systems for disease diagnosis (e.g., mammogram)\n",
    "- strategic games (chess, go, backgammon)\n",
    "- **any 'knowledge-poor' but 'data-rich' problem**\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<!--\n",
       "This HTML file contains custom styles and some javascript.\n",
       "Include it a Jupyter notebook for improved rendering.\n",
       "-->\n",
       "\n",
       "<!-- Fonts -->\n",
       "<link href='http://fonts.googleapis.com/css?family=Alegreya+Sans:100,300,400,500,700,800,900,100italic,300italic,400italic,500italic,700italic,800italic,900italic' rel='stylesheet' type='text/css'>\n",
       "<link href='http://fonts.googleapis.com/css?family=Arvo:400,700,400italic' rel='stylesheet' type='text/css'>\n",
       "<link href='http://fonts.googleapis.com/css?family=PT+Mono' rel='stylesheet' type='text/css'>\n",
       "<link href='http://fonts.googleapis.com/css?family=Shadows+Into+Light' rel='stylesheet' type='text/css'>\n",
       "<link href='http://fonts.googleapis.com/css?family=Nixie+One' rel='stylesheet' type='text/css'>\n",
       "\n",
       "<!-- Custom style -->\n",
       "<style>\n",
       "\n",
       "@font-face {\n",
       "    font-family: \"Computer Modern\";\n",
       "    src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
       "}\n",
       "\n",
       "#notebook_panel { /* main background */\n",
       "    background: rgb(245,245,245);\n",
       "}\n",
       "\n",
       "div.container {\n",
       "    min-width: 960px;\n",
       "}\n",
       "\n",
       "div #notebook { /* centre the content */\n",
       "    background: #fff; /* white background for content */\n",
       "    margin: auto;\n",
       "    padding-left: 0em;\n",
       "}\n",
       "\n",
       "#notebook li { /* More space between bullet points */\n",
       "    margin-top:0.8em;\n",
       "}\n",
       "\n",
       "/* draw border around running cells */\n",
       "div.cell.border-box-sizing.code_cell.running {\n",
       "    border: 1px solid #111;\n",
       "}\n",
       "\n",
       "/* Put a solid color box around each cell and its output, visually linking them*/\n",
       "div.cell.code_cell {\n",
       "    background-color: rgb(256,256,256);\n",
       "    border-radius: 0px;\n",
       "    padding: 0.5em;\n",
       "    margin-left:1em;\n",
       "    margin-top: 1em;\n",
       "}\n",
       "\n",
       "div.text_cell_render{\n",
       "    font-family: 'Alegreya Sans' sans-serif;\n",
       "    line-height: 140%;\n",
       "    font-size: 125%;\n",
       "    font-weight: 400;\n",
       "    width:800px;\n",
       "    margin-left:auto;\n",
       "    margin-right:auto;\n",
       "}\n",
       "\n",
       "\n",
       "/* Formatting for header cells */\n",
       ".text_cell_render h1 {\n",
       "    font-family: 'Nixie One', serif;\n",
       "    font-style:regular;\n",
       "    font-weight: 400;\n",
       "    font-size: 45pt;\n",
       "    line-height: 100%;\n",
       "    color: rgb(0,51,102);\n",
       "    margin-bottom: 0.5em;\n",
       "    margin-top: 0.5em;\n",
       "    display: block;\n",
       "}\n",
       "\n",
       ".text_cell_render h2 {\n",
       "    font-family: 'Nixie One', serif;\n",
       "    font-weight: 400;\n",
       "    font-size: 30pt;\n",
       "    line-height: 100%;\n",
       "    color: rgb(0,51,102);\n",
       "    margin-bottom: 0.1em;\n",
       "    margin-top: 0.3em;\n",
       "    display: block;\n",
       "}\n",
       "\n",
       ".text_cell_render h3 {\n",
       "    font-family: 'Nixie One', serif;\n",
       "    margin-top:16px;\n",
       "    font-size: 22pt;\n",
       "    font-weight: 600;\n",
       "    margin-bottom: 3px;\n",
       "    font-style: regular;\n",
       "    color: rgb(102,102,0);\n",
       "}\n",
       "\n",
       ".text_cell_render h4 {    /*Use this for captions*/\n",
       "    font-family: 'Nixie One', serif;\n",
       "    font-size: 14pt;\n",
       "    text-align: center;\n",
       "    margin-top: 0em;\n",
       "    margin-bottom: 2em;\n",
       "    font-style: regular;\n",
       "}\n",
       "\n",
       ".text_cell_render h5 {  /*Use this for small titles*/\n",
       "    font-family: 'Nixie One', sans-serif;\n",
       "    font-weight: 400;\n",
       "    font-size: 16pt;\n",
       "    color: rgb(163,0,0);\n",
       "    font-style: italic;\n",
       "    margin-bottom: .1em;\n",
       "    margin-top: 0.8em;\n",
       "    display: block;\n",
       "}\n",
       "\n",
       ".text_cell_render h6 { /*use this for copyright note*/\n",
       "    font-family: 'PT Mono', sans-serif;\n",
       "    font-weight: 300;\n",
       "    font-size: 9pt;\n",
       "    line-height: 100%;\n",
       "    color: grey;\n",
       "    margin-bottom: 1px;\n",
       "    margin-top: 1px;\n",
       "}\n",
       "\n",
       ".CodeMirror{\n",
       "    font-family: \"PT Mono\";\n",
       "    font-size: 90%;\n",
       "}\n",
       "\n",
       ".boxed { /* draw a border around a piece of text */\n",
       "  border: 1px solid blue ;\n",
       "}\n",
       "\n",
       "h4#CODE-EXAMPLE,\n",
       "h4#END-OF-CODE-EXAMPLE {\n",
       "    margin: 10px 0;\n",
       "    padding: 10px;\n",
       "    background-color: #d0f9ca !important;\n",
       "    border-top: #849f81 1px solid;\n",
       "    border-bottom: #849f81 1px solid;\n",
       "}\n",
       "\n",
       ".emphasis {\n",
       "    color: red;\n",
       "}\n",
       "\n",
       ".exercise {\n",
       "    color: green;\n",
       "}\n",
       "\n",
       ".proof {\n",
       "    color: blue;\n",
       "}\n",
       "\n",
       "code {\n",
       "  padding: 2px 4px !important;\n",
       "  font-size: 90% !important;\n",
       "  color: #222 !important;\n",
       "  background-color: #efefef !important;\n",
       "  border-radius: 2px !important;\n",
       "}\n",
       "\n",
       "/* This removes the actual style cells from the notebooks, but no in print mode\n",
       "   as they will be removed through some other method */\n",
       "@media not print {\n",
       "  .cell:nth-last-child(-n+2) {\n",
       "    display: none;\n",
       "  }\n",
       "}\n",
       "\n",
       "</style>\n",
       "\n",
       "<!-- MathJax styling -->\n",
       "<script>\n",
       "    MathJax.Hub.Config({\n",
       "                        TeX: {\n",
       "                           extensions: [\"AMSmath.js\"],\n",
       "                           equationNumbers: { autoNumber: \"AMS\", useLabelIds: true}\n",
       "                           },\n",
       "                tex2jax: {\n",
       "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
       "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
       "                },\n",
       "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
       "                \"HTML-CSS\": {\n",
       "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
       "                }\n",
       "        });\n",
       "</script>\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "open(\"../../styles/aipstyle.html\") do f display(\"text/html\", readstring(f)) end"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Julia 0.6.1",
   "language": "julia",
   "name": "julia-0.6"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}