{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Review of Naive Bayes\n",
      "-----------------------\n",
      "Predicting class of a text, say \"humor\":  \n",
      "$P(C = \\textrm{humor} \\mid X_a = \\textrm{False}, etc) \\\\\n",
      " P(C \\mid X) = P(X \\mid C)P(C) / P(X)$ \n",
      " \n",
      "(aside: We have generally been doing supervised learning.)\n",
      "\n",
      "Parametric vs. Non-Paramentric:\n",
      "----------------------------\n",
      "\n",
      "- **K-NN**: not considered parametric, even though $K$ is a parameter, since you're not given a probability distribution\n",
      "- **Decision trees**: nonparametric\n",
      "- **Linear regression**: parametric (assumes the world is linear; if the world is sinusoidal, for instance, LR will provide a biased estimate\n",
      "- **Linear regression with complexity penalty**: parametric\n",
      "- **Naive Bayes**: parametric (form of the prior, probabilities of each word given each category, etc)  \n",
      "    * number of parameters is $k * v$, where $k$ == categories and $v$ == words\n",
      "    \n",
      "Generative vs. Discriminative\n",
      "-----------------------------\n",
      "\n",
      "- Generative\n",
      "    * learn $p(x)$ -- learn the probability distribution\n",
      "    * or, equivalently, $p(y, x)$\n",
      "- Discriminative\n",
      "    * learn $p(y\\mid x)$\n",
      "    * or, minimize $L(y, f(x, \\Theta))$\n",
      "\n",
      "Given $p(y, x)$, you can estimate $p(y\\mid x)$. But you cannot estimate $p(y, x)$ given $p(y\\mid x)$\n",
      "\n",
      "- **Linear regression**: \n",
      "    * $P(Y,X) = P(Y\\mid X)P(X)$\n",
      "    * $Y_i \\sim N(wx_i, \\sigma^2)$\n",
      "    * We don't make any assumptions about $P(X)$\n",
      "    * Therefore, it is _discriminative_\n",
      "- **Naive Bayes**:\n",
      "    * makes an assumption about how $X$ is distributed, so it is _generative_\n",
      "    * assumes a very particular distribution of Xs\n",
      "    \n",
      "In a discriminative model, I'm not estimating P(X). If P(X) is particularly messy, then it's easier to use discriminative models so we don't have to model the Xs."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Logistic Regression\n",
      "=====================\n",
      "\n",
      "Imagine X vs Y: Y = 1 if true, 0 if false. Plotting gives you set of points at {0,1}. Fitting a linear regression gives you a bad probability estimator (can go above or below 0, 1). You can fit a smoothing function $f(x) = 1/(1 + e^{-x})$. This function starts at 0 and goes to 1; no matter what X is, $f(x)$ is between 0 and 1. \n",
      "\n",
      "This is the _logistic function_.\n",
      "\n",
      "$P(Y = 1 \\mid x,w) = \\frac{1}{1+\\exp(-\\sum_j w_j x_j)}\\frac{1}{1+\\exp(-w^\\intercal x)} = \\frac{1}{1+\\exp(-yw^\\intercal x)}$\n",
      "\n",
      "$\\log \\frac{P(Y=1\\mid x,w)}{P(Y=-1\\mid x,w)} = w^\\intercal x$\n",
      "\n",
      "In log. regression, each weight tells you how much you'll drive up the logistic function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Adding interactions:  \n",
      "Can you add an interaction term and have it still be a logistic regression? Yes\n",
      "\n",
      "If you include transformations and interactions, LR can be extremely powerful\n",
      "\n",
      "Why would you use NB vs LR? NB is useful because often you don't have enough data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Linear boundary for 2-class Gaussian Naive Bayes with shared variances\n",
      "------------------\n",
      "\n",
      "see notes on website"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Intuition about linear boundaries:\n",
      "------------------\n",
      "**Geometric intuition:**\n",
      "\n",
      "Representing a line: $0 = x - y$\n",
      "\n",
      "- 0 = [1 -1] [x; y]\n",
      "- in general a hyperplane is defined by $0 = \\vec{w} \\bullet \\vec{x}$\n",
      "    - the vector ($-\\hat{w}$) defines the plane that is orthagonal to it"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}