{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Chapter 11: Kernels and mixture models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "# Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Exercise 1 (30 points)\n",
    "We want to use a kernel smoother to fit a curve to the data set `ratdiet_fields.dat`, fitting `trt` vs. `t`, which has $N$ data points. Specifically, we will use a nearest neighbour kernel and do local linear regression. The neighbours will be weighted with the tricubic kernel, $(1 - [(x_i-x)/L]^3)^3$, where $L$ is the distance to the furthest neighbour, and $(x_i-x)$ is the distance to the neighbour from the centre of the kernel. \n",
    "\n",
    "Note that this is the kernel used by the R function `loess`. (If you want to use a different package with a different weighting, that's fine. Just fully explain what you have used.)\n",
    "\n",
    "We will control the amount of smoothing via the number of nearest neighbours.\n",
    "\n",
    "* Calculate the smoothed function on a grid of at least 1000 points for $k=5, 15, 30$. Overplot each of these as a line on top of the data.\n",
    "\n",
    "* Calculate and plot the RMS of the full data set as a function of $k$, for $k=4,5,\\ldots N$, using the full data set each time to evaluate the function. Which value of $k$ minimizes the RMS?\n",
    "\n",
    "* Repeat the previous step, but now doing leave-one-out cross validation to calculate the RMS for each $k$. Which value of $k$ minimizes the RMS now?\n",
    "\n",
    "_Hint_: If you use `loess` in R, I suggest you always keep the two extreme points (lowest and highest) in every training data set when doing LOO-CV. This is because the default setting for `loess` does not permit extrapolation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Exercise 2 (30 points)\n",
    "\n",
    "Look at the data `geyser2_MASS.dat` (the same as used in the lecture). Use a kernel density esimtate method to identify the locations of the maxima of the two apparent clusters and the height of each maximum (and report these). Using this plus inspection of the data, decide on a boundary value of `pduration` which best separates the two clusters. We want to approximate each cluster using a Gaussian. By allocating data to each cluster according to your boundary, calculate the covariance matrix of each Gaussian and report these. Overplot on the data for each cluster the centre of the Gaussian plus its elliptical contour corresponding to its covariance.\n",
    "If we wanted to actually model the data using these two Gaussians, suggest a mathematical expression for combining them into a single PDF for the entire data set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "##Exercise 3 (40 points)\n",
    "\n",
    "We propose to analyse the dataset given in `line_outlier.dat`. You will fit a linear model to a dataset\n",
    "$$ \\hat{y}(x ~\\mid~\\alpha, \\beta) = \\alpha x + \\beta $$\n",
    "and assess the probability of each point to be an outlier to this model.\n",
    "\n",
    "1) plot the dataset with its uncertainties and assess how many outliers are possibly within this dataset. Explain your thoughts.\n",
    "\n",
    "2) Blind fit, i.e., no outlier model. \n",
    "\n",
    "2.a) Explicit the likelihood function of this dataset with a linear model.\n",
    "\n",
    "2.b) Obtain posterior samples of you model given this dataset with an MCMC method of your choice. You will give the algorithm that you choose, and justify the tuning parameters and initial guess of the fit. \n",
    "\n",
    "2.c) Plot the posterior distribution $p(\\alpha, \\beta)$ as well as your fit and uncertainties on top of the data. Give the values of the model parameters and comment on the quality of your fit.\n",
    "\n",
    "3) The Bayesian approach to accounting for outliers generally involves mixture models so that the initial model is combined with a complement model accounting for the outliers. \n",
    "\n",
    "3.1) Explain the modelling given the following likelihood function:\n",
    "$$\n",
    "\\begin{array}{ll}\n",
    "p(\\{x_i, y_i, \\sigma_i\\}~|~\\alpha, \\beta,\\{g_i\\},\\sigma_B) = & \\frac{g_i}{\\sqrt{2\\pi \\sigma_i^2}}\\exp\\left[\\frac{-\\left(\\hat{y}(x_i~|~\\alpha, \\beta) - y_i\\right)^2}{2\\sigma_i^2}\\right] \\\\\n",
    "&+ \\frac{1 - g_i}{\\sqrt{2\\pi \\sigma_B^2}}\\exp\\left[\\frac{-\\left(\\hat{y}(x_i~|~\\alpha, \\beta) - y_i\\right)^2}{2\\sigma_B^2}\\right]\n",
    "\\end{array}\n",
    "$$\n",
    "where $\\{g_i\\}$ is a series of weights which range from 0 to 1 and that encode for each point $i$ the probability that $(x_i, y_i)$ is not an outlier, and $\\sigma_B$ the scaling parameter of the second component.\n",
    "\n",
    "3.2) Explain what could be the the major drawback of this model? What would you change to produce a more interesting model?\n",
    "\n",
    "4) Based on the same formulation, we can consider one global $g$ instead individuals, which will characterize on the ensemble the probability of having an outlier. In other words, the fraction of outliers relative to the dataset. \n",
    "\n",
    "$$\n",
    "\\begin{array}{ll}\n",
    "p(\\{x_i, y_i, \\sigma_i\\}~|~\\theta,g,\\sigma_B) = & \\frac{g}{\\sqrt{2\\pi \\sigma_i^2}}\\exp\\left[\\frac{-\\left(\\hat{y}(x_i~|~\\theta) - y_i\\right)^2}{2\\sigma_i^2}\\right] \\\\\n",
    "&+ \\frac{1 - g}{\\sqrt{2\\pi \\sigma_B^2}}\\exp\\left[\\frac{-\\left(\\hat{y}(x_i~|~\\theta) - y_i\\right)^2}{2\\sigma_B^2}\\right]\n",
    "\\end{array}\n",
    "$$\n",
    "\n",
    "as before, $g$ ranges from 0 to 1, and indicates the fraction of outlier in the sample. Similarily Gaussian distribution of width $\\sigma_B$ is used to model the outliers in the computation of the likelihood.\n",
    "\n",
    "4.1) Explain briefly how this update affects the degree of complexity of the model in comparison to the previous one (defined question 3).\n",
    "\n",
    "4.2) Using this last model expression, give the expression of the odds that a point is an outlier of the affine distribution. (justify your expression).\n",
    "\n",
    "4.3) Fit the data using an MCMC method of your choice. Give the method and the details of the fit initialization and procedure. Caution: Be very careful to the parameters of your MCMC (burning, sampling, stepsize etc). Plot the pair-wise joint-distribution of $\\alpha$, $\\beta$, $g$ and $\\sigma_B$ and for each parameter give their maginalized median value and inter-quartile intervals.\n",
    "\n",
    "4.4) Plot the final fit and its uncertainties on top of the data and the \"blind-fit\" (found in question 2). Using the expression of the outlier probability (question 4.2) highlight all points with expected odds, $E[Odds] > 0.5$. Comment on your fit and the number of outliers you find.\n",
    "\n",
    "5) Take the outlier model from the lecture notes, i.e., a Cauchy distribution for the outliers, and redo the fit. Plot the 3 fits on one figure along with the data and compare. Comment on the similarities and differences."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}