{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Accuracy / Optimality Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import dask.array as da\n",
    "import numpy as np\n",
    "\n",
    "from dask_glm.algorithms import (admm, gradient_descent, \n",
    "                                 newton, proximal_grad)\n",
    "from dask_glm.families import Logistic\n",
    "from dask_glm.utils import sigmoid, make_y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we will create some random data that fits nicely into the logistic family."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# turn off overflow warnings\n",
    "np.seterr(all='ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "N = 1e3\n",
    "p = 3\n",
    "nchunks = 5\n",
    "\n",
    "X = da.random.random((N, p), chunks=(N // nchunks, p))\n",
    "true_beta = np.random.random(p)\n",
    "y = make_y(X, beta=true_beta, chunks=(N // nchunks,))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# add an intercept\n",
    "o = da.ones((X.shape[0], 1), chunks=(X.chunks[0], (1,)))\n",
    "X_i = da.concatenate([X, o], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Unregularized Problems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Checking the gradient for optimality\n",
    "\n",
    "Recall that when we \"do logistic regression\" we are solving an optimization problem (maximizing the appropriate log-likelihood function).  Given input data $(X, y) \\in \\mathbb{R}^{n\\times p}\\times\\{0, 1\\}^n$, the gradient of our objective function at a point $\\beta \\in \\mathbb{R}^p$ is given by\n",
    "\n",
    "$$\n",
    "X^T(\\sigma(X\\beta) - y)\n",
    "$$\n",
    "\n",
    "where \n",
    "\n",
    "$$\n",
    "\\sigma(x) = 1 / (1 + \\exp(-x))\n",
    "$$\n",
    "\n",
    "is the *sigmoid* function.\n",
    "\n",
    "As our objective function is convex, we will *know* we have found the global solution if the gradient at the estimate is the 0 vector.  Let's check this condition for our unregularized algorithms: gradient descent and Newton's method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "newtons_beta = newton(X_i, y, tol=1e-8, family=Logistic)\n",
    "grad_beta = gradient_descent(X_i, y, tol=1e-8, family=Logistic)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "newtons_grad, grad_grad = da.compute(Logistic.pointwise_gradient(newtons_beta, X_i, y), \n",
    "                                     Logistic.pointwise_gradient(grad_beta, X_i, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## check the gradient\n",
    "print('Size of gradient')\n",
    "print('='*30)\n",
    "print('Newton\\'s Method : {0:.2f}'.format(np.linalg.norm(newtons_grad)))\n",
    "print('Gradient Descent : {0:.2f}'.format(np.linalg.norm(grad_grad)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## check the gradient\n",
    "print('Size of gradient')\n",
    "print('='*30)\n",
    "print('Newton\\'s Method : {0:.2f}'.format(np.max(np.abs(newtons_grad))))\n",
    "print('Gradient Descent : {0:.2f}'.format(np.max(np.abs(grad_grad))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, Newton's Method succesfully finds a *true* optimizer, whereas gradient descent doesn't do as well."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### One implication of a non-zero gradient"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For problems with an intercept, notice that the first component of the gradient is:\n",
    "\n",
    "$$\n",
    "\\Sigma_{i=1}^n \\sigma(X\\beta)_i - y_i)\n",
    "$$\n",
    "\n",
    "which implies that the true solution $\\beta^*$ has the property that the *average* prediction is equal to the *average* rate of 1's in the training data.  This provides an easy high-level test for how well our algorithms are peforming; however, this test tends to fail for `gradient_descent`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# check aggregate predictions\n",
    "newton_preds = sigmoid(X_i.dot(newtons_beta))\n",
    "grad_preds = sigmoid(X_i.dot(grad_beta))\n",
    "\n",
    "print('Difference between aggregate predictions vs. aggregate level of 1\\'s')\n",
    "print('='*75)\n",
    "print('Newton\\'s Method : {:.2f}'.format((newton_preds - y).sum().compute()))\n",
    "print('Gradient Descent : {:.2f}'.format((grad_preds - y).sum().compute()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Checking the log-likelihood\n",
    "\n",
    "We can also compare the objective function directly for each of these estimates; recall that in practice we *minimize* the *negative* log-likelihood, so we are looking for smaller values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "newtons_loss, grad_loss = da.compute(Logistic.pointwise_loss(newtons_beta, X_i, y),\n",
    "                                     Logistic.pointwise_loss(grad_beta, X_i, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## check log-likelihood\n",
    "print('Negative Log-Likelihood')\n",
    "print('='*30)\n",
    "print('Newton\\'s Method : {0:.4f}'.format(newtons_loss))\n",
    "print('Gradient Descent : {0:.4f}'.format(grad_loss))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We do see that the function values are surprisingly close, but as the aggregate predictions check shows us, there is a material *model* difference between the estimates."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### $\\ell_1$ Regularized Problems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let us consider problems where we modify the log-likelihood by adding a \"regularizer\"; in our particular case we are optimizing a modified function where $\\lambda \\sum_{i=1}^p \\left|\\beta_i\\right| =: \\lambda \\|\\beta\\|_1$ has been added to the likelihood function.  \n",
    "\n",
    "As above, we can perform a 0 gradient check to test for optimality, but our regularizer is *not differentiable at 0* so we have to be careful at any coefficient values that are 0.  For this test, we will also compare against `sklearn`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "lamduh = 4.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We should see *two* convergence prints, one for `admm` and one for `proximal_grad`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "mod = LogisticRegression(penalty='l1', C = 1. / lamduh, fit_intercept=False, tol=1e-8).fit(X.compute(), y.compute())\n",
    "sk_beta = mod.coef_\n",
    "\n",
    "admm_beta = admm(X, y, lamduh=lamduh, max_iter=700, \n",
    "                 abstol=1e-8, reltol=1e-2, family=Logistic)\n",
    "prox_beta = proximal_grad(X, y, family=Logistic, regularizer='l1', tol=1e-8, lamduh=lamduh)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# optimality check\n",
    "\n",
    "def check_regularized_grad(beta, lamduh, tol=1e-6):\n",
    "    opt_grad = Logistic.pointwise_gradient(beta, X.compute(), y.compute())\n",
    "    for idx, b in enumerate(beta):\n",
    "        if b == 0:\n",
    "            try:\n",
    "                assert opt_grad[idx] - lamduh <= 0 <= opt_grad[idx] + lamduh\n",
    "            except AssertionError:\n",
    "                print('Optimality Fail')\n",
    "                break\n",
    "        else:\n",
    "            try:\n",
    "                assert np.abs(opt_grad[idx] + lamduh * np.sign(b)) < tol\n",
    "            except AssertionError:\n",
    "                print('Optimality Fail')\n",
    "                break\n",
    "    if b == beta[-1]:\n",
    "        print('Optimality Pass!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# tolerance for 0's\n",
    "tol = 1e-4\n",
    "\n",
    "print('scikit-learn')\n",
    "print('='*20)\n",
    "check_regularized_grad(sk_beta[0,:], lamduh=lamduh, tol=tol)\n",
    "\n",
    "print('\\nADMM')\n",
    "print('='*20)\n",
    "check_regularized_grad(admm_beta, lamduh=lamduh, tol=tol)\n",
    "\n",
    "print('\\nProximal Gradient')\n",
    "print('='*20)\n",
    "check_regularized_grad(prox_beta, lamduh=lamduh, tol=tol)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "print(prox_beta)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "print(admm_beta)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "print(sk_beta)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}