{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "View the assignment [here](http://www.cs.ubc.ca/~nando/540-2013/lectures/homework3.pdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\newcommand{\\vect}[1]{\\boldsymbol{\\mathbf{#1}}}$\n", "The loss function for logistic regression is defined by\n", "$$J\\left( \\vect \\theta \\right) = - \\log p\\left( \\vect y | \\vect X, \\vect \\theta \\right)=\n", "-\\sum _{i=1} ^n y_i \\log \\pi_i + (1-y_i)\\log (1 - \\pi_i)$$\n", "where $\\pi_i=\\frac{1}{1+\\exp(-\\vect \\theta^T \\vect x_i)} = \\frac{\\exp(-\\vect \\theta^T \\vect x_i)}{1+\\exp(\\vect \\theta^T \\vect x_i)}$, $\\vect y \\in \\mathbb{R}^{n \\times 1}$, $\\vect \\theta \\in \\mathbb{R}^{d \\times 1}$ and $\\vect X \\in \\mathbb{R}^{n \\times d}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By algebraic manipulation it can be written as\n", "$$J(\\vect \\theta) = \\sum _{i=1} ^n \\log(\\exp(\\vect \\theta^T \\vect x_i) + 1) - y_i \\vect \\theta^T \\vect x_i$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then the gradient is\n", "$$\\vect g(\\vect \\theta) = \\frac{\\partial}{\\partial \\vect \\theta} J(\\vect \\theta) = \\sum _{i=1}^n\n", "\\frac{1}{\\exp(\\vect \\theta^T \\vect x_i) + 1} \\exp(\\vect \\theta^T \\vect x_i) \\vect x_i^T - y_i\\vect x_i^T$$\n", "$$=\\sum_{i=1}^n \\vect x_i^T (\\pi_i - y_i) = \\vect X^T (\\vect \\pi - \\vect y)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the Hessian is\n", "$$\\frac{\\partial}{\\partial \\vect \\theta} \\vect g(\\vect \\theta)^T = \\frac{\\partial}{\\partial \\vect \\theta} \\sum _{i=1}^n \\left( \\frac{1}{1+\\exp(-\\vect \\theta^T \\vect x_i)} - y_i\\right) \\vect x_i$$\n", "$$= \\sum _{i=1}^n \\frac {\\exp(-\\vect \\theta^T \\vect x_i)\\vect x_i } {\\left( 1 + \\exp(-\\vect \\theta^T \\vect x_i) \\right)^2} \\vect x_i^T = \\sum _{i=1}^n \\pi_i (1 - \\pi_i)\\vect x_i \\vect x_i^T= \\vect X^T diag(\\pi_i(1-\\pi_i))\\vect X$$" ] } ], "metadata": {} } ] }