{ "cells": [ { "cell_type": "markdown", "id": "45e51d4b", "metadata": {}, "source": [ "# Generative Classification\n", "\n", "- **[1]** You have a machine that measures property $x$, the \"orangeness\" of liquids. You wish to discriminate between $C_1 = \\text{Fanta'}$ and $C_2 = \\text{Orangina'}$. It is known that\n", "\n", "\\begin{align*}\n", "p(x|C_1) &= \\begin{cases} 10 & 1.0 \\leq x \\leq 1.1\\\\\n", " 0 & \\text{otherwise}\n", " \\end{cases}\\\\\n", "p(x|C_2) &= \\begin{cases} 200(x - 1) & 1.0 \\leq x \\leq 1.1\\\\\n", "0 & \\text{otherwise}\n", "\\end{cases}\n", "\\end{align*}\n", "\n", "The prior probabilities $p(C_1) = 0.6$ and $p(C_2) = 0.4$ are also known from experience. \n", " (a) (##) A \"Bayes Classifier\" is given by\n", " \n", "$$\\text{Decision} = \\begin{cases} C_1 & \\text{if } p(C_1|x)>p(C_2|x) \\\\\n", " C_2 & \\text{otherwise}\n", " \\end{cases}\n", "$$\n", "\n", "Derive the optimal Bayes classifier. \n", " (b) (###) The probability of making the wrong decision, given $x$, is\n", " \n", "$$\n", "p(\\text{error}|x)= \\begin{cases} p(C_1|x) & \\text{if we decide C_2}\\\\\n", " p(C_2|x) & \\text{if we decide C_1}\n", "\\end{cases}\n", "$$\n", "\n", "Compute the **total** error probability $p(\\text{error})$ for the Bayes classifier in this example.\n", "\n", "\n", "- **[2]** (#) (see Bishop exercise 4.8): Using (4.57) and (4.58) (from Bishop's book), derive the result (4.65) for the posterior class probability in the two-class generative model with Gaussian densities, and verify the results (4.66) and (4.67) for the parameters $w$ and $w0$. \n", "\n", "\n", "- **[3]** (###) (see Bishop exercise 4.9). \n", "\n", "\n", "- **[4]** (##) (see Bishop exercise 4.10). \n", "\n", "\n", "" ] }, { "cell_type": "markdown", "id": "0bf3a01b", "metadata": {}, "source": [ "# Discriminative Classification\n", "\n", "- **[1]** Given a data set $D=\\{(x_1,y_1),\\ldots,(x_N,y_N)\\}$, where $x_n \\in \\mathbb{R}^M$ and $y_n \\in \\{0,1\\}$. The probabilistic classification method known as *logistic regression* attempts to model these data as\n", "$$p(y_n=1|x_n) = \\sigma(\\theta^T x_n + b)$$\n", "where $\\sigma(x) = 1/(1+e^{-x})$ is the *logistic function*. Let's introduce shorthand notation $\\mu_n=\\sigma(\\theta^T x_n + b)$. So, for every input $x_n$, we have a model output $\\mu_n$ and an actual data output $y_n$. \n", " (a) Express $p(y_n|x_n)$ as a Bernoulli distribution in terms of $\\mu_n$ and $y_n$. \n", " (b) If furthermore is given that the data set is IID, show that the log-likelihood is given by\n", "$$\n", "L(\\theta) \\triangleq \\log p(D|\\theta) = \\sum_n \\left\\{y_n \\log \\mu_n + (1-y_n)\\log(1-\\mu_n)\\right\\}\n", "$$ \n", " (c) Prove that the derivative of the logistic function is given by\n", "$$\n", "\\sigma^\\prime(\\xi) = \\sigma(\\xi)\\cdot\\left(1-\\sigma(\\xi)\\right)\n", "$$ \n", " (d) Show that the derivative of the log-likelihood is\n", "$$\n", "\\nabla_\\theta L(\\theta) = \\sum_{n=1}^N \\left( y_n - \\sigma(\\theta^T x_n +b)\\right)x_n\n", "$$ \n", " (e) Design a gradient-ascent algorithm for maximizing $L(\\theta)$ with respect to $\\theta$. \n", "\n", "\n", "- **[2]** Describe shortly the similarities and differences between the discriminative and generative approach to classification.\n", "\n", "\n", "- **[3]** (Bishop ex.4.7) (#) Show that the logistic sigmoid function $\\sigma(a) = \\frac{1}{1+\\exp(-a)}$ satisfies the property $\\sigma(-a) = 1-\\sigma(a)$ and that its inverse is given by $\\sigma^{-1}(y) = \\log\\{y/(1-y)\\}$.\n", "\n", " \n", "- **[4]** (Bishop ex.4.16) (###) Consider a binary classification problem in which each observation $x_n$ is known to belong to one of two classes, corresponding to $y_n = 0$ and $y_n = 1$. Suppose that the procedure for collecting training data is imperfect, so that training points are sometimes mislabelled. For every data point $x_n$, instead of having a value $y_n$ for the class label, we have instead a value $\\pi_n$ representing the probability that $y_n = 1$. Given a probabilistic model $p(y_n = 1|x_n,\\theta)$, write down the log-likelihood function appropriate to such a data set.\n", " \n", "\n", "- **[5]** (###) Let $X$ be a real valued random variable with probability density\n", "$$\n", "p_X(x) = \\frac{e^{-x^2/2}}{\\sqrt{2\\pi}},\\quad\\text{for all x}.\n", "$$\n", "Also $Y$ is a real valued random variable with conditional density\n", "$$\n", "p_{Y|X}(y|x) = \\frac{e^{-(y-x)^2/2}}{\\sqrt{2\\pi}},\\quad\\text{for all x and y}. \n", "$$\n", " (a) Give an (integral) expression for $p_Y(y)$. Do not try to evaluate the integral. \n", " (b) Approximate $p_Y(y)$ using the Laplace approximation.\n", " Give the detailed derivation, not just the answer.\n", "Hint: You may use the following results.\n", "Let \n", "$$g(x) = \\frac{e^{-x^2/2}}{\\sqrt{2\\pi}}$$\n", "and\n", "$$\n", "h(x) = \\frac{e^{-(y-x)^2/2}}{\\sqrt{2\\pi}}$$\n", "for some real value $y$. Then:\n", "\\begin{align*}\n", "\\frac{\\partial}{\\partial x} g(x) &= -xg(x) \\\\\n", "\\frac{\\partial^2}{\\partial x^2} g(x) &= (x^2-1)g(x) \\\\\n", "\\frac{\\partial}{\\partial x} h(x) &= (y-x)h(x) \\\\\n", "\\frac{\\partial^2}{\\partial x^2} h(x) &= ((y-x)^2-1)h(x) \n", "\\end{align*} \n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e455e95b", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.5.2", "language": "julia", "name": "julia-1.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.5.2" } }, "nbformat": 4, "nbformat_minor": 5 }