{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 1 \n", "\n", " - We'll start with fitting data to a parametric model\n", " \n", " - So we need to create the log-likelihood, which we will optimize over. \n", " - Let's create some fake data first." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "\n", "\n", "T = (np.random.exponential(size=1000)/1.5) ** 2.3\n", "\n", "plt.hist(T, bins=50);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use the Weibull parametric model: https://en.wikipedia.org/wiki/Weibull_distribution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def pdf(params, t):\n", " lambda_, rho_ = params\n", " # I'm not going to make you type this out - it's annoying and error prone. \n", " return rho_ / lambda_ * (t / lambda_) ** (rho_ - 1) * np.exp(-(t/lambda_) ** rho_)\n", "\n", "# okay, but we actually need the _log_ of the pdf\n", "def log_pdf(params, t):\n", " lambda_, rho_ = params\n", " # I'm not going to make you type this out - it's annoying and error prone. \n", " return np.log(rho_) - np.log(lambda_) + (rho_ - 1) * (np.log(t) - np.log(lambda_)) - (t/lambda_) ** rho_\n", "\n", "# now we can define the log likehood\n", "\n", "def log_likelihood(params, t):\n", " return np.sum(log_pdf(params, t)) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `scipy.optimize.minimize`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from scipy.optimize import minimize\n", "\n", "results = minimize(log_likelihood, \n", " x0 = # some initial guess of the parameters. \n", " method=None, \n", " args=(T, ))\n", "\n", "print(results)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def log_likelihood(params, t):\n", " return # this should change\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results = minimize(log_likelihood, \n", " x0 = np.array([1.0, 1.0]), # some initial guess of the parameters. \n", " method=None, \n", " args=(T, ))\n", "\n", "print(results)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# weibull parameters must be greater than 0!\n", "# we can \"nudge\" the minimizer to understand this using the bounds argument\n", "results = minimize(log_likelihood, \n", " x0 = np.array([1.0, 1.0]),\n", " method=None, \n", " args=(T, ),\n", " bounds=#fill this in)\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Takeaway: `minimize` is a very flexible function for small-mid size parameter optimization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A few problems though:\n", "\n", "1. You are stuck with only using known parametric models that are easy to implement. \n", "2. Not very fast minimization routines\n", "\n", "Let's move to Part 2. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }