{ "metadata": { "name": "", "signature": "sha256:bc8496fcd06e13c318fd10ff6a6bf370240b4fbd4b15aefebc104f97bc7f14a6" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#[Estimating Mixture Models from Moments](https://twitter.com/justinvf/status/489556981019979776)\n", "##Justin Vincent ([justinvf](https://twitter.com/justinvf))\n", "###An algorithmshop.com production\n", "\n", "I was inspired by this [blog post](http://mrtz.org/blog/pearsons-polynomial/) about James Pearson to learn a bit about his method of solving mixture models. I also figured it would be a good way to learn sympy a little better.\n", "\n", "The basic gist of this notebook is that moments for a mixture of gaussians are nice polynomials in terms of means and variance. Thus if you calculate the moments in your data, you can use those polynomials to get the formulas for the parameters of the mixture model. But to have the numerical solver converge, it requires picking points somewhat near the real solution. Also, I don't know much about stats. Feel free to educate me.\n", "\n", "I'm going to start writing more [algorithmshop.com](http://algorithmshop.com) stuff just in IPython notebooks. This might make its way into a post someday, but I would rather just put things out there and talk to folks. I'm [justinvf](https://twitter.com/justinvf) on Twitter. If you live in the bay (or the internet) and want to talk math, please reach out! I think I made my site nicer than I want to commit to regularly. I would rather be learning stuff than making my site pretty." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sympy.stats import Normal, moment\n", "from sympy import Symbol, simplify, init_printing, symbols, pretty_print\n", "from sympy.solvers import nsolve\n", "init_printing()\n", "\n", "import numpy as np\n", "from numpy.linalg import norm\n", "np.random.seed(43)\n", "\n", "from matplotlib import pyplot as plt\n", "%pylab inline" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "# The moments to use in the calculations\n", "MOMENTS_USED = [1,2,3,4,5]\n", "\n", "# The parameters of the 2 gaussians, and the mixture param (mu_1, sigma_2, mu_2, sigma_2, r)\n", "REAL_PARAMS = (0, 2, 10, 3, .7)\n", "NUM_POINTS = 5000" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "\n", "mu = Symbol('mu')\n", "sigma = Symbol('sigma', positive=True)\n", "X = Normal('X', mu, sigma)\n", "\n", "moments = [simplify(moment(X,i)) for i in MOMENTS_USED]\n", "\n", "for (i, m) in zip(MOMENTS_USED, moments):\n", " print('\\nMoment {}'.format(i))\n", " pretty_print(m)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Moment 1\n", "\u03bc\n", "\n", "Moment 2\n", " 2 2\n", "\u03bc + \u03c3 \n", "\n", "Moment 3\n", " \u239b 2 2\u239e\n", "\u03bc\u22c5\u239d\u03bc + 3\u22c5\u03c3 \u23a0\n", "\n", "Moment 4\n", " 4 2 2 4\n", "\u03bc + 6\u22c5\u03bc \u22c5\u03c3 + 3\u22c5\u03c3 \n", "\n", "Moment 5\n", " \u239b 4 2 2 4\u239e\n", "\u03bc\u22c5\u239d\u03bc + 10\u22c5\u03bc \u22c5\u03c3 + 15\u22c5\u03c3 \u23a0\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's generate data from a gaussian mixture" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def generate_data(mu_1, sigma_1, mu_2, sigma_2, r, n):\n", " \"\"\"mu and sigmas are for the two gaussians.\n", " r is the probability of picking from from the first gaussian.\n", " n is the number of points to generate\n", " \"\"\"\n", " s = np.random.rand(n) < r\n", " return np.select([s, ~s],\n", " [np.random.normal(mu_1, sigma_1, n),\n", " np.random.normal(mu_2, sigma_2, n)])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "fake_data = generate_data(*REAL_PARAMS, n=NUM_POINTS)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "plt.hist(fake_data, bins=50);" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEACAYAAAC57G0KAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFBpJREFUeJzt3V+MXFd9wPHvJrYVgr04q1SO7aS1lcQKllCTSnGrQMUt\nFa6NqO08NAlVqQspiuQqRCBo4kjUAwgwUQN9qMhLAjKBmFpQLOeBECf4quGBWCA7/4wbO4olNo2d\nAgavFZTa2H04Zzyz45ndO//nnv1+pNHcOXNn5uyd2d+c+Z1zzwFJkiRJkiRJkiRJkiRJkqQ57VLg\nAPB4vF0BJmPZAWB93b5bgSPAYWDt4KooSerEJ4FvA3vi7W2xrNFq4CAwH1gBHAUuGUD9JEkNigTf\nq4EPAA8DY7FsrG673kZgJ3AGOEYI8Gu6rqUkqW1FAvxXgU8D5+rKzgN3A88BjwCLY/kyQuqmahJY\n3n01JUntmi3AfxB4g5Bnr2+xPwSsBG4EXgcenOE5zndTQUlSZ+bNcv8twAZCiuYyYBz4JvD3dfs8\nTK3z9TXgmrr7ro5l01x77bXnX3nllQ6rLElz1ivAdf144vdSC+RL68o/ATwWt6udrAsILfxXaJ6r\nP19m27ZtG3YVumL9h6fMdT9/3voPG21mRGZrwdcbq3vyB4A/jrdfBe6K5YeAXfH6LLCl3QpJknqj\nnQCfxwvAh2fY74vxIkkaIseodyDLsmFXoSvWf3jKXHew/mXTLD8+CDGdJEkqamxsDNqI27bgJSlR\nBnhJSpQBXpISZYCXpEQZ4CUpUQZ4SUqUAV6SEmWAl6REGeAlKVEGeElKlAFekhJlgJekRBngJSlR\nBnhJSlTRAH8pYeHt6pJ9E8Be4GXgSWBx3b5bgSPAYWBtb6qpXhsfn2BsbOyiy/j4xLCrJqlHigb4\newjL8FUncb+PEOBXAU/H2xDWZL09Xq8DvtbGa2iApqZOEt7O6ZdQLikFRYLv1cAHgIepTTS/AdgR\nt3cAm+L2RmAncAY4BhwF1vSorpKkNhQJ8F8FPg2cqytbApyI2yfibYBlwGTdfpPA8i7rKEnqwGyL\nbn8QeIOQf89a7FP9fd9K0/sqlcqF7SzL5txaiZI0mzzPyfO848fPtrbfF4EPA2eBy4Bx4D+BmwkB\n/ziwFNgH3EAtF789Xj8BbAOebXhe12QdsrC2Y7P3YAzfG2k09XpN1vuBa4CVwB3AjwgBfw+wOe6z\nGdgdt/fE/RbEx1wP7C9aGUlS78yWomlUbdptB3YBdxI6U2+L5Ydi+SFCq38LM6dvJEl9Urip32Om\naIbMFI1UPr1O0UiSSsoAL0mJMsBLUqIM8JKUKAO8JCXKAJ8QZ4iUVM9hkglpPfRxPuG0hEYOk5TK\npN1hku2e6KRSOsvFwXxY3+2SBsUUjSQlygAvSYkywEtSogzwkpQoA7wkJcoAL0mJMsBLUqIM8JKU\nqNkC/GWE9VQPElZp+lIsrwCThMW4DwDr6x6zFTgCHAbW9rCukqQ2FDmd8XLgTcJZrz8GPgX8JTAF\nfKVh39XAY4RFuZcDTwGrgHMN+zlVQR/MtEpT8zNZnapAKpN+rOj0ZrxeAFwKnKy+VpN9NwI7gTOE\ntVqPAmuKVkaS1DtFAvwlhBTNCWAf8FIsvxt4DngEWBzLlhFSN1WThJa8JGnAikw2dg64EXgH8EMg\nAx4CPhfv/zzwIHBni8c3/b1fqVQubGdZRpZlBaoiSXNHnufked7x49udUvAzwO+Af60rWwE8DrwL\nuC+WbY/XTwDbCB219czB94E5eCltvc7BX0kt/fI24P2EUTNX1e1zK/BC3N4D3EHI168Ergf2F62M\nJKl3ZkvRLAV2EL4ILgEeBZ4GvklI25wHXgXuivsfAnbF67PAFlqkaDSq5lVbCRcsWnQFp079ekj1\nkdQpV3RKSK9SNM329f2Shq8fwyQlSSVkgJekRBngJSlRBnhJSpQBXpISZYCXpEQZ4CUpUQZ4SUqU\nAV6SEmWAl6REGeAlKVEGeElKlAFekhJlgJekRBngJSlRBviSGh+fYGxsbNpFkurNFuAvI6ynepCw\nStOXYvkEsBd4GXiS2rJ+AFuBI8BhYG0vK6uaqamThIU56i+SVFOk2Xc58CZheb8fA58CNgC/BB4A\n7gWuICy4vRp4DLgZWA48BawCzjU8pys6dan56k2u6CSlrB8rOr0ZrxcAlwInCQF+RyzfAWyK2xuB\nncAZ4BhwFFhTtDKSpN4pEuAvIaRoTgD7gJeAJfE28XpJ3F4GTNY9dpLQkpckDdi8AvucA24E3gH8\nEPiLhvtnSwA3va9SqVzYzrKMLMsKVEWS5o48z8nzvOPHtzv04jPA74B/BDLgOLCU0LK/gZCHB9ge\nr58AthE6auuZg+/SYHPw84GzF+25aNEVnDr16yLVldQDvc7BX0lthMzbgPcDB4A9wOZYvhnYHbf3\nAHcQ8vUrgeuB/UUro1F1lotH7JyPI3kkjarZUjRLCZ2ol8TLo8DThCC/C7iT0Jl6W9z/UCw/RIgK\nW3D8niQNxbDOjjFF06VBD5Nsta/vozQ4/RgmKUkqIQO8JCXKAC9JiTLAS1KiDPCSlCgDvCQlygAv\nSYkywEtSogzwkpQoA/yIa7Y0n8vzSSrCqQpGXPMpCaCf0w84VYE0mpyqQJIEGOAlKVkGeElKlAFe\nkhJVJMBfQ22x7ReBj8fyCmFR7QPxsr7uMVuBI8BhYG2P6ipJakOR3tir4uUgsBD4GbCJsIrTFPCV\nhv1XA48BNwPLgaeAVYTFu6scRVOQo2gkVfVjFM1xQnAHOA38nBC4W73QRmAncIawnN9RYE3RCkmS\neqPdHPwK4CbgJ/H23cBzwCPUFudeRkjdVE1S+0KQJA1IOwF+IfBd4B5CS/4hYCVwI/A68OAMj/V3\nvCQN2LyC+80Hvgd8C9gdy96ou/9h4PG4/RqhY7bq6lg2TaVSubCdZRlZlhWsiiTNDXmek+d5x48v\nkqwfA3YAvwI+UVe+lNByJ5bfDPwttU7WNdQ6Wa9jeiveTtaC7GSVVNVuJ2uRFvy7gb8DnicMhwS4\nH/gQIT1zHngVuCvedwjYFa/PAlswRSNJA+dkYyPOFrykKicbkyQBBnhJSpYBXpISZYCXpEQZ4CUp\nUQZ4SUqUAV6SEmWAl6REGeAlKVEGeElKlAFekhJlgJekRBng1YV5jI2NTbuMj08Mu1KSImeTHHGj\nPptks319b6X+cDZJSRJggJekZBUJ8NcA+4CXgBeBj8fyCWAv8DLwJLC47jFbgSPAYWBtryorSSqu\nSC7nqng5CCwEfgZsAj4C/BJ4ALgXuAK4j9qarDdTW5N1FXCu7jnNwRdkDl5SVT9y8McJwR3gNPBz\nQuDeQFiMm3i9KW5vBHYCZ4BjwFHCAtySpAFqNwe/ArgJeBZYApyI5SfibYBlwGTdYyYJXwiSpAFq\nJ8AvBL4H3ANMNdx3nua/4evvlyQN0LyC+80nBPdHgd2x7AQhN38cWAq8EctfI3TMVl0dy6apVCoX\ntrMsI8uy4rWWpDkgz3PyPO/48UWS9WOEHPuvgE/UlT8Qy75M6FxdzPRO1jXUOlmvY3or3k7Wguxk\nlVTVbidrkR3fA/wX8Dy1/+atwH5gF/CHhM7U24DfxPvvBz4KnCWkdH7Y8JwG+IIM8JKq+hHg+8EA\nX5ABXlKVUxVIkgADvCQlywAvSYkywEtSogzwI2R8fOKiBTQkqVOOohkhzUfMjMbIGEfRSMPnKBpJ\nEmCAV89dvE6ra7VKw1F0LhqpoLM0S+dMTdmfIA2aLXhJSpQBXpISZYCXpEQZ4CUpUQZ4SUqUAV6S\nEmWAl6REFQnwXyesv/pCXVkFmAQOxMv6uvu2AkeAw8DantRSktS2IgH+G8C6hrLzwFeAm+LlB7F8\nNXB7vF4HfK3ga0iSeqxI8H0GONmkvNmpiRuBncAZwjqtRwmLb0uSBqyb1vXdwHPAI8DiWLaMkLqp\nmgSWd/EakqQOdToXzUPA5+L254EHgTtb7Nt07thKpXJhO8sysizrsCqSlKY8z8nzvOPHF50BagXw\nOPCuWe67L5Ztj9dPANuAZxse43zwTaQyH3yrfX3Ppe4Maj74pXXbt1IbYbMHuANYAKwErgf2d/ga\nkqQuFEnR7ATeC1wJ/ILQIs+AGwlNtVeBu+K+h4Bd8fossIUWKRpJUn+5ZN8IMUUjaSYu2SdJAgzw\nkpQsA7wkJcoALw3Q+PiEC5JrYOxkHSF2sqav1XvssVERdrJKkgADvCQlywAvSYkywEtSogzwUpcc\nGaNR5SiaEeIomnJqZ2SMo2jUDUfRSH3SrKUe/+GkkWSAlwqamjpJaH03Xro1r+kXh2kedavTFZ0k\n9cxZmn1RTE3560DdsQU/BP7UlzQItuCHoPZTv5FBXlLvFGnBfx04QW1ZPoAJYC/wMvAksLjuvq3A\nEeAwsLY31ZQGq9mvLKlsigT4bwDrGsruIwT4VcDT1BbbXg3cHq/XAV8r+BrSSGneoSqVS5Hg+wxw\nsqFsA7Ajbu8ANsXtjYQ1XM8Ax4CjwJquaymVTvORMdIgddq6XkJI2xCvl8TtZcBk3X6TwPIOX0NJ\nuTjgpT0MsDoyxl8BGp5edLLO9sltel+lUrmwnWUZWZb1oCoaXRcPBXQYoDSzPM/J87zjxxf9D1sB\nPA68K94+DGTAcWApsA+4gVoufnu8fgLYBjzb8HxzeqqC5qerw6hMKTDIfUf1czAq00aM6vHRcAxq\nqoI9wOa4vRnYXVd+B7AAWAlcD+zv8DUkSV0okqLZCbwXuBL4BfAvhBb6LuBOQmfqbXHfQ7H8EOE3\n+RZMPErSUDib5BCYoqmVjernwBSNRpGzSUrJmGsjj9RrTlUgjSxHHqk7tuAlKVEGeCkBLhuoZgzw\nGqLhL3SRytTNzebOCWWayxxF00fj4xMz/JMNfwTLKO87qM9HGUc0udbr3NXuKBo7WfvIed8lDZMB\nXiqVeaVMIWk4DPBSqTRfv9VfhWrGTlZJSpQBXpISZYDXnOE6q5prDPAqhVbj1dsZM+86q5pr7GRV\nKbQacurcLFJrtuAlKVEGeClZw58KQsPVbYrmGHAK+D1wBlgDTAD/AfwRtdWeftPl60hqW/Mx86a1\n5o5uW/DnCYtv30QI7hAW3t4LrAKeprYQtyRpgHqRomlsDmwAdsTtHcCmHrzGyHMInqRR04sW/FPA\nT4GPxbIlwIm4fSLeTp5D8FQeLgU4V3Sbg3838DrwB4S0zOGG+1tGukqlcmE7yzKyLOuyKkpHdxNq\nzTxNs1wKsDzyPCfP844f38t3dRtwmtCSz4DjwFJgH3BDw77JzQffaj7u4vN8typ339n2bfwslXGO\n91HYN7X/yRS1Ox98Nymay4FFcfvtwFrgBWAPsDmWbwZ2d/EakqQOdZOiWQJ8v+55vg08ScjH7wLu\npDZMUuoT50eXWnHJvh4xReO+Zd83tf/JFA0yRSNJGmEGeElKlAFekhJlgJeEE5OlyfngJeHEZGmy\nBS9pBk5rUGa24CXNwGkNyswWfJtarQ0qSaPGFnybWq0NOrxzxiSpOVvwkpQoA7wkJcoAL0mJMsBL\nalPzk6LGxhYULHOo5aDYySqpTc1PimpnRkuHWg6GLXhJSlS/Avw6wvqsR4B7+/QakqQZ9CPAXwr8\nOyHIrwY+BLyzD68zRPmwK9ClfNgV6FI+7ArMYXmPnmc4UyB0s4B1GfUjwK8BjhKW6zsDfAfY2IfX\n6ZlWZ6c26yAK8mFWtwfyYVegS/mwKzCH5T16nmoev3YJJxH211wL8P3oZF0O/KLu9iTwp314nZbe\neustbrvtH/jtb9+cVj42Bl/4wr3ccsst08pnPju1WaeRpN5rtb7ufEJbsWbRois4derXF+05Pj5x\n0RdFq33ngn4E+KEv7Hj69Gn27PlO0/ve974f8dZbpwdcI0mzKz46Z2pq/gxzQM2872c/+1lgbnxJ\n9KM5+mdAhZCDB9gKnAO+XLfPUeDaPry2JKXsFeC6YVZgXqzECmABcJDkOlklae5aD/w3oaW+dch1\nkSRJktSpvwFeAn4P/EnDfVsJJ0YdBtYOuF6dqBBGCB2Il3Uz7j0ayn4C2jHgecLx3j/cqhTydeAE\n8EJd2QSwF3gZeBJYPIR6FdWs/hXK8bm/BthHiDcvAh+P5WU5/q3qX2GEj/8NwCpCxesD/GpCrn4+\nIXd/lNGfRmEb8MlhV6INlxKO6wrCcS5j38irhH/Qsvhz4CamB8gHgH+O2/cC2wddqTY0q39ZPvdX\nATfG7YWElPE7Kc/xb1X/to7/oIPoYcI3Z6ONwE7CYNdjhEC0ZnDV6liZBsWX7gS0Fsp0zJ8BGs/e\n2QDsiNs7gE0DrVF7mtUfyvEeHCc0YgBOAz8nnKNTluPfqv7QxvEflVbyMsLPjqpJan/MKLsbeA54\nhNH9qVfV7AS0MhzjeueBp4CfAh8bcl06tYSQ9iBeLxliXTpVps89hF+tNwHPUs7jv4JQ/5/E24WP\nfz8C/F7CT7rGy1+3+TxDP2GK1n/LBuAhYCXhZ9TrwINDqmNRo3A8u/Vuwgd9PfBPhBRCmVXP0y+T\nsn3uFwLfA+4BphruK8PxXwh8l1D/07R5/PtxJuv7O3jMa4ROhaqrY9mwFf1bHgYe72dFeqDxGF/D\n9F9NZfB6vP5f4PuEtNMzw6tOR04Q8qvHgaXAG8OtTtvq6zvqn/v5hOD+KLA7lpXp+Ffr/y1q9W/r\n+A8zRVOfR9oD3EE4MWolcD2jP0piad32rUzviBpFPyUc1xWE43w74biXxeXAorj9dsJIq1E/5s3s\nATbH7c3U/nHLoiyf+zFCCuMQ8G915WU5/q3qP9LH/1ZCHvh3hG/QH9Tddz+hE/Aw8FeDr1rbvkkY\nsvcc4UNShlxemU9AW0nodDpIGDZWhvrvBP4H+D/C5/4jhFFATzH6w/Tg4vp/lPJ87t9DmCLlINOH\nFJbl+Der/3rKc/wlSZIkSZIkSZIkSZIkSZIkSZIkKfh/26TvI8xqY0YAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we assume our data is a mixture of two gaussians, N_1 and N_2, with r the mixture parameter, then we know what the moments should look like for the data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "mu_1, sigma_1, mu_2, sigma_2, r = symbols('mu_1 sigma_1 mu_2 sigma_2 r')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "def mixture_moment(m):\n", " return ( r * (m.subs({mu:mu_1, sigma: sigma_1}))\n", " + (1 - r) * (m.subs({mu:mu_2, sigma: sigma_2})))\n", "\n", "mixture_moments = [mixture_moment(m) for m in moments]\n", "\n", "for (i, m) in zip(MOMENTS_USED, mixture_moments):\n", " print('\\nMixture Moment {}'.format(i))\n", " pretty_print(m)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Mixture Moment 1\n", "\u03bc\u2081\u22c5r + \u03bc\u2082\u22c5(-r + 1)\n", "\n", "Mixture Moment 2\n", " \u239b 2 2\u239e \u239b 2 2\u239e \n", "r\u22c5\u239d\u03bc\u2081 + \u03c3\u2081 \u23a0 + \u239d\u03bc\u2082 + \u03c3\u2082 \u23a0\u22c5(-r + 1)\n", "\n", "Mixture Moment 3\n", " \u239b 2 2\u239e \u239b 2 2\u239e \n", "\u03bc\u2081\u22c5r\u22c5\u239d\u03bc\u2081 + 3\u22c5\u03c3\u2081 \u23a0 + \u03bc\u2082\u22c5\u239d\u03bc\u2082 + 3\u22c5\u03c3\u2082 \u23a0\u22c5(-r + 1)\n", "\n", "Mixture Moment 4\n", " \u239b 4 2 2 4\u239e \u239b 4 2 2 4\u239e\n", "r\u22c5\u239d\u03bc\u2081 + 6\u22c5\u03bc\u2081 \u22c5\u03c3\u2081 + 3\u22c5\u03c3\u2081 \u23a0 + (-r + 1)\u22c5\u239d\u03bc\u2082 + 6\u22c5\u03bc\u2082 \u22c5\u03c3\u2082 + 3\u22c5\u03c3\u2082 \u23a0\n", "\n", "Mixture Moment 5\n", " \u239b 4 2 2 4\u239e \u239b 4 2 2 4\u239e\n", "\u03bc\u2081\u22c5r\u22c5\u239d\u03bc\u2081 + 10\u22c5\u03bc\u2081 \u22c5\u03c3\u2081 + 15\u22c5\u03c3\u2081 \u23a0 + \u03bc\u2082\u22c5(-r + 1)\u22c5\u239d\u03bc\u2082 + 10\u22c5\u03bc\u2082 \u22c5\u03c3\u2082 + 15\u22c5\u03c3\u2082 \u23a0\n" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "def numerical_moment(a, n):\n", " return np.sum(a ** n) / len(a)\n", "\n", "actual_moments = [numerical_moment(fake_data, i) for i in MOMENTS_USED]\n", "\n", "for (i, m) in zip(MOMENTS_USED, actual_moments):\n", " print('\\nActual Moment {}: {}'.format(i, m))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Actual Moment 1: 3.033243007556774\n", "\n", "Actual Moment 2: 35.429816643767715\n", "\n", "Actual Moment 3: 378.3815897586502\n", "\n", "Actual Moment 4: 4697.59347129467\n", "\n", "Actual Moment 5: 60793.208071695706\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have 5 moments, and 5 unknows (r and the 2 parameters for the 2 normal distributions). So now we should be able to use the numerical momemnts to solve for the parameters." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# These should all be solved for zero\n", "equations = [m - value for (m,value) in zip(mixture_moments, actual_moments)]\n", "for (i, m) in zip(MOMENTS_USED, equations):\n", " print('\\nEqn for moment {}'.format(i))\n", " pretty_print(m)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Eqn for moment 1\n", "\u03bc\u2081\u22c5r + \u03bc\u2082\u22c5(-r + 1) - 3.03324300755677\n", "\n", "Eqn for moment 2\n", " \u239b 2 2\u239e \u239b 2 2\u239e \n", "r\u22c5\u239d\u03bc\u2081 + \u03c3\u2081 \u23a0 + \u239d\u03bc\u2082 + \u03c3\u2082 \u23a0\u22c5(-r + 1) - 35.4298166437677\n", "\n", "Eqn for moment 3\n", " \u239b 2 2\u239e \u239b 2 2\u239e \n", "\u03bc\u2081\u22c5r\u22c5\u239d\u03bc\u2081 + 3\u22c5\u03c3\u2081 \u23a0 + \u03bc\u2082\u22c5\u239d\u03bc\u2082 + 3\u22c5\u03c3\u2082 \u23a0\u22c5(-r + 1) - 378.38158975865\n", "\n", "Eqn for moment 4\n", " \u239b 4 2 2 4\u239e \u239b 4 2 2 4\u239e \n", "r\u22c5\u239d\u03bc\u2081 + 6\u22c5\u03bc\u2081 \u22c5\u03c3\u2081 + 3\u22c5\u03c3\u2081 \u23a0 + (-r + 1)\u22c5\u239d\u03bc\u2082 + 6\u22c5\u03bc\u2082 \u22c5\u03c3\u2082 + 3\u22c5\u03c3\u2082 \u23a0 - 4697.593471\n", "\n", " \n", "29467\n", "\n", "Eqn for moment 5\n", " \u239b 4 2 2 4\u239e \u239b 4 2 2 4\u239e \n", "\u03bc\u2081\u22c5r\u22c5\u239d\u03bc\u2081 + 10\u22c5\u03bc\u2081 \u22c5\u03c3\u2081 + 15\u22c5\u03c3\u2081 \u23a0 + \u03bc\u2082\u22c5(-r + 1)\u22c5\u239d\u03bc\u2082 + 10\u22c5\u03bc\u2082 \u22c5\u03c3\u2082 + 15\u22c5\u03c3\u2082 \u23a0 - 6\n", "\n", " \n", "0793.2080716957\n" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "print(\"True Paramaters for mu_1, sigma_1, mu_2, sigma_2, r: {}\".format(REAL_PARAMS))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "True Paramaters for mu_1, sigma_1, mu_2, sigma_2, r: (0, 2, 10, 3, 0.7)\n" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "def solve_numerically(equations, initial_guess):\n", " # If I start off near-ish to the point, then we can solve it numerically.\n", " solved = nsolve(equations, (mu_1, sigma_1, mu_2, sigma_2, r), list(initial_guess))\n", " print('Numeric solution')\n", " pretty_print(solved.T)\n", "\n", " # Blarg. I get arbitrary precesion stuff out and I don't need that\n", " to_float_array = lambda mfp_array: np.array(list(map(float, mfp_array)))\n", "\n", " distance_to_real_array = norm(np.array(REAL_PARAMS) - to_float_array(solved.T))\n", "\n", " # Not sure how I should really be talking about distance...\n", " print('L2 distance from true solution (after sampling {} sample points): {}'.format(\n", " NUM_POINTS, distance_to_real_array ))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we just solve numerically starting at the real parameters, then the error is just the sampling error" ] }, { "cell_type": "code", "collapsed": false, "input": [ "solve_numerically(equations, np.array(REAL_PARAMS))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Numeric solution\n", "[-0.0964001259983833 1.93947261516827 9.45613788320574 3.31424965505908 0.\n", "672375746577542]\n", "L2 distance from true solution (after sampling 5000 sample points): 0.6389510902822756\n" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we move the initial guess slightly we are fine" ] }, { "cell_type": "code", "collapsed": false, "input": [ "solve_numerically(equations, np.array(REAL_PARAMS) + np.array([.3, .2, .2, .2, 0]))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Numeric solution\n", "[-0.0964001259983833 1.93947261516827 9.45613788320574 3.31424965505908 0.\n", "672375746577542]\n", "L2 distance from true solution (after sampling 5000 sample points): 0.6389510902822756\n" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "But then if we get a little farther...." ] }, { "cell_type": "code", "collapsed": false, "input": [ "solve_numerically(equations, np.array(REAL_PARAMS) + np.array([2, 1, -1, 1, .2]))" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "Could not find root within given tolerance. (1278.68 > 2.1684e-19)\nTry another starting point or tweak arguments.", "output_type": "pyerr", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0msolve_numerically\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mequations\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0marray\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mREAL_PARAMS\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m+\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0marray\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m-\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m.2\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;32m\u001b[0m in \u001b[0;36msolve_numerically\u001b[1;34m(equations, initial_guess)\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0msolve_numerically\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mequations\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minitial_guess\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;31m# If I start off near-ish to the point, then we can solve it numerically.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0msolved\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnsolve\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mequations\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mmu_1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0msigma_1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmu_2\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0msigma_2\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mr\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mlist\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0minitial_guess\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Numeric solution'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[0mpretty_print\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0msolved\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/justinvf/anaconda3/lib/python3.4/site-packages/sympy/solvers/solvers.py\u001b[0m in \u001b[0;36mnsolve\u001b[1;34m(*args, **kwargs)\u001b[0m\n\u001b[0;32m 2473\u001b[0m \u001b[0mJ\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mlambdify\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfargs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mJ\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmodules\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2474\u001b[0m \u001b[1;31m# solve the system numerically\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2475\u001b[1;33m \u001b[0mx\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mfindroot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mf\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mx0\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mJ\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mJ\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2476\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mx\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2477\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/justinvf/anaconda3/lib/python3.4/site-packages/sympy/mpmath/calculus/optimization.py\u001b[0m in \u001b[0;36mfindroot\u001b[1;34m(ctx, f, x0, solver, tol, verbose, verify, **kwargs)\u001b[0m\n\u001b[0;32m 973\u001b[0m \u001b[1;34m'(%g > %g)\\n'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 974\u001b[0m \u001b[1;34m'Try another starting point or tweak arguments.'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 975\u001b[1;33m % (norm(f(*xl))**2, tol))\n\u001b[0m\u001b[0;32m 976\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mx\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 977\u001b[0m \u001b[1;32mfinally\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mValueError\u001b[0m: Could not find root within given tolerance. (1278.68 > 2.1684e-19)\nTry another starting point or tweak arguments." ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we looked at the data, we could guess some obvious initial guesses for the mus and sigmas. As we had in the histogram above:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.hist(fake_data, bins=50);" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEACAYAAAC57G0KAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFBpJREFUeJzt3V+MXFd9wPHvJrYVgr04q1SO7aS1lcQKllCTSnGrQMUt\nFa6NqO08NAlVqQspiuQqRCBo4kjUAwgwUQN9qMhLAjKBmFpQLOeBECf4quGBWCA7/4wbO4olNo2d\nAgavFZTa2H04Zzyz45ndO//nnv1+pNHcOXNn5uyd2d+c+Z1zzwFJkiRJkiRJkiRJkiRJkqQ57VLg\nAPB4vF0BJmPZAWB93b5bgSPAYWDt4KooSerEJ4FvA3vi7W2xrNFq4CAwH1gBHAUuGUD9JEkNigTf\nq4EPAA8DY7FsrG673kZgJ3AGOEYI8Gu6rqUkqW1FAvxXgU8D5+rKzgN3A88BjwCLY/kyQuqmahJY\n3n01JUntmi3AfxB4g5Bnr2+xPwSsBG4EXgcenOE5zndTQUlSZ+bNcv8twAZCiuYyYBz4JvD3dfs8\nTK3z9TXgmrr7ro5l01x77bXnX3nllQ6rLElz1ivAdf144vdSC+RL68o/ATwWt6udrAsILfxXaJ6r\nP19m27ZtG3YVumL9h6fMdT9/3voPG21mRGZrwdcbq3vyB4A/jrdfBe6K5YeAXfH6LLCl3QpJknqj\nnQCfxwvAh2fY74vxIkkaIseodyDLsmFXoSvWf3jKXHew/mXTLD8+CDGdJEkqamxsDNqI27bgJSlR\nBnhJSpQBXpISZYCXpEQZ4CUpUQZ4SUqUAV6SEmWAl6REGeAlKVEGeElKlAFekhJlgJekRBngJSlR\nBnhJSlTRAH8pYeHt6pJ9E8Be4GXgSWBx3b5bgSPAYWBtb6qpXhsfn2BsbOyiy/j4xLCrJqlHigb4\newjL8FUncb+PEOBXAU/H2xDWZL09Xq8DvtbGa2iApqZOEt7O6ZdQLikFRYLv1cAHgIepTTS/AdgR\nt3cAm+L2RmAncAY4BhwF1vSorpKkNhQJ8F8FPg2cqytbApyI2yfibYBlwGTdfpPA8i7rKEnqwGyL\nbn8QeIOQf89a7FP9fd9K0/sqlcqF7SzL5txaiZI0mzzPyfO848fPtrbfF4EPA2eBy4Bx4D+BmwkB\n/ziwFNgH3EAtF789Xj8BbAOebXhe12QdsrC2Y7P3YAzfG2k09XpN1vuBa4CVwB3AjwgBfw+wOe6z\nGdgdt/fE/RbEx1wP7C9aGUlS78yWomlUbdptB3YBdxI6U2+L5Ydi+SFCq38LM6dvJEl9Urip32Om\naIbMFI1UPr1O0UiSSsoAL0mJMsBLUqIM8JKUKAO8JCXKAJ8QZ4iUVM9hkglpPfRxPuG0hEYOk5TK\npN1hku2e6KRSOsvFwXxY3+2SBsUUjSQlygAvSYkywEtSogzwkpQoA7wkJcoAL0mJMsBLUqIM8JKU\nqNkC/GWE9VQPElZp+lIsrwCThMW4DwDr6x6zFTgCHAbW9rCukqQ2FDmd8XLgTcJZrz8GPgX8JTAF\nfKVh39XAY4RFuZcDTwGrgHMN+zlVQR/MtEpT8zNZnapAKpN+rOj0ZrxeAFwKnKy+VpN9NwI7gTOE\ntVqPAmuKVkaS1DtFAvwlhBTNCWAf8FIsvxt4DngEWBzLlhFSN1WThJa8JGnAikw2dg64EXgH8EMg\nAx4CPhfv/zzwIHBni8c3/b1fqVQubGdZRpZlBaoiSXNHnufked7x49udUvAzwO+Af60rWwE8DrwL\nuC+WbY/XTwDbCB219czB94E5eCltvc7BX0kt/fI24P2EUTNX1e1zK/BC3N4D3EHI168Ergf2F62M\nJKl3ZkvRLAV2EL4ILgEeBZ4GvklI25wHXgXuivsfAnbF67PAFlqkaDSq5lVbCRcsWnQFp079ekj1\nkdQpV3RKSK9SNM329f2Shq8fwyQlSSVkgJekRBngJSlRBnhJSpQBXpISZYCXpEQZ4CUpUQZ4SUqU\nAV6SEmWAl6REGeAlKVEGeElKlAFekhJlgJekRBngJSlRBviSGh+fYGxsbNpFkurNFuAvI6ynepCw\nStOXYvkEsBd4GXiS2rJ+AFuBI8BhYG0vK6uaqamThIU56i+SVFOk2Xc58CZheb8fA58CNgC/BB4A\n7gWuICy4vRp4DLgZWA48BawCzjU8pys6dan56k2u6CSlrB8rOr0ZrxcAlwInCQF+RyzfAWyK2xuB\nncAZ4BhwFFhTtDKSpN4pEuAvIaRoTgD7gJeAJfE28XpJ3F4GTNY9dpLQkpckDdi8AvucA24E3gH8\nEPiLhvtnSwA3va9SqVzYzrKMLMsKVEWS5o48z8nzvOPHtzv04jPA74B/BDLgOLCU0LK/gZCHB9ge\nr58AthE6auuZg+/SYHPw84GzF+25aNEVnDr16yLVldQDvc7BX0lthMzbgPcDB4A9wOZYvhnYHbf3\nAHcQ8vUrgeuB/UUro1F1lotH7JyPI3kkjarZUjRLCZ2ol8TLo8DThCC/C7iT0Jl6W9z/UCw/RIgK\nW3D8niQNxbDOjjFF06VBD5Nsta/vozQ4/RgmKUkqIQO8JCXKAC9JiTLAS1KiDPCSlCgDvCQlygAv\nSYkywEtSogzwkpQoA/yIa7Y0n8vzSSrCqQpGXPMpCaCf0w84VYE0mpyqQJIEGOAlKVkGeElKlAFe\nkhJVJMBfQ22x7ReBj8fyCmFR7QPxsr7uMVuBI8BhYG2P6ipJakOR3tir4uUgsBD4GbCJsIrTFPCV\nhv1XA48BNwPLgaeAVYTFu6scRVOQo2gkVfVjFM1xQnAHOA38nBC4W73QRmAncIawnN9RYE3RCkmS\neqPdHPwK4CbgJ/H23cBzwCPUFudeRkjdVE1S+0KQJA1IOwF+IfBd4B5CS/4hYCVwI/A68OAMj/V3\nvCQN2LyC+80Hvgd8C9gdy96ou/9h4PG4/RqhY7bq6lg2TaVSubCdZRlZlhWsiiTNDXmek+d5x48v\nkqwfA3YAvwI+UVe+lNByJ5bfDPwttU7WNdQ6Wa9jeiveTtaC7GSVVNVuJ2uRFvy7gb8DnicMhwS4\nH/gQIT1zHngVuCvedwjYFa/PAlswRSNJA+dkYyPOFrykKicbkyQBBnhJSpYBXpISZYCXpEQZ4CUp\nUQZ4SUqUAV6SEmWAl6REGeAlKVEGeElKlAFekhJlgJekRBng1YV5jI2NTbuMj08Mu1KSImeTHHGj\nPptks319b6X+cDZJSRJggJekZBUJ8NcA+4CXgBeBj8fyCWAv8DLwJLC47jFbgSPAYWBtryorSSqu\nSC7nqng5CCwEfgZsAj4C/BJ4ALgXuAK4j9qarDdTW5N1FXCu7jnNwRdkDl5SVT9y8McJwR3gNPBz\nQuDeQFiMm3i9KW5vBHYCZ4BjwFHCAtySpAFqNwe/ArgJeBZYApyI5SfibYBlwGTdYyYJXwiSpAFq\nJ8AvBL4H3ANMNdx3nua/4evvlyQN0LyC+80nBPdHgd2x7AQhN38cWAq8EctfI3TMVl0dy6apVCoX\ntrMsI8uy4rWWpDkgz3PyPO/48UWS9WOEHPuvgE/UlT8Qy75M6FxdzPRO1jXUOlmvY3or3k7Wguxk\nlVTVbidrkR3fA/wX8Dy1/+atwH5gF/CHhM7U24DfxPvvBz4KnCWkdH7Y8JwG+IIM8JKq+hHg+8EA\nX5ABXlKVUxVIkgADvCQlywAvSYkywEtSogzwI2R8fOKiBTQkqVOOohkhzUfMjMbIGEfRSMPnKBpJ\nEmCAV89dvE6ra7VKw1F0LhqpoLM0S+dMTdmfIA2aLXhJSpQBXpISZYCXpEQZ4CUpUQZ4SUqUAV6S\nEmWAl6REFQnwXyesv/pCXVkFmAQOxMv6uvu2AkeAw8DantRSktS2IgH+G8C6hrLzwFeAm+LlB7F8\nNXB7vF4HfK3ga0iSeqxI8H0GONmkvNmpiRuBncAZwjqtRwmLb0uSBqyb1vXdwHPAI8DiWLaMkLqp\nmgSWd/EakqQOdToXzUPA5+L254EHgTtb7Nt07thKpXJhO8sysizrsCqSlKY8z8nzvOPHF50BagXw\nOPCuWe67L5Ztj9dPANuAZxse43zwTaQyH3yrfX3Ppe4Maj74pXXbt1IbYbMHuANYAKwErgf2d/ga\nkqQuFEnR7ATeC1wJ/ILQIs+AGwlNtVeBu+K+h4Bd8fossIUWKRpJUn+5ZN8IMUUjaSYu2SdJAgzw\nkpQsA7wkJcoALw3Q+PiEC5JrYOxkHSF2sqav1XvssVERdrJKkgADvCQlywAvSYkywEtSogzwUpcc\nGaNR5SiaEeIomnJqZ2SMo2jUDUfRSH3SrKUe/+GkkWSAlwqamjpJaH03Xro1r+kXh2kedavTFZ0k\n9cxZmn1RTE3560DdsQU/BP7UlzQItuCHoPZTv5FBXlLvFGnBfx04QW1ZPoAJYC/wMvAksLjuvq3A\nEeAwsLY31ZQGq9mvLKlsigT4bwDrGsruIwT4VcDT1BbbXg3cHq/XAV8r+BrSSGneoSqVS5Hg+wxw\nsqFsA7Ajbu8ANsXtjYQ1XM8Ax4CjwJquaymVTvORMdIgddq6XkJI2xCvl8TtZcBk3X6TwPIOX0NJ\nuTjgpT0MsDoyxl8BGp5edLLO9sltel+lUrmwnWUZWZb1oCoaXRcPBXQYoDSzPM/J87zjxxf9D1sB\nPA68K94+DGTAcWApsA+4gVoufnu8fgLYBjzb8HxzeqqC5qerw6hMKTDIfUf1czAq00aM6vHRcAxq\nqoI9wOa4vRnYXVd+B7AAWAlcD+zv8DUkSV0okqLZCbwXuBL4BfAvhBb6LuBOQmfqbXHfQ7H8EOE3\n+RZMPErSUDib5BCYoqmVjernwBSNRpGzSUrJmGsjj9RrTlUgjSxHHqk7tuAlKVEGeCkBLhuoZgzw\nGqLhL3SRytTNzebOCWWayxxF00fj4xMz/JMNfwTLKO87qM9HGUc0udbr3NXuKBo7WfvIed8lDZMB\nXiqVeaVMIWk4DPBSqTRfv9VfhWrGTlZJSpQBXpISZYDXnOE6q5prDPAqhVbj1dsZM+86q5pr7GRV\nKbQacurcLFJrtuAlKVEGeClZw58KQsPVbYrmGHAK+D1wBlgDTAD/AfwRtdWeftPl60hqW/Mx86a1\n5o5uW/DnCYtv30QI7hAW3t4LrAKeprYQtyRpgHqRomlsDmwAdsTtHcCmHrzGyHMInqRR04sW/FPA\nT4GPxbIlwIm4fSLeTp5D8FQeLgU4V3Sbg3838DrwB4S0zOGG+1tGukqlcmE7yzKyLOuyKkpHdxNq\nzTxNs1wKsDzyPCfP844f38t3dRtwmtCSz4DjwFJgH3BDw77JzQffaj7u4vN8typ339n2bfwslXGO\n91HYN7X/yRS1Ox98Nymay4FFcfvtwFrgBWAPsDmWbwZ2d/EakqQOdZOiWQJ8v+55vg08ScjH7wLu\npDZMUuoT50eXWnHJvh4xReO+Zd83tf/JFA0yRSNJGmEGeElKlAFekhJlgJeEE5OlyfngJeHEZGmy\nBS9pBk5rUGa24CXNwGkNyswWfJtarQ0qSaPGFnybWq0NOrxzxiSpOVvwkpQoA7wkJcoAL0mJMsBL\nalPzk6LGxhYULHOo5aDYySqpTc1PimpnRkuHWg6GLXhJSlS/Avw6wvqsR4B7+/QakqQZ9CPAXwr8\nOyHIrwY+BLyzD68zRPmwK9ClfNgV6FI+7ArMYXmPnmc4UyB0s4B1GfUjwK8BjhKW6zsDfAfY2IfX\n6ZlWZ6c26yAK8mFWtwfyYVegS/mwKzCH5T16nmoev3YJJxH211wL8P3oZF0O/KLu9iTwp314nZbe\neustbrvtH/jtb9+cVj42Bl/4wr3ccsst08pnPju1WaeRpN5rtb7ufEJbsWbRois4derXF+05Pj5x\n0RdFq33ngn4E+KEv7Hj69Gn27PlO0/ve974f8dZbpwdcI0mzKz46Z2pq/gxzQM2872c/+1lgbnxJ\n9KM5+mdAhZCDB9gKnAO+XLfPUeDaPry2JKXsFeC6YVZgXqzECmABcJDkOlklae5aD/w3oaW+dch1\nkSRJktSpvwFeAn4P/EnDfVsJJ0YdBtYOuF6dqBBGCB2Il3Uz7j0ayn4C2jHgecLx3j/cqhTydeAE\n8EJd2QSwF3gZeBJYPIR6FdWs/hXK8bm/BthHiDcvAh+P5WU5/q3qX2GEj/8NwCpCxesD/GpCrn4+\nIXd/lNGfRmEb8MlhV6INlxKO6wrCcS5j38irhH/Qsvhz4CamB8gHgH+O2/cC2wddqTY0q39ZPvdX\nATfG7YWElPE7Kc/xb1X/to7/oIPoYcI3Z6ONwE7CYNdjhEC0ZnDV6liZBsWX7gS0Fsp0zJ8BGs/e\n2QDsiNs7gE0DrVF7mtUfyvEeHCc0YgBOAz8nnKNTluPfqv7QxvEflVbyMsLPjqpJan/MKLsbeA54\nhNH9qVfV7AS0MhzjeueBp4CfAh8bcl06tYSQ9iBeLxliXTpVps89hF+tNwHPUs7jv4JQ/5/E24WP\nfz8C/F7CT7rGy1+3+TxDP2GK1n/LBuAhYCXhZ9TrwINDqmNRo3A8u/Vuwgd9PfBPhBRCmVXP0y+T\nsn3uFwLfA+4BphruK8PxXwh8l1D/07R5/PtxJuv7O3jMa4ROhaqrY9mwFf1bHgYe72dFeqDxGF/D\n9F9NZfB6vP5f4PuEtNMzw6tOR04Q8qvHgaXAG8OtTtvq6zvqn/v5hOD+KLA7lpXp+Ffr/y1q9W/r\n+A8zRVOfR9oD3EE4MWolcD2jP0piad32rUzviBpFPyUc1xWE43w74biXxeXAorj9dsJIq1E/5s3s\nATbH7c3U/nHLoiyf+zFCCuMQ8G915WU5/q3qP9LH/1ZCHvh3hG/QH9Tddz+hE/Aw8FeDr1rbvkkY\nsvcc4UNShlxemU9AW0nodDpIGDZWhvrvBP4H+D/C5/4jhFFATzH6w/Tg4vp/lPJ87t9DmCLlINOH\nFJbl+Der/3rKc/wlSZIkSZIkSZIkSZIkSZIkSZIkKfh/26TvI8xqY0YAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The left hump looks a little move massive, so I would guess .7 for $r$. Then $\\mu_1$ looks to be around 0, $\\mu_2$ around 10. The I would guess a larger standard deviation for the second." ] }, { "cell_type": "code", "collapsed": false, "input": [ "solve_numerically(equations, np.array([0, 3, 10, 5, .7]))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Numeric solution\n", "[-0.0964001259983833 1.93947261516827 9.45613788320574 3.31424965505908 0.\n", "672375746577542]\n", "L2 distance from true solution (after sampling 5000 sample points): 0.6389510902822756\n" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And that works... But it's not satisfying. I should read more papers about this :).\n", "\n", "My main questions left from this exercise (many probably probably obvious, I know like zero stats):\n", "\n", "1. What distance metric is appropriate when talking about the distance between distribution parameters.\n", "2. How does the number of gaussians increase the difficulty for the numerical solver.\n", "3. I could probably pick means by doing a sliding window over the points. As long as the gaussians have some distance it shouldn't be too hard to get good guesses for standard deviation either.\n", "4. I just picked the first 5 moments. I could have used later moments (the MOMENTS_USED constant). Using higher order moments increases the error on the parameter estimates though (4x for using 2-6 as opposed to 1-5). If this technique were extended to 3 gaussians though, it would necesitate using these higher order moments. That seems damming.\n", "5. How will NUM_POINTS affect the error?\n", "6. Could the central moments be of use? How do the central moments of the individual gaussians relate the central moments of the full distribution?" ] } ], "metadata": {} } ] }