{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Solution-05\n", "\n", "This notebook contains the solution to the Bayesian model of 47 Ursae Majoris.\n", "\n", "Your task to find the orbital parameters of the first planet in this system, using data from table 1 of [Fischer et al 2002](http://iopscience.iop.org/article/10.1086/324336/meta).\n", "\n", "The following IPython magic command will create the data file ``47UrsaeMajoris.txt`` in the current directory:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%file 47UrsaeMajoris.txt\n", "date rv rv_err\n", "6959.737 -60.48 14.00\n", "7194.912 -53.60 7.49\n", "7223.798 -38.36 6.14\n", "7964.893 0.60 8.19\n", "8017.730 -28.29 10.57\n", "8374.771 -40.25 9.37\n", "8647.897 42.37 11.41\n", "8648.910 32.64 11.02\n", "8670.878 55.45 11.45\n", "8745.691 51.78 8.76\n", "8992.061 4.49 11.21\n", "9067.771 -14.63 7.00\n", "9096.734 -26.06 6.79\n", "9122.691 -47.38 7.91\n", "9172.686 -38.22 10.55\n", "9349.912 -52.21 9.52\n", "9374.964 -48.69 8.67\n", "9411.839 -36.01 12.81\n", "9481.720 -52.46 13.40\n", "9767.918 38.58 5.48\n", "9768.908 36.68 5.02\n", "9802.789 37.93 3.85\n", "10058.079 15.82 3.45\n", "10068.980 15.46 4.63\n", "10072.012 21.20 4.09\n", "10088.994 1.30 4.25\n", "10089.947 6.12 3.70\n", "10091.900 0.00 4.16\n", "10120.918 4.07 4.16\n", "10124.905 0.29 3.74\n", "10125.823 -1.87 3.79\n", "10127.898 -0.68 4.10\n", "10144.877 -4.13 5.26\n", "10150.797 -8.14 4.18\n", "10172.829 -10.79 4.43\n", "10173.762 -9.33 5.43\n", "10181.742 -23.87 3.28\n", "10187.740 -16.70 4.67\n", "10199.730 -16.29 3.98\n", "10203.733 -21.84 4.92\n", "10214.731 -24.51 3.67\n", "10422.018 -56.63 4.23\n", "10438.001 -39.61 3.91\n", "10442.027 -44.62 4.05\n", "10502.853 -32.05 4.69\n", "10504.859 -39.08 4.65\n", "10536.845 -22.46 5.18\n", "10537.842 -22.83 4.16\n", "10563.673 -17.47 4.03\n", "10579.697 -11.01 3.84\n", "10610.719 -8.67 3.52\n", "10793.957 37.00 3.78\n", "10795.039 41.85 4.80\n", "10978.684 36.42 5.01\n", "11131.066 13.56 6.61\n", "11175.027 -3.74 8.17\n", "11242.842 -21.85 5.43\n", "11303.712 -48.75 4.63\n", "11508.070 -51.65 8.37\n", "11536.064 -72.44 4.73\n", "11540.999 -57.58 5.97\n", "11607.916 -43.94 4.94\n", "11626.771 -39.14 7.03\n", "11627.754 -50.88 6.21\n", "11628.727 -51.52 5.87\n", "11629.832 -51.86 4.60\n", "11700.693 -24.58 5.20\n", "11861.049 14.64 5.33\n", "11874.068 14.15 5.75\n", "11881.045 18.02 4.15\n", "11895.068 16.96 4.60\n", "11906.014 11.73 4.07\n", "11907.011 22.83 4.38\n", "11909.042 23.42 3.78\n", "11910.955 18.34 4.33\n", "11914.067 15.45 5.37\n", "11915.048 24.05 3.82\n", "11916.033 23.16 3.67\n", "11939.969 27.53 5.08\n", "11946.960 21.44 4.18\n", "11969.902 30.99 4.58\n", "11971.894 38.36 5.01\n", "11998.779 33.82 3.93\n", "11999.820 27.52 3.98\n", "12000.858 23.40 4.07\n", "12028.740 37.08 4.95\n", "12033.746 26.28 5.24\n", "12040.759 31.12 3.54\n", "12041.719 34.04 3.45\n", "12042.695 31.38 3.98\n", "12073.723 21.81 4.73" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But first, we'll copy some relevant material from the first notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preliminaries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn; seaborn.set() #nice plot formatting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Radial Velocity Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from collections import namedtuple\n", "params = namedtuple('params', ['T', 'e', 'K', 'V', 'omega', 'chi', 's'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from scipy import optimize\n", "\n", "@np.vectorize\n", "def compute_E(M, e):\n", " \"\"\"Solve Kepler's eqns for eccentric anomaly given mean anomaly\"\"\"\n", " f = lambda E, M=M, e=e: E - e * np.sin(E) - M\n", " return optimize.brentq(f, 0, 2 * np.pi)\n", "\n", "\n", "def radial_velocity(t, theta):\n", " \"\"\"Compute radial velocity given orbital parameters\"\"\"\n", " T, e, K, V, omega, chi = theta[:6]\n", " \n", " # compute mean anomaly (0 <= M < 2pi)\n", " M = 2 * np.pi * ((t / T + chi) % 1)\n", " \n", " # solve for eccentric anomaly\n", " E = compute_E(M, e)\n", " \n", " # compute true anomaly\n", " f = 2 * np.arctan2(np.sqrt(1 + e) * np.sin(E / 2),\n", " np.sqrt(1 - e) * np.cos(E / 2))\n", " \n", " # compute radial velocity\n", " return V - K * (np.sin(f + omega) + e * np.sin(omega))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Prior, Likelihood, and Posterior" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "theta_lim = params(T=(0.2, 2000),\n", " e=(0, 1),\n", " K=(0.01, 2000),\n", " V=(-2000, 2000),\n", " omega=(0, 2 * np.pi),\n", " chi=(0, 1),\n", " s=(0.001, 100))\n", "theta_min, theta_max = map(np.array, zip(*theta_lim))\n", "\n", "def log_prior(theta):\n", " if np.any(theta < theta_min) or np.any(theta > theta_max):\n", " return -np.inf # log(0)\n", " \n", " # Jeffreys Prior on T, K, and s\n", " return -np.sum(np.log(theta[[0, 2, 6]]))\n", "\n", "def log_likelihood(theta, t, rv, rv_err):\n", " sq_err = rv_err ** 2 + theta[6] ** 2\n", " rv_model = radial_velocity(t, theta)\n", " return -0.5 * np.sum(np.log(sq_err) + (rv - rv_model) ** 2 / sq_err)\n", "\n", "def log_posterior(theta, t, rv, rv_err):\n", " ln_prior = log_prior(theta)\n", " if np.isinf(ln_prior):\n", " return ln_prior\n", " else:\n", " return ln_prior + log_likelihood(theta, t, rv, rv_err)\n", " \n", "def make_starting_guess(t, rv, rv_err):\n", " model = LombScargleFast()\n", " model.optimizer.set(period_range=theta_lim.T,\n", " quiet=True)\n", " model.fit(t, rv, rv_err)\n", "\n", " rv_range = 0.5 * (np.max(rv) - np.min(rv))\n", " rv_center = np.mean(rv)\n", " return params(T=model.best_period,\n", " e=0.1,\n", " K=rv_range,\n", " V=rv_center,\n", " omega=np.pi,\n", " chi=0.5,\n", " s=rv_err.mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After running the above command, your data will be in a file called ``47UrsaeMajoris.txt``.\n", "\n", "An easy way to load data in this form is the [pandas](http://pandas.pydata.org) package, which implements a DataFrame object (basically, a labeled data table).\n", "Reading the CSV file is a one-line operation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "data = pd.read_csv('47UrsaeMajoris.txt', delim_whitespace=True)\n", "t, rv, rv_err = data.values.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize the Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Start by Visualizing the Data\n", "plt.errorbar(t, rv, rv_err, fmt='.');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute the Periodogram" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Compute the periodogram to look for significant periodicity\n", "\n", "from gatspy.periodic import LombScargleFast\n", "model = LombScargleFast()\n", "\n", "model.fit(t, rv, rv_err)\n", "periods, power = model.periodogram_auto()\n", "plt.semilogx(periods, power)\n", "plt.xlim(0, 10000);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialize and run the MCMC Chain" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "theta_guess = make_starting_guess(t, rv, rv_err)\n", "theta_guess" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import emcee\n", "\n", "ndim = len(theta_guess) # number of parameters in the model\n", "nwalkers = 50 # number of MCMC walkers\n", "\n", "# start with a tight distribution of theta around the initial guess\n", "rng = np.random.RandomState(42)\n", "starting_guesses = theta_guess * (1 + 0.1 * rng.randn(nwalkers, ndim))\n", "\n", "# create the sampler; fix the random state for replicability\n", "sampler = emcee.EnsembleSampler(nwalkers, ndim, log_posterior, args=(t, rv, rv_err))\n", "sampler.random_state = rng\n", "\n", "# time and run the MCMC\n", "%time pos, prob, state = sampler.run_mcmc(starting_guesses, 1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot the chains: have they stabilized?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def plot_chains(sampler):\n", " fig, ax = plt.subplots(7, figsize=(8, 10), sharex=True)\n", " for i in range(7):\n", " ax[i].plot(sampler.chain[:, :, i].T, '-k', alpha=0.2);\n", " ax[i].set_ylabel(params._fields[i])\n", "\n", "plot_chains(sampler)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sampler.reset()\n", "%time pos, prob, state = sampler.run_mcmc(pos, 1000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plot_chains(sampler)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a Corner Plot for the Parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import corner\n", "corner.corner(sampler.flatchain[:, :2], \n", " labels=params._fields[:2]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot the Model Fits over the Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "t_fit = np.linspace(t.min(), t.max(), 1000)\n", "rv_fit = [radial_velocity(t_fit, sampler.flatchain[i])\n", " for i in rng.choice(sampler.flatchain.shape[0], 200)]\n", "\n", "plt.figure(figsize=(14, 6))\n", "plt.errorbar(t, rv, rv_err, fmt='.k')\n", "plt.plot(t_fit, np.transpose(rv_fit), '-k', alpha=0.01)\n", "plt.xlabel('time (days)')\n", "plt.ylabel('radial velocity (km/s)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Results: Report your (joint) uncertainties on period and eccentricity" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mean = sampler.flatchain.mean(0)\n", "std = sampler.flatchain.std(0)\n", "\n", "print(\"Period = {0:.0f} +/- {1:.0f} days\".format(mean[0], std[0]))\n", "print(\"eccentricity = {0:.2f} +/- {1:.2f}\".format(mean[1], std[1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, a simple mean and standard deviation is probably not the best summary of the data, because the posterior has multiple modes. We could isolate the strongest mode and then get a better estimate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "chain = sampler.flatchain[sampler.flatchain[:, 0] > 1050]\n", "mean = chain.mean(0)\n", "std = chain.std(0)\n", "\n", "print(\"Period = {0:.0f} +/- {1:.0f} days\".format(mean[0], std[0]))\n", "print(\"eccentricity = {0:.2f} +/- {1:.2f}\".format(mean[1], std[1]))\n", "corner.corner(chain[:, :2], \n", " labels=params._fields[:2]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a better representation of the parameter estimates for the most prominant peak." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extra Credit\n", "\n", "If you finish early, try tackling this...\n", "\n", "One reason we see multiple modes in the posterior is that there are actually multiple planets in the system! For example, the source of the above data is a paper which actually reports *two* detected planets.\n", "\n", "Build a Bayesian model which models two planets simultaneously: can you find signals from both planets in the data?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }