{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "099b2332",
   "metadata": {},
   "source": [
    "(joint)=\n",
    "\n",
    "# Joint RV & transit fits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bce60221-8b9e-4049-8a2d-1a3b560d59af",
   "metadata": {},
   "outputs": [],
   "source": [
    "import exoplanet\n",
    "\n",
    "exoplanet.utils.docs_setup()\n",
    "print(f\"exoplanet.__version__ = '{exoplanet.__version__}'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a11901fe",
   "metadata": {},
   "source": [
    "In this tutorial, we will combine many of the previous tutorials to perform a fit of the K2-24 system using the K2 transit data and the RVs from [Petigura et al. (2016)](https://arxiv.org/abs/1511.04497).\n",
    "This is the same system that we fit in {ref}`rv` and we'll combine that model with the transit model from {ref}`transit` and the Gaussian Process noise model from {ref}`stellar-variability`.\n",
    "\n",
    "## Datasets and initializations\n",
    "\n",
    "To get started, let's download the relevant datasets.\n",
    "First, the transit light curve from [Everest](https://rodluger.github.io/everest/):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1c497b4a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from astropy.io import fits\n",
    "from scipy.signal import savgol_filter\n",
    "\n",
    "# Download the data\n",
    "lc_url = \"https://archive.stsci.edu/hlsps/everest/v2/c02/203700000/71098/hlsp_everest_k2_llc_203771098-c02_kepler_v2.0_lc.fits\"\n",
    "with fits.open(lc_url) as hdus:\n",
    "    lc = hdus[1].data\n",
    "    lc_hdr = hdus[1].header\n",
    "\n",
    "# Work out the exposure time\n",
    "texp = lc_hdr[\"FRAMETIM\"] * lc_hdr[\"NUM_FRM\"]\n",
    "texp /= 60.0 * 60.0 * 24.0\n",
    "\n",
    "# Mask bad data\n",
    "m = (\n",
    "    (np.arange(len(lc)) > 100)\n",
    "    & np.isfinite(lc[\"FLUX\"])\n",
    "    & np.isfinite(lc[\"TIME\"])\n",
    ")\n",
    "bad_bits = [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17]\n",
    "qual = lc[\"QUALITY\"]\n",
    "for b in bad_bits:\n",
    "    m &= qual & 2 ** (b - 1) == 0\n",
    "\n",
    "# Convert to parts per thousand\n",
    "x = lc[\"TIME\"][m]\n",
    "y = lc[\"FLUX\"][m]\n",
    "mu = np.median(y)\n",
    "y = (y / mu - 1) * 1e3\n",
    "\n",
    "# Identify outliers\n",
    "m = np.ones(len(y), dtype=bool)\n",
    "for i in range(10):\n",
    "    y_prime = np.interp(x, x[m], y[m])\n",
    "    smooth = savgol_filter(y_prime, 101, polyorder=3)\n",
    "    resid = y - smooth\n",
    "    sigma = np.sqrt(np.mean(resid**2))\n",
    "    m0 = np.abs(resid) < 3 * sigma\n",
    "    if m.sum() == m0.sum():\n",
    "        m = m0\n",
    "        break\n",
    "    m = m0\n",
    "\n",
    "# Only discard positive outliers\n",
    "m = resid < 3 * sigma\n",
    "\n",
    "# Shift the data so that the K2 data start at t=0. This tends to make the fit\n",
    "# better behaved since t0 covaries with period.\n",
    "x_ref = np.min(x[m])\n",
    "x -= x_ref\n",
    "\n",
    "# Plot the data\n",
    "plt.plot(x, y, \"k\", label=\"data\")\n",
    "plt.plot(x, smooth)\n",
    "plt.plot(x[~m], y[~m], \"xr\", label=\"outliers\")\n",
    "plt.legend(fontsize=12)\n",
    "plt.xlim(x.min(), x.max())\n",
    "plt.xlabel(\"time\")\n",
    "plt.ylabel(\"flux\")\n",
    "\n",
    "# Make sure that the data type is consistent\n",
    "x = np.ascontiguousarray(x[m], dtype=np.float64)\n",
    "y = np.ascontiguousarray(y[m], dtype=np.float64)\n",
    "smooth = np.ascontiguousarray(smooth[m], dtype=np.float64)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcc719d3",
   "metadata": {},
   "source": [
    "Then the RVs from [RadVel](https://radvel.readthedocs.io):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f71b0e0f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "url = \"https://raw.githubusercontent.com/California-Planet-Search/radvel/master/example_data/epic203771098.csv\"\n",
    "data = pd.read_csv(url, index_col=0)\n",
    "\n",
    "# Don't forget to remove the time offset from above!\n",
    "x_rv = np.array(data.t) - x_ref\n",
    "y_rv = np.array(data.vel)\n",
    "yerr_rv = np.array(data.errvel)\n",
    "\n",
    "plt.errorbar(x_rv, y_rv, yerr=yerr_rv, fmt=\".k\")\n",
    "plt.xlabel(\"time [days]\")\n",
    "_ = plt.ylabel(\"radial velocity [m/s]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3c2eb34",
   "metadata": {},
   "source": [
    "We can initialize the transit parameters using [the box least squares periodogram from AstroPy](http://docs.astropy.org/en/latest/timeseries/bls.html).\n",
    "(Note: you'll need AstroPy v3.1 or more recent to use this feature.)\n",
    "A full discussion of transit detection and vetting is beyond the scope of this tutorial so let's assume that we know that there are two periodic transiting planets in this dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3721d715",
   "metadata": {},
   "outputs": [],
   "source": [
    "from astropy.timeseries import BoxLeastSquares\n",
    "\n",
    "m = np.zeros(len(x), dtype=bool)\n",
    "period_grid = np.exp(np.linspace(np.log(5), np.log(50), 50000))\n",
    "bls_results = []\n",
    "periods = []\n",
    "t0s = []\n",
    "depths = []\n",
    "\n",
    "# Compute the periodogram for each planet by iteratively masking out\n",
    "# transits from the higher signal to noise planets. Here we're assuming\n",
    "# that we know that there are exactly two planets.\n",
    "for i in range(2):\n",
    "    bls = BoxLeastSquares(x[~m], y[~m] - smooth[~m])\n",
    "    bls_power = bls.power(period_grid, 0.1, oversample=20)\n",
    "    bls_results.append(bls_power)\n",
    "\n",
    "    # Save the highest peak as the planet candidate\n",
    "    index = np.argmax(bls_power.power)\n",
    "    periods.append(bls_power.period[index])\n",
    "    t0s.append(bls_power.transit_time[index])\n",
    "    depths.append(bls_power.depth[index])\n",
    "\n",
    "    # Mask the data points that are in transit for this candidate\n",
    "    m |= bls.transit_mask(x, periods[-1], 0.5, t0s[-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c82979e",
   "metadata": {},
   "source": [
    "Let's plot the initial transit estimates based on these periodograms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b1f176c",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(len(bls_results), 2, figsize=(15, 10))\n",
    "\n",
    "for i in range(len(bls_results)):\n",
    "    # Plot the periodogram\n",
    "    ax = axes[i, 0]\n",
    "    ax.axvline(np.log10(periods[i]), color=\"C1\", lw=5, alpha=0.8)\n",
    "    ax.plot(np.log10(bls_results[i].period), bls_results[i].power, \"k\")\n",
    "    ax.annotate(\n",
    "        \"period = {0:.4f} d\".format(periods[i]),\n",
    "        (0, 1),\n",
    "        xycoords=\"axes fraction\",\n",
    "        xytext=(5, -5),\n",
    "        textcoords=\"offset points\",\n",
    "        va=\"top\",\n",
    "        ha=\"left\",\n",
    "        fontsize=12,\n",
    "    )\n",
    "    ax.set_ylabel(\"bls power\")\n",
    "    ax.set_yticks([])\n",
    "    ax.set_xlim(np.log10(period_grid.min()), np.log10(period_grid.max()))\n",
    "    if i < len(bls_results) - 1:\n",
    "        ax.set_xticklabels([])\n",
    "    else:\n",
    "        ax.set_xlabel(\"log10(period)\")\n",
    "\n",
    "    # Plot the folded transit\n",
    "    ax = axes[i, 1]\n",
    "    p = periods[i]\n",
    "    x_fold = (x - t0s[i] + 0.5 * p) % p - 0.5 * p\n",
    "    m = np.abs(x_fold) < 0.4\n",
    "    ax.plot(x_fold[m], y[m] - smooth[m], \".k\")\n",
    "\n",
    "    # Overplot the phase binned light curve\n",
    "    bins = np.linspace(-0.41, 0.41, 32)\n",
    "    denom, _ = np.histogram(x_fold, bins)\n",
    "    num, _ = np.histogram(x_fold, bins, weights=y - smooth)\n",
    "    denom[num == 0] = 1.0\n",
    "    ax.plot(0.5 * (bins[1:] + bins[:-1]), num / denom, color=\"C1\")\n",
    "\n",
    "    ax.set_xlim(-0.4, 0.4)\n",
    "    ax.set_ylabel(\"relative flux [ppt]\")\n",
    "    if i < len(bls_results) - 1:\n",
    "        ax.set_xticklabels([])\n",
    "    else:\n",
    "        ax.set_xlabel(\"time since transit\")\n",
    "\n",
    "_ = fig.subplots_adjust(hspace=0.02)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af40df2b",
   "metadata": {},
   "source": [
    "The discovery paper for K2-24 ([Petigura et al. (2016)](https://arxiv.org/abs/1511.04497)) includes the following estimates of the stellar mass and radius in Solar units:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8890b6d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "M_star_petigura = 1.12, 0.05\n",
    "R_star_petigura = 1.21, 0.11"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e2904ad",
   "metadata": {},
   "source": [
    "Finally, using this stellar mass, we can also estimate the minimum masses of the planets given these transit parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ee60780",
   "metadata": {},
   "outputs": [],
   "source": [
    "import exoplanet as xo\n",
    "import astropy.units as u\n",
    "\n",
    "msini = xo.estimate_minimum_mass(\n",
    "    periods, x_rv, y_rv, yerr_rv, t0s=t0s, m_star=M_star_petigura[0]\n",
    ")\n",
    "msini = msini.to(u.M_earth)\n",
    "print(msini)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c376919",
   "metadata": {},
   "source": [
    "## A joint transit and radial velocity model in PyMC3\n",
    "\n",
    "Now, let's define our full model in *PyMC3*.\n",
    "There's a lot going on here, but I've tried to comment it and most of it should be familiar from the other tutorials and case studies.\n",
    "In this case, I've put the model inside a model \"factory\" function because we'll do some sigma clipping below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8000897c",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "import pymc3 as pm\n",
    "import aesara_theano_fallback.tensor as tt\n",
    "\n",
    "import pymc3_ext as pmx\n",
    "from celerite2.theano import terms, GaussianProcess\n",
    "\n",
    "# These arrays are used as the times/phases where the models are\n",
    "# evaluated at higher resolution for plotting purposes\n",
    "t_rv = np.linspace(x_rv.min() - 5, x_rv.max() + 5, 500)\n",
    "phase_lc = np.linspace(-0.3, 0.3, 100)\n",
    "\n",
    "\n",
    "def build_model(mask=None, start=None):\n",
    "    if mask is None:\n",
    "        mask = np.ones(len(x), dtype=bool)\n",
    "    with pm.Model() as model:\n",
    "        # Parameters for the stellar properties\n",
    "        mean_flux = pm.Normal(\"mean_flux\", mu=0.0, sd=10.0)\n",
    "        u_star = xo.QuadLimbDark(\"u_star\")\n",
    "        star = xo.LimbDarkLightCurve(u_star)\n",
    "        BoundedNormal = pm.Bound(pm.Normal, lower=0, upper=3)\n",
    "        m_star = BoundedNormal(\n",
    "            \"m_star\", mu=M_star_petigura[0], sd=M_star_petigura[1]\n",
    "        )\n",
    "        r_star = BoundedNormal(\n",
    "            \"r_star\", mu=R_star_petigura[0], sd=R_star_petigura[1]\n",
    "        )\n",
    "\n",
    "        # Orbital parameters for the planets\n",
    "        t0 = pm.Normal(\"t0\", mu=np.array(t0s), sd=1, shape=2)\n",
    "        log_m_pl = pm.Normal(\"log_m_pl\", mu=np.log(msini.value), sd=1, shape=2)\n",
    "        log_period = pm.Normal(\"log_period\", mu=np.log(periods), sd=1, shape=2)\n",
    "\n",
    "        # Fit in terms of transit depth (assuming b<1)\n",
    "        b = pm.Uniform(\"b\", lower=0, upper=1, shape=2)\n",
    "        log_depth = pm.Normal(\n",
    "            \"log_depth\", mu=np.log(depths), sigma=2.0, shape=2\n",
    "        )\n",
    "        ror = pm.Deterministic(\n",
    "            \"ror\",\n",
    "            star.get_ror_from_approx_transit_depth(\n",
    "                1e-3 * tt.exp(log_depth), b\n",
    "            ),\n",
    "        )\n",
    "        r_pl = pm.Deterministic(\"r_pl\", ror * r_star)\n",
    "\n",
    "        m_pl = pm.Deterministic(\"m_pl\", tt.exp(log_m_pl))\n",
    "        period = pm.Deterministic(\"period\", tt.exp(log_period))\n",
    "\n",
    "        ecs = pmx.UnitDisk(\"ecs\", shape=(2, 2), testval=0.01 * np.ones((2, 2)))\n",
    "        ecc = pm.Deterministic(\"ecc\", tt.sum(ecs**2, axis=0))\n",
    "        omega = pm.Deterministic(\"omega\", tt.arctan2(ecs[1], ecs[0]))\n",
    "        xo.eccentricity.vaneylen19(\n",
    "            \"ecc_prior\", multi=True, shape=2, fixed=True, observed=ecc\n",
    "        )\n",
    "\n",
    "        # RV jitter & a quadratic RV trend\n",
    "        log_sigma_rv = pm.Normal(\n",
    "            \"log_sigma_rv\", mu=np.log(np.median(yerr_rv)), sd=5\n",
    "        )\n",
    "        trend = pm.Normal(\n",
    "            \"trend\", mu=0, sd=10.0 ** -np.arange(3)[::-1], shape=3\n",
    "        )\n",
    "\n",
    "        # Transit jitter & GP parameters\n",
    "        log_sigma_lc = pm.Normal(\n",
    "            \"log_sigma_lc\", mu=np.log(np.std(y[mask])), sd=10\n",
    "        )\n",
    "        log_rho_gp = pm.Normal(\"log_rho_gp\", mu=0.0, sd=10)\n",
    "        log_sigma_gp = pm.Normal(\n",
    "            \"log_sigma_gp\", mu=np.log(np.std(y[mask])), sd=10\n",
    "        )\n",
    "\n",
    "        # Orbit models\n",
    "        orbit = xo.orbits.KeplerianOrbit(\n",
    "            r_star=r_star,\n",
    "            m_star=m_star,\n",
    "            period=period,\n",
    "            t0=t0,\n",
    "            b=b,\n",
    "            m_planet=xo.units.with_unit(m_pl, msini.unit),\n",
    "            ecc=ecc,\n",
    "            omega=omega,\n",
    "        )\n",
    "\n",
    "        # Compute the model light curve\n",
    "        light_curves = (\n",
    "            star.get_light_curve(orbit=orbit, r=r_pl, t=x[mask], texp=texp)\n",
    "            * 1e3\n",
    "        )\n",
    "        light_curve = pm.math.sum(light_curves, axis=-1) + mean_flux\n",
    "        resid = y[mask] - light_curve\n",
    "\n",
    "        # GP model for the light curve\n",
    "        kernel = terms.SHOTerm(\n",
    "            sigma=tt.exp(log_sigma_gp),\n",
    "            rho=tt.exp(log_rho_gp),\n",
    "            Q=1 / np.sqrt(2),\n",
    "        )\n",
    "        gp = GaussianProcess(kernel, t=x[mask], yerr=tt.exp(log_sigma_lc))\n",
    "        gp.marginal(\"transit_obs\", observed=resid)\n",
    "\n",
    "        # And then include the RVs as in the RV tutorial\n",
    "        x_rv_ref = 0.5 * (x_rv.min() + x_rv.max())\n",
    "\n",
    "        def get_rv_model(t, name=\"\"):\n",
    "            # First the RVs induced by the planets\n",
    "            vrad = orbit.get_radial_velocity(t)\n",
    "            pm.Deterministic(\"vrad\" + name, vrad)\n",
    "\n",
    "            # Define the background model\n",
    "            A = np.vander(t - x_rv_ref, 3)\n",
    "            bkg = pm.Deterministic(\"bkg\" + name, tt.dot(A, trend))\n",
    "\n",
    "            # Sum over planets and add the background to get the full model\n",
    "            return pm.Deterministic(\n",
    "                \"rv_model\" + name, tt.sum(vrad, axis=-1) + bkg\n",
    "            )\n",
    "\n",
    "        # Define the model\n",
    "        rv_model = get_rv_model(x_rv)\n",
    "        get_rv_model(t_rv, name=\"_pred\")\n",
    "\n",
    "        # The likelihood for the RVs\n",
    "        err = tt.sqrt(yerr_rv**2 + tt.exp(2 * log_sigma_rv))\n",
    "        pm.Normal(\"obs\", mu=rv_model, sd=err, observed=y_rv)\n",
    "\n",
    "        # Compute and save the phased light curve models\n",
    "        pm.Deterministic(\n",
    "            \"lc_pred\",\n",
    "            1e3\n",
    "            * tt.stack(\n",
    "                [\n",
    "                    star.get_light_curve(\n",
    "                        orbit=orbit, r=r_pl, t=t0[n] + phase_lc, texp=texp\n",
    "                    )[..., n]\n",
    "                    for n in range(2)\n",
    "                ],\n",
    "                axis=-1,\n",
    "            ),\n",
    "        )\n",
    "\n",
    "        # Fit for the maximum a posteriori parameters, I've found that I can get\n",
    "        # a better solution by trying different combinations of parameters in turn\n",
    "        if start is None:\n",
    "            start = model.test_point\n",
    "        map_soln = pmx.optimize(start=start, vars=[trend])\n",
    "        map_soln = pmx.optimize(start=map_soln, vars=[log_sigma_lc])\n",
    "        map_soln = pmx.optimize(start=map_soln, vars=[log_depth, b])\n",
    "        map_soln = pmx.optimize(start=map_soln, vars=[log_period, t0])\n",
    "        map_soln = pmx.optimize(\n",
    "            start=map_soln, vars=[log_sigma_lc, log_sigma_gp]\n",
    "        )\n",
    "        map_soln = pmx.optimize(start=map_soln, vars=[log_rho_gp])\n",
    "        map_soln = pmx.optimize(start=map_soln)\n",
    "\n",
    "        extras = dict(\n",
    "            zip(\n",
    "                [\"light_curves\", \"gp_pred\"],\n",
    "                pmx.eval_in_model([light_curves, gp.predict(resid)], map_soln),\n",
    "            )\n",
    "        )\n",
    "\n",
    "    return model, map_soln, extras\n",
    "\n",
    "\n",
    "model0, map_soln0, extras0 = build_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7d02ab9",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "source": [
    "Now let's plot the map radial velocity model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b78425b5",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "def plot_rv_curve(soln):\n",
    "    fig, axes = plt.subplots(2, 1, figsize=(10, 5), sharex=True)\n",
    "\n",
    "    ax = axes[0]\n",
    "    ax.errorbar(x_rv, y_rv, yerr=yerr_rv, fmt=\".k\")\n",
    "    ax.plot(t_rv, soln[\"vrad_pred\"], \"--k\", alpha=0.5)\n",
    "    ax.plot(t_rv, soln[\"bkg_pred\"], \":k\", alpha=0.5)\n",
    "    ax.plot(t_rv, soln[\"rv_model_pred\"], label=\"model\")\n",
    "    ax.legend(fontsize=10)\n",
    "    ax.set_ylabel(\"radial velocity [m/s]\")\n",
    "\n",
    "    ax = axes[1]\n",
    "    err = np.sqrt(yerr_rv**2 + np.exp(2 * soln[\"log_sigma_rv\"]))\n",
    "    ax.errorbar(x_rv, y_rv - soln[\"rv_model\"], yerr=err, fmt=\".k\")\n",
    "    ax.axhline(0, color=\"k\", lw=1)\n",
    "    ax.set_ylabel(\"residuals [m/s]\")\n",
    "    ax.set_xlim(t_rv.min(), t_rv.max())\n",
    "    ax.set_xlabel(\"time [days]\")\n",
    "\n",
    "\n",
    "_ = plot_rv_curve(map_soln0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ade1ac5b",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "source": [
    "That looks pretty similar to what we got in {ref}`rv`.\n",
    "Now let's also plot the transit model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58b3c7fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_light_curve(soln, extras, mask=None):\n",
    "    if mask is None:\n",
    "        mask = np.ones(len(x), dtype=bool)\n",
    "\n",
    "    fig, axes = plt.subplots(3, 1, figsize=(10, 7), sharex=True)\n",
    "\n",
    "    ax = axes[0]\n",
    "    ax.plot(x[mask], y[mask], \"k\", label=\"data\")\n",
    "    gp_mod = extras[\"gp_pred\"] + soln[\"mean_flux\"]\n",
    "    ax.plot(x[mask], gp_mod, color=\"C2\", label=\"gp model\")\n",
    "    ax.legend(fontsize=10)\n",
    "    ax.set_ylabel(\"relative flux [ppt]\")\n",
    "\n",
    "    ax = axes[1]\n",
    "    ax.plot(x[mask], y[mask] - gp_mod, \"k\", label=\"de-trended data\")\n",
    "    for i, l in enumerate(\"bc\"):\n",
    "        mod = extras[\"light_curves\"][:, i]\n",
    "        ax.plot(x[mask], mod, label=\"planet {0}\".format(l))\n",
    "    ax.legend(fontsize=10, loc=3)\n",
    "    ax.set_ylabel(\"de-trended flux [ppt]\")\n",
    "\n",
    "    ax = axes[2]\n",
    "    mod = gp_mod + np.sum(extras[\"light_curves\"], axis=-1)\n",
    "    ax.plot(x[mask], y[mask] - mod, \"k\")\n",
    "    ax.axhline(0, color=\"#aaaaaa\", lw=1)\n",
    "    ax.set_ylabel(\"residuals [ppt]\")\n",
    "    ax.set_xlim(x[mask].min(), x[mask].max())\n",
    "    ax.set_xlabel(\"time [days]\")\n",
    "\n",
    "    return fig\n",
    "\n",
    "\n",
    "_ = plot_light_curve(map_soln0, extras0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a81495b",
   "metadata": {},
   "source": [
    "There are still a few outliers in the light curve and it can be useful to remove those before doing the full fit because both the GP and transit parameters can be sensitive to this.\n",
    "\n",
    "## Sigma clipping\n",
    "\n",
    "To remove the outliers, we'll look at the empirical RMS of the residuals away from the GP + transit model and remove anything that is more than a 7-sigma outlier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b96bbd27",
   "metadata": {},
   "outputs": [],
   "source": [
    "mod = (\n",
    "    extras0[\"gp_pred\"]\n",
    "    + map_soln0[\"mean_flux\"]\n",
    "    + np.sum(extras0[\"light_curves\"], axis=-1)\n",
    ")\n",
    "resid = y - mod\n",
    "rms = np.sqrt(np.median(resid**2))\n",
    "mask = np.abs(resid) < 7 * rms\n",
    "\n",
    "plt.plot(x, resid, \"k\", label=\"data\")\n",
    "plt.plot(x[~mask], resid[~mask], \"xr\", label=\"outliers\")\n",
    "plt.axhline(0, color=\"#aaaaaa\", lw=1)\n",
    "plt.ylabel(\"residuals [ppt]\")\n",
    "plt.xlabel(\"time [days]\")\n",
    "plt.legend(fontsize=12, loc=4)\n",
    "_ = plt.xlim(x.min(), x.max())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e3fa865",
   "metadata": {},
   "source": [
    "That looks better. Let's re-build our model with this sigma-clipped dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b97999f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "model, map_soln, extras = build_model(mask, map_soln0)\n",
    "_ = plot_light_curve(map_soln, extras, mask)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2f1a0a3",
   "metadata": {},
   "source": [
    "Great! Now we're ready to sample.\n",
    "\n",
    "## Sampling\n",
    "\n",
    "The sampling for this model is the same as for all the previous tutorials, but it takes a bit longer.\n",
    "This is partly because the model is more expensive to compute than the previous ones and partly because there are some non-affine degeneracies in the problem (for example between impact parameter, eccentricity, and radius/radius ratio).\n",
    "It might be worth thinking about reparameterizations (in terms of duration instead of eccentricity), but that's beyond the scope of this tutorial.\n",
    "Besides, using more traditional MCMC methods, this would have taken a lot longer to get thousands of effective samples!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3e54906",
   "metadata": {},
   "outputs": [],
   "source": [
    "import multiprocessing\n",
    "\n",
    "with model:\n",
    "    trace = pm.sample(\n",
    "        tune=1500,\n",
    "        draws=1000,\n",
    "        start=map_soln,\n",
    "        cores=2,\n",
    "        chains=2,\n",
    "        target_accept=0.95,\n",
    "        return_inferencedata=True,\n",
    "        random_seed=[203771098, 203775000],\n",
    "        mp_ctx=multiprocessing.get_context(\"fork\"),\n",
    "        init=\"adapt_full\",\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28ba74b7",
   "metadata": {},
   "source": [
    "Let's look at the convergence diagnostics for some of the key parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1060607",
   "metadata": {},
   "outputs": [],
   "source": [
    "import arviz as az\n",
    "\n",
    "az.summary(\n",
    "    trace,\n",
    "    var_names=[\n",
    "        \"period\",\n",
    "        \"r_pl\",\n",
    "        \"m_pl\",\n",
    "        \"ecc\",\n",
    "        \"omega\",\n",
    "        \"b\",\n",
    "        \"log_sigma_gp\",\n",
    "        \"log_rho_gp\",\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35477ada",
   "metadata": {},
   "source": [
    "As you see, the effective number of samples for the impact parameters and eccentricites are lower than for the other parameters.\n",
    "This is because of the correlations that I mentioned above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad4cba3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import corner\n",
    "\n",
    "_ = corner.corner(trace, var_names=[\"b\", \"ecc\", \"r_pl\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02b667de",
   "metadata": {},
   "source": [
    "## Phase plots\n",
    "\n",
    "Finally, we can make folded plots of the transits and the radial velocities and compare to the posterior model predictions. (Note: planets b and c in this tutorial are swapped compared to the labels from [Petigura et al. (2016)](https://arxiv.org/abs/1511.04497))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d7c8beec",
   "metadata": {},
   "outputs": [],
   "source": [
    "flat_samps = trace.posterior.stack(sample=(\"chain\", \"draw\"))\n",
    "gp_mod = extras[\"gp_pred\"] + map_soln[\"mean_flux\"]\n",
    "\n",
    "for n, letter in enumerate(\"bc\"):\n",
    "    plt.figure()\n",
    "\n",
    "    # Get the posterior median orbital parameters\n",
    "    p = np.median(flat_samps[\"period\"][n])\n",
    "    t0 = np.median(flat_samps[\"t0\"][n])\n",
    "\n",
    "    # Plot the folded data\n",
    "    x_fold = (x[mask] - t0 + 0.5 * p) % p - 0.5 * p\n",
    "    m = np.abs(x_fold) < 0.3\n",
    "    plt.plot(\n",
    "        x_fold[m], y[mask][m] - gp_mod[m], \".k\", label=\"data\", zorder=-1000\n",
    "    )\n",
    "\n",
    "    # Plot the folded model\n",
    "    pred = np.percentile(flat_samps[\"lc_pred\"][:, n, :], [16, 50, 84], axis=-1)\n",
    "    plt.plot(phase_lc, pred[1], color=\"C1\", label=\"model\")\n",
    "    art = plt.fill_between(\n",
    "        phase_lc, pred[0], pred[2], color=\"C1\", alpha=0.5, zorder=1000\n",
    "    )\n",
    "    art.set_edgecolor(\"none\")\n",
    "\n",
    "    # Annotate the plot with the planet's period\n",
    "    txt = \"period = {0:.4f} +/- {1:.4f} d\".format(\n",
    "        np.mean(flat_samps[\"period\"][n].values),\n",
    "        np.std(flat_samps[\"period\"][n].values),\n",
    "    )\n",
    "    plt.annotate(\n",
    "        txt,\n",
    "        (0, 0),\n",
    "        xycoords=\"axes fraction\",\n",
    "        xytext=(5, 5),\n",
    "        textcoords=\"offset points\",\n",
    "        ha=\"left\",\n",
    "        va=\"bottom\",\n",
    "        fontsize=12,\n",
    "    )\n",
    "\n",
    "    plt.legend(fontsize=10, loc=4)\n",
    "    plt.xlabel(\"time since transit [days]\")\n",
    "    plt.ylabel(\"de-trended flux\")\n",
    "    plt.title(\"K2-24{0}\".format(letter))\n",
    "    plt.xlim(-0.3, 0.3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c13287ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "for n, letter in enumerate(\"bc\"):\n",
    "    plt.figure()\n",
    "\n",
    "    # Get the posterior median orbital parameters\n",
    "    p = np.median(flat_samps[\"period\"][n])\n",
    "    t0 = np.median(flat_samps[\"t0\"][n])\n",
    "\n",
    "    # Compute the median of posterior estimate of the background RV\n",
    "    # and the contribution from the other planet. Then we can remove\n",
    "    # this from the data to plot just the planet we care about.\n",
    "    other = np.median(flat_samps[\"vrad\"][:, (n + 1) % 2], axis=-1)\n",
    "    other += np.median(flat_samps[\"bkg\"], axis=-1)\n",
    "\n",
    "    # Plot the folded data\n",
    "    x_fold = (x_rv - t0 + 0.5 * p) % p - 0.5 * p\n",
    "    plt.errorbar(x_fold, y_rv - other, yerr=yerr_rv, fmt=\".k\", label=\"data\")\n",
    "\n",
    "    # Compute the posterior prediction for the folded RV model for this\n",
    "    # planet\n",
    "    t_fold = (t_rv - t0 + 0.5 * p) % p - 0.5 * p\n",
    "    inds = np.argsort(t_fold)\n",
    "    pred = np.percentile(\n",
    "        flat_samps[\"vrad_pred\"][inds, n], [16, 50, 84], axis=-1\n",
    "    )\n",
    "    plt.plot(t_fold[inds], pred[1], color=\"C1\", label=\"model\")\n",
    "    art = plt.fill_between(\n",
    "        t_fold[inds], pred[0], pred[2], color=\"C1\", alpha=0.3\n",
    "    )\n",
    "    art.set_edgecolor(\"none\")\n",
    "\n",
    "    plt.legend(fontsize=10)\n",
    "    plt.xlim(-0.5 * p, 0.5 * p)\n",
    "    plt.xlabel(\"phase [days]\")\n",
    "    plt.ylabel(\"radial velocity [m/s]\")\n",
    "    plt.title(\"K2-24{0}\".format(letter))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d102a073",
   "metadata": {},
   "source": [
    "We can also compute the posterior constraints on the planet densities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "745012cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "volume = 4 / 3 * np.pi * flat_samps[\"r_pl\"].values ** 3\n",
    "density = u.Quantity(\n",
    "    flat_samps[\"m_pl\"].values / volume, unit=u.M_earth / u.R_sun**3\n",
    ")\n",
    "density = density.to(u.g / u.cm**3).value\n",
    "\n",
    "bins = np.linspace(0, 1.1, 45)\n",
    "for n, letter in enumerate(\"bc\"):\n",
    "    plt.hist(\n",
    "        density[n],\n",
    "        bins,\n",
    "        histtype=\"step\",\n",
    "        lw=2,\n",
    "        label=\"K2-24{0}\".format(letter),\n",
    "        density=True,\n",
    "    )\n",
    "plt.yticks([])\n",
    "plt.legend(fontsize=12)\n",
    "plt.xlim(bins[0], bins[-1])\n",
    "plt.xlabel(\"density [g/cc]\")\n",
    "_ = plt.ylabel(\"posterior density\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92eac7cf",
   "metadata": {},
   "source": [
    "## Citations\n",
    "\n",
    "As described in the [citation tutorial](https://docs.exoplanet.codes/en/stable/tutorials/citation/), we can use [citations.get_citations_for_model](https://docs.exoplanet.codes/en/stable/user/api/#exoplanet.citations.get_citations_for_model) to construct an acknowledgement and BibTeX listing that includes the relevant citations for this model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "947ee22a",
   "metadata": {},
   "outputs": [],
   "source": [
    "with model:\n",
    "    txt, bib = xo.citations.get_citations_for_model()\n",
    "print(txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72b23361",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(bib.split(\"\\n\\n\")[0] + \"\\n\\n...\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5e7dabe-1767-4f8a-971e-6847f2e7e8a9",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}