{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Black Hole Spectroscopy with PyCBC Inference\n",
    "\n",
    "This notebook provides an overview of how to do black hole spectroscopy with PyCBC Inference. We suggest that you go through the inference tutorial 0 for a better understanding of what's done here. For more details, see the separate tutorial notebooks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make sure the software is set up ###\n",
    "\n",
    "To do a ringdown analysis in PyCBC we need the [pykerr](https://github.com/cdcapano/pykerr) package installed. We'll also install `dynesty` for doing the sampling. These are both optional dependencies in the PyCBC repository, meaning that we need to install them separately if we are installing released versions of PyCBC."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install pykerr dynesty pycbc --no-cache-dir"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recap\n",
    "\n",
    "To do parameter estimation, we use the command-line script `pycbc_inference`. This pulls together the sampler, models, and distributions modules in the `pycbc.inference` package so that we may perform Bayesian inference. The standard command-line argument for PyCBC inference are:\n",
    "\n",
    "```\n",
    "pycbc_inference --verbose --seed ${SEED} --output-file inference.hdf --config-files config1.ini [config2.ini ...]\n",
    "```\n",
    "The key arguments that `pycbc_inference` takes are an output file and one or more `config-files` (run `pycbc_inference --help` to see all of the arguments it takes). The config files entirely specify the problem that you will be analyzing. In them, you specify the model to use, the variable parameters, and the prior on each of the parameters. If the model involves gravitational-wave data, you also specify what data to load and how to estimate the PSD. \n",
    "\n",
    "You use the same command-line program regardless of whether you are doing a standard CBC analysis, a BH ringdown analysis, or any other form of testing GR. The config file(s) you provide are what determines what type of analysis you are doing. Below, we will illustrated how to setup a config file to do a BH ringdown analysis on GW190521.\n",
    "\n",
    "*Note: in order to do everything from within the notebook, we'll create the config files by echoing a python string to  a file. In a normal situation, you would just use your favorite text editor to create the config files.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The model\n",
    "\n",
    "As illustrated in previous notebooks, there are many different models in PyCBC. The model you choose, combined with the waveform template you set (done via the `approximant` argument in the `[static_params]` section), determines the type of analysis you are doing. Let's recall what models are available:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pycbc.inference import models\n",
    "\n",
    "for model_name in models.models:\n",
    "    print(model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To do a BH ringdown analysis we need to use one of the gated Gaussian models. Note that there are two: the standard `gated_gaussian_noise` model, and a `gated_gaussian_margpol`. The latter does the same thing, it just marginalizes over the polarization angle numerically to speed up convergence. We'll use that model here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_config = \"\"\"\n",
    "[model]\n",
    "name = gated_gaussian_margpol\n",
    "low-frequency-cutoff = 15\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!echo '{model_config}' > model.ini\n",
    "!cat model.ini"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The data\n",
    "\n",
    "Since this model uses data, we also need to create a data section that specifies what data to read for the analysis and the PSD estimation. See the [PyCBC Inference docs](https://pycbc.org/pycbc/latest/html/inference.html#setting-data) for more information about what options are available and what they mean.\n",
    "\n",
    "First we need to download the data from GWOSC (note: if you were doing this on the command line, you can just use `wget` or `curl`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from astropy.utils.data import download_file\n",
    "\n",
    "ifos = ['H1', 'L1', 'V1']\n",
    "data_filenames = {}\n",
    "for ifo in ifos:\n",
    "    print(\"Downloading {} data\".format(ifo))\n",
    "    \n",
    "    # Download the gravitational wave data for GW190521\n",
    "    url = \"https://www.gw-openscience.org/eventapi/html/O3_Discovery_Papers/GW190521/v2/{}-{}1_GWOSC_4KHZ_R2-1242440920-4096.gwf\"\n",
    "    fname = download_file(url.format(ifo[0], ifo[0]), cache=True)\n",
    "    data_filenames[ifo] = fname"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's set up our data section, pointing to the frame files we just downloaded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_config = \"\"\"\n",
    "[data]\n",
    "instruments = H1 L1 V1\n",
    "trigger-time = 1242442967.445\n",
    "analysis-start-time = -4\n",
    "analysis-end-time = 4\n",
    "data-conditioning-low-freq = H1:0 L1:0 V1:0\n",
    "psd-estimation = median-mean\n",
    "psd-start-time = -144\n",
    "psd-end-time = 144\n",
    "psd-inverse-length = 8\n",
    "psd-segment-length = 8\n",
    "psd-segment-stride = 4\n",
    "frame-files = H1:{h1file} L1:{l1file} V1:{v1file}\n",
    "channel-name = H1:GWOSC-4KHZ_R2_STRAIN L1:GWOSC-4KHZ_R2_STRAIN V1:GWOSC-4KHZ_R2_STRAIN\n",
    "sample-rate = 2048\n",
    "strain-high-pass = 10\n",
    "pad-data = 8\n",
    "\"\"\".format(h1file=data_filenames['H1'],\n",
    "           l1file=data_filenames['L1'],\n",
    "           v1file=data_filenames['V1'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!echo '{data_config}' > data.ini\n",
    "!cat data.ini"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The prior\n",
    "Now let's specify the prior. We need to provide a section that lists the variable parameters and another that specifies the static parameters. For every variable parameter we have to provide one or more `prior` section(s) that specifies the prior distribution.\n",
    "\n",
    "For a ringdown, the key things to note are:\n",
    " * By setting the approximant to [TdQNMfromFinalMassSpin](http://pycbc.org/pycbc/latest/html/pycbc.waveform.html#pycbc.waveform.ringdown.get_td_from_final_mass_spin) we are using a QNM template from the Kerr solution. This means that the frequency and damping times of all the modes we will use are set by the final mass and spin (which we will vary). If we wanted to allow the frequencies and damping times to be completely independent, we would use the [TdQNMfromFreqTau](http://pycbc.org/pycbc/latest/html/pycbc.waveform.html#pycbc.waveform.ringdown.get_td_from_freqtau) approximant. (Note, however, that `TdQNMfromFinalMassspin` accepts `delta_flmn` and `delta_taulmn` parameters that can be used to make the frequency and damping time of an `lmn` mode to deviate from its GR value. See the documentation [TdQNMfromFinalMassSpin](http://pycbc.org/pycbc/latest/html/pycbc.waveform.html#pycbc.waveform.ringdown.get_td_from_final_mass_spin) for a full list of documented parameters.\n",
    "  * We set which harmonics to use via the `harmonic` argument; here, we are using the spheroidal harmonics (which come from `pykerr`).\n",
    "  * We set which modes are included by setting the `lmns` argument. **Note that the last digit in the `lmns` is not the overtone number, but the number of overtones to include for the given `lm` mode. In other words, `lmns = 222` means to include the 220 and 221 mode. If we wanted the 220 only, we'd set `lmns = 221`.**\n",
    "  * If we wanted to add other harmonics, we'd list them as a space-separated list, e.g., `221 331 321` would mean to include the 220, 330, and 320 modes.\n",
    "  * For each mode/overtone we specify with the `lmns` argument we need to provide an amplitude and phase for that mode (as well as any other optional argument, such as a frequency/damping time deviation and/or a different polarization; see the [TdQNMfromFinalMassSpin](http://pycbc.org/pycbc/latest/html/pycbc.waveform.html#pycbc.waveform.ringdown.get_td_from_final_mass_spin) documentation for details). These additional parameters can either be fixed (by setting it in the `static_params`) or variable.\n",
    "  * The amplitude of the dominant mode (`amp220`) is in terms of strain. The amplitudes of all other modes are specified with respect to the dominant mode. For example, `amp221` means that the amplitude of the 221 mode (in terms of strain) will be `amp221 * amp220`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prior_config = \"\"\"\n",
    "[static_params]\n",
    "approximant = TdQNMfromFinalMassSpin\n",
    "harmonics = spheroidal\n",
    "tref = ${data|trigger-time}\n",
    "ra = 3.5\n",
    "dec = 0.73\n",
    "toffset = 0.018\n",
    "lmns = 222\n",
    "t_final = 2\n",
    "\n",
    "[variable_params]\n",
    "final_mass = \n",
    "final_spin = \n",
    "inclination = \n",
    "logamp220 = \n",
    "phi220 =\n",
    "amp221 =\n",
    "phi221 =\n",
    "\n",
    "[waveform_transforms-t_gate_end]\n",
    "name = custom\n",
    "inputs = tref\n",
    "t_gate_end = tref + 0.0\n",
    "\n",
    "[waveform_transforms-t_gate_start]\n",
    "name = custom\n",
    "inputs = t_gate_end\n",
    "t_gate_start = t_gate_end - 2\n",
    "\n",
    "[waveform_transforms-tc]\n",
    "name = custom\n",
    "inputs = t_gate_end\n",
    "tc = t_gate_end - 0.001\n",
    "\n",
    "[prior-final_mass]\n",
    "name = uniform\n",
    "min-final_mass = 100\n",
    "max-final_mass = 500\n",
    "\n",
    "[prior-final_spin]\n",
    "name = uniform\n",
    "min-final_spin = -0.99\n",
    "max-final_spin = 0.99\n",
    "\n",
    "[prior-inclination]\n",
    "name = sin_angle\n",
    "\n",
    "[prior-logamp220]\n",
    "name = uniform\n",
    "min-logamp220 = -24\n",
    "max-logamp220 = -19\n",
    "\n",
    "[waveform_transforms-amp220]\n",
    "name = custom\n",
    "inputs = logamp220\n",
    "amp220 = 10**logamp220\n",
    "\n",
    "[prior-phi220]\n",
    "name = uniform_angle\n",
    "\n",
    "[prior-amp221]\n",
    "name = uniform\n",
    "min-amp221 = 0\n",
    "max-amp221 = 5\n",
    "\n",
    "[prior-phi221]\n",
    "name = uniform_angle\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!echo '{prior_config}' > prior.ini\n",
    "!cat prior.ini"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The sampler\n",
    "\n",
    "Finally, we need to specify the sampler settings. To do that, we need to provide a `sampler` section. In this case we'll use the nested sampler `dynesty`, but you can use any of the samplers that PyCBC has support for. See the [PyCBC Inference docs](https://pycbc.org/pycbc/latest/html/inference.html#configuring-the-sampler) for a list of supported samplers and the optional available for them.\n",
    "\n",
    "**Note:** Below we set the number of livepoints to 100, the checkpoint interval to only 30s, and we set a `maxcall` to 100. We are only doing this so that we can trigger a checkpoint quickly. This is only for demonstration purposes. In a real analysis, you would want to use many more livepoints (4000 is a good number for BH ringdown)  and set the checkpoint interval to something ~1800s (or more). You also don't normally need to set `maxcall`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sampler_config = \"\"\"\n",
    "[sampler]\n",
    "name = dynesty\n",
    "dlogz = 0.1\n",
    "nlive = 100\n",
    "checkpoint_time_interval = 30\n",
    "maxcall = 100\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!echo '{sampler_config}' > sampler.ini\n",
    "!cat sampler.ini"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run `pycbc_inference`\n",
    "\n",
    "Now that we've set up our config files, we can run `pycbc_inference`. The BH ringdown problem we set up above would take a long time (several hours) to run in a notebook like this. Instead, start this running, then interrupt the kernel after you see the output message say that it has checkpointed a couple times (should occur within a couple minutes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# generate a random int for the seed\n",
    "import numpy\n",
    "seed = numpy.random.randint(1,int(2**31))\n",
    "print(\"Using seed: {}\".format(seed))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!timeout 300 \\\n",
    "    pycbc_inference --verbose \\\n",
    "    --config-files model.ini data.ini prior.ini sampler.ini \\\n",
    "    --output-file inference.hdf \\\n",
    "    --seed {seed} \\\n",
    "    --nprocesses 8 \\\n",
    "    --force    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that a checkpoint file exists in our directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -lh inference.hdf.checkpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting with `pycbc_inference_plot_posterior`\n",
    "\n",
    "The command-line tool `pycbc_inference_plot_posterior` is used to make corner and posterior plots of our results. We can give it either the direct output of `pycbc_inference` (here, `inference.hdf`) or a posterior file created using `pycbc_inference_extract_samples`. There are several plotting configuration options available (see `pycbc_inference_plot_posterior --help` for the list of options).\n",
    "\n",
    "We can also plot the current status of the sampler using the checkpoint file. This isn't too meaningful for nested samplers, but can give you an idea of how far along the sampler is. Let's take a look at the raw samples from our current checkpoint file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pycbc_inference_plot_posterior --verbose \\\n",
    "    --input-file inference.hdf.checkpoint \\\n",
    "    --output-file raw_samples_current.png \\\n",
    "    --plot-scatter --plot-marginal \\\n",
    "    --z-arg loglikelihood \\\n",
    "    --raw-samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "Image('raw_samples_current.png', height=480)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot a fully completed result\n",
    "\n",
    "The setup we've done above was used in Capano et al. [arXiv:2105.05238](https://arxiv.org/abs/2105.05238), the samples for which were released [here](https://github.com/gwastro/BH-Spectroscopy-GW190521). We'll download the completed posterior file of the 220+221 result at $t_{\\rm ref} - 5\\textrm{ms}$ from there to see what a completed result looks like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget https://github.com/gwastro/BH-Spectroscopy-GW190521/raw/v1/posteriors/kerr/220_221/KERR-220_221-M05MS.hdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pycbc_inference_plot_posterior --verbose \\\n",
    "    --input-file KERR-220_221-M05MS.hdf \\\n",
    "    --output-file completed_scatter.png \\\n",
    "    --plot-scatter --plot-marginal \\\n",
    "    --z-arg loglikelihood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Image('completed_scatter.png', height=480)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's make a more interesting plot: we'll use these to make a density plot of the mass, spin, and amplitudes, and give them pretty labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pycbc_inference_plot_posterior --verbose \\\n",
    "    --input-file KERR-220_221-M05MS.hdf \\\n",
    "    --output-file completed_density-mass_spin_amps.png \\\n",
    "    --plot-density --plot-contours --plot-marginal \\\n",
    "    --density-cmap inferno --contour-colors white \\\n",
    "    --max-kde-samples 5000 \\\n",
    "    --parameters \\\n",
    "        'final_mass:$M_f (M_\\odot)$' \\\n",
    "        'final_spin:$\\chi_f$' \\\n",
    "        '10**logamp220*1e20:$A_{220} (\\times 10^20)$' \\\n",
    "        'amp221:$A_{221}/A_{220}$'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Image('completed_density-mass_spin_amps.png', height=480)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the model from the posterior file\n",
    "\n",
    "The inference file and/or posterior file contains all the information to reconstruct the model; i.e., the config file, data, and PSDs are all stored in the hdf file. We can use this to load the model in the file, and plot the max likelihood waveforms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pycbc.inference import io, models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the posterior file\n",
    "fp = io.loadfile('KERR-220_221-M05MS.hdf', 'r')\n",
    "# get the config, the data, and PSDs from the file\n",
    "# the config file:\n",
    "cp = fp.read_config_file()\n",
    "# the data\n",
    "data = fp.read_data()\n",
    "# the psds\n",
    "psds = fp.read_psds()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# now let's load the model\n",
    "model = models.read_from_config(cp, data=data, psds=psds)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# let's get the maximum likelihood point\n",
    "samples = fp.read_samples(list(fp['samples'].keys()))\n",
    "maxlidx = samples['loglikelihood'].argmax()\n",
    "maxlparams = {p: samples[p][maxlidx] for p in model.variable_params}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the loglikelihood of these points\n",
    "model.update(**maxlparams)\n",
    "model.loglikelihood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the matched-filter SNR\n",
    "print((2*model.loglr)**0.5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the gated, whitened maxL waveform and plot it against\n",
    "# the whitened data\n",
    "gated_wfs = model.get_gated_waveforms()\n",
    "gated_data = model.get_gated_data()\n",
    "# whiten them\n",
    "gated_wfs = model.whiten(gated_wfs, 1)\n",
    "gated_data = model.whiten(gated_data, 1)\n",
    "# convert to the time domain\n",
    "gated_wfs = {ifo: d.to_timeseries() for ifo, d in gated_wfs.items()}\n",
    "gated_data = {ifo: d.to_timeseries() for ifo, d in gated_data.items()}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot\n",
    "%matplotlib notebook\n",
    "from matplotlib import pyplot\n",
    "fig, axes = pyplot.subplots(nrows=3, figsize=(8,8))\n",
    "for ii, ifo in enumerate(gated_wfs):\n",
    "    ax = axes[ii]\n",
    "    # we'll plot w.r.t. the geocentric end gate time\n",
    "    t_gate_end = model.current_params['t_gate_end']\n",
    "    d = gated_data[ifo].time_slice(t_gate_end-0.15, t_gate_end+0.15)\n",
    "    t = d.sample_times - t_gate_end\n",
    "    ax.plot(t, d, color='black', label='{} gated data'.format(ifo))\n",
    "    # plot the waveform\n",
    "    d = gated_wfs[ifo].time_slice(t_gate_end-0.15, t_gate_end+0.15)\n",
    "    t = d.sample_times - t_gate_end\n",
    "    ax.plot(t, d, color='C1', lw=3, label='maxL gated template')\n",
    "    ax.legend()\n",
    "    if ii == 2:\n",
    "        ax.set_xlabel(r'time - $t_{\\rm{gate\\,end}}$ (s)', fontsize=18)\n",
    "    if ii == 1:\n",
    "        ax.set_ylabel('whitened data', fontsize=18)\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}