{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Accessing the Coadd data using the Data Butler\n",
    "<br>Owner: **Jim Chiang** ([@jchiang87](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@jchiang87))\n",
    "<br>Last Verified to Run: **2024-03-01** (by @fjaviersanchez)\n",
    "\n",
    "This notebook shows how to read in the coadd catalog data for an entire tract using the LSST Data Management (DM) data butler, and construct a Pandas dataframe with the columns needed to apply the weak lensing cuts used in recent analysis of data from Hyper Suprime Cam [Mandelbaum et al (2018)](https://arxiv.org/abs/1705.06745) using cuts given by [Francois in Slack](https://lsstc.slack.com/archives/C77DDKZHR/p1525197444000687).\n",
    "\n",
    "We have a [notebook that performs a similar analysis using GCR](https://github.com/LSSTDESC/DC2-analysis/blob/master/Notebooks/DC2%20Coadd%20Run1.1p%20GCR%20access%20--%20HSC%20selection.ipynb).  See also [the more general script](https://github.com/LSSTDESC/DC2-analysis/blob/master/scripts/merge_tract_cat.py) for extracting the coadd data from a tract into an hdf5 file.\n",
    "\n",
    "### Learning Objectives\n",
    "After completing and studying this Notebook, you should be able to\n",
    "1. Read in the coadd catalog data using the DM Butler\n",
    "2. Construct a Pandas data frame\n",
    "3. Apply the HSC weak lensing cuts to a catalog of DM-processed data.\n",
    "\n",
    "## Set Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "import lsst.daf.persistence as dp\n",
    "\n",
    "from desc_dc2_dm_data import REPOS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How To Read DM Science Pipelines Coadd Catalogs into a Pandas Dataframe\n",
    "\n",
    "Here's a function to do this: pass it a butler and some information about what you want, and get a pandas df object back. In this example, we'll get the forced photometry as well as the model measurements - these are stored in different tables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "REPOS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "tract = skymap[4851]\n",
    "filter = 'i'\n",
    "for patch in tract:\n",
    "        patchId = '%d,%d' % patch.getIndex()\n",
    "        dataId = dict(filter=filter_, tract=tract.getId(), patch=patchId)\n",
    "\n",
    "        # Patches for which there is no data will raise a `NoResults` exception when requested\n",
    "        # from the butler.  We catch those exceptions, note the missing data, and \n",
    "        # continue looping over the patches.\n",
    "        try:\n",
    "            forced = butler.get('deepCoadd_forced_src', dataId)\n",
    "            calib = butler.get('deepCoadd_calexp_photoCalib', dataId)\n",
    "            merged = butler.get('deepCoadd_ref', dataId)\n",
    "        except dp.NoResults as eobj:\n",
    "            print(eobj)\n",
    "            continue"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def read_tract(butler, tract, filter_='i', num_patches=None):\n",
    "    \"\"\"\n",
    "    Read in the coadd catalog data from the specified tract, looping over patches and concatenating\n",
    "    into a pandas data frame.  \n",
    "    \n",
    "    Notes\n",
    "    -----\n",
    "    The `merged` coadd catalog is supplemented with the CModel fluxes from the `forced_src` \n",
    "    catalog for the desired filter.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    butler: lsst.daf.persistence.Butler\n",
    "        Data butler that points to the data repository of interest, e.g., the output repository\n",
    "        for the Run1.1p data.\n",
    "    tract: lsst.skymap.tractInfo.TractInfo\n",
    "        The object containing the tract information as obtained from a skyMap object, e.g., via \n",
    "        `tract = skymap[tractId]` where `tractId` is the tract number.\n",
    "    filter_: str ['i']\n",
    "        Filter to use for extracting the `forced_src` fluxes.\n",
    "    num_patches: int [None]\n",
    "        Number of patches to consider.  If None, then use all of the available patches in the tract.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    pandas.DataFrame: This will contain the merged coadd data with the band-specific columns.\n",
    "    \"\"\"\n",
    "    if num_patches is None:\n",
    "        num_patches = len(tract)\n",
    "    \n",
    "    df_list = []\n",
    "    i = 0\n",
    "    print(\"Loop  Patch  Nobj\")\n",
    "    print(\"=================\")\n",
    "    for patch in tract:\n",
    "        patchId = '%d,%d' % patch.getIndex()\n",
    "        dataId = dict(filter=filter_, tract=tract.getId(), patch=patchId)\n",
    "\n",
    "        # Patches for which there is no data will raise a `NoResults` exception when requested\n",
    "        # from the butler.  We catch those exceptions, note the missing data, and \n",
    "        # continue looping over the patches.\n",
    "        try:\n",
    "            forced = butler.get('deepCoadd_forced_src', dataId)\n",
    "            calib = butler.get('deepCoadd_calexp_photoCalib', dataId)\n",
    "            merged = butler.get('deepCoadd_ref', dataId)\n",
    "        except dp.NoResults as eobj:\n",
    "            print(eobj)\n",
    "            continue\n",
    "\n",
    "        # Convert the merged coadd catalog into a pandas DataFrame and add the band-specific\n",
    "        # quantities from the forced_src coadd catalog as new columns.\n",
    "        data = merged.asAstropy().to_pandas()\n",
    "        data[filter_ + '_modelfit_CModel_instFlux'] = forced['modelfit_CModel_instFlux']\n",
    "        data[filter_ + '_modelfit_CModel_instFluxErr'] = forced['modelfit_CModel_instFluxErr']\n",
    "\n",
    "        # The calib object applies the zero point to get calibrated magnitudes.\n",
    "        \n",
    "        _aux = calib.instFluxToMagnitude(forced, 'modelfit_CModel')\n",
    "        data[filter_ + '_mag_CModel'] = _aux[:, 0]\n",
    "        data[filter_ + '_mag_err_CModel'] = _aux[:, 1]\n",
    "        data[filter_ + '_modelfit_CModel_SNR'] \\\n",
    "            = forced['modelfit_CModel_instFlux']/forced['modelfit_CModel_instFluxErr']\n",
    "        data['ext_shapeHSM_HsmShapeRegauss_abs_e'] \\\n",
    "            = np.hypot(data['ext_shapeHSM_HsmShapeRegauss_e1'],\n",
    "                       data['ext_shapeHSM_HsmShapeRegauss_e2'])\n",
    "        df_list.append(data)\n",
    "\n",
    "        # Print the current patch and number of objects added to the final DataFrame.\n",
    "        print(\"   {}  {}   {}\".format(i, patchId, len(data)))\n",
    "        i += 1\n",
    "        if i == num_patches:\n",
    "            break\n",
    "    return pd.concat(df_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Accessing the DM Science Pipeline processed data\n",
    "\n",
    "In order to access the DM Science Pipeline Processed outputs, including images and catalog data, we summon a data butler and provide the path to the data repository where the DM Science Pipeline tasks have written their outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "butler = dp.Butler(REPOS['2.2i_dr6'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `skyMap` object stores the information defining how we decided to divide up the sky into tracts and patches when we did the coaddition.  Here we pick a tract that is near the center of the protoDC2 simulation region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "skymap = butler.get('deepCoadd_skyMap')\n",
    "tractId = 4851\n",
    "filter_ = 'i'\n",
    "num_patches = None\n",
    "data = read_tract(butler, skymap[tractId], filter_=filter_, num_patches=num_patches)\n",
    "print(\"Number of objects in the coadd catalog:\", len(data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Making the Selection\n",
    "\n",
    "First off, we need to apply a baseline set of cuts to remove nan's."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "base_mask = ~(np.isnan(data['i_modelfit_CModel_instFlux'])\n",
    "         | np.isnan(data['ext_shapeHSM_HsmShapeRegauss_resolution'])\n",
    "         | np.isnan(data['ext_shapeHSM_HsmShapeRegauss_e1']))\n",
    "data = data[base_mask]\n",
    "print(\"Number of objects after baseline cuts:\", len(data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we get to the weak lensing cuts.   The `detect_isPrimary` flag identifies the primary object in the overlap regions between patches, i.e., applying it resolves duplicates from the different analyses of overlapping patches."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "mask = data['detect_isPrimary']\n",
    "mask &= data['deblend_skipped'] == False\n",
    "mask &= data['base_PixelFlags_flag_edge'] == False\n",
    "mask &= data['base_PixelFlags_flag_interpolatedCenter'] == False\n",
    "mask &= data['base_PixelFlags_flag_saturatedCenter'] == False\n",
    "mask &= data['base_PixelFlags_flag_crCenter'] == False\n",
    "mask &= data['base_PixelFlags_flag_bad'] == False\n",
    "mask &= data['base_PixelFlags_flag_suspectCenter'] == False\n",
    "mask &= data['base_PixelFlags_flag_clipped'] == False\n",
    "mask &= data['ext_shapeHSM_HsmShapeRegauss_flag'] == False\n",
    "\n",
    "# Cut on measured object properties\n",
    "mask &= data['i_modelfit_CModel_SNR'] >= 10\n",
    "mask &= data['ext_shapeHSM_HsmShapeRegauss_resolution'] >= 0.3\n",
    "mask &= data['ext_shapeHSM_HsmShapeRegauss_abs_e'] < 2\n",
    "mask &= data['ext_shapeHSM_HsmShapeRegauss_sigma'] <= 0.4\n",
    "mask &= data['i_mag_CModel'] < 24.5 # !!! Doesnt have exinction correction\n",
    "mask &= data['base_Blendedness_abs'] < 10**(-0.375)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's make some plots of galaxy properties with the above cuts applied. The figure to compare with is Fig. 22 of [Mandelbaum et al (2018)](https://arxiv.org/abs/1705.06745)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(10, 10))\n",
    "hist_kwds = dict(bins=100, histtype='step', normed=True, linewidth=2.0, color=\"black\")\n",
    "p1 = fig.add_subplot(2, 2, 1)\n",
    "plt.hist(data['ext_shapeHSM_HsmShapeRegauss_resolution'][mask], **hist_kwds);\n",
    "plt.xlabel('{}-band HSM resolution'.format(filter_))\n",
    "plt.xlim([0.2,1])\n",
    "p2 = fig.add_subplot(2, 2, 2)\n",
    "plt.hist(data[filter_ + '_modelfit_CModel_SNR'][mask], range=(0, 100), **hist_kwds);\n",
    "plt.xlabel('{}-band CModel S/N'.format(filter_))\n",
    "plt.xlim([5,100])\n",
    "p3 = fig.add_subplot(2, 2, 3)\n",
    "plt.hist(data[filter_ + '_mag_CModel'][mask], **hist_kwds);\n",
    "plt.xlabel(' {}-band CModel mag'.format(filter_))\n",
    "plt.xlim([20,25])\n",
    "p4 = fig.add_subplot(2, 2, 4)\n",
    "plt.hist(data['ext_shapeHSM_HsmShapeRegauss_abs_e'][mask], **hist_kwds);\n",
    "plt.xlabel('HSM distortion magnitude |e|')\n",
    "plt.xlim([0,2]);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's the published figure, for comparison: \n",
    "\n",
    "![](https://user-images.githubusercontent.com/710903/42742471-a196979e-886f-11e8-81f6-a89e425f9be3.png)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "desc-stack",
   "language": "python",
   "name": "desc-stack"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}