{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# DC2 Object Catalog Run2.2i GCR tutorial -- Part III: Guided Challenges\n",
    "\n",
    "Owners: **Javier Sanchez [@fjaviersanchez](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@fjaviersanchez), Francois Lanusse [@EiffL](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@EiffL)**  \n",
    "Last Run: **2020-07-02** (by @jrbogart)\n",
    "\n",
    "This notebook is the third of four in the Run2.2i object catalog with GCR series ([Part I](object_gcr_1_intro.ipynb), [Part II](object_gcr_2_lensing_cuts.ipynb)). Here, we propose some challenges for you as science use cases of the object catalogs. We will provide a solution here but you are encouraged to create your own!\n",
    "\n",
    "\n",
    "__Logistics__: This notebook is intended to be run through the JupyterHub NERSC interface available here: https://jupyter-dev.nersc.gov. To setup your NERSC environment, please follow the instructions available here: https://confluence.slac.stanford.edu/display/LSSTDESC/Using+Jupyter-dev+at+NERSC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Challenge 1: Galaxy counts-in-cells"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Blending affects the accuracy of centroid and flux measurements. It can potentially generate a systematic effect in different measurements (for example 2-point statistics). \n",
    "\n",
    "The stack returns a very useful value to check (partially) for the presence of these kind of systematics, which is the `blendedness` parameter (more details on Section 4.9.11 of [Bosch et al. 2017](https://arxiv.org/pdf/1705.06766.pdf)\n",
    "\n",
    "* Q: Why only partially?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A very simple tool to measure the different statistical moments of galaxies is Counts-in-cells (CiC) [Peebles et al. 1980](https://press.princeton.edu/titles/724.html). Here, we are going to use a simplified version of CiC to check the possible systematic effects due to differences in the `blendedness` measurements."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So what's CiC?\n",
    "\n",
    "1) Count the number of galaxies in a cell of a given scale.\n",
    "\n",
    "2) Measure the density contrast distribution and its moments.\n",
    "\n",
    "3) Change the scale and repeat."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: We are focusing on the galaxy number CiC but it is possible to do with shapes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from utils.cic import cic_analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import healpy as hp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import GCRCatalogs\n",
    "from GCR import GCRQuery\n",
    "# Load the object catalog\n",
    "run = 'run2.2i_dr3'\n",
    "tract = 3081    # has good depth\n",
    "tract_s = str(tract)\n",
    "catalog = GCRCatalogs.load_catalog(f'dc2_object_{run}')\n",
    "#catalog = GCRCatalogs.load_catalog('dc2_object_run1.2i_all_columns')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's use almost the same cuts as in the WL sample\n",
    "\n",
    "cic_cuts_base = [\n",
    "    GCRQuery((np.isfinite, 'mag_i_cModel')), # (from this and below) remove nan entries\n",
    "    GCRQuery((np.isfinite, 'ext_shapeHSM_HsmShapeRegauss_resolution')),\n",
    "    GCRQuery((np.isfinite, 'ext_shapeHSM_HsmShapeRegauss_e1')),\n",
    "    GCRQuery((np.isfinite, 'ext_shapeHSM_HsmShapeRegauss_e2')),\n",
    "    GCRQuery('good'), \n",
    "    GCRQuery('snr_i_cModel >= 10'), # (from this and below) cut on object properties\n",
    "    GCRQuery('ext_shapeHSM_HsmShapeRegauss_sigma <= 0.4'),\n",
    "]\n",
    "\n",
    "is_blended = GCRQuery('blendedness >= 10**(-0.375)')\n",
    "\n",
    "cic_cuts_nb = cic_cuts_base + [~is_blended]\n",
    "cic_cuts_b = cic_cuts_base + [is_blended]\n",
    "\n",
    "quantities = ['ra','dec']\n",
    "\n",
    "# was tract 4849.  3081 has greater depth for Run2.2i_dr3\n",
    "d_nb = catalog.get_quantities(quantities, \n",
    "                              filters=cic_cuts_nb, \n",
    "                              native_filters=[f'tract == {tract_s}'])\n",
    "d_b = catalog.get_quantities(quantities, \n",
    "                             filters=cic_cuts_b, \n",
    "                             native_filters=[f'tract == {tract_s}'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read in the depth map created in the previous tutorial \n",
    "import pickle\n",
    "import gzip\n",
    "f = gzip.open(f'assets/m10map_{run}_{tract_s}.pklz', 'rb')\n",
    "m10map = pickle.load(f)\n",
    "f.close()\n",
    "mask = np.zeros_like(m10map)\n",
    "mask[m10map > 23.0] = 1.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hp.gnomview(mask, rot=(d_nb['ra'].mean(), d_nb['dec'].mean()), title='Run 2.2i Depth', reso=0.5, unit='10-$\\sigma$ i-band depth')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sigma_b, sigma_err_b, skw_b, skw_err_b, kurtosis_b, kurtosis_err_b, pixel_scale = cic_analysis(d_b, mask, nboot=100)\n",
    "sigma_nb, sigma_err_nb, skw_nb, skw_err_nb, kurtosis_nb, kurtosis_err_nb, _ = cic_analysis(d_nb, mask, nboot=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f, ax = plt.subplots(1, 1, figsize=(6,6))\n",
    "ax.errorbar(pixel_scale[2:], sigma_b[2:], sigma_err_b[2:], fmt='o', linestyle='none', label='High blendedness')\n",
    "ax.errorbar(pixel_scale[2:], sigma_nb[2:], sigma_err_nb[2:], fmt='x', linestyle='none', label='Low blendedness')\n",
    "ax.legend()\n",
    "ax.set_xlabel('Pixel scale [deg]')\n",
    "ax.set_ylabel(r'$\\sigma$')\n",
    "f.draw(f.canvas.get_renderer())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There's definitely something going on with the high blendedness sources!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Challenge 2: Check if PSF residuals are within requirements\n",
    "\n",
    "In this section, we will try to apply all the tools we have covered during this tutorial to test the quality of the DM stack PSF model on run 2.2i_dr3.\n",
    "\n",
    "The challenge will be to select a clean sample of stars, compute their size and ellipticity using second moments, and compare those to the PSF model predicted by the DM stack. We will test the one point and two point fuctions of these residuals to make diagnostic plots that would directly go into a weak lensing shape catalog paper.\n",
    "\n",
    "See for example Section 3. of [(Zuntz et al. 2017)](https://arxiv.org/pdf/1708.01533.pdf), (and in particular section 3.2) for an introduction to, and real life example of, PSF systematics tests."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1: Select a clean sample of point sources\n",
    "\n",
    "We want to restrict our sample to objects that follow these constraints:\n",
    " - Point sources\n",
    " - Not corrupted or with any defects\n",
    " - Successful second moment measurements\n",
    " - Sufficiently high signal to noise in the i-band, above 50\n",
    " \n",
    "Remember to use [SCHEMA.md](https://github.com/LSSTDESC/gcr-catalogs/blob/master/GCRCatalogs/SCHEMA.md#schema-for-dc2-coadd-catalogs) as a reference to build your cuts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import GCRCatalogs\n",
    "from GCR import GCRQuery\n",
    "\n",
    "catalog = GCRCatalogs.load_catalog(f'dc2_object_{run}')\n",
    "\n",
    "filters = [\n",
    "    # Add the required filters\n",
    "    # ...\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Computes size and ellipticity from second moments\n",
    "\n",
    "We will use the following definitions:  \n",
    "$g_1 = \\frac{I_{xx} - I_{yy}}{I_{xx} + I_{yy}}$  \n",
    "$g_2 = \\frac{2 I_{xy}}{I_{xx} + I_{yy}}$  \n",
    "$\\sigma = ( I_{xx} I_{yy} - I_{xy}^2)^{1/4}$ \n",
    "\n",
    "Using the `add_derived_quantity` of the GCR (documented [here](https://yymao.github.io/generic-catalog-reader/index.html#GCR.BaseGenericCatalog.add_derived_quantity)), add modifiers to compute these quantities for the sources and the PSF model evaluated at the position of the sources.\n",
    "Again the schema is your friend ;-)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here is a template to fill for g1\n",
    "g1_modif = lambda ixx,iyy,ixy: (ixx-iyy)/(ixx+iyy)\n",
    "catalog.add_derived_quantity('g1', g1_modif, 'Ixx', 'Iyy', 'Ixy')\n",
    "\n",
    "# Define in the same way g2, sigma, psf_g1, psf_g2 psf_sigma\n",
    "#..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Extract sample\n",
    "\n",
    "Now that we have all the pieces, let's extract the quantities specified below from the catalog:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "quantities = ['ra', 'dec', \n",
    "              'mag_i', 'snr_i_cModel', 'psf_fwhm_i',\n",
    "              'g1', 'g2', 'sigma',\n",
    "              'psf_g1', 'psf_g2', 'psf_sigma']\n",
    "\n",
    "# Extract these quantities from the catalog into data\n",
    "# data = ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4: Size and ellipticity residuals as a function of magnitude and seeing\n",
    "\n",
    "Try to reproduce plots similar to the ones you can find in the PSF section of your favorite experiment's shape catalog paper (for instance Section 4. of [Mandelbaum et al. 2017](https://arxiv.org/pdf/1705.06745.pdf)).\n",
    "\n",
    "For instance you can look at the fractional difference in size, as a function of magnitude, or seeing. You can also look at the distribution of ellipticity residuals, make sure they are centered on 0 , and again see if you can spot a dependence on seeing or magnitude."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plt.figure(figsize=(10,5))\n",
    "\n",
    "# plt.subplot(121)\n",
    "# plt.hist2d(data['mag_i'], (data['sigma'] - data['psf_sigma'])/data['psf_sigma'], 100, range=[[15,23],[-0.02,0.02]]);\n",
    "# plt.xlabel('i mag')\n",
    "# plt.ylabel('$f \\delta_\\sigma$')\n",
    "# plt.colorbar(label='Number of objects')\n",
    "# plt.subplot(122)\n",
    "# plt.hist2d(data['psf_fwhm_i'], (data['sigma'] - data['psf_sigma'])/data['psf_sigma'], 100, range=[[0.4,1.0],[-0.02,0.02]]);\n",
    "# plt.xlabel('seeing FWHM (arcsec)')\n",
    "# plt.colorbar(label='Number of objects');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plt.figure(figsize=(15,10))\n",
    "\n",
    "# plt.subplot(221)\n",
    "# plt.hist2d(data['mag_i'], (data['g1'] - data['psf_g1']), 100, range=[[15,23],[-0.02,0.02]]);\n",
    "# plt.xlabel('i mag')\n",
    "# plt.ylabel('$g_1 - g_1^{PSF}$')\n",
    "# plt.colorbar(label='Number of objects')\n",
    "# plt.subplot(222)\n",
    "# plt.hist2d(data['psf_fwhm_i'], (data['g1'] - data['psf_g1']), 100, range=[[0.4,1.0],[-0.02,0.02]]);\n",
    "# plt.xlabel('seeing FWHM (arcsec)')\n",
    "# plt.ylabel('$g_1 - g_1^{PSF}$')\n",
    "# plt.colorbar(label='Number of objects')\n",
    "# plt.subplot(223)\n",
    "# plt.hist((data['g1'] - data['psf_g1']), 100, range=[-0.04,0.04]);\n",
    "# plt.xlabel('$g_1 - g_1^{PSF}$')\n",
    "# plt.axvline(0)\n",
    "# plt.subplot(224)\n",
    "# plt.hist((data['g2'] - data['psf_g2']), 100, range=[-0.04,0.04]);\n",
    "# plt.xlabel('$g_2 - g_2^{PSF}$')\n",
    "# plt.axvline(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5: Compute $\\rho$-statistics in Stile\n",
    "\n",
    "No shear catalog paper would be complete without the so-called $\\rho$-statistics ([Rowe 2010](https://academic.oup.com/mnras/article/404/1/350/3101560), [Jarvis et al. 2016](https://academic.oup.com/mnras/article-abstract/460/2/2245/2609178?redirectedFrom=fulltext)), which check the two-point correlations of the PSF residuals. See section 3.4 of Jarvis et al. 2016 for instance to see the definition of these statistics.\n",
    "\n",
    "\n",
    "We are going to use the [Stile](https://github.com/msimet/Stile) package developed by Melanie Simet [@msimet](https://github.com/msimet), Hironao Miyatake [@HironaoMiyatake](https://github.com/HironaoMiyatake), Rachel Mandelbaum [@rmandelb](https://github.com/rmandelb), and Song Huang [@dr-guangtou](https://github.com/dr-guangtou).\n",
    "\n",
    "This is an incredibly useful package which already implements a range of WL related systematics tests, including the $\\rho$ statistics. Checkout the documentation for Stile [here](http://stile.readthedocs.io/en/latest/index.html).\n",
    "\n",
    "Uncomment the following cells to try them out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import pandas\n",
    "# import stile\n",
    "\n",
    "# # Stile expects numpy structured arrays, here is an easy way to do that:\n",
    "# d = pandas.DataFrame(data)\n",
    "# # We also add a weight column, set to 1\n",
    "# d['w'] =1\n",
    "# d = d.to_records(index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Here is an example of how to compute the rho1 statistics\n",
    "# stile_args = {'ra_units': 'degrees', 'dec_units': 'degrees',\n",
    "#               'min_sep': 0.05, 'max_sep': 1, 'sep_units': 'degrees', 'nbins': 20}\n",
    "\n",
    "# rho1 = stile.CorrelationFunctionSysTest('Rho1')\n",
    "\n",
    "# r1 = rho1(d, config=stile_args)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Have a look at the content of the result array\n",
    "# r1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Stile offers utility functions to generate \n",
    "# f = rho1.plot(r1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have reached that point, well done! \n",
    "Feel free to explore the other functionalities of Stile, including Histogram plots, Whiskers plots, and some fancy summary statistics (MAD, skewness, and kurtosis). Find Hironao, Rachel, or ping Melanie if you are interested in knowing more.\n",
    "\n",
    "And if you have reached this point out of desperation and are looking for answers, [here](assets/object_gcr_tutorial_part3.py) they are :-)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "desc-python",
   "language": "python",
   "name": "desc-python"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}