{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Match truth and object catalogs for DC2 Run 2.2i\n",
    "Owner: Yao-Yuan Mao, Scott Daniel (with help from Anže Slosar, Bhairav Valera, HyeYun Park) <br>\n",
    "Updated by: Javier Sanchez, Yao-Yuan Mao, Patricia Larsen <br>\n",
    "Last Verified to Run: 2024-Mar-11 by Patricia Larsen\n",
    "\n",
    "**Notes:**\n",
    "- Follow this [step-by-step guide](https://confluence.slac.stanford.edu/x/Xgg4Dg) if you don't know how to run this notebook.\n",
    "- If you need more information about the Generic Catalog Reader (GCR), see [this diagram](https://github.com/yymao/generic-catalog-reader/blob/master/README.md#concept) and [more examples](https://github.com/LSSTDESC/gcr-catalogs/blob/master/examples/GCRCatalogs%20Demo.ipynb).\n",
    "\n",
    "## Learning objectives\n",
    "After completing and studying this Notebook, you should be able to:\n",
    "  1. Use GCR to load object catalog and truth catalog\n",
    "  2. Use `filters` and `native_filters` appropriately\n",
    "  3. Use `add_derived_quantity`\n",
    "  4. Use `FoFCatalogMatching` to do Friends-of-friends catalog matching\n",
    "  5. Learn some cool Numpy tricks for binning, masking, and reshaping [Advanced]\n",
    "  6. Learn use pandas to match truth catalog object id back to the galaxy id in extragalactic catalog [advanced]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import healpy as hp\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from astropy.coordinates import SkyCoord\n",
    "import pandas as pd\n",
    "\n",
    "import FoFCatalogMatching\n",
    "import GCRCatalogs\n",
    "from GCRCatalogs import GCRQuery\n",
    "from GCRCatalogs.dc2_truth_match import _flux_to_mag as flux_to_mag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# load object catalog (for a single tract)\n",
    "object_cat = GCRCatalogs.load_catalog('dc2_object_run2.2i_dr6')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Let's first visually inspect the footprint of a few tracts of the object catalog.\n",
    "# When `return_iterator` is turned on, the method `get_quantities` will return an \n",
    "# iterator, and each element in the iterator will be the quantities we requested in \n",
    "# different chunks of the dataset. \n",
    "\n",
    "# For object catalogs, the different chunks happen to be different tracts, \n",
    "# resulting in a different color for each tract in the scatter plot below.\n",
    "\n",
    "for object_data in object_cat.get_quantities(['ra', 'dec'], native_filters=['tract >= 3445', 'tract < 3448'], return_iterator=True):\n",
    "    plt.scatter(object_data['ra'], object_data['dec'], s=1, rasterized=True);\n",
    "\n",
    "plt.xlabel('RA');\n",
    "plt.ylabel('Dec');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Let's also define a magnitude cut\n",
    "mag_filters = [\n",
    "    (np.isfinite, 'mag_i'),\n",
    "    'mag_i < 24.5',\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# let's add total ellipticity for later use (not needed for now)\n",
    "object_cat.add_derived_quantity('shape_hsm_regauss_etot', np.hypot, \n",
    "                                'ext_shapeHSM_HsmShapeRegauss_e1', 'ext_shapeHSM_HsmShapeRegauss_e2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Load ra and dec from object, using both of the filters we just defined.\n",
    "object_data = object_cat.get_quantities(['ra', 'dec', 'mag_i_cModel', 'shape_hsm_regauss_etot',\n",
    "                                         'extendedness', 'blendedness'],\n",
    "                                        filters=(mag_filters), native_filters=['tract == 3447'])\n",
    "\n",
    "# Convert to pandas dataframe\n",
    "object_data = pd.DataFrame(object_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Let's now turn to the truth catalog. Here we just append galaxies and stars; \n",
    "# however, truth catalogs are also available in GCRCatalogs and PostgreSQL.\n",
    "truth_cat = GCRCatalogs.load_catalog('cosmoDC2_v1.1.4_image')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Load the stars\n",
    "stars_cat = GCRCatalogs.load_catalog(\"dc2_run2.2i_truth_star_summary\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "max_ra = np.nanmax(object_data['ra'])\n",
    "min_ra = np.nanmin(object_data['ra'])\n",
    "max_dec = np.nanmax(object_data['dec'])\n",
    "min_dec = np.nanmin(object_data['dec'])\n",
    "pos_filters = [f'ra >= {min_ra}',f'ra <= {max_ra}', f'dec >= {min_dec}', f'dec <= {max_dec}']\n",
    "\n",
    "vertices = hp.ang2vec(np.array([min_ra, max_ra, max_ra, min_ra]),\n",
    "                      np.array([min_dec, min_dec, max_dec, max_dec]), lonlat=True)\n",
    "ipix = hp.query_polygon(32, vertices, inclusive=True)\n",
    "healpix_filter = GCRQuery((lambda h: np.isin(h, ipix, True), \"healpix_pixel\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# get ra and dec from truth catalog\n",
    "# note that we add i < 25 to the native filter to speed up load time\n",
    "truth_mag_filters = ['mag_i < 24.7']\n",
    "\n",
    "quantities = ['galaxy_id', 'ra', 'dec', 'mag_i', 'redshift']\n",
    "truth_data = truth_cat.get_quantities(quantities, filters=truth_mag_filters+pos_filters, \n",
    "                                      native_filters=healpix_filter)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "if not stars_cat.has_quantity(\"mag_i\"):\n",
    "    stars_cat.add_derived_quantity(\"mag_i\", flux_to_mag, \"flux_i\")\n",
    "quantities = ['id', 'ra', 'dec', 'mag_i']\n",
    "stars_data = stars_cat.get_quantities(quantities, filters=pos_filters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Adjust the tables of galaxies and stars so that we can merge them into one table\n",
    "\n",
    "truth_data = pd.DataFrame(truth_data)\n",
    "truth_data[\"star\"] = False\n",
    "truth_data = truth_data.rename(columns={\"galaxy_id\": \"id\"})\n",
    "\n",
    "stars_data = pd.DataFrame(stars_data)\n",
    "stars_data[\"star\"] = True\n",
    "stars_data[\"id\"] = stars_data[\"id\"].astype(np.int64)\n",
    "\n",
    "truth_data_all = pd.concat([truth_data, stars_data], ignore_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "plt.scatter(truth_data_all['ra'][::100], truth_data_all['dec'][::100], s=0.1)\n",
    "plt.scatter(object_data['ra'][::100], object_data['dec'][::100], s=0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# now we can really do the matching!\n",
    "# FoFCatalogMatching.match takes a dictionary of catalogs to match, a friends-of-friends linking length. \n",
    "\n",
    "# This cell may take a few minutes to run.\n",
    "\n",
    "results = FoFCatalogMatching.match(\n",
    "    catalog_dict={'truth': truth_data_all, 'object': object_data},\n",
    "    linking_lengths=1.0, # Linking length of 1 arcsecond, you can play around with the values!\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# now we want to count the number of truth and object objects *for each group*\n",
    "# but instead of looping over groups, we can do this in a smart (and very fast) way\n",
    "\n",
    "# first we need to know which rows are from the truth catalog and which are from the object\n",
    "truth_mask = results['catalog_key'] == 'truth'\n",
    "object_mask = ~truth_mask\n",
    "\n",
    "# then np.bincount will give up the number of id occurrences (like historgram but with integer input)\n",
    "n_groups = results['group_id'].max() + 1\n",
    "n_truth = np.bincount(results['group_id'][truth_mask], minlength=n_groups)\n",
    "print(n_truth[n_truth>10])\n",
    "n_object = np.bincount(results['group_id'][object_mask], minlength=n_groups)\n",
    "\n",
    "# now n_truth and n_object are the number of truth/object objects in each group\n",
    "# we want to make a 2d histrogram of (n_truth, n_object). \n",
    "n_max = max(n_truth.max(), n_object.max()) + 1\n",
    "hist_2d = np.bincount(n_object * n_max + n_truth, minlength=n_max*n_max).reshape(n_max, n_max)\n",
    "\n",
    "plt.imshow(np.log10(hist_2d+1), extent=(-0.5, n_max-0.5, -0.5, n_max-0.5), origin='lower');\n",
    "plt.xlabel('Number of truth objects');\n",
    "plt.ylabel('Number of object objects');\n",
    "plt.colorbar(label=r'$\\log(N_{\\rm groups} \\, + \\, 1)$');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Let's further inspect the objects in the groups that have 1-to-1 truth/object match.\n",
    "\n",
    "# first, let's find our the IDs of the groups that have 1-to-1 truth/object match:\n",
    "one_to_one_group_mask = np.in1d(results['group_id'], np.flatnonzero((n_truth == 1) & (n_object == 1)))\n",
    "\n",
    "# and then we can find the row indices in the *original* truth/object catalogs for those 1-to-1 groups\n",
    "truth_idx = results['row_index'][one_to_one_group_mask & truth_mask]\n",
    "object_idx = results['row_index'][one_to_one_group_mask & object_mask]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "truth_sc = SkyCoord(truth_data_all['ra'][truth_idx], truth_data_all['dec'][truth_idx], unit=\"deg\")\n",
    "object_sc = SkyCoord(object_data['ra'][object_idx], object_data['dec'][object_idx], unit=\"deg\")\n",
    "\n",
    "delta_ra = (object_sc.ra.arcsec - truth_sc.ra.arcsec) * np.cos(np.deg2rad(0.5*(truth_sc.dec.deg + object_sc.dec.deg)))\n",
    "delta_dec = object_sc.dec.arcsec - truth_sc.dec.arcsec\n",
    "delta_arcsec = object_sc.separation(truth_sc).arcsec"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(7.3, 6))  # Pick a figuresize that will result in a square equal-axis plus colorbar\n",
    "plt.hist2d(delta_ra, delta_dec, bins=40, range=((-0.5, +0.5), (-0.5, +0.5)), norm=mpl.colors.LogNorm(), cmap=\"gray_r\");\n",
    "plt.xlabel(r'$\\Delta$ RA [arcsec]');\n",
    "plt.ylabel(r'$\\Delta$ Dec [arcsec]');\n",
    "plt.colorbar();\n",
    "plt.xlim(-0.5, +0.5)\n",
    "plt.ylim(-0.5, +0.5)\n",
    "plt.axis('equal');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#Plotting Delta angle for the outputs\n",
    "plt.hist(delta_arcsec, bins=80);\n",
    "plt.xlim(0, 0.4);\n",
    "plt.xlabel(r'$\\Delta$ angle [arcsec]');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# merge the matched objects\n",
    "truth_matched = truth_data_all.iloc[truth_idx].reset_index(drop=True)\n",
    "object_matched = object_data.iloc[object_idx].reset_index(drop=True)\n",
    "matched = pd.merge(truth_matched, object_matched, left_index=True, right_index=True, suffixes=('_truth', '_object'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Select only those truth objects that are galaxies which were not sprinkled\n",
    "# (stars and sprinkled objects do not occur in the extragalactic catalog)\n",
    "matched_gals = matched.query('~star')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# load redshift and ellipticity from the extragalactic catalog, only for galaxies that are already in `matched_gals`\n",
    "extragalactic_data = truth_cat.get_quantities(\n",
    "    ['galaxy_id', 'mag_i_lsst', 'ellipticity_true'],\n",
    "    filters=[(lambda x: np.isin(x, matched_gals['id'].values, True), 'galaxy_id')]+truth_mag_filters+pos_filters,\n",
    "    native_filters=healpix_filter,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# merge extragalactic_data to matched_gals\n",
    "matched_gals = pd.merge(matched_gals, pd.DataFrame(extragalactic_data), 'left', left_on='id', right_on='galaxy_id')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# compare the magnitude\n",
    "plt.figure(figsize=(5,5));\n",
    "plt.scatter(matched_gals['mag_i_lsst'], matched_gals['mag_i_cModel'], s=0.1);\n",
    "lims = [14, 25]\n",
    "plt.plot(lims, lims, c='k', lw=0.5);\n",
    "plt.xlabel('extragalactic $i$-mag');\n",
    "plt.ylabel('object $i$-mag (cModel)');\n",
    "plt.xlim(lims);\n",
    "plt.ylim(lims);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# compare the ellipticity (naively -- see below for further discussion)\n",
    "plt.figure(figsize=(5,5));\n",
    "plt.scatter(matched_gals['ellipticity_true'], matched_gals['shape_hsm_regauss_etot'], s=0.1);\n",
    "lims = [0, 1]\n",
    "plt.plot(lims, lims, c='k', lw=0.5);\n",
    "plt.xlabel('extragalactic ellipticity');\n",
    "plt.ylabel('object ellipticity');\n",
    "plt.xlim(lims);\n",
    "plt.ylim(lims);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ellipticity comparison plot above is quite surprising. \n",
    "It seems that the ellipticities in the object catalog are generally higher (i.e., less round) than those in the extragalactic catalog. \n",
    "\n",
    "The quantity `shape_hsm_regauss_etot` that we used for the object catalog are the re-Gaussianization shapes, which are PSF corrected, and they could be either rounder (if the correction was an under-correction) or less round (if the correction was an over-correction). Hence, their value being systematically larger than the \"truth\" from extragalactic catalog seems problematic. \n",
    "\n",
    "Before we panic, we should, however, remind ourselves of the definition of ellipticities used in these catalogs. \n",
    "For the extragalactic catalog, ellipticity is defined as $(1-q)/(1+q)$, where $q$ is the minor-to-major axis ratio\n",
    "(see the [SCHEMA](https://github.com/LSSTDESC/gcr-catalogs/blob/master/GCRCatalogs/SCHEMA.md#schema-for-extragalatic-catalogs)). \n",
    "On the other hand, for the object catalog, the HSM re-Gaussianization ellipticity that we are using is defined as $(1-q^2)/(1+q^2)$\n",
    "(see e.g., Eq. 8 of [Mandelbaum et al. 2006](https://arxiv.org/abs/astro-ph/0511164)).\n",
    "\n",
    "Hence their definitions are in fact different, so we need to do a conversion before we compare them.\n",
    "With some math, we can find the conversion between the two definitions $e_{\\rm HSM~def} = \\frac{2e_{\\rm EGC~def}}{1+e_{\\rm EGC~def}^2}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# compare the ellipticity (smartly)\n",
    "ellipticity_conversion = lambda e: 2*e / (1.0+e*e)\n",
    "plt.figure(figsize=(5,5));\n",
    "plt.scatter(ellipticity_conversion(matched_gals['ellipticity_true']), matched_gals['shape_hsm_regauss_etot'], s=0.1);\n",
    "lims = [0, 1]\n",
    "plt.plot(lims, lims, c='k', lw=0.5);\n",
    "plt.xlabel('extragalactic ellipticity (in object def.)');\n",
    "plt.ylabel('object ellipticity');\n",
    "plt.xlim(lims);\n",
    "plt.ylim(lims);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This looks much better now! \n",
    "\n",
    "When you were checking the [SCHEMA](https://github.com/LSSTDESC/gcr-catalogs/blob/master/GCRCatalogs/SCHEMA.md#schema-for-extragalatic-catalogs)) file,\n",
    "you probably have also noticed that `ellipticity_true` is the ellipticity before the shear is applied (i.e., unlensed). \n",
    "Hence this comparison is still not an apples-to-apples comparison, as the ellipticity in the object catalog is, of course, lensed. \n",
    "\n",
    "According to the [SCHEMA](https://github.com/LSSTDESC/gcr-catalogs/blob/master/GCRCatalogs/SCHEMA.md#schema-for-extragalatic-catalogs)), we should have been using `ellipticity` from the extragalactic catalog.\n",
    "But unfortunately, this quantity is not directly available from the extragalactic catalog!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def calc_lensed_ellipticity(es1, es2, gamma1, gamma2, kappa):\n",
    "    gamma = gamma1 + gamma2*1j # shear (as a complex number)\n",
    "    es = es1 + es2*1j # intrinsic ellipticity (as a complex number)\n",
    "    g = gamma / (1.0 - kappa) # reduced shear\n",
    "    e = (es + g) / (1.0 + g.conjugate()*es) # lensed ellipticity\n",
    "    return np.absolute(e)\n",
    "    \n",
    "truth_cat.add_derived_quantity('ellipticity', calc_lensed_ellipticity, \n",
    "                                       'ellipticity_1_true', 'ellipticity_2_true', 'shear_1', 'shear_2', 'convergence')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Now let's get the newly defined ellipticity and add to our merged pandas data frame:\n",
    "extragalactic_data_more = truth_cat.get_quantities(\n",
    "    ['galaxy_id', 'ellipticity'],\n",
    "    filters=[(lambda x: np.isin(x, matched_gals['id'].values, True), 'galaxy_id')], native_filters=healpix_filter,\n",
    ")\n",
    "\n",
    "matched_gals = pd.merge(matched_gals, pd.DataFrame(extragalactic_data_more), 'left', left_on='id', right_on='galaxy_id')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Now we compare the ellipticity again (and don't forget the definition conversion!)\n",
    "ellipticity_conversion = lambda e: 2*e / (1.0+e*e)\n",
    "plt.figure(figsize=(5,5));\n",
    "plt.scatter(ellipticity_conversion(matched_gals['ellipticity']), matched_gals['shape_hsm_regauss_etot'], s=0.1);\n",
    "lims = [0, 1]\n",
    "plt.plot(lims, lims, c='k', lw=0.5);\n",
    "plt.xlabel('extragalactic ellipticity (lensed, in object def.)');\n",
    "plt.ylabel('object ellipticity');\n",
    "plt.xlim(lims);\n",
    "plt.ylim(lims);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course the lensing effect is very small, so the unlensed ellipticity (`ellipticity_true`) and the lensed one (`ellipticity`) do not differ much to eye."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "desc-python",
   "language": "python",
   "name": "desc-python"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}