{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DC2 Object Catalog Run2.2i GCR tutorial -- Part II: Lensing Cuts\n", "\n", "\n", "Owners: **Francois Lanusse [@EiffL](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@EiffL), Javier Sanchez [@fjaviersanchez](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@fjaviersanchez)** \n", "Last Verified to Run: **2020-07-02** (by @jrbogart)\n", "\n", "This notebook is the second part of the tutorial ([Part I here](object_gcr_1_intro.ipynb)). Here, we present more advanced features of the DPDD-like object catalog, which contains the detected objects at the coadd level, through the Generic Catalog Reader [GCR](https://github.com/yymao/generic-catalog-reader) and build a lensing sample which we compare to the published [HSC Y1 sample](https://hsc-release.mtk.nao.ac.jp/doc/). The final goal is to show how the different filtering mechanisms, features of `GCR`, and the samples presented here can be useful for your science.\n", "\n", "__Learning objectives__:\n", "\n", "After going through this notebook, you should be able to:\n", " 1. Select a clean sample of galaxies for weak lensing.\n", " 2. Compute a depth map for a given SNR threshold.\n", " 3. Load both the DC2 Run2.2i catalog and a selection from the HSC Public Data Release XMM field.\n", " 4. Directly access the coadd images to investigate suspicious objects.\n", "\n", "__Logistics__: This notebook is intended to be run through the JupyterHub NERSC interface available here: https://jupyter-dev.nersc.gov. To setup your NERSC environment, please follow the instructions available here: https://confluence.slac.stanford.edu/display/LSSTDESC/Using+Jupyter-dev+at+NERSC\n", "\n", "__Other notes__: This notebook uses the [LSST DM stack](https://pipelines.lsst.io/) (see on the top right corner that we are using the `desc-stack` kernel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting a useful sample of galaxies: lensing cuts from HSC DR1\n", "\n", "In this notebook, we will build step by step a sample of galaxies from the DC2 run2.2i object catalog, and compare it to an equivalent sample built from the HSC DR1 catalog. See more info on [Aihara et al. 2018](http://adsabs.harvard.edu/abs/2018PASJ...70S...8A)\n", "\n", "### Sample selection\n", "\n", "We will start from a set of basic sanity cuts that will select extended objects and reject problematic sources, including those for which shape measurement has failed.\n", "\n", "One subtlety is that shape measurement is only run for the *reference band*, which is most of the time the i-band, but not always, we will further restrict the sample to objects for which we have i-band shapes using the `merge_measurement_i` flag." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import GCRCatalogs\n", "from GCR import GCRQuery\n", "# Load the object catalog\n", "run = 'run2.2i_dr3'\n", "tract_s = '3081'\n", "catalog = GCRCatalogs.load_catalog(f'dc2_object_{run}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "basic_cuts = [\n", " GCRQuery('extendedness > 0'), # Extended objects\n", " GCRQuery((np.isfinite, 'mag_i')), # Select objects that have i-band magnitudes\n", " GCRQuery('clean'), # The source has no flagged pixels (interpolated, saturated, edge, clipped...) \n", " # and was not skipped by the deblender\n", " GCRQuery('xy_flag == 0'), # Flag for bad centroid measurement\n", " GCRQuery('ext_shapeHSM_HsmShapeRegauss_flag == 0'), # Error code returned by shape measurement code\n", " GCRQuery((np.isfinite, 'ext_shapeHSM_HsmShapeRegauss_sigma')), # Shape measurement uncertainty should not be NaN\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to these basic cuts, we will want to apply a set of cuts based on object properties, to ensure we are selecting well resolved and well measured galaxies. One of these properties is the measured total distortion, which is not directly defined in the schema, but can be derived from the measured $e1$, $e2$ distortion components according to $|e| = \\sqrt{e_1^2 + e_2^2 }$\n", "\n", "The GCR provides a convenience function, `add_quantity_modifier`, to add this quantity to the schema on the fly, so that we can use it afterwards to build our cuts:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Adds the new derived column \n", "catalog.add_quantity_modifier('shape_hsm_regauss_etot', \n", " (np.hypot, 'ext_shapeHSM_HsmShapeRegauss_e1', 'ext_shapeHSM_HsmShapeRegauss_e2'), \n", " overwrite=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Define lensing cuts on galaxy properties \n", "properties_cuts = [\n", " GCRQuery('snr_i_cModel > 10'), # SNR > 10\n", " GCRQuery('mag_i_cModel < 24.5'), # cModel imag brighter than 24.5\n", " GCRQuery('ext_shapeHSM_HsmShapeRegauss_resolution >= 0.3'), # Sufficiently resolved galaxies compared to PSF\n", " GCRQuery('shape_hsm_regauss_etot < 2'), # Total distortion in reasonable range\n", " GCRQuery('ext_shapeHSM_HsmShapeRegauss_sigma <= 0.4'), # Shape measurement errors reasonable\n", "]\n", "\n", "# We can now extract our lensing sample \n", "quantities = ['mag_i_cModel', 'snr_i_cModel', 'shape_hsm_regauss_etot', 'ext_shapeHSM_HsmShapeRegauss_resolution']\n", "data_basic = catalog.get_quantities(quantities, \n", " filters=basic_cuts+properties_cuts, \n", " native_filters=[f'tract == {tract_s}'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(10,10))\n", "plt.subplot(221)\n", "plt.hist(data_basic['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True);\n", "plt.xlabel('i-band resolution')\n", "plt.xlim()\n", "plt.subplot(222)\n", "plt.hist(data_basic['snr_i_cModel'], 100, range=[0,100], density=True)\n", "plt.xlabel('i-band cmodel S/N')\n", "plt.subplot(223)\n", "plt.hist(data_basic['mag_i_cModel'], 100, range=[20,25], density=True);\n", "plt.xlabel('i-band cmodel mag')\n", "plt.subplot(224)\n", "plt.hist(data_basic['shape_hsm_regauss_etot'],100, density=True);\n", "plt.xlabel('Ellipticity magniture |e|');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can compare this plot to Fig 12. in [Mandelbaum et al. 2018](http://adsabs.harvard.edu/abs/2018PASJ...70S..25M):\n", "\n", "\n", "A quick visual comparison will highlight two things:\n", "\n", " - We are missing a lot of galaxies between about 24.2 and 24.5 mag\n", " - We have a bump near resolution of 1\n", "\n", "We are going to investigate to understand these differences below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Looking at the depth of the survey" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One obvious reason why we would be missing some faint galaxies is if DC2 run2.2i is shallower than HSC. We will test this here by measuring the depths of both Run2.2i and the HSC DR1 XMM field. More generally, it is also a concern for most science analysis to have spatially uniform sampled data, which can be checked by looking at the depth of the sample. \n", "\n", "There are several ways to do this, in this case, we are going to check what's the magnitude which has a median SNR closest to 10, the SNR cut of our lensing sample.\n", "\n", "Code to do this calculation for the HSC DR1 XMM field is commented out below because that catalog is not currently available, but we know the answer from previous runs. That is printed out for comparison with the calculated number for DC2 run2.2i" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import an efficient alternative to binned_statistic_2d, defined in utils/cic.py\n", "from utils.cic import binned_statistic\n", " \n", "import healpy as hp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def depth_map_snr (ra, dec, mags, snr,snr_threshold=10,nside=2048):\n", " \"\"\"\n", " Constructs a depth map on a healpix grid for a given SNR threshold.\n", " \n", " Parameters\n", " ----------\n", " ra, dec: Array of coordinates on the sky (in deg.)\n", " mags, snr : measured magnitude and snr for the sample\n", " snr_threshold: SNR\n", " \"\"\"\n", " # Remove potentially problematic entries\n", " good = np.logical_or(np.logical_not(np.isnan(ra)),np.logical_not(np.isnan(dec)))\n", " # Create array of healpix pixel indices corresponding to coordinates \n", " pix_nums = hp.ang2pix(nside,np.pi/2.-dec[good]*np.pi/180,ra[good]*np.pi/180)\n", " \n", " # Create output map\n", " map_out = np.zeros(12*nside**2)\n", " \n", " # Bins in magnitudes\n", " bin_centers = np.linspace(22+6/30.,28-6/30.,30)\n", " \n", " # For each healpix pixel\n", " for px in np.unique(pix_nums):\n", " # Select all objects within this pixel\n", " mask = px==pix_nums\n", " if np.count_nonzero(mask)>0:\n", " # Compute median snr in bins of magnitude\n", " median_snr = binned_statistic(mags[mask],snr[mask],np.nanmedian,nbins=30,range=(22,28))\n", " mask2 = np.isnan(median_snr)==False\n", " # Find magnitude corresponding to snr threshold\n", " if np.count_nonzero(mask2)>0:\n", " depth = bin_centers[mask2][np.argmin(np.fabs(median_snr[mask2] - snr_threshold))]\n", " map_out[px]=depth\n", " else:\n", " map_out[px]=0\n", " else:\n", " map_out[px]=0.\n", " return map_out" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "quantities = ['ra', 'dec', 'mag_i_cModel', 'snr_i_cModel', 'ext_shapeHSM_HsmShapeRegauss_resolution']\n", "\n", "# Data from DC2 run2.2i\n", "data_dc2 = catalog.get_quantities(quantities, \n", " filters=basic_cuts, # Note the only apply the basic_cuts\n", " native_filters=[f'tract == {tract_s}'])\n", "\n", "# Data from HSC DR1 XMM field \n", "# cat_hsc = GCRCatalogs.load_catalog('hsc-pdr1-xmm')\n", "# data_hsc = cat_hsc.get_quantities(quantities,\n", "# filters=basic_cuts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Note__: Runtime warning will arise due to objects with incorrect fluxes but the results are unaffected by them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compute i-band depth maps for Run2.2i, print median\n", "m10map_dc2 = depth_map_snr(data_dc2['ra'], data_dc2['dec'], data_dc2['mag_i_cModel'], data_dc2['snr_i_cModel'])\n", "print(f\"{run} median i-band 10-sigma depth \", np.median(m10map_dc2[m10map_dc2 > 0.0]))\n", "\n", "# Print the corresponding value for the HSC data, computed previously\n", "print(\"HSC XMM median i-band 10-sigma depth \", 25.2896551722413795)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pickle\n", "import gzip\n", "def save_map(outfile, depth_map):\n", " f = gzip.open(outfile, 'wb')\n", " pickle.dump(depth_map, f)\n", " f.close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Uncomment the following to save the depth map. It's used in the next tutorial\n", "# There is no need to save unless you've changed run or tract\n", "# map_dir = 'your_choice_goes_here'\n", "# save_map(f'{map_dir}/m10map_{run}_{tract_s}.pklz', m10map_dc2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we see the difference between the two samples, HSC is deeper. (In the case of Run 1.1p, this is partly due to the fact that it was interrupted mid-run so it doesn't reach the full depth of DC2. For Run2.2i_dr3, about 500 visits in the center of the field were omitted. If we had picked a tract in this deficit area the plots and the depth measurement would look considerably worse.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Impact of blendedness\n", " \n", "The second discrepancy between our sample and the Mandelbaum et al. plot is an excess of galaxies appearing very well resolved compared to the PSF (resolution > 0.9). To understand this difference, we are going to select a few of these objects and extract postage stamps from the DM stack for visual inspection.\n", "\n", "We begin by selecting galaxies in our lensing sample which are near perfectly resolved by adding a cut on resolution:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sample_cut = basic_cuts + properties_cuts + ['ext_shapeHSM_HsmShapeRegauss_resolution >= 0.98']\n", "\n", "data = catalog.get_quantities(['ra', 'dec', 'mag_i_cModel', 'ext_shapeHSM_HsmShapeRegauss_resolution'], \n", " filters=sample_cut,\n", " native_filters=['tract == 3081'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will extract a few postage stamps at these coordinates, to do so we will reuse some of the code from the [DC2 Postage Stamps tutorial](/DC2%20Postage%20Stamps.ipynb). Please have a look at that tutorial to fully understand the function we will be using here, but in a nustshell we are going to query the DM data Butler to retrieve cutouts of the Deep Coadd exposures of these objects in the i-band." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from desc_dc2_dm_data import REPOS\n", "\n", "import lsst.daf.persistence as dafPersist\n", "import lsst.geom as afwGeom\n", "import lsst.afw.coord as afwCoord\n", "import lsst.afw.image as afwImage\n", "import lsst.afw.display as afwDisplay\n", "\n", "from astropy.table import Table\n", "from astropy.visualization import ZScaleInterval\n", "\n", "# Please check the DC2 Postage Stamps tutorial for all the details of how this works\n", "def cutout_coadd_ra_dec(butler, ra, dec, filt='i', datasetType='deepCoadd', \n", " skymap=None, cutoutSideLength=50, **kwargs):\n", " \"\"\"Produce a cutout from a coadd at the given ra,dec coordinates\n", " \n", "\n", " Parameters\n", " --\n", " butler - lsst.daf.persistence.Butler of the data repository\n", " ra, dec - coordinates of the center of the cutout (in degrees).\n", " filter - Filter of the image to load\n", " datasetType - 'deepCoadd' Which type of coadd to load. Doesn't support 'calexp'\n", " \n", " skymap - [optional] Pass in to avoid the Butler read. Useful if you have lots of them.\n", " cutoutSideLength - [optional] Side of the cutout region in pixels.\n", " \n", " Returns\n", " --\n", " MaskedImage\n", " \"\"\"\n", " # Create a lsst.afw.geom.SpherePoint coordinates object\n", " radec = afwGeom.SpherePoint(ra, dec, afwGeom.degrees)\n", " cutoutSize = afwGeom.ExtentI(cutoutSideLength, cutoutSideLength)\n", "\n", " if skymap is None:\n", " skymap = butler.get(\"deepCoadd_skyMap\")\n", " \n", " # Retrieves the tract, patch info for these coordinates from the skymap\n", " tractInfo = skymap.findTract(radec)\n", " patchInfo = tractInfo.findPatch(radec)\n", " \n", " # Get pixel coordinates on the tract\n", " xy = afwGeom.PointI(tractInfo.getWcs().skyToPixel(radec))\n", " bbox = afwGeom.BoxI(xy - cutoutSize//2, cutoutSize)\n", "\n", " coaddId = {'tract': tractInfo.getId(), 'patch': \"%d,%d\" % patchInfo.getIndex(), 'filter': filt}\n", " \n", " cutout_image = butler.get(datasetType+'_sub', bbox=bbox, immediate=True, dataId=coaddId)\n", " \n", " return cutout_image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this tool to extract cutouts in hand, let's have a look at a few examples in our sample:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create an instance of the data butler for our run data repository\n", "repo = REPOS[f'{run}'[3:]]\n", "butler = dafPersist.Butler(repo)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(4, 4, figsize=(15,15))\n", "\n", "for ra, dec, ax_this in zip(data['ra'], data['dec'], ax.flat):\n", " # Extract the cutout using the data butler\n", " cutout_image = cutout_coadd_ra_dec(butler, ra, dec, filt='i')\n", " \n", " # Plot the postage stamp on the same scales, with some arcsinh range compression \n", " ax_this.imshow(np.arcsinh(cutout_image.image.array), vmax=4, cmap='binary');\n", " \n", " # Let's add a crosshair to guide the eye\n", " ax_this.axhline(25, color='k', alpha=0.5);\n", " ax_this.axvline(25, color='k', alpha=0.5);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### So, one thing may jump to eye, these objects are supposed to be large, extremely well resolved, yet they often look fairly small... but they have a lot of sometimes very bright neighbours. This is very suspicious and may indicate a problem with the DM deblender.\n", "\n", "Looking at the [SCHEMA.md](https://github.com/LSSTDESC/gcr-catalogs/blob/master/GCRCatalogs/SCHEMA.md#schema-for-dc2-coadd-catalogs) one may notice a field named `blendedness`. This is a metric produced by the DM stack defined for deblended objects and \"measures the fraction of the total flux in the neighborhood of a source that belongs to its neighbors\" [(Bosch et. 2018)](http://adsabs.harvard.edu/abs/2018PASJ...70S...5B).\n", "\n", "Let's have a look at the `blendedness` values of the examples in our sample:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.update(catalog.get_quantities(['blendedness'], \n", " filters=sample_cut,\n", " native_filters=['tract == 3081']))\n", "\n", "data['blendedness'][:16]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As one would expect, they mostly have large values, close to 1, indicating that these objects belong to very blended neighborhoods. Because the deblender seems to be failing for these objects, we can use this metric to try to exclude them from our sample. \n", "\n", "As a matter of fact, this is what was done in the ([Mandelbaum et al., 2018](http://adsabs.harvard.edu/abs/2018PASJ...70S..25M)) paper where an additional cut on blendedness was introduced to guard against deblender failures. \n", "\n", "Let's try to rebuild our sample following the same approach:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "properties_cuts = [\n", " GCRQuery('snr_i_cModel > 10'), # SNR > 10\n", " GCRQuery('mag_i_cModel < 24.5'), # cModel imag brighter than 24.5\n", " GCRQuery('ext_shapeHSM_HsmShapeRegauss_resolution >= 0.3'), # Sufficiently resolved galaxies compared to PSF\n", " GCRQuery('shape_hsm_regauss_etot < 2'), # Total distortion in reasonable range\n", " GCRQuery('ext_shapeHSM_HsmShapeRegauss_sigma <= 0.4'), # Shape measurement errors reasonable\n", " # New cut on blendedness:\n", " GCRQuery('blendedness < 10**(-0.375)') # Avoid spurious detections and those contaminated by blends\n", "]\n", "\n", "quantities = ['mag_i_cModel', 'snr_i_cModel', 'shape_hsm_regauss_etot', 'ext_shapeHSM_HsmShapeRegauss_resolution']\n", "data = catalog.get_quantities(quantities, \n", " filters=basic_cuts + properties_cuts, \n", " native_filters=['tract == 3081'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(10,10))\n", "plt.subplot(221)\n", "plt.hist(data['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True, histtype='step', label='New cuts');\n", "plt.xlabel('i-band resolution')\n", "plt.xlim()\n", "plt.subplot(222)\n", "plt.hist(data['snr_i_cModel'], 100, range=[0,100], density=True, histtype='step', label='New cuts')\n", "plt.xlabel('i-band cmodel S/N')\n", "plt.subplot(223)\n", "plt.hist(data['mag_i_cModel'], 100, range=[20,25], density=True, histtype='step', label='New cuts');\n", "plt.xlabel('i-band cmodel mag')\n", "plt.subplot(224)\n", "plt.hist(data['shape_hsm_regauss_etot'],100, density=True, histtype='step', label='New cuts');\n", "plt.xlabel('Ellipticity magniture |e|');\n", "plt.subplot(221)\n", "plt.hist(data_basic['ext_shapeHSM_HsmShapeRegauss_resolution'], 100, range=[0,1], density=True, histtype='step', label='Original cuts');\n", "plt.xlabel('i-band resolution')\n", "plt.legend(loc='best');\n", "plt.xlim()\n", "plt.subplot(222)\n", "plt.hist(data_basic['snr_i_cModel'], 100, range=[0,100], density=True, histtype='step', label='Original cuts')\n", "plt.xlabel('i-band cmodel S/N')\n", "plt.subplot(223)\n", "plt.hist(data_basic['mag_i_cModel'], 100, range=[20,25], density=True, histtype='step', label='Original cuts');\n", "plt.xlabel('i-band cmodel mag')\n", "plt.subplot(224)\n", "plt.hist(data_basic['shape_hsm_regauss_etot'],100, density=True, histtype='step', label='Original cuts');\n", "plt.xlabel('Ellipticity magniture |e|');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\o/ The excess of high resolution objects has disappeared, mystery solved!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__If you want, you can check out [Part III: Challenges](object_gcr_3_challenges.ipynb)!__" ] } ], "metadata": { "kernelspec": { "display_name": "desc-stack", "language": "python", "name": "desc-stack" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 4 }