{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Accessing the Coadd data using the Data Butler\n",
"
Owner: **Jim Chiang** ([@jchiang87](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@jchiang87))\n",
"
Last Verified to Run: **2024-03-01** (by @fjaviersanchez)\n",
"\n",
"This notebook shows how to read in the coadd catalog data for an entire tract using the LSST Data Management (DM) data butler, and construct a Pandas dataframe with the columns needed to apply the weak lensing cuts used in recent analysis of data from Hyper Suprime Cam [Mandelbaum et al (2018)](https://arxiv.org/abs/1705.06745) using cuts given by [Francois in Slack](https://lsstc.slack.com/archives/C77DDKZHR/p1525197444000687).\n",
"\n",
"We have a [notebook that performs a similar analysis using GCR](https://github.com/LSSTDESC/DC2-analysis/blob/master/Notebooks/DC2%20Coadd%20Run1.1p%20GCR%20access%20--%20HSC%20selection.ipynb). See also [the more general script](https://github.com/LSSTDESC/DC2-analysis/blob/master/scripts/merge_tract_cat.py) for extracting the coadd data from a tract into an hdf5 file.\n",
"\n",
"### Learning Objectives\n",
"After completing and studying this Notebook, you should be able to\n",
"1. Read in the coadd catalog data using the DM Butler\n",
"2. Construct a Pandas data frame\n",
"3. Apply the HSC weak lensing cuts to a catalog of DM-processed data.\n",
"\n",
"## Set Up"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"import lsst.daf.persistence as dp\n",
"\n",
"from desc_dc2_dm_data import REPOS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How To Read DM Science Pipelines Coadd Catalogs into a Pandas Dataframe\n",
"\n",
"Here's a function to do this: pass it a butler and some information about what you want, and get a pandas df object back. In this example, we'll get the forced photometry as well as the model measurements - these are stored in different tables."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"REPOS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"tract = skymap[4851]\n",
"filter = 'i'\n",
"for patch in tract:\n",
" patchId = '%d,%d' % patch.getIndex()\n",
" dataId = dict(filter=filter_, tract=tract.getId(), patch=patchId)\n",
"\n",
" # Patches for which there is no data will raise a `NoResults` exception when requested\n",
" # from the butler. We catch those exceptions, note the missing data, and \n",
" # continue looping over the patches.\n",
" try:\n",
" forced = butler.get('deepCoadd_forced_src', dataId)\n",
" calib = butler.get('deepCoadd_calexp_photoCalib', dataId)\n",
" merged = butler.get('deepCoadd_ref', dataId)\n",
" except dp.NoResults as eobj:\n",
" print(eobj)\n",
" continue"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def read_tract(butler, tract, filter_='i', num_patches=None):\n",
" \"\"\"\n",
" Read in the coadd catalog data from the specified tract, looping over patches and concatenating\n",
" into a pandas data frame. \n",
" \n",
" Notes\n",
" -----\n",
" The `merged` coadd catalog is supplemented with the CModel fluxes from the `forced_src` \n",
" catalog for the desired filter.\n",
"\n",
" Parameters\n",
" ----------\n",
" butler: lsst.daf.persistence.Butler\n",
" Data butler that points to the data repository of interest, e.g., the output repository\n",
" for the Run1.1p data.\n",
" tract: lsst.skymap.tractInfo.TractInfo\n",
" The object containing the tract information as obtained from a skyMap object, e.g., via \n",
" `tract = skymap[tractId]` where `tractId` is the tract number.\n",
" filter_: str ['i']\n",
" Filter to use for extracting the `forced_src` fluxes.\n",
" num_patches: int [None]\n",
" Number of patches to consider. If None, then use all of the available patches in the tract.\n",
"\n",
" Returns\n",
" -------\n",
" pandas.DataFrame: This will contain the merged coadd data with the band-specific columns.\n",
" \"\"\"\n",
" if num_patches is None:\n",
" num_patches = len(tract)\n",
" \n",
" df_list = []\n",
" i = 0\n",
" print(\"Loop Patch Nobj\")\n",
" print(\"=================\")\n",
" for patch in tract:\n",
" patchId = '%d,%d' % patch.getIndex()\n",
" dataId = dict(filter=filter_, tract=tract.getId(), patch=patchId)\n",
"\n",
" # Patches for which there is no data will raise a `NoResults` exception when requested\n",
" # from the butler. We catch those exceptions, note the missing data, and \n",
" # continue looping over the patches.\n",
" try:\n",
" forced = butler.get('deepCoadd_forced_src', dataId)\n",
" calib = butler.get('deepCoadd_calexp_photoCalib', dataId)\n",
" merged = butler.get('deepCoadd_ref', dataId)\n",
" except dp.NoResults as eobj:\n",
" print(eobj)\n",
" continue\n",
"\n",
" # Convert the merged coadd catalog into a pandas DataFrame and add the band-specific\n",
" # quantities from the forced_src coadd catalog as new columns.\n",
" data = merged.asAstropy().to_pandas()\n",
" data[filter_ + '_modelfit_CModel_instFlux'] = forced['modelfit_CModel_instFlux']\n",
" data[filter_ + '_modelfit_CModel_instFluxErr'] = forced['modelfit_CModel_instFluxErr']\n",
"\n",
" # The calib object applies the zero point to get calibrated magnitudes.\n",
" \n",
" _aux = calib.instFluxToMagnitude(forced, 'modelfit_CModel')\n",
" data[filter_ + '_mag_CModel'] = _aux[:, 0]\n",
" data[filter_ + '_mag_err_CModel'] = _aux[:, 1]\n",
" data[filter_ + '_modelfit_CModel_SNR'] \\\n",
" = forced['modelfit_CModel_instFlux']/forced['modelfit_CModel_instFluxErr']\n",
" data['ext_shapeHSM_HsmShapeRegauss_abs_e'] \\\n",
" = np.hypot(data['ext_shapeHSM_HsmShapeRegauss_e1'],\n",
" data['ext_shapeHSM_HsmShapeRegauss_e2'])\n",
" df_list.append(data)\n",
"\n",
" # Print the current patch and number of objects added to the final DataFrame.\n",
" print(\" {} {} {}\".format(i, patchId, len(data)))\n",
" i += 1\n",
" if i == num_patches:\n",
" break\n",
" return pd.concat(df_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Accessing the DM Science Pipeline processed data\n",
"\n",
"In order to access the DM Science Pipeline Processed outputs, including images and catalog data, we summon a data butler and provide the path to the data repository where the DM Science Pipeline tasks have written their outputs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"butler = dp.Butler(REPOS['2.2i_dr6'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `skyMap` object stores the information defining how we decided to divide up the sky into tracts and patches when we did the coaddition. Here we pick a tract that is near the center of the protoDC2 simulation region."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"skymap = butler.get('deepCoadd_skyMap')\n",
"tractId = 4851\n",
"filter_ = 'i'\n",
"num_patches = None\n",
"data = read_tract(butler, skymap[tractId], filter_=filter_, num_patches=num_patches)\n",
"print(\"Number of objects in the coadd catalog:\", len(data))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making the Selection\n",
"\n",
"First off, we need to apply a baseline set of cuts to remove nan's."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"base_mask = ~(np.isnan(data['i_modelfit_CModel_instFlux'])\n",
" | np.isnan(data['ext_shapeHSM_HsmShapeRegauss_resolution'])\n",
" | np.isnan(data['ext_shapeHSM_HsmShapeRegauss_e1']))\n",
"data = data[base_mask]\n",
"print(\"Number of objects after baseline cuts:\", len(data))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we get to the weak lensing cuts. The `detect_isPrimary` flag identifies the primary object in the overlap regions between patches, i.e., applying it resolves duplicates from the different analyses of overlapping patches."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mask = data['detect_isPrimary']\n",
"mask &= data['deblend_skipped'] == False\n",
"mask &= data['base_PixelFlags_flag_edge'] == False\n",
"mask &= data['base_PixelFlags_flag_interpolatedCenter'] == False\n",
"mask &= data['base_PixelFlags_flag_saturatedCenter'] == False\n",
"mask &= data['base_PixelFlags_flag_crCenter'] == False\n",
"mask &= data['base_PixelFlags_flag_bad'] == False\n",
"mask &= data['base_PixelFlags_flag_suspectCenter'] == False\n",
"mask &= data['base_PixelFlags_flag_clipped'] == False\n",
"mask &= data['ext_shapeHSM_HsmShapeRegauss_flag'] == False\n",
"\n",
"# Cut on measured object properties\n",
"mask &= data['i_modelfit_CModel_SNR'] >= 10\n",
"mask &= data['ext_shapeHSM_HsmShapeRegauss_resolution'] >= 0.3\n",
"mask &= data['ext_shapeHSM_HsmShapeRegauss_abs_e'] < 2\n",
"mask &= data['ext_shapeHSM_HsmShapeRegauss_sigma'] <= 0.4\n",
"mask &= data['i_mag_CModel'] < 24.5 # !!! Doesnt have exinction correction\n",
"mask &= data['base_Blendedness_abs'] < 10**(-0.375)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's make some plots of galaxy properties with the above cuts applied. The figure to compare with is Fig. 22 of [Mandelbaum et al (2018)](https://arxiv.org/abs/1705.06745)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(10, 10))\n",
"hist_kwds = dict(bins=100, histtype='step', normed=True, linewidth=2.0, color=\"black\")\n",
"p1 = fig.add_subplot(2, 2, 1)\n",
"plt.hist(data['ext_shapeHSM_HsmShapeRegauss_resolution'][mask], **hist_kwds);\n",
"plt.xlabel('{}-band HSM resolution'.format(filter_))\n",
"plt.xlim([0.2,1])\n",
"p2 = fig.add_subplot(2, 2, 2)\n",
"plt.hist(data[filter_ + '_modelfit_CModel_SNR'][mask], range=(0, 100), **hist_kwds);\n",
"plt.xlabel('{}-band CModel S/N'.format(filter_))\n",
"plt.xlim([5,100])\n",
"p3 = fig.add_subplot(2, 2, 3)\n",
"plt.hist(data[filter_ + '_mag_CModel'][mask], **hist_kwds);\n",
"plt.xlabel(' {}-band CModel mag'.format(filter_))\n",
"plt.xlim([20,25])\n",
"p4 = fig.add_subplot(2, 2, 4)\n",
"plt.hist(data['ext_shapeHSM_HsmShapeRegauss_abs_e'][mask], **hist_kwds);\n",
"plt.xlabel('HSM distortion magnitude |e|')\n",
"plt.xlim([0,2]);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's the published figure, for comparison: \n",
"\n",
"![](https://user-images.githubusercontent.com/710903/42742471-a196979e-886f-11e8-81f6-a89e425f9be3.png)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "desc-stack",
"language": "python",
"name": "desc-stack"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}