{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using the LSST Stack tools to do positional matching on coadd and src catalogs\n",
"
Owner: **Jim Chiang** ([@jchiang87](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@jchiang87))\n",
"
Updated by: **Javier Sánchez** ([@fjaviersanchez](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@fjaviersanchez))\n",
"
Last Run: **2020-06-12**\n",
"\n",
"In this notebook, we use the data butler to retrieve catalogs from coadd and visit-level analyses of Run2.2i, and use the `lsst.afw.table.matchRaDec` function to do positional matching against galaxy truth info extracted from the cosmoDC2 v1.1.4 extragalactic catalog. To enable this, we show how to create a `SourceCatalog` object from the galaxy truth info provided by the GCR interface.\n",
"\n",
"## Set Up"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"from collections import namedtuple\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import lsst.afw.geom as afw_geom\n",
"import lsst.afw.table as afw_table\n",
"import lsst.daf.persistence as dp\n",
"import GCRCatalogs\n",
"import healpy as hp\n",
"import lsst.geom"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How To Create a Source Catalog\n",
"\n",
"In order to use the Stack's spatial matching code, we will need to reformat the extragalactic catalog galaxy position and magnitude information into an afw `SourceCatalog` object. These table objects are initialized by a \"schema\", which in turn is built from column definition \"Coldef\" objects. The `mag_cols` function below shows how these Coldefs can be created."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def make_SourceCatalog(new_cols):\n",
" \"\"\"\n",
" Make a SourceCatalog to contain id and coordinates for each object, plus any new\n",
" columns.\n",
" \n",
" Parameters\n",
" ----------\n",
" new_cols: list of Coldefs\n",
" Column info for adding to an afw.table schema.\n",
"\n",
" Returns\n",
" -------\n",
" lsst.afw.table.SourceCatalog: An empty SourceCatalog with the desired schema.\n",
" \"\"\"\n",
" # The minimal schema just contains the `id`, `coord_ra`, and `coord_dec` fields.\n",
" schema = afw_table.SourceTable.makeMinimalSchema()\n",
" for coldef in new_cols:\n",
" schema.addField(coldef.name, type=coldef.type, doc=coldef.doc)\n",
" return afw_table.SourceCatalog(schema)\n",
"\n",
"\n",
"def mag_cols(bands):\n",
" \"\"\"Return column information for adding magnitude columns to an afw.table schema.\"\"\"\n",
" Coldef = namedtuple('Coldef', 'name type doc'.split())\n",
" return [Coldef(f'mag_{x}', float, f'{x}-magnitude')\n",
" for x in bands]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting Galaxies\n",
"\n",
"We are going to downselect cosmoDC2 to the sky region being considered. We'll make a general-purpose RegionSelector class, and then sub-class it for selecting objects in either CCDs or patches."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class RegionSelector:\n",
" \"\"\"\n",
" Class to rotate the protoDC2 galaxies to the Run1.1p sky location and downselect those galaxies\n",
" based on a magnitude limit and on the coordinates of the subregion (i.e., patch or CCD) being\n",
" considered.\n",
" \"\"\"\n",
"\n",
" def __init__(self):\n",
" pass\n",
" \n",
" def _set_coord_range(self, bbox, wcs):\n",
" \"\"\"\n",
" Set the coordinate range of the region.\n",
" \n",
" Notes\n",
" -----\n",
" This method is used by the RegionSelector's subclasses.\n",
" \n",
" Parameters\n",
" ----------\n",
" bbox: Calexp.BBox\n",
" Defines corners of region's bounding box\n",
" wcs: Calexp.Wcs\n",
" Defines pixel to world (sky) coordinate transformation\n",
" \"\"\"\n",
" region_box = lsst.geom.Box2D(bbox)\n",
" corners = region_box.getCorners()\n",
" ra_values, dec_values = [], []\n",
" for corner in corners:\n",
" ra, dec = wcs.pixelToSky(corner)\n",
" ra_values.append(ra.asDegrees())\n",
" dec_values.append(dec.asDegrees())\n",
" self.ra_range = min(ra_values), max(ra_values)\n",
" self.dec_range = min(dec_values), max(dec_values)\n",
"\n",
" def __call__(self, gc, band, max_mag):\n",
" \"\"\"\n",
" Create a SourceCatalog object from the input galaxy catalog for the specified band, and\n",
" apply the region and magnitude cuts.\n",
"\n",
" Parameters\n",
" ----------\n",
" gc: GCRCatalogs GalaxyCatalog\n",
" The galaxy catalog obtained via GCR.\n",
" band: str\n",
" The band, e.g., 'i', to use for the magnitude comparison with the values measured\n",
" from the simulated data.\n",
" max_mag: float\n",
" The magnitude limit to apply.\n",
"\n",
" Returns\n",
" -------\n",
" lsst.afw.table.SourceCatalog\n",
" \"\"\"\n",
" # Retrieve the healpix pixels corresponding to the catalog so we don't query the full catalog\n",
" vertices = hp.ang2vec(np.array([self.ra_range[0], self.ra_range[1],\n",
" self.ra_range[1], self.ra_range[0]]),\n",
" np.array([self.dec_range[0], self.dec_range[0],\n",
" self.dec_range[1], self.dec_range[1]]), lonlat=True)\n",
" ipix = hp.query_polygon(32, vertices, inclusive=True)\n",
" # We are going to pass the healpixels that overlap with our catalog as native filters to speed up the process\n",
" native_filter = f'(healpix_pixel == {ipix[0]})'\n",
" for ipx in ipix:\n",
" native_filter=native_filter+f' | (healpix_pixel == {ipx})'\n",
" # Retrieve the desired columns and cut on the magnitude values.\n",
" bandname = f'mag_{band}'\n",
" filter_ = f'{bandname} < {max_mag}'\n",
" print(\"Applying magnitude filter:\", filter_)\n",
" gc_cols = gc.get_quantities(['galaxy_id', 'ra', 'dec',\n",
" bandname], filters=[filter_, \n",
" f'ra > {self.ra_range[0]}',\n",
" f'ra < {self.ra_range[1]}',\n",
" f'dec > {self.dec_range[0]}',\n",
" f'dec < {self.dec_range[1]}',\n",
" ],\n",
" native_filters = native_filter)\n",
" print(\"Number of galaxies within region:\", len(gc_cols['ra']))\n",
"\n",
" # Create a SourceCatalog with the galaxy_ids, coordinates, magnitudes\n",
" galaxy_catalog = make_SourceCatalog(mag_cols((band,)))\n",
" for id_, ra, dec, mag in zip(gc_cols['galaxy_id'], gc_cols['ra'], gc_cols['dec'], gc_cols[bandname]):\n",
" record = galaxy_catalog.addNew()\n",
" record.set('id', id_)\n",
" record.set('coord_ra', lsst.geom.Angle(ra, lsst.geom.degrees))\n",
" record.set('coord_dec', lsst.geom.Angle(dec, lsst.geom.degrees))\n",
" record.set(f'mag_{band}', mag)\n",
" return galaxy_catalog"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class CcdSelector(RegionSelector):\n",
" \"\"\"RegionSelector to use with visit-level calexps, i.e., single CCD exposures.\"\"\"\n",
" def __init__(self, butler, visit, raft, sensor):\n",
" super(CcdSelector, self).__init__()\n",
" # Get the CCD boundaries\n",
" dataId = dict(visit=visit, raft=raft, sensor=sensor)\n",
" calexp = butler.get('calexp', dataId=dataId)\n",
" self._set_coord_range(calexp.getBBox(), calexp.getWcs())\n",
"\n",
"\n",
"class PatchSelector(RegionSelector):\n",
" \"\"\"RegionSelector to use with skyMap patches, i.e., coadd data.\"\"\"\n",
" def __init__(self, butler, tract, patch):\n",
" super(PatchSelector, self).__init__()\n",
" # Get the patch boundaries.\n",
" skymap = butler.get('deepCoadd_skyMap')\n",
" tractInfo = skymap[tract]\n",
" patchInfo = tractInfo.getPatchInfo(eval(patch))\n",
" self._set_coord_range(patchInfo.getOuterBBox(), tractInfo.getWcs())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Matching Experiments\n",
"\n",
"Now we have the tools we need, let's read in the Run 2.2i-DR3 coadd catalog data and match it to the extragalactic catalog input. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a data butler for the repo.\n",
"repo = '/global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/rerun/run2.2i-coadd-wfd-dr3-v1'\n",
"butler = dp.Butler(repo)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sources or Objects?\n",
"\n",
"We can consider visit-level src catalog data, in which case we would provide a `dataId` to the butler with (`visit`, `raft`, `sensor`) ids; or we can consider coadd object data, for which we would provid a `dataId` with (`filter`, `tract`, `patch`) ids.\n",
"\n",
"Somewhat different flux models are available in the Run2.2i data for src catalogs versus coadd catalogs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mag_max = 24.5\n",
"# we_are_matching = \"Sources\"\n",
"we_are_matching = \"Objects\"\n",
"\n",
"if we_are_matching == \"Sources\":\n",
" # Get the src catalog for a selected visit, raft, and sensor:\n",
" visit = 219976\n",
" raft = '2,2'\n",
" sensor = '1,1'\n",
" title = f'Run2.2i-DR3, visit={visit}, raft={raft}, sensor={sensor}'\n",
" dataId = dict(visit=visit, raft=raft, sensor=sensor)\n",
" catalog = butler.get('src', dataId=dataId)\n",
" calexp = butler.get('calexp', dataId=dataId)\n",
" filter_ = calexp.getInfo().getFilter().getFilter()\n",
" calib = calexp.getPhotoCalib()\n",
" #flux_model = 'ext_photometryKron_KronFlux'\n",
" flux_model = 'modelfit_CModel'\n",
" region_selector = CcdSelector(butler, visit, raft, sensor)\n",
"\n",
"else:\n",
" # Get the coadd catalog for a selected filter, tract, and patch:\n",
" filter_ = 'r'\n",
" tract = 4638\n",
" patch = '2,2'\n",
" title = f'Run2.2i-DR3, filter={filter_}, tract={tract}, patch={patch}'\n",
" dataId = dict(tract=tract, patch=patch, filter=filter_)\n",
" catalog = butler.get('deepCoadd_meas', dataId=dataId)\n",
" calexp = butler.get('deepCoadd', dataId=dataId)\n",
" calib = calexp.getPhotoCalib()\n",
" flux_model = 'modelfit_CModel'\n",
" region_selector = PatchSelector(butler, tract, patch)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Focusing on Well-measured Galaxies\n",
"\n",
"Galaxies can be selected as extended objects (or sources) using the `base_ClassificationExtendedness_value`. We use the model flag and flux to ensure that a flux value could be measured, and then apply a selection to ensure that we get deblended objects. Finally, we apply a relatively bright magnitude cut, to avoid confusion when performing the positional match."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Extract columns to use in the selection:\n",
"ext = catalog.get('base_ClassificationExtendedness_value')\n",
"model_flag = catalog.get(flux_model + '_flag')\n",
"model_flux = catalog.get(flux_model + '_instFlux')\n",
"num_children = catalog.get('deblend_nChild')\n",
"\n",
"# Apply the extendedness, flag, and blendedness cuts using the subset method:\n",
"cat_temp = catalog.subset((ext == 1) &\n",
" (model_flag == False) &\n",
" (model_flux > 0) &\n",
" (num_children == 0))\n",
"# Extract the magnitude and again use subset to apply the depth cut:\n",
"_mag = calib.instFluxToMagnitude(cat_temp, flux_model) \n",
"mag = _mag[:,0] #Magnitude\n",
"magerr = _mag[:,1] #Magnitude Error\n",
"#print(mag, len(mag))\n",
"cat_temp = cat_temp.subset(mag < mag_max)\n",
"\n",
"# Repackage everything in a more minimal SourceCatalog and add a magnitude column for comparing\n",
"# to the galaxy catalog truth values.\n",
"drp_catalog = make_SourceCatalog(mag_cols((filter_,)))\n",
"for record in cat_temp:\n",
" new_rec = drp_catalog.addNew()\n",
" for name in 'id coord_ra coord_dec parent'.split():\n",
" new_rec.set(name, record[name])\n",
" mag = calib.instFluxToMagnitude(record, flux_model)\n",
" new_rec.set(f'mag_{filter_}', mag.value)\n",
" \n",
"print(\"Number of observed objects in our DRP galaxy catalog\", len(drp_catalog))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extracting the Truth Info\n",
"\n",
"We can now use our region_selector object to process the protoDC2 extragalactic catalog. Note that while we instantiated it with a butler, so that it could work on selecting galaxy observations from either a tract or a CCD, we can _call_ it as a function (via its `__call__` method) which takes a GCR catalog object as its first argument. The result will be a DM Stack SourceCatalog object, that we can match to our observed DRP catalog."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Read in the galaxy catalog data.\n",
"with warnings.catch_warnings():\n",
" warnings.filterwarnings('ignore')\n",
" gc = GCRCatalogs.load_catalog('cosmoDC2_v1.1.4_image')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a SourceCatalog from the gc data, applying the region and magnitude selections.\n",
"galaxy_catalog = region_selector(gc, band=filter_, max_mag=mag_max)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compare the number of galaxies within our region with th enumber of observed galaxies in the DRP catalog. Is this what you would expect?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Doing the Matching\n",
"We can now carry out the spatial matching, and compute some quantities to plot."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Find positional matches within 100 milliarcseconds:\n",
"radius = lsst.geom.Angle(0.1, lsst.geom.arcseconds)\n",
"matches = afw_table.matchRaDec(drp_catalog, galaxy_catalog, radius)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`matches` is a list of `match` objects, each one containing an observed-true matchd galaxy pair. The code below shows how to work with these, looping over the matches and extracting information to plot."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Compare magnitudes for matched objects:\n",
"drp_mag = np.zeros(len(matches), dtype=np.float)\n",
"gc_mag = np.zeros(len(matches), dtype=np.float)\n",
"sep = np.zeros(len(matches), dtype=np.float)\n",
"# Arrays for a quiver plot.\n",
"u = np.zeros(len(matches), dtype=np.float)\n",
"v = np.zeros(len(matches), dtype=np.float)\n",
"for i, match in enumerate(matches):\n",
" drp_mag[i] = match.first[f'mag_{filter_}']\n",
" gc_mag[i] = match.second[f'mag_{filter_}']\n",
" sep[i] = np.degrees(match.distance)*3600.*1000.\n",
" u[i] = match.first['coord_ra'] - match.second['coord_ra']\n",
" v[i] = match.first['coord_dec'] - match.second['coord_dec']\n",
"print(\"Number of matches:\", len(matches))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Start a 2x2 panel figure:\n",
"fig = plt.figure(figsize=(8, 8))\n",
"frame_axes = fig.add_subplot(111, frameon=False)\n",
"frame_axes.set_title(title)\n",
"frame_axes.get_xaxis().set_ticks([])\n",
"frame_axes.get_yaxis().set_ticks([])\n",
"\n",
"delta_mag = drp_mag - gc_mag # Observed - True\n",
"\n",
"# Upper Left: Histogram of match separations.\n",
"fig.add_subplot(2, 2, 1)\n",
"plt.hist(sep, range=(0, 100), histtype='step', bins=40)\n",
"plt.xlabel('separation (marcsec)')\n",
"plt.ylabel('entries / bin')\n",
"\n",
"# Upper Right: Quiver plot of (DRP - galaxy_catalog) positions on the sky.\n",
"fig.add_subplot(2, 2, 2)\n",
"plt.quiver(np.degrees(drp_catalog['coord_ra']),\n",
" np.degrees(drp_catalog['coord_dec']),\n",
" u, v)\n",
"plt.xlabel('RA (deg)')\n",
"plt.ylabel('Dec (deg)')\n",
"\n",
"# Lower left: Difference in magnitudes vs true magnitude (mag_gc).\n",
"fig.add_subplot(2, 2, 3)\n",
"plt.errorbar(gc_mag, delta_mag, fmt='.', alpha=0.1)\n",
"plt.xlabel(f'True mag {filter_}_gc'.format(filter_))\n",
"plt.ylabel(f'Mag difference ({filter_}_gc - {filter_}_drp)')\n",
"\n",
"# Difference in magnitudes vs separation.\n",
"fig.add_subplot(2, 2, 4)\n",
"plt.errorbar(sep, delta_mag, fmt='.', alpha=0.1)\n",
"plt.xlabel('separation (mas)')\n",
"plt.ylabel(f'Mag difference ({filter_}_gc - {filter_}_drp)')\n",
"\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Going Further\n",
"\n",
"The available columns in a `SourceCatalog` can be seen by printing the info from the schema that it carries around with it. The cells below show you what you have available. The drp_catalog and galaxy_catalog that we made to do the spatial matching only have positions and magnitudes in them - but the parent catalogs have many more quantities. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for item in drp_catalog.schema:\n",
" print(f\"{item.field.getName()}: {item.field.getDoc()}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for item in galaxy_catalog.schema:\n",
" print(f\"{item.field.getName()}: {item.field.getDoc()}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# All the DRP measurements:\n",
"for item in catalog.schema:\n",
" print(f\"{item.field.getName()}: {item.field.getDoc()}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# All the cosmoDC2 parameters:\n",
"# help(gc)\n",
"gc.list_all_quantities()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "desc-stack",
"language": "python",
"name": "desc-stack"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}