{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6. Photometry\n",
    "\n",
    "This is the sixth in a series of notebooks related to astronomy data.\n",
    "\n",
    "As a continuing example, we will replicate part of the analysis in a recent paper, \"[Off the beaten path: Gaia reveals GD-1 stars outside of the main stream](https://arxiv.org/abs/1805.00425)\" by Adrian M. Price-Whelan and Ana Bonaca.\n",
    "\n",
    "In the previous lesson we downloaded photometry data from Pan-STARRS, which is available from the same server we've been using to get Gaia data. \n",
    "\n",
    "The next step in the analysis is to select candidate stars based on the photometry data.  \n",
    "The following figure from the paper is a color-magnitude diagram showing the stars we previously selected based on proper motion:\n",
    "\n",
    "<img width=\"300\" src=\"https://github.com/datacarpentry/astronomy-python/raw/gh-pages/fig/gd1-3.png\">\n",
    "\n",
    "In red is a theoretical isochrone, showing where we expect the stars in GD-1 to fall based on the metallicity and age of their original globular cluster. \n",
    "\n",
    "By selecting stars in the shaded area, we can further distinguish the main sequence of GD-1 from mostly younger background stars."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Outline\n",
    "\n",
    "Here are the steps in this notebook:\n",
    "\n",
    "1. We'll reload the data from the previous notebook and make a color-magnitude diagram.\n",
    "\n",
    "2. We'll use an isochrone computed by MIST to specify a polygonal region in the color-magnitude diagram and select the stars inside it.\n",
    "\n",
    "After completing this lesson, you should be able to\n",
    "\n",
    "* Use Matplotlib to specify a `Polygon` and determine which points fall inside it.\n",
    "\n",
    "* Use Pandas to merge data from multiple `DataFrames`, much like a database `JOIN` operation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reload the data\n",
    "\n",
    "You can [download the data from the previous lesson](https://github.com/AllenDowney/AstronomicalData/raw/main/data/gd1_data.hdf) or run the following cell, which downloads it if necessary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "from os.path import basename, exists\n",
    "\n",
    "def download(url):\n",
    "    filename = basename(url)\n",
    "    if not exists(filename):\n",
    "        from urllib.request import urlretrieve\n",
    "        local, _ = urlretrieve(url, filename)\n",
    "        print('Downloaded ' + local)\n",
    "\n",
    "download('https://github.com/AllenDowney/AstronomicalData/raw/main/' +\n",
    "         'data/gd1_data.hdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can reload `candidate_df`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "filename = 'gd1_data.hdf'\n",
    "candidate_df = pd.read_hdf(filename, 'candidate_df')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting photometry data\n",
    "\n",
    "Now that we have photometry data from Pan-STARRS, we can replicate the [color-magnitude diagram](https://en.wikipedia.org/wiki/Galaxy_color%E2%80%93magnitude_diagram) from the original paper:\n",
    "\n",
    "<img width=\"300\" src=\"https://github.com/datacarpentry/astronomy-python/raw/gh-pages/fig/gd1-3.png\">\n",
    "\n",
    "The y-axis shows the apparent magnitude of each source with the [g filter](https://en.wikipedia.org/wiki/Photometric_system).\n",
    "\n",
    "The x-axis shows the difference in apparent magnitude between the g and i filters, which indicates color.\n",
    "\n",
    "Stars with lower values of (g-i) are brighter in g-band than in i-band, compared to other stars, which means they are bluer.\n",
    "\n",
    "Stars in the lower-left quadrant of this diagram are less bright than the others, and have lower metallicity, which means they are [likely to be older](http://spiff.rit.edu/classes/ladder/lectures/ordinary_stars/ordinary.html).\n",
    "\n",
    "Since we expect the stars in GD-1 to be older than the background stars, the stars in the lower-left are more likely to be in GD-1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following function takes a table containing photometry data and draws a color-magnitude diagram.\n",
    "The input can be an Astropy `Table` or Pandas `DataFrame`, as long as it has columns named `g_mean_psf_mag` and `i_mean_psf_mag`.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def plot_cmd(table):\n",
    "    \"\"\"Plot a color magnitude diagram.\n",
    "    \n",
    "    table: Table or DataFrame with photometry data\n",
    "    \"\"\"\n",
    "    y = table['g_mean_psf_mag']\n",
    "    x = table['g_mean_psf_mag'] - table['i_mean_psf_mag']\n",
    "\n",
    "    plt.plot(x, y, 'ko', markersize=0.3, alpha=0.3)\n",
    "\n",
    "    plt.xlim([0, 1.5])\n",
    "    plt.ylim([14, 22])\n",
    "    plt.gca().invert_yaxis()\n",
    "\n",
    "    plt.ylabel('$Magnitude (g)$')\n",
    "    plt.xlabel('$Color (g-i)$')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`plot_cmd` uses a new function, `invert_yaxis`, to invert the `y` axis, which is conventional when plotting magnitudes, since lower magnitude indicates higher brightness.\n",
    "\n",
    "`invert_yaxis` is a little different from the other functions we've used.  You can't call it like this:\n",
    "\n",
    "```\n",
    "plt.invert_yaxis()          # doesn't work\n",
    "```\n",
    "\n",
    "You have to call it like this:\n",
    "\n",
    "```\n",
    "plt.gca().invert_yaxis()          # works\n",
    "```\n",
    "\n",
    "`gca` stands for \"get current axis\".  It returns an object that represents the axes of the current figure, and that object provides `invert_yaxis`.\n",
    "\n",
    "**In case anyone asks:** The most likely reason for this inconsistency in the interface is that `invert_yaxis` is a lesser-used function, so it's not made available at the top level of the interface."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what the results look like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_cmd(candidate_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our figure does not look exactly like the one in the paper because we are working with a smaller region of the sky, so we don't have as many stars.  But we can see an overdense region in the lower left that contains stars with the photometry we expect for GD-1.\n",
    "\n",
    "In the next section we'll use an isochrone to specify a polygon that contains this overdense regioin."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Isochrone\n",
    "\n",
    "Based on our best estimates for the ages of the stars in GD-1 and their metallicity, we can compute a [stellar isochrone](https://en.wikipedia.org/wiki/Stellar_isochrone) that predicts the relationship between their magnitude and color.\n",
    "\n",
    "In fact, we can use [MESA Isochrones & Stellar Tracks](http://waps.cfa.harvard.edu/MIST/) (MIST) to compute it for us.\n",
    "\n",
    "Using the [MIST Version 1.2 web interface](http://waps.cfa.harvard.edu/MIST/interp_isos.html), we computed an isochrone with the following parameters:\n",
    "    \n",
    "* Rotation initial v/v_crit = 0.4\n",
    "\n",
    "* Single age, linear scale = 12e9\n",
    "\n",
    "* Composition [Fe/H] = -1.35\n",
    "\n",
    "* Synthetic Photometry, PanStarrs\n",
    "\n",
    "* Extinction av = 0\n",
    "\n",
    "The following cell downloads the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "download('https://github.com/AllenDowney/AstronomicalData/raw/main/' +\n",
    "         'data/MIST_iso_5fd2532653c27.iso.cmd')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To read this file we'll download a Python module [from this repository](https://github.com/jieunchoi/MIST_codes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "download('https://github.com/jieunchoi/MIST_codes/raw/master/scripts/' +\n",
    "         'read_mist_models.py')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can read the file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "import read_mist_models\n",
    "\n",
    "filename = 'MIST_iso_5fd2532653c27.iso.cmd'\n",
    "iso = read_mist_models.ISOCMD(filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is an `ISOCMD` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(iso)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It contains a list of arrays, one for each isochrone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(iso.isocmds)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We only got one isochrone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(iso.isocmds)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we can select it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "iso_array = iso.isocmds[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's a NumPy array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(iso_array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But it's an unusual NumPy array, because it contains names for the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "iso_array.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which means we can select columns using the bracket operator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "iso_array['phase']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use `phase` to select the part of the isochrone for stars in the main sequence and red giant phases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "phase_mask = (iso_array['phase'] >= 0) & (iso_array['phase'] < 3)\n",
    "phase_mask.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "main_sequence = iso_array[phase_mask]\n",
    "len(main_sequence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The other two columns we'll use are `PS_g` and `PS_i`, which contain simulated photometry data for stars with the given age and metallicity, based on a model of the Pan-STARRS sensors.\n",
    "\n",
    "We'll use these columns to superimpose the isochrone on the color-magnitude diagram, but first we have to use a [distance modulus](https://en.wikipedia.org/wiki/Distance_modulus) to scale the isochrone based on the estimated distance of GD-1.\n",
    "\n",
    "We can use the `Distance` object from Astropy to compute the distance modulus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "import astropy.coordinates as coord\n",
    "import astropy.units as u\n",
    "\n",
    "distance = 7.8 * u.kpc\n",
    "distmod = coord.Distance(distance).distmod.value\n",
    "distmod"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can compute the scaled magnitude and color of the isochrone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "mag_g = main_sequence['PS_g'] + distmod\n",
    "color_g_i = main_sequence['PS_g'] - main_sequence['PS_i']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can plot it on the color-magnitude diagram like this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_cmd(candidate_df)\n",
    "plt.plot(color_g_i, mag_g);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The theoretical isochrone passes through the overdense region where we expect to find stars in GD-1.\n",
    "\n",
    "Let's save this result so we can reload it later without repeating the steps in this section.\n",
    "\n",
    "So we can save the data in an HDF5 file, we'll put it in a Pandas `DataFrame` first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "iso_df = pd.DataFrame()\n",
    "iso_df['mag_g'] = mag_g\n",
    "iso_df['color_g_i'] = color_g_i\n",
    "\n",
    "iso_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then save it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = 'gd1_isochrone.hdf5'\n",
    "iso_df.to_hdf(filename, 'iso_df')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Making a polygon\n",
    "\n",
    "The following cell downloads the isochrone we made in the previous section, if necessary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "download('https://github.com/AllenDowney/AstronomicalData/raw/main/data/' +\n",
    "         'gd1_isochrone.hdf5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can read it back in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = 'gd1_isochrone.hdf5'\n",
    "iso_df = pd.read_hdf(filename, 'iso_df')\n",
    "iso_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what the isochrone looks like on the color-magnitude diagram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_cmd(candidate_df)\n",
    "plt.plot(iso_df['color_g_i'], iso_df['mag_g']);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the bottom half of the figure, the isochrone passes through the overdense region where the stars are likely to belong to GD-1.\n",
    "\n",
    "In the top half, the isochrone passes through other regions where the stars have higher magnitude and metallicity than we expect for stars in GD-1.\n",
    "\n",
    "So we'll select the part of the isochrone that lies in the overdense region.\n",
    "\n",
    "`g_mask` is a Boolean Series that is `True` where `g` is between 18.0 and 21.5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = iso_df['mag_g']\n",
    "\n",
    "g_mask = (g > 18.0) & (g < 21.5)\n",
    "g_mask.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use it to select the corresponding rows in `iso_df`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "iso_masked = iso_df[g_mask]\n",
    "iso_masked.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, to select the stars in the overdense region, we have to define a polygon that includes stars near the isochrone.\n",
    "\n",
    "The original paper uses the following formulas to define the left and right boundaries. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = iso_masked['mag_g']\n",
    "left_color = iso_masked['color_g_i'] - 0.4 * (g/28)**5\n",
    "right_color = iso_masked['color_g_i'] + 0.8 * (g/28)**5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The intention is to define a polygon that gets wider as `g` increases, to reflect increasing uncertainty.\n",
    "\n",
    "But we can do about as well with a simpler formula:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = iso_masked['mag_g']\n",
    "left_color = iso_masked['color_g_i'] - 0.06\n",
    "right_color = iso_masked['color_g_i'] + 0.12"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what these boundaries look like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_cmd(candidate_df)\n",
    "\n",
    "plt.plot(left_color, g, label='left color')\n",
    "plt.plot(right_color, g, label='right color')\n",
    "\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Which points are in the polygon?\n",
    "\n",
    "Matplotlib provides a `Polygon` object that we can use to check which points fall in the polygon we just constructed.\n",
    "\n",
    "To make a `Polygon`, we need to assemble `g`, `left_color`, and `right_color` into a loop, so the points in `left_color` are connected to the points of `right_color` in reverse order.\n",
    "\n",
    "We'll use the following function, which takes two arrays and joins them front-to-back:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "def front_to_back(first, second):\n",
    "    \"\"\"Join two arrays front to back.\"\"\"\n",
    "    return np.append(first, second[::-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`front_to_back` uses a \"slice index\" to reverse the elements of `second`.\n",
    "\n",
    "As explained in the [NumPy documentation](https://numpy.org/doc/stable/reference/arrays.indexing.html), a slice index has three parts separated by colons:\n",
    "\n",
    "* `start`: The index of the element where the slice starts.\n",
    "\n",
    "* `stop`: The index of the element where the slice ends.\n",
    "\n",
    "* `step`: The step size between elements.\n",
    "\n",
    "In this example, `start` and `stop` are omitted, which means all elements are selected.\n",
    "\n",
    "And `step` is `-1`, which means the elements are in reverse order.\n",
    "\n",
    "We can use `front_to_back` to make a loop that includes the elements of `left_color` and `right_color`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "color_loop = front_to_back(left_color, right_color)\n",
    "color_loop.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And a corresponding loop with the elements of `g` in forward and reverse order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": [
    "mag_loop = front_to_back(g, g)\n",
    "mag_loop.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what the loop looks like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_cmd(candidate_df)\n",
    "plt.plot(color_loop, mag_loop);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make a `Polygon`, it will be convenient to put `color_loop` and `mag_loop` into a `DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [],
   "source": [
    "loop_df = pd.DataFrame()\n",
    "loop_df['color_loop'] = color_loop\n",
    "loop_df['mag_loop'] = mag_loop\n",
    "loop_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can pass `loop_df` to `Polygon`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [],
   "source": [
    "from matplotlib.patches import Polygon\n",
    "\n",
    "polygon = Polygon(loop_df)\n",
    "polygon"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is a `Polygon` object , which provides `contains_points`, which figures out which points are inside the polygon.\n",
    "\n",
    "To test it, we'll create a list with two points, one inside the polygon and one outside."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [],
   "source": [
    "points = [(0.4, 20), \n",
    "          (0.4, 16)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can make sure `contains_points` does what we expect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "inside = polygon.contains_points(points)\n",
    "inside"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is an array of Boolean values.\n",
    "\n",
    "We are almost ready to select stars whose photometry data falls in this polygon.  But first we need to do some data cleaning."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Save the polygon\n",
    "\n",
    "[Reproducibile research](https://en.wikipedia.org/wiki/Reproducibility#Reproducible_research) is \"the idea that ... the full computational environment used to produce the results in the paper such as the code, data, etc. can be used to reproduce the results and create new work based on the research.\"\n",
    "\n",
    "This Jupyter notebook is an example of reproducible research because it contains all of the code needed to reproduce the results, including the database queries that download the data and and analysis.\n",
    "\n",
    "In this lesson we used an isochrone to derive a polygon, which we used to select stars based on photometry. \n",
    "So it is important to record the polygon as part of the data analysis pipeline.\n",
    "\n",
    "Here's how we can save it in an HDF file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = 'gd1_data.hdf'\n",
    "loop_df.to_hdf(filename, 'loop_df')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Selecting based on photometry\n",
    "\n",
    "Now let's see how many of the candidate stars are inside the polygon we chose.\n",
    "We'll put color and magnitude data from `candidate_df` into a new `DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [],
   "source": [
    "points = pd.DataFrame()\n",
    "\n",
    "points['color'] = candidate_df['g_mean_psf_mag'] - candidate_df['i_mean_psf_mag']\n",
    "points['mag'] = candidate_df['g_mean_psf_mag']\n",
    "\n",
    "points.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which we can pass to `contains_points`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [],
   "source": [
    "inside = polygon.contains_points(points)\n",
    "inside"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is a Boolean array.  We can use `sum` to see how many stars fall in the polygon."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [],
   "source": [
    "inside.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can use `inside` as a mask to select stars that fall inside the polygon."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [],
   "source": [
    "winner_df = candidate_df[inside]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's make a color-magnitude plot one more time, highlighting the selected stars with green markers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_cmd(candidate_df)\n",
    "plt.plot(color_g_i, mag_g)\n",
    "plt.plot(color_loop, mag_loop)\n",
    "\n",
    "x = winner_df['g_mean_psf_mag'] - winner_df['i_mean_psf_mag']\n",
    "y = winner_df['g_mean_psf_mag']\n",
    "plt.plot(x, y, 'go', markersize=0.5, alpha=0.5);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like the selected stars are, in fact, inside the polygon, which means they have photometry data consistent with GD-1.\n",
    "\n",
    "Finally, we can plot the coordinates of the selected stars:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,2.5))\n",
    "\n",
    "x = winner_df['phi1']\n",
    "y = winner_df['phi2']\n",
    "plt.plot(x, y, 'ko', markersize=0.7, alpha=0.9)\n",
    "\n",
    "plt.xlabel('ra (degree GD1)')\n",
    "plt.ylabel('dec (degree GD1)')\n",
    "\n",
    "plt.axis('equal');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example includes two new Matplotlib commands:\n",
    "\n",
    "* `figure` creates the figure.  In previous examples, we didn't have to use this function; the figure was created automatically.  But when we call it explicitly, we can provide arguments like `figsize`, which sets the size of the figure.\n",
    "\n",
    "* `axis` with the parameter `equal` sets up the axes so a unit is the same size along the `x` and `y` axes.\n",
    "\n",
    "In an example like this, where `x` and `y` represent coordinates in space, equal axes ensures that the distance between points is represented accurately.   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write the data\n",
    "\n",
    "Finally, let's write the selected stars to a file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = 'gd1_data.hdf'\n",
    "winner_df.to_hdf(filename, 'winner_df')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [],
   "source": [
    "from os.path import getsize\n",
    "\n",
    "MB = 1024 * 1024\n",
    "getsize(filename) / MB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "In this lesson, we used photometry data from Pan-STARRS to draw a color-magnitude diagram.\n",
    "We used an isochrone to define a polygon and select stars we think are likely to be in GD-1.  Plotting the results, we have a clearer picture of GD-1, similar to Figure 1 in the original paper."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Best practices\n",
    "\n",
    "* Matplotlib provides operations for working with points, polygons, and other geometric entities, so it's not just for making figures.\n",
    "\n",
    "* Use Matplotlib options to control the size and aspect ratio of figures to make them easier to interpret.  In this example, we scaled the axes so the size of a degree is equal along both axes.\n",
    "\n",
    "* Record every element of the data analysis pipeline that would be needed to replicate the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}