{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Spectroscopy Module\n",
    "\n",
    "**Lecturer:** Robert Quimby<br>\n",
    "**Jupyter Notebook Author:** Robert Quimby <br>\n",
    "**With Contributions from:** Cameron Hummels, Matt Hankins, & Leo Singer\n",
    "\n",
    "This is a Jupyter notebook lesson taken from the GROWTH Summer School 2020.  For other lessons and their accompanying lectures, please see: http://growth.caltech.edu/growth-school-2020.html\n",
    "\n",
    "## Objective\n",
    "\n",
    "Extract a 1-D spectrum from a 2-D image of a long-slit spectrum, determine wavelength solution, and then measure the redshift of the target.\n",
    "\n",
    "\n",
    "## Key steps\n",
    "- Define target and background apertures\n",
    "- Subtract the background contribution from the target aperture\n",
    "- Extract (sum) the counts in the background subtracted target aperture\n",
    "- Determine the mapping between spectral bin number and wavelength (in Angstroms)\n",
    "- Apply the wavelength solution to the target spectrum and find the observed wavelengths of emission lines\n",
    "- Use the observed wavelengths and the known rest wavelengths to measure the target redshift\n",
    "\n",
    "*Note: this notebook covers the steps necessary for a \"quick and dirty\" spectral extraction. For publication quality extractions see the included [Guide to Long-Slit Spectral Extractions](spectra_guide.ipynb)*\n",
    "\n",
    "## Required dependencies\n",
    "\n",
    "See GROWTH school webpage for detailed instructions on how to install these modules and packages.  Nominally, you should be able to install the python modules with `pip install <module>`.  The external astromatic packages are easiest installed using package managers (e.g., `rpm`, `apt-get`).\n",
    "\n",
    "### Python modules\n",
    "* python 3\n",
    "* astropy\n",
    "* numpy\n",
    "* scipy\n",
    "* matplotlib\n",
    "\n",
    "### External packages\n",
    "* None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# python modules that we will use\n",
    "import os\n",
    "import numpy as np\n",
    "from astropy.io import fits\n",
    "from astropy.stats import sigma_clip\n",
    "from scipy.optimize import curve_fit\n",
    "from scipy.signal import find_peaks\n",
    "\n",
    "%matplotlib inline\n",
    "import matplotlib.pylab as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from platform import python_version\n",
    "assert python_version() >= '3.6', \"Python version 3.6 or greater is required for f-strings\"\n",
    "\n",
    "import scipy\n",
    "assert scipy.__version__ >= '1.4', \"scipy version 1.4 or higher is required for this module\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# change plotting defaults\n",
    "plt.rc('axes', labelsize=14)\n",
    "plt.rc('axes', labelweight='bold')\n",
    "plt.rc('axes', titlesize=16)\n",
    "plt.rc('axes', titleweight='bold')\n",
    "plt.rc('font', family='sans-serif')\n",
    "plt.rcParams['figure.figsize'] = (17, 7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# where are the data located?\n",
    "cwd = os.getcwd()\n",
    "data_dir = os.path.join(cwd, 'data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the 2-D science Frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's begin by looking at the 2D image of the science spectrum, `spec_sci.fits`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the 2D science image data\n",
    "image = fits.getdata(os.path.join(data_dir,'spec_sci.fits'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this is a helper function to display images in this notebook\n",
    "def show_image(image, lower=-1, upper=3, extent=None):\n",
    "    sample = sigma_clip(image)\n",
    "    vmin = sample.mean() + lower * sample.std()\n",
    "    vmax = sample.mean() + upper * sample.std()\n",
    "    plt.figure(figsize=(15, 3))\n",
    "    plt.imshow(image, origin='lower', cmap='gray', aspect='auto', vmin=vmin, vmax=vmax, extent=extent)\n",
    "    plt.xlabel('Column Number')\n",
    "    plt.ylabel('Row Number');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# display the image\n",
    "show_image(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the science image above. The supplied `spec_sci.fits` image is the red channel of a Keck/LRIS spectrum taken in 2014. The image has already been process to remove bias, and the image has been trimmed and rotated (what we will call \"rows\" are actually columns on the CCD). This science image is actually a combination of two individual exposures that has been filtered to remove **most** (but not quite all) cosmic ray events. \n",
    "\n",
    "Notice the vertical bands of light. These are the sky lines. The spectrum runs from about 5000 Angstroms on the left to about 10000 Angstroms on the right (you will determine the exact range below!). Notice that there is a significantly higher density of sky lines in the near infrared. \n",
    "\n",
    "Notice the two horizontal bands of light. The lower band is the target (a small galaxy), and the upper band is a second object in the slit. If you look closely at the lower band (the target trace), you will see a number of emission lines. You will use these to determine the target redshift."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Determine which rows the target spans and which are background"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to know which rows in the 2D image contain light from the target and which rows we can use to set the sky background. Plot the average counts in each row and then use this plot to set `row1` and `row2` roughly to the rows where the target is significantly brighter than the background to define the target aperture. Then set `bgrow1` and `bgrow2` to define the aperture for the sky background."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot the average of each row vs. row number\n",
    "plt.plot(image.mean(axis=????))\n",
    "\n",
    "# define the target aperture range\n",
    "row1, row2 = ????, ????\n",
    "plt.axvspan(row1, row2, color='k', alpha=0.1, label='target aperture')\n",
    "\n",
    "# define the background aperture\n",
    "# (it is best to have two bands--one on each side of the target aperture)\n",
    "bg_apertures = []\n",
    "bg_apertures.append((????, ????))\n",
    "bg_apertures.append((????, ????))\n",
    "for bgrow1, bgrow2 in bg_apertures:\n",
    "    plt.axvspan(bgrow1, bgrow2, color='r', alpha=0.1, label='background aperture')\n",
    "\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick and dirty sky subtraction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As with photometry, the target aperture will contain light from both the target and the sky, so we must subtraction the contribution from the sky background to obtain our target spectrum. An (imperfect) way to do this is to simply find the average value in each column of the background aperture and to subtract this value from the respective columns on the image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a mask to identify which pixels are *not* part of the background apertures\n",
    "mask = np.ones(image.shape, dtype=bool)\n",
    "for bgrow1, bgrow2 in bg_apertures:\n",
    "    mask[bgrow1:bgrow2, :] = False\n",
    "\n",
    "# use the mask to define a background image (as a numpy masked array)\n",
    "bgim = np.ma.array(image, mask=mask)\n",
    "\n",
    "# calculate the average value for each column in the background aperture\n",
    "bgspec = ???\n",
    "\n",
    "# show the background subtracted target aperture\n",
    "bgsubim = image - bgspec\n",
    "show_image(bgsubim[row1:row2, :], upper=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should now see the faint continuum of the target with a number of bright emission lines. You may also notice significant residuals from the bright sky lines. See the included [Guide to Long-Slit Spectral Extractions](spectra_guide.ipynb) for a better way to remove the sky background. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick and dirty spectral extraction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have a sky-subtracted image, we can take the sum of each column to extract our target spectrum."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spec = ???\n",
    "plt.plot(spec)\n",
    "plt.xlabel('Column Number')\n",
    "plt.ylabel('Counts');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see a strong emission line near the middle of the spectra, which is H-alpha, along with some weaker lines. Now we need to determine the observed wavelength of this line (this is often the hard part)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the Lamp Spectra"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To determine the mapping between column numbers and wavelengths (in Angstroms), we can make use of arc lamp spectra."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the 2D lamp image data\n",
    "lamp_image = fits.getdata(os.path.join(data_dir,'spec_lamp.fits'))\n",
    "show_image(lamp_image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see that the lamp spectra consist of emission lines from the excited gas in the arc lamps (in this case mostly a combination of Neon and Argon). The wavelengths of these lines have been precisely measured in laboratory experiments, and we can use these known wavelengths and the columns where they are found to construct our wavelength solution. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the comparison line list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the \"linelist\" with precise laboratory wavelengths for these lamp lines\n",
    "dtype = [('wav', float), ('id', 'U2')]\n",
    "linelist = np.genfromtxt(os.path.join(data_dir, 'line_list.dat'), dtype=dtype)\n",
    "linelist.sort(order='wav')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract the lamp spectra in the same aperture as the target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use the same rows as for the target and extract the column averages\n",
    "lamp_spec = ???\n",
    "plt.plot(lamp_spec, c='g')\n",
    "plt.xlabel('Column Number')\n",
    "plt.ylabel('Counts');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that many of the lines peak at about 65000 counts? This indicates that these lines are saturated (too bright) and will not be useful for determining the wavelength solution."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Identify each column where the flux peaks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need the column value where each line peaks for our column to wavelength mapping. To start we can get approximate column values using `scipy.signal.find_peaks`. The keyword `prominence` is the desired minimum contrast (in counts) between the value at a peak and the value at the nearest troughs. Set this so that significant lines are kept but noise peaks are rejected. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# locate approximate line centers\n",
    "prominence = ???\n",
    "peak_cols, properties = find_peaks(lamp_spec, prominence=prominence)\n",
    "\n",
    "# plot the spectra\n",
    "plt.plot(lamp_spec, c='g')\n",
    "\n",
    "# mark the peaks\n",
    "plt.scatter(peak_cols, lamp_spec[peak_cols], marker='x', c='k')\n",
    "plt.xlabel('Column Number')\n",
    "plt.ylabel('Counts')\n",
    "plt.yscale('log')\n",
    "plt.grid();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `find_peaks` function merely returns integer indices into the `lamp_spec` array. We can fit Gaussian models to each one of these peaks to find the line centers to much better than 1 column. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use a Gaussian function to model spectral lines\n",
    "def gaussian(x, *params):\n",
    "    amp, x0, sigma = params\n",
    "    return amp * np.exp(-(x - x0)**2 / 2 / sigma**2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define a function to determine the precise column centers for each lamp line\n",
    "def get_lamp_lines(lamp_spec, prominence=100):\n",
    "    peak_cols, properties = find_peaks(lamp_spec, prominence=prominence)\n",
    "\n",
    "    # record in a structured array for convenience \n",
    "    dtype = []\n",
    "    dtype.append( ('col', float) )\n",
    "    dtype.append( ('counts', float) )\n",
    "    dtype.append( ('x', float) )\n",
    "    dtype.append( ('y', float) )\n",
    "    dtype.append( ('sigma', float) )\n",
    "    dtype.append( ('id', 'U2') )\n",
    "    dtype.append( ('wav', float) )\n",
    "    dtype.append( ('wavres', float) )\n",
    "    dtype.append( ('used', bool) )\n",
    "    lamp_lines = np.zeros(peak_cols.size, dtype=dtype)\n",
    "    lamp_lines['col'] = peak_cols\n",
    "    lamp_lines['counts'] = lamp_spec[peak_cols]\n",
    "    lamp_lines['x'] = np.nan\n",
    "    lamp_lines['y'] = np.nan\n",
    "    lamp_lines['sigma'] = np.nan\n",
    "    lamp_lines['wav'] = np.nan\n",
    "    lamp_lines['wavres'] = np.nan\n",
    "\n",
    "    # fit each peak to determine precise center\n",
    "    cols = np.arange(lamp_spec.size)\n",
    "    sigma_guess = 2.5\n",
    "    for line in lamp_lines:\n",
    "        if line['counts'] > 60000:\n",
    "            # line is saturated...skip\n",
    "            continue\n",
    "\n",
    "        i0 = max([0, int(line['col'] - 5)])\n",
    "        i1 = min([lamp_spec.size - 1, int(line['col'] + 5)])\n",
    "        guess = (line['counts'], line['col'], sigma_guess)\n",
    "        bounds = ((0, line['col'] - 3, 0), (np.inf, line['col'] + 3, np.inf))\n",
    "        try:\n",
    "            popt, pcov = curve_fit(gaussian, cols[i0:i1], lamp_spec[i0:i1], p0=guess, bounds=bounds)\n",
    "        except RuntimeError:\n",
    "            # curve_fit failed to converge...skip\n",
    "            continue\n",
    "\n",
    "        line['x'] = popt[1]\n",
    "        line['y'] = gaussian(popt[1], *popt)\n",
    "        line['sigma'] = popt[2]\n",
    "\n",
    "    # filter lamp_lines to keep only lines that were fit\n",
    "    wasfit = np.isfinite(lamp_lines['x'])\n",
    "    lamp_lines = lamp_lines[wasfit]\n",
    "    print('found center pixel values for', lamp_lines.size, 'lines')\n",
    "    return lamp_lines"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# helper function to mark lamp lines with and 'x'\n",
    "def mark_peaks(plt, lamp_lines, xtype='x', ytype='y', c='k'):\n",
    "    w = np.isfinite(lamp_lines['wav'])\n",
    "    if w.sum() > 0:\n",
    "        plt.scatter(lamp_lines[xtype][w], lamp_lines[ytype][w], c=np.abs(lamp_lines['wavres'][w]), zorder=10)\n",
    "    if (~w).sum() > 0:\n",
    "        plt.scatter(lamp_lines[xtype][~w], lamp_lines[ytype][~w], c=c, marker='x')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# find the precise column centers for all lamp lines\n",
    "lamp_lines = get_lamp_lines(lamp_spec, prominence=prominence)\n",
    "\n",
    "# plot the lamp spectra\n",
    "plt.plot(lamp_spec, c='g')\n",
    "plt.xlabel('Column Number')\n",
    "plt.ylabel('Counts')\n",
    "plt.grid()\n",
    "\n",
    "# mark initial values from find_peaks in red\n",
    "mark_peaks(plt, lamp_lines, 'col', 'counts', c='r')\n",
    "\n",
    "# mark best-fit values\n",
    "mark_peaks(plt, lamp_lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load reference spectrum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have precise column centers for each line, we need to identify each line to set its wavelength. We can begin by looking for similar line features in a wavelength calibrated lamp spectrum atlas. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load and plot the provided spectral atlas\n",
    "lamp_ref = np.genfromtxt(os.path.join(data_dir, 'lamp_reference.dat'), names='wav, counts')\n",
    "plt.plot(lamp_ref['wav'], lamp_ref['counts'], c='r')\n",
    "plt.xlabel('Wavelength ($\\AA$)')\n",
    "plt.ylabel('Counts');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define a rough mapping between column number and wavelength"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this function will be used below to match a given line to a given list\n",
    "def match_to_list(listing, values, plt=None, tol=None, revcoeff=None, c='k'):\n",
    "    matched = []\n",
    "    cols = []\n",
    "    for value in values:\n",
    "        absdiff = np.abs(value - listing)\n",
    "        ind = np.argmin(absdiff)\n",
    "        if tol is None:\n",
    "            bestmatch = listing[ind]\n",
    "        elif absdiff[ind] < tol:\n",
    "            bestmatch = listing[ind]\n",
    "        else:\n",
    "            bestmatch = np.nan\n",
    "        matched.append(bestmatch)\n",
    "\n",
    "        if plt is not None:\n",
    "            plt.axvline(bestmatch, ls='dotted', c=c)\n",
    "            \n",
    "        if revcoeff is not None:\n",
    "            col = np.polyval(revcoeff, bestmatch)\n",
    "            cols.append(col)\n",
    "            print(f\"{bestmatch:.1f} is expected near column {col:.0f}\")\n",
    "\n",
    "    if revcoeff is not None:\n",
    "        return np.array(matched), cols\n",
    "    \n",
    "    return np.array(matched)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Focus on a small wavelength range of the reference spectrum (between `wav1` and `wav2`). Pick a few good (non-blended) lines to start with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "wav1 = 5700\n",
    "wav2 = 7500\n",
    "\n",
    "# plot the reference spectrum in red\n",
    "plt.plot(lamp_ref['wav'], lamp_ref['counts'], label='reference', c='r')\n",
    "plt.xlim(wav1, wav2)\n",
    "plt.xlabel('Wavelength ($\\AA$)')\n",
    "plt.ylabel('Counts'); plt.grid();\n",
    "\n",
    "# mark all wavelengths available in linelist\n",
    "for row in linelist:\n",
    "    plt.axvline(row['wav'], ls='dashed', c='0.8')\n",
    "\n",
    "# pick a few lines in this plot\n",
    "rough_waves = [] # <---- add a few values here\n",
    "refwavs = match_to_list(linelist['wav'], rough_waves, plt=plt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now try to find a section of your `lamp_spec` (between columns `col1` and `col2`) that looks similar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# find a section of lamp_spec that looks similar\n",
    "col1 = ???\n",
    "col2 = ???\n",
    "plt.plot(lamp_spec, c='g')\n",
    "plt.xlim(col1, col2)\n",
    "plt.xlabel('Column Number'); plt.ylabel('Counts'); plt.grid()\n",
    "\n",
    "# mark lines with Gaussian-fit centers\n",
    "mark_peaks(plt, lamp_lines)\n",
    "\n",
    "# record the rough column numbers of the same lines as above in the same order\n",
    "rough_cols = [] # <---- add rough column values here\n",
    "refcols = match_to_list(lamp_lines['x'], rough_cols, plt=plt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you have marked 3-4 matching lines, use the cells below to record the line wavelengths in `lamp_lines`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def set_line_identity(lamp_lines, linelist, x, wav):\n",
    "    # find the closest matching lamp_line\n",
    "    ilamp = np.argmin(np.abs(lamp_lines['x'] - x))\n",
    "    \n",
    "    # find the closest matching row in the linelist\n",
    "    ilist = np.argmin(np.abs(linelist['wav'] - wav))\n",
    "    \n",
    "    # reset values in lamp_lines\n",
    "    lamp_lines[ilamp]['id'] = linelist[ilist]['id']\n",
    "    lamp_lines[ilamp]['wav'] = linelist[ilist]['wav']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# record ids and wavelengths of matched lines\n",
    "for col, wav in zip(refcols, refwavs):\n",
    "    set_line_identity(lamp_lines, linelist, col, wav)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set the initial wavelength solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the matched column/wavelength pairs we can define an initial wavelength solution by assuming a polynomial (linear to start) relation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this routine finds the polynomial coefficients needed to transform between \n",
    "# column numbers and wavelengths (and vice versa). Outlier rejection is included.\n",
    "def get_wavelength_solution(lamp_lines, order=4):\n",
    "    wfit = np.isfinite(lamp_lines['wav'])\n",
    "    \n",
    "    # define the reverse mapping (wavelength to column)\n",
    "    revcoeff = np.polyfit(lamp_lines['wav'][wfit], lamp_lines['x'][wfit], order)\n",
    "\n",
    "    # define the forward mapping (column to wavelength)\n",
    "    coeff = np.polyfit(lamp_lines['x'][wfit], lamp_lines['wav'][wfit], order)\n",
    "    \n",
    "    # check the fit for outliers\n",
    "    fit_wav = np.polyval(coeff, lamp_lines['x'])\n",
    "    wavres = fit_wav - lamp_lines['wav']\n",
    "    lamp_lines['wavres'] = wavres\n",
    "    sample = wavres[wfit]\n",
    "    sample.sort()\n",
    "    sample = sample[int(0.1 * sample.size) : int(0.9 * sample.size + 0.5)]    \n",
    "    std = np.std(sample, ddof=1)\n",
    "    w = wfit \n",
    "    w[wfit] = (np.abs(lamp_lines['wavres'][wfit]) < (5 * std))\n",
    "    if w.sum() != lamp_lines.size:\n",
    "        # re-fit with outliers rejected\n",
    "        coeff, revcoeff = get_wavelength_solution(lamp_lines[w], order=order)\n",
    "        \n",
    "        # reset wavelength residuals using new coefficients\n",
    "        fit_wav = np.polyval(coeff, lamp_lines['x'])\n",
    "        wavres = fit_wav - lamp_lines['wav']\n",
    "        lamp_lines['wavres'] = wavres\n",
    "        \n",
    "    lamp_lines['used'] = w\n",
    "    return coeff, revcoeff\n",
    "        \n",
    "def check_wavelength_solution(lamp_spec, lamp_lines, coeff):    \n",
    "    wavs = col_to_wav(coeff, np.arange(lamp_spec.size))\n",
    "    plt.plot(wavs, lamp_spec, c='g', lw=2)\n",
    "    mark_peaks(plt, lamp_lines, 'wav')\n",
    "    plt.colorbar(label='Residual ($\\AA$)')\n",
    "    plt.xlabel('Wavelength ($\\AA$)')\n",
    "    plt.ylabel('Counts')\n",
    "    plt.grid()\n",
    "    \n",
    "def col_to_wav(coeff, cols):\n",
    "    return np.polyval(coeff, cols)\n",
    "\n",
    "def wav_to_col(revcoeff, wavs):\n",
    "    return np.polyval(revcoeff, wavs)\n",
    "\n",
    "def mark_matched(lamp_lines):\n",
    "    for line in lamp_lines:\n",
    "        plt.axvline(line['wav'], ls='dotted', c='k')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# estimate a linear relation between column number and wavlength\n",
    "coeff, revcoeff = get_wavelength_solution(lamp_lines, order=???)\n",
    "\n",
    "# apply the wavelength solution to the column numbers\n",
    "wavs = col_to_wav(???)\n",
    "\n",
    "# plot the initial wavelength calibrated spectrum\n",
    "plt.plot(wavs, lamp_spec, c='g', lw=2, label='lamps')\n",
    "plt.xlim(wav1, wav2)\n",
    "\n",
    "# plot the reference spectrum in red\n",
    "plt.plot(lamp_ref['wav'], lamp_ref['counts'], label='reference', c='r')\n",
    "plt.xlim(wav1, wav2)\n",
    "plt.xlabel('Wavelength ($\\AA$)'); plt.ylabel('Counts'); plt.grid();\n",
    "mark_matched(lamp_lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use the initial wavelength solution to match more lines"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# check for more matches in the range already fit\n",
    "def match_more(lamp_lines, linelist, order=4, tol=2):\n",
    "    coeff, revcoeff = get_wavelength_solution(lamp_lines, order=order)\n",
    "    wfit = np.isfinite(lamp_lines['wav'])\n",
    "    minwav = lamp_lines['wav'][wfit].min()\n",
    "    maxwav = lamp_lines['wav'][wfit].max()\n",
    "    \n",
    "    xmin = lamp_lines['x'][wfit].min()\n",
    "    xmax = lamp_lines['x'][wfit].max()\n",
    "    \n",
    "    w = (lamp_lines['x'] > xmin) & (lamp_lines['x'] < xmax)\n",
    "    for line in lamp_lines[w]:\n",
    "        rough_wav = col_to_wav(coeff, line['x'])\n",
    "        refwav = match_to_list(linelist['wav'], [rough_wav], tol=tol)\n",
    "        if np.isfinite(refwav):\n",
    "            #print(f'matched column {line[\"x\"]:.1f} to wavelength {refwav[0]}')\n",
    "            set_line_identity(lamp_lines, linelist, line['x'], refwav)\n",
    "            \n",
    "match_more(lamp_lines, linelist, order=???)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# re-fit with a higher order\n",
    "coeff, revcoeff = get_wavelength_solution(lamp_lines, order=???)\n",
    "check_wavelength_solution(lamp_spec, lamp_lines, coeff)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use the revised wavelength solution to extend the range of matches"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pick lines near the ends of the spectra to extend the solution across the full wavelength range."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "wav1 = 5300\n",
    "wav2 = 10000\n",
    "\n",
    "# plot line list over lamp spectra\n",
    "wavs = col_to_wav(coeff, np.arange(lamp_spec.size))\n",
    "plt.plot(wavs, lamp_spec, c='g')\n",
    "plt.xlim(wav1, wav2)\n",
    "\n",
    "# show available lines in the line list with dashed grey lines\n",
    "for row in linelist:\n",
    "    plt.axvline(row['wav'], ls='dashed', c='0.8')\n",
    "\n",
    "# pick new lines to add\n",
    "rough_waves = [] # <--- add new lines here\n",
    "refwavs, rough_cols = match_to_list(linelist['wav'], rough_waves, plt=plt, revcoeff=revcoeff, c='r')\n",
    "refcols = match_to_list(lamp_lines['x'], rough_cols, plt=plt)\n",
    "\n",
    "# mark existing matches\n",
    "mark_peaks(plt, lamp_lines, 'wav')\n",
    "plt.xlabel('Wavelength ($\\AA$)'); plt.ylabel('Counts');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# check the supposed matches\n",
    "col1, col2 = np.polyval(revcoeff, [wav1, wav2])\n",
    "refcols = match_to_list(lamp_lines['x'], rough_cols)\n",
    "\n",
    "plt.plot(lamp_spec, c='g')\n",
    "mark_peaks(plt, lamp_lines, 'x', 'y')\n",
    "\n",
    "plt.xlim(col1, col2)\n",
    "refcols = match_to_list(lamp_lines['x'], rough_cols, plt=plt)\n",
    "plt.xlabel('Column Number'); plt.ylabel('Counts');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add new lines to the matches\n",
    "for col, wav in zip(refcols, refwavs):\n",
    "    set_line_identity(lamp_lines, linelist, col, wav)\n",
    "    \n",
    "# auto-match more lines\n",
    "match_more(lamp_lines, linelist)\n",
    "\n",
    "# re-fit\n",
    "coeff, revcoeff = get_wavelength_solution(lamp_lines, order=???)\n",
    "check_wavelength_solution(lamp_spec, lamp_lines, coeff)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot wavelength residuals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure you have a good wavelength solution by checking the residuals. For the supplied Keck/LRIS data, the wavelength solution should be accurate to a fraction of an Angstrom."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "w = lamp_lines['used']\n",
    "std = np.std(lamp_lines['wavres'][w], ddof=1)\n",
    "print(f'STD of wavelength residual is {std:0.2} Angstrom')\n",
    "\n",
    "plt.scatter(lamp_lines['wav'][w], lamp_lines['wavres'][w])\n",
    "plt.scatter(lamp_lines['wav'][~w], lamp_lines['wavres'][~w], marker='x')\n",
    "plt.axhspan(-std, std, color='k', alpha=0.1)\n",
    "plt.xlabel('Wavelength ($\\AA$)'); plt.ylabel('Counts'); plt.grid();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot target spectra vs. wavelength"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can apply the wavelength solution from the arc lamps to the science spectra to produce a wavelength calibrated science spectrum. Note that it is possible for there to be a wavelength offset between the arc lamp and science spectra. See the included [Guide to Long-Slit Spectral Extractions](spectra_guide.ipynb) for a procedure to correct for this shift. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# calculate the wavelengths for each column number\n",
    "wavs = col_to_wav(coeff, np.arange(lamp_spec.size))\n",
    "\n",
    "# plot the spectra\n",
    "plt.plot(wavs, spec);\n",
    "\n",
    "# zoom in on H-alpha\n",
    "plt.xlim(???, ???)\n",
    "plt.grid()\n",
    "\n",
    "# make a guess at the central wavelength\n",
    "obswav_guess = ???\n",
    "plt.axvline(obswav_guess, ls='dashed', c='k');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Measure the observed wavelength of H$\\alpha$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can assume the same Gaussian line model used above to fit for the precise line center."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fit a Gaussian to find line center\n",
    "guess = ???\n",
    "bounds = ???\n",
    "popt, pcov = curve_fit(gaussian, wavs, spec, p0=guess, bounds=bounds)\n",
    "\n",
    "obswav = popt[1]\n",
    "obswav_error = np.sqrt(pcov[1, 1])\n",
    "print(f'Observed wavelength of H-alpha is {obswav:.3f} +/- {obswav_error:.3f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use the known laboratory wavelength to determine the redshift\n",
    "restwav = 6562.78\n",
    "redshift = ???\n",
    "redshift_error = ???\n",
    "print(f'The redshift is {redshift:.5f} +/- {redshift_error:.5f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Optional] use other lines in the spectrum to test redshift"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# [Optional] Going Further: Publication Quality Spectral Extraction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Following the steps in this notebook you should now have a measurement for the redshift of the target. For many applications this is all you will need, but there are a number of improvements that can be made as well as additional steps that can be followed to produce a fully reduced, publication quality spectrum. See the included [Guide to Long-Slit Spectral Extractions](spectra_guide.ipynb) for the complete guide."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}