{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Image Reduction Module\n",
    "\n",
    "**Lecturer:** Dan Perley<br>\n",
    "**Jupyter Notebook Authors:** Dan Perley, Kishalay De & Cameron Hummels\n",
    "\n",
    "This is a Jupyter notebook lesson taken from the GROWTH Summer School 2020.  For other lessons and their accompanying lectures, please see: http://growth.caltech.edu/growth-school-2020.html\n",
    "\n",
    "## Objective\n",
    "Process raw images from a visible wavelength telescope and make them ready for photometric analysis\n",
    "\n",
    "## Key steps\n",
    "- Reduce raw images by applying biases & flats\n",
    "- Align and combine consecutive images\n",
    "\n",
    "## Required dependencies\n",
    "\n",
    "See GROWTH school webpage for detailed instructions on how to install these modules and packages.  Nominally, you should be able to install the python modules with `pip install <module>` (or `pip3 install <module>`).  The external astromatic packages are easiest installed using package managers (e.g., `rpm`, `apt-get`).\n",
    "\n",
    "### Python modules\n",
    "* python 3\n",
    "* astropy\n",
    "* numpy\n",
    "* matplotlib\n",
    "* photutils\n",
    "* pyregion\n",
    "\n",
    "### External packages\n",
    "* SExtractor https://www.astromatic.net/software\n",
    "* SWarp https://www.astromatic.net/software (optional)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## Setup\n",
    "\n",
    "### Import modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Import necessary modules \n",
    "\n",
    "from astropy.io import fits\n",
    "from astropy.wcs import WCS\n",
    "from astropy.stats import sigma_clipped_stats\n",
    "import glob\n",
    "import os\n",
    "import subprocess\n",
    "import warnings\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import photutils\n",
    "import pyregion\n",
    "\n",
    "# You can ignore any warnings that appear below, but if any modules can't be imported you need to install them"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test dependencies\n",
    "\n",
    "In order to complete the module some external software is required.  The following step checks that these are installed properly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test_dependency(dep, alternate_name=None):\n",
    "    \"\"\"\n",
    "    Test external dependency by trying to run it as a subprocess\n",
    "    \"\"\"\n",
    "    try:\n",
    "        subprocess.check_output(dep, stderr=subprocess.PIPE, shell=True)\n",
    "        print(\"%s is installed properly as %s. OK\" % (dep, dep))\n",
    "        return 1\n",
    "    except subprocess.CalledProcessError:\n",
    "        try:\n",
    "            subprocess.check_output(alternate_name, stderr=subprocess.PIPE, shell=True)\n",
    "            print(\"%s is installed properly as %s. OK\" % (dep, alternate_name))\n",
    "            return 1\n",
    "        except subprocess.CalledProcessError:\n",
    "            print(\"===%s/%s IS NOT YET INSTALLED PROPERLY===\" % (dep, alternate_name))\n",
    "            return 0\n",
    "    \n",
    "dependencies = [('sextractor', 'sex'), ('SWarp', 'swarp')]\n",
    "i = 0\n",
    "for dep_name1, dep_name2 in dependencies:\n",
    "    i += test_dependency(dep_name1, dep_name2)\n",
    "print(\"%i out of %i external dependencies installed properly.\\n\" % (i, len(dependencies)))\n",
    "if i != len(dependencies):\n",
    "    print(\"Please correctly install these programs before continuing by following the instructions in README.md.\")\n",
    "else:\n",
    "    print(\"You are ready to continue.\")        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SWarp and SExtractor are needed only for the second half of the module.  If you are unable to install these packages, it is possible to follow most of the module without them.  If they are installed under a command other than \"sextractor\" and \"SWarp\", respectively, you will have to slightly modify some of the example code blocks below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define where data live\n",
    "\n",
    "The various directories where data live are defined here for use later in this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "## Define data directories\n",
    "\n",
    "curpath = os.path.abspath('.')                    # top level directory\n",
    "dataFolder = os.path.join(curpath, 'data')        # data directory\n",
    "biasFolder = os.path.join(dataFolder, 'bias')     # bias frames subdirectory\n",
    "flatFolder = os.path.join(dataFolder, 'flat')     # flat fields subdirectory\n",
    "sciFolder = os.path.join(dataFolder, 'science')   # science data subdirectory\n",
    "procFolder = os.path.join(curpath, 'processing')  # processing directory\n",
    "if not os.path.isdir(procFolder): \n",
    "    os.mkdir(procFolder)\n",
    "else:\n",
    "    for f in os.listdir(procFolder):\n",
    "        try:\n",
    "            os.remove(os.path.join(procFolder,f)) # clear the processing folder from previous iterations\n",
    "        except:\n",
    "            print('Could not remove',f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we get lists of all the various file types, including lists and names of files we'll be writing out later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Get file lists\n",
    "\n",
    "os.chdir(sciFolder)\n",
    "fileList = sorted(glob.glob('*.fits'))\n",
    "os.chdir(curpath)\n",
    "biasList = sorted(glob.glob(os.path.join(biasFolder,'*.fits')))\n",
    "flatList = sorted(glob.glob(os.path.join(flatFolder,'*.fits')))\n",
    "sciList = [os.path.join(sciFolder, file) for file in fileList]\n",
    "procList = [os.path.join(procFolder, file).replace('.fits','.proc.fits') for file in fileList]\n",
    "combinedFile = os.path.join(procFolder, 'AT2018cow_combined.fits')\n",
    "weightFile = os.path.join(procFolder, 'AT2018cow_weight.fits')\n",
    "resampledFile = os.path.join(procFolder, 'AT2018cow_resampled.fits')\n",
    "\n",
    "print('Found',len(biasList),'bias files; ',len(flatList),'flat files; ',len(sciList),'science files')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this module we will turn off python warnings, since the properties of the files we'll be using tend to generate quite a few spurious warnings messages that would distract from the content of the module.  In general it's better to turn off warnings more selectively only for sections that require it, but hiding warnings globally makes the code a little simpler to read."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## Preliminary Data Inspection and Exploration\n",
    "\n",
    "Like nearly all astronomical images, our data are stored in the form of FITS files.   FITS files contain a plaintext header storing the observational metadata (e.g. the name of the telescope) plus an array of pixel data.  FITS files can may be 'simple' (single header, single pixel image) or may contain extensions (additional headers and/or pixel images).  \n",
    "\n",
    "FITS files are loaded and examined using the astropy library in python.  Below we open up a FITS file and take a look at its structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Open an example FITS file\n",
    "\n",
    "exampleFile = sciList[0]         # The first science image\n",
    "HDUList = fits.open(exampleFile) # Open the file\n",
    "HDUList.info()                   # Print some information about the FITS file structure.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This reveals that the FITS header metadata is in extension 0 (the 'primary' header) and the pixel data is in extension 1.  The image dimensions are 2148x2501 pixels.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Investigate the header\n",
    "\n",
    "A good FITS header will contain almost everything we need to know about the observation.  Header data is stored as keyword + value pairs, optionally with a comment following the value.  Keywords are all-capitals and 8 characters or less; corresponding values can be much longer.  There is almost no standardization in the use of keywords at different telescopes, but OBJECT and EXPTIME are commonly used to store the name of the object being observed and the integration time in seconds, respectively.\n",
    "\n",
    "Below we will load the header and print out a limited set of the header lines in \"raw\" format (showing how the header metadata is stored in the file), and also look up a few specific bits of information using their keywords."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Print some header properties\n",
    "\n",
    "header = HDUList[0].header # Get the header out of the HDU list.\n",
    "print(repr(header[0:13]))  # Print the first 14 lines of the header (in raw format).\n",
    "print(repr(header[24:34])) # Print some additional lines\n",
    "print(repr(header[48:52])) \n",
    "print(repr(header[67:71])) \n",
    "print(repr(header[129:135])) \n",
    "print()\n",
    "print('The object is:', header['OBJECT'])   # Print the name of the object being observed\n",
    "print('The filters are:', header['ACAMFILT']) # Print the names of the optical filters in the light path.\n",
    "print('The exposure time is:', header['EXPTIME'])  # Print the exposure time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From this we can infer that this is a Sloan g-band image of AT2018cow and the exposure time was 90 seconds.  We can also see that this data was taken with the ACAM instrument on the WHT last July 14th (by your module leader).  This is only a select portion of the header; printing out the entire header would show other keywords relating to the detector settings and temperature and other properties of the instrument, telescope, and/or facililty."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Examine the data\n",
    "\n",
    "It's usually also a good idea to visually inspect the data to get a feel for its appearance and to make sure it doesn't have any obvious problems before we analyze it.  We will use the imshow command within pyplot, which requires us to provide some parameters defining the minimum and maximum range of the colorbar scale.  For this, we will use sigma-clipped image pixel statistics to come up with some reasonable values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Get and display the image data\n",
    "\n",
    "data = HDUList[1].data  # Get the data array (a simple numpy array) from the first extension.\n",
    "\n",
    "mean, median, std = sigma_clipped_stats(data) # get some image statistics\n",
    "plt.figure(figsize=(10,10))                     # set up the plot panel\n",
    "plt.imshow(data, vmin = median - 2*std, vmax = median + 20*std, origin='lower')\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This shows us that the field of view of the instrument is circular and centered at approximately x=1000, y=1100.  A significant fraction of the CCD is not exposed, which is important to make note of - the image statistics we calculated earlier include a very large number of blank pixels, a problem that not even sigma clipping can solve.  It would be better to use a subset of the image without any unexposed pixels to calculate the statistics:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mean, median, std = sigma_clipped_stats(data[500:1600,500:1600]) # get some image statistics\n",
    "plt.figure(figsize=(10,10))                     # set up the plot panel\n",
    "plt.imshow(data, vmin = median - 2*std, vmax = median + 20*std, origin='lower')\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this stretch some of the issues intrinsic to raw data become more obvious: the background seems to vary due to varying sensitivity across the field, there are some dark spots on the detector, and even the unexposed pixels have a quite large ADU count of around 2150.  Correcting or removing these effects to produce a \"pure\" image of the night sky is the role of data reduction.  There are two generic steps (bias correction and flat-fielding), which can be followed by a variety of possible high-level calibration and artifact-correction steps.  We will address these in the next section.  For now, we are done inspecting the science image so we will close it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "HDUList.close()                  # Close the file (good practice, but not strictly necessary)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## Creating a bias frame\n",
    "\n",
    "The detector image comes with an electronic offset introduced by the voltages applied to the amplifier.   This can be clearly seen in the science image above: even the unexposed parts of the detector have an apparent signal level of about 1900 counts.  We have to remove this offset to get real counts received by the detector. \n",
    "\n",
    "Different bias-removal methods are employed by different detectors.  In most cases, we record a set of images (\"bias frames\") with zero exposure time, so that the recorded image reflects *only* the intrinsic offsets in the detector (the detector does not receive any photons if the exposure time is zero).  To reduce detector noise and to remove cosmic ray contamination, we usually take many bias frames and average them together using median-combination.  (Even though the exposure time is zero, cosmic rays can strike the detector during readout.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Investigate the data\n",
    "\n",
    "As always, it's a good idea to inspect your images before you process them.  During the first <font color='red'>breakout segment</font>, open the first bias image and verify that it doesn't contain any obvious astronomical signals."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercise 1: Open and display the first bias image, and determine its median and clipped standard deviation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## EXERCISE 1:  Display the first bias image and determine its clipped standard deviation.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see a largely empty image with a hint of structure (faint diagonal and horizontal lines, and lower counts towards the edges), with a blip in the lower right corner (most likely caused by a cosmic ray hitting the detector during readout).  <font color='red'>*You can return from the breakout room once you've produced this, or when time is up.*</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Median-combine the bias images\n",
    "\n",
    "This process could be repeated with the other four images (in Python or using external software like DS9), but we'll trust the data for now and proceed to the next step, which is to combine all five bias frames together.  To do this, we set up a 3D numpy array and put a bias frame in each layer, and then median-combine along the 3rd dimension to produce a single, median-combined image (our \"master bias\")."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Median-combine the bias frames to make a master bias\n",
    "\n",
    "# Create a 3D array to store all the bias files together.\n",
    "ny = 2501\n",
    "nx = 2148\n",
    "numBiasFiles = len(biasList)\n",
    "biasImages = np.zeros((ny, nx, numBiasFiles))\n",
    "\n",
    "# Add the files to the array\n",
    "for i in range(numBiasFiles):\n",
    "        HDUList = fits.open(biasList[i])     # Open the file\n",
    "        biasImages[:,:,i] = HDUList[1].data  # Load the data into the appropriate layer\n",
    "        HDUList.close()                      # Close the file\n",
    "\n",
    "# Create the master bias frame by doing a median combination for each pixel over all layers\n",
    "masterBias = np.median(biasImages, axis=2)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercise 2: Display the master bias image and check that the noise is reduced compared to a single bias frame by calculating and printing out its (sigma-clipped) standard deviation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## EXERCISE 2:  Display the master bias image and calculate its median and clipped standard deviation.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It should look pretty similar to the individual bias frame, but with lower noise.  Read noise is generally negligible compared to the sky noise for broadband imaging observations such as this one, so this isn't a major consideration here.  However, for narrowband observations or for spectroscopy this may not be the case, so it's always a good idea to take many bias frames if possible."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## Creating a flat field\n",
    "\n",
    "The image recorded on the detector $I(x,y)$ is a product of the true brightness distribution on the sky $S(x,y)$ and the response of the telescope optics and detector in different parts of the image $F(x,y)$. with any bias signal added: i.e.: \n",
    "\n",
    "$I(x,y) = F(x,y) S(x,y) + B(x,y)$.  \n",
    "\n",
    "We already have $B(x,y)$ thanks to our bias frames, but we also need to determine $F(x,y)$ to reconstruct $S(x,y)$.\n",
    "\n",
    "To do this, we expose the detector to a uniform light source -- this can either be a uniformly illuminated part of the telescope dome or the twilight sky (the assumption is that the sky is uniformly bright over a small field of view)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Investigate the data\n",
    "\n",
    "As usual it's good to start by looking at the data.  During the second <font color='red'>breakout segment</font> you will visually examine the first image and print out a few basic properties for the remaining images."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercise 3: Open and display the first flat-field image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## EXERCISE 3:  Display the first flat-field image.  \n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see the circular field of view of the instrument with no sources discernable within, but some apparent structure with significantly higher counts in the middle than towards the edges.  You will likely want to play around with the \"vmin\" and \"vmax\" values a bit to see this structure.  Once you see this, please continue to the next exercise."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's a good idea to scan through all the flat-field files to make sure that the configuration is what we expect (e.g. the filter didn't change) and that the frames aren't saturated.  In the next exercise you will write a loop to do this check."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercise 4: Verify that all five flat-field images are acceptable by looping over each file and printing out the filter (ACAMFILT keyword), the exposure time (EXPTIME keyword), and the median pixel value (calculated from the data)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## EXERCISE 4: Verify that all five flat-field images are acceptable by printing out some basic properties.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your output should confirm that all images are in the same filter and have median count values well below the saturation level of about 63500 (65535-2000, or 2^16 minus the bias value that was subtracted off earlier).  <font color='red'>*You can return from the breakout room once you've confirmed that this is the case.*</font>  Note that the exposure times differ - why might this be the case?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Normalize and combine flat-fields\n",
    "\n",
    "The master flat field is formed by median combination in a similar way to the master bias frame, above.  There are two big differences.  First, we have to remove the bias from each image.  Second, we have to adjust for the fact that the light level may be changing from image to image by measuring the average pixel level in each image and dividing the image by it (normalizing).\n",
    "\n",
    "It is important that this normalization level be calculated only with pixels for which there is actually some signal.  For this camera much of the CCD is unused and has no signal, so we will use a square subset of the image and calculate its median.  Then we divide by this factor and stick it into the appropriate layer of a 3D array.  The median combination at the end is left an an exercise for you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Load and normalize the flat fields in preparation for making a master flat field.\n",
    "\n",
    "# Set up the 3D array\n",
    "numFlatFiles = len(flatList)\n",
    "flatImages = np.zeros((ny, nx, numFlatFiles))\n",
    "\n",
    "# Load the files into the array, with bias subtraction and normalization\n",
    "for i in range(numFlatFiles):\n",
    "        # Load the data from the fits file\n",
    "        HDUList = fits.open(flatList[i])\n",
    "        data = HDUList[1].data * 1. # Convert to floating point\n",
    "        HDUList.close()\n",
    "\n",
    "        # Bias-subtract, normalize, and add to the array layer\n",
    "        data -= masterBias\n",
    "        normfactor = np.median(data[500:1600,500:1600])\n",
    "        print(normfactor)\n",
    "        flatImages[:,:,i] = data / normfactor\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercise 5: Perform the median combination to produce a 2D array called masterFlat, and display it. (Be smart with the scaling: use only the exposed area to calculate the stats for vmin/vmax.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## EXERCISE 5: Median-combine the flat field layer array into a master flat and display it.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Flag unexposed pixels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that a large fraction of the above image has no signal at all, as you can easily check by changing the scaling above to e.g. vmin=0, vmax=1.1.  Dividing no-signal values by very low flat-field values will fill the unexposed part of the detector with garbage values, which isn't desirable.   Instead, we will set all the low pixel values to NaN (\"not a number\"), which signifies that these pixels have no scientific value.  These NaN values will propogate to the science images after flat-field division and flag the unwanted pixels in those as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Set the unexposed pixels to NaN, and display again\n",
    "\n",
    "masterFlatFixed = np.copy(masterFlat) # Use 'copy' to preserve the original masterFlat\n",
    "if np.any(masterFlat < 0.2):\n",
    "        # Set all flat-field values lower than 0.2 to NaN\n",
    "        masterFlatFixed[masterFlat < 0.2] = float('NaN') \n",
    "\n",
    "plt.figure(figsize=(10,10))\n",
    "#scale the image according to its statistics\n",
    "plt.imshow(masterFlatFixed, vmin = 0, vmax = median + 5*std, origin='lower')\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point, we have the necessary calibration images to reduce science images and visualize the true brightness distribution on the sky.  Remember that the observed intensity on the detector is:\n",
    "\n",
    "$I(x,y) = F(x,y) \\,  S(x,y) + B(x,y)$\n",
    "\n",
    "We now have to retrieve $S(x,y)$ from $I(x,y)$, by inverting this equation, i.e.:\n",
    "\n",
    "$S(x,y) = \\frac{I(x,y)-B(x,y)}{F(x,y)}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-processing science frames\n",
    "\n",
    "Now we will apply the bias flat calibrations to the science frames and write processed science frames to disk for further processing.  To simplify matters, we will write these processed files in 'simple' FITS format, combining the primary header data with the (bias- and flat-corrected) image data in the FITS extension for each file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "## Bias subtract and flat-field all science frames and write out pre-processed files.\n",
    "\n",
    "numSciFiles = len(sciList)\n",
    "print('Found %d science files'%numSciFiles)\n",
    "\n",
    "for i in range(numSciFiles):\n",
    "        # Read in the FITS data.\n",
    "        HDUList = fits.open(sciList[i])\n",
    "        primaryHeader = HDUList[0].header\n",
    "        imageData = HDUList[1].data    \n",
    "        HDUList.close()\n",
    "        \n",
    "        # Correct for the bias and flats here\n",
    "        procData = (imageData - masterBias) / masterFlatFixed\n",
    "        \n",
    "        # Prepare the output FITS structure in simple format\n",
    "        procHDU = fits.PrimaryHDU(procData)  # Create a new HDU with the processed image data\n",
    "        procHDU.header = primaryHeader       # Copy over the header from the raw file\n",
    "        procHDU.header.add_history('Bias corrected and flat-fielded') # Add a note to the header\n",
    "\n",
    "        # Write the reduced frame to disk\n",
    "        print(sciList[i],'->',procList[i])\n",
    "        procHDU.writeto(procList[i], overwrite=True)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now loop through the processed science images we just wrote to the disk and plot them to see what they look like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Inspect the processed science files\n",
    "\n",
    "plt.figure(figsize=(14,14))\n",
    "for i in range(numSciFiles):\n",
    "        procHDU = fits.open(procList[i])\n",
    "        procData = procHDU[0].data\n",
    "        procHDU.close()\n",
    "        \n",
    "        mean, median, std = sigma_clipped_stats(procData)\n",
    "        plt.subplot(2,2,1+i)\n",
    "        plt.imshow(procData, vmin = median - 2*std, vmax = median + 10*std, origin='lower')\n",
    "        plt.colorbar()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These images are suitable for some types of scientific analysis.  However, we would also like to combine all three together to remove cosmic rays and reduce the noise (go deeper).  This is not as simple as combining the biases or the flats: the images were dithered (the telescope was moved between each exposure) so the astronomical sources lie at different pixel positions in different images.  We will have to align the images relative to each other before stacking."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Relative astrometric aligment\n",
    "\n",
    "Aligning images is a form of astrometry (the measurement of precise positions of objects on the sky).  We will worry about absolute astrometry versus the all-sky RA, DEC reference system later, but we do need a way to measure and remove the relative offsets between images now.  If our image-to-image dithering motions are small, the telescope field of view does not contain significant distortions, and the instrument was not rotated, we can do this ourselves in Python by measuring the relative positions of a reference star from image to image and offsetting."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Measure reference star locations\n",
    "\n",
    "The first step is to find a nice reference star near the center of the image.  By measuring its centroid as it shifts between all three images, we can calculate how much to offset each image to line them up with each other.  Fortunately there is an excellent, nonsaturated star just to the left of the bright galaxy that's well-suited for this purpose.  Its approximate positions (x,y) in the three science images, estimated by eye, are (998,1161), (998,1121), and (958,1121) respectively.  But it would be better to measure the position more precisely, so the below loop takes a little cutout around our rough position in each image, calculates the centroid, and stores that in a separate list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Measure the precise position of a star in each image\n",
    "\n",
    "# Set the approximate star coordinates\n",
    "estimated_star_x = [998,998,958]\n",
    "estimated_star_y = [1161,1121,1121]\n",
    "actual_star_x = [0,0,0] # dummy values\n",
    "actual_star_y = [0,0,0]\n",
    "\n",
    "for i in range(numSciFiles):\n",
    "        HDUList = fits.open(procList[i])\n",
    "        procData = HDUList[0].data\n",
    "        HDUList.close()\n",
    "        (x0, x1) = (estimated_star_x[i]-20, estimated_star_x[i]+20)\n",
    "        (y0, y1) = (estimated_star_y[i]-20, estimated_star_y[i]+20)\n",
    "        cutout = procData[y0:y1,x0:x1]\n",
    "        mean, median, std = sigma_clipped_stats(cutout)\n",
    "        cx, cy = photutils.centroid_com(cutout-median)\n",
    "        actual_star_x[i] = cx+x0\n",
    "        actual_star_y[i] = cy+y0\n",
    "        print('Image', i, ':', actual_star_x[i], actual_star_y[i])\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply offsets and crop/shift images\n",
    "\n",
    "Next we apply these offsets to the image.  In the general case (e.g. if rotations are involved or if the pixel offsets are not pure integers), this involves resampling the image along a new coordinate grid.  To keep things simple, we are just going to apply a simple integer offset by taking appropriately matching slices of each array.  The first image will be cropped from y=150:2150 and x=150:2015 (2000 pixels on each axis).  The other two images will be cropped similarly but with the x and y starting and ending values changed as needed based on their offset.\n",
    "\n",
    "While we're at it, we also subtract the sky background: because the sky level can vary, it's important to remove it prior to stacking the images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Offset each image by subsetting to line them up with each other, and subtract the sky\n",
    "\n",
    "# Define the default crop region and set up the 3D array\n",
    "xmin = 50\n",
    "xmax = 2050\n",
    "ymin = 150\n",
    "ymax = 2150       \n",
    "alignedImages = np.zeros((ymax-ymin,xmax-xmin, numSciFiles))\n",
    "\n",
    "# Calculate and apply offsets, and load the 3D array with appropriately shifted data\n",
    "for i in range(numSciFiles):\n",
    "        # Calculate the offset relative to the reference image\n",
    "        xoffset = int(round(actual_star_x[i]-actual_star_x[0]))\n",
    "        yoffset = int(round(actual_star_y[i]-actual_star_y[0]))\n",
    "        print('Image', i, 'offset:', xoffset, yoffset)\n",
    "\n",
    "        # Open the FITS file and get the data\n",
    "        HDUList = fits.open(procList[i])        \n",
    "        procData = HDUList[0].data\n",
    "        HDUList.close()\n",
    "        \n",
    "        # Crop the data appropriately to match up with the reference image\n",
    "        shiftData = procData[ymin+yoffset:ymax+yoffset, (xmin+xoffset):(xmax+xoffset)]\n",
    "        shiftData -= np.nanmedian(shiftData) # Subtract sky\n",
    "        alignedImages[:,:,i] = shiftData     # Add this layer to the 3D array\n",
    "        \n",
    "# Verify the images are aligned by plotting a little cutout around where the star should be\n",
    "for i in range(numSciFiles):\n",
    "        plt.subplot(1,3,1+i)\n",
    "        (starX, starY) = (int(round(actual_star_x[0]-xmin)), int(round(actual_star_y[0]-ymin)))\n",
    "        starCutout = alignedImages[starY-9:starY+9,starX-9:starX+9,i]\n",
    "        plt.imshow(starCutout, vmin = 0, vmax = 10000, origin='lower')        \n",
    "        \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The star is now in the same place (same pixels) on all three images, so that's good."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Median-combine science images\n",
    "\n",
    "During the third and last <font color='red'>breakout segment</font>, you will median-combine the aligned images together (in case you haven't noticed, optical astronomers are really into medians).  Note that while we have subtracted the sky, we have not scaled the data, so we are assuming all three images have the same exposure time and very similar sky conditions (no clouds).  In variable seeing or transparency a median combination will produce strange results!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### EXERCISE 6: Median-combine the shifted/cropped science images and save the result to disk.   For the output filename, use the string stored in combinedFile.  (Hint: use nanmedian for the stack, and re-use code from when the processed files were written to disk)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## EXERCISE 6: Median-combine the shifted/cropped science images and save the result to disk.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If this worked correctly and you wrote the file out to the path given in the combinedFile variable, the code below should display it for you:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Confirm the stack worked correctly by reloading the combined image from disk and displaying it.\n",
    "\n",
    "HDUList = fits.open(combinedFile) # Open the first science file in order to retrieve its header\n",
    "combinedImage = HDUList[0].data\n",
    "HDUList.close()\n",
    "\n",
    "plt.figure(figsize = (12,10))\n",
    "mean, median, std = sigma_clipped_stats(combinedImage)\n",
    "plt.imshow(combinedImage, vmin = median - 1*std, vmax = median + 100*std, origin='lower')\n",
    "plt.colorbar()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point you should see an image of the field similar to the one that we displayed during the \"Examine the Data\" step, but the pixel level of the background sky should be close to zero and you should not be able to discern any artifacts away from the edge of the field of view.  <font color='red'>*If this is the case, you can return from the breakout room.*</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Absolute astrometric calibration\n",
    "\n",
    "We now have a combined, science-ready image... sort of.  In reality, doing any sort of scientific analysis on this image as it stands will be a big pain, because it doesn't contain any positional information.  If we wanted to extract information about a particular star or galaxy, or (worse) a whole list of objects, this would be extremely frustrating: we would have to scan the image by eye to find each object, write down its pixel coordinates, and send them one-by-one to our photometry or other analysis code.\n",
    "\n",
    "Fortunately, the FITS format can encode astrometric (positional) information via the header, enabling us to transform (x,y) to (RA,Dec) and back again for any location in the image.  Basic header keywords encode the position, pixel scale, and rotation of the image (plus the spherical projection to be used); these are fully standardized and supported by nearly all astronomical software.  Additional keywords provide distortion terms to allow for deviations from perfectly rectilinear CCD/optics, although unfortunately there are several conflicting conventions for treating distortions and there is no guarantee that your software will support them correctly.\n",
    "\n",
    "Solving astrometry requires matching a group of stars detected in your image to the stars in the catalog, and solving (fitting) for the parameters in the system of astrometric equations that relate pixel coordinates (x,y) to world coordinates (RA,Dec).  Doing this by hand is painstaking so we will rely on a general-purpose python script that pattern-matches groups of stars in the image to stars in sky catalogs, solves for the astrometric parameters, and writes them to the header.  This process is entirely automated: all we have to give the code is the pixel scale (the projected \"size\" of each pixel in arcseconds), which for WHT/ACAM is 0.253.\n",
    "\n",
    "The code requires SourceExtractor (aka SExtractor), an external piece of software that automatically detects and measures the properties of all star-like objects in the image.  Note that you may have to modify the \"autoastrometry3.py\" file if this is called in an unusual way from your operating system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Astrometrically calibrate the reduced image using an external python script\n",
    "\n",
    "autoastrometry_script = os.path.join(dataFolder, 'autoastrometry3.py')\n",
    "os.chdir(procFolder)\n",
    "sextractorCommand = 'sextractor' # Change this if you get an error below\n",
    "\n",
    "try:\n",
    "    #Run the autoastrometry script using 2MASS as the reference catalog by specifying 'tmc' with the '-c' option\n",
    "    astromFile = combinedFile.replace('.fits','.astrom.fits')\n",
    "    command = 'python3 %s %s -c sdss -px 0.253 -inv -o %s -sex %s -1' % (autoastrometry_script, combinedFile, astromFile, sextractorCommand)\n",
    "    print('Executing command: %s' % command)\n",
    "    print('Processing...')\n",
    "    rval = subprocess.run(command.split(), check=True, capture_output=True)\n",
    "    print('Process completed.')\n",
    "    print(rval.stdout.decode())\n",
    "except subprocess.CalledProcessError as err:\n",
    "    print('Could not run autoastrometry with error %s. Check if file exists.'%err)\n",
    "    \n",
    "os.chdir(curpath) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If this worked you should see a list of successful matches followed by information about the applied rotation and spatial offset.  To make sure the astrometric solution is correct, we can load the star positions (in world coordinates, i.e. RA/Dec) on top of our image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "## Display astrometrically calibrated image with star catalog overlaid\n",
    "\n",
    "# Open the file and get the info\n",
    "HDUList = fits.open(astromFile)\n",
    "header = HDUList[0].header\n",
    "combinedImage = HDUList[0].data\n",
    "HDUList.close()\n",
    "\n",
    "# Display the image, associating the WCS information with the axes\n",
    "wcs = WCS(header)\n",
    "fig = plt.figure(figsize=(14,14))\n",
    "ax = plt.subplot(projection=wcs)\n",
    "plt.imshow(combinedImage, vmin=0, vmax=1000, origin=\"lower\")\n",
    "\n",
    "# Load star catalog region file\n",
    "regionFile = os.path.join(procFolder,'cat.wcs.reg')\n",
    "r = pyregion.open(regionFile).as_imagecoord(header=header)\n",
    "patch_list, artist_list = r.get_mpl_patches_texts()\n",
    "\n",
    "# Overlay the star catalog on the plot\n",
    "for p in patch_list:\n",
    "    ax.add_patch(p)\n",
    "for t in artist_list:\n",
    "    ax.add_artist(t)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most of the catalog star positions should land on top of bright stars in the image, indicating that our image is correctly aligned to at least a reasonable approximation.  Small deviations may be due to proper motions (stars move!) and catalog errors, or due to distortions in our image field that we have not tried to solve for.\n",
    "\n",
    "Despite the lack of distortion solution, this should be good enough to find any star or object in the field automatically given its coordinates.  Also, once an image is approximately aligned applying further corrections (using more sophisticated software like scamp) is much easier.\n",
    "\n",
    "You can load this image in DS9 or other viewing software to interactively explore it in more detail."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resampling\n",
    "\n",
    "The above image does not have North up and East to the left, the normal convention for displaying astronomical images.  The easiest way to project an image to the standard reference system is to resample it using SWarp, another piece of external code.  This can be done as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Resample the image to standard orientation using SWarp\n",
    "\n",
    "swarpConfigFile = os.path.join(dataFolder, 'stack.swarp')\n",
    "swarpCommand = 'SWarp' # Change this if you get an error below\n",
    "\n",
    "try:\n",
    "    #IMAGEOUT_NAME is the output stacked file, RESAMPLE_DIR is the directory where the resampled images are to be stored\n",
    "\n",
    "    command = '%s %s -c %s -IMAGEOUT_NAME %s -WEIGHTOUT_NAME %s -RESAMPLE_DIR %s' % (swarpCommand, astromFile, swarpConfigFile, resampledFile, weightFile, procFolder)\n",
    "    print(\"Executing command: %s\" % command)\n",
    "    print(\"Processing...\")\n",
    "    rval = subprocess.run(command.split(), check=True, capture_output=True)\n",
    "    print(\"Process completed.\")\n",
    "    print(rval.stdout.decode())\n",
    "except subprocess.CalledProcessError as err:\n",
    "    print('Could not run SWarp. Can you run it from the terminal?')\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can then be displayed as normal:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Display the resampled image\n",
    "\n",
    "HDUList = fits.open(resampledFile)\n",
    "wcs = WCS(HDUList[0].header)\n",
    "resampledData = HDUList[0].data\n",
    "HDUList.close()\n",
    "\n",
    "fig = plt.figure(figsize=(14,14))\n",
    "ax = plt.subplot(projection=wcs)\n",
    "plt.imshow(resampledData, vmin=0., vmax=1000, origin=\"lower\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that SWarp has set the NaN values to zero, which is somewhat problematic: we may want to change these in software in the future before doing any analysis.  The noise properties have also been changed by the resampling/interpolation.\n",
    "\n",
    "SWarp also supports resampling a series of (astrometrically-corrected) images onto a common grid and median-stacking them to produce a combined file: simply give it a list of files instead of a single file.  If you have extra time, you can try to produce a stacked image using an this method instead of our Python-based method: astrometrically calibrate the three individual images (with autoastrometry.py) and stack them (with SWarp).  This can be done entirely from the UNIX command line outside of the notebook if you prefer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Concluding remarks\n",
    "\n",
    "At this point, we're done!  Either of the above images could be used for photometry or for other forms of image analysis that will you learn about in the next session.\n",
    "\n",
    "It is important to note that procedures may differ from telescope to telescope and depending on the type of analysis you're interested in.  Depending on your detector, filter passband, the number of exposures, the time-variability of your source, whether the moon is up or down, whether clouds were present, and the specifications of the type of science you need to do, you may need to carry out additional steps or different steps, such as:\n",
    "\n",
    "* Overscan subtraction\n",
    "* Dark current subtraction\n",
    "* Bad pixel masking\n",
    "* Source masking in sky flat construction\n",
    "* Super-sky flat construction and subtraction\n",
    "* Linearity correction\n",
    "* Fringe correction or subtraction\n",
    "* Median-filtered or model-based sky subtraction\n",
    "* Dithered-group sky subtraction\n",
    "* Cosmic ray filtering/masking\n",
    "* Photometric scaling of individual exposures\n",
    "* Astrometric distortion corrections\n",
    "\n",
    "### Cleanup\n",
    "\n",
    "To save disk space, you can execute the cell below to remove your processed data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# WARNING: WILL REMOVE YOUR PROCESSED DATA FROM THIS NOTEBOOK\n",
    "os.chdir(curpath)\n",
    "for f in os.listdir(procFolder):\n",
    "    try:\n",
    "        os.remove(os.path.join(procFolder,f)) # clear the processing folder from previous iterations\n",
    "    except:\n",
    "        print('Could not remove',f)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}