{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Image combination\n",
    "\n",
    "Image combination serves several purposes. Combining images:\n",
    "\n",
    "+ reduces noise in images\n",
    "+ can remove transient artifacts like cosmic rays and satellite tracks\n",
    "+ can remove stars in flat images taken at twilight\n",
    "\n",
    "It's essential that several of each type of calibration image (bias, dark, flat)\n",
    "be taken. Combining them reduces the noise in the images by roughly a factor of\n",
    "$1/\\sqrt{N}$, where $N$ is the number of images being combined. As shown in the\n",
    "previous notebook, using a single calibration image actually *increases* the\n",
    "noise in your image.\n",
    "\n",
    "There are a few ways to combine images; if done properly, features that show up\n",
    "in only one of the images (like cosmic rays) are not present in the combination.\n",
    "If done incorrectly, those features show up in your combined images and then\n",
    "contaminate your calibrated science images too.\n",
    "\n",
    "## The bottom line: combine by averaging images, but clip extreme values\n",
    "\n",
    "The remainder of this notebook demonstrates this conclusion and explains how to\n",
    "do a combination by averaging images with [ccdproc](https://ccdproc.readthedocs.io/en/latest/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "%matplotlib inline\n",
    "from matplotlib import pyplot as plt\n",
    "from matplotlib import rc\n",
    "\n",
    "from astropy.visualization import hist\n",
    "from astropy.stats import mad_std"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use custom style for larger fonts and figures\n",
    "plt.style.use('guide.mplstyle')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set some default parameters for the plots below\n",
    "rc('font', size=20)\n",
    "rc('axes', grid=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up the random number generator, allowing a seed to be set from the environment\n",
    "seed = os.getenv('GUIDE_RANDOM_SEED', None)\n",
    "\n",
    "if seed is not None:\n",
    "    seed = int(seed)\n",
    "    \n",
    "# This is the generator to use for any image component which changes in each image, e.g. read noise\n",
    "# or Poisson error\n",
    "noise_rng = np.random.default_rng(seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combination method: average or median?\n",
    "\n",
    "In this section we'll look at a simplified version of the challenges of\n",
    "combining images to reduce noise. It's fair to think of astronomical images\n",
    "(especially bias and dark images) as being a Gaussian distribution of pixel\n",
    "values around the bias level, and a width related to the read noise of the\n",
    "detector. To simplify what follows, we will work arrays of random numbers drawn\n",
    "from a Gaussian distribution instead of with astronomical images.\n",
    "\n",
    "In properly done flat images the noise is technically a Poisson distribution,\n",
    "but with a large enough number of counts, the distribution is indistinguishable\n",
    "from a Gaussian distribution whose width is related to the square root of the\n",
    "number of counts. While some regions of a science image are dominated by Poisson\n",
    "noise from sources in the image, most of the image will be dominated by Gaussian\n",
    "read noise from the detector or Poisson noise from the sky background.\n",
    "\n",
    "Instead of working with a combination of images, we'll create 100 Gaussian\n",
    "distributions with a mean of zero, and a standard deviation of one, and combine\n",
    "those two different ways: by finding the average and by finding the median. Each\n",
    "distribution has size $320^2$ so that we can view it as either a distribution of\n",
    "102,400 values or as an image that is $320 \\times 320$.\n",
    "\n",
    "We can think of each of these 100 distributions as representing an image, like a\n",
    "bias or dark. To make the analogy to real images a little more direct, a \"bias\"\n",
    "of 1000 is added to each distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_distributions = 100\n",
    "bias_level = 1000\n",
    "n_side = 320\n",
    "bits = noise_rng.normal(size=(n_distributions, n_side**2)) + bias_level\n",
    "average = np.average(bits, axis=0)\n",
    "median = np.median(bits, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we've created the distributions and combined them in two different\n",
    "ways, let's take a look at them. The [`hist` function from astropy.visualization](https://astropy.readthedocs.io/en/stable/visualization/histogram.html) is used\n",
    "below because it can figure out what bin size to use for your data.\n",
    "\n",
    "<!-- *Note: but beware https://github.com/astropy/astropy/issues/7758* -->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(1, 2, sharey=True, tight_layout=True, figsize=(20, 10))\n",
    "\n",
    "hist(bits[0, :], bins='freedman', ax=ax[0]);\n",
    "ax[0].set_title('One sample distribution')\n",
    "ax[0].set_xlabel('Pixel value')\n",
    "ax[0].set_ylabel('Number of pixels')\n",
    "\n",
    "hist(average, bins='freedman', label='average', alpha=0.5, ax=ax[1]);\n",
    "hist(median, bins='freedman', label='median', alpha=0.5, ax=ax[1]);\n",
    "ax[1].set_title('{} distributions combined'.format(n_distributions))\n",
    "ax[1].set_xlabel('Pixel value')\n",
    "ax[1].legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Combining by averaging gives a narrower (i.e. less noisy) distribution than\n",
    "combining by median, though both substantially reduced the width of the\n",
    "distribution. The conclusion so far is that combining by averaging is mildly\n",
    "preferable to combining by median. Computationally, the mean is also faster to\n",
    "compute than the median."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Image view of these distributions\n",
    "\n",
    "As suggested above, we could view each of these distributions as an image\n",
    "instead of a histogram. One take away from the diagram below is that in this\n",
    "case, the difference between mean and median is not apparent.\n",
    "\n",
    "In all cases, the extreme values of the image display are set to bracket the\n",
    "width of the initial distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 3, sharey=True, tight_layout=True, figsize=(20, 10))\n",
    "data_source = [bits[0, :], average, median]\n",
    "titles = ['One distrbution', 'Average of {n}'.format(n=n_distributions), 'Median of {n}'.format(n=n_distributions)]\n",
    "\n",
    "for axis, data, title in zip(axes, data_source, titles):\n",
    "    axis.imshow(data.reshape(n_side, n_side), vmin=bias_level - 3, vmax=bias_level + 3)\n",
    "    axis.set_xticks([])\n",
    "    axis.set_yticks([])\n",
    "    axis.grid(False)\n",
    "    axis.set_title(title)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The effect of outliers\n",
    "\n",
    "Suppose that, in just one of the 100 distributions we're combining, there are a\n",
    "small number of extreme values. In astronomical images these extremes happen\n",
    "very frequently because of cosmic ray hits on the detector that cause, in one\n",
    "small patch of a calibration image, much higher counts. Another case occurs when\n",
    "combining twilight flats, which often contain faint images of stars.\n",
    "\n",
    "In the example below, we set just 50 points out of the 102,400 in the first\n",
    "distribution to a somewhat higher value than the rest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bits[0, 10000:10050] = 2 * bias_level"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember, we can think of the values in this distribution as an image, a view\n",
    "that will be particularly convenient in this case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(bits[0, :].reshape(n_side, n_side), vmin=bias_level - 3, vmax=bias_level + 3)\n",
    "plt.xticks([])\n",
    "plt.yticks([])\n",
    "plt.title('One distribution with outliers')\n",
    "plt.grid(False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we know what the outliers in this (and *only* this) distribution look\n",
    "like, we'll combine all of the distributions as we did above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "average = np.average(bits, axis=0)\n",
    "median = np.median(bits, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even though only one out of the 100 \"images\" we're combining has these high\n",
    "pixel values, the distribution of pixels for the average is clearly affected\n",
    "(well, maybe not clearly, since seeing it requires a logarithmic $y$-axis). The\n",
    "distribution for the median looks much the same as above. Since median simply\n",
    "looks for the middle value, an extreme value doesn't affect the result too much."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10, 10))\n",
    "hist(average, bins='freedman', alpha=0.5, label='average');\n",
    "hist(median, bins='freedman', alpha=0.5, label='median');\n",
    "plt.legend()\n",
    "plt.xlabel('Counts')\n",
    "plt.ylabel('Number of pixels')\n",
    "plt.semilogy();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Combining using the average has a noticeable effect on the result; median\n",
    "removes the artifact\n",
    "\n",
    "The effect of the outlier is *much* clearer if the distributions are displayed\n",
    "as images. If the distributions we're combining were calibration images then the\n",
    "outliers that appear in one image (e.g. a cosmic ray) would affect the combined\n",
    "image we hoped to use for calibration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 3, sharey=True, tight_layout=True, figsize=(20, 10))\n",
    "data_source = [bits[0, :], average, median]\n",
    "titles = ['One distribution with outliers', 'Average of {n}'.format(n=n_distributions), 'Median of {n}'.format(n=n_distributions)]\n",
    "\n",
    "for axis, data, title in zip(axes, data_source, titles):\n",
    "    axis.imshow(data.reshape(n_side, n_side), vmin=bias_level - 3, vmax=bias_level + 3)\n",
    "    axis.set_xticks([])\n",
    "    axis.set_yticks([])\n",
    "    axis.grid(False)\n",
    "    axis.set_title(title)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On one hand, the noise properties are better when you combine by taking the\n",
    "average. On the other hand, the median eliminates features that appear in only\n",
    "one image.\n",
    "\n",
    "Astronomical images will almost always have those transient features. Even at an\n",
    "observatory near sea level in an exposure that is very short, cosmic ray hits\n",
    "are common."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The solution: average combine, but clip the extreme values\n",
    "\n",
    "The answer here is to first clip extreme values from the distributions and then\n",
    "combine using the average. That rejects outlying values like the median but with\n",
    "the modestly better statistical properties of the average. A method called\n",
    "\"sigma clipping\" is used to remove the extreme values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Please do not use the code below for reducing your data...**\n",
    "\n",
    "...in the next set of notebooks we'll walk through the package\n",
    "[ccdproc](https://ccdproc.readthedocs.io), which automates much of what you see below.\n",
    "The section below demonstrates and explains some of what's happening behind the\n",
    "scenes in [ccdproc](https://ccdproc.readthedocs.io)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sigma clipping\n",
    "\n",
    "Sigma clipping means calculating how \"far\" each pixel to be combined is from the\n",
    "\"typical\" value and excluding values from the combination if they are \"too far\"\n",
    "from the pixel value.\n",
    "\n",
    "To be clear, when evaluating which values to reject we're doing it for each of\n",
    "the 102,400 points in the distribution (or, if you prefer, each of the\n",
    "320$\\times$320 pixels in the image) we're going to combine. In other words, for\n",
    "each point (or pixel), we'll compute a \"typical\" value for the 100 distributions\n",
    "(images) we're combining and exclude any from the average that are \"too far\"\n",
    "from the \"typical value.\"\n",
    "\n",
    "What should be used as the \"typical\" value, how do we measure how \"far\" away a\n",
    "value is, and how far is \"too far\"?\n",
    "\n",
    "The last question is easiest to answer: it depends a bit on the noise level in\n",
    "your camera but something like 5 farther from the \"typical\" value than most of\n",
    "the pixels are.\n",
    "\n",
    "Using the average as the typical value and the standard deviation as a measure\n",
    "of how far a particular value is from the typical value is often not the best\n",
    "choice. The problem with this is that outlying values in a single distribution\n",
    "(or image) strongly bias the average and exaggerate the standard deviation. In\n",
    "this example, where we're combining 100 distributions (images), using the\n",
    "average and standard deviation might work since there are so many distributions.\n",
    "A more typical number of bias or dark images that one might combine is 10 or 20.\n",
    "In that case, an extreme value in one image strongly affects the mean and\n",
    "standard deviation.\n",
    "\n",
    "As an example, consider combining 10, 20, or 100 of our distributions, as shown\n",
    "in the cell below. Only in the case of 100 distributions would our extreme value\n",
    "of 2000 be excluded if we excluded values more than 5 times the standard\n",
    "deviation from the average."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Number combined\\t Average\\t Standard dev σ \\t 10σ ')\n",
    "\n",
    "for n_to_combine in [10, 20, n_distributions]:\n",
    "    avg = np.mean(bits[:n_to_combine, 10000])\n",
    "    std = np.std(bits[:n_to_combine, 10000])\n",
    "    print('{n:10d}\\t{avg:10.2f}\\t{std:10.2f}\\t{ten_sig:10.2f}'.format(n=n_to_combine, \n",
    "                                         avg=avg, \n",
    "                                         std=std, ten_sig=10 * std))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A better choice is to use the median as the typical value and the *median\n",
    "absolute deviation* in place of the standard deviation as the measure of how far\n",
    "a value is from the typical value. The [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation), or MAD,\n",
    "of a set of points $x$ is defined by:\n",
    "$$\n",
    "\\text{MAD} = \\text{median}\\big( x_i - \\text{median}(x) \\big).\n",
    "$$\n",
    "This is a measure of the typical absolute distance from the median of the set of\n",
    "values. The MAD is not directly equivalent to the standard deviation. The\n",
    "relationship between the two depends on the distribution of values, but for a\n",
    "Gaussian distribution multiplying the MAD by 1.4826 does the trick. The\n",
    "[astropy function `mad_std`](http://docs.astropy.org/en/stable/api/astropy.stats.mad_std.html) will calculate the MAD and multiply by the\n",
    "appropriate factor for you.\n",
    "\n",
    "\n",
    "Repeating the calculation above but with median as the central value and the MAD\n",
    "in place of the standard deviation demonstrates that even for 10 distributions\n",
    "the extreme value will be excluded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('{:^20}{:^20}{:^20}{:^20}'.format('Number combined', 'Median', 'MAD σ', '10σ'))\n",
    "\n",
    "for n_to_combine in [10, 20, n_distributions]:\n",
    "    avg = np.median(bits[:n_to_combine, 10000])\n",
    "    std = mad_std(bits[:n_to_combine, 10000])\n",
    "    print('{n:^20d}{avg:^20.2f}{std:^20.2f}{ten_sig:^20.2f}'.format(n=n_to_combine, \n",
    "                                         avg=avg, \n",
    "                                         std=std, ten_sig=10 * std))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The downside to using the median and median absolute deviation? They can be slow\n",
    "to compute for large images or large stacks of images."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cells below perform the actual clipping; you should generally use the\n",
    "astropy function [`sigma_clip`](https://astropy.readthedocs.io/en/stable/stats/robust.html) to do this, but here we'll do\n",
    "it manually to illustrate the process.\n",
    "\n",
    "We begin by calculating the MAD standard deviation estimator for our data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mad_sigma = mad_std(bits, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The expression below is true for all of the points farther than $10\n",
    "\\sigma_{MAD}$ from the median of the distributions and false everywhere else.\n",
    "This array will be used to exclude the extreme points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "exclude = (bits - median) / mad_sigma > 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we calculate the average, excluding the points identified as \"too far\"\n",
    "from from the median. There are two approaches we can take here. One is to use\n",
    "numpy masked arrays; the other is to temporarily set the excluded values to the\n",
    "special value `np.nan` and use a numpy function that excludes `nan` from the\n",
    "calculation. The latter approach is often faster than the former.\n",
    "\n",
    "The best approach is really to use a higher-level function from astropy for\n",
    "ccdproc. Those will take care of the details of implementing the clipping for\n",
    "you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "original_values = bits[exclude]\n",
    "bits[exclude] = np.nan\n",
    "\n",
    "clip_combine = np.nanmean(bits, axis=0)\n",
    "bits[exclude] = original_values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "Combine images by (1) excluding extreme values using sigma clipping, with the\n",
    "median as the typical value and the MAD estimator of the standard deviation, and\n",
    "then (2) averaging the remaining pixels across all of the images.\n",
    "\n",
    "Note that in the distribution below the clipped average is a narrower\n",
    "distribution (less noise) than the median but that it still excludes the extreme\n",
    "value that appeared in one image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10, 10))\n",
    "hist(clip_combine, bins='freedman', alpha=0.5, label='clipped average')\n",
    "hist(median, bins='freedman', alpha=0.5, label='median');\n",
    "plt.legend()\n",
    "plt.xlabel('Counts')\n",
    "plt.ylabel('Number of pixels')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}