{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![CTA first data challenge logo](images/cta-1dc.png)\n",
    "\n",
    "# CTA first data challenge (1DC) with Gammapy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "In 2017 and 2018, the [CTA observatory](https://www.cta-observatory.org/) did a first data challenge, called CTA DC-1, where CTA high-level data was simulated assuming a sky model and using CTA IRFs.\n",
    "\n",
    "The main page for CTA 1DC is here:\n",
    "https://forge.in2p3.fr/projects/data-challenge-1-dc-1/wiki (CTA internal)\n",
    "\n",
    "There you will find information on 1DC sky models, data access, data organisation, simulation and analysis tools, simulated observations, as well as contact information if you have any questions or comments.\n",
    "\n",
    "### This tutorial notebook\n",
    "\n",
    "This notebook shows you how to get started with CTA 1DC data and what it contains.\n",
    "\n",
    "You will learn how to use Astropy and Gammapy to:\n",
    "\n",
    "* access event data\n",
    "* access instrument response functions (IRFs)\n",
    "* use index files and the ``gammapy.data.DataStore`` to access all data\n",
    "* use the observation index file to select the observations you're interested in\n",
    "* read model XML files from Python (not integrated in Gammapy yet)\n",
    "\n",
    "This is to familiarise ourselves with the data files and to get an overview.\n",
    "\n",
    "### Further information\n",
    "\n",
    "How to analyse the CTA 1DC data with Gammapy (make an image and spectrum) is shown in a second notebook [cta_data_analysis.ipynb](cta_data_analysis.ipynb). If you prefer, you can of course just skim or skip this notebook and go straight to second one."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook and Gammapy Setup\n",
    "\n",
    "Before we get started, please execcute the following code cells.\n",
    "\n",
    "The first one configures the notebooks so that plots are shown inline (if you don't do this, separate windows will pop up). The second cell imports and checks the version of the packages we will use below. If you're missing some packages, install them and then select \"Kernel -> Restart\" above to restart this notebook.\n",
    "\n",
    "In case you're new to Jupyter notebooks: to execute a cell, select it, then type \"SHIFT\" + \"ENTER\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import astropy\n",
    "import gammapy\n",
    "\n",
    "print(\"numpy:\", np.__version__)\n",
    "print(\"astropy:\", astropy.__version__)\n",
    "print(\"gammapy:\", gammapy.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DC-1 data\n",
    "\n",
    "In this and other Gammapy tutorials we will only access a few data files from CTA DC-1 from `$GAMMAPY_DATA/cta-1dc` that you will have if you followed the \"Getting started with Gammapy\" instructions and executed `gammapy download tutorials`.\n",
    "\n",
    "Information how to download more or all of the DC-1 data (in total 20 GB) is here:\n",
    "https://forge.in2p3.fr/projects/data-challenge-1-dc-1/wiki#Data-access\n",
    "\n",
    "Working with that data with Gammapy is identical to what we show here, except that the recommended way to do it is to point `DataStore.read` at `$CTADATA/index/gps` or whatever dataset or files you'd like to access there."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!echo $GAMMAPY_DATA/cta-1dc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls $GAMMAPY_DATA/cta-1dc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Let's have a look around at the directories and files in `$GAMMAPY_DATA`.\n",
    "\n",
    "We will look at the `data` folder with events, the `caldb` folder with the IRFs and the `index` folder with the index files. At the end, we will also mention what the `model` and `obs` folder contains, but they aren't used with Gammapy, at least not at the moment.\n",
    "\n",
    "## EVENT data\n",
    "\n",
    "First, the EVENT data (RA, DEC, ENERGY, TIME of each photon or hadronic background event) is in the `data` folder, with one observation per file. The \"baseline\" refers to the assumed CTA array that was used to simulate the observations. The number in the filename is the observation identifier `OBS_ID` of the observation. Observations are ~ 30 minutes, pointing at a fixed location on the sky."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -1 $GAMMAPY_DATA/cta-1dc/data/baseline/gps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's open up the first event list using the Gammapy `EventList` class, which contains the ``EVENTS`` table data via the ``table`` attribute as an Astropy `Table` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gammapy.data import EventList\n",
    "\n",
    "path = \"$GAMMAPY_DATA/cta-1dc/data/baseline/gps/gps_baseline_110380.fits\"\n",
    "events = EventList.read(path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(events)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(events.table)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First event (using [] for \"indexing\")\n",
    "events.table[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First few events (using [] for \"slicing\")\n",
    "events.table[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Event times can be accessed as Astropy Time objects\n",
    "type(events.time)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "events.time[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert event time to more human-readable format\n",
    "print(events.time[:2].fits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Event positions can be accessed as Astropy SkyCoord objects\n",
    "print(type(events.radec))\n",
    "events.radec[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "events.galactic[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The event header information is stored\n",
    "# in the `events.table.meta` dictionary\n",
    "print(type(events.table.meta))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# E.g. to get the observation pointing position in degrees:\n",
    "events.table.meta[\"RA_PNT\"], events.table.meta[\"DEC_PNT\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## EVENT analysis example\n",
    "\n",
    "As an example how to work with EVENT data, let's look at the spatial and energy distribution of the events for a single run.\n",
    "\n",
    "Note that EVENT data in Gammapy is just stored in an Astropy Table, which is basically a dictionary mapping column names to column data, where the column data is a Numpy array. This means you can efficienly process it from Python using any of the scientific Python tools you like (e.g. Numpy, Scipy, scikit-image, scikit-learn, ...)\n",
    "\n",
    "### EVENT spatial distribution\n",
    "\n",
    "To illustrate a bit how to work with EVENT table an header data,\n",
    "let's plot the event positions as well as the pointing position."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Event positions\n",
    "pos = events.galactic[::300]  # sub-sample every 100th event\n",
    "plt.scatter(pos.l.wrap_at(\"180 deg\").deg, pos.b.deg, s=10)\n",
    "# Pointing position\n",
    "pos_pnt = events.pointing_radec.galactic\n",
    "plt.scatter(\n",
    "    pos_pnt.l.wrap_at(\"180 deg\").deg, pos_pnt.b.deg, marker=\"*\", s=400, c=\"red\"\n",
    ")\n",
    "plt.xlabel(\"Galactic longitude (deg)\")\n",
    "plt.ylabel(\"Galactic latitude (deg)\")\n",
    "pos_pnt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### EVENT energy distribution\n",
    "\n",
    "Let's have a look at the event energy distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "energy = events.table[\"ENERGY\"].data\n",
    "energy_bins = np.logspace(-2, 2, num=100)\n",
    "plt.hist(energy, bins=energy_bins)\n",
    "plt.semilogx()\n",
    "plt.xlabel(\"Event energy (TeV)\")\n",
    "plt.ylabel(\"Number of events\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A double-peak, at ~ 30 GeV and ~ 100 GeV? .... let's try to find out what's going on ...\n",
    "\n",
    "### EVENT MC_ID\n",
    "\n",
    "One idea could be to split the data into gamma-ray and hadronic background events.\n",
    "Now from looking at the FITS header, one can see that ``MC_ID == 1`` is the hadronic background, and the other values are for different gamma-ray emission components."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "is_gamma = events.table[\"MC_ID\"] != 1\n",
    "print(\"Number of events: \", len(events.table))\n",
    "print(\"Number of gammas: \", is_gamma.sum())\n",
    "print(\"Number of hadrons: \", len(events.table) - is_gamma.sum())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "energy = events.table[\"ENERGY\"].data\n",
    "energy_bins = np.logspace(-2, 2, num=100)\n",
    "opts = dict(bins=energy_bins, density=True, histtype=\"step\")\n",
    "plt.hist(energy[~is_gamma], label=\"hadron\", **opts)\n",
    "plt.hist(energy[is_gamma], label=\"gamma\", **opts)\n",
    "plt.loglog()\n",
    "plt.xlabel(\"Event energy (TeV)\")\n",
    "plt.ylabel(\"Number of events\")\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected, the energy spectra are roughly power-laws. \n",
    "When plotting in log-log, one can see that the spectra are roughly power-laws (as expected), and the \"double peak\" we saw before looks more like a \"double energy threshold\" pattern.\n",
    "It's there for gammas and background, and below 100 GeV the signal to background ratio is lower.\n",
    "\n",
    "What we're seeing here is the result of a mixed-array in CTA with LSTs, MSTs and SSTs, which  have different energy thresholds:\n",
    "\n",
    "* ~ 30 GeV is the energy threshold of the LSTs\n",
    "* ~ 100 GeV is the energy threshold of the MSTs\n",
    "* the energy threshold of the SSTs is at a few TeV and doesn't show up as a clear feature in the gamma and background energy distribution (probably the rise in slope in gamma in the 1-10 TeV range is due to the SSTs).\n",
    "\n",
    "Let's look a little deeper and also check the event offset distribution in the field of view ...\n",
    "\n",
    "### EVENT FOV offset\n",
    "\n",
    "The event position and offset in the field of view (FOV) can be computed from the event RA, DEC position and the observation pointing RA, DEC position.\n",
    "\n",
    "But actually, the field of view position is stored as extra columns in the EVENT list: ``DETX`` and ``DETY`` (angles in degree, I think RA / DEC aligned field of view system).\n",
    "\n",
    "I presume (hope) this unnecessary information will be dropped from the CTA event lists in the future to save some space (make the CTA DL3 data ~25% smaller), but for now, let's use those columns to compute the field of view offset and look at the offset-energy distribution of the events."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "energy_bins = 10 ** np.linspace(-2, 2, 100)\n",
    "offset_bins = np.arange(0, 5.5, 0.1)\n",
    "\n",
    "t = events.table\n",
    "offset = np.sqrt(t[\"DETX\"] ** 2 + t[\"DETY\"] ** 2)\n",
    "hist = np.histogram2d(\n",
    "    x=t[\"ENERGY\"], y=offset, bins=(energy_bins, offset_bins)\n",
    ")[0].T\n",
    "\n",
    "from matplotlib.colors import LogNorm\n",
    "\n",
    "plt.pcolormesh(energy_bins, offset_bins, hist, norm=LogNorm())\n",
    "plt.semilogx()\n",
    "plt.colorbar()\n",
    "plt.xlabel(\"Energy (TeV)\")\n",
    "plt.ylabel(\"Offset (deg)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So the CTA field of view increases with energy in steps. The energy distribution we saw before was the combination of the energy distribution at all offsets. Even at a single offset, the double energy-threshold at ~ 30 GeV and ~ 100 GeV is present.\n",
    "\n",
    "There is also a quick-look peek method you can use to get a peek at the contents of an event list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "events.peek()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Stacking EVENTS tables\n",
    "\n",
    "As a final example of how to work with event lists, here's now to apply arbitrary event selections and how to stack events tables from several observations into a single event list. \n",
    "\n",
    "We will just use `astropy.table.Table` directly, not go via the `gammapy.data.EventList` class. Note that you can always make an `EventList` object from a `Table` object via `event_list = EventList(table)`. One point to keep in mind is that `Table.read` doesn't resolve environment variables in filenames, so we'll use the Python standard library `os` package to construct the filenames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from astropy.table import Table\n",
    "from astropy.table import vstack as table_vstack\n",
    "\n",
    "filename = os.path.join(\n",
    "    os.environ[\"GAMMAPY_DATA\"],\n",
    "    \"cta-1dc/data/baseline/gps/gps_baseline_110380.fits\",\n",
    ")\n",
    "t1 = Table.read(filename, hdu=\"EVENTS\")\n",
    "\n",
    "filename = os.path.join(\n",
    "    os.environ[\"GAMMAPY_DATA\"],\n",
    "    \"cta-1dc/data/baseline/gps/gps_baseline_111140.fits\",\n",
    ")\n",
    "t2 = Table.read(filename, hdu=\"EVENTS\")\n",
    "tables = [t1, t2]\n",
    "table = table_vstack(tables, metadata_conflicts=\"silent\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Number of events: \", len(table))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's select gamma rays with energy above 10 TeV\n",
    "mask_mc_id = table[\"MC_ID\"] != 1\n",
    "mask_energy = table[\"ENERGY\"] > 10\n",
    "mask = mask_mc_id & mask_energy\n",
    "table2 = table[mask]\n",
    "print(\"Number of events after selection:\", len(table2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When processing a lot or all of the 1DC events, you would write a for loop, and apply the event selection before putting the table in the list of tables, or you might run out of memory. An example is [here](https://github.com/gammasky/cta-dc/blob/master/data/cta_1dc_make_allsky_images.py).\n",
    "\n",
    "That's all for ``EVENTS``. You now know what every column in the event table contains, and how to work with event list tables using ``gammapy.data.EventList`` and ``astropy.table.Table``. \n",
    "\n",
    "Just in case that there is some observation parameter in the FITS EVENTS header that you're interested in, you can find the full description of the keys you can access via the ``events.table.meta`` dictionary [here](http://gamma-astro-data-formats.readthedocs.io/en/latest/events/events.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## IRFs\n",
    "\n",
    "The CTA instrument reponse functions (IRFs) are given as FITS files in the `caldb` folder.\n",
    "\n",
    "Note that this is not realistic. Real CTA data at the DL3 level (what we have here, what users get) will mostly likely have per-observation or per-time interval IRFs, and the IRFs will not be stored in a separate CALDB folder, but distributed with the EVENTS (probably in the same file, or at least in the same folder, so that it's together).\n",
    "\n",
    "For now, the EVENT to IRF association (i.e. which IRF is the right one for given EVENTS) is done by index files. We will discuss those in the next section, but before we do, let's look at the CTA IRFs for one given configuration: `South_z20_50h`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!(cd $GAMMAPY_DATA/cta-1dc && tree caldb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's look at the content of one of the IRF FITS files.\n",
    "# IRFs are stored in `BinTable` HDUs in a weird format\n",
    "# that you don't need to care about because it's implemented in Gammapy\n",
    "irf_filename = os.path.join(\n",
    "    os.environ[\"GAMMAPY_DATA\"],\n",
    "    \"cta-1dc/caldb/data/cta/1dc/bcf/South_z20_50h/irf_file.fits\",\n",
    ")\n",
    "from astropy.io import fits\n",
    "\n",
    "hdu_list = fits.open(irf_filename)\n",
    "hdu_list.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Effective area\n",
    "\n",
    "The effective area is given as a 2-dim array with energy and field of view offset axes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gammapy.irf import EffectiveAreaTable2D\n",
    "\n",
    "aeff = EffectiveAreaTable2D.read(irf_filename, hdu=\"EFFECTIVE AREA\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(aeff)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(aeff.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(aeff.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aeff.peek()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# What is the on-axis effective area at 10 TeV?\n",
    "aeff.data.evaluate(energy=\"10 TeV\", offset=\"0 deg\").to(\"km2\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is how you slice out an `EffectiveAreaTable` object\n",
    "# at a given field of view offset for analysis\n",
    "aeff.to_effective_area_table(offset=\"1 deg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Energy dispersion\n",
    "\n",
    "Let's have a look at the CTA energy dispersion with three axes: true energy, fov offset and migra = e_reco / e_true and has dP / dmigra as value.\n",
    "\n",
    "Similar to the event energy distribution above, we can see the mixed-telescope array reflected in the EDISP.\n",
    "At low energies the events are only detected and reconstructed by the LSTs.\n",
    "At ~100 GeV, the MSTs take over and EDISP is chaotic in the ~ 50 GeV to 100 GeV energy range.\n",
    "So it can be useful to have quick access to IRFs like with Gammapy (e.g. for spectral line searches in this case), even if for 95% of science analyses users later won't have to look at the IRFs and just trust that everything is working."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gammapy.irf import EnergyDispersion2D\n",
    "\n",
    "edisp = EnergyDispersion2D.read(irf_filename, hdu=\"ENERGY DISPERSION\")\n",
    "print(type(edisp))\n",
    "print(type(edisp.data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(edisp.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "edisp.peek()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is how for analysis you could slice out an `EnergyDispersion`\n",
    "# object at a given offset:\n",
    "edisp.to_energy_dispersion(offset=\"0 deg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Point spread function\n",
    "\n",
    "The point spread function (PSF) in this case is given as an analytical Gaussian model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gammapy.irf import EnergyDependentMultiGaussPSF\n",
    "\n",
    "psf = EnergyDependentMultiGaussPSF.read(\n",
    "    irf_filename, hdu=\"POINT SPREAD FUNCTION\"\n",
    ")\n",
    "print(psf.info())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "psf.peek()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is how for analysis you could slice out the PSF\n",
    "# at a given field of view offset\n",
    "psf.to_energy_dependent_table_psf(\"1 deg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Background\n",
    "\n",
    "The hadronic background for CTA DC-1 is given as a template model with an absolute rate that depends on `energy`, `detx` and `dety`. The coordinates `detx` and `dety` are angles in the \"field of view\" coordinate frame.\n",
    "\n",
    "Note that really the background model for DC-1 and most CTA IRFs produced so far are radially symmetric, i.e. only depend on the FOV offset. The background model here was rotated to fill the FOV in a rotationally symmetric way, for no good reason."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gammapy.irf import Background3D\n",
    "\n",
    "bkg = Background3D.read(irf_filename, hdu=\"BACKGROUND\")\n",
    "print(bkg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: implement a peek method for Background3D\n",
    "# bkg.peek()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bkg.data.evaluate(energy=\"3 TeV\", fov_lon=\"1 deg\", fov_lat=\"0 deg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Index files and DataStore\n",
    "\n",
    "As we saw, you can access all of the CTA data using Astropy and Gammapy.\n",
    "\n",
    "But wouldn't it be nice if there were a better, simpler way?\n",
    "\n",
    "Imagine what life could be like if you had a butler that knows where all the files and HDUs are, and hands you the 1DC data on a silver platter, you just have to ask for it.\n",
    "\n",
    "![gammapy.data.DataStore - your butler for CTA data](images/gammapy_datastore_butler.png)\n",
    "\n",
    "Well, the butler exists. He's called `gammapy.data.DataStore` and he keeps track of the data using index files.\n",
    "\n",
    "### Index files\n",
    "\n",
    "The files with in the `index` folder with names `obs-index.fits.gz` and `hdu-index.fits.gz` are so called \"observation index files\" and \"HDU index files\".\n",
    "\n",
    "* The purpose of observation index files is to get a table of available observations, with the most relevant parameters commonly used for observation selection (e.g. pointing position or observation time). Their format is described in detail [here](http://gamma-astro-data-formats.readthedocs.io/en/latest/data_storage/obs_index/index.html).\n",
    "* The purpose of HDU index files is to locate all FITS header data units (HDUs) for a given observation. At the moment, for each observation, there are six relevant HDUs: EVENTS, GTI, AEFF, EDISP, PSF and BKG. The format is described in detail [here](http://gamma-astro-data-formats.readthedocs.io/en/latest/data_storage/hdu_index/index.html).\n",
    "\n",
    "For 1DC there is one set of index files per simulated dataset, as well as a set of index files listing all available data in the all directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!(cd $GAMMAPY_DATA/cta-1dc && tree index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gammapy DataStore\n",
    "\n",
    "If you want to access data and IRFs from the CTA 1DC GPS dataset, just create a `DataStore` by pointing at a folder where the index files are located."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gammapy.data import DataStore\n",
    "\n",
    "data_store = DataStore.from_dir(\"$GAMMAPY_DATA/cta-1dc/index/gps\")\n",
    "\n",
    "# If you want to access all CTA DC-1 data,\n",
    "# set the CTADATA env var and use this:\n",
    "# data_store = DataStore.from_dir(\"$CTADATA/index/gps\")\n",
    "# Or point at the directly with the index files directly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print out some basic information about the available data:\n",
    "data_store.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The observation index is loaded as a table\n",
    "print(\"Number of observations: \", len(data_store.obs_table))\n",
    "print(data_store.obs_table.colnames)\n",
    "# Compute and print total obs time in hours\n",
    "print(data_store.obs_table[\"ONTIME\"].sum() / 3600)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The HDU index is loaded as a table\n",
    "print(len(data_store.hdu_table))\n",
    "print(data_store.hdu_table.colnames)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Of course, you can look at the tables if you like\n",
    "# data_store.obs_table[:10].show_in_browser(jsviewer=True)\n",
    "# data_store.hdu_table[:10].show_in_browser(jsviewer=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select observations\n",
    "\n",
    "With ``data_store.obs_table`` you have a table with the most common per-observation parameters that are used for observation selection. Using Python / Table methods it's easy to apply any selection you like, always with the goal of making a list or array of `OBS_ID`, which is then the input to analysis.\n",
    "\n",
    "For the current 1DC dataset it's pretty simple, because the only quantities useful for selection are:\n",
    "* pointing position\n",
    "* which irf (i.e. array  / zenith angle)\n",
    "\n",
    "With real data, there will be more parameters of interest, such as data quality, observation duration, zenith angle, time of observation, ...\n",
    "\n",
    "Let's look at one example: select observations that are at offset 1 to 2 deg from the Galactic center."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from astropy.coordinates import SkyCoord\n",
    "\n",
    "table = data_store.obs_table\n",
    "pos_obs = SkyCoord(\n",
    "    table[\"GLON_PNT\"], table[\"GLAT_PNT\"], frame=\"galactic\", unit=\"deg\"\n",
    ")\n",
    "pos_target = SkyCoord(0, 0, frame=\"galactic\", unit=\"deg\")\n",
    "offset = pos_target.separation(pos_obs).deg\n",
    "mask = (1 < offset) & (offset < 2)\n",
    "table = table[mask]\n",
    "print(\"Number of selected observations: \", len(table))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Look at the first few\n",
    "table[[\"OBS_ID\", \"GLON_PNT\", \"GLAT_PNT\", \"IRF\"]][:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check which IRFs were used ... it's all south and 20 deg zenith angle\n",
    "set(table[\"IRF\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the pointing positions\n",
    "# The grid pointing positions at GLAT = +- 1.2 deg are visible\n",
    "from astropy.coordinates import Angle\n",
    "\n",
    "plt.scatter(\n",
    "    Angle(table[\"GLON_PNT\"], unit=\"deg\").wrap_at(\"180 deg\"), table[\"GLAT_PNT\"]\n",
    ")\n",
    "plt.xlabel(\"Galactic longitude (deg)\")\n",
    "plt.ylabel(\"Galactic latitude (deg)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load data\n",
    "\n",
    "Once you have selected the observations of interest, use the `DataStore` to load the data and IRF for those observations. Let's say we're interested in `OBS_ID=110380`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs = data_store.obs(obs_id=110380)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(obs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.events"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.events.table[:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.gti"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.aeff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.edisp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.psf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.bkg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's an example how to loop over many observations and analyse them: [cta_1dc_make_allsky_images.py](https://github.com/gammasky/cta-dc/blob/master/data/cta_1dc_make_allsky_images.py)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model XML files\n",
    "\n",
    "The 1DC sky model is distributed as a set of XML files, which in turn link to a ton of other FITS and text files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: copy an example XML file for tutorials\n",
    "# This one is no good: $CTADATA/models/models_gps.xml\n",
    "# Too large, private in CTA, loads large diffuse model\n",
    "# We need to prepare something custom.\n",
    "# !ls $CTADATA/models/*.xml | xargs -n 1 basename"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is what the XML file looks like\n",
    "# !tail -n 20 $CTADATA/models/models_gps.xml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At the moment, you cannot read and write these sky model XML files with Gammapy.\n",
    "\n",
    "There are multiple reasons why this XML serialisation format isn't implemented in Gammapy yet (all variants of \"it sucks\"):\n",
    "\n",
    "* XML is tedious to read and write for humans.\n",
    "* The format is too strict in places: there are many use cases where \"min\", \"max\", \"free\" and \"error\" aren't needed, so it should be possible to omit them (see e.g. dummy values above)\n",
    "* The parameter \"scale\" is an implementation detail that very few optimisers need. There's no reason to bother all gamma-ray astronomers with separating value and scale in model input and result files.\n",
    "* The \"unit\" is missing. Pretty important piece of information, no?\n",
    "  All people working on IACTs use \"TeV\", why should CTA be forced to use \"MeV\"?\n",
    "* Ad-hoc extensions that keep being added and many models can't be put in one file (see extra ASCII and FITS files with tables or other info, e.g. pulsar models added for CTA 1DC). Admittedly complex model serialisation is an intrinsically hard problem, there is not simple and flexible solution.\n",
    "\n",
    "Also, to be honest, I also want to say / admit:\n",
    "\n",
    "* A model serialisation format is useful, even if it will never cover all use cases (e.g. energy-dependent morphology or an AGN with a variable spectrum, or complex FOV background models).\n",
    "* The Gammapy model package isn't well-implemented or well-developed yet.\n",
    "* So far users were happy to write a few lines of Python to define a model instead of XML.\n",
    "* Clearly with CTA 1DC now using the XML format there's an important reason to support it, there is the legacy of Fermi-LAT using it and ctools using / extending it for CTA.\n",
    "\n",
    "So what now?\n",
    "\n",
    "* In Gammapy, we have started to implement a modeling package in `gammapy.utils.modeling`, with the goal of supporting this XML format as one of the serialisation formats. It's not very far along, will be a main focus for us, help is welcome.\n",
    "* In addition we would like to support a second more human-friendly model format that looks something like [this](https://github.com/gammapy/gamma-cat/blob/b651de8d1d793e924764ffb13c8ec189bce9ea7d/input/data/2006/2006A%2526A...457..899A/tev-000025.yaml#L11)\n",
    "* For now, you could use Gammalib to read the XML files, or you could read them directly with Python. The Python standard library contains [ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html) and there's [xmltodict](https://github.com/martinblech/xmltodict) which simply hands you back the XML file contents as a Python dictionary (containing a very nested hierarchical structure of Python dict and list objects and strings and numbers.\n",
    "\n",
    "As an example, here's how you can read an XML sky model and access the spectral parameters of one source (the last, \"Arp200\" visible above in the XML printout) and create a [gammapy.spectrum.models.PowerLaw](https://docs.gammapy.org/dev/api/gammapy.spectrum.models.PowerLaw.html) object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read XML file and access spectrum parameters\n",
    "# from gammapy.extern import xmltodict\n",
    "\n",
    "# filename = os.path.join(os.environ[\"CTADATA\"], \"models/models_gps.xml\")\n",
    "# data = xmltodict.parse(open(filename).read())\n",
    "# data = data[\"source_library\"][\"source\"][-1]\n",
    "# data = data[\"spectrum\"][\"parameter\"]\n",
    "# data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a spectral model the the right units\n",
    "# from astropy import units as u\n",
    "# from gammapy.spectrum.models import PowerLaw\n",
    "\n",
    "# par_to_val = lambda par: float(par[\"@value\"]) * float(par[\"@scale\"])\n",
    "# spec = PowerLaw(\n",
    "#     amplitude=par_to_val(data[0]) * u.Unit(\"cm-2 s-1 MeV-1\"),\n",
    "#     index=par_to_val(data[1]),\n",
    "#     reference=par_to_val(data[2]) * u.Unit(\"MeV\"),\n",
    "# )\n",
    "# print(spec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "* Easy: Go over this notebook again, and change the data / code a little bit:\n",
    "    * Pick another run (any run you like)\n",
    "    * Plot the energy-offset histogram of the events separately for gammas and background\n",
    "\n",
    "* Medium difficulty: Find all runs within 1 deg of the Crab nebula.\n",
    "    * Load the \"all\" index file via the `DataStore`, then access the ``.obs_table``, then get an array-valued ``SkyCoord`` with all the observation pointing positions.\n",
    "    * You can get the Crab nebula position with `astropy.coordinates.SkyCoord` via ``SkyCoord.from_name('crab')`` \n",
    "    * Note that to compute the angular separation between two ``SkyCoord`` objects can be computed via ``separation = coord1.separation(coord2)``.\n",
    "\n",
    "* Hard: Find the PeVatrons in the 1DC data, i.e. the ~ 10 sources that are brightest at high energies (e.g. above 10 TeV).\n",
    "    * This is difficult, because it's note clear how to best do this, and also because it's time-consuming to crunch through all relevant data for any given method.\n",
    "    * One idea could be to go brute-force through **all** events, select the ones above 10 TeV and stack them all into one table. Then make an all-sky image and run a peak finder, or use an event cluster-finding method e.g. from scikit-learn.\n",
    "    * Another idea could be to first make a list of targets of interest, either from the CTA 1DC sky model or from gamma-cat, compute the model integral flux above 10 TeV and pick candidates that way, then run analyses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# start typing here ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What next?\n",
    "\n",
    "* This notebook gave you an overview of the 1DC files and showed you have to access and work with them using Gammapy.\n",
    "* To see how to do analysis, i.e. make a sky image and spectrum, see [cta_data_analysis.ipynb](cta_data_analysis.ipynb) next."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}