{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8a373e1e",
   "metadata": {},
   "source": [
    "# Onderzoek de data van je eigen station"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7088879",
   "metadata": {},
   "outputs": [],
   "source": [
    "STATION = 15  # gvdveen"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e38b57fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sapphire\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import datetime as dt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b837f025",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = sapphire.quick_download(STATION)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b252974",
   "metadata": {},
   "source": [
    "Na enige tijd verschijnt hierboven een regel zoals:\n",
    "\"`100%|############################################################|Time: 0:00:06`\"\n",
    "\n",
    "Soms is de download zo snel dat deze regel niet wordt afgedrukt.\n",
    "\n",
    "De variabele \"`data`\" bevat nu een set meetgegevens. Deze set is af te drukken."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d2dec97",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download data van een andere dag\n",
    "# start = dt.datetime(2019, 3, 20)\n",
    "# end = dt.datetime(2019, 3, 21)\n",
    "# sapphire.download_data(data, '/s%d' % STATION, start, end)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d21fc9e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f7d5ff6",
   "metadata": {},
   "source": [
    "Het \"`data`\" bestand heeft een hierarchise opbouw. In \"`data`\" zit een\n",
    "RootGroup, deze is te benaderen met \"`data.root`\". Hierin zit weer een groep\n",
    "\"`s102`\", deze is te benaderen met \"`data.root.s102`\". Hierin zit een tabel\n",
    "\"`events`\".\n",
    "\n",
    "## Werken met een events tabel\n",
    "Voor het gemak maken we een variable\n",
    "`events` die naar de eventstabel van het station wijst:\n",
    "\n",
    "De tabel heeft een\n",
    "bepaalde plaats het HDF5 data bestand: `/s????/events` waarbij `????` staat voor\n",
    "het station nummer. Deze plaats heet een `node`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c28dc0b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "node_naam = '/s%d/events' % STATION\n",
    "node_naam"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9dde9f5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "event_tabel = data.get_node(node_naam)\n",
    "event_tabel"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b441d48c",
   "metadata": {},
   "source": [
    "Dit is een tabel tienduizenden regels. Elke regel is een event.\n",
    "\n",
    "De informatie van het eerste event is op te halen met:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f06c64a",
   "metadata": {},
   "outputs": [],
   "source": [
    "event_tabel[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6af9f284",
   "metadata": {},
   "source": [
    "Het **tweede** event: (Let op, python telt vanaf 0 en niet vanaf 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb86060f",
   "metadata": {},
   "outputs": [],
   "source": [
    "event_tabel[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bb3032e",
   "metadata": {},
   "source": [
    "De informatie in een event bestaat uit een lijst getallen. Deze getallen hebben\n",
    "de volgende betekenis:\n",
    "\n",
    "1. event_id: Het unieke nummer van het event in deze dataset.\n",
    "1. timestamp: De tijd in hele seconden (GPS) waarop de trigger van het event plaatsvond.\n",
    "1. nanoseconds: De tijd in nanoseconden waarop de trigger van het event plaatsvond.\n",
    "1. ext_timestamp: Dit getal is vrij groot, namelijk de twee vorige achter elkaar.\n",
    "1. pulseheights: Een array met pulshoogten, \"`-1`\" betekent dat er geen detector was.\n",
    "1. integrals: Een array met pulsoppervlakten, \"`-1`\" betekent ook hier dat er geen detector was.\n",
    "1. n1: Het aantal MIPS's (Minimal Ionising Particles) dat in detector 1 is gereconstrueerd.\n",
    "1. n2\n",
    "1. n3\n",
    "1. n4\n",
    "1. t1: De gereconstrueerde detectietijden vanaf het begin van het opgeslagen signaal voor detector 1.\n",
    "1. t2\n",
    "1. t3\n",
    "1. t4\n",
    "1. t_trigger: Het moment van de GPS-tijdstempel vanaf het begin van het opgeslagen signaal.\n",
    "\n",
    "In het\n",
    "werkblad [https://docs.hisparc.nl/infopakket/pdf/traces.pdf](https://docs.hisparc.nl/infopakket/pdf/traces.pdf)\n",
    "wordt de natuurkundige betekenis van deze\n",
    "getallen beschreven.  De afbeeldingen in dit werkblad zijn afkomstig uit het\n",
    "interactieve werkblad [https://data.hisparc.nl/media/jsparc/jsparc.html](https://data.hisparc.nl/media/jsparc/jsparc.html).\n",
    "Let op, computers tellen vanaf \"`0`\" en niet vanaf \"`1`\"."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5824b55",
   "metadata": {},
   "source": [
    "### Werken met kolomnamen\n",
    "\n",
    "Een kolom zoals 'event_id', 'timestamp' of 't1' kan opgevraagd worden door de\n",
    "index van de kolom (0, 1, 2, ...) of door de kolomnaam. Door gebruik te maken\n",
    "van de kolomnaam wordt de code veel beter leesbaar:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21c5a036",
   "metadata": {},
   "outputs": [],
   "source": [
    "first_event = event_tabel[0]\n",
    "first_event['timestamp']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a8e3a52",
   "metadata": {},
   "source": [
    "Het aantal gereconstrueerde deeltjes in detector 1 (het zevende getal) bij het\n",
    "eerste event is dus te vinden met:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d458a45b",
   "metadata": {},
   "outputs": [],
   "source": [
    "event_tabel[0][6]  # 7de kolom van 1ste rij"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62aa0f28",
   "metadata": {},
   "source": [
    "en:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf4a94a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "first_event = event_tabel[0]\n",
    "first_event['n1']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "767d90ed",
   "metadata": {},
   "source": [
    "De tweede code is weliswaar langer, maar veel beter leesbaar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34265a55",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(first_event['n1'])\n",
    "print(event_tabel[0][6])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c1975e3",
   "metadata": {},
   "source": [
    "Een array met pulshoogten in ADC-waarden is in dit geval te vinden met:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78abb6d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "first_event['pulseheights']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f78ad927",
   "metadata": {},
   "source": [
    "Merk op dat de pulshoogtes van detector 3 en 4 de waarde '-1' hebben. De waarde\n",
    "'-1' betekent dat de pulsehoogte niet bepaald kon worden; Station 102 heeft\n",
    "slechts twee detectoren.\n",
    "\n",
    "De eerste pulshoogte is te vinden met:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ecdf9ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"pulshoogte detector 1: %d ADC (eerste event)\" % first_event['pulseheights'][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "662f001c",
   "metadata": {},
   "source": [
    "## Timestamps\n",
    "Vaak is het eenvoudiger om een hele *kolom* bijvoorbeeld `timestamp` in een keer te bekijken.\n",
    "\n",
    "Eerst lezen we de hele tabel in het geheugen. Het object `events` is de gehele tabel:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a46e5008",
   "metadata": {},
   "outputs": [],
   "source": [
    "events = event_tabel.read()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "647fbe5c",
   "metadata": {},
   "source": [
    "De variabele `ts` wijst naar de kolom `timestamp` en we bekijken de eerste 30 regels (events):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b949c479",
   "metadata": {},
   "outputs": [],
   "source": [
    "ts = events['timestamp']\n",
    "ts[:30]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fbbd918",
   "metadata": {},
   "outputs": [],
   "source": [
    "ns = events['nanoseconds']\n",
    "ns[:30]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "351b6a8e",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,4))\n",
    "plt.hist(ns, histtype='step')\n",
    "plt.ylabel('aantal')\n",
    "plt.xlabel('nanoseconds deel van de timestamp')\n",
    "plt.title('ESD Events van een enkele dag van station %d' % STATION)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ebaa20a",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,4))\n",
    "plt.hist(ts, bins=24, histtype='step')\n",
    "plt.ylabel('aantal')\n",
    "plt.xlabel('timestamp [s]')\n",
    "plt.title('ESD Events van een enkele dag van station %d' % STATION)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29b6c332",
   "metadata": {},
   "outputs": [],
   "source": [
    "eerste_ts = ts[0]\n",
    "eerste_ts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48ab2f49",
   "metadata": {},
   "outputs": [],
   "source": [
    "# linker en rechter grenzen van bins van 1 uur (3600 s) breed vanaf de eerste timestamp (1 dag)\n",
    "bins = [eerste_ts + 3600 * h for h in range(25)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77465c09",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,4))\n",
    "plt.hist(ts, bins=bins, histtype='step')\n",
    "plt.ylabel('aantal')\n",
    "plt.xlabel('tijd vanaf middernacht [h]')\n",
    "plt.xticks(bins, range(25))\n",
    "plt.title('ESD Events van een enkele dag van station %d' % STATION)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4559e69",
   "metadata": {},
   "source": [
    "# MIPs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b92020c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "n1 = event_tabel.col('n1')\n",
    "plt.figure(figsize=(10,4))\n",
    "plt.hist(n1, bins=np.arange(0.3, 5., .1), histtype='step')\n",
    "plt.title('Station %d: Number of particles in detector 1' % STATION)\n",
    "plt.xlabel('number of particles (MIP)')\n",
    "plt.ylabel('counts')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f943a402",
   "metadata": {},
   "source": [
    "## Pulshoogte\n",
    "\n",
    "Maak een histogram van de pulshoogtes van detector 1 en 2 van het\n",
    "station.\n",
    "\n",
    "Een voorbeeld is hier te zien: https://data.hisparc.nl/show/stations/15"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fff43a6f",
   "metadata": {},
   "outputs": [],
   "source": [
    "ph = event_tabel.col('pulseheights')\n",
    "ph1 = ph[:, 0]\n",
    "ph2 = ph[:, 1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3643833c",
   "metadata": {},
   "source": [
    "'pulseheights' is een *matrix*:\n",
    "- `[:, 0]` is de gehele eerste rij, dwz de pulshoogtes per event van detector 0\n",
    "- `[:, 1]` is de gehele tweede rij, dwz de pulshoogtes per event van detector 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "223f3260",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,4))\n",
    "plt.hist(ph1, bins=np.arange(0, 2000., 20.), histtype='step', log=True)\n",
    "plt.hist(ph2, bins=np.arange(0, 2000., 20.), histtype='step', log=True)\n",
    "plt.title('Station %d: Pulseheights' % STATION)\n",
    "plt.xlabel('Pulseheight (ADC)')\n",
    "plt.ylabel('counts')\n",
    "plt.legend(['detector 1', 'detector 2' ])\n",
    "plt.ylim(10, 1e4)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}