{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial 1: Basics\n",
    "\n",
    "This tutorial will talk about how to use this software from your own python project or Jupyter notebook.\n",
    "There is also a nice command line interface that enables you to do the same with just two lines in your command line.\n",
    "\n",
    "**NOTE FOR CONTRIBUTORS: Always clear all output before commiting (``Cell`` > ``All Output`` > ``Clear``)**!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Magic\n",
    "%matplotlib inline\n",
    "# Reload modules whenever they change\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "# Make clusterking package available even without installation\n",
    "import sys\n",
    "sys.path = [\"../../\"] + sys.path\n",
    "\n",
    "import clusterking as ck"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Scanning\n",
    "\n",
    "### Setting it up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's set up a scanner object and configure it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = ck.scan.WilsonScanner()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we set up the function/distribution that we want to consider. Here we look into the branching ratio with respect to $q^2$ of $B\\to D \\,\\tau\\, \\bar\\nu_\\tau$. The function of the differential branching ration is taken from the flavio package (https://flav-io.github.io/). The $q^2$ binning is chosen to have 9 bins between $3.2 \\,\\text{GeV}^2$ and $11.6\\,\\text{GeV}^2$ and is implemented as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import flavio\n",
    "import numpy as np\n",
    "\n",
    "def dBrdq2(w, q):\n",
    "    return flavio.np_prediction(\"dBR/dq2(B+->Dtaunu)\", w, q)\n",
    "\n",
    "s.set_dfunction(\n",
    "    dBrdq2,\n",
    "    binning=np.linspace(3.2, 11.6, 10),\n",
    "    normalize=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let's set up the Wilson coefficients that need to be sampled. The Wilson coefficients are implemented using the Wilson package (https://wilson-eft.github.io/), which allows to use a variety of bases, EFTs and matches them to user specified scales.\n",
    "Using the example of $B\\to D \\tau \\bar\\nu_\\tau$, we sample the coefficients ``CVL_bctaunutau``, ``CSL_bctaunutau`` and ``CT_bctaunutau`` from the ``flavio`` basis (https://wcxf.github.io/assets/pdf/WET.flavio.pdf) with 10 points between $-1$ and $1$ at the scale of 5 GeV:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s.set_spoints_equidist(\n",
    "    {\n",
    "        \"CVL_bctaunutau\": (-1, 1, 10),\n",
    "        \"CSL_bctaunutau\": (-1, 1, 10),\n",
    "        \"CT_bctaunutau\": (-1, 1, 10)\n",
    "    },\n",
    "    scale=5,\n",
    "    eft='WET',\n",
    "    basis='flavio'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running it"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now to compute the kinematical distributions from the Wilson coefficients sampled above we need a data instance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = ck.Data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Computing the kinematical distributions is done using ``run()`` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s.run(d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results are saved in a dataframe, ``d.df``. Let's have a look:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "d.df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clustering"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's build a hierarchy cluster out of the data object we created above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c = ck.cluster.HierarchyCluster(d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we have to specify the metric we want to use to measure the distance between different distributions. If no argument is specified, the common $\\chi^2$ metric from is used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c.set_metric()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's build now the hierarchy cluster:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c.build_hierarchy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The maximal distance between the individual clusters ``max_d`` can be chosen as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "c.cluster(max_d=0.15)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we add the information about the clusters to the dataframe created above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c.write()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look and notice the new column ``cluster`` at the end of the data frame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d.df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Selecting benchmark points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In a similar way we can determine the benchmark points representing the individual clusters. Initializing a benchmark point object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = ck.Benchmark(d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and again choosing a metric ($\\chi^2$ metric is default)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b.set_metric()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "the benchmark points can be computed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b.select_bpoints()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and written in the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b.write()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look and notice the new column ``bpoint`` at the end of the data frame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d.df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preserving results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now it's time to write out the results for later use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d.write(\"output/cluster\", \"tutorial_basics\", overwrite=\"overwrite\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will not only write out the data itself, but also a lot of associated metadata that makes it easy to later reconstruct what the data actually represents. This was accumulated in the attribute ``d.md`` over all steps:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d.md"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}