{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "209d2b58",
   "metadata": {},
   "source": [
    "## Part 5: Boosted Decision Trees\n",
    "\n",
    "The `conifer` package was created out of `hls4ml`, providing a similar set of features but specifically targeting inference of Boosted Decision Trees. In this notebook we will train a `GradientBoostingClassifier` with scikit-learn, using the same jet tagging dataset as in the other tutorial notebooks. Then we will convert the model using `conifer`, and run bit-accurate prediction and synthesis as we did with `hls4ml` before.\n",
    "\n",
    "`conifer` is available from GitHub [here](https://github.com/thesps/conifer), and we have a publication describing the inference implementation and performance in detail [here](https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05026/pdf).\n",
    "\n",
    "<img src=\"https://github.com/thesps/conifer/blob/master/conifer_v1.png?raw=true\" width=\"250\" alt=\"conifer\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eda9b784",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.ensemble import GradientBoostingClassifier\n",
    "from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n",
    "from sklearn.metrics import accuracy_score\n",
    "import joblib\n",
    "import conifer\n",
    "import plotting\n",
    "import matplotlib.pyplot as plt\n",
    "import os\n",
    "os.environ['PATH'] = '/opt/Xilinx/Vivado/2019.2/bin:' + os.environ['PATH']\n",
    "np.random.seed(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18354699",
   "metadata": {},
   "source": [
    "## Load the dataset\n",
    "Note you need to have gone through `part1_getting_started` to download the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1574ed18",
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_val = np.load('X_train_val.npy')\n",
    "X_test = np.load('X_test.npy')\n",
    "y_train_val = np.load('y_train_val.npy')\n",
    "y_test = np.load('y_test.npy', allow_pickle=True)\n",
    "classes = np.load('classes.npy', allow_pickle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24658fb4",
   "metadata": {},
   "source": [
    "We need to transform the test labels from the one-hot encoded values to labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "00f304bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "le = LabelEncoder().fit(classes)\n",
    "ohe = OneHotEncoder().fit(le.transform(classes).reshape(-1,1))\n",
    "y_train_val = ohe.inverse_transform(y_train_val.astype(np.int))\n",
    "y_test = ohe.inverse_transform(y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8305e22c",
   "metadata": {},
   "source": [
    "## Train a `GradientBoostingClassifier`\n",
    "We will use 20 estimators with a maximum depth of 3. The number of decision trees will be `n_estimators * n_classes`, so 100 for this dataset. If you are returning to this notebook having already trained the BDT once, set `train = False` to load the model rather than retrain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5044231",
   "metadata": {},
   "outputs": [],
   "source": [
    "train = True\n",
    "if train:\n",
    "    clf = GradientBoostingClassifier(n_estimators=20, learning_rate=1.0,\n",
    "                                     max_depth=3, random_state=0, verbose=1).fit(X_train_val, y_train_val.ravel())\n",
    "    if not os.path.exists('model_5'):\n",
    "        os.makedirs('model_5')\n",
    "    joblib.dump(clf, 'model_5/bdt.joblib')\n",
    "else:\n",
    "    clf = joblib.load('model_5/bdt.joblib')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e9857c2",
   "metadata": {},
   "source": [
    "## Create a conifer configuration\n",
    "\n",
    "Similarly to `hls4ml`, we can use a utility method to get a template for the configuration dictionary that we can modify."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5bab868f",
   "metadata": {},
   "outputs": [],
   "source": [
    "cfg = conifer.backends.xilinxhls.auto_config()\n",
    "cfg['OutputDir'] = 'model_5/conifer_prj'\n",
    "cfg['XilinxPart'] = 'xcu250-figd2104-2L-e'\n",
    "plotting.print_dict(cfg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e3ca740",
   "metadata": {},
   "source": [
    "## Convert the model\n",
    "The syntax for model conversion with `conifer` is a little different to `hls4ml`. We construct a `conifer.model` object, providing the trained BDT, the converter corresponding to the library we used, the conifer 'backend' that we wish to target, and the configuration.\n",
    "\n",
    "`conifer` has converters for:\n",
    "- `sklearn`\n",
    "- `xgboost`\n",
    "- `tmva`\n",
    "\n",
    "And backends:\n",
    "- `vivadohls`\n",
    "- `vitishls`\n",
    "- `xilinxhls` (use whichever `vivado` or `vitis` is on the path\n",
    "- `vhdl`\n",
    "\n",
    "Here we will use the `sklearn` converter, since that's how we trained our model, and the `vivadohls` backend. For larger BDTs with many more trees or depth, it may be preferable to generate VHDL directly using the `vhdl` backend to get best performance. See [our paper](https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05026/pdf) for the performance comparison between those backends."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ebf5b06",
   "metadata": {},
   "outputs": [],
   "source": [
    "cnf = conifer.model(clf, conifer.converters.sklearn, conifer.backends.vivadohls, cfg)\n",
    "cnf.compile()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc5e487b",
   "metadata": {},
   "source": [
    "## profile\n",
    "Similarly to hls4ml, we can visualize the distribution of the parameters of the BDT to guide the choice of precision"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "993fef56",
   "metadata": {},
   "outputs": [],
   "source": [
    "cnf.profile()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c840ca4",
   "metadata": {},
   "source": [
    "## Run inference\n",
    "Now we can execute the BDT inference with `sklearn`, and also the bit exact simulation using Vivado HLS. The output that the `conifer` BDT produces is equivalent to the `decision_function` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9fd0fee",
   "metadata": {},
   "outputs": [],
   "source": [
    "y_skl = clf.decision_function(X_test)\n",
    "y_cnf = cnf.decision_function(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c486535e",
   "metadata": {},
   "source": [
    "## Check performance\n",
    "\n",
    "Print the accuracy from `sklearn` and `conifer` evaluations, and plot the ROC curves. We should see that we can get quite close to the accuracy of the Neural Networks from parts 1-4."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a87c1b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "yt = ohe.transform(y_test).toarray().astype(np.int)\n",
    "print(\"Accuracy sklearn: {}\".format(accuracy_score(np.argmax(yt, axis=1), np.argmax(y_skl, axis=1))))\n",
    "print(\"Accuracy conifer: {}\".format(accuracy_score(np.argmax(yt, axis=1), np.argmax(y_cnf, axis=1))))\n",
    "fig, ax = plt.subplots(figsize=(9, 9))\n",
    "_ = plotting.makeRoc(yt, y_skl, classes)\n",
    "plt.gca().set_prop_cycle(None) # reset the colors\n",
    "_ = plotting.makeRoc(yt, y_cnf, classes, linestyle='--')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70c43d82",
   "metadata": {},
   "source": [
    "## Synthesize\n",
    "Now run the Vivado HLS C Synthesis step to produce an IP that we can use, and inspect the estimate resources and latency.\n",
    "You can see some live output while the synthesis is running by opening a terminal from the Jupyter home page and executing:\n",
    "`tail -f model_5/conifer_prj/vivado_hls.log`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "721814ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "cnf.build()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad1efe07",
   "metadata": {},
   "source": [
    "## Read report\n",
    "We can use an hls4ml utility to read the Vivado report"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "578a62c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hls4ml\n",
    "hls4ml.report.read_vivado_report('model_5/conifer_prj/')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}