{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "209d2b58", "metadata": {}, "source": [ "## Part 5: Boosted Decision Trees\n", "\n", "The `conifer` package was created out of `hls4ml`, providing a similar set of features but specifically targeting inference of Boosted Decision Trees. In this notebook we will train a `GradientBoostingClassifier` with scikit-learn, using the same jet tagging dataset as in the other tutorial notebooks. Then we will convert the model using `conifer`, and run bit-accurate prediction and synthesis as we did with `hls4ml` before.\n", "\n", "`conifer` is available from GitHub [here](https://github.com/thesps/conifer), and we have a publication describing the inference implementation and performance in detail [here](https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05026/pdf).\n", "\n", "<img src=\"https://github.com/thesps/conifer/blob/master/conifer_v1.png?raw=true\" width=\"250\" alt=\"conifer\">" ] }, { "cell_type": "code", "execution_count": null, "id": "eda9b784", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from sklearn.ensemble import GradientBoostingClassifier\n", "from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n", "from sklearn.metrics import accuracy_score\n", "import joblib\n", "import conifer\n", "import plotting\n", "import matplotlib.pyplot as plt\n", "import os\n", "os.environ['PATH'] = '/opt/Xilinx/Vivado/2019.2/bin:' + os.environ['PATH']\n", "np.random.seed(0)" ] }, { "cell_type": "markdown", "id": "18354699", "metadata": {}, "source": [ "## Load the dataset\n", "Note you need to have gone through `part1_getting_started` to download the data." ] }, { "cell_type": "code", "execution_count": null, "id": "1574ed18", "metadata": {}, "outputs": [], "source": [ "X_train_val = np.load('X_train_val.npy')\n", "X_test = np.load('X_test.npy')\n", "y_train_val = np.load('y_train_val.npy')\n", "y_test = np.load('y_test.npy', allow_pickle=True)\n", "classes = np.load('classes.npy', allow_pickle=True)" ] }, { "cell_type": "markdown", "id": "24658fb4", "metadata": {}, "source": [ "We need to transform the test labels from the one-hot encoded values to labels" ] }, { "cell_type": "code", "execution_count": null, "id": "00f304bd", "metadata": {}, "outputs": [], "source": [ "le = LabelEncoder().fit(classes)\n", "ohe = OneHotEncoder().fit(le.transform(classes).reshape(-1,1))\n", "y_train_val = ohe.inverse_transform(y_train_val.astype(np.int))\n", "y_test = ohe.inverse_transform(y_test)" ] }, { "cell_type": "markdown", "id": "8305e22c", "metadata": {}, "source": [ "## Train a `GradientBoostingClassifier`\n", "We will use 20 estimators with a maximum depth of 3. The number of decision trees will be `n_estimators * n_classes`, so 100 for this dataset. If you are returning to this notebook having already trained the BDT once, set `train = False` to load the model rather than retrain." ] }, { "cell_type": "code", "execution_count": null, "id": "f5044231", "metadata": {}, "outputs": [], "source": [ "train = True\n", "if train:\n", " clf = GradientBoostingClassifier(n_estimators=20, learning_rate=1.0,\n", " max_depth=3, random_state=0, verbose=1).fit(X_train_val, y_train_val.ravel())\n", " if not os.path.exists('model_5'):\n", " os.makedirs('model_5')\n", " joblib.dump(clf, 'model_5/bdt.joblib')\n", "else:\n", " clf = joblib.load('model_5/bdt.joblib')" ] }, { "cell_type": "markdown", "id": "5e9857c2", "metadata": {}, "source": [ "## Create a conifer configuration\n", "\n", "Similarly to `hls4ml`, we can use a utility method to get a template for the configuration dictionary that we can modify." ] }, { "cell_type": "code", "execution_count": null, "id": "5bab868f", "metadata": {}, "outputs": [], "source": [ "cfg = conifer.backends.xilinxhls.auto_config()\n", "cfg['OutputDir'] = 'model_5/conifer_prj'\n", "cfg['XilinxPart'] = 'xcu250-figd2104-2L-e'\n", "plotting.print_dict(cfg)" ] }, { "cell_type": "markdown", "id": "9e3ca740", "metadata": {}, "source": [ "## Convert the model\n", "The syntax for model conversion with `conifer` is a little different to `hls4ml`. We construct a `conifer.model` object, providing the trained BDT, the converter corresponding to the library we used, the conifer 'backend' that we wish to target, and the configuration.\n", "\n", "`conifer` has converters for:\n", "- `sklearn`\n", "- `xgboost`\n", "- `tmva`\n", "\n", "And backends:\n", "- `vivadohls`\n", "- `vitishls`\n", "- `xilinxhls` (use whichever `vivado` or `vitis` is on the path\n", "- `vhdl`\n", "\n", "Here we will use the `sklearn` converter, since that's how we trained our model, and the `vivadohls` backend. For larger BDTs with many more trees or depth, it may be preferable to generate VHDL directly using the `vhdl` backend to get best performance. See [our paper](https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05026/pdf) for the performance comparison between those backends." ] }, { "cell_type": "code", "execution_count": null, "id": "7ebf5b06", "metadata": {}, "outputs": [], "source": [ "cnf = conifer.model(clf, conifer.converters.sklearn, conifer.backends.vivadohls, cfg)\n", "cnf.compile()" ] }, { "cell_type": "markdown", "id": "dc5e487b", "metadata": {}, "source": [ "## profile\n", "Similarly to hls4ml, we can visualize the distribution of the parameters of the BDT to guide the choice of precision" ] }, { "cell_type": "code", "execution_count": null, "id": "993fef56", "metadata": {}, "outputs": [], "source": [ "cnf.profile()" ] }, { "cell_type": "markdown", "id": "9c840ca4", "metadata": {}, "source": [ "## Run inference\n", "Now we can execute the BDT inference with `sklearn`, and also the bit exact simulation using Vivado HLS. The output that the `conifer` BDT produces is equivalent to the `decision_function` method." ] }, { "cell_type": "code", "execution_count": null, "id": "b9fd0fee", "metadata": {}, "outputs": [], "source": [ "y_skl = clf.decision_function(X_test)\n", "y_cnf = cnf.decision_function(X_test)" ] }, { "cell_type": "markdown", "id": "c486535e", "metadata": {}, "source": [ "## Check performance\n", "\n", "Print the accuracy from `sklearn` and `conifer` evaluations, and plot the ROC curves. We should see that we can get quite close to the accuracy of the Neural Networks from parts 1-4." ] }, { "cell_type": "code", "execution_count": null, "id": "3a87c1b8", "metadata": {}, "outputs": [], "source": [ "yt = ohe.transform(y_test).toarray().astype(np.int)\n", "print(\"Accuracy sklearn: {}\".format(accuracy_score(np.argmax(yt, axis=1), np.argmax(y_skl, axis=1))))\n", "print(\"Accuracy conifer: {}\".format(accuracy_score(np.argmax(yt, axis=1), np.argmax(y_cnf, axis=1))))\n", "fig, ax = plt.subplots(figsize=(9, 9))\n", "_ = plotting.makeRoc(yt, y_skl, classes)\n", "plt.gca().set_prop_cycle(None) # reset the colors\n", "_ = plotting.makeRoc(yt, y_cnf, classes, linestyle='--')" ] }, { "cell_type": "markdown", "id": "70c43d82", "metadata": {}, "source": [ "## Synthesize\n", "Now run the Vivado HLS C Synthesis step to produce an IP that we can use, and inspect the estimate resources and latency.\n", "You can see some live output while the synthesis is running by opening a terminal from the Jupyter home page and executing:\n", "`tail -f model_5/conifer_prj/vivado_hls.log`" ] }, { "cell_type": "code", "execution_count": null, "id": "721814ef", "metadata": {}, "outputs": [], "source": [ "cnf.build()" ] }, { "cell_type": "markdown", "id": "ad1efe07", "metadata": {}, "source": [ "## Read report\n", "We can use an hls4ml utility to read the Vivado report" ] }, { "cell_type": "code", "execution_count": null, "id": "578a62c3", "metadata": {}, "outputs": [], "source": [ "import hls4ml\n", "hls4ml.report.read_vivado_report('model_5/conifer_prj/')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }