{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Before You Start\n", "\n", "The current set of notebooks are under constant development.\n", "\n", "## Update Tutorial Repository\n", "\n", "If you have previously cloned the tutorial repository, you may need to get the latest versions of the notebooks.\n", "\n", "First check the status of your repository:\n", "```\n", "cd hls4ml-tutorial\n", "make clean\n", "git status \n", "```\n", "\n", "You may have some _modified_ notebooks. For example:\n", "\n", "```\n", "# On branch csee-e6868-spring2022\n", "# Changes not staged for commit:\n", "# (use \"git add ...\" to update what will be committed)\n", "# (use \"git checkout -- ...\" to discard changes in working directory)\n", "#\n", "#\tmodified: part1_getting_started.ipynb\n", "#\tmodified: part2_advanced_config.ipynb\n", "#\n", "no changes added to commit (use \"git add\" and/or \"git commit -a\")\n", "```\n", "\n", "You can make a copy of those modified notebooks if you had significat changes, otherwise the easiest thing to do is to discard those changes.\n", "\n", "**ATTENTION** You will loose your local changes!\n", "\n", "```\n", "git checkout *.ipynb\n", "```\n", "\n", "At this point, you can update you copy of the repository:\n", "```\n", "git pull\n", "```\n", "\n", "# Part 2: Advanced Configuration\n", "\n", "In this notebook, we will learn more about design-space exploration with hls4ml. We will focus on post-training quantization (introduced in [Part 1](part1_getting_started.ipynb)) and reuse factor.\n", "\n", "## Setup\n", "\n", "As we did in [Part 1](part1_getting_started.ipynb), let's import the libraries, call the magic functions, and setup the environment variables." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.utils import to_categorical\n", "from sklearn.datasets import fetch_openml\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import LabelEncoder, StandardScaler\n", "from sklearn.metrics import accuracy_score\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import os\n", "os.environ['PATH'] = '/opt/xilinx/Vivado/2019.1/bin:' + os.environ['PATH']\n", "def is_tool(name):\n", " from distutils.spawn import find_executable\n", " return find_executable(name) is not None\n", "\n", "print('-----------------------------------')\n", "if not is_tool('vivado_hls'):\n", " print('Xilinx Vivado HLS is NOT in the PATH')\n", "else:\n", " print('Xilinx Vivado HLS is in the PATH')\n", "print('-----------------------------------')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the dataset\n", "\n", "In [Part 1](part1_getting_started.ipynb), we saved the preprocessed dataset to files. Let's load them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train_val = np.load('X_train_val.npy')\n", "X_test = np.load('X_test.npy')\n", "y_train_val = np.load('y_train_val.npy')\n", "y_test = np.load('y_test.npy', allow_pickle=True)\n", "classes = np.load('classes.npy', allow_pickle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the model\n", "\n", "In [Part 1](part1_getting_started.ipynb), we saved the trained model to file. Let's load it as well.\n", "\n", "**Make sure you've run through that walkthrough first!**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.models import load_model\n", "model = load_model('model_1/KERAS_check_best_model.h5')\n", "y_keras = model.predict(X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an hls4ml configuration & model\n", "\n", "This time, we'll create an hls4ml configuration dictionary with _finer granularity_.\n", "\n", "When we used `granularity='model'`, the generated configuration was:\n", "```\n", "-----------------------------------\n", "Model\n", " Precision: ap_fixed<16,6>\n", " ReuseFactor: 1\n", " Strategy: Latency\n", "-----------------------------------\n", "```\n", "\n", "Now instead we are using `granularity='name'` and we print the configuration, you'll notice that an entry is created for each named layer of the model. See for the first layer, for example:\n", "```\n", "-----------------------------------\n", "[...]\n", "LayerName:\n", " fc1:\n", " Precision:\n", " weight: ap_fixed<16,6>\n", " bias: ap_fixed<16,6>\n", " result: ap_fixed<16,6>\n", " ReuseFactor: 1\n", "[...]\n", "-----------------------------------\n", "```\n", "Taken _out of the box_ this configuration will set all the parameters to the same settings as in [Part 1](part1_getting_started.ipynb), but we can use it as a template to start modifying things. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import hls4ml\n", "import plotting\n", "config = hls4ml.utils.config_from_keras_model(model, granularity='name')\n", "print(\"-----------------------------------\")\n", "plotting.print_dict(config)\n", "print(\"-----------------------------------\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the remaining of the notebook we will investigate how to leverage the `Precision` and `ReuseFactor` knobs to run design-space exploration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Design Space Exploration for Post-Training Quantization\n", "\n", "### Profile\n", "\n", "With this new dictionary, we can choose the precision of _everything_ in our neural network. This is a powerful way to tune the performance, but the _fine tuning_ of all of these parameters is also complicated.\n", "\n", "The tools in `hls4ml.model.profiling` can help you choose the right precision for your model. (That said, training your model with quantization built in can get around this problem, and that is introduced in Part 4. So, don't go too far down the rabbit hole of tuning your data types without first trying out quantization aware training with [QKeras](https://github.com/google/qkeras).)\n", "\n", "The first thing to try is to numerically profile your model. This method plots the distribution of the weights (and biases) as a [box and whisker plot](https://en.wikipedia.org/wiki/Box_plot#:~:text=In%20descriptive%20statistics%2C%20a%20box,whisker%20plot%20and%20box%2Dand%2D). The grey boxes show the values which can be represented with the data types used in the `hls_model`. Generally,\n", "- you need the box to overlap completely with the whisker *to the right* (large values) otherwise you'll get saturation & wrap-around issues;\n", "- it can be okay for the box not to overlap completely *to the left* (small values), but finding how small you can go is a matter of trial-and-error.\n", "\n", "Providing data, in this case just using the first 1000 examples for speed, will show the same distributions captured at the output of each layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Enable tracing for all of the layers\n", "for layer in config['LayerName'].keys():\n", " config['LayerName'][layer]['Trace'] = True\n", "\n", "%matplotlib inline\n", "# Create an HLS model from the Keras model and hls4ml configuration dictionary\n", "hls_model = hls4ml.converters.convert_from_keras_model(model,\n", " hls_config=config,\n", " output_dir='model_1/hls4ml_prj_2',\n", " #fpga_part='xczu7ev-ffvc1156-2-e') # ZCU106\n", " part='xczu3eg-sbva484-1-e') # Ultra96\n", " #part='xc7z020clg400-1') # Pynq-Z1\n", " #part='xc7z007sclg225-1') # MiniZed\n", "# Run profiling\n", "_ = hls4ml.model.profiling.numerical(model=model, hls_model=hls_model, X=X_test[:1000])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Customize\n", "\n", "Let's just try setting the precision of the first layer in the weights to something more narrow than 16 bits. Using fewer bits can save resources in the FPGA. After inspecting the profiling plot above, let's try 8 bits with 1 integer bit.\n", "\n", "Then create a new `hls_model`, and display the profiling with the new config. This time, just display the weight profile by not providing any data '`X`'. Then create the `HLSModel` and display the architecture. Notice the box around the weights of the first layer reflects the different precision." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Fine tune the precision of the weights in the first layer\n", "config['LayerName']['fc1']['Precision']['weight'] = 'ap_fixed<8,2>'\n", "\n", "# Create an HLS model from the Keras model and the updated hls4ml configuration dictionary\n", "hls_model = hls4ml.converters.convert_from_keras_model(model,\n", " hls_config=config,\n", " output_dir='model_1/hls4ml_prj_2',\n", " #part='xczu7ev-ffvc1156-2-e') # ZCU106\n", " part='xczu3eg-sbva484-1-e') # Ultra96\n", " #part='xc7z020clg400-1') # Pynq-Z1\n", " #part='xc7z007sclg225-1') # MiniZed\n", "# Run profiling (weights only, no X is provided)\n", "_ = hls4ml.model.profiling.numerical(model=model, hls_model=hls_model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we did in [Part 1](part1_getting_started.ipynb), let's visualise the HLS model that we just created." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hls4ml.utils.plot_model(hls_model, show_shapes=True, show_precision=True, to_file=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Enable Tracing\n", "When we start using customised precision throughout the model, it can be useful to collect the output from each layer to find out when things have gone wrong. We enable this trace collection by setting `Trace = True` for each layer whose output we want to collect." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Enable tracing for all of the layers\n", "for layer in config['LayerName'].keys():\n", " config['LayerName'][layer]['Trace'] = True\n", "\n", "# Create an HLS model from the Keras model and the updated hls4ml configuration dictionary\n", "hls_model = hls4ml.converters.convert_from_keras_model(model,\n", " hls_config=config,\n", " output_dir='model_1/hls4ml_prj_2',\n", " #part='xczu7ev-ffvc1156-2-e') # ZCU106\n", " part='xczu3eg-sbva484-1-e') # Ultra96\n", " #part='xc7z020clg400-1') # Pynq-Z1\n", " #part='xc7z007sclg225-1') # MiniZed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compile, trace, predict\n", "\n", "Now we need to check that this model performance is still good after reducing the precision.\n", "\n", "We compile the `hls_model`, and now use the `hls_model.trace` method to collect the model output, and also the output for all the layers we enabled tracing for. The trace is a dictionary with keys corresponding to the layer names of the model; stored at that key is the array of values output by that layer, sampled from the provided data.\n", "\n", "A helper function `get_ymodel_keras` returns the same dictionary for the Keras model that is with the floating point precision.\n", "\n", "We'll just run the `trace` for the first 1000 examples, since it takes a bit longer and uses more memory than just running `predict`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%time\n", "# Recompile the hls model\n", "hls_model.compile()\n", "\n", "# Run tracing on a portion of the test set for the hls model (fixed-point precision) \n", "hls4ml_pred, hls4ml_trace = hls_model.trace(np.ascontiguousarray(X_test[:1000]))\n", "\n", "# Run tracing on a portion of the test set for the Keras model (floating-point precision)\n", "keras_trace = hls4ml.model.profiling.get_ymodel_keras(model, X_test[:1000])\n", "\n", "# Run prediction on all of the test set for the hls model (fixed-point precision)\n", "y_hls = hls_model.predict(np.ascontiguousarray(X_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Manual Inspection\n", "Now we can print out, make plots, or do any other more detailed analysis on the output of each layer to make sure we haven't made the performance worse. And if we have, we can quickly find out where. Let's just print the output of the first layer, for the first sample, for both the Keras and hls4ml models." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('-----------------------------------')\n", "print(\"Keras layer 'fc1', first sample:\")\n", "print(keras_trace['fc1'][0])\n", "print('-----------------------------------')\n", "print(\"hls4ml layer 'fc1', first sample:\")\n", "print(hls4ml_trace['fc1'][0])\n", "print('-----------------------------------')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare\n", "Let's see if we lost performance by using 8 bits for the weights of the first layer by inspecting the accuracy and ROC curve." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('-----------------------------------')\n", "print(\"Keras Accuracy: {}\".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_keras, axis=1))))\n", "print(\"hls4ml Accuracy: {}\".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_hls, axis=1))))\n", "print('-----------------------------------')\n", "\n", "# Enable logarithmic scale on TPR and FPR axes \n", "logscale_tpr = False # Y axis\n", "logscale_fpr = False # X axis\n", "\n", "fig, ax = plt.subplots(figsize=(9, 9))\n", "_ = plotting.plotMultiClassRoc(y_test, y_keras, classes, logscale_tpr=logscale_tpr, logscale_fpr=logscale_fpr)\n", "plt.gca().set_prop_cycle(None) # reset the colors\n", "_ = plotting.plotMultiClassRoc(y_test, y_hls, classes, logscale_tpr=logscale_tpr, logscale_fpr=logscale_fpr, linestyle='--')\n", "\n", "from matplotlib.lines import Line2D\n", "lines = [Line2D([0], [0], ls='-'),\n", " Line2D([0], [0], ls='--')]\n", "from matplotlib.legend import Legend\n", "leg = Legend(ax, lines, labels=['keras', 'hls4ml'],\n", " loc='lower right', frameon=False)\n", "_ = ax.add_artist(leg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The AUC results for the Keras and hls4ml implementation are again really close, but you can notice a little more difference with respect the plot in [Part 1](part1_getting_started.ipynb). Apply logaritmic scale on the FPR axis (logscale_fpr=True) to better appreciate differences." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### PTQ Summary\n", "\n", "We lost a small amount of accuracy compared to when we used `ap_fixed<16,6>`, but in many cases this difference will be small enough to be worth the resource saving. You can choose how aggressive to go with quantization, but it's always sensible to make the profiling plots even with the default configuration.\n", "\n", "Layer-level `trace` is very useful for finding when you reduced the bitwidth too far, or when the default configuration is no good for your model.\n", "\n", "In this model, _post training quantization_ at around 8-bits width generally seems to be the limit to how low you can go before suffering significant performance loss. In Part 4, we'll look at using _training aware quantization_ with QKeras to go much lower without losing much performance.\n", "\n", "## Design Space Exploration for Reuse Factor\n", "\n", "Now let's look at the other configuration parameter: `ReuseFactor`.\n", "Recall that `ReuseFactor` is our mechanism for tuning the parallelism:" ] }, { "attachments": { "reuse.png": { "image/png": "" } }, "cell_type": "markdown", "metadata": {}, "source": [ "![reuse.png](attachment:reuse.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now let's make a new configuration for this model, and set the `ReuseFactor` to `2` for every layer:\n", "we'll compile the model, then evaulate its performance. (Note, by creating a new config with `granularity=Model`, we're implicitly resetting the precision to `ap_fixed<16,6>` throughout.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate a hls4ml configuration dictionary from the Keras model\n", "config = hls4ml.utils.config_from_keras_model(model, granularity='Model')\n", "\n", "# Set the ReuseFactor to 2 for all of the layers of the model\n", "config['Model']['ReuseFactor'] = 2\n", "\n", "print('-----------------------------------')\n", "# Show the generated configuration dictionary for hls4ml\n", "plotting.print_dict(config)\n", "print('-----------------------------------')\n", "\n", "# Create an HLS model from the Keras model and the updated hls4ml configuration dictionary\n", "hls_model = hls4ml.converters.convert_from_keras_model(model,\n", " hls_config=config,\n", " output_dir='model_1/hls4ml_prj_2',\n", " #part='xczu7ev-ffvc1156-2-e') # ZCU106\n", " part='xczu3eg-sbva484-1-e') # Ultra96\n", " #part='xc7z020clg400-1') # Pynq-Z1\n", " #part='xc7z007sclg225-1') # MiniZed\n", "_ = hls_model.compile()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Changing the `ReuseFactor` should not change the classification results, but let's just verify that by inspecting the accuracy and ROC curve again!\n", "Then we'll build the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run prediction on all of the test set for the hls model (fixed-point precision)\n", "y_hls = hls_model.predict(np.ascontiguousarray(X_test))\n", "\n", "print('-----------------------------------')\n", "print(\"Keras Accuracy: {}\".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_keras, axis=1))))\n", "print(\"hls4ml Accuracy: {}\".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_hls, axis=1))))\n", "print('-----------------------------------')\n", "\n", "# Enable logarithmic scale on TPR and FPR axes \n", "logscale_tpr = False # Y axis\n", "logscale_fpr = False # X axis\n", "\n", "fig, ax = plt.subplots(figsize=(9, 9))\n", "_ = plotting.plotMultiClassRoc(y_test, y_keras, classes, logscale_tpr=logscale_tpr, logscale_fpr=logscale_fpr)\n", "plt.gca().set_prop_cycle(None) # reset the colors\n", "_ = plotting.plotMultiClassRoc(y_test, y_hls, classes, logscale_tpr=logscale_tpr, logscale_fpr=logscale_fpr, linestyle='--')\n", "\n", "from matplotlib.lines import Line2D\n", "lines = [Line2D([0], [0], ls='-'),\n", " Line2D([0], [0], ls='--')]\n", "from matplotlib.legend import Legend\n", "leg = Legend(ax, lines, labels=['keras', 'hls4ml'],\n", " loc='lower right', frameon=False)\n", "_ = ax.add_artist(leg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's run Vivado HLS to synthesize the model (_C-Synthesis_).\n", "\n", "**This takes approx. 15 minutes on Columbia servers.**\n", "\n", "While the C-Synthesis is running, we can monitor the progress looking at the log file by opening a terminal from the notebook home, and executing:\n", "\n", "`tail -f model_1/hls4ml_prj_2/vivado_hls.log`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "hls_results = hls_model.build(csim=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now print the report, compare this to the report from Exercise 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hls4ml.report.read_vivado_report('model_1/hls4ml_prj_2')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hls4ml.report.read_vivado_report('model_1/hls4ml_prj')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the new design, we are expecting half of the DSP usage and _Initiation Interval_ equals to 2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise\n", "1. Recall the outcome of the exercise of Part 1 where we estimated how many DSPs our network should use. How does this change now we've used `ReuseFactor = 2` for the network? Does the expectation match the report this time?\n", "2. By leveraging `Precision` and `Reuse Factor`, try to create HLS models that fit at least the ZCU106 and Ultra96 boards." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }