{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Network Topology Selection: A Model Validation Perspective\n", "
(C) Nikolai Nowaczyk, Jörg Kienitz, Sarp Kaya Acar, Qian Liang 2019
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Choosing a good topology for the neural network is key in order to achieve a successful training as the number of layers and the numbers of unit in each layer directly determine the number of trainable weights. But how to chose the number of layers and units? This question is of particular importance in cases where neural networks are used for financial modelling as these models are subject to model governance as well as internal and external model validation. Thus, the network topology has to be chosen in a way that is compatible with model governance.\n", "\n", "Typical approaches to chose a network topology (in financial and any other conext) are:\n", "* **Arbitrary choice:** The model developer simply makes a choice and plays around with it until the results are satisfactory. Alternative choices are not documented or systematically evaluated.\n", "* **Automatic Machine Learning:** There are attempts to automate the process of chosing a network topology, dubbed *AutoML*.\n", "* **Functional Analysis:** From a mathematical perspective, a neural network is a method to approximate a non-linear function. All possible choices of network topologies constitute a space of approximation functions and thus existing theory from that area can be utilized.\n", "\n", "The first method is of course straight-forward and quite often successfully used in practice. However, it is not always successful. For financial models, this approach is particularly problematic as it is not compatible with any model governance framework.\n", "\n", "The second method can also work well for many practical use cases, but the problem from a model validation perspective here is that one tries to shed light into a blackbox with another blackbox. AutoML can only be used to validate a bank's machine learning solution after it has sucessfully completed a model validation process itself. That might be possible, but requires a lot of additional ressources - potentially more than to validate the neural network in question directly. Also, it is doubtful if a regulator would sign such a blank cheque.\n", "\n", "The third method, has the advantage that a lot of literature and research already exists in that area - most famously the [universal approximation theorem for neural networks](https://en.wikipedia.org/wiki/Universal_approximation_theorem). This literature is very helpful in providing sound methodological justifications and theoretical background. However, in practice, asymptotic convergence results will often not be enough to uniquely determine the concrete parameters of the network for the problem at hand.\n", "\n", "We will therefore propose an intermediary solution that improves the first method by some of the techniques used in the second, in particular grid search. We will formulate this proposal in a language that takes the more classical perspective of degrees of freedom that is very common in model validation. The aim is to have a framework in place that can be used in practice to get a production level machine learning application in quantitative finance through a model validation process." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from keras.models import Sequential\n", "from keras.layers import Dense\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import matplotlib\n", "from mpl_toolkits.mplot3d import Axes3D\n", "import seaborn as sns\n", "%matplotlib notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Multilayer Perceptron (MLP)\n", "Let's first consider the case of MLPs. For MLPs one has to chose the number $n_L$ of layers and for each hidden layer $i$ the number $n_{u_i}$ of units. For any choice of these parameters, the degrees of freedom in the resulting MLP are precisely the number $n_w$ trainable weights, which we compute in the following.\n", "\n", "For a primer on MLPs and the notaion convention used in the following, see [here](https://nbviewer.jupyter.org/github/niknow/machine-learning-examples/blob/master/neural_network_intro/neural_network_intro_model_setup.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Only 1 Layer\n", "In the easiest case, one only has $n_L = 1$ layers. In that case, the number $n_w$ of trainable weights is given by\n", "$$ n_w = n_o n_i + n_o,$$\n", "where\n", "* $n_i$ is the number of inputs,\n", "* $n_o$ is the number of outputs.\n", "\n", "This is because we need a matrix in $\\mathbb{R}^{n_o x n_i}$ and a vector in $\\mathbb{R}^{n_o}$ for the bias (we always assume that we train the bias).\n", "\n", "In particular, for a network with only $n_L = 1$ layer, the topology is completely determined by the input and the output dimensions and there is no choice to be made. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "dense_1 (Dense) (None, 3) 9 \n", "=================================================================\n", "Total params: 9\n", "Trainable params: 9\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "# in keras, this setup corresponds to\n", "n_o = 3\n", "n_i = 2\n", "model = Sequential([\n", " Dense(units=n_o, input_shape=(n_i,)),\n", "])\n", "print(n_i*n_o + n_o)\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 Layers\n", "If we allow $n_L=2$ layers, then we have to choose the number $n_u$ of units in the first layer. The resulting number of weights will be\n", "$$ n_w = n_u n_i + n_u + n_o n_u + n_o = n_u(n_i + n_o + 1) + n_o, $$\n", "because for the first layer, we need one matrix of shape $n_u x n_i$ and a vector of shape $n_u$, and for the second layer we need a matrix of shape $n_o x n_u$ and a vector of shape $n_o$. Thus, the only parameter we need to choose is $n_u$. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "63\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "dense_2 (Dense) (None, 10) 30 \n", "_________________________________________________________________\n", "dense_3 (Dense) (None, 3) 33 \n", "=================================================================\n", "Total params: 63\n", "Trainable params: 63\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "# in keras, this setup corresponds to\n", "n_o = 3\n", "n_i = 2\n", "n_u = 10\n", "model = Sequential([\n", " Dense(units=n_u, input_shape=(n_i,)),\n", " Dense(units=n_o),\n", "])\n", "print(n_u * (n_i + n_o + 1) + n_o)\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arbitrary many hidden layers\n", "In case the number of layers is $n_L \\geq 3$, we have to choose the number of units $n_u$ for all but the last layers separately. If we enumerate the layers from input to output by $1, \\ldots, n_L$ and denote by $n_{u_i}$ the number of units in each layer, then the total number of trainable weights is given by\n", "$$ n_w = n_i n_{u_1} + n_{u_1} + n_{u_L} n_o + n_o + \\sum_{i=2}^{n_L - 1}{n_{u_{i-1}} n_{u_{i}} +n_{u_i}},$$\n", "which in theory leaves us with many choices. \n", "\n", "If we assume that all but the output layer have the same number of units, i.e. $n_{u_i}=n_u$ for $1 \\leq i < N_L$, then we only have to choose one number $n_u$ and the resulting number of trainable weights simplifies to:\n", "$$ n_w = (n_L - 2) n_u^2 + (n_i + n_o + n_L - 1)n_u + n_o ,$$\n", "which requires $2$ choices, namely $n_L$, the number of layers, and $n_u$, the number of units per layer." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "393\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "dense_4 (Dense) (None, 10) 30 \n", "_________________________________________________________________\n", "dense_5 (Dense) (None, 10) 110 \n", "_________________________________________________________________\n", "dense_6 (Dense) (None, 10) 110 \n", "_________________________________________________________________\n", "dense_7 (Dense) (None, 10) 110 \n", "_________________________________________________________________\n", "dense_8 (Dense) (None, 3) 33 \n", "=================================================================\n", "Total params: 393\n", "Trainable params: 393\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "# in keras, an example is given by\n", "n_o = 3\n", "n_i = 2\n", "n_u = 10\n", "n_L = 5\n", "model = Sequential([\n", " Dense(units=n_u, input_shape=(n_i,)),\n", " Dense(units=n_u),\n", " Dense(units=n_u),\n", " Dense(units=n_u),\n", " Dense(units=n_o),\n", "])\n", "print( (n_L - 2) * n_u**2 + (n_i + n_o + n_L - 1) * n_u + n_o)\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Balancing Layers, Units and Weights\n", "\n", "We have seen that under the assumption that every layer has the same number of units (except the output layer), the number $n_w$ of trainable weights is a function $n_w = n_w(n_L, w_u)$ of the number $n_L$ of layers and $n_u$ of units per layer.\n", "\n", "In order to find a good network topology for a given problem, one could of course simply try out a grid of networks parametrised by $n_L$ and $n_u$. However, if we study this function using some simple examples, we see that this might not be the best choice." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# implement $n_w$ function\n", "def num_weights(n_i, n_o, n_L, n_u):\n", " \"\"\"\n", " Computes the total number of parameters in an MLP assuming all layers have the same\n", " number of units (except the output layer).\n", "\n", " param n_i: number of inputs\n", " param n_o: number of outputs\n", " param n_L: number of layers\n", " param n_u: number of units per layer\n", "\n", " returns: total number of trainable weights\n", "\n", " \"\"\"\n", " if n_L == 2:\n", " return n_u * (n_i + n_o + 1) + n_o\n", " else:\n", " return (n_L - 2) * n_u**2 + (n_i + n_o + n_L - 1) * n_u + n_o" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " event.shiftKey = false;\n", " // Send a \"J\" for go to next cell\n", " event.which = 74;\n", " event.keyCode = 74;\n", " manager.command_mode();\n", " manager.handle_keydown(event);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# create example \n", "n_o = 3\n", "n_i = 2\n", "layers = np.arange(2, 6)\n", "units = np.arange(20, 120, 20)\n", "weights ={u : num_weights(n_i, n_o, layers[0], u) for u in units}\n", "xx, yy = np.meshgrid(units, layers)\n", "x, y = xx.ravel(), yy.ravel()\n", "z = np.array([[num_weights(n_i, n_o, l, num_units(n_i, n_o, l, weights[u])) for u in units] for l in layers]).ravel()\n", "\n", "# plot example\n", "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection='3d')\n", "width = 5\n", "depth = 0.25\n", "ax.bar3d(x, y, bottom, width, depth, z, shade=True)\n", "ax.set_title('Trainable weights')\n", "ax.set_xlabel('original number of units')\n", "ax.set_ylabel('number of layers')\n", "ax.set_zlabel('number of weights')\n", "ax.zaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))\n", "ax.yaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))\n", "ax.set_yticks(layers)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the number of weights now increases linearly with the number of units, but stays roughly constant with the number of layers (because the number of units is decreased). It does not stay exactly constant due to rounding the solution of the quadratic equation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Network Topology Selection Algorithm\n", "\n", "This leaves us with the following method of determining a good network topology for any given problem:\n", "\n", "**Input:**\n", "* An artificial neural network $\\operatorname{NN}$.\n", "* A range of number of units $\\mathcal{N}_u = (n_{u_1}, \\ldots, n_{u_m})$\n", "* A range of number of layers $\\mathcal{N}_L = (N_{L_1}, \\ldots, N_{L_m})$\n", "* A labeled dataset $(x,y)$ together with a train/test split (e.g. $80\\%, 20\\%$)\n", "* A bias threshold $t_b$ and a variance threshold $t_v$ together with metrics for both (e.g. MSE).\n", "* A number $e_{\\max}$ of maximal epochs.\n", "\n", "\n", "**Steps:**\n", "* For each $n_u \\in \\mathcal{N}_u$ and each $n_L \\in \\mathcal{N}_L$ train the network $\\operatorname{NN}$ with $(x,y)$ until the bias and variance is below the tresholds $t_b$ and $t_v$ (or the maximum number of epochs $e_{\\max}$ is reached). This results in a grid of trained models parametrized by $\\mathcal{N}_u \\times \\mathcal{N}_L$.\n", "* Cross out all networks on the grid, for which the bias and the variance are not below the given thresholds. (In case all models are crossed out, the number of units or layers or the number of training samples or the number of epochs needs to be increased to yield a meaningful result.)\n", "* Amongst the remaining, find the smallest number $n_L$ of layers, for which there exist a number of units $n_u$ such that the model $(n_u, n_L)$ has not been crossed out. Amongst those, chose the one with the smallest number $n_u$ of units. \n", "\n", "**Output:** A number $(n_u, n_L)$ of units and layers for the network $\\operatorname{NN}$ such that the bias and the variance of the network on $(x,y)$ are within the threshold and the numbers $(n_u, n_L)$ are optimal within the given range.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advantages of the Algorithm\n", "\n", "* **Documented Choice:** The resulting grid of trained models yields transparent evidence on why a network topology has been chosen.\n", "* **Prevention of Overfitting:** Because both, the number of layers an units is increased from below, this method prevents overfitting the model.\n", "* **Topology Type Comparison (optional):** If one does not only want to select a certain number of layers and units for a given topology type like MLP, but wants to compare MLP vs LSTM, this comparison is more meaningful as the number of trainable weights is more comparable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LSTM: Long-Term-Short-Term-Memory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If one wants to compare the MLP with an [Long-Term-Short-Term-Memory network (LSTM)](https://nbviewer.jupyter.org/github/niknow/machine-learning-examples/blob/master/lstm_intro/lstm_intro.ipynb), one has to determine the number of weights in an LSTM as well. \n", "\n", "The number of weights in a single LSTM layer with $k$ features and $m$ units is given by \n", "\\begin{align*}\n", "\t\t4m^2 + 4(k+1)m.\n", "\\end{align*}\n", "\n", "For an LSTM with $n_L-1$ layers with $n_u$ units, $n_i$ inputs followed by a single dense output layer of dimension $n_o$, we obtain\n", "\\begin{align*}\n", " n_w & = 4 (2 n_L -3) n_u^2 + (4 n_i + n_o + 4 n_L - 4 )n_u + n_o.\n", "\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Applications\n", "* [How deep are financial models?](https://nbviewer.jupyter.org/github/niknow/machine-learning-examples/blob/master/network_topology_selection/how_deep_are_financial_models.ipynb): Learning the pricing function of a European Call option in Black-Scholes and Heston model" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }