{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Numba for Performance Gains - A Finance Example\n", "\n", "This post looks at how Numba can be used to speed up a finance example with minimal effort. The example looks at an application of Monte Carlo simulation for calculating a simplified version of Credit Valuation Adjustment on a portfolio. The first half explains the finance and setup and the second half show how we can speed it up and benefit from that speedup.\n", "\n", "[Credit Valuation Adjustment](https://en.wikipedia.org/wiki/Credit_valuation_adjustment) is a measure of the credit risk a bank has between its [counterparties](https://en.wikipedia.org/wiki/Counterparty). Specifically it is the adjustment to the fair value price of derivative instruments to account for [counterparty credit risk](https://en.wikipedia.org/wiki/Credit_risk#Counterparty_risk). This is an interesting problem to look at as a bank can have millions of credit instruments (referenced as deals in the code) outstanding with thousands of counterparties. There are many types of deals, each type having its own complexity in calculating price. Similarly, each counterparty has its own risks, default risk being the main concern with CVA. [Monte Carlo](https://en.wikipedia.org/wiki/Monte_Carlo_method) is applied to estimate the future values of these instruments along with the probability of default from the counterparty.\n", "\n", "The formula for CVA is as such,\n", "\n", "$$CVA = (1-R)\\sum_{i=0}^{n}q_i p_i$$\n", "\n", "where $q$ is the default probability, $R$ the [recovery rate](http://www.investopedia.com/terms/r/recovery-rate.asp) and $p$ is the price of the instrument. To simplify the calculation, we can ignore the recovery rate and assume the default probability of the counterparty is limited to a hazard rate such that:\n", "\n", "$$q_i = \\int_{0}^{i} q(\\lambda, t)dt$$\n", "\n", "Only two types of deals are looked at, [foreign exchange (forex) futures](https://en.wikipedia.org/wiki/Currency_future) and swaps. The swaps are based on the forex price. Assuming a fixed interest rate, the forex and swaps are easy to price. In the first cell below, we generate a random sample of deals, assigning values and hazard rates based on the weights below.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating our simulated bank portfolio\n", "\n", "We generate deals for each type of instrument and convert it to a numpy array. Note, the type and currency have been enumerated. This is done since numba does not support strings as are commonly used in code like this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inputs\n", "\n", "In this section we define our inputs. Most of these inputs are used to generate our deals. These include the number of deals, the ratio of counterparty risk, maturity, and value. We also define some of the inputs for the Monte Carlo, like the number of simulations and periods per year. These are defined here since we are pregenerating the normals at the moment. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import math\n", "import numpy as np\n", "import numpy.random as rand\n", "\n", "\n", "# Setup Configuration\n", "sims = 1000 # number of simulations\n", "n_forex = 1000 # number of forex deals\n", "n_swaps = 1000 # number of swap deals\n", "ppy = 12 # periods per yer\n", "min_maturity = 0.25 # minumum maturity (years)\n", "max_maturity = 10 # maximum maturity (years)\n", "min_value = 1000 # minimum dollar value of product\n", "max_value = 100000000 # maximum dollar value of product\n", "min_fixed = 0.01 # minimum fixed interest rate\n", "max_fixed = 0.15 # maximum fixed interest rate\n", "periods = max_maturity * ppy # total number of periods\n", "\n", "ratings = {0: 0.5, # 'A' counterparty ratings\n", " 1: 0.2, # 'B'\n", " 2: 0.1, # 'C'\n", " 3: 0.1, # 'D'\n", " 4: 0.1} # 'E'\n", "hazard = {0: 0.1, # hazard rates\n", " 1: 0.2,\n", " 2: 0.3,\n", " 3: 0.4,\n", " 4: 0.5}\n", "\n", "# percent swap deals in USD\n", "pct_swap_cur = {0: 0.6, # 'USD'\n", " 1: 0.4} # 'EUR'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note, instead of using letter ratings for the keys, we have enumerated using integers. Numba does not currently support string operations or dictionaries. This is one important consideration when working with Numba." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating our portfolio\n", "\n", "First lets make some functions to randomly assign a rating and a currency. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# make functions to sample from settings\n", "get_rating = lambda: rand.choice(list(ratings.keys()), 1, p=list(ratings.values()))[0]\n", "get_swap_cur = lambda: rand.choice(list(pct_swap_cur.keys()), 1, p=list(pct_swap_cur.values()))[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we create the deals using the inputs we defined above." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 0.00000000e+00, 8.91008600e+07, 1.00000000e+00,\n", " 1.38570132e-02, 4.00000000e+00, 5.00000000e-01,\n", " -1.00000000e+00])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# function to generate a forex deal\n", "def forex(i):\n", " r = get_rating()\n", " return (0, rand.randint(min_value, max_value), rand.randint(min_maturity*ppy, max_maturity*ppy)/ppy,\n", " rand.uniform(min_fixed, max_fixed), r, hazard[r], -1)\n", "\n", "# function to generate a swap deal\n", "def swap(i):\n", " r = get_rating()\n", " return (1, rand.randint(min_value, max_value), rand.randint(min_maturity*ppy, max_maturity*ppy)/ppy,\n", " rand.uniform(min_fixed, max_fixed), r, hazard[r], get_swap_cur())\n", "\n", "# vectorize to get functions to work with fromfunction\n", "vforex = np.vectorize(forex)\n", "vswap = np.vectorize(swap)\n", "\n", "# generate forex and swaps\n", "forexs = np.asarray(np.fromfunction(vforex, (n_forex,))).T\n", "swaps = np.asarray(np.fromfunction(vswap, (n_swaps,))).T\n", "\n", "# combine into one deal array\n", "deals = np.concatenate([forexs, swaps]) \n", "\n", "# print first row to show an example of one deal\n", "deals[0] # [type, value, maturity, fixed, rating, hazard, currency]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "# Writing the CVA function for Numba" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate standard normals and forex curves\n", "\n", "Here we pregenerate our standard normals and forex curves used for each time step in the simulation." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.08367271, 0.11661154],\n", " [ 0.05881636, 0.10628529],\n", " [ 0.08903457, 0.15040312],\n", " [ 0.06683134, 0.12186166],\n", " [ 0.1404406 , 0.04686952]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate standard normals\n", "znorms = rand.randn(sims, periods) # (simulations, max # of periods)\n", "\n", "# Generate forex curves\n", "def gen_curve(periods, xbar, alpha, vol, dt):\n", " curve = []\n", " sqrtdt = math.sqrt(dt)\n", " for i in range(periods):\n", " Z = rand.randn()\n", " cpast = 1 if i==0 else curve[i-1]\n", " curve.append(abs(alpha*xbar*cpast + vol*sqrtdt*(xbar+Z)))\n", " return curve\n", "\n", "# Generate forex curve for USD and EUR\n", "forex_curves = np.asarray(list(zip(gen_curve(periods, 0.1, 0.1, 0.3, 1/ppy), \n", " gen_curve(periods, 0.1, 0.1, 0.3, 1/ppy))))\n", "forex_curves[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CVA function\n", "\n", "This function will take the generated deals and curves and apply Monte Carlo to estimate CVA. It is pretty straight forward in that it loops through each deal, aggregating the individual CVA for each. Here, the returned value is a percentage of CVA relative to the total value of the deals. This provides a better way to look at than an arbitrary large number. With this value, we can say x% of our credit portfolio will likely be lost due to counterparty default. \n", "\n", "In this case, I use Numba's [`guvectorize`](http://numba.pydata.org/numba-doc/0.22.1/reference/jit-compilation.html#numba.guvectorize) to compile the function. The CVA function's arguments are the list of deals, parameters for the MC, and a scalar to store the result. `guvectorize` requires that the result be returned by modifying an input since it does not allow the function to return anything." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Vendor: Continuum Analytics, Inc.\n", "Package: numbapro\n", "Message: trial mode expires in 29 days\n" ] } ], "source": [ "from numbapro import guvectorize, float64\n", "\n", "\n", "def calc_CVA(deals, args, res):\n", " fxprice, vol, int_rate, recov_rate, sims = args # parse parameters\n", " \n", " # initialize variables\n", " CVA = 0.0\n", " tot_value = 0.0\n", " dt = 1/ppy\n", " sqrtdt = math.sqrt(dt)\n", " \n", " # calc CVA for each deal\n", " for k in range(len(deals)):\n", " deal_type, value, maturity, fixed, rating, hazard, currency = deals[k]\n", " tot_value += value\n", " \n", " # for each monte carlo simulation\n", " for i in range(int(sims)):\n", " xold = fxprice\n", " xnew = 0\n", " cva = 0.0\n", " \n", " # for each period\n", " exp_h_0 = 1 # j-1 exponential for q, pull out to optimize\n", " for j in range(1, int(maturity*ppy)-1): # skip t=0\n", " xnew = xold * (1 + vol * sqrtdt * znorms[i, j]) # fxprice at t=j\n", " exp_h_1 = math.exp(-hazard*j*dt)\n", " q = exp_h_0 - exp_h_1 # prob of default\n", " exp_h_0 = exp_h_1\n", " if deal_type == 0: # forex\n", " v = value * xnew * math.exp(-int_rate * j * dt)\n", " elif deal_type == 1: # swap\n", " v = value * (fixed - forex_curves[j, int(currency)]) * dt\n", " v = v if currency == 0 else v/xnew # convert to USD if EUR swap\n", " else:\n", " raise ValueError(\"Unknown deal type!\")\n", "\n", " cva += v*q\n", " CVA += cva/sims\n", " res[0] = (1-recov_rate)*CVA/tot_value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding numba\n", "\n", "Now we call the numba decorator to compile the cva function. Note, I call the decorator explicitly instead of using `@guvectorize` so I can preserve the pure python version to compare to the numba compiled versions. Also, note that I set `nopython=True` in the `cpu_calc` since the cpu target will fallback to python mode if there are compilation errors. The parallel and gpu target do not have this issue since they require compilation to work." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cpu_calc = guvectorize([(float64[:,:], float64[:], float64[:])], '(m,n),(k)->()', target='cpu', nopython=True)(calc_CVA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How is adding one line of code going to help? Let's find out..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Performance\n", "\n", "Below we see examples of this being ran with regular Python and numba targeting the cpu. In the 1000 simulation / 2000 deals scenario Python takes about 4 minutes compute in Python and about 2 seconds in numba. \n", "\n", "***That's a 200X speedup!!!***\n", "\n", "That is a remarkable difference. I didn't really have to change much either in order to get the function to compile with numba. The main challenge I encountered was setting the data types of the inputs correctly, namely enumerating string values.\n", "\n", "When I expand this to a 2,000,000 deal scenario, numba computes the CVA in about 4 minutes on my Macbook Air. There is still the opportunity to target the gpu and see how this would perform on a CUDA-capable machine." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "args = np.asarray([1.0, 0.3, 0.01, 0.9, sims]) # starting fxprice, vol, int_rate, simulations\n", "res = np.zeros(1)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 4min 9s, sys: 494 ms, total: 4min 9s\n", "Wall time: 4min 10s\n" ] }, { "data": { "text/plain": [ "0.024708592505027108" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time calc_CVA(deals, args, res)\n", "res[0]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1.83 s, sys: 15.2 ms, total: 1.85 s\n", "Wall time: 1.85 s\n" ] }, { "data": { "text/plain": [ "0.024708592505027108" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time cpu_calc(deals, args, res)\n", "res[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Analysis\n", "\n", "What does the extra performance buy us?\n", "\n", "Since we can now calculate CVA 150 times faster, we can more easily compute CVA under different scenarios. For example, a risk officer wants to get an idea of CVA for a range of volatilities. Or a trader wants to rebalance the type of counterparty risk. All of these can be determined on the fly (relatively)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example, what happens with different levels of the interest rate? As shown below, there is a clear behavior, with a slight convexity in the level of rates." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ " \n", "\n", "\n", " \n", "\n", "
\n", " \n", " BokehJS successfully loaded.\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "from bokeh.charts import Line, Scatter, Bar, show, output_notebook\n", "output_notebook()\n", "\n", "int_cva = []\n", "ints = np.arange(0, 0.10, 0.01)\n", "for i in ints:\n", " args = np.asarray([1.0, 0.3, i, 0.9, sims]) # starting fxprice, vol, int_rate, recov_rate, simulations\n", " res = np.zeros(1)\n", " cpu_calc(deals, args, res)\n", " int_cva.append(res[0])\n", " \n", "data = pd.DataFrame({'CVA': int_cva, 'int_rate': ints})\n", "line = Line(data, x='int_rate', y='CVA', title='CVA vs Interest Rate')\n", "show(line)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Conclusion\n", "\n", "There are really two types of performance here. One is the faster calculation time. The other is the speed in writing this tool.\n", "\n", "Speed of implementation is one of the highlights of Python. Numba offers the ability to achieve C-level performance with minimal code. Traders can be impatient people. Taking advantage of numba, ipython, bokeh, and ipython widgets, an interactive tool can be created in hours. No C, no java, and very minimal html/js." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "hide_input": true }, "outputs": [ { "data": { "application/javascript": [ "$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%javascript\n", "$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')" ] } ], "metadata": { "hide_input": true, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }