{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Demo for integrating robust GD into neural net backprop\n", "\n", "This is the main notebook for the companion repository to our tutorial paper Robust gradient descent via back-propagation: A Chainer-based tutorial.\n", "\n", "#### Contents:\n", "\n", "1. Implementing our first example\n", "\n", "1. A complete demo: Iris data with noisy inputs\n", "\n", " 1. Prepare the data set\n", " \n", " 1. Try out the learning algorithm(s)\n", " \n", " 1. Visualization of learning efficiency" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Implementing our first example" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import chainer.computational_graph as cg\n", "import chainer.functions as cfn\n", "import chainer as ch\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Step by step forward computations.\n", "x1 = ch.Variable(np.array([1.5], dtype=np.float32))\n", "x2 = ch.Variable(np.array([3.0], dtype=np.float32))\n", "u1 = x1**3\n", "u2 = x1*x2\n", "u3 = u1/x2\n", "u4 = cfn.sin(u2)\n", "u5 = u3+u4" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "g = cg.build_computational_graph([u5])\n", "with open(\"simple_ex_cpgraph.dot\", \"w\") as o:\n", " o.write(g.dump())" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "! dot -Tpdf simple_ex_cpgraph.dot -o simple_ex_cpgraph.pdf\n", "! dot -Tpng simple_ex_cpgraph.dot -o simple_ex_cpgraph.png" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See the computational graph below (refresh browser if required):\n", "\n", " \"computational\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute $\\bar{u}_{i}$ and $\\bar{x}_{i}$ quantities manually:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Reverse mode computations (by hand)\n", "ub5 = ch.Variable(np.array([1.0], dtype=np.float32))\n", "ub4 = ub5\n", "ub3 = ub5\n", "ub2 = ub4*cfn.cos(u2)\n", "ub1 = ub3/x2\n", "xb2 = -ub3*u1/x2**2 + ub2*x1\n", "xb1 = ub2*x2 + ub1*3*x1**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now use Chainer to compute the same quantities automatically:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Reverse mode computations (by Chainer)\n", "u5.backward(retain_grad=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, compare the outputs:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ub5 = 1.0000 vs. u5.grad = 1.0000\n", "ub4 = 1.0000 vs. u4.grad = 1.0000\n", "ub3 = 1.0000 vs. u3.grad = 1.0000\n", "ub2 = -0.2108 vs. u2.grad = -0.2108\n", "ub1 = 0.3333 vs. u1.grad = 0.3333\n", "xb2 = -0.6912 vs. x2.grad = -0.6912\n", "xb1 = 1.6176 vs. x1.grad = 1.6176\n" ] } ], "source": [ "print(\"ub5 = {:5.4f} vs. u5.grad = {:5.4f}\".format(ub5.data[0],u5.grad[0]))\n", "print(\"ub4 = {:5.4f} vs. u4.grad = {:5.4f}\".format(ub4.data[0],u4.grad[0]))\n", "print(\"ub3 = {:5.4f} vs. u3.grad = {:5.4f}\".format(ub3.data[0],u3.grad[0]))\n", "print(\"ub2 = {:5.4f} vs. u2.grad = {:5.4f}\".format(ub2.data[0],u2.grad[0]))\n", "print(\"ub1 = {:5.4f} vs. u1.grad = {:5.4f}\".format(ub1.data[0],u1.grad[0]))\n", "print(\"xb2 = {:5.4f} vs. x2.grad = {:5.4f}\".format(xb2.data[0],x2.grad[0]))\n", "print(\"xb1 = {:5.4f} vs. x1.grad = {:5.4f}\".format(xb1.data[0],x1.grad[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## A complete demo: Iris data with noisy inputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Prepare the data set\n", "\n", "First we acquire the famous Iris Data Set due to R.A. Fisher, included in the UCI Machine Learning Respository. We shall store the data in a directory called `data`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "! mkdir -p data\n", "! mkdir -p data/iris" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2019-04-09 09:33:08-- https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\n", "Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252\n", "Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 4551 (4.4K) [application/x-httpd-php]\n", "Saving to: ‘data/iris/iris.data’\n", "\n", "iris.data 100%[===================>] 4.44K --.-KB/s in 0s \n", "\n", "2019-04-09 09:33:08 (48.4 MB/s) - ‘data/iris/iris.data’ saved [4551/4551]\n", "\n", "--2019-04-09 09:33:08-- https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.names\n", "Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252\n", "Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 2998 (2.9K) [application/x-httpd-php]\n", "Saving to: ‘data/iris/iris.names’\n", "\n", "iris.names 100%[===================>] 2.93K --.-KB/s in 0s \n", "\n", "2019-04-09 09:33:09 (56.0 MB/s) - ‘data/iris/iris.names’ saved [2998/2998]\n", "\n" ] } ], "source": [ "! wget -nc https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data -P data/iris/\n", "! wget -nc https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.names -P data/iris/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we organize a few basic facts about the dataset, chiefly for clerical purposes." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Clerical stuff.\n", "_n = 150\n", "_n_tr = 100\n", "_n_te = _n - _n_tr\n", "_nf = 4 # number of features to start.\n", "_nc = 3 # number of classes.\n", "_label_dict = {\"Iris-setosa\": 0,\n", " \"Iris-versicolor\": 1,\n", " \"Iris-virginica\": 2}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Beware that this `iris.data` file contains one more line than is required (since there should be only 150 points). The 151st line is blank, and so we can just skip it." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "151 data/iris/iris.data\r\n" ] } ], "source": [ "! wc -l data/iris/iris.data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we use the `csv` module to read the text file and convert the data into a `numpy` array." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import os\n", "import csv\n", "import numpy as np\n", "from sklearn.decomposition import PCA\n", "\n", "toread = os.path.join(\"data\", \"iris\", \"iris.data\")\n", "\n", "X = np.zeros((_n,_nf), dtype=np.float32)\n", "y = np.zeros((_n,1), dtype=np.int8)\n", "\n", "with open(toread, newline=\"\") as f_table:\n", " \n", " f_reader = csv.reader(f_table, delimiter=\",\")\n", " \n", " i = 0\n", " for line in f_reader:\n", " X[i,:] = np.array(line[0:-1],\n", " dtype=X.dtype)\n", " y[i,:] = np.array(_label_dict[line[-1]],\n", " dtype=y.dtype)\n", " i += 1\n", " \n", " if i >= _n: # to skip the final blank line.\n", " break\n", " " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Shuffle up the data set (+ PCA dim reduction).\n", "my_pca = PCA(n_components=2, whiten=True) # PCA setup.\n", "shufidx = np.random.choice(X.shape[0], size=X.shape[0], replace=False)\n", "X = my_pca.fit_transform(X[shufidx,:]) # PCA mapping.\n", "y = y[shufidx,:]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Normalize the inputs in a per-feature manner (as max/min are vecs).\n", "maxvec = np.max(X, axis=0)\n", "minvec = np.min(X, axis=0)\n", "X = X-minvec\n", "with np.errstate(divide=\"ignore\", invalid=\"ignore\"):\n", " X = X / (maxvec-minvec)\n", " X[X == np.inf] = 0\n", " X = np.nan_to_num(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This ends our preparation of the dimension-reduced dataset; random perturbations to the inputs will be done at runtime." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Try out the learning algorithm(s)\n", "\n", "Let us call the demonstration experiment here `iris_noisyinputs`. We make a directory for saving results." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "import math\n", "import chainer as ch\n", "import get_model as gm\n", "import helpers as hlp\n", "import robustify as rob\n", "\n", "# Prepare a results directory.\n", "_task_name = \"iris_noisyinputs\"\n", "towrite = os.path.join(\"results\", _task_name)\n", "hlp.makedir_safe(dirname=towrite)\n", "\n", "# Data modification parameters.\n", "_rad_factor = 1e+4 # coefficient controlling how far to push data.\n", "_frac_to_move = 0.02 # the fraction of the training data to perturb." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Model/algorithm parameters.\n", "_num_trials = 20\n", "_t_max = 1000\n", "_num_records = 1000+1 # number of times to record performance.\n", "_lr = 0.5 # learning rate for steepest descent type of update.\n", "_num_units = 10 # number of units in each hidden layer.\n", "_robfn = rob.softmean # specification of robustifier.\n", "_todiv = _t_max // (_num_records-1)\n", "_mod_names = [\"deep\", \"deep-rob\"]\n", "_paras = {\"num_units\": _num_units,\n", " \"robfn\": _robfn}" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Working on deep...\n", "Trial number 0.\n", "Trial number 10.\n", "Done.\n", "Working on deep-rob...\n", "Trial number 0.\n", "Trial number 10.\n", "Done.\n" ] } ], "source": [ "# Main loop over methods.\n", "\n", "for mod_name in _mod_names:\n", " \n", " print(\"Working on {}...\".format(mod_name))\n", " \n", " # Prepare storage for performance metrics.\n", " loss_tr = np.zeros((_num_trials,_num_records), dtype=np.float32)\n", " loss_te = np.zeros((_num_trials,_num_records), dtype=np.float32)\n", " costs = np.zeros((_num_records,), dtype=np.uint32)\n", " \n", " for tri in range(_num_trials):\n", " \n", " if tri % 10 == 0:\n", " print(\"Trial number {}.\".format(tri))\n", " \n", " cost_counter = 0\n", " \n", " shufidx = np.random.choice(_n, size=_n, replace=False)\n", " X_tr = X[shufidx[0:_n_tr],:]\n", " y_tr = y[shufidx[0:_n_tr],:]\n", " X_te = X[shufidx[_n_tr:],:]\n", " y_te = y[shufidx[_n_tr:],:]\n", " \n", " # Add input noise randomly.\n", " x_center = np.mean(X_tr, axis=0)\n", " x_sd = np.std(\n", " np.sqrt(np.sum((X_tr-x_center)**2, axis=1))\n", " )\n", " idx_to_move = np.random.choice(_n_tr,\n", " size=math.ceil(_frac_to_move*_n_tr),\n", " replace=False)\n", " r_noise = _rad_factor * x_sd\n", " for i in idx_to_move:\n", " x_pre = X_tr[i,:]\n", " r_ratio = r_noise / np.sqrt(np.sum((x_pre-x_center)**2))\n", " X_tr[i,:] = -x_pre*r_ratio + (1+r_ratio)*x_center\n", " \n", " # Preparation of model and optimizer objects.\n", " mod = gm.get_model(mod_name=mod_name,\n", " nf=X.shape[1], nc=_nc,\n", " paras=_paras)\n", " opt = ch.optimizers.SGD(lr=_lr)\n", " opt.setup(mod)\n", " \n", " # Finally, run the learning algorithm for the initialized models.\n", " tval = 0\n", " idx = 0\n", " while tval < _t_max:\n", "\n", " # Evaluate current state.\n", " loss = ch.functions.softmax_cross_entropy(\n", " x=mod(X_tr),\n", " t=y_tr.flatten(),\n", " normalize=True,\n", " reduce=\"mean\"\n", " )\n", "\n", " # Get gradients.\n", " mod.cleargrads()\n", " loss.backward()\n", "\n", " # Parameter update.\n", " opt.update()\n", "\n", " # Cost update (assumes gradient descent using full batch).\n", " if tri == 0:\n", " cost_counter += X_tr.shape[0]\n", "\n", " # Record performance for the given iteration.\n", " if tval % _todiv == 0:\n", " \n", " if tri == 0:\n", " costs[idx] = cost_counter\n", " \n", " acc_tr = ch.functions.accuracy(y=mod(X_tr),\n", " t=y_tr.flatten()).data\n", " acc_te = ch.functions.accuracy(y=mod(X_te),\n", " t=y_te.flatten()).data\n", " loss_tr[tri,idx] = 1.0-acc_tr\n", " loss_te[tri,idx] = 1.0-acc_te\n", " \n", " idx += 1\n", "\n", " tval += 1\n", " \n", " # Record final performance if not done so already.\n", " if idx < _num_records:\n", " \n", " if tri == 0:\n", " costs[idx] = cost_counter\n", " \n", " acc_tr = ch.functions.accuracy(y=mod(X_tr),\n", " t=y_tr.flatten()).data\n", " acc_te = ch.functions.accuracy(y=mod(X_te),\n", " t=y_te.flatten()).data\n", " loss_tr[tri,idx] = 1.0-acc_tr\n", " loss_te[tri,idx] = 1.0-acc_te\n", " \n", " idx += 1\n", " \n", " # After running all trials, take stats and arrange into perf mtx.\n", " ave_loss_tr = np.mean(loss_tr, axis=0)\n", " ave_loss_te = np.mean(loss_te, axis=0)\n", " ave_loss = np.vstack((ave_loss_tr,ave_loss_te)).T # transpose it.\n", "\n", " # Write to disk.\n", " np.savetxt(fname=os.path.join(towrite, (mod_name+\".ave\")),\n", " X=ave_loss, fmt=\"%.7e\", delimiter=\",\")\n", " np.savetxt(fname=os.path.join(towrite, (mod_name+\".cost\")),\n", " X=costs, fmt=\"%d\", delimiter=\",\")\n", " \n", " del mod, opt\n", " \n", " print(\"Done.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Visualization of learning efficiency" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import os\n", "import matplotlib.pyplot as plt\n", "\n", "_fontsize = \"xx-large\"\n", "_task_name = \"iris_noisyinputs\"\n", "_mth_names = [\"deep\", \"deep-rob\"]\n", "_mth_colors = [\"black\", \"blue\"]\n", "_mth_markers = [\"-\", \"-\"]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "myfig = plt.figure(figsize=(12,6))\n", "\n", "perf_idx = 0\n", "ax1 = myfig.add_subplot(1,2,1)\n", "for i in range(len(_mth_names)):\n", " \n", " mth_name = _mth_names[i]\n", " mth_color = _mth_colors[i]\n", " \n", " toread = os.path.join(\"results\", _task_name)\n", " \n", " # Read in all relevant performance.\n", " with open(os.path.join(toread, (mth_name+\".ave\")), mode=\"r\", encoding=\"ascii\") as f:\n", " tmp_ave = np.loadtxt(fname=f, dtype=np.float, usecols=perf_idx, delimiter=\",\")\n", " \n", " with open(os.path.join(toread, (mth_name+\".cost\")), mode=\"r\", encoding=\"ascii\") as f:\n", " tmp_cost = np.loadtxt(fname=f, dtype=np.uint, delimiter=\",\")\n", " \n", " ax1.plot(tmp_cost, tmp_ave, _mth_markers[i],\n", " color=mth_color,\n", " label=mth_name) # option: semilogy with basey=10.\n", " ax1.tick_params(labelsize=_fontsize)\n", "\n", "ax1.legend(loc=\"best\",ncol=1, fontsize=_fontsize)\n", "plt.title(\"Misclass rate (train)\", size=_fontsize)\n", "plt.xlabel(\"Cost in gradients\", size=_fontsize)\n", "\n", "perf_idx = 1\n", "ax2 = myfig.add_subplot(1,2,2, sharey=ax1)\n", "for i in range(len(_mth_names)):\n", " \n", " mth_name = _mth_names[i]\n", " mth_color = _mth_colors[i]\n", " \n", " #print(\"Performance evaluation:\", mth_name)\n", " \n", " toread = os.path.join(\"results\", _task_name)\n", " \n", " # Read in all relevant performance.\n", " with open(os.path.join(toread, (mth_name+\".ave\")), mode=\"r\", encoding=\"ascii\") as f:\n", " tmp_ave = np.loadtxt(fname=f, dtype=np.float, usecols=perf_idx, delimiter=\",\")\n", " \n", " with open(os.path.join(toread, (mth_name+\".cost\")), mode=\"r\", encoding=\"ascii\") as f:\n", " tmp_cost = np.loadtxt(fname=f, dtype=np.uint, delimiter=\",\")\n", " \n", " ax2.plot(tmp_cost, tmp_ave, _mth_markers[i],\n", " color=mth_color,\n", " label=mth_name)# option: semilogy with basey=10.\n", " ax2.tick_params(labelsize=_fontsize)\n", "\n", "plt.title(\"Misclass rate (test)\", size=_fontsize)\n", "plt.xlabel(\"Cost in gradients\", size=_fontsize)\n", "\n", "plt.savefig(fname=\"results_iris_noisyinputs.pdf\", bbox_inches=\"tight\")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }