{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook is focused on the basics of using [MXNet](http://mxnet.io/) in [Julia](http://julialang.org/) to create a simple Multilayer Perceptron (MLP). This basic neural network building block is described more fully in [Wikipedia](https://en.wikipedia.org/wiki/Multilayer_perceptron). \n", "\n", "We will be making predictions on the famous MNIST data set, which is a labeled set of individual handwritten digits from zero to nine. A brief description is available in [Wikipedia](https://en.wikipedia.org/wiki/MNIST_database). The data is available at http://yann.lecun.com/exdb/mnist/, where there are also descriptions of the effectiveness of the different approaches applied to this data and references to related papers. Instead of this source, however, I will be using the data from [Kaggle](http://kaggle.com/) as I have learned a lot from the competitions there and like to use it as an example.\n", "\n", "I also intend to use this as brief tutorial on looking at data using the [Julia](http://julialang.org/) language. It assumes some basic knowledge of Julia, although someone who knows Python or R could probably figure it all out." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Get data\n", "\n", "You can get the data from https://www.kaggle.com/c/digit-recognizer/data. The Kaggle website describes the files as follows:\n", "\n", ">The data files `train.csv` and `test.csv` contain gray-scale images of hand-drawn digits, from zero through nine.\n", "\n", ">Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.\n", "\n", ">The training data set, `train.csv`, has 785 columns. The first column, called \"label\", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.\n", "\n", ">Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixelx is located on row i and column j of a 28 x 28 matrix, (indexing by zero).\n", "\n", ">For example, pixel31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below.\n", "\n", ">Visually, if we omit the \"pixel\" prefix, the pixels make up the image like this:\n", "```\n", "000 001 002 003 ... 026 027\n", "028 029 030 031 ... 054 055\n", "056 057 058 059 ... 082 083\n", " | | | | ... | |\n", "728 729 730 731 ... 754 755\n", "756 757 758 759 ... 782 783 \n", "```\n", "The test data set, `test.csv`, is the same as the training set, except that it does not contain the \"label\" column.\n", "\n", ">Your submission file should be in the following format: For each of the 28000 images in the test set, output a single line with the digit you predict. For example, if you predict that the first image is of a 3, the second image is of a 7, and the third image is of a 8, then your submission file would look like:\n", "\n", ">3\n", "\n", ">7\n", "\n", ">8\n", "\n", ">(27997 more lines)\n", "\n", "\n", "Of course, we can learn a lot of this by just looking at the data. The code below assumes you downloaded the csv files and put them in a folder named `data`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "using DataFrames, CSV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few comments for those new to Julia. Most people start working with the REPL where the contents of any line you type is immediately evaluated and returned. Working with Jupyter notebooks using iJulia allows the possibility of depicting such results graphically, and in the case of `DataFrames`, they are rendered in a nice tabular format. In this case, by following the line with a semicolon, I will suppress the normal output as I don't want to potentially clutter the notebook with a table that has 42000 rows in it. I encourage you to download this notebook and experiment with it, removing the semicolon and seeing what the raw data looks like. Of course, you can use functions like `head()` and `tail()` as demonstrated in the [Stats](#Stats) section.\n", "\n", "I am also using the `@time` macro here to as I like to track how long different processes take and memory involved. You can delete the leading `@time` from any command line if you just want to execute the operation." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 6.867796 seconds (15.87 M allocations: 751.361 MiB, 6.01% gc time)\n" ] } ], "source": [ "@time train = CSV.read(\"data/train.csv\");" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

6 rows × 785 columns (omitted printing of 774 columns)

labelpixel0pixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8pixel9
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
110000000000
200000000000
310000000000
440000000000
500000000000
600000000000
" ], "text/latex": [ "\\begin{tabular}{r|cccccccccccc}\n", "\t& label & pixel0 & pixel1 & pixel2 & pixel3 & pixel4 & pixel5 & pixel6 & pixel7 & pixel8 & pixel9 & \\\\\n", "\t\\hline\n", "\t& Int64 & Int64 & Int64 & Int64 & Int64 & Int64 & Int64 & Int64 & Int64 & Int64 & Int64 & \\\\\n", "\t\\hline\n", "\t1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & $\\dots$ \\\\\n", "\t2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & $\\dots$ \\\\\n", "\t3 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & $\\dots$ \\\\\n", "\t4 & 4 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & $\\dots$ \\\\\n", "\t5 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & $\\dots$ \\\\\n", "\t6 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & $\\dots$ \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "6×785 DataFrame. Omitted printing of 777 columns\n", "│ Row │ label │ pixel0 │ pixel1 │ pixel2 │ pixel3 │ pixel4 │ pixel5 │ pixel6 │\n", "│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼───────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤\n", "│ 1 │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │\n", "│ 2 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │\n", "│ 3 │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │\n", "│ 4 │ 4 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │\n", "│ 5 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │\n", "│ 6 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first(train, 6)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "(42000, 785)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# to get the size/shape of the DataFrame\n", "size(train)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "28×28 Array{Int64,2}:\n", " 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 15 94 89 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 89 220 253 251 214 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 240 253 253 253 218 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 … 253 253 253 250 95 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 195 80 94 131 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 25 0 0 0 0 0 0 0 0\n", " ⋮ ⋮ ⋱ ⋮ ⋮ \n", " 0 0 0 0 0 0 0 0 29 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 80 207 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 123 247 253 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 191 248 253 235 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 188 250 253 208 77 … 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 255 253 167 13 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 94 93 10 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# first element is label, rest is 28x28=784 pixel values, so to look at first row\n", "reshape([train[1,col] for col in 2:785], 28, 28)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several plotting libraries available for Julia. [Plots.jl](https://juliaplots.github.io/) is one of my favorites as it is developing an excellent ecosystem that allows multiple backends including [PyPlot](https://github.com/JuliaPy/PyPlot.jl) which is a wrapper around [Matplotlib](http://matplotlib.org/) and [Plotly](https://github.com/sglyon/PlotlyJS.jl) which is an interface to the [plotly.js](https://plot.ly/javascript) visualization library. Use StatPlots if you want to plot a DataFrame directly." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Recompiling stale cache file /home/milton/.julia/compiled/v1.1/PyPlot/oatAj.ji for PyPlot [d330b81b-6aea-500a-939a-2ce795aea3ee]\n", "└ @ Base loading.jl:1184\n" ] }, { "data": { "text/plain": [ "Plots.PyPlotBackend()" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Plots\n", "# I like Plotly for interactivity. \n", "plotly(legend=false)\n", "# to use it on GitHub, you have to setup Plotly Online and inject their code into the notebook\n", "# so just switch to PyPlot\n", "pyplot()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 12.920942 seconds (27.09 M allocations: 1.324 GiB, 7.05% gc time)\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAGQCAYAAAByNR6YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFPxJREFUeJzt3X+M1/Wd4PHXF0ZADhvuBmXPwPjdiUxJC9uhcxrOQ5DEZimFDVk0JouENiXTpvRY7zzdXvZue79ktcfBSTuu9rYxFbNTrJiz3a4RGpNSco21/Vos0drpDshMt/wod2RdBGSYz/3hOlEZTL/fvuDzneHx+Au+8Mr7la9fw5P3l5lvpSiKIgAASDOh7AUAAMYbgQUAkExgjVNvvPFG1Gq1eOONN8peBQAuOy1lL8DF8bOf/Sy6uroiYmJEVMpeB4CLpCjOlr0Co3CDBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1hN6PTp07Fq1aro6OiIzs7OWLZsWRw8eDAiIm655ZZob2+Pzs7O6OzsjK1bt5a7LABwnpayF2B03d3d8fGPfzwqlUp85Stfie7u7ti1a1dERGzbti1WrFhR8oYAwIW4wWpCU6ZMieXLl0elUomIiIULF0Z/f3/JWwEAvymBNQZs27YtVq5cOfLze+65J+bPnx933HGH8AKAJiSwmtymTZuir68v7rvvvoiI2L59e7zyyivx0ksvxc033+ytQgBoQpWiKIqyl2B0mzdvjm984xvx3e9+N6ZPnz7q75kyZUr88pe/jNbW1nc9XqvVoqurKyImRkTl4i8LQCmK4mzZKzAKN1hNasuWLdHb2xu7d+8eiauhoaE4cuTIyO/ZuXNnzJw587y4AgDK5asIm9Dg4GDcfffd0d7eHkuXLo2IiMmTJ8dzzz0Xn/jEJ+LMmTMxYcKEmDFjRnzrW98qeVsA4L0EVhOaNWtWXOid2x/96EeXeBsAoF7eIgQASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgWUvZC0C9/vx3u+ueGY5KQ2f9p4En6545O3S8obMAGD/cYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJGspewGo1z39N9U9M3zuTENndX/2ubpn2h6b3NBZp978u4bmAGg+brAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIVimKoih7CfLVarXo6uqKiIkRUSl7nVSn/3xq3TPD/+a/NnRWS8u0umcm/O9/19BZf7xhbd0zPUceaugsYPwoirNlr8Ao3GABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACSrFEVRlL0E+Wq1WnR1dUXExIiolL1O6c5sntzQ3NnP/VndM5MmzWjorDcHn6175oHF1zR01v88urvumddP/21DZwEXV1GcLXsFRuEGCwAgmcBqQqdPn45Vq1ZFR0dHdHZ2xrJly+LgwYMREXH06NFYtmxZzJkzJ+bNmxd79+4td1kA4DwCq0l1d3fHq6++Gj/5yU9ixYoV0d3dHRERX/jCF2LhwoXR19cXjz76aKxZsyaGhoZK3hYAeCeB1YSmTJkSy5cvj0rlrX87tXDhwujv74+IiCeeeCI2bNgQERE33HBDzJw50y0WADQZgTUGbNu2LVauXBnHjx+P4eHhuPrqq0d+rVqtxqFDh0rcDgB4r5ayF+D9bdq0Kfr6+uLhhx+OU6dOjdxqvc0XgQJA83GD1cQ2b94cTz31VDzzzDMxderUaG1tjYiIY8eOjfye1157Ldra2spaEQAYhcBqUlu2bIne3t7YvXt3TJ8+feTx22+/PXp6eiIi4oUXXojDhw/HokWLyloTABiFtwib0ODgYNx9993R3t4eS5cujYiIyZMnx/PPPx8PPPBArF27NubMmROTJk2K7du3R0uL/4wA0Ez8ydyEZs2adcF/WzVz5szYtWvXJd4IAKiHtwgBAJIJLACAZD7seZzyYc85/rrr9rpnluyZ19BZkyf/TkNzjXj9X99f90zrXxy4CJsAvy0f9tyc3GABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACRrKXsBaGYrfvzNumfuuv7qhs7a/OB/r3tm+A//R0NnTfqP/6rumYH/94GGztq6p/6ztvzdQw2dBdAs3GABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQrFIURVH2EuSr1WrR1dUVERMjolL2OvwGZk1bXPfMwcf2NXTW8B98qaG5Rpw+/kLdMzNn/01DZ7157vWG5s41OAfNoCjOlr0Co3CDBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQrKXsBYC3DP7DnrpnPrG2u6Gz/vrvJzY014grZyyse+bvT9U/ExERf/nZhsamfO6qumfOnXu9obOAy4MbLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZC1lLwA07tmTX21o7kvXV+qe+ZOtTzR01vCK+xuaa0Rl/cMNzZ0a+mzdM3f9l881dNZDRx5qaA4YW9xgAQAkE1hNaOPGjVGtVqNSqcT+/ftHHq9WqzF37tzo7OyMzs7O2LFjR4lbAgAX4i3CJnTbbbfFvffeG4sWLTrv15588smYN29eCVsBAL8pgdWEFi9eXPYKAMBvwVuEY8yaNWti/vz5sX79+jh27FjZ6wAAoxBYY8iePXti3759UavVorW1NdatW1f2SgDAKLxFOIa0tbVFRMQVV1wRd911V3R0dJS8EQAwGjdYY8TJkyfjxIkTIz/v7e2NBQsWlLgRAHAhbrCa0IYNG+Lpp5+Ow4cPx6233hrTpk2LXbt2xerVq+PcuXNRFEW0t7fHY489VvaqAMAoBFYT6unpiZ6envMef/HFF0vYBgCol7cIAQCSCSwAgGSVoiiKspcgX61Wi66uroiYGBH1f7AvvNeMqZ0NzR1+/EDdM8N/8KWGzqpUJjY0VxTn6p55881fN3TWgx8+VPfMnx54pKGzuDwUxdmyV2AUbrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJJViqIoyl6CfLVaLbq6uiJiYkRUyl6Hy9i0yb9b98xVV/xOQ2cN/OW+huaGb9va0FwjTp88WPfM5P+1paGzZn/x9+qeOXryhYbOojxFcbbsFRiFGywAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACS+bDnccqHPXM5mn7lhxqa61v7Zt0z/3TRaw2dNfxHX2loriF/saHukXs3dTd01MwpQ3XP/Mt//quGzrrl/3y7obnxyoc9Nyc3WAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAySpFURRlL0G+Wq0WXV1dETExIiplrwPjzvQrP9TQ3OE/O1j3zIR7tjZ0VrMbGvqHhua+/KFX6575k/6vNnTWWFAUZ8tegVG4wQIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZD3sep3zYMzSnfzb19+qe+W+z/0VDZ925p/6/Q09pvaGhsy6pvQ/UPTLplv6LsEhz8GHPzckNFgBAMoEFAJBMYDWhjRs3RrVajUqlEvv37x95vK+vL2666abo6OiIG2+8MV5++eUStwQALkRgNaHbbrst9u7dG9ddd927Hv/MZz4T3d3d8fOf/zzuvffe+PSnP13ShgDA+xFYTWjx4sUxa9asdz129OjRqNVqceedd0ZExOrVq+PAgQNx8ODBEjYEAN6PwBojBgYG4tprr42WlpaIiKhUKtHW1haHDh0qeTMA4L0E1hhSqbz72y34DhsA0JwE1hgxe/bsGBwcjKGhoYh4K64GBgaira2t5M0AgPcSWGPENddcEwsWLIjHH388IiJ27twZ1Wo1qtVquYsBAOcRWE1ow4YNMWvWrBgcHIxbb701rr/++oiIeOSRR+KRRx6Jjo6OuP/+++NrX/tayZsCAKNpKXsBztfT0xM9PT3nPf7BD34wfvCDH5SwEQBQDzdYAADJBBYAQDJvEQJcQv/3jZfqnvncq/XPREQcueGzdc/86V9taeisWPhv6x4ZHj7d0FETT59qaA4uJTdYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJKsURVGUvQT5arVadHV1RcTEiKiUvQ4wRvyTydc1NPcfrv39umdOn2vs7/j/+dDDDc2NV0VxtuwVGIUbLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZC1lLwBA8zh55rWG5v79ga8mbwJjmxssAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJrDGoWq3G3Llzo7OzMzo7O2PHjh1lrwQAvENL2QvQmCeffDLmzZtX9hoAwCjcYAEAJBNYY9SaNWti/vz5sX79+jh27FjZ6wAA7yCwxqA9e/bEvn37olarRWtra6xbt67slQCAd6gURVGUvQSN+9WvfhUdHR3x+uuvv+vxWq0WXV1dETExIiql7AbAxVcUZ8tegVG4wRpjTp48GSdOnBj5eW9vbyxYsKDEjQCA9/JVhGPMkSNHYvXq1XHu3LkoiiLa29vjscceK3stAOAdBNYY097eHi+++GLZawAA78NbhAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJCspewFuDhOnTr1jz8qSt0DgIurVqvF3LlzY+rUqWWvwjsIrHHq4MGD//ij4TLXAOAi6+rqih//+Mfx0Y9+tOxVeIdKURSuOMahX//61/Hss89GtVqNK6+8sux1ALiI3GA1H4EFAJDMP3IHAEgmsAAAkgksxoVqtRpz586Nzs7O6OzsjB07dpS90iW1cePGqFarUalUYv/+/SOP9/X1xU033RQdHR1x4403xssvv1zilpfGhZ6Ly/U1cvr06Vi1alV0dHREZ2dnLFu2bOSLYI4ePRrLli2LOXPmxLx582Lv3r3lLnsJvN/zccstt0R7e/vIa2Tr1q3lLsvYVsA4cN111xU//elPy16jNN/73veKgYGB856HpUuXFo8++mhRFEXxzW9+s1i4cGFJG146F3ouLtfXyKlTp4rvfOc7xfDwcFEURfHlL3+5+NjHPlYURVF86lOfKr74xS8WRVEUP/zhD4u2trbi7NmzZa16Sbzf87FkyZLi29/+dpnrMY64wYJxYPHixTFr1qx3PXb06NGo1Wpx5513RkTE6tWr48CBA+/4Fh7j02jPxeVsypQpsXz58qhUKhERsXDhwujv74+IiCeeeCI2bNgQERE33HBDzJw5c9zfYr3f8wGZBBbjxpo1a2L+/Pmxfv36OHbsWNnrlG5gYCCuvfbaaGl569vdVSqVaGtri0OHDpW8WXm8RiK2bdsWK1eujOPHj8fw8HBcffXVI79WrVYvu9fH28/H2+65556YP39+3HHHHcKL34rAYlzYs2dP7Nu3L2q1WrS2tsa6devKXqkpvP239LcVl/F3ZfEaidi0aVP09fXFfffdFxFeH+99PrZv3x6vvPJKvPTSS3HzzTfHihUrSt6QsUxgMS60tbVFRMQVV1wRd911V3z/+98veaPyzZ49OwYHB2NoaCgi3vrDc2BgYOS5utxc7q+RzZs3x1NPPRXPPPNMTJ06NVpbWyMi3nWT99prr102r4/3Ph8Rb/0/E/FWeH7+85+P/v7+OH78eJlrMoYJLMa8kydPxokTJ0Z+3tvbGwsWLChxo+ZwzTXXxIIFC+Lxxx+PiIidO3dGtVqNarVa7mIluNxfI1u2bIne3t7YvXt3TJ8+feTx22+/PXp6eiIi4oUXXojDhw/HokWLylrzkhnt+RgaGoojR46M/J6dO3fGzJkzR0IU6uU7uTPm9ff3x+rVq+PcuXNRFEW0t7fHgw8+eFmFxIYNG+Lpp5+Ow4cPx4wZM2LatGnxi1/8Il599dX45Cc/GcePH48PfOAD8fWvfz0+/OEPl73uRTXac7Fr167L9jUyODgYs2fPjvb29rjqqqsiImLy5Mnx/PPPx5EjR2Lt2rVx4MCBmDRpUjz00EOxZMmSkje+uC70fDz33HOxZMmSOHPmTEyYMCFmzJgRW7ZsiY985CMlb8xYJbAAAJJ5ixAAIJnAAgBIJrAAAJL9f8nsY6ICWfsMAAAAAElFTkSuQmCC" }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# heatmap() plots a 2D array\n", "@time heatmap(reshape([train[1,col] for col in 2:785], 28, 28), aspect_ratio=:equal)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.071421 seconds (110.27 k allocations: 5.517 MiB)\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAGQCAYAAAByNR6YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFEJJREFUeJzt3X+M1PeZ2PFn8PLzcERvsbexYD23Cht0gWbI1hbnYjCSoyMETkjYsnTYIrmgTXQbIXSuiatr66pX06Sl0JDgi9OLrIDVDY6x6uRybiByFcI1cpxsgoPsOGsBhs0dP8KJqsWA2d1v//B5he0FZSYPfGeX1+sv78Cjz6PxWH7zmWWnUhRFEQAApJlQ9gIAAOONwAIASCawxqk33ngj+vr64o033ih7FQC47rSUvQBXxy9+8Yvo6uqKiBsiolL2OgBcJUVxsewVGIUbLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcBqQufPn49Vq1ZFZ2dn1Gq1WLZsWRw5ciQiIu66667o6OiIWq0WtVottm7dWu6yAMB7tJS9AKPr7u6Oj33sY1GpVOLLX/5ydHd3x549eyIiYtu2bbFixYqSNwQALscNVhOaMmVKLF++PCqVSkRELFy4MA4dOlTyVgDAb0pgjQHbtm2LlStXjnz90EMPxfz58+O+++4TXgDQhARWk9u0aVP09/fHo48+GhERO3fujFdeeSVeeumluPPOO71VCABNqFIURVH2Eoxu8+bN8Y1vfCO+973vxYwZM0b9PVOmTIlf/epX0dra+o7H+/r6oqurKyJuiIjK1V8WgFIUxcWyV2AUbrCa1JYtW6K3tzf27t07EleDg4Nx4sSJkd+ze/fuaGtre09cAQDl8rcIm9DAwEA8+OCD0dHREUuXLo2IiMmTJ8fzzz8fH//4x+PChQsxYcKEmDlzZnzrW98qeVsA4N0EVhOaNWtWXO6d2x//+MfXeBsAoF7eIgQASCawAACSCSwAgGS+BwuAEef/47SG5loefqzumf/c8b8bOutfHf5qQ3NwLbnBAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJkPewYYpy5snlz3zNBn/6Khs4aHLtQ/E5WGzoKxwA0WAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAECylrIXAODK/rrr3obmLv5pre6ZSS3TGzrr/3xmc90z/+7YPzR0FowFbrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBI5sOeAa6hDe//07pnluxra+isSZNm1j0z4X/8y4bOat9xY90zFwdPN3QWjAVusAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkrWUvQDAWDRr+uKG5jZ/cUfdM8OT/0tDZ7058N26Zz7X80BDZ51787GG5mC8coMFAJBMYDWh8+fPx6pVq6KzszNqtVosW7Ysjhw5EhERJ0+ejGXLlsWcOXNi3rx5sX///nKXBQDeQ2A1qe7u7nj11VfjZz/7WaxYsSK6u7sjIuLhhx+OhQsXRn9/fzzxxBOxZs2aGBwcLHlbAOBSAqsJTZkyJZYvXx6VSiUiIhYuXBiHDh2KiIinnnoqenp6IiLitttui7a2NrdYANBkBNYYsG3btli5cmWcPn06hoeH46abbhr5tWq1GkePHi1xOwDg3fwtwia3adOm6O/vj6985Stx7ty5kVuttxVFUdJmAMDluMFqYps3b45nnnkmnnvuuZg2bVq0trZGRMSpU6dGfs/rr78e7e3tZa0IAIxCYDWpLVu2RG9vb+zduzdmzJgx8vi9994b27dvj4iIF198MY4fPx6LFi0qa00AYBTeImxCAwMD8eCDD0ZHR0csXbo0IiImT54cL7zwQnzhC1+IBx54IObMmROTJk2KnTt3RkuLf40A0Ez8n7kJzZo167LfW9XW1hZ79uy5xhsBAPXwFiEAQDKBBQCQzFuEwHXvD3+nu+6Z7+zY1dBZw3/U2Ac3N+ILi2+ue2b7CR/aDBncYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJGspewGALI/+3qcbmtv42m11zwwXXQ2ddeHkD+qeefMv/rahs/7rSX+GhrL4rw8AIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkPuwZaDozp9Uamvvc1qcamiui/g97btSZP/t53TOzv3H4KmwCXE1usAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkrWUvQAwvk2f/Ht1zxx/8nBDZw2v+E8NzVUamDl/+sWGztq67180MHWgobOA8rjBAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBI1lL2AsD4duPEf1r3zPAfPXwVNsnVNvtvGpo79+bfJW8CNCM3WAAAyQRWE1q/fn1Uq9WoVCpx8ODBkcer1WrMnTs3arVa1Gq12LVrV4lbAgCX4y3CJnTPPffExo0bY9GiRe/5taeffjrmzZtXwlYAwG9KYDWhxYsXl70CAPBb8BbhGLNmzZqYP39+rFu3Lk6dOlX2OgDAKATWGLJv3744cOBA9PX1RWtra6xdu7bslQCAUXiLcAxpb2+PiIiJEyfGhg0borOzs+SNAIDRuMEaI86ePRtnzpwZ+bq3tzcWLFhQ4kYAwOW4wWpCPT098eyzz8bx48fj7rvvjunTp8eePXti9erVMTQ0FEVRREdHR+zYsaPsVQGAUVSKoijKXoJ8fX190dXVFRE3RESl7HW4jr1/+h/UPfP6mU/kL3IFlcoNdc+8b+q/begsP8mdbEVxsewVGIW3CAEAkgksAIBkvgcL+I3MmPr7Dc0d+6sDdc8UDbxl91v5q8/UPfLm0NSrsAgwXrjBAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBI1lL2AsDY0P/Amw3NDd+ztf6hYqihs+LxnobGpq6fWvfM0ND/begs4PrgBgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkPuwZrkMzpv5+3TP/ZFF/Q2cNNzDz5pu/buisjf++u6G5oaHHGpoDuBw3WAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyVrKXgBo3O9O+2cNzf3dv3mt7pnhP/5yQ2edP3uk7pkv1f6hobMeO/F4Q3MA2dxgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkMyHPcMY9h9m//OG5iY81JO8yeVN/m9b6p7588MXrsImANeOGywAgGQCCwAgmcBqQuvXr49qtRqVSiUOHjw48nh/f3/ccccd0dnZGbfffnu8/PLLJW4JAFyOwGpC99xzT+zfvz9uvfXWdzz+6U9/Orq7u+OXv/xlbNy4MT71qU+VtCEAcCUCqwktXrw4Zs2a9Y7HTp48GX19fXH//fdHRMTq1avj8OHDceTIkRI2BACuRGCNEceOHYtbbrklWlre+ouflUol2tvb4+jRoyVvBgC8m8AaQyqVyju+LoqipE0AgCsRWGPE7NmzY2BgIAYHByPirbg6duxYtLe3l7wZAPBuAmuMuPnmm2PBggXx5JNPRkTE7t27o1qtRrVaLXcxAOA9BFYT6unpiVmzZsXAwEDcfffd8YEPfCAiIh5//PF4/PHHo7OzMz7/+c/H1772tZI3BQBG46NymtD27dtj+/bt73n8gx/8YPzwhz8sYSMAoB5usAAAkgksAIBk3iKEJvFI+2fqnrl/3+BV2OQy/rKnobHZjyxoYOrFhs4CaBZusAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEjmw54h2e9MvrWhuT//7/+r/qHWP2vorEZs3NTd0NzJs48lbwLQ/NxgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkayl7ARhv/vUtf9jY4MKu3EWStU0ZLHsFgDHDDRYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJfNgzJDs/1NifW4aHz9c9M2HClIbOGhz8f3XP/MH7/76hs+JwY2MAY5kbLACAZAILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZJWiKIqylyBfX19fdHV1RcQNEVEpex1+Axf/5/vrnimmTG3orK1/cnfdM5879NWGzgKurqK4WPYKjMINFgBAMoE1BlWr1Zg7d27UarWo1Wqxa9euslcCAC7RUvYCNObpp5+OefPmlb0GADAKN1gAAMkE1hi1Zs2amD9/fqxbty5OnTpV9joAwCUE1hi0b9++OHDgQPT19UVra2usXbu27JUAgEv4HqwxqL29PSIiJk6cGBs2bIjOzs6SNwIALuUGa4w5e/ZsnDlzZuTr3t7eWLBgQYkbAQDv5gZrjDlx4kSsXr06hoaGoiiK6OjoiB07dpS9FgBwCYE1xnR0dMRPf/rTstcAAK7AW4QAAMkEFgBAMoEFAJDM92BBk5i47O+v4WlfvYZnAVx/3GABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAECylrIX4Oo4d+7cP/5TUeoeAFxdfX19MXfu3Jg2bVrZq3AJgTVOHTly5B//abjMNQC4yrq6uuInP/lJfOQjHyl7FS5RKYrCFcc49Otf/zq++93vRrVajalTp5a9DgBXkRus5iOwAACS+SZ3AIBkAgsAIJnAYlyoVqsxd+7cqNVqUavVYteuXWWvdE2tX78+qtVqVCqVOHjw4Mjj/f39cccdd0RnZ2fcfvvt8fLLL5e45bVxuefien2NnD9/PlatWhWdnZ1Rq9Vi2bJlI38J5uTJk7Fs2bKYM2dOzJs3L/bv31/ustfAlZ6Pu+66Kzo6OkZeI1u3bi13Wca2AsaBW2+9tfj5z39e9hql+f73v18cO3bsPc/D0qVLiyeeeKIoiqL45je/WSxcuLCkDa+dyz0X1+tr5Ny5c8V3vvOdYnh4uCiKovjSl75UfPSjHy2Koig++clPFo888khRFEXxox/9qGhvby8uXrxY1qrXxJWejyVLlhTf/va3y1yPccQNFowDixcvjlmzZr3jsZMnT0ZfX1/cf//9ERGxevXqOHz48CU/wmN8Gu25uJ5NmTIlli9fHpVKJSIiFi5cGIcOHYqIiKeeeip6enoiIuK2226Ltra2cX+LdaXnAzIJLMaNNWvWxPz582PdunVx6tSpstcp3bFjx+KWW26Jlpa3ftxdpVKJ9vb2OHr0aMmblcdrJGLbtm2xcuXKOH36dAwPD8dNN9008mvVavW6e328/Xy87aGHHor58+fHfffdJ7z4rQgsxoV9+/bFgQMHoq+vL1pbW2Pt2rVlr9QU3v5T+tuK6/insniNRGzatCn6+/vj0UcfjQivj3c/Hzt37oxXXnklXnrppbjzzjtjxYoVJW/IWCawGBfa29sjImLixImxYcOG+MEPflDyRuWbPXt2DAwMxODgYES89T/PY8eOjTxX15vr/TWyefPmeOaZZ+K5556LadOmRWtra0TEO27yXn/99evm9fHu5yPirf9mIt4Kz89+9rNx6NChOH36dJlrMoYJLMa8s2fPxpkzZ0a+7u3tjQULFpS4UXO4+eabY8GCBfHkk09GRMTu3bujWq1GtVotd7ESXO+vkS1btkRvb2/s3bs3ZsyYMfL4vffeG9u3b4+IiBdffDGOHz8eixYtKmvNa2a052NwcDBOnDgx8nt2794dbW1tIyEK9fKT3BnzDh06FKtXr46hoaEoiiI6Ojrii1/84nUVEj09PfHss8/G8ePHY+bMmTF9+vR47bXX4tVXX41PfOITcfr06Xjf+94XX//61+NDH/pQ2eteVaM9F3v27LluXyMDAwMxe/bs6OjoiBtvvDEiIiZPnhwvvPBCnDhxIh544IE4fPhwTJo0KR577LFYsmRJyRtfXZd7Pp5//vlYsmRJXLhwISZMmBAzZ86MLVu2xIc//OGSN2asElgAAMm8RQgAkExgAQAkE1gAAMn+P8Rf9OpKjHrnAAAAAElFTkSuQmCC" }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# to see it upright, use the rotl90() function to rotate left 90 degrees\n", "# there are also rotr90() and rot180() functions\n", "@time heatmap(rotl90(reshape([train[1,col] for col in 2:785], 28, 28)), aspect_ratio=:equal)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "" }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can use an array comprehension to plot the first sixteen rows\n", "# the Plots library automatically resizes everything for you and adds the axes\n", "plot([heatmap(rotl90(reshape([train[i,col] for col in 2:785], 28, 28)), aspect_ratio=:equal) for i=1:16]...)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have confirmed that the data really are characters that are arranged as described in the data description, we will take a brief eyeball on the data just to make sure there is nothing strange." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "\"42000×785 DataFrame\"" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note that summary in Julia just gives basic information as it is supposed to work on ANY datatype\n", "summary(train)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "

785 rows × 8 columns

variablemeanminmedianmaxnuniquenmissingeltype
SymbolFloat64Int64Float64Int64NothingNothingDataType
1label4.4566404.09Int64
2pixel00.000.00Int64
3pixel10.000.00Int64
4pixel20.000.00Int64
5pixel30.000.00Int64
6pixel40.000.00Int64
7pixel50.000.00Int64
8pixel60.000.00Int64
9pixel70.000.00Int64
10pixel80.000.00Int64
11pixel90.000.00Int64
12pixel100.000.00Int64
13pixel110.000.00Int64
14pixel120.00300.0116Int64
15pixel130.011190500.0254Int64
16pixel140.0051428600.0216Int64
17pixel150.00021428600.09Int64
18pixel160.000.00Int64
19pixel170.000.00Int64
20pixel180.000.00Int64
21pixel190.000.00Int64
22pixel200.000.00Int64
23pixel210.000.00Int64
24pixel220.000.00Int64
25pixel230.000.00Int64
26pixel240.000.00Int64
27pixel250.000.00Int64
28pixel260.000.00Int64
29pixel270.000.00Int64
30pixel280.000.00Int64
" ], "text/latex": [ "\\begin{tabular}{r|cccccccc}\n", "\t& variable & mean & min & median & max & nunique & nmissing & eltype\\\\\n", "\t\\hline\n", "\t& Symbol & Float64 & Int64 & Float64 & Int64 & Nothing & Nothing & DataType\\\\\n", "\t\\hline\n", "\t1 & label & 4.45664 & 0 & 4.0 & 9 & & & Int64 \\\\\n", "\t2 & pixel0 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t3 & pixel1 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t4 & pixel2 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t5 & pixel3 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t6 & pixel4 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t7 & pixel5 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t8 & pixel6 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t9 & pixel7 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t10 & pixel8 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t11 & pixel9 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t12 & pixel10 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t13 & pixel11 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t14 & pixel12 & 0.003 & 0 & 0.0 & 116 & & & Int64 \\\\\n", "\t15 & pixel13 & 0.0111905 & 0 & 0.0 & 254 & & & Int64 \\\\\n", "\t16 & pixel14 & 0.00514286 & 0 & 0.0 & 216 & & & Int64 \\\\\n", "\t17 & pixel15 & 0.000214286 & 0 & 0.0 & 9 & & & Int64 \\\\\n", "\t18 & pixel16 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t19 & pixel17 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t20 & pixel18 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t21 & pixel19 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t22 & pixel20 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t23 & pixel21 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t24 & pixel22 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t25 & pixel23 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t26 & pixel24 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t27 & pixel25 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t28 & pixel26 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t29 & pixel27 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t30 & pixel28 & 0.0 & 0 & 0.0 & 0 & & & Int64 \\\\\n", "\t$\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "785×8 DataFrame. Omitted printing of 1 columns\n", "│ Row │ variable │ mean │ min │ median │ max │ nunique │ nmissing │\n", "│ │ \u001b[90mSymbol\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mNothing\u001b[39m │ \u001b[90mNothing\u001b[39m │\n", "├─────┼──────────┼────────────┼───────┼─────────┼───────┼─────────┼──────────┤\n", "│ 1 │ label │ 4.45664 │ 0 │ 4.0 │ 9 │ │ │\n", "│ 2 │ pixel0 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 3 │ pixel1 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 4 │ pixel2 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 5 │ pixel3 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 6 │ pixel4 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 7 │ pixel5 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 8 │ pixel6 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 9 │ pixel7 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 10 │ pixel8 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "⋮\n", "│ 775 │ pixel773 │ 0.340214 │ 0 │ 0.0 │ 255 │ │ │\n", "│ 776 │ pixel774 │ 0.219286 │ 0 │ 0.0 │ 254 │ │ │\n", "│ 777 │ pixel775 │ 0.117095 │ 0 │ 0.0 │ 254 │ │ │\n", "│ 778 │ pixel776 │ 0.0590238 │ 0 │ 0.0 │ 253 │ │ │\n", "│ 779 │ pixel777 │ 0.0201905 │ 0 │ 0.0 │ 253 │ │ │\n", "│ 780 │ pixel778 │ 0.0172381 │ 0 │ 0.0 │ 254 │ │ │\n", "│ 781 │ pixel779 │ 0.00285714 │ 0 │ 0.0 │ 62 │ │ │\n", "│ 782 │ pixel780 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 783 │ pixel781 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 784 │ pixel782 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │\n", "│ 785 │ pixel783 │ 0.0 │ 0 │ 0.0 │ 0 │ │ │" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note that summary in Julia just gives basic information as it is supposed to work on ANY datatype\n", "# those who are used to R may be looking for the describe() function which is for DataFrames\n", "describe(train)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Summary Stats:\n", "Length: 42000\n", "Missing Count: 0\n", "Mean: 97.163476\n", "Minimum: 0.000000\n", "1st Quartile: 0.000000\n", "Median: 35.000000\n", "3rd Quartile: 231.000000\n", "Maximum: 255.000000\n", "Type: Int64\n" ] } ], "source": [ "# the describe() function gives basic stats\n", "# you could describe(train) to do the entire DataFrame, but that would take a lot of space here\n", "describe(train[:,406])" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Summary Stats:\n", "Length: 42000\n", "Missing Count: 0\n", "Mean: 97.163476\n", "Minimum: 0.000000\n", "1st Quartile: 0.000000\n", "Median: 35.000000\n", "3rd Quartile: 231.000000\n", "Maximum: 255.000000\n", "Type: Int64\n" ] } ], "source": [ "# can also specify a column in a DataFrame with a Symbol\n", "# Symbols in Julia have a colon before them and are reserved when created (:pixel404)\n", "# I just just picked 404 because it is near the center for 28x28cv\n", "describe(train[!, :pixel404])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "" }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# see the distribution of values in that column\n", "histogram(train[!, :pixel404], bins=20)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "" }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a point a little more to the edge has a lot more zeros\n", "histogram(train[!, :pixel397], bins=20)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAGQCAYAAAByNR6YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAEpJJREFUeJzt3W9snXXZwPGr7cbGMgdZKwQcbSFSB5XR0mkmEQx/AwQSdb4wEYE4YoyoEQ1/XoCJcQ4N0RdACJBnLhDIgoQsJEBYFJ3GBA3QMBE2MsY6R3YKq4Mtg23P1t7Piz00VEDmep1z99z9fBKS08J2Xxdjh++5z29tS1EURQAAkKa17AEAAKpGYAEAJDuiwHr33XdjcHAw3n333ex5AACa3hEF1saNG2NgYCA2btyYPc9h27VrV2nXrqeq7hVht2ZU1b0iqrtbVfeKqO5uVd0rotq7fZymfYtwdHS07BHqoqp7RditGVV1r4jq7lbVvSKqu1tV94qo9m4fp2kDCwBgqhJYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkGxG2QMw0fDwcAwNDTXseh0dHdHZ2dmw6wHAdCCwppBarRa9vb0Nvebso+fEKxs3iCwASCSwppBarXbowTfviejsr/8FhzfEvpXXxMjIiMACgEQCayrq7I/oakBgAQB14ZA7AEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAshllDwAAh2N4eDiGhoYacq2Ojo7o7OxsyLWoJoEFwJRXq9Wit7e3YdebffSceGXjBpHFERNYAEx5tVrt0INv3hPR2V/fiw1viH0rr4mRkRGBxRETWAA0j87+iK46BxYkEFiQwNkQAN5PYMEkORsCwL8TWDBJzoYA8O8EFmRxNgSA/+cLjQIAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAk85XcaRjfEBlgepnOz/sCi4bwDZEBppfp/rwvsGgI3xAZYHqZ7s/7AovG8g2Roa6m81syTFHT9Hm/aQPLkwjARNP9LRmYSpoysDyJAHzQdH9LBqaSpg2siPAkAvBhpulbMjCVNGVgjfMkAgBMQc0dWEBdOesIcGQEFvChnHWExmjkC5kIL2YaRWABH8pZR6i/Rr+QifBiplEEFvCfOesIddPQFzIRXsw0kMACgLJ5IVM5rWUPAABQNQILACCZwAIASCawAACSCSwAgGQCCwAgmcACAEgmsAAAkgksAIBkAgsAIJnAAgBIJrAAAJIJLACAZAILACCZwAIASCawAACSCSwAgGQzyh4AoAzDw8MxNDTUkGt1dHREZ2dnQ64FTA0CC5h2arVa9Pb2Nux6s4+eE69s3CCyYBoRWMC0U6vVDj345j0Rnf31vdjwhti38poYGRkRWDCNCCxg+ursj+iqc2AB05JD7gAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAMoEFAJBMYAEAJBNYAADJBBYAQDKBBQCQTGABACQTWAAAyQQWAEAygQUAkExgAQAkE1gAAMkEFgBAsgmBtWzZsmhpafnAXzt37ixrPgCApjMhsC677LJDn2xtjbPPPjtaWw/97ZkzZzZ+MgCAJjUhsNra2sYf33rrrVEURURE/P73v2/sVAAATWzG+z94763AsbGxuPTSS8c//9JLL8VXvvKVD/zgBx98MNatWzf+cV9fX5x//vl1GvVDDG9o2DXWrFkTg4ODdb3U1q1bJ1yz7qq6WwP3iqjublXdK6K6u1V1r4jq7uZ5P0mj/v39F1qK925TxaG7VsuXL4+2traYOXNm7Nu3LyIifvSjH8WvfvWr8R80ODgYAwMDH/oT3nTTTXUeOWL37t1x3/+sjNED/1v3azVcS2tEMVb2FPVht+ZT1b0iqrtbVfeKqO5uVd0roqG7tc08Kr597bKYN29eQ673737xi19M+HjCHazNmzdHRMTo6GiMjo6Of/7FF1/80J/s+uuvjwULFox/3Mg7WN/5znfi4MGDDblWrVaLWq3WkGvNnTs3enp6GnKtiOru1si9Iqq7W1X3iqjublXdK6K6u3nez9HR0RGdnZ0NudbhmHAH6+9//3uceeaZERFxxhlnxEsvvRRjY2Px3HPPTbhj9d4drOeffz7OOuusxk8dh97OnD9/finXrqeq7hVht2ZU1b0iqrtbVfeKqO5uVd0rotq7fZwJd7BuvPHG8cfv3bWaM2fOR74dCADAB00IrKeeeqqsOQAAKsNXcgcASNaUgbV///745S9/Gfv37y97lFRV3SvCbs2oqntFVHe3qu4VUd3dqrpXRLV3OxwTDrkfrrIPue/evTuOOeaY2LVrV2l/HLMeqrpXhN2aUVX3iqjublXdK6K6u1V1r4hq73Y4mvIOFgDAVCawAACSzfj4f+SD9u7dGxERGzaU86Xp9+zZExERL7zwQsydO7eUGeqhqntF2K0ZVXWviOruVtW9Iqq7W1X3iqj2bh9l4cKFMWfOnIg4wjNYDz30UFx55ZXpgwEANKv3n00/osAaGRmJtWvXRnd3dxx99NHpAwIANJtJ38ECAOCjOeQOAJBMYAEAJGu6wNq0aVOcffbZ0dPTE5///Ofj5ZdfLnukFD/4wQ+iu7s7Wlpa4h//+EfZ46TZt29ffPnLX46enp7o6+uLSy65JIaGhsoeK83FF18cixYtir6+vjjnnHPihRdeKHukVD/96U8r999kd3d3LFy4MPr6+qKvry8efvjhskdKsX///vje974Xp556avT29lbmDyK9/fbb479WfX190dPTEzNmzIidO3eWPdqkrV27NgYGBqK/vz8++9nPxv3331/2SGmeeuqpWLx4cSxatCiWLFkS69evL3ukxiuazHnnnVesWrWqKIqieOSRR4olS5aUO1CSP/3pT8W2bduKrq6u4sUXXyx7nDR79+4tnnjiiWJsbKwoiqK48847i4suuqjkqfK89dZb44/XrFlT9Pf3lzhNrueff7645JJLis7Ozkr9N1m132Pv+eEPf1h8//vfH/+9tn379pInqo/bb7+9uPzyy8seY9LGxsaK+fPnF+vXry+Koii2bNlSzJo1q9i9e3fJk03ezp07i/b29uLll18uiqIo1q1bV/T29pY8VeM11R2sN998MwYHB8dfmS1dujS2bNlSiTsi5557bixYsKDsMdLNnj07LrvssmhpaYmIiCVLlsRrr71W8lR5jj322PHHu3btitbWpvot9ZH2798f1113Xdx9993jv3ZMXe+8806sWrUqVqxYMf7rdcIJJ5Q8VX2sWrUqli1bVvYYad5+++2IOPRtZdrb22PWrFklTzR5mzdvjuOOOy5OO+20iIj40pe+FFu3bo3BwcGSJ2uspvq/wbZt2+LEE0+MGTMOfX3UlpaW6OzsjH/+858lT8bhuuOOO+KKK64oe4xUV111VZx00klxyy23VOYW/09+8pO48sor4+STTy57lLr4xje+EWeccUZce+21sWPHjrLHmbTNmzdHe3t7LF++PBYvXhznnHNOPP3002WPle6ZZ56Jf/3rX3H55ZeXPcqktbS0xG9/+9v46le/Gl1dXfHFL34x7r///jjqqKPKHm3STj311NixY0f89a9/jYiINWvWxJ49eypxM+S/0VSBFREfeDVd+CoTTWPFihWxadOm+PnPf172KKkeeOCB2LZtWyxfvjxuuOGGsseZtGeeeSaeffbZ+O53v1v2KHXx5z//OdavXx+Dg4PR3t4eV199ddkjTdqBAwfitddei9NPPz2ee+65uOuuu+LrX/96JeLx/X7zm9/EVVddNf4iu5kdPHgwbrvttnjsscdi69at8fTTT8fVV19dibNlxxxzTDz66KNx8803x8DAQKxbty5OP/30mDlzZtmjNVbZ71H+N954441i3rx5xYEDB4qiOPQe9vHHH19s2bKl3MESVfV8yO23314MDAxMOLNURbNnzy5GRkbKHmNSbrvttuKEE04ourq6iq6urqKtra048cQTiyeffLLs0dJt3769mDt3btljTNqOHTuK1tbW4uDBg+Of+9znPlf88Y9/LG+oZHv27Ck+8YlPFBs2bCh7lBTPPvtscdppp0343OLFi4s//OEPJU1UP/v27SuOPfbYYtOmTWWP0lBNdQfruOOOi/7+/njwwQcjIuLRRx+N7u7u6O7uLncw/qNf//rXsXr16vjd73434cxSs9u9e3ds3759/OM1a9ZEe3t7zJ8/v8SpJu/mm2+O7du3x9DQUAwNDcWCBQti7dq1cemll5Y92qS9884742deIiJWr14d/f39JU6Uo6OjIy644IJYu3ZtRERs3bo1tmzZEp/5zGdKnizPI488EosWLYqFCxeWPUqKk046KV5//fV45ZVXIiLi1Vdfjc2bN0dPT0/Jk+Wo1Wrjj3/2s5/F+eefH5/+9KdLnKjxmu4+67333hvXXHNNrFixIubNm1eZMy/XXXddPPbYYzE8PBwXXnhhzJ07N1599dWyx5q0119/PX784x/HKaecEuedd15ERMyaNSv+9re/lTzZ5O3atSuWLl0ae/fujdbW1vjkJz8Zjz/+uEPhU9gbb7wRS5cujdHR0SiKIk455ZR44IEHyh4rxT333BPf+ta34qabboq2tra47777KnXQfeXKlZU63H788cfHvffeG1/72teitbU1iqKIu+++Oz71qU+VPVqKW2+9Nf7yl7/EwYMH4wtf+EKsXLmy7JEazrfKAQBI1lRvEQIANAOBBQCQTGABACT7P8NKEf9tCA34AAAAAElFTkSuQmCC" }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# distribution of labels - seems fairly even\n", "histogram(train[!, :label], ticks=collect(0:9))" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "" }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can plot different subsets to eyeball variation\n", "histogram([train[1:8400,:label] train[8401:16800,:label] train[16801:25200,:label] train[25201:33600,:label] train[33601:end,:label]], ticks=collect(0:9), legend=true, \n", "label=[\"first fifth\" \"second fifth\" \"third fifth\" \"fourth fifth\" \"last fifth\"], alpha=0.4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you really want to check stats, you might want to do a chi-square or Kolmogorov–Smirnov test to really see if the distribution is uniform. You can do these with the [HypothesisTests](https://github.com/JuliaStats/HypothesisTests.jl) package in Julia. Documentation is at https://juliastats.org/HypothesisTests.jl/stable/. The real focus of this notebook is MLP, so let us move on and start setting it up." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Preprocess data" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "search: \u001b[0m\u001b[1mp\u001b[22m\u001b[0m\u001b[1me\u001b[22m\u001b[0m\u001b[1mr\u001b[22m\u001b[0m\u001b[1mm\u001b[22m\u001b[0m\u001b[1mu\u001b[22m\u001b[0m\u001b[1mt\u001b[22m\u001b[0m\u001b[1me\u001b[22m\u001b[0m\u001b[1md\u001b[22m\u001b[0m\u001b[1mi\u001b[22m\u001b[0m\u001b[1mm\u001b[22m\u001b[0m\u001b[1ms\u001b[22m \u001b[0m\u001b[1mp\u001b[22m\u001b[0m\u001b[1me\u001b[22m\u001b[0m\u001b[1mr\u001b[22m\u001b[0m\u001b[1mm\u001b[22m\u001b[0m\u001b[1mu\u001b[22m\u001b[0m\u001b[1mt\u001b[22m\u001b[0m\u001b[1me\u001b[22m\u001b[0m\u001b[1md\u001b[22m\u001b[0m\u001b[1mi\u001b[22m\u001b[0m\u001b[1mm\u001b[22m\u001b[0m\u001b[1ms\u001b[22m! \u001b[0m\u001b[1mP\u001b[22m\u001b[0m\u001b[1me\u001b[22m\u001b[0m\u001b[1mr\u001b[22m\u001b[0m\u001b[1mm\u001b[22m\u001b[0m\u001b[1mu\u001b[22m\u001b[0m\u001b[1mt\u001b[22m\u001b[0m\u001b[1me\u001b[22m\u001b[0m\u001b[1md\u001b[22mD\u001b[0m\u001b[1mi\u001b[22m\u001b[0m\u001b[1mm\u001b[22m\u001b[0m\u001b[1ms\u001b[22mArray\n", "\n" ] }, { "data": { "text/latex": [ "\\begin{verbatim}\n", "permutedims(A::AbstractArray, perm)\n", "\\end{verbatim}\n", "Permute the dimensions of array \\texttt{A}. \\texttt{perm} is a vector specifying a permutation of length \\texttt{ndims(A)}.\n", "\n", "See also: \\href{@ref}{\\texttt{PermutedDimsArray}}.\n", "\n", "\\section{Examples}\n", "\\begin{verbatim}\n", "julia> A = reshape(Vector(1:8), (2,2,2))\n", "2×2×2 Array{Int64,3}:\n", "[:, :, 1] =\n", " 1 3\n", " 2 4\n", "\n", "[:, :, 2] =\n", " 5 7\n", " 6 8\n", "\n", "julia> permutedims(A, [3, 2, 1])\n", "2×2×2 Array{Int64,3}:\n", "[:, :, 1] =\n", " 1 3\n", " 5 7\n", "\n", "[:, :, 2] =\n", " 2 4\n", " 6 8\n", "\\end{verbatim}\n", "\\rule{\\textwidth}{1pt}\n", "\\begin{verbatim}\n", "permutedims(m::AbstractMatrix)\n", "\\end{verbatim}\n", "Permute the dimensions of the matrix \\texttt{m}, by flipping the elements across the diagonal of the matrix. Differs from \\texttt{LinearAlgebra}'s \\href{@ref}{\\texttt{transpose}} in that the operation is not recursive.\n", "\n", "\\section{Examples}\n", "\\begin{verbatim}\n", "julia> a = [1 2; 3 4];\n", "\n", "julia> b = [5 6; 7 8];\n", "\n", "julia> c = [9 10; 11 12];\n", "\n", "julia> d = [13 14; 15 16];\n", "\n", "julia> X = [[a] [b]; [c] [d]]\n", "2×2 Array{Array{Int64,2},2}:\n", " [1 2; 3 4] [5 6; 7 8]\n", " [9 10; 11 12] [13 14; 15 16]\n", "\n", "julia> permutedims(X)\n", "2×2 Array{Array{Int64,2},2}:\n", " [1 2; 3 4] [9 10; 11 12]\n", " [5 6; 7 8] [13 14; 15 16]\n", "\n", "julia> transpose(X)\n", "2×2 Transpose{Transpose{Int64,Array{Int64,2}},Array{Array{Int64,2},2}}:\n", " [1 3; 2 4] [9 11; 10 12]\n", " [5 7; 6 8] [13 15; 14 16]\n", "\\end{verbatim}\n", "\\rule{\\textwidth}{1pt}\n", "\\begin{verbatim}\n", "permutedims(v::AbstractVector)\n", "\\end{verbatim}\n", "Reshape vector \\texttt{v} into a \\texttt{1 × length(v)} row matrix. Differs from \\texttt{LinearAlgebra}'s \\href{@ref}{\\texttt{transpose}} in that the operation is not recursive.\n", "\n", "\\section{Examples}\n", "\\begin{verbatim}\n", "julia> permutedims([1, 2, 3, 4])\n", "1×4 Array{Int64,2}:\n", " 1 2 3 4\n", "\n", "julia> V = [[[1 2; 3 4]]; [[5 6; 7 8]]]\n", "2-element Array{Array{Int64,2},1}:\n", " [1 2; 3 4]\n", " [5 6; 7 8]\n", "\n", "julia> permutedims(V)\n", "1×2 Array{Array{Int64,2},2}:\n", " [1 2; 3 4] [5 6; 7 8]\n", "\n", "julia> transpose(V)\n", "1×2 Transpose{Transpose{Int64,Array{Int64,2}},Array{Array{Int64,2},1}}:\n", " [1 3; 2 4] [5 7; 6 8]\n", "\\end{verbatim}\n" ], "text/markdown": [ "```\n", "permutedims(A::AbstractArray, perm)\n", "```\n", "\n", "Permute the dimensions of array `A`. `perm` is a vector specifying a permutation of length `ndims(A)`.\n", "\n", "See also: [`PermutedDimsArray`](@ref).\n", "\n", "# Examples\n", "\n", "```jldoctest\n", "julia> A = reshape(Vector(1:8), (2,2,2))\n", "2×2×2 Array{Int64,3}:\n", "[:, :, 1] =\n", " 1 3\n", " 2 4\n", "\n", "[:, :, 2] =\n", " 5 7\n", " 6 8\n", "\n", "julia> permutedims(A, [3, 2, 1])\n", "2×2×2 Array{Int64,3}:\n", "[:, :, 1] =\n", " 1 3\n", " 5 7\n", "\n", "[:, :, 2] =\n", " 2 4\n", " 6 8\n", "```\n", "\n", "---\n", "\n", "```\n", "permutedims(m::AbstractMatrix)\n", "```\n", "\n", "Permute the dimensions of the matrix `m`, by flipping the elements across the diagonal of the matrix. Differs from `LinearAlgebra`'s [`transpose`](@ref) in that the operation is not recursive.\n", "\n", "# Examples\n", "\n", "```jldoctest; setup = :(using LinearAlgebra)\n", "julia> a = [1 2; 3 4];\n", "\n", "julia> b = [5 6; 7 8];\n", "\n", "julia> c = [9 10; 11 12];\n", "\n", "julia> d = [13 14; 15 16];\n", "\n", "julia> X = [[a] [b]; [c] [d]]\n", "2×2 Array{Array{Int64,2},2}:\n", " [1 2; 3 4] [5 6; 7 8]\n", " [9 10; 11 12] [13 14; 15 16]\n", "\n", "julia> permutedims(X)\n", "2×2 Array{Array{Int64,2},2}:\n", " [1 2; 3 4] [9 10; 11 12]\n", " [5 6; 7 8] [13 14; 15 16]\n", "\n", "julia> transpose(X)\n", "2×2 Transpose{Transpose{Int64,Array{Int64,2}},Array{Array{Int64,2},2}}:\n", " [1 3; 2 4] [9 11; 10 12]\n", " [5 7; 6 8] [13 15; 14 16]\n", "```\n", "\n", "---\n", "\n", "```\n", "permutedims(v::AbstractVector)\n", "```\n", "\n", "Reshape vector `v` into a `1 × length(v)` row matrix. Differs from `LinearAlgebra`'s [`transpose`](@ref) in that the operation is not recursive.\n", "\n", "# Examples\n", "\n", "```jldoctest; setup = :(using LinearAlgebra)\n", "julia> permutedims([1, 2, 3, 4])\n", "1×4 Array{Int64,2}:\n", " 1 2 3 4\n", "\n", "julia> V = [[[1 2; 3 4]]; [[5 6; 7 8]]]\n", "2-element Array{Array{Int64,2},1}:\n", " [1 2; 3 4]\n", " [5 6; 7 8]\n", "\n", "julia> permutedims(V)\n", "1×2 Array{Array{Int64,2},2}:\n", " [1 2; 3 4] [5 6; 7 8]\n", "\n", "julia> transpose(V)\n", "1×2 Transpose{Transpose{Int64,Array{Int64,2}},Array{Array{Int64,2},1}}:\n", " [1 3; 2 4] [5 7; 6 8]\n", "```\n" ], "text/plain": [ "\u001b[36m permutedims(A::AbstractArray, perm)\u001b[39m\n", "\n", " Permute the dimensions of array \u001b[36mA\u001b[39m. \u001b[36mperm\u001b[39m is a vector specifying a permutation\n", " of length \u001b[36mndims(A)\u001b[39m.\n", "\n", " See also: \u001b[36mPermutedDimsArray\u001b[39m.\n", "\n", "\u001b[1m Examples\u001b[22m\n", "\u001b[1m ≡≡≡≡≡≡≡≡≡≡\u001b[22m\n", "\n", "\u001b[36m julia> A = reshape(Vector(1:8), (2,2,2))\u001b[39m\n", "\u001b[36m 2×2×2 Array{Int64,3}:\u001b[39m\n", "\u001b[36m [:, :, 1] =\u001b[39m\n", "\u001b[36m 1 3\u001b[39m\n", "\u001b[36m 2 4\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m [:, :, 2] =\u001b[39m\n", "\u001b[36m 5 7\u001b[39m\n", "\u001b[36m 6 8\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> permutedims(A, [3, 2, 1])\u001b[39m\n", "\u001b[36m 2×2×2 Array{Int64,3}:\u001b[39m\n", "\u001b[36m [:, :, 1] =\u001b[39m\n", "\u001b[36m 1 3\u001b[39m\n", "\u001b[36m 5 7\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m [:, :, 2] =\u001b[39m\n", "\u001b[36m 2 4\u001b[39m\n", "\u001b[36m 6 8\u001b[39m\n", "\n", " ────────────────────────────────────────────────────────────────────────────\n", "\n", "\u001b[36m permutedims(m::AbstractMatrix)\u001b[39m\n", "\n", " Permute the dimensions of the matrix \u001b[36mm\u001b[39m, by flipping the elements across the\n", " diagonal of the matrix. Differs from \u001b[36mLinearAlgebra\u001b[39m's \u001b[36mtranspose\u001b[39m in that the\n", " operation is not recursive.\n", "\n", "\u001b[1m Examples\u001b[22m\n", "\u001b[1m ≡≡≡≡≡≡≡≡≡≡\u001b[22m\n", "\n", "\u001b[36m julia> a = [1 2; 3 4];\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> b = [5 6; 7 8];\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> c = [9 10; 11 12];\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> d = [13 14; 15 16];\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> X = [[a] [b]; [c] [d]]\u001b[39m\n", "\u001b[36m 2×2 Array{Array{Int64,2},2}:\u001b[39m\n", "\u001b[36m [1 2; 3 4] [5 6; 7 8]\u001b[39m\n", "\u001b[36m [9 10; 11 12] [13 14; 15 16]\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> permutedims(X)\u001b[39m\n", "\u001b[36m 2×2 Array{Array{Int64,2},2}:\u001b[39m\n", "\u001b[36m [1 2; 3 4] [9 10; 11 12]\u001b[39m\n", "\u001b[36m [5 6; 7 8] [13 14; 15 16]\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> transpose(X)\u001b[39m\n", "\u001b[36m 2×2 Transpose{Transpose{Int64,Array{Int64,2}},Array{Array{Int64,2},2}}:\u001b[39m\n", "\u001b[36m [1 3; 2 4] [9 11; 10 12]\u001b[39m\n", "\u001b[36m [5 7; 6 8] [13 15; 14 16]\u001b[39m\n", "\n", " ────────────────────────────────────────────────────────────────────────────\n", "\n", "\u001b[36m permutedims(v::AbstractVector)\u001b[39m\n", "\n", " Reshape vector \u001b[36mv\u001b[39m into a \u001b[36m1 × length(v)\u001b[39m row matrix. Differs from\n", " \u001b[36mLinearAlgebra\u001b[39m's \u001b[36mtranspose\u001b[39m in that the operation is not recursive.\n", "\n", "\u001b[1m Examples\u001b[22m\n", "\u001b[1m ≡≡≡≡≡≡≡≡≡≡\u001b[22m\n", "\n", "\u001b[36m julia> permutedims([1, 2, 3, 4])\u001b[39m\n", "\u001b[36m 1×4 Array{Int64,2}:\u001b[39m\n", "\u001b[36m 1 2 3 4\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> V = [[[1 2; 3 4]]; [[5 6; 7 8]]]\u001b[39m\n", "\u001b[36m 2-element Array{Array{Int64,2},1}:\u001b[39m\n", "\u001b[36m [1 2; 3 4]\u001b[39m\n", "\u001b[36m [5 6; 7 8]\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> permutedims(V)\u001b[39m\n", "\u001b[36m 1×2 Array{Array{Int64,2},2}:\u001b[39m\n", "\u001b[36m [1 2; 3 4] [5 6; 7 8]\u001b[39m\n", "\u001b[36m \u001b[39m\n", "\u001b[36m julia> transpose(V)\u001b[39m\n", "\u001b[36m 1×2 Transpose{Transpose{Int64,Array{Int64,2}},Array{Array{Int64,2},1}}:\u001b[39m\n", "\u001b[36m [1 3; 2 4] [5 7; 6 8]\u001b[39m" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "? permutedims" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.337898 seconds (6.89 k allocations: 502.799 MiB, 54.34% gc time)\n" ] }, { "data": { "text/plain": [ "42000×784 Array{Int64,2}:\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ \n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@time a = convert(Matrix, train[:,2:end])" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.134896 seconds (7.85 k allocations: 453.027 KiB)\n" ] }, { "data": { "text/plain": [ "784×42000 LinearAlgebra.Transpose{Int64,Array{Int64,2}}:\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ \n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@time transpose(a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "transpose(a) == permutedims(a)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.419683 seconds (6 allocations: 251.221 MiB, 1.94% gc time)\n" ] }, { "data": { "text/plain": [ "784×42000 Array{Int64,2}:\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ \n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@time permutedims(a)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "ename": "MethodError", "evalue": "MethodError: no method matching Array(::DataFrame)\nClosest candidates are:\n Array(!Matched::LinearAlgebra.SymTridiagonal) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/tridiag.jl:111\n Array(!Matched::LinearAlgebra.Tridiagonal) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/tridiag.jl:489\n Array(!Matched::LinearAlgebra.AbstractTriangular) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/triangular.jl:106\n ...", "output_type": "error", "traceback": [ "MethodError: no method matching Array(::DataFrame)\nClosest candidates are:\n Array(!Matched::LinearAlgebra.SymTridiagonal) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/tridiag.jl:111\n Array(!Matched::LinearAlgebra.Tridiagonal) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/tridiag.jl:489\n Array(!Matched::LinearAlgebra.AbstractTriangular) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/triangular.jl:106\n ...", "", "Stacktrace:", " [1] top-level scope at In[41]:1" ] } ], "source": [ "# need to split train into training and eval sets\n", "# convert DataFrame to an Array, remembering Julia is column-major like R and Matlab (unlike Python)\n", "X = transpose(Array(train[:,2:end]))\n", "y = Array(train[:,1])" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "(784,42000)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "size(X)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "42000" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = size(X)[2]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "42000" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "length(y)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "(0,255)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extrema(X)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "33.408911169825075" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean(X)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# scale X to range 0-1\n", "X = X./255;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Split data" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "784×8400 Array{Float64,2}:\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " ⋮ ⋮ ⋱ ⋮ \n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we need to split the train data into a training set (cv_X) and an eval set (eval_X)\n", "split = 0.8\n", "cv_X = X[:,1:floor(Int,split*N)]\n", "eval_X = X[:,floor(Int,split*N)+1:N]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "8400-element Array{Int64,1}:\n", " 0\n", " 7\n", " 7\n", " 2\n", " 2\n", " 6\n", " 5\n", " 7\n", " 8\n", " 5\n", " 3\n", " 0\n", " 2\n", " ⋮\n", " 0\n", " 5\n", " 3\n", " 1\n", " 9\n", " 6\n", " 4\n", " 0\n", " 1\n", " 7\n", " 6\n", " 9" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cv_y = y[1:floor(Int,split*N)]\n", "eval_y = y[floor(Int,split*N)+1:N]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup providers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we get to the heart of this notebook: setting up the problem in MXNet. Full documentation for the Julia `MXNet.jl` library is available at http://dmlc.ml/MXNet.jl/latest/. Here is the big picture:\n", "\n", "Neural networks in MXNet are set up as Models. Presently there are only Feedforward models that you define with the [Symbolic API](http://dmlc.ml/MXNet.jl/latest/api/symbolic-node/). The model is fed data through a Data Provider. The `fit()` or `train()` function trains a Model with a Data Provider with a chosen Optimizer, EvalMetric, and Initializer. \n", "\n", "KVStore which is a system to allow synchronization of data across different devices such as different CPUs or GPUs on the same or different machines. It automatically attempts to parallelize any operations that can be split for faster performance. This happens under the hood so you usually don't have to worry about it unless you have a complicated setup.\n", "\n", "Let's see how this all works in a basic MLP." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "ename": "ArgumentError", "evalue": "ArgumentError: Package MXNet not found in current path:\n- Run `import Pkg; Pkg.add(\"MXNet\")` to install the MXNet package.\n", "output_type": "error", "traceback": [ "ArgumentError: Package MXNet not found in current path:\n- Run `import Pkg; Pkg.add(\"MXNet\")` to install the MXNet package.\n", "", "Stacktrace:", " [1] require(::Module, ::Symbol) at ./loading.jl:823", " [2] top-level scope at In[39]:1" ] } ], "source": [ "using MXNet" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "1000" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "batch_size = 1000" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MXNet sets up a `AbstractDataProvider` class that helps abstract the data retrieval for the Executor from the KVStore. (I hope that is not too abstract). There are a few predefined concrete classes including one that specifically provides the MNIST dataset, appropriately named [`MNISTIter()`](http://dmlc.ml/MXNet.jl/latest/api/io/#MXNet.mx.MNISTIter-Tuple{}) or `MNISTProvider`. I want to demonstrate a more general case, so I will be using the ArrayDataProvider with an Array filled from the Kaggle file above. Documentation is at http://dmlc.ml/MXNet.jl/latest/api/io/#MXNet.mx.ArrayDataProvider, but briefly, parameters are:\n", "\n", "`ArrayDataProvider(data[, label]; batch_size, shuffle)`\n", "\n", "Shuffle is a boolean to shuffle the data each different epoch of training.\n", "\n", "Note that although the Executor can process in parallel, access to a DataProvider can **NOT** do so. _Don't_ call a DataProvider from more than one place. " ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "MXNet.mx.ArrayDataProvider(Array{Float32,N}[\n", "Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]],Symbol[:data],Array{Float32,N}[\n", "Float32[1.0 0.0 … 2.0 2.0]],Symbol[:softmax_label],1000,33600,true,0.0f0,0.0f0,MXNet.mx.NDArray[mx.NDArray{Float32}(784,1000)],MXNet.mx.NDArray[mx.NDArray{Float32}(1000,)])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# since this is a training set, we specify shuffle to help randomize the training\n", "train_provider = mx.ArrayDataProvider(cv_X, cv_y, batch_size=batch_size, shuffle=true)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "MXNet.mx.ArrayDataProvider(Array{Float32,N}[\n", "Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]],Symbol[:data],Array{Float32,N}[\n", "Float32[0.0 7.0 … 6.0 9.0]],Symbol[:softmax_label],1000,8400,false,0.0f0,0.0f0,MXNet.mx.NDArray[mx.NDArray{Float32}(784,1000)],MXNet.mx.NDArray[mx.NDArray{Float32}(1000,)])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we don't want to randomize the eval set during training, so shuffle is false\n", "eval_provider = mx.ArrayDataProvider(eval_X, eval_y, batch_size=batch_size, shuffle=false)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# MLP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, we want to define the model by specifying the number of nodes in each layer and how they are connected to each other. Looking at the paper LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. The article is available on ResearchGate at https://www.researchgate.net/profile/Yann_Lecun/publication/2985446_Gradient-based_learning_applied_to_document_recognition/links/0deec519dfa1983fc2000000/Gradient-based-learning-applied-to-document-recognition.pdf. They compared a one layer nets (28x28-300-10 and 28x28-1000-10) which had error rates of 4.7% and 4.5% on MNIST to two layer net (28x28-300-100-10) with error rate of 3.05% and their preference, a convolutional net with rate of 0.95%. The advantage of the ConvNet is that it captures more spatial relations without having to make the network 5 layers deep. I look at the [ConvNet](mnistLenet.ipynb) in a separate notebook. For now, let's try creating the first two fully-connected nets.\n", "\n", "For this network, we need to create a `FullyConnected` SymbolicNode for our hidden layer/s which will set up all the weights and a layer bias if desired. We will also need an `Activation` SymbolicNode so each node can do the job of separation. The `Activation()` function can specify an act_type = ’relu’, ’sigmoid’, ’softrelu’, or ’tanh’. There are also specific `SoftmaxActivation()` and `LeakyReLU()` functions the later of which has options for act_type = ’elu’, ’leaky’, ’prelu’, or ’rrelu’. These are all described in the Wikipedia article on [Activation Functions](https://en.wikipedia.org/wiki/Activation_function). \n", "\n", "Besides the hidden layer/s, we need input and output. The input is labeled with the `Variable()` function and will be taken from the `DataProvider` specified in the `fit()` instruction. We also need to create a `SymbolicNode` for the output layer. The MXNet API provides several choices including `LinearRegressionOutput`, `LogisticRegressionOutput`, `MAERegressionOutput` (mean absolute error), and `SoftmaxOutput`. The layers are connected by specifying the appropriate input data for each function." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x000000001962b220))" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = mx.Variable(:data)\n", "fc1 = mx.FullyConnected(data, name=:fc1, num_hidden=300)\n", "act1 = mx.Activation(fc1, name=:tanh1, act_type=:tanh)\n", "fc2 = mx.FullyConnected(act1, name=:fc2, num_hidden=10)\n", "mlp1 = mx.SoftmaxOutput(fc2, name=:softmax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`MXNet.jl` has a `to_graphviz()` function to produce a `.dot` file from a model. Unfortunately, I don't have a way to directly display `dot` output in Jupyter, so we will save as a `dot` file, convert to `png` and insert that into the notebook. Of course if you are playing with this yourself, you can use `dot` file viewer directly." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# save file - note that the `do` block automatically closes the filestream\n", "open(\"mlp1graph.dot\", \"w\") do fs\n", " print(fs, mx.to_graphviz(mlp1))\n", "end" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# we can run the `dot` program to convert to png if it is installed on your computer\n", "run(pipeline(`dot -Tpng mlp1graph.dot`, stdout=\"mlp1graph.png\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can display this file:\n", "![mlp1graph.png](mlp1graph.png)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "MXNet.mx.FeedForward(MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x000000001962b220)),MXNet.mx.Context[CPU0],#undef,#undef,#undef)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# change context to gpu(number) if you have a gpu and want to use that for processing\n", "model = mx.FeedForward(mlp1, context=mx.cpu())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have our model and data, we need to choose an Optimizer to do backprop. For more background on how [backpropagation](https://en.wikipedia.org/wiki/Backpropagation) works to use an error function to adjust network weights, check out the Wikipedia article. To learn about optimization, read the Wikipedia article on [Stochastic Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). It describes several variants, including AdaGrad, RMSProp,and Adam.\n", "\n", "A lot of different optimizer options are available in MXNet. These include `SGD` (Stochastic Gradient Descent), `AdaGrad` (Adaptive Gradient), `RMSProp` (Root Mean Square Propagation) and its variants `ADAM` (Adaptive Moment Estimation), `AdaMax`, `Nadam` (Nesterov Adam). \n", "\n", "Documentation is available at http://dmlc.ml/MXNet.jl/latest/api/optimizer/#built-in-optimizers, and it includes references to the source papers." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "MXNet.mx.SGD(MXNet.mx.SGDOptions(0.1,0.9,0,1.0e-5,MXNet.mx.LearningRate.Fixed(0.1),MXNet.mx.Momentum.Fixed(0.9)),#undef)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we are going to use the basic Stochastic Gradient Descent optimizer with a fixed learning rate and momentum\n", "optimizer = mx.SGD(lr=0.1, momentum=0.9, weight_decay=0.00001)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examine initial state" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, we start training ith the data providers, model and optimizer set up above using the `fit()` function. We can specify the number of epochs to run. At this point, we will start with just one epoch, and see the output that is automatically generated. The first `fit()` will set up all the memory structures for the model, initialize the weighs, create the KVStore, and start training. This first `fit()` will also take longer than subsequent ones as Julia does Just In Time compilation of code using LLVM. Subsequent calls will not have to be compiled.\n", "\n", "Note that `fit()` also allows the choice of other evaluation metrics, other initializers, and setting up callback routines between epochs. Full documentation is available at http://dmlc.ml/MXNet.jl/latest/api/model/#MXNet.mx.fit-Tuple{MXNet.mx.FeedForward,MXNet.mx.AbstractOptimizer,MXNet.mx.AbstractDataProvider}. Here, I am just using the default metric (`Accuracy()`) and defualt initializer (`UniformInitializer(0.01)`). The later will intialize the weights of the network to uniformly distributed random numbers less that .01. " ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: Start training on MXNet.mx.Context[CPU0]\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Initializing parameters...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Creating KVStore...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: TempSpace: Total 4 MB allocated on CPU0\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Start training...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 001/001 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.6336\n", "\u001b[0m" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 32.429736 seconds (7.59 M allocations: 540.899 MB, 2.80% gc time)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: time = 11.8315 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.7843\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Finish training on MXNet.mx.Context[CPU0]\n", "\u001b[0m" ] } ], "source": [ "@time mx.fit(model, optimizer, train_provider, eval_data=eval_provider, n_epoch=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I am taking a break in training here so we can take a quick peek at the initial internals of the network. Note all the parameters of the model are stored as MXNet.NDArrays which need to be `copy` to memory in an Array to play with them. Although the model is defined earlier, no memory is assigned until the `fit()` function is executed, so if you try to access the `arg_params` of the model before that, you will get `UndefRefError: access to undefined reference`." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "Dict{Symbol,MXNet.mx.NDArray} with 4 entries:\n", " :fc1_weight => mx.NDArray{Float32}(784,300)\n", " :fc1_bias => mx.NDArray{Float32}(300,)\n", " :fc2_weight => mx.NDArray{Float32}(300,10)\n", " :fc2_bias => mx.NDArray{Float32}(10,)" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note the input layer has 784*300 = 235,200 trainable weights stored in a NDArray. The output layer has 3000 weights.\n", "model.arg_params" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "300×10 Array{Float32,2}:\n", " -0.127887 0.0771416 0.0249448 … 0.145933 0.010249 \n", " -0.0196595 -0.0296244 0.0209594 -0.0811309 -0.0449037\n", " 0.0956391 -0.128065 -0.103133 0.0358045 -0.0394614\n", " -0.107717 0.0242749 0.0551586 0.0597582 -0.0298745\n", " -0.0984035 0.0960803 0.115726 -0.0061346 0.0479453\n", " -0.0183465 -0.207955 -0.000346427 … 0.0321886 0.0209672\n", " 0.120342 -0.122031 -0.024811 0.0252119 -0.0228585\n", " -0.0212993 0.0206527 0.0558286 -0.0572732 -0.0384446\n", " 0.0798484 -0.0635753 0.0840668 0.0380003 -0.0590102\n", " 0.0820934 0.0308938 0.12895 0.0494668 -0.120226 \n", " 0.145466 -0.128332 -0.110208 … 0.0314653 0.0496928\n", " 0.0384468 0.168292 -0.0586571 -0.0639272 -0.044879 \n", " 0.0877501 -0.0616897 -0.141263 0.0441554 0.0759257\n", " ⋮ ⋱ \n", " 0.0791338 -0.226709 -0.0443711 -0.0835878 0.0361731\n", " 0.0163245 0.045172 0.0819094 0.0264293 -0.0304361\n", " -0.0624429 -0.0678274 -0.0740325 … -0.0847413 0.0844292\n", " 0.0406324 0.0564422 -0.106171 0.0146266 0.0337845\n", " -0.0579353 0.0977567 -0.0983308 -0.112232 0.0105776\n", " -0.0233625 -0.0358019 -0.0986435 -0.0249763 0.0135266\n", " 0.0388482 0.0118654 -0.16399 0.0670357 0.120794 \n", " -0.0115069 0.0122509 -0.0674744 … 0.0703299 0.119037 \n", " 0.191457 -0.0999065 0.00194451 -0.0827886 -0.032837 \n", " 0.0187783 -0.0221313 -0.0729263 0.008342 0.0182222\n", " 0.00468942 -0.175613 0.0360089 0.0626512 0.0410399\n", " 0.124774 -0.122346 0.0201343 0.0144864 -0.0635198" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's examine the weights in the hidden layer\n", "w2 = copy(model.arg_params[:fc2_weight])" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "10×3 Array{Any,2}:\n", " (-0.280105,0.208179) 0.0080839 0.0957205\n", " (-0.267891,0.284985) 0.000389835 0.103438 \n", " (-0.202828,0.246249) -0.00487307 0.0891263\n", " (-0.212816,0.190532) 0.003161 0.0817025\n", " (-0.22252,0.225273) -0.00620983 0.0842009\n", " (-0.202098,0.172084) 0.00473183 0.0636334\n", " (-0.238122,0.210844) -0.00532095 0.0923733\n", " (-0.254803,0.263656) 0.00346573 0.0984543\n", " (-0.180616,0.145933) -0.00335081 0.0618578\n", " (-0.194464,0.198503) -0.000115971 0.0770083" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's get the extrema mean and standard deviation for each of the 10 groups\n", "[ [extrema(w2[:,i]) for i in 1:10] [mean(w2[:,i]) for i in 1:10] [std(w2[:,i]) for i in 1:10] ]" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# plot mean and std\n", "plot([ [mean(w2[:,i]) for i in 1:10] [std(w2[:,i]) for i in 1:10] ], legend=true, label=[\"mean\" \"std\"])" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# plot them as 30x10 arrays\n", "heatmap(reshape(w2[:,1], 10,30), aspect_ratio=:equal)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "Plots.PyPlotBackend()" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# switch to PyPlot because Plotly doesn't support clims attribute and I will want to compare later\n", "pyplot()" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot([heatmap(reshape(w2[:,i], 10,30), aspect_ratio=:equal, clims=(-0.6,0.6)) for i=1:10]...)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train model 10 epochs " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok. Let's run it for a few epochs and watch what happens. Note that every time we call `fit()`, the default is to continue training the same model without resetting the weights. " ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: Start training on MXNet.mx.Context[CPU0]\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Initializing parameters...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Creating KVStore...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: TempSpace: Total 4 MB allocated on CPU0\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Start training...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 001/009 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.8643\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 9.1088 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.8391\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 002/009 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9066\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.3779 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9173\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 003/009 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9159\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.7162 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9248\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 004/009 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9223\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.3265 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9298\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 005/009 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9274\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.3956 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9319\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 006/009 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9330\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.5606 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9370\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 007/009 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9363\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 9.2693 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9400\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 008/009 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9426\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 9.4478 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9428\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 009/009 ==========\n", "\u001b[0m" ] } ], "source": [ "@time mx.fit(model, optimizer, train_provider, eval_data=eval_provider, n_epoch=9)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "300×10 Array{Float32,2}:\n", " -0.18999 0.0523741 0.0521278 … 0.256539 0.0830147 \n", " -0.0566305 -0.0292824 0.000472053 -0.176414 -0.137913 \n", " -0.0510069 -0.15316 -0.15406 -0.00278085 -0.175738 \n", " -0.160078 0.0158039 0.0645762 0.101266 0.0279298 \n", " -0.121867 0.126162 0.203941 -0.0970455 0.1576 \n", " -0.27373 -0.234331 0.03053 … -0.02137 0.0339543 \n", " 0.165604 -0.171726 -0.0604989 0.0526133 -0.0206003 \n", " -0.0281766 0.0293302 0.0430013 -0.0955367 -0.137775 \n", " 0.126382 -0.076785 0.10942 0.0555131 -0.0920628 \n", " 0.128483 0.0501477 0.136502 0.0789185 -0.133262 \n", " 0.211869 -0.153832 -0.119128 … 0.0677559 0.0706252 \n", " 0.0836029 0.188536 -0.144794 -0.05817 -0.049355 \n", " 0.156777 -0.0915638 -0.21475 0.0954725 0.209062 \n", " ⋮ ⋱ \n", " 0.085546 -0.250519 -0.0668047 -0.137696 -0.00329412\n", " 0.00653774 0.0642993 0.191455 -0.0250502 0.0283506 \n", " -0.0594797 -0.109547 -0.072032 … -0.111813 0.00390982\n", " 0.0777404 0.0444848 -0.131087 0.0178672 0.0707252 \n", " -0.118424 0.150786 -0.147908 -0.172449 -0.0152548 \n", " -0.0545296 -0.045317 -0.174648 -0.0304535 -0.048231 \n", " 0.066315 0.0347018 -0.228092 0.080211 0.342091 \n", " 0.0115476 0.0107955 -0.0769882 … 0.116553 0.283723 \n", " 0.257533 -0.133712 0.104216 -0.208887 -0.0636919 \n", " 0.0174476 -0.00290674 -0.132057 0.01859 -0.0688223 \n", " -0.0502207 -0.193162 0.0843327 0.113111 0.0176484 \n", " 0.11381 -0.142134 0.0347138 0.00476562 -0.157754 " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's examine how the weights in the hidden layer have changed after 10 more epochs of training\n", "w211 = copy(model.arg_params[:fc2_weight])" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "10×3 Array{Any,2}:\n", " (-0.357168,0.36403) 0.00906962 0.144172\n", " (-0.291686,0.346591) 0.000505016 0.124114\n", " (-0.35688,0.567968) -0.00167273 0.145727\n", " (-0.354625,0.29979) 0.0025032 0.126793\n", " (-0.335417,0.335986) -0.0121043 0.133443\n", " (-0.61578,0.406012) 0.0105811 0.144681\n", " (-0.34189,0.273747) -0.00809373 0.127449\n", " (-0.443309,0.390159) 0.00113069 0.145592\n", " (-0.423108,0.589517) -0.00721869 0.125767\n", " (-0.339918,0.404306) 0.00526154 0.132497" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# extrema mean and standard deviation for each of the 10 groups\n", "[ [extrema(w211[:,i]) for i in 1:10] [mean(w211[:,i]) for i in 1:10] [std(w211[:,i]) for i in 1:10] ]" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# plot comparison to previous \n", "plot([ [mean(w2[:,i]) for i in 1:10] [std(w2[:,i]) for i in 1:10] [mean(w211[:,i]) for i in 1:10] [std(w211[:,i]) for i in 1:10] ], \n", "label=[\"mean 1\" \"std 1\" \"mean 10\" \"std 10\"], legend=true)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot([heatmap(reshape(w211[:,i], 10,30), aspect_ratio=:equal, clims=(-0.6,0.6)) for i=1:10]...)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: TempSpace: Total 1 MB allocated on CPU0\n", "\u001b[0m" ] } ], "source": [ "# test on eval set\n", "preds = mx.predict(model, eval_provider)\n", "correct = 0\n", "for i = 1:size(preds)[2]\n", " if indmax(preds[:,i]) == eval_y[i]+1\n", " correct += 1\n", " end\n", "end\n", "correct/size(preds)[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The accuracy already seems to be stabilizing around 95%, but we will run a few more epochs to see what happens" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train model 20 epochs " ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: Start training on MXNet.mx.Context[CPU0]\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Initializing parameters...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Creating KVStore...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: TempSpace: Total 4 MB allocated on CPU0\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Start training...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 001/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9494\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.3934 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9477\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 002/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9522\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.7644 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9524\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 003/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9553\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.9129 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9527\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 004/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9586\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.4194 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9558\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 005/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9613\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.2062 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9571\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 006/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9637\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 7.6110 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9586\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 007/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9666\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.5968 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9579\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 008/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9676\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.1803 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9600\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 009/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9694\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.5670 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9596\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 010/010 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9716\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 8.7106 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 93.436206 seconds (3.92 M allocations: 2.322 GB, 6.35% gc time)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: accuracy = 0.9629\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Finish training on MXNet.mx.Context[CPU0]\n", "\u001b[0m" ] } ], "source": [ "@time mx.fit(model, optimizer, train_provider, eval_data=eval_provider, n_epoch=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like it might have improved to 97%, but that is on the eval set. Let's go ahead and make predictions with the model we have trained using the Kaggle test set and create a file that can be submitted on the website. Note that running a prediction is a lot faster than training." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## run on test set" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 26" ] } ], "source": [ "@time test = readtable(\"data/test.csv\");" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "(28000,784)" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "size(test)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "784×28000 Array{Int64,2}:\n", " 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ \n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_X = transpose(Array(test))" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot([heatmap(rotl90(reshape(Array(test_X[1:end,i]), 28, 28)), aspect_ratio=:equal) for i=1:16]...)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "(0,255)" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extrema(test_X)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "33.3515450983965" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean(test_X)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# scale test_X to range 0-1\n", "test_X = test_X./255;" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "MXNet.mx.ArrayDataProvider(Array{Float32,N}[\n", "Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]],Symbol[:data],Array{Float32,N}[],Symbol[],1000,28000,false,0.0f0,0.0f0,MXNet.mx.NDArray[mx.NDArray{Float32}(784,1000)],MXNet.mx.NDArray[])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_provider = mx.ArrayDataProvider(test_X, batch_size=batch_size, shuffle=false)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: TempSpace: Total 1 MB allocated on CPU0\n", "\u001b[0m" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 4.253657 seconds (20.44 k allocations: 86.709 MB, 10.64% gc time)\n" ] }, { "data": { "text/plain": [ "10×28000 Array{Float32,2}:\n", " 3.32992f-5 0.998368 1.43607f-5 … 1.062f-6 2.4091f-5 \n", " 1.16891f-10 3.90927f-8 0.000117007 1.37874f-8 2.84631f-9 \n", " 0.999818 0.000398155 7.91866f-5 5.73282f-7 0.999624 \n", " 0.00013095 1.55586f-5 0.000858953 0.00192545 0.000157291\n", " 7.27301f-7 1.95835f-9 0.0574737 0.0149685 6.15969f-5 \n", " 2.27206f-7 0.00120688 0.00746393 … 0.000177767 1.21758f-5 \n", " 2.64876f-7 7.05455f-6 1.17546f-5 1.13237f-7 5.25664f-6 \n", " 8.56755f-7 3.20001f-6 0.00172127 0.000133627 7.38207f-8 \n", " 1.45219f-5 1.30237f-6 0.0118794 7.05102f-5 0.000103447\n", " 1.1988f-6 2.87172f-7 0.92038 0.982722 1.20239f-5 " ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this uses the previously trained model and the DataProvider specified above\n", "@time tpreds = mx.predict(model, test_provider)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# create submission\n", "open(\"MLP1submission.csv\", \"w\") do f\n", " write(f, \"ImageId,Label\\n\")\n", " for i = 1:size(tpreds)[2]\n", " write(f, string(i),\",\",string(indmax(tpreds[:,i])-1),\"\\n\")\n", " end\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When I submit the resulting file to https://kaggle.com/c/digit-recognizer/submit, I get a score of 0.95200. which is pretty close to the error rate of 4.7% reported above for this network. The validation accuracy was just beginning to plateau so it isn't clear if we could get a little more out of training, but I'm more interested here in demonstrating creating the two layer network. We will duplicate the 300-100 net that got an error rate of 4.5%." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# MLP - two layer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we define a new, two layer network with two FullyConnected Layers and tanh activations. We could use code similar to last time:\n", "```\n", "data = mx.Variable(:data)\n", "fc1 = mx.FullyConnected(data, name=:fc1, num_hidden=300)\n", "act1 = mx.Activation(fc1, name=:relu1, act_type=:relu)\n", "fc2 = mx.FullyConnected(act1, name=:fc2, num_hidden=100)\n", "act2 = mx.Activation(fc2, name=:relu2, act_type=:relu)\n", "fc3 = mx.FullyConnected(act2, name=:fc3, num_hidden=10)\n", "mlp2 = mx.SoftmaxOutput(fc3, name=:softmax)\n", "```\n", "but, to demonstrate a shortcut, we will use the MXNet chain macro to create this net using `=>` connections." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x000000001cfa0a10))" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# define new network\n", "mlp2 = @mx.chain mx.Variable(:data) => \n", " mx.FullyConnected(name=:fc1, num_hidden=300) =>\n", "mx.Activation(name=:tanh1, act_type=:tanh) =>\n", " mx.FullyConnected(name=:fc2, num_hidden=100) =>\n", "mx.Activation(name=:tanh2, act_type=:tanh) =>\n", " mx.FullyConnected(name=:fc3, num_hidden=10) =>\n", " mx.SoftmaxOutput(name=:softmax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display model" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# save file - note that the `do` block automatically closes the filestream\n", "open(\"mlp2graph.dot\", \"w\") do fs\n", " print(fs, mx.to_graphviz(mlp2))\n", "end" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# we can run the `dot` program to convert to png if it is installed on your computer\n", "run(pipeline(`dot -Tpng mlp2graph.dot`, stdout=\"mlp2graph.png\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can display this file:\n", "![mlp2graph.png](mlp2graph.png)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "MXNet.mx.FeedForward(MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x000000001cfa0a10)),MXNet.mx.Context[CPU0],#undef,#undef,#undef)" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# change context to gpu(number) if you have a gpu\n", "model = mx.FeedForward(mlp2, context=mx.cpu())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train model" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: Start training on MXNet.mx.Context[CPU0]\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Initializing parameters...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Creating KVStore...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: TempSpace: Total 4 MB allocated on CPU0\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Start training...\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 001/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.1096\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 11.8936 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.1056\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 002/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.3144\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.0328 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.5613\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 003/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.7609\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.0750 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.8576\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 004/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.8721\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 11.7286 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.8989\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 005/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.8987\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.1276 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9143\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 006/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9137\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.9629 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9227\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 007/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9231\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.6956 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9293\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 008/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9311\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 13.9586 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9364\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 009/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9381\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 14.6798 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9411\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 010/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9443\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.7957 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9454\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 011/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9491\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.6748 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9521\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 012/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9544\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 13.5925 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9556\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 013/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9591\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 13.5508 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9570\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 014/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9620\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.6269 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9554\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 015/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9640\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 12.5352 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9579\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 016/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9683\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 11.9030 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9607\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 017/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9711\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 13.4604 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9604\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 018/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9729\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 11.2488 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9611\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 019/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9736\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 13.9827 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9648\n", "\u001b[0m\u001b[1m\u001b[34mINFO: == Epoch 020/020 ==========\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Training summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9768\n", "\u001b[0m\u001b[1m\u001b[34mINFO: time = 13.0553 seconds\n", "\u001b[0m\u001b[1m\u001b[34mINFO: ## Validation summary\n", "\u001b[0m\u001b[1m\u001b[34mINFO: accuracy = 0.9644\n", "\u001b[0m\u001b[1m\u001b[34mINFO: Finish training on MXNet.mx.Context[CPU0]\n", "\u001b[0m" ] }, { "name": "stdout", "output_type": "stream", "text": [ "282.506040 seconds (8.59 M allocations: 4.680 GB, 4.63% gc time)\n" ] } ], "source": [ "@time mx.fit(model, optimizer, train_provider, eval_data=eval_provider, n_epoch=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After 20 epochs of training, we seem to be hitting a ceiling on validation accuracy. Let's use this newly trained model to predict on the test set and submit to Kaggle." ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[34mINFO: TempSpace: Total 1 MB allocated on CPU0\n", "\u001b[0m" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 3.578575 seconds (5.62 k allocations: 86.121 MB, 0.41% gc time)\n" ] }, { "data": { "text/plain": [ "10×28000 Array{Float32,2}:\n", " 3.17956f-5 0.999099 1.12871f-6 … 1.32057f-6 6.29556f-6 \n", " 5.82953f-8 1.96553f-8 0.000100165 3.53726f-8 1.18404f-7 \n", " 0.99966 0.000367172 1.80263f-6 1.26685f-6 0.999141 \n", " 0.000230099 1.24621f-6 8.82856f-5 0.0032008 0.000576255\n", " 5.89889f-6 1.49464f-7 0.0208702 0.0349224 5.21555f-5 \n", " 1.43514f-7 0.000451465 0.000577244 … 0.000289193 7.39706f-7 \n", " 9.90732f-7 7.13181f-5 6.17309f-7 2.8054f-7 1.98969f-6 \n", " 1.8722f-5 4.91723f-6 0.000499786 0.00010417 4.51312f-6 \n", " 5.00769f-5 1.48886f-6 0.00483644 0.000480367 0.00021646 \n", " 1.78278f-6 3.23831f-6 0.973024 0.961 1.00423f-6 " ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this uses the recently trained model and the same DataProvider on the test set \n", "@time tpreds = mx.predict(model, test_provider)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# create submission\n", "open(\"MLP2submission.csv\", \"w\") do f\n", " write(f, \"ImageId,Label\\n\")\n", " for i = 1:size(tpreds)[2]\n", " write(f, string(i),\",\",string(indmax(tpreds[:,i])-1),\"\\n\")\n", " end\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When I submit this new file to Kaggle, I get a score of 0.95657, which is similar to the error of 4.5% from the LeCun paper. Creating MLPs is very easy with MXNet and there is even an `MLP()` constructor. The equivalent code for this is:\n", "```\n", "mlp2 = @mx.chain mx.Variable(:data) =>\n", " mx.MLP([300, 100, 10]) =>\n", " mx.SoftmaxOutput(name=:softmax)\n", "```\n", "Running this model for 20 epochs and submitting to Kaggle got me a better score of 0.96543. I'm not sure why yet, although the defaults for this constructor are different. For example, `MLP()` defaults to using `:relu` activations which have become more standard as they are faster than the original `:tanh` activations I used. ReLU do have problems at zero due to the discontinuity there, so other variants like LeakyReLU have been developed as mentioned above.\n", "\n", "I hope this notebook has been helpful. Please leave a comment if you have suggestions for improvement." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[](http://creativecommons.org/licenses/by-sa/4.0/) \n", "\n", "Licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.1.1", "language": "julia", "name": "julia-1.1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.1.1" }, "toc": { "nav_menu": { "height": "150px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 4, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }