{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## A Machine Learning Example where we compare Gaussian Elimination with the commonly used method of today\n", "\n", "We show that a simple linear neuron can be \"learned\" with Gaussian elimination, and indeed is much\n", "faster and more accurate upon doing so. (Much of machine learning is non-linear.)\n", "\n", "Our model of the universe is that we have an unknow 3-vector\n", "\n", "$w = \\left[ \\begin{array}{c} w_1 \\\\ w_2 \\\\ w_3 \\end{array} \\right]$\n", "\n", "that we wish to learn. We have three 3-vectors $x_1,x_2,x_3$ and the corresponding scalar values\n", "$y_1 = w \\cdot x_1$, $\\ y_2 = w \\cdot x_2$, $\\ y_3 = w \\cdot x_3$. (Caution: The $x_i$ are 3-vectors,\n", "not components.) We will show that Gauassian elimination learns $w$ very quickly, while standard deep learning\n", "approaches (which use a version of gradient descent currently considered the best known as [ADAM](https://arxiv.org/abs/1412.6980) can require many steps, may be inaccurate, and inconsistent.\n", "\n", "One of the issues is how to organize the \"x\" data and the \"y\" data. The \"x\"s can be the columns or rows of a matrix, or can be a vector of vectors. Many applications prefer the matrix approach. The \"y\"s can be bundled into a vector similarly." ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.982331\n", " 0.1774 \n", " 0.212845" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w = rand(3) ## We are setting up a w. We will know it, but the learning algorithm will only have X and y data below." ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.881336\n", " 1.0557 \n", " 0.485883" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Here is the data. Each \"x\" is a 3-vector. Each \"y\" is a number.\n", "n = 3\n", "x1 = rand(3); y1=w ⋅ x1 # We are using the dot product (type \\cdot+tab)\n", "x2 = rand(3); y2=w ⋅ x2\n", "x3 = rand(3); y3=w ⋅ x3\n", "# Gather the \"x\" data into the rows of a matrix and \"y\" into a vector\n", "X=[x1 x2 x3]'\n", "y=[y1; y2; y3]" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.0\n", " 0.0\n", " 0.0" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We check that the linear system for the \"unknown\" w is X*w = y\n", "X*w-y" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.982331\n", " 0.1774 \n", " 0.212845" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Recover w with Gaussian Elimination\n", "X\\y" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.982331\n", " 0.1774 \n", " 0.212845" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Recover w with a machine learning package -- 18.06 students might just want to execute as a black box\n", "using Flux" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "We show how the same problem is commonly done with machine learning. Many learning cycles seem to be needed." ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.982331 0.1774 0.212845] : <== estimate after training\n" ] } ], "source": [ "# t ... a model to be learned to fit the data\n", "t = Dense(3,1)\n", "loss(x,y) = Flux.mse(t(x),y)\n", "opt = ADAM(Flux.params(t)[1:1])\n", "Flux.train!(loss, Iterators.repeated( (X',y'), 20000), opt) # 20000 steps of training\n", "println((t.W).data, \" : <== estimate after training\")" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Adding more data does not help a whole lot" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.948837 0.17883 0.218774] : <== estimate after training\n" ] } ], "source": [ "n = 3000\n", "X = randn(n,3)\n", "y = X*w\n", "t = Dense(3,1)\n", "loss(x,y) = Flux.mse(t(x),y)\n", "opt = ADAM(Flux.params(t)[1:1])\n", "Flux.train!(loss, Iterators.repeated( (X',y'), 2000), opt) # 2000 steps of training\n", "println((t.W).data, \" : <== estimate after training\")" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Julia 0.6.0", "language": "julia", "name": "julia-0.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.6.0" }, "widgets": { "state": { "294167a6-1234-43dc-aef6-951949f1fac6": { "views": [ { "cell_index": 26 } ] }, "41f7367b-0ad3-43e3-bd43-c6e4a1618e8d": { "views": [ { "cell_index": 19 } ] }, "6e3620ec-4915-4734-8d3a-3332fdc63970": { "views": [ { "cell_index": 16 } ] }, "ce72699c-d8cc-4a03-902b-a490178223e5": { "views": [ { "cell_index": 17 } ] }, "db2d9825-08d3-4028-8072-1e865d1a0c4f": { "views": [ { "cell_index": 23 } ] } }, "version": "1.2.0" } }, "nbformat": 4, "nbformat_minor": 2 }