{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training\n", "\n", "We now run our RandomForest modeling software on our training set, described earlier, and derive a model along with some parameters describing how good our model is. " ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%pylab inline\n", "# We pull in the training, validation and test sets created according to the scheme described\n", "# in the data exploration lesson.\n", "\n", "import pandas as pd\n", "\n", "samtrain = pd.read_csv('../datasets/samsung/samtrain.csv')\n", "samval = pd.read_csv('../datasets/samsung/samval.csv')\n", "samtest = pd.read_csv('../datasets/samsung/samtest.csv')\n", "\n", "# We use the Python RandomForest package from the scikits.learn collection of algorithms. \n", "# The package is called sklearn.ensemble.RandomForestClassifier\n", "\n", "# For this we need to convert the target column ('activity') to integer values \n", "# because the Python RandomForest package requires that. \n", "# In R it would have been a \"factor\" type and R would have used that for classification.\n", "\n", "# We map activity to an integer according to\n", "# laying = 1, sitting = 2, standing = 3, walk = 4, walkup = 5, walkdown = 6\n", "# Code is in supporting library randomforest.py\n", "\n", "import randomforests as rf\n", "samtrain = rf.remap_col(samtrain,'activity')\n", "samval = rf.remap_col(samval,'activity')\n", "samtest = rf.remap_col(samtest,'activity')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import sklearn.ensemble as sk\n", "rfc = sk.RandomForestClassifier(n_estimators=500, compute_importances=True, oob_score=True)\n", "train_data = samtrain[samtrain.columns[1:-2]]\n", "train_truth = samtrain['activity']\n", "model = rfc.fit(train_data, train_truth)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.97946768060836498" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# use the OOB (out of band) score which is an estimate of accuracy of our model.\n", "rfc.oob_score_" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[(0.052194982088894198, 'tAccMean'),\n", " (0.046418448022626055, 'tAccStd'),\n", " (0.043291948466911298, 'tJerkMean'),\n", " (0.053130159100753124, 'tGyroJerkMagSD'),\n", " (0.059232069484007693, 'fAccMean'),\n", " (0.048256742613275803, 'fJerkSD'),\n", " (0.13799007369608407, 'angleGyroJerkGravity'),\n", " (0.17036595812582825, 'angleXGravity'),\n", " (0.044817236984266123, 'angleYGravity')]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### TRY THIS\n", "# use \"feature importance\" scores to see what the top 10 important features are\n", "fi = enumerate(rfc.feature_importances_)\n", "cols = samtrain.columns\n", "[(value,cols[i]) for (i,value) in fi if value > 0.04]\n", "## Change the value 0.04 which we picked empirically to give us 10 variables\n", "## try running this code after changing the value up and down so you get more or less variables\n", "## do you see how this might be useful in refining the model?\n", "## Here is the code in case you mess up the line above\n", "## [(value,cols[i]) for (i,value) in fi if value > 0.04]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the predict() function using our model on our validation set and our test set and get the following results from our analysis of errors in the predictions." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# pandas data frame adds a spurious unknown column in 0 position hence starting at col 1\n", "# not using subject column, activity ie target is in last columns hence -2 i.e dropping last 2 cols\n", "\n", "val_data = samval[samval.columns[1:-2]]\n", "val_truth = samval['activity']\n", "val_pred = rfc.predict(val_data)\n", "\n", "test_data = samtest[samtest.columns[1:-2]]\n", "test_truth = samtest['activity']\n", "test_pred = rfc.predict(test_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####Prediction Errors and Computed Error Measures " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean accuracy score for validation set = 0.846477\n", "mean accuracy score for test set = 0.895623\n" ] } ], "source": [ "print(\"mean accuracy score for validation set = %f\" %(rfc.score(val_data, val_truth)))\n", "print(\"mean accuracy score for test set = %f\" %(rfc.score(test_data, test_truth)))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# use the confusion matrix to see how observations were misclassified as other activities\n", "# See [5]\n", "import sklearn.metrics as skm\n", "test_cm = skm.confusion_matrix(test_truth,test_pred)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# visualize the confusion matrix" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPYAAADzCAYAAAC13+t7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtYVHX+B/D3AGPIReQiwwgoGLLDKMokgrmSmAKZYpqm\n4o1HtLayHhG7WFqKqWilrubPNt1ycyvEag1XgUfdHDLblQwp10uYMYaEKCAkchkYvr8/WE8MMMzh\nMJfD8Hk9z3keZubM+XzH8TPf7/dcPkfCGGMghNgUO2s3gBBiepTYhNggSmxCbBAlNiE2iBKbEBtE\niU2IDRJ1YtfV1SE+Ph79+/fHnDlzBG/n448/RlxcnAlbZj2nTp2CQqEQ9N4ff/wRYWFh6NevH3bt\n2mXillmWRqOBnZ0dmpubrd0UcWIm8PHHH7NRo0YxFxcXJpfL2eTJk9nXX3/d7e3u37+fRUREMJ1O\nZ4JWip9EImFXr1412/aTkpJYSkqKyba3du1atmDBApNsq6ufvaioiEkkEl7/N06ePMn8/Py607we\np9s99rZt27BixQqsWbMGN2/eRHFxMZYtW4bDhw93+0fn2rVrCA4Ohp2dqAcWJsU6OV+oqampW9u+\ndu0alEqloPfqdLpuxeajs88uNk4SCSQ8Fw8PD8s3sDu/ClVVVczFxYV99tlnBtepr69ny5cvZwMH\nDmQDBw5kycnJrKGhgTHW8kvq6+vLtm7dyry9vZlcLmf79u1jjDH2+uuvsz59+jCpVMpcXFzY+++/\n366HaPurvW/fPjZkyBDm6urKAgMD2ccff8w9P27cOO59p0+fZuHh4czNzY2NHj2affPNN9xr48eP\nZ6+99hr74x//yFxdXVlsbCwrLy/v8LPda/+bb77JBgwYwORyOTt06BA7evQoGzp0KPPw8GBpaWnc\n+mfOnGFjxoxh/fv3Z3K5nD333HNMq9UyxhiLiopiEomEOTs7MxcXF3bw4EFu+1u2bGE+Pj5s0aJF\ner3PTz/9xDw8PFh+fj5jjLGSkhLm5eXFcnNz27V1woQJzN7enjk6OjJXV1d25coVVlVVxRYuXMgG\nDBjABg8ezDZs2MCam5u5f7OxY8eyFStWME9PT/baa6/pbS87O1vv+wkLC+P+TyQlJTG5XM58fX3Z\nmjVruO/nypUr7KGHHmJubm7My8uLzZ071+Bnb0un07GVK1cyLy8vNmTIELZr1y697/6DDz5gISEh\nzNXVlQ0ZMoS99957jDHGampqmKOjI7Ozs2MuLi7M1dWVlZaWdvpd8AGAbeC5dDPNBOlWxOzsbObg\n4NDpcOi1115jDz74ILt16xa7desWGzt2LPef5OTJk8zBwYGtXbuWNTU1saysLObk5MSqqqoYY4yt\nW7eOLVy4kNvWunXrDCZ2TU0N69evHyssLGSMMXbjxg124cIFxph+YldUVLD+/fuzjz76iOl0Opae\nns7c3d1ZZWUlY6wlsYOCgtiVK1dYXV0di46OZqtWrerws91r/xtvvMGamprY3r17maenJ5s3bx6r\nqalhFy5cYH379mUajYYxxth3333Hzpw5w3Q6HdNoNCwkJIT9+c9/5rbXdjh6b/urVq1iWq2W1dXV\ntRtW7t27lymVSlZbW8tiY2PZiy++aPC7iI6OZu+//z73eOHChWz69OmspqaGaTQaFhwczL2+b98+\n5uDgwHbt2sV0Oh2rq6trt7223w9jjE2fPp09/fTTrLa2lt28eZNFRERwSTZ37ly2adMmxhhjDQ0N\n7PTp0wY/e1vvvvsuUygU7Pr166yyspJFR0czOzs77v/e0aNH2c8//8wYYyw3N5c5OTlxP3hqtbrd\nUNzYd2EMALaF52KNxO7WGLeiogJeXl6dDpU/+eQTvP766/Dy8oKXlxfWrl2Lv//979zrUqkUr7/+\nOuzt7TF58mS4uLjgxx9/vDea0BueMSNDNTs7O5w/fx51dXWQyWQdDjuPHj2KP/zhD5g/fz7s7Oww\nd+5cKBQKbuogkUiwePFiBAUFwdHREbNnz0ZBQYHBmFKpFKtXr4a9vT3mzJmDyspKJCcnw9nZGUql\nEkqlknv/Aw88gIiICNjZ2WHw4MF46qmnkJuba/QzpaamQiqVwtHRsd3rS5cuRVBQECIiIlBWVoaN\nGzd2ur17/4Y6nQ4ZGRlIS0uDs7MzBg8ejJUrV+p9NwMHDsSyZctgZ2fXYey2309ZWRmys7Oxfft2\n9O3bFwMGDEBycjIOHDgAAOjTpw80Gg1KSkrQp08fjB07ttO2tnbw4EGsWLECvr6+cHd3x6uvvqoX\n+9FHH0VgYCAA4KGHHkJsbCxOnTql95lbE/JdtOXAc7GGbiW2p6cnysvLO90z+euvv2Lw4MHc40GD\nBuHXX3/V20brHwYnJyfU1NR0uS3Ozs7IyMjAX/7yFwwcOBBTp07lfiDatmfQoEF6zw0ePFivTT4+\nPtzfffv27bQ9np6ekEgk3LoAIJPJ9N5/9+5dAEBhYSGmTp0KuVwONzc3rF69GhUVFZ1+rgEDBqBP\nnz6drrN06VJcuHABzz//PKRSaafr3mtreXk5Ghsb2303JSUl3GN/f/9Ot9XWtWvX0NjYCLlcDnd3\nd7i7u+Ppp5/GrVu3AABvvvkmGGOIiIjA8OHDsW/fPt7bLi0t1WtP2+8wOzsbY8aMgaenJ9zd3ZGV\nldXpv62Q76KtvjwXa+hWYj/44IO47777cOjQIYPrDBw4EBqNhnv8yy+/YODAgYLiubi4oLa2lnt8\n48YNvddjY2Nx7Ngx3LhxAwqFAk8++WS7bfj6+uLatWt6z127dg2+vr6C2tQVzzzzDJRKJX766SdU\nV1dj48aNRg/X3EtEQ2pqapCcnIylS5di7dq1uH37Nq+2eHl5QSqVtvtu/Pz8eMduO1Lz9/fHfffd\nh4qKCty+fRu3b99GdXU1zp8/D6DlB2/Pnj0oKSnBe++9h2effRY///wzr/bK5XL88ssvem29p6Gh\nATNnzsRLL72Emzdv4vbt23j00Ue5nrqjzyHku2hLynOxhm4ltpubG9avX49ly5YhMzMTtbW1aGxs\nRHZ2Nl5++WUAQEJCAjZs2IDy8nKUl5dj/fr1WLhwoaB4YWFh+Oqrr1BcXIzq6mqkpaVxr928eROZ\nmZm4e/cupFIpnJ2dYW9v324bkydPRmFhIdLT09HU1ISMjAxcvnwZU6dO5dYxNuQXqqamBq6urnBy\ncsLly5fx7rvv6r0uk8lw9erVLm1z+fLliIiIwJ49ezBlyhQ8/fTTna5/77PZ29tj9uzZWL16NWpq\nanDt2jVs374dCxYs4B1bJpNBo9Fw25TL5YiNjUVKSgru3LmD5uZmXL16FV999RUA4NNPP8X169cB\nAP3794dEIuF+HIx99tmzZ2Pnzp0oKSnB7du3sXnzZu41rVYLrVbLTQuzs7Nx7NgxvXZWVFTgt99+\n454z9l3wYbNDcQBISUnBtm3bsGHDBnh7e2PQoEHYvXs3ZsyYAQBYs2YNwsPDMWLECIwYMQLh4eFY\ns2YN9/7OeoV7hwvumTRpEubMmYMRI0Zg9OjRiI+P515vbm7G9u3b4evrC09PT5w6dYr7slpvx9PT\nE0eOHMHWrVvh5eWFt99+G0eOHNE7JNE6Zts2dNTGzh639vbbb+OTTz5Bv3798NRTT2Hu3Ll6669b\ntw6JiYlwd3fHZ599ZjD2vecyMzNx7Ngx7nNu27YN+fn5SE9P59Xed955B87OzhgyZAiioqIwf/58\nLF68mNfnBoAnnngCQMu/aXh4OABg//790Gq1UCqV8PDwwBNPPMGNrM6ePYsxY8bA1dUVjz32GHbu\n3ImAgIAOP3tbTz75JOLi4jBy5EiEh4dj5syZXPtcXV2xc+dOzJ49Gx4eHkhPT8djjz3GvVehUCAh\nIQFDhgyBh4cHbty4YfS74EPMPbaEmat7IsSGSSQSHOC57lxY/hi9tUYKhPR41uqN+bDaKV05OTlQ\nKBQYOnQotmzZYpGYSUlJkMlkCA0NtUi8e4qLizFhwgQMGzYMw4cPx86dO80es76+HpGRkQgLC4NS\nqcQrr7xi9pit6XQ6qFQqxMfHWyxmQEAARowYAZVKhYiICLPHE/NQ3PJHzhljTU1N7P7772dFRUVM\nq9WykSNHsosXL5o97ldffcXy8/PZ8OHDzR6rtdLSUnbu3DnGGGN37txhwcHBFvm8d+/eZYwx1tjY\nyCIjI9mpU6fMHvOerVu3snnz5rH4+HiLxQwICGAVFRUWiQWAHee5WCPNrNJj5+XlISgoCAEBAZBK\npZg7dy4yMzPNHjcqKgru7u5mj9OWj48PwsLCALQcsgsJCdE7bm4uTk5OAFr2Gut0Oouds3z9+nVk\nZWVh6dKlFp9bWjKeTe8VF6KkpETvZAM/Pz+9EyNsmUajwblz5xAZGWn2WM3NzQgLC4NMJsOECRME\nXwDSVStWrMBbb71l8Yt3JBIJJk2ahPDwcOzdu9fs8YQOxQ1NzebMmQOVSgWVSoXAwECoVCruPWlp\naRg6dCgUCoXeoTxDrPKD0tXDCraipqYGs2bNwo4dO+Di4mL2eHZ2digoKEB1dTXi4uKgVqsRHR1t\n1phHjhyBt7c3VCoV1Gq1WWO1dfr0acjlcty6dQsxMTFQKBSIiooyWzyhySOVSrF9+3aEhYWhpqYG\no0aNQkxMDDIyMrh1XnjhBfTv3x8AcPHiRWRkZODixYsoKSnBpEmTUFhY2OkPp1V6bF9fXxQXF3OP\ni4uL9c54skWNjY2YOXMmFixYgOnTp1s0tpubG6ZMmYKzZ8+aPdY333yDw4cPIzAwEAkJCfjyyy+x\naNEis8cFWk6QAVpOw50xYwby8vLMGk9oj21sasYYw8GDB5GQkACg5XyFhIQESKVSBAQEICgoyOhn\ns0pih4eH48qVK9BoNNBqtcjIyMC0adOs0RSLYIxhyZIlUCqVSE5OtkjM8vJyVFVVAWipRHP8+HG9\noZ25bNq0CcXFxSgqKsKBAwfw8MMPY//+/WaPW1tbizt37gAA7t69i2PHjpn96Icp5tgdTc1OnToF\nmUyG+++/H0DL9Q2tOz4+U1erDMUdHBywa9cuxMXFQafTYcmSJQgJCTF73ISEBOTm5qKiogL+/v5Y\nv349d6aVOZ0+fRofffQRdygGaJkzPfLII2aLWVpaisTERDQ3N6O5uRkLFy7ExIkTzRbPEEtNu8rK\nyrizHZuamjB//nzExsaaNaahQ1l5AL7l8X5DU7P09HTMmzev0/ca+3elM88IEUAikaCI57qBaL+3\nvrGxEVOnTsXkyZP1RnFNTU3w8/NDfn4+d7HUvfPiV61aBQB45JFHkJqa2ukO2N5Tc4gQExM6x+5s\nanbixAmEhIToXQE5bdo0HDhwAFqtFkVFRbhy5YrRE3DolFJCBBKaPJ1NzTIyMridZvcolUrMnj0b\nSqUSDg4O2L17Nw3FCTEHiUSCCp6Z7dnUAy8C6a3HpIlt6koCOvDNnu4VlxXEJEPxtQLfpwYQLfC9\nqYKjdjdyd/SmuNaI2d24qV1aW9q+jodo0BybEIF499hWIOKmESJu0vus3QLDrJrYAb0ucm+Ka42Y\nFo4r4m6REpvi2lBMC8elxCbEBok4e0TcNEJEjvaKE2KDRJw9Im4aISJHe8UJsUEizh6jV3dZo0ww\nIT2CiKsZdprYOp0Ozz33HHJycnDx4kWkp6fj0qVLlmobIeJmz3Oxgk4T21plggnpEXpqj92bywQT\nYpTAxDZ2Z5itW7fCzs4OlZWV3HMmLT/M95JMdau/A2DNM8oI6QrN/xaBBPbGhsoPh4SEoLi4GMeP\nH8fgwYO59U1efphvmeDoVktAlz4iIdYUAP3/vV10H8+ljc7KD6ekpODNN9/UW9/k5Yd7W5lgQrrE\nBHPs1uWHMzMz4efnhxEjRuitY/Lyw9YqE0xIj2Bgj7f6JqC+ZfztrcsP29nZYdOmTTh+/Dj3emfV\nXIxNk43OEiZPnozJkycbbyUhvY2B7Ike2LLck3qx/Tpt7wxz/vx5aDQajBw5EkDLjQ1HjRqFM2fO\ntJsSX79+Hb6+vp02jcoPEyKUwKF4R+WHQ0NDUVZWhqKiIhQVFXG1xWUyGZUfJsSiBJ580lH54U2b\nNumNjFsPta1SflgikXSrrKBQ3StmSEhHUnlXKZVIJGBL+W1V8tceWH6YkF7L0doNMIwSmxChqNAC\nITZIxNkj4qYRInIizh4RN40QkaOhOCE2SMTZY5KmWePQE9vStfssmYpkrZVuTlq/zjpxiWG2ntiE\n9EpUzJAQGyTi7BFx0wgRORFnj4ibRojI0V5xQmyQiLNHxE0jROREnD0ibhohIifioTgVWiBEKEee\nSxuGyg9/+umnGDZsGOzt7ZGfn6/3HpOWHyaEdMLE5YdDQ0Nx6NAh/OlPf9Jb3+TlhwkhnRB4ix9D\n5YcVCgWCg4PbrW/y8sOEkE6YuPywISYvPwwASUlJOHr0KLy9vXH+/HljqxPSexjIHvUPgJpHqrQu\nP+zi4tKl0MZqnhntsRcvXoycnJwuBSWkVzAw9I5WAesW/b50pG354c6YpfxwVFQU3N3dja1GSO8j\ncK94R+WHO1rnHio/TIglmbj8cENDA55//nmUl5djypQpUKlUyM7OFlR+2ESJrW71dwDo1nykZ9DA\nGnfbHDduHJqbmzt8zdCw/NVXX8Wrr77KO4aJEjvaNJshxKICoN8J5Xbt7SIe74q4aYSInIizx+jO\ns4SEBIwdOxaFhYXw9/fHvn37LNEuQsRP4AkqlmD0Nyc9Pd0S7SCk5xFxjy3iphEiclTzjBAbJOLs\nEXHTCBE5EWePiJtGiMiJOHtE3DRCxI2JuIIKJTYhAulEnD0ibhoh4kaJTYgNarivD881tWZtR0co\nsQkRSGcv3kl2j01syct3rRKXKTq/XM5cJJctf0dT0jmdiOsPU80zQgRqgj2vpa2kpCTIZDKEhoZy\nz+Xl5SEiIgIqlQqjR4/Gt99+y73W1dLDACU2IYLp4MBraaujcmMvvfQS3njjDZw7dw7r16/HSy+9\nBEC/9HBOTg6effZZg9dyt0aJTYhAOtjzWtrqqNyYXC5HdXU1AKCqqoqraSak9DDQg+fYhFibKefY\nmzdvxrhx4/DCCy+gubkZ//73vwG0lB4eM2YMtx6f0sMAJTYhgjWg48Ndeep65Knru7StJUuWYOfO\nnZgxYwY+/fRTJCUl4fjx4x2ua6zeGUCJTYhgHc2fAWBUtAtGRf9eJ/z/UquNbisvLw8nTpwAAMya\nNQtLly4FIKz0MEBzbEIEEzrH7khQUBByc1tqrn355ZfcrX6ElB4GqMcmRDChc+yEhATk5uaivLwc\n/v7+WL9+Pfbs2YNly5ahoaEBffv2xZ49ewBAUOlhAJCw1pXJBWgJYo2TJ16yQkyAKZytEpdOULGE\nVPBNB4lEgjw2nNe6EZL/8t6uqVCPTYhAhubYYiDelhEicj36lNLi4mJMmDABw4YNw/Dhw7Fz505L\ntIsQ0dOiD6/FGoz22FKpFNu3b0dYWBhqamowatQoxMTEICQkxBLtI0S0OjoPXCyMJraPjw98fHwA\nAC4uLggJCcGvv/5KiU16PZuZY2s0Gpw7dw6RkZFtXlG3+jsAdFM+0jNo0J2b8ol5js07sWtqajBr\n1izs2LEDLi4ubV6NNm2rCLGIAHTnpnw9PrEbGxsxc+ZMLFiwwOBtPgnpbXr0HJsxhiVLlkCpVCI5\nOdkSbSKkR9CK+B4/Rg93nT59Gh999BFOnjwJlUoFlUrV7iJxQnojU54rbmpGe+xx48bxqthASG/T\no4fihJCO2czhLkLI73r8XnFCSHtiTmwqtECIQEJ3nnVUfnjdunXw8/PjdlBnZ2dzrwkpP0w9NiEC\nNQg83LV48WI8//zzWLRoEfecRCJBSkoKUlJS9NZtXX64pKQEkyZNQmFhIezsOu+TqccmRCBTlh8G\n0GExBqHlhymxCRHI1Mex33nnHYwcORJLlixBVVUVgJbyw35+ftw6fMsPU2ITIpChW/pcUf+KY+vO\ncAsfzzzzDIqKilBQUAC5XI6VK1caXJfKDxNiRoaOYw+KDsSg6EDu8cnUfxvdlre3N/f30qVLER8f\nD0B4+eEenNhvWiWqtYoKMnWqVeJKok9aJS7wg5Xi8mfKw12lpaWQy+UAgEOHDnF7zKdNm4Z58+Yh\nJSUFJSUlVH6YEHMzVfnh1NRUqNVqFBQUQCKRIDAwEO+99x6AXll+uHehHtsSlnep/PBK9gavdbdK\nXqPyw4T0FHSuOCE2SMynlFJiEyIQJTYhNoiuxybEBtEcmxAbRENxQmyQtW7fwwclNiEC9eg5dn19\nPcaPH4+GhgZotVo89thjSEtLs0TbCBG1Hj3HdnR0xMmTJ+Hk5ISmpiaMGzcOX3/9NcaNG2eJ9hEi\nWj1+ju3k5AQA0Gq10Ol08PDwMGujCOkJxJzYvK7Hbm5uRlhYGGQyGSZMmAClUmnudhEieoaux267\nWAOvHtvOzg4FBQWorq5GXFwc1Go1oqOjW62hbvV3AOhum6RnuALgJ8Hv7tFz7Nbc3NwwZcoUnD17\ntk1iRxt4ByFiNvR/yz1du3WVmA93GR2Kl5eXc/WX6urqcPz4cahUKrM3jBCxEzoU76j88IsvvoiQ\nkBCMHDkSjz/+OKqrq7nXhJQfNprYpaWlePjhhxEWFobIyEjEx8dj4sSJvDZOiC3TwYHX0tbixYvb\n3dgyNjYWFy5cwPfff4/g4GDukHLr8sM5OTl49tlned1Lz+hQPDQ0FPn5+Xw/KyG9htC94lFRUdBo\nNHrPxcTEcH9HRkbi888/B2C4/PCYMWM6jSHe2T8hImcose+oz+GO+pzg7X7wwQdISEgA0FJ+uHUS\n8y0/TIlNiECGEtspOhxO0eHc49LUD3hvc+PGjejTpw/mzZtncB0qP0yIGQm9xY8hf/vb35CVlYV/\n/etf3HNCyw/TDQMIEciUdwLJycnBW2+9hczMTDg6OnLPT5s2DQcOHIBWq0VRURGVHybE3ExZfjgt\nLQ1arZbbifbggw9i9+7dVH7Y1lH5YUvoWvnhwewSr3WvSUKo/DAhPYXNnFJKCPmdmK/uosQmRCBK\nbJsSbZWoEuuERR573ipxIyT/sELU5V1au0Er3otAKLEJEUjXJN70EW/LCBE5XRMNxQmxOZTYhNig\npkZKbEJsTrNOvOkj3pYRInY0FCfEBtWLN33E2zJCxK7J2g0wjBKbEKFEnNh0PTYhQjXxXDqwY8cO\nhIaGYvjw4dixYwcAoLKyEjExMQgODkZsbCxXHVgISmxChGrkubTx3//+F3/961/x7bff4vvvv8eR\nI0dw9epVbN68GTExMSgsLMTEiROxefNmwU3jldg6nQ4qlQrx8fGCAxFic3Q8lzYuX76MyMhIODo6\nwt7eHuPHj8fnn3+Ow4cPIzExEQCQmJiIL774QnDTeCX2jh07oFQqeVVuIKTXEDgUHz58OE6dOoXK\nykrU1tYiKysL169fR1lZGWQyGQBAJpOhrKxMcNOM7jy7fv06srKysHr1amzbtk1wIEJsTr2B579X\nAz+oDb5NoVDg5ZdfRmxsLJydnREWFgZ7e/1j4hKJpFsdqdEee8WKFXjrrbdgZ0fTcUL0GOqhh0UD\nCet+XzqQlJSEs2fPIjc3F+7u7ggODoZMJsONGzcAtNyBx9vbW3DTOu2xjxw5Am9vb6hUKqjV6k7W\nbP1aAOhum6RnOAMgT/jbu3G46+bNm/D29sYvv/yCf/zjH/jPf/6DoqIifPjhh3j55Zfx4YcfYvr0\n6YK332lif/PNNzh8+DCysrJQX1+P3377DYsWLcL+/fvbrBktuAGEWE/k/5Z73una27uR2LNmzUJF\nRQWkUil2794NNzc3rFq1CrNnz8b777+PgIAAHDx4UPD2eVcpzc3Nxdtvv41//vOf+hvodVVKo60U\nV22VqHnsc6vEtU4FleAuVSnFAZ6VR+dKxF2llPaKE9JKB4eyxIJ3Yo8fPx7jx483Z1sI6VlEfEop\nnStOiFCGDneJACU2IUJRj02IDaLEJsQGUWITYoM6uHJLLCixCRHKFg53EULaoL3ihNggmmMTYoNo\njm1L1FaK29cqUSMk71ol7iUWY/GYIV09Y5rm2ITYIBqKE2KDRJzYVBaFEKEEVikFgKqqKsyaNQsh\nISFQKpU4c+YMlR8mRBQaeC4dWL58OR599FFcunQJP/zwAxQKhUnLD/MutGBwA72u0IK1WGfnGfBH\nq0S9xBZYPGaI5FrXCi0k8EyddP1CC9XV1VCpVPj555/1VlMoFMjNzeVqn0VHR+Py5cu8298a9diE\nCCVwKF5UVIQBAwZg8eLFeOCBB/Dkk0/i7t27li0/TAgxwNDhrltqoFxt8G1NTU3Iz8/Hrl27MHr0\naCQnJ7cbdpu9/DAhxABD5Yfdo4Gh635f2vDz84Ofnx9Gjx4NoKWwYX5+Pnx8fExWfpgSmxChBN4J\nxMfHB/7+/igsLAQAnDhxAsOGDUN8fDw+/PBDADBv+WFCSCe6cUrpO++8g/nz50Or1eL+++/Hvn37\noNPpTFZ+mBKbEKEMHMriY+TIkfj222/bPX/ixIluNOh3vBI7ICAA/fr1g729PaRSKfLyunH3BEJs\nhYjPPOOV2BKJBGq1Gh4eHuZuDyE9hy1c3WXpOxkQInoivrqL115xiUSCSZMmITw8HHv37jV3mwjp\nGQTuFbcEXj326dOnIZfLcevWLcTExEChUCAqKqrVGupWfweA7rZJeoI8dT3y1N2obyTiOXaXzxVP\nTU2Fi4sLVq5c2bIBOlfcQuhccXPr8rniQTxT5yfL35TP6FC8trYWd+7cAQDcvXsXx44dQ2hoqNkb\nRojodePqLnMzOhQvKyvDjBkzALSc4zp//nzExsaavWGEiJ6Ih+JGEzswMBAFBQWWaAshPYstHO4i\nhLQh4sNdlNiECNWTh+KEEAMosQmxQTTHJsQGibjHtnKhBQ3FtYifja9icuesEBPdO5PMQurr6xEZ\nGYmwsDAolUq88sorAGBL5Yc1FNciKLHFxNHRESdPnkRBQQF++OEHnDx5El9//bVJyw9TaSRCrMDJ\nyQkAoNWqdTXdAAACM0lEQVRqodPp4O7ujsOHDyMxMREAkJiYiC+++ELw9imxCRFM+K1AmpubERYW\nBplMhgkTJmDYsGEmLT9sohsGEGIbunQRCGoNvPoVgFOtHm80uN3q6mrExcUhLS0Njz/+OG7fvs29\n5uHhgcrKSl7taavbe8WpAAPpvQwd73rwf8s9Gw1uwc3NDVOmTMF3333H3QHEx8eHyg8TYj11PBd9\n5eXl3B7vuro6HD9+HCqVCtOmTaPyw4RYn7AzVEpLS5GYmIjm5mY0Nzdj4cKFmDhxIlQqlcnKD3d7\njk1Ib9Qyxy7iuXagxaes1GMTIph4zymlxCZEMPGeU0qJTYhg1GMTYoPa7/EWC0psQgSjoTghNoiG\n4oTYIOqxCbFB1GMTYoOoxybEBlGPTYgNosNdhNgg6rEJsUE0xybEBom3x6ZCC4QI1sRzaS8nJwcK\nhQJDhw7Fli1bTN4ySmxCBBNWzFCn0+G5555DTk4OLl68iPT0dFy6dMmkLaPEJkQwYT12Xl4egoKC\nEBAQAKlUirlz5yIzM9OkLaPEJkQwYTXPSkpK4O/vzz328/NDSUmJSVtGO88IEWwdr7VcXFz0Hlui\nZDclNiECdKeGma+vL4qLi7nHxcXF8PPzM0WzODQUJ8TCwsPDceXKFWg0Gmi1WmRkZGDatGkmjUE9\nNiEW5uDggF27diEuLg46nQ5LlixBSEiISWNQ+WFCbBANxQmxQZTYhNggSmxCbBAlNiE2iBKbEBtE\niU2IDfp/RUzyzXdlcM0AAAAASUVORK5CYII=\n" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pylab as pl\n", "pl.matshow(test_cm)\n", "pl.title('Confusion matrix for test data')\n", "pl.colorbar()\n", "pl.show()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# compute a number of other common measures of prediction goodness" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now compute some commonly used measures of prediction \"goodness\". \n", "For more detail on these measures see\n", "[6],[7],[8],[9]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy = 0.895623\n" ] } ], "source": [ "# Accuracy\n", "print(\"Accuracy = %f\" %(skm.accuracy_score(test_truth,test_pred)))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Precision = 0.897903\n" ] } ], "source": [ "# Precision\n", "print(\"Precision = %f\" %(skm.precision_score(test_truth,test_pred)))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Recall = 0.895623\n" ] } ], "source": [ "# Recall\n", "print(\"Recall = %f\" %(skm.recall_score(test_truth,test_pred)))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "F1 score = 0.896047\n" ] } ], "source": [ "# F1 Score\n", "print(\"F1 score = %f\" %(skm.f1_score(test_truth,test_pred)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Instead of using domain knowledge to reduce variables, use Random Forests directly on the full set of columns. Then use variable importance and sort the variables. \n", "\n", "Compare the model you get with the model you got from using domain knowledge. \n", "You can short circuit the data cleanup process as well by simply renaming the variables x1, x2...xn, y where y is 'activity' the dependent variable. \n", "\n", "Now look at the new Random Forest model you get. It is likely to be more accurate at prediction than the one we have above. It is a black box model, where there is no meaning attached to the variables. \n", " \n", "* What insights does it give you? \n", "* Which model do you prefer? \n", "* Why? \n", "* Is this an absolute preference or might it change? \n", "* What might cause it to change? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "[1] Original dataset as R data https://spark-public.s3.amazonaws.com/dataanalysis/samsungData.rda \n", "[2] Human Activity Recognition Using Smartphones http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones \n", "[3] Android Developer Reference http://developer.android.com/reference/android/hardware/Sensor.html \n", "[4] Random Forests http://en.wikipedia.org/wiki/Random_forest \n", "[5] Confusion matrix http://en.wikipedia.org/wiki/Confusion_matrix\n", "[6] Mean Accuracy http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1054102&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1054102\n", "\n", "[7] Precision http://en.wikipedia.org/wiki/Precision_and_recall\n", "[8] Recall http://en.wikipedia.org/wiki/Precision_and_recall\n", "[9] F Measure http://en.wikipedia.org/wiki/Precision_and_recall" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"../styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.8" } }, "nbformat": 4, "nbformat_minor": 0 }