{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Comet + R with nnet\n",
    "\n",
    "This notebook is based on:\n",
    "\n",
    "* https://www.rdocumentation.org/packages/nnet/versions/7.3-13/topics/nnet\n",
    "\n",
    "It attempts to learn to identify species of [Iris flowers](https://en.wikipedia.org/wiki/Iris_flower_data_set) based on some of their characteristics."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Started with Comet\n",
    "\n",
    "To get started with Comet and R, please see:\n",
    "https://www.comet.ml/docs/r-sdk/getting-started/\n",
    "\n",
    "Specifically, you need to create a .comet.yml file\n",
    "or add your Comet API key to create_experiment(). In this example, I've created a ~/.comet.yml file with these contents (replace items for your use):\n",
    "\n",
    "```yaml\n",
    "COMET_WORKSPACE: YOUR-COMET-USERNAME\n",
    "COMET_PROJECT_NAME: PROJECT-NAME\n",
    "COMET_API_KEY: YOUR-API-KEY\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning the Iris Dataset\n",
    "\n",
    "R libraries needed for this notebook:\n",
    "\n",
    "```r\n",
    "install.packages(\"cometr\")\n",
    "install.packages(\"nnet\")\n",
    "install.packages(\"stringr\")\n",
    "install.packages(\"IRdisplay\")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, now we are ready to machine learn. First we import the needed libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "library(cometr)\n",
    "library(nnet)\n",
    "library(stringr)\n",
    "library(IRdisplay)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we create a Comet experiment marking what we would like to log to the server:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Experiment created: https://www.comet.ml/dsblank/cometr/ae316b3138be4dfdb02b305b1fc438c4 \n"
     ]
    }
   ],
   "source": [
    "exp <- create_experiment(\n",
    "  keep_active = TRUE,\n",
    "  log_output = FALSE,\n",
    "  log_error = FALSE,\n",
    "  log_code = TRUE,\n",
    "  log_system_details = TRUE,\n",
    "  log_git_info = TRUE\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: the notebook source isn't logged through the experiment, but we'll log the entire notebook at the end. \n",
    "\n",
    "Let's tag the experiment, so that the experiment will be easy to select in the Comet UI:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "exp$add_tags(c(\"made with nnet\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Iris Dataset\n",
    "\n",
    "Next, for this example, we sample the iris data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "sample_size <- 25 # of each iris type\n",
    "total_size <- 50"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We note the sample_size as a hyperparameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "exp$log_parameter(\"sample_size\", sample_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And actually sample the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "ir <- rbind(iris3[,,1], iris3[,,2], iris3[,,3])\n",
    "targets <- class.ind(c(\n",
    "  rep(\"s\", total_size),\n",
    "  rep(\"c\", total_size),\n",
    "  rep(\"v\", total_size))\n",
    ")\n",
    "samp <- c(\n",
    "  sample(1:total_size, sample_size),\n",
    "  sample((total_size + 1):(total_size * 2), sample_size),\n",
    "  sample(((total_size * 2) + 1):(total_size * 3), sample_size)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at one of the samples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".dl-inline {width: auto; margin:0; padding: 0}\n",
       ".dl-inline>dt, .dl-inline>dd {float: none; width: auto; display: inline-block}\n",
       ".dl-inline>dt::after {content: \":\\0020\"; padding-right: .5ex}\n",
       ".dl-inline>dt:not(:first-of-type) {padding-left: .5ex}\n",
       "</style><dl class=dl-inline><dt>Sepal L.</dt><dd>5.1</dd><dt>Sepal W.</dt><dd>3.5</dd><dt>Petal L.</dt><dd>1.4</dd><dt>Petal W.</dt><dd>0.2</dd></dl>\n"
      ],
      "text/latex": [
       "\\begin{description*}\n",
       "\\item[Sepal L.] 5.1\n",
       "\\item[Sepal W.] 3.5\n",
       "\\item[Petal L.] 1.4\n",
       "\\item[Petal W.] 0.2\n",
       "\\end{description*}\n"
      ],
      "text/markdown": [
       "Sepal L.\n",
       ":   5.1Sepal W.\n",
       ":   3.5Petal L.\n",
       ":   1.4Petal W.\n",
       ":   0.2\n",
       "\n"
      ],
      "text/plain": [
       "Sepal L. Sepal W. Petal L. Petal W. \n",
       "     5.1      3.5      1.4      0.2 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "ir[1,]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see the sepal length, sepal width, petal length, and the petal width. What type of an Iris is it?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".dl-inline {width: auto; margin:0; padding: 0}\n",
       ".dl-inline>dt, .dl-inline>dd {float: none; width: auto; display: inline-block}\n",
       ".dl-inline>dt::after {content: \":\\0020\"; padding-right: .5ex}\n",
       ".dl-inline>dt:not(:first-of-type) {padding-left: .5ex}\n",
       "</style><dl class=dl-inline><dt>c</dt><dd>0</dd><dt>s</dt><dd>1</dd><dt>v</dt><dd>0</dd></dl>\n"
      ],
      "text/latex": [
       "\\begin{description*}\n",
       "\\item[c] 0\n",
       "\\item[s] 1\n",
       "\\item[v] 0\n",
       "\\end{description*}\n"
      ],
      "text/markdown": [
       "c\n",
       ":   0s\n",
       ":   1v\n",
       ":   0\n",
       "\n"
      ],
      "text/plain": [
       "c s v \n",
       "0 1 0 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "targets[1,]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are three types (**c**, **s**, and **v**) and this is type **s**. These stand for versicolor, setosa, and virginica, respectively.\n",
    "\n",
    "## Train the Network\n",
    "\n",
    "Now, the hyperparameters for the actual experiment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "weight_decay <- 5e-4\n",
    "epochs <- 200\n",
    "hidden_layer_size <- 2\n",
    "initial_random_weight_range <- 0.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And log them as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "exp$log_parameter(\"weight_decay\", weight_decay)\n",
    "exp$log_parameter(\"epochs\", epochs)\n",
    "exp$log_parameter(\"hidden_layer_size\", hidden_layer_size)\n",
    "exp$log_parameter(\"initial_random_weight_range\", initial_random_weight_range)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ideally, all we need to do next is:\n",
    "\n",
    "```r\n",
    "nnet(\n",
    "    ir[samp,],\n",
    "    targets[samp,],\n",
    "    size = hidden_layer_size,\n",
    "    rang = initial_random_weight_range,\n",
    "    decay = weight_decay,\n",
    "    maxit = epochs\n",
    ")\n",
    "```\n",
    "\n",
    "However, we wish to log the \"loss\" values from the training. Unfortunately, the loop that does the processing is in C. But we can grab the output, parse it, and then log it. So, a bit of code to do that.\n",
    "\n",
    "Now, we attempt to learn the categories using the `train` function, logging the metric \"loss\" (i.e., \"error\"):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# weights:  19\n",
      "initial  value 55.212711 \n",
      "iter  10 value 32.054406\n",
      "iter  20 value 25.084979\n",
      "iter  30 value 24.785996\n",
      "iter  40 value 18.337328\n",
      "iter  50 value 17.420288\n",
      "iter  60 value 17.196332\n",
      "iter  70 value 17.071140\n",
      "iter  80 value 17.025227\n",
      "iter  90 value 16.961082\n",
      "iter 100 value 16.960243\n",
      "final  value 16.960180 \n",
      "converged\n",
      "a 4-2-3 network with 19 weights\n",
      "options were - decay=5e-04\n"
     ]
    }
   ],
   "source": [
    "ir1 <- NULL\n",
    "\n",
    "train <- function() {\n",
    "  ir1 <<- nnet(\n",
    "    ir[samp,],\n",
    "    targets[samp,],\n",
    "    size = hidden_layer_size,\n",
    "    rang = initial_random_weight_range,\n",
    "    decay = weight_decay,\n",
    "    maxit = epochs)\n",
    "    ir1\n",
    "}\n",
    "\n",
    "output <- capture.output(train(), split = TRUE)\n",
    "output <- strsplit(output, \"\\n\")\n",
    "\n",
    "# \"initial  value 57.703088 \"\n",
    "for (match in str_match(output, \"^initial\\\\s+value\\\\s+([-+]?[0-9]*\\\\.?[0-9]+)\")[,2]) {\n",
    "  if (!is.na(match)) {\n",
    "     exp$log_metric(\"loss\", match, step=0)\n",
    "  }\n",
    "}\n",
    "\n",
    "# \"iter  10 value 46.803951\"\n",
    "matrix = str_match(output, \"^iter\\\\s+(\\\\d+)\\\\s+value\\\\s+([-+]?[0-9]*\\\\.?[0-9]+)\")\n",
    "for (i in 1:nrow(matrix)) {\n",
    "  match = matrix[i,]\n",
    "  if (!is.na(match[2])) {\n",
    "     exp$log_metric(\"loss\", match[3], step=match[2])\n",
    "  }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Testing the Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we'll test the trained model by creating a confusion matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "test.cl <- function(true, pred) {\n",
    "    true <- max.col(true)\n",
    "    cres <- max.col(pred)\n",
    "    table(true, cres)\n",
    "}\n",
    "cm <- test.cl(targets[-samp,], predict(ir1, ir[-samp,]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "    cres\n",
       "true  1  2  3\n",
       "   1 19  0  6\n",
       "   2  0 25  0\n",
       "   3  0  0 25"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "cm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's make a slightly better visualization for this confusion matrix.\n",
    "\n",
    "First we pull out the matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "'[[19,0,6],[0,25,0],[0,0,25]]'"
      ],
      "text/latex": [
       "'{[}{[}19,0,6{]},{[}0,25,0{]},{[}0,0,25{]}{]}'"
      ],
      "text/markdown": [
       "'[[19,0,6],[0,25,0],[0,0,25]]'"
      ],
      "text/plain": [
       "[1] \"[[19,0,6],[0,25,0],[0,0,25]]\""
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "matrix <- sprintf(\"[%s,%s,%s]\", \n",
    "                  sprintf(\"[%s]\", paste(cm[1,], collapse=\",\")),\n",
    "                  sprintf(\"[%s]\", paste(cm[2,], collapse=\",\")),\n",
    "                  sprintf(\"[%s]\", paste(cm[3,], collapse=\",\")))\n",
    "matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And set some labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "title <- \"Iris Confusion Matrix\"\n",
    "labels <- sprintf('[\"%s\",\"%s\",\"%s\"]', \"Setosa\",\"Versicolor\",\"Virginica\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We put those together in a template of the JSON format for the Comet confusion matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "template <- '{\"version\":1,\"title\":\"%s\",\"labels\":%s,\"matrix\":%s,\"rowLabel\":\"Actual Category\",\"columnLabel\":\"Predicted Category\",\"maxSamplesPerCell\":25,\"sampleMatrix\":[],\"type\":\"integer\"}'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We log the confusion matrix to a file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "fp <- file(\"confusion_matrix.json\")\n",
    "writeLines(c(sprintf(template, title, labels, matrix)), fp)\n",
    "close(fp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "exp$upload_asset(\"confusion_matrix.json\", type = \"confusion-matrix\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the Comet UI, you should see something like this on the Confusion Matrix tab:\n",
    "\n",
    "<img src=\"confusion-matrix.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And log some additional notes to the HTML tab on the Comet UI:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "exp$log_html(\"\n",
    "<h1>Comet nnet Example</h1>\n",
    "\n",
    "<p>This example demonstrates using the nnet library on the iris dataset.</p>\n",
    "\n",
    "<p>See the Output tab for confusion matrix.</p>\n",
    "\n",
    "<ul>\n",
    "<li><a href=https://github.com/comet-ml/cometr/blob/master/inst/train-examples/nnet-example.R>github.com/comet-ml/cometr/inst/train-example/nnet-example.R</a></li>\n",
    "</ul>\n",
    "\n",
    "<p>For help on the Comet R SDK, please see: <a href=https://www.comet.com/docs/r-sdk/getting-started/>www.comet.com/docs/r-sdk/getting-started/</a></p>\n",
    "\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mark the experiment as created by R:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "exp$log_other(key = \"Created by\", value = \"cometr\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we show how you can display this experiment in the notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "url <- exp$get_url()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<iframe width=900 height=900 src=https://www.comet.ml/dsblank/cometr/ae316b3138be4dfdb02b305b1fc438c4></iframe>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display_html(sprintf(\"<iframe width=900 height=900 src=%s></iframe>\", url))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point, save your notebook. We'll then upload it as an asset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "exp$upload_asset(\n",
    "  \"Comet-R-nnet.ipynb\",\n",
    "  type = \"notebook\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Comet experiment https://www.comet.ml/dsblank/cometr/ae316b3138be4dfdb02b305b1fc438c4 \n"
     ]
    }
   ],
   "source": [
    "exp$print()\n",
    "exp$stop()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}