{
  "cells": [
    {
      "metadata": {
        "toc": true
      },
      "cell_type": "markdown",
      "source": "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Factors-in-R\" data-toc-modified-id=\"Factors-in-R-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Factors in R</a></span><ul class=\"toc-item\"><li><span><a href=\"#Creating-or-Converting\" data-toc-modified-id=\"Creating-or-Converting-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Creating or Converting</a></span></li><li><span><a href=\"#Types-of-Categorical-variables\" data-toc-modified-id=\"Types-of-Categorical-variables-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Types of Categorical variables</a></span></li><li><span><a href=\"#Factor-levels:\" data-toc-modified-id=\"Factor-levels:-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Factor levels:</a></span></li><li><span><a href=\"#Ordered-factor\" data-toc-modified-id=\"Ordered-factor-1.4\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>Ordered factor</a></span></li><li><span><a href=\"#labels:\" data-toc-modified-id=\"labels:-1.5\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>labels:</a></span></li><li><span><a href=\"#Generating-Factor-Levels(using-gl()-function)\" data-toc-modified-id=\"Generating-Factor-Levels(using-gl()-function)-1.6\"><span class=\"toc-item-num\">1.6&nbsp;&nbsp;</span>Generating Factor Levels(using <code>gl()</code> function)</a></span></li><li><span><a href=\"#droplevels()\" data-toc-modified-id=\"droplevels()-1.7\"><span class=\"toc-item-num\">1.7&nbsp;&nbsp;</span><code>droplevels()</code></a></span></li><li><span><a href=\"#working-with-is.factor(),-is.ordered(),-as.factor()-and-as.ordered()\" data-toc-modified-id=\"working-with-is.factor(),-is.ordered(),-as.factor()-and-as.ordered()-1.8\"><span class=\"toc-item-num\">1.8&nbsp;&nbsp;</span>working with <code>is.factor()</code>, <code>is.ordered()</code>, <code>as.factor()</code> and <code>as.ordered()</code></a></span></li></ul></li><li><span><a href=\"#Date-and-Time\" data-toc-modified-id=\"Date-and-Time-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Date and Time</a></span></li></ul></div>"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "# Factors in R"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "* List of all possible values of a variable in string format (special string type)\n* Limited number of different values(categories) \n* Stored as a vector of integer values \n* factor's levels will always be character values\n* `factor()` function is used to create a factor\n* `levels()` function can be useful to check the levels of factor variable "
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Creating or Converting"
    },
    {
      "metadata": {
        "scrolled": true,
        "trusted": false
      },
      "cell_type": "code",
      "source": "# To create factors in R, you make use of the function factor()\n# Sex vector\nsex_vect <- c(\"Male\", \"Female\", \"Female\", \"Male\", \"Male\", \"Female\")\n# Convert vector into a factor\nfactor_sex_vect <- factor(sex_vect)\nprint(factor_sex_vect)",
      "execution_count": 42,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "[1] Male   Female Female Male   Male   Female\nLevels: Female Male\n"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Types of Categorical variables"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "<img src='https://i.imgur.com/BoFjjmM.jpg' width=\"850\" height=\"650\"><br><br>"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Factor levels:\n* Assign the levels(using argument) while creating factor or change the names of these levels using `levels()` function. \n* Post check the levels using `levels()` function\n* There is also another function `nlevels()` to check the number of levels"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# Animals without order\nanimals_vector <- c(\"Elephant\", \"Dog\", \"Donkey\", \"Horse\")\nfactor_animals_vector <- factor(animals_vector)\nprint(factor_animals_vector)\nprint(levels(factor_animals_vector))\nnlevels(factor_animals_vector) # to check the number of levels\n# Temperature with order\ntemperature_vector <- c(\"High\", \"Low\", \"High\",\"Low\", \"Medium\")\nfactor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c(\"Low\", \"Medium\", \"High\"))\nnlevels(factor_temperature_vector)",
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "text": "[1] Elephant Dog      Donkey   Horse   \nLevels: Dog Donkey Elephant Horse\n[1] \"Dog\"      \"Donkey\"   \"Elephant\" \"Horse\"   \n",
          "name": "stdout"
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "[1] 4",
            "text/latex": "4",
            "text/markdown": "4",
            "text/html": "4"
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": "[1] 3",
            "text/latex": "3",
            "text/markdown": "3",
            "text/html": "3"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Ordered factor\n* An \"ordered\" factor is a factor whose levels have a particular order\n* Create ordered factors with the `ordered()` command, or by using `factor()` with the `ordered=TRUE` argument"
    },
    {
      "metadata": {
        "scrolled": true,
        "trusted": false
      },
      "cell_type": "code",
      "source": "# Temperature with order\ntemperature_vector <- c(\"High\", \"Low\", \"High\",\"Low\", \"Medium\")\nfactor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c(\"Low\", \"Medium\", \"High\"))\nprint(factor_temperature_vector)\n# levels(factor_vector) <- c(\"name1\", \"name2\",...)  # syntax to assign the levels\nprint(levels(factor_temperature_vector))",
      "execution_count": 44,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "[1] High   Low    High   Low    Medium\nLevels: Low < Medium < High\n[1] \"Low\"    \"Medium\" \"High\"  \n"
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "# High\nHigh <- factor_temperature_vector[1]\n\n# Low\nLow <- factor_temperature_vector[2]\n\n# check whether High greater than Low? similarly we can use for heavier, larger, faster, strongly agree depends on the context of the data \nHigh > Low",
      "execution_count": 45,
      "outputs": [
        {
          "data": {
            "text/html": "TRUE",
            "text/latex": "TRUE",
            "text/markdown": "TRUE",
            "text/plain": "[1] TRUE"
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## labels:\n* Assign the labels through the labels= argument\n* They are better than using simple integer labels because factors are self describing: `\"low\", \"medium\",` and `\"high\"` is more descriptive than `1, 2, 3`"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "data = c(1,2,2,3,1,2,3,3,1,2,3,3,1)\nfdata = factor(data)\nprint(fdata) # without labels \nrdata = factor(data,labels=c(\"Low\",\"Medium\",\"High\"))\nprint(rdata) # with labels",
      "execution_count": 46,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": " [1] 1 2 2 3 1 2 3 3 1 2 3 3 1\nLevels: 1 2 3\n [1] Low    Medium Medium High   Low    Medium High   High   Low    Medium\n[11] High   High   Low   \nLevels: Low Medium High\n"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "##  Generating Factor Levels(using `gl()` function)"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "`gl()` function generates factors by specifying the pattern of their levels.  \n  \n`gl(n, k, length = n*k, labels = 1:n, ordered = FALSE)`\n  \n`n`: number of levels  \n`k`: number of replications  \n`length`: length of the result  \n`labels`: labels for the resulting factor levels  \n`ordered`: whether the result sould be ordered or not  \n  \n  \n`gl(3,2,labels = c(\"green\",\"red\",\"yellow\"))`  \n[1] green  green  red    red    yellow yellow  \nLevels: green red yellow  "
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "# usage of gl function in data frame \nclinical.trial <-\n    data.frame(patient = 1:100,\n               age = rnorm(100, mean = 60, sd = 6),\n               treatment = gl(2, 50,\n                 labels = c(\"Treatment\", \"Control\")),\n               center = sample(paste(\"Center\", LETTERS[1:5]), 100, replace = TRUE)) \nprint(head(clinical.trial,20))",
      "execution_count": 47,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "   patient      age treatment   center\n1        1 56.37911 Treatment Center B\n2        2 72.05500 Treatment Center D\n3        3 54.26399 Treatment Center D\n4        4 59.67424 Treatment Center A\n5        5 58.36585 Treatment Center C\n6        6 50.42562 Treatment Center C\n7        7 61.84052 Treatment Center C\n8        8 57.49042 Treatment Center C\n9        9 59.41043 Treatment Center C\n10      10 57.12441 Treatment Center D\n11      11 58.12204 Treatment Center E\n12      12 59.25759 Treatment Center C\n13      13 51.23221 Treatment Center E\n14      14 56.24048 Treatment Center E\n15      15 55.28236 Treatment Center D\n16      16 62.27550 Treatment Center A\n17      17 64.87295 Treatment Center C\n18      18 55.22675 Treatment Center C\n19      19 56.19110 Treatment Center C\n20      20 67.15331 Treatment Center C\n"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## `droplevels()`\n* we can drop the Unused Levels from the Factors using 'droplevels()' function, since the factor list sticks around even if you remove some data such that no examples of a particular level still exist (see below)"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "aq <- transform(airquality, Month = factor(Month, labels = month.abb[5:9]))\nprint(levels(aq$Month))\naq <- subset(aq, Month != \"Jul\")\nprint(levels(aq$Month)) # still the same levels\ntable(aq$Month) # even though one level has 0 entries!\ntable(droplevels(aq)$Month)",
      "execution_count": 48,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "[1] \"May\" \"Jun\" \"Jul\" \"Aug\" \"Sep\"\n[1] \"May\" \"Jun\" \"Jul\" \"Aug\" \"Sep\"\n"
        },
        {
          "data": {
            "text/plain": "\nMay Jun Jul Aug Sep \n 31  30   0  31  30 "
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": "\nMay Jun Aug Sep \n 31  30  31  30 "
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## working with `is.factor()`, `is.ordered()`, `as.factor()` and `as.ordered()` \n* to check whethr it is factor or ordered factor  \n* to convert the vectors into factor vector or ordered factors   "
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "is.factor(temperature_vector)\nis.ordered(temperature_vector)\nis.factor(factor_temperature_vector)\nis.ordered(factor_temperature_vector)",
      "execution_count": 49,
      "outputs": [
        {
          "data": {
            "text/html": "FALSE",
            "text/latex": "FALSE",
            "text/markdown": "FALSE",
            "text/plain": "[1] FALSE"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": "FALSE",
            "text/latex": "FALSE",
            "text/markdown": "FALSE",
            "text/plain": "[1] FALSE"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": "TRUE",
            "text/latex": "TRUE",
            "text/markdown": "TRUE",
            "text/plain": "[1] TRUE"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": "TRUE",
            "text/latex": "TRUE",
            "text/markdown": "TRUE",
            "text/plain": "[1] TRUE"
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "# Date and Time"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "![Imgur](https://i.imgur.com/2RMPVpT.png)\n<sub>source: <a href=\"https://tbrieder.org/epidata/course_reading/e_aragon.pdf\" target=\"_blank\">https://tbrieder.org/epidata/course_reading/e_aragon.pdf</a></sub>  \n<sub>source: <a href=\"http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/resources/R/vars.pdf\" target=\"_blank\">http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/resources/R/vars.pdf</a></sub>  "
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "* `strptime()`\n*  `POSIXct()` and `POSIXlt()`\n* `as.Date()`"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "help(strptime) #- to get conversion formats see .",
      "execution_count": 5,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "myDays <- c(\"10/11/1945\", \"8/19/2003\", \"5/15/1964\")\nmyDates<- as.Date(myDays, format = \"%m/%d/%Y\")\nmyDates",
      "execution_count": 4,
      "outputs": [
        {
          "data": {
            "text/html": "<ol class=list-inline>\n\t<li><time datetime=\"1945-10-11\">1945-10-11</time></li>\n\t<li><time datetime=\"2003-08-19\">2003-08-19</time></li>\n\t<li><time datetime=\"1964-05-15\">1964-05-15</time></li>\n</ol>\n",
            "text/latex": "\\begin{enumerate*}\n\\item 1945-10-11\n\\item 2003-08-19\n\\item 1964-05-15\n\\end{enumerate*}\n",
            "text/markdown": "1. 1945-10-11\n2. 2003-08-19\n3. 1964-05-15\n\n\n",
            "text/plain": "[1] \"1945-10-11\" \"2003-08-19\" \"1964-05-15\""
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "<span style=\"color:red; font-family:Comic Sans MS\">References</span>  \n<a href=\"https://blogisticreflections.wordpress.com/2009/09/21/r-function-of-the-day-table/\" target=\"_blank\">https://blogisticreflections.wordpress.com/2009/09/21/r-function-of-the-day-table/</a>  \n<a href=\"http://www.endmemo.com/program/R/gl.php\" target=\"_blank\">http://www.endmemo.com/program/R/gl.php</a>  "
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "https://data-flair.training/blogs/r-factor-functions/?fbclid=IwAR2VrzHLFSR3JYjmCCMR9KqE-Yz71KiOE6C_5QVhKXWfJcl8cI7rAxwuQro"
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "r",
      "display_name": "R",
      "language": "R"
    },
    "language_info": {
      "mimetype": "text/x-r-source",
      "name": "R",
      "pygments_lexer": "r",
      "version": "3.4.1",
      "file_extension": ".r",
      "codemirror_mode": "r"
    },
    "toc": {
      "nav_menu": {},
      "number_sections": true,
      "sideBar": true,
      "skip_h1_title": false,
      "base_numbering": 1,
      "title_cell": "Table of Contents",
      "title_sidebar": "Contents",
      "toc_cell": true,
      "toc_position": {},
      "toc_section_display": "block",
      "toc_window_display": false
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}