<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Factors-in-R" data-toc-modified-id="Factors-in-R-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Factors in R</a></span><ul class="toc-item"><li><span><a href="#Creating-or-Converting" data-toc-modified-id="Creating-or-Converting-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Creating or Converting</a></span></li><li><span><a href="#Types-of-Categorical-variables" data-toc-modified-id="Types-of-Categorical-variables-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Types of Categorical variables</a></span></li><li><span><a href="#Factor-levels:" data-toc-modified-id="Factor-levels:-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Factor levels:</a></span></li><li><span><a href="#Ordered-factor" data-toc-modified-id="Ordered-factor-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Ordered factor</a></span></li><li><span><a href="#labels:" data-toc-modified-id="labels:-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>labels:</a></span></li><li><span><a href="#Generating-Factor-Levels(using-gl()-function)" data-toc-modified-id="Generating-Factor-Levels(using-gl()-function)-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Generating Factor Levels(using <code>gl()</code> function)</a></span></li><li><span><a href="#droplevels()" data-toc-modified-id="droplevels()-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span><code>droplevels()</code></a></span></li><li><span><a href="#working-with-is.factor(),-is.ordered(),-as.factor()-and-as.ordered()" data-toc-modified-id="working-with-is.factor(),-is.ordered(),-as.factor()-and-as.ordered()-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>working with <code>is.factor()</code>, <code>is.ordered()</code>, <code>as.factor()</code> and <code>as.ordered()</code></a></span></li></ul></li><li><span><a href="#Date-and-Time" data-toc-modified-id="Date-and-Time-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Date and Time</a></span></li></ul></div>

# Factors in R

* List of all possible values of a variable in string format (special string type)
* Limited number of different values(categories) 
* Stored as a vector of integer values 
* factor's levels will always be character values
* `factor()` function is used to create a factor
* `levels()` function can be useful to check the levels of factor variable 

## Creating or Converting

In [42]:
# To create factors in R, you make use of the function factor()
# Sex vector
sex_vect <- c("Male", "Female", "Female", "Male", "Male", "Female")
# Convert vector into a factor
factor_sex_vect <- factor(sex_vect)
print(factor_sex_vect)

[1] Male   Female Female Male   Male   Female
Levels: Female Male


## Types of Categorical variables

<img src='https://i.imgur.com/BoFjjmM.jpg' width="850" height="650"><br><br>

## Factor levels:
* Assign the levels(using argument) while creating factor or change the names of these levels using `levels()` function. 
* Post check the levels using `levels()` function
* There is also another function `nlevels()` to check the number of levels

In [2]:
# Animals without order
animals_vector <- c("Elephant", "Dog", "Donkey", "Horse")
factor_animals_vector <- factor(animals_vector)
print(factor_animals_vector)
print(levels(factor_animals_vector))
nlevels(factor_animals_vector) # to check the number of levels
# Temperature with order
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
nlevels(factor_temperature_vector)

[1] Elephant Dog      Donkey   Horse   
Levels: Dog Donkey Elephant Horse
[1] "Dog"      "Donkey"   "Elephant" "Horse"   


## Ordered factor
* An "ordered" factor is a factor whose levels have a particular order
* Create ordered factors with the `ordered()` command, or by using `factor()` with the `ordered=TRUE` argument

In [44]:
# Temperature with order
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
print(factor_temperature_vector)
# levels(factor_vector) <- c("name1", "name2",...)  # syntax to assign the levels
print(levels(factor_temperature_vector))

[1] High   Low    High   Low    Medium
Levels: Low < Medium < High
[1] "Low"    "Medium" "High"  


In [45]:
# High
High <- factor_temperature_vector[1]

# Low
Low <- factor_temperature_vector[2]

# check whether High greater than Low? similarly we can use for heavier, larger, faster, strongly agree depends on the context of the data 
High > Low

## labels:
* Assign the labels through the labels= argument
* They are better than using simple integer labels because factors are self describing: `"low", "medium",` and `"high"` is more descriptive than `1, 2, 3`

In [46]:
data = c(1,2,2,3,1,2,3,3,1,2,3,3,1)
fdata = factor(data)
print(fdata) # without labels 
rdata = factor(data,labels=c("Low","Medium","High"))
print(rdata) # with labels

 [1] 1 2 2 3 1 2 3 3 1 2 3 3 1
Levels: 1 2 3
 [1] Low    Medium Medium High   Low    Medium High   High   Low    Medium
[11] High   High   Low   
Levels: Low Medium High


##  Generating Factor Levels(using `gl()` function)

`gl()` function generates factors by specifying the pattern of their levels.  
  
`gl(n, k, length = n*k, labels = 1:n, ordered = FALSE)`
  
`n`: number of levels  
`k`: number of replications  
`length`: length of the result  
`labels`: labels for the resulting factor levels  
`ordered`: whether the result sould be ordered or not  
  
  
`gl(3,2,labels = c("green","red","yellow"))`  
[1] green  green  red    red    yellow yellow  
Levels: green red yellow  

In [47]:
# usage of gl function in data frame 
clinical.trial <-
    data.frame(patient = 1:100,
               age = rnorm(100, mean = 60, sd = 6),
               treatment = gl(2, 50,
                 labels = c("Treatment", "Control")),
               center = sample(paste("Center", LETTERS[1:5]), 100, replace = TRUE)) 
print(head(clinical.trial,20))

   patient      age treatment   center
1        1 56.37911 Treatment Center B
2        2 72.05500 Treatment Center D
3        3 54.26399 Treatment Center D
4        4 59.67424 Treatment Center A
5        5 58.36585 Treatment Center C
6        6 50.42562 Treatment Center C
7        7 61.84052 Treatment Center C
8        8 57.49042 Treatment Center C
9        9 59.41043 Treatment Center C
10      10 57.12441 Treatment Center D
11      11 58.12204 Treatment Center E
12      12 59.25759 Treatment Center C
13      13 51.23221 Treatment Center E
14      14 56.24048 Treatment Center E
15      15 55.28236 Treatment Center D
16      16 62.27550 Treatment Center A
17      17 64.87295 Treatment Center C
18      18 55.22675 Treatment Center C
19      19 56.19110 Treatment Center C
20      20 67.15331 Treatment Center C


## `droplevels()`
* we can drop the Unused Levels from the Factors using 'droplevels()' function, since the factor list sticks around even if you remove some data such that no examples of a particular level still exist (see below)

In [48]:
aq <- transform(airquality, Month = factor(Month, labels = month.abb[5:9]))
print(levels(aq$Month))
aq <- subset(aq, Month != "Jul")
print(levels(aq$Month)) # still the same levels
table(aq$Month) # even though one level has 0 entries!
table(droplevels(aq)$Month)

[1] "May" "Jun" "Jul" "Aug" "Sep"
[1] "May" "Jun" "Jul" "Aug" "Sep"



May Jun Jul Aug Sep 
 31  30   0  31  30 


May Jun Aug Sep 
 31  30  31  30 

## working with `is.factor()`, `is.ordered()`, `as.factor()` and `as.ordered()` 
* to check whethr it is factor or ordered factor  
* to convert the vectors into factor vector or ordered factors   

In [49]:
is.factor(temperature_vector)
is.ordered(temperature_vector)
is.factor(factor_temperature_vector)
is.ordered(factor_temperature_vector)

# Date and Time

![Imgur](https://i.imgur.com/2RMPVpT.png)
<sub>source: <a href="https://tbrieder.org/epidata/course_reading/e_aragon.pdf" target="_blank">https://tbrieder.org/epidata/course_reading/e_aragon.pdf</a></sub>  
<sub>source: <a href="http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/resources/R/vars.pdf" target="_blank">http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/resources/R/vars.pdf</a></sub>  

* `strptime()`
*  `POSIXct()` and `POSIXlt()`
* `as.Date()`

In [5]:
help(strptime) #- to get conversion formats see .

In [4]:
myDays <- c("10/11/1945", "8/19/2003", "5/15/1964")
myDates<- as.Date(myDays, format = "%m/%d/%Y")
myDates

<span style="color:red; font-family:Comic Sans MS">References</span>  
<a href="https://blogisticreflections.wordpress.com/2009/09/21/r-function-of-the-day-table/" target="_blank">https://blogisticreflections.wordpress.com/2009/09/21/r-function-of-the-day-table/</a>  
<a href="http://www.endmemo.com/program/R/gl.php" target="_blank">http://www.endmemo.com/program/R/gl.php</a>  

https://data-flair.training/blogs/r-factor-functions/?fbclid=IwAR2VrzHLFSR3JYjmCCMR9KqE-Yz71KiOE6C_5QVhKXWfJcl8cI7rAxwuQro