--- title: "Lecture 2 – Importing and handling data I" author: "Dr Nicolaas Puts and Dr Lucas França" date: "10 October 2022" output: html_document: df_print: paged pdf_document: default --- # Data types The R data types or classes are - logical (or boolean) - integer - numeric - complex - character - raw (we are not going to use this one in the course) ## 1. Logical ```{r} logic1 <- TRUE log2 <- FALSE ``` ## 2. Intenger ```{r} n <- 24 ``` ## 3. Numeric ```{r} dist <- 3.5 ``` ## 4. Complex ```{r} cmp <- 4+5i ``` ## 5. Character ```{r} fruit <- "Apple" ``` ## 6. Raw ```{r} raw_fruit <- charToRaw("Apple") ``` # Data structure ## String ```{r} fruits <- 'Apple and Banana' fruits2 <- 'and Grapes' nchar(fruits) paste(fruits, fruits2) ``` ```{r} fruits == fruits2 ``` ```{r} fruits == fruits ``` ## Vectors ```{r} fruits_vec <- c('Apple', 'Banana', 'Grape') fruits_vec[1] fruits_vec[1] <- 'Orange' ``` ## Matrix ```{r} matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE) matrix1[1,2] matrix1[,2] ``` ## Arrays ```{r} array1 <- array(c(1:12), dim = c(2,3,2)) ``` ## Dataframe ```{r} dataframe1 <- data.frame( Name = c("Anne", "Matthew", "Simon"), Age = c(21, 18, 19), Vote = c(TRUE, FALSE, TRUE) ) ``` ## Tibbles ```{r} #install.packages("tidyverse") library(tidyverse) ``` ```{r} tibble1 <- tibble( Name = c("Anne", "Matthew", "Simon"), Age = c(21, 18, 19), Vote = c(TRUE, FALSE, TRUE) ) ``` ```{r} tibble1$Name ``` ```{r} as_tibble(Dependent) ``` # Loading data ## Loading csv ```{r} urlfile <- 'https://raw.githubusercontent.com/HeJasonL/BIG-TAC/main/TutorialData.csv' #this loads a csv file called TutorialData and puts it into urlfile tutorial_data <-read.csv(urlfile) #this reads the csv file that we put in urlfile and makes a new ‘dataframe’ called tutorial_data ``` ## Loading SPSS ```{r} urlfile <- 'https://github.com/fans-stats/intro_to_R_stats/blob/main/datasets/researchers_sample.SAV?raw=true' #this loads a csv file called TutorialData and puts it into urlfile tutorial_data <- read.spss(urlfile) #this reads the csv file that we put in urlfile and makes a new ‘dataframe’ called tutorial_data tutorial_data <- as.tibble(tutorial_data) ``` # Handling missing data ```{r} tutorial_data$id[1] <- NA ``` # A simple task There's a sample of 25 people where the mean height is 180 cm with an SD of 30. Use R to calculate the 95% CI and put the CI into a variable. ```{r} CI <- c((180 - 1.96 * (30/sqrt(25))), (180 + 1.96 * (30/sqrt(25)))) ```