--- title: "BiCG R Notebook" output: html_notebook --- ### Intro to R R is used to manipulate various kinds of data. Data objects can be given names, called variables: ```{r} 5 + 6 number1 <- 5 number2 <- 6 number1 + number2 number.sum <- number1 + number2 # print the content of a data object or variable to screen print(number.sum) # typing just the name of the object also prints it to screen number.sum ``` Data in R are manipulated primarily with functions. Functions are called by typing the name of the function followed by round brackets. Most functions require arguments, or options, to be specified in the brackets. For `print(number)` used above, `print` is the name of the function, and `number.sum` is the variable option that we want to print. If you are not sure what a specific function does, or if you need a reminder on what the arguments for a funtion are, you can view the help page by typing `?function.name` without the round brackets: ```{r} ?print ``` Note: The help page will open in the "Help" tab in the Files/Plots/Packages/Help/Viewer pane. ### Getting Around #### The Hard Way ```{r} # get the current working directory: getwd() # set a new working directory: setwd("~/workspace/Review_Session") # to set a working directory on your own computer, you can do one of the following: # (these commands are commented out so that these code chunks will run on AWS RStudio) # setwd("C:/myPATH") # on Windows # setwd("~/myPATH") # on Mac # setwd("/Users/david/myPATH") # on Mac # list the files in the current working directory: list.files() # list objects in the current R session: ls() ``` #### The Easy Way In RStudio we can use "Session" > "Set Working Directory" > "Choose Directory". ### Data Types #### Vectors Vectors contain multiple pieces of data. The elements of a vector must all be of the same type (numeric, logical, character). The `c()` function combines the items inside the round brackets and can be used to create a new vector: ```{r} numeric.vector <- c(1,2,3,4,5,6,2,1) numeric.vector character.vector <- c("Fred", "Barney", "Wilma", "Betty") character.vector logical.vector <- c(TRUE, TRUE, FALSE, TRUE) logical.vector ``` To refer to specific elements in the vector, use square brackets. Because vectors are one dimensional (unlike a two-dimensional matrix), only a single number can be specified in the brackets. If you want more than one element, you can specify a range (with `:`) or a vector (with `c()`): ```{r} character.vector character.vector[2] character.vector[2:3] character.vector[c(2,4)] ``` #### Matrices You can create a 3x4 numeric matrix with: ```{r} matrix.example <- matrix(1:12, nrow = 3, ncol=4, byrow = FALSE) matrix.example matrix.example <- matrix(1:12, nrow = 3, ncol=4, byrow = TRUE) matrix.example ``` Alternatively, you can create a matrix by combining vectors: ```{r} dataset.a <- c(1,22,3,4,5) dataset.b <- c(10,11,13,14,15) dataset.a dataset.b rbind.together <- rbind(dataset.a, dataset.b) rbind.together cbind.together <- cbind(dataset.a, dataset.b) cbind.together ``` To get elements of the matrix, square brackets are used again, but two dimensions must be specified. To do this, a comma is used, where the number before the comma is the row number and the number after the comma is the column number. If you want to specify all of the rows, leave the space before the comma blank. If you want to specify all of the columns, leave the space after the comma blank: ```{r} matrix.example[2,4] matrix.example[2,] matrix.example[,4] ``` You can add column and row names to the matrix and use the new names to get the elements of the matrix: ```{r} colnames(matrix.example) <- c("Sample1","Sample2","Sample3","Sample4") rownames(matrix.example) <- paste("gene",1:3,sep="_") matrix.example matrix.example[,"Sample2"] matrix.example[1,"Sample2"] matrix.example["gene_1","Sample2"] ``` Note that all columns in a matrix must have the same type(numeric, character, etc.) and the same length. #### Dataframes Dataframes are similar to matrices, but different columns can have different types (numeric, character, factor, etc.): ```{r} people.summary <- data.frame( age = c(30,29,25,25), names = c("Fred", "Barney", "Wilma", "Betty"), gender = c("m", "m", "f", "f") ) people.summary ``` Getting elements of a dataframe is similar to getting elements of a matrix: ```{r} people.summary[2,1] people.summary[2,] people.summary[,1] ``` An easier way to specify a dataframe column by name is by using `$`: ```{r} people.summary$age ``` #### Lists Lists gather together a collection of objects under one name: ```{r} together.list <- list( vector.example = dataset.a, matrix.example = matrix.example, data.frame.example = people.summary ) together.list ``` There are several ways to get elements of a list: ```{r} together.list$matrix.example together.list$matrix.example[,3] together.list["matrix.example"] together.list[["matrix.example"]] together.list[["matrix.example"]][,2] ``` ### Reading Data In We use `read.data` or `read.csv` to read in data: ```{r} gene_example <- read.csv("Gene_R_Example.csv") ``` In RStudio, we can use the "File" navigation window instead. Navigate to the directory containing the Gene_R_example.csv that we downloaded previously. Click on the file name then click "Import Dataset." A new window appears allowing you to modify attributes of your file. Rename the file to the object "gene_example". Commands like `head` and `tail` also work in R. `View` will open the dataframe in a new window: ```{r} head(gene_example) View(gene_example) ``` ### Basic Plotting A very basic plot: ```{r} plot(x=gene_example$Control, y=gene_example$Treated) ``` You can specify how you want your plot to look, in terms of color, shape, labels, etc. A nicer plot: ```{r} plot(x=gene_example$Control, y=gene_example$Treated, xlab = "Control", ylab = "Treated", cex.lab = 1.5, main = "A nice scatter plot", pch = 16, bty = "n", col = "dark blue", las = 1 ) # las # To change the axes label style, use the graphics option las (label style). This changes the orientation angle of the labels: # 0: The default, parallel to the axis # 1: Always horizontal # 2: Perpendicular to the axis # 3: Always vertical # bty # To change the type of box round the plot area, use the option bty (box type): # "o": The default value draws a complete rectangle around the plot. # "n": Draws nothing around the plot. ``` Connecting the dots: ```{r} plot(x=gene_example$Control, y=gene_example$Treated, xlab = "Control", ylab = "Treated", cex.lab = 1.5, main = "A nice scatter plot", pch = 16, bty = "n", col = "dark blue", type = "b", las = 1 ) # type # The 'type' option changes what the data points are plotted with. # "p": points # "l": lines # "b": both (points and lines) # for more options, see the plot() help page ``` #### Histograms ```{r} hist(gene_example$Control) hist(gene_example$Control, xlab = "Expression", ylab = "Number of Genes", cex.lab = 1.5, main = "A nice histogram", col = "cyan", breaks = 10, las = 1 ) ``` #### Boxplots ```{r} boxplot(gene_example[,2:3]) ``` ```{r} boxplot(gene_example[,2:3], width = c(3,1), col = "red", border = "dark blue", names = c("Control", "Treatment"), main = "My boxplot", notch = TRUE, horizontal = TRUE ) ``` #### Saving your plots as PDFs ```{r} pdf("myfigure.pdf", height=10, width=6) par(mfrow=c(2,1)) plot(x=gene_example$Control, y=gene_example$Treated, xlab = "Control", ylab = "Treated", cex.lab = 1.5, main = "A nice scatter plot", pch = 16, bty = "n", col = "dark blue", type = "b", las = 1 ) boxplot(gene_example[,2:3], width = c(3,1), col = "red", border = "dark blue", names = c("Control", "Treatment"), main = "My boxplot", notch = TRUE, horizontal = TRUE ) dev.off() ``` If you don't specify a path in the pdf() function like we did above, the pdf will be saved in your working directory. In a new internet tab, go to http://##.oicrcbw.ca/Review_Session/myfigure.pdf to view the saved image. To save your RStudio notebook, go to "File" > "Save As..." and enter a name.