--- title: 'ExercisesRmd' output: html_document date: '2023-04-11' --- # Load data Load package ```{r} library(palmerpenguins) ``` Load data frame ```{r} penguins<-penguins ``` ## rows Check first 5 rows ```{r} head(penguins,5) ``` Check last 5 rows ```{r} tail(penguins,5) ``` When you want to inspect specific rows, they are in the **first** position ```{r} (penguins[1,]) ``` Check first 3 rows The **:** means "from A to B". ```{r} (penguins[1:3,]) ``` ## columns The columns go on the **second** position. ```{r} head(penguins[,1]) ``` Another way to do it is with the $ and the column name. ```{r} head(penguins$species) ``` ## column and row ```{r} (penguins[1,1]) ``` ```{r} (penguins[3,2]) ``` # Tidyverse Load package ```{r} library(tidyverse) ``` ## count Sample size? ```{r} penguins %>% count() ``` Sample size per species? ```{r} penguins %>% count(species) ``` Per island and per species? ```{r} penguins %>% count(island,species) ``` ## distinct Using base R ```{r} unique(penguins$species) ``` Using tidyverse ```{r} penguins %>% distinct(species) ``` ## select Select one column ```{r} penguins %>% select(species) ``` Remove one column using the **-** ```{r} penguins %>% select(-sex) ``` Select all columns expect this one using **!** ```{r} penguins %>% select(!sex) ``` Select columns in between using **:** ```{r} penguins %>% select(bill_length_mm:body_mass_g) ``` Select columns based on characters (strings). ```{r} penguins %>% select(ends_with("mm")) ``` Using the first letters of the string ```{r} penguins %>% select(starts_with("bill")) ``` ## filter ```{r, eval=FALSE} penguins %>% filter(sex == 'female') ``` To change the object we need to create a new data frame. ```{r} female_penguins<-penguins %>% filter(sex == 'female') ``` - The symbol **<=** means 'smaller or same as' ```{r, eval=FALSE} penguins %>% filter(bill_length_mm <= 39.1) ``` - The symbol **>=** means 'larger or same as' ```{r, eval=FALSE} penguins %>% filter(bill_length_mm >= 39.1) ``` - The symbol **&** means 'and' ```{r, eval=FALSE} penguins %>% filter(island == 'Biscoe' & species =='Adelie') ``` ## mutate ```{r} penguins<-penguins %>% mutate(body_mass_kg = body_mass_g / 1000) ``` ## group_by ```{r} penguins %>% group_by(year) %>% summarise(mean_bill_length=mean(bill_length_mm)) ``` ## drop_na This functions allows you to ignore or remove NAs ```{r, eval=FALSE} penguins %>% drop_na(bill_length_mm) ``` ## group_by ```{r} penguins %>% group_by(year) %>% drop_na(bill_length_mm) %>% summarise(mean_bill_length=mean(bill_length_mm)) ``` ### fill out - Create a new data frame with different penguin species ```{r} #______ <-penguins %>% # distinct(_____) ``` - Create a new object with only two columns: sex and year ```{r} #_____ <-penguins %>% # select(____,_____) ``` - Create a new data frame only with male data ```{r} #______ <-penguins %>% # filter(_____ == '______') ``` - Create a new data frame using only females that are heavier than 3800 g ```{r} #______ <-penguins %>% # filter(_____ == '______')%>% # filter(______ >= ______) ``` - Check the results form this operation ```{r} penguins_means<-penguins %>% group_by(species,sex) %>% drop_na(body_mass_g,sex)%>% summarise(mean_body_mass_g = mean(body_mass_g), n = n())%>% mutate(mean_body_mass_kg = mean_body_mass_g / 1000) ``` # Tables ```{r} bird_id<-c("ID01","ID02","ID03","ID04","ID05", "ID06","ID07","ID08","ID09","ID10") bird_mass<-c(1.5,2.0,3.5,4.1,2.6,3.7,8.9,2.5,6.3,1.0) bird_gps<-c(50010,50020,50035,50001,50006,50003,50008,50002,50003,50001) ``` We might have two data sets On one hand, the measurement data... ```{r} bird_measurements<-data.frame(bird_id,bird_mass) head(bird_measurements) ``` ... on the other, field data. ```{r} bird_tracking <- data.frame(bird_id,bird_gps) head(bird_tracking) ``` ## left_join() To join them we can use the function **left_join()** However it is important to have a **key** to match the observations ```{r} bird_joined<-left_join(bird_measurements, bird_tracking, by = "bird_id") bird_joined ``` ## pivot_longer Lets imagine we have data from five species and their number of locations (nlocs) among three different years. ```{r} bird_id<-c("ID01","ID02","ID03","ID04","ID05", "ID06","ID07","ID08","ID09","ID10") year_2010<-c(5,4,5,6,7,3,2,1,9,10) year_2011<-c(3,2,1,9,4,5,6,7,3,2) year_2012<-c(6,2,3,7,8,2,1,9,4,5) ``` New data frame ```{r} bird_nlocs<-data.frame(bird_id,year_2010,year_2011,year_2012) ``` ```{r} head(bird_nlocs,5) ``` ## pivot_longer pivot_longer "lengthens" data, increasing the number of rows and decreasing the number of columns. ```{r} bird_long <- bird_nlocs %>% pivot_longer(c(year_2010,year_2011,year_2012), names_to = "year", values_to = "nlocs" ) ``` ```{r} head(bird_long,5) ``` ## pivot_wider The opposite will be to separate the columns. pivot_wider() "widens" data, increasing the number of columns and decreasing the number of rows. The most important arguments are **names_from** which are going to be the names of the columns created after (often the column with factors) and **values_from** is the the name of the column with the values (often the columns with numbers) ```{r} bird_wide<-bird_long %>% pivot_wider(names_from = year, values_from = nlocs) bird_wide ``` ## paste or unite The argument **paste** or **paste0** from base R allows you to paste together multiple columns ```{r} bird_long$unique_id<-paste0(bird_long$bird_id,'_',bird_long$year) ``` The argument **unite** is similar, but lets you to paste together multiple columns into one. ```{r} bird_long<-bird_long %>% unite(col = unique_id2, c("bird_id", "year"), sep = "_", remove=FALSE) bird_long ``` **Note** it will get rid of the original column, so if you don't want to eliminate the original column add **remove = FALSE.** ## separate ```{r, eval=FALSE} bird_long %>% separate(col = unique_id, into = c("id", "text","year"), sep = "_") ``` ```{r,} head(bird_long,5) ``` **Note** it will get rid of the original column, so if you don't want to eliminate the original column add **remove = FALSE.** ```{r} bird_long<-bird_long %>% separate(col = unique_id, into = c("id", "text","year"), sep = "_", remove = FALSE) ``` ## rename The argument **rename** allows to change the name of one or several columns. The new name is written first and the old name comes after. An example changing the name of one column ```{r, eval=FALSE} bird_long %>% rename(unique_identifier = unique_id2) ``` ## relocate The argument **relocate** allows you to reorganize your columns and keeping just those that you are interested on. ```{r, eval=FALSE} bird_long %>% relocate(bird_id,year,nlocs) ``` Using this argument together with select you can keep only the columns of interest. ```{r, eval=FALSE} bird_long %>% select(bird_id,year,nlocs)%>% relocate(bird_id,year,nlocs) ``` # Export ```{r, message=FALSE} library(here) ResultsFolder<-here() ``` ```{r, eval=FALSE} write_csv( bird_joined, file =paste0(ResultsFolder,'/bird_joined.csv')) ```