---
title: 'ExercisesRmd'
output: html_document
date: '2023-04-11'
---
# Load data
Load package
```{r}
library(palmerpenguins)
```
Load data frame
```{r}
penguins<-penguins
```
## rows
Check first 5 rows
```{r}
head(penguins,5)
```
Check last 5 rows
```{r}
tail(penguins,5)
```
When you want to inspect specific rows, they are in the **first** position
```{r}
(penguins[1,])
```
Check first 3 rows
The **:** means "from A to B".
```{r}
(penguins[1:3,])
```
## columns
The columns go on the **second** position.
```{r}
head(penguins[,1])
```
Another way to do it is with the $ and the column name.
```{r}
head(penguins$species)
```
## column and row
```{r}
(penguins[1,1])
```
```{r}
(penguins[3,2])
```
# Tidyverse
Load package
```{r}
library(tidyverse)
```
## count
Sample size?
```{r}
penguins %>%
count()
```
Sample size per species?
```{r}
penguins %>%
count(species)
```
Per island and per species?
```{r}
penguins %>%
count(island,species)
```
## distinct
Using base R
```{r}
unique(penguins$species)
```
Using tidyverse
```{r}
penguins %>%
distinct(species)
```
## select
Select one column
```{r}
penguins %>%
select(species)
```
Remove one column using the **-**
```{r}
penguins %>%
select(-sex)
```
Select all columns expect this one using **!**
```{r}
penguins %>%
select(!sex)
```
Select columns in between using **:**
```{r}
penguins %>%
select(bill_length_mm:body_mass_g)
```
Select columns based on characters (strings).
```{r}
penguins %>%
select(ends_with("mm"))
```
Using the first letters of the string
```{r}
penguins %>%
select(starts_with("bill"))
```
## filter
```{r, eval=FALSE}
penguins %>%
filter(sex == 'female')
```
To change the object we need to create a new data frame.
```{r}
female_penguins<-penguins %>%
filter(sex == 'female')
```
- The symbol **<=** means 'smaller or same as'
```{r, eval=FALSE}
penguins %>%
filter(bill_length_mm <= 39.1)
```
- The symbol **>=** means 'larger or same as'
```{r, eval=FALSE}
penguins %>%
filter(bill_length_mm >= 39.1)
```
- The symbol **&** means 'and'
```{r, eval=FALSE}
penguins %>%
filter(island == 'Biscoe' & species =='Adelie')
```
## mutate
```{r}
penguins<-penguins %>%
mutate(body_mass_kg = body_mass_g / 1000)
```
## group_by
```{r}
penguins %>%
group_by(year) %>%
summarise(mean_bill_length=mean(bill_length_mm))
```
## drop_na
This functions allows you to ignore or remove NAs
```{r, eval=FALSE}
penguins %>%
drop_na(bill_length_mm)
```
## group_by
```{r}
penguins %>%
group_by(year) %>%
drop_na(bill_length_mm) %>%
summarise(mean_bill_length=mean(bill_length_mm))
```
### fill out
- Create a new data frame with different penguin species
```{r}
#______ <-penguins %>%
# distinct(_____)
```
- Create a new object with only two columns: sex and year
```{r}
#_____ <-penguins %>%
# select(____,_____)
```
- Create a new data frame only with male data
```{r}
#______ <-penguins %>%
# filter(_____ == '______')
```
- Create a new data frame using only females that are heavier than 3800 g
```{r}
#______ <-penguins %>%
# filter(_____ == '______')%>%
# filter(______ >= ______)
```
- Check the results form this operation
```{r}
penguins_means<-penguins %>%
group_by(species,sex) %>%
drop_na(body_mass_g,sex)%>%
summarise(mean_body_mass_g = mean(body_mass_g), n = n())%>%
mutate(mean_body_mass_kg = mean_body_mass_g / 1000)
```
# Tables
```{r}
bird_id<-c("ID01","ID02","ID03","ID04","ID05",
"ID06","ID07","ID08","ID09","ID10")
bird_mass<-c(1.5,2.0,3.5,4.1,2.6,3.7,8.9,2.5,6.3,1.0)
bird_gps<-c(50010,50020,50035,50001,50006,50003,50008,50002,50003,50001)
```
We might have two data sets
On one hand, the measurement data...
```{r}
bird_measurements<-data.frame(bird_id,bird_mass)
head(bird_measurements)
```
... on the other, field data.
```{r}
bird_tracking <- data.frame(bird_id,bird_gps)
head(bird_tracking)
```
## left_join()
To join them we can use the function **left_join()**
However it is important to have a **key** to match the observations
```{r}
bird_joined<-left_join(bird_measurements,
bird_tracking,
by = "bird_id")
bird_joined
```
## pivot_longer
Lets imagine we have data from five species and their number of locations (nlocs) among three different years.
```{r}
bird_id<-c("ID01","ID02","ID03","ID04","ID05",
"ID06","ID07","ID08","ID09","ID10")
year_2010<-c(5,4,5,6,7,3,2,1,9,10)
year_2011<-c(3,2,1,9,4,5,6,7,3,2)
year_2012<-c(6,2,3,7,8,2,1,9,4,5)
```
New data frame
```{r}
bird_nlocs<-data.frame(bird_id,year_2010,year_2011,year_2012)
```
```{r}
head(bird_nlocs,5)
```
## pivot_longer
pivot_longer "lengthens" data, increasing the number of rows and decreasing the number of columns.
```{r}
bird_long <- bird_nlocs %>%
pivot_longer(c(year_2010,year_2011,year_2012),
names_to = "year",
values_to = "nlocs" )
```
```{r}
head(bird_long,5)
```
## pivot_wider
The opposite will be to separate the columns.
pivot_wider() "widens" data, increasing the number of columns and decreasing the number of rows.
The most important arguments are **names_from** which are going to be the names of the columns created after (often the column with factors) and **values_from** is the the name of the column with the values (often the columns with numbers)
```{r}
bird_wide<-bird_long %>%
pivot_wider(names_from = year,
values_from = nlocs)
bird_wide
```
## paste or unite
The argument **paste** or **paste0** from base R allows you to paste together multiple columns
```{r}
bird_long$unique_id<-paste0(bird_long$bird_id,'_',bird_long$year)
```
The argument **unite** is similar, but lets you to paste together multiple columns into one.
```{r}
bird_long<-bird_long %>%
unite(col = unique_id2,
c("bird_id", "year"),
sep = "_",
remove=FALSE)
bird_long
```
**Note** it will get rid of the original column, so if you don't want to eliminate the original column add **remove = FALSE.**
## separate
```{r, eval=FALSE}
bird_long %>%
separate(col = unique_id,
into = c("id", "text","year"),
sep = "_")
```
```{r,}
head(bird_long,5)
```
**Note** it will get rid of the original column, so if you don't want to eliminate the original column add **remove = FALSE.**
```{r}
bird_long<-bird_long %>%
separate(col = unique_id,
into = c("id", "text","year"),
sep = "_",
remove = FALSE)
```
## rename
The argument **rename** allows to change the name of one or several columns.
The new name is written first and the old name comes after.
An example changing the name of one column
```{r, eval=FALSE}
bird_long %>%
rename(unique_identifier = unique_id2)
```
## relocate
The argument **relocate** allows you to reorganize your columns and keeping just those that you are interested on.
```{r, eval=FALSE}
bird_long %>%
relocate(bird_id,year,nlocs)
```
Using this argument together with select you can keep only the columns of interest.
```{r, eval=FALSE}
bird_long %>%
select(bird_id,year,nlocs)%>%
relocate(bird_id,year,nlocs)
```
# Export
```{r, message=FALSE}
library(here)
ResultsFolder<-here()
```
```{r, eval=FALSE}
write_csv(
bird_joined,
file =paste0(ResultsFolder,'/bird_joined.csv'))
```