=======================================================================================================
Welcome to Software Carpentry Etherpad for the November 5-6th. workshop at the Nature Conservancy!
This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try https://etherpad.wikimedia.org/).
Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/
We will use this Etherpad during the workshop for chatting, asking questions, taking notes collaboratively, and sharing URLs or bits of code.

Website: http://tiny.cc/tnc-swc  -  (https://mickley.github.io/2019-11-05-nature_conservancy/)

Socrative Login: https://b.socrative.com/login/student/
Room: TNCSWC

=======================================================================================================


Instructors:

Helpers:

Attendees: (Put your name here):
=======================================================================================================

Setup:
    1. Sign in up front: Get a name tag and a sticky note of each color
    2. Download and install software from our course website: https://mickley.github.io/2019-11-05-nature_conservancy/
    3. Put your name under Attendees above (You can get here from the Etherpad link on the course website)
    4. Fill out the pre-workshop survey if you haven't already: https://www.surveymonkey.com/r/swc_pre_workshop_v1?workshop_id=2019-11-05-nature_conservancy
    5. Enter your name at the top of the chat column of this etherpad (top right corner)
    6. Download the datasets here https://www.dropbox.com/sh/huimlzu3ezomugt/AABZu3wxkhkmqKEDHF7b2QFla?dl=0 and save them to your desktop
    7. In a web browser sign in to your GitHub.com account. Sign up for one if you don't have an account.
    8. In RStudio, click on Tools > Global Options... > Git/SVN > Make sure the grey box under "SSH RSA Key" shows a file name and a "View public key" link exists next to it and clicking on "View public key" opens a window opens which is not blank.  If the "SSH RSA Key" box is blank, click "Create RSA key..." and leave the optional Passphrase blank and click "Create".


***Please put up a blue sticky on your laptop if you have signed up for Github and have all three programs  installed (Github, R and Rstudio)***

===============================

Link to lessons: http://swcarpentry.github.io/r-novice-gapminder/

DAY 1 - Introduction to R and RSTUDIO

RStudio

R Coding

Good practice is to use meaningful variables. Some common conventions for multiple words are:

Cutting and pasting code/data
Vectorization

Importance of maintaining environment

Packages

Projects in RStudio

Getting help:

Getting data into and out of R:

Exploring data in R


===============================
DAY 1 - Programming in R

TODO:



In addition to logical comparisons (==, !=, >, < , >=, <=) we can use logical operators representing Or and And; 

A for loop iterates over a vector and performs some task for each value of the vector

Note: A tibble is just a dataframe that has some special properties. If you print a tibble it will only show the first 10 rows, and only as many columns as will fit nicely on your screen

The code for a function follows:
    
    function_name <- function(argument1, argument2, ... ){
        stuff you want to do
        return(output)
    }
    
    Functions should always end with a return statement

variables that exist within functions are ONLY within functions, outside of the function they don't exist.  the function is also working with a copy of the variable/data.



Graphing with ggplot


If you want to use a mathematical expression in a ggplot axis label you can use the expression function

ggplot help:


Some examples of useful things I (Mike T) have put together for sharing with folks:
* https://tnc-ny-science.github.io/nyc_trees/ownership_exploration_summary_20190807/canopy_ownership_summary_20190807.html
* https://mltconsecol.github.io/misc/commdists_quickAnalytics_interactive_minimal.nb.html


Reports:
    check out Help/Markdown quick reference
    check out rmarkdown.rstudio.com for lots of galleries etc
    difference between r notebook (rmd) format vs html.  the html export won't let people you share it with run the code chunks right from in the html
   ... R first runs the code chunks and makes them into markdown
R Notebooks use a different environment than the one in main RStudio.  Have to define variables and fcns and data etc in the notebook and run in order for subsequent commands to work-you can't just call directly to things you've defined in your studio session
===============================
DAY 2 - Data Wrangling


TODO:


select(.data, ...) allows you to choose what columns you want to work with in your dataframe

Note: a tibble will show you first 10 rows and only as many columns as fit nicely in data frame, will tell you it is a tibble at the top

filter(.data, ...) allows you to get rows from the dataframe that match a given condition or set of conditions

%>% is called a pipe

1. The traditional way with nested functions
go_to_work(get_breakfast(wake_up()))
2. Using the pipe
wake_up() %>% get_breakfast() %>% go_to_work()

Note that if you are going to comment out lines for special cases or particular runs, or just to explore if something is working, consider adding I() at the end of the code.  it just returns the final identiy so that the code won't look for things that aren't there or get hung up waiting for another command from you.  

group_by(.data, ...) tells R that you want to do operations to subsets of the data based on groups defined in the columns you give to the function

summarize(.data, ...) allows you to compute things on the groups 

the n() function counts the number of samples in a group

mutate(.data, ...) allows you to alter the dataframe by changing a column or adding a new column

Wide format Ex:
    genus
    ursus
    felis
    
    LONG which R prefers
Note: An alternative to read.csv() is the read_csv() function from the readr package. The read_csv() function works the same as read.csv(), but imports the data as a dataframe and a tibble, and it uses the stringsAsFactors = FALSE as default

Gather doesn't remove any columns that aren't involved, but it does remove ones that are.

Cartoon spread/gather https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/tidyr_spread_gather.png
other fun rstats illustrations: https://github.com/allisonhorst/stats-illustrations/tree/master/rstats-artwork

Explainer for joining two dataframes together with dplyr (with animations): https://www.garrickadenbuie.com/project/tidyexplain/

Data resources

There is a community of r users who do #TidyTuesday on twitter. Each week a new data set is posted for people to plot 
You can learn more here: https://github.com/rfordatascience/tidytuesday
Here is one very good r-user who does a screencast of how he works through the data https://www.youtube.com/watch?v=6GV9sAD6Pi0&feature=youtu.be

If you are on Slack there is an online learning community R for Data Science: https://medium.com/@kierisi/join-the-r-for-data-science-online-learning-community-842527222ab3

At TNC we have access to LinkedInLearning which has a decent  number of R courses: https://www.linkedin.com/learning/topics/r?u=2186177

===============================
DAY 2 - Version Control with Git

Working in the terminal (git bash)

Working with Git

Always pull before you start work
A resource I found helpful : https://happygitwithr.com/
Some more git help: https://github.com/k88hudson/git-flight-rules