--- title: "Introduction to R 8-20-2020" author: "Rob Wells" date: "8/20/2020" output: html_document --- **Get Organized** Create a folder on your Desktop: name it DataFall2020 Store this file and the Census data in this folder ** You only run this once for the first setup ```{r} #Example is below #Put in your own directory #setwd(/Users/rswells/Dropbox/Classes/Data Fall 2020/CovidFall2020) ``` ```{r} #This aligns the file with a directory and saves a huge hassle install.packages("here") library(here) ``` --- ### Orientation about R Studio **There are four main windows** Script writing, R Markdown, Table Viewer: Upper Left Environment - data loaded in R: Upper Right Console - write commands in R: Lower Left File Manager and Html table viewer: Bottom Right **Basic R skills** Loading software. Tidyverse Rio Run demos from Ch. 3 **Install software to grab data** tidyverse installs 8 separate software packages to perform data import, tidying, manipulation, visualisation, and programming **Packages vs. Libraries** 1) You install the software packages. Usually only once 2) You call up the software libraries. Do that every time ```{r} #Installing packages: tidyverse, rio, janitor #YOU ONLY DO THIS ONCE PER COMPUTER install.packages("tidyverse") install.packages("rio") install.packages("janitor") ``` I like the rio package for its easy importing features and janitor for data cleaning rio handles more than two dozen formats including tab-separated data (with the extension .tsv), After you install a package on your hard drive, you can call it back up by summoning a library Libraries are bits of software you will have to load each time into R to make things run. ```{r} library(tidyverse) library(rio) library(janitor) ``` **Now we can use the damned thing** **Load Data** ```{r} ArkRace <- rio::import("https://raw.githubusercontent.com/profrobwells/CovidFall2020/master/Data/ArkRace2018.csv") ``` ## Basic Commands Four Corners Test! Number Columns Number Rows Text, Numeric Data How many rows? nrow(yourdataset) ```{r} nrow(ArkRace) ``` How many columns? ncol(yourdataset) ```{r} ncol(ArkRace) ``` What is in the first six rows? head(yourdataset) ```{r} head(ArkRace) ``` Check data types ```{r} glimpse(ArkRace) ``` **Data Types** chr = character dbl = numeric (double-precision floating point number. it has decimals) --- Rename columns ```{r} #Rename several columns at once colnames(ArkRace)[2:3] <- c("County", "Total_Pop") ``` Check the head of the data table ```{r} head(ArkRace) #Names of your columns ``` ```{r} colnames(ArkRace) #Another method #names(ArkRace) ``` ```{r} ArkRace <- janitor::clean_names(ArkRace) ``` ```{r} #colnames(ArkRace) #Another method names(ArkRace) ``` Rename columns ```{r} #Rename several columns at once colnames(ArkRace)[6] <- c("black") colnames(ArkRace)[8] <- c("am_indian") colnames(ArkRace)[12] <- c("pac_island") colnames(ArkRace)[14] <- c("other_race") colnames(ArkRace)[16] <- c("hispanic") names(ArkRace) ``` **Software: How to get details and help** help(package="dplyr") browseVignettes("NameOfPackage”) help("NameOfFunction”) ??median Check to see what's installed by clicking on "Packages" tab in File Manager, lower right pane **Summary:** Here is a quick way to view the range of your data ```{r} summary(ArkRace) ``` Here is a summary just for one column. --DATA FRAME$COLUMN NAME --ArkRace$White ```{r} summary(ArkRace$white) ``` ### Exercise #1: Summary 1: Run summary on Black 2: Run summary on American Indian ```{r} TYPE YOUR COMMAND here ``` **Build a simple summary table by County** ```{r} BlackWhite <- ArkRace %>% select(county, white, black) %>% group_by(county) %>% arrange(desc(black)) BlackWhite ``` **Commands Used** select: Select columns by exact name group_by: Apply a function to a data frame by group arrange: Sort a data frame by one column, descending values ## Filtering Focus on White, Black, Hispanic Population in Washington County ```{r} ArkRace %>% select(county, white, black, hispanic) %>% filter(county =="Washington County, Arkansas") ``` **Shortcut Commands** Tab - Autocomplete In Console Window (lower left) Control (or Command) + UP arrow - last lines run Control (or Command) + Enter - Runs current or selected lines of code in the top left box of RStudio Shift + Control (or Command) +P - Reruns previous region code ### Exercise #2: Filtering Question White, Black, Hispanic Population for Benton County, Arkansas White, Black, Hispanic Population for Pulaski County, Arkansas White, Black, Hispanic Population for Arkansas ```{r} TYPE YOUR COMMAND here ``` **Filter Statewide Population by White Median Population** **Build a table with all counties above that median population** **See Line 178: Median = 15805 ** ```{r} MostWhites <- ArkRace %>% select(county, white) %>% filter(white > 15805) %>% arrange(desc(white)) MostWhites ``` **Exercise #3: Filtering by Medians** **Determine Median for Black and Hispanics Statewide** Table of Above Median Hispanic Population Table of Above Median Black Population **Basics of R** In the Console window, type: ```{r} demo() ``` ```{r} help() ``` ```{r} help.start() ``` ### Dplyr select Choose which columns to include. filter Filter the data. arrange Sort the data, by size for continuous variables, by date, or alphabetically. group_by Group the data by a categorical variable. #Question: ```{r} AboveIncome <- ArkRace %>% select(County, White) %>% filter(White > 42336) %>% arrange(desc(White)) AboveIncome ``` ### Exercise #4: Filtering. Counties with the lowest quartile in income ```{r} TYPE YOUR COMMAND here ``` **Question: Which county has the lowest percentage of population in this low income bracket?** Determine the cutoff for the low income distribution ```{r} summary(ArkRace$Pct_Income_to_25k) ``` ```{r} ArkRace %>% select(County, Pct_Income_to_25k) %>% filter(Pct_Income_to_25k==8.80) %>% arrange(desc(Pct_Income_to_25k)) ``` ### Exercise #5: Which county has the highest percentage of population in this low income bracket? ```{r} TYPE YOUR COMMAND here ``` ## Charts **Build a basic chart in ggplot using our data** ```{r} ggplot(AboveIncome, aes(x = County, y = White)) + geom_bar(stat = "identity") + coord_flip() + #this makes it a horizontal bar chart instead of vertical labs(title = "Communities with high median income", subtitle = "2016", caption = "Graphic by Wells", y="Place", x="Median Income") ``` **Basic chart, adding color** **fill=White** ```{r} ggplot(AboveIncome, aes(x = County, y = White, fill=White)) + geom_bar(stat = "identity", show.legend = FALSE) + coord_flip() + #this makes it a horizontal bar chart instead of vertical labs(title = "Communities with high median income", subtitle = "2016", caption = "Graphic by Wells", y="Place", x="Median Income") ``` **Basic chart, sorting** sorting = aes(x = reorder(Geography, White), y = White, fill=White)) + ```{r} ggplot(AboveIncome, aes(x = reorder(County, White), y = White, fill=White)) + geom_bar(stat = "identity", show.legend = FALSE) + coord_flip() + #this makes it a horizontal bar chart instead of vertical labs(title = "Communities with high median income", subtitle = "2016", caption = "Graphic by Wells", y="Place", x="Median Income") ``` **Basic chart, add labels** labels = geom_text(aes(label = White), hjust = -.1, size = 2.2) + **Use ggsave for high resolution image** ggsave("/Users/rswells/Dropbox/Classes/Data Fall 2020/CovidFall2020/Xx.png",device = "png",width=9,height=6, dpi=800) ```{r} ggplot(AboveIncome, aes(x = reorder(County, White), y = White, fill=White)) + geom_bar(stat = "identity", show.legend = FALSE) + geom_text(aes(label = White), hjust = -.1, size = 2.2) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + coord_flip() + #this makes it a horizontal bar chart instead of vertical labs(title = "Communities with high median income", subtitle = "2016", caption = "Graphic by Wells", y="Place", x="Median Income") ggsave("/Users/rswells/Dropbox/Classes/Data Fall 2020/CovidFall2020/Xx.png",device = "png",width=9,height=6, dpi=800) ``` ### Exercise #6: Build a chart with top 10 counties with lowest median incomes ```{r} TYPE YOUR COMMAND here ``` ### Exercise #7: Index to commands Check the following website and run a command on the ArkRace data https://smach.github.io/R4JournalismBook/HowDoI.html ```{r} TYPE YOUR COMMAND here ``` #--30--