--- title: "Introduction to `rtweet` package" author: date: urlcolor: blue output: html_document: toc: true toc_depth: 2 toc_float: true number_sections: true fig_caption: true highlight: tango theme: default df_print: tibble --- ```{r, echo=FALSE, include=FALSE} knitr::opts_chunk$set(comment = '', highlight = TRUE) ``` > All you need is a Twitter account (user name and password) and you can be up in running in minutes! _Source: [`rtweet` README](https://cran.r-project.org/web/packages/rtweet/readme/README.html)_ # Introduction Load packages: ```{r, message=FALSE} # install.packages('rtweet') library(rtweet) library(tidyverse) ``` Documentation and resources: - https://cran.r-project.org/web/packages/rtweet/readme/README.html - https://cloud.r-project.org/web/packages/rtweet/rtweet.pdf - https://cran.csiro.au/web/packages/rtweet/vignettes/intro.html # `rtweet` package
**What is the `rtweet` package?** - R package for interacting with Twitter's API - An API ([application programming interface](https://en.wikipedia.org/wiki/API)) is a computing interface which defines interactions between multiple software intermediaries (e.g., the kinds of calls or requests that can be made) - We can use it to make requests to Twitter's API and request data
**How to get started**: - Create a [Twitter account](https://twitter.com/) if you don't have one already - Load the `rtweet` package - In your console (i.e., an interactive session of R), make a request to Twitter's API using one of `rtweet`'s functions (e.g., `search_tweets()`) - A browser window should pop up, prompting you to log in your Twitter account and authorize the `rstats2twitter` app

```r > tweets <- search_tweets('#ucla') ``` ``` Requesting token on behalf of user... Waiting for authentication in browser... Press Esc/Ctrl + C to abort ```
Twitter authentication
# `search_tweets()` function
**Description**: `search_tweets()` is a function from the `rtweet` package that returns Twitter statuses matching a user-provided search query.
**Usage**: ```{r, eval=FALSE} ?search_tweets # SYNTAX AND DEFAULT VALUES search_tweets( q, n = 100, type = "recent", include_rts = TRUE, geocode = NULL, max_id = NULL, parse = TRUE, token = NULL, retryonratelimit = FALSE, verbose = TRUE, ... ) ```
**Arguments**: - `q`: query to be searched (same as if you searched [here](https://twitter.com/explore)) - Use `OR` to search for tweets containing either search terms (e.g., `data OR science`) - Use `from:` to search for tweets from certain handles (e.g., `from:UCLA`) - Use `filter:` to only include certain kinds of tweets (e.g., `filter:verified`) - Use `-filter:` to exclude certain kinds of tweets (e.g., `-filter:replies`) - `n`: total number of tweets to return - `include_rts`: whether or not to include retweets in search results - etc. # Example
**Request data**: We will use `rtweet` to pull Twitter data from the PAC-12 universities. We will use the university admissions Twitter handle if there is one, or the main Twitter handle for the university if there isn't one: ```{r} # PAC-12 university handles p12 <- c('uaadmissions', 'FutureSunDevils', 'caladmissions', 'UCLAAdmission', 'futurebuffs', 'uoregon', 'BeaverVIP', 'USCAdmission', 'engagestanford', 'UtahAdmissions', 'UW', 'WSUPullman') # Use `OR` to search for tweets containing any of the handles my_query <- paste0('from:', p12, collapse = ' OR ') my_query ``` ```{r, echo=FALSE} # Load saved Twitter data p12_df <- readRDS(url('https://github.com/anyone-can-cook/rclass2/raw/master/data/p12_dataset.RDS')) ``` ```{r, eval=FALSE} # Request Twitter data p12_df_2 <- search_tweets(my_query, n = 500) ```
**Analyze data**: `search_tweets()` conveniently returns the data in a dataframe, so it's ready for you to perform data manipulations! ```{r} # Inspect dataframe str(p12_df, max.level = 1, strict.width = 'cut') # Select certain variables df <- p12_df %>% select('user_id', 'created_at', 'screen_name', 'text', 'location') head(df) # Filter and sort variables df %>% filter(screen_name == 'UCLAAdmission') %>% arrange(created_at) # Aggregate data df %>% group_by(screen_name) %>% count() %>% arrange(desc(n)) ```