# Impressions on Video Game Developers from Online Forums

## Reasoning:
Knowledge of webscraping gives access to the largest bank of data available. Almost any website can become a source of data. Its use can range from analyzing competitors to learning more about a user base.

It would make sense to scrape through comments of each post too, but that would take far too long. GameFaqs is at least very casual so posts there will be as opinionated as comments.


## Objective:
The slightly long-winded title explains most of what this notebook is about. My goal is to scrape the post titles from Reddit and Gamefaqs to perform sentiment analysis on them on top of a some data exploration.


## Methods:
I will be using selenium to scroll through all reddit posts and to do some other automation used for clicking buttons. BeautifulSoup will be used to scrape and retrieve the actual data.

I will need to scrape a list of current common user agents and lists of free, recent proxies to rotate through for GameFaqs. They use pages and not infinite scrolling so too many http requests will result in a ban. To play it safe and uninterrupted a few measures will be taken. These IPs and user agents could be used for scraping any other website as well.

I will be obtaining all posts from the past year and filter out the ones which don't mention any that don't mention a developer.

## Featured:
- Webscraping with BeautifulSoup and Selenium
- Advanced knowledge on how to rotate proxies and user agents
- Working with Pandas DataFrames
- Data analysis and visualization


In [77]:
import requests
import random
import time
import pandas as pd
from bs4 import BeautifulSoup as soup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

## Scraping reddit

<br>
Here I create a function which uses a web driver to simulate scrolling and load in all results of the page (reddit limits to around 1000 posts). If I wanted to access the entire archive, there are websites which store all of Reddit's data. However, since I am also getting plenty of data from other forums there is not much need to scrape the archives for the purpose of this notebook.

After the entire page is loaded, we can scrape all the text of each post title.

In [65]:
def reddit_scraper(url):
    '''
    Webscrapes all reddit posts from the given link by scrolling through the "infinite scrolling"
    
    Args:
        url: The url of the subreddit or other reddit page you'd like to scrape
    
    Returns:
        A list of all post titles on that page
    '''
    
    driver = webdriver.Chrome(ChromeDriverManager().install())

    driver.get(url)

    for n in range(600): 
        driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

        time.sleep(0.5)

        
    page_html = soup(driver.page_source, 'lxml')

    driver.close()
    
    containers = page_html.findAll("a", {'data-click-id' : 'body'})

    post_titles = []
    for container in containers:

        titles = container.find_all("h2", recursive=True)

        for title_tag in titles:
            post_titles.append(title_tag.text)

            
    return post_titles

### Scraping r/games

In [280]:
r_games_url = 'https://www.reddit.com/r/games/top/?t=year'
r_games_posts = reddit_scraper(r_games_url)


Checking for mac64 chromedriver:2.46 in cache
Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver


### Scraping r/gaming

In [281]:
r_gaming_url = 'https://www.reddit.com/r/gaming/top/?t=year'
r_gaming_posts = reddit_scraper(r_gaming_url)


Checking for mac64 chromedriver:2.46 in cache
Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver


### Scraping r/truegaming

In [282]:
r_truegaming_url = 'https://www.reddit.com/r/truegaming/top/?t=year'
r_truegaming_posts = reddit_scraper(r_truegaming_url)


Checking for mac64 chromedriver:2.46 in cache
Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver


### Let's combine all the results from each subreddit

In [289]:
# r/games
r_games_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/games']], columns=['Post', 'Website', 'Board']) 
                           for title in r_games_posts], 
                           ignore_index=True)

# r/gaming
r_gaming_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/gaming']], columns=['Post', 'Website', 'Board']) 
                            for title in r_gaming_posts], 
                            ignore_index=True)

# r/truegaming
r_truegaming_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/truegaming']], columns=['Post', 'Website', 'Board']) 
                                for title in r_truegaming_posts], 
                                ignore_index=True)

# Join all to post_titles
post_titles = pd.DataFrame(columns=['Post', 'Website', 'Board'])
post_titles = post_titles.append([r_games_forums, r_gaming_forums, r_truegaming_forums], ignore_index=True)

### Progress Report
Let's check out the DataFrame so far

In [290]:
print('Shape: {}'.format(post_titles.shape))
post_titles.head(5)

Shape: (3007, 3)


Unnamed: 0,Post,Website,Board
0,"John @Totalbiscuit Bain July 8, 1984 - May 24,...",Reddit,r/games
1,Bungie Splits With Activision,Reddit,r/games
2,"Totalbiscuit hospitalized, his cancer is sprea...",Reddit,r/games
3,[E3 2018] Cyberpunk 2077,Reddit,r/games
4,Sony faces growing Fortnite backlash at E3,Reddit,r/games


## Scraping GameFaqs

GameFaqs will be a different challenge. Rather than infinite scrolling, this website uses pages. This means many http requests will need to be made, so in order to avoid an ip ban and not strain their servers a few things must be done:

1. Rotate user agents
2. Rotate ip addresses
3. Sleep on each request

### 1. Obtaining User Agents to Rotate Through

To accomplish this we have to make use of whatismybrowser.com's list of common current user agents, which means webscraping the page to stay up to date.

In [224]:
def get_agents(browser, num_agents=10, offset=0):
    '''
    Webscrapes whatismybrowser.com for new user agents
    
    Args:
        browser: the browser you want the agent from
        num_agents: number of agents to return
        offset: get agents starting from offset num. on page
    
    Returns:
        A list of user agents from the given browser
    '''
    
    if offset + num_agents > 50:
        return []
    
    try:
        chrome_url = requests.get('https://developers.whatismybrowser.com/useragents/explore/software_name/' \
                                  + browser)
    except:
        print('Browser does not exist. Try lower case')
        return
        

    chrome_html = soup(chrome_url.content)

    chrome_containers = chrome_html.findAll('td', {'class' : 'useragent'})

    user_agents = []
    for i in range(num_agents):

        chrome_agent = chrome_containers[i + offset].a.text

        user_agents.append(chrome_agent)
    
    return user_agents

Get 10 user agents for chrome and 10 for firefox

In [175]:
user_agents = []
user_agents.extend(get_agents('chrome'))
user_agents.extend(get_agents('firefox'))

### 2. Obtaining IP Addresses to Rotate Through

We need to create a similar function for retrieving new proxies. This function is more important to call frequently as IPs should be updated frequently.

In [93]:
def get_ips(num_addresses=20):
    '''
    Webscrapes free-proxy-list.net for new free proxies. This is important because these proxies
    could go bad after just a couple hours.
    
    Args:
        num_addresses: The number of IPs you want returned. If fewer than requested are available,
        return the available amount
    
    Returns:
        A list of new proxies
    '''
        
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get('https://free-proxy-list.net/')
    
    page_html = soup(driver.page_source, 'lxml')
    containers = page_html.findAll('tr', {'role' : 'row'})
    
    ips = []
    ip_num = 0
    page_num = 1
    next_set_btn = driver.find_element_by_xpath('//*[@id="proxylisttable_next"]/a')
    while len(ips) < num_addresses:
                    
        ip_num += 1
        
        # Click next button to get more ips if the current page doesn't have enough
        if ((ip_num % 20) - 1 == 0) and ip_num != 1:
            
            # If reached the last page, return what we have
            if page_num >= 15:
                driver.close()
                return ips
            
            next_set_btn.click()
            next_set_btn = driver.find_element_by_xpath('//*[@id="proxylisttable_next"]/a')
            
            ip_num = 1
            page_html = soup(driver.page_source, 'lxml')
            containers = page_html.findAll('tr', {'role' : 'row'})
            
            page_num += 1
        
        row = containers[ip_num].find_all('td')
        
        ip = row[0].text
        port = row[1].text
        
        if row[6].text == 'yes':
            ips.append(':'.join([ip, port]))

    driver.close()
    
    return ips

In [94]:
ips = get_ips(20)


Checking for mac64 chromedriver:2.46 in cache
Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver


In [95]:
# Print out the proxies to see what they look like
ips

['51.68.112.254:3128',
 '45.32.42.234:8080',
 '178.128.54.73:8080',
 '104.248.16.45:8080',
 '177.38.66.255:45235',
 '95.47.180.171:53484',
 '138.186.23.9:40340',
 '182.160.119.254:56229',
 '103.250.157.43:38641',
 '115.127.39.66:55474',
 '88.210.71.234:46626',
 '177.94.206.67:60666',
 '1.10.186.157:55129',
 '176.197.103.210:53281',
 '109.201.97.235:39125',
 '31.43.143.15:8181',
 '193.213.89.72:51024',
 '183.82.118.87:8080',
 '41.84.131.78:53281',
 '93.77.78.123:42803']

### 3. GameFaqs Scraping Function with Pauses

In [249]:
def gamefaqs_scraper(url, num_pages, ips, user_agents, offset=0, start_page=0):
    '''
    Scrape GameFaqs forums for post titles
    
    Args:
        url: The url to the first page of GameFaqs
        num_pages: Number of pages to scrape
        
    Returns:
        A list of post titles
    '''
    
    rot_list = []
    for ip in ips:
        rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])
        
        
    req = ''
    i = 0
    while req == '':
        try:
            agent_proxy_pair = random.choice(rot_list)
            proxy = agent_proxy_pair[0]
            headers = agent_proxy_pair[1]
            
            if start_page == 0:
                req = requests.get(url, headers=headers, proxies=proxy, timeout=10)
            else:
                req = requests.get(url + '?page=' + str(start_page+1), headers=headers, proxies=proxy, timeout=10)
                
            print('Success with IP, ' + proxy['https'])
            
            page_html = soup(req.content)

            containers = page_html.findAll('td', {'class' : 'topic'})
            
            if not containers:
                print('Agent may be banned, removing agent and trying a new one...')
                print(page_html, user_agent)
                try:
                    user_agents.remove(headers['User-Agent'])    
                except:
                    pass
                
                req = ''

        except Exception as e:
            i += 1
            print('Error with IP, ' + proxy['https'] + ' requesting a new one...')
        
            if i % 20 == 0:
                ips = get_ips(20)
                
                rot_list = []
                for ip in ips:
                    rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])
                 
    
    post_titles = []
    for page in range(start_page, num_pages):
        
        for container in containers:
            title = container.a.text
                 
            post_titles.append(title)
        
        
        time.sleep(3)
        
        req = ''
        i = 0
        while req == '':
            try:        
                agent_proxy_pair = random.choice(rot_list)
                proxy = agent_proxy_pair[0]
                headers = agent_proxy_pair[1]

                req = requests.get(url + '?page=' + str(page + 1), headers=headers, proxies=proxy, timeout=10)
                
                page_html = soup(req.content)

                containers = page_html.findAll('td', {'class' : 'topic'})

                if not containers:
                    print('Agent may be banned, removing agent and trying a new one...')
                    
                    try:
                        rot_list.remove(agent_proxy_pair)
                        if not rot_list:
                            print('Loading in new IPs...')
                            ips = get_ips(20)

                            rot_list = []
                            for ip in ips:
                                rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])

                            
                        user_agents.remove(headers['User-Agent'])    
                    except:
                        pass
                    
                    req = ''
                    time.sleep(2)
                    
                    if len(user_agents) == 0:
                        print('No more agents, ended at page,', page+1)
                        return post_titles
                else:
                    print('Success with IP ' + proxy['https'] + ', now onto page ', page + 2)
            
            except Exception as e:
                i += 1
                time.sleep(2)
                print('Error with IP ' + proxy['https'] + ', requesting a new one...')
    
                if i % 20 == 0:
                    print('Loading in new IPs...')
                    ips = get_ips(20)
                    
                    rot_list = []
                    for ip in ips:
                        rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])
                    
        if page % 100 == 0:
            print('Loading in new IPs...')
            ips = get_ips(20)
            
            rot_list = []
            for ip in ips:
                rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])
            
            
    return post_titles

Note: I am simply printing the last 10 outputs for each forum webscraped after realizing the output couldn't be shrinked when uploaded.

### Nintendo Switch forums

In [3]:
switch_url = 'https://gamefaqs.gamespot.com/boards/189706-nintendo-switch'
switch_posts = gamefaqs_scraper(switch_url, num_pages=1700, ips=ips, user_agents=user_agents)

Success with IP 118.174.233.33:54705, now onto page  1692 
Success with IP 203.128.94.102:60152, now onto page  1693 
Success with IP 182.52.238.111:45639, now onto page  1694 
Success with IP 116.203.1.177:1994, now onto page  1695 
Success with IP 217.17.38.245:41506, now onto page  1696 
Success with IP 203.128.94.102:60152, now onto page  1697 
Success with IP 180.180.156.35:49510, now onto page  1698 
Success with IP 1.20.97.4:46965, now onto page  1699 
Success with IP 203.128.94.102:60152, now onto page  1700 
Success with IP 180.180.156.45:32355, now onto page  1701


### Playstation forums

In [225]:
# Use new agents to avoid a temporary ban
user_agents = []
user_agents.extend(get_agents('chrome/2', num_agents=50))
user_agents.extend(get_agents('firefox/2', num_agents=50))
user_agents.extend(get_agents('safari/2', num_agents=50))

In [5]:
ips = get_ips(20)
playstation_url = 'https://gamefaqs.gamespot.com/boards/691087-playstation-4'
playstation_posts = gamefaqs_scraper(playstation_url, num_pages=1935, ips=ips, user_agents=user_agents)

Success with IP 119.192.179.46:55012, now onto page  1932 
Success with IP 1.20.101.150:41904, now onto page  1933  
Error with IP 103.194.192.29:49202, requesting a new one... 
Agent may be banned, removing agent and trying a new one... 
Error with IP 213.14.32.75:47442, requesting a new one... 
Error with IP 103.220.28.180:51493, requesting a new one... 
Error with IP 103.220.28.180:51493, requesting a new one... 
Success with IP 111.91.225.2:8080, now onto page  1934 
Success with IP 119.192.179.46:55012, now onto page  1935 
Success with IP 1.20.101.150:41904, now onto page  1936


### PC forums

In [250]:
# Use new agents to avoid a temporary ban
user_agents = []
user_agents.extend(get_agents('chrome/3', num_agents=50))
user_agents.extend(get_agents('firefox/3', num_agents=50))
user_agents.extend(get_agents('safari/3', num_agents=50))

In [6]:
ips = get_ips(20)
pc_url = 'https://gamefaqs.gamespot.com/boards/916373-pc'
pc_posts = gamefaqs_scraper(pc_url, num_pages=1065, ips=ips, user_agents=user_agents)

Success with IP 176.98.95.247:31955, now onto page  1061 
Error with IP 45.6.100.250:48214, requesting a new one... 
Error with IP 75.98.119.13:57859, requesting a new one... 
Error with IP 45.6.100.250:48214, requesting a new one... 
Success with IP 41.215.81.170:59959, now onto page  1062 
Success with IP 41.215.81.170:59959, now onto page  1063 
Error with IP 45.6.100.250:48214, requesting a new one... 
Success with IP 87.26.3.40:8080, now onto page  1064 
Success with IP 203.205.29.106:39191, now onto page  1065 
Success with IP 87.26.3.40:8080, now onto page  1066


### Xbox One forums

We will just reuse the same user agents here

In [7]:
ips = get_ips(20)
xbox_url = 'https://gamefaqs.gamespot.com/boards/691088-xbox-one'
xbox_posts = gamefaqs_scraper(xbox_url, num_pages=710, ips=ips, user_agents=user_agents)

Error with IP 210.11.181.221:55331, requesting a new one... 
Error with IP 178.128.217.99:8080, requesting a new one... 
Error with IP 31.209.110.159:39494, requesting a new one... 
Error with IP 210.11.181.221:55331, requesting a new one... 
Error with IP 202.91.92.21:43576, requesting a new one... 
Error with IP 5.2.200.145:44508, requesting a new one... 
Success with IP 109.201.142.14:3128, now onto page  710 
Agent may be banned, removing agent and trying a new one... 
Error with IP 124.41.240.191:38167, requesting a new one... 
Success with IP 109.201.142.14:3128, now onto page  711


### Removing Duplicates and Combining All the Results

In [257]:
switch_posts = list(set(switch_posts))
playstation_posts = list(set(playstation_posts))
pc_posts = list(set(pc_posts))
xbox_posts = list(set(xbox_posts))

In [291]:
# Switch Boards
switch_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Switch']], columns=['Post', 'Website', 'Board']) 
                           for title in switch_posts], 
                           ignore_index=True)

# PS4 Boards
playstation_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Playstation 4']], columns=['Post', 'Website', 'Board']) 
                            for title in playstation_posts], 
                            ignore_index=True)

# Xbox One Boards
xbox_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Xbox One']], columns=['Post', 'Website', 'Board']) 
                                for title in xbox_posts], 
                                ignore_index=True)

# PC Boards
pc_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'PC']], columns=['Post', 'Website', 'Board']) 
                                for title in pc_posts], 
                                ignore_index=True)

# Join all to post_titles
post_titles = pd.concat([post_titles, switch_forums, playstation_forums, xbox_forums, pc_forums], ignore_index=True)

We now have all the posts we want and could display the final results

In [292]:
post_titles

Unnamed: 0,Post,Website,Board
0,"John @Totalbiscuit Bain July 8, 1984 - May 24,...",Reddit,r/games
1,Bungie Splits With Activision,Reddit,r/games
2,"Totalbiscuit hospitalized, his cancer is sprea...",Reddit,r/games
3,[E3 2018] Cyberpunk 2077,Reddit,r/games
4,Sony faces growing Fortnite backlash at E3,Reddit,r/games
5,John “TotalBiscuit” Bain to be inducted into E...,Reddit,r/games
6,"Later today, Red Dead 2 gets a new trailer. Be...",Reddit,r/games
7,List of Video Games where you can pet the dogs,Reddit,r/games
8,It's time video game makers unionize.,Reddit,r/games
9,Bethesda Support Leaks Fallout 76 Customer Nam...,Reddit,r/games


## Extracting Titles which Mention Large Game Companies

First I have to make a list of relevant developers and their different nicknames.

In [313]:
# The full name is only listed in cases like 'Activision Blizzard' together with 'Activision' and 'Blizzard'
# in order to label each post in the next step
developers = [['Tencent'], ['Rockstar'], ['Valve'], ['Sony'], ['Microsoft'], ['Nintendo'], ['Bungie'],
              ['Activision Blizzard', 'Activision', 'Activi$ion', 'Blizzard'], ['Electronic Arts', 'EA'],
              ['Bandai Namco', 'Bandai', 'Namco'], ['Ubisoft'], ['Nexon'], ['Telltale'], 
              ['Epic Games', 'Epic'], ['BioWare'], ['Naughty Dog'], ['Square Enix', 'Square'], 
              ['Bunjie'], ['Insomniac'], ['Bethesda'], ['Capcom'], ['Take-Two', 'Take Two', 'Take 2', 'Take2'], 
              ['Sega'], ['Devolver Digital', 'Devolver'], ['Konami'], ['Apple']]

In [335]:
import re

In [376]:
dev_posts = pd.DataFrame(columns=['Post', 'Website', 'Board', 'Developer'])
index = 0
post_dict = {}
for i in range(len(post_titles)):

    all_developers = []
    for dev in developers:
        for nickname in dev:
            
            # Special case for EA. Common nickname but could also be mixed with common words like "each".
            match = False
            if nickname == 'EA':                
                post_title = post_titles['Post'].loc[i]
                
                # Regex to match EA outside of other words
                if re.match(r'([^a-zA-Z]|^)EA([^a-zA-Z]|$)', post_title):
                    all_developers += [dev[0]]
                    match = True
                    
            else:
                post_title = post_titles['Post'].loc[i].lower()
                
                if nickname.lower() in post_title:
                    all_developers += [dev[0]]
                    match = True

            if match:
                if post_dict.get(dev[0]):
                    post_dict[dev[0]].append(post_titles['Post'].loc[i])
                else:
                    post_dict[dev[0]] = [post_titles['Post'].loc[i]]
                break
                
    if all_developers:            
        row = post_titles.loc[i].values.tolist() + [', '.join(all_developers)]
        dev_posts.loc[index] = row
        index += 1

In [337]:
print('Shape, {}'.format(dev_posts.shape))
dev_posts.head()

Shape, (8493, 4)


Unnamed: 0,Post,Website,Board,Developer
0,Bungie Splits With Activision,Reddit,r/games,"Bungie, Activision Blizzard"
1,Sony faces growing Fortnite backlash at E3,Reddit,r/games,Sony
2,"Later today, Red Dead 2 gets a new trailer. Be...",Reddit,r/games,"Rockstar, Take-Two"
3,Bethesda Support Leaks Fallout 76 Customer Nam...,Reddit,r/games,Bethesda
4,"Ubisoft will now ban players for racist, homop...",Reddit,r/games,Ubisoft


## Sentiment Analysis

### Public Impressions of Developers

Now we can finally analyze our data and figure out how well public opinion is in each of these developer's favor.
<br>
First we want to do a simple comparison based on sentiment, this will be a 3 step process:

1. Gather all titles associated with each developer
2. Perform sentiment analysis on each title
3. Calculate the mean of the results for each developer

In [340]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/adrianherrmann/nltk_data...


True

We want to judge sentiments based on the compound score, which is the sum of all lexicon ratings standarized to be within the range from -1 to 1

In [359]:
dev_sentiments = pd.DataFrame(columns=['Mean Sentiment', 'Developer', 
                                       'Most Negative Sentence', 'Most Positive Sentence',
                                       'Most Negative Score', 'Most Positive Score',
                                       'Number of Posts'])
index = 0
sid = SentimentIntensityAnalyzer()
for dev in developers:
    titles = dev_posts[dev_posts['Developer'].str.contains(dev[0])]
    
    if not titles.values.tolist():
        continue
    
    tot_sentiment = 0
    most_neg_sent = ''
    most_pos_sent = ''
    most_neg_score = 1
    most_pos_score = -1
    for title in titles['Post'].values:
        sentiment = sid.polarity_scores(title)['compound']   
        tot_sentiment += sentiment
        
        if sentiment < most_neg_score:
            most_neg_score = sentiment
            most_neg_sent = title
            
        if sentiment > most_pos_score:
            most_pos_score = sentiment
            most_pos_sent = title
    
    mean_sentiment = tot_sentiment / len(titles)
    
    dev_sentiments.loc[index] = [mean_sentiment, dev[0],
                                 most_neg_sent, most_pos_sent,
                                 most_neg_score, most_pos_score,
                                 len(titles)]
    index += 1
    
    
    

Now that the sentiments are analyzed we can view the important details.
<br>
#### NOTE:
It's expected that some of these posts will be wrongfully rated. For example if a game has a name with a negative word and it is mentioned within the same sentence as a developer (think Resident Evil), then the title's score will negatively affect the rating. At least for now, on a grand scheme, these analyses will average out and lean toward how they are truly perceived (given the sample size is large enough).

It is important to dive deeper so that you can apply even more specific filtering and sentiment analysis when analyzing one company, which I will be doing a bit of. The post_dict created earlier will help.

In [361]:
dev_sentiments

Unnamed: 0,Mean Sentiment,Developer,Most Negative Sentence,Most Positive Sentence,Most Negative Score,Most Positive Score,Number of Posts
0,0.138989,Tencent,Does anyone actually play the crappy F2P games...,Should superior Chinese companies like Tencent...,-0.296,0.8176,9
1,-0.043847,Rockstar,Rockstar Lies & Red Dead Online Economy Is A G...,Ubisoft is a BETTER company than Rockstar! LOL...,-0.8625,0.8419,167
2,-0.018581,Valve,Dead before it even released? A valve game?? A...,"Artifact is so good, Kotaku writer wants to re...",-0.7041,0.8147,94
3,0.03159,Sony,Sony's Devil May Cry has arrived. Lost Souls A...,Sony wins best Float at PRIDE 2018,-0.8689,0.9008,1627
4,0.054865,Microsoft,NO! BAD MICROSOFT! I'm so ashamed of you!,Amazing show Microsoft!! My brother even said ...,-0.9191,0.8798,708
5,0.072509,Nintendo,"Resident Evil, Resident Evil 0, and Resident E...",Discovered a Nintendo office close to where I ...,-0.9349,0.9273,3969
6,0.009765,Bungie,Activision currently under investigation for f...,LMAO Anthem is the exact same hustle Bungie us...,-0.5859,0.6841,37
7,0.006517,Activision Blizzard,"Heroes of the Storm pros vent sadness, anger a...",Thank you Activision for CoD Black Ops 4 Black...,-0.7783,0.8555,183
8,-0.105926,Electronic Arts,EA Head Fired For Gross Misconduct,EA are an excellent company that provides chea...,-0.7717,0.8316,130
9,0.03839,Bandai Namco,WTF were Namco Bandai thinking?,Bandai Namco proves to be the best third party...,-0.6739,0.7845,73


### EA and Nintendo

Now let's take a deeper look at a couple of companies with scores on two opposite ends of the spectrum, Electronic Arts and Nintendo. These two have the second worst and second best scores respectively, but they also have plenty of posts, which the developers with the worst and best scores (Take-Two, 20 posts and Tencent, 9 posts) don't have.

#### Nintendo:

In [364]:
dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']

Unnamed: 0,Mean Sentiment,Developer,Most Negative Sentence,Most Positive Sentence,Most Negative Score,Most Positive Score,Number of Posts
5,0.072509,Nintendo,"Resident Evil, Resident Evil 0, and Resident E...",Discovered a Nintendo office close to where I ...,-0.9349,0.9273,3969


In [367]:
print('Nintendo\'s most negative sentence:\n' +
      dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']['Most Negative Sentence'].values[0] + '\n')
print('Nintendo\'s most positive sentence:\n' +
      dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']['Most Positive Sentence'].values[0] + '\n')

Nintendo's most negative sentence:
Resident Evil, Resident Evil 0, and Resident Evil 4 coming to Nintendo Switch in 2019

Nintendo's most positive sentence:
Discovered a Nintendo office close to where I live and asked if they had any kind of tour or something. Lady told me they hadn’t but she handed me a bag full of cool souvenirs. This coin is definitely the best of all!



For Nintendo it looks like the worst post, which is in fact the most negatively rated post of all threads across all developers, is rated so because it mentions the game "Resident Evil" multiple times. This only testifies for their high overall score.
<br><br>
Nintendo being so well liked comes to no surprise. They without a doubt have the most devout following of any modern gaming company. So many people grew up on Nintendo as children and continue to play their games as adults, many even strictly stick to Nintendo.
<br><br>
Let's get the word frequencies from Nintendo posts.

In [380]:
from collections import Counter

In [414]:
neg_titles = []
for title in post_dict['Nintendo']:
    sentiment = sid.polarity_scores(title)['compound']
    
    if sentiment <= -0.5:
        neg_titles.append(title)


nintendo_words = ' '.join(neg_titles).split(' ')

neu_words=[]

for word in nintendo_words:
    sentiment = sid.polarity_scores(word)['compound']
    
    if (sentiment >= -0.4 and sentiment <= 0.4):
        neu_words.append(word.lower())

neu_freq = Counter(neu_words)

print('Most common neutral words in negative titles: ', neu_freq.most_common(50))

Most common neutral words in negative titles:  [('nintendo', 187), ('the', 71), ('to', 52), ('is', 46), ('switch', 35), ('a', 32), ('of', 29), ('and', 26), ('for', 26), ('why', 26), ('in', 19), ("nintendo's", 19), ('on', 19), ('online', 19), ('you', 19), ('it', 16), ('has', 15), ('have', 14), ('do', 14), ('that', 13), ('so', 13), ('i', 12), ('are', 12), ('what', 11), ('-', 10), ('up', 10), ('games', 10), ('with', 10), ('this', 10), ('will', 9), ('sony', 9), ('if', 9), ('console', 9), ('does', 8), ('e3', 8), ('be', 8), ('an', 7), ('about', 7), ('at', 7), ('get', 7), ('nintendo?', 7), ('game', 7), ('think', 6), ('was', 6), ('or', 6), ('would', 6), ('not', 6), ('how', 6), ('most', 6), ('did', 6)]


Going down the list we see some common words, but then notice one which should definitely not be common:
<br>
'Online'

Here are some sentences containing online in the titles with negative sentiments, there are 19 posts total.

In [416]:
index = 0
bad_count = 0
while bad_count < 10:
    if 'online' in neg_titles[index].lower():
        print(neg_titles[index])
        bad_count += 1
    index += 1

Jim Sterling: The Online system makes nintendo look weak and stupid
Nintendo Switch Paid Online Still a Disaster? - Nintendo Direct Review
So nintendo online was a scam
Everytime I finish a mission in Resident Evil a Nintendo Online message appears
Would it of killed Nintendo to add promotion SNES titles to new Online Subs?
Nintendo's paid online is bad.  FACT.
scumbag nintendo wont let me try the darksouls demo without online.
Will Nintendo Switch Online kill multiplayer lobbies?
Nintendo would be dumb to not have an online paywall TBH.
What the hell does Nintendo online even include?


From this it's obvious that one very critical complaint of Nintendo is the Online system they have in place. If there was one thing they could do to please their base, it would be to address the paywall and offer more with their online subscription (i.e. it's lackluster). We can figure this all out just based on these posts.

#### Electronic Arts:

In [363]:
dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']

Unnamed: 0,Mean Sentiment,Developer,Most Negative Sentence,Most Positive Sentence,Most Negative Score,Most Positive Score,Number of Posts
8,-0.105926,Electronic Arts,EA Head Fired For Gross Misconduct,EA are an excellent company that provides chea...,-0.7717,0.8316,130


In [369]:
print('Electronic Arts\'s most negative sentence:\n' +
      dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']['Most Negative Sentence'].values[0] + '\n')
print('Electronic Arts\'s most positive sentence:\n' +
      dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']['Most Positive Sentence'].values[0] + '\n')

Electronic Arts's most negative sentence:
EA Head Fired For Gross Misconduct

Electronic Arts's most positive sentence:
EA are an excellent company that provides cheap access to a lot of great games



Unlike Nintendo, many people online really hate EA. In fact, they have the most downvoted comment of any post in Reddit history, which should be a testament to how negatively they are seen. Still, they continue to be pretty successful. Apex Legends, a new game they recently released, seems to be gaining rapid popularity. Public opinion on the way they monetize their games seems to be changing, which may be a good indicator that people will once again have a positive attitude towards the company.
<br><br>
Similarly I am going to check EA's sentiment frequency.

In [418]:
neg_titles = []
for title in post_dict['Electronic Arts']:
    sentiment = sid.polarity_scores(title)['compound']
    
    if sentiment <= -0.5:
        neg_titles.append(title)


ea_words = ' '.join(neg_titles).split(' ')

neu_words=[]

for word in ea_words:
    sentiment = sid.polarity_scores(word)['compound']
    
    if (sentiment >= -0.4 and sentiment <= 0.4):
        neu_words.append(word.lower())

neu_freq = Counter(neu_words)

print('Most common neutral words in negative titles: ', neu_freq.most_common(75))

Most common neutral words in negative titles:  [('ea', 21), ('for', 9), ('star', 6), ('game', 6), ('is', 6), ('of', 4), ('open', 4), ('world', 4), ('hiring', 3), ('under', 3), ('investigation', 3), ('to', 3), ('the', 3), ('cancels', 2), ('open-world', 2), ('an', 2), ('anthem', 2), ('video', 2), ('removed', 2), ('another', 2), ('sell', 2), ('lootboxes', 2), ('in', 2), ('belgium', 2), ('games', 2), ('are', 2), ('stocks', 2), ('by', 2), ('bfv', 2), ('sales', 2), ('people', 2), ('ea:', 1), ('youtube', 1), ("creator's", 1), ('disclosure', 1), ('not', 1), ('content', 1), ('head', 1), ('misconduct', 1), ('zelda', 1), ('botw', 1), ('mass', 1), ("effect's", 1), ('franchise', 1), ('continuing', 1), ('automatically', 1), ('loses.', 1), ('conference.', 1), ('and', 1), ('got', 1), ('downgraded', 1), ('says', 1), ('singleplayer', 1), ('god', 1), ('goty.', 1), ('massive', 1), ('blow', 1), ('&', 1), ('plummet!', 1), ('should', 1), ('space', 1), ('ip', 1), ('falling', 1), ('apart.', 1), ('we', 1), ('pl

Because EA has a smaller sample size we should look at multiple words to get some more intuition.

In [421]:
index = 0
bad_count = 0
while bad_count < 10:
    if 'star' in neg_titles[index].lower() \
    or 'anthem' in neg_titles[index].lower() \
    or 'lootboxes' in neg_titles[index].lower():
        
        print(neg_titles[index])
        bad_count += 1
    index += 1

EA Cancels Open-World Star Wars Game
EA Is Hiring For An Open-World Star Wars Game
EA: YouTube creator's Anthem video removed for disclosure failure, not content
EA is under criminal investigation for continuing to sell Lootboxes in Belgium
EA Automatically loses. Horrible Conference. And Anthem got DOWNGRADED
EA's Open World Star Wars Game Cancelled
EA Vancouver hiring for open world Star Wars game
EA is hiring people for an open world star wars game...
EA cancels open world Star Wars game
EA is under criminal investigation by the Belgium government for FIFA lootboxes


The most critical complaints are on anthem and lootboxes while the negative sentiment with Star Wars seems to be more of a disappointment that a game was cancelled given the several posts on the topic. Again, the frequency size is small because the number of posts wasn't much but we can still extract a good amount of information.

## Final Remarks

There is plenty more that can be done, like getting more data from the comments. This would give much more input and allow us to view even more opinionated posts, meaning a better consensus of how people feel about different companies. This is at least a taste of what can be done