{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Impressions on Video Game Developers from Online Forums\n",
"\n",
"## Reasoning:\n",
"Knowledge of webscraping gives access to the largest bank of data available. Almost any website can become a source of data. Its use can range from analyzing competitors to learning more about a user base.\n",
"\n",
"It would make sense to scrape through comments of each post too, but that would take far too long. GameFaqs is at least very casual so posts there will be as opinionated as comments.\n",
"\n",
"\n",
"## Objective:\n",
"The slightly long-winded title explains most of what this notebook is about. My goal is to scrape the post titles from Reddit and Gamefaqs to perform sentiment analysis on them on top of a some data exploration.\n",
"\n",
"\n",
"## Methods:\n",
"I will be using selenium to scroll through all reddit posts and to do some other automation used for clicking buttons. BeautifulSoup will be used to scrape and retrieve the actual data.\n",
"\n",
"I will need to scrape a list of current common user agents and lists of free, recent proxies to rotate through for GameFaqs. They use pages and not infinite scrolling so too many http requests will result in a ban. To play it safe and uninterrupted a few measures will be taken. These IPs and user agents could be used for scraping any other website as well.\n",
"\n",
"I will be obtaining all posts from the past year and filter out the ones which don't mention any that don't mention a developer.\n",
"\n",
"## Featured:\n",
"- Webscraping with BeautifulSoup and Selenium\n",
"- Advanced knowledge on how to rotate proxies and user agents\n",
"- Working with Pandas DataFrames\n",
"- Data analysis and visualization\n"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import random\n",
"import time\n",
"import pandas as pd\n",
"from bs4 import BeautifulSoup as soup\n",
"from selenium import webdriver\n",
"from webdriver_manager.chrome import ChromeDriverManager"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scraping reddit\n",
"\n",
"
\n",
"Here I create a function which uses a web driver to simulate scrolling and load in all results of the page (reddit limits to around 1000 posts). If I wanted to access the entire archive, there are websites which store all of Reddit's data. However, since I am also getting plenty of data from other forums there is not much need to scrape the archives for the purpose of this notebook.\n",
"\n",
"After the entire page is loaded, we can scrape all the text of each post title."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"def reddit_scraper(url):\n",
" '''\n",
" Webscrapes all reddit posts from the given link by scrolling through the \"infinite scrolling\"\n",
" \n",
" Args:\n",
" url: The url of the subreddit or other reddit page you'd like to scrape\n",
" \n",
" Returns:\n",
" A list of all post titles on that page\n",
" '''\n",
" \n",
" driver = webdriver.Chrome(ChromeDriverManager().install())\n",
"\n",
" driver.get(url)\n",
"\n",
" for n in range(600): \n",
" driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')\n",
"\n",
" time.sleep(0.5)\n",
"\n",
" \n",
" page_html = soup(driver.page_source, 'lxml')\n",
"\n",
" driver.close()\n",
" \n",
" containers = page_html.findAll(\"a\", {'data-click-id' : 'body'})\n",
"\n",
" post_titles = []\n",
" for container in containers:\n",
"\n",
" titles = container.find_all(\"h2\", recursive=True)\n",
"\n",
" for title_tag in titles:\n",
" post_titles.append(title_tag.text)\n",
"\n",
" \n",
" return post_titles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scraping r/games"
]
},
{
"cell_type": "code",
"execution_count": 280,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Checking for mac64 chromedriver:2.46 in cache\n",
"Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver\n"
]
}
],
"source": [
"r_games_url = 'https://www.reddit.com/r/games/top/?t=year'\n",
"r_games_posts = reddit_scraper(r_games_url)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scraping r/gaming"
]
},
{
"cell_type": "code",
"execution_count": 281,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Checking for mac64 chromedriver:2.46 in cache\n",
"Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver\n"
]
}
],
"source": [
"r_gaming_url = 'https://www.reddit.com/r/gaming/top/?t=year'\n",
"r_gaming_posts = reddit_scraper(r_gaming_url)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scraping r/truegaming"
]
},
{
"cell_type": "code",
"execution_count": 282,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Checking for mac64 chromedriver:2.46 in cache\n",
"Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver\n"
]
}
],
"source": [
"r_truegaming_url = 'https://www.reddit.com/r/truegaming/top/?t=year'\n",
"r_truegaming_posts = reddit_scraper(r_truegaming_url)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's combine all the results from each subreddit"
]
},
{
"cell_type": "code",
"execution_count": 289,
"metadata": {},
"outputs": [],
"source": [
"# r/games\n",
"r_games_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/games']], columns=['Post', 'Website', 'Board']) \n",
" for title in r_games_posts], \n",
" ignore_index=True)\n",
"\n",
"# r/gaming\n",
"r_gaming_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/gaming']], columns=['Post', 'Website', 'Board']) \n",
" for title in r_gaming_posts], \n",
" ignore_index=True)\n",
"\n",
"# r/truegaming\n",
"r_truegaming_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/truegaming']], columns=['Post', 'Website', 'Board']) \n",
" for title in r_truegaming_posts], \n",
" ignore_index=True)\n",
"\n",
"# Join all to post_titles\n",
"post_titles = pd.DataFrame(columns=['Post', 'Website', 'Board'])\n",
"post_titles = post_titles.append([r_games_forums, r_gaming_forums, r_truegaming_forums], ignore_index=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Progress Report\n",
"Let's check out the DataFrame so far"
]
},
{
"cell_type": "code",
"execution_count": 290,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape: (3007, 3)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Post | \n",
" Website | \n",
" Board | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" John @Totalbiscuit Bain July 8, 1984 - May 24,... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 1 | \n",
" Bungie Splits With Activision | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 2 | \n",
" Totalbiscuit hospitalized, his cancer is sprea... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 3 | \n",
" [E3 2018] Cyberpunk 2077 | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 4 | \n",
" Sony faces growing Fortnite backlash at E3 | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Post Website Board\n",
"0 John @Totalbiscuit Bain July 8, 1984 - May 24,... Reddit r/games\n",
"1 Bungie Splits With Activision Reddit r/games\n",
"2 Totalbiscuit hospitalized, his cancer is sprea... Reddit r/games\n",
"3 [E3 2018] Cyberpunk 2077 Reddit r/games\n",
"4 Sony faces growing Fortnite backlash at E3 Reddit r/games"
]
},
"execution_count": 290,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('Shape: {}'.format(post_titles.shape))\n",
"post_titles.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scraping GameFaqs\n",
"\n",
"GameFaqs will be a different challenge. Rather than infinite scrolling, this website uses pages. This means many http requests will need to be made, so in order to avoid an ip ban and not strain their servers a few things must be done:\n",
"\n",
"1. Rotate user agents\n",
"2. Rotate ip addresses\n",
"3. Sleep on each request"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Obtaining User Agents to Rotate Through\n",
"\n",
"To accomplish this we have to make use of whatismybrowser.com's list of common current user agents, which means webscraping the page to stay up to date."
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {},
"outputs": [],
"source": [
"def get_agents(browser, num_agents=10, offset=0):\n",
" '''\n",
" Webscrapes whatismybrowser.com for new user agents\n",
" \n",
" Args:\n",
" browser: the browser you want the agent from\n",
" num_agents: number of agents to return\n",
" offset: get agents starting from offset num. on page\n",
" \n",
" Returns:\n",
" A list of user agents from the given browser\n",
" '''\n",
" \n",
" if offset + num_agents > 50:\n",
" return []\n",
" \n",
" try:\n",
" chrome_url = requests.get('https://developers.whatismybrowser.com/useragents/explore/software_name/' \\\n",
" + browser)\n",
" except:\n",
" print('Browser does not exist. Try lower case')\n",
" return\n",
" \n",
"\n",
" chrome_html = soup(chrome_url.content)\n",
"\n",
" chrome_containers = chrome_html.findAll('td', {'class' : 'useragent'})\n",
"\n",
" user_agents = []\n",
" for i in range(num_agents):\n",
"\n",
" chrome_agent = chrome_containers[i + offset].a.text\n",
"\n",
" user_agents.append(chrome_agent)\n",
" \n",
" return user_agents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get 10 user agents for chrome and 10 for firefox"
]
},
{
"cell_type": "code",
"execution_count": 175,
"metadata": {},
"outputs": [],
"source": [
"user_agents = []\n",
"user_agents.extend(get_agents('chrome'))\n",
"user_agents.extend(get_agents('firefox'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Obtaining IP Addresses to Rotate Through\n",
"\n",
"We need to create a similar function for retrieving new proxies. This function is more important to call frequently as IPs should be updated frequently."
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [],
"source": [
"def get_ips(num_addresses=20):\n",
" '''\n",
" Webscrapes free-proxy-list.net for new free proxies. This is important because these proxies\n",
" could go bad after just a couple hours.\n",
" \n",
" Args:\n",
" num_addresses: The number of IPs you want returned. If fewer than requested are available,\n",
" return the available amount\n",
" \n",
" Returns:\n",
" A list of new proxies\n",
" '''\n",
" \n",
" driver = webdriver.Chrome(ChromeDriverManager().install())\n",
" driver.get('https://free-proxy-list.net/')\n",
" \n",
" page_html = soup(driver.page_source, 'lxml')\n",
" containers = page_html.findAll('tr', {'role' : 'row'})\n",
" \n",
" ips = []\n",
" ip_num = 0\n",
" page_num = 1\n",
" next_set_btn = driver.find_element_by_xpath('//*[@id=\"proxylisttable_next\"]/a')\n",
" while len(ips) < num_addresses:\n",
" \n",
" ip_num += 1\n",
" \n",
" # Click next button to get more ips if the current page doesn't have enough\n",
" if ((ip_num % 20) - 1 == 0) and ip_num != 1:\n",
" \n",
" # If reached the last page, return what we have\n",
" if page_num >= 15:\n",
" driver.close()\n",
" return ips\n",
" \n",
" next_set_btn.click()\n",
" next_set_btn = driver.find_element_by_xpath('//*[@id=\"proxylisttable_next\"]/a')\n",
" \n",
" ip_num = 1\n",
" page_html = soup(driver.page_source, 'lxml')\n",
" containers = page_html.findAll('tr', {'role' : 'row'})\n",
" \n",
" page_num += 1\n",
" \n",
" row = containers[ip_num].find_all('td')\n",
" \n",
" ip = row[0].text\n",
" port = row[1].text\n",
" \n",
" if row[6].text == 'yes':\n",
" ips.append(':'.join([ip, port]))\n",
"\n",
" driver.close()\n",
" \n",
" return ips"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Checking for mac64 chromedriver:2.46 in cache\n",
"Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver\n"
]
}
],
"source": [
"ips = get_ips(20)"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['51.68.112.254:3128',\n",
" '45.32.42.234:8080',\n",
" '178.128.54.73:8080',\n",
" '104.248.16.45:8080',\n",
" '177.38.66.255:45235',\n",
" '95.47.180.171:53484',\n",
" '138.186.23.9:40340',\n",
" '182.160.119.254:56229',\n",
" '103.250.157.43:38641',\n",
" '115.127.39.66:55474',\n",
" '88.210.71.234:46626',\n",
" '177.94.206.67:60666',\n",
" '1.10.186.157:55129',\n",
" '176.197.103.210:53281',\n",
" '109.201.97.235:39125',\n",
" '31.43.143.15:8181',\n",
" '193.213.89.72:51024',\n",
" '183.82.118.87:8080',\n",
" '41.84.131.78:53281',\n",
" '93.77.78.123:42803']"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print out the proxies to see what they look like\n",
"ips"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. GameFaqs Scraping Function with Pauses"
]
},
{
"cell_type": "code",
"execution_count": 249,
"metadata": {},
"outputs": [],
"source": [
"def gamefaqs_scraper(url, num_pages, ips, user_agents, offset=0, start_page=0):\n",
" '''\n",
" Scrape GameFaqs forums for post titles\n",
" \n",
" Args:\n",
" url: The url to the first page of GameFaqs\n",
" num_pages: Number of pages to scrape\n",
" \n",
" Returns:\n",
" A list of post titles\n",
" '''\n",
" \n",
" rot_list = []\n",
" for ip in ips:\n",
" rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n",
" \n",
" \n",
" req = ''\n",
" i = 0\n",
" while req == '':\n",
" try:\n",
" agent_proxy_pair = random.choice(rot_list)\n",
" proxy = agent_proxy_pair[0]\n",
" headers = agent_proxy_pair[1]\n",
" \n",
" if start_page == 0:\n",
" req = requests.get(url, headers=headers, proxies=proxy, timeout=10)\n",
" else:\n",
" req = requests.get(url + '?page=' + str(start_page+1), headers=headers, proxies=proxy, timeout=10)\n",
" \n",
" print('Success with IP, ' + proxy['https'])\n",
" \n",
" page_html = soup(req.content)\n",
"\n",
" containers = page_html.findAll('td', {'class' : 'topic'})\n",
" \n",
" if not containers:\n",
" print('Agent may be banned, removing agent and trying a new one...')\n",
" print(page_html, user_agent)\n",
" try:\n",
" user_agents.remove(headers['User-Agent']) \n",
" except:\n",
" pass\n",
" \n",
" req = ''\n",
"\n",
" except Exception as e:\n",
" i += 1\n",
" print('Error with IP, ' + proxy['https'] + ' requesting a new one...')\n",
" \n",
" if i % 20 == 0:\n",
" ips = get_ips(20)\n",
" \n",
" rot_list = []\n",
" for ip in ips:\n",
" rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n",
" \n",
" \n",
" post_titles = []\n",
" for page in range(start_page, num_pages):\n",
" \n",
" for container in containers:\n",
" title = container.a.text\n",
" \n",
" post_titles.append(title)\n",
" \n",
" \n",
" time.sleep(3)\n",
" \n",
" req = ''\n",
" i = 0\n",
" while req == '':\n",
" try: \n",
" agent_proxy_pair = random.choice(rot_list)\n",
" proxy = agent_proxy_pair[0]\n",
" headers = agent_proxy_pair[1]\n",
"\n",
" req = requests.get(url + '?page=' + str(page + 1), headers=headers, proxies=proxy, timeout=10)\n",
" \n",
" page_html = soup(req.content)\n",
"\n",
" containers = page_html.findAll('td', {'class' : 'topic'})\n",
"\n",
" if not containers:\n",
" print('Agent may be banned, removing agent and trying a new one...')\n",
" \n",
" try:\n",
" rot_list.remove(agent_proxy_pair)\n",
" if not rot_list:\n",
" print('Loading in new IPs...')\n",
" ips = get_ips(20)\n",
"\n",
" rot_list = []\n",
" for ip in ips:\n",
" rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n",
"\n",
" \n",
" user_agents.remove(headers['User-Agent']) \n",
" except:\n",
" pass\n",
" \n",
" req = ''\n",
" time.sleep(2)\n",
" \n",
" if len(user_agents) == 0:\n",
" print('No more agents, ended at page,', page+1)\n",
" return post_titles\n",
" else:\n",
" print('Success with IP ' + proxy['https'] + ', now onto page ', page + 2)\n",
" \n",
" except Exception as e:\n",
" i += 1\n",
" time.sleep(2)\n",
" print('Error with IP ' + proxy['https'] + ', requesting a new one...')\n",
" \n",
" if i % 20 == 0:\n",
" print('Loading in new IPs...')\n",
" ips = get_ips(20)\n",
" \n",
" rot_list = []\n",
" for ip in ips:\n",
" rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n",
" \n",
" if page % 100 == 0:\n",
" print('Loading in new IPs...')\n",
" ips = get_ips(20)\n",
" \n",
" rot_list = []\n",
" for ip in ips:\n",
" rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n",
" \n",
" \n",
" return post_titles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: I am simply printing the last 10 outputs for each forum webscraped after realizing the output couldn't be shrinked when uploaded."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Nintendo Switch forums"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Success with IP 118.174.233.33:54705, now onto page 1692 \n",
"Success with IP 203.128.94.102:60152, now onto page 1693 \n",
"Success with IP 182.52.238.111:45639, now onto page 1694 \n",
"Success with IP 116.203.1.177:1994, now onto page 1695 \n",
"Success with IP 217.17.38.245:41506, now onto page 1696 \n",
"Success with IP 203.128.94.102:60152, now onto page 1697 \n",
"Success with IP 180.180.156.35:49510, now onto page 1698 \n",
"Success with IP 1.20.97.4:46965, now onto page 1699 \n",
"Success with IP 203.128.94.102:60152, now onto page 1700 \n",
"Success with IP 180.180.156.45:32355, now onto page 1701\n"
]
}
],
"source": [
"switch_url = 'https://gamefaqs.gamespot.com/boards/189706-nintendo-switch'\n",
"switch_posts = gamefaqs_scraper(switch_url, num_pages=1700, ips=ips, user_agents=user_agents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Playstation forums"
]
},
{
"cell_type": "code",
"execution_count": 225,
"metadata": {},
"outputs": [],
"source": [
"# Use new agents to avoid a temporary ban\n",
"user_agents = []\n",
"user_agents.extend(get_agents('chrome/2', num_agents=50))\n",
"user_agents.extend(get_agents('firefox/2', num_agents=50))\n",
"user_agents.extend(get_agents('safari/2', num_agents=50))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Success with IP 119.192.179.46:55012, now onto page 1932 \n",
"Success with IP 1.20.101.150:41904, now onto page 1933 \n",
"Error with IP 103.194.192.29:49202, requesting a new one... \n",
"Agent may be banned, removing agent and trying a new one... \n",
"Error with IP 213.14.32.75:47442, requesting a new one... \n",
"Error with IP 103.220.28.180:51493, requesting a new one... \n",
"Error with IP 103.220.28.180:51493, requesting a new one... \n",
"Success with IP 111.91.225.2:8080, now onto page 1934 \n",
"Success with IP 119.192.179.46:55012, now onto page 1935 \n",
"Success with IP 1.20.101.150:41904, now onto page 1936\n"
]
}
],
"source": [
"ips = get_ips(20)\n",
"playstation_url = 'https://gamefaqs.gamespot.com/boards/691087-playstation-4'\n",
"playstation_posts = gamefaqs_scraper(playstation_url, num_pages=1935, ips=ips, user_agents=user_agents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PC forums"
]
},
{
"cell_type": "code",
"execution_count": 250,
"metadata": {},
"outputs": [],
"source": [
"# Use new agents to avoid a temporary ban\n",
"user_agents = []\n",
"user_agents.extend(get_agents('chrome/3', num_agents=50))\n",
"user_agents.extend(get_agents('firefox/3', num_agents=50))\n",
"user_agents.extend(get_agents('safari/3', num_agents=50))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Success with IP 176.98.95.247:31955, now onto page 1061 \n",
"Error with IP 45.6.100.250:48214, requesting a new one... \n",
"Error with IP 75.98.119.13:57859, requesting a new one... \n",
"Error with IP 45.6.100.250:48214, requesting a new one... \n",
"Success with IP 41.215.81.170:59959, now onto page 1062 \n",
"Success with IP 41.215.81.170:59959, now onto page 1063 \n",
"Error with IP 45.6.100.250:48214, requesting a new one... \n",
"Success with IP 87.26.3.40:8080, now onto page 1064 \n",
"Success with IP 203.205.29.106:39191, now onto page 1065 \n",
"Success with IP 87.26.3.40:8080, now onto page 1066\n"
]
}
],
"source": [
"ips = get_ips(20)\n",
"pc_url = 'https://gamefaqs.gamespot.com/boards/916373-pc'\n",
"pc_posts = gamefaqs_scraper(pc_url, num_pages=1065, ips=ips, user_agents=user_agents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Xbox One forums\n",
"\n",
"We will just reuse the same user agents here"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Error with IP 210.11.181.221:55331, requesting a new one... \n",
"Error with IP 178.128.217.99:8080, requesting a new one... \n",
"Error with IP 31.209.110.159:39494, requesting a new one... \n",
"Error with IP 210.11.181.221:55331, requesting a new one... \n",
"Error with IP 202.91.92.21:43576, requesting a new one... \n",
"Error with IP 5.2.200.145:44508, requesting a new one... \n",
"Success with IP 109.201.142.14:3128, now onto page 710 \n",
"Agent may be banned, removing agent and trying a new one... \n",
"Error with IP 124.41.240.191:38167, requesting a new one... \n",
"Success with IP 109.201.142.14:3128, now onto page 711\n"
]
}
],
"source": [
"ips = get_ips(20)\n",
"xbox_url = 'https://gamefaqs.gamespot.com/boards/691088-xbox-one'\n",
"xbox_posts = gamefaqs_scraper(xbox_url, num_pages=710, ips=ips, user_agents=user_agents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Removing Duplicates and Combining All the Results"
]
},
{
"cell_type": "code",
"execution_count": 257,
"metadata": {},
"outputs": [],
"source": [
"switch_posts = list(set(switch_posts))\n",
"playstation_posts = list(set(playstation_posts))\n",
"pc_posts = list(set(pc_posts))\n",
"xbox_posts = list(set(xbox_posts))"
]
},
{
"cell_type": "code",
"execution_count": 291,
"metadata": {},
"outputs": [],
"source": [
"# Switch Boards\n",
"switch_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Switch']], columns=['Post', 'Website', 'Board']) \n",
" for title in switch_posts], \n",
" ignore_index=True)\n",
"\n",
"# PS4 Boards\n",
"playstation_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Playstation 4']], columns=['Post', 'Website', 'Board']) \n",
" for title in playstation_posts], \n",
" ignore_index=True)\n",
"\n",
"# Xbox One Boards\n",
"xbox_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Xbox One']], columns=['Post', 'Website', 'Board']) \n",
" for title in xbox_posts], \n",
" ignore_index=True)\n",
"\n",
"# PC Boards\n",
"pc_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'PC']], columns=['Post', 'Website', 'Board']) \n",
" for title in pc_posts], \n",
" ignore_index=True)\n",
"\n",
"# Join all to post_titles\n",
"post_titles = pd.concat([post_titles, switch_forums, playstation_forums, xbox_forums, pc_forums], ignore_index=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now have all the posts we want and could display the final results"
]
},
{
"cell_type": "code",
"execution_count": 292,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Post | \n",
" Website | \n",
" Board | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" John @Totalbiscuit Bain July 8, 1984 - May 24,... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 1 | \n",
" Bungie Splits With Activision | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 2 | \n",
" Totalbiscuit hospitalized, his cancer is sprea... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 3 | \n",
" [E3 2018] Cyberpunk 2077 | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 4 | \n",
" Sony faces growing Fortnite backlash at E3 | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 5 | \n",
" John “TotalBiscuit” Bain to be inducted into E... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 6 | \n",
" Later today, Red Dead 2 gets a new trailer. Be... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 7 | \n",
" List of Video Games where you can pet the dogs | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 8 | \n",
" It's time video game makers unionize. | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 9 | \n",
" Bethesda Support Leaks Fallout 76 Customer Nam... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 10 | \n",
" Ubisoft will now ban players for racist, homop... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 11 | \n",
" Fallout 76 – Official Teaser Trailer | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 12 | \n",
" Nintendo of America’s Reggie Fils-Aime to Reti... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 13 | \n",
" Obsidian's The Outer Worlds blends Firefly and... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 14 | \n",
" Bethesda offering 500 atoms ($5 ingame store c... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 15 | \n",
" [E3 2018] The Elder Scrolls VI | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 16 | \n",
" Giantbomb Unlikely to Review Fallout 76. Gerst... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 17 | \n",
" Report: The Walking Dead developer Telltale Ga... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 18 | \n",
" Introducing the Xbox Adaptive Controller | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 19 | \n",
" Game dev: Linux users were only 0.1% of sales ... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 20 | \n",
" Sony's Stubborn Stance on Cross-Play Is Embarr... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 21 | \n",
" Metro dev: 'if at all all the PC players annou... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 22 | \n",
" Blizzard Says It Wasn't Expecting Fans To Be T... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 23 | \n",
" Black Ops 4 adds microtransactions, requiring ... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 24 | \n",
" This takes it to the next level, Are we really... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 25 | \n",
" EA Cancels Open-World Star Wars Game | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 26 | \n",
" In the crazy economy of Red Dead Online, baked... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 27 | \n",
" PlayStation Skipping E3 For First Time in Show... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 28 | \n",
" Cyberpunk 2077 is a First-Person RPG | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" 29 | \n",
" Belgian government opens criminal investigatio... | \n",
" Reddit | \n",
" r/games | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 107471 | \n",
" Skyrim vs Kingdom Come Deliverance, which game... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107472 | \n",
" MH World PC port planned for autumn 2018 | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107473 | \n",
" Best place to buy cheap Steam key? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107474 | \n",
" Looking to get my A+ certification. | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107475 | \n",
" Gears of war 4 help plz | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107476 | \n",
" WD Easystore 4TB External Drive $120 At Best Buy | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107477 | \n",
" With Injustice 1 and MKXL being the highest se... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107478 | \n",
" Best sandbox building game? Preferably with mu... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107479 | \n",
" Switch to a 2.4ghz connection instead of 5ghz ... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107480 | \n",
" Over-Ear Headphones under $150? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107481 | \n",
" What are your opinions on possible FO3 remaster? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107482 | \n",
" Most enjoyable fighter on pc so far? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107483 | \n",
" Monitor recommendations? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107484 | \n",
" What E3 games will YOU be buying? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107485 | \n",
" Your five most played Steam games? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107486 | \n",
" New Fire Pro Wrestling game coming to Steam | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107487 | \n",
" Oculus Go is trash right now. | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107488 | \n",
" Need help getting Sonic Heroes to work | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107489 | \n",
" Anyone ever bypass Rockstar Social s*** on Steam? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107490 | \n",
" So does Win10 still have that mandatory update... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107491 | \n",
" Windows 10 update broke my graphics driver. AM... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107492 | \n",
" There is a lot of fear mongering by net neutra... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107493 | \n",
" Fallout 76 rust clone? | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107494 | \n",
" Please help me out here. | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107495 | \n",
" Climbed from bronze 4 to gold 5 | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107496 | \n",
" Fellow 2500K users, when are you upgrading/hav... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107497 | \n",
" About to grab the Sennheiser 598, couple last ... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107498 | \n",
" So is it possible to do full body tracking wit... | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107499 | \n",
" Modular PSU | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
" 107500 | \n",
" Steam dropped Bitcoin payments | \n",
" GameFaqs | \n",
" PC | \n",
"
\n",
" \n",
"
\n",
"
107501 rows × 3 columns
\n",
"
"
],
"text/plain": [
" Post Website Board\n",
"0 John @Totalbiscuit Bain July 8, 1984 - May 24,... Reddit r/games\n",
"1 Bungie Splits With Activision Reddit r/games\n",
"2 Totalbiscuit hospitalized, his cancer is sprea... Reddit r/games\n",
"3 [E3 2018] Cyberpunk 2077 Reddit r/games\n",
"4 Sony faces growing Fortnite backlash at E3 Reddit r/games\n",
"5 John “TotalBiscuit” Bain to be inducted into E... Reddit r/games\n",
"6 Later today, Red Dead 2 gets a new trailer. Be... Reddit r/games\n",
"7 List of Video Games where you can pet the dogs Reddit r/games\n",
"8 It's time video game makers unionize. Reddit r/games\n",
"9 Bethesda Support Leaks Fallout 76 Customer Nam... Reddit r/games\n",
"10 Ubisoft will now ban players for racist, homop... Reddit r/games\n",
"11 Fallout 76 – Official Teaser Trailer Reddit r/games\n",
"12 Nintendo of America’s Reggie Fils-Aime to Reti... Reddit r/games\n",
"13 Obsidian's The Outer Worlds blends Firefly and... Reddit r/games\n",
"14 Bethesda offering 500 atoms ($5 ingame store c... Reddit r/games\n",
"15 [E3 2018] The Elder Scrolls VI Reddit r/games\n",
"16 Giantbomb Unlikely to Review Fallout 76. Gerst... Reddit r/games\n",
"17 Report: The Walking Dead developer Telltale Ga... Reddit r/games\n",
"18 Introducing the Xbox Adaptive Controller Reddit r/games\n",
"19 Game dev: Linux users were only 0.1% of sales ... Reddit r/games\n",
"20 Sony's Stubborn Stance on Cross-Play Is Embarr... Reddit r/games\n",
"21 Metro dev: 'if at all all the PC players annou... Reddit r/games\n",
"22 Blizzard Says It Wasn't Expecting Fans To Be T... Reddit r/games\n",
"23 Black Ops 4 adds microtransactions, requiring ... Reddit r/games\n",
"24 This takes it to the next level, Are we really... Reddit r/games\n",
"25 EA Cancels Open-World Star Wars Game Reddit r/games\n",
"26 In the crazy economy of Red Dead Online, baked... Reddit r/games\n",
"27 PlayStation Skipping E3 For First Time in Show... Reddit r/games\n",
"28 Cyberpunk 2077 is a First-Person RPG Reddit r/games\n",
"29 Belgian government opens criminal investigatio... Reddit r/games\n",
"... ... ... ...\n",
"107471 Skyrim vs Kingdom Come Deliverance, which game... GameFaqs PC\n",
"107472 MH World PC port planned for autumn 2018 GameFaqs PC\n",
"107473 Best place to buy cheap Steam key? GameFaqs PC\n",
"107474 Looking to get my A+ certification. GameFaqs PC\n",
"107475 Gears of war 4 help plz GameFaqs PC\n",
"107476 WD Easystore 4TB External Drive $120 At Best Buy GameFaqs PC\n",
"107477 With Injustice 1 and MKXL being the highest se... GameFaqs PC\n",
"107478 Best sandbox building game? Preferably with mu... GameFaqs PC\n",
"107479 Switch to a 2.4ghz connection instead of 5ghz ... GameFaqs PC\n",
"107480 Over-Ear Headphones under $150? GameFaqs PC\n",
"107481 What are your opinions on possible FO3 remaster? GameFaqs PC\n",
"107482 Most enjoyable fighter on pc so far? GameFaqs PC\n",
"107483 Monitor recommendations? GameFaqs PC\n",
"107484 What E3 games will YOU be buying? GameFaqs PC\n",
"107485 Your five most played Steam games? GameFaqs PC\n",
"107486 New Fire Pro Wrestling game coming to Steam GameFaqs PC\n",
"107487 Oculus Go is trash right now. GameFaqs PC\n",
"107488 Need help getting Sonic Heroes to work GameFaqs PC\n",
"107489 Anyone ever bypass Rockstar Social s*** on Steam? GameFaqs PC\n",
"107490 So does Win10 still have that mandatory update... GameFaqs PC\n",
"107491 Windows 10 update broke my graphics driver. AM... GameFaqs PC\n",
"107492 There is a lot of fear mongering by net neutra... GameFaqs PC\n",
"107493 Fallout 76 rust clone? GameFaqs PC\n",
"107494 Please help me out here. GameFaqs PC\n",
"107495 Climbed from bronze 4 to gold 5 GameFaqs PC\n",
"107496 Fellow 2500K users, when are you upgrading/hav... GameFaqs PC\n",
"107497 About to grab the Sennheiser 598, couple last ... GameFaqs PC\n",
"107498 So is it possible to do full body tracking wit... GameFaqs PC\n",
"107499 Modular PSU GameFaqs PC\n",
"107500 Steam dropped Bitcoin payments GameFaqs PC\n",
"\n",
"[107501 rows x 3 columns]"
]
},
"execution_count": 292,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"post_titles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extracting Titles which Mention Large Game Companies\n",
"\n",
"First I have to make a list of relevant developers and their different nicknames."
]
},
{
"cell_type": "code",
"execution_count": 313,
"metadata": {},
"outputs": [],
"source": [
"# The full name is only listed in cases like 'Activision Blizzard' together with 'Activision' and 'Blizzard'\n",
"# in order to label each post in the next step\n",
"developers = [['Tencent'], ['Rockstar'], ['Valve'], ['Sony'], ['Microsoft'], ['Nintendo'], ['Bungie'],\n",
" ['Activision Blizzard', 'Activision', 'Activi$ion', 'Blizzard'], ['Electronic Arts', 'EA'],\n",
" ['Bandai Namco', 'Bandai', 'Namco'], ['Ubisoft'], ['Nexon'], ['Telltale'], \n",
" ['Epic Games', 'Epic'], ['BioWare'], ['Naughty Dog'], ['Square Enix', 'Square'], \n",
" ['Bunjie'], ['Insomniac'], ['Bethesda'], ['Capcom'], ['Take-Two', 'Take Two', 'Take 2', 'Take2'], \n",
" ['Sega'], ['Devolver Digital', 'Devolver'], ['Konami'], ['Apple']]"
]
},
{
"cell_type": "code",
"execution_count": 335,
"metadata": {},
"outputs": [],
"source": [
"import re"
]
},
{
"cell_type": "code",
"execution_count": 376,
"metadata": {},
"outputs": [],
"source": [
"dev_posts = pd.DataFrame(columns=['Post', 'Website', 'Board', 'Developer'])\n",
"index = 0\n",
"post_dict = {}\n",
"for i in range(len(post_titles)):\n",
"\n",
" all_developers = []\n",
" for dev in developers:\n",
" for nickname in dev:\n",
" \n",
" # Special case for EA. Common nickname but could also be mixed with common words like \"each\".\n",
" match = False\n",
" if nickname == 'EA': \n",
" post_title = post_titles['Post'].loc[i]\n",
" \n",
" # Regex to match EA outside of other words\n",
" if re.match(r'([^a-zA-Z]|^)EA([^a-zA-Z]|$)', post_title):\n",
" all_developers += [dev[0]]\n",
" match = True\n",
" \n",
" else:\n",
" post_title = post_titles['Post'].loc[i].lower()\n",
" \n",
" if nickname.lower() in post_title:\n",
" all_developers += [dev[0]]\n",
" match = True\n",
"\n",
" if match:\n",
" if post_dict.get(dev[0]):\n",
" post_dict[dev[0]].append(post_titles['Post'].loc[i])\n",
" else:\n",
" post_dict[dev[0]] = [post_titles['Post'].loc[i]]\n",
" break\n",
" \n",
" if all_developers: \n",
" row = post_titles.loc[i].values.tolist() + [', '.join(all_developers)]\n",
" dev_posts.loc[index] = row\n",
" index += 1"
]
},
{
"cell_type": "code",
"execution_count": 337,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape, (8493, 4)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Post | \n",
" Website | \n",
" Board | \n",
" Developer | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Bungie Splits With Activision | \n",
" Reddit | \n",
" r/games | \n",
" Bungie, Activision Blizzard | \n",
"
\n",
" \n",
" 1 | \n",
" Sony faces growing Fortnite backlash at E3 | \n",
" Reddit | \n",
" r/games | \n",
" Sony | \n",
"
\n",
" \n",
" 2 | \n",
" Later today, Red Dead 2 gets a new trailer. Be... | \n",
" Reddit | \n",
" r/games | \n",
" Rockstar, Take-Two | \n",
"
\n",
" \n",
" 3 | \n",
" Bethesda Support Leaks Fallout 76 Customer Nam... | \n",
" Reddit | \n",
" r/games | \n",
" Bethesda | \n",
"
\n",
" \n",
" 4 | \n",
" Ubisoft will now ban players for racist, homop... | \n",
" Reddit | \n",
" r/games | \n",
" Ubisoft | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Post Website Board \\\n",
"0 Bungie Splits With Activision Reddit r/games \n",
"1 Sony faces growing Fortnite backlash at E3 Reddit r/games \n",
"2 Later today, Red Dead 2 gets a new trailer. Be... Reddit r/games \n",
"3 Bethesda Support Leaks Fallout 76 Customer Nam... Reddit r/games \n",
"4 Ubisoft will now ban players for racist, homop... Reddit r/games \n",
"\n",
" Developer \n",
"0 Bungie, Activision Blizzard \n",
"1 Sony \n",
"2 Rockstar, Take-Two \n",
"3 Bethesda \n",
"4 Ubisoft "
]
},
"execution_count": 337,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('Shape, {}'.format(dev_posts.shape))\n",
"dev_posts.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sentiment Analysis\n",
"\n",
"### Public Impressions of Developers\n",
"\n",
"Now we can finally analyze our data and figure out how well public opinion is in each of these developer's favor.\n",
"
\n",
"First we want to do a simple comparison based on sentiment, this will be a 3 step process:\n",
"\n",
"1. Gather all titles associated with each developer\n",
"2. Perform sentiment analysis on each title\n",
"3. Calculate the mean of the results for each developer"
]
},
{
"cell_type": "code",
"execution_count": 340,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package vader_lexicon to\n",
"[nltk_data] /Users/adrianherrmann/nltk_data...\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 340,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import nltk\n",
"from nltk.sentiment.vader import SentimentIntensityAnalyzer\n",
"nltk.download('vader_lexicon')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to judge sentiments based on the compound score, which is the sum of all lexicon ratings standarized to be within the range from -1 to 1"
]
},
{
"cell_type": "code",
"execution_count": 359,
"metadata": {},
"outputs": [],
"source": [
"dev_sentiments = pd.DataFrame(columns=['Mean Sentiment', 'Developer', \n",
" 'Most Negative Sentence', 'Most Positive Sentence',\n",
" 'Most Negative Score', 'Most Positive Score',\n",
" 'Number of Posts'])\n",
"index = 0\n",
"sid = SentimentIntensityAnalyzer()\n",
"for dev in developers:\n",
" titles = dev_posts[dev_posts['Developer'].str.contains(dev[0])]\n",
" \n",
" if not titles.values.tolist():\n",
" continue\n",
" \n",
" tot_sentiment = 0\n",
" most_neg_sent = ''\n",
" most_pos_sent = ''\n",
" most_neg_score = 1\n",
" most_pos_score = -1\n",
" for title in titles['Post'].values:\n",
" sentiment = sid.polarity_scores(title)['compound'] \n",
" tot_sentiment += sentiment\n",
" \n",
" if sentiment < most_neg_score:\n",
" most_neg_score = sentiment\n",
" most_neg_sent = title\n",
" \n",
" if sentiment > most_pos_score:\n",
" most_pos_score = sentiment\n",
" most_pos_sent = title\n",
" \n",
" mean_sentiment = tot_sentiment / len(titles)\n",
" \n",
" dev_sentiments.loc[index] = [mean_sentiment, dev[0],\n",
" most_neg_sent, most_pos_sent,\n",
" most_neg_score, most_pos_score,\n",
" len(titles)]\n",
" index += 1\n",
" \n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that the sentiments are analyzed we can view the important details.\n",
"
\n",
"#### NOTE:\n",
"It's expected that some of these posts will be wrongfully rated. For example if a game has a name with a negative word and it is mentioned within the same sentence as a developer (think Resident Evil), then the title's score will negatively affect the rating. At least for now, on a grand scheme, these analyses will average out and lean toward how they are truly perceived (given the sample size is large enough).\n",
"\n",
"It is important to dive deeper so that you can apply even more specific filtering and sentiment analysis when analyzing one company, which I will be doing a bit of. The post_dict created earlier will help."
]
},
{
"cell_type": "code",
"execution_count": 361,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Mean Sentiment | \n",
" Developer | \n",
" Most Negative Sentence | \n",
" Most Positive Sentence | \n",
" Most Negative Score | \n",
" Most Positive Score | \n",
" Number of Posts | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.138989 | \n",
" Tencent | \n",
" Does anyone actually play the crappy F2P games... | \n",
" Should superior Chinese companies like Tencent... | \n",
" -0.2960 | \n",
" 0.8176 | \n",
" 9 | \n",
"
\n",
" \n",
" 1 | \n",
" -0.043847 | \n",
" Rockstar | \n",
" Rockstar Lies & Red Dead Online Economy Is A G... | \n",
" Ubisoft is a BETTER company than Rockstar! LOL... | \n",
" -0.8625 | \n",
" 0.8419 | \n",
" 167 | \n",
"
\n",
" \n",
" 2 | \n",
" -0.018581 | \n",
" Valve | \n",
" Dead before it even released? A valve game?? A... | \n",
" Artifact is so good, Kotaku writer wants to re... | \n",
" -0.7041 | \n",
" 0.8147 | \n",
" 94 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.031590 | \n",
" Sony | \n",
" Sony's Devil May Cry has arrived. Lost Souls A... | \n",
" Sony wins best Float at PRIDE 2018 | \n",
" -0.8689 | \n",
" 0.9008 | \n",
" 1627 | \n",
"
\n",
" \n",
" 4 | \n",
" 0.054865 | \n",
" Microsoft | \n",
" NO! BAD MICROSOFT! I'm so ashamed of you! | \n",
" Amazing show Microsoft!! My brother even said ... | \n",
" -0.9191 | \n",
" 0.8798 | \n",
" 708 | \n",
"
\n",
" \n",
" 5 | \n",
" 0.072509 | \n",
" Nintendo | \n",
" Resident Evil, Resident Evil 0, and Resident E... | \n",
" Discovered a Nintendo office close to where I ... | \n",
" -0.9349 | \n",
" 0.9273 | \n",
" 3969 | \n",
"
\n",
" \n",
" 6 | \n",
" 0.009765 | \n",
" Bungie | \n",
" Activision currently under investigation for f... | \n",
" LMAO Anthem is the exact same hustle Bungie us... | \n",
" -0.5859 | \n",
" 0.6841 | \n",
" 37 | \n",
"
\n",
" \n",
" 7 | \n",
" 0.006517 | \n",
" Activision Blizzard | \n",
" Heroes of the Storm pros vent sadness, anger a... | \n",
" Thank you Activision for CoD Black Ops 4 Black... | \n",
" -0.7783 | \n",
" 0.8555 | \n",
" 183 | \n",
"
\n",
" \n",
" 8 | \n",
" -0.105926 | \n",
" Electronic Arts | \n",
" EA Head Fired For Gross Misconduct | \n",
" EA are an excellent company that provides chea... | \n",
" -0.7717 | \n",
" 0.8316 | \n",
" 130 | \n",
"
\n",
" \n",
" 9 | \n",
" 0.038390 | \n",
" Bandai Namco | \n",
" WTF were Namco Bandai thinking? | \n",
" Bandai Namco proves to be the best third party... | \n",
" -0.6739 | \n",
" 0.7845 | \n",
" 73 | \n",
"
\n",
" \n",
" 10 | \n",
" 0.057659 | \n",
" Ubisoft | \n",
" Ubisoft will now ban players for racist, homop... | \n",
" Ubisoft is a BETTER company than Rockstar! LOL... | \n",
" -0.8225 | \n",
" 0.8419 | \n",
" 201 | \n",
"
\n",
" \n",
" 11 | \n",
" 0.000000 | \n",
" Nexon | \n",
" What Nexon games use NX? | \n",
" What Nexon games use NX? | \n",
" 0.0000 | \n",
" 0.0000 | \n",
" 1 | \n",
"
\n",
" \n",
" 12 | \n",
" -0.043905 | \n",
" Telltale | \n",
" No wonder Telltale Games died a slow painful d... | \n",
" Would Telltale Be The Best Developer If They H... | \n",
" -0.9118 | \n",
" 0.7964 | \n",
" 109 | \n",
"
\n",
" \n",
" 13 | \n",
" 0.028201 | \n",
" Epic Games | \n",
" God of War Has An Epic Avengers Infinity War R... | \n",
" The Epic Games store is now live - giving away... | \n",
" -0.7717 | \n",
" 0.8625 | \n",
" 189 | \n",
"
\n",
" \n",
" 14 | \n",
" -0.061895 | \n",
" BioWare | \n",
" Did EA ruin Bioware or did Bioware ruin itself? | \n",
" Former Bioware legend Mike Laidlaw praises wit... | \n",
" -0.8225 | \n",
" 0.8122 | \n",
" 60 | \n",
"
\n",
" \n",
" 15 | \n",
" 0.051743 | \n",
" Naughty Dog | \n",
" Naughty Dog's lead animator explains in-depth ... | \n",
" I for one am proud Naughty Dog is displaying E... | \n",
" -0.6696 | \n",
" 0.7712 | \n",
" 72 | \n",
"
\n",
" \n",
" 16 | \n",
" 0.037446 | \n",
" Square Enix | \n",
" kingdom hearts 3 deluxe edition....what the HE... | \n",
" Did Square Enix ever state why they skipped ou... | \n",
" -0.8793 | \n",
" 0.8788 | \n",
" 286 | \n",
"
\n",
" \n",
" 17 | \n",
" 0.064921 | \n",
" Insomniac | \n",
" Santa Monica, Gorilla, Insomniac, Sucker Punch... | \n",
" Ratchet and Clank PS4 is Insomniacs BEST SELLI... | \n",
" -0.5267 | \n",
" 0.7125 | \n",
" 33 | \n",
"
\n",
" \n",
" 18 | \n",
" 0.034053 | \n",
" Bethesda | \n",
" Bethesda worst dev/pub of all time,nothing was... | \n",
" Call me crazy, but Fallout 4's the best Bethes... | \n",
" -0.8271 | \n",
" 0.8481 | \n",
" 247 | \n",
"
\n",
" \n",
" 19 | \n",
" 0.068994 | \n",
" Capcom | \n",
" As a Devil My Cry fan, I'm jealous of the way ... | \n",
" Operation Make DMC Great again is a success! T... | \n",
" -0.9274 | \n",
" 0.8999 | \n",
" 386 | \n",
"
\n",
" \n",
" 20 | \n",
" -0.113245 | \n",
" Take-Two | \n",
" Take two/Rockstar will be VERY hard to stop wi... | \n",
" Take2 CEO on Epic Store: Competition is a good... | \n",
" -0.6756 | \n",
" 0.7003 | \n",
" 20 | \n",
"
\n",
" \n",
" 21 | \n",
" 0.014805 | \n",
" Sega | \n",
" Damn sega!!!! Killing it this week!! | \n",
" Virtua Fighter 5 Final Showdown is free with G... | \n",
" -0.8507 | \n",
" 0.8932 | \n",
" 220 | \n",
"
\n",
" \n",
" 22 | \n",
" 0.016383 | \n",
" Devolver Digital | \n",
" Not a Hero hitting Switch Aug 2nd. 12 more Dev... | \n",
" So who is Devolver Digital and should I care a... | \n",
" -0.4449 | \n",
" 0.4939 | \n",
" 12 | \n",
"
\n",
" \n",
" 23 | \n",
" 0.020019 | \n",
" Konami | \n",
" Death Stranding will prove that Konami was rig... | \n",
" What is the best Konami game you've ever played? | \n",
" -0.7430 | \n",
" 0.7650 | \n",
" 134 | \n",
"
\n",
" \n",
" 24 | \n",
" 0.036826 | \n",
" Apple | \n",
" Went shopping for apple products, it's a horri... | \n",
" Apple Finally Caves, promises to support Steam... | \n",
" -0.5423 | \n",
" 0.6486 | \n",
" 31 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Mean Sentiment Developer \\\n",
"0 0.138989 Tencent \n",
"1 -0.043847 Rockstar \n",
"2 -0.018581 Valve \n",
"3 0.031590 Sony \n",
"4 0.054865 Microsoft \n",
"5 0.072509 Nintendo \n",
"6 0.009765 Bungie \n",
"7 0.006517 Activision Blizzard \n",
"8 -0.105926 Electronic Arts \n",
"9 0.038390 Bandai Namco \n",
"10 0.057659 Ubisoft \n",
"11 0.000000 Nexon \n",
"12 -0.043905 Telltale \n",
"13 0.028201 Epic Games \n",
"14 -0.061895 BioWare \n",
"15 0.051743 Naughty Dog \n",
"16 0.037446 Square Enix \n",
"17 0.064921 Insomniac \n",
"18 0.034053 Bethesda \n",
"19 0.068994 Capcom \n",
"20 -0.113245 Take-Two \n",
"21 0.014805 Sega \n",
"22 0.016383 Devolver Digital \n",
"23 0.020019 Konami \n",
"24 0.036826 Apple \n",
"\n",
" Most Negative Sentence \\\n",
"0 Does anyone actually play the crappy F2P games... \n",
"1 Rockstar Lies & Red Dead Online Economy Is A G... \n",
"2 Dead before it even released? A valve game?? A... \n",
"3 Sony's Devil May Cry has arrived. Lost Souls A... \n",
"4 NO! BAD MICROSOFT! I'm so ashamed of you! \n",
"5 Resident Evil, Resident Evil 0, and Resident E... \n",
"6 Activision currently under investigation for f... \n",
"7 Heroes of the Storm pros vent sadness, anger a... \n",
"8 EA Head Fired For Gross Misconduct \n",
"9 WTF were Namco Bandai thinking? \n",
"10 Ubisoft will now ban players for racist, homop... \n",
"11 What Nexon games use NX? \n",
"12 No wonder Telltale Games died a slow painful d... \n",
"13 God of War Has An Epic Avengers Infinity War R... \n",
"14 Did EA ruin Bioware or did Bioware ruin itself? \n",
"15 Naughty Dog's lead animator explains in-depth ... \n",
"16 kingdom hearts 3 deluxe edition....what the HE... \n",
"17 Santa Monica, Gorilla, Insomniac, Sucker Punch... \n",
"18 Bethesda worst dev/pub of all time,nothing was... \n",
"19 As a Devil My Cry fan, I'm jealous of the way ... \n",
"20 Take two/Rockstar will be VERY hard to stop wi... \n",
"21 Damn sega!!!! Killing it this week!! \n",
"22 Not a Hero hitting Switch Aug 2nd. 12 more Dev... \n",
"23 Death Stranding will prove that Konami was rig... \n",
"24 Went shopping for apple products, it's a horri... \n",
"\n",
" Most Positive Sentence Most Negative Score \\\n",
"0 Should superior Chinese companies like Tencent... -0.2960 \n",
"1 Ubisoft is a BETTER company than Rockstar! LOL... -0.8625 \n",
"2 Artifact is so good, Kotaku writer wants to re... -0.7041 \n",
"3 Sony wins best Float at PRIDE 2018 -0.8689 \n",
"4 Amazing show Microsoft!! My brother even said ... -0.9191 \n",
"5 Discovered a Nintendo office close to where I ... -0.9349 \n",
"6 LMAO Anthem is the exact same hustle Bungie us... -0.5859 \n",
"7 Thank you Activision for CoD Black Ops 4 Black... -0.7783 \n",
"8 EA are an excellent company that provides chea... -0.7717 \n",
"9 Bandai Namco proves to be the best third party... -0.6739 \n",
"10 Ubisoft is a BETTER company than Rockstar! LOL... -0.8225 \n",
"11 What Nexon games use NX? 0.0000 \n",
"12 Would Telltale Be The Best Developer If They H... -0.9118 \n",
"13 The Epic Games store is now live - giving away... -0.7717 \n",
"14 Former Bioware legend Mike Laidlaw praises wit... -0.8225 \n",
"15 I for one am proud Naughty Dog is displaying E... -0.6696 \n",
"16 Did Square Enix ever state why they skipped ou... -0.8793 \n",
"17 Ratchet and Clank PS4 is Insomniacs BEST SELLI... -0.5267 \n",
"18 Call me crazy, but Fallout 4's the best Bethes... -0.8271 \n",
"19 Operation Make DMC Great again is a success! T... -0.9274 \n",
"20 Take2 CEO on Epic Store: Competition is a good... -0.6756 \n",
"21 Virtua Fighter 5 Final Showdown is free with G... -0.8507 \n",
"22 So who is Devolver Digital and should I care a... -0.4449 \n",
"23 What is the best Konami game you've ever played? -0.7430 \n",
"24 Apple Finally Caves, promises to support Steam... -0.5423 \n",
"\n",
" Most Positive Score Number of Posts \n",
"0 0.8176 9 \n",
"1 0.8419 167 \n",
"2 0.8147 94 \n",
"3 0.9008 1627 \n",
"4 0.8798 708 \n",
"5 0.9273 3969 \n",
"6 0.6841 37 \n",
"7 0.8555 183 \n",
"8 0.8316 130 \n",
"9 0.7845 73 \n",
"10 0.8419 201 \n",
"11 0.0000 1 \n",
"12 0.7964 109 \n",
"13 0.8625 189 \n",
"14 0.8122 60 \n",
"15 0.7712 72 \n",
"16 0.8788 286 \n",
"17 0.7125 33 \n",
"18 0.8481 247 \n",
"19 0.8999 386 \n",
"20 0.7003 20 \n",
"21 0.8932 220 \n",
"22 0.4939 12 \n",
"23 0.7650 134 \n",
"24 0.6486 31 "
]
},
"execution_count": 361,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dev_sentiments"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### EA and Nintendo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's take a deeper look at a couple of companies with scores on two opposite ends of the spectrum, Electronic Arts and Nintendo. These two have the second worst and second best scores respectively, but they also have plenty of posts, which the developers with the worst and best scores (Take-Two, 20 posts and Tencent, 9 posts) don't have."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Nintendo:"
]
},
{
"cell_type": "code",
"execution_count": 364,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Mean Sentiment | \n",
" Developer | \n",
" Most Negative Sentence | \n",
" Most Positive Sentence | \n",
" Most Negative Score | \n",
" Most Positive Score | \n",
" Number of Posts | \n",
"
\n",
" \n",
" \n",
" \n",
" 5 | \n",
" 0.072509 | \n",
" Nintendo | \n",
" Resident Evil, Resident Evil 0, and Resident E... | \n",
" Discovered a Nintendo office close to where I ... | \n",
" -0.9349 | \n",
" 0.9273 | \n",
" 3969 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Mean Sentiment Developer \\\n",
"5 0.072509 Nintendo \n",
"\n",
" Most Negative Sentence \\\n",
"5 Resident Evil, Resident Evil 0, and Resident E... \n",
"\n",
" Most Positive Sentence Most Negative Score \\\n",
"5 Discovered a Nintendo office close to where I ... -0.9349 \n",
"\n",
" Most Positive Score Number of Posts \n",
"5 0.9273 3969 "
]
},
"execution_count": 364,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']"
]
},
{
"cell_type": "code",
"execution_count": 367,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Nintendo's most negative sentence:\n",
"Resident Evil, Resident Evil 0, and Resident Evil 4 coming to Nintendo Switch in 2019\n",
"\n",
"Nintendo's most positive sentence:\n",
"Discovered a Nintendo office close to where I live and asked if they had any kind of tour or something. Lady told me they hadn’t but she handed me a bag full of cool souvenirs. This coin is definitely the best of all!\n",
"\n"
]
}
],
"source": [
"print('Nintendo\\'s most negative sentence:\\n' +\n",
" dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']['Most Negative Sentence'].values[0] + '\\n')\n",
"print('Nintendo\\'s most positive sentence:\\n' +\n",
" dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']['Most Positive Sentence'].values[0] + '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For Nintendo it looks like the worst post, which is in fact the most negatively rated post of all threads across all developers, is rated so because it mentions the game \"Resident Evil\" multiple times. This only testifies for their high overall score.\n",
"
\n",
"Nintendo being so well liked comes to no surprise. They without a doubt have the most devout following of any modern gaming company. So many people grew up on Nintendo as children and continue to play their games as adults, many even strictly stick to Nintendo.\n",
"
\n",
"Let's get the word frequencies from Nintendo posts."
]
},
{
"cell_type": "code",
"execution_count": 380,
"metadata": {},
"outputs": [],
"source": [
"from collections import Counter"
]
},
{
"cell_type": "code",
"execution_count": 414,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Most common neutral words in negative titles: [('nintendo', 187), ('the', 71), ('to', 52), ('is', 46), ('switch', 35), ('a', 32), ('of', 29), ('and', 26), ('for', 26), ('why', 26), ('in', 19), (\"nintendo's\", 19), ('on', 19), ('online', 19), ('you', 19), ('it', 16), ('has', 15), ('have', 14), ('do', 14), ('that', 13), ('so', 13), ('i', 12), ('are', 12), ('what', 11), ('-', 10), ('up', 10), ('games', 10), ('with', 10), ('this', 10), ('will', 9), ('sony', 9), ('if', 9), ('console', 9), ('does', 8), ('e3', 8), ('be', 8), ('an', 7), ('about', 7), ('at', 7), ('get', 7), ('nintendo?', 7), ('game', 7), ('think', 6), ('was', 6), ('or', 6), ('would', 6), ('not', 6), ('how', 6), ('most', 6), ('did', 6)]\n"
]
}
],
"source": [
"neg_titles = []\n",
"for title in post_dict['Nintendo']:\n",
" sentiment = sid.polarity_scores(title)['compound']\n",
" \n",
" if sentiment <= -0.5:\n",
" neg_titles.append(title)\n",
"\n",
"\n",
"nintendo_words = ' '.join(neg_titles).split(' ')\n",
"\n",
"neu_words=[]\n",
"\n",
"for word in nintendo_words:\n",
" sentiment = sid.polarity_scores(word)['compound']\n",
" \n",
" if (sentiment >= -0.4 and sentiment <= 0.4):\n",
" neu_words.append(word.lower())\n",
"\n",
"neu_freq = Counter(neu_words)\n",
"\n",
"print('Most common neutral words in negative titles: ', neu_freq.most_common(50))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Going down the list we see some common words, but then notice one which should definitely not be common:\n",
"
\n",
"'Online'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are some sentences containing online in the titles with negative sentiments, there are 19 posts total."
]
},
{
"cell_type": "code",
"execution_count": 416,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Jim Sterling: The Online system makes nintendo look weak and stupid\n",
"Nintendo Switch Paid Online Still a Disaster? - Nintendo Direct Review\n",
"So nintendo online was a scam\n",
"Everytime I finish a mission in Resident Evil a Nintendo Online message appears\n",
"Would it of killed Nintendo to add promotion SNES titles to new Online Subs?\n",
"Nintendo's paid online is bad. FACT.\n",
"scumbag nintendo wont let me try the darksouls demo without online.\n",
"Will Nintendo Switch Online kill multiplayer lobbies?\n",
"Nintendo would be dumb to not have an online paywall TBH.\n",
"What the hell does Nintendo online even include?\n"
]
}
],
"source": [
"index = 0\n",
"bad_count = 0\n",
"while bad_count < 10:\n",
" if 'online' in neg_titles[index].lower():\n",
" print(neg_titles[index])\n",
" bad_count += 1\n",
" index += 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From this it's obvious that one very critical complaint of Nintendo is the Online system they have in place. If there was one thing they could do to please their base, it would be to address the paywall and offer more with their online subscription (i.e. it's lackluster). We can figure this all out just based on these posts."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Electronic Arts:"
]
},
{
"cell_type": "code",
"execution_count": 363,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Mean Sentiment | \n",
" Developer | \n",
" Most Negative Sentence | \n",
" Most Positive Sentence | \n",
" Most Negative Score | \n",
" Most Positive Score | \n",
" Number of Posts | \n",
"
\n",
" \n",
" \n",
" \n",
" 8 | \n",
" -0.105926 | \n",
" Electronic Arts | \n",
" EA Head Fired For Gross Misconduct | \n",
" EA are an excellent company that provides chea... | \n",
" -0.7717 | \n",
" 0.8316 | \n",
" 130 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Mean Sentiment Developer Most Negative Sentence \\\n",
"8 -0.105926 Electronic Arts EA Head Fired For Gross Misconduct \n",
"\n",
" Most Positive Sentence Most Negative Score \\\n",
"8 EA are an excellent company that provides chea... -0.7717 \n",
"\n",
" Most Positive Score Number of Posts \n",
"8 0.8316 130 "
]
},
"execution_count": 363,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']"
]
},
{
"cell_type": "code",
"execution_count": 369,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Electronic Arts's most negative sentence:\n",
"EA Head Fired For Gross Misconduct\n",
"\n",
"Electronic Arts's most positive sentence:\n",
"EA are an excellent company that provides cheap access to a lot of great games\n",
"\n"
]
}
],
"source": [
"print('Electronic Arts\\'s most negative sentence:\\n' +\n",
" dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']['Most Negative Sentence'].values[0] + '\\n')\n",
"print('Electronic Arts\\'s most positive sentence:\\n' +\n",
" dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']['Most Positive Sentence'].values[0] + '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unlike Nintendo, many people online really hate EA. In fact, they have the most downvoted comment of any post in Reddit history, which should be a testament to how negatively they are seen. Still, they continue to be pretty successful. Apex Legends, a new game they recently released, seems to be gaining rapid popularity. Public opinion on the way they monetize their games seems to be changing, which may be a good indicator that people will once again have a positive attitude towards the company.\n",
"
\n",
"Similarly I am going to check EA's sentiment frequency."
]
},
{
"cell_type": "code",
"execution_count": 418,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Most common neutral words in negative titles: [('ea', 21), ('for', 9), ('star', 6), ('game', 6), ('is', 6), ('of', 4), ('open', 4), ('world', 4), ('hiring', 3), ('under', 3), ('investigation', 3), ('to', 3), ('the', 3), ('cancels', 2), ('open-world', 2), ('an', 2), ('anthem', 2), ('video', 2), ('removed', 2), ('another', 2), ('sell', 2), ('lootboxes', 2), ('in', 2), ('belgium', 2), ('games', 2), ('are', 2), ('stocks', 2), ('by', 2), ('bfv', 2), ('sales', 2), ('people', 2), ('ea:', 1), ('youtube', 1), (\"creator's\", 1), ('disclosure', 1), ('not', 1), ('content', 1), ('head', 1), ('misconduct', 1), ('zelda', 1), ('botw', 1), ('mass', 1), (\"effect's\", 1), ('franchise', 1), ('continuing', 1), ('automatically', 1), ('loses.', 1), ('conference.', 1), ('and', 1), ('got', 1), ('downgraded', 1), ('says', 1), ('singleplayer', 1), ('god', 1), ('goty.', 1), ('massive', 1), ('blow', 1), ('&', 1), ('plummet!', 1), ('should', 1), ('space', 1), ('ip', 1), ('falling', 1), ('apart.', 1), ('we', 1), ('plummet', 1), ('$21', 1), ('billion.', 1), ('blame', 1), (\"ea's\", 1), ('cancelled', 1), ('vancouver', 1), ('execs', 1), ('dump', 1), ('millions', 1)]\n"
]
}
],
"source": [
"neg_titles = []\n",
"for title in post_dict['Electronic Arts']:\n",
" sentiment = sid.polarity_scores(title)['compound']\n",
" \n",
" if sentiment <= -0.5:\n",
" neg_titles.append(title)\n",
"\n",
"\n",
"ea_words = ' '.join(neg_titles).split(' ')\n",
"\n",
"neu_words=[]\n",
"\n",
"for word in ea_words:\n",
" sentiment = sid.polarity_scores(word)['compound']\n",
" \n",
" if (sentiment >= -0.4 and sentiment <= 0.4):\n",
" neu_words.append(word.lower())\n",
"\n",
"neu_freq = Counter(neu_words)\n",
"\n",
"print('Most common neutral words in negative titles: ', neu_freq.most_common(75))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because EA has a smaller sample size we should look at multiple words to get some more intuition."
]
},
{
"cell_type": "code",
"execution_count": 421,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"EA Cancels Open-World Star Wars Game\n",
"EA Is Hiring For An Open-World Star Wars Game\n",
"EA: YouTube creator's Anthem video removed for disclosure failure, not content\n",
"EA is under criminal investigation for continuing to sell Lootboxes in Belgium\n",
"EA Automatically loses. Horrible Conference. And Anthem got DOWNGRADED\n",
"EA's Open World Star Wars Game Cancelled\n",
"EA Vancouver hiring for open world Star Wars game\n",
"EA is hiring people for an open world star wars game...\n",
"EA cancels open world Star Wars game\n",
"EA is under criminal investigation by the Belgium government for FIFA lootboxes\n"
]
}
],
"source": [
"index = 0\n",
"bad_count = 0\n",
"while bad_count < 10:\n",
" if 'star' in neg_titles[index].lower() \\\n",
" or 'anthem' in neg_titles[index].lower() \\\n",
" or 'lootboxes' in neg_titles[index].lower():\n",
" \n",
" print(neg_titles[index])\n",
" bad_count += 1\n",
" index += 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The most critical complaints are on anthem and lootboxes while the negative sentiment with Star Wars seems to be more of a disappointment that a game was cancelled given the several posts on the topic. Again, the frequency size is small because the number of posts wasn't much but we can still extract a good amount of information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Final Remarks\n",
"\n",
"There is plenty more that can be done, like getting more data from the comments. This would give much more input and allow us to view even more opinionated posts, meaning a better consensus of how people feel about different companies. This is at least a taste of what can be done"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}