{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Impressions on Video Game Developers from Online Forums\n", "\n", "## Reasoning:\n", "Knowledge of webscraping gives access to the largest bank of data available. Almost any website can become a source of data. Its use can range from analyzing competitors to learning more about a user base.\n", "\n", "It would make sense to scrape through comments of each post too, but that would take far too long. GameFaqs is at least very casual so posts there will be as opinionated as comments.\n", "\n", "\n", "## Objective:\n", "The slightly long-winded title explains most of what this notebook is about. My goal is to scrape the post titles from Reddit and Gamefaqs to perform sentiment analysis on them on top of a some data exploration.\n", "\n", "\n", "## Methods:\n", "I will be using selenium to scroll through all reddit posts and to do some other automation used for clicking buttons. BeautifulSoup will be used to scrape and retrieve the actual data.\n", "\n", "I will need to scrape a list of current common user agents and lists of free, recent proxies to rotate through for GameFaqs. They use pages and not infinite scrolling so too many http requests will result in a ban. To play it safe and uninterrupted a few measures will be taken. These IPs and user agents could be used for scraping any other website as well.\n", "\n", "I will be obtaining all posts from the past year and filter out the ones which don't mention any that don't mention a developer.\n", "\n", "## Featured:\n", "- Webscraping with BeautifulSoup and Selenium\n", "- Advanced knowledge on how to rotate proxies and user agents\n", "- Working with Pandas DataFrames\n", "- Data analysis and visualization\n" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import random\n", "import time\n", "import pandas as pd\n", "from bs4 import BeautifulSoup as soup\n", "from selenium import webdriver\n", "from webdriver_manager.chrome import ChromeDriverManager" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scraping reddit\n", "\n", "
\n", "Here I create a function which uses a web driver to simulate scrolling and load in all results of the page (reddit limits to around 1000 posts). If I wanted to access the entire archive, there are websites which store all of Reddit's data. However, since I am also getting plenty of data from other forums there is not much need to scrape the archives for the purpose of this notebook.\n", "\n", "After the entire page is loaded, we can scrape all the text of each post title." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "def reddit_scraper(url):\n", " '''\n", " Webscrapes all reddit posts from the given link by scrolling through the \"infinite scrolling\"\n", " \n", " Args:\n", " url: The url of the subreddit or other reddit page you'd like to scrape\n", " \n", " Returns:\n", " A list of all post titles on that page\n", " '''\n", " \n", " driver = webdriver.Chrome(ChromeDriverManager().install())\n", "\n", " driver.get(url)\n", "\n", " for n in range(600): \n", " driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')\n", "\n", " time.sleep(0.5)\n", "\n", " \n", " page_html = soup(driver.page_source, 'lxml')\n", "\n", " driver.close()\n", " \n", " containers = page_html.findAll(\"a\", {'data-click-id' : 'body'})\n", "\n", " post_titles = []\n", " for container in containers:\n", "\n", " titles = container.find_all(\"h2\", recursive=True)\n", "\n", " for title_tag in titles:\n", " post_titles.append(title_tag.text)\n", "\n", " \n", " return post_titles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scraping r/games" ] }, { "cell_type": "code", "execution_count": 280, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Checking for mac64 chromedriver:2.46 in cache\n", "Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver\n" ] } ], "source": [ "r_games_url = 'https://www.reddit.com/r/games/top/?t=year'\n", "r_games_posts = reddit_scraper(r_games_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scraping r/gaming" ] }, { "cell_type": "code", "execution_count": 281, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Checking for mac64 chromedriver:2.46 in cache\n", "Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver\n" ] } ], "source": [ "r_gaming_url = 'https://www.reddit.com/r/gaming/top/?t=year'\n", "r_gaming_posts = reddit_scraper(r_gaming_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scraping r/truegaming" ] }, { "cell_type": "code", "execution_count": 282, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Checking for mac64 chromedriver:2.46 in cache\n", "Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver\n" ] } ], "source": [ "r_truegaming_url = 'https://www.reddit.com/r/truegaming/top/?t=year'\n", "r_truegaming_posts = reddit_scraper(r_truegaming_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's combine all the results from each subreddit" ] }, { "cell_type": "code", "execution_count": 289, "metadata": {}, "outputs": [], "source": [ "# r/games\n", "r_games_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/games']], columns=['Post', 'Website', 'Board']) \n", " for title in r_games_posts], \n", " ignore_index=True)\n", "\n", "# r/gaming\n", "r_gaming_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/gaming']], columns=['Post', 'Website', 'Board']) \n", " for title in r_gaming_posts], \n", " ignore_index=True)\n", "\n", "# r/truegaming\n", "r_truegaming_forums = pd.concat([pd.DataFrame([[title, 'Reddit', 'r/truegaming']], columns=['Post', 'Website', 'Board']) \n", " for title in r_truegaming_posts], \n", " ignore_index=True)\n", "\n", "# Join all to post_titles\n", "post_titles = pd.DataFrame(columns=['Post', 'Website', 'Board'])\n", "post_titles = post_titles.append([r_games_forums, r_gaming_forums, r_truegaming_forums], ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Progress Report\n", "Let's check out the DataFrame so far" ] }, { "cell_type": "code", "execution_count": 290, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape: (3007, 3)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PostWebsiteBoard
0John @Totalbiscuit Bain July 8, 1984 - May 24,...Redditr/games
1Bungie Splits With ActivisionRedditr/games
2Totalbiscuit hospitalized, his cancer is sprea...Redditr/games
3[E3 2018] Cyberpunk 2077Redditr/games
4Sony faces growing Fortnite backlash at E3Redditr/games
\n", "
" ], "text/plain": [ " Post Website Board\n", "0 John @Totalbiscuit Bain July 8, 1984 - May 24,... Reddit r/games\n", "1 Bungie Splits With Activision Reddit r/games\n", "2 Totalbiscuit hospitalized, his cancer is sprea... Reddit r/games\n", "3 [E3 2018] Cyberpunk 2077 Reddit r/games\n", "4 Sony faces growing Fortnite backlash at E3 Reddit r/games" ] }, "execution_count": 290, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Shape: {}'.format(post_titles.shape))\n", "post_titles.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scraping GameFaqs\n", "\n", "GameFaqs will be a different challenge. Rather than infinite scrolling, this website uses pages. This means many http requests will need to be made, so in order to avoid an ip ban and not strain their servers a few things must be done:\n", "\n", "1. Rotate user agents\n", "2. Rotate ip addresses\n", "3. Sleep on each request" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Obtaining User Agents to Rotate Through\n", "\n", "To accomplish this we have to make use of whatismybrowser.com's list of common current user agents, which means webscraping the page to stay up to date." ] }, { "cell_type": "code", "execution_count": 224, "metadata": {}, "outputs": [], "source": [ "def get_agents(browser, num_agents=10, offset=0):\n", " '''\n", " Webscrapes whatismybrowser.com for new user agents\n", " \n", " Args:\n", " browser: the browser you want the agent from\n", " num_agents: number of agents to return\n", " offset: get agents starting from offset num. on page\n", " \n", " Returns:\n", " A list of user agents from the given browser\n", " '''\n", " \n", " if offset + num_agents > 50:\n", " return []\n", " \n", " try:\n", " chrome_url = requests.get('https://developers.whatismybrowser.com/useragents/explore/software_name/' \\\n", " + browser)\n", " except:\n", " print('Browser does not exist. Try lower case')\n", " return\n", " \n", "\n", " chrome_html = soup(chrome_url.content)\n", "\n", " chrome_containers = chrome_html.findAll('td', {'class' : 'useragent'})\n", "\n", " user_agents = []\n", " for i in range(num_agents):\n", "\n", " chrome_agent = chrome_containers[i + offset].a.text\n", "\n", " user_agents.append(chrome_agent)\n", " \n", " return user_agents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get 10 user agents for chrome and 10 for firefox" ] }, { "cell_type": "code", "execution_count": 175, "metadata": {}, "outputs": [], "source": [ "user_agents = []\n", "user_agents.extend(get_agents('chrome'))\n", "user_agents.extend(get_agents('firefox'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Obtaining IP Addresses to Rotate Through\n", "\n", "We need to create a similar function for retrieving new proxies. This function is more important to call frequently as IPs should be updated frequently." ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "def get_ips(num_addresses=20):\n", " '''\n", " Webscrapes free-proxy-list.net for new free proxies. This is important because these proxies\n", " could go bad after just a couple hours.\n", " \n", " Args:\n", " num_addresses: The number of IPs you want returned. If fewer than requested are available,\n", " return the available amount\n", " \n", " Returns:\n", " A list of new proxies\n", " '''\n", " \n", " driver = webdriver.Chrome(ChromeDriverManager().install())\n", " driver.get('https://free-proxy-list.net/')\n", " \n", " page_html = soup(driver.page_source, 'lxml')\n", " containers = page_html.findAll('tr', {'role' : 'row'})\n", " \n", " ips = []\n", " ip_num = 0\n", " page_num = 1\n", " next_set_btn = driver.find_element_by_xpath('//*[@id=\"proxylisttable_next\"]/a')\n", " while len(ips) < num_addresses:\n", " \n", " ip_num += 1\n", " \n", " # Click next button to get more ips if the current page doesn't have enough\n", " if ((ip_num % 20) - 1 == 0) and ip_num != 1:\n", " \n", " # If reached the last page, return what we have\n", " if page_num >= 15:\n", " driver.close()\n", " return ips\n", " \n", " next_set_btn.click()\n", " next_set_btn = driver.find_element_by_xpath('//*[@id=\"proxylisttable_next\"]/a')\n", " \n", " ip_num = 1\n", " page_html = soup(driver.page_source, 'lxml')\n", " containers = page_html.findAll('tr', {'role' : 'row'})\n", " \n", " page_num += 1\n", " \n", " row = containers[ip_num].find_all('td')\n", " \n", " ip = row[0].text\n", " port = row[1].text\n", " \n", " if row[6].text == 'yes':\n", " ips.append(':'.join([ip, port]))\n", "\n", " driver.close()\n", " \n", " return ips" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Checking for mac64 chromedriver:2.46 in cache\n", "Driver found in /Users/adrianherrmann/.wdm/chromedriver/2.46/mac64/chromedriver\n" ] } ], "source": [ "ips = get_ips(20)" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['51.68.112.254:3128',\n", " '45.32.42.234:8080',\n", " '178.128.54.73:8080',\n", " '104.248.16.45:8080',\n", " '177.38.66.255:45235',\n", " '95.47.180.171:53484',\n", " '138.186.23.9:40340',\n", " '182.160.119.254:56229',\n", " '103.250.157.43:38641',\n", " '115.127.39.66:55474',\n", " '88.210.71.234:46626',\n", " '177.94.206.67:60666',\n", " '1.10.186.157:55129',\n", " '176.197.103.210:53281',\n", " '109.201.97.235:39125',\n", " '31.43.143.15:8181',\n", " '193.213.89.72:51024',\n", " '183.82.118.87:8080',\n", " '41.84.131.78:53281',\n", " '93.77.78.123:42803']" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print out the proxies to see what they look like\n", "ips" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. GameFaqs Scraping Function with Pauses" ] }, { "cell_type": "code", "execution_count": 249, "metadata": {}, "outputs": [], "source": [ "def gamefaqs_scraper(url, num_pages, ips, user_agents, offset=0, start_page=0):\n", " '''\n", " Scrape GameFaqs forums for post titles\n", " \n", " Args:\n", " url: The url to the first page of GameFaqs\n", " num_pages: Number of pages to scrape\n", " \n", " Returns:\n", " A list of post titles\n", " '''\n", " \n", " rot_list = []\n", " for ip in ips:\n", " rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n", " \n", " \n", " req = ''\n", " i = 0\n", " while req == '':\n", " try:\n", " agent_proxy_pair = random.choice(rot_list)\n", " proxy = agent_proxy_pair[0]\n", " headers = agent_proxy_pair[1]\n", " \n", " if start_page == 0:\n", " req = requests.get(url, headers=headers, proxies=proxy, timeout=10)\n", " else:\n", " req = requests.get(url + '?page=' + str(start_page+1), headers=headers, proxies=proxy, timeout=10)\n", " \n", " print('Success with IP, ' + proxy['https'])\n", " \n", " page_html = soup(req.content)\n", "\n", " containers = page_html.findAll('td', {'class' : 'topic'})\n", " \n", " if not containers:\n", " print('Agent may be banned, removing agent and trying a new one...')\n", " print(page_html, user_agent)\n", " try:\n", " user_agents.remove(headers['User-Agent']) \n", " except:\n", " pass\n", " \n", " req = ''\n", "\n", " except Exception as e:\n", " i += 1\n", " print('Error with IP, ' + proxy['https'] + ' requesting a new one...')\n", " \n", " if i % 20 == 0:\n", " ips = get_ips(20)\n", " \n", " rot_list = []\n", " for ip in ips:\n", " rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n", " \n", " \n", " post_titles = []\n", " for page in range(start_page, num_pages):\n", " \n", " for container in containers:\n", " title = container.a.text\n", " \n", " post_titles.append(title)\n", " \n", " \n", " time.sleep(3)\n", " \n", " req = ''\n", " i = 0\n", " while req == '':\n", " try: \n", " agent_proxy_pair = random.choice(rot_list)\n", " proxy = agent_proxy_pair[0]\n", " headers = agent_proxy_pair[1]\n", "\n", " req = requests.get(url + '?page=' + str(page + 1), headers=headers, proxies=proxy, timeout=10)\n", " \n", " page_html = soup(req.content)\n", "\n", " containers = page_html.findAll('td', {'class' : 'topic'})\n", "\n", " if not containers:\n", " print('Agent may be banned, removing agent and trying a new one...')\n", " \n", " try:\n", " rot_list.remove(agent_proxy_pair)\n", " if not rot_list:\n", " print('Loading in new IPs...')\n", " ips = get_ips(20)\n", "\n", " rot_list = []\n", " for ip in ips:\n", " rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n", "\n", " \n", " user_agents.remove(headers['User-Agent']) \n", " except:\n", " pass\n", " \n", " req = ''\n", " time.sleep(2)\n", " \n", " if len(user_agents) == 0:\n", " print('No more agents, ended at page,', page+1)\n", " return post_titles\n", " else:\n", " print('Success with IP ' + proxy['https'] + ', now onto page ', page + 2)\n", " \n", " except Exception as e:\n", " i += 1\n", " time.sleep(2)\n", " print('Error with IP ' + proxy['https'] + ', requesting a new one...')\n", " \n", " if i % 20 == 0:\n", " print('Loading in new IPs...')\n", " ips = get_ips(20)\n", " \n", " rot_list = []\n", " for ip in ips:\n", " rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n", " \n", " if page % 100 == 0:\n", " print('Loading in new IPs...')\n", " ips = get_ips(20)\n", " \n", " rot_list = []\n", " for ip in ips:\n", " rot_list.append([{'https' : ip}, {'User-Agent' : random.choice(user_agents)}])\n", " \n", " \n", " return post_titles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: I am simply printing the last 10 outputs for each forum webscraped after realizing the output couldn't be shrinked when uploaded." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Nintendo Switch forums" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success with IP 118.174.233.33:54705, now onto page 1692 \n", "Success with IP 203.128.94.102:60152, now onto page 1693 \n", "Success with IP 182.52.238.111:45639, now onto page 1694 \n", "Success with IP 116.203.1.177:1994, now onto page 1695 \n", "Success with IP 217.17.38.245:41506, now onto page 1696 \n", "Success with IP 203.128.94.102:60152, now onto page 1697 \n", "Success with IP 180.180.156.35:49510, now onto page 1698 \n", "Success with IP 1.20.97.4:46965, now onto page 1699 \n", "Success with IP 203.128.94.102:60152, now onto page 1700 \n", "Success with IP 180.180.156.45:32355, now onto page 1701\n" ] } ], "source": [ "switch_url = 'https://gamefaqs.gamespot.com/boards/189706-nintendo-switch'\n", "switch_posts = gamefaqs_scraper(switch_url, num_pages=1700, ips=ips, user_agents=user_agents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Playstation forums" ] }, { "cell_type": "code", "execution_count": 225, "metadata": {}, "outputs": [], "source": [ "# Use new agents to avoid a temporary ban\n", "user_agents = []\n", "user_agents.extend(get_agents('chrome/2', num_agents=50))\n", "user_agents.extend(get_agents('firefox/2', num_agents=50))\n", "user_agents.extend(get_agents('safari/2', num_agents=50))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success with IP 119.192.179.46:55012, now onto page 1932 \n", "Success with IP 1.20.101.150:41904, now onto page 1933 \n", "Error with IP 103.194.192.29:49202, requesting a new one... \n", "Agent may be banned, removing agent and trying a new one... \n", "Error with IP 213.14.32.75:47442, requesting a new one... \n", "Error with IP 103.220.28.180:51493, requesting a new one... \n", "Error with IP 103.220.28.180:51493, requesting a new one... \n", "Success with IP 111.91.225.2:8080, now onto page 1934 \n", "Success with IP 119.192.179.46:55012, now onto page 1935 \n", "Success with IP 1.20.101.150:41904, now onto page 1936\n" ] } ], "source": [ "ips = get_ips(20)\n", "playstation_url = 'https://gamefaqs.gamespot.com/boards/691087-playstation-4'\n", "playstation_posts = gamefaqs_scraper(playstation_url, num_pages=1935, ips=ips, user_agents=user_agents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### PC forums" ] }, { "cell_type": "code", "execution_count": 250, "metadata": {}, "outputs": [], "source": [ "# Use new agents to avoid a temporary ban\n", "user_agents = []\n", "user_agents.extend(get_agents('chrome/3', num_agents=50))\n", "user_agents.extend(get_agents('firefox/3', num_agents=50))\n", "user_agents.extend(get_agents('safari/3', num_agents=50))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success with IP 176.98.95.247:31955, now onto page 1061 \n", "Error with IP 45.6.100.250:48214, requesting a new one... \n", "Error with IP 75.98.119.13:57859, requesting a new one... \n", "Error with IP 45.6.100.250:48214, requesting a new one... \n", "Success with IP 41.215.81.170:59959, now onto page 1062 \n", "Success with IP 41.215.81.170:59959, now onto page 1063 \n", "Error with IP 45.6.100.250:48214, requesting a new one... \n", "Success with IP 87.26.3.40:8080, now onto page 1064 \n", "Success with IP 203.205.29.106:39191, now onto page 1065 \n", "Success with IP 87.26.3.40:8080, now onto page 1066\n" ] } ], "source": [ "ips = get_ips(20)\n", "pc_url = 'https://gamefaqs.gamespot.com/boards/916373-pc'\n", "pc_posts = gamefaqs_scraper(pc_url, num_pages=1065, ips=ips, user_agents=user_agents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Xbox One forums\n", "\n", "We will just reuse the same user agents here" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Error with IP 210.11.181.221:55331, requesting a new one... \n", "Error with IP 178.128.217.99:8080, requesting a new one... \n", "Error with IP 31.209.110.159:39494, requesting a new one... \n", "Error with IP 210.11.181.221:55331, requesting a new one... \n", "Error with IP 202.91.92.21:43576, requesting a new one... \n", "Error with IP 5.2.200.145:44508, requesting a new one... \n", "Success with IP 109.201.142.14:3128, now onto page 710 \n", "Agent may be banned, removing agent and trying a new one... \n", "Error with IP 124.41.240.191:38167, requesting a new one... \n", "Success with IP 109.201.142.14:3128, now onto page 711\n" ] } ], "source": [ "ips = get_ips(20)\n", "xbox_url = 'https://gamefaqs.gamespot.com/boards/691088-xbox-one'\n", "xbox_posts = gamefaqs_scraper(xbox_url, num_pages=710, ips=ips, user_agents=user_agents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Removing Duplicates and Combining All the Results" ] }, { "cell_type": "code", "execution_count": 257, "metadata": {}, "outputs": [], "source": [ "switch_posts = list(set(switch_posts))\n", "playstation_posts = list(set(playstation_posts))\n", "pc_posts = list(set(pc_posts))\n", "xbox_posts = list(set(xbox_posts))" ] }, { "cell_type": "code", "execution_count": 291, "metadata": {}, "outputs": [], "source": [ "# Switch Boards\n", "switch_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Switch']], columns=['Post', 'Website', 'Board']) \n", " for title in switch_posts], \n", " ignore_index=True)\n", "\n", "# PS4 Boards\n", "playstation_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Playstation 4']], columns=['Post', 'Website', 'Board']) \n", " for title in playstation_posts], \n", " ignore_index=True)\n", "\n", "# Xbox One Boards\n", "xbox_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'Xbox One']], columns=['Post', 'Website', 'Board']) \n", " for title in xbox_posts], \n", " ignore_index=True)\n", "\n", "# PC Boards\n", "pc_forums = pd.concat([pd.DataFrame([[title, 'GameFaqs', 'PC']], columns=['Post', 'Website', 'Board']) \n", " for title in pc_posts], \n", " ignore_index=True)\n", "\n", "# Join all to post_titles\n", "post_titles = pd.concat([post_titles, switch_forums, playstation_forums, xbox_forums, pc_forums], ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have all the posts we want and could display the final results" ] }, { "cell_type": "code", "execution_count": 292, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PostWebsiteBoard
0John @Totalbiscuit Bain July 8, 1984 - May 24,...Redditr/games
1Bungie Splits With ActivisionRedditr/games
2Totalbiscuit hospitalized, his cancer is sprea...Redditr/games
3[E3 2018] Cyberpunk 2077Redditr/games
4Sony faces growing Fortnite backlash at E3Redditr/games
5John “TotalBiscuit” Bain to be inducted into E...Redditr/games
6Later today, Red Dead 2 gets a new trailer. Be...Redditr/games
7List of Video Games where you can pet the dogsRedditr/games
8It's time video game makers unionize.Redditr/games
9Bethesda Support Leaks Fallout 76 Customer Nam...Redditr/games
10Ubisoft will now ban players for racist, homop...Redditr/games
11Fallout 76 – Official Teaser TrailerRedditr/games
12Nintendo of America’s Reggie Fils-Aime to Reti...Redditr/games
13Obsidian's The Outer Worlds blends Firefly and...Redditr/games
14Bethesda offering 500 atoms ($5 ingame store c...Redditr/games
15[E3 2018] The Elder Scrolls VIRedditr/games
16Giantbomb Unlikely to Review Fallout 76. Gerst...Redditr/games
17Report: The Walking Dead developer Telltale Ga...Redditr/games
18Introducing the Xbox Adaptive ControllerRedditr/games
19Game dev: Linux users were only 0.1% of sales ...Redditr/games
20Sony's Stubborn Stance on Cross-Play Is Embarr...Redditr/games
21Metro dev: 'if at all all the PC players annou...Redditr/games
22Blizzard Says It Wasn't Expecting Fans To Be T...Redditr/games
23Black Ops 4 adds microtransactions, requiring ...Redditr/games
24This takes it to the next level, Are we really...Redditr/games
25EA Cancels Open-World Star Wars GameRedditr/games
26In the crazy economy of Red Dead Online, baked...Redditr/games
27PlayStation Skipping E3 For First Time in Show...Redditr/games
28Cyberpunk 2077 is a First-Person RPGRedditr/games
29Belgian government opens criminal investigatio...Redditr/games
............
107471Skyrim vs Kingdom Come Deliverance, which game...GameFaqsPC
107472MH World PC port planned for autumn 2018GameFaqsPC
107473Best place to buy cheap Steam key?GameFaqsPC
107474Looking to get my A+ certification.GameFaqsPC
107475Gears of war 4 help plzGameFaqsPC
107476WD Easystore 4TB External Drive $120 At Best BuyGameFaqsPC
107477With Injustice 1 and MKXL being the highest se...GameFaqsPC
107478Best sandbox building game? Preferably with mu...GameFaqsPC
107479Switch to a 2.4ghz connection instead of 5ghz ...GameFaqsPC
107480Over-Ear Headphones under $150?GameFaqsPC
107481What are your opinions on possible FO3 remaster?GameFaqsPC
107482Most enjoyable fighter on pc so far?GameFaqsPC
107483Monitor recommendations?GameFaqsPC
107484What E3 games will YOU be buying?GameFaqsPC
107485Your five most played Steam games?GameFaqsPC
107486New Fire Pro Wrestling game coming to SteamGameFaqsPC
107487Oculus Go is trash right now.GameFaqsPC
107488Need help getting Sonic Heroes to workGameFaqsPC
107489Anyone ever bypass Rockstar Social s*** on Steam?GameFaqsPC
107490So does Win10 still have that mandatory update...GameFaqsPC
107491Windows 10 update broke my graphics driver. AM...GameFaqsPC
107492There is a lot of fear mongering by net neutra...GameFaqsPC
107493Fallout 76 rust clone?GameFaqsPC
107494Please help me out here.GameFaqsPC
107495Climbed from bronze 4 to gold 5GameFaqsPC
107496Fellow 2500K users, when are you upgrading/hav...GameFaqsPC
107497About to grab the Sennheiser 598, couple last ...GameFaqsPC
107498So is it possible to do full body tracking wit...GameFaqsPC
107499Modular PSUGameFaqsPC
107500Steam dropped Bitcoin paymentsGameFaqsPC
\n", "

107501 rows × 3 columns

\n", "
" ], "text/plain": [ " Post Website Board\n", "0 John @Totalbiscuit Bain July 8, 1984 - May 24,... Reddit r/games\n", "1 Bungie Splits With Activision Reddit r/games\n", "2 Totalbiscuit hospitalized, his cancer is sprea... Reddit r/games\n", "3 [E3 2018] Cyberpunk 2077 Reddit r/games\n", "4 Sony faces growing Fortnite backlash at E3 Reddit r/games\n", "5 John “TotalBiscuit” Bain to be inducted into E... Reddit r/games\n", "6 Later today, Red Dead 2 gets a new trailer. Be... Reddit r/games\n", "7 List of Video Games where you can pet the dogs Reddit r/games\n", "8 It's time video game makers unionize. Reddit r/games\n", "9 Bethesda Support Leaks Fallout 76 Customer Nam... Reddit r/games\n", "10 Ubisoft will now ban players for racist, homop... Reddit r/games\n", "11 Fallout 76 – Official Teaser Trailer Reddit r/games\n", "12 Nintendo of America’s Reggie Fils-Aime to Reti... Reddit r/games\n", "13 Obsidian's The Outer Worlds blends Firefly and... Reddit r/games\n", "14 Bethesda offering 500 atoms ($5 ingame store c... Reddit r/games\n", "15 [E3 2018] The Elder Scrolls VI Reddit r/games\n", "16 Giantbomb Unlikely to Review Fallout 76. Gerst... Reddit r/games\n", "17 Report: The Walking Dead developer Telltale Ga... Reddit r/games\n", "18 Introducing the Xbox Adaptive Controller Reddit r/games\n", "19 Game dev: Linux users were only 0.1% of sales ... Reddit r/games\n", "20 Sony's Stubborn Stance on Cross-Play Is Embarr... Reddit r/games\n", "21 Metro dev: 'if at all all the PC players annou... Reddit r/games\n", "22 Blizzard Says It Wasn't Expecting Fans To Be T... Reddit r/games\n", "23 Black Ops 4 adds microtransactions, requiring ... Reddit r/games\n", "24 This takes it to the next level, Are we really... Reddit r/games\n", "25 EA Cancels Open-World Star Wars Game Reddit r/games\n", "26 In the crazy economy of Red Dead Online, baked... Reddit r/games\n", "27 PlayStation Skipping E3 For First Time in Show... Reddit r/games\n", "28 Cyberpunk 2077 is a First-Person RPG Reddit r/games\n", "29 Belgian government opens criminal investigatio... Reddit r/games\n", "... ... ... ...\n", "107471 Skyrim vs Kingdom Come Deliverance, which game... GameFaqs PC\n", "107472 MH World PC port planned for autumn 2018 GameFaqs PC\n", "107473 Best place to buy cheap Steam key? GameFaqs PC\n", "107474 Looking to get my A+ certification. GameFaqs PC\n", "107475 Gears of war 4 help plz GameFaqs PC\n", "107476 WD Easystore 4TB External Drive $120 At Best Buy GameFaqs PC\n", "107477 With Injustice 1 and MKXL being the highest se... GameFaqs PC\n", "107478 Best sandbox building game? Preferably with mu... GameFaqs PC\n", "107479 Switch to a 2.4ghz connection instead of 5ghz ... GameFaqs PC\n", "107480 Over-Ear Headphones under $150? GameFaqs PC\n", "107481 What are your opinions on possible FO3 remaster? GameFaqs PC\n", "107482 Most enjoyable fighter on pc so far? GameFaqs PC\n", "107483 Monitor recommendations? GameFaqs PC\n", "107484 What E3 games will YOU be buying? GameFaqs PC\n", "107485 Your five most played Steam games? GameFaqs PC\n", "107486 New Fire Pro Wrestling game coming to Steam GameFaqs PC\n", "107487 Oculus Go is trash right now. GameFaqs PC\n", "107488 Need help getting Sonic Heroes to work GameFaqs PC\n", "107489 Anyone ever bypass Rockstar Social s*** on Steam? GameFaqs PC\n", "107490 So does Win10 still have that mandatory update... GameFaqs PC\n", "107491 Windows 10 update broke my graphics driver. AM... GameFaqs PC\n", "107492 There is a lot of fear mongering by net neutra... GameFaqs PC\n", "107493 Fallout 76 rust clone? GameFaqs PC\n", "107494 Please help me out here. GameFaqs PC\n", "107495 Climbed from bronze 4 to gold 5 GameFaqs PC\n", "107496 Fellow 2500K users, when are you upgrading/hav... GameFaqs PC\n", "107497 About to grab the Sennheiser 598, couple last ... GameFaqs PC\n", "107498 So is it possible to do full body tracking wit... GameFaqs PC\n", "107499 Modular PSU GameFaqs PC\n", "107500 Steam dropped Bitcoin payments GameFaqs PC\n", "\n", "[107501 rows x 3 columns]" ] }, "execution_count": 292, "metadata": {}, "output_type": "execute_result" } ], "source": [ "post_titles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extracting Titles which Mention Large Game Companies\n", "\n", "First I have to make a list of relevant developers and their different nicknames." ] }, { "cell_type": "code", "execution_count": 313, "metadata": {}, "outputs": [], "source": [ "# The full name is only listed in cases like 'Activision Blizzard' together with 'Activision' and 'Blizzard'\n", "# in order to label each post in the next step\n", "developers = [['Tencent'], ['Rockstar'], ['Valve'], ['Sony'], ['Microsoft'], ['Nintendo'], ['Bungie'],\n", " ['Activision Blizzard', 'Activision', 'Activi$ion', 'Blizzard'], ['Electronic Arts', 'EA'],\n", " ['Bandai Namco', 'Bandai', 'Namco'], ['Ubisoft'], ['Nexon'], ['Telltale'], \n", " ['Epic Games', 'Epic'], ['BioWare'], ['Naughty Dog'], ['Square Enix', 'Square'], \n", " ['Bunjie'], ['Insomniac'], ['Bethesda'], ['Capcom'], ['Take-Two', 'Take Two', 'Take 2', 'Take2'], \n", " ['Sega'], ['Devolver Digital', 'Devolver'], ['Konami'], ['Apple']]" ] }, { "cell_type": "code", "execution_count": 335, "metadata": {}, "outputs": [], "source": [ "import re" ] }, { "cell_type": "code", "execution_count": 376, "metadata": {}, "outputs": [], "source": [ "dev_posts = pd.DataFrame(columns=['Post', 'Website', 'Board', 'Developer'])\n", "index = 0\n", "post_dict = {}\n", "for i in range(len(post_titles)):\n", "\n", " all_developers = []\n", " for dev in developers:\n", " for nickname in dev:\n", " \n", " # Special case for EA. Common nickname but could also be mixed with common words like \"each\".\n", " match = False\n", " if nickname == 'EA': \n", " post_title = post_titles['Post'].loc[i]\n", " \n", " # Regex to match EA outside of other words\n", " if re.match(r'([^a-zA-Z]|^)EA([^a-zA-Z]|$)', post_title):\n", " all_developers += [dev[0]]\n", " match = True\n", " \n", " else:\n", " post_title = post_titles['Post'].loc[i].lower()\n", " \n", " if nickname.lower() in post_title:\n", " all_developers += [dev[0]]\n", " match = True\n", "\n", " if match:\n", " if post_dict.get(dev[0]):\n", " post_dict[dev[0]].append(post_titles['Post'].loc[i])\n", " else:\n", " post_dict[dev[0]] = [post_titles['Post'].loc[i]]\n", " break\n", " \n", " if all_developers: \n", " row = post_titles.loc[i].values.tolist() + [', '.join(all_developers)]\n", " dev_posts.loc[index] = row\n", " index += 1" ] }, { "cell_type": "code", "execution_count": 337, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape, (8493, 4)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PostWebsiteBoardDeveloper
0Bungie Splits With ActivisionRedditr/gamesBungie, Activision Blizzard
1Sony faces growing Fortnite backlash at E3Redditr/gamesSony
2Later today, Red Dead 2 gets a new trailer. Be...Redditr/gamesRockstar, Take-Two
3Bethesda Support Leaks Fallout 76 Customer Nam...Redditr/gamesBethesda
4Ubisoft will now ban players for racist, homop...Redditr/gamesUbisoft
\n", "
" ], "text/plain": [ " Post Website Board \\\n", "0 Bungie Splits With Activision Reddit r/games \n", "1 Sony faces growing Fortnite backlash at E3 Reddit r/games \n", "2 Later today, Red Dead 2 gets a new trailer. Be... Reddit r/games \n", "3 Bethesda Support Leaks Fallout 76 Customer Nam... Reddit r/games \n", "4 Ubisoft will now ban players for racist, homop... Reddit r/games \n", "\n", " Developer \n", "0 Bungie, Activision Blizzard \n", "1 Sony \n", "2 Rockstar, Take-Two \n", "3 Bethesda \n", "4 Ubisoft " ] }, "execution_count": 337, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Shape, {}'.format(dev_posts.shape))\n", "dev_posts.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sentiment Analysis\n", "\n", "### Public Impressions of Developers\n", "\n", "Now we can finally analyze our data and figure out how well public opinion is in each of these developer's favor.\n", "
\n", "First we want to do a simple comparison based on sentiment, this will be a 3 step process:\n", "\n", "1. Gather all titles associated with each developer\n", "2. Perform sentiment analysis on each title\n", "3. Calculate the mean of the results for each developer" ] }, { "cell_type": "code", "execution_count": 340, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[nltk_data] Downloading package vader_lexicon to\n", "[nltk_data] /Users/adrianherrmann/nltk_data...\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 340, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import nltk\n", "from nltk.sentiment.vader import SentimentIntensityAnalyzer\n", "nltk.download('vader_lexicon')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to judge sentiments based on the compound score, which is the sum of all lexicon ratings standarized to be within the range from -1 to 1" ] }, { "cell_type": "code", "execution_count": 359, "metadata": {}, "outputs": [], "source": [ "dev_sentiments = pd.DataFrame(columns=['Mean Sentiment', 'Developer', \n", " 'Most Negative Sentence', 'Most Positive Sentence',\n", " 'Most Negative Score', 'Most Positive Score',\n", " 'Number of Posts'])\n", "index = 0\n", "sid = SentimentIntensityAnalyzer()\n", "for dev in developers:\n", " titles = dev_posts[dev_posts['Developer'].str.contains(dev[0])]\n", " \n", " if not titles.values.tolist():\n", " continue\n", " \n", " tot_sentiment = 0\n", " most_neg_sent = ''\n", " most_pos_sent = ''\n", " most_neg_score = 1\n", " most_pos_score = -1\n", " for title in titles['Post'].values:\n", " sentiment = sid.polarity_scores(title)['compound'] \n", " tot_sentiment += sentiment\n", " \n", " if sentiment < most_neg_score:\n", " most_neg_score = sentiment\n", " most_neg_sent = title\n", " \n", " if sentiment > most_pos_score:\n", " most_pos_score = sentiment\n", " most_pos_sent = title\n", " \n", " mean_sentiment = tot_sentiment / len(titles)\n", " \n", " dev_sentiments.loc[index] = [mean_sentiment, dev[0],\n", " most_neg_sent, most_pos_sent,\n", " most_neg_score, most_pos_score,\n", " len(titles)]\n", " index += 1\n", " \n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the sentiments are analyzed we can view the important details.\n", "
\n", "#### NOTE:\n", "It's expected that some of these posts will be wrongfully rated. For example if a game has a name with a negative word and it is mentioned within the same sentence as a developer (think Resident Evil), then the title's score will negatively affect the rating. At least for now, on a grand scheme, these analyses will average out and lean toward how they are truly perceived (given the sample size is large enough).\n", "\n", "It is important to dive deeper so that you can apply even more specific filtering and sentiment analysis when analyzing one company, which I will be doing a bit of. The post_dict created earlier will help." ] }, { "cell_type": "code", "execution_count": 361, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean SentimentDeveloperMost Negative SentenceMost Positive SentenceMost Negative ScoreMost Positive ScoreNumber of Posts
00.138989TencentDoes anyone actually play the crappy F2P games...Should superior Chinese companies like Tencent...-0.29600.81769
1-0.043847RockstarRockstar Lies & Red Dead Online Economy Is A G...Ubisoft is a BETTER company than Rockstar! LOL...-0.86250.8419167
2-0.018581ValveDead before it even released? A valve game?? A...Artifact is so good, Kotaku writer wants to re...-0.70410.814794
30.031590SonySony's Devil May Cry has arrived. Lost Souls A...Sony wins best Float at PRIDE 2018-0.86890.90081627
40.054865MicrosoftNO! BAD MICROSOFT! I'm so ashamed of you!Amazing show Microsoft!! My brother even said ...-0.91910.8798708
50.072509NintendoResident Evil, Resident Evil 0, and Resident E...Discovered a Nintendo office close to where I ...-0.93490.92733969
60.009765BungieActivision currently under investigation for f...LMAO Anthem is the exact same hustle Bungie us...-0.58590.684137
70.006517Activision BlizzardHeroes of the Storm pros vent sadness, anger a...Thank you Activision for CoD Black Ops 4 Black...-0.77830.8555183
8-0.105926Electronic ArtsEA Head Fired For Gross MisconductEA are an excellent company that provides chea...-0.77170.8316130
90.038390Bandai NamcoWTF were Namco Bandai thinking?Bandai Namco proves to be the best third party...-0.67390.784573
100.057659UbisoftUbisoft will now ban players for racist, homop...Ubisoft is a BETTER company than Rockstar! LOL...-0.82250.8419201
110.000000NexonWhat Nexon games use NX?What Nexon games use NX?0.00000.00001
12-0.043905TelltaleNo wonder Telltale Games died a slow painful d...Would Telltale Be The Best Developer If They H...-0.91180.7964109
130.028201Epic GamesGod of War Has An Epic Avengers Infinity War R...The Epic Games store is now live - giving away...-0.77170.8625189
14-0.061895BioWareDid EA ruin Bioware or did Bioware ruin itself?Former Bioware legend Mike Laidlaw praises wit...-0.82250.812260
150.051743Naughty DogNaughty Dog's lead animator explains in-depth ...I for one am proud Naughty Dog is displaying E...-0.66960.771272
160.037446Square Enixkingdom hearts 3 deluxe edition....what the HE...Did Square Enix ever state why they skipped ou...-0.87930.8788286
170.064921InsomniacSanta Monica, Gorilla, Insomniac, Sucker Punch...Ratchet and Clank PS4 is Insomniacs BEST SELLI...-0.52670.712533
180.034053BethesdaBethesda worst dev/pub of all time,nothing was...Call me crazy, but Fallout 4's the best Bethes...-0.82710.8481247
190.068994CapcomAs a Devil My Cry fan, I'm jealous of the way ...Operation Make DMC Great again is a success! T...-0.92740.8999386
20-0.113245Take-TwoTake two/Rockstar will be VERY hard to stop wi...Take2 CEO on Epic Store: Competition is a good...-0.67560.700320
210.014805SegaDamn sega!!!! Killing it this week!!Virtua Fighter 5 Final Showdown is free with G...-0.85070.8932220
220.016383Devolver DigitalNot a Hero hitting Switch Aug 2nd. 12 more Dev...So who is Devolver Digital and should I care a...-0.44490.493912
230.020019KonamiDeath Stranding will prove that Konami was rig...What is the best Konami game you've ever played?-0.74300.7650134
240.036826AppleWent shopping for apple products, it's a horri...Apple Finally Caves, promises to support Steam...-0.54230.648631
\n", "
" ], "text/plain": [ " Mean Sentiment Developer \\\n", "0 0.138989 Tencent \n", "1 -0.043847 Rockstar \n", "2 -0.018581 Valve \n", "3 0.031590 Sony \n", "4 0.054865 Microsoft \n", "5 0.072509 Nintendo \n", "6 0.009765 Bungie \n", "7 0.006517 Activision Blizzard \n", "8 -0.105926 Electronic Arts \n", "9 0.038390 Bandai Namco \n", "10 0.057659 Ubisoft \n", "11 0.000000 Nexon \n", "12 -0.043905 Telltale \n", "13 0.028201 Epic Games \n", "14 -0.061895 BioWare \n", "15 0.051743 Naughty Dog \n", "16 0.037446 Square Enix \n", "17 0.064921 Insomniac \n", "18 0.034053 Bethesda \n", "19 0.068994 Capcom \n", "20 -0.113245 Take-Two \n", "21 0.014805 Sega \n", "22 0.016383 Devolver Digital \n", "23 0.020019 Konami \n", "24 0.036826 Apple \n", "\n", " Most Negative Sentence \\\n", "0 Does anyone actually play the crappy F2P games... \n", "1 Rockstar Lies & Red Dead Online Economy Is A G... \n", "2 Dead before it even released? A valve game?? A... \n", "3 Sony's Devil May Cry has arrived. Lost Souls A... \n", "4 NO! BAD MICROSOFT! I'm so ashamed of you! \n", "5 Resident Evil, Resident Evil 0, and Resident E... \n", "6 Activision currently under investigation for f... \n", "7 Heroes of the Storm pros vent sadness, anger a... \n", "8 EA Head Fired For Gross Misconduct \n", "9 WTF were Namco Bandai thinking? \n", "10 Ubisoft will now ban players for racist, homop... \n", "11 What Nexon games use NX? \n", "12 No wonder Telltale Games died a slow painful d... \n", "13 God of War Has An Epic Avengers Infinity War R... \n", "14 Did EA ruin Bioware or did Bioware ruin itself? \n", "15 Naughty Dog's lead animator explains in-depth ... \n", "16 kingdom hearts 3 deluxe edition....what the HE... \n", "17 Santa Monica, Gorilla, Insomniac, Sucker Punch... \n", "18 Bethesda worst dev/pub of all time,nothing was... \n", "19 As a Devil My Cry fan, I'm jealous of the way ... \n", "20 Take two/Rockstar will be VERY hard to stop wi... \n", "21 Damn sega!!!! Killing it this week!! \n", "22 Not a Hero hitting Switch Aug 2nd. 12 more Dev... \n", "23 Death Stranding will prove that Konami was rig... \n", "24 Went shopping for apple products, it's a horri... \n", "\n", " Most Positive Sentence Most Negative Score \\\n", "0 Should superior Chinese companies like Tencent... -0.2960 \n", "1 Ubisoft is a BETTER company than Rockstar! LOL... -0.8625 \n", "2 Artifact is so good, Kotaku writer wants to re... -0.7041 \n", "3 Sony wins best Float at PRIDE 2018 -0.8689 \n", "4 Amazing show Microsoft!! My brother even said ... -0.9191 \n", "5 Discovered a Nintendo office close to where I ... -0.9349 \n", "6 LMAO Anthem is the exact same hustle Bungie us... -0.5859 \n", "7 Thank you Activision for CoD Black Ops 4 Black... -0.7783 \n", "8 EA are an excellent company that provides chea... -0.7717 \n", "9 Bandai Namco proves to be the best third party... -0.6739 \n", "10 Ubisoft is a BETTER company than Rockstar! LOL... -0.8225 \n", "11 What Nexon games use NX? 0.0000 \n", "12 Would Telltale Be The Best Developer If They H... -0.9118 \n", "13 The Epic Games store is now live - giving away... -0.7717 \n", "14 Former Bioware legend Mike Laidlaw praises wit... -0.8225 \n", "15 I for one am proud Naughty Dog is displaying E... -0.6696 \n", "16 Did Square Enix ever state why they skipped ou... -0.8793 \n", "17 Ratchet and Clank PS4 is Insomniacs BEST SELLI... -0.5267 \n", "18 Call me crazy, but Fallout 4's the best Bethes... -0.8271 \n", "19 Operation Make DMC Great again is a success! T... -0.9274 \n", "20 Take2 CEO on Epic Store: Competition is a good... -0.6756 \n", "21 Virtua Fighter 5 Final Showdown is free with G... -0.8507 \n", "22 So who is Devolver Digital and should I care a... -0.4449 \n", "23 What is the best Konami game you've ever played? -0.7430 \n", "24 Apple Finally Caves, promises to support Steam... -0.5423 \n", "\n", " Most Positive Score Number of Posts \n", "0 0.8176 9 \n", "1 0.8419 167 \n", "2 0.8147 94 \n", "3 0.9008 1627 \n", "4 0.8798 708 \n", "5 0.9273 3969 \n", "6 0.6841 37 \n", "7 0.8555 183 \n", "8 0.8316 130 \n", "9 0.7845 73 \n", "10 0.8419 201 \n", "11 0.0000 1 \n", "12 0.7964 109 \n", "13 0.8625 189 \n", "14 0.8122 60 \n", "15 0.7712 72 \n", "16 0.8788 286 \n", "17 0.7125 33 \n", "18 0.8481 247 \n", "19 0.8999 386 \n", "20 0.7003 20 \n", "21 0.8932 220 \n", "22 0.4939 12 \n", "23 0.7650 134 \n", "24 0.6486 31 " ] }, "execution_count": 361, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dev_sentiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### EA and Nintendo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's take a deeper look at a couple of companies with scores on two opposite ends of the spectrum, Electronic Arts and Nintendo. These two have the second worst and second best scores respectively, but they also have plenty of posts, which the developers with the worst and best scores (Take-Two, 20 posts and Tencent, 9 posts) don't have." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Nintendo:" ] }, { "cell_type": "code", "execution_count": 364, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean SentimentDeveloperMost Negative SentenceMost Positive SentenceMost Negative ScoreMost Positive ScoreNumber of Posts
50.072509NintendoResident Evil, Resident Evil 0, and Resident E...Discovered a Nintendo office close to where I ...-0.93490.92733969
\n", "
" ], "text/plain": [ " Mean Sentiment Developer \\\n", "5 0.072509 Nintendo \n", "\n", " Most Negative Sentence \\\n", "5 Resident Evil, Resident Evil 0, and Resident E... \n", "\n", " Most Positive Sentence Most Negative Score \\\n", "5 Discovered a Nintendo office close to where I ... -0.9349 \n", "\n", " Most Positive Score Number of Posts \n", "5 0.9273 3969 " ] }, "execution_count": 364, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']" ] }, { "cell_type": "code", "execution_count": 367, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nintendo's most negative sentence:\n", "Resident Evil, Resident Evil 0, and Resident Evil 4 coming to Nintendo Switch in 2019\n", "\n", "Nintendo's most positive sentence:\n", "Discovered a Nintendo office close to where I live and asked if they had any kind of tour or something. Lady told me they hadn’t but she handed me a bag full of cool souvenirs. This coin is definitely the best of all!\n", "\n" ] } ], "source": [ "print('Nintendo\\'s most negative sentence:\\n' +\n", " dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']['Most Negative Sentence'].values[0] + '\\n')\n", "print('Nintendo\\'s most positive sentence:\\n' +\n", " dev_sentiments[dev_sentiments['Developer'] == 'Nintendo']['Most Positive Sentence'].values[0] + '\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For Nintendo it looks like the worst post, which is in fact the most negatively rated post of all threads across all developers, is rated so because it mentions the game \"Resident Evil\" multiple times. This only testifies for their high overall score.\n", "

\n", "Nintendo being so well liked comes to no surprise. They without a doubt have the most devout following of any modern gaming company. So many people grew up on Nintendo as children and continue to play their games as adults, many even strictly stick to Nintendo.\n", "

\n", "Let's get the word frequencies from Nintendo posts." ] }, { "cell_type": "code", "execution_count": 380, "metadata": {}, "outputs": [], "source": [ "from collections import Counter" ] }, { "cell_type": "code", "execution_count": 414, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Most common neutral words in negative titles: [('nintendo', 187), ('the', 71), ('to', 52), ('is', 46), ('switch', 35), ('a', 32), ('of', 29), ('and', 26), ('for', 26), ('why', 26), ('in', 19), (\"nintendo's\", 19), ('on', 19), ('online', 19), ('you', 19), ('it', 16), ('has', 15), ('have', 14), ('do', 14), ('that', 13), ('so', 13), ('i', 12), ('are', 12), ('what', 11), ('-', 10), ('up', 10), ('games', 10), ('with', 10), ('this', 10), ('will', 9), ('sony', 9), ('if', 9), ('console', 9), ('does', 8), ('e3', 8), ('be', 8), ('an', 7), ('about', 7), ('at', 7), ('get', 7), ('nintendo?', 7), ('game', 7), ('think', 6), ('was', 6), ('or', 6), ('would', 6), ('not', 6), ('how', 6), ('most', 6), ('did', 6)]\n" ] } ], "source": [ "neg_titles = []\n", "for title in post_dict['Nintendo']:\n", " sentiment = sid.polarity_scores(title)['compound']\n", " \n", " if sentiment <= -0.5:\n", " neg_titles.append(title)\n", "\n", "\n", "nintendo_words = ' '.join(neg_titles).split(' ')\n", "\n", "neu_words=[]\n", "\n", "for word in nintendo_words:\n", " sentiment = sid.polarity_scores(word)['compound']\n", " \n", " if (sentiment >= -0.4 and sentiment <= 0.4):\n", " neu_words.append(word.lower())\n", "\n", "neu_freq = Counter(neu_words)\n", "\n", "print('Most common neutral words in negative titles: ', neu_freq.most_common(50))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Going down the list we see some common words, but then notice one which should definitely not be common:\n", "
\n", "'Online'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some sentences containing online in the titles with negative sentiments, there are 19 posts total." ] }, { "cell_type": "code", "execution_count": 416, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Jim Sterling: The Online system makes nintendo look weak and stupid\n", "Nintendo Switch Paid Online Still a Disaster? - Nintendo Direct Review\n", "So nintendo online was a scam\n", "Everytime I finish a mission in Resident Evil a Nintendo Online message appears\n", "Would it of killed Nintendo to add promotion SNES titles to new Online Subs?\n", "Nintendo's paid online is bad. FACT.\n", "scumbag nintendo wont let me try the darksouls demo without online.\n", "Will Nintendo Switch Online kill multiplayer lobbies?\n", "Nintendo would be dumb to not have an online paywall TBH.\n", "What the hell does Nintendo online even include?\n" ] } ], "source": [ "index = 0\n", "bad_count = 0\n", "while bad_count < 10:\n", " if 'online' in neg_titles[index].lower():\n", " print(neg_titles[index])\n", " bad_count += 1\n", " index += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this it's obvious that one very critical complaint of Nintendo is the Online system they have in place. If there was one thing they could do to please their base, it would be to address the paywall and offer more with their online subscription (i.e. it's lackluster). We can figure this all out just based on these posts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Electronic Arts:" ] }, { "cell_type": "code", "execution_count": 363, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Mean SentimentDeveloperMost Negative SentenceMost Positive SentenceMost Negative ScoreMost Positive ScoreNumber of Posts
8-0.105926Electronic ArtsEA Head Fired For Gross MisconductEA are an excellent company that provides chea...-0.77170.8316130
\n", "
" ], "text/plain": [ " Mean Sentiment Developer Most Negative Sentence \\\n", "8 -0.105926 Electronic Arts EA Head Fired For Gross Misconduct \n", "\n", " Most Positive Sentence Most Negative Score \\\n", "8 EA are an excellent company that provides chea... -0.7717 \n", "\n", " Most Positive Score Number of Posts \n", "8 0.8316 130 " ] }, "execution_count": 363, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']" ] }, { "cell_type": "code", "execution_count": 369, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Electronic Arts's most negative sentence:\n", "EA Head Fired For Gross Misconduct\n", "\n", "Electronic Arts's most positive sentence:\n", "EA are an excellent company that provides cheap access to a lot of great games\n", "\n" ] } ], "source": [ "print('Electronic Arts\\'s most negative sentence:\\n' +\n", " dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']['Most Negative Sentence'].values[0] + '\\n')\n", "print('Electronic Arts\\'s most positive sentence:\\n' +\n", " dev_sentiments[dev_sentiments['Developer'] == 'Electronic Arts']['Most Positive Sentence'].values[0] + '\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike Nintendo, many people online really hate EA. In fact, they have the most downvoted comment of any post in Reddit history, which should be a testament to how negatively they are seen. Still, they continue to be pretty successful. Apex Legends, a new game they recently released, seems to be gaining rapid popularity. Public opinion on the way they monetize their games seems to be changing, which may be a good indicator that people will once again have a positive attitude towards the company.\n", "

\n", "Similarly I am going to check EA's sentiment frequency." ] }, { "cell_type": "code", "execution_count": 418, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Most common neutral words in negative titles: [('ea', 21), ('for', 9), ('star', 6), ('game', 6), ('is', 6), ('of', 4), ('open', 4), ('world', 4), ('hiring', 3), ('under', 3), ('investigation', 3), ('to', 3), ('the', 3), ('cancels', 2), ('open-world', 2), ('an', 2), ('anthem', 2), ('video', 2), ('removed', 2), ('another', 2), ('sell', 2), ('lootboxes', 2), ('in', 2), ('belgium', 2), ('games', 2), ('are', 2), ('stocks', 2), ('by', 2), ('bfv', 2), ('sales', 2), ('people', 2), ('ea:', 1), ('youtube', 1), (\"creator's\", 1), ('disclosure', 1), ('not', 1), ('content', 1), ('head', 1), ('misconduct', 1), ('zelda', 1), ('botw', 1), ('mass', 1), (\"effect's\", 1), ('franchise', 1), ('continuing', 1), ('automatically', 1), ('loses.', 1), ('conference.', 1), ('and', 1), ('got', 1), ('downgraded', 1), ('says', 1), ('singleplayer', 1), ('god', 1), ('goty.', 1), ('massive', 1), ('blow', 1), ('&', 1), ('plummet!', 1), ('should', 1), ('space', 1), ('ip', 1), ('falling', 1), ('apart.', 1), ('we', 1), ('plummet', 1), ('$21', 1), ('billion.', 1), ('blame', 1), (\"ea's\", 1), ('cancelled', 1), ('vancouver', 1), ('execs', 1), ('dump', 1), ('millions', 1)]\n" ] } ], "source": [ "neg_titles = []\n", "for title in post_dict['Electronic Arts']:\n", " sentiment = sid.polarity_scores(title)['compound']\n", " \n", " if sentiment <= -0.5:\n", " neg_titles.append(title)\n", "\n", "\n", "ea_words = ' '.join(neg_titles).split(' ')\n", "\n", "neu_words=[]\n", "\n", "for word in ea_words:\n", " sentiment = sid.polarity_scores(word)['compound']\n", " \n", " if (sentiment >= -0.4 and sentiment <= 0.4):\n", " neu_words.append(word.lower())\n", "\n", "neu_freq = Counter(neu_words)\n", "\n", "print('Most common neutral words in negative titles: ', neu_freq.most_common(75))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because EA has a smaller sample size we should look at multiple words to get some more intuition." ] }, { "cell_type": "code", "execution_count": 421, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "EA Cancels Open-World Star Wars Game\n", "EA Is Hiring For An Open-World Star Wars Game\n", "EA: YouTube creator's Anthem video removed for disclosure failure, not content\n", "EA is under criminal investigation for continuing to sell Lootboxes in Belgium\n", "EA Automatically loses. Horrible Conference. And Anthem got DOWNGRADED\n", "EA's Open World Star Wars Game Cancelled\n", "EA Vancouver hiring for open world Star Wars game\n", "EA is hiring people for an open world star wars game...\n", "EA cancels open world Star Wars game\n", "EA is under criminal investigation by the Belgium government for FIFA lootboxes\n" ] } ], "source": [ "index = 0\n", "bad_count = 0\n", "while bad_count < 10:\n", " if 'star' in neg_titles[index].lower() \\\n", " or 'anthem' in neg_titles[index].lower() \\\n", " or 'lootboxes' in neg_titles[index].lower():\n", " \n", " print(neg_titles[index])\n", " bad_count += 1\n", " index += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most critical complaints are on anthem and lootboxes while the negative sentiment with Star Wars seems to be more of a disappointment that a game was cancelled given the several posts on the topic. Again, the frequency size is small because the number of posts wasn't much but we can still extract a good amount of information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final Remarks\n", "\n", "There is plenty more that can be done, like getting more data from the comments. This would give much more input and allow us to view even more opinionated posts, meaning a better consensus of how people feel about different companies. This is at least a taste of what can be done" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }