{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Comparing Post Response Volumes on Hacker News" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "This project will look at a 20,000 randomly selected posts on Hacker News. Of those, posts that are in either the \"Ask HN\" or the \"Show HN\" categories will be analyzed to see which subset generates the most comments. We will also look for any correlation between number of comments and the time the post is published.\n", "The only selection criteria used was that all posts must have received comments." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']\n", "[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', \"Florida DJs May Face Felony for April Fools' Water Joke\", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]\n" ] } ], "source": [ "# accessing the data\n", "import csv\n", "from csv import reader\n", "opened_file = open('hacker_news.csv')\n", "read_file = reader(opened_file)\n", "data_file = list(read_file)\n", "hn_header = data_file[0]\n", "hn = data_file[1:]\n", "print(hn_header)\n", "print(hn[:5])" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1744\n", "1162\n", "18938\n" ] } ], "source": [ "# separating the posts into three sub-groups\n", "ask_posts = []\n", "show_posts = []\n", "other_posts = []\n", "\n", "for row in hn:\n", " title = row[1]\n", " title = title.lower()\n", " if title.startswith('ask hn'):\n", " ask_posts.append(row)\n", " if title.startswith('show hn'):\n", " show_posts.append(row)\n", " else:\n", " other_posts.append(row)\n", " \n", "print(len(ask_posts))\n", "print(len(show_posts))\n", "print(len(other_posts))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "14.038417431192661\n", "10.31669535283993\n" ] } ], "source": [ "total_ask_comments = 0 # determining the average number of comments for the \"Ask HN\" category\n", "for row in ask_posts:\n", " num_comments = int(row[4])\n", " total_ask_comments += num_comments \n", "avg_ask_comments = total_ask_comments/len(ask_posts)\n", "print(avg_ask_comments)\n", "\n", "total_show_comments = 0 # determining the average number of comments for the \"Show HN\" category\n", "for row in show_posts:\n", " num_comments = int(row[4])\n", " total_show_comments += num_comments\n", "average_show_comments = total_show_comments/len(show_posts)\n", "print(average_show_comments)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparison of Ask HN and Show HN\n", "* There is an average of 14.0 comments per post in the Ask HN group. In the Show HN group the average is about 10.3 comments per post.\n", "* This is a 35% increase in the volume of comments for Ask HN.\n", "* This may not be too surprising, since asking is a process that inherently illicits a response." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import datetime as dt\n", "\n", "result_list = []\n", "for row in ask_posts:\n", " created_at = row[6] # really just interested in the hour here\n", " num_comments = int(row[4])\n", " result_list.append([created_at, num_comments])\n", " \n", "counts_by_hour = {} # this is the number of posts_per_hour\n", "comments_by_hour = {}\n", "\n", "for row in result_list:\n", " post_time_str = row[0]\n", " comment_num = row[1]\n", " post_time_dt = dt.datetime.strptime(post_time_str, \"%m/%d/%Y %H:%M\")\n", " post_time_hour = post_time_dt.hour\n", " if post_time_hour not in counts_by_hour:\n", " counts_by_hour[post_time_hour] = 1\n", " comments_by_hour[post_time_hour] = comment_num\n", " else:\n", " counts_by_hour[post_time_hour] += 1\n", " comments_by_hour[post_time_hour] += comment_num" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "avg_by_hour = []\n", "\n", "for hour in counts_by_hour:\n", " avg_by_hour.append([hour, (comments_by_hour[hour]/counts_by_hour[hour])])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Top 5 Hours for Ask Post Comments\n", "Adjusted to Pacific Time\n", "12:00: 38.59 average comments per post.\n", "23:00: 23.81 average comments per post.\n", "17:00: 21.52 average comments per post.\n", "13:00: 16.80 average comments per post.\n", "18:00: 16.01 average comments per post.\n" ] } ], "source": [ "swap_avg_by_hour = []\n", "\n", "for row in avg_by_hour:\n", " hour = row[0]\n", " avg = row[1]\n", " temp_list = []\n", " temp_list.append(avg)\n", " temp_list.append(hour)\n", " swap_avg_by_hour.append(temp_list)\n", "\n", "sorted_swap = sorted(swap_avg_by_hour, key=lambda x:x[0], reverse=True)\n", "print(\"Top 5 Hours for Ask Post Comments\")\n", "print(\"Adjusted to Pacific Time\")\n", "for row in sorted_swap[:5]:\n", " hour = str(row[1])\n", " comments = row[0]\n", " hour_dt = dt.datetime.strptime(hour, \"%H\")\n", " hour_dt = hour_dt - dt.timedelta(hours = 3)\n", " hour_dt = hour_dt.strftime(\"%H:%M\")\n", " print(\"{}: {:.2f} average comments per post.\".format(hour_dt, comments))\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "* Noon and eleven o'clock at night are the times with the highest average number of comments per post in the Ask HN category.\n", "* It may be worth considering the five p.m. and six p.m. slots as well. They both have high average numbers and the wider time span may meet some specifications." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }