{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Ranking Subreddits by Comments, Authors and Comment/Author Ratios\n", "\n", "A month ago Redditor [/u/Stuck_In_the_Matrix](https://www.reddit.com/user/Stuck_In_the_Matrix) released a [huge dataset of Reddit comments](https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/) with more than 1.7 billion records. Redditor [/u/fhoffa](https://www.reddit.com/user/fhoffa) ([Felipe Hoffa](https://twitter.com/felipehoffa)) made this dataset [available via Google Big Query](https://www.reddit.com/r/bigquery/comments/3cej2b/17_billion_reddit_comments_loaded_on_bigquery/).\n", "\n", "Felipe provided several query examples and also created a table of [Subreddit ranks for May 2015](https://bigquery.cloud.google.com/table/fh-bigquery:reddit_comments.subr_rank_201505). This table includes comment and author counts aggregated from more than 54 million comments posted in that month. You need a Google account with billing enabled to download this dataset.\n", "\n", "In this notebook we'll quickly dive into this table and create a few charts ranking Subreddits by number of comments, authors and comments by authors. This particular table will also be used in [future notebooks](https://ramiro.org/notebook/rss.xml) to be able to calculate values relative to the total number of comments or authors within a Subreddit." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | comments | \n", "rank_comments | \n", "authors | \n", "rank_authors | \n", "
---|---|---|---|---|
subreddit | \n", "\n", " | \n", " | \n", " | \n", " |
AskReddit | \n", "3883198 | \n", "1 | \n", "570722 | \n", "1 | \n", "
leagueoflegends | \n", "1148203 | \n", "2 | \n", "119316 | \n", "7 | \n", "
nba | \n", "704755 | \n", "3 | \n", "45029 | \n", "22 | \n", "
funny | \n", "690898 | \n", "4 | \n", "224069 | \n", "2 | \n", "
pics | \n", "564366 | \n", "5 | \n", "205298 | \n", "3 | \n", "