{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook is the data munging part of the visualization of the interconnectedness of my top 30\\* most edited articles on Wikipedia (I go by [Resident Mario](https://en.wikipedia.org/wiki/User:Resident_Mario) on the encyclopedia), as reported by IBM Watson's [Concept Insight](http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/concept-insights.html) API service. The data is scraped from the [Supercount Wikimedia Lab tool](https://tools.wmflabs.org/supercount/) with `requests` and `beautifulsoup`, interwoven using `watsongraph`, and visualized using `d3.js`.\n", "\n", "The techniques here could eventually be easily applied to any editor! A [widget](https://github.com/jdfreder/ipython-d3networkx/blob/master/examples/demo%20simple.ipynb) for visualizing any editor's top articles is forthcoming once `watsongraph` makes it to the `0.3.0` release.\n", "\n", "\\* The cutoff is due to a [technical limitation](https://github.com/ResidentMario/watsongraph/issues/8)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from watsongraph.conceptmodel import ConceptModel\n", "from watsongraph.node import conceptualize\n", "import json\n", "import requests\n", "import bs4\n", "\n", "def get_top_thirty_articles(username):\n", " \"\"\"\n", " Performs a raw call to the Supercount edit counter, and parses the response to get at a list of links\n", " on that page.\n", " Output looks like this:\n", " [