{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Which city makes the most hip hop, relative to population?\n",
"\n",
"First, load up the data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import sqlite3\n",
"import pandas as pd\n",
"from plotly import tools\n",
"from plotly.offline import init_notebook_mode, iplot\n",
"from plotly.graph_objs import Bar, Scatter, Figure, Layout\n",
"init_notebook_mode()\n",
"\n",
"# start connection to sqlite data, grab data, then close connection\n",
"con = sqlite3.connect('data.db')\n",
"tags = pd.read_sql_query('SELECT * from tags;', con)\n",
"cities = pd.read_sql_query('SELECT * from cities;', con)\n",
"con.close()\n",
"\n",
"# first, get the overall frequency of each tag\n",
"tagfrequency = tags.tag.value_counts()\n",
"\n",
"# make tag frequency a dataframe, clean it up\n",
"tagfrequency = tagfrequency.to_frame(name = 'frequency')\n",
"tagfrequency['city'] = tagfrequency.index\n",
"tagfrequency.reset_index(inplace = True, drop = True)\n",
"\n",
"# second, merge the tag frequency with each city in the database\n",
"merged = pd.merge(cities, tagfrequency, on = 'city')\n",
"merged['ratio'] = merged.frequency / (merged.population / 100000.0) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tag Frequency v.s. 2010 Population"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sh = Scatter(x = merged.population, \n",
" y = merged.frequency, \n",
" mode = 'markers', \n",
" text = merged.city, \n",
" hoverinfo = 'text',\n",
" marker = dict(\n",
" size = 12,\n",
" color = 'rgba(0, 0, 152, .5)',\n",
" )\n",
")\n",
"\n",
"layout = Layout(\n",
" xaxis = dict(\n",
" type='log',\n",
" title = 'Population (Log Scale)'\n",
" ),\n",
" yaxis = dict(title = 'What.CD Tag Frequency'),\n",
" hovermode = 'closest',\n",
" showlegend=False\n",
")\n",
"\n",
"fh = Figure(data=[sh], layout = layout)\n",
"iplot(fh)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Most Productive Cities\n",
"\n",
"Unsurprisingly, the biggest cities are also the ones that are most frequently tagged (though note Los Angeles has a particularly poor showing: Houston, a city 1/3 its size, is making considerably more hip hop). The real question is about which cities make a surprising amount of hip hop, given their population:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is the format of your plot grid:\n",
"[ (1,1) x1,y1 ] [ (1,2) x2,y1 ]\n",
"\n"
]
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"merged = merged.sort_values(by = 'ratio', ascending = False)\n",
"\n",
"# plot scatter of ratio against population\n",
"plotdata = merged.iloc[0:50,:]\n",
"sh = Scatter(\n",
" x = plotdata.population, \n",
" y = plotdata.ratio, \n",
" mode = 'markers', \n",
" text = plotdata.city, \n",
" hoverinfo = 'text',\n",
" marker = dict(\n",
" size = 12, color = 'rgba(0, 0, 152, .5)',\n",
" )\n",
")\n",
"\n",
"\n",
"# plot bars for top 15 cities\n",
"plotdata = merged.iloc[0:15,:]\n",
"bh = Bar(\n",
" x = plotdata.city, \n",
" y = plotdata.ratio, \n",
" hoverinfo = 'x+y'\n",
")\n",
"\n",
"\n",
"# layout\n",
"fh = tools.make_subplots(rows = 1, cols = 2, \n",
" shared_yaxes = True,\n",
" horizontal_spacing = 0.02,\n",
" subplot_titles = ['Top 50 Cities','Top 10 Cities'])\n",
"fh['layout']['showlegend'] = False\n",
"fh['layout']['hovermode'] = 'closest'\n",
"fh['layout']['xaxis1'].update(title='2010 Population (Log Scale)', type='log')\n",
"fh['layout']['yaxis'].update(title='Tags Per 100k People')\n",
"\n",
"fh.append_trace(sh,1,1)\n",
"fh.append_trace(bh,1,2)\n",
"\n",
"iplot(fh)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Cash Money Records](http://www.cashmoney-records.com/) has been super prolific! Per person, New Orleans is leader in hip hop, with Atlanta not too far behind. Overall, the ranking is dominated by big (but not *huge*) cities."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}