{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Which city makes the most hip hop, relative to population?\n", "\n", "First, load up the data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import sqlite3\n", "import pandas as pd\n", "from plotly import tools\n", "from plotly.offline import init_notebook_mode, iplot\n", "from plotly.graph_objs import Bar, Scatter, Figure, Layout\n", "init_notebook_mode()\n", "\n", "# start connection to sqlite data, grab data, then close connection\n", "con = sqlite3.connect('data.db')\n", "tags = pd.read_sql_query('SELECT * from tags;', con)\n", "cities = pd.read_sql_query('SELECT * from cities;', con)\n", "con.close()\n", "\n", "# first, get the overall frequency of each tag\n", "tagfrequency = tags.tag.value_counts()\n", "\n", "# make tag frequency a dataframe, clean it up\n", "tagfrequency = tagfrequency.to_frame(name = 'frequency')\n", "tagfrequency['city'] = tagfrequency.index\n", "tagfrequency.reset_index(inplace = True, drop = True)\n", "\n", "# second, merge the tag frequency with each city in the database\n", "merged = pd.merge(cities, tagfrequency, on = 'city')\n", "merged['ratio'] = merged.frequency / (merged.population / 100000.0) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tag Frequency v.s. 2010 Population" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sh = Scatter(x = merged.population, \n", " y = merged.frequency, \n", " mode = 'markers', \n", " text = merged.city, \n", " hoverinfo = 'text',\n", " marker = dict(\n", " size = 12,\n", " color = 'rgba(0, 0, 152, .5)',\n", " )\n", ")\n", "\n", "layout = Layout(\n", " xaxis = dict(\n", " type='log',\n", " title = 'Population (Log Scale)'\n", " ),\n", " yaxis = dict(title = 'What.CD Tag Frequency'),\n", " hovermode = 'closest',\n", " showlegend=False\n", ")\n", "\n", "fh = Figure(data=[sh], layout = layout)\n", "iplot(fh)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Most Productive Cities\n", "\n", "Unsurprisingly, the biggest cities are also the ones that are most frequently tagged (though note Los Angeles has a particularly poor showing: Houston, a city 1/3 its size, is making considerably more hip hop). The real question is about which cities make a surprising amount of hip hop, given their population:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is the format of your plot grid:\n", "[ (1,1) x1,y1 ] [ (1,2) x2,y1 ]\n", "\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "merged = merged.sort_values(by = 'ratio', ascending = False)\n", "\n", "# plot scatter of ratio against population\n", "plotdata = merged.iloc[0:50,:]\n", "sh = Scatter(\n", " x = plotdata.population, \n", " y = plotdata.ratio, \n", " mode = 'markers', \n", " text = plotdata.city, \n", " hoverinfo = 'text',\n", " marker = dict(\n", " size = 12, color = 'rgba(0, 0, 152, .5)',\n", " )\n", ")\n", "\n", "\n", "# plot bars for top 15 cities\n", "plotdata = merged.iloc[0:15,:]\n", "bh = Bar(\n", " x = plotdata.city, \n", " y = plotdata.ratio, \n", " hoverinfo = 'x+y'\n", ")\n", "\n", "\n", "# layout\n", "fh = tools.make_subplots(rows = 1, cols = 2, \n", " shared_yaxes = True,\n", " horizontal_spacing = 0.02,\n", " subplot_titles = ['Top 50 Cities','Top 10 Cities'])\n", "fh['layout']['showlegend'] = False\n", "fh['layout']['hovermode'] = 'closest'\n", "fh['layout']['xaxis1'].update(title='2010 Population (Log Scale)', type='log')\n", "fh['layout']['yaxis'].update(title='Tags Per 100k People')\n", "\n", "fh.append_trace(sh,1,1)\n", "fh.append_trace(bh,1,2)\n", "\n", "iplot(fh)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Cash Money Records](http://www.cashmoney-records.com/) has been super prolific! Per person, New Orleans is leader in hip hop, with Atlanta not too far behind. Overall, the ranking is dominated by big (but not *huge*) cities." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 1 }