{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring the Top Incomes Database with Pandas and Matplotlib\n", "\n", "**Author:** [Ramiro Gómez](https://ramiro.org/)\n", "\n", "The [World Top Incomes Database](http://topincomes.g-mond.parisschoolofeconomics.eu/) originated from research by Thomas Piketty on the distribution of top incomes in France in 2001 and has since then gathered information for more than 20 countries generating a large volume of data, intended as a resource for further analysis and research. The database is compiled and maintained by Facundo Alvaredo, Tony Atkinson, Thomas Piketty and Emmanuel Saez.\n", "\n", "The income data being explored in this notebook was downloaded on July 25, 2015. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%load_ext signature\n", "%matplotlib inline\n", "\n", "import itertools\n", "import math\n", "\n", "import pandas as pd\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "\n", "plt.style.use('ramiro')\n", "\n", "df = pd.read_excel('csv/top-incomes.xlsx', 1, skiprows=1)\n", "\n", "chartinfo = 'Author: Ramiro Gómez - ramiro.org • Data: World Top Incomes Database - parisschoolofeconomics.eu'\n", "infosize = 13" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring the dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | Country | \n", "Year | \n", "Top 10% income share | \n", "Top 10% income share-LAD | \n", "Top 10% income share-married couples & single adults | \n", "Top 10% income share-adults | \n", "Top 10% income share-tax data | \n", "Top 10% income share-IDS | \n", "Top 5% income share | \n", "Top 5% income share-LAD | \n", "... | \n", "Top 0.5-0.1% average income-including capital gains | \n", "Top 0.1-0.01% average income-including capital gains | \n", "P90 income threshold-including capital gains | \n", "P95 income threshold-including capital gains | \n", "P99 income threshold-including capital gains | \n", "P99.5 income threshold-including capital gains | \n", "P99.9 income threshold-including capital gains | \n", "P99.99 income threshold-including capital gains | \n", "Pareto-Lorenz coefficient | \n", "Inverted Pareto-Lorenz coefficient | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Argentina | \n", "1932 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.659 | \n", "2.517 | \n", "
1 | \n", "Argentina | \n", "1933 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.674 | \n", "2.484 | \n", "
2 | \n", "Argentina | \n", "1934 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.652 | \n", "2.534 | \n", "
3 | \n", "Argentina | \n", "1935 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.655 | \n", "2.526 | \n", "
4 | \n", "Argentina | \n", "1936 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.723 | \n", "2.382 | \n", "
5 rows × 412 columns
\n", "