{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# HIDDEN\n", "import numpy as np\n", "from datascience import *\n", "np.set_printoptions(threshold=50)\n", "\n", "url = 'http://www.census.gov/popest/data/national/asrh/2014/files/NC-EST2014-AGESEX-RES.csv'\n", "full = Table.read_table(url)\n", "partial = full.select(['SEX', 'AGE', 4, 8])\n", "simple = partial.relabeled(2, '2010').relabeled(3, '2014')\n", "census = simple.with_columns([\n", " 'Change', simple.column('2014')-simple.column('2010'),\n", " 'Growth', (simple.column('2014')/simple.column('2010')) ** (1/4) - 1\n", " ])\n", "census.set_format('Growth', PercentFormatter)\n", "census.set_format([2, 3, 4], NumberFormatter)\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section, we will continue to use the US Census dataset. We will focus only on the 2014 population estimate." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SEX AGE 2014
0 0 3,948,350
0 1 3,962,123
0 2 3,957,772
0 3 4,005,190
0 4 4,003,448
0 5 4,004,858
0 6 4,134,352
0 7 4,154,000
0 8 4,119,524
0 9 4,106,832
\n", "

... (293 rows omitted)\n", " \n", " \n", " Male/Female Age Group Population\n", " \n", " \n", " \n", " \n", " Total Baby 3948350 \n", " \n", " \n", " \n", " Total Baby 3962123 \n", " \n", " \n", " \n", " Total Child 3957772 \n", " \n", " \n", " \n", " Total Child 4005190 \n", " \n", " \n", " \n", " Total Child 4003448 \n", " \n", " \n", " \n", " Total Child 4004858 \n", " \n", " \n", " \n", " Total Child 4134352 \n", " \n", " \n", " \n", " Total Child 4154000 \n", " \n", " \n", " \n", " Total Child 4119524 \n", " \n", " \n", " \n", " Total Child 4106832 \n", " \n", " \n", "\n", "

... (293 rows omitted)\n", " \n", " \n", " Age Group count\n", " \n", " \n", " \n", " \n", " Adult 243 \n", " \n", " \n", " \n", " Baby 6 \n", " \n", " \n", " \n", " Child 33 \n", " \n", " \n", " \n", " Teen 21 \n", " \n", " \n", "" ], "text/plain": [ "Age Group | count\n", "Adult | 243\n", "Baby | 6\n", "Child | 33\n", "Teen | 21" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "population.group('Age Group')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The optional second argument names the function that will be used to aggregate values in other columns for all of those rows. For instance, `sum` will sum up the populations in all rows that match each category. This result also contains one row per unique value in the grouped column, but it has the same number of columns as the original table." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Age Group Population
Baby 3948350
Baby 3962123
Child 3957772
Child 4005190
Child 4003448
Child 4004858
Child 4134352
Child 4154000
Child 4119524
Child 4106832
\n", "

... (91 rows omitted)\n", " \n", " \n", " Age Group Population sum\n", " \n", " \n", " \n", " \n", " Adult 236721454 \n", " \n", " \n", " \n", " Baby 7910473 \n", " \n", " \n", " \n", " Child 44755656 \n", " \n", " \n", " \n", " Teen 29469473 \n", " \n", " \n", "" ], "text/plain": [ "Age Group | Population sum\n", "Adult | 236721454\n", "Baby | 7910473\n", "Child | 44755656\n", "Teen | 29469473" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "totals.group('Age Group', sum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `groups` method behaves in the same way, but accepts a list of columns as its first argument. The resulting table has one row for every *unique combination* of values that appear together in the grouped columns. Again, a single argument (a list, in this case) gives row counts." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Male/Female Age Group count
Female Adult 81
Female Baby 2
Female Child 11
Female Teen 7
Male Adult 81
Male Baby 2
Male Child 11
Male Teen 7
Total Adult 81
Total Baby 2
\n", "

... (2 rows omitted)\n", " \n", " \n", " Male/Female Age Group Population sum\n", " \n", " \n", " \n", " \n", " Female Adult 121754366 \n", " \n", " \n", " \n", " Female Baby 3869363 \n", " \n", " \n", " \n", " Female Child 21903805 \n", " \n", " \n", " \n", " Female Teen 14393035 \n", " \n", " \n", " \n", " Male Adult 114967088 \n", " \n", " \n", " \n", " Male Baby 4041110 \n", " \n", " \n", " \n", " Male Child 22851851 \n", " \n", " \n", " \n", " Male Teen 15076438 \n", " \n", " \n", " \n", " Total Adult 236721454 \n", " \n", " \n", " \n", " Total Baby 7910473 \n", " \n", " \n", "\n", "

... (2 rows omitted)\n", " \n", " \n", " Age Group Female Male Total\n", " \n", " \n", " \n", " \n", " Adult 81 81 81 \n", " \n", " \n", " \n", " Baby 2 2 2 \n", " \n", " \n", " \n", " Child 11 11 11 \n", " \n", " \n", " \n", " Teen 7 7 7 \n", " \n", " \n", "" ], "text/plain": [ "Age Group | Female | Male | Total\n", "Adult | 81 | 81 | 81\n", "Baby | 2 | 2 | 2\n", "Child | 11 | 11 | 11\n", "Teen | 7 | 7 | 7" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "population.pivot('Male/Female', 'Age Group')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An optional third argument indicates a column of values that will replace the counts in each cell of the grid. The fourth argument indicates how to aggregate all of the values that match the combination of column and row values." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Age Group Female Male Total
Adult 121754366 114967088 236721454
Baby 3869363 4041110 7910473
Child 21903805 22851851 44755656
Teen 14393035 15076438 29469473
" ], "text/plain": [ "Age Group | Female | Male | Total\n", "Adult | 121754366 | 114967088 | 236721454\n", "Baby | 3869363 | 4041110 | 7910473\n", "Child | 21903805 | 22851851 | 44755656\n", "Teen | 14393035 | 15076438 | 29469473" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pivoted = population.pivot('Male/Female', 'Age Group', 'Population', sum)\n", "pivoted" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The advantage of pivot is that it places grouped values into adjacent columns, so that they can be combined. For instance, this pivoted table allows us to compute the proportion of each age group that is male. We find the surprising result that younger age groups are predominantly male, but among adults there are substantially more females." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Age Group Female Male Total Male Percentage
Adult 121754366 114967088 236721454 48.57%
Baby 3869363 4041110 7910473 51.09%
Child 21903805 22851851 44755656 51.06%
Teen 14393035 15076438 29469473 51.16%
" ], "text/plain": [ "Age Group | Female | Male | Total | Male Percentage\n", "Adult | 121754366 | 114967088 | 236721454 | 48.57%\n", "Baby | 3869363 | 4041110 | 7910473 | 51.09%\n", "Child | 21903805 | 22851851 | 44755656 | 51.06%\n", "Teen | 14393035 | 15076438 | 29469473 | 51.16%" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pivoted.with_column('Male Percentage', pivoted.column('Male')/pivoted.column('Total')).set_format(4, PercentFormatter)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }