{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Foundations of Computational Economics #15\n", "\n", "by Fedor Iskhakov, ANU\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## Introduction to Data Manipulation in Python (Pandas)\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "\n", "[https://youtu.be/61pHVbZubmo](https://youtu.be/61pHVbZubmo)\n", "\n", "Description: Introduction into DataFrames, grouping and data merging." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Scientific stack in Python\n", "\n", "Collection of modules (libraries) used in scientific numerical computations:\n", "\n", "- **``NumPy``** is widely-used scientific computing package for implements fast array processing — vectorization \n", "- **``SciPy``** is a collection of functions that perform common scientific operations (optimization, root finding, interpolation, numerical integration, etc.) \n", "- **``Pandas``** is data manipulation package with special data types and methods \n", "- **``Numba``** is just in time (JIT) compiler for a subset of Python and NumPy functions \n", "- **``Matplotlib``** serves for making figures and plots " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### What is Pandas?\n", "\n", "- Pandas provides classes for working with data (`Series`, `DataFrame`) \n", "- Data objects have **methods** for manipulating data eg. indexing, sorting, grouping, filling in missing data \n", "- Pandas does not provide modeling tools eg. regression, prediction \n", "- These tools are found in packages such as `scikit-learn` and `statsmodels`, which are built on top of pandas " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### DataFrames\n", "\n", "A `DataFrame` combines multiple ‘columns’ of data into a\n", "two-dimensional object, similar to a spreadsheet\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Create and explore the dataframe object" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import pandas as pd\n", "data = pd.read_csv('./_static/data/recent-grads.csv')\n", "# help(data) # more help on dataframe object and its methods" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajor_codeMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployed...Part_timeFull_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobs
012419PETROLEUM ENGINEERING2339.02057.0282.0Engineering0.120564361976...2701207370.018381110000950001250001534364193
122416MINING AND MINERAL ENGINEERING756.0679.077.0Engineering0.1018527640...170388850.11724175000550009000035025750
232415METALLURGICAL ENGINEERING856.0725.0131.0Engineering0.1530373648...133340160.02409673000500001050004561760
342417NAVAL ARCHITECTURE AND MARINE ENGINEERING1258.01123.0135.0Engineering0.10731316758...150692400.0501257000043000800005291020
452405CHEMICAL ENGINEERING32260.021239.011021.0Engineering0.34163128925694...51801669716720.061098650005000075000183144440972
562418NUCLEAR ENGINEERING2573.02200.0373.0Engineering0.144967171857...26414494000.17722665000500001020001142657244
676202ACTUARIAL SCIENCE3777.02110.01667.0Business0.441356512912...29624823080.0956526200053000720001768314259
785001ASTRONOMY AND ASTROPHYSICS1792.0832.0960.0Physical Sciences0.535714101526...553827330.0211676200031500109000972500220
892414MECHANICAL ENGINEERING91227.080320.010907.0Engineering0.119559102976442...131015463946500.05734260000480007000052844163843253
9102408ELECTRICAL ENGINEERING81527.065511.016016.0Engineering0.19645063161928...126954141338950.05917460000450007200045829108743170
\n", "

10 rows × 21 columns

\n", "
" ], "text/plain": [ " Rank Major_code Major Total \\\n", "0 1 2419 PETROLEUM ENGINEERING 2339.0 \n", "1 2 2416 MINING AND MINERAL ENGINEERING 756.0 \n", "2 3 2415 METALLURGICAL ENGINEERING 856.0 \n", "3 4 2417 NAVAL ARCHITECTURE AND MARINE ENGINEERING 1258.0 \n", "4 5 2405 CHEMICAL ENGINEERING 32260.0 \n", "5 6 2418 NUCLEAR ENGINEERING 2573.0 \n", "6 7 6202 ACTUARIAL SCIENCE 3777.0 \n", "7 8 5001 ASTRONOMY AND ASTROPHYSICS 1792.0 \n", "8 9 2414 MECHANICAL ENGINEERING 91227.0 \n", "9 10 2408 ELECTRICAL ENGINEERING 81527.0 \n", "\n", " Men Women Major_category ShareWomen Sample_size Employed \\\n", "0 2057.0 282.0 Engineering 0.120564 36 1976 \n", "1 679.0 77.0 Engineering 0.101852 7 640 \n", "2 725.0 131.0 Engineering 0.153037 3 648 \n", "3 1123.0 135.0 Engineering 0.107313 16 758 \n", "4 21239.0 11021.0 Engineering 0.341631 289 25694 \n", "5 2200.0 373.0 Engineering 0.144967 17 1857 \n", "6 2110.0 1667.0 Business 0.441356 51 2912 \n", "7 832.0 960.0 Physical Sciences 0.535714 10 1526 \n", "8 80320.0 10907.0 Engineering 0.119559 1029 76442 \n", "9 65511.0 16016.0 Engineering 0.196450 631 61928 \n", "\n", " ... Part_time Full_time_year_round Unemployed Unemployment_rate \\\n", "0 ... 270 1207 37 0.018381 \n", "1 ... 170 388 85 0.117241 \n", "2 ... 133 340 16 0.024096 \n", "3 ... 150 692 40 0.050125 \n", "4 ... 5180 16697 1672 0.061098 \n", "5 ... 264 1449 400 0.177226 \n", "6 ... 296 2482 308 0.095652 \n", "7 ... 553 827 33 0.021167 \n", "8 ... 13101 54639 4650 0.057342 \n", "9 ... 12695 41413 3895 0.059174 \n", "\n", " Median P25th P75th College_jobs Non_college_jobs Low_wage_jobs \n", "0 110000 95000 125000 1534 364 193 \n", "1 75000 55000 90000 350 257 50 \n", "2 73000 50000 105000 456 176 0 \n", "3 70000 43000 80000 529 102 0 \n", "4 65000 50000 75000 18314 4440 972 \n", "5 65000 50000 102000 1142 657 244 \n", "6 62000 53000 72000 1768 314 259 \n", "7 62000 31500 109000 972 500 220 \n", "8 60000 48000 70000 52844 16384 3253 \n", "9 60000 45000 72000 45829 10874 3170 \n", "\n", "[10 rows x 21 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Preview of the dataset\n", "data.head(n=10)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Description of the data\n", "\n", "**The Economic Guide To Picking A College Major**\n", "\n", "[https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/](https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/)\n", "\n", "Data dictionary available at\n", "\n", "[https://github.com/fivethirtyeight/data/tree/master/college-majors](https://github.com/fivethirtyeight/data/tree/master/college-majors)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 173 entries, 0 to 172\n", "Data columns (total 21 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Rank 173 non-null int64 \n", " 1 Major_code 173 non-null int64 \n", " 2 Major 173 non-null object \n", " 3 Total 172 non-null float64\n", " 4 Men 172 non-null float64\n", " 5 Women 172 non-null float64\n", " 6 Major_category 173 non-null object \n", " 7 ShareWomen 172 non-null float64\n", " 8 Sample_size 173 non-null int64 \n", " 9 Employed 173 non-null int64 \n", " 10 Full_time 173 non-null int64 \n", " 11 Part_time 173 non-null int64 \n", " 12 Full_time_year_round 173 non-null int64 \n", " 13 Unemployed 173 non-null int64 \n", " 14 Unemployment_rate 173 non-null float64\n", " 15 Median 173 non-null int64 \n", " 16 P25th 173 non-null int64 \n", " 17 P75th 173 non-null int64 \n", " 18 College_jobs 173 non-null int64 \n", " 19 Non_college_jobs 173 non-null int64 \n", " 20 Low_wage_jobs 173 non-null int64 \n", "dtypes: float64(5), int64(14), object(2)\n", "memory usage: 28.5+ KB\n" ] } ], "source": [ "# Info on the dataset\n", "data.info()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Access individual columns of data\n", "\n", "This returns a `Series` object" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of data1 is \n" ] }, { "data": { "text/plain": [ "0 PETROLEUM ENGINEERING\n", "1 MINING AND MINERAL ENGINEERING\n", "2 METALLURGICAL ENGINEERING\n", "3 NAVAL ARCHITECTURE AND MARINE ENGINEERING\n", "4 CHEMICAL ENGINEERING\n", "Name: Major, dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data1 = data['Major']\n", "print('Type of data1 is ',type(data1))\n", "data1.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Access multiple columns of data\n", "\n", "This returns a `DataFrame` object again" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of data2 is \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MajorShareWomen
0PETROLEUM ENGINEERING0.120564
1MINING AND MINERAL ENGINEERING0.101852
2METALLURGICAL ENGINEERING0.153037
3NAVAL ARCHITECTURE AND MARINE ENGINEERING0.107313
4CHEMICAL ENGINEERING0.341631
\n", "
" ], "text/plain": [ " Major ShareWomen\n", "0 PETROLEUM ENGINEERING 0.120564\n", "1 MINING AND MINERAL ENGINEERING 0.101852\n", "2 METALLURGICAL ENGINEERING 0.153037\n", "3 NAVAL ARCHITECTURE AND MARINE ENGINEERING 0.107313\n", "4 CHEMICAL ENGINEERING 0.341631" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data2 = data[['Major', 'ShareWomen']]\n", "print('Type of data2 is ',type(data2))\n", "data2.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Add a new column Stata style" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TotalEmployedEmployment rate
02339.019760.844805
1756.06400.846561
2856.06480.757009
31258.07580.602544
432260.0256940.796466
\n", "
" ], "text/plain": [ " Total Employed Employment rate\n", "0 2339.0 1976 0.844805\n", "1 756.0 640 0.846561\n", "2 856.0 648 0.757009\n", "3 1258.0 758 0.602544\n", "4 32260.0 25694 0.796466" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['Employment rate'] = data['Employed'] / data['Total']\n", "data[['Total', 'Employed', 'Employment rate']].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Average unemployment rate…" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "6.819083091329481" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['Unemployment_rate'].mean() * 100" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Simple summary statistics\n", "\n", "`.describe()` returns useful summary statistics" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "count 173.000000\n", "mean 0.068191\n", "std 0.030331\n", "min 0.000000\n", "25% 0.050306\n", "50% 0.067961\n", "75% 0.087557\n", "max 0.177226\n", "Name: Unemployment_rate, dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['Unemployment_rate'].describe()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Simple plots\n", "\n", "Pandas also provides a simple way to generate matplotlib plots" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Median salary')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "data.plot(x='ShareWomen', y='Median', kind='scatter', figsize=(10, 8), color='red')\n", "plt.xlabel('Share of women')\n", "plt.ylabel('Median salary')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Selecting and filtering\n", "\n", "We can use integer slicing to select rows as follows" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajor_codeMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployed...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
012419PETROLEUM ENGINEERING2339.02057.0282.0Engineering0.120564361976...1207370.0183811100009500012500015343641930.844805
122416MINING AND MINERAL ENGINEERING756.0679.077.0Engineering0.1018527640...388850.117241750005500090000350257500.846561
232415METALLURGICAL ENGINEERING856.0725.0131.0Engineering0.1530373648...340160.024096730005000010500045617600.757009
\n", "

3 rows × 22 columns

\n", "
" ], "text/plain": [ " Rank Major_code Major Total Men Women \\\n", "0 1 2419 PETROLEUM ENGINEERING 2339.0 2057.0 282.0 \n", "1 2 2416 MINING AND MINERAL ENGINEERING 756.0 679.0 77.0 \n", "2 3 2415 METALLURGICAL ENGINEERING 856.0 725.0 131.0 \n", "\n", " Major_category ShareWomen Sample_size Employed ... \\\n", "0 Engineering 0.120564 36 1976 ... \n", "1 Engineering 0.101852 7 640 ... \n", "2 Engineering 0.153037 3 648 ... \n", "\n", " Full_time_year_round Unemployed Unemployment_rate Median P25th P75th \\\n", "0 1207 37 0.018381 110000 95000 125000 \n", "1 388 85 0.117241 75000 55000 90000 \n", "2 340 16 0.024096 73000 50000 105000 \n", "\n", " College_jobs Non_college_jobs Low_wage_jobs Employment rate \n", "0 1534 364 193 0.844805 \n", "1 350 257 50 0.846561 \n", "2 456 176 0 0.757009 \n", "\n", "[3 rows x 22 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[:3]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Majors with the highest share of women\n", "\n", "First we will sort our values by a column in the dataframe" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajor_codeMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployed...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
1641652307EARLY CHILDHOOD EDUCATION37589.01167.036422.0Education0.96895434232551...2074813600.04010528000210003500023515770528680.865971
1631646102COMMUNICATION DISORDERS SCIENCES AND SERVICES38279.01225.037054.0Health0.9679989529763...1446014870.04758428000200004000019957940451250.777528
51526104MEDICAL ASSISTING SERVICES11123.0803.010320.0Health0.927807679168...42904070.0425074200030000650002091694812700.824238
\n", "

3 rows × 22 columns

\n", "
" ], "text/plain": [ " Rank Major_code Major Total \\\n", "164 165 2307 EARLY CHILDHOOD EDUCATION 37589.0 \n", "163 164 6102 COMMUNICATION DISORDERS SCIENCES AND SERVICES 38279.0 \n", "51 52 6104 MEDICAL ASSISTING SERVICES 11123.0 \n", "\n", " Men Women Major_category ShareWomen Sample_size Employed ... \\\n", "164 1167.0 36422.0 Education 0.968954 342 32551 ... \n", "163 1225.0 37054.0 Health 0.967998 95 29763 ... \n", "51 803.0 10320.0 Health 0.927807 67 9168 ... \n", "\n", " Full_time_year_round Unemployed Unemployment_rate Median P25th \\\n", "164 20748 1360 0.040105 28000 21000 \n", "163 14460 1487 0.047584 28000 20000 \n", "51 4290 407 0.042507 42000 30000 \n", "\n", " P75th College_jobs Non_college_jobs Low_wage_jobs Employment rate \n", "164 35000 23515 7705 2868 0.865971 \n", "163 40000 19957 9404 5125 0.777528 \n", "51 65000 2091 6948 1270 0.824238 \n", "\n", "[3 rows x 22 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sort_values(by='ShareWomen', ascending=False)[:3]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Using row index\n", "\n", "Another way to select rows is to use row labels, i.e. set a row index\n", "\n", "Similar to the column labels, we can add row labels (the index)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployedFull_time...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
Major_code
24191PETROLEUM ENGINEERING2339.02057.0282.0Engineering0.1205643619761849...1207370.0183811100009500012500015343641930.844805
24162MINING AND MINERAL ENGINEERING756.0679.077.0Engineering0.1018527640556...388850.117241750005500090000350257500.846561
24153METALLURGICAL ENGINEERING856.0725.0131.0Engineering0.1530373648558...340160.024096730005000010500045617600.757009
24174NAVAL ARCHITECTURE AND MARINE ENGINEERING1258.01123.0135.0Engineering0.107313167581069...692400.05012570000430008000052910200.602544
24055CHEMICAL ENGINEERING32260.021239.011021.0Engineering0.3416312892569423170...1669716720.0610986500050000750001831444409720.796466
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " Rank Major Total Men \\\n", "Major_code \n", "2419 1 PETROLEUM ENGINEERING 2339.0 2057.0 \n", "2416 2 MINING AND MINERAL ENGINEERING 756.0 679.0 \n", "2415 3 METALLURGICAL ENGINEERING 856.0 725.0 \n", "2417 4 NAVAL ARCHITECTURE AND MARINE ENGINEERING 1258.0 1123.0 \n", "2405 5 CHEMICAL ENGINEERING 32260.0 21239.0 \n", "\n", " Women Major_category ShareWomen Sample_size Employed \\\n", "Major_code \n", "2419 282.0 Engineering 0.120564 36 1976 \n", "2416 77.0 Engineering 0.101852 7 640 \n", "2415 131.0 Engineering 0.153037 3 648 \n", "2417 135.0 Engineering 0.107313 16 758 \n", "2405 11021.0 Engineering 0.341631 289 25694 \n", "\n", " Full_time ... Full_time_year_round Unemployed \\\n", "Major_code ... \n", "2419 1849 ... 1207 37 \n", "2416 556 ... 388 85 \n", "2415 558 ... 340 16 \n", "2417 1069 ... 692 40 \n", "2405 23170 ... 16697 1672 \n", "\n", " Unemployment_rate Median P25th P75th College_jobs \\\n", "Major_code \n", "2419 0.018381 110000 95000 125000 1534 \n", "2416 0.117241 75000 55000 90000 350 \n", "2415 0.024096 73000 50000 105000 456 \n", "2417 0.050125 70000 43000 80000 529 \n", "2405 0.061098 65000 50000 75000 18314 \n", "\n", " Non_college_jobs Low_wage_jobs Employment rate \n", "Major_code \n", "2419 364 193 0.844805 \n", "2416 257 50 0.846561 \n", "2415 176 0 0.757009 \n", "2417 102 0 0.602544 \n", "2405 4440 972 0.796466 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.set_index('Major_code').head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Need to overwrite dataframe\n", "\n", "Note: we haven’t actually changed the DataFrame `data`\n", "\n", "Need to overwrite `data` with the new copy" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajor_codeMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployed...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
012419PETROLEUM ENGINEERING2339.02057.0282.0Engineering0.120564361976...1207370.0183811100009500012500015343641930.844805
122416MINING AND MINERAL ENGINEERING756.0679.077.0Engineering0.1018527640...388850.117241750005500090000350257500.846561
232415METALLURGICAL ENGINEERING856.0725.0131.0Engineering0.1530373648...340160.024096730005000010500045617600.757009
342417NAVAL ARCHITECTURE AND MARINE ENGINEERING1258.01123.0135.0Engineering0.10731316758...692400.05012570000430008000052910200.602544
452405CHEMICAL ENGINEERING32260.021239.011021.0Engineering0.34163128925694...1669716720.0610986500050000750001831444409720.796466
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " Rank Major_code Major Total \\\n", "0 1 2419 PETROLEUM ENGINEERING 2339.0 \n", "1 2 2416 MINING AND MINERAL ENGINEERING 756.0 \n", "2 3 2415 METALLURGICAL ENGINEERING 856.0 \n", "3 4 2417 NAVAL ARCHITECTURE AND MARINE ENGINEERING 1258.0 \n", "4 5 2405 CHEMICAL ENGINEERING 32260.0 \n", "\n", " Men Women Major_category ShareWomen Sample_size Employed ... \\\n", "0 2057.0 282.0 Engineering 0.120564 36 1976 ... \n", "1 679.0 77.0 Engineering 0.101852 7 640 ... \n", "2 725.0 131.0 Engineering 0.153037 3 648 ... \n", "3 1123.0 135.0 Engineering 0.107313 16 758 ... \n", "4 21239.0 11021.0 Engineering 0.341631 289 25694 ... \n", "\n", " Full_time_year_round Unemployed Unemployment_rate Median P25th P75th \\\n", "0 1207 37 0.018381 110000 95000 125000 \n", "1 388 85 0.117241 75000 55000 90000 \n", "2 340 16 0.024096 73000 50000 105000 \n", "3 692 40 0.050125 70000 43000 80000 \n", "4 16697 1672 0.061098 65000 50000 75000 \n", "\n", " College_jobs Non_college_jobs Low_wage_jobs Employment rate \n", "0 1534 364 193 0.844805 \n", "1 350 257 50 0.846561 \n", "2 456 176 0 0.757009 \n", "3 529 102 0 0.602544 \n", "4 18314 4440 972 0.796466 \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployedFull_time...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
Major_code
24191PETROLEUM ENGINEERING2339.02057.0282.0Engineering0.1205643619761849...1207370.0183811100009500012500015343641930.844805
24162MINING AND MINERAL ENGINEERING756.0679.077.0Engineering0.1018527640556...388850.117241750005500090000350257500.846561
24153METALLURGICAL ENGINEERING856.0725.0131.0Engineering0.1530373648558...340160.024096730005000010500045617600.757009
24174NAVAL ARCHITECTURE AND MARINE ENGINEERING1258.01123.0135.0Engineering0.107313167581069...692400.05012570000430008000052910200.602544
24055CHEMICAL ENGINEERING32260.021239.011021.0Engineering0.3416312892569423170...1669716720.0610986500050000750001831444409720.796466
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " Rank Major Total Men \\\n", "Major_code \n", "2419 1 PETROLEUM ENGINEERING 2339.0 2057.0 \n", "2416 2 MINING AND MINERAL ENGINEERING 756.0 679.0 \n", "2415 3 METALLURGICAL ENGINEERING 856.0 725.0 \n", "2417 4 NAVAL ARCHITECTURE AND MARINE ENGINEERING 1258.0 1123.0 \n", "2405 5 CHEMICAL ENGINEERING 32260.0 21239.0 \n", "\n", " Women Major_category ShareWomen Sample_size Employed \\\n", "Major_code \n", "2419 282.0 Engineering 0.120564 36 1976 \n", "2416 77.0 Engineering 0.101852 7 640 \n", "2415 131.0 Engineering 0.153037 3 648 \n", "2417 135.0 Engineering 0.107313 16 758 \n", "2405 11021.0 Engineering 0.341631 289 25694 \n", "\n", " Full_time ... Full_time_year_round Unemployed \\\n", "Major_code ... \n", "2419 1849 ... 1207 37 \n", "2416 556 ... 388 85 \n", "2415 558 ... 340 16 \n", "2417 1069 ... 692 40 \n", "2405 23170 ... 16697 1672 \n", "\n", " Unemployment_rate Median P25th P75th College_jobs \\\n", "Major_code \n", "2419 0.018381 110000 95000 125000 1534 \n", "2416 0.117241 75000 55000 90000 350 \n", "2415 0.024096 73000 50000 105000 456 \n", "2417 0.050125 70000 43000 80000 529 \n", "2405 0.061098 65000 50000 75000 18314 \n", "\n", " Non_college_jobs Low_wage_jobs Employment rate \n", "Major_code \n", "2419 364 193 0.844805 \n", "2416 257 50 0.846561 \n", "2415 176 0 0.757009 \n", "2417 102 0 0.602544 \n", "2405 4440 972 0.796466 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = data.set_index('Major_code')\n", "# Could also use data.set_index('Major_code', inplace=True)\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### After index is set, we can access with `loc`\n", "\n", "Using `Major_code` variable values as labels for rows" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Rank 5\n", "Major CHEMICAL ENGINEERING\n", "Total 32260\n", "Men 21239\n", "Women 11021\n", "Major_category Engineering\n", "ShareWomen 0.341631\n", "Sample_size 289\n", "Employed 25694\n", "Full_time 23170\n", "Part_time 5180\n", "Full_time_year_round 16697\n", "Unemployed 1672\n", "Unemployment_rate 0.0610977\n", "Median 65000\n", "P25th 50000\n", "P75th 75000\n", "College_jobs 18314\n", "Non_college_jobs 4440\n", "Low_wage_jobs 972\n", "Employment rate 0.796466\n", "Name: 2405, dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.loc[2405]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployedFull_time...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
Major_code
6102164COMMUNICATION DISORDERS SCIENCES AND SERVICES38279.01225.037054.0Health0.967998952976319975...1446014870.04758428000200004000019957940451250.777528
50018ASTRONOMY AND ASTROPHYSICS1792.0832.0960.0Physical Sciences0.5357141015261085...827330.02116762000315001090009725002200.851562
\n", "

2 rows × 21 columns

\n", "
" ], "text/plain": [ " Rank Major Total \\\n", "Major_code \n", "6102 164 COMMUNICATION DISORDERS SCIENCES AND SERVICES 38279.0 \n", "5001 8 ASTRONOMY AND ASTROPHYSICS 1792.0 \n", "\n", " Men Women Major_category ShareWomen Sample_size \\\n", "Major_code \n", "6102 1225.0 37054.0 Health 0.967998 95 \n", "5001 832.0 960.0 Physical Sciences 0.535714 10 \n", "\n", " Employed Full_time ... Full_time_year_round Unemployed \\\n", "Major_code ... \n", "6102 29763 19975 ... 14460 1487 \n", "5001 1526 1085 ... 827 33 \n", "\n", " Unemployment_rate Median P25th P75th College_jobs \\\n", "Major_code \n", "6102 0.047584 28000 20000 40000 19957 \n", "5001 0.021167 62000 31500 109000 972 \n", "\n", " Non_college_jobs Low_wage_jobs Employment rate \n", "Major_code \n", "6102 9404 5125 0.777528 \n", "5001 500 220 0.851562 \n", "\n", "[2 rows x 21 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "code_list = [6102, 5001]\n", "\n", "data.loc[code_list]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Sorting index\n", "\n", "Recommended for efficient selecting and filtering" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployedFull_time...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
Major_code
110065GENERAL AGRICULTURE10399.06053.04346.0Agriculture & Natural Resources0.41792515888847589...58881780.019642400003000050000241847178390.854313
110164AGRICULTURE PRODUCTION AND MANAGEMENT14240.09658.04582.0Agriculture & Natural Resources0.3217702731232311119...90936490.0500314000025000500001925622113620.865379
110272AGRICULTURAL ECONOMICS2439.01749.0690.0Agriculture & Natural Resources0.2829034421741819...15281820.077250400002700054000535893940.891349
1103153ANIMAL SCIENCES21573.05347.016226.0Agriculture & Natural Resources0.7521442551711214479...108249170.0508623000022000400005443957121250.793214
110422FOOD SCIENCENaNNaNNaNAgriculture & Natural ResourcesNaN3631492558...17353380.09693153000320007000011831274485NaN
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " Rank Major Total Men \\\n", "Major_code \n", "1100 65 GENERAL AGRICULTURE 10399.0 6053.0 \n", "1101 64 AGRICULTURE PRODUCTION AND MANAGEMENT 14240.0 9658.0 \n", "1102 72 AGRICULTURAL ECONOMICS 2439.0 1749.0 \n", "1103 153 ANIMAL SCIENCES 21573.0 5347.0 \n", "1104 22 FOOD SCIENCE NaN NaN \n", "\n", " Women Major_category ShareWomen Sample_size \\\n", "Major_code \n", "1100 4346.0 Agriculture & Natural Resources 0.417925 158 \n", "1101 4582.0 Agriculture & Natural Resources 0.321770 273 \n", "1102 690.0 Agriculture & Natural Resources 0.282903 44 \n", "1103 16226.0 Agriculture & Natural Resources 0.752144 255 \n", "1104 NaN Agriculture & Natural Resources NaN 36 \n", "\n", " Employed Full_time ... Full_time_year_round Unemployed \\\n", "Major_code ... \n", "1100 8884 7589 ... 5888 178 \n", "1101 12323 11119 ... 9093 649 \n", "1102 2174 1819 ... 1528 182 \n", "1103 17112 14479 ... 10824 917 \n", "1104 3149 2558 ... 1735 338 \n", "\n", " Unemployment_rate Median P25th P75th College_jobs \\\n", "Major_code \n", "1100 0.019642 40000 30000 50000 2418 \n", "1101 0.050031 40000 25000 50000 1925 \n", "1102 0.077250 40000 27000 54000 535 \n", "1103 0.050862 30000 22000 40000 5443 \n", "1104 0.096931 53000 32000 70000 1183 \n", "\n", " Non_college_jobs Low_wage_jobs Employment rate \n", "Major_code \n", "1100 4717 839 0.854313 \n", "1101 6221 1362 0.865379 \n", "1102 893 94 0.891349 \n", "1103 9571 2125 0.793214 \n", "1104 1274 485 NaN \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sort_index(inplace=True)\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Selecting rows\n", "\n", "Alternatively, we can filter our dataframe (select rows) using *boolean conditions*" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployedFull_time...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
Major_code
6000150FINE ARTS74440.024786.049654.0Arts0.6670346235967942764...3187754860.0841863050021000410002079232725118800.801706
6001167DRAMA AND THEATER ARTS43249.014440.028809.0Arts0.6661193573616525147...1689130400.077541270001920035000699425313110680.836204
6002147MUSIC60633.029909.030724.0Arts0.5067214194766229010...2142539180.075960310002230042000137522878692860.786074
6003154VISUAL AND PERFORMING ARTS16250.04133.012117.0Arts0.745662132128708447...632214650.1021973000022000400003849763528400.792000
600496COMMERCIAL ART AND GRAPHIC DESIGN103480.032041.071439.0Arts0.69036511868348367448...5224389470.0967983500025000450003738938119148390.806755
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " Rank Major Total Men \\\n", "Major_code \n", "6000 150 FINE ARTS 74440.0 24786.0 \n", "6001 167 DRAMA AND THEATER ARTS 43249.0 14440.0 \n", "6002 147 MUSIC 60633.0 29909.0 \n", "6003 154 VISUAL AND PERFORMING ARTS 16250.0 4133.0 \n", "6004 96 COMMERCIAL ART AND GRAPHIC DESIGN 103480.0 32041.0 \n", "\n", " Women Major_category ShareWomen Sample_size Employed \\\n", "Major_code \n", "6000 49654.0 Arts 0.667034 623 59679 \n", "6001 28809.0 Arts 0.666119 357 36165 \n", "6002 30724.0 Arts 0.506721 419 47662 \n", "6003 12117.0 Arts 0.745662 132 12870 \n", "6004 71439.0 Arts 0.690365 1186 83483 \n", "\n", " Full_time ... Full_time_year_round Unemployed \\\n", "Major_code ... \n", "6000 42764 ... 31877 5486 \n", "6001 25147 ... 16891 3040 \n", "6002 29010 ... 21425 3918 \n", "6003 8447 ... 6322 1465 \n", "6004 67448 ... 52243 8947 \n", "\n", " Unemployment_rate Median P25th P75th College_jobs \\\n", "Major_code \n", "6000 0.084186 30500 21000 41000 20792 \n", "6001 0.077541 27000 19200 35000 6994 \n", "6002 0.075960 31000 22300 42000 13752 \n", "6003 0.102197 30000 22000 40000 3849 \n", "6004 0.096798 35000 25000 45000 37389 \n", "\n", " Non_college_jobs Low_wage_jobs Employment rate \n", "Major_code \n", "6000 32725 11880 0.801706 \n", "6001 25313 11068 0.836204 \n", "6002 28786 9286 0.786074 \n", "6003 7635 2840 0.792000 \n", "6004 38119 14839 0.806755 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "selection = data['Major_category'] == 'Arts'\n", "data[selection].head()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankMajorTotalMenWomenMajor_categoryShareWomenSample_sizeEmployedFull_time...Full_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
Major_code
600496COMMERCIAL ART AND GRAPHIC DESIGN103480.032041.071439.0Arts0.69036511868348367448...5224389470.0967983500025000450003738938119148390.806755
\n", "

1 rows × 21 columns

\n", "
" ], "text/plain": [ " Rank Major Total Men \\\n", "Major_code \n", "6004 96 COMMERCIAL ART AND GRAPHIC DESIGN 103480.0 32041.0 \n", "\n", " Women Major_category ShareWomen Sample_size Employed \\\n", "Major_code \n", "6004 71439.0 Arts 0.690365 1186 83483 \n", "\n", " Full_time ... Full_time_year_round Unemployed \\\n", "Major_code ... \n", "6004 67448 ... 52243 8947 \n", "\n", " Unemployment_rate Median P25th P75th College_jobs \\\n", "Major_code \n", "6004 0.096798 35000 25000 45000 37389 \n", "\n", " Non_college_jobs Low_wage_jobs Employment rate \n", "Major_code \n", "6004 38119 14839 0.806755 \n", "\n", "[1 rows x 21 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "selection = data['Major_category'] == 'Arts'\n", "data[selection & (data['Total'] > 100000)].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Grouping and aggregating data\n", "\n", "We might want to summarize our data by grouping it by major categories\n", "\n", "To do this, we will use the `.groupby()` function" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped = data.groupby('Major_category')\n", "grouped" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n" ] }, { "data": { "text/plain": [ "{'Agriculture & Natural Resources': Int64Index([1100, 1101, 1102, 1103, 1104, 1105, 1106, 1199, 1302, 1303], dtype='int64', name='Major_code'),\n", " 'Arts': Int64Index([6000, 6001, 6002, 6003, 6004, 6005, 6007, 6099], dtype='int64', name='Major_code'),\n", " 'Biology & Life Science': Int64Index([1301, 3600, 3601, 3602, 3603, 3604, 3605, 3606, 3607, 3608, 3609,\n", " 3611, 3699, 4006],\n", " dtype='int64', name='Major_code'),\n", " 'Business': Int64Index([6200, 6201, 6202, 6203, 6204, 6205, 6206, 6207, 6209, 6210, 6211,\n", " 6212, 6299],\n", " dtype='int64', name='Major_code'),\n", " 'Communications & Journalism': Int64Index([1901, 1902, 1903, 1904], dtype='int64', name='Major_code'),\n", " 'Computers & Mathematics': Int64Index([2001, 2100, 2101, 2102, 2105, 2106, 2107, 3700, 3701, 3702, 4005], dtype='int64', name='Major_code'),\n", " 'Education': Int64Index([2300, 2301, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310, 2311,\n", " 2312, 2313, 2314, 2399, 3501],\n", " dtype='int64', name='Major_code'),\n", " 'Engineering': Int64Index([1401, 2400, 2401, 2402, 2403, 2404, 2405, 2406, 2407, 2408, 2409,\n", " 2410, 2411, 2412, 2413, 2414, 2415, 2416, 2417, 2418, 2419, 2499,\n", " 2500, 2501, 2502, 2503, 2504, 2599, 5008],\n", " dtype='int64', name='Major_code'),\n", " 'Health': Int64Index([4002, 6100, 6102, 6103, 6104, 6105, 6106, 6107, 6108, 6109, 6110,\n", " 6199],\n", " dtype='int64', name='Major_code'),\n", " 'Humanities & Liberal Arts': Int64Index([1501, 2601, 2602, 2603, 3301, 3302, 3401, 3402, 4001, 4801, 4901,\n", " 5502, 6006, 6402, 6403],\n", " dtype='int64', name='Major_code'),\n", " 'Industrial Arts & Consumer Services': Int64Index([2201, 2901, 3801, 4101, 5601, 5701, 5901], dtype='int64', name='Major_code'),\n", " 'Interdisciplinary': Int64Index([4000], dtype='int64', name='Major_code'),\n", " 'Law & Public Policy': Int64Index([3201, 3202, 5301, 5401, 5402], dtype='int64', name='Major_code'),\n", " 'Physical Sciences': Int64Index([5000, 5001, 5002, 5003, 5004, 5005, 5006, 5007, 5098, 5102], dtype='int64', name='Major_code'),\n", " 'Psychology & Social Work': Int64Index([5200, 5201, 5202, 5203, 5205, 5206, 5299, 5403, 5404], dtype='int64', name='Major_code'),\n", " 'Social Science': Int64Index([4007, 5500, 5501, 5503, 5504, 5505, 5506, 5507, 5599], dtype='int64', name='Major_code')}" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(type(grouped))\n", "print(type(grouped.groups))\n", "grouped.groups" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Reduce functions for the grouped data\n", "\n", "To return an *aggregated* dataframe, we need to specify the function we\n", "would like pandas to use to aggregate our groups\n", "\n", "- Mean \n", "- Sum \n", "- Count \n", "\n", "\n", "A list of built-in aggregatation functions [https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankTotalMenWomenShareWomenSample_sizeEmployedFull_timePart_timeFull_time_year_roundUnemployedUnemployment_rateMedianP25thP75thCollege_jobsNon_college_jobsLow_wage_jobsEmployment rate
Major_category
Agriculture & Natural Resources101.5000008402.2222224484.1111113918.1111110.405267110.4000006694.3000005814.3000001659.1000004362.600000382.4000000.05632836900.00000025400.00000048010.0000001986.0000003449.100000789.9000000.860603
Arts131.12500044641.25000016798.75000027842.5000000.603658407.50000036014.25000025971.62500014348.87500019138.8750003528.5000000.09017333062.50000021962.50000043662.50000011848.12500020465.0000007514.5000000.815670
Biology & Life Science95.35714332418.71428613208.50000019210.2142860.587193165.50000021628.35714317169.7857148338.28571411843.0000001632.4285710.06091836421.42857126614.28571446085.71428610802.3571439084.4285713053.0000000.700543
Business55.846154100182.76923151373.23076948809.5384620.4831981192.69230883749.38461576066.92307715148.92307760801.9230776144.3846150.07106443538.46153833461.53846254846.15384611426.00000038197.6923089752.9230770.829638
Communications & Journalism104.00000098150.25000032980.25000065170.0000000.6583841127.00000082665.00000068332.50000022454.25000053557.0000006713.0000000.07553834500.00000026250.00000044975.00000021639.00000043248.00000012398.7500000.844534
Computers & Mathematics57.00000027182.54545518975.0000008207.5454550.311772260.00000021626.72727318867.7272734842.72727314468.7272731670.2727270.08425642745.45454529290.90909158090.90909112532.6363646769.3636361466.9090910.799617
Education130.37500034945.5625006470.37500028475.1875000.748507296.37500029989.93750024878.6875007537.06250018001.9375001560.5625000.05170232350.00000026590.62500038562.50000021169.5625007610.0625002554.3750000.856697
Engineering22.62069018537.34482814079.5517244457.7931030.238889169.86206914495.58620713167.8275862935.7241389963.8620691028.1724140.06333457382.75862141555.17241470448.2758629302.3103453530.448276864.7931030.787311
Health96.50000038602.5000006293.08333332309.4166670.795152326.16666731012.25000024568.2500009549.33333319034.8333331851.0833330.06592036825.00000026166.66666750250.00000020453.4166679208.0000002605.8333330.758078
Humanities & Liberal Arts135.06666747564.53333318189.73333329374.8000000.631790356.00000036274.53333327795.93333314268.66666719704.0666673406.7333330.08100831913.33333323493.33333342073.33333312843.33333318435.4666676282.6666670.767983
Industrial Arts & Consumer Services105.14285732827.42857114825.85714318001.5714290.349523309.28571427006.14285721626.1428578731.71428616311.2857141646.5714290.04807136342.85714326771.42857145142.8571438171.42857114945.7142863798.5714290.715442
Interdisciplinary110.00000012296.0000002817.0000009479.0000000.770901128.0000009821.0000008032.0000003173.0000006234.000000749.0000000.07086135000.00000025000.00000044000.0000005176.0000003903.0000001061.0000000.798715
Law & Public Policy64.60000035821.40000018225.80000017595.6000000.483649387.00000028958.00000025388.0000007642.60000020090.8000002699.0000000.09080542200.00000032640.00000055000.0000005844.20000020004.8000004144.0000000.770304
Physical Sciences67.60000018547.9000009539.0000009008.9000000.508683113.70000013923.10000011285.2000004344.4000008563.500000788.0000000.04651141890.00000028350.00000057290.0000007655.2000004946.9000001407.8000000.776080
Psychology & Social Work143.00000053445.22222210901.66666742543.5555560.794397353.33333342260.44444432111.11111115332.44444424233.8888893699.1111110.07206530100.00000025333.33333338777.77777818256.11111118818.4444446249.5555560.795633
Social Science91.66666758885.11111128537.11111130348.0000000.553962509.00000044610.33333338571.22222213507.66666728357.6666674775.0000000.09572937344.44444428355.55555650111.11111112662.22222221138.4444446020.0000000.770296
\n", "
" ], "text/plain": [ " Rank Total Men \\\n", "Major_category \n", "Agriculture & Natural Resources 101.500000 8402.222222 4484.111111 \n", "Arts 131.125000 44641.250000 16798.750000 \n", "Biology & Life Science 95.357143 32418.714286 13208.500000 \n", "Business 55.846154 100182.769231 51373.230769 \n", "Communications & Journalism 104.000000 98150.250000 32980.250000 \n", "Computers & Mathematics 57.000000 27182.545455 18975.000000 \n", "Education 130.375000 34945.562500 6470.375000 \n", "Engineering 22.620690 18537.344828 14079.551724 \n", "Health 96.500000 38602.500000 6293.083333 \n", "Humanities & Liberal Arts 135.066667 47564.533333 18189.733333 \n", "Industrial Arts & Consumer Services 105.142857 32827.428571 14825.857143 \n", "Interdisciplinary 110.000000 12296.000000 2817.000000 \n", "Law & Public Policy 64.600000 35821.400000 18225.800000 \n", "Physical Sciences 67.600000 18547.900000 9539.000000 \n", "Psychology & Social Work 143.000000 53445.222222 10901.666667 \n", "Social Science 91.666667 58885.111111 28537.111111 \n", "\n", " Women ShareWomen Sample_size \\\n", "Major_category \n", "Agriculture & Natural Resources 3918.111111 0.405267 110.400000 \n", "Arts 27842.500000 0.603658 407.500000 \n", "Biology & Life Science 19210.214286 0.587193 165.500000 \n", "Business 48809.538462 0.483198 1192.692308 \n", "Communications & Journalism 65170.000000 0.658384 1127.000000 \n", "Computers & Mathematics 8207.545455 0.311772 260.000000 \n", "Education 28475.187500 0.748507 296.375000 \n", "Engineering 4457.793103 0.238889 169.862069 \n", "Health 32309.416667 0.795152 326.166667 \n", "Humanities & Liberal Arts 29374.800000 0.631790 356.000000 \n", "Industrial Arts & Consumer Services 18001.571429 0.349523 309.285714 \n", "Interdisciplinary 9479.000000 0.770901 128.000000 \n", "Law & Public Policy 17595.600000 0.483649 387.000000 \n", "Physical Sciences 9008.900000 0.508683 113.700000 \n", "Psychology & Social Work 42543.555556 0.794397 353.333333 \n", "Social Science 30348.000000 0.553962 509.000000 \n", "\n", " Employed Full_time Part_time \\\n", "Major_category \n", "Agriculture & Natural Resources 6694.300000 5814.300000 1659.100000 \n", "Arts 36014.250000 25971.625000 14348.875000 \n", "Biology & Life Science 21628.357143 17169.785714 8338.285714 \n", "Business 83749.384615 76066.923077 15148.923077 \n", "Communications & Journalism 82665.000000 68332.500000 22454.250000 \n", "Computers & Mathematics 21626.727273 18867.727273 4842.727273 \n", "Education 29989.937500 24878.687500 7537.062500 \n", "Engineering 14495.586207 13167.827586 2935.724138 \n", "Health 31012.250000 24568.250000 9549.333333 \n", "Humanities & Liberal Arts 36274.533333 27795.933333 14268.666667 \n", "Industrial Arts & Consumer Services 27006.142857 21626.142857 8731.714286 \n", "Interdisciplinary 9821.000000 8032.000000 3173.000000 \n", "Law & Public Policy 28958.000000 25388.000000 7642.600000 \n", "Physical Sciences 13923.100000 11285.200000 4344.400000 \n", "Psychology & Social Work 42260.444444 32111.111111 15332.444444 \n", "Social Science 44610.333333 38571.222222 13507.666667 \n", "\n", " Full_time_year_round Unemployed \\\n", "Major_category \n", "Agriculture & Natural Resources 4362.600000 382.400000 \n", "Arts 19138.875000 3528.500000 \n", "Biology & Life Science 11843.000000 1632.428571 \n", "Business 60801.923077 6144.384615 \n", "Communications & Journalism 53557.000000 6713.000000 \n", "Computers & Mathematics 14468.727273 1670.272727 \n", "Education 18001.937500 1560.562500 \n", "Engineering 9963.862069 1028.172414 \n", "Health 19034.833333 1851.083333 \n", "Humanities & Liberal Arts 19704.066667 3406.733333 \n", "Industrial Arts & Consumer Services 16311.285714 1646.571429 \n", "Interdisciplinary 6234.000000 749.000000 \n", "Law & Public Policy 20090.800000 2699.000000 \n", "Physical Sciences 8563.500000 788.000000 \n", "Psychology & Social Work 24233.888889 3699.111111 \n", "Social Science 28357.666667 4775.000000 \n", "\n", " Unemployment_rate Median \\\n", "Major_category \n", "Agriculture & Natural Resources 0.056328 36900.000000 \n", "Arts 0.090173 33062.500000 \n", "Biology & Life Science 0.060918 36421.428571 \n", "Business 0.071064 43538.461538 \n", "Communications & Journalism 0.075538 34500.000000 \n", "Computers & Mathematics 0.084256 42745.454545 \n", "Education 0.051702 32350.000000 \n", "Engineering 0.063334 57382.758621 \n", "Health 0.065920 36825.000000 \n", "Humanities & Liberal Arts 0.081008 31913.333333 \n", "Industrial Arts & Consumer Services 0.048071 36342.857143 \n", "Interdisciplinary 0.070861 35000.000000 \n", "Law & Public Policy 0.090805 42200.000000 \n", "Physical Sciences 0.046511 41890.000000 \n", "Psychology & Social Work 0.072065 30100.000000 \n", "Social Science 0.095729 37344.444444 \n", "\n", " P25th P75th College_jobs \\\n", "Major_category \n", "Agriculture & Natural Resources 25400.000000 48010.000000 1986.000000 \n", "Arts 21962.500000 43662.500000 11848.125000 \n", "Biology & Life Science 26614.285714 46085.714286 10802.357143 \n", "Business 33461.538462 54846.153846 11426.000000 \n", "Communications & Journalism 26250.000000 44975.000000 21639.000000 \n", "Computers & Mathematics 29290.909091 58090.909091 12532.636364 \n", "Education 26590.625000 38562.500000 21169.562500 \n", "Engineering 41555.172414 70448.275862 9302.310345 \n", "Health 26166.666667 50250.000000 20453.416667 \n", "Humanities & Liberal Arts 23493.333333 42073.333333 12843.333333 \n", "Industrial Arts & Consumer Services 26771.428571 45142.857143 8171.428571 \n", "Interdisciplinary 25000.000000 44000.000000 5176.000000 \n", "Law & Public Policy 32640.000000 55000.000000 5844.200000 \n", "Physical Sciences 28350.000000 57290.000000 7655.200000 \n", "Psychology & Social Work 25333.333333 38777.777778 18256.111111 \n", "Social Science 28355.555556 50111.111111 12662.222222 \n", "\n", " Non_college_jobs Low_wage_jobs \\\n", "Major_category \n", "Agriculture & Natural Resources 3449.100000 789.900000 \n", "Arts 20465.000000 7514.500000 \n", "Biology & Life Science 9084.428571 3053.000000 \n", "Business 38197.692308 9752.923077 \n", "Communications & Journalism 43248.000000 12398.750000 \n", "Computers & Mathematics 6769.363636 1466.909091 \n", "Education 7610.062500 2554.375000 \n", "Engineering 3530.448276 864.793103 \n", "Health 9208.000000 2605.833333 \n", "Humanities & Liberal Arts 18435.466667 6282.666667 \n", "Industrial Arts & Consumer Services 14945.714286 3798.571429 \n", "Interdisciplinary 3903.000000 1061.000000 \n", "Law & Public Policy 20004.800000 4144.000000 \n", "Physical Sciences 4946.900000 1407.800000 \n", "Psychology & Social Work 18818.444444 6249.555556 \n", "Social Science 21138.444444 6020.000000 \n", "\n", " Employment rate \n", "Major_category \n", "Agriculture & Natural Resources 0.860603 \n", "Arts 0.815670 \n", "Biology & Life Science 0.700543 \n", "Business 0.829638 \n", "Communications & Journalism 0.844534 \n", "Computers & Mathematics 0.799617 \n", "Education 0.856697 \n", "Engineering 0.787311 \n", "Health 0.758078 \n", "Humanities & Liberal Arts 0.767983 \n", "Industrial Arts & Consumer Services 0.715442 \n", "Interdisciplinary 0.798715 \n", "Law & Public Policy 0.770304 \n", "Physical Sciences 0.776080 \n", "Psychology & Social Work 0.795633 \n", "Social Science 0.770296 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.mean()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "Major_category\n", "Agriculture & Natural Resources 36900.000000\n", "Arts 33062.500000\n", "Biology & Life Science 36421.428571\n", "Business 43538.461538\n", "Communications & Journalism 34500.000000\n", "Computers & Mathematics 42745.454545\n", "Education 32350.000000\n", "Engineering 57382.758621\n", "Health 36825.000000\n", "Humanities & Liberal Arts 31913.333333\n", "Industrial Arts & Consumer Services 36342.857143\n", "Interdisciplinary 35000.000000\n", "Law & Public Policy 42200.000000\n", "Physical Sciences 41890.000000\n", "Psychology & Social Work 30100.000000\n", "Social Science 37344.444444\n", "Name: Median, dtype: float64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped['Median'].mean()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanmedianstd
Major_category
Agriculture & Natural Resources36900.000000350006935.416354
Arts33062.500000307507223.164621
Biology & Life Science36421.428571363004528.912006
Business43538.461538400007774.052832
Communications & Journalism34500.000000350001000.000000
Computers & Mathematics42745.454545450005108.691346
Education32350.000000327503892.728263
Engineering57382.7586215700013626.079747
Health36825.000000350005776.460854
Humanities & Liberal Arts31913.333333320003393.032076
Industrial Arts & Consumer Services36342.857143350007290.829204
Interdisciplinary35000.00000035000NaN
Law & Public Policy42200.000000360009066.421565
Physical Sciences41890.000000395008251.659766
Psychology & Social Work30100.000000300005381.914158
Social Science37344.444444380004750.555523
\n", "
" ], "text/plain": [ " mean median std\n", "Major_category \n", "Agriculture & Natural Resources 36900.000000 35000 6935.416354\n", "Arts 33062.500000 30750 7223.164621\n", "Biology & Life Science 36421.428571 36300 4528.912006\n", "Business 43538.461538 40000 7774.052832\n", "Communications & Journalism 34500.000000 35000 1000.000000\n", "Computers & Mathematics 42745.454545 45000 5108.691346\n", "Education 32350.000000 32750 3892.728263\n", "Engineering 57382.758621 57000 13626.079747\n", "Health 36825.000000 35000 5776.460854\n", "Humanities & Liberal Arts 31913.333333 32000 3393.032076\n", "Industrial Arts & Consumer Services 36342.857143 35000 7290.829204\n", "Interdisciplinary 35000.000000 35000 NaN\n", "Law & Public Policy 42200.000000 36000 9066.421565\n", "Physical Sciences 41890.000000 39500 8251.659766\n", "Psychology & Social Work 30100.000000 30000 5381.914158\n", "Social Science 37344.444444 38000 4750.555523" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped['Median'].agg(['mean', 'median', 'std'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Plot from GroupBy objects" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "grouped['Median'].mean().plot(kind='bar', figsize=(10, 8))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Merging and appending data\n", "\n", "Simple example with fictitious data" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "hide-output": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "raw_data_1 = {'subject_id': ['1', '2', '3', '4', '5'],\n", " 'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],\n", " 'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}\n", "\n", "raw_data_2 = {'subject_id': ['4', '5', '6', '7', '8'],\n", " 'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],\n", " 'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}\n", "\n", "raw_data_3 = {'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],\n", " 'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Assign each to a dataframe called data1, data2, data3\n", "data1 = pd.DataFrame(raw_data_1, columns = ['subject_id', 'first_name', 'last_name'])\n", "data2 = pd.DataFrame(raw_data_2, columns = ['subject_id', 'first_name', 'last_name'])\n", "data3 = pd.DataFrame(raw_data_3, columns = ['subject_id','test_id'])" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idfirst_namelast_name
01AlexAnderson
12AmyAckerman
23AllenAli
34AliceAoni
45AyoungAtiches
\n", "
" ], "text/plain": [ " subject_id first_name last_name\n", "0 1 Alex Anderson\n", "1 2 Amy Ackerman\n", "2 3 Allen Ali\n", "3 4 Alice Aoni\n", "4 5 Ayoung Atiches" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print data 1\n", "data1" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idfirst_namelast_name
04BillyBonder
15BrianBlack
26BranBalwner
37BryceBrice
48BettyBtisan
\n", "
" ], "text/plain": [ " subject_id first_name last_name\n", "0 4 Billy Bonder\n", "1 5 Brian Black\n", "2 6 Bran Balwner\n", "3 7 Bryce Brice\n", "4 8 Betty Btisan" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print data 2\n", "data2" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idtest_id
0151
1215
2315
3461
4516
5714
6815
791
81061
91116
\n", "
" ], "text/plain": [ " subject_id test_id\n", "0 1 51\n", "1 2 15\n", "2 3 15\n", "3 4 61\n", "4 5 16\n", "5 7 14\n", "6 8 15\n", "7 9 1\n", "8 10 61\n", "9 11 16" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print data 3\n", "data3" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idfirst_namelast_name
01AlexAnderson
12AmyAckerman
23AllenAli
34AliceAoni
45AyoungAtiches
04BillyBonder
15BrianBlack
26BranBalwner
37BryceBrice
48BettyBtisan
\n", "
" ], "text/plain": [ " subject_id first_name last_name\n", "0 1 Alex Anderson\n", "1 2 Amy Ackerman\n", "2 3 Allen Ali\n", "3 4 Alice Aoni\n", "4 5 Ayoung Atiches\n", "0 4 Billy Bonder\n", "1 5 Brian Black\n", "2 6 Bran Balwner\n", "3 7 Bryce Brice\n", "4 8 Betty Btisan" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Join the two dataframes along rows\n", "data_all_rows = pd.concat([data1, data2])\n", "data_all_rows" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idfirst_namelast_namesubject_idfirst_namelast_name
01AlexAnderson4BillyBonder
12AmyAckerman5BrianBlack
23AllenAli6BranBalwner
34AliceAoni7BryceBrice
45AyoungAtiches8BettyBtisan
\n", "
" ], "text/plain": [ " subject_id first_name last_name subject_id first_name last_name\n", "0 1 Alex Anderson 4 Billy Bonder\n", "1 2 Amy Ackerman 5 Brian Black\n", "2 3 Allen Ali 6 Bran Balwner\n", "3 4 Alice Aoni 7 Bryce Brice\n", "4 5 Ayoung Atiches 8 Betty Btisan" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Join the two dataframes along columns\n", "data_all_col = pd.concat([data1, data2], axis = 1)\n", "data_all_col" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idfirst_namelast_nametest_id
01AlexAnderson51
12AmyAckerman15
23AllenAli15
34AliceAoni61
44BillyBonder61
55AyoungAtiches16
65BrianBlack16
77BryceBrice14
88BettyBtisan15
\n", "
" ], "text/plain": [ " subject_id first_name last_name test_id\n", "0 1 Alex Anderson 51\n", "1 2 Amy Ackerman 15\n", "2 3 Allen Ali 15\n", "3 4 Alice Aoni 61\n", "4 4 Billy Bonder 61\n", "5 5 Ayoung Atiches 16\n", "6 5 Brian Black 16\n", "7 7 Bryce Brice 14\n", "8 8 Betty Btisan 15" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Merge data_all_rows and data3 along the subject_id value\n", "pd.merge(data_all_rows, data3, on='subject_id')" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
04AliceAoniBillyBonder
15AyoungAtichesBrianBlack
\n", "
" ], "text/plain": [ " subject_id first_name_x last_name_x first_name_y last_name_y\n", "0 4 Alice Aoni Billy Bonder\n", "1 5 Ayoung Atiches Brian Black" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inner merge\n", "pd.merge(data1, data2, on='subject_id', how='inner')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
01AlexAndersonNaNNaN
12AmyAckermanNaNNaN
23AllenAliNaNNaN
34AliceAoniBillyBonder
45AyoungAtichesBrianBlack
56NaNNaNBranBalwner
67NaNNaNBryceBrice
78NaNNaNBettyBtisan
\n", "
" ], "text/plain": [ " subject_id first_name_x last_name_x first_name_y last_name_y\n", "0 1 Alex Anderson NaN NaN\n", "1 2 Amy Ackerman NaN NaN\n", "2 3 Allen Ali NaN NaN\n", "3 4 Alice Aoni Billy Bonder\n", "4 5 Ayoung Atiches Brian Black\n", "5 6 NaN NaN Bran Balwner\n", "6 7 NaN NaN Bryce Brice\n", "7 8 NaN NaN Betty Btisan" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Outer merge\n", "pd.merge(data1, data2, on='subject_id', how='outer')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Further learning resources\n", "\n", "- Reference manual for Pandas\n", " [https://pandas.pydata.org/pandas-docs/stable/getting_started/](https://pandas.pydata.org/pandas-docs/stable/getting_started/) \n", "- Pandas at QuantEcon lectures\n", " [https://lectures.quantecon.org/py/pandas.html](https://lectures.quantecon.org/py/pandas.html)\n", " [https://lectures.quantecon.org/py/pandas_panel.html](https://lectures.quantecon.org/py/pandas_panel.html) \n", "- Pandas at QuantEcon DataScience\n", " [https://datascience.quantecon.org/pandas/](https://datascience.quantecon.org/pandas/) \n", "- QuantEcon [Stata-R-Pandas\n", " cheatsheet](https://cheatsheets.quantecon.org/stats-cheatsheet.html) \n", "- 📖 Kevin Sheppard “Introduction to Python for Econometrics, Statistics\n", " and Data Analysis.” *Chapter: 9, 16* " ] } ], "metadata": { "celltoolbar": "Slideshow", "date": 1612589584.85308, "download_nb": false, "filename": "15_pandas.rst", "filename_with_path": "15_pandas", "kernelspec": { "display_name": "Python", "language": "python3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "title": "Foundations of Computational Economics #15" }, "nbformat": 4, "nbformat_minor": 4 }