{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Module 3\n",
"\n",
"## Video 15: Filtering and Modifying DataFrames\n",
"**Python for the Energy Industry**\n",
"\n",
"## Filtering DataFrames\n",
"\n",
"A DataFrame can be filtered on a condition in a similar way to filtering numpy arrays. We will use the OPEC data example again:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"OPEC_df = pd.DataFrame({\n",
" 'Country': ['Algeria','Angola','Equatorial Guinea','Gabon','Iran','Iraq','Kuwait','Libya','Nigeria','Republic of the Congo','Saudi Arabia','UAE','Venezuela'],\n",
" 'Region': ['North Africa','Southern Africa','Central Africa','Central Africa','Middle East','Middle East','Middle East','North Africa','West Africa','Central Africa','Middle East','Middle East','South America'],\n",
" 'Population': [42228408,30809787,1308975,2119275,81800188,38433600,4137312,6678559,195874685,5125821,33702756,9630959,28887118],\n",
" 'Oil Production': [1348361,1769615,np.nan,210820,3990956,4451516,2923825,384686,1999885,260000,10460710,3106077,2276967],\n",
" 'Proven Reserves': [12.2e9,8.423e9,np.nan,2e9,157.53e9,143.069e9,101.5e9,48.363e9,37.07e9,1.6e9,266.578e9,97.8e9,299.953e9]\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can do boolean operations on a column of a DataFrame to obtain an array of boolean values. This boolean array can be used to filter a DataFrame - all entries corresponding to a False value are dropped from the DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" Region | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" Iran | \n",
" Middle East | \n",
" 81800188 | \n",
" 3990956.0 | \n",
" 1.575300e+11 | \n",
"
\n",
" \n",
" 5 | \n",
" Iraq | \n",
" Middle East | \n",
" 38433600 | \n",
" 4451516.0 | \n",
" 1.430690e+11 | \n",
"
\n",
" \n",
" 6 | \n",
" Kuwait | \n",
" Middle East | \n",
" 4137312 | \n",
" 2923825.0 | \n",
" 1.015000e+11 | \n",
"
\n",
" \n",
" 10 | \n",
" Saudi Arabia | \n",
" Middle East | \n",
" 33702756 | \n",
" 10460710.0 | \n",
" 2.665780e+11 | \n",
"
\n",
" \n",
" 11 | \n",
" UAE | \n",
" Middle East | \n",
" 9630959 | \n",
" 3106077.0 | \n",
" 9.780000e+10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Region Population Oil Production Proven Reserves\n",
"4 Iran Middle East 81800188 3990956.0 1.575300e+11\n",
"5 Iraq Middle East 38433600 4451516.0 1.430690e+11\n",
"6 Kuwait Middle East 4137312 2923825.0 1.015000e+11\n",
"10 Saudi Arabia Middle East 33702756 10460710.0 2.665780e+11\n",
"11 UAE Middle East 9630959 3106077.0 9.780000e+10"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get Middle Eastern countries data only\n",
"is_ME = (OPEC_df['Region'] == \"Middle East\")\n",
"OPEC_df[is_ME]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's say we want to filter using some other condition (e.g. countries with production over 2M bbl/day):"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" Region | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" Iran | \n",
" Middle East | \n",
" 81800188 | \n",
" 3990956.0 | \n",
" 1.575300e+11 | \n",
"
\n",
" \n",
" 5 | \n",
" Iraq | \n",
" Middle East | \n",
" 38433600 | \n",
" 4451516.0 | \n",
" 1.430690e+11 | \n",
"
\n",
" \n",
" 6 | \n",
" Kuwait | \n",
" Middle East | \n",
" 4137312 | \n",
" 2923825.0 | \n",
" 1.015000e+11 | \n",
"
\n",
" \n",
" 10 | \n",
" Saudi Arabia | \n",
" Middle East | \n",
" 33702756 | \n",
" 10460710.0 | \n",
" 2.665780e+11 | \n",
"
\n",
" \n",
" 11 | \n",
" UAE | \n",
" Middle East | \n",
" 9630959 | \n",
" 3106077.0 | \n",
" 9.780000e+10 | \n",
"
\n",
" \n",
" 12 | \n",
" Venezuela | \n",
" South America | \n",
" 28887118 | \n",
" 2276967.0 | \n",
" 2.999530e+11 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Region Population Oil Production Proven Reserves\n",
"4 Iran Middle East 81800188 3990956.0 1.575300e+11\n",
"5 Iraq Middle East 38433600 4451516.0 1.430690e+11\n",
"6 Kuwait Middle East 4137312 2923825.0 1.015000e+11\n",
"10 Saudi Arabia Middle East 33702756 10460710.0 2.665780e+11\n",
"11 UAE Middle East 9630959 3106077.0 9.780000e+10\n",
"12 Venezuela South America 28887118 2276967.0 2.999530e+11"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prod_over_2M = (OPEC_df[\"Oil Production\"] > 2e6)\n",
"OPEC_df[prod_over_2M]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The logical_and function we saw before can be used to filter on multiple conditions."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" Region | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" Iran | \n",
" Middle East | \n",
" 81800188 | \n",
" 3990956.0 | \n",
" 1.575300e+11 | \n",
"
\n",
" \n",
" 5 | \n",
" Iraq | \n",
" Middle East | \n",
" 38433600 | \n",
" 4451516.0 | \n",
" 1.430690e+11 | \n",
"
\n",
" \n",
" 6 | \n",
" Kuwait | \n",
" Middle East | \n",
" 4137312 | \n",
" 2923825.0 | \n",
" 1.015000e+11 | \n",
"
\n",
" \n",
" 10 | \n",
" Saudi Arabia | \n",
" Middle East | \n",
" 33702756 | \n",
" 10460710.0 | \n",
" 2.665780e+11 | \n",
"
\n",
" \n",
" 11 | \n",
" UAE | \n",
" Middle East | \n",
" 9630959 | \n",
" 3106077.0 | \n",
" 9.780000e+10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Region Population Oil Production Proven Reserves\n",
"4 Iran Middle East 81800188 3990956.0 1.575300e+11\n",
"5 Iraq Middle East 38433600 4451516.0 1.430690e+11\n",
"6 Kuwait Middle East 4137312 2923825.0 1.015000e+11\n",
"10 Saudi Arabia Middle East 33702756 10460710.0 2.665780e+11\n",
"11 UAE Middle East 9630959 3106077.0 9.780000e+10"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OPEC_df[np.logical_and(is_ME,prod_over_2M)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Modifying DataFrames\n",
"\n",
"### Adding to DataFrames\n",
"\n",
"We can use the 'append' function to add a new entry to DataFrame, by appending a dictionary with keys corresponding to the column names."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" Region | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Algeria | \n",
" North Africa | \n",
" 42228408 | \n",
" 1348361.0 | \n",
" 1.220000e+10 | \n",
"
\n",
" \n",
" 1 | \n",
" Angola | \n",
" Southern Africa | \n",
" 30809787 | \n",
" 1769615.0 | \n",
" 8.423000e+09 | \n",
"
\n",
" \n",
" 2 | \n",
" Equatorial Guinea | \n",
" Central Africa | \n",
" 1308975 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" Gabon | \n",
" Central Africa | \n",
" 2119275 | \n",
" 210820.0 | \n",
" 2.000000e+09 | \n",
"
\n",
" \n",
" 4 | \n",
" Iran | \n",
" Middle East | \n",
" 81800188 | \n",
" 3990956.0 | \n",
" 1.575300e+11 | \n",
"
\n",
" \n",
" 5 | \n",
" Iraq | \n",
" Middle East | \n",
" 38433600 | \n",
" 4451516.0 | \n",
" 1.430690e+11 | \n",
"
\n",
" \n",
" 6 | \n",
" Kuwait | \n",
" Middle East | \n",
" 4137312 | \n",
" 2923825.0 | \n",
" 1.015000e+11 | \n",
"
\n",
" \n",
" 7 | \n",
" Libya | \n",
" North Africa | \n",
" 6678559 | \n",
" 384686.0 | \n",
" 4.836300e+10 | \n",
"
\n",
" \n",
" 8 | \n",
" Nigeria | \n",
" West Africa | \n",
" 195874685 | \n",
" 1999885.0 | \n",
" 3.707000e+10 | \n",
"
\n",
" \n",
" 9 | \n",
" Republic of the Congo | \n",
" Central Africa | \n",
" 5125821 | \n",
" 260000.0 | \n",
" 1.600000e+09 | \n",
"
\n",
" \n",
" 10 | \n",
" Saudi Arabia | \n",
" Middle East | \n",
" 33702756 | \n",
" 10460710.0 | \n",
" 2.665780e+11 | \n",
"
\n",
" \n",
" 11 | \n",
" UAE | \n",
" Middle East | \n",
" 9630959 | \n",
" 3106077.0 | \n",
" 9.780000e+10 | \n",
"
\n",
" \n",
" 12 | \n",
" Venezuela | \n",
" South America | \n",
" 28887118 | \n",
" 2276967.0 | \n",
" 2.999530e+11 | \n",
"
\n",
" \n",
" 13 | \n",
" Ecuador | \n",
" South America | \n",
" 17084358 | \n",
" 548421.0 | \n",
" 8.273000e+09 | \n",
"
\n",
" \n",
" 14 | \n",
" Indonesia | \n",
" Southeast Asia | \n",
" 267670543 | \n",
" 833667.0 | \n",
" 3.692500e+09 | \n",
"
\n",
" \n",
" 15 | \n",
" Qatar | \n",
" Middle East | \n",
" 2781682 | \n",
" 1522902.0 | \n",
" 2.524400e+10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Region Population Oil Production \\\n",
"0 Algeria North Africa 42228408 1348361.0 \n",
"1 Angola Southern Africa 30809787 1769615.0 \n",
"2 Equatorial Guinea Central Africa 1308975 NaN \n",
"3 Gabon Central Africa 2119275 210820.0 \n",
"4 Iran Middle East 81800188 3990956.0 \n",
"5 Iraq Middle East 38433600 4451516.0 \n",
"6 Kuwait Middle East 4137312 2923825.0 \n",
"7 Libya North Africa 6678559 384686.0 \n",
"8 Nigeria West Africa 195874685 1999885.0 \n",
"9 Republic of the Congo Central Africa 5125821 260000.0 \n",
"10 Saudi Arabia Middle East 33702756 10460710.0 \n",
"11 UAE Middle East 9630959 3106077.0 \n",
"12 Venezuela South America 28887118 2276967.0 \n",
"13 Ecuador South America 17084358 548421.0 \n",
"14 Indonesia Southeast Asia 267670543 833667.0 \n",
"15 Qatar Middle East 2781682 1522902.0 \n",
"\n",
" Proven Reserves \n",
"0 1.220000e+10 \n",
"1 8.423000e+09 \n",
"2 NaN \n",
"3 2.000000e+09 \n",
"4 1.575300e+11 \n",
"5 1.430690e+11 \n",
"6 1.015000e+11 \n",
"7 4.836300e+10 \n",
"8 3.707000e+10 \n",
"9 1.600000e+09 \n",
"10 2.665780e+11 \n",
"11 9.780000e+10 \n",
"12 2.999530e+11 \n",
"13 8.273000e+09 \n",
"14 3.692500e+09 \n",
"15 2.524400e+10 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Include countries with lapsed OPEC membership\n",
"\n",
"OPEC_df = OPEC_df.append({'Country':'Ecuador',\n",
" 'Region':'South America',\n",
" 'Population': 17084358,\n",
" 'Oil Production': 548421,\n",
" 'Proven Reserves': 8.273e9},\n",
" ignore_index=True)\n",
"\n",
"OPEC_df = OPEC_df.append({'Country':'Indonesia',\n",
" 'Region':'Southeast Asia',\n",
" 'Population': 267670543,\n",
" 'Oil Production': 833667,\n",
" 'Proven Reserves': 3.6925e9},\n",
" ignore_index=True)\n",
"\n",
"OPEC_df = OPEC_df.append({'Country':'Qatar',\n",
" 'Region':'Middle East',\n",
" 'Population': 2781682,\n",
" 'Oil Production': 1522902,\n",
" 'Proven Reserves': 25.244e9},\n",
" ignore_index=True)\n",
"\n",
"OPEC_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"New columns in a DataFrame can be created by setting a column of that name to a list/array of the right length:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" Region | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
" Membership Active | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Algeria | \n",
" North Africa | \n",
" 42228408 | \n",
" 1348361.0 | \n",
" 1.220000e+10 | \n",
" True | \n",
"
\n",
" \n",
" 1 | \n",
" Angola | \n",
" Southern Africa | \n",
" 30809787 | \n",
" 1769615.0 | \n",
" 8.423000e+09 | \n",
" True | \n",
"
\n",
" \n",
" 2 | \n",
" Equatorial Guinea | \n",
" Central Africa | \n",
" 1308975 | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
"
\n",
" \n",
" 3 | \n",
" Gabon | \n",
" Central Africa | \n",
" 2119275 | \n",
" 210820.0 | \n",
" 2.000000e+09 | \n",
" True | \n",
"
\n",
" \n",
" 4 | \n",
" Iran | \n",
" Middle East | \n",
" 81800188 | \n",
" 3990956.0 | \n",
" 1.575300e+11 | \n",
" True | \n",
"
\n",
" \n",
" 5 | \n",
" Iraq | \n",
" Middle East | \n",
" 38433600 | \n",
" 4451516.0 | \n",
" 1.430690e+11 | \n",
" True | \n",
"
\n",
" \n",
" 6 | \n",
" Kuwait | \n",
" Middle East | \n",
" 4137312 | \n",
" 2923825.0 | \n",
" 1.015000e+11 | \n",
" True | \n",
"
\n",
" \n",
" 7 | \n",
" Libya | \n",
" North Africa | \n",
" 6678559 | \n",
" 384686.0 | \n",
" 4.836300e+10 | \n",
" True | \n",
"
\n",
" \n",
" 8 | \n",
" Nigeria | \n",
" West Africa | \n",
" 195874685 | \n",
" 1999885.0 | \n",
" 3.707000e+10 | \n",
" True | \n",
"
\n",
" \n",
" 9 | \n",
" Republic of the Congo | \n",
" Central Africa | \n",
" 5125821 | \n",
" 260000.0 | \n",
" 1.600000e+09 | \n",
" True | \n",
"
\n",
" \n",
" 10 | \n",
" Saudi Arabia | \n",
" Middle East | \n",
" 33702756 | \n",
" 10460710.0 | \n",
" 2.665780e+11 | \n",
" True | \n",
"
\n",
" \n",
" 11 | \n",
" UAE | \n",
" Middle East | \n",
" 9630959 | \n",
" 3106077.0 | \n",
" 9.780000e+10 | \n",
" True | \n",
"
\n",
" \n",
" 12 | \n",
" Venezuela | \n",
" South America | \n",
" 28887118 | \n",
" 2276967.0 | \n",
" 2.999530e+11 | \n",
" True | \n",
"
\n",
" \n",
" 13 | \n",
" Ecuador | \n",
" South America | \n",
" 17084358 | \n",
" 548421.0 | \n",
" 8.273000e+09 | \n",
" False | \n",
"
\n",
" \n",
" 14 | \n",
" Indonesia | \n",
" Southeast Asia | \n",
" 267670543 | \n",
" 833667.0 | \n",
" 3.692500e+09 | \n",
" False | \n",
"
\n",
" \n",
" 15 | \n",
" Qatar | \n",
" Middle East | \n",
" 2781682 | \n",
" 1522902.0 | \n",
" 2.524400e+10 | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Region Population Oil Production \\\n",
"0 Algeria North Africa 42228408 1348361.0 \n",
"1 Angola Southern Africa 30809787 1769615.0 \n",
"2 Equatorial Guinea Central Africa 1308975 NaN \n",
"3 Gabon Central Africa 2119275 210820.0 \n",
"4 Iran Middle East 81800188 3990956.0 \n",
"5 Iraq Middle East 38433600 4451516.0 \n",
"6 Kuwait Middle East 4137312 2923825.0 \n",
"7 Libya North Africa 6678559 384686.0 \n",
"8 Nigeria West Africa 195874685 1999885.0 \n",
"9 Republic of the Congo Central Africa 5125821 260000.0 \n",
"10 Saudi Arabia Middle East 33702756 10460710.0 \n",
"11 UAE Middle East 9630959 3106077.0 \n",
"12 Venezuela South America 28887118 2276967.0 \n",
"13 Ecuador South America 17084358 548421.0 \n",
"14 Indonesia Southeast Asia 267670543 833667.0 \n",
"15 Qatar Middle East 2781682 1522902.0 \n",
"\n",
" Proven Reserves Membership Active \n",
"0 1.220000e+10 True \n",
"1 8.423000e+09 True \n",
"2 NaN True \n",
"3 2.000000e+09 True \n",
"4 1.575300e+11 True \n",
"5 1.430690e+11 True \n",
"6 1.015000e+11 True \n",
"7 4.836300e+10 True \n",
"8 3.707000e+10 True \n",
"9 1.600000e+09 True \n",
"10 2.665780e+11 True \n",
"11 9.780000e+10 True \n",
"12 2.999530e+11 True \n",
"13 8.273000e+09 False \n",
"14 3.692500e+09 False \n",
"15 2.524400e+10 False "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OPEC_df['Membership Active'] = [True]*13 + [False]*3\n",
"OPEC_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"New columns can also be created from operations on existing columns:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" Region | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
" Membership Active | \n",
" Production per cap | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Algeria | \n",
" North Africa | \n",
" 42228408 | \n",
" 1348361.0 | \n",
" 1.220000e+10 | \n",
" True | \n",
" 0.031930 | \n",
"
\n",
" \n",
" 1 | \n",
" Angola | \n",
" Southern Africa | \n",
" 30809787 | \n",
" 1769615.0 | \n",
" 8.423000e+09 | \n",
" True | \n",
" 0.057437 | \n",
"
\n",
" \n",
" 2 | \n",
" Equatorial Guinea | \n",
" Central Africa | \n",
" 1308975 | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" Gabon | \n",
" Central Africa | \n",
" 2119275 | \n",
" 210820.0 | \n",
" 2.000000e+09 | \n",
" True | \n",
" 0.099477 | \n",
"
\n",
" \n",
" 4 | \n",
" Iran | \n",
" Middle East | \n",
" 81800188 | \n",
" 3990956.0 | \n",
" 1.575300e+11 | \n",
" True | \n",
" 0.048789 | \n",
"
\n",
" \n",
" 5 | \n",
" Iraq | \n",
" Middle East | \n",
" 38433600 | \n",
" 4451516.0 | \n",
" 1.430690e+11 | \n",
" True | \n",
" 0.115824 | \n",
"
\n",
" \n",
" 6 | \n",
" Kuwait | \n",
" Middle East | \n",
" 4137312 | \n",
" 2923825.0 | \n",
" 1.015000e+11 | \n",
" True | \n",
" 0.706697 | \n",
"
\n",
" \n",
" 7 | \n",
" Libya | \n",
" North Africa | \n",
" 6678559 | \n",
" 384686.0 | \n",
" 4.836300e+10 | \n",
" True | \n",
" 0.057600 | \n",
"
\n",
" \n",
" 8 | \n",
" Nigeria | \n",
" West Africa | \n",
" 195874685 | \n",
" 1999885.0 | \n",
" 3.707000e+10 | \n",
" True | \n",
" 0.010210 | \n",
"
\n",
" \n",
" 9 | \n",
" Republic of the Congo | \n",
" Central Africa | \n",
" 5125821 | \n",
" 260000.0 | \n",
" 1.600000e+09 | \n",
" True | \n",
" 0.050724 | \n",
"
\n",
" \n",
" 10 | \n",
" Saudi Arabia | \n",
" Middle East | \n",
" 33702756 | \n",
" 10460710.0 | \n",
" 2.665780e+11 | \n",
" True | \n",
" 0.310381 | \n",
"
\n",
" \n",
" 11 | \n",
" UAE | \n",
" Middle East | \n",
" 9630959 | \n",
" 3106077.0 | \n",
" 9.780000e+10 | \n",
" True | \n",
" 0.322510 | \n",
"
\n",
" \n",
" 12 | \n",
" Venezuela | \n",
" South America | \n",
" 28887118 | \n",
" 2276967.0 | \n",
" 2.999530e+11 | \n",
" True | \n",
" 0.078823 | \n",
"
\n",
" \n",
" 13 | \n",
" Ecuador | \n",
" South America | \n",
" 17084358 | \n",
" 548421.0 | \n",
" 8.273000e+09 | \n",
" False | \n",
" 0.032101 | \n",
"
\n",
" \n",
" 14 | \n",
" Indonesia | \n",
" Southeast Asia | \n",
" 267670543 | \n",
" 833667.0 | \n",
" 3.692500e+09 | \n",
" False | \n",
" 0.003115 | \n",
"
\n",
" \n",
" 15 | \n",
" Qatar | \n",
" Middle East | \n",
" 2781682 | \n",
" 1522902.0 | \n",
" 2.524400e+10 | \n",
" False | \n",
" 0.547475 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Region Population Oil Production \\\n",
"0 Algeria North Africa 42228408 1348361.0 \n",
"1 Angola Southern Africa 30809787 1769615.0 \n",
"2 Equatorial Guinea Central Africa 1308975 NaN \n",
"3 Gabon Central Africa 2119275 210820.0 \n",
"4 Iran Middle East 81800188 3990956.0 \n",
"5 Iraq Middle East 38433600 4451516.0 \n",
"6 Kuwait Middle East 4137312 2923825.0 \n",
"7 Libya North Africa 6678559 384686.0 \n",
"8 Nigeria West Africa 195874685 1999885.0 \n",
"9 Republic of the Congo Central Africa 5125821 260000.0 \n",
"10 Saudi Arabia Middle East 33702756 10460710.0 \n",
"11 UAE Middle East 9630959 3106077.0 \n",
"12 Venezuela South America 28887118 2276967.0 \n",
"13 Ecuador South America 17084358 548421.0 \n",
"14 Indonesia Southeast Asia 267670543 833667.0 \n",
"15 Qatar Middle East 2781682 1522902.0 \n",
"\n",
" Proven Reserves Membership Active Production per cap \n",
"0 1.220000e+10 True 0.031930 \n",
"1 8.423000e+09 True 0.057437 \n",
"2 NaN True NaN \n",
"3 2.000000e+09 True 0.099477 \n",
"4 1.575300e+11 True 0.048789 \n",
"5 1.430690e+11 True 0.115824 \n",
"6 1.015000e+11 True 0.706697 \n",
"7 4.836300e+10 True 0.057600 \n",
"8 3.707000e+10 True 0.010210 \n",
"9 1.600000e+09 True 0.050724 \n",
"10 2.665780e+11 True 0.310381 \n",
"11 9.780000e+10 True 0.322510 \n",
"12 2.999530e+11 True 0.078823 \n",
"13 8.273000e+09 False 0.032101 \n",
"14 3.692500e+09 False 0.003115 \n",
"15 2.524400e+10 False 0.547475 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OPEC_df['Production per cap'] = OPEC_df['Oil Production'] / OPEC_df['Population']\n",
"OPEC_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sorting DataFrames\n",
"\n",
"We can sort a DataFrame by the values in a given column:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" Region | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
" Membership Active | \n",
" Production per cap | \n",
"
\n",
" \n",
" \n",
" \n",
" 10 | \n",
" Saudi Arabia | \n",
" Middle East | \n",
" 33702756 | \n",
" 10460710.0 | \n",
" 2.665780e+11 | \n",
" True | \n",
" 0.310381 | \n",
"
\n",
" \n",
" 5 | \n",
" Iraq | \n",
" Middle East | \n",
" 38433600 | \n",
" 4451516.0 | \n",
" 1.430690e+11 | \n",
" True | \n",
" 0.115824 | \n",
"
\n",
" \n",
" 4 | \n",
" Iran | \n",
" Middle East | \n",
" 81800188 | \n",
" 3990956.0 | \n",
" 1.575300e+11 | \n",
" True | \n",
" 0.048789 | \n",
"
\n",
" \n",
" 11 | \n",
" UAE | \n",
" Middle East | \n",
" 9630959 | \n",
" 3106077.0 | \n",
" 9.780000e+10 | \n",
" True | \n",
" 0.322510 | \n",
"
\n",
" \n",
" 6 | \n",
" Kuwait | \n",
" Middle East | \n",
" 4137312 | \n",
" 2923825.0 | \n",
" 1.015000e+11 | \n",
" True | \n",
" 0.706697 | \n",
"
\n",
" \n",
" 12 | \n",
" Venezuela | \n",
" South America | \n",
" 28887118 | \n",
" 2276967.0 | \n",
" 2.999530e+11 | \n",
" True | \n",
" 0.078823 | \n",
"
\n",
" \n",
" 8 | \n",
" Nigeria | \n",
" West Africa | \n",
" 195874685 | \n",
" 1999885.0 | \n",
" 3.707000e+10 | \n",
" True | \n",
" 0.010210 | \n",
"
\n",
" \n",
" 1 | \n",
" Angola | \n",
" Southern Africa | \n",
" 30809787 | \n",
" 1769615.0 | \n",
" 8.423000e+09 | \n",
" True | \n",
" 0.057437 | \n",
"
\n",
" \n",
" 15 | \n",
" Qatar | \n",
" Middle East | \n",
" 2781682 | \n",
" 1522902.0 | \n",
" 2.524400e+10 | \n",
" False | \n",
" 0.547475 | \n",
"
\n",
" \n",
" 0 | \n",
" Algeria | \n",
" North Africa | \n",
" 42228408 | \n",
" 1348361.0 | \n",
" 1.220000e+10 | \n",
" True | \n",
" 0.031930 | \n",
"
\n",
" \n",
" 14 | \n",
" Indonesia | \n",
" Southeast Asia | \n",
" 267670543 | \n",
" 833667.0 | \n",
" 3.692500e+09 | \n",
" False | \n",
" 0.003115 | \n",
"
\n",
" \n",
" 13 | \n",
" Ecuador | \n",
" South America | \n",
" 17084358 | \n",
" 548421.0 | \n",
" 8.273000e+09 | \n",
" False | \n",
" 0.032101 | \n",
"
\n",
" \n",
" 7 | \n",
" Libya | \n",
" North Africa | \n",
" 6678559 | \n",
" 384686.0 | \n",
" 4.836300e+10 | \n",
" True | \n",
" 0.057600 | \n",
"
\n",
" \n",
" 9 | \n",
" Republic of the Congo | \n",
" Central Africa | \n",
" 5125821 | \n",
" 260000.0 | \n",
" 1.600000e+09 | \n",
" True | \n",
" 0.050724 | \n",
"
\n",
" \n",
" 3 | \n",
" Gabon | \n",
" Central Africa | \n",
" 2119275 | \n",
" 210820.0 | \n",
" 2.000000e+09 | \n",
" True | \n",
" 0.099477 | \n",
"
\n",
" \n",
" 2 | \n",
" Equatorial Guinea | \n",
" Central Africa | \n",
" 1308975 | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Region Population Oil Production \\\n",
"10 Saudi Arabia Middle East 33702756 10460710.0 \n",
"5 Iraq Middle East 38433600 4451516.0 \n",
"4 Iran Middle East 81800188 3990956.0 \n",
"11 UAE Middle East 9630959 3106077.0 \n",
"6 Kuwait Middle East 4137312 2923825.0 \n",
"12 Venezuela South America 28887118 2276967.0 \n",
"8 Nigeria West Africa 195874685 1999885.0 \n",
"1 Angola Southern Africa 30809787 1769615.0 \n",
"15 Qatar Middle East 2781682 1522902.0 \n",
"0 Algeria North Africa 42228408 1348361.0 \n",
"14 Indonesia Southeast Asia 267670543 833667.0 \n",
"13 Ecuador South America 17084358 548421.0 \n",
"7 Libya North Africa 6678559 384686.0 \n",
"9 Republic of the Congo Central Africa 5125821 260000.0 \n",
"3 Gabon Central Africa 2119275 210820.0 \n",
"2 Equatorial Guinea Central Africa 1308975 NaN \n",
"\n",
" Proven Reserves Membership Active Production per cap \n",
"10 2.665780e+11 True 0.310381 \n",
"5 1.430690e+11 True 0.115824 \n",
"4 1.575300e+11 True 0.048789 \n",
"11 9.780000e+10 True 0.322510 \n",
"6 1.015000e+11 True 0.706697 \n",
"12 2.999530e+11 True 0.078823 \n",
"8 3.707000e+10 True 0.010210 \n",
"1 8.423000e+09 True 0.057437 \n",
"15 2.524400e+10 False 0.547475 \n",
"0 1.220000e+10 True 0.031930 \n",
"14 3.692500e+09 False 0.003115 \n",
"13 8.273000e+09 False 0.032101 \n",
"7 4.836300e+10 True 0.057600 \n",
"9 1.600000e+09 True 0.050724 \n",
"3 2.000000e+09 True 0.099477 \n",
"2 NaN True NaN "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OPEC_df.sort_values(by='Oil Production', ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Grouping\n",
"\n",
"We can also group entries by a common property. In this dataset, Region is the natural grouping. We have to specify what operation is applied to numeric data, e.g. adding or averaging the values for a given group."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
" Membership Active | \n",
" Production per cap | \n",
"
\n",
" \n",
" Region | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Central Africa | \n",
" 8554071 | \n",
" 470820.0 | \n",
" 3.600000e+09 | \n",
" 3 | \n",
" 0.150201 | \n",
"
\n",
" \n",
" Middle East | \n",
" 170486497 | \n",
" 26455986.0 | \n",
" 7.917210e+11 | \n",
" 5 | \n",
" 2.051676 | \n",
"
\n",
" \n",
" North Africa | \n",
" 48906967 | \n",
" 1733047.0 | \n",
" 6.056300e+10 | \n",
" 2 | \n",
" 0.089530 | \n",
"
\n",
" \n",
" South America | \n",
" 45971476 | \n",
" 2825388.0 | \n",
" 3.082260e+11 | \n",
" 1 | \n",
" 0.110924 | \n",
"
\n",
" \n",
" Southeast Asia | \n",
" 267670543 | \n",
" 833667.0 | \n",
" 3.692500e+09 | \n",
" 0 | \n",
" 0.003115 | \n",
"
\n",
" \n",
" Southern Africa | \n",
" 30809787 | \n",
" 1769615.0 | \n",
" 8.423000e+09 | \n",
" 1 | \n",
" 0.057437 | \n",
"
\n",
" \n",
" West Africa | \n",
" 195874685 | \n",
" 1999885.0 | \n",
" 3.707000e+10 | \n",
" 1 | \n",
" 0.010210 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Population Oil Production Proven Reserves \\\n",
"Region \n",
"Central Africa 8554071 470820.0 3.600000e+09 \n",
"Middle East 170486497 26455986.0 7.917210e+11 \n",
"North Africa 48906967 1733047.0 6.056300e+10 \n",
"South America 45971476 2825388.0 3.082260e+11 \n",
"Southeast Asia 267670543 833667.0 3.692500e+09 \n",
"Southern Africa 30809787 1769615.0 8.423000e+09 \n",
"West Africa 195874685 1999885.0 3.707000e+10 \n",
"\n",
" Membership Active Production per cap \n",
"Region \n",
"Central Africa 3 0.150201 \n",
"Middle East 5 2.051676 \n",
"North Africa 2 0.089530 \n",
"South America 1 0.110924 \n",
"Southeast Asia 0 0.003115 \n",
"Southern Africa 1 0.057437 \n",
"West Africa 1 0.010210 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OPEC_df.groupby('Region').sum()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Population | \n",
" Oil Production | \n",
" Proven Reserves | \n",
" Membership Active | \n",
" Production per cap | \n",
"
\n",
" \n",
" Region | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Central Africa | \n",
" 2.851357e+06 | \n",
" 235410.0 | \n",
" 1.800000e+09 | \n",
" 1.000000 | \n",
" 0.075100 | \n",
"
\n",
" \n",
" Middle East | \n",
" 2.841442e+07 | \n",
" 4409331.0 | \n",
" 1.319535e+11 | \n",
" 0.833333 | \n",
" 0.341946 | \n",
"
\n",
" \n",
" North Africa | \n",
" 2.445348e+07 | \n",
" 866523.5 | \n",
" 3.028150e+10 | \n",
" 1.000000 | \n",
" 0.044765 | \n",
"
\n",
" \n",
" South America | \n",
" 2.298574e+07 | \n",
" 1412694.0 | \n",
" 1.541130e+11 | \n",
" 0.500000 | \n",
" 0.055462 | \n",
"
\n",
" \n",
" Southeast Asia | \n",
" 2.676705e+08 | \n",
" 833667.0 | \n",
" 3.692500e+09 | \n",
" 0.000000 | \n",
" 0.003115 | \n",
"
\n",
" \n",
" Southern Africa | \n",
" 3.080979e+07 | \n",
" 1769615.0 | \n",
" 8.423000e+09 | \n",
" 1.000000 | \n",
" 0.057437 | \n",
"
\n",
" \n",
" West Africa | \n",
" 1.958747e+08 | \n",
" 1999885.0 | \n",
" 3.707000e+10 | \n",
" 1.000000 | \n",
" 0.010210 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Population Oil Production Proven Reserves \\\n",
"Region \n",
"Central Africa 2.851357e+06 235410.0 1.800000e+09 \n",
"Middle East 2.841442e+07 4409331.0 1.319535e+11 \n",
"North Africa 2.445348e+07 866523.5 3.028150e+10 \n",
"South America 2.298574e+07 1412694.0 1.541130e+11 \n",
"Southeast Asia 2.676705e+08 833667.0 3.692500e+09 \n",
"Southern Africa 3.080979e+07 1769615.0 8.423000e+09 \n",
"West Africa 1.958747e+08 1999885.0 3.707000e+10 \n",
"\n",
" Membership Active Production per cap \n",
"Region \n",
"Central Africa 1.000000 0.075100 \n",
"Middle East 0.833333 0.341946 \n",
"North Africa 1.000000 0.044765 \n",
"South America 0.500000 0.055462 \n",
"Southeast Asia 0.000000 0.003115 \n",
"Southern Africa 1.000000 0.057437 \n",
"West Africa 1.000000 0.010210 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OPEC_df.groupby('Region').mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise\n",
"\n",
"Add to the DataFrame a column for proven reserves per capita. Output:\n",
"- The DataFrame in descending order by proven reserves per capita\n",
"- The average value of proven reserves per capita for each region"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}