{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 3\n", "\n", "## Video 15: Filtering and Modifying DataFrames\n", "**Python for the Energy Industry**\n", "\n", "## Filtering DataFrames\n", "\n", "A DataFrame can be filtered on a condition in a similar way to filtering numpy arrays. We will use the OPEC data example again:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "OPEC_df = pd.DataFrame({\n", " 'Country': ['Algeria','Angola','Equatorial Guinea','Gabon','Iran','Iraq','Kuwait','Libya','Nigeria','Republic of the Congo','Saudi Arabia','UAE','Venezuela'],\n", " 'Region': ['North Africa','Southern Africa','Central Africa','Central Africa','Middle East','Middle East','Middle East','North Africa','West Africa','Central Africa','Middle East','Middle East','South America'],\n", " 'Population': [42228408,30809787,1308975,2119275,81800188,38433600,4137312,6678559,195874685,5125821,33702756,9630959,28887118],\n", " 'Oil Production': [1348361,1769615,np.nan,210820,3990956,4451516,2923825,384686,1999885,260000,10460710,3106077,2276967],\n", " 'Proven Reserves': [12.2e9,8.423e9,np.nan,2e9,157.53e9,143.069e9,101.5e9,48.363e9,37.07e9,1.6e9,266.578e9,97.8e9,299.953e9]\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do boolean operations on a column of a DataFrame to obtain an array of boolean values. This boolean array can be used to filter a DataFrame - all entries corresponding to a False value are dropped from the DataFrame." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryRegionPopulationOil ProductionProven Reserves
4IranMiddle East818001883990956.01.575300e+11
5IraqMiddle East384336004451516.01.430690e+11
6KuwaitMiddle East41373122923825.01.015000e+11
10Saudi ArabiaMiddle East3370275610460710.02.665780e+11
11UAEMiddle East96309593106077.09.780000e+10
\n", "
" ], "text/plain": [ " Country Region Population Oil Production Proven Reserves\n", "4 Iran Middle East 81800188 3990956.0 1.575300e+11\n", "5 Iraq Middle East 38433600 4451516.0 1.430690e+11\n", "6 Kuwait Middle East 4137312 2923825.0 1.015000e+11\n", "10 Saudi Arabia Middle East 33702756 10460710.0 2.665780e+11\n", "11 UAE Middle East 9630959 3106077.0 9.780000e+10" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get Middle Eastern countries data only\n", "is_ME = (OPEC_df['Region'] == \"Middle East\")\n", "OPEC_df[is_ME]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say we want to filter using some other condition (e.g. countries with production over 2M bbl/day):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryRegionPopulationOil ProductionProven Reserves
4IranMiddle East818001883990956.01.575300e+11
5IraqMiddle East384336004451516.01.430690e+11
6KuwaitMiddle East41373122923825.01.015000e+11
10Saudi ArabiaMiddle East3370275610460710.02.665780e+11
11UAEMiddle East96309593106077.09.780000e+10
12VenezuelaSouth America288871182276967.02.999530e+11
\n", "
" ], "text/plain": [ " Country Region Population Oil Production Proven Reserves\n", "4 Iran Middle East 81800188 3990956.0 1.575300e+11\n", "5 Iraq Middle East 38433600 4451516.0 1.430690e+11\n", "6 Kuwait Middle East 4137312 2923825.0 1.015000e+11\n", "10 Saudi Arabia Middle East 33702756 10460710.0 2.665780e+11\n", "11 UAE Middle East 9630959 3106077.0 9.780000e+10\n", "12 Venezuela South America 28887118 2276967.0 2.999530e+11" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prod_over_2M = (OPEC_df[\"Oil Production\"] > 2e6)\n", "OPEC_df[prod_over_2M]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The logical_and function we saw before can be used to filter on multiple conditions." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryRegionPopulationOil ProductionProven Reserves
4IranMiddle East818001883990956.01.575300e+11
5IraqMiddle East384336004451516.01.430690e+11
6KuwaitMiddle East41373122923825.01.015000e+11
10Saudi ArabiaMiddle East3370275610460710.02.665780e+11
11UAEMiddle East96309593106077.09.780000e+10
\n", "
" ], "text/plain": [ " Country Region Population Oil Production Proven Reserves\n", "4 Iran Middle East 81800188 3990956.0 1.575300e+11\n", "5 Iraq Middle East 38433600 4451516.0 1.430690e+11\n", "6 Kuwait Middle East 4137312 2923825.0 1.015000e+11\n", "10 Saudi Arabia Middle East 33702756 10460710.0 2.665780e+11\n", "11 UAE Middle East 9630959 3106077.0 9.780000e+10" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "OPEC_df[np.logical_and(is_ME,prod_over_2M)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modifying DataFrames\n", "\n", "### Adding to DataFrames\n", "\n", "We can use the 'append' function to add a new entry to DataFrame, by appending a dictionary with keys corresponding to the column names." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryRegionPopulationOil ProductionProven Reserves
0AlgeriaNorth Africa422284081348361.01.220000e+10
1AngolaSouthern Africa308097871769615.08.423000e+09
2Equatorial GuineaCentral Africa1308975NaNNaN
3GabonCentral Africa2119275210820.02.000000e+09
4IranMiddle East818001883990956.01.575300e+11
5IraqMiddle East384336004451516.01.430690e+11
6KuwaitMiddle East41373122923825.01.015000e+11
7LibyaNorth Africa6678559384686.04.836300e+10
8NigeriaWest Africa1958746851999885.03.707000e+10
9Republic of the CongoCentral Africa5125821260000.01.600000e+09
10Saudi ArabiaMiddle East3370275610460710.02.665780e+11
11UAEMiddle East96309593106077.09.780000e+10
12VenezuelaSouth America288871182276967.02.999530e+11
13EcuadorSouth America17084358548421.08.273000e+09
14IndonesiaSoutheast Asia267670543833667.03.692500e+09
15QatarMiddle East27816821522902.02.524400e+10
\n", "
" ], "text/plain": [ " Country Region Population Oil Production \\\n", "0 Algeria North Africa 42228408 1348361.0 \n", "1 Angola Southern Africa 30809787 1769615.0 \n", "2 Equatorial Guinea Central Africa 1308975 NaN \n", "3 Gabon Central Africa 2119275 210820.0 \n", "4 Iran Middle East 81800188 3990956.0 \n", "5 Iraq Middle East 38433600 4451516.0 \n", "6 Kuwait Middle East 4137312 2923825.0 \n", "7 Libya North Africa 6678559 384686.0 \n", "8 Nigeria West Africa 195874685 1999885.0 \n", "9 Republic of the Congo Central Africa 5125821 260000.0 \n", "10 Saudi Arabia Middle East 33702756 10460710.0 \n", "11 UAE Middle East 9630959 3106077.0 \n", "12 Venezuela South America 28887118 2276967.0 \n", "13 Ecuador South America 17084358 548421.0 \n", "14 Indonesia Southeast Asia 267670543 833667.0 \n", "15 Qatar Middle East 2781682 1522902.0 \n", "\n", " Proven Reserves \n", "0 1.220000e+10 \n", "1 8.423000e+09 \n", "2 NaN \n", "3 2.000000e+09 \n", "4 1.575300e+11 \n", "5 1.430690e+11 \n", "6 1.015000e+11 \n", "7 4.836300e+10 \n", "8 3.707000e+10 \n", "9 1.600000e+09 \n", "10 2.665780e+11 \n", "11 9.780000e+10 \n", "12 2.999530e+11 \n", "13 8.273000e+09 \n", "14 3.692500e+09 \n", "15 2.524400e+10 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Include countries with lapsed OPEC membership\n", "\n", "OPEC_df = OPEC_df.append({'Country':'Ecuador',\n", " 'Region':'South America',\n", " 'Population': 17084358,\n", " 'Oil Production': 548421,\n", " 'Proven Reserves': 8.273e9},\n", " ignore_index=True)\n", "\n", "OPEC_df = OPEC_df.append({'Country':'Indonesia',\n", " 'Region':'Southeast Asia',\n", " 'Population': 267670543,\n", " 'Oil Production': 833667,\n", " 'Proven Reserves': 3.6925e9},\n", " ignore_index=True)\n", "\n", "OPEC_df = OPEC_df.append({'Country':'Qatar',\n", " 'Region':'Middle East',\n", " 'Population': 2781682,\n", " 'Oil Production': 1522902,\n", " 'Proven Reserves': 25.244e9},\n", " ignore_index=True)\n", "\n", "OPEC_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "New columns in a DataFrame can be created by setting a column of that name to a list/array of the right length:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryRegionPopulationOil ProductionProven ReservesMembership Active
0AlgeriaNorth Africa422284081348361.01.220000e+10True
1AngolaSouthern Africa308097871769615.08.423000e+09True
2Equatorial GuineaCentral Africa1308975NaNNaNTrue
3GabonCentral Africa2119275210820.02.000000e+09True
4IranMiddle East818001883990956.01.575300e+11True
5IraqMiddle East384336004451516.01.430690e+11True
6KuwaitMiddle East41373122923825.01.015000e+11True
7LibyaNorth Africa6678559384686.04.836300e+10True
8NigeriaWest Africa1958746851999885.03.707000e+10True
9Republic of the CongoCentral Africa5125821260000.01.600000e+09True
10Saudi ArabiaMiddle East3370275610460710.02.665780e+11True
11UAEMiddle East96309593106077.09.780000e+10True
12VenezuelaSouth America288871182276967.02.999530e+11True
13EcuadorSouth America17084358548421.08.273000e+09False
14IndonesiaSoutheast Asia267670543833667.03.692500e+09False
15QatarMiddle East27816821522902.02.524400e+10False
\n", "
" ], "text/plain": [ " Country Region Population Oil Production \\\n", "0 Algeria North Africa 42228408 1348361.0 \n", "1 Angola Southern Africa 30809787 1769615.0 \n", "2 Equatorial Guinea Central Africa 1308975 NaN \n", "3 Gabon Central Africa 2119275 210820.0 \n", "4 Iran Middle East 81800188 3990956.0 \n", "5 Iraq Middle East 38433600 4451516.0 \n", "6 Kuwait Middle East 4137312 2923825.0 \n", "7 Libya North Africa 6678559 384686.0 \n", "8 Nigeria West Africa 195874685 1999885.0 \n", "9 Republic of the Congo Central Africa 5125821 260000.0 \n", "10 Saudi Arabia Middle East 33702756 10460710.0 \n", "11 UAE Middle East 9630959 3106077.0 \n", "12 Venezuela South America 28887118 2276967.0 \n", "13 Ecuador South America 17084358 548421.0 \n", "14 Indonesia Southeast Asia 267670543 833667.0 \n", "15 Qatar Middle East 2781682 1522902.0 \n", "\n", " Proven Reserves Membership Active \n", "0 1.220000e+10 True \n", "1 8.423000e+09 True \n", "2 NaN True \n", "3 2.000000e+09 True \n", "4 1.575300e+11 True \n", "5 1.430690e+11 True \n", "6 1.015000e+11 True \n", "7 4.836300e+10 True \n", "8 3.707000e+10 True \n", "9 1.600000e+09 True \n", "10 2.665780e+11 True \n", "11 9.780000e+10 True \n", "12 2.999530e+11 True \n", "13 8.273000e+09 False \n", "14 3.692500e+09 False \n", "15 2.524400e+10 False " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "OPEC_df['Membership Active'] = [True]*13 + [False]*3\n", "OPEC_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "New columns can also be created from operations on existing columns:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryRegionPopulationOil ProductionProven ReservesMembership ActiveProduction per cap
0AlgeriaNorth Africa422284081348361.01.220000e+10True0.031930
1AngolaSouthern Africa308097871769615.08.423000e+09True0.057437
2Equatorial GuineaCentral Africa1308975NaNNaNTrueNaN
3GabonCentral Africa2119275210820.02.000000e+09True0.099477
4IranMiddle East818001883990956.01.575300e+11True0.048789
5IraqMiddle East384336004451516.01.430690e+11True0.115824
6KuwaitMiddle East41373122923825.01.015000e+11True0.706697
7LibyaNorth Africa6678559384686.04.836300e+10True0.057600
8NigeriaWest Africa1958746851999885.03.707000e+10True0.010210
9Republic of the CongoCentral Africa5125821260000.01.600000e+09True0.050724
10Saudi ArabiaMiddle East3370275610460710.02.665780e+11True0.310381
11UAEMiddle East96309593106077.09.780000e+10True0.322510
12VenezuelaSouth America288871182276967.02.999530e+11True0.078823
13EcuadorSouth America17084358548421.08.273000e+09False0.032101
14IndonesiaSoutheast Asia267670543833667.03.692500e+09False0.003115
15QatarMiddle East27816821522902.02.524400e+10False0.547475
\n", "
" ], "text/plain": [ " Country Region Population Oil Production \\\n", "0 Algeria North Africa 42228408 1348361.0 \n", "1 Angola Southern Africa 30809787 1769615.0 \n", "2 Equatorial Guinea Central Africa 1308975 NaN \n", "3 Gabon Central Africa 2119275 210820.0 \n", "4 Iran Middle East 81800188 3990956.0 \n", "5 Iraq Middle East 38433600 4451516.0 \n", "6 Kuwait Middle East 4137312 2923825.0 \n", "7 Libya North Africa 6678559 384686.0 \n", "8 Nigeria West Africa 195874685 1999885.0 \n", "9 Republic of the Congo Central Africa 5125821 260000.0 \n", "10 Saudi Arabia Middle East 33702756 10460710.0 \n", "11 UAE Middle East 9630959 3106077.0 \n", "12 Venezuela South America 28887118 2276967.0 \n", "13 Ecuador South America 17084358 548421.0 \n", "14 Indonesia Southeast Asia 267670543 833667.0 \n", "15 Qatar Middle East 2781682 1522902.0 \n", "\n", " Proven Reserves Membership Active Production per cap \n", "0 1.220000e+10 True 0.031930 \n", "1 8.423000e+09 True 0.057437 \n", "2 NaN True NaN \n", "3 2.000000e+09 True 0.099477 \n", "4 1.575300e+11 True 0.048789 \n", "5 1.430690e+11 True 0.115824 \n", "6 1.015000e+11 True 0.706697 \n", "7 4.836300e+10 True 0.057600 \n", "8 3.707000e+10 True 0.010210 \n", "9 1.600000e+09 True 0.050724 \n", "10 2.665780e+11 True 0.310381 \n", "11 9.780000e+10 True 0.322510 \n", "12 2.999530e+11 True 0.078823 \n", "13 8.273000e+09 False 0.032101 \n", "14 3.692500e+09 False 0.003115 \n", "15 2.524400e+10 False 0.547475 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "OPEC_df['Production per cap'] = OPEC_df['Oil Production'] / OPEC_df['Population']\n", "OPEC_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sorting DataFrames\n", "\n", "We can sort a DataFrame by the values in a given column:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryRegionPopulationOil ProductionProven ReservesMembership ActiveProduction per cap
10Saudi ArabiaMiddle East3370275610460710.02.665780e+11True0.310381
5IraqMiddle East384336004451516.01.430690e+11True0.115824
4IranMiddle East818001883990956.01.575300e+11True0.048789
11UAEMiddle East96309593106077.09.780000e+10True0.322510
6KuwaitMiddle East41373122923825.01.015000e+11True0.706697
12VenezuelaSouth America288871182276967.02.999530e+11True0.078823
8NigeriaWest Africa1958746851999885.03.707000e+10True0.010210
1AngolaSouthern Africa308097871769615.08.423000e+09True0.057437
15QatarMiddle East27816821522902.02.524400e+10False0.547475
0AlgeriaNorth Africa422284081348361.01.220000e+10True0.031930
14IndonesiaSoutheast Asia267670543833667.03.692500e+09False0.003115
13EcuadorSouth America17084358548421.08.273000e+09False0.032101
7LibyaNorth Africa6678559384686.04.836300e+10True0.057600
9Republic of the CongoCentral Africa5125821260000.01.600000e+09True0.050724
3GabonCentral Africa2119275210820.02.000000e+09True0.099477
2Equatorial GuineaCentral Africa1308975NaNNaNTrueNaN
\n", "
" ], "text/plain": [ " Country Region Population Oil Production \\\n", "10 Saudi Arabia Middle East 33702756 10460710.0 \n", "5 Iraq Middle East 38433600 4451516.0 \n", "4 Iran Middle East 81800188 3990956.0 \n", "11 UAE Middle East 9630959 3106077.0 \n", "6 Kuwait Middle East 4137312 2923825.0 \n", "12 Venezuela South America 28887118 2276967.0 \n", "8 Nigeria West Africa 195874685 1999885.0 \n", "1 Angola Southern Africa 30809787 1769615.0 \n", "15 Qatar Middle East 2781682 1522902.0 \n", "0 Algeria North Africa 42228408 1348361.0 \n", "14 Indonesia Southeast Asia 267670543 833667.0 \n", "13 Ecuador South America 17084358 548421.0 \n", "7 Libya North Africa 6678559 384686.0 \n", "9 Republic of the Congo Central Africa 5125821 260000.0 \n", "3 Gabon Central Africa 2119275 210820.0 \n", "2 Equatorial Guinea Central Africa 1308975 NaN \n", "\n", " Proven Reserves Membership Active Production per cap \n", "10 2.665780e+11 True 0.310381 \n", "5 1.430690e+11 True 0.115824 \n", "4 1.575300e+11 True 0.048789 \n", "11 9.780000e+10 True 0.322510 \n", "6 1.015000e+11 True 0.706697 \n", "12 2.999530e+11 True 0.078823 \n", "8 3.707000e+10 True 0.010210 \n", "1 8.423000e+09 True 0.057437 \n", "15 2.524400e+10 False 0.547475 \n", "0 1.220000e+10 True 0.031930 \n", "14 3.692500e+09 False 0.003115 \n", "13 8.273000e+09 False 0.032101 \n", "7 4.836300e+10 True 0.057600 \n", "9 1.600000e+09 True 0.050724 \n", "3 2.000000e+09 True 0.099477 \n", "2 NaN True NaN " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "OPEC_df.sort_values(by='Oil Production', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Grouping\n", "\n", "We can also group entries by a common property. In this dataset, Region is the natural grouping. We have to specify what operation is applied to numeric data, e.g. adding or averaging the values for a given group." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PopulationOil ProductionProven ReservesMembership ActiveProduction per cap
Region
Central Africa8554071470820.03.600000e+0930.150201
Middle East17048649726455986.07.917210e+1152.051676
North Africa489069671733047.06.056300e+1020.089530
South America459714762825388.03.082260e+1110.110924
Southeast Asia267670543833667.03.692500e+0900.003115
Southern Africa308097871769615.08.423000e+0910.057437
West Africa1958746851999885.03.707000e+1010.010210
\n", "
" ], "text/plain": [ " Population Oil Production Proven Reserves \\\n", "Region \n", "Central Africa 8554071 470820.0 3.600000e+09 \n", "Middle East 170486497 26455986.0 7.917210e+11 \n", "North Africa 48906967 1733047.0 6.056300e+10 \n", "South America 45971476 2825388.0 3.082260e+11 \n", "Southeast Asia 267670543 833667.0 3.692500e+09 \n", "Southern Africa 30809787 1769615.0 8.423000e+09 \n", "West Africa 195874685 1999885.0 3.707000e+10 \n", "\n", " Membership Active Production per cap \n", "Region \n", "Central Africa 3 0.150201 \n", "Middle East 5 2.051676 \n", "North Africa 2 0.089530 \n", "South America 1 0.110924 \n", "Southeast Asia 0 0.003115 \n", "Southern Africa 1 0.057437 \n", "West Africa 1 0.010210 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "OPEC_df.groupby('Region').sum()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PopulationOil ProductionProven ReservesMembership ActiveProduction per cap
Region
Central Africa2.851357e+06235410.01.800000e+091.0000000.075100
Middle East2.841442e+074409331.01.319535e+110.8333330.341946
North Africa2.445348e+07866523.53.028150e+101.0000000.044765
South America2.298574e+071412694.01.541130e+110.5000000.055462
Southeast Asia2.676705e+08833667.03.692500e+090.0000000.003115
Southern Africa3.080979e+071769615.08.423000e+091.0000000.057437
West Africa1.958747e+081999885.03.707000e+101.0000000.010210
\n", "
" ], "text/plain": [ " Population Oil Production Proven Reserves \\\n", "Region \n", "Central Africa 2.851357e+06 235410.0 1.800000e+09 \n", "Middle East 2.841442e+07 4409331.0 1.319535e+11 \n", "North Africa 2.445348e+07 866523.5 3.028150e+10 \n", "South America 2.298574e+07 1412694.0 1.541130e+11 \n", "Southeast Asia 2.676705e+08 833667.0 3.692500e+09 \n", "Southern Africa 3.080979e+07 1769615.0 8.423000e+09 \n", "West Africa 1.958747e+08 1999885.0 3.707000e+10 \n", "\n", " Membership Active Production per cap \n", "Region \n", "Central Africa 1.000000 0.075100 \n", "Middle East 0.833333 0.341946 \n", "North Africa 1.000000 0.044765 \n", "South America 0.500000 0.055462 \n", "Southeast Asia 0.000000 0.003115 \n", "Southern Africa 1.000000 0.057437 \n", "West Africa 1.000000 0.010210 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "OPEC_df.groupby('Region').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Add to the DataFrame a column for proven reserves per capita. Output:\n", "- The DataFrame in descending order by proven reserves per capita\n", "- The average value of proven reserves per capita for each region" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 4 }