{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Atmospheric Carbon Dioxide Analysis\n", "The carbon dioxide record from [Mauna Loa Observatory](https://en.wikipedia.org/wiki/Mauna_Loa_Observatory), known as the “Keeling Curve,” is the world’s longest unbroken record of atmospheric carbon dioxide concentrations. Scientists make atmospheric measurements in remote locations to sample air that is representative of a large volume of Earth’s atmosphere and relatively free from local influences.\n", "\n", "The data in this notebook is a combination of data collected at the Mauna Loa Observatory (MLO), with datasets from NOAA and also from UC San Diego. The NOAA dataset only goes back until 1974 while the UCSD dataset has recordings going back until 1958 when the observatory opened. This notebook combines the two datasets and takes a look at the trends over the years.\n", "\n", "##### Links\n", "- Datasets: \n", " - Kaggle: https://www.kaggle.com/ucsandiego/carbon-dioxide\n", " - MLO: [https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html)\n", "- Online Notebook: [Jupyter nbviewer](https://nbviewer.jupyter.org/github/kylepollina/Atmospheric_CO2_Analysis/blob/master/Atmospheric%20Carbon%20Dioxide%20Analysis.ipynb)\n", "- Source: https://github.com/kylepollina/Atmospheric_CO2_Analysis\n", "- Author: https://github.com/kylepollina\n", "\n", "-------" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preface" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### History of atmospheric carbon dioxide from 800,000 years ago until January, 2019. \n", "https://www.youtube.com/watch?v=1ZQG59_z83I" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%HTML\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NASA | A Year in the Life of Earth's CO2\n", "https://www.youtube.com/watch?v=x1SgmFa0r04" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "jupyter": { "source_hidden": true } }, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%HTML\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "------\n", "## The data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime\n", "import altair as alt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### University of California San Diego Dataset\n", "source: [https://www.kaggle.com/ucsandiego/carbon-dioxide](https://www.kaggle.com/ucsandiego/carbon-dioxide)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearMonthDecimal DateCO2 (ppm)Seasonally Adjusted CO2 (ppm)Carbon Dioxide Fit (ppm)Seasonally Adjusted CO2 Fit (ppm)Date
0195811958.0411NaNNaNNaNNaN1958-01-01
1195821958.1260NaNNaNNaNNaN1958-02-01
2195831958.2027315.69314.42316.18314.891958-03-01
3195841958.2877317.45315.15317.30314.981958-04-01
4195851958.3699317.50314.73317.83315.061958-05-01
...........................
715201782017.6219NaNNaNNaNNaN2017-08-01
716201792017.7068NaNNaNNaNNaN2017-09-01
7172017102017.7890NaNNaNNaNNaN2017-10-01
7182017112017.8740NaNNaNNaNNaN2017-11-01
7192017122017.9562NaNNaNNaNNaN2017-12-01
\n", "

720 rows × 8 columns

\n", "
" ], "text/plain": [ " Year Month Decimal Date CO2 (ppm) Seasonally Adjusted CO2 (ppm) \\\n", "0 1958 1 1958.0411 NaN NaN \n", "1 1958 2 1958.1260 NaN NaN \n", "2 1958 3 1958.2027 315.69 314.42 \n", "3 1958 4 1958.2877 317.45 315.15 \n", "4 1958 5 1958.3699 317.50 314.73 \n", ".. ... ... ... ... ... \n", "715 2017 8 2017.6219 NaN NaN \n", "716 2017 9 2017.7068 NaN NaN \n", "717 2017 10 2017.7890 NaN NaN \n", "718 2017 11 2017.8740 NaN NaN \n", "719 2017 12 2017.9562 NaN NaN \n", "\n", " Carbon Dioxide Fit (ppm) Seasonally Adjusted CO2 Fit (ppm) Date \n", "0 NaN NaN 1958-01-01 \n", "1 NaN NaN 1958-02-01 \n", "2 316.18 314.89 1958-03-01 \n", "3 317.30 314.98 1958-04-01 \n", "4 317.83 315.06 1958-05-01 \n", ".. ... ... ... \n", "715 NaN NaN 2017-08-01 \n", "716 NaN NaN 2017-09-01 \n", "717 NaN NaN 2017-10-01 \n", "718 NaN NaN 2017-11-01 \n", "719 NaN NaN 2017-12-01 \n", "\n", "[720 rows x 8 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ucsd_co2_data = pd.read_csv('data_acda/ucsd_co2_data.csv').rename(columns={'Carbon Dioxide (ppm)': 'CO2 (ppm)'})\n", "ucsd_co2_data['Date'] = pd.to_datetime(ucsd_co2_data['Year'].astype(str) + ' ' + ucsd_co2_data['Month'].astype(str))\n", "ucsd_co2_data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(ucsd_co2_data).mark_line().encode(\n", " x = alt.X('Date', type='temporal'),\n", " y = alt.Y('CO2 (ppm)', type='quantitative', scale=alt.Scale(zero=False)),\n", " color = alt.value('blue')\n", ").properties(title=\"MLO Carbon Dioxide in PPM over Time (UCSD Data)\", width=700).interactive()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NOAA Datasets\n", "- Mauna Loa Observatory data source: [https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html)\n", "- Global Trends data source: [https://www.esrl.noaa.gov/gmd/ccgg/trends/gl_data.html](https://www.esrl.noaa.gov/gmd/ccgg/trends/gl_data.html)\n" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "The datasets from NOAA are text files that need to be processed into DataFrames.\n", "Here is an excerpt from the dataset:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# Start of week CO2 molfrac (-999.99 = no data) increase\n", "# (yr, mon, day, decimal) (ppm) #days 1 yr ago 10 yr ago since 1800\n", " 1974 5 19 1974.3795 333.34 6 -999.99 -999.99 50.36\n", " 1974 5 26 1974.3986 332.95 6 -999.99 -999.99 50.06\n", " 1974 6 2 1974.4178 332.32 5 -999.99 -999.99 49.57\n" ] } ], "source": [ "%%bash\n", "tail +48 data_acda/co2_weekly_mlo.txt | head -n 5" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateyearmonthdaydecimal dateCO2 (ppm)#days
01974-05-2619745261974.3986332.956
11974-06-021974621974.4178332.325
21974-06-091974691974.4370332.187
31974-06-1619746161974.4562332.377
41974-06-2319746231974.4753331.596
........................
24052020-11-15202011152020.8730412.536
24062020-11-22202011222020.8921413.846
24072020-11-29202011292020.9112413.767
24082020-12-0620201262020.9303413.397
24092020-12-13202012132020.9495413.927
\n", "

2410 rows × 7 columns

\n", "
" ], "text/plain": [ " date year month day decimal date CO2 (ppm) #days\n", "0 1974-05-26 1974 5 26 1974.3986 332.95 6\n", "1 1974-06-02 1974 6 2 1974.4178 332.32 5\n", "2 1974-06-09 1974 6 9 1974.4370 332.18 7\n", "3 1974-06-16 1974 6 16 1974.4562 332.37 7\n", "4 1974-06-23 1974 6 23 1974.4753 331.59 6\n", "... ... ... ... .. ... ... ...\n", "2405 2020-11-15 2020 11 15 2020.8730 412.53 6\n", "2406 2020-11-22 2020 11 22 2020.8921 413.84 6\n", "2407 2020-11-29 2020 11 29 2020.9112 413.76 7\n", "2408 2020-12-06 2020 12 6 2020.9303 413.39 7\n", "2409 2020-12-13 2020 12 13 2020.9495 413.92 7\n", "\n", "[2410 rows x 7 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mlo_co2_data = {\n", " 'date': [], 'year': [], 'month': [], 'day': [],\n", " 'decimal date': [], 'CO2 (ppm)': [], '#days': []\n", "}\n", "with open('data_acda/co2_weekly_mlo.txt', 'r') as file:\n", " raw_data = file.readlines()[50:]\n", "\n", " for row in raw_data:\n", " data = row.split()\n", " if data[4] == '-999.99':\n", " continue\n", "\n", " mlo_co2_data['year'].append(data[0])\n", " mlo_co2_data['month'].append(data[1])\n", " mlo_co2_data['day'].append(data[2])\n", " mlo_co2_data['decimal date'].append(data[3])\n", " mlo_co2_data['CO2 (ppm)'].append(data[4])\n", " mlo_co2_data['#days'].append(data[5])\n", " date = datetime(year=int(data[0]), month=int(data[1]), day=int(data[2]))\n", " mlo_co2_data['date'].append(date)\n", "\n", "mlo_co2_data = pd.DataFrame(mlo_co2_data)\n", "mlo_co2_data.drop(index=mlo_co2_data[mlo_co2_data['CO2 (ppm)'] == '-999.99'].index)\n", "mlo_co2_data" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(mlo_co2_data).mark_line().encode(\n", " x = alt.X('date', type='temporal'),\n", " y = alt.Y('CO2 (ppm)', type='quantitative', scale=alt.Scale(zero=False)),\n", " color = alt.value('green')\n", ").properties(title='MLO Carbon Dioxide in PPM over Time (NOAA Data)', width=700).interactive()\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateyearmonthdaycycletrend
02010-01-01 00:00:00201011388.28387.23
12010-01-02 00:00:00201012388.30387.24
22010-01-03 00:00:00201013388.32387.25
32010-01-04 00:00:00201014388.34387.25
42010-01-05 00:00:00201015388.36387.26
.....................
36852020-02-03 00:00:00202023413.52411.77
36862020-02-04 00:00:00202024413.54411.78
36872020-02-05 00:00:00202025413.55411.78
36882020-02-06 00:00:00202026413.57411.79
36892020-02-07 00:00:00202027413.58411.80
\n", "

3690 rows × 6 columns

\n", "
" ], "text/plain": [ " date year month day cycle trend\n", "0 2010-01-01 00:00:00 2010 1 1 388.28 387.23\n", "1 2010-01-02 00:00:00 2010 1 2 388.30 387.24\n", "2 2010-01-03 00:00:00 2010 1 3 388.32 387.25\n", "3 2010-01-04 00:00:00 2010 1 4 388.34 387.25\n", "4 2010-01-05 00:00:00 2010 1 5 388.36 387.26\n", "... ... ... ... .. ... ...\n", "3685 2020-02-03 00:00:00 2020 2 3 413.52 411.77\n", "3686 2020-02-04 00:00:00 2020 2 4 413.54 411.78\n", "3687 2020-02-05 00:00:00 2020 2 5 413.55 411.78\n", "3688 2020-02-06 00:00:00 2020 2 6 413.57 411.79\n", "3689 2020-02-07 00:00:00 2020 2 7 413.58 411.80\n", "\n", "[3690 rows x 6 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "global_co2_data = {'date': [], 'year': [], 'month': [], 'day': [], 'cycle': [], 'trend': []}\n", "with open('data_acda/co2_trend_gl.txt', 'r') as file:\n", " raw_data = file.readlines()[60:]\n", "\n", " for row in raw_data:\n", " data = row.split()\n", " year = data[0]\n", " month = data[1]\n", " day = data[2]\n", " cycle = data[3]\n", " trend = data[4]\n", "\n", " global_co2_data['year'].append(year)\n", " global_co2_data['month'].append(month)\n", " global_co2_data['day'].append(day)\n", " global_co2_data['cycle'].append(cycle)\n", " global_co2_data['trend'].append(trend)\n", "\n", " date = datetime(year=int(year), month=int(month), day=int(day))\n", " global_co2_data['date'].append(str(date))\n", "\n", "global_co2_data = pd.DataFrame(global_co2_data)\n", "global_co2_data" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(global_co2_data).mark_line().encode(\n", " x = alt.X('date', type='temporal'),\n", " y = alt.Y('cycle', type='quantitative', scale=alt.Scale(zero=False)),\n", " color = alt.value('red')\n", ").properties(title=\"Global Carbon Dioxide Trends in PPM over Time (NOAA Data)\", width=700).interactive()" ] } ], "metadata": { "kernelspec": { "display_name": "PyCharm (notebooks)", "language": "python", "name": "pycharm-77f5979f" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 4 }