# Atmospheric Carbon Dioxide Analysis
The carbon dioxide record from [Mauna Loa Observatory](https://en.wikipedia.org/wiki/Mauna_Loa_Observatory), known as the “Keeling Curve,” is the world’s longest unbroken record of atmospheric carbon dioxide concentrations. Scientists make atmospheric measurements in remote locations to sample air that is representative of a large volume of Earth’s atmosphere and relatively free from local influences.

The data in this notebook is a combination of data collected at the Mauna Loa Observatory (MLO), with datasets from NOAA and also from UC San Diego. The NOAA dataset only goes back until 1974 while the UCSD dataset has recordings going back until 1958 when the observatory opened. This notebook combines the two datasets and takes a look at the trends over the years.

##### Links
- Datasets: 
    - Kaggle: https://www.kaggle.com/ucsandiego/carbon-dioxide
    - MLO: [https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html)
- Online Notebook: [Jupyter nbviewer](https://nbviewer.jupyter.org/github/kylepollina/Atmospheric_CO2_Analysis/blob/master/Atmospheric%20Carbon%20Dioxide%20Analysis.ipynb)
- Source: https://github.com/kylepollina/Atmospheric_CO2_Analysis
- Author: https://github.com/kylepollina

-------

## Preface

### History of atmospheric carbon dioxide from 800,000 years ago until January, 2019. 
https://www.youtube.com/watch?v=1ZQG59_z83I

In [1]:
%%HTML
<iframe width="640" height="360" src="https://www.youtube.com/embed/1ZQG59_z83I" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

### NASA | A Year in the Life of Earth's CO2
https://www.youtube.com/watch?v=x1SgmFa0r04

In [2]:
%%HTML
<iframe width="640" height="360" src="https://www.youtube.com/embed/x1SgmFa0r04" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

------
## The data

In [3]:
from datetime import datetime
import altair as alt
import pandas as pd

### University of California San Diego Dataset
source: [https://www.kaggle.com/ucsandiego/carbon-dioxide](https://www.kaggle.com/ucsandiego/carbon-dioxide)

In [4]:
ucsd_co2_data = pd.read_csv('data_acda/ucsd_co2_data.csv').rename(columns={'Carbon Dioxide (ppm)': 'CO2 (ppm)'})
ucsd_co2_data['Date'] = pd.to_datetime(ucsd_co2_data['Year'].astype(str) + ' ' + ucsd_co2_data['Month'].astype(str))
ucsd_co2_data

Unnamed: 0,Year,Month,Decimal Date,CO2 (ppm),Seasonally Adjusted CO2 (ppm),Carbon Dioxide Fit (ppm),Seasonally Adjusted CO2 Fit (ppm),Date
0,1958,1,1958.0411,,,,,1958-01-01
1,1958,2,1958.1260,,,,,1958-02-01
2,1958,3,1958.2027,315.69,314.42,316.18,314.89,1958-03-01
3,1958,4,1958.2877,317.45,315.15,317.30,314.98,1958-04-01
4,1958,5,1958.3699,317.50,314.73,317.83,315.06,1958-05-01
...,...,...,...,...,...,...,...,...
715,2017,8,2017.6219,,,,,2017-08-01
716,2017,9,2017.7068,,,,,2017-09-01
717,2017,10,2017.7890,,,,,2017-10-01
718,2017,11,2017.8740,,,,,2017-11-01


In [5]:
alt.Chart(ucsd_co2_data).mark_line().encode(
    x = alt.X('Date', type='temporal'),
    y = alt.Y('CO2 (ppm)', type='quantitative', scale=alt.Scale(zero=False)),
    color = alt.value('blue')
).properties(title="MLO Carbon Dioxide in PPM over Time (UCSD Data)", width=700).interactive()

### NOAA Datasets
- Mauna Loa Observatory data source: [https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html)
- Global Trends data source: [https://www.esrl.noaa.gov/gmd/ccgg/trends/gl_data.html](https://www.esrl.noaa.gov/gmd/ccgg/trends/gl_data.html)


The datasets from NOAA are text files that need to be processed into DataFrames.
Here is an excerpt from the dataset:

In [6]:
%%bash
tail +48 data_acda/co2_weekly_mlo.txt | head -n 5

#      Start of week      CO2 molfrac           (-999.99 = no data)  increase
# (yr, mon, day, decimal)    (ppm)  #days       1 yr ago  10 yr ago  since 1800
  1974   5  19  1974.3795    333.34  6          -999.99   -999.99     50.36
  1974   5  26  1974.3986    332.95  6          -999.99   -999.99     50.06
  1974   6   2  1974.4178    332.32  5          -999.99   -999.99     49.57


In [7]:
mlo_co2_data = {
    'date': [], 'year': [], 'month': [], 'day': [],
    'decimal date': [], 'CO2 (ppm)': [], '#days': []
}
with open('data_acda/co2_weekly_mlo.txt', 'r') as file:
    raw_data = file.readlines()[50:]

    for row in raw_data:
        data = row.split()
        if data[4] == '-999.99':
            continue

        mlo_co2_data['year'].append(data[0])
        mlo_co2_data['month'].append(data[1])
        mlo_co2_data['day'].append(data[2])
        mlo_co2_data['decimal date'].append(data[3])
        mlo_co2_data['CO2 (ppm)'].append(data[4])
        mlo_co2_data['#days'].append(data[5])
        date = datetime(year=int(data[0]), month=int(data[1]), day=int(data[2]))
        mlo_co2_data['date'].append(date)

mlo_co2_data = pd.DataFrame(mlo_co2_data)
mlo_co2_data.drop(index=mlo_co2_data[mlo_co2_data['CO2 (ppm)'] == '-999.99'].index)
mlo_co2_data

Unnamed: 0,date,year,month,day,decimal date,CO2 (ppm),#days
0,1974-05-26,1974,5,26,1974.3986,332.95,6
1,1974-06-02,1974,6,2,1974.4178,332.32,5
2,1974-06-09,1974,6,9,1974.4370,332.18,7
3,1974-06-16,1974,6,16,1974.4562,332.37,7
4,1974-06-23,1974,6,23,1974.4753,331.59,6
...,...,...,...,...,...,...,...
2405,2020-11-15,2020,11,15,2020.8730,412.53,6
2406,2020-11-22,2020,11,22,2020.8921,413.84,6
2407,2020-11-29,2020,11,29,2020.9112,413.76,7
2408,2020-12-06,2020,12,6,2020.9303,413.39,7


In [8]:
alt.Chart(mlo_co2_data).mark_line().encode(
    x = alt.X('date', type='temporal'),
    y = alt.Y('CO2 (ppm)', type='quantitative', scale=alt.Scale(zero=False)),
    color = alt.value('green')
).properties(title='MLO Carbon Dioxide in PPM over Time (NOAA Data)', width=700).interactive()


In [9]:
global_co2_data = {'date': [], 'year': [], 'month': [], 'day': [], 'cycle': [], 'trend': []}
with open('data_acda/co2_trend_gl.txt', 'r') as file:
    raw_data = file.readlines()[60:]

    for row in raw_data:
        data = row.split()
        year = data[0]
        month = data[1]
        day = data[2]
        cycle = data[3]
        trend = data[4]

        global_co2_data['year'].append(year)
        global_co2_data['month'].append(month)
        global_co2_data['day'].append(day)
        global_co2_data['cycle'].append(cycle)
        global_co2_data['trend'].append(trend)

        date = datetime(year=int(year), month=int(month), day=int(day))
        global_co2_data['date'].append(str(date))

global_co2_data = pd.DataFrame(global_co2_data)
global_co2_data

Unnamed: 0,date,year,month,day,cycle,trend
0,2010-01-01 00:00:00,2010,1,1,388.28,387.23
1,2010-01-02 00:00:00,2010,1,2,388.30,387.24
2,2010-01-03 00:00:00,2010,1,3,388.32,387.25
3,2010-01-04 00:00:00,2010,1,4,388.34,387.25
4,2010-01-05 00:00:00,2010,1,5,388.36,387.26
...,...,...,...,...,...,...
3685,2020-02-03 00:00:00,2020,2,3,413.52,411.77
3686,2020-02-04 00:00:00,2020,2,4,413.54,411.78
3687,2020-02-05 00:00:00,2020,2,5,413.55,411.78
3688,2020-02-06 00:00:00,2020,2,6,413.57,411.79


In [10]:
alt.Chart(global_co2_data).mark_line().encode(
    x = alt.X('date', type='temporal'),
    y = alt.Y('cycle', type='quantitative', scale=alt.Scale(zero=False)),
    color = alt.value('red')
).properties(title="Global Carbon Dioxide Trends in PPM over Time (NOAA Data)", width=700).interactive()