{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Earth Surface Temperature Analysis\n", "----------\n", "In this analysis I will be looking at the dataset _[Climate Change: Earth Surface Temperature Data](https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data)_ by Berkley Earth. This dataset consists of temperature readings from around the world that go back to at least the 1760's in some areas and up until 2013.\n", "\n", "*Note*: This analysis does not include the 95% confidence interval for temperatures.\n", "\n", "This notebook is best viewed with [Jupyter nbviewer](https://nbviewer.jupyter.org/github/kylepollina/Surface_Temperature_Analysis/blob/master/Earth%20Surface%20Temperature%20Analysis.ipynb). The *Contents* links will not work on Github.\n", "\n", "##### Contents\n", "1. [Preface](#Preface)\n", "2. [Global Land Temperatures By Country](#Global-Land-Temperatures-By-Country)\n", "3. [Global Land Temperatures By Major City](#Global-Land-Temperatures-By-Major-City)\n", "4. [Global Land Temperatures By State](#Global-Land-Temperatures-By-State)\n", "\n", " \n", "\n", "##### Links\n", "- Dataset: https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data\n", "- Notebook: [Jupyter nbviewer](https://nbviewer.jupyter.org/github/kylepollina/Surface_Temperature_Analysis/blob/master/Earth%20Surface%20Temperature%20Analysis.ipynb)\n", "- Source: https://github.com/circulEarth/SurfaceTemperatures\n", "- Author: https://github.com/kylepollina" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preface\n", "----------\n", "\n", "1. [The data](#The-data)\n", "2. [Helper functions](#Helper-functions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Packages\n", "- [pandas](https://pandas.pydata.org/): Used for reading data into [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) objects\n", "- [altair](https://altair-viz.github.io/): Used for plotting and interactive visualization\n", "- [os](https://docs.python.org/3/library/os.html): Used for operating system behavior, specifically [walking](https://docs.python.org/3/library/os.html?highlight=os%20walk#os.walk) through the filesystem\n", "- [datetime.date](https://docs.python.org/2/library/datetime.html#date-objects): Used for reading/parsing dates from strings using [datetime.date.fromisoformat()](https://docs.python.org/3/library/datetime.html#datetime.date.fromisoformat)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas\n", "import altair as alt\n", "import os\n", "from datetime import date" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The data\n", "This dataset consists of five sources. I have downloaded them into a directory named `/data/BerkleyEarth`. The `datafiles` list below consists of the names of each of the five .csv files. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['data_sta/GlobalLandTemperaturesByCountry.csv', 'data_sta/GlobalLandTemperaturesByMajorCity.csv', 'data_sta/GlobalLandTemperaturesByState.csv', 'data_sta/GlobalTemperatures.csv', 'data_sta/GlobalLandTemperaturesByCity.csv']\n" ] } ], "source": [ "datafiles = []\n", "\n", "for directory, _, files in os.walk('data_sta'):\n", " for file in files:\n", " filename = os.path.join(directory, file)\n", " datafiles.append(filename)\n", "\n", "print(datafiles)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`print_datafiles()` - Orderly print each datafile and index within the `datafiles` list" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0. data_sta/GlobalLandTemperaturesByCountry.csv\n", "1. data_sta/GlobalLandTemperaturesByMajorCity.csv\n", "2. data_sta/GlobalLandTemperaturesByState.csv\n", "3. data_sta/GlobalTemperatures.csv\n", "4. data_sta/GlobalLandTemperaturesByCity.csv\n" ] } ], "source": [ "def print_datafiles():\n", " for index, filename in enumerate(datafiles):\n", " print(str(index) + \". \" + filename)\n", " \n", "print_datafiles()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Temperatures by Country" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | dt | \n", "AverageTemperature | \n", "AverageTemperatureUncertainty | \n", "Country | \n", "
---|---|---|---|---|
0 | \n", "1743-11-01 | \n", "4.384 | \n", "2.294 | \n", "Åland | \n", "
5 | \n", "1744-04-01 | \n", "1.530 | \n", "4.680 | \n", "Åland | \n", "
6 | \n", "1744-05-01 | \n", "6.702 | \n", "1.789 | \n", "Åland | \n", "
7 | \n", "1744-06-01 | \n", "11.609 | \n", "1.577 | \n", "Åland | \n", "
8 | \n", "1744-07-01 | \n", "15.342 | \n", "1.410 | \n", "Åland | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
577456 | \n", "2013-04-01 | \n", "21.142 | \n", "0.495 | \n", "Zimbabwe | \n", "
577457 | \n", "2013-05-01 | \n", "19.059 | \n", "1.022 | \n", "Zimbabwe | \n", "
577458 | \n", "2013-06-01 | \n", "17.613 | \n", "0.473 | \n", "Zimbabwe | \n", "
577459 | \n", "2013-07-01 | \n", "17.000 | \n", "0.453 | \n", "Zimbabwe | \n", "
577460 | \n", "2013-08-01 | \n", "19.759 | \n", "0.717 | \n", "Zimbabwe | \n", "
544811 rows × 4 columns
\n", "\n", " | dt | \n", "AverageTemperature | \n", "AverageTemperatureUncertainty | \n", "State | \n", "Country | \n", "
---|---|---|---|---|---|
0 | \n", "1855-05-01 | \n", "25.544 | \n", "1.171 | \n", "Acre | \n", "Brazil | \n", "
1 | \n", "1855-06-01 | \n", "24.228 | \n", "1.103 | \n", "Acre | \n", "Brazil | \n", "
2 | \n", "1855-07-01 | \n", "24.371 | \n", "1.044 | \n", "Acre | \n", "Brazil | \n", "
3 | \n", "1855-08-01 | \n", "25.427 | \n", "1.073 | \n", "Acre | \n", "Brazil | \n", "
4 | \n", "1855-09-01 | \n", "25.675 | \n", "1.014 | \n", "Acre | \n", "Brazil | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
645669 | \n", "2013-04-01 | \n", "15.710 | \n", "0.461 | \n", "Zhejiang | \n", "China | \n", "
645670 | \n", "2013-05-01 | \n", "21.634 | \n", "0.578 | \n", "Zhejiang | \n", "China | \n", "
645671 | \n", "2013-06-01 | \n", "24.679 | \n", "0.596 | \n", "Zhejiang | \n", "China | \n", "
645672 | \n", "2013-07-01 | \n", "29.272 | \n", "1.340 | \n", "Zhejiang | \n", "China | \n", "
645673 | \n", "2013-08-01 | \n", "29.202 | \n", "0.869 | \n", "Zhejiang | \n", "China | \n", "
620027 rows × 5 columns
\n", "\n", " | dt | \n", "AverageTemperature | \n", "AverageTemperatureUncertainty | \n", "City | \n", "Country | \n", "Latitude | \n", "Longitude | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "1849-01-01 | \n", "26.704 | \n", "1.435 | \n", "Abidjan | \n", "Côte D'Ivoire | \n", "5.63N | \n", "3.23W | \n", "
1 | \n", "1849-02-01 | \n", "27.434 | \n", "1.362 | \n", "Abidjan | \n", "Côte D'Ivoire | \n", "5.63N | \n", "3.23W | \n", "
2 | \n", "1849-03-01 | \n", "28.101 | \n", "1.612 | \n", "Abidjan | \n", "Côte D'Ivoire | \n", "5.63N | \n", "3.23W | \n", "
3 | \n", "1849-04-01 | \n", "26.140 | \n", "1.387 | \n", "Abidjan | \n", "Côte D'Ivoire | \n", "5.63N | \n", "3.23W | \n", "
4 | \n", "1849-05-01 | \n", "25.427 | \n", "1.200 | \n", "Abidjan | \n", "Côte D'Ivoire | \n", "5.63N | \n", "3.23W | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
239171 | \n", "2013-04-01 | \n", "12.563 | \n", "1.823 | \n", "Xian | \n", "China | \n", "34.56N | \n", "108.97E | \n", "
239172 | \n", "2013-05-01 | \n", "18.979 | \n", "0.807 | \n", "Xian | \n", "China | \n", "34.56N | \n", "108.97E | \n", "
239173 | \n", "2013-06-01 | \n", "23.522 | \n", "0.647 | \n", "Xian | \n", "China | \n", "34.56N | \n", "108.97E | \n", "
239174 | \n", "2013-07-01 | \n", "25.251 | \n", "1.042 | \n", "Xian | \n", "China | \n", "34.56N | \n", "108.97E | \n", "
239175 | \n", "2013-08-01 | \n", "24.528 | \n", "0.840 | \n", "Xian | \n", "China | \n", "34.56N | \n", "108.97E | \n", "
228175 rows × 7 columns
\n", "