{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table of contents\n",
    "1. [Settings](#settings)\n",
    "2. [Download](#download)\n",
    "3. [Preprocessing](#pre)\n",
    "4. [Heat demand time series](#demand)\n",
    "5. [COP time series](#cop)\n",
    "6. [Writing](#write)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=settings></a>\n",
    "# 1. Settings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "I recommend running this notebook in a [conda environment, which can be created from the environment.yml file](https://conda.io/docs/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) provided with this notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import Python libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Python modules\n",
    "import os\n",
    "import shutil\n",
    "import pandas as pd\n",
    "from time import time\n",
    "from datetime import date\n",
    "\n",
    "# Custom scripts\n",
    "import scripts.download as download \n",
    "import scripts.read as read\n",
    "import scripts.preprocess as preprocess\n",
    "import scripts.demand as demand\n",
    "import scripts.cop as cop\n",
    "import scripts.write as write\n",
    "import scripts.metadata as metadata\n",
    "\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Version and changes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "version = '2019-08-06'\n",
    "changes = 'Minor revisions'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make directories"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "home_path = os.path.realpath('.')\n",
    "\n",
    "input_path = os.path.join(home_path, 'input')\n",
    "interim_path = os.path.join(home_path, 'interim')\n",
    "output_path = os.path.join(home_path, 'output', version)\n",
    "\n",
    "for path in [input_path, interim_path, output_path]:\n",
    "    os.makedirs(path, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select geographical and temporal scope"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_countries = ['AT', 'BE', 'BG', 'CZ', 'DE', 'FR', 'GB', 'HR', \n",
    "                 'HU', 'IE', 'LU', 'NL', 'PL', 'RO', 'SI', 'SK'] # available\n",
    "countries = all_countries  # selected for calculation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "year_start = 2008\n",
    "year_end = 2018"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set ECMWF access key\n",
    "In the following, this notebook downloads weather data from the ECMWF server. For accessing this server, follow the steps below:\n",
    "1.  Register at https://apps.ecmwf.int/registration/.\n",
    "2.  Login at https://apps.ecmwf.int/auth/login/.\n",
    "3.  Retrieve your key at https://api.ecmwf.int/v1/key/.\n",
    "4.  Enter your key and your e-mail below.\n",
    "\n",
    "If you have already [installed](https://confluence.ecmwf.int/display/WEBAPI/Access+ECMWF+Public+Datasets#AccessECMWFPublicDatasets-key) your ECMWF KEY, this step is skipped."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.path.isfile(os.path.join(os.environ['USERPROFILE'], \".ecmwfapirc\")):\n",
    "    os.environ[\"ECMWF_API_URL\"] = \"https://api.ecmwf.int/v1\"\n",
    "    os.environ[\"ECMWF_API_KEY\"] = \"XXXXXXXXXXXXXXXXXXXXXX\"\n",
    "    os.environ[\"ECMWF_API_EMAIL\"] = \"john.smith@example.com\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=download></a>\n",
    "# 2. Download\n",
    "In the following, weather and population data is downloaded from the respective sources. For all years and countries, this takes around 45 minutes to run.\n",
    "\n",
    "Note that standard load profile parameters from [BGW](http://www.gwb-netz.de/wa_files/05_bgw_leitfaden_lastprofile_56550.pdf)/[BDEW](https://www.enwg-veroeffentlichungen.de/badtoelz/Netze/Gasnetz/Netzbeschreibung/LF-Abwicklung-von-Standardlastprofilen-Gas-20110630-final.pdf) and energy statistics from the [EU Builidng Database](http://ec.europa.eu/energy/en/eu-buildings-database) are already provided with this notebook in the input directory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Weather data\n",
    "As mentioned above, weather data is downloaded from ECMWF, more specifically form the [ERA-Interim](https://www.ecmwf.int/en/research/climate-reanalysis/era-interim) archive. The following data is retrieved:\n",
    "* Wind: wind speed at 10 m above ground for heating seasons (October-April) in 1979-2016 in monthly resolution \n",
    "* Temperature: ambient air temperature at 2 m above ground for the selected years in six-hourly resolution \n",
    "\n",
    "All data is downloaded for the whole of Europe. If some data already exists on your computer, this data will be skipped in the download process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "download.wind(input_path)\n",
    "download.temperatures(input_path, year_start, year_end)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Population data\n",
    "As mentioned above, population data is downloaded from [EUROSTAT](http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/geostat)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "download.population(input_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=pre></a>\n",
    "# 3. Preprocessing\n",
    "Population and weather data is preprocessed. This takes around 10 minutes to run."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Re-mapping population data\n",
    "The population data from Eurostat features a 1 km² grid, which country-by-country transformed to the 0.75 x 0.75° grid of the weather data in the following. Interim results are saved/loaded from disk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "mapped_population = preprocess.map_population(input_path, countries, interim_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mapped_population['LU']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing weather data\n",
    "\n",
    "The temporal resolution of the weather data is changed as follows:\n",
    "* Temperatures (air and soil): from six-hours to one hour\n",
    "* Wind: from monthly to the average of all heating periods from 1979 to 2016\n",
    "\n",
    "To speed up the calculation, all weather data is filtered by the selected countries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "wind = preprocess.wind(input_path, mapped_population)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "temperature = preprocess.temperature(input_path, year_start, year_end, mapped_population)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=demand></a>\n",
    "# 4. Heat demand time series\n",
    "For all years and countries, the calculation of heat demand time series takes around 20 minutes to run."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reference temperature\n",
    "\n",
    "To capture the thermal inertia of buildings, the daily reference temperature is calculated as the weighted mean of the ambient air temperature of the actual and the three preceding days. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reference_temperature = demand.reference_temperature(temperature['air'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Daily demand\n",
    "\n",
    "Daily demand factors are derived from the reference temperatures using profile functions as described in [BDEW](https://www.enwg-veroeffentlichungen.de/badtoelz/Netze/Gasnetz/Netzbeschreibung/LF-Abwicklung-von-Standardlastprofilen-Gas-20110630-final.pdf)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "daily_parameters = read.daily_parameters(input_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "daily_heat = demand.daily_heat(reference_temperature, \n",
    "                               wind, \n",
    "                               daily_parameters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "daily_water = demand.daily_water(reference_temperature,\n",
    "                                 wind,\n",
    "                                 daily_parameters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hourly demand\n",
    "\n",
    "Hourly damand factors are calculated from the daily demand based on hourly factors from [BGW](http://www.gwb-netz.de/wa_files/05_bgw_leitfaden_lastprofile_56550.pdf)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hourly_parameters = read.hourly_parameters(input_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hourly_heat = demand.hourly_heat(daily_heat,\n",
    "                                 reference_temperature, \n",
    "                                 hourly_parameters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hourly_water = demand.hourly_water(daily_water,\n",
    "                                   reference_temperature, \n",
    "                                   hourly_parameters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hourly_space = (hourly_heat - hourly_water).clip(lower=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Weight and scale\n",
    "The spatial time series are weighted with the population and normalized to 1 TWh yearly demand each. Years included in the building database are scaled accordingly. The time series not spatially aggregated yet because spatial time series are needed for COP calculation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "building_database = read.building_database(input_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spatial_space = demand.finishing(hourly_space, mapped_population, building_database['space'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spatial_water = demand.finishing(hourly_water, mapped_population, building_database['water'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Safepoint\n",
    "\n",
    "The following cells can be used to save and reload the spatial hourly time series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spatial_space.to_pickle(os.path.join(interim_path, 'spatial_space'))\n",
    "spatial_water.to_pickle(os.path.join(interim_path, 'spatial_water'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spatial_space = pd.read_pickle(os.path.join(interim_path, 'spatial_space'))[countries]\n",
    "spatial_water = pd.read_pickle(os.path.join(interim_path, 'spatial_water'))[countries]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Aggregate and combine\n",
    "All heat demand time series are aggregated country-wise and combined into one data frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "final_heat = demand.combine(spatial_space, spatial_water)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=cop></a>\n",
    "# 5. COP time series\n",
    "For all years and countries, the calculation of the coefficient of performance (COP) of heat pumps takes around 5 minutes to run."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Source temperature \n",
    "For air-sourced, ground-sources and groundwater-sourced heat pumps (ASHP, GSHP and WSHP), the relevant heat source temperatures are calculated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "source_temperature = cop.source_temperature(temperature)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sink temperatures\n",
    "Heat sink temperatures, i.e. the temperature level at which the heat pumps have to provide heat, are calculated for floor heating, radiator heating and warm water."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sink_temperature = cop.sink_temperature(temperature)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## COP\n",
    "The COP is derived from the temperature difference between heat sources and sinks using COP curves."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cop_parameters = read.cop_parameters(input_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spatial_cop = cop.spatial_cop(source_temperature, sink_temperature, cop_parameters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Safepoint\n",
    "\n",
    "The following cells can be used to save and reload the spatial hourly time series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spatial_cop.to_pickle(os.path.join(interim_path, 'spatial_cop'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spatial_cop = pd.read_pickle(os.path.join(interim_path, 'spatial_cop'))[countries]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Aggregating and correction\n",
    "The spatial COP time series are weighted with the spatial heat demand and aggregated into national time series. The national time series are corrected for part-load losses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "final_cop = cop.finishing(spatial_cop, spatial_space, spatial_water)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## COP averages\n",
    "COP averages (performance factors) are calculated and saved to disk for validation purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cop.validation(final_cop, final_heat, interim_path, 'corrected')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cop.validation(cop.finishing(spatial_cop, spatial_space, spatial_water, correction=1),\n",
    "               final_heat, interim_path, \"uncorrected\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=write></a>\n",
    "# 6. Writing\n",
    "For data and metadata, this takes around 5 minutes to run."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data\n",
    "As for the OPSD \"Time Series\" package, data are provided in three different \"shapes\":\n",
    "\n",
    "* SingleIndex (easy to read for humans, compatible with datapackage standard, small file size)\n",
    "  * Fileformat: CSV, SQLite\n",
    "* MultiIndex (easy to read into GAMS, not compatible with datapackage standard, small file size)\n",
    "  * Fileformat: CSV, Excel\n",
    "* Stacked (compatible with data package standard, large file size, many rows, too many for Excel)\n",
    "  * Fileformat: CSV"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The different shapes are created before they are saved to files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "shaped_dfs = write.shaping(final_heat, final_cop)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write data to an SQL-database, ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "write.to_sql(shaped_dfs, output_path, home_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and to CSV."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "write.to_csv(shaped_dfs, output_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Writing to Excel takes extremely long. As a workaround, a copy of the multi-indexed data is writtten to CSV and manually converted to Excel."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Metadata\n",
    "The metadata is reported in a JSON file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metadata.make_json(shaped_dfs, version, changes, year_start, year_end, output_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Copy input data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "shutil.copytree(input_path, os.path.join(output_path, 'original_data'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checksums"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metadata.checksums(output_path, home_path)"
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}