{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"width:100%; background-color: #D9EDF7; border: 1px solid #CFCFCF; text-align: left; padding: 10px;\">\n",
    "      <b>Conventional Power Plants: Power Plants in Europe</b>\n",
    "      <ul>\n",
    "        <li><a href=\"main.ipynb\">Main Notebook</a></li>\n",
    "        <li><a href=\"download_and_process_DE.ipynb\">Processing notebook for German power plant</a></li>\n",
    "        <li>Processing notebook for European power plants</li>\n",
    "      </ul>\n",
    "      <br>This notebook is part of the <a href=\"http://data.open-power-system-data.org/DATA PACKAGE NAME HERE\"> Data package name here Data Package</a> of <a href=\"http://open-power-system-data.org\">Open Power System Data</a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table of Contents\n",
    "* [1. Script setup](#1.-Script-setup)\n",
    "* [2. Data import](#2.-Settings)\n",
    "\t* [2.1 Data sources](#2.1-Data-sources)\n",
    "\t* [2.2 Functions](#2.2-Functions)\n",
    "\t* [2.3 Definition of harmonized output scheme](#2.3-Definition-of-harmonized-output-scheme)\n",
    "* [3. Data processing per country](#3.-Data-processing-per-country)\n",
    "\t* [3.1 Belgium BE](#3.1-Belgium-BE)\n",
    "\t* [3.2 The Netherlands NL](#3.2-The-Netherlands-NL)\n",
    "\t* [3.3 France FR](#3.3-France-FR)\n",
    "\t* [3.4 Poland PL](#3.4-Poland-PL)\n",
    "\t* [3.5 Czech Republic CZ](#3.5-Czech-Republic-CZ)\n",
    "\t* [3.6 Switzerland CH](#3.6-Switzerland-CH)\n",
    "\t* [3.7 Italy IT](#3.7-Italy-IT)\n",
    "\t* [3.8 Finland FI](#3.8-Finland-FI)\n",
    "\t* [3.9 Spain ES](#3.9-Spain-ES)\n",
    "\t* [3.10 United Kingdom UK](#3.10-United-Kingdom-UK)\n",
    "\t* [3.11 Norway NO](#3.11-Norway-NO)\n",
    "\t* [3.12 Sweden SE](#3.12-Sweden-SE)\n",
    "\t* [3.13 Slovakia SK](#3.13-Slovakia-SK)\n",
    "\t* [3.14 Slovenia SI](#3.14-Slovenia-SI)\n",
    "\t* [3.15 Austria AT](#3.15-Austria-AT)\n",
    "\t* [3.16 Denmark DK](#3.16-Denmark-DK)\n",
    "* [4. Consolidation of processed country data](#4.-Consolidation-of-processed-country-data)\n",
    "\t* [4.1 Implementation of energy source levels](#4.1-Implementation-of-energy-source-levels)\n",
    "\t* [4.2 Definition of structure and data types](#4.2-Definition-of-structure-and-data-types)\n",
    "* [5. Result export](#6.-Result-export)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Script setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import of Python modules needed to process the data and creation of required output folders. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import os\n",
    "import yaml\n",
    "import json\n",
    "import sqlite3\n",
    "import hashlib\n",
    "from download_and_process_functions import get_sha_hash\n",
    "from download_and_process_functions import add_location_and_EIC\n",
    "\n",
    "\n",
    "# create output folder if they do not exist\n",
    "os.makedirs(os.path.join('output'), exist_ok=True)\n",
    "\n",
    "# set data & input directory\n",
    "data_directory = os.path.join('input','data')\n",
    "locations_directory = os.path.join('input', 'locations')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 Data sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unlike the previous releases of this package, where the data was partially downloaded within this script, all data in the current release is pre-downloaded and provided. The following states all relevant data sources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "meta_data = \"\"\"\n",
    "\n",
    "    BE:\n",
    "        filename: ProductionParkOverview.xls\n",
    "        source: https://www.elia.be/en/grid-data/power-generation/generating-facilities#\n",
    "        source_file: https://griddata.elia.be/eliabecontrols.prod/interface/fdn/download/generatingfacilities/xls\n",
    "        filetype: xls\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: no\n",
    "        provider: Elia\n",
    "        institution: TSO\n",
    "        \n",
    "    NL:\n",
    "        filename: export.csv\n",
    "        source: https://www.tennet.org/english/operational_management/export_data.aspx\n",
    "        source_file: www.tennet.org/english/operational_management/export_data.aspx?exporttype=installedcapacity&format=csv&quarter=2019-4&submit=3\n",
    "        filetype: csv\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: no\n",
    "        provider: Tennet\n",
    "        institution: TSO\n",
    "\n",
    "    FR:\n",
    "        filename: Production_Capacities.csv\n",
    "        source: https://www.services-rte.com/en/view-data-published-by-rte/production-installed-capacity.html\n",
    "        source_file: NA\n",
    "        filetype: csv\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: no\n",
    "        provider: RTE\n",
    "        institution: TSO\n",
    "        \n",
    "    PL:\n",
    "        filename: units_list_2019_11_29_PL.csv\n",
    "        source: http://gpi.tge.pl/en/wykaz-jednostek\n",
    "        source_file: http://gpi.tge.pl/en/wykaz-jednostek\n",
    "        date_of_access: Dec 2019\n",
    "        manually_assembled: no\n",
    "        provider: GPI Power Market Data\n",
    "        institution: Information platform\n",
    "        \n",
    "    CZ:\n",
    "        filename: 21915_2019.pdf\n",
    "        source: https://www.ceps.cz/cs/priprava-provozu\n",
    "        source_file: https://www.ceps.cz/cs/priprava-provozu\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: no\n",
    "        provider: Ceps\n",
    "        institution: TSO\n",
    "        \n",
    "    CH:\n",
    "        filename: 2018  Statistik der Wasserkraftanlagen der Schweiz 31.12.2018.csv\n",
    "        source: https://www.bfe.admin.ch/bfe/de/home/versorgung/statistik-und-geodaten/geoinformation/geodaten/wasser/statistik-der-wasserkraftanlagen.html\n",
    "        source_file: https://www.bfe.admin.ch/bfe/de/home/versorgung/statistik-und-geodaten/geoinformation/geodaten/wasser/statistik-der-wasserkraftanlagen.html\n",
    "        date_of access: Dec 2019\n",
    "        manually_assempled: no\n",
    "        provider: Swiss Federal office of energy\n",
    "        institution: Federal Administration\n",
    "        \n",
    "    IT:\n",
    "        filename: 18.xlsx\n",
    "        source: http://www2018.terna.it/it-it/sistemaelettrico/transparencyreport/generation/installedgenerationcapacity.aspx\n",
    "        source_file: http://download.terna.it/terna/0000/0216/18.XLSX\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: no\n",
    "        provider: Terna\n",
    "        institution: TSO\n",
    "\n",
    "    FI:\n",
    "        filename: Energiaviraston voimalaitosrekisteri.csv\n",
    "        source: https://energiavirasto.fi/toimitusvarmuus\n",
    "        source_file: https://energiavirasto.fi/toimitusvarmuus\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: no\n",
    "        provider: energiavirasto\n",
    "        institution: energy agency\n",
    "        \n",
    "    ES:\n",
    "        filename: Registro_16_12_2019.csv \n",
    "        source: https://sede.minetur.gob.es/en-US/datosabiertos/catalogo/registro-productores-electrica\n",
    "        source_file: https://sede.minetur.gob.es/en-US/datosabiertos/catalogo/registro-productores-electrica\n",
    "        date_of_access: Dec 2019 \n",
    "        manually_assembled: no \n",
    "        provider: \n",
    "        institution: \n",
    "        \n",
    "    UK:\n",
    "        filename: DUKES_5.11_UK.csv\n",
    "        source: https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes#content\n",
    "        source_file: https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes#content\n",
    "        date_of_access: Dec 2019\n",
    "        manually_assembled: no\n",
    "        provider: UK gov\n",
    "        institution: UK statistics\n",
    "\n",
    "    NO:\n",
    "        filename_thermal: termiske-kraftverk-i-norge-2019.xlsx\n",
    "        filename_hydro: Vannkraftverk.csv\n",
    "        source: https://www.nve.no/\n",
    "        source_file_thermal: https://www.nve.no/media/8967/termiske-kraftverk-i-norge-2019.xlsx\n",
    "        source_file_hydro: https://www.nve.no/energiforsyning/kraftproduksjon/vannkraft/vannkraftdatabase/#\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: no\n",
    "        provider: Norwegian Water Resources and Energy Directorate\n",
    "        institution: Ministry\n",
    "\n",
    "    SE:\n",
    "        filename: input_plant-list_SE.csv\n",
    "        source: https://www.nordpoolgroup.com/\n",
    "        source_file: NA\n",
    "        date_of_access: 2014\n",
    "        manually_assembled: no\n",
    "        provider: Nordpool Group\n",
    "        institution: Market operator\n",
    "\n",
    "    SK:\n",
    "        filename: input_plant-list_SK.csv\n",
    "        source: https://www.seas.sk/thermal-power-plants\n",
    "        source_file: https://www.seas.sk/thermal-power-plants\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: yes \n",
    "        provider: Slovenské elektrárne\n",
    "        institution: joint-stock company\n",
    "                      \n",
    "    SI:\n",
    "        filename: input_plant-list_SI.csv\n",
    "        source: multiples (in document)\n",
    "        source_file: \n",
    "        date_of_access: Dec 2019\n",
    "        manually_assembled: yes\n",
    "        provider: multiples (in document)\n",
    "        institution: Private company\n",
    "                              \n",
    "    AT:\n",
    "        filename_thermal: input_plant-list_AT_thermal.csv\n",
    "        filename_hydro: input_plant-list_AT_hydro.csv\n",
    "        source: multiples (in document)\n",
    "        source_file:\n",
    "        date_of_access: Feb 2020\n",
    "        manually_assembled: yes\n",
    "        provider: multiples (in document)\n",
    "        institution: multiples (in document)                                   \n",
    "                                              \n",
    "    DK:\n",
    "        filename: input_plant-list_DK.csv \n",
    "        source: multiples (in document)\n",
    "        source_file: \n",
    "        date_of_access: Jan 2020\n",
    "        manually_assembled: yes\n",
    "        provider: multiples (in document)\n",
    "        institution: multiples (in document)\n",
    "        \n",
    "\"\"\"\n",
    "\n",
    "# Conversion to JSON (if needed)\n",
    "# meta_data = yaml.load(meta_data, Loader=yaml.BaseLoader)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 Definition of harmonized output scheme"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To provide a standardizes set of power plant information among all national data sources, a set of required columns is defined which is subsequently filled with available data. The following columns and their structure are the basis for all national data sources. \n",
    "\n",
    "**Note:** If information for specific columns are not available, the data entry is empty. On the other hand, if the national data sources provides other information than required by the scheme, these information are not processed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "columns_sorted = ['name',\n",
    "                  'company',\n",
    "                  'street',\n",
    "                  'postcode',\n",
    "                  'city',\n",
    "                  'country',\n",
    "                  'capacity',\n",
    "                  'energy_source',\n",
    "                  'technology',\n",
    "                  'chp',\n",
    "                  'commissioned',\n",
    "                  'type',\n",
    "                  'lat',\n",
    "                  'lon',\n",
    "                  'eic_code',\n",
    "                  'additional_info',\n",
    "                  'comment',\n",
    "                  'source']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Data processing per country"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 Belgium BE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Belgian transmission network operator [**ELIA**](http://publications.elia.be/upload/ProductionParkOverview.xls?TS=20120416193815). It encompasses a detailed list of Belgian generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_BE = os.path.join(data_directory, 'BE','ProductionParkOverview.xls')\n",
    "data_BE = pd.read_excel(filepath_BE,\n",
    "                       sheet_name='ProductionParkOverview',\n",
    "                       skiprows=1)\n",
    "\n",
    "data_BE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_BE = {'ARP': 'company',\n",
    "                   'Generation plant': 'name',\n",
    "                   'Plant Type': 'technology',\n",
    "                   'Technical Nominal Power (MW)': 'capacity',\n",
    "                   'Remarks': 'comment',\n",
    "                   'Fuel': 'energy_source',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "\n",
    "\n",
    "# Apply general template of columns\n",
    "data_BE = data_BE.rename(columns=dict_columns_BE).reindex(columns=columns_sorted)\n",
    "\n",
    "# Drop rows without capacity entries, so that row with \n",
    "# \"Unit connected to Distribution Grid\" is dropped\n",
    "data_BE.dropna(subset=['capacity'], inplace=True)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_BE.capacity = data_BE.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1.3 Definition of generation type"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The generation type provides information on the 'usage' of the power plants (beside electricity generaiton), thus if the plant is an industrial power plant or provides thermal heat for district heating. \n",
    "\n",
    "The Belgian data source provides only a general information on the heat supply (here: WKK). Thus, due to these general informaiton, we classify corresponding plants as both, industrial or combined heat power plant, and cannot distringuish both types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate entries in column \"type\" according to technology \"WKK\"\n",
    "data_BE.loc[data_BE['technology'] == 'WKK', 'type'] = 'CHP/IPP'\n",
    "data_BE.loc[data_BE['name'].str.contains('WKK'), 'type'] = 'CHP/IPP'\n",
    "\n",
    "# Generate entries in column \"CHP\" according to column \"type\"\n",
    "data_BE.loc[(data_BE['type'] == 'CHP') |\n",
    "            (data_BE['type'] == 'IPP') |\n",
    "            (data_BE['type'] == 'CHP/IPP'), 'chp'] = 'Yes'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1.4 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all technology types mentioned in the column \"technology\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate technologies\n",
    "dict_technology_BE = {'GT': 'Gas turbine',\n",
    "                      'BG': np.nan,\n",
    "                      'CL': 'Steam turbine',\n",
    "                      'WKK': np.nan,\n",
    "                      'CCGT': 'Combined cycle',\n",
    "                      'D': np.nan,\n",
    "                      'HU': np.nan,\n",
    "                      'IS': np.nan,\n",
    "                      'NU': 'Steam turbine',\n",
    "                      'TJ': 'Gas turbine',\n",
    "                      'WT': np.nan,\n",
    "                      ' ': np.nan,\n",
    "                      'nan': np.nan,\n",
    "                     }\n",
    "data_BE[\"technology\"].replace(dict_technology_BE, inplace=True)\n",
    "\n",
    "# add technology parameter for steam and gas turbines\n",
    "data_BE.loc[data_BE['name'].str.contains('ST') &\n",
    "            data_BE['technology'].isna(), 'technology'] = 'Steam turbine'\n",
    "\n",
    "data_BE.loc[data_BE['name'].str.contains('GT') &\n",
    "            data_BE['technology'].isna(), 'technology'] = 'Gas turbine'\n",
    "\n",
    "data_BE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1.5 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\" and subsequent translation check. Deletion of rows containing \"wind\" as energy source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate energy sources\n",
    "dict_energysources_BE = {'BIO': 'Biomass and biogas',\n",
    "                         'BF': 'Other fossil fuels',\n",
    "                         'CL': 'Lignite',\n",
    "                         'CP': 'Hard coal',\n",
    "                         'CG': 'Other fossil fuels',\n",
    "                         'GO': 'Oil',\n",
    "                         'LF': 'Oil',\n",
    "                         'LV': 'Oil',\n",
    "                         'CP/BF': 'Mixed fossil fuels',\n",
    "                         'CP/CG': 'Mixed fossil fuels',\n",
    "                         'FA/BF': 'Mixed fossil fuels',\n",
    "                         'NG/BF': 'Mixed fossil fuels',\n",
    "                         'NG': 'Natural gas',\n",
    "                         'NU': 'Nuclear',\n",
    "                         'WR': 'Non-renewable waste',\n",
    "                         'WA': 'Hydro',\n",
    "                         'WI': 'Wind',\n",
    "                         'WP': 'Biomass and biogas'}\n",
    "data_BE[\"energy_source\"].replace(dict_energysources_BE, inplace=True)\n",
    "data_BE[\"energy_source\"].replace('NaN', np.nan, inplace=True)\n",
    "\n",
    "\n",
    "# Delete unwanted energy source\n",
    "data_BE = data_BE[data_BE.energy_source != 'Wind']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1.6 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "data_BE = add_location_and_EIC('BE', data_BE)\n",
    "\n",
    "# add source\n",
    "data_BE[\"source\"] = \"https://www.elia.be/en/grid-data/power-generation/generating-facilities#\"\n",
    "\n",
    "data_BE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2 The Netherlands NL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2.1 Data import and merger"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Dutch transmission network operator [**TenneT**](http://www.tennet.org/english/operational_management/export_data.aspx). It encompasses the daily available generation capacity, thus a list of Dutch generation units being operational on a specific day. The data is downloaded for all four quarters in 2018."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Import of quartely data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_NL = os.path.join(data_directory, 'NL', 'export.csv')\n",
    "data_NL = pd.read_csv(filepath_NL, encoding='utf-8')\n",
    "\n",
    "data_NL.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Merge columns \"street\" and \"Number\" to one column called \"Street\"\n",
    "data_NL['street'] = data_NL[['street', 'Number']].apply(\n",
    "    lambda x: '{} {}'.format(x[0], x[1]), axis=1)\n",
    "\n",
    "# Drop columns not needed anymore\n",
    "colsToDrop = ['Location', 'Date', 'Number']\n",
    "data_NL = data_NL.drop(colsToDrop, axis=1)\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_NL = {'Connected body': 'company',\n",
    "                   'Entity': 'name',\n",
    "                   'Fuel': 'energy_source',\n",
    "                   'Capacity': 'capacity',\n",
    "                   'zipcode': 'postcode',\n",
    "                   'place-name': 'city'}\n",
    "data_NL.rename(columns=dict_columns_NL, inplace=True)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_NL.capacity = data_NL.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\". Generation of entries for the column \"technology\" according to information given in the column \"energy_source\" by [TenneT](http://www.tennet.org/english/operational_management/system_data_preparation/Reported_production_capacity/Installed_capacity.aspx)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename types of energy sources\n",
    "dict_energysources_NL = {'E01': 'Solar',\n",
    "                         'E02': 'Wind',\n",
    "                         'E03': 'Hydro',\n",
    "                         'E04': 'Biomass and biogas',\n",
    "                         'E05': 'Hard coal',\n",
    "                         'E06': 'Natural gas',\n",
    "                         'E07': 'Oil',\n",
    "                         'E08': 'Nuclear',\n",
    "                         'E09': 'Other or unspecified energy sources'}\n",
    "\n",
    "data_NL[\"energy_source\"].replace(dict_energysources_NL, inplace=True)\n",
    "\n",
    "# Generate technology entry according to energy source\n",
    "data_NL.loc[data_NL['energy_source'] == 'Nuclear',\n",
    "            'technology'] = 'Steam turbine'\n",
    "data_NL.loc[data_NL['energy_source'] == 'Hard coal',\n",
    "            'technology'] = 'Steam turbine'\n",
    "\n",
    "# Delete unwanted energy sources in column \"energy_source\"\n",
    "data_NL = data_NL[data_NL.energy_source != 'Solar']\n",
    "data_NL = data_NL[data_NL.energy_source != 'Wind']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2.4 Select daily entry with highest available capacity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We estimate the installed capacity by the highest available daily capacity for each unit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Filter rows by considering \"name\" and maximum \"capacity\n",
    "data_NL = data_NL.sort_values(\n",
    "    'capacity', ascending=False).groupby('name', as_index=False).first()\n",
    "\n",
    "# Apply general template of columns\n",
    "data_NL = data_NL.reindex(columns=columns_sorted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2.5 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_NL = add_location_and_EIC('NL', data_NL)\n",
    "\n",
    "# add source\n",
    "data_NL[\"source\"] = \"https://www.tennet.org/english/operational_management/export_data.aspx\"\n",
    "\n",
    "data_NL.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.3 France FR"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the French transmission network operator [**RTE**](https://www.services-rte.com/en/view-data-published-by-rte/production-installed-capacity.html). It encompasses a detailed list of French generation units with a capacity of more than 100 MW."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_FR = os.path.join(data_directory, 'FR', 'Production_Capacities.csv')\n",
    "data_FR = pd.read_csv(filepath_FR)\n",
    "\n",
    "data_FR.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_FR = {'Type': 'energy_source',\n",
    "                   'Name': 'name',\n",
    "                   'Installed capacity (MW)': 'capacity',\n",
    "                   'Start date of the current version': 'commissioned',\n",
    "                   'Location': 'country'\n",
    "                   }\n",
    "data_FR.rename(columns=dict_columns_FR, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_FR = data_FR.reindex(columns=columns_sorted)\n",
    "\n",
    "# Delete place holder datetime\n",
    "data_FR[\"commissioned\"].replace('01/01/2000', np.nan, inplace=True)\n",
    "\n",
    "# Map commissioned year to Timestamp col\n",
    "data_FR['commissioned_year'] = pd.to_datetime(data_FR['commissioned'], format='%d/%m/%Y')\n",
    "# Reassing commissioned col with year only\n",
    "mask = data_FR['commissioned_year'].notna()\n",
    "data_FR.loc[mask, 'commissioned'] = data_FR.loc[mask].commissioned_year.apply(lambda x: x.year)\n",
    "# Drop not needed col\n",
    "data_FR.drop('commissioned_year', axis=1, inplace=True)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_FR.capacity = data_FR.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3.4 Definition of energy sources and generation of technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for technologies. Overall translation of all energy sources types mentioned in the column \"energy_sources\" and subsequent translation check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate technology entries according to energy sources\n",
    "data_FR.loc[data_FR['energy_source'] == 'Pumping',\n",
    "            'technology'] = 'Pumped storage'\n",
    "data_FR.loc[data_FR['energy_source'] == 'Hydraulic over water / guided through',\n",
    "            'technology'] = 'Run-of-river'\n",
    "data_FR.loc[data_FR['energy_source'] == 'Hydraulic lakes',\n",
    "            'technology'] = 'Reservoir'\n",
    "data_FR.loc[data_FR['energy_source'] == 'Nuclear',\n",
    "            'technology'] = 'Steam turbine'\n",
    "data_FR.loc[data_FR['energy_source'] == 'Hard coal',\n",
    "            'technology'] = 'Steam turbine'\n",
    "\n",
    "# Translate types of energy sources\n",
    "dict_energysources_FR = {'Other': 'Other or unspecified energy sources',\n",
    "                         'Gas': 'Natural gas',\n",
    "                         'Pumping': 'Hydro',\n",
    "                         'Hydraulic over water / guided through': 'Hydro',\n",
    "                         'Hydraulic lakes': 'Hydro',\n",
    "                         'Biomass': 'Biomass and biogas'}\n",
    "data_FR[\"energy_source\"].replace(dict_energysources_FR, inplace=True)\n",
    "\n",
    "\n",
    "# Delete unwanted energy sources in column \"energy_source\"\n",
    "data_FR = data_FR[data_FR.energy_source != 'Wind']\n",
    "data_FR = data_FR[data_FR.energy_source != 'Solar']\n",
    "data_FR = data_FR[data_FR.energy_source != 'Marine']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3.5 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_FR = add_location_and_EIC('FR', data_FR)\n",
    "\n",
    "# add source\n",
    "data_FR[\"source\"] = \"https://www.services-rte.com/en/view-data-published-by-rte/production-installed-capacity.html\"\n",
    "\n",
    "data_FR.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.4 Poland PL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Polish Power Exchange [**GPI**](http://gpi.tge.pl/en/wykaz-jednostek?p_p_id=powerunits_WAR_powerunitsportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_cacheability=cacheLevelPage&p_p_col_id=column-1&p_p_col_count=1\n",
    "). It encompasses a detailed list of large Polish generation units with information on the single power plant blocks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "filepath_PL = os.path.join(data_directory, 'PL', 'units_list_2019_11_29_PL.csv')\n",
    "data_PL = pd.read_csv(filepath_PL, sep=';')\n",
    "\n",
    "data_PL.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename columns\n",
    "dict_columns_PL = {'Generating unit name': 'name',\n",
    "                   'Comments': 'comment',\n",
    "                   'Available capacity [MW]': 'capacity',\n",
    "                   'Basic fuel': 'energy_source',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename first column\n",
    "data_PL.columns.values[0] = 'company'\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_PL = {'Generating unit name': 'name',\n",
    "                   'Comments': 'comment',\n",
    "                   'Available capacity [MW]': 'capacity',\n",
    "                   'Basic fuel': 'energy_source',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "data_PL = data_PL.rename(columns=dict_columns_PL)\n",
    "\n",
    "# Fill columns \"energy_source\" and \"company\" with the belonging entries\n",
    "cols = ['energy_source', 'company']\n",
    "data_PL[cols] = data_PL[cols].ffill()\n",
    "\n",
    "# Delete empty and therefore unwanted rows by referring to column \"Generating unit code\"\n",
    "data_PL = data_PL.dropna(subset=['Generating unit code'])\n",
    "\n",
    "# Apply general template of columns\n",
    "data_PL = data_PL.reindex(columns=columns_sorted)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_PL.capacity = data_PL.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename energy sources types\n",
    "dict_energysources_PL = {'Brown coal': 'Lignite',\n",
    "                         'Black coal': 'Hard coal',\n",
    "                         'Water': 'Hydro',\n",
    "                         'Natural gas': 'Natural gas',\n",
    "                        }\n",
    "data_PL[\"energy_source\"].replace(dict_energysources_PL, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4.4 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for the column \"technology\" according to information given in the column \"energy_source\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate entries in column \"technology\" according to energy source \"hydro\"\n",
    "data_PL.loc[data_PL['energy_source'] == 'Hydro', 'technology'] = 'Pumped storage'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4.5 Additional information on further power plants, geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_PL = add_location_and_EIC('PL', data_PL)\n",
    "\n",
    "# add source\n",
    "data_PL[\"source\"] = \"http://gpi.tge.pl/en/wykaz-jednostek\"\n",
    "\n",
    "data_PL.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.5 Czech Republic CZ"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Czech transmission network operator [**CEPS**](https://www.ceps.cz/cs/priprava-provozu). It encompasses the daily available capacity reported by the transmission system operator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_CZ = os.path.join(data_directory, 'CZ', '21915_2019.csv')\n",
    "data_CZ = pd.read_csv(filepath_CZ, encoding='utf-8')\n",
    "\n",
    "data_CZ.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Remove white space from names\n",
    "data_CZ['název elektrárny'] = data_CZ['název elektrárny'].str.strip()\n",
    "data_CZ['oznacení bloku'] = data_CZ['oznacení bloku'].str.strip()\n",
    "data_CZ['Typ'] = data_CZ['Typ'].str.strip()\n",
    "data_CZ['Palivo'] = data_CZ['Palivo'].str.strip()\n",
    "\n",
    "# Insert dummy G1 where plant block is NA\n",
    "data_CZ.loc[data_CZ['oznacení bloku'].isna(), 'oznacení bloku'] = 'G1'\n",
    "\n",
    "# Merge name and block to one column called \"name\"\n",
    "data_CZ['name'] = data_CZ['název elektrárny'].map(str) + ' ' + data_CZ['oznacení bloku']\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_CZ = {'Typ': 'technology',\n",
    "                   'Palivo': 'energy_source',\n",
    "                   'výkon instalovaný (MW)': 'capacity'\n",
    "                   }\n",
    "data_CZ.rename(columns=dict_columns_CZ, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_CZ = data_CZ.reindex(columns=columns_sorted)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_CZ.capacity = data_CZ.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5.3 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all technology types mentioned in the column \"technology\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate energy source\n",
    "dict_energy_source_CZ = {'VODA': 'Hydro',\n",
    "                        'PLYN': 'Natural gas',\n",
    "                        'OLEJ': 'Oil',\n",
    "                        'URAN': 'Nuclear',\n",
    "                        'HU': 'Lignite',\n",
    "                        'CU': 'Hard coal',\n",
    "                        'BIO': 'Bioenergy'}\n",
    "data_CZ[\"energy_source\"].replace(dict_energy_source_CZ, inplace=True)\n",
    "\n",
    "# Translate technologies\n",
    "dict_technologies_CZ = {'PE': 'Steam turbine',\n",
    "                        'PPE': 'Combined cycle',\n",
    "                        'PSE': 'Combined cycle',\n",
    "                        'JE': 'Steam turbine',\n",
    "                        'VE': np.nan,\n",
    "                        'PVE': 'Pumped storage'}\n",
    "data_CZ[\"technology\"].replace(dict_technologies_CZ, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5.4 Additional information on further power plants, geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_CZ = add_location_and_EIC('CZ', data_CZ)\n",
    "\n",
    "# add source\n",
    "data_CZ[\"source\"] = \"https://www.ceps.cz/cs/priprava-provozu\"\n",
    "    \n",
    "data_CZ.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.6 Switzerland CH"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.6.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Swiss Ministry of Energy [**BFE**](https://www.bfe.admin.ch/bfe/de/home/versorgung/statistik-und-geodaten/geoinformation/geodaten/wasser/statistik-der-wasserkraftanlagen.html). It encompasses a detailed list of Swiss hydro generation units with comprehensive information on technical specifications. The list of nuclear generators is manually assembled and provided separately."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_CH_hydro = os.path.join(data_directory, 'CH', '2018  Statistik der Wasserkraftanlagen der Schweiz 31.12.2018.csv')\n",
    "data_CH = pd.read_csv(filepath_CH_hydro, error_bad_lines=False, sep=';',decimal=',')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_CH_nuclear = os.path.join(data_directory, 'CH', 'input_plant-list_CH_nuclear.csv')\n",
    "data_nuclear_CH = pd.read_csv(filepath_CH_nuclear, encoding='utf-8', header=0, index_col=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 3.6.2 Processing of Hydro generator list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, the imported generator list of hydro generators is standardized."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Consolidation of columns**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Merge columns \"ZE-Erste Inbetriebnahme\" and \"ZE-Letzte Inbetriebnahme\" to one column called \"Commissioned\"\n",
    "data_CH['commissioned'] = data_CH[\n",
    "    ['ZE-Erste-Inbetriebnahme', 'ZE-Letzte-Inbetriebnahme']].apply(\n",
    "        lambda x: max(x[0], x[1]), axis=1)\n",
    "\n",
    "# Merge columns \"Bemerkung (1) - (10)\" to one column \"Comment\"\n",
    "data_CH['comment'] = data_CH[['Bemerkung (1)',\n",
    "                              'Bemerkung (2)',\n",
    "                              'Bemerkung (3)',\n",
    "                              'Bemerkung (4)',\n",
    "                              'Bemerkung (5)',\n",
    "                              'Bemerkung (6)',\n",
    "                              'Bemerkung (7)',\n",
    "                              'Bemerkung (8)',\n",
    "                              'Bemerkung (9)',\n",
    "                              'Bemerkung (10)']].apply(\n",
    "                                  lambda x:\n",
    "                                      '{}; {}; {}; {}; {}; {}; {}; {}; {}; {}'.format(\n",
    "                                          x[0],\n",
    "                                          x[1],\n",
    "                                          x[2],\n",
    "                                          x[3],\n",
    "                                          x[4],\n",
    "                                          x[5],\n",
    "                                          x[6],\n",
    "                                          x[7],\n",
    "                                          x[8],\n",
    "                                          x[9]), axis=1)\n",
    "\n",
    "data_CH['comment'] = data_CH['comment'].str.replace('nan; ', '')\n",
    "data_CH['comment'] = data_CH['comment'].str.replace('nan', '')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Translation and harmonization of columns**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_CH = {'WKA-Name': 'name',\n",
    "                   'ZE-Standort': 'city',\n",
    "                   'WKA-Typ': 'technology',\n",
    "                   'ZE-Status': 'availability',\n",
    "                   'Inst. Turbinenleistung': 'capacity'}\n",
    "data_CH.rename(columns=dict_columns_CH, inplace=True)\n",
    "\n",
    "# Adjust type of entries in column \"Capacity\"\n",
    "data_CH.capacity = data_CH.capacity.astype(float)\n",
    "\n",
    "# Adjust availabilities\n",
    "dict_availabilities_CH = {'im Normalbetrieb': '1',\n",
    "                          'im Bau': '0',\n",
    "                          'im Umbau': '0',\n",
    "                          'stillgelegt': '0'}\n",
    "data_CH[\"availability\"].replace(dict_availabilities_CH, inplace=True)\n",
    "\n",
    "# List only operating plants\n",
    "data_CH = data_CH[data_CH.availability != '0']\n",
    "\n",
    "# Apply general template of columns\n",
    "data_CH = data_CH.reindex(columns=columns_sorted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Definition of generation technology types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set energy source to \"hydro\"\n",
    "data_CH['energy_source'] = 'Hydro'\n",
    "\n",
    "# Adjust technologies\n",
    "dict_technologies_CH = {'L': 'Run-of-river',\n",
    "                        'S': 'Reservoir',\n",
    "                        'P': 'Pumped storage with natural inflow',\n",
    "                        'U': 'Pumped storage'}\n",
    "data_CH[\"technology\"].replace(dict_technologies_CH, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.6.3 Merge hydro and nuclear power plant data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add source for hydro\n",
    "data_CH[\"source\"] = \"https://www.bfe.admin.ch/bfe/de/home/versorgung/statistik-und-geodaten/geoinformation/geodaten/wasser/statistik-der-wasserkraftanlagen.html\"\n",
    "\n",
    "# Concat dataframes\n",
    "data_CH = pd.concat([data_CH, data_nuclear_CH], ignore_index=True, sort=False)\n",
    "\n",
    "data_CH.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.6.4 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual hydro power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_CH = add_location_and_EIC('CH', data_CH)\n",
    "\n",
    "data_CH.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.7 Italy IT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.7.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Italian transmission network operator [**TERNA**](http://download.terna.it/terna/0000/0216/18.XLSX). It encompasses a detailed list of Italian generation units of more than 100 MW."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_IT = os.path.join(data_directory, 'IT', '18.XLSX')\n",
    "data_IT = pd.read_excel(filepath_IT, sheet_name='UPR PmaxOver 100MW') \n",
    "\n",
    "data_IT.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.7.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_IT = {'Descrizione Impianto': 'name',\n",
    "                   'TIPOLOGIA': 'energy_source',\n",
    "                   'Comune': 'city',\n",
    "                   'PMAX [MW]': 'capacity',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source',\n",
    "                   'Zona': 'additional_info'}\n",
    "data_IT.rename(columns=dict_columns_IT, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_IT = data_IT.reindex(columns=columns_sorted)\n",
    "\n",
    "# Consider of geographical information in column \"additional_info\"\n",
    "data_IT['additional_info'] = data_IT[['additional_info']].apply(\n",
    "    lambda x: 'Zone: {}'.format(x[0]), axis=1)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_IT.capacity = data_IT.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.7.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy source types mentioned in the column \"energy_sources\". Deletion of rows containing \"wind\" and \"geothermal_power\"as energy source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate types of energy sources\n",
    "dict_energysources_IT = {'GEOTERMICO': 'Geothermal',\n",
    "                         'TERMOELETTRICO': 'Mixed fossil fuels',\n",
    "                         'IDROELETTRICO': 'Hydro',\n",
    "                         'EOLICO': 'Wind'}\n",
    "data_IT[\"energy_source\"].replace(dict_energysources_IT, inplace=True)\n",
    "\n",
    "# Delete unwanted energy sources in column \"energy_source\"\n",
    "data_IT = data_IT[data_IT.energy_source != 'Wind']\n",
    "data_IT = data_IT[data_IT.energy_source != 'Geothermal']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.7.4 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_IT = add_location_and_EIC('IT', data_IT)\n",
    "\n",
    "# add source\n",
    "data_IT[\"source\"] = \"http://www2018.terna.it/it-it/sistemaelettrico/transparencyreport/generation/installedgenerationcapacity.aspx\"\n",
    "\n",
    "data_IT.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.8 Finland FI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.8.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Finnish [**Energy Authority**](https://energiavirasto.fi/toimitusvarmuus). It encompasses a detailed list of Finnish generation units of at least one megavolt ampere [1 MVA]."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_FI = os.path.join(data_directory, 'FI', 'Energiaviraston voimalaitosrekisteri.csv')\n",
    "data_FI = pd.read_csv(filepath_FI, sep=';')  \n",
    "\n",
    "data_FI.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.8.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate entries in column \"CHP\"\n",
    "data_FI.loc[data_FI[\n",
    "    'Combined Heat and Power Production, Industry,Maximum, Total, MW'] > 0,\n",
    "    'chp'] = 'Yes'\n",
    "data_FI.loc[data_FI[\n",
    "    'Combined Heat and Power Production, District Heating, Total, MW'] > 0,\n",
    "    'chp'] = 'Yes'\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_FI = {'Name': 'name',\n",
    "                   'Company': 'company',\n",
    "                   'Type': 'type',\n",
    "                   'Address': 'street',\n",
    "                   'Town': 'city',\n",
    "                   'Postal code': 'postcode',\n",
    "                   'Maximum, total, MW': 'capacity',\n",
    "                   'Main fuel': 'energy_source',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "data_FI.rename(columns=dict_columns_FI, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_FI = data_FI.reindex(columns=columns_sorted)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_FI.capacity = data_FI.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.8.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\". Generation of entries for the column \"energy_scoures\" according to information given in the column \"type\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename types of energy sources\n",
    "dict_energysources_FI = {'Biogas': 'Biomass and biogas',\n",
    "                         'Black liquor and concentrated liquors': 'Biomass and biogas',\n",
    "                         'Blast furnace gas': 'Other fossil fuels',\n",
    "                         'By-products from wood processing industry': 'Biomass and biogas',\n",
    "                         'Exothermic heat from industry': 'Other or unspecified energy sources',\n",
    "                         'Forest fuelwood': 'Biomass and biogas',\n",
    "                         'Gasified waste': 'Non-renewable waste',\n",
    "                         'Hard coal and anthracite': 'Hard coal',\n",
    "                         'Heavy distillates': 'Oil',\n",
    "                         'Industrial wood residues': 'Biomass and biogas',\n",
    "                         'Light distillates': 'Oil',\n",
    "                         'Medium heavy distillates': 'Oil',\n",
    "                         'Mixed fuels': 'Mixed fossil fuels',\n",
    "                         'Natural gas': 'Natural gas',\n",
    "                         'Nuclear energy': 'Nuclear',\n",
    "                         'Other by-products and wastes used as fuel': 'Other fossil fuels',\n",
    "                         'Other non-specified energy sources': 'Other or unspecified energy sources',\n",
    "                         'Peat': 'Biomass and biogas',\n",
    "                         ' ': 'Other or unspecified energy sources',\n",
    "                         np.nan: 'Other or unspecified energy sources'}\n",
    "data_FI[\"energy_source\"].replace(dict_energysources_FI, inplace=True)\n",
    "data_FI[\"energy_source\"].replace('NaN', np.nan, inplace=True)\n",
    "\n",
    "# Generate entries in column \"energy_sources\" for hydro and wind stations according to column \"type\"\n",
    "data_FI.loc[data_FI['type'] == 'Hydro power', 'energy_source'] = 'Hydro'\n",
    "data_FI.loc[data_FI['type'] == 'Wind power', 'energy_source'] = 'Wind'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.8.4 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for the column \"technology\" according to information given in the column \"energy_source\". Deletion of rows containing \"wind\" as energy source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate entries in column \"technology\" according to column \"energy_source\"\n",
    "data_FI.loc[data_FI['energy_source'] == 'Nuclear',\n",
    "            'technology'] = 'Steam turbine'\n",
    "data_FI.loc[data_FI['energy_source'] == 'Hard coal',\n",
    "            'technology'] = 'Steam turbine'\n",
    "\n",
    "# Delete unwanted energy source (wind) in column \"energy_source\"\n",
    "data_FI = data_FI[data_FI.energy_source != 'Wind']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.8.5 Definition of generation type"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all types mentioned in the column \"type\" and subsequent translation check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename types\n",
    "dict_types_FI = {'District heating CHP': 'CHP',\n",
    "                 'Hydro power': 'NaN',\n",
    "                 'Industry CHP': 'IPP',\n",
    "                 'Nuclear energy': 'NaN',\n",
    "                 'Separate electricity production': 'NaN',\n",
    "                 'Wind power': 'NaN'}\n",
    "data_FI[\"type\"].replace(dict_types_FI, inplace=True)\n",
    "data_FI[\"type\"].replace('NaN', np.nan, inplace=True)\n",
    "\n",
    "# drop solar generator (redundant)\n",
    "data_FI = data_FI[data_FI.type != 'Solar']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.8.6 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "data_FI = add_location_and_EIC('FI', data_FI)\n",
    "\n",
    "# add source\n",
    "data_FI[\"source\"] = \"https://energiavirasto.fi/toimitusvarmuus\"\n",
    "\n",
    "data_FI.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.9 Spain ES"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.9.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Spanish [**SEDE - Ministry of Industry, Energy and Tourism**](https://sede.minetur.gob.es/en-US/datosabiertos/catalogo/registro-productores-electrica). It encompasses a detailed list of Spanish generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_ES = os.path.join(data_directory, 'ES', 'Registro_16_12_2019.csv')\n",
    "data_ES = pd.read_csv(filepath_ES, error_bad_lines=False, sep=';',decimal=',')  \n",
    "\n",
    "data_ES.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.9.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall adjustment of all columns within the dataframe. Translation, addition, deletion, sorting of columns as well as adjustment of the column entries' types. Adjustment of the entries' units from kW to MW in the columns \"Capacity\" (corresponding to the net capacity in the original data set)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Select value of 'Potencia Instalada en MW' if gross capacity is empty\n",
    "data_ES['Potencia Bruta Total en MW'] = np.where(data_ES['Potencia Bruta Total en MW'].isnull(),data_ES['Potencia Instalada en MW'],data_ES['Potencia Bruta Total en MW'])\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_ES = {'Nombre del Titular de la Unidad de Producción': 'company',\n",
    "                   'Nombre de la Unidad de Producción': 'name',\n",
    "                   'Municipio de la Unidad de Producción': 'city',\n",
    "                   'CPostal del Titular': 'postcode',\n",
    "                   'Tecnología de la Unidad de Producción': 'technology',\n",
    "                   'Comment': 'comment',\n",
    "                   'Potencia Bruta Total en MW': 'capacity',\n",
    "                   'Tipo de Unidad de Producción': 'energy_source',\n",
    "                   'Fecha de la puesta en servicio de la Unidad de Producción': 'commissioned',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "data_ES.rename(columns=dict_columns_ES, inplace=True)\n",
    "\n",
    "#Fix capacity entries to float\n",
    "numeric_capacity = []\n",
    "for cap in data_ES.capacity:\n",
    "    if isinstance(cap, float):\n",
    "        numeric_capacity.append(cap)\n",
    "    else:\n",
    "        split_cap = cap.split(',')\n",
    "        if len(split_cap) == 1:\n",
    "            numeric_capacity.append(int(split_cap[0]))\n",
    "        elif len(split_cap) == 2:\n",
    "            numeric_capacity.append(int(split_cap[0]) + float('.' + split_cap[1]))\n",
    "        elif len(split_cap) == 3:\n",
    "            numeric_capacity.append(int(split_cap[0])*1000 + int(split_cap[1]) + float('.' + split_cap[2]))\n",
    "        else:\n",
    "            numeric_capacity.append(np.nan)\n",
    "   \n",
    "data_ES[\"capacity\"] = numeric_capacity  \n",
    "\n",
    "# Apply general template of columns\n",
    "data_ES = data_ES.reindex(columns=columns_sorted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.9.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dict_energysources_ES = {'Biocombustibles liquidos': 'Biomass and biogas',\n",
    "                         'Biogas': 'Biomass and biogas',\n",
    "                         'Biogas de digestion': 'Biomass and biogas',\n",
    "                         'Biogas de vertedero': 'Biomass and biogas',\n",
    "                         'Biomasa industrial agricola': 'Biomass and biogas',\n",
    "                         'Biomasa industrial forestal': 'Biomass and biogas',\n",
    "                         'Biomasa primaria': 'Biomass and biogas',\n",
    "                         'Calor residual': 'Other or unspecified energy sources',\n",
    "                         'Carbon': 'Hard coal',\n",
    "                         'CARBON IMPORTADO': 'Hard coal',\n",
    "                         'Cultivos energeticos agricolas o forestales': 'Biomass and biogas',\n",
    "                         'DIESEL': 'Oil',\n",
    "                         'Energias residuales': 'Non-renewable waste',\n",
    "                         'Fuel': 'Oil',\n",
    "                         'FUEL-OIL 0,3': 'Oil',\n",
    "                         'FUELOLEO': 'Oil',\n",
    "                         'GAS DE REFINERIA': 'Natural gas',\n",
    "                         'Gas natural': 'Natural gas',\n",
    "                         'GAS NATURAL': 'Natural gas',\n",
    "                         'Gas residual': 'Natural gas',\n",
    "                         'Gasoleo': 'Oil',\n",
    "                         'GASOLEO': 'Oil',\n",
    "                         'HULLA+ANTRACITA': 'Hard coal',\n",
    "                         'Licores negros': 'Biomass and biogas',\n",
    "                         'LIGNITO NEGRO': 'Lignite',\n",
    "                         'LIGNITO PARDO': 'Lignite',\n",
    "                         'NUCLEAR': 'Nuclear',\n",
    "                         'Propano': 'Natural gas',\n",
    "                         'Residuo aprovechamiento forestal o selvicola': 'Other bioenergy and renewable waste',\n",
    "                         'Residuos': 'Non-renewable waste',\n",
    "                         'Residuos actividad agricolas o jardineria': 'Other bioenergy and renewable waste',\n",
    "                         'Residuos industriales': 'Non-renewable waste',\n",
    "                         'Residuos solidos urbanos': 'Non-renewable waste',\n",
    "                         'RESIDUOS SOLIDOS URBANOS': 'Non-renewable waste',\n",
    "                         ' ': 'Other or unspecified energy sources',\n",
    "                         np.nan: 'Other or unspecified energy sources',\n",
    "                        'HIDRÁULICA': 'Hydro',\n",
    "                        'TERMONUCLEAR': 'Nuclear',\n",
    "                        'TÉRMICA': 'Hard coal',\n",
    "                        'TÉRMICA CLÁSICA': 'Hard coal'}\n",
    "\n",
    "data_ES[\"energy_source\"].replace(dict_energysources_ES, inplace=True)\n",
    "data_ES[\"energy_source\"].replace('NaN', np.nan, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.9.4 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all technology types mentioned in the column \"technology\". "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate  technologies\n",
    "data_ES.loc[data_ES.technology == \"COGENERACIÓN\", \"chp\"] = \"yes\"\n",
    "\n",
    "data_ES[\"technology\"].replace('COGENERACIÓN','chp', inplace=True)\n",
    "\n",
    "dict_technologies_ES = {'FLUYENTE': 'Run-of-river',\n",
    "                        'EMBALSE': 'Reservoir',\n",
    "                        'BOMBEO MIXTO': 'Pumped storage with natural inflow',\n",
    "                        'CT CARBÓN': '',\n",
    "                        'CN PWR': '',\n",
    "                        'CN BWR': '',\n",
    "                        'COGENERACIÓN': 'chp',\n",
    "                        'Turbinas de Vapor de Fuel': '',\n",
    "                        'CICLO COMBINADO': 'Combined cycle',\n",
    "                        'Ciclo combinado configuración 2x1': 'Combined cycle',\n",
    "                        'RESÍDUOS SÓLIDOS URBANOS': '',\n",
    "                        'Turbinas de vapor de Carbón': 'Steam turbine',\n",
    "                        'Ciclo combinado configuración 3x1': 'Combined cycle',\n",
    "                        'Turbinas de gas aeroderivadas': 'Gas turbine',\n",
    "                        'TURBINA DE GAS Y DE VAPOR': 'Gas turbine',\n",
    "                        'BOMBEO PURO': 'Pumped storage',\n",
    "                        'CT FUELOLEO': '',\n",
    "                        'Grupos Diésel - 4T': '',\n",
    "                        'MOTORES DIESEL': '',\n",
    "                        'Turbinas de gas heavy duty': 'Gas turbine',\n",
    "                        'Grupos Diésel - 2T': '',\n",
    "                        'BOMBEO+ EOLICA': '',\n",
    "                        'TURBINA DE GAS': 'Gas turbine',\n",
    "                        'CT FUEL-GAS': '',\n",
    "                        }\n",
    "data_ES.loc[:, \"technology\"] = data_ES[\"technology\"].replace(dict_technologies_ES)\n",
    "\n",
    "data_ES.loc[data_ES.technology == \"chp\", \"chp\"] = \"yes\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.9.5 Delete unwanted energy sources/power stations with no names and adjust commissioning year"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exclude renewable energy sources, delete power stations with no names and capacities, and adjust the format of commissioning year."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Delete unwanted energy source in column \"energy_source\" & \"technology\"\n",
    "data_ES = data_ES[data_ES.energy_source != 'TERMOELÉCTRICA']\n",
    "data_ES = data_ES[data_ES.energy_source != 'COGENERACIÓN']\n",
    "data_ES = data_ES[data_ES.energy_source != 'EXPERIMENTAL']\n",
    "data_ES = data_ES[data_ES.name != '']\n",
    "\n",
    "# Delete power stations with no name and no capacities\n",
    "data_ES = data_ES[data_ES.name.notna()]\n",
    "\n",
    "# Map commissioned year to Timestamp col\n",
    "data_ES['commissioned_year'] = pd.to_datetime(data_ES['commissioned'], format='%d.%m.%Y %H:%M')\n",
    "# Reassing commissioned col with year only\n",
    "mask = data_ES['commissioned_year'].notna()\n",
    "data_ES.loc[mask, 'commissioned'] = data_ES.loc[mask].commissioned_year.apply(lambda x: x.year)\n",
    "# Drop not needed col\n",
    "data_ES.drop('commissioned_year', axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.9.6 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_ES = add_location_and_EIC('ES', data_ES)\n",
    "\n",
    "# add source\n",
    "data_ES[\"source\"] = \"https://sede.minetur.gob.es/en-US/datosabiertos/catalogo/registro-productores-electrica\"\n",
    "\n",
    "data_ES.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.10 United Kingdom UK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.10.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the British government's [**Statistical Office**](https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/446457/dukes5_10.xls). It encompasses a detailed list of British generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_UK = os.path.join(data_directory, 'UK', 'DUKES_5.11_UK.csv')\n",
    "data_UK = pd.read_csv(filepath_UK, sep=';') \n",
    "\n",
    "data_UK.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.10.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename sixth column\n",
    "data_UK.columns.values[6] = 'Location'\n",
    "\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_UK = {'Company Name': 'company',\n",
    "                   'Station Name': 'name',\n",
    "                   'Installed Capacity (MW)': 'capacity',\n",
    "                   'Country': 'country',\n",
    "                   'Location': 'location',\n",
    "                   'Fuel': 'energy_source',\n",
    "                   'Year of commission or year generation began': 'commissioned',\n",
    "                   'Source': 'source'}\n",
    "data_UK.rename(columns=dict_columns_UK, inplace=True)\n",
    "\n",
    "        \n",
    "dict_regions_UK = {'East': 'England',\n",
    "                   'East Midlands': 'England',\n",
    "                   'London': 'England',\n",
    "                   'North East': 'England',\n",
    "                   'North West': 'England',\n",
    "                   'South East': 'England',\n",
    "                   'South West': 'England',\n",
    "                   'West Midlands': 'England',\n",
    "                   'Yorkshire and the Humber': 'England',\n",
    "                   'N Ireland': 'Northern Ireland'}\n",
    "data_UK[\"location\"]=data_UK[\"location\"].replace(dict_regions_UK, inplace=True)\n",
    "\n",
    "# Merge columns \"Country\" and \"Location\" to one column called \"Country\"\n",
    "data_UK['additional_info'] = data_UK[['location']].apply(\n",
    "    lambda x: 'Region: {}'.format(x[0]), axis=1)\n",
    "\n",
    "# Drop column \"Location\" after merger\n",
    "colsToDrop = ['location']\n",
    "data_UK = data_UK.drop(colsToDrop, axis=1)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_UK = data_UK.reindex(columns=columns_sorted)\n",
    "\n",
    "#Set specific territory to \"additional_info\"\n",
    "data_UK['additional_info'] = data_UK['country']\n",
    "\n",
    "# Solve comma problem in capacity column and convert to float\n",
    "data_UK.capacity = data_UK.capacity.str.replace(',', '').astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.10.3 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for the column \"technology\" according to information given in the column \"energy_source\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate entries in column \"technology\" according to column \"energy_source\"\n",
    "data_UK.loc[data_UK['energy_source'] == 'Hydro / pumped storage', \n",
    "            'technology'] = 'Pumped storage'\n",
    "data_UK.loc[data_UK['energy_source'] == 'Pumped storage',\n",
    "            'technology'] = 'Pumped storage'\n",
    "data_UK.loc[data_UK['energy_source'] == 'Wind',\n",
    "            'technology'] = 'Onshore'\n",
    "data_UK.loc[data_UK['energy_source'] == 'Wind (offshore)',\n",
    "            'technology'] = 'Offshore'\n",
    "data_UK.loc[data_UK['energy_source'] == 'Nuclear',\n",
    "            'technology'] = 'Steam turbine'\n",
    "data_UK.loc[data_UK['energy_source'] == 'CCGT',\n",
    "            'technology'] = 'Combined cycle'\n",
    "data_UK.loc[data_UK['energy_source'] == 'OCGT',\n",
    "            'technology'] = 'Gas turbine'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.10.4 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_source\" and subsequent translation check. Deletion of rows containing \"wind\" as energy source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dict_energysources_UK = {'Biomass': 'Biomass and biogas',\n",
    "                         'Biomass / gas / waste derived fuel': 'Mixed fossil fuels',\n",
    "                         'Natural Gas': 'Natural gas',\n",
    "                         'CCGT': 'Natural gas',\n",
    "                         'Sour gas': 'Natural gas',\n",
    "                         'Coal': 'Hard coal',\n",
    "                         'Coal / biomass': 'Mixed fossil fuels',\n",
    "                         'Coal / biomass / gas / waste derived fuel': 'Mixed fossil fuels',\n",
    "                         'Coal / oil': 'Mixed fossil fuels',\n",
    "                         'Coal/oil': 'Mixed fossil fuels',\n",
    "                         'Diesel': 'Oil',\n",
    "                         'Gas': 'Natural gas',\n",
    "                         'Gas / oil': 'Mixed fossil fuels',\n",
    "                         'Diesel/gas oil': 'Mixed fossil fuels',\n",
    "                         'Gas oil': 'Oil',\n",
    "                         'Gas oil / kerosene': 'Oil',\n",
    "                         'Hydro': 'Hydro',\n",
    "                         'Hydro / pumped storage': 'Hydro',\n",
    "                         'Pumped Storage': 'Hydro',\n",
    "                         'Light oil': 'Oil',\n",
    "                         'Meat & bone meal': 'Other bioenergy and renewable waste',\n",
    "                         'Nuclear': 'Nuclear',\n",
    "                         'OCGT': 'Natural gas',\n",
    "                         'Oil': 'Oil',\n",
    "                         'Light oil ': 'Oil',\n",
    "                         'Pumped storage': 'Hydro',\n",
    "                         'Straw': 'Biomass and biogas',\n",
    "                         'Biomass (wood pellets, sunflower/oat husk pellets)': 'Biomass and biogas',\n",
    "                         'Biomass (woodchip)': 'Biomass and biogas',\n",
    "                         'Biomass (litter, woodchip)': 'Biomass and biogas',\n",
    "                         'Biomass (meat and bone meal)': 'Biomass and biogas',\n",
    "                         'Biomass (poultry litter, waste wood)': 'Biomass and biogas',\n",
    "                         'Biomass (straw)': 'Biomass and biogas',\n",
    "                         'Biomass (recycled wood)': 'Biomass and biogas',\n",
    "                         'Biomass (poultry litter, woodchip)': 'Biomass and biogas',\n",
    "                         'Biomass (wood pellets)': 'Biomass and biogas',\n",
    "                         'Waste (municipal solid waste)': 'Non-renewable waste',\n",
    "                         'Biomass (recycled wood, virgin wood)': 'Biomass and biogas',\n",
    "                         'Biomass (virgin wood)': 'Biomass and biogas',\n",
    "                         'Waste': 'Non-renewable waste',\n",
    "                         'Waste (anaerobic digestion)': 'Non-renewable waste',\n",
    "                         'Wind': 'Wind',\n",
    "                         'Wind (offshore)': 'Wind',\n",
    "                         'Wind (onshore)': 'Wind',\n",
    "                         'Solar': 'Solar'}\n",
    "data_UK[\"energy_source\"].replace(dict_energysources_UK, inplace=True)\n",
    "\n",
    "# Delete unwanted energy sources\n",
    "data_UK = data_UK[data_UK.energy_source != 'Wind']\n",
    "data_UK = data_UK[data_UK.energy_source != 'Solar']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.10.5 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_UK = add_location_and_EIC('UK', data_UK)\n",
    "\n",
    "# add source\n",
    "data_UK[\"source\"] = \"https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes#content\"\n",
    "\n",
    "data_UK.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.11 Norway NO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the [**Norwegian Water Resources and Energy Directorate**](https://www.nve.no/english/). It encompasses a database on the installed capacity of [**thermal generators**](https://www.nve.no/energiforsyning/kraftproduksjon/termisk-kraft/?ref=mainmenu) as well as on [**hydro generators**](https://www.nve.no/energiforsyning/kraftproduksjon/vannkraft/vannkraftdatabase/#)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.11.1 Data import"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_NO_hydro = os.path.join(data_directory, 'NO', 'Vannkraftverk.csv')\n",
    "data_NO_hydro = pd.read_csv(filepath_NO_hydro, \n",
    "                skiprows=2,\n",
    "                sep=\";\",\n",
    "                decimal=\",\",\n",
    "                header=0,\n",
    "                index_col=False,\n",
    "                encoding='latin-1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_NO_thermal = os.path.join(data_directory, 'NO', 'termiske-kraftverk-i-norge-2019.xlsx')\n",
    "data_NO_thermal = pd.read_excel(filepath_NO_thermal,\n",
    "                sheet_name='Ark1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.11.2 Hydro generators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_NO_hydro = {'Navn': 'name',\n",
    "                         'Type': 'technology',\n",
    "                         'Kommune': 'city',\n",
    "                         'Kommunenr.': 'postcode',\n",
    "                         'Maks ytelse [MW]': 'capacity',\n",
    "                         'Hovedeier': 'company',\n",
    "                         'Dato for første utnyttelse av fallet': 'commissioned',\n",
    "                         'Elspotområde': 'additional_info'}\n",
    "data_NO_hydro.rename(columns=dict_columns_NO_hydro, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_NO_hydro = data_NO_hydro.reindex(columns=columns_sorted)\n",
    "\n",
    "# Fill with general information\n",
    "data_NO_hydro['country'] = 'NO'\n",
    "data_NO_hydro['energy_source'] = 'Hydro'\n",
    "data_NO_hydro['additional_info'] = 'Zone: NO' + data_NO_hydro['additional_info'].astype(str)\n",
    "# Change comissioning date to year only\n",
    "data_NO_hydro['commissioned'] = data_NO_hydro['commissioned'].apply(lambda x: x[0:4]).astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all technology types mentioned in the column \"technology\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add comment for plants with pump only (later categorized as pumped storage)\n",
    "data_NO_hydro.loc[data_NO_hydro['technology'] == 'Pumpe', 'comment'] = 'Pump only'\n",
    "# Take absolute of negative capacity of plants with pump only\n",
    "mask = data_NO_hydro['technology'] == 'Pumpe'\n",
    "data_NO_hydro.loc[mask, 'capacity'] = data_NO_hydro.loc[mask].capacity.apply(lambda x: abs(x))\n",
    "\n",
    "# Translate technologies\n",
    "dict_technologies_NO_hydro = {\n",
    "    'Kraftverk': 'Reservoir',\n",
    "    'Pumpekraftverk': 'Pumped storage',\n",
    "    'Pumpe': 'Pumped storage'\n",
    "    }\n",
    "data_NO_hydro['technology'].replace(dict_technologies_NO_hydro, inplace=True)\n",
    "\n",
    "data_NO_hydro.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.11.3 Thermal generators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_NO_thermal = {'Kraftverk': 'name',\n",
    "                         'Kommentar': 'comment',\n",
    "                         'Brensel': 'energy_source',\n",
    "                         'Kommune': 'city',\n",
    "                         'Kommunenr.': 'postcode',\n",
    "                         'Installert effekt [MW]': 'capacity',\n",
    "                         'Idriftsettelsesår': 'commissioned',\n",
    "                         'Elspotområde': 'additional_info'}\n",
    "data_NO_thermal.rename(columns=dict_columns_NO_thermal, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_NO_thermal = data_NO_thermal.reindex(columns=columns_sorted)\n",
    "\n",
    "# Fill with general information\n",
    "data_NO_thermal['country'] = 'NO'\n",
    "data_NO_thermal['additional_info'] = 'Zone: ' + data_NO_thermal['additional_info'].astype(str)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all technology types mentioned in the column \"technology\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Translate energy sources\n",
    "dict_energy_sources_NO_thermal = {\n",
    "    'Avfallsforbrenning': 'Non-renewable waste',\n",
    "    'Varmegjenvinning': 'Other fossil fuels',\n",
    "    'Naturgass': 'Natural gas',\n",
    "    'Biogass fra avfall': 'Biomass and biogas',\n",
    "    'Ukjent': np.nan,\n",
    "    'Bark, returfiberavfall, slam, rivningsvirke og olje ': 'Mixed fossil fuels',\n",
    "    'Flis fra impregnert tre, avfallsforbrenning': 'Other bioenergy and renewable waste',\n",
    "    'Biogass': 'Biomass and biogas',\n",
    "    'CO gass': 'Other fossil fuels'\n",
    "    }\n",
    "data_NO_thermal['energy_source'].replace(dict_energy_sources_NO_thermal,\n",
    "                                         inplace=True)\n",
    "\n",
    "# Delete unwanted energy sources\n",
    "data_NO_thermal = data_NO_thermal[data_NO_thermal.energy_source != 'Biomass and biogas']\n",
    "data_NO_thermal = data_NO_thermal[data_NO_thermal.energy_source != 'Other bioenergy and renewable waste']\n",
    "\n",
    "data_NO_thermal.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.11.4 Combine hydro and thermal data frames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add sources\n",
    "data_NO_hydro[\"source\"] = \"https://www.nve.no/energiforsyning/kraftproduksjon/vannkraft/vannkraftdatabase/#\"\n",
    "data_NO_thermal[\"source\"] = \"https://www.nve.no/media/8967/termiske-kraftverk-i-norge-2019.xlsx\"\n",
    "\n",
    "data_NO = pd.concat([data_NO_hydro, data_NO_thermal], ignore_index=True)\n",
    "\n",
    "data_NO.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.12 Sweden SE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the power exchange [**Nordpool**](https://www.nordpoolgroup.com/). It encompasses a detailed list of Swedish generation units with a capacity of more than 100 MW for 2014. Since there was no new data on the Swedish generators found, the list from 2014 is still used in this release."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_SE = os.path.join(data_directory, 'SE', 'input_plant-list_SE.csv')\n",
    "data_SE = pd.read_csv(filepath_SE, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "data_SE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.13 Slovakia SK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Slovakian utility [**Slovenské elektrárne a.s. (SEAS)**](https://www.seas.sk/power-plants). It encompasses a detailed list of Slovak generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_SK = os.path.join(data_directory, 'SK', 'input_plant-list_SK.csv')\n",
    "data_SK = pd.read_csv(filepath_SK, encoding='utf-8', header=0, index_col=None) \n",
    "\n",
    "data_SK.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.14 Slovenia SI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by several Slovenian utilities. The respective data links are given in the column \"source\". This list encompasses Slovenian generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_SI = os.path.join(data_directory, 'SI', 'input_plant-list_SI.csv')\n",
    "data_SI = pd.read_csv(filepath_SI, encoding='utf-8')\n",
    "\n",
    "data_SI.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.15 Austria AT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data for **conventional power plants** is provided by several Austrian utilities. The respective data links are given in the column \"source\". The specifications of Austrian **hydro power plants**, however, solely are based on [**Verbund AG**](https://www.verbund.com/de-at/ueber-verbund/kraftwerke/unsere-kraftwerke). The resulting list encompasses Austrian generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.15.1 Data import"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_AT_hydro = os.path.join(data_directory, 'AT', 'input_plant-list_AT_hydro.csv')\n",
    "data_AT_hydro = pd.read_csv(filepath_AT_hydro, encoding=\"latin1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_AT_thermal = os.path.join(data_directory, 'AT', 'input_plant-list_AT_thermal.csv')\n",
    "data_AT_thermal = pd.read_csv(filepath_AT_thermal, encoding=\"latin1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.15.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Delete MW in capacity column\n",
    "data_AT_hydro.capacity = data_AT_hydro.capacity.apply(lambda x: x.replace('MW',''))\n",
    "\n",
    "#Apply general template of columns\n",
    "data_AT_hydro = data_AT_hydro.reindex(columns=columns_sorted)\n",
    "data_AT_thermal = data_AT_thermal.reindex(columns=columns_sorted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.15.3 Combine hydro and thermal data frames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_AT = pd.concat([data_AT_hydro, data_AT_thermal], ignore_index=True)\n",
    "\n",
    "data_AT.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.16 Denmark DK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is assembled using the information of several websites. The sources can be found within the document. It encompasses a detailed list of Danish generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.16.1 Data import"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath_DK = os.path.join(data_directory, 'DK', 'input_plant-list_DK.csv')\n",
    "data_DK = pd.read_csv(filepath_DK, encoding='utf-8', header=0, index_col=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.17.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All generators, which are not or only partly available, are dropped. The imported data is then standardized with respect to the columns as defined in section [2.3](#2.3-Definition-of-harmonized-output-scheme)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List only operating plants\n",
    "data_DK = data_DK[data_DK.availability != '0']\n",
    "data_DK = data_DK[data_DK.availability != 'partly']\n",
    "\n",
    "#Drop unwanted columns\n",
    "data_DK = data_DK.drop('availability', axis=1)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_DK=data_DK.reindex(columns=columns_sorted)\n",
    "\n",
    "data_DK.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Consolidation of processed country data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following, the national datasets are consolidated to a single European dataset. Unfortunately, the Belgian dataset cannot be integrated due to the copyright by the data owner ELIA."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataframes = [data_BE,\n",
    "              data_NL,\n",
    "              data_FR,\n",
    "              data_PL,\n",
    "              data_CZ,\n",
    "              data_CH,\n",
    "              data_IT,\n",
    "              data_FI,\n",
    "              data_ES,\n",
    "              data_UK,\n",
    "              data_NO,\n",
    "              data_SE,\n",
    "              data_SK,\n",
    "              data_SI,\n",
    "              data_AT,\n",
    "              data_DK]\n",
    "\n",
    "data_EU = pd.concat(dataframes, sort=False)\n",
    "\n",
    "data_EU.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1 Implementation of energy source levels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import energy source level definition\n",
    "energy_source_levels = pd.read_csv(\n",
    "    os.path.join('input', 'energy_source_levels.csv'), index_col=None, header=0)\n",
    "\n",
    "# Merge energy source levels to data set\n",
    "data_EU = data_EU.reset_index().merge(\n",
    "    energy_source_levels,\n",
    "    how='left',\n",
    "    left_on='energy_source',\n",
    "    right_on='energy_source_level_1').drop_duplicates(\n",
    "        subset=['name',\n",
    "                'city',\n",
    "                'country',\n",
    "                'capacity'], keep='first').set_index('name')\n",
    "\n",
    "data_EU = data_EU.reset_index().merge(\n",
    "    energy_source_levels,\n",
    "    how='left',\n",
    "    left_on='energy_source',\n",
    "    right_on='energy_source_level_2').drop_duplicates(\n",
    "        subset=['name',\n",
    "                'city',\n",
    "                'country',\n",
    "                'capacity'], keep='first').set_index('name')\n",
    "\n",
    "data_EU = data_EU.reset_index().merge(\n",
    "    energy_source_levels,\n",
    "    how='left',\n",
    "    left_on='energy_source',\n",
    "    right_on='energy_source_level_3').drop_duplicates(\n",
    "        subset=['name',\n",
    "                'city',\n",
    "                'country',\n",
    "                'capacity'], keep='first').set_index('name')\n",
    "\n",
    "# Combine different energy source levels created by merge\n",
    "data_EU['energy_source_level_1'] = data_EU[\n",
    "    ['energy_source_level_1',\n",
    "     'energy_source_level_1_x',\n",
    "     'energy_source_level_1_y']].fillna('').sum(axis=1)\n",
    "\n",
    "data_EU['energy_source_level_2'] = data_EU[\n",
    "    ['energy_source_level_2',\n",
    "     'energy_source_level_2_y']].fillna('').sum(axis=1)\n",
    "\n",
    "data_EU['energy_source_level_3'] = data_EU[\n",
    "    ['energy_source_level_3']].fillna('').sum(axis=1)\n",
    "\n",
    "# Drop auxiliary columns due to merge\n",
    "colsToDrop = ['energy_source_level_1_y',\n",
    "              'energy_source_level_2_y',\n",
    "              'energy_source_level_3_y',\n",
    "              'energy_source_level_1_x',\n",
    "              'energy_source_level_2_x',\n",
    "              'energy_source_level_3_x']\n",
    "data_EU = data_EU.drop(colsToDrop, axis=1)\n",
    "\n",
    "# replace false energy source levels for plants without energy source in original data\n",
    "index_with_NAN = data_EU[data_EU.energy_source.isna()].index\n",
    "\n",
    "data_EU.loc[index_with_NAN,['energy_source_level_1', \n",
    "                            'energy_source_level_2', \n",
    "                            'energy_source_level_3']] = np.NaN\n",
    "\n",
    "data_EU.loc[index_with_NAN, 'comment'] = 'Energy source not in original data'\n",
    "\n",
    "\n",
    "data_EU.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2 Definition of structure and data types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we define the ordering of the columns. Secondly, the data types are redefined. At the moment, this has the drawback that empty columns are redefined as `float` instead of `object`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "columns_sorted_output = ['name',\n",
    "                         'company',\n",
    "                         'street',\n",
    "                         'postcode',\n",
    "                         'city',\n",
    "                         'country',\n",
    "                         'capacity',\n",
    "                         'energy_source',\n",
    "                         'technology',\n",
    "                         'chp',\n",
    "                         'commissioned',\n",
    "                         'type',\n",
    "                         'lat',\n",
    "                         'lon',\n",
    "                         'eic_code',\n",
    "                         'energy_source_level_1',\n",
    "                         'energy_source_level_2',\n",
    "                         'energy_source_level_3',\n",
    "                         'additional_info',\n",
    "                         'comment',\n",
    "                         'source']\n",
    "\n",
    "# Set ordering of columns\n",
    "data_EU = data_EU.reset_index()\n",
    "data_EU = data_EU.reindex(columns=columns_sorted_output)\n",
    "\n",
    "# Set data types for columns\n",
    "data_EU = data_EU.astype(str)\n",
    "data_EU[['capacity', 'commissioned', 'lat', 'lon']] = data_EU[\n",
    "    ['capacity', 'commissioned', 'lat', 'lon']].astype(float)\n",
    "\n",
    "data_EU.replace('nan', np.nan, inplace=True)\n",
    "\n",
    "# data_EU.dtypes\n",
    "\n",
    "# Set index\n",
    "data_EU = data_EU.set_index('name')\n",
    "\n",
    "data_EU.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. Result export"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Obtain the DE list and concatinate. Note, the stand alone DE list contains more information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# call DE script if not already executed\n",
    "# %run ./download_and_process_DE.ipynb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_DE = pd.read_csv('output/conventional_power_plants_DE.csv', index_col=0)\n",
    "\n",
    "data_DE = data_DE.rename(columns={'eic_code_plant': 'eic_code',\n",
    "                                  'capacity_net_bnetza': 'capacity'})\n",
    "data_DE[\"source\"] = \"BNetzA/UBA\"\n",
    "data_DE[\"additional_info\"] = \"\"\n",
    "\n",
    "data_EU = pd.concat([data_EU, data_DE.loc[:, data_EU.columns]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write the final list to file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_path = 'output'\n",
    "\n",
    "data_EU.to_csv(os.path.join(\n",
    "    output_path, 'conventional_power_plants_EU.csv'),\n",
    "    encoding='utf-8',\n",
    "    index_label='name')\n",
    "\n",
    "data_EU.to_excel(\n",
    "    os.path.join(output_path, 'conventional_power_plants_EU.xlsx'),\n",
    "    sheet_name='plants',\n",
    "    index_label='name')\n",
    "\n",
    "data_EU.to_sql(\n",
    "    'conventional_power_plants_EU',\n",
    "    sqlite3.connect(os.path.join(output_path, 'conventional_power_plants.sqlite')),\n",
    "    if_exists=\"replace\",\n",
    "    index_label='name')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "End of script."
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}