{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"width:100%; background-color: #D9EDF7; border: 1px solid #CFCFCF; text-align: left; padding: 10px;\">\n",
    "      <b>Conventional Power Plants: Power Plants in Europe</b>\n",
    "      <ul>\n",
    "        <li><a href=\"main.ipynb\">Main Notebook</a></li>\n",
    "        <li><a href=\"download_and_process_DE.ipynb\">Processing notebook for German power plant</a></li>\n",
    "        <li>Processing notebook for European power plants</li>\n",
    "      </ul>\n",
    "      <br>This notebook is part of the <a href=\"http://data.open-power-system-data.org/DATA PACKAGE NAME HERE\"> Data package name here Data Package</a> of <a href=\"http://open-power-system-data.org\">Open Power System Data</a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table of Contents\n",
    "* [1. Script setup](#1.-Script-setup)\n",
    "* [2. Settings](#2.-Settings)\n",
    "\t* [2.1 Choose download option](#2.1-Choose-download-option)\n",
    "\t* [2.2 Definition of national data sources](#2.2-Definition-of-national-data-sources)\n",
    "\t* [2.3 Download function](#2.3-Download-function)\n",
    "\t* [2.4 Import function](#2.4-Import-function)\n",
    "\t* [2.5 Other functions](#2.5-Other-functions)\n",
    "\t* [2.6 Definition of harmonized output scheme](#2.6-Definition-of-harmonized-output-scheme)\n",
    "* [3. Download of data](#3.-Download-of-data)\n",
    "* [4. Data processing per country](#4.-Data-processing-per-country)\n",
    "\t* [4.1 Belgium BE](#4.1-Belgium-BE)\n",
    "\t\t* [4.1.1 Data import](#4.1.1-Data-import)\n",
    "\t\t* [4.1.2 Translation and harmonization of columns](#4.1.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.1.3 Definition of generation type](#4.1.3-Definition-of-generation-type)\n",
    "\t\t* [4.1.4 Definition of generation technology types](#4.1.4-Definition-of-generation-technology-types)\n",
    "\t\t* [4.1.5 Definition of energy sources](#4.1.5-Definition-of-energy-sources)\n",
    "\t\t* [4.1.6 Additional information on geographic coordinates and EIC codes](#4.1.6-Additional-information-on-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.2 The Netherlands NL](#4.2-The-Netherlands-NL)\n",
    "\t\t* [4.2.1 Data import and merger](#4.2.1-Data-import-and-merger)\n",
    "\t\t* [4.2.2 Translation and harmonization of columns](#4.2.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.2.3 Definition of energy sources](#4.2.3-Definition-of-energy-sources)\n",
    "\t\t* [4.2.4 Adjustment of capacity for plant Rijnmond II (version '2016-10-27')](#4.2.4-Adjustment-of-capacity-for-plant-Rijnmond-II-%28version-'2016-10-27'%29)\n",
    "\t\t* [4.2.5 Select daily entry with highest available capacity](#4.2.5-Select-daily-entry-with-highest-available-capacity)\n",
    "\t\t* [4.2.6 Additional information on geographic coordinates and EIC codes](#4.2.6-Additional-information-on-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.3 France FR](#4.3-France-FR)\n",
    "\t\t* [4.3.1 Data import](#4.3.1-Data-import)\n",
    "\t\t* [4.3.2 Translation and harmonization of columns](#4.3.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.3.3 Treatment of duplicate data entries](#4.3.3-Treatment-of-duplicate-data-entries)\n",
    "\t\t* [4.3.4 Definition of energy sources and generation of technology types](#4.3.4-Definition-of-energy-sources-and-generation-of-technology-types)\n",
    "\t\t* [4.3.5 Additional information on geographic coordinates and EIC codes](#4.3.5-Additional-information-on-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.4 Poland PL](#4.4-Poland-PL)\n",
    "\t\t* [4.4.1 Data import](#4.4.1-Data-import)\n",
    "\t\t* [4.4.2 Translation and harmonization of columns](#4.4.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.4.3 Definition of energy sources](#4.4.3-Definition-of-energy-sources)\n",
    "\t\t* [4.4.4 Definition of generation technology types](#4.4.4-Definition-of-generation-technology-types)\n",
    "\t\t* [4.4.5 Additional information on further power plants, geographic coordinates and EIC codes](#4.4.5-Additional-information-on-further-power-plants,-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.5 Czech Republic CZ](#4.5-Czech-Republic-CZ)\n",
    "\t\t* [4.5.1 Data import](#4.5.1-Data-import)\n",
    "\t\t* [4.5.2 Translation and harmonization of columns](#4.5.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.5.3 Definition of generation technology types](#4.5.3-Definition-of-generation-technology-types)\n",
    "\t\t* [4.5.4 Additional information on further power plants, geographic coordinates and EIC codes](#4.5.4-Additional-information-on-further-power-plants,-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.6 Switzerland CH](#4.6-Switzerland-CH)\n",
    "\t\t* [4.6.1 Data import](#4.6.1-Data-import)\n",
    "\t\t* [4.6.2 Consolidation, translation and harmonization of columns](#4.6.2-Consolidation,-translation-and-harmonization-of-columns)\n",
    "\t\t* [4.6.3 Definition of generation technology types](#4.6.3-Definition-of-generation-technology-types)\n",
    "\t\t* [4.6.4 Additional information on geographic coordinates and EIC codes](#4.6.4-Additional-information-on-geographic-coordinates-and-EIC-codes)\n",
    "\t\t* [4.6.5 Merge hydro and nuclear power plant data](#4.6.5-Merge-hydro-and-nuclear-power-plant-data)\n",
    "\t* [4.7 Italy IT](#4.7-Italy-IT)\n",
    "\t\t* [4.7.1 Data import](#4.7.1-Data-import)\n",
    "\t\t* [4.7.2 Translation and harmonization of columns](#4.7.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.7.3 Definition of energy sources](#4.7.3-Definition-of-energy-sources)\n",
    "\t\t* [4.7.4 Additional information on geographic coordinates and EIC codes](#4.7.4-Additional-information-on-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.8 Finland FI](#4.8-Finland-FI)\n",
    "\t\t* [4.8.1 Data import](#4.8.1-Data-import)\n",
    "\t\t* [4.8.2 Translation and harmonization of columns](#4.8.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.8.3 Definition of energy sources](#4.8.3-Definition-of-energy-sources)\n",
    "\t\t* [4.8.4 Definition of generation technology types](#4.8.4-Definition-of-generation-technology-types)\n",
    "\t\t* [4.8.5 Definition of generation type](#4.8.5-Definition-of-generation-type)\n",
    "\t\t* [4.8.6 Additional information on geographic coordinates and EIC codes](#4.8.6-Additional-information-on-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.9 Spain ES](#4.9-Spain-ES)\n",
    "\t\t* [4.9.1 Data import](#4.9.1-Data-import)\n",
    "\t\t* [4.9.2 Translation and harmonization of columns](#4.9.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.9.3 Definition of energy sources](#4.9.3-Definition-of-energy-sources)\n",
    "\t\t* [4.9.4 Definition of generation technology types](#4.9.4-Definition-of-generation-technology-types)\n",
    "\t\t* [4.9.5 Definition of energy sources](#4.9.5-Definition-of-energy-sources)\n",
    "\t\t* [4.9.6 Additional information on geographic coordinates and EIC codes](#4.9.6-Additional-information-on-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.10 United Kingdom UK](#4.10-United-Kingdom-UK)\n",
    "\t\t* [4.10.1 Data import](#4.10.1-Data-import)\n",
    "\t\t* [4.10.2 Translation and harmonization of columns](#4.10.2-Translation-and-harmonization-of-columns)\n",
    "\t\t* [4.10.3 Definition of generation technology types](#4.10.3-Definition-of-generation-technology-types)\n",
    "\t\t* [4.10.4 Definition of energy sources](#4.10.4-Definition-of-energy-sources)\n",
    "\t\t* [4.10.5 Additional information on geographic coordinates and EIC codes](#4.10.5-Additional-information-on-geographic-coordinates-and-EIC-codes)\n",
    "\t* [4.11 Norway NO](#4.11-Norway-NO)\n",
    "\t* [4.12 Sweden SE](#4.12-Sweden-SE)\n",
    "\t* [4.13 Slovakia SK](#4.13-Slovakia-SK)\n",
    "\t* [4.14 Slovenia SI](#4.14-Slovenia-SI)\n",
    "\t* [4.15 Austria AT](#4.15-Austria-AT)\n",
    "\t* [4.16 Denmark DK](#4.16-Denmark-DK)\n",
    "* [5. Consolidation of processed country data](#5.-Consolidation-of-processed-country-data)\n",
    "\t* [5.1 Implementation of energy source levels](#5.1-Implementation-of-energy-source-levels)\n",
    "\t* [5.2 Definition of structure and data types](#5.2-Definition-of-structure-and-data-types)\n",
    "* [6. Documentation of the data package](#6.-Documentation-of-the-data-package)\n",
    "* [7. Result export](#7.-Result-export)\n",
    "\t* [7.1 Write results to file](#7.1-Write-results-to-file)\n",
    "\t* [7.2 Write checksums](#7.2-Write-checksums)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Script setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import of Python modules needed to process the data and creation of required output folders. **Note:** The package `pyproj` needs to installed as they are not part of the standard Anaconda installation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import datetime\n",
    "import json\n",
    "import logging\n",
    "import os\n",
    "import urllib.parse\n",
    "import urllib.request\n",
    "import sqlite3\n",
    "import zipfile\n",
    "import io\n",
    "import requests\n",
    "import csv\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import pyproj  # required for transforming coordinates\n",
    "import yaml\n",
    "import hashlib\n",
    "\n",
    "logging.basicConfig(\n",
    "    level=logging.INFO,\n",
    "    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',\n",
    "    datefmt='%d %b %Y %H:%M:%S'\n",
    "    )\n",
    "\n",
    "logger = logging.getLogger()\n",
    "\n",
    "# create download and output folder if they do not exist\n",
    "os.makedirs(os.path.join('download'), exist_ok=True)\n",
    "os.makedirs(os.path.join('output'), exist_ok=True)\n",
    "os.makedirs(os.path.join('output', 'original_data'), exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Settings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 Choose download option"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The original data can either be downloaded from the original data sources as specified below or from the OPSD-Server. Default option is to download from the original sources as the aim of the project is to stay as close to original sources as possible. However, if problems with downloads e.g. due to changing URLs occur, you can still run the script with the original data from the OPSD-server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Activate or uncomment to download from original data sources\n",
    "download_from = 'original_sources'\n",
    "\n",
    "# Activate or uncomment to download from existing OPSD versions\n",
    "# download_from = 'opsd_server'\n",
    "# version = '2016-10-27'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 Definition of national data sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The meta data of the national data sources, which is required for downloading and importing the data, are laid down in the following section. The following meta data information are required, which partly differ for each data source depending on the filetype for import.\n",
    "\n",
    "**CSV format**\n",
    "```\n",
    "Country: Country ISO code\n",
    "    Name of source: Short name of source  \n",
    "        url_initial: Initial URL on data source webpage\n",
    "        url_template: Direct URL to data file\n",
    "        filename: Name of data file without filetype extension\n",
    "        filename_opsd: Name of data file for OPSD download\n",
    "            '2016-10-27': Name of data file in previous OPSD version\n",
    "        filetype: Name of filetype (e.g. csv)\n",
    "        sep: Column separator (e.g. ',')\n",
    "        skiprows: Number of first rows to be skipped during import\n",
    "        decimal: Decimal separator (e.g. '.')\n",
    "        encoding: Type of encoding (e.g. 'utf-8')      \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Excel format (xls, xlsx)**\n",
    "\n",
    "```\n",
    "Country: Country ISO code\n",
    "    Name of source: Short name of source \n",
    "        url_initial: Initial URL on data source webpage\n",
    "        url_template: Direct URL to data file\n",
    "        filename: Name of data file without filetype extension\n",
    "        filename_opsd: Name of data file for OPSD download\n",
    "            '2016-10-27': Name of data file in previous OPSD version        \n",
    "        filetype: Name of filetype (e.g. xlsx)\n",
    "        sheetname: Name of sheet in workbook\n",
    "        skiprows: Number of first rows to be skipped during import\n",
    "        \n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "conf = \"\"\"\n",
    "    BE: \n",
    "        Elia:\n",
    "            url_initial: http://www.elia.be/en/grid-data/power-generation/generating-facilities\n",
    "            url_template: http://publications.elia.be/upload/ProductionParkOverview.xls?TS=20120416193815\n",
    "            filename: ProductionParkOverview\n",
    "            filename_opsd:\n",
    "                '2016-10-27': ProductionParkOverview\n",
    "                '2017-03-03': ProductionParkOverview\n",
    "                '2017-07-03': ProductionParkOverview\n",
    "                '2018-02-27': ProductionParkOverview\n",
    "                '2018-12-20': ProductionParkOverview\n",
    "            filetype: xls\n",
    "            sheetname: 'ProductionParkOverview'\n",
    "            skiprows: 1\n",
    "    \n",
    "    NL: \n",
    "        Tennet_Q1:\n",
    "            url_initial: http://www.tennet.org/english/operational_management/export_data.aspx\n",
    "            url_template: http://www.tennet.org/english/operational_management/export_data.aspx?exporttype=installedcapacity&format=csv&quarter=2018-1&submit=3\n",
    "            filename: export_Q12018\n",
    "            filename_opsd:\n",
    "                '2016-10-27': export_Q12015\n",
    "                '2017-03-03': export_Q12016\n",
    "                '2017-07-03': export_Q12016\n",
    "                '2018-02-27': export_Q12017\n",
    "                '2018-12-20': export_Q12018\n",
    "            filetype: csv\n",
    "            sep: ','\n",
    "            skiprows: 0\n",
    "            decimal: '.'\n",
    "            encoding: 'utf-8'\n",
    "        Tennet_Q2:\n",
    "            url_initial: http://www.tennet.org/english/operational_management/export_data.aspx\n",
    "            url_template: http://www.tennet.org/english/operational_management/export_data.aspx?exporttype=installedcapacity&format=csv&quarter=2018-2&submit=3\n",
    "            filename: export_Q22018\n",
    "            filename_opsd:\n",
    "                '2016-10-27': export_Q22015\n",
    "                '2017-03-03': export_Q22016\n",
    "                '2017-07-03': export_Q22016\n",
    "                '2018-02-27': export_Q22017\n",
    "                '2018-12-20': export_Q22018\n",
    "            filetype: csv\n",
    "            sep: ','\n",
    "            skiprows: 0\n",
    "            decimal: '.'\n",
    "            encoding: 'utf-8'\n",
    "        Tennet_Q3:\n",
    "            url_initial: http://www.tennet.org/english/operational_management/export_data.aspx\n",
    "            url_template: http://www.tennet.org/english/operational_management/export_data.aspx?exporttype=installedcapacity&format=csv&quarter=2018-3&submit=3\n",
    "            filename: export_Q32018\n",
    "            filename_opsd:\n",
    "                '2016-10-27': export_Q32015\n",
    "                '2017-03-03': export_Q32016\n",
    "                '2017-07-03': export_Q32016\n",
    "                '2018-02-27': export_Q32017\n",
    "                '2018-12-20': export_Q32018\n",
    "            filetype: csv\n",
    "            sep: ','\n",
    "            skiprows: 0\n",
    "            decimal: '.'\n",
    "            encoding: 'utf-8'\n",
    "        Tennet_Q4:\n",
    "            url_initial: http://www.tennet.org/english/operational_management/export_data.aspx\n",
    "            url_template: http://www.tennet.org/english/operational_management/export_data.aspx?exporttype=installedcapacity&format=csv&quarter=2018-4&submit=3\n",
    "            filename: export_Q42018\n",
    "            filename_opsd:\n",
    "                '2016-10-27': export_Q42015\n",
    "                '2017-03-03': export_Q42016\n",
    "                '2017-07-03': export_Q42016\n",
    "                '2018-02-27': export_Q42017\n",
    "                '2018-12-20': export_Q42018\n",
    "            filetype: csv\n",
    "            sep: ','\n",
    "            skiprows: 0\n",
    "            decimal: '.'\n",
    "            encoding: 'utf-8'\n",
    "    \n",
    "    IT:\n",
    "        Terna: \n",
    "            url_initial: http://www.terna.it/it-it/sistemaelettrico/transparencyreport/generation/installedgenerationcapacity.aspx\n",
    "            url_template: http://download.terna.it/terna/0000/0216/16.XLSX \n",
    "            filename: 16\n",
    "            filename_opsd:\n",
    "                '2016-10-27': 16\n",
    "                '2017-03-03': 16\n",
    "                '2017-07-03': 16\n",
    "                '2018-02-27': 16\n",
    "                '2018-12-20': 16\n",
    "            filetype: xls\n",
    "            sheetname: 'UPR PmaxOver 100MW'\n",
    "            skiprows: 0\n",
    "\n",
    "    FR: \n",
    "        RTE:\n",
    "            url_initial: https://clients.rte-france.com/lang/an/visiteurs/vie/prod/parc_reference.jsp\n",
    "            url_template: http://clients.rte-france.com/servlets/CodesEICServlet\n",
    "            filename: Centrales_production_reference\n",
    "            filename_opsd:\n",
    "                '2016-10-27': Centrales_production_reference\n",
    "                '2017-03-03': Centrales_production_reference\n",
    "                '2017-07-03': Centrales_production_reference\n",
    "                '2018-02-27': Centrales_production_reference\n",
    "                '2018-12-20': Centrales_production_reference\n",
    "            filetype: zip\n",
    "            sep: '\\t'\n",
    "            skiprows: 2\n",
    "            decimal: ','\n",
    "            encoding: 'cp1252'\n",
    "            \n",
    "    ES: \n",
    "        SEDE: \n",
    "            url_initial:  https://sedeaplicaciones.minetur.gob.es/electra/BuscarDatos.aspx\n",
    "            url_template: http://www6.mityc.es/aplicaciones/electra/ElectraExp.csv.zip\n",
    "            filename: ElectraExp\n",
    "            filename_opsd:\n",
    "                '2016-10-27': ElectraExp\n",
    "                '2017-03-03': ElectraExp\n",
    "                '2017-07-03': ElectraExp\n",
    "                '2018-02-27': ElectraExp\n",
    "                '2018-12-20': ElectraExp\n",
    "            filetype: zip\n",
    "            sep: ';'\n",
    "            skiprows: 0\n",
    "            decimal: ','\n",
    "            encoding: 'utf-8'\n",
    "\n",
    "    FI: \n",
    "        EnergyAuthority:\n",
    "            url_initial:  https://www.energiavirasto.fi/en/web/energy-authority/power-plant-register\n",
    "            url_template: https://www.energiavirasto.fi/documents/10191/0/Energiaviraston+voimalaitosrekisteri+29102018.xlsx/4b4b87f5-9eee-409d-836f-5b0040dba7de\n",
    "            filename: Energiaviraston+voimalaitosrekisteri+29102018\n",
    "            filename_opsd:\n",
    "                '2016-10-27': Energiaviraston+Voimalaitosrekisteri+040316\n",
    "                '2017-03-03': Energiaviraston+Voimalaitosrekisteri+010117\n",
    "                '2017-07-03': Energiaviraston+voimalaitosrekisteri+06072017\n",
    "                '2018-02-27': Energiaviraston+voimalaitosrekisteri+01022018\n",
    "                '2018-12-20': Energiaviraston+voimalaitosrekisteri+29102018\n",
    "            filetype: xlsx\n",
    "            sheetname: 'English'\n",
    "            skiprows: 1\n",
    "\n",
    "    PL: \n",
    "        GPI:\n",
    "            url_initial:  http://gpi.tge.pl/en/wykaz-jednostek;jsessionid=C2472043DF326CED2F9C0840B503F5B0.gpi-app1\n",
    "            url_template: http://gpi.tge.pl/en/wykaz-jednostek?p_p_id=powerunits_WAR_powerunitsportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_cacheability=cacheLevelPage&p_p_col_id=column-1&p_p_col_count=1\n",
    "            filename: units_list\n",
    "            filename_opsd:\n",
    "                '2016-10-27': units_list\n",
    "                '2017-03-03': units_list\n",
    "                '2017-07-03': units_list\n",
    "                '2018-02-27': units_list\n",
    "                '2018-12-20': units_list\n",
    "            filetype: csv            \n",
    "            sep: ';'\n",
    "            skiprows: 0\n",
    "            decimal: '.'\n",
    "            encoding: 'utf-8'\n",
    "\n",
    "    UK: \n",
    "        GOV:\n",
    "            url_initial:  https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes\n",
    "            url_template: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/558040/DUKES_5.10.xls\n",
    "            filename: DUKES_5.10\n",
    "            filename_opsd:\n",
    "                '2016-10-27': dukes5_10\n",
    "                '2017-03-03': dukes5_10\n",
    "                '2017-07-03': DUKES_5.10\n",
    "                '2018-02-27': DUKES_5.10\n",
    "            filetype: xls\n",
    "            sheetname: 'Database'\n",
    "            skiprows: 3\n",
    "            \n",
    "    CZ:\n",
    "        CEPS:\n",
    "            url_initial:  http://www.ceps.cz/en/all-data#AvailableCapacity\n",
    "            url_template: http://www.ceps.cz/en/all-data#AvailableCapacity\n",
    "            url_request1: http://www.ceps.cz/en/all-data?do=loadGraphData&method=AvailableCapacity&filter_data={\"dateFrom\":\"2018-01-01 00:00:00\",\"dateTo\":\"2018-12-20 23:59:59\",\"version\":\"YF\",\"agregation\":\"MI\",\"function\":\"AVG\"}&graph_id=1026&move_graph=year&download=xls\n",
    "            url_request2: http://www.ceps.cz/download-data/?format=txt\n",
    "            params:\n",
    "                format: txt\n",
    "                do: loadGraphData\n",
    "                method: AvailableCapacity\n",
    "                filter_data: {\"dateFrom\":\"2018-01-01 00:00:00\",\"dateTo\":\"2018-12-20 23:59:59\",\"version\":\"YF\",\"agregation\":\"MI\",\"function\":\"AVG\"}\n",
    "                download: txt\n",
    "                graph_id: 1026\n",
    "                move_graph: year\n",
    "            headers:\n",
    "                Host: www.ceps.cz\n",
    "                User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0\n",
    "                Accept: application/json, text/javascript, */*; q=0.01\n",
    "                Accept-Language: de,en-US;q=0.7,en;q=0.3\n",
    "                Accept-Encoding: gzip, deflate\n",
    "                Referer: http://www.ceps.cz/en/all-data\n",
    "                X-Requested-With: XMLHttpRequest\n",
    "                Cookie: BIGipServer~produkce~WEB2017-80=rd1o00000000000000000000ffffac180a87o80\n",
    "                Connection: keep-alive\n",
    "            filename: data\n",
    "            filename_opsd:\n",
    "                '2016-10-27': Data\n",
    "                '2017-03-03': Data\n",
    "                '2017-07-03': Data\n",
    "                '2018-02-27': data\n",
    "                '2018-12-20': data\n",
    "            filetype: csv\n",
    "            sep: ','\n",
    "            skiprows: 0\n",
    "            decimal: '.'\n",
    "            encoding: 'utf-8'\n",
    "\n",
    "    CH:\n",
    "        BFE:\n",
    "            url_initial:  http://www.bfe.admin.ch/themen/00490/00491/index.html?lang=de&dossier_id=01049\n",
    "            url_template: http://www.bfe.admin.ch/php/modules/publikationen/stream.php?extlang=de&name=de_496108515.zip&endung=Statistik%20der%20Wasserkraftanlagen%20der%20Schweiz\n",
    "            filename: 'Statistik der Wasserkraftanlagen der Schweiz 1.1.2018'\n",
    "            filename_opsd:\n",
    "                '2016-10-27': 'Statistik der Wasserkraftanlagen der Schweiz 1.1.2016'\n",
    "                '2017-03-03': 'Statistik der Wasserkraftanlagen der Schweiz 1.1.2016'\n",
    "                '2017-07-03': 'Statistik der Wasserkraftanlagen der Schweiz 1.1.2017'\n",
    "                '2018-02-27': 'Statistik der Wasserkraftanlagen der Schweiz 1.1.2017'\n",
    "                '2018-12-20': 'Statistik der Wasserkraftanlagen der Schweiz 1.1.2018'\n",
    "            filetype: zip\n",
    "            sheetname: 'Vorschau'\n",
    "            skiprows: 0\n",
    "            \n",
    "\"\"\"\n",
    "conf = yaml.load(conf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3 Download function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The download function standardizes the download process of the different national data sources. The required input parameter are the URL, the filename, and the filetype. The function downloads the corresponding data source and saves the file in the folder `download`. Additionally, the file is saved in the folder `output/original_data` together with the results of this processing script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def downloadandcache(url, filename, filetype):\n",
    "    \"\"\"This function downloads a file into a folder called\n",
    "    downloads and returns the local filepath.\"\"\"\n",
    "    filename = str(filename)\n",
    "    filetype = filetype\n",
    "\n",
    "    now = datetime.datetime.now()\n",
    "    datestring = \"\"\n",
    "    datestring = str(now.year)+\"-\"+str(now.month)+\"-\"+str(now.day)\n",
    "\n",
    "    filepath = os.path.join(\n",
    "        'download', datestring+\"-\"+filename+\".\"+filetype)\n",
    "    filepath_original_data = os.path.join(\n",
    "        'output', 'original_data', filename+\".\"+filetype)\n",
    "\n",
    "    # Check if file exists, otherwise download it\n",
    "    if not os.path.exists(filepath):\n",
    "        print(\"Downloading file\", filename+\".\"+filetype)\n",
    "        urllib.request.urlretrieve(url, filepath)\n",
    "        urllib.request.urlretrieve(url, filepath_original_data)\n",
    "    else:\n",
    "        print(\"Using local file from\", filepath)\n",
    "    filepath = './'+filepath\n",
    "\n",
    "    return filepath"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.4 Import function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The import function reads the downloaded data files and imports it into a data frame. The function uses the meta data defined in [2.1](#2.2-Definition-of-national-data-sources). Depending on the filetype, pandas-specific import function are used to read the files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def importdata(country, tso):\n",
    "    \"\"\"This function imports the downloaded data\n",
    "    and returns a data frame for further processing.\"\"\"\n",
    "    now = datetime.datetime.now()\n",
    "    datestring = \"\"\n",
    "    datestring = str(now.year)+\"-\"+str(now.month)+\"-\"+str(now.day)\n",
    "\n",
    "    param = conf[country][tso]\n",
    "    filepath = os.path.join('download',\n",
    "                            datestring+\"-\"+str(\n",
    "                                param['filename'])+\".\"+param['filetype'])\n",
    "\n",
    "    if param['filetype'] == 'csv':\n",
    "        data_import = pd.read_csv(filepath,\n",
    "                                  sep=param['sep'],\n",
    "                                  skiprows=param['skiprows'],\n",
    "                                  decimal=param['decimal'],\n",
    "                                  encoding=param['encoding'],\n",
    "                                  low_memory=False)\n",
    "    elif param['filetype'] == 'dbf':\n",
    "        dbf = Dbf5(filepath, codec=param['encoding'])\n",
    "        data_import = dbf.to_dataframe()\n",
    "    else:\n",
    "        data_import = pd.read_excel(filepath,\n",
    "                                    sheetname=param['sheetname'],\n",
    "                                    skiprows=param['skiprows'])\n",
    "\n",
    "    data_import['country'] = str(country)\n",
    "    data_import['source'] = str(tso)\n",
    "\n",
    "    return data_import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5 Other functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Hash function\n",
    "def get_sha_hash(path, blocksize=65536):\n",
    "    sha_hasher = hashlib.sha256()\n",
    "    with open(path, 'rb') as f:\n",
    "        buffer = f.read(blocksize)\n",
    "        while len(buffer) > 0:\n",
    "            sha_hasher.update(buffer)\n",
    "            buffer = f.read(blocksize)\n",
    "        return sha_hasher.hexdigest()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.6 Definition of harmonized output scheme"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To provide a standardizes set of power plant information among all national data sources, a set of required columns is defined which is subsequently filled with available data. The following columns and their structure are the basis for all national data sources. \n",
    "\n",
    "**Note:** If information for specific columns are not available, the data entry is empty. On the other hand, if the national data sources provides other information than required by the scheme, these information are not processed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "columns_sorted = ['name',\n",
    "                  'company',\n",
    "                  'street',\n",
    "                  'postcode',\n",
    "                  'city',\n",
    "                  'country',\n",
    "                  'capacity',\n",
    "                  'energy_source',\n",
    "                  'technology',\n",
    "                  'chp',\n",
    "                  'commissioned',\n",
    "                  'type',\n",
    "                  'lat',\n",
    "                  'lon',\n",
    "                  'eic_code',\n",
    "                  'additional_info',\n",
    "                  'comment',\n",
    "                  'source']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Download of data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The subsequent download loops of the different entries, countries and data sources, of the meta data file (section [2.1](#2.2-Definition-of-national-data-sources)) and calls the previously defined download function which is feeded with the data source specific meta data. After downloading the data files, they are further handled to recieve files which are ready for a direct import through pandas. In most cases, readable filetypes are provided (e.g. csv, xls, xlsx). However, the following exceptions need special attention:\n",
    "- **FR** (RTE) provides a compressed zip-file with a corrupted Excel-file. Renaming the Excel- to a CSV-file, solves the import problem.\n",
    "- **ES** (SEDE) delivers a compressed zip-file which needs to be extracted before importing. Files are extracted in the folder `download`.\n",
    "- **CH** (BFE) delivers a compressed zip-file which needs to be extracted before importing. Files are extracted in the folder `download`.\n",
    "- **CZ** (CEPS) requires a special way of scraping the data from their website."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "for country, tso in conf.items():\n",
    "    for tso, param in tso.items():\n",
    "\n",
    "        # Read and define input values for download function\n",
    "        if download_from == 'original_sources':\n",
    "            url_data = param['url_template']\n",
    "            filename_data = param['filename']\n",
    "            filetype_data = param['filetype']\n",
    "        if download_from == 'opsd_server':\n",
    "            url_data = 'http://data.open-power-system-data.org/conventional_power_plants/'\n",
    "            + version + '/original_data/'\n",
    "            filename_data = param['filename_opsd'][version]\n",
    "            filetype_data = param['filetype']\n",
    "            url_data = url_data + str(filename_data).replace(\" \", \"%20\")\n",
    "            + '.' + str(filetype_data)\n",
    "\n",
    "        print(url_data)\n",
    "\n",
    "# Call of download function\n",
    "        if download_from == 'original_sources' and not tso == 'CEPS':\n",
    "            downloadandcache(url_data, filename_data, filetype_data)\n",
    "            \n",
    "        # Special case for CZ\n",
    "        if download_from == 'original_sources' and tso == 'CEPS':\n",
    "            now = datetime.datetime.now()\n",
    "            datestring = \"\"\n",
    "            datestring = str(now.year)+\"-\"+str(now.month)+\"-\"+str(now.day)\n",
    "\n",
    "            filepath = os.path.join(\n",
    "                'download', datestring+\"-\"+filename_data+\".\"+filetype_data)\n",
    "            filepath_original_data = os.path.join(\n",
    "                'output', 'original_data', filename_data+\".\"+filetype_data)\n",
    "\n",
    "            # Check if file exists, otherwise download/create it\n",
    "            if not os.path.exists(filepath):\n",
    "                print(\"Downloading file\", filename_data+\".\"+filetype_data)\n",
    "                res = requests.get(param['url_request1'], params=param['params'], headers=param['headers'])\n",
    "                res = requests.get(param['url_request2'], params=param['params'], headers=param['headers'])\n",
    "                fix = res.text\n",
    "                df = pd.read_csv(io.StringIO(fix), sep=\";\", skiprows=2)\n",
    "                df = df.drop(['Unnamed: 5', 'Date'], axis=1)\n",
    "                df.to_csv(filepath,index=False,encoding='utf-8')\n",
    "                df.to_csv(filepath_original_data,index=False,encoding='utf-8')\n",
    "            else:\n",
    "                print(\"Using local file from\", filepath)\n",
    "                \n",
    "        if download_from == 'opsd_server':\n",
    "            downloadandcache(url_data, filename_data, filetype_data)\n",
    "\n",
    "# Treatment of special cases for FR, ES, and CH\n",
    "        now = datetime.datetime.now()\n",
    "        datestring = str(now.year) + \"-\" + str(now.month) + \"-\" + str(now.day)\n",
    "        filepath = os.path.join(\n",
    "            \"download\", datestring + \"-\" + str(filename_data))\n",
    "\n",
    "# Special case FR (RTE): ZIP-file with corrupted xls-file is provided, which needs to be renamed to csv\n",
    "        if country == 'FR':\n",
    "            with zipfile.ZipFile(filepath + \".zip\", \"r\") as O:\n",
    "                O.extractall(\"download\")\n",
    "            if not os.path.exists(filepath + \".csv\"):\n",
    "                os.rename(os.path.join(\"download\", filename_data + \".xls\"),\n",
    "                          filepath + \".csv\")\n",
    "            # change filetype from zip to csv\n",
    "            conf[country][tso]['filetype'] = \"csv\"\n",
    "\n",
    "# Special case ES (SEDE): ZIP-file with csv file\n",
    "        if country == 'ES':\n",
    "            with zipfile.ZipFile(filepath + \".zip\", \"r\") as O:\n",
    "                O.extractall(\"download\")\n",
    "            if not os.path.exists(filepath + \".csv\"):\n",
    "                os.rename(os.path.join(\"download\", filename_data + \".csv\"),\n",
    "                          filepath + \".csv\")\n",
    "\n",
    "            # change filetype from zip to csv\n",
    "            conf[country][tso]['filetype'] = \"csv\"\n",
    "\n",
    "# Special case CH (BFE): ZIP-file with xlsx file\n",
    "        if country == 'CH':\n",
    "            with zipfile.ZipFile(filepath + \".zip\", \"r\") as O:\n",
    "                O.extractall(\"download\")\n",
    "            if not os.path.exists(filepath + \".xlsx\"):\n",
    "                os.rename(os.path.join(\"download\", filename_data + \".xlsx\"),\n",
    "                          filepath + \".xlsx\")\n",
    "\n",
    "            # change filetype from zip to csv\n",
    "            conf[country][tso]['filetype'] = \"xlsx\"\n",
    "\n",
    "# print(conf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Data processing per country"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1 Belgium BE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Belgian transmission network operator [**ELIA**](http://publications.elia.be/upload/ProductionParkOverview.xls?TS=20120416193815). It encompasses a detailed list of Belgian generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_BE = importdata('BE', 'Elia')\n",
    "\n",
    "data_BE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_BE = {'ARP': 'company',\n",
    "                   'Generation plant': 'name',\n",
    "                   'Plant Type': 'technology',\n",
    "                   'Technical Nominal Power (MW)': 'capacity',\n",
    "                   'Remarks': 'comment',\n",
    "                   'Fuel': 'energy_source',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "data_BE.rename(columns=dict_columns_BE, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_BE = data_BE.reindex(columns=columns_sorted)\n",
    "\n",
    "# Drop rows without capacity entries, so that row with \n",
    "# \"Unit connected to Distribution Grid\" is dropped\n",
    "data_BE = data_BE.dropna(subset=['capacity'])\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_BE.capacity = data_BE.capacity.astype(float)\n",
    "\n",
    "data_BE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.3 Definition of generation type"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The generation type provides information on the 'usage' of the power plants (beside electricity generaiton), thus if the plant is an industrial power plant or provides thermal heat for district heating. \n",
    "\n",
    "The Belgian data source provides only a general information on the heat supply (here: WKK). Thus, due to these general informaiton, we classify corresponding plants as both, industrial or combined heat power plant, and cannot distringuish both types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate entries in column \"type\" according to technology \"WKK\"\n",
    "data_BE.loc[data_BE['technology'] == 'WKK', 'type'] = 'CHP/IPP'\n",
    "data_BE.loc[data_BE['name'].str.contains('WKK'), 'type'] = 'CHP/IPP'\n",
    "\n",
    "# Generate entries in column \"CHP\" according to column \"type\"\n",
    "data_BE.loc[(data_BE['type'] == 'CHP') |\n",
    "            (data_BE['type'] == 'IPP') |\n",
    "            (data_BE['type'] == 'CHP/IPP'), 'chp'] = 'Yes'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.4 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all technology types mentioned in the column \"technology\" and subsequent translation check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate technologies\n",
    "dict_technology_BE = {'GT': 'Gas turbine',\n",
    "                      'BG': 'NaN',\n",
    "                      'CL': 'Steam turbine',\n",
    "                      'WKK': 'NaN',\n",
    "                      'CCGT': 'Combined cycle',\n",
    "                      'D': 'NaN',\n",
    "                      'HU': 'NaN',\n",
    "                      'IS': 'NaN',\n",
    "                      'NU': 'Steam turbine',\n",
    "                      'TJ': 'Gas turbine',\n",
    "                      'WT': 'NaN',\n",
    "                      ' ': 'NaN',\n",
    "                      'nan': 'NaN',\n",
    "                     }\n",
    "data_BE[\"technology\"].replace(dict_technology_BE, inplace=True)\n",
    "data_BE[\"technology\"].replace('NaN', np.nan, inplace=True)\n",
    "\n",
    "data_BE.loc[(data_BE['name'].str.contains(' ST') |\n",
    "             data_BE['name'].str.contains(' ST ')) &\n",
    "            ((data_BE['technology'] == 'NaN') |\n",
    "             data_BE['technology'].isnull()), 'technology'] = 'Steam turbine'\n",
    "data_BE.loc[(data_BE['name'].str.contains(' GT') |\n",
    "             data_BE['name'].str.contains(' GT ')) &\n",
    "            ((data_BE['technology'] == 'NaN') |\n",
    "             data_BE['technology'].isnull()), 'technology'] = 'Gas turbine'\n",
    "\n",
    "# Check if all technologies have been translated\n",
    "for technology in data_BE[\"technology\"].unique():\n",
    "    if (technology not in dict_technology_BE.values()) & (str(technology) != 'NaN'):\n",
    "        logger.error(\"Untranslated technology: \" + str(technology))\n",
    "\n",
    "data_BE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.5 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\" and subsequent translation check. Deletion of rows containing \"wind\" as energy source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate energy sources\n",
    "dict_energysources_BE = {'BIO': 'Biomass and biogas',\n",
    "                         'BF': 'Other fossil fuels',\n",
    "                         'CL': 'Lignite',\n",
    "                         'CP': 'Hard coal',\n",
    "                         'CG': 'Other fossil fuels',\n",
    "                         'GO': 'Oil',\n",
    "                         'LF': 'Oil',\n",
    "                         'LV': 'Oil',\n",
    "                         'CP/BF': 'Mixed fossil fuels',\n",
    "                         'CP/CG': 'Mixed fossil fuels',\n",
    "                         'FA/BF': 'Mixed fossil fuels',\n",
    "                         'NG/BF': 'Mixed fossil fuels',\n",
    "                         'NG': 'Natural gas',\n",
    "                         'NU': 'Nuclear',\n",
    "                         'WR': 'Non-renewable waste',\n",
    "                         'WA': 'Hydro',\n",
    "                         'WI': 'Wind',\n",
    "                         'WP': 'Biomass and biogas'}\n",
    "data_BE[\"energy_source\"].replace(dict_energysources_BE, inplace=True)\n",
    "data_BE[\"energy_source\"].replace('NaN', np.nan, inplace=True)\n",
    "\n",
    "# Check if all energy sources have been translated\n",
    "for energysource in data_BE[\"energy_source\"].unique():\n",
    "    if (energysource not in dict_energysources_BE.values()) & (str(energysource) != \"NaN\"):\n",
    "        logger.error(\"Untranslated energy source: \" + str(energysource))\n",
    "\n",
    "# Delete unwanted energy source\n",
    "data_BE = data_BE[data_BE.energy_source != 'Wind']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.6 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_BE.csv')\n",
    "additional_data_BE = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate matching\n",
    "data_BE = data_BE.merge(additional_data_BE,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_BE = data_BE.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_BE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2 The Netherlands NL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.1 Data import and merger"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Dutch transmission network operator [**TenneT**](http://www.tennet.org/english/operational_management/export_data.aspx). It encompasses the daily available generation capacity, thus a list of Dutch generation units being operational on a specific day. The data is downloaded for all four quarter in 2015."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Import of quartely data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_NL_Q1 = importdata('NL', 'Tennet_Q1')\n",
    "data_NL_Q2 = importdata('NL', 'Tennet_Q2')\n",
    "data_NL_Q3 = importdata('NL', 'Tennet_Q3')\n",
    "data_NL_Q4 = importdata('NL', 'Tennet_Q4')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Merge quartely data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dataframes = [data_NL_Q1, data_NL_Q2, data_NL_Q3, data_NL_Q4]\n",
    "data_NL = pd.concat(dataframes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Merge columns \"street\" and \"Number\" to one column called \"Street\"\n",
    "data_NL['street'] = data_NL[['street', 'Number']].apply(\n",
    "    lambda x: '{} {}'.format(x[0], x[1]), axis=1)\n",
    "\n",
    "# Drop columns not needed anymore\n",
    "colsToDrop = ['Location', 'Date', 'Number']\n",
    "data_NL = data_NL.drop(colsToDrop, axis=1)\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_NL = {'Connected body': 'company',\n",
    "                   'Entity': 'name',\n",
    "                   'Fuel': 'energy_source',\n",
    "                   'Capacity': 'capacity',\n",
    "                   'zipcode': 'postcode',\n",
    "                   'place-name': 'city'}\n",
    "data_NL.rename(columns=dict_columns_NL, inplace=True)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_NL.capacity = data_NL.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\" and subsequent translation check. Generation of entries for the column \"technology\" according to information given in the column \"energy_source\" by [TenneT](http://www.tennet.org/english/operational_management/system_data_preparation/Reported_production_capacity/Installed_capacity.aspx)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Rename types of energy sources\n",
    "dict_energysources_NL = {'E01': 'Solar',\n",
    "                         'E02': 'Wind',\n",
    "                         'E03': 'Hydro',\n",
    "                         'E04': 'Biomass and biogas',\n",
    "                         'E05': 'Hard coal',\n",
    "                         'E06': 'Natural gas',\n",
    "                         'E07': 'Oil',\n",
    "                         'E08': 'Nuclear',\n",
    "                         'E09': 'Other or unspecified energy sources'}\n",
    "\n",
    "data_NL[\"energy_source\"].replace(dict_energysources_NL, inplace=True)\n",
    "\n",
    "# Check if all energy sources have been translated\n",
    "for energysource in data_NL[\"energy_source\"].unique():\n",
    "    if (energysource not in dict_energysources_NL.values()) & (str(energysource) != \"NaN\"):\n",
    "        logger.error(\"Not renamed energy source: \" + str(energysource))\n",
    "\n",
    "# Generate technology entry according to energy source\n",
    "data_NL.loc[data_NL['energy_source'] == 'Nuclear',\n",
    "            'technology'] = 'Steam turbine'\n",
    "data_NL.loc[data_NL['energy_source'] == 'Hard coal',\n",
    "            'technology'] = 'Steam turbine'\n",
    "\n",
    "# Delete unwanted energy sources in column \"energy_source\"\n",
    "data_NL = data_NL[data_NL.energy_source != 'Solar']\n",
    "data_NL = data_NL[data_NL.energy_source != 'Wind']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.4 Adjustment of capacity for plant Rijnmond II (version '2016-10-27')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Adjustment of the capacity entry for the row relating to the power plant named \"Rijnmond II\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Show data entries for Rijnmond II\n",
    "# data_NL.loc[data_NL['name'] == 'Rijnmond II','capacity']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Data for power plant 'Rijnmond II' are daily total capacity\n",
    "if download_from == 'opsd_server':\n",
    "    if version == '2016-10-27':\n",
    "        data_NL['capacity_new'] = (data_NL['capacity']/24).where(\n",
    "            data_NL.name == 'Rijnmond II')\n",
    "        data_NL.loc[data_NL[\n",
    "            'name'] == 'Rijnmond II', 'capacity'] = data_NL.loc[data_NL[\n",
    "                'name'] == 'Rijnmond II', 'capacity_new']\n",
    "        data_NL = data_NL.drop(['capacity_new'], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.5 Select daily entry with highest available capacity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We estimate the installed capacity by the highest available daily capacity for each unit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Filter rows by considering \"name\" and maximum \"capacity\n",
    "data_NL = data_NL.sort_values(\n",
    "    'capacity', ascending=False).groupby('name', as_index=False).first()\n",
    "\n",
    "# Apply general template of columns\n",
    "data_NL = data_NL.reindex(columns=columns_sorted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.6 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_NL.csv')\n",
    "additional_data_NL = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate matching\n",
    "data_NL = data_NL.merge(additional_data_NL,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_NL = data_NL.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_NL.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.3 France FR"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the French transmission network operator [**RTE**](http://clients.rte-france.com/servlets/CodesEICServlet). It encompasses a detailed list of French generation units with a capacity of more than 100 MW."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_FR = importdata('FR', 'RTE')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_FR = {'Type': 'energy_source',\n",
    "                   'Nom de la centrale de production': 'name',\n",
    "                   'Capacité de production Installée (MW)': 'capacity',\n",
    "                   'Date de création': 'commissioned',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "data_FR.rename(columns=dict_columns_FR, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_FR = data_FR.reindex(columns=columns_sorted)\n",
    "\n",
    "# Delete unwanted row by referring to column \"Name\"\n",
    "data_FR = data_FR.dropna(subset=['name'])\n",
    "\n",
    "# Delete place holder datetime\n",
    "data_FR[\"commissioned\"].replace('01/01/2000', np.nan, inplace=True)\n",
    "\n",
    "# Define commissioning year\n",
    "data_FR['commissioned'] = pd.to_datetime(\n",
    "    data_FR['commissioned'], format='%d/%m/%Y')\n",
    "data_FR['commissioned'] = pd.DatetimeIndex(\n",
    "    data_FR['commissioned']).year\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_FR.capacity = data_FR.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3.3 Treatment of duplicate data entries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The list comprises duplicate entries for specific power plants. If they are commissioned, decommissioned, or retrofitted, the power plant is listed twice with differences in the installed capacity and the commissioning years. In the following, the first data entry is choosen as it seems to be most recent one. \n",
    "\n",
    "**Note**: The commissioning or decommissioning year is currently not adjusted. A comment is set in the corresponding column 'comment'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Show duplicate data entries\n",
    "data_FR[data_FR.duplicated(subset='name', keep=False)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Set comment for duplicate entries\n",
    "data_FR.loc[data_FR.duplicated(\n",
    "    subset='name',\n",
    "    keep=False),\n",
    "    'commissioned'] = np.nan\n",
    "data_FR.loc[data_FR.duplicated(\n",
    "    subset='name',\n",
    "    keep=False),\n",
    "    'comment'] = 'Commissioning year not reported due to duplicate entries'\n",
    "\n",
    "# Drop last duplicate\n",
    "data_FR = data_FR.drop_duplicates(subset='name', keep='first')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3.4 Definition of energy sources and generation of technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for technologies. Overall translation of all energy sources types mentioned in the column \"energy_sources\" and subsequent translation check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate technology entries according to energy sources\n",
    "data_FR.loc[data_FR['energy_source'] == 'Hydraulique STEP',\n",
    "            'technology'] = 'Pumped storage'\n",
    "data_FR.loc[data_FR['energy_source'] == '''Hydraulique fil de l'eau / éclusée''',\n",
    "            'technology'] = 'Run-of-river'\n",
    "data_FR.loc[data_FR['energy_source'] == 'Hydraulique lacs',\n",
    "            'technology'] = 'Reservoir'\n",
    "data_FR.loc[data_FR['energy_source'] == 'Nucléaire',\n",
    "            'technology'] = 'Steam turbine'\n",
    "data_FR.loc[data_FR['energy_source'] == 'Charbon',\n",
    "            'technology'] = 'Steam turbine'\n",
    "\n",
    "# Translate types of energy sources\n",
    "dict_energysources_FR = {'Autre': 'Other or unspecified energy sources',\n",
    "                         'Charbon': 'Hard coal',\n",
    "                         'Fioul': 'Oil',\n",
    "                         'Gaz': 'Natural gas',\n",
    "                         'Hydraulique STEP': 'Hydro',\n",
    "                         '''Hydraulique fil de l'eau / éclusée''': 'Hydro',\n",
    "                         'Hydraulique lacs': 'Hydro',\n",
    "                         'Marin': 'Marine',\n",
    "                         'Nucléaire': 'Nuclear',\n",
    "                         'Biomasse': 'Biomass and biogas'}\n",
    "data_FR[\"energy_source\"].replace(dict_energysources_FR, inplace=True)\n",
    "\n",
    "# Check if all energy sources have been translated\n",
    "for energysource in data_FR[\"energy_source\"].unique():\n",
    "    if (energysource not in dict_energysources_FR.values()) & (str(energysource) != \"NaN\"):\n",
    "        logger.error(\"Untranslated energy source: \" + str(energysource))\n",
    "\n",
    "# Delete unwanted energy sources in column \"energy_source\"\n",
    "data_FR = data_FR[data_FR.energy_source != 'Wind']\n",
    "data_FR = data_FR[data_FR.energy_source != 'Solar']\n",
    "data_FR = data_FR[data_FR.energy_source != 'Marine']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3.5 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_FR.csv')\n",
    "additional_data_FR = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate matching\n",
    "data_FR = data_FR.merge(additional_data_FR,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_FR = data_FR.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_FR.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.4 Poland PL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.4.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Polish Power Exchange [**GPI**](http://gpi.tge.pl/en/wykaz-jednostek?p_p_id=powerunits_WAR_powerunitsportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_cacheability=cacheLevelPage&p_p_col_id=column-1&p_p_col_count=1\n",
    "). It encompasses a detailed list of large Polish generation units with information on the single power plant blocks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_PL = importdata('PL', 'GPI')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.4.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Rename first column\n",
    "data_PL.columns.values[0] = 'company'\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_PL = {'Generating unit name': 'name',\n",
    "                   'Comments': 'comment',\n",
    "                   'Available capacity [MW]': 'capacity',\n",
    "                   'Basic fuel': 'energy_source',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "data_PL.rename(columns=dict_columns_PL, inplace=True)\n",
    "\n",
    "# Fill columns \"energy_source\" and \"company\" with the belonging entries\n",
    "cols = ['energy_source', 'company']\n",
    "data_PL[cols] = data_PL[cols].ffill()\n",
    "\n",
    "# Delete empty and therefore unwanted rows by referring to column \"Generating unit code\"\n",
    "data_PL = data_PL.dropna(subset=['Generating unit code'])\n",
    "\n",
    "# Apply general template of columns\n",
    "data_PL = data_PL.reindex(columns=columns_sorted)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_PL.capacity = data_PL.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.4.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\" and subsequent translation check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Rename energy sources types\n",
    "dict_energysources_PL = {'Brown coal': 'Lignite',\n",
    "                         'Black coal': 'Hard coal',\n",
    "                         'Water': 'Hydro',\n",
    "                         'Natural gas': 'Natural gas',\n",
    "                        }\n",
    "data_PL[\"energy_source\"].replace(dict_energysources_PL, inplace=True)\n",
    "\n",
    "# Check if all energy sources have been translated\n",
    "for energysource in data_PL[\"energy_source\"].unique():\n",
    "    if (energysource not in dict_energysources_PL.values()) & (str(energysource) != \"NaN\"):\n",
    "        logger.error(\"Not renamed energy source: \" + str(energysource))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.4.4 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for the column \"technology\" according to information given in the column \"energy_source\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate entries in column \"technology\" according to energy source \"hydro\"\n",
    "data_PL.loc[data_PL['energy_source'] == 'Hydro', 'technology'] = 'Pumped storage'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.4.5 Additional information on further power plants, geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list\n",
    "filepath = os.path.join('input', 'input_plant_locations_PL.csv')\n",
    "additional_data_PL = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate merger\n",
    "data_PL = data_PL.merge(additional_data_PL,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_PL = data_PL.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_PL.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.5 Czech Republic CZ"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.5.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Czech transmission network operator [**CEPS**](http://www.ceps.cz/_layouts/15/Ceps/_Pages/GraphData.aspx?mode=xlsx&from=1/1/2010%2012:00:00%20AM&to=12/31/2015%2011:59:59%20PM&hasinterval=False&sol=9&lang=ENG&ver=YF&\n",
    "). It encompasses the daily available capacity reported by the transmission system operator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_CZ = importdata('CZ', 'CEPS')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.5.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Merge columns \"Power plant\" and \"Generating unit\" to one column called \"Name\"\n",
    "data_CZ['name'] = data_CZ[['Power plant', 'Generating unit']].apply(\n",
    "    lambda x: '{} {}'.format(x[0], x[1]), axis=1)\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_CZ = {'Available capacity [MW]': 'capacity',\n",
    "                   'Type of source': 'technology',\n",
    "                   'Country': 'country',\n",
    "                   }\n",
    "data_CZ.rename(columns=dict_columns_CZ, inplace=True)\n",
    "\n",
    "# Filter rows by considering \"name\" and maximum \"capacity\"\n",
    "data_CZ = data_CZ.sort_values(\n",
    "    'capacity', ascending=False).groupby('name', as_index=False).first()\n",
    "\n",
    "# Apply general template of columns\n",
    "data_CZ = data_CZ.reindex(columns=columns_sorted)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_CZ.capacity = data_CZ.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.5.3 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all technology types mentioned in the column \"technology\" and subsequent translation check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate entries in column \"energy_source\" according to column \"technology\"\n",
    "data_CZ.loc[data_CZ['technology'] == 'Jaderná elektrárna',\n",
    "            'energy_source'] = 'Nuclear'\n",
    "data_CZ.loc[data_CZ['technology'] == 'Přečerpávací vodní elektrárna',\n",
    "            'energy_source'] = 'Hydro'\n",
    "data_CZ.loc[data_CZ['technology'] == 'Parní elektrárna',\n",
    "            'energy_source'] = 'Other or unspecified energy sources'\n",
    "data_CZ.loc[data_CZ['technology'] == 'Paroplynová elektrárna',\n",
    "            'energy_source'] = 'Other or unspecified energy sources'\n",
    "\n",
    "# Translate technologies\n",
    "dict_technologies_CZ = {'Přečerpávací vodní elektrárna': 'Pumped storage',\n",
    "                        'Parní elektrárna': 'Steam turbine',\n",
    "                        'Jaderná elektrárna': 'Steam turbine',\n",
    "                        'Paroplynová elektrárna': 'Combined cycle'}\n",
    "data_CZ[\"technology\"].replace(dict_technologies_CZ, inplace=True)\n",
    "\n",
    "# Check if all technologies have been translated\n",
    "for technology in data_CZ[\"technology\"].unique():\n",
    "    if (technology not in dict_technologies_CZ.values()) & (str(technology) != \"NaN\"):\n",
    "        logger.error(\"Untranslated technology: \" + str(technology))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.5.4 Additional information on further power plants, geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_CZ.csv')\n",
    "additional_data_CZ = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate merger\n",
    "data_CZ = data_CZ.merge(additional_data_CZ,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_CZ = data_CZ.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_CZ.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.6 Switzerland CH"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.6.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Swiss Ministry of Energy [**BFE**](http://www.bfe.admin.ch/php/modules/publikationen/stream.php?extlang=de&name=de_416798061.zip&endung=Statistik%20der%20Wasserkraftanlagen%20der%20Schweiz\n",
    "). It encompasses a detailed list of Swiss hydro generation units with comprehensive information on technical specifications."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Hydro power**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_CH = importdata('CH', 'BFE')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Nuclear power plants**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "filepath = os.path.join('input', 'input_plant-list_CH_conventional.csv')\n",
    "data_nuclear_CH = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 4.6.2 Consolidation, translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Consolidation of columns**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Merge columns \"ZE-Erste Inbetriebnahme\" and \"ZE-Letzte Inbetriebnahme\" to one column called \"Commissioned\"\n",
    "data_CH['commissioned'] = data_CH[\n",
    "    ['ZE-Erste Inbetriebnahme', 'ZE-Letzte Inbetriebnahme']].apply(\n",
    "        lambda x: max(x[0], x[1]), axis=1)\n",
    "\n",
    "# Merge columns \"Bemerkung (1) - (10)\" to one column \"Comment\"\n",
    "data_CH['comment'] = data_CH[['Bemerkung (1)',\n",
    "                              'Bemerkung (2)',\n",
    "                              'Bemerkung (3)',\n",
    "                              'Bemerkung (4)',\n",
    "                              'Bemerkung (5)',\n",
    "                              'Bemerkung (6)',\n",
    "                              'Bemerkung (7)',\n",
    "                              'Bemerkung (8)',\n",
    "                              'Bemerkung (9)',\n",
    "                              'Bemerkung (10)']].apply(\n",
    "                                  lambda x:\n",
    "                                      '{}; {}; {}; {}; {}; {}; {}; {}; {}; {}'.format(\n",
    "                                          x[0],\n",
    "                                          x[1],\n",
    "                                          x[2],\n",
    "                                          x[3],\n",
    "                                          x[4],\n",
    "                                          x[5],\n",
    "                                          x[6],\n",
    "                                          x[7],\n",
    "                                          x[8],\n",
    "                                          x[9]), axis=1)\n",
    "\n",
    "data_CH['comment'] = data_CH['comment'].str.replace('nan; ', '')\n",
    "data_CH['comment'] = data_CH['comment'].str.replace('nan', '')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Transformation of geographical hydro power data to WGS84 projection**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LatLon with WGS84 datum used by GPS units and Google Earth\n",
    "wgs84 = pyproj.Proj(\"+init=EPSG:4326\")\n",
    "\n",
    "# CH1903 projection used in BFE-Data\n",
    "ch1903 = pyproj.Proj(\"+init=EPSG:21781\")\n",
    "\n",
    "data_CH[['lon', 'lat']] = data_CH[\n",
    "    ['ZE-Koordinaten unscharf (Nord)',\n",
    "     'ZE-Koordinaten unscharf (Ost)']].apply(\n",
    "    lambda row: pyproj.transform(\n",
    "        ch1903, wgs84, row[1], row[0]), axis=1).apply(pd.Series)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Translation and harmonization of columns**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_CH = {'WKA-Name': 'name',\n",
    "                   'ZE-Standort': 'city',\n",
    "                   'WKA-Typ': 'technology',\n",
    "                   'ZE-Status': 'availability',\n",
    "                   'Inst. Turbinenleistung': 'capacity'}\n",
    "data_CH.rename(columns=dict_columns_CH, inplace=True)\n",
    "\n",
    "# Adjust type of entries in column \"Capacity\"\n",
    "data_CH.capacity = data_CH.capacity.astype(float)\n",
    "\n",
    "# Adjust availabilities\n",
    "dict_availabilities_CH = {'im Normalbetrieb': '1',\n",
    "                          'im Bau': '0',\n",
    "                          'im Umbau': '0',\n",
    "                          'stillgelegt': '0'}\n",
    "data_CH[\"availability\"].replace(dict_availabilities_CH, inplace=True)\n",
    "\n",
    "# List only operating plants\n",
    "data_CH = data_CH[data_CH.availability != '0']\n",
    "\n",
    "# Apply general template of columns\n",
    "data_CH = data_CH.reindex(columns=columns_sorted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.6.3 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Set energy source to \"hydro\"\n",
    "data_CH['energy_source'] = 'Hydro'\n",
    "\n",
    "# Adjust technologies\n",
    "dict_technologies_CH = {'L': 'Run-of-river',\n",
    "                        'S': 'Reservoir',\n",
    "                        'P': 'Pumped storage with natural inflow',\n",
    "                        'U': 'Pumped storage'}\n",
    "data_CH[\"technology\"].replace(dict_technologies_CH, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.6.4 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual hydro power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_CH.csv')\n",
    "additional_data_CH = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate merger\n",
    "data_CH = data_CH.merge(additional_data_CH,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_CH = data_CH.drop(colsToDrop, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.6.5 Merge hydro and nuclear power plant data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_CH = data_CH.append(data_nuclear_CH)\n",
    "\n",
    "data_CH.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.7 Italy IT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.7.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Italian transmission network operator [**TERNA**](http://download.terna.it/terna/0000/0216/16.XLSX). It encompasses a detailed list of Italian generation units of more than 100 MW."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_IT = importdata('IT', 'Terna')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.7.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate columns\n",
    "dict_columns_IT = {'Descrizione Impianto': 'name',\n",
    "                   'TIPOLOGIA': 'energy_source',\n",
    "                   'Comune': 'city',\n",
    "                   'PMAX [MW]': 'capacity',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source',\n",
    "                   'Zona': 'additional_info'}\n",
    "data_IT.rename(columns=dict_columns_IT, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_IT = data_IT.reindex(columns=columns_sorted)\n",
    "\n",
    "# Consider of geographical information in column \"additional_info\"\n",
    "data_IT['additional_info'] = data_IT[['additional_info']].apply(\n",
    "    lambda x: 'Zone: {}'.format(x[0]), axis=1)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_IT.capacity = data_IT.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.7.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy source types mentioned in the column \"energy_sources\" and subsequent translation check. Deletion of rows containing \"wind\" and \"geothermal_power\"as energy source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate types of energy sources\n",
    "dict_energysources_IT = {'GEOTERMICO': 'Geothermal',\n",
    "                         'TERMOELETTRICO': 'Fossil fuels',\n",
    "                         'IDROELETTRICO': 'Hydro',\n",
    "                         'EOLICO': 'Wind'}\n",
    "data_IT[\"energy_source\"].replace(dict_energysources_IT, inplace=True)\n",
    "\n",
    "# Check if all energy sources have been translated\n",
    "for energysource in data_IT[\"energy_source\"].unique():\n",
    "    if (energysource not in dict_energysources_IT.values()) & (str(energysource) != \"NaN\"):\n",
    "        logger.error(\"Untranslated energy source: \" + str(energysource))\n",
    "\n",
    "# Delete unwanted energy sources in column \"energy_source\"\n",
    "data_IT = data_IT[data_IT.energy_source != 'Wind']\n",
    "data_IT = data_IT[data_IT.energy_source != 'Geothermal']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.7.4 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_IT.csv')\n",
    "additional_data_IT = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate merger\n",
    "data_IT = data_IT.merge(additional_data_IT,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_IT = data_IT.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_IT.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.8 Finland FI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.8.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Finnish [**Energy Authority**](http://www.energiavirasto.fi/documents/10191/0/Energiaviraston+Voimalaitosrekisteri+010117.xlsx). It encompasses a detailed list of Finnish generation units of at least one megavolt ampere [1 MVA]."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_FI = importdata('FI', 'EnergyAuthority')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.8.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate entries in column \"CHP\"\n",
    "data_FI.loc[data_FI[\n",
    "    'Combined Heat and Power Production, Industry,Maximum, Total, MW'] > 0,\n",
    "    'chp'] = 'Yes'\n",
    "data_FI.loc[data_FI[\n",
    "    'Combined Heat and Power Production, District Heating, Total, MW'] > 0,\n",
    "    'chp'] = 'Yes'\n",
    "\n",
    "# Drop columns not needed anymore\n",
    "colsToDrop = ['Business ID',\n",
    "              'Location',\n",
    "              'Separate power production, Maximum, Hour, MW',\n",
    "              'Separate power production, Decomissioned, Hour, MW',\n",
    "              'Combined Heat and Power Production, Industry,Maximum, Total, MW',\n",
    "              'Combined Heat and Power Production, Industry,Hour, Total, MW',\n",
    "              'Combined Heat and Power Production, Industry, Decomissioned, Total, MW',\n",
    "              'Combined Heat and Power Production, District Heating, Total, MW',\n",
    "              'Combined Heat and Power Production, District Heating, Hour, MW',\n",
    "              'Combined Heat and Power Production, District Heating, Decomissioned, Total, MW',\n",
    "              'Separate power production, Maximum, Total, MW',\n",
    "              'Hour, total, MW',\n",
    "              'Decomissioned, Total, MW',\n",
    "              'Standby fuel ',\n",
    "              'Standby fuel']\n",
    "data_FI = data_FI.drop(colsToDrop, axis=1)\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_FI = {'Name': 'name',\n",
    "                   'Company': 'company',\n",
    "                   'Type': 'type',\n",
    "                   'Address': 'street',\n",
    "                   'Town': 'city',\n",
    "                   'Postal code': 'postcode',\n",
    "                   'Maximum, total, MW': 'capacity',\n",
    "                   'Main fuel': 'energy_source',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "data_FI.rename(columns=dict_columns_FI, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_FI = data_FI.reindex(columns=columns_sorted)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_FI.capacity = data_FI.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.8.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\" and subsequent translation check. Generation of entries for the column \"energy_scoures\" according to information given in the column \"type\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Rename types of energy sources\n",
    "dict_energysources_FI = {'Biogas': 'Biomass and biogas',\n",
    "                         'Black liquor and concentrated liquors': 'Biomass and biogas',\n",
    "                         'Blast furnace gas': 'Other fossil fuels',\n",
    "                         'By-products from wood processing industry': 'Biomass and biogas',\n",
    "                         'Exothermic heat from industry': 'Other or unspecified energy sources',\n",
    "                         'Forest fuelwood': 'Biomass and biogas',\n",
    "                         'Gasified waste': 'Non-renewable waste',\n",
    "                         'Hard coal and anthracite': 'Hard coal',\n",
    "                         'Heavy distillates': 'Oil',\n",
    "                         'Industrial wood residues': 'Biomass and biogas',\n",
    "                         'Light distillates': 'Oil',\n",
    "                         'Medium heavy distillates': 'Oil',\n",
    "                         'Mixed fuels': 'Mixed fossil fuels',\n",
    "                         'Natural gas': 'Natural gas',\n",
    "                         'Nuclear energy': 'Nuclear',\n",
    "                         'Other by-products and wastes used as fuel': 'Other fossil fuels',\n",
    "                         'Other non-specified energy sources': 'Other or unspecified energy sources',\n",
    "                         'Peat': 'Biomass and biogas',\n",
    "                         ' ': 'Other or unspecified energy sources',\n",
    "                         np.nan: 'Other or unspecified energy sources'}\n",
    "data_FI[\"energy_source\"].replace(dict_energysources_FI, inplace=True)\n",
    "data_FI[\"energy_source\"].replace('NaN', np.nan, inplace=True)\n",
    "\n",
    "# Check if all energy sources have been translated\n",
    "for energysource in data_FI[\"energy_source\"].unique():\n",
    "    if (energysource not in dict_energysources_FI.values()) & (str(energysource) != \"NaN\"):\n",
    "        logger.error(\"Untranslated energy source: \" + str(energysource))\n",
    "\n",
    "# Generate entries in column \"energy_sources\" for hydro and wind stations according to column \"type\"\n",
    "data_FI.loc[data_FI['type'] == 'Hydro power', 'energy_source'] = 'Hydro'\n",
    "data_FI.loc[data_FI['type'] == 'Wind power', 'energy_source'] = 'Wind'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.8.4 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for the column \"technology\" according to information given in the column \"energy_source\". Deletion of rows containing \"wind\" as energy source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate entries in column \"technology\" according to column \"energy_source\"\n",
    "data_FI.loc[data_FI['energy_source'] == 'Nuclear',\n",
    "            'technology'] = 'Steam turbine'\n",
    "data_FI.loc[data_FI['energy_source'] == 'Hard coal',\n",
    "            'technology'] = 'Steam turbine'\n",
    "\n",
    "# Delete unwanted energy source (wind) in column \"energy_source\"\n",
    "data_FI = data_FI[data_FI.energy_source != 'Wind']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.8.5 Definition of generation type"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all types mentioned in the column \"type\" and subsequent translation check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Rename types\n",
    "dict_types_FI = {'District heating CHP': 'CHP',\n",
    "                 'Hydro power': 'NaN',\n",
    "                 'Industry CHP': 'IPP',\n",
    "                 'Nuclear energy': 'NaN',\n",
    "                 'Separate electricity production': 'NaN',\n",
    "                 'Wind power': 'NaN'}\n",
    "data_FI[\"type\"].replace(dict_types_FI, inplace=True)\n",
    "data_FI[\"type\"].replace('NaN', np.nan, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.8.6 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_FI.csv')\n",
    "additional_data_FI = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate merger\n",
    "data_FI = data_FI.merge(additional_data_FI,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_FI = data_FI.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_FI.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.9 Spain ES"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.9.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Spanish [**SEDE - Ministry of Industry, Energy and Tourism**](http://www6.mityc.es/aplicaciones/electra/ElectraExp.csv.zip). It encompasses a detailed list of Spanish generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_ES = importdata('ES', 'SEDE')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.9.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall adjustment of all columns within the dataframe. Translation, addition, deletion, sorting of columns as well as adjustment of the column entries' types. Adjustment of the entries' units from kW to MW in the columns \"Capacity\" (corresponding to the net capacity in the original data set)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Delete unwanted regions\n",
    "data_ES = data_ES[data_ES.Autonomia != 'Ceuta']\n",
    "data_ES = data_ES[data_ES.Autonomia != 'Melilla']\n",
    "data_ES = data_ES[data_ES.Autonomia != 'Canarias']\n",
    "data_ES = data_ES[data_ES.Autonomia != 'Baleares']\n",
    "\n",
    "# Delete unwanted energy source\n",
    "data_ES = data_ES[data_ES.Tecnologia != 'Fotovoltaica']\n",
    "\n",
    "# Drop columns not needed anymore\n",
    "colsToDrop = ['N. Orden',\n",
    "              'Tipo Regimen',\n",
    "              'Autonomia',\n",
    "              'Provincia',\n",
    "              'F. Alta',\n",
    "              'F. Baja',\n",
    "              'F. Alta Provicional',\n",
    "              'Alta Registro']\n",
    "data_ES = data_ES.drop(colsToDrop, axis=1)\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_ES = {'Titular': 'company',\n",
    "                   'Nombre de la instalacion': 'name',\n",
    "                   'Municipio': 'city',\n",
    "                   'Tecnologia': 'technology',\n",
    "                   'Comment': 'comment',\n",
    "                   'Potencia Neta': 'capacity',\n",
    "                   'Potencia Bruta': 'gross_capacity',\n",
    "                   'Combustible': 'energy_source',\n",
    "                   'F. Puesta En Servicio': 'commissioned',\n",
    "                   'Country': 'country',\n",
    "                   'Source': 'source'}\n",
    "data_ES.rename(columns=dict_columns_ES, inplace=True)\n",
    "\n",
    "# Check if all columns have been renamed\n",
    "for columnnames in data_ES.columns:\n",
    "    if columnnames not in dict_columns_ES.values():\n",
    "        logger.error(\"Not renamed column: \" + columnnames)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_ES = data_ES.reindex(columns=columns_sorted)\n",
    "\n",
    "# Change unit of column 'capacity' from kW to MW\n",
    "data_ES.capacity = data_ES.capacity.astype(float)\n",
    "data_ES['capacity'] = (data_ES['capacity'] / 1000)\n",
    "\n",
    "# Define commissioning year\n",
    "data_ES['commissioned'] = pd.to_datetime(\n",
    "    data_ES['commissioned'], format='%d/%m/%Y')\n",
    "data_ES['commissioned'] = pd.DatetimeIndex(\n",
    "    data_ES['commissioned']).year"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.9.3 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_sources\" and subsequent translation check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dict_energysources_ES = {'Biocombustibles liquidos': 'Biomass and biogas',\n",
    "                         'Biogas': 'Biomass and biogas',\n",
    "                         'Biogas de digestion': 'Biomass and biogas',\n",
    "                         'Biogas de vertedero': 'Biomass and biogas',\n",
    "                         'Biomasa industrial agricola': 'Biomass and biogas',\n",
    "                         'Biomasa industrial forestal': 'Biomass and biogas',\n",
    "                         'Biomasa primaria': 'Biomass and biogas',\n",
    "                         'Calor residual': 'Other or unspecified energy sources',\n",
    "                         'Carbon': 'Hard coal',\n",
    "                         'CARBON IMPORTADO': 'Hard coal',\n",
    "                         'Cultivos energeticos agricolas o forestales': 'Biomass and biogas',\n",
    "                         'DIESEL': 'Oil',\n",
    "                         'Energias residuales': 'Non-renewable waste',\n",
    "                         'Fuel': 'Oil',\n",
    "                         'FUEL-OIL 0,3': 'Oil',\n",
    "                         'FUELOLEO': 'Oil',\n",
    "                         'GAS DE REFINERIA': 'Natural gas',\n",
    "                         'Gas natural': 'Natural gas',\n",
    "                         'GAS NATURAL': 'Natural gas',\n",
    "                         'Gas residual': 'Natural gas',\n",
    "                         'Gasoleo': 'Oil',\n",
    "                         'GASOLEO': 'Oil',\n",
    "                         'HULLA+ANTRACITA': 'Hard coal',\n",
    "                         'Licores negros': 'Biomass and biogas',\n",
    "                         'LIGNITO NEGRO': 'Lignite',\n",
    "                         'LIGNITO PARDO': 'Lignite',\n",
    "                         'NUCLEAR': 'Nuclear',\n",
    "                         'Propano': 'Natural gas',\n",
    "                         'Residuo aprovechamiento forestal o selvicola': 'Other bioenergy and renewable waste',\n",
    "                         'Residuos': 'Non-renewable waste',\n",
    "                         'Residuos actividad agricolas o jardineria': 'Other bioenergy and renewable waste',\n",
    "                         'Residuos industriales': 'Non-renewable waste',\n",
    "                         'Residuos solidos urbanos': 'Non-renewable waste',\n",
    "                         'RESIDUOS SOLIDOS URBANOS': 'Non-renewable waste',\n",
    "                         ' ': 'Other or unspecified energy sources',\n",
    "                         np.nan: 'Other or unspecified energy sources'}\n",
    "\n",
    "data_ES[\"energy_source\"].replace(dict_energysources_ES, inplace=True)\n",
    "data_ES[\"energy_source\"].replace('NaN', np.nan, inplace=True)\n",
    "\n",
    "# Check if all energy sources have been translated\n",
    "for energysource in data_ES[\"energy_source\"].unique():\n",
    "    if (energysource not in dict_energysources_ES.values()) & (str(energysource) != \"NaN\"):\n",
    "        logger.error(\"Not renamed energy source: \" + str(energysource))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.9.4 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all technology types mentioned in the column \"technology\" and subsequent translation check. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dict_technologies_ES = {'Aprovechamiento de energias residuales': 'NaN',\n",
    "                        'Biogas': 'NaN',\n",
    "                        'Biomasa': 'NaN',\n",
    "                        'Cogeneracion': 'NaN',\n",
    "                        'Eolica terrestre': 'Onshore',\n",
    "                        'Fotovoltaica': 'Photovoltaics',\n",
    "                        'Hidraulica fluyente': 'Run-of-river',\n",
    "                        'Hidraulica': 'Differently categorized hydro',\n",
    "                        'Motor': 'Combustion engine',\n",
    "                        'Otras': 'NaN',\n",
    "                        'Termica': 'NaN',\n",
    "                        'Termica clasica': 'NaN',\n",
    "                        'Termonuclear': 'Steam turbine',\n",
    "                        'Tratamiento de residuos': 'NaN',\n",
    "                        'Turbina': 'NaN',\n",
    "                        'Turbina de gas': 'Gas turbine',\n",
    "                        'Residuos': 'NaN',\n",
    "                        'Solar Termoelectrica': 'Concentrated solar power',\n",
    "                        ' ': 'NaN',\n",
    "                        'nan': 'NaN',\n",
    "                        np.nan: 'NaN',\n",
    "                       }\n",
    "\n",
    "data_ES[\"technology\"].replace(dict_technologies_ES, inplace=True)\n",
    "data_ES[\"technology\"].replace('NaN', np.nan, inplace=True)\n",
    "\n",
    "# Check if all technologies have been translated\n",
    "for technology in data_ES[\"technology\"].unique():\n",
    "    if (technology not in dict_technologies_ES.values()) & (str(technology) != \"NaN\"):\n",
    "        logger.error(\"Untranslated technology: \" + str(technology))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.9.5 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for the column \"energy_source\" according to information given in the column \"technology\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate entries in column \"energy_source\" according to technologies\n",
    "data_ES.loc[data_ES['technology'] == 'Differently categorized hydro',\n",
    "            'energy_source'] = 'Hydro'\n",
    "data_ES[\"technology\"].replace('Differently categorized hydro', np.nan, inplace=True)\n",
    "\n",
    "data_ES.loc[data_ES['technology'] == 'Run-of-river',\n",
    "            'energy_source'] = 'Hydro'\n",
    "data_ES.loc[data_ES['technology'] == 'Onshore',\n",
    "            'energy_source'] = 'Wind'\n",
    "data_ES.loc[data_ES['technology'] == 'Photovoltaics',\n",
    "            'energy_source'] = 'Solar'\n",
    "data_ES.loc[data_ES['technology'] == 'Concentrated solar power',\n",
    "            'energy_source'] = 'Solar'\n",
    "\n",
    "# Delete unwanted energy sources\n",
    "data_ES = data_ES[data_ES.energy_source != 'Wind']\n",
    "data_ES = data_ES[data_ES.energy_source != 'Solar']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.9.6 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants. **Please note**: The naming of power plants in the Spanish dataset is not unique creating duplicated entries during merge with geographical data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_ES.csv')\n",
    "additional_data_ES = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Merge plant data with manually compiled geo-coordinates (currently duplicate entries due to merge based on name)\n",
    "data_ES = data_ES.merge(additional_data_ES,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop duplicate entries\n",
    "data_ES = data_ES.drop_duplicates(subset=['company', 'name', 'city'], keep='first')\n",
    "\n",
    "# Drop columns after merge\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_ES = data_ES.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_ES.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.10 United Kingdom UK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.10.1 Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the British government's [**Statistical Office**](https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/446457/dukes5_10.xls). It encompasses a detailed list of British generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_UK = importdata('UK', 'GOV')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.10.2 Translation and harmonization of columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The imported data is standardized with respect to the columns as defined in section [2.4](#2.6-Definition-of-harmonized-output-scheme). In a first step, existing and output-relevant columns are translated and remaining columns are deleted in a second step. Columns which are not exist in the data set, but required for the output are additionally added in this process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Rename sixth column\n",
    "data_UK.columns.values[5] = 'Location'\n",
    "\n",
    "# Drop rows without station names, so that the footnotes at the end of the list are deleted\n",
    "data_UK = data_UK.dropna(subset=['Station Name'])\n",
    "\n",
    "# Drop columns not needed anymore\n",
    "colsToDrop = ['Footnotes']\n",
    "data_UK = data_UK.drop(colsToDrop, axis=1)\n",
    "\n",
    "# Rename columns\n",
    "dict_columns_UK = {'Company Name': 'company',\n",
    "                   'Station Name': 'name',\n",
    "                   'Installed Capacity (MW)': 'capacity',\n",
    "                   'Country': 'country',\n",
    "                   'Location': 'location',\n",
    "                   'Fuel': 'energy_source',\n",
    "                   'Year of commission or year generation began': 'commissioned',\n",
    "                   'Source': 'source'}\n",
    "data_UK.rename(columns=dict_columns_UK, inplace=True)\n",
    "\n",
    "# Check if all columns have been renamed\n",
    "for columnnames in data_UK.columns:\n",
    "    if columnnames not in dict_columns_UK.values():\n",
    "        logger.error(\"Not renamed column: \" + columnnames)\n",
    "\n",
    "# Adjust names of region\n",
    "dict_regions_UK = {'East': 'England',\n",
    "                   'East Midlands': 'England',\n",
    "                   'London': 'England',\n",
    "                   'North East': 'England',\n",
    "                   'North West': 'England',\n",
    "                   'South East': 'England',\n",
    "                   'South West': 'England',\n",
    "                   'West Midlands': 'England',\n",
    "                   'Yorkshire and the Humber': 'England',\n",
    "                   'N Ireland': 'Northern Ireland'}\n",
    "data_UK[\"location\"].replace(dict_regions_UK, inplace=True)\n",
    "\n",
    "# Merge columns \"Country\" and \"Location\" to one column called \"Country\"\n",
    "data_UK['additional_info'] = data_UK[['location']].apply(\n",
    "    lambda x: 'Region: {}'.format(x[0]), axis=1)\n",
    "\n",
    "# Drop column \"Location\" after merger\n",
    "colsToDrop = ['location']\n",
    "data_UK = data_UK.drop(colsToDrop, axis=1)\n",
    "\n",
    "# Adjust commissioning years\n",
    "dict_commissioning_UK = {'1926/2002': '2002'}\n",
    "data_UK[\"commissioned\"].replace(dict_commissioning_UK, inplace=True)\n",
    "\n",
    "# Apply general template of columns\n",
    "data_UK = data_UK.reindex(columns=columns_sorted)\n",
    "\n",
    "# Adjust types of entries in all columns\n",
    "data_UK.capacity = data_UK.capacity.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.10.3 Definition of generation technology types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generation of entries for the column \"technology\" according to information given in the column \"energy_source\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate entries in column \"technology\" according to column \"energy_source\"\n",
    "data_UK.loc[data_UK['energy_source'] == 'Hydro / pumped storage', \n",
    "            'technology'] = 'Pumped storage'\n",
    "data_UK.loc[data_UK['energy_source'] == 'Pumped storage',\n",
    "            'technology'] = 'Pumped storage'\n",
    "data_UK.loc[data_UK['energy_source'] == 'Wind',\n",
    "            'technology'] = 'Onshore'\n",
    "data_UK.loc[data_UK['energy_source'] == 'Wind (offshore)',\n",
    "            'technology'] = 'Offshore'\n",
    "data_UK.loc[data_UK['energy_source'] == 'Nuclear',\n",
    "            'technology'] = 'Steam turbine'\n",
    "data_UK.loc[data_UK['energy_source'] == 'CCGT',\n",
    "            'technology'] = 'Combined cycle'\n",
    "data_UK.loc[data_UK['energy_source'] == 'OCGT',\n",
    "            'technology'] = 'Gas turbine'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.10.4 Definition of energy sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall translation of all energy sources types mentioned in the column \"energy_source\" and subsequent translation check. Deletion of rows containing \"wind\" as energy source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dict_energysources_UK = {'Biomass': 'Biomass and biogas',\n",
    "                         'Biomass / gas / waste derived fuel': 'Mixed fossil fuels',\n",
    "                         'CCGT': 'Natural gas',\n",
    "                         'Coal': 'Hard coal',\n",
    "                         'Coal / biomass': 'Mixed fossil fuels',\n",
    "                         'Coal / biomass / gas / waste derived fuel': 'Mixed fossil fuels',\n",
    "                         'Coal / oil': 'Mixed fossil fuels',\n",
    "                         'Coal/oil': 'Mixed fossil fuels',\n",
    "                         'Diesel': 'Oil',\n",
    "                         'Gas': 'Natural gas',\n",
    "                         'Gas / oil': 'Mixed fossil fuels',\n",
    "                         'Gas oil': 'Oil',\n",
    "                         'Gas oil / kerosene': 'Oil',\n",
    "                         'Hydro': 'Hydro',\n",
    "                         'Hydro / pumped storage': 'Hydro',\n",
    "                         'Light oil': 'Oil',\n",
    "                         'Meat & bone meal': 'Other bioenergy and renewable waste',\n",
    "                         'Nuclear': 'Nuclear',\n",
    "                         'OCGT': 'Natural gas',\n",
    "                         'Oil': 'Oil',\n",
    "                         'Light oil ': 'Oil',\n",
    "                         'Pumped storage': 'Hydro',\n",
    "                         'Straw': 'Biomass and biogas',\n",
    "                         'Waste': 'Non-renewable waste',\n",
    "                         'Wind': 'Wind',\n",
    "                         'Wind (offshore)': 'Wind',\n",
    "                         'Solar': 'Solar'}\n",
    "data_UK[\"energy_source\"].replace(dict_energysources_UK, inplace=True)\n",
    "\n",
    "# Check if all energy sources have been translated\n",
    "for energysource in data_UK[\"energy_source\"].unique():\n",
    "    if (energysource not in dict_energysources_UK.values()) & (str(energysource) != \"NaN\"):\n",
    "        logger.error(\"Not renamed energy source: \" + str(energysource))\n",
    "\n",
    "# Delete unwanted energy sources\n",
    "data_UK = data_UK[data_UK.energy_source != 'Wind']\n",
    "data_UK = data_UK[data_UK.energy_source != 'Solar']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.10.5 Additional information on geographic coordinates and EIC codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a manually compiled list is used to define the geographic coordinates of indivdual power plants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access the second list with EIC-Codes and geographic coordinates\n",
    "filepath = os.path.join('input', 'input_plant_locations_UK.csv')\n",
    "additional_data_UK = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Initiate merger\n",
    "data_UK = data_UK.merge(additional_data_UK,\n",
    "                        left_on='name',\n",
    "                        right_on='name',\n",
    "                        how='left',\n",
    "                        suffixes=('_x', ''))\n",
    "\n",
    "# Drop columns after merger\n",
    "colsToDrop = ['eic_code_x', 'lat_x', 'lon_x']\n",
    "data_UK = data_UK.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_UK.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.11 Norway NO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the power exchange [**Nordpool**](http://www.nordpoolspot.com/globalassets/download-center/tso/generation-capacity_norway_valid-from-2-december-2013_larger-than-100mw.pdf). It encompasses a detailed list of Norwegian generation units with a capacity of more than 100 MW for 2013."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access local file\n",
    "filepath = os.path.join('input', 'input_plant-list_NO.csv')\n",
    "data_NO = pd.read_csv(filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "data_NO.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.12 Sweden SE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the power exchange [**Nordpool**](http://www.nordpoolspot.com/globalassets/download-center/tso/generation-capacity_sweden_larger-than-100mw-per-unit_17122014.pdf). It encompasses a detailed list of Swedish generation units with a capacity of more than 100 MW for 2014."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access local file\n",
    "filepath = os.path.join('input', 'input_plant-list_SE.csv')\n",
    "data_SE = pd.read_csv(filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "data_SE.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.13 Slovakia SK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Slovakian utility [**Slovenské elektrárne a.s. (SEAS)**](https://www.seas.sk/power-plants). It encompasses a detailed list of Slovak generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access local file\n",
    "filepath = os.path.join('input', 'input_plant-list_SK.csv')\n",
    "data_SK = pd.read_csv(filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Delete unwanted energy source\n",
    "data_SK = data_SK[data_SK.energy_source != 'Solar']\n",
    "\n",
    "data_SK.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.14 Slovenia SI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by several Slovenian utilities. The respective data links are given in the column \"source\". This list encompasses Slovenian generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access local file\n",
    "filepath = os.path.join('input', 'input_plant-list_SI.csv')\n",
    "data_SI = pd.read_csv(filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "data_SI.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.15 Austria AT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data for **conventional power plants** is provided by several Austrian utilities. The respective data links are given in the column \"source\". The specifications of Austrian **hydro power plants**, however, solely are based on [**Verbund AG**](https://www.verbund.com/de-at/ueber-verbund/kraftwerke/unsere-kraftwerke). The resulting list encompasses Austrian generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access local file for conventional plants\n",
    "filepath_conventional = os.path.join(\n",
    "    'input', 'input_plant-list_AT_conventional.csv')\n",
    "data_AT_conventional = pd.read_csv(\n",
    "    filepath_conventional, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Access local file for hydro plants\n",
    "filepath_hydro = os.path.join('input', 'input_plant-list_AT_hydro.csv')\n",
    "data_AT_hydro = pd.read_csv(\n",
    "    filepath_hydro, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# Merge the lists\n",
    "data_AT = data_AT_conventional.append(data_AT_hydro, ignore_index=True)\n",
    "\n",
    "data_AT.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.16 Denmark DK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is provided by the Danish transmission network operator [**Energinet.dk**](https://www.energinet.dk/SiteCollectionDocuments/Engelske%20dokumenter/El/Energinet%20dk%27s%20assumptions%20for%20analysis%202014-2035,%20September%202014.xlsm). It encompasses a detailed list of Danish generation units with comprehensive information on technologies and energy fuels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Access local file for conventional plants\n",
    "filepath = os.path.join('input', 'input_plant-list_DK.csv')\n",
    "data_DK = pd.read_csv(\n",
    "    filepath, encoding='utf-8', header=0, index_col=None)\n",
    "\n",
    "# List only operating plants\n",
    "data_DK = data_DK[data_DK.availability != '0']\n",
    "data_DK = data_DK[data_DK.availability != 'partly']\n",
    "\n",
    "data_DK.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. Consolidation of processed country data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following, the national datasets are consolidated to a single European dataset. Unfortunately, the Belgian dataset cannot be integrated due to the copyright by the data owner ELIA."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dataframes = [#data_BE,\n",
    "              data_NL,\n",
    "              data_FR,\n",
    "              data_PL,\n",
    "              data_CZ,\n",
    "              data_CH,\n",
    "              data_IT,\n",
    "              data_FI,\n",
    "              data_ES,\n",
    "              data_UK,\n",
    "              data_NO,\n",
    "              data_SE,\n",
    "              data_SK,\n",
    "              data_SI,\n",
    "              data_AT,\n",
    "              data_DK]\n",
    "\n",
    "data_EU = pd.concat(dataframes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.1 Implementation of energy source levels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Import energy source level definition\n",
    "energy_source_levels = pd.read_csv(\n",
    "    os.path.join('input', 'energy_source_levels.csv'), index_col=None, header=0)\n",
    "\n",
    "# Merge energy source levels to data set\n",
    "data_EU = data_EU.reset_index().merge(\n",
    "    energy_source_levels,\n",
    "    how='left',\n",
    "    left_on='energy_source',\n",
    "    right_on='energy_source_level_1').drop_duplicates(\n",
    "        subset=['name',\n",
    "                'city',\n",
    "                'country',\n",
    "                'capacity'], keep='first').set_index('name')\n",
    "\n",
    "data_EU = data_EU.reset_index().merge(\n",
    "    energy_source_levels,\n",
    "    how='left',\n",
    "    left_on='energy_source',\n",
    "    right_on='energy_source_level_2').drop_duplicates(\n",
    "        subset=['name',\n",
    "                'city',\n",
    "                'country',\n",
    "                'capacity'], keep='first').set_index('name')\n",
    "\n",
    "data_EU = data_EU.reset_index().merge(\n",
    "    energy_source_levels,\n",
    "    how='left',\n",
    "    left_on='energy_source',\n",
    "    right_on='energy_source_level_3').drop_duplicates(\n",
    "        subset=['name',\n",
    "                'city',\n",
    "                'country',\n",
    "                'capacity'], keep='first').set_index('name')\n",
    "\n",
    "# Combine different energy source levels created by merge\n",
    "data_EU['energy_source_level_1'] = data_EU[\n",
    "    ['energy_source_level_1',\n",
    "     'energy_source_level_1_x',\n",
    "     'energy_source_level_1_y']].fillna('').sum(axis=1)\n",
    "\n",
    "data_EU['energy_source_level_2'] = data_EU[\n",
    "    ['energy_source_level_2',\n",
    "     'energy_source_level_2_y']].fillna('').sum(axis=1)\n",
    "\n",
    "data_EU['energy_source_level_3'] = data_EU[\n",
    "    ['energy_source_level_3']].fillna('').sum(axis=1)\n",
    "\n",
    "# Drop auxiliary columns due to merge\n",
    "colsToDrop = ['energy_source_level_1_y',\n",
    "              'energy_source_level_2_y',\n",
    "              'energy_source_level_3_y',\n",
    "              'energy_source_level_1_x',\n",
    "              'energy_source_level_2_x',\n",
    "              'energy_source_level_3_x']\n",
    "data_EU = data_EU.drop(colsToDrop, axis=1)\n",
    "\n",
    "data_EU.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.2 Definition of structure and data types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we define the ordering of the columns. Secondly, the data types are redefined. At the moment, this has the drawback that empty columns are redefined as `float` instead of `object`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "columns_sorted_output = ['name',\n",
    "                         'company',\n",
    "                         'street',\n",
    "                         'postcode',\n",
    "                         'city',\n",
    "                         'country',\n",
    "                         'capacity',\n",
    "                         'energy_source',\n",
    "                         'technology',\n",
    "                         'chp',\n",
    "                         'commissioned',\n",
    "                         'type',\n",
    "                         'lat',\n",
    "                         'lon',\n",
    "                         'eic_code',\n",
    "                         'energy_source_level_1',\n",
    "                         'energy_source_level_2',\n",
    "                         'energy_source_level_3',\n",
    "                         'additional_info',\n",
    "                         'comment',\n",
    "                         'source']\n",
    "\n",
    "# Set ordering of columns\n",
    "data_EU = data_EU.reset_index()\n",
    "data_EU = data_EU.reindex(columns=columns_sorted_output)\n",
    "\n",
    "# Set data types for columns\n",
    "data_EU = data_EU.astype(str)\n",
    "data_EU[['capacity', 'commissioned', 'lat', 'lon']] = data_EU[\n",
    "    ['capacity', 'commissioned', 'lat', 'lon']].astype(float)\n",
    "\n",
    "data_EU.replace('nan', np.nan, inplace=True)\n",
    "\n",
    "# data_EU.dtypes\n",
    "\n",
    "# Set index\n",
    "data_EU = data_EU.set_index('name')\n",
    "\n",
    "data_EU.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6. Documentation of the data package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We document the data packages meta data in the specific format JSON as proposed by the Open Knowledge Foundation. See the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.\n",
    "\n",
    "In order to keep the notebook more readable, we first formulate the metadata in the human-readable YAML format using a multi-line string. We then parse the string into a Python dictionary and save that to disk as a JSON file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with open(os.path.join('input', 'metadata.yml'), 'r') as f:\n",
    "    metadata = yaml.load(f.read())\n",
    " \n",
    "datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7. Result export"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.1 Write results to file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "output_path = 'output'\n",
    "\n",
    "data_EU.to_csv(os.path.join(\n",
    "    output_path, 'conventional_power_plants_EU.csv'),\n",
    "    encoding='utf-8',\n",
    "    index_label='name')\n",
    "\n",
    "data_EU.to_excel(\n",
    "    os.path.join(output_path, 'conventional_power_plants_EU.xlsx'),\n",
    "    sheet_name='plants',\n",
    "    index_label='name')\n",
    "\n",
    "data_EU.to_sql(\n",
    "    'conventional_power_plants_EU',\n",
    "    sqlite3.connect(os.path.join(output_path, 'conventional_power_plants.sqlite')),\n",
    "    if_exists=\"replace\",\n",
    "    index_label='name')\n",
    "\n",
    "with open(os.path.join(output_path, 'datapackage.json'), 'w') as f:\n",
    "    f.write(datapackage_json)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.2 Write checksums"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "files = [\n",
    "    'conventional_power_plants_DE.csv', 'conventional_power_plants_DE.xlsx',\n",
    "    'conventional_power_plants_EU.csv', 'conventional_power_plants_EU.xlsx',\n",
    "    'conventional_power_plants.sqlite']\n",
    "\n",
    "with open('checksums.txt', 'w') as f:\n",
    "    for file_name in files:\n",
    "        file_hash = get_sha_hash(os.path.join(output_path, file_name))\n",
    "        f.write('{},{}\\n'.format(file_name, file_hash))"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}