{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"width:100%; background-color: #D9EDF7; border: 1px solid #CFCFCF; text-align: left; padding: 10px;\">\n",
    "      <b>Conventional Power Plants: Power Plants in Germany</b>\n",
    "      <ul>\n",
    "        <li><a href=\"main.ipynb\">Main Notebook</a></li>\n",
    "        <li>Processing notebook for German power plant</li>\n",
    "        <li><a href=\"download_and_process_EU.ipynb\">Processing notebook for European power plants</a></li>\n",
    "      </ul>\n",
    "      <br>This notebook is part of the <a href=\"http://data.open-power-system-data.org/DATA PACKAGE NAME HERE\"> Data package name here Data Package</a> of <a href=\"http://open-power-system-data.org\">Open Power System Data</a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Power Plants in Germany"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This file covers german power plants. It downloads the power plant list from the German Federal Network Agency (BNetzA) and augments it with more information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of Contents\n",
    "* [1. Power Plants in Germany](#1.-Power-Plants-in-Germany)\n",
    "* [2. Prepare the environment](#2.-Prepare-the-environment)\n",
    "* [3. Download settings](#3.-Download-settings)\n",
    "\t* [3.1 Choose download location](#3.1-Choose-download-location)\n",
    "* [4. Define functions](#4.-Define-functions)\n",
    "* [5. Downloads](#5.-Downloads)\n",
    "\t* [5.1 Download the BNetzA power plant list](#5.1-Download-the-BNetzA-power-plant-list)\n",
    "\t* [5.2 Download the Uba Plant list](#5.2-Download-the-Uba-Plant-list)\n",
    "* [6. Translate contents](#6.-Translate-contents)\n",
    "\t* [6.1 Columns](#6.1-Columns)\n",
    "\t* [6.2 Fuel types](#6.2-Fuel-types)\n",
    "\t* [6.3 Power plant status](#6.3-Power-plant-status)\n",
    "\t* [6.4 CHP Capability](#6.4-CHP-Capability)\n",
    "\t* [6.5 EEG](#6.5-EEG)\n",
    "\t* [6.6 UBA Columns](#6.6-UBA-Columns)\n",
    "* [7. Process data](#7.-Process-data)\n",
    "\t* [7.1 Set index to the BNetzA power plant ID](#7.1-Set-index-to-the-BNetzA-power-plant-ID)\n",
    "\t* [7.2 Merge data from UBA List](#7.2-Merge-data-from-UBA-List)\n",
    "\t\t* [7.2.1 case 1-1](#7.2.1-case-1-1)\n",
    "\t\t* [7.2.2 case n-1](#7.2.2-case-n-1)\n",
    "\t\t* [7.2.3 case 1-n](#7.2.3-case-1-n)\n",
    "\t\t* [7.2.4 Merge into plantlist](#7.2.4-Merge-into-plantlist)\n",
    "\t* [7.3 Delete fuels not in focus](#7.3-Delete-fuels-not-in-focus)\n",
    "\t* [7.4 Add Columns for shutdown and retrofit](#7.4-Add-Columns-for-shutdown-and-retrofit)\n",
    "\t* [7.5 Convert input colums to usable data types](#7.5-Convert-input-colums-to-usable-data-types)\n",
    "\t* [7.6 Identify generation technology](#7.6-Identify-generation-technology)\n",
    "\t\t* [7.6.1 Process technology information from UBA list](#7.6.1-Process-technology-information-from-UBA-list)\n",
    "\t\t* [7.6.2 Identify generation technology based on BNetzA information](#7.6.2-Identify-generation-technology-based-on-BNetzA-information)\n",
    "\t* [7.7 Add country code](#7.7-Add-country-code)\n",
    "\t* [7.8 Add efficiency data](#7.8-Add-efficiency-data)\n",
    "\t\t* [7.8.1 Efficiencies from research](#7.8.1-Efficiencies-from-research)\n",
    "\t\t\t* [7.8.1.1 Import data](#7.8.1.1-Import-data)\n",
    "\t\t\t* [7.8.1.2 Plot efficiencies by year of commissioning](#7.8.1.2-Plot-efficiencies-by-year-of-commissioning)\n",
    "\t\t\t* [7.8.1.3 Determine least-squares approximation based on researched data (planned)](#7.8.1.3-Determine-least-squares-approximation-based-on-researched-data-%28planned%29)\n",
    "\t\t\t* [7.8.1.4 Apply efficiency approximation from least squares approximation (planned)](#7.8.1.4-Apply-efficiency-approximation-from-least-squares-approximation-%28planned%29)\n",
    "\t\t* [7.8.2 Efficiencies from literature](#7.8.2-Efficiencies-from-literature)\n",
    "\t\t\t* [7.8.2.1 Import data](#7.8.2.1-Import-data)\n",
    "\t\t\t* [7.8.2.2 Apply efficiency approximation from literature](#7.8.2.2-Apply-efficiency-approximation-from-literature)\n",
    "\t* [7.9 Add geodata](#7.9-Add-geodata)\n",
    "\t* [7.10 Allocate energy source levels](#7.10-Allocate-energy-source-levels)\n",
    "* [8. Define final output](#8.-Define-final-output)\n",
    "\t* [8.1 Round values](#8.1-Round-values)\n",
    "\t* [8.2 Verification](#8.2-Verification)\n",
    "\t\t* [8.2.1 Capacities by plant status](#8.2.1-Capacities-by-plant-status)\n",
    "\t\t* [8.2.2 Power plant age](#8.2.2-Power-plant-age)\n",
    "\t\t* [8.2.3 Block size vs year of commissioning](#8.2.3-Block-size-vs-year-of-commissioning)\n",
    "\t* [8.3 Logical checks](#8.3-Logical-checks)\n",
    "\t\t* [8.3.1 Every power plant needs a capacity](#8.3.1-Every-power-plant-needs-a-capacity)\n",
    "\t\t* [8.3.2 Commissioning Dates](#8.3.2-Commissioning-Dates)\n",
    "\t\t* [8.3.3 Compare UBA and BNetzA data](#8.3.3-Compare-UBA-and-BNetzA-data)\n",
    "\t\t\t* [8.3.3.1 Postcodes of BNetzA and UBA lists should match](#8.3.3.1-Postcodes-of-BNetzA-and-UBA-lists-should-match)\n",
    "\t\t\t* [8.3.3.2 Compare Installed capacities](#8.3.3.2-Compare-Installed-capacities)\n",
    "* [9. Documenting the data package (meta data)](#9.-Documenting-the-data-package-%28meta-data%29)\n",
    "* [10. Result export](#10.-Result-export)\n",
    "\t* [10.1 Write the results to file](#10.1-Write-the-results-to-file)\n",
    "\t* [10.2 Write checksums](#10.2-Write-checksums)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Prepare the environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Import all functions from external file\n",
    "from download_and_process_DE_functions import *\n",
    "\n",
    "# Jupyter functions\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Download settings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 Choose download location"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The original data can either be downloaded from the original data sources as specified below or from the opsd-Server. Default option is to download from the original sources as the aim of the project is to stay as close to original sources as possible. However, if problems with downloads e.g. due to changing urls occur, you can still run the script with the original data from the opsd_server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "download_from = 'original_sources'\n",
    "#download_from = 'opsd_server'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "if download_from == 'original_sources':\n",
    "    # BNetzA Power plant list\n",
    "    url_bnetza = ('http://www.bundesnetzagentur.de/SharedDocs/Downloads/DE/'\n",
    "              'Sachgebiete/Energie/Unternehmen_Institutionen/Versorgungssicherheit/'\n",
    "              'Erzeugungskapazitaeten/Kraftwerksliste/Kraftwerksliste_CSV.csv'\n",
    "              '?__blob=publicationFile&v=16')\n",
    "\n",
    "    # UBA Power plant list\n",
    "    url_uba = ('https://www.umweltbundesamt.de/sites/default/files/medien/372/dokumente/kraftwerke_de_ab_100_mw_0.xls')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "if download_from == 'opsd_server':\n",
    "    # Specify direction to original_data folder on the opsd data server\n",
    "    # BNetzA Power plant list\n",
    "    url_bnetza = 'http://data.open-power-system-data.org/conventional_power_plants/'\n",
    "    url_bnetza = url_bnetza + '2018-12-20'\n",
    "    url_bnetza = url_bnetza +'/original_data/Kraftwerksliste_CSV.csv'\n",
    "    \n",
    "    # UBA Power plant list\n",
    "    url_uba = 'http://data.open-power-system-data.org/conventional_power_plants/'\n",
    "    url_uba = url_uba + '2018-12-20'\n",
    "    url_uba = url_uba +'/original_data/kraftwerke_de_ab_100_mw_0.xls'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Define functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Functions used multiple times within this script are now located in a separate file called download_and_process_DE_functions.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. Downloads"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.1 Download the BNetzA power plant list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section downloads the BNetzA power plant list and converts it to a pandas data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist = getbnetzalist(url_bnetza)\n",
    "\n",
    "# clean unnamed columns\n",
    "plantlist.drop([c for c in plantlist.columns if 'Unnamed:' in c], axis=1, inplace=True)\n",
    "\n",
    "plantlist.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.2 Download the UBA Plant list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section downloads the power plant list from the German Federal Environment Agency (UBA) and converts it to a pandas data frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist_uba = getubalist(url_uba)\n",
    "plantlist_uba.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6. Translate contents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.1 Columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A dictionary with the original column names to the new column names is created. This dictionary is used to translate the column names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "dict_columns = {\n",
    "    'Kraftwerksnummer Bundesnetzagentur':\n",
    "        'id',\n",
    "    'Unternehmen':\n",
    "        'company',\n",
    "    'Kraftwerksname':\n",
    "        'name',\n",
    "    'PLZ\\n(Standort Kraftwerk)':\n",
    "        'postcode',\n",
    "    'Ort\\n(Standort Kraftwerk)':\n",
    "        'city',\n",
    "    'Straße und Hausnummer (Standort Kraftwerk)':\n",
    "        'street',\n",
    "    'Bundesland':\n",
    "        'state',\n",
    "    'Blockname':\n",
    "        'block',\n",
    "    ('Aufnahme der kommerziellen Stromerzeugung der derzeit in Betrieb '\n",
    "     'befindlichen Erzeugungseinheit\\n(Datum/Jahr)'):\n",
    "        'commissioned',\n",
    "    ('Kraftwerksstatus \\n(in Betrieb/\\nvorläufig stillgelegt/\\nsaisonale '\n",
    "     'Konservierung\\nReservekraftwerk/\\nSonderfall)'):\n",
    "        'status',\n",
    "    ('Kraftwerksstatus \\n(in Betrieb/\\nvorläufig stillgelegt/\\nsaisonale '\n",
    "     'Konservierung\\nGesetzlich an Stilllegung gehindert/\\nSonderfall)'):\n",
    "        'status',\n",
    "    ('Kraftwerksstatus \\n(in Betrieb/\\nvorläufig stillgelegt/\\nsaisonale '\n",
    "     'Konservierung\\nNetzreserve/ Sicherheitsbereitschaft/\\nSonderfall)'):\n",
    "        'status',\n",
    "    'Energieträger':\n",
    "        'fuel_basis',\n",
    "    ('Spezifizierung \"Mehrere Energieträger\" und \"Sonstige Energieträger\" - '\n",
    "     'Hauptbrennstoff'): 'fuel_multiple1',\n",
    "    'Spezifizierung \"Mehrere Energieträger\" - Zusatz- / Ersatzbrennstoffe':\n",
    "        'fuel_multiple2',\n",
    "    ('Auswertung\\nEnergieträger (Zuordnung zu einem Hauptenergieträger bei '\n",
    "     'Mehreren Energieträgern)'):\n",
    "        'fuel',\n",
    "    'Vergütungsfähig nach EEG\\n(ja/nein)':\n",
    "        'eeg',\n",
    "    'Wärmeauskopplung (KWK)\\n(ja/nein)':\n",
    "        'chp',\n",
    "    'Netto-Nennleistung (elektrische Wirkleistung) in MW':\n",
    "        'capacity',\n",
    "    ('Bezeichnung Verknüpfungspunkt (Schaltanlage) mit dem Stromnetz der '\n",
    "     'Allgemeinen Versorgung gemäß Netzbetreiber'):\n",
    "        'network_node',\n",
    "    'Netz- oder Umspannebene des Anschlusses in kV':\n",
    "        'voltage',\n",
    "    'Name Stromnetzbetreiber':\n",
    "        'network_operator',\n",
    "    'Kraftwerksname / Standort':\n",
    "        'uba_name',\n",
    "    'Betreiber ':\n",
    "        'uba_company',\n",
    "    'Standort-PLZ':\n",
    "        'uba_postcode',\n",
    "    'Kraftwerksstandort':\n",
    "        'uba_city',\n",
    "    'Elektrische Bruttoleistung (MW)':\n",
    "        'uba_capacity',\n",
    "    'Fernwärme-leistung (MW)':\n",
    "        'uba_chp_capacity',\n",
    "    'Inbetriebnahme  (ggf. Ertüchtigung)':\n",
    "        'uba_commissioned',\n",
    "    'Anlagenart':\n",
    "        'uba_technology',\n",
    "    'Primärenergieträger':\n",
    "        'uba_fuel',\n",
    "}\n",
    "plantlist.rename(columns=dict_columns, inplace=True)\n",
    "\n",
    "# Check if all columns have been translated\n",
    "for columnnames in plantlist.columns:\n",
    "    # if columnnames not in dict_columns.values():\n",
    "    if columnnames not in dict_columns.values():\n",
    "        logger.error(\"Untranslated column: \"+ columnnames)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.2 Fuel types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# first remove line breaks\n",
    "plantlist['fuel'] = plantlist['fuel'].str.replace('\\n', ' ')\n",
    "\n",
    "# Delete entries without fuel and name\n",
    "plantlist = plantlist.dropna(subset = ['fuel','name'])\n",
    "\n",
    "dict_fuels = {\n",
    "    'Steinkohle': 'Hard coal',\n",
    "    'Erdgas': 'Natural gas',\n",
    "    'Braunkohle': 'Lignite',\n",
    "    'Kernenergie': 'Nuclear',\n",
    "    'Pumpspeicher': 'Hydro PSP',\n",
    "    'Biomasse': 'Biomass and biogas',\n",
    "    'Mineralölprodukte': 'Oil',\n",
    "    'Laufwasser': 'Hydro',\n",
    "    'Sonstige Energieträger (nicht erneuerbar) ': 'Other fuels',\n",
    "    'Abfall': 'Waste',\n",
    "    'Speicherwasser (ohne Pumpspeicher)': 'Hydro reservoir',\n",
    "    'Unbekannter Energieträger (nicht erneuerbar)': 'Other fuels',\n",
    "    'Sonstige Energieträger (nicht erneuerbar)': 'Other fuels',\n",
    "    'Mehrere Energieträger (nicht erneuerbar)': 'Mixed fossil fuels',\n",
    "    'Deponiegas': 'Sewage and landfill gas',\n",
    "    'Windenergie (Onshore-Anlage)': 'Onshore',\n",
    "    'Windenergie (Onshore-Anlage)neu': 'Onshore',\n",
    "    'Windenergie (Offshore-Anlage)': 'Offshore',\n",
    "    'Solare Strahlungsenergie': 'Solar',\n",
    "    'Klärgas': 'Sewage and landfill gas',\n",
    "    'Geothermie': 'Geothermal',\n",
    "    'Grubengas': 'Other fossil fuels',\n",
    "    'Sonstige Speichertechnologien': 'Storage Technologies'\n",
    "}\n",
    "plantlist[\"fuel\"].replace(dict_fuels, inplace=True)\n",
    "\n",
    "# Check if all fuels have been translated\n",
    "for fuelnames in plantlist[\"fuel\"].unique():\n",
    "    if fuelnames not in dict_fuels.values():\n",
    "        print(dict_fuels.values(), fuelnames)\n",
    "        logger.error(\"Untranslated fuel: \" + fuelnames)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.3 Power plant status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "dict_plantstatus = {\n",
    "    'in Betrieb': 'operating',\n",
    "    'In Betrieb': 'operating',\n",
    "    'vorläufig stillgelegt': 'shutdown_temporary',\n",
    "    'Vorläufig stillgelegt': 'shutdown_temporary',\n",
    "    'Vorläufig Stillgelegt': 'shutdown_temporary',    \n",
    "    'Sonderfall': 'special_case',\n",
    "    'saisonale Konservierung': 'seasonal_conservation',\n",
    "    'Saisonale Konservierung': 'seasonal_conservation',    \n",
    "    'Reservekraftwerk':'reserve',\n",
    "    'Endgültig Stillgelegt 2011': 'shutdown_2011',\n",
    "    'Endgültig Stillgelegt 2012': 'shutdown_2012',\n",
    "    'Endgültig Stillgelegt 2013': 'shutdown_2013',\n",
    "    'Endgültig Stillgelegt 2014': 'shutdown_2014',\n",
    "    'Endgültig Stillgelegt 2015': 'shutdown_2015',\n",
    "    'Endgültig stillgelegt 2015': 'shutdown_2015',\n",
    "    'Endgültig Stillgelegt 2016': 'shutdown_2016',\n",
    "    'Gesetzlich an Stilllegung gehindert': 'operating',    \n",
    "    'Endgültig Stillgelegt 2011 (ohne StA)': 'shutdown_2011',\n",
    "    'Endgültig Stillgelegt 2012 (ohne StA)': 'shutdown_2012',\n",
    "    'Endgültig Stillgelegt 2013 (mit StA)': 'shutdown_2013',\n",
    "    'Endgültig Stillgelegt 2013 (ohne StA)': 'shutdown_2013',\n",
    "    'Endgültig Stillgelegt 2014 (mit StA)': 'shutdown_2014',\n",
    "    'Endgültig Stillgelegt 2014 (ohne StA)': 'shutdown_2014',\n",
    "    'Endgültig Stillgelegt 2015 (mit StA)': 'shutdown_2015',\n",
    "    'Endgültig Stillgelegt 2015 (ohne StA)': 'shutdown_2015',\n",
    "    'Endgültig Stillgelegt 2016 (mit StA)': 'shutdown_2016',\n",
    "    'Sicherheitsbereitschaft': 'reserve',\n",
    "    'Vorläufig Stillgelegt (mit StA)': 'shutdown_temporary',\n",
    "    'Vorläufig Stillgelegt (ohne StA)': 'shutdown_temporary',\n",
    "    'Endgültig Stillgelegt 2016 (ohne StA)': 'shutdown_2016',\n",
    "    'Endgültig Stillgelegt 2017 (mit StA)' : 'shutdown_2017',\n",
    "    'Endgültig Stillgelegt 2017 (ohne StA)': 'shutdown_2017',\n",
    "    'Endgültig Stillgelegt 2018 (mit StA)' : 'shutdown_2018',\n",
    "    'Endgültig Stillgelegt 2018 (ohne StA)': 'shutdown_2018',\n",
    "    'gesetzlich an Stilllegung gehindert' : 'operating',\n",
    "    'Netzreserve' : 'reserve',\n",
    "}  \n",
    "plantlist['status'].replace(dict_plantstatus, inplace=True) \n",
    "\n",
    "# Check if all fuels have been translated\n",
    "for statusnames in plantlist['status'].unique():\n",
    "    if statusnames not in dict_plantstatus.values():\n",
    "        logger.error('Untranslated plant status: '+ statusnames)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.4 CHP Capability"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "dict_yesno ={\n",
    "    'Nein': 'no',\n",
    "    'nein': 'no',\n",
    "    'Ja': 'yes',\n",
    "    'ja': 'yes',    \n",
    "}\n",
    "plantlist['chp'].replace(dict_yesno, inplace=True)\n",
    "\n",
    "# Check if all fuels have been translated\n",
    "for chpnames in plantlist['chp'].unique():\n",
    "    if (chpnames not in dict_yesno.values()) & (str(chpnames) != 'nan'):\n",
    "        logger.error('Untranslated chp capability: ' + str(chpnames))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.5 EEG"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "plantlist['eeg'].replace(dict_yesno, inplace=True)\n",
    "\n",
    "# Check if all fuels have been translated\n",
    "for eegnames in plantlist['eeg'].unique():\n",
    "    if (eegnames not in dict_yesno.values()) & (str(eegnames) != 'nan'):\n",
    "        logger.error('Untranslated EEG type: ' + str(eegnames))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.6 UBA Columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Translate the UBA Column names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "dict_uba_columns = {\n",
    "    'Kraftwerksname / Standort': 'uba_name',\n",
    "    'Betreiber ': 'uba_company',\n",
    "    'Standort-PLZ': 'uba_postcode',\n",
    "    'Kraftwerksstandort': 'uba_city',\n",
    "    'Elektrische Bruttoleistung (MW)': 'uba_capacity',\n",
    "    'Fernwärme-leistung (MW)': 'uba_chp_capacity',\n",
    "    'Inbetriebnahme  (ggf. Ertüchtigung)': 'uba_commissioned',\n",
    "    'Anlagenart': 'uba_technology',\n",
    "    'Primärenergieträger': 'uba_fuel',\n",
    "    'Bundesland':'uba_state',\n",
    "}\n",
    "plantlist_uba.rename(columns=dict_uba_columns, inplace=True)\n",
    "\n",
    "# Check if all columns have been translated\n",
    "for columnnames in plantlist_uba.columns:\n",
    "    if columnnames not in dict_uba_columns.values():\n",
    "        logger.error('Untranslated column: ' + columnnames)\n",
    "        \n",
    "# Prepare for matching\n",
    "plantlist_uba['uba_id_string'] = (plantlist_uba['uba_name'] \n",
    "                                  + '_' \n",
    "                                  + plantlist_uba['uba_fuel'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7. Process data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.1 Set index to the BNetzA power plant ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set Index of BNetzA power plant list to Kraftwerksnummer_Bundesnetzagentur\n",
    "plantlist['bnetza_id'] = plantlist['id']\n",
    "plantlist = plantlist.set_index('id')\n",
    "\n",
    "# remove line breaks in some columns\n",
    "plantlist['network_node'] = plantlist['network_node'].str.replace('\\n', ' ')\n",
    "plantlist['company'] = plantlist['company'].str.replace('\\n', ' ')\n",
    "plantlist['name'] = plantlist['name'].str.replace('\\n', ' ')\n",
    "plantlist['fuel'] = plantlist['fuel'].str.replace('\\n', ' ')\n",
    "plantlist['block'] = plantlist['block'].str.replace('\\n', ' ')\n",
    "plantlist['network_operator'] = plantlist['network_operator'].str.replace('\\n', ' ')\n",
    "plantlist['street'] = plantlist['street'].str.replace('\\n', ' ')\n",
    "plantlist['commissioned'] = plantlist['commissioned'].str.replace('\\n', ' ')\n",
    " \n",
    "plantlist.head() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Manual adjustments:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "plantlist.loc[plantlist['bnetza_id'] == 'BNA0834', 'fuel'] = 'Natural gas'\n",
    "plantlist.loc[plantlist['bnetza_id'] == 'BNA0662a', 'fuel'] = 'Hard coal'\n",
    "plantlist.loc[plantlist['bnetza_id'] == 'BNA0662b', 'fuel'] = 'Hard coal'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.2 Merge data from UBA List"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section a hand-researched list is used to match the power plants from the UBA list to the BNetzA list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read matching list\n",
    "matchinglist = getmatchinglist()\n",
    "matchinglist.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.1 case 1-1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Matching: 1-1 One BNetzA ID to one UBA-ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "match1t1 = matchinglist[\n",
    "    (matchinglist.duplicated(subset=['uba_id_string'], keep=False) == False) \n",
    "    & (matchinglist.duplicated(subset=['ID BNetzA'], keep=False) == False)]\n",
    "match1t1 = pd.merge(match1t1, plantlist_uba,\n",
    "                    left_on='uba_id_string', \n",
    "                    right_on='uba_id_string',\n",
    "                    how='left')\n",
    "match1t1 = match1t1.set_index('ID BNetzA')\n",
    "\n",
    "#Add comment\n",
    "match1t1['merge_comment'] = ('List matching type: Single UBA power plant '\n",
    "                             'assigned to single BNetzA power plant')\n",
    "\n",
    "match1t1.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.2 case n-1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Match multiple BNetza IDs to one UBA ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Matching structure (example): \n",
    "# bnetza_id uba_id\n",
    "# 1         1\n",
    "# 2         1\n",
    "# 3         1\n",
    "# 4         2\n",
    "# 5         2\n",
    "\n",
    "# Get relevant entries from the matchinglist and merge the corresponding \n",
    "# UBA Data to the list.\n",
    "matchnt1= matchinglist[\n",
    "    (matchinglist.duplicated(subset=['uba_id_string'], keep=False) == True)\n",
    "    & (matchinglist.duplicated(subset=['ID BNetzA'], keep=False)== False)]\n",
    "matchnt1 = pd.merge(matchnt1, plantlist_uba,\n",
    "                    left_on='uba_id_string', right_on='uba_id_string', how='left')\n",
    "matchnt1 = matchnt1.set_index('ID BNetzA')\n",
    "\n",
    "# Import BNetzA Capacities and CHP criterion into matchnt1 dataframe\n",
    "plantlist_capacities = pd.DataFrame(plantlist[['capacity', 'chp']]).rename(\n",
    "    columns={'capacity': 'capacity_bnetza', 'chp': 'chp_bnetza'})\n",
    "matchnt1 = pd.merge(matchnt1, plantlist_capacities,\n",
    "                    left_index=True, right_index=True, how='left')\n",
    "\n",
    "# Get sum of BNetzA Capacitites for each UBA Index and merge into matchnt1 dataframe\n",
    "plantlist_uba_capacitysum = pd.DataFrame(\n",
    "    matchnt1.groupby('uba_id_string').sum()['capacity_bnetza']).rename(\n",
    "        columns={'capacity_bnetza': 'capacity_bnetza_aggregate'})\n",
    "matchnt1 = pd.merge(matchnt1, plantlist_uba_capacitysum,\n",
    "                    left_on='uba_id_string', right_index=True, how='left')\n",
    "\n",
    "# Scale UBA Capacities based BNetzA Data\n",
    "matchnt1['uba_capacity_scaled'] = (matchnt1['uba_capacity']\n",
    "                                   * matchnt1['capacity_bnetza']\n",
    "                                   / matchnt1['capacity_bnetza_aggregate'])\n",
    "\n",
    "# determine sum of capacities with chp capability and add to matchnt1\n",
    "plantlist_uba_chp_capacities = matchnt1[(matchnt1['chp_bnetza'] == 'yes')]\n",
    "plantlist_uba_chp_capacitysum = pd.DataFrame(\n",
    "    plantlist_uba_chp_capacities.groupby('uba_id_string')\n",
    "    .sum()['capacity_bnetza']) \n",
    "plantlist_uba_chp_capacitysum = plantlist_uba_chp_capacitysum.rename(\n",
    "    columns={'capacity_bnetza': 'capacity_bnetza_with_chp'})\n",
    "\n",
    "matchnt1 = pd.merge(matchnt1, plantlist_uba_chp_capacitysum,\n",
    "                    left_on='uba_id_string', right_index=True, how='left',)\n",
    "\n",
    "matchnt1['uba_chp_capacity'] = pd.to_numeric(matchnt1['uba_chp_capacity'], errors='coerce')\n",
    "\n",
    "\n",
    "matchnt1['uba_chp_capacity_scaled'] = (matchnt1['uba_chp_capacity']\n",
    "                                       * matchnt1['capacity_bnetza']\n",
    "                                       / matchnt1['capacity_bnetza_with_chp'])\n",
    "\n",
    "# Change column names for merge later on\n",
    "matchnt1['uba_chp_capacity_original'] = matchnt1['uba_chp_capacity']\n",
    "matchnt1['uba_chp_capacity'] = matchnt1['uba_chp_capacity_scaled']\n",
    "matchnt1['uba_capacity_original'] = matchnt1['uba_capacity']\n",
    "matchnt1['uba_capacity'] = matchnt1['uba_capacity_scaled']\n",
    "\n",
    "#Add comment\n",
    "matchnt1['merge_comment'] = ('List matching type: UBA capacity distributed '\n",
    "                             'proportionally to multiple BNetzA power plants')\n",
    "\n",
    "matchnt1.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.3 case 1-n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1-n Case here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The resulting DataFrame should be called \"match1tn\"\n",
    "# Matching structure: \n",
    "# bnetza_id uba_id\n",
    "# 1         1\n",
    "# 1         2\n",
    "# 1         3\n",
    "# 2         4\n",
    "# 2         5\n",
    "\n",
    "# Get relevant entries from the matchinglist and merge the corresponding UBA Data to the list.\n",
    "match1tn= matchinglist[\n",
    "    (matchinglist.duplicated(subset=['ID BNetzA'], keep=False) == True) & \n",
    "    (matchinglist.duplicated(subset=['uba_id_string'], keep=False)== False)]\n",
    "match1tn = pd.merge(match1tn, plantlist_uba,\n",
    "                    left_on='uba_id_string', right_on='uba_id_string', how='left')\n",
    "match1tn = match1tn.set_index('ID BNetzA')\n",
    "match1tn.head()\n",
    "\n",
    "# Import BNetzA Capacities and CHP criterion into match1tn dataframe\n",
    "plantlist_capacities = pd.DataFrame(plantlist[['capacity','chp']]).rename(\n",
    "    columns = {'capacity': 'capacity_bnetza', 'chp': 'chp_bnetza'})\n",
    "match1tn = pd.merge(match1tn, plantlist_capacities,\n",
    "                    left_index=True, right_index=True, how='left')\n",
    "match1tn.index.names=['ID BNetzA']\n",
    "match1tn.head()\n",
    "\n",
    "# Get sum of UBA Capacitites per BNetzA Index and merge to match1tn dataframe\n",
    "plantlist_bnetza_capacitysum = pd.DataFrame(\n",
    "    match1tn.groupby(match1tn.index).sum()['uba_capacity'])\n",
    "plantlist_bnetza_capacitysum = plantlist_bnetza_capacitysum.rename(\n",
    "    columns={'uba_capacity':'uba_capacity_aggregate'})\n",
    "match1tn = pd.merge(match1tn, plantlist_bnetza_capacitysum,\n",
    "                    left_index=True, right_index=True, how='left')\n",
    "\n",
    "match1tn['uba_chp_capacity'] = pd.to_numeric(match1tn['uba_chp_capacity'], errors='coerce')\n",
    "match1tn\n",
    "\n",
    "\n",
    "# Get sum of UBA CHP Capacities per BNetzA Index and merge to match1tn dataframe\n",
    "plantlist_bnetza_chp_capacitysum = pd.DataFrame(\n",
    "    match1tn.groupby(match1tn.index).sum()['uba_chp_capacity'])\n",
    "plantlist_bnetza_chp_capacitysum = plantlist_bnetza_chp_capacitysum.rename(\n",
    "    columns={'uba_chp_capacity': 'uba_chp_capacity_aggregate'})\n",
    "match1tn = pd.merge(match1tn, plantlist_bnetza_chp_capacitysum,\n",
    "                    left_index=True, right_index=True, how='left')\n",
    "\n",
    "# Get UBA Technology for each BNetzA Index and merge into match1tn dataframe \n",
    "## Option 1: Take all technologies and merge them\n",
    "#match1tn['uba_technology_aggregate'] = pd.DataFrame(\n",
    "#    match1tn.groupby(match1tn.index)\n",
    "#    .transform(lambda x: ', '.join(x))['uba_technology'])\n",
    "## Option 2 (currently preferred): Take technology with highest occurence\n",
    "match1tn['uba_technology_aggregate'] = pd.DataFrame(\n",
    "    match1tn.groupby(match1tn.index)['uba_technology']\n",
    "    .agg(lambda x: x.value_counts().index[0]))\n",
    "\n",
    "# Get UBA Plant name\n",
    "match1tn['uba_name_aggregate'] = pd.DataFrame(\n",
    "    match1tn.groupby(match1tn.index).transform(lambda x: ', '.join(x))['uba_name'])\n",
    "\n",
    "# Get UBA company name\n",
    "match1tn['uba_company_aggregate'] = pd.DataFrame(\n",
    "    match1tn.groupby(match1tn.index)['uba_company']\n",
    "    .agg(lambda x:x.value_counts().index[0]))\n",
    "\n",
    "# Change column names for merge later on\n",
    "match1tn = match1tn.rename(\n",
    "    columns={'uba_chp_capacity': 'uba_chp_capacity_original',\n",
    "             'uba_capacity': 'uba_capacity_original',\n",
    "             'uba_chp_capacity_aggregate': 'uba_chp_capacity',\n",
    "             'uba_capacity_aggregate': 'uba_capacity'})\n",
    "\n",
    "#Add comment\n",
    "match1tn['merge_comment'] = ('List matching type: Multiple UBA capacities '\n",
    "                             'aggregated to single BNetzA power plant')\n",
    "\n",
    "# Drop duplicate rows and keep first entry\n",
    "match1tn = match1tn.reset_index().drop_duplicates(subset='ID BNetzA',keep='first').set_index('ID BNetzA')\n",
    "\n",
    "match1tn.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.4 Merge into plantlist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Merge the UBA DataFrames\n",
    "# Merge first two dataframes\n",
    "plantlist_uba_for_merge = match1t1.append(matchnt1)\n",
    "\n",
    "# Add third dataframe\n",
    "plantlist_uba_for_merge = plantlist_uba_for_merge.append(match1tn)\n",
    "\n",
    "# Merge plantlist_uba_for_merge into the plantlist\n",
    "plantlist = pd.merge(plantlist, plantlist_uba_for_merge,\n",
    "                     left_index=True, right_index=True, how='left')\n",
    "\n",
    "plantlist.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.3 Delete fuels not in focus"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, solar, wind onshore. and wind offshore technologies are deleted from the list, as they are handled by another datapackage. Furthermore, aggregate values are excluded as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Delete solar, wind onshore, and wind offshore\n",
    "plantlist = plantlist[(plantlist['fuel'] != 'Solar') \n",
    "                       & (plantlist['fuel'] != 'Onshore') \n",
    "                       & (plantlist['fuel'] != 'Offshore')]\n",
    "\n",
    "# Delete aggregate values\n",
    "plantlist = plantlist[(plantlist['company'] != 'EEG-Anlagen < 10 MW') \n",
    "                       & (plantlist['company'] != 'Nicht-EEG-Anlagen < 10 MW')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.4 Add Columns for shutdown and retrofit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extract the year when plants were shutdown or retrofit, using regular expressions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add columns with empty data\n",
    "plantlist['shutdown'] = 'NaN'\n",
    "\n",
    "plantlist['shutdown'] = pd.to_numeric(\n",
    "    plantlist['status'].str.extract('[\\w].+(\\d\\d\\d\\d)', expand=False), \n",
    "    errors='coerce')\n",
    "plantlist.loc[plantlist['shutdown'] > 0, 'status'] = 'shutdown'\n",
    "\n",
    "# Fill retrofit data column\n",
    "# Identify restrofit dates in UBA list\n",
    "plantlist['retrofit'] = pd.to_numeric(\n",
    "    plantlist['uba_commissioned'].str.extract('[(.+](\\d\\d\\d\\d)', expand=False), \n",
    "    errors='coerce')\n",
    "\n",
    "# Split multiple commissioning dates as listed in UBA\n",
    "plantlist['uba_commissioned_1'] = pd.to_numeric(\n",
    "    plantlist['uba_commissioned'].str.extract('(\\d\\d\\d\\d)', expand=False), \n",
    "    errors='coerce')\n",
    "plantlist.loc[plantlist['uba_commissioned_1'].isnull(), 'uba_commissioned_1'] = pd.to_numeric(\n",
    "    plantlist['uba_commissioned'].str.extract('(\\d\\d\\d\\d).+[\\w]', expand=False), \n",
    "    errors='coerce')\n",
    "plantlist['uba_commissioned_2'] = pd.to_numeric(\n",
    "    plantlist['uba_commissioned'].str.extract('[\\w].+(\\d\\d\\d\\d).+[\\w]', expand=False),\n",
    "    errors='coerce')\n",
    "plantlist['uba_commissioned_3'] = pd.to_numeric(\n",
    "    plantlist['uba_commissioned'].str.extract('[\\w].+(\\d\\d\\d\\d)', expand=False),\n",
    "    errors='coerce')\n",
    "\n",
    "plantlist.loc[plantlist['retrofit'] == plantlist['uba_commissioned_1'], 'uba_commissioned_1'] = ''\n",
    "plantlist.loc[plantlist['retrofit'] == plantlist['uba_commissioned_2'], 'uba_commissioned_2'] = ''\n",
    "plantlist.loc[plantlist['retrofit'] == plantlist['uba_commissioned_3'], 'uba_commissioned_3'] = ''\n",
    "\n",
    "# Split multiple commissioning dates as listed in BNetzA\n",
    "plantlist['commissioned_1'] = pd.to_numeric(\n",
    "    plantlist['commissioned'].str.extract('(\\d\\d\\d\\d)', expand=False), \n",
    "    errors='coerce')\n",
    "plantlist.loc[plantlist['commissioned_1'].isnull(), 'commissioned_1'] = pd.to_numeric(\n",
    "    plantlist['commissioned'].str.extract('(\\d\\d\\d\\d).+[\\w]', expand=False),\n",
    "    errors='coerce')\n",
    "plantlist['commissioned_2'] = pd.to_numeric(\n",
    "    plantlist['commissioned'].str.extract('[\\w].+(\\d\\d\\d\\d).+[\\w]', expand=False),\n",
    "    errors='coerce')\n",
    "plantlist['commissioned_3'] = pd.to_numeric(\n",
    "    plantlist['commissioned'].str.extract('[\\w].+(\\d\\d\\d\\d)', expand=False),\n",
    "    errors='coerce')\n",
    "\n",
    "# Show plantlist\n",
    "plantlist[plantlist['status'] == 'shutdown']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.5 Convert input colums to usable data types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist['capacity_float'] = pd.to_numeric(\n",
    "    plantlist['capacity'], \n",
    "    errors='coerce')\n",
    "plantlist['commissioned_float'] = pd.to_numeric(\n",
    "    plantlist[['commissioned','commissioned_1','commissioned_2','commissioned_3']].max(axis=1),\n",
    "    errors='coerce')\n",
    "plantlist['retrofit_float'] = pd.to_numeric(\n",
    "    plantlist['retrofit'],\n",
    "    errors='coerce')\n",
    "plantlist.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.6 Identify generation technology"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.6.1 Process technology information from UBA list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Technologies describes the turbine specification etc., and \"type\" determines how the plant is used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Split uba_technology information into technology (GT, CC,...) and type (HKW, IKW, ...)\n",
    "plantlist['technology'] = plantlist['uba_technology']\n",
    "plantlist['type'] = plantlist['uba_technology']\n",
    "\n",
    "dict_technology = {\n",
    "    'GT': 'Gas turbine',\n",
    "    'DT': 'Steam turbine',\n",
    "    'GT / DT': 'Combined cycle',\n",
    "    'GuD': 'Combined cycle',\n",
    "    'DKW': 'Steam turbine',\n",
    "    'LWK': 'Run-of-river',\n",
    "    'PSW': 'Pumped storage',\n",
    "    'DWR': 'Steam turbine',  #Pressurized water reactor\n",
    "    'G/AK': 'Gas turbine',  #GT with heat recovery\n",
    "    'SWR': 'Steam turbine',  #boiling water reactor\n",
    "    'SWK': 'Reservoir',  #storage power plant\n",
    "    'SSA': '',  #bus bar\n",
    "    'HKW (DT)': 'Steam turbine',\n",
    "    'HKW / GuD': 'Combined cycle',\n",
    "    'GuD / HKW': 'Combined cycle',\n",
    "    'IKW / GuD': 'Combined cycle',\n",
    "    'IKW /GuD': 'Combined cycle',\n",
    "    'HKW / SSA': '',\n",
    "    'IKW / SSA': '',\n",
    "    'HKW': '',\n",
    "    'IKW': '',\n",
    "    'IKW / HKW': '',\n",
    "    'WEA': 'WT',\n",
    "    'IKW / HKW / GuD' : 'Combined cycle',\n",
    "}\n",
    "plantlist['technology'].replace(dict_technology, inplace=True)\n",
    "plantlist['technology'].unique()\n",
    "\n",
    "# Check if all technologies have been translated\n",
    "for technology in plantlist['technology'].unique():\n",
    "    if (technology not in dict_technology.values()) & (str(technology) != 'nan'):\n",
    "        logger.error('Untranslated technology: ' + str(technology))\n",
    "\n",
    "# Translate types\n",
    "dict_type = {\n",
    "    'HKW': 'CHP',  #thermal power plant,\n",
    "    'HKW (DT)': 'CHP',\n",
    "    'IKW': 'IPP',  #industrial power plant         \n",
    "    'HKW / GuD': 'CHP',\n",
    "    'GuD / HKW': 'CHP',\n",
    "    'IKW / GuD': 'IPP',\n",
    "    'IKW /GuD': 'IPP',\n",
    "    'IKW / SSA': 'IPP',\n",
    "    'HKW / SSA': 'CHP',\n",
    "    'IKW / HKW': 'CHP',\n",
    "    'GT': '',\n",
    "    'DT': '',\n",
    "    'GT / DT': '',\n",
    "    'GuD': '',\n",
    "    'DKW': '',\n",
    "    'LWK': '',\n",
    "    'PSW': '',\n",
    "    'DWR': '',  #Pressurized water reactor\n",
    "    'G/AK': 'CHP',  #GT with heat recovery\n",
    "    'SWR': '',  #boiling water reactor\n",
    "    'SWK': '',  #storage power plant\n",
    "    'SSA': '', \n",
    "    'WEA': '',\n",
    "    'IKW / HKW / GuD' : 'IPP',\n",
    "}\n",
    "plantlist['type'].replace(dict_type, inplace=True)\n",
    "plantlist['type'].unique()\n",
    "\n",
    "# Check if all types have been translated\n",
    "for type in plantlist['type'].unique():\n",
    "    if (type not in dict_type.values()) & (str(type) != 'nan'):\n",
    "        logger.error('Untranslated type: ' + str(type))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.6.2 Identify generation technology based on BNetzA information"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Set technology based on fuels\n",
    "\n",
    "plantlist.loc[(plantlist['fuel'] == 'Nuclear') & ((plantlist['technology'] == '') | (\n",
    "    plantlist['technology'].isnull())), 'technology'] = 'Steam turbine'\n",
    "plantlist.loc[(plantlist['fuel'] == 'Lignite') & ((plantlist['technology'] == '') | (\n",
    "    plantlist['technology'].isnull())), 'technology'] = 'Steam turbine'\n",
    "plantlist.loc[(plantlist['fuel'] == 'Hard Coal') & ((plantlist['technology'] == '') | (\n",
    "    plantlist['technology'].isnull())), 'technology'] = 'Steam turbine'\n",
    "plantlist.loc[(plantlist['fuel'] == 'Hard coal') & ((plantlist['technology'] == '') | (\n",
    "    plantlist['technology'].isnull())), 'technology'] = 'Steam turbine'\n",
    "plantlist.loc[(plantlist['fuel'] == 'Hydro') & ((plantlist['technology'] == '') | (\n",
    "    plantlist['technology'].isnull())), 'technology'] = 'Run-of-river'\n",
    "plantlist.loc[(plantlist['fuel'] == 'Hydro PSP') & ((plantlist['technology'] == '') | (\n",
    "    plantlist['technology'].isnull())), 'technology'] = 'Pumped storage'\n",
    "plantlist.loc[(plantlist['fuel'] == 'Hydro PSP'), 'fuel'] = 'Hydro'\n",
    "plantlist.loc[(plantlist['fuel'] == 'Hydro reservoir') & ((plantlist['technology'] == '') | (\n",
    "    plantlist['technology'].isnull())), 'technology'] = 'RES'\n",
    "plantlist.loc[(plantlist['fuel'] == 'Hydro reservoir'), 'fuel'] = 'Hydro'\n",
    "plantlist.loc[(plantlist['fuel'] == 'reservoir') & ((plantlist['technology'] == '') | (\n",
    "    plantlist['technology'].isnull())), 'technology'] = 'RES'\n",
    "\n",
    "# Set technology based on name and block information combined with fuels (e.g. combined-cycle, gas turbine)\n",
    "# Define technology CC as combination of GT and DT\n",
    "plantlist.loc[((plantlist['name'].str.contains(\"GT\")) | (plantlist['block'].str.contains(\"GT\")))\n",
    "              & ((plantlist['name'].str.contains(\"DT\")) | (plantlist['block'].str.contains(\"DT\")))\n",
    "              & ((plantlist['technology'] == '') | (plantlist['technology'].isnull())), 'technology'] = 'Combined cycle'\n",
    "# Define technology CC if specified as GuD\n",
    "plantlist.loc[((plantlist['name'].str.contains(\"GuD\")) | (plantlist['block'].str.contains(\"GuD\"))\n",
    "               | (plantlist['name'].str.contains(\"GUD\")) | (plantlist['name'].str.contains(\"GUD\")))\n",
    "              & ((plantlist['technology'] == '') | (plantlist['technology'].isnull())), 'technology'] = 'Combined cycle'\n",
    "# Define technology GT\n",
    "plantlist.loc[((plantlist['name'].str.contains(\"GT\"))\n",
    "               | (plantlist['block'].str.contains(\"GT\"))\n",
    "               | (plantlist['name'].str.contains(\"Gasturbine\"))\n",
    "               | (plantlist['block'].str.contains(\"Gasturbine\")))\n",
    "              & ((plantlist['technology'] == '') | (plantlist['technology'].isnull())), 'technology'] = 'Gas turbine'\n",
    "# Define technology ST\n",
    "plantlist.loc[((plantlist['name'].str.contains(\"DT\"))\n",
    "               | (plantlist['block'].str.contains(\"DT\"))\n",
    "               | (plantlist['name'].str.contains(\"Dampfturbine\"))\n",
    "               | (plantlist['block'].str.contains(\"Dampfturbine\"))\n",
    "               | (plantlist['name'].str.contains(\"Dampfkraftwerk\"))\n",
    "               | (plantlist['block'].str.contains(\"Dampfkraftwerk\"))\n",
    "               | (plantlist['name'].str.contains(\"DKW\"))\n",
    "               | (plantlist['block'].str.contains(\"DKW\")))\n",
    "              & ((plantlist['technology'] == '') | (plantlist['technology'].isnull())), 'technology'] = 'Steam turbine'\n",
    "# Define technology CB\n",
    "plantlist.loc[((plantlist['name'].str.contains(\"motor\"))\n",
    "               | (plantlist['block'].str.contains(\"motor\"))\n",
    "               | (plantlist['name'].str.contains(\"Motor\"))\n",
    "               | (plantlist['block'].str.contains(\"Motor\")))\n",
    "              & ((plantlist['technology'] == '') | (plantlist['technology'].isnull())), 'technology'] = 'Combustion Engine'\n",
    "\n",
    "# Identify stroage technologies\n",
    "\n",
    "plantlist.loc[(plantlist['fuel'] == 'Other fuels') & ((plantlist[\n",
    "    'fuel_basis'] == 'Sonstige Speichertechnologien') & (plantlist['technology'].isnull())), 'technology'] = 'Storage technologies'\n",
    "\n",
    "\n",
    "# Set technology ST for all technologies which could not be identified\n",
    "plantlist.loc[((plantlist['technology'] == '')\n",
    "               | (plantlist['technology'].isnull())), 'technology'] = 'Steam turbine'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.7 Add country code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some power plants are in Austria, Switzerland, or Luxembourg. As they are sometimes part of the German electricity system, they are included here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Add country Code\n",
    "plantlist['country_code'] = plantlist['state']\n",
    "dict_state_country = {\n",
    "    'Brandenburg': 'DE',\n",
    "    'Baden-Württemberg': 'DE',\n",
    "    'Niedersachsen': 'DE',\n",
    "    'Bayern': 'DE',\n",
    "    'Mecklenburg-Vorpommern': 'DE',\n",
    "    'Sachsen-Anhalt': 'DE',\n",
    "    'Hessen': 'DE',\n",
    "    'Nordrhein-Westfalen': 'DE',\n",
    "    'Berlin': 'DE',\n",
    "    'Saarland': 'DE',\n",
    "    'Thüringen': 'DE',\n",
    "    'Sachsen': 'DE',\n",
    "    'Bremen': 'DE',\n",
    "    'Schleswig-Holstein': 'DE',\n",
    "    'Hamburg': 'DE',\n",
    "    'Rheinland-Pfalz': 'DE',\n",
    "    'Österreich': 'AT',\n",
    "    'Luxemburg': 'LU',\n",
    "    'Schweiz': 'CH',\n",
    "}\n",
    "plantlist['country_code'].replace(dict_state_country, inplace=True)\n",
    "\n",
    "# Check if all types have been translated\n",
    "for plant_type in plantlist['country_code'].unique():\n",
    "    if (plant_type not in dict_state_country.values()) & (str(plant_type) != 'nan'):\n",
    "        logger.error('Untranslated type: ' + str(plant_type))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.8 Add efficiency data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.8.1 Efficiencies from research"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This sections adds efficiency data. These values have been researched by hand.  \n",
    "\n",
    "The source of each value is given in the column \"efficiency_source\". \n",
    "\n",
    "Additionally, a rating of the source has been done starting from A (e.g. website of the power plants operator) to C (e.g. Article in local newspaper)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7.8.1.1 Import data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Efficiencies\n",
    "data_efficiencies_bnetza = pd.read_csv(os.path.join('input', 'input_efficiency_de.csv'),\n",
    "                                     sep=',',  # CSV field separator, default is ','\n",
    "                                     decimal='.',  # Decimal separator, default is '.')\n",
    "                                     index_col='id',\n",
    "                                     encoding='utf8')\n",
    "data_efficiencies_bnetza['efficiency_net'] = pd.to_numeric(\n",
    "    data_efficiencies_bnetza['efficiency_net'],\n",
    "    errors='coerce') \n",
    "\n",
    "data_efficiencies_bnetza = data_efficiencies_bnetza.dropna(subset=['efficiency_net'])\n",
    "\n",
    "plantlist = pd.merge(\n",
    "    plantlist, \n",
    "    data_efficiencies_bnetza, \n",
    "    left_index=True, \n",
    "    right_index=True, \n",
    "    how='left')\n",
    "\n",
    "plantlist.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7.8.1.2 Plot efficiencies by year of commissioning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist.iloc[:,6:-1].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plantlist_for_efficiency_analysis = plantlist\n",
    "plantlist_for_efficiency_analysis = plantlist_for_efficiency_analysis.dropna(subset=['efficiency_net'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot efficiencies for lignite, coal, oil, and natural gas\n",
    "\n",
    "fuel_for_plot = ['Lignite', 'Hard coal', 'Oil', 'Natural gas']\n",
    "col_dict = {'Lignite': 'brown', 'Hard coal': 'grey', 'Oil': 'k', 'Natural gas': 'orange'}\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(16,8))\n",
    "\n",
    "for fuels in fuel_for_plot:\n",
    "     \n",
    "    sub_df = plantlist_for_efficiency_analysis[plantlist_for_efficiency_analysis.fuel == fuels]\n",
    "    \n",
    "    if len(sub_df['efficiency_net']) > 10:\n",
    "        x = np.array(sub_df['commissioned_float'].astype(int))\n",
    "        fit = np.polyfit(x, sub_df['efficiency_net'], deg=1)\n",
    "        ax.plot(x, fit[0]*x + fit[1],  color=col_dict[fuels])\n",
    "    \n",
    "    sub_df.plot(ax=ax,\n",
    "                kind='scatter',\n",
    "                x='commissioned_float',\n",
    "                y='efficiency_net',\n",
    "                c=col_dict[fuels],\n",
    "                label=fuels)                                                      "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.8.2 Efficiencies from literature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Jonas Egerer, Clemens Gerbaulet, Richard Ihlenburg, Friedrich Kunz, Benjamin Reinhard, Christian von Hirschhausen, Alexander Weber, Jens Weibezahn (2014): **Electricity Sector Data for Policy-Relevant Modeling: Data Documentation and Applications to the German and European Electricity Markets**. DIW Data Documentation 72, Berlin, Germany."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7.8.2.1 Import data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For each energy source - technology combination two values are read, to be applied as a linear approximation based on the year of commissioning. Therefore, the efficiency is made up of the efficiency_intercept (the efficiency at \"year zero\") plus the efficiency_slope multiplied by the year of commissioning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_efficiencies_literature = pd.read_csv(os.path.join('input','input_efficiency_literature_by_fuel_technology.csv'),\n",
    "                                     sep=',',  # CSV field separator, default is ','\n",
    "                                     decimal='.',  # Decimal separator, default is '.')  \n",
    "                                     encoding='utf8')\n",
    "#data_efficiencies_literature['technology'] = data_efficiencies_literature['technology'].str.upper()\n",
    "data_efficiencies_literature = data_efficiencies_literature.set_index(['fuel','technology'])\n",
    "data_efficiencies_literature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7.8.2.2 Apply efficiency approximation from literature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist = plantlist.join(data_efficiencies_literature,on=['fuel','technology'])\n",
    "plantlist['efficiency_literature'] = plantlist['efficiency_intercept'] + plantlist['efficiency_slope']*plantlist[['commissioned_float','retrofit_float']].max(axis=1)\n",
    "plantlist.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.9 Add geodata and EIC Codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The locations of power plants have been researched manually, these are now added to the output. Checking was done visually using satellite imagery and other mapping material."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_plant_locations = pd.read_csv(os.path.join('input','input_plant_locations_de.csv'),\n",
    "                                     sep=',',  # CSV field separator, default is ','\n",
    "                                     decimal='.',  # Decimal separator, default is '.')  \n",
    "                                     encoding='utf8')\n",
    "\n",
    "data_plant_locations = data_plant_locations.set_index('id')\n",
    "\n",
    "data_plant_locations['lat'] = pd.to_numeric(data_plant_locations['lat'], \n",
    "                                            errors='coerce')\n",
    "data_plant_locations['lon'] = pd.to_numeric(data_plant_locations['lon'], \n",
    "                                            errors='coerce')\n",
    "\n",
    "plantlist = pd.merge(plantlist, \n",
    "                     data_plant_locations, \n",
    "                     left_index=True, \n",
    "                     right_index=True, \n",
    "                     how='left')\n",
    "plantlist.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist[plantlist.lat.isnull()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.10 Allocate energy source levels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To enable a more readable output, the columns 'fuel' is augmented with additional information called 'energy source level'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read energy source level allocation table\n",
    "energy_source_level_allocator = pd.read_csv(os.path.join('input', 'energy_source_level_allocator.csv'),\n",
    "                                     sep=',',  # CSV field separator, default is ','\n",
    "                                     decimal='.',  # Decimal separator, default is '.')\n",
    "                                     index_col='fuel',\n",
    "                                     encoding='utf8')\n",
    "plantlist = pd.merge(energy_source_level_allocator, plantlist,\n",
    "                    left_index = True,\n",
    "                    right_on='fuel',\n",
    "                    how='outer')\n",
    "\n",
    "plantlist"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 8. Define final output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Merge uba_name_aggregate and uba_name\n",
    "plantlist.loc[plantlist['uba_name_aggregate'].isnull(), 'uba_name_aggregate'] = plantlist['uba_name']\n",
    "\n",
    "# Drop columns not relevant for output\n",
    "colsToDrop = ['bnetza_id',\n",
    "              'capacity',\n",
    "              'uba_name',\n",
    "              'uba_capacity_original',\n",
    "              'uba_chp_capacity_original',\n",
    "              'uba_city', \n",
    "              'uba_commissioned', \n",
    "              'uba_company', \n",
    "              'uba_company_aggregate', \n",
    "              'uba_fuel', \n",
    "              'uba_postcode', \n",
    "              'uba_state', \n",
    "              'uba_technology', \n",
    "              'uba_technology_aggregate', \n",
    "              'retrofit',\n",
    "              'uba_commissioned_1', \n",
    "              'uba_commissioned_2', \n",
    "              'uba_commissioned_3', \n",
    "              'commissioned_1', \n",
    "              'commissioned_2', \n",
    "              'commissioned_3', \n",
    "              'fuel_basis', \n",
    "              'fuel_multiple1', \n",
    "              'fuel_multiple2',\n",
    "              'efficiency_gross',\n",
    "              'efficiency_intercept',\n",
    "              'efficiency_slope',\n",
    "              'source_type',\n",
    "              'date'\n",
    "             ]\n",
    "plantlist = plantlist.drop(colsToDrop, axis=1)\n",
    "\n",
    "# Rename columns\n",
    "plantlist = plantlist.rename(columns={'commissioned': 'commissioned_original', \n",
    "                                      'commissioned_float': 'commissioned', \n",
    "                                      'retrofit_float': 'retrofit', \n",
    "                                      'capacity_float': 'capacity_net_bnetza',\n",
    "                                      'uba_capacity': 'capacity_gross_uba', \n",
    "                                      'uba_chp_capacity': 'chp_capacity_uba', \n",
    "                                      'efficiency_net': 'efficiency_data', \n",
    "                                      'efficiency_literature': 'efficiency_estimate', \n",
    "                                      'uba_name_aggregate': 'name_uba',\n",
    "                                      'name': 'name_bnetza',\n",
    "                                      'block': 'block_bnetza',\n",
    "                                     })\n",
    "\n",
    "# Sort columns\n",
    "columns_sorted = [\n",
    "                 'name_bnetza',\n",
    "                 'block_bnetza',\n",
    "                 'name_uba',\n",
    "                 'company',\n",
    "                 'street',\n",
    "                 'postcode',\n",
    "                 'city',\n",
    "                 'state',\n",
    "                 'country_code',\n",
    "                 'capacity_net_bnetza',\n",
    "                 'capacity_gross_uba',\n",
    "                 'fuel',\n",
    "                 'technology',\n",
    "                 'chp',\n",
    "                 'chp_capacity_uba',\n",
    "                 'commissioned',\n",
    "                 'commissioned_original',\n",
    "                 'retrofit',\n",
    "                 'shutdown',\n",
    "                 'status',\n",
    "                 'type',\n",
    "                 'lat',\n",
    "                 'lon',\n",
    "                 'eic_code_plant',\n",
    "                 'eic_code_block',    \n",
    "                 'efficiency_data',\n",
    "                 'efficiency_source',\n",
    "                 'efficiency_estimate',\n",
    "                 'energy_source_level_1',\n",
    "                 'energy_source_level_2',\n",
    "                 'energy_source_level_3',\n",
    "                 'eeg',\n",
    "                 'network_node',\n",
    "                 'voltage',\n",
    "                 'network_operator',\n",
    "                 'merge_comment',\n",
    "                 'comment']\n",
    "plantlist = plantlist.reindex(columns=columns_sorted)\n",
    "\n",
    "plantlist.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8.1 Round values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Round capacity values as well as the efficiency estimate to five decimals-\n",
    "plantlist.capacity_net_bnetza = plantlist.capacity_net_bnetza.round(decimals=5)\n",
    "plantlist.capacity_gross_uba = plantlist.capacity_gross_uba.round(decimals=5)  \n",
    "plantlist.efficiency_estimate = plantlist.efficiency_estimate.round(decimals=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## 8.2 Verification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.2.1 Capacities by plant status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pivot_status_capacity = pd.pivot_table(\n",
    "                        plantlist, \n",
    "                        values='capacity_net_bnetza',\n",
    "                        columns='status',\n",
    "                        index='fuel', \n",
    "                        aggfunc=np.sum\n",
    "                        )\n",
    "pivot_status_capacity.sort_values(by='operating', inplace=True, ascending=0)\n",
    "pivot_status_capacity_plot=pivot_status_capacity.plot(kind='barh', \n",
    "                                                      stacked=True,\n",
    "                                                      legend=True, \n",
    "                                                      figsize=(12, 6))\n",
    "pivot_status_capacity_plot.set_xlabel(\"MW\")\n",
    "pivot_status_capacity_plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.2.2 Power plant age"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist_filtered = plantlist  \n",
    "pivot_age_capacity = pd.pivot_table(\n",
    "                        plantlist_filtered, \n",
    "                        values='capacity_net_bnetza',\n",
    "                        columns='fuel',\n",
    "                        index='commissioned', \n",
    "                        aggfunc=np.sum,\n",
    "                        dropna=True\n",
    "                        )\n",
    "\n",
    "pivot_age_capacity_plot=pivot_age_capacity.plot(kind='bar', \n",
    "                                                stacked=True,\n",
    "                                                legend=True, \n",
    "                                                figsize=(17, 10))\n",
    "pivot_age_capacity_plot.set_ylabel(\"MW\")\n",
    "\n",
    "xaxis_labels = pivot_age_capacity.index.astype(int)\n",
    "pivot_age_capacity_plot.set_xticklabels(xaxis_labels)\n",
    "\n",
    "\n",
    "pivot_age_capacity_plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.2.3 Block size vs year of commissioning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This chart is suitable to check outliers of commissioning years and block sizes. \n",
    "In theory, there should be no unexpected values, e.g. all commissioning years should be greater than 1900. \n",
    "Block sizes above 2000 MW are also unlikely."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist_for_plot = plantlist.copy(deep=True)\n",
    "plantlist_for_plot['capacity_float'] = pd.to_numeric(plantlist_for_plot['capacity_net_bnetza'], \n",
    "                                                     errors='coerce')\n",
    "plantlist_for_plot['commissioned_float'] = pd.to_numeric(plantlist_for_plot['commissioned'], \n",
    "                                                         errors='coerce')\n",
    "#plot_blocksize_year = Scatter(plantlist_for_plot,  \n",
    "#                              x='commissioned_float', \n",
    "#                              y='capacity_float',\n",
    "#                              color='fuel', \n",
    "#                              title='Block-Size vs Year of Commissioning', \n",
    "#                              xlabel='Year', \n",
    "#                              ylabel='MW',\n",
    "#                              legend=\"top_left\",\n",
    "#                              height=500,\n",
    "#                              width=700)\n",
    "#show(plot_blocksize_year)\n",
    "plantlist_for_plot.plot(kind='scatter', x='commissioned_float', y='capacity_float')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8.3 Logical checks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.3.1 Every power plant needs a capacity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "List all entries with zero capacity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plantlist[plantlist.capacity_net_bnetza == 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.3.2 Commissioning Dates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Show all Plants with commisioning dates below 1900 \n",
    "plantlist[plantlist['commissioned'] <= 1900]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show all Plants with invalid commisioning dates\n",
    "plantlist[plantlist['commissioned'].isnull()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.3.3 Compare UBA and BNetzA data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 8.3.3.1 Postcodes of BNetzA and UBA lists should match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# TODO: When implemented write a marker in the comment column\n",
    "\n",
    "# List all entries with diverging postcodes (if a postcode is given)\n",
    "# plantlist[(plantlist['uba_postcode'].notnull() == True) & (pd.to_numeric(plantlist.postcode, errors='coerce') != pd.to_numeric(plantlist.uba_postcode, errors='coerce'))]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 8.3.3.2 Compare Installed capacities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: improve this comparison, it creates many false positives\n",
    "\n",
    "capacitycomparison = pd.DataFrame(plantlist.capacity_net_bnetza / plantlist.capacity_gross_uba)\n",
    "capacitycomparison['Name'] = plantlist.name_bnetza\n",
    "capacitycomparison['Block'] = plantlist.block_bnetza\n",
    "capacitycomparison['BnetzaCapacity'] = plantlist.capacity_net_bnetza\n",
    "capacitycomparison['UBACapacity'] = plantlist.capacity_gross_uba\n",
    "capacitycomparison.dropna(inplace=True)\n",
    "capacitycomparison.sort_values(by=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# 9. Documenting the data package (meta data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We document the data packages meta data in the specific format JSON as proposed by the Open Knowledge Foundation. See the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.\n",
    "\n",
    "In order to keep the notebook more readable, we first formulate the metadata in the human-readable YAML format using a multi-line string. We then parse the string into a Python dictionary and save that to disk as a JSON file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "with open(os.path.join('input', 'metadata.yml'), 'r') as f:\n",
    "    metadata = yaml.load(f.read())\n",
    "    \n",
    "datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 10. Result export"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10.1 Write the results to file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "output_path = 'output'\n",
    "\n",
    "plantlist.to_csv(\n",
    "    os.path.join(output_path, 'conventional_power_plants_DE.csv'),\n",
    "    encoding='utf-8', index_label='id'\n",
    ")\n",
    "\n",
    "plantlist.to_excel(\n",
    "    os.path.join(output_path, 'conventional_power_plants_DE.xlsx'),\n",
    "    sheet_name='plants', index_label='id'\n",
    ")\n",
    "\n",
    "plantlist.to_sql(\n",
    "    'conventional_power_plants_DE',\n",
    "    sqlite3.connect(os.path.join(output_path ,'conventional_power_plants.sqlite')),\n",
    "    if_exists=\"replace\", index_label='id'\n",
    ")\n",
    "\n",
    "with open(os.path.join(output_path, 'datapackage.json'), 'w') as f:\n",
    "    f.write(datapackage_json)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10.2 Write checksums"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "files = [\n",
    "    'conventional_power_plants_DE.csv', 'conventional_power_plants_DE.xlsx',\n",
    "    'conventional_power_plants_EU.csv', 'conventional_power_plants_EU.xlsx',\n",
    "    'conventional_power_plants.sqlite'\n",
    "]\n",
    "\n",
    "with open('checksums.txt', 'w') as f:\n",
    "    for file_name in files:\n",
    "        file_hash = get_sha_hash(os.path.join(output_path, file_name))\n",
    "        f.write('{},{}\\n'.format(file_name, file_hash))"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  },
  "toc": {
   "nav_menu": {
    "height": "512px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": false,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": true,
   "toc_section_display": "block",
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}