{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table style=\"width:100%; background-color: #D9EDF7\">\n",
    "  <tr>\n",
    "    <td style=\"border: 1px solid #CFCFCF\">\n",
    "      <b>Renewable power plants: Validation and output Notebook</b>\n",
    "      <ul>\n",
    "        <li><a href=\"main.ipynb\">Main Notebook</a></li>\n",
    "        <li><a href=\"download_and_process.ipynb\">Download and process Notebook</a></li>\n",
    "        <li>Validation and output Notebook</li>\n",
    "      </ul>\n",
    "      <br>This Notebook is part of the <a href=\"http://data.open-power-system-data.org/renewable_power_plants\"> Renewable power plants Data Package</a> of <a href=\"http://open-power-system-data.org\">Open Power System Data</a>.\n",
    "    </td>\n",
    "  </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part 1 of the script (<a href=\"download_and_process.ipynb\">Download and process Notebook</a>) has downloaded and merged the original data. This Notebook subsequently checks, validates the list of renewable power plants and creates CSV/XLSX/SQLite files. It also generates a daily time series of cumulated installed capacities by energy source.\n",
    "\n",
    "*(Before running this script make sure you ran Part 1, so that the renewables.pickle files for each country exist in the same folder as the scripts)*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table of contents \n",
    "\n",
    "* [Part 1: Download and process](download_and_process.ipynb)\n",
    "* [Script setup](#Script-setup)\n",
    "* [4. Load data](#4.-Load-data)\n",
    "* [5. Validation](#5.-Validation)\n",
    "    * [5.1 Germany DE](#5.1-Germany-DE)\n",
    "    * [5.2 France FR](#5.2-France-FR)\n",
    "* [6. Capacity time series](#6.-Capacity-time-series)\n",
    "* [7. Plots](#7.-Plots)\n",
    "    * [7.1 Cumulated capacity validation](#7.1-Cumulated-capacity-validation)\n",
    "        * [7.1.1 Download and read data for validaion](#7.1.1-Download-and-read-data-for-validation)\n",
    "        * [7.1.2 Prepare cumulated capacity for comparison](#7.1.2-Prepare-cumulated-capacity-for-comparison)\n",
    "        * [7.1.3 Plot cumulated capacity](#7.1.3-Plot-cumulated-capacity)\n",
    "    * [7.2 Capacity time series DE-validation](#7.2-Capacity-time-series-DE---validation)\n",
    "* [8. Harmonize format for output files](#8.-Harmonize-format-for-output-files)\n",
    "    * [8.1 Columns](#8.1-Columns)\n",
    "    * [8.2 Accuracy of numbers](#8.2-Accuracy-of-numbers)\n",
    "    * [8.3 Sort](#8.3-Sort)\n",
    "    * [8.4 Leave unspecified cells blank](#8.4-Leave-unspecified-cells-blank)\n",
    "* [9. Output](#9.-Output)\n",
    "    * [9.1 Settings](#9.1-Settings)\n",
    "    * [9.2 Write power plant lists](#9.2-Write-power-plant-list)\n",
    "    * [9.3 Write time series](#9.3-Write-time-series)\n",
    "    * [9.4 Write meta data](#9.4-Write-meta-data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Script setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# importing all necessary Python libraries for this Script\n",
    "\n",
    "import json\n",
    "import yaml\n",
    "import os\n",
    "import pandas as pd\n",
    "import sqlite3\n",
    "import logging\n",
    "import requests\n",
    "import posixpath\n",
    "import urllib\n",
    "from bokeh.charts import Scatter, Line, Bar, show, output_file\n",
    "from bokeh.io import output_notebook\n",
    "import sqlalchemy\n",
    "\n",
    "output_notebook()\n",
    "\n",
    "# Set up a log\n",
    "logger = logging.getLogger('notebook')\n",
    "logger.setLevel('INFO')\n",
    "nb_root_logger = logging.getLogger()\n",
    "formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s'\\\n",
    "                              '- %(message)s', datefmt='%d %b %Y %H:%M:%S')\n",
    "\n",
    "# Create input and output folders if they don't exist\n",
    "os.makedirs('input/original_data', exist_ok=True)\n",
    "\n",
    "os.makedirs('output', exist_ok=True)\n",
    "os.makedirs('output/renewable_power_plants', exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Load data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Read data from script Part 1 download_and_process\n",
    "DE_re = pd.read_pickle('DE_renewables.pickle')\n",
    "DK_re = pd.read_pickle('DK_renewables.pickle')\n",
    "FR_re = pd.read_pickle('FR_renewables.pickle')\n",
    "PL_re = pd.read_pickle('PL_renewables.pickle')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# 5. Validation \n",
    "\n",
    "-- *Work in progress - all information on suspect data is welcome!* --\n",
    "\n",
    "This section checks the DataFrame for a set of pre-defined criteria and adds markers to the entries in an additional column. The marked data will be included in the output files, but marked, so that they can be easiliy filtered out. For creating the validation plots and the time series, suspect data is skipped."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Read csv of validation marker description an show them\n",
    "validation = pd.read_csv('input/validation_marker.csv',\n",
    "                         sep=',', header=0)\n",
    "validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.1 Germany DE\n",
    "**Main issue:** Due to the shortened zip-Code (5 to 3 digits) and eeg_id (33 to 15 characters) in the data from Netztransparenz (until end of 2015), we are not able to identify which of the power plants are also in the BNetzA-data. Thus the duplicate entries from August 2014 - end 2015 are not filtered, but all BNetzA-entries in this timespan marked."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Add marker to data according to criteria (see validation_marker above)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Create empty marker column named comment\n",
    "DE_re['comment'] = \"\"\n",
    "\n",
    "# Validation criteria (R_1) for data source BNetzA\n",
    "idx_date = DE_re[(DE_re['commissioning_date'] <= '2015-12-31') & (\n",
    "                  DE_re['data_source'] == 'BNetzA')].index\n",
    "\n",
    "# this can be replaced by renewables.loc[idx_date,'comment'] += \"R_1, \".\n",
    "# The same goes for the expressions below\n",
    "DE_re.loc[idx_date, 'comment'] = DE_re.loc[idx_date, 'comment'] + \"R_1;\"\n",
    "\n",
    "# Validation criteria (R_1) for source BNetzA_PV\n",
    "idx_date_pv = DE_re[(DE_re['commissioning_date'] <= '2015-12-31') & (\n",
    "              DE_re['data_source'] == 'BNetzA_PV')].index\n",
    "DE_re.loc[idx_date_pv, 'comment'] += \"R_1;\"\n",
    "\n",
    "# Validation criteria (R_2)\n",
    "idx_not_inst = DE_re[(DE_re['notification_reason'] != 'Inbetriebnahme') & (\n",
    "                      DE_re['data_source'] == 'BNetzA')].index\n",
    "DE_re.loc[idx_not_inst, 'comment'] += \"R_2;\"\n",
    "\n",
    "# Validation criteria (R_3)\n",
    "idx_date_null = DE_re[(DE_re['commissioning_date'].isnull())].index\n",
    "DE_re.loc[idx_date_null, 'comment'] += \"R_3;\"\n",
    "\n",
    "# Validation criteria (R_4)\n",
    "idx_capacity = DE_re[DE_re.electrical_capacity <= 0.0].index\n",
    "DE_re.loc[idx_capacity, 'comment'] += \"R_4;\"\n",
    "\n",
    "# Validation criteria (R_5)\n",
    "# Find all power plants which are decommissioned from the grid. \n",
    "# These are probably commissioned to another grid and thus doubled.\n",
    "# Just the entry which is not double should be kept, thus the other one is marked\n",
    "idx_grid_decomm = DE_re[DE_re['grid_decommissioning_date'].isnull() == False].index\n",
    "DE_re.loc[idx_grid_decomm, 'comment'] += \"R_5;\"\n",
    "\n",
    "# Validation criteria (R_6)\n",
    "idx_decomm = DE_re[DE_re['decommissioning_date'].isnull() == False].index\n",
    "DE_re.loc[idx_decomm, 'comment'] += \"R_6\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Create cleaned DataFrame**\n",
    "\n",
    "All marked entries are deleted for the cleaned version of the DataFrame that is utilized for creating time series of installation and for the validation plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Locate suspect entires\n",
    "idx_suspect = DE_re[DE_re.comment.str.len() > 1].index\n",
    "\n",
    "# create new DataFrame without suspect entries\n",
    "DE_re_clean = DE_re.drop(idx_suspect)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Overview suspect entries - cleaned data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Count marked entries\n",
    "DE_re.groupby(['comment','data_source'])['comment'].count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Summarize electrical capacity per energy source of suspect data\n",
    "DE_re.groupby(['comment', 'energy_source_level_2'])['electrical_capacity'].sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Summarize electrical capacity per energy source level 2 of cleaned data (MW)\n",
    "DE_re_clean.groupby(['energy_source_level_2'])['electrical_capacity'].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.2 France FR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Create empty marker column\n",
    "FR_re['comment'] = \"\"\n",
    "\n",
    "# Validation criteria (R_7)\n",
    "idx_not_Europe = FR_re[(FR_re['lat'] < 41) | (\n",
    "                        FR_re['lon'] < -6) | (\n",
    "                        FR_re['lon'] > 10)].index\n",
    "\n",
    "FR_re.loc[idx_not_Europe, 'comment'] += \"R_7\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Show entries in the French list not located on the European continent\n",
    "FR_re.loc[idx_not_Europe]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6. Capacity time series\n",
    "\n",
    "This section creates a daily and yearly time series of the cumulated installed capacity by energy source. This data will be part of the output and will be compared in a plot for validation in the next section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Additional column for chosing energy sources for time series\n",
    "DE_re_clean['temp_energy_source'] = DE_re_clean['energy_source_level_2']\n",
    "\n",
    "# Time series for on- and offshore wind should be separated, for hydro subtype\n",
    "# should be used because all is run-of-river\n",
    "idx_subtype = DE_re_clean[(DE_re_clean.energy_source_level_2 == 'Wind') |\n",
    "                          (DE_re_clean.energy_source_level_2 == 'Hydro')].index\n",
    "\n",
    "DE_re_clean.loc[idx_subtype, 'temp_energy_source'] = DE_re_clean.loc[\n",
    "                                                     idx_subtype, 'technology']\n",
    "\n",
    "# Set energy source for which time series should be generated\n",
    "energy_sources = ['Solar', 'Onshore', 'Offshore', 'Bioenergy',\n",
    "                  'Geothermal', 'Run-of-river']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Set date range for which the time series should be generated\n",
    "range_yearly = pd.date_range(start='1990-01-01', end='2016-01-01', freq='A')\n",
    "range_daily = pd.date_range(start='2005-01-01', end='2016-05-31', freq='D')\n",
    "\n",
    "# Set range of time series as index\n",
    "timeseries_yearly = pd.DataFrame(index=range_yearly)\n",
    "timeseries_daily = pd.DataFrame(index=range_daily)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Create cumulated time series per energy source for both yearly and daily time series\n",
    "for gtype in energy_sources:\n",
    "    temp = (DE_re_clean[['commissioning_date', 'electrical_capacity']]\n",
    "            .loc[DE_re_clean['temp_energy_source'].isin([gtype])])\n",
    "    \n",
    "    temp_timeseries = temp.set_index('commissioning_date')\n",
    "    \n",
    "    # Create cumulated time series per energy_source and year\n",
    "    timeseries_yearly['{0}'.format(gtype)] = temp_timeseries.resample(\n",
    "        'A').sum().cumsum().fillna(method='ffill')\n",
    "\n",
    "    # Create cumulated time series per energy_source and day\n",
    "    timeseries_daily['{0}'.format(gtype)] = temp_timeseries.resample(\n",
    "        'D').sum().cumsum().fillna(method='ffill')\n",
    "    \n",
    "# Filling the empty cells in the gethermal column since this did not work in the loop\n",
    "timeseries_daily.Geothermal = timeseries_daily.Geothermal.fillna(method='ffill')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Shorten timestamp to year for the yearly time series    \n",
    "timeseries_yearly.index = pd.to_datetime(timeseries_yearly.index, format=\"%Y\").year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Show yearly timeseries of installed capacity in MW per energy source level 2\n",
    "timeseries_yearly"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7. Plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.1 Cumulated capacity validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare cumulated capacity per country and energy source level 2 of the power plant list to data from the International Renewable Energy Agency (IRENA), ENTSO-E and Eurostat, already compiled in OPSD Data Package <a href = \"http://data.open-power-system-data.org/national_generation_capacity\">national generation capacity</a>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.1.1 Download and read data for validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### IRENA\n",
    "If automatic Download of <a href = \"http://public.tableau.com/profile/irena.resource#!/vizhome/ExportIRENAData/Europe.csv\">IRENA</a> data does not work, download the file Europe.csv manually and put the file into the folder _input_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Download function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def download_and_cache_Irena_Europe(url, session=None):\n",
    "    \"\"\"This function downloads a file into a folder called \n",
    "    original_data and returns the local filepath.\"\"\"\n",
    "    path = urllib.parse.urlsplit(url).path\n",
    "    filename = 'Europe.csv'\n",
    "    base_filepath = \"input/\"\n",
    "    filepath = base_filepath + filename\n",
    "\n",
    "    # check if file exists, if not download it\n",
    "    filepath = base_filepath + filename\n",
    "    if not os.path.exists(filepath):\n",
    "        if not session:\n",
    "            session = requests.session()\n",
    "        \n",
    "        print(\"Downloading file: \", filename)\n",
    "        r = session.get(url, stream=True)\n",
    "\n",
    "        chuncksize = 1024\n",
    "        with open(filepath, 'wb') as file:\n",
    "            for chunck in r.iter_content(chuncksize):\n",
    "                file.write(chunck)\n",
    "    else:\n",
    "        print(\"Using local file from\", filepath)\n",
    "    filepath = '' + filepath\n",
    "    return filepath"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "url = 'http://public.tableau.com/views/ExportIRENAData/Europe.csv?:embed=y&:showVizHome=no&:display_count=y&:display_static_image=y&:bootstrapWhenNotified=true'\n",
    "download_and_cache_Irena_Europe(url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "irena_df = pd.read_csv('input/Europe.csv',\n",
    "                       sep = \",\",\n",
    "                       decimal = \".\",\n",
    "                       thousands = \",\",\n",
    "                       encoding = 'UTF8',\n",
    "                       header = 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Select just the required values for capacity\n",
    "idx_irena = irena_df[(irena_df['Flow'] == 'Total capacity (MWe)')].index\n",
    "irena_df = irena_df.loc[idx_irena,('Country / Area','Sub Technology','Year','Value')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Read column and value translation list\n",
    "columnnames = pd.read_csv('input/column_translation_list.csv')\n",
    "valuenames = pd.read_csv('input/value_translation_list.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Create dictionary for column translation\n",
    "idx_Europe = columnnames[columnnames['country'] == 'Europe'].index\n",
    "column_dict = columnnames.loc[idx_Europe].set_index('original_name')[\n",
    "                                                           'opsd_name'].to_dict()\n",
    "\n",
    "# Create dictionnary in order to adapt energy_source_subtype names\n",
    "idx_Europe = valuenames[valuenames['data_source'] == 'IRENA'].index\n",
    "value_dict = valuenames.loc[idx_Europe].set_index('original_name')[\n",
    "                                                  'opsd_name'].to_dict()\n",
    "\n",
    "# Create dictionnary in order to assign energy_source to its subtype\n",
    "energy_source_dict = valuenames.loc[idx_Europe].set_index('opsd_name')[\n",
    "                                                          'energy_source_level_2'].to_dict()\n",
    "energy_source_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate columns by list \n",
    "irena_df.rename(columns = column_dict, inplace = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Replace energy_source names\n",
    "irena_df.replace(value_dict, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Create new column for energy_source\n",
    "irena_df['energy_source_level_2'] = irena_df['technology']\n",
    "\n",
    "# Fill this with the energy source instead of subtype information\n",
    "irena_df.energy_source_level_2.replace(energy_source_dict, inplace=True)\n",
    "\n",
    "# Group and summarize DataFrame by generation type and installed capacity\n",
    "irena_sum_df = irena_df.groupby(['country','energy_source_level_2','year'])\n",
    "\n",
    "# Group and summarize DataFrame by generation type and installed capacity\n",
    "irena_sum_df = irena_df.groupby(['country','energy_source_level_2','year'],\n",
    "                                as_index = False)['electrical_capacity'].sum()\n",
    "\n",
    "# Add column to keep track of the source\n",
    "irena_sum_df['data_source'] = 'IRENA'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "#### Aggregated capacity from OPSD Data Package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "opsd_df = pd.read_csv('input/aggregated_capacity.csv',\n",
    "                      sep = \",\",\n",
    "                      decimal = \".\",\n",
    "                      encoding = 'UTF8',\n",
    "                      header = 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Translate columns by list \n",
    "opsd_df.rename(columns = column_dict, inplace = True)\n",
    "\n",
    "# Choose the required translation terms for this data source\n",
    "idx_opsd = valuenames[valuenames['data_source'] == 'OPSD'].index\n",
    "\n",
    "# Create dictionnary in order to change install_type names \n",
    "value_dict = valuenames.loc[idx_opsd].set_index('original_name')['opsd_name'].to_dict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Replace energy_source names\n",
    "opsd_df.replace(value_dict, inplace = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Group and summarize DataFrame by generation type and installed capacity\n",
    "e_source = pd.Series(['Bioenergy','Wind','Solar','Geothermal','Hydro','Marine'])\n",
    "recent_years = (2014,2015)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "idx_opsd = opsd_df[(opsd_df['year'].isin(recent_years))\n",
    "                    & (opsd_df['energy_source_level_2'].isin(e_source))].index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# only required columns are chosen, and only rows of recent years\n",
    "# Unnamed:0 gives an error\n",
    "# opsd_df = opsd_df.loc[idx_opsd].drop(['Unnamed: 0','source_type',\n",
    "#                       'type','technology_level_0','technology_level_1',\n",
    "#                       'technology_level_2','technology_level_3'], axis = 1)\n",
    "\n",
    "opsd_df = opsd_df.loc[idx_opsd].drop(['source_type',\n",
    "                      'type','technology_level_0','technology_level_1',\n",
    "                      'technology_level_2','technology_level_3'], axis = 1)\n",
    "\n",
    "\n",
    "# index is not required any more\n",
    "opsd_df = opsd_df.reset_index(drop = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.1.2 Prepare cumulated capacity for comparison\n",
    "Installed capacity is summed per country and energy source to compare it to the other sources for cumulated capacity. This gives an impression how complete the country power plant list datasets are."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Germany DE**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Summarize per energy_source until end of 2015 without marked entries\n",
    "DE_re_sum = DE_re_clean[(DE_re_clean['commissioning_date'] <= '2016-12-31')].groupby([\n",
    "        'energy_source_level_2'], as_index = False)['electrical_capacity'].sum()\n",
    "\n",
    "# Add information until which year (including) the capacity is summarized\n",
    "DE_re_sum['year'] = 2015\n",
    "DE_re_sum['country'] = 'DE'\n",
    "\n",
    "DE_re_sum['data_source'] = 'German TSOs'\n",
    "\n",
    "DE_re_sum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Denmark DK**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Summarize per energy_source until end of 2015\n",
    "DK_re_sum = DK_re[(DK_re['commissioning_date'] <= '2015-12-31')].groupby([\n",
    "                   'energy_source_level_2','data_source'], as_index = False)[\n",
    "                  'electrical_capacity'].sum()\n",
    "\n",
    "# Add information until which year (including) the capacity is summarized\n",
    "DK_re_sum['year'] = 2015\n",
    "DK_re_sum['country'] = 'DK'\n",
    "\n",
    "DK_re_sum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "**France FR**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Summarize per energy_source. Data until 31.12.2014 is provided by the data source\n",
    "FR_re_sum = FR_re.groupby(['energy_source_level_2','data_source'], as_index = False)[\n",
    "                           'electrical_capacity'].sum()\n",
    "\n",
    "# Add information until which year (including) the capacity is summarized\n",
    "FR_re_sum['year'] = 2014\n",
    "FR_re_sum['country'] = 'FR'\n",
    "\n",
    "FR_re_sum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Poland PL**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Summarize per energy_source. Data until 31.12.2015 is provided by the data source\n",
    "PL_re_sum = PL_re.groupby(['energy_source_level_2','data_source'], as_index = False)[\n",
    "                           'electrical_capacity'].sum()\n",
    "\n",
    "# Add information until which year (including) the capacity is summarized\n",
    "PL_re_sum['year'] = 2015\n",
    "PL_re_sum['country'] = 'PL'\n",
    "\n",
    "PL_re_sum"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Merge DataFrames\n",
    "capacities = pd.concat([irena_sum_df, opsd_df,\n",
    "                        DE_re_sum, DK_re_sum, FR_re_sum, PL_re_sum])\n",
    "\n",
    "# Get rid von all indices\n",
    "capacities = capacities.reset_index(drop = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.1.3 Plot cumulated capacity\n",
    "One comparison plot for each country: Cumulated capacity in MW per energy source from different sources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plot_country = 'DE'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# chose capacities of respective country and recent years\n",
    "idx_country = capacities[(capacities['country'] == plot_country) \n",
    "                        & (capacities['year'].isin(recent_years))].index\n",
    "\n",
    "# prepare DataFrame for plotting\n",
    "cap_country = capacities.loc[idx_country].drop(['country',\n",
    "                                                'capacity_definition'],\n",
    "                                               axis = 1)\n",
    "\n",
    "cap_country = cap_country.reset_index(drop = True)\n",
    "\n",
    "cap_country"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# plot\n",
    "p = Bar(cap_country, label='energy_source_level_2', values='electrical_capacity',\n",
    "        group=['data_source','year'], legend='top_left', \n",
    "        title=plot_country)\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plot_country = 'DK'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# chose capacities of respective country and recent years\n",
    "idx_country = capacities[(capacities['country'] == plot_country) \n",
    "                        & (capacities['year'].isin(recent_years))].index\n",
    "\n",
    "# prepare DataFrame for plotting\n",
    "cap_country = capacities.loc[idx_country].drop(['country',\n",
    "                                                'capacity_definition'],\n",
    "                                               axis = 1)\n",
    "\n",
    "cap_country = cap_country.reset_index(drop = True)\n",
    "\n",
    "cap_country"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# plot\n",
    "p = Bar(cap_country, label='energy_source_level_2', values='electrical_capacity',\n",
    "        group=['data_source','year'], legend='top_left', \n",
    "        title=plot_country)\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plot_country = 'FR'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# chose capacities of respective country and recent years\n",
    "idx_country = capacities[(capacities['country'] == plot_country) \n",
    "                        & (capacities['year'].isin(recent_years))].index\n",
    "\n",
    "# prepare DataFrame for plotting\n",
    "cap_country = capacities.loc[idx_country].drop(['country',\n",
    "                                                'capacity_definition'],\n",
    "                                               axis = 1)\n",
    "\n",
    "cap_country = cap_country.reset_index(drop = True)\n",
    "\n",
    "cap_country"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# plot\n",
    "p = Bar(cap_country, label='energy_source_level_2', values='electrical_capacity',\n",
    "        group=['data_source','year'], legend='top_left', \n",
    "        title=plot_country)\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plot_country = 'PL'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# chose capacities of respective country and recent years\n",
    "idx_country = capacities[(capacities['country'] == plot_country) \n",
    "                        & (capacities['year'].isin(recent_years))].index\n",
    "\n",
    "# prepare data fraem for plotting\n",
    "cap_country = capacities.loc[idx_country].drop(['country',\n",
    "                                                'capacity_definition'],\n",
    "                                               axis = 1)\n",
    "\n",
    "cap_country = cap_country.reset_index(drop = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# plot\n",
    "p = Bar(cap_country, label='energy_source_level_2', values='electrical_capacity',\n",
    "        group=['data_source','year'], legend='top_left', \n",
    "        title=plot_country)\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.2 Capacity time series DE - validation\n",
    "The yearly capacity time series by energy source derived from the power plant list is validated plotting it in comparison to a time series published by the German Federal Ministry for Economic Affairs and Energy (BMWi, Bundesministeriums für Wirtschaft und Energie) [Time series for the development of renewable energies in Germany](http://www.erneuerbare-energien.de/EE/Navigation/DE/Service/Erneuerbare_Energien_in_Zahlen/Zeitreihen/zeitreihen.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Download BMWi time series** *(If an error message is shown, check if the url_bmwi_stat is still correct.)*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Defining URL\n",
    "url_bmwi_stat = 'http://www.erneuerbare-energien.de/EE/Redaktion/DE/'\\\n",
    "                'Downloads/zeitreihen-zur-entwicklung-der-erneuerbaren-'\\\n",
    "                'energien-in-deutschland-1990-2015-excel.xlsx;jsessionid='\\\n",
    "                'FFE958ADA709DCBFDD437C8A8FF7D90B?__blob=publicationFile&v=6'\n",
    "\n",
    "# Reading BMWi data\n",
    "bmwi_stat = pd.read_excel(url_bmwi_stat,\n",
    "                          sheetname='4', \n",
    "                          header=7,\n",
    "                          skip_footer=8,\n",
    "                          index_col=0)\n",
    "\n",
    "# Transpose DataFrame and set column names\n",
    "bmwi_df = bmwi_stat.T\n",
    "bmwi_df.columns = ['bmwi_hydro', 'bmwi_wind_onshore', 'bmwi_wind_offshore',\n",
    "                   'bmwi_solar', 'bmwi_biomass', 'bmwi_biomass_liquid',\n",
    "                   'bmwi_biomass_gas', 'bmwi_sewage_gas', 'bmwi_landfill_gas',\n",
    "                   'bmwi_geothermal', 'bmwi_total']\n",
    "\n",
    "# Set year as index\n",
    "bmwi_df.index = pd.to_datetime(bmwi_df.index, format=\"%Y\").year"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "** Calculate the absolute deviation between BMWi and OPSD time series**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Merge BMWi and OPSD DataFrames\n",
    "valuation = pd.concat([bmwi_df, timeseries_yearly], axis=1)\n",
    "valuation = valuation.fillna(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Calculate absolute deviation for each year and energy source\n",
    "valuation['absolute_wind_onshore'] = (valuation['Onshore']\n",
    "                                      - valuation['bmwi_wind_onshore']).fillna(0)\n",
    "\n",
    "valuation['absolute_wind_offshore'] = (valuation['Offshore']\n",
    "                                       - valuation['bmwi_wind_offshore']).fillna(0)\n",
    "\n",
    "valuation['absolute_solar'] = (valuation['Solar']\n",
    "                               - valuation['bmwi_solar']).fillna(0)\n",
    "\n",
    "valuation['absolute_hydro'] = (valuation['Run-of-river']\n",
    "                               - valuation['bmwi_hydro']).fillna(0)\n",
    "\n",
    "valuation['absolute_geothermal'] = (valuation['Geothermal']\n",
    "                                    - valuation['bmwi_geothermal']).fillna(0)\n",
    "\n",
    "valuation['absolute_biomass'] = (valuation['Bioenergy']\n",
    "                                 - (valuation['bmwi_biomass']\n",
    "                                    + valuation['bmwi_biomass_liquid']\n",
    "                                    + valuation['bmwi_biomass_gas']\n",
    "                                    + valuation['bmwi_sewage_gas']\n",
    "                                    + valuation['bmwi_landfill_gas'])).fillna(0)\n",
    "\n",
    "valuation['absolute_total'] = ((valuation['Bioenergy']\n",
    "                                + valuation['Onshore']\n",
    "                                + valuation['Offshore']\n",
    "                                + valuation['Solar']\n",
    "                                + valuation['Geothermal']\n",
    "                                + valuation['Run-of-river']\n",
    "                                ) - (valuation['bmwi_total'])).fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Plotting the absolute deviation**\n",
    "\n",
    "The graph shows the cumulative absolute deviation of the electrical capacity in MW between the OPSD data set of renewable power plants and the BMWI statistic from 1990 until 2015.<br>\n",
    "*A positive deviation means the OPSD data set has more capacity than the BMWI statistics.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#Plot settings for absolute deviation\n",
    "deviation_columns = ['absolute_wind_onshore','absolute_wind_offshore',\n",
    "                     'absolute_solar','absolute_hydro','absolute_biomass',\n",
    "                     'absolute_geothermal','absolute_total']\n",
    "\n",
    "dataplot = valuation[deviation_columns]\n",
    "\n",
    "deviation = Line(dataplot,\n",
    "                 y=deviation_columns,\n",
    "                 dash=deviation_columns,\n",
    "                 color=deviation_columns,\n",
    "                 title=\"Deviation between data set and BMWI statistic (negative => BMWI value higher)\",\n",
    "                 ylabel='Deviation in MW',\n",
    "                 xlabel='From 1990 till 2015',\n",
    "                 legend=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Show Plot for absolute deviation\n",
    "show(deviation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "** Calculate the relative deviation between both time series**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Relative deviation\n",
    "valuation['relative_wind_onshore'] = (valuation['absolute_wind_onshore'] / valuation['bmwi_wind_onshore']).fillna(0)\n",
    "\n",
    "valuation['relative_wind_offshore'] = (valuation['absolute_wind_offshore'] / valuation['bmwi_wind_offshore']).fillna(0)\n",
    "\n",
    "valuation['relative_solar'] = (valuation['absolute_solar'] / (valuation['bmwi_solar'])).fillna(0)\n",
    "\n",
    "valuation['relative_hydro'] = (valuation['absolute_hydro'] / (valuation['bmwi_hydro'])).fillna(0)\n",
    "\n",
    "valuation['relative_geothermal'] = (valuation['absolute_geothermal'] / (valuation['bmwi_geothermal'])).fillna(0)\n",
    "\n",
    "valuation['relative_biomass'] = (valuation['absolute_biomass'] / (valuation['bmwi_biomass'])).fillna(0)\n",
    "\n",
    "valuation['relative_total'] = (valuation['absolute_total'] / (valuation['bmwi_total'])).fillna(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Plot settings relative deviation\n",
    "relative_column = ['relative_wind_onshore','relative_wind_offshore',\n",
    "                   'relative_solar','relative_hydro','relative_biomass',\n",
    "                   'relative_total']\n",
    "\n",
    "dataplot2 = valuation[relative_column]\n",
    "\n",
    "relative = Line(dataplot2 * 100,\n",
    "                y=relative_column,\n",
    "                dash=relative_column,\n",
    "                color=relative_column,\n",
    "                title=\"Deviation between data set and BMWI statistic (negative => BMWI value higher)\",\n",
    "                ylabel='Relative difference in percent',\n",
    "                xlabel='From 1990 till 2015',\n",
    "                legend=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Plotting the relative deviation**\n",
    "\n",
    "The graph shows the relative difference of the electrical capacity between the OPSD data set of renewable power plants and the BMWI statistic from 1990 until 2015.<br>\n",
    "*A positive value means the OPSD data set has more capacity than the BMWI statistics.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Show Plot for relative deviation\n",
    "show(relative)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Saving deviation results in XLSX-file**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# write results as Excel file\n",
    "valuation.to_excel('validation_report.xlsx', sheet_name='Capacities_1990_2015', float_format= '%20.5f')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 8. Harmonize format for output files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8.1 Columns\n",
    "A similar order of columns is pursued for all country lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "DE_re = DE_re.loc[:,('commissioning_date','decommissioning_date',\n",
    "                     'energy_source_level_1','energy_source_level_2',\n",
    "                     'energy_source_level_3','technology',\n",
    "                     'electrical_capacity','thermal_capacity',\n",
    "                     'voltage_level','tso','dso','dso_id', 'eeg_id','bnetza_id',\n",
    "                     'federal_state','postcode','municipality_code','municipality',\n",
    "                     'address','address_number',\n",
    "                     'utm_zone','utm_east','utm_north','lat','lon',\n",
    "                     'data_source','comment')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "DK_re = DK_re.loc[:,('commissioning_date',\n",
    "                     'energy_source_level_1','energy_source_level_2',\n",
    "                     'technology',\n",
    "                     'electrical_capacity',\n",
    "                     'dso','gsrn_id',\n",
    "                     'postcode','municipality_code','municipality',\n",
    "                     'address','address_number',\n",
    "                     'utm_east','utm_north','lat','lon',\n",
    "                     'hub_height','rotor_diameter','manufacturer','model',\n",
    "                     'data_source')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "FR_re = FR_re.loc[:,('municipality_code','municipality',\n",
    "                     'energy_source_level_1','energy_source_level_2',\n",
    "                     'energy_source_level_3','technology',\n",
    "                     'electrical_capacity','number_of_installations',\n",
    "                     'lat','lon',\n",
    "                     'data_source','comment')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "PL_re = PL_re.loc[:,('district',\n",
    "                     'energy_source_level_1','energy_source_level_2',\n",
    "                     'energy_source_level_3','technology',\n",
    "                     'electrical_capacity','number_of_installations',\n",
    "                     'lat','lon',\n",
    "                     'data_source')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8.2 Accuracy of numbers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Five digits behind the decimal seperator for electrical capacity in MW.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "DE_re['electrical_capacity'] = DE_re['electrical_capacity'\n",
    "                                       ].map(lambda x: '%.5f' % x)\n",
    "DE_re['thermal_capacity'] = DE_re['thermal_capacity'\n",
    "                                    ].map(lambda x: '%.5f' % x)\n",
    "DK_re['electrical_capacity'] = DK_re['electrical_capacity'\n",
    "                                       ].map(lambda x: '%.5f' % x)\n",
    "FR_re['electrical_capacity'] = FR_re['electrical_capacity'\n",
    "                                       ].map(lambda x: '%.5f' % x)\n",
    "PL_re['electrical_capacity'] = PL_re['electrical_capacity'\n",
    "                                       ].map(lambda x: '%.5f' % x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Five digits behind the decimal seperator for coordinates.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "DE_re['lat'] = DE_re['lat'].map(lambda x: '%.5f' % x)\n",
    "DE_re['lon'] = DE_re['lon'].map(lambda x: '%.5f' % x)\n",
    "DE_re['utm_east'] = DE_re['utm_east'].map(lambda x: '%.5f' % x)\n",
    "DE_re['utm_north'] = DE_re['utm_north'].map(lambda x: '%.5f' % x)\n",
    "\n",
    "# Zone value has to be an integer, thus no digits after the comma\n",
    "DE_re['utm_zone'] = pd.to_numeric(DE_re['utm_zone'], errors='coerce')\n",
    "DE_re['utm_zone'] = DE_re['utm_zone'].map(lambda x: '%.0f' % x)\n",
    "\n",
    "DK_re['lat'] = DK_re['lat'].map(lambda x: '%.5f' % x)\n",
    "DK_re['lon'] = DK_re['lon'].map(lambda x: '%.5f' % x)\n",
    "DK_re['utm_east'] = DK_re['utm_east'].map(lambda x: '%.5f' % x)\n",
    "DK_re['utm_north'] = DK_re['utm_north'].map(lambda x: '%.5f' % x)\n",
    "\n",
    "FR_re['lon'] = FR_re['lon'].map(lambda x: '%.5f' % x)\n",
    "FR_re['lat'] = FR_re['lat'].map(lambda x: '%.5f' % x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Dates in the output should be without a timestamp.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "DE_re['commissioning_date'] = DE_re['commissioning_date'].apply(lambda x: x.date())\n",
    "DE_re['decommissioning_date'] = DE_re['decommissioning_date'].apply(lambda x: x.date())\n",
    "DK_re['commissioning_date'] = DK_re['commissioning_date'].apply(lambda x: x.date())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Reset index of timeseries.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Time index is not required any more\n",
    "timeseries_yearly = timeseries_yearly.reset_index()\n",
    "timeseries_daily = timeseries_daily.reset_index()\n",
    "\n",
    "# Set index name\n",
    "timeseries_yearly.rename(columns={'index': 'year'}, inplace=True)\n",
    "timeseries_daily.rename(columns={'index': 'day'}, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8.3 Sort"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Sort German DataFrame by commissioning date\n",
    "DE_re = DE_re.ix[DE_re.commissioning_date.sort_values().index]\n",
    "\n",
    "# Sort Danish DataFrame by commissioning date\n",
    "DK_re = DK_re.ix[DK_re.commissioning_date.sort_values().index]\n",
    "\n",
    "# Sort French DataFrame by municipality_code\n",
    "FR_re = FR_re.ix[FR_re.municipality_code.sort_values().index]\n",
    "\n",
    "# Sort Polish DataFrame by district\n",
    "PL_re = PL_re.ix[PL_re.district.sort_values().index]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8.4 Leave unspecified cells blank"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# fillna leaves NaN (Not-a-Number) cells blank and replace cells with 'nan'\n",
    "DE_re.fillna('', inplace=True)\n",
    "DE_re.replace('nan','', inplace=True)\n",
    "DK_re.fillna('', inplace=True)\n",
    "DK_re.replace('nan','', inplace=True)\n",
    "FR_re.fillna('', inplace=True)\n",
    "FR_re.replace('nan','', inplace=True)\n",
    "PL_re.fillna('', inplace=True)\n",
    "PL_re.replace('nan','', inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 9. Output\n",
    "This section finally writes the Data Package:\n",
    "* CSV + XLSX + SQLite of the power plant lists\n",
    "* CSV of the time series\n",
    "* Meta data (JSON)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9.1 Settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "path_package = 'output/renewable_power_plants'\n",
    "os.makedirs(path_package, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9.2 Write power plant lists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "** Write CSV-files**\n",
    "\n",
    "This process will take some time depending on you hardware.\n",
    "\n",
    "One csv-file for each country. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Germany\n",
    "DE_re.to_csv(path_package + '/renewable_power_plants_DE.csv',\n",
    "             sep=',',\n",
    "             decimal='.',\n",
    "             date_format='%Y-%m-%d',\n",
    "             encoding='utf-8',\n",
    "             index=False,\n",
    "             if_exists=\"replace\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Denmark\n",
    "DK_re.to_csv(path_package + '/renewable_power_plants_DK.csv',\n",
    "             sep=',',\n",
    "             decimal='.',\n",
    "             date_format='%Y-%m-%d',\n",
    "             encoding='utf-8',\n",
    "             index=False,\n",
    "             if_exists=\"replace\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# France\n",
    "FR_re.to_csv(path_package + '/renewable_power_plants_FR.csv',\n",
    "             sep=',',\n",
    "             decimal='.',\n",
    "             date_format='%Y-%m-%d',\n",
    "             encoding='utf-8',\n",
    "             index=False,\n",
    "             if_exists=\"replace\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Denmark\n",
    "PL_re.to_csv(path_package + '/renewable_power_plants_PL.csv',\n",
    "             sep=',',\n",
    "             decimal='.',\n",
    "             date_format='%Y-%m-%d',\n",
    "             encoding='utf-8',\n",
    "             index=False,\n",
    "             if_exists=\"replace\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "** Write XLSX-file**\n",
    "\n",
    "This process will take some time depending on your hardware.\n",
    "\n",
    "All country power plant list will be written in one xlsx-file. Each country power plant list is written in a separate sheet. As the German power plant list has to many entries for one sheet, it will be split in two. An additional sheet includes the explanations of the marker."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# In case there is a memory error, the output data can be saved in a pickle file,\n",
    "# the kernel stopped and all variables deleted. Then just the necessary data can be\n",
    "# read and then there should be no memory error for the output.\n",
    "# DE_re.to_pickle('DE_re_before_output.pickle')\n",
    "# DK_re.to_pickle('DK_re_before_output.pickle')\n",
    "# FR_re.to_pickle('FR_re_before_output.pickle')\n",
    "# PL_re.to_pickle('PL_re_before_output.pickle')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# In case the date has to be read in again, the first cell of the notebook has to\n",
    "# be executed and then the data for the excel-output can be read in again.\n",
    "# DE_re = pd.read_pickle('DE_re_before_output.pickle')\n",
    "# DK_re = pd.read_pickle('DK_re_before_output.pickle')\n",
    "# FR_re = pd.read_pickle('FR_re_before_output.pickle')\n",
    "# PL_re = pd.read_pickle('PL_re_before_output.pickle')\n",
    "# path_package = 'output/renewable_power_plants'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Read csv of Marker Explanations\n",
    "validation = pd.read_csv('input/validation_marker.csv',\n",
    "                         sep=',', header=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# print('Preparing writer')\n",
    "# # Write the results as xlsx file\n",
    "%time writer = pd.ExcelWriter(path_package + '/renewable_power_plants.xlsx',engine='xlsxwriter',date_format='yyyy-mm-dd')\n",
    "\n",
    "print('Writing DK')\n",
    "%time DK_re.to_excel(writer, index=False, sheet_name='DK')\n",
    "\n",
    "print('Writing FR')\n",
    "%time FR_re.to_excel(writer, index=False, sheet_name='FR')\n",
    "\n",
    "print('Writing PL')\n",
    "%time PL_re.to_excel(writer, index=False, sheet_name='PL')\n",
    "\n",
    "print('Writing DE part 1')\n",
    "%time DE_re[:1000000].to_excel(writer, index=False, sheet_name='DE part-1')\n",
    "\n",
    "print('Writing DE part 2')\n",
    "%time DE_re[1000000:].to_excel(writer, index=False, sheet_name='DE part-2')\n",
    "\n",
    "print('Writing validation')\n",
    "# The explanation of validation markers is added as a sheet\n",
    "%time validation.to_excel(writer, index=False, sheet_name='validation_marker')\n",
    "\n",
    "print('Saving...')\n",
    "# Close the Pandas Excel writer and output the Excel file.\n",
    "%time writer.save()\n",
    "print('...done!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Write SQLite**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "# The decommissioning_date column is giving the engine some trouble. That is\n",
    "# why it is casted to the str type.\n",
    "DE_re.decommissioning_date = DE_re.decommissioning_date.astype(str)\n",
    "# Write the results to sqlite database. Using the chunksize parameter makes\n",
    "# this cell not use so much memory. If the parameter is not set, the to_sql\n",
    "# function will try to write all rows at the same time. This uses too much\n",
    "# memory. If you have a lot of memory, you can remove the parameter or increase \n",
    "# it to speed this process up. If you have memory problemes, try decreasing the \n",
    "# chunksize.\n",
    "engine = sqlalchemy.create_engine(\n",
    "    'sqlite:///'+ path_package +'/renewable_power_plants.sqlite') \n",
    "DE_re.to_sql('renewable_power_plants_DE',\n",
    "             engine,\n",
    "             if_exists=\"replace\",\n",
    "             chunksize=100000,\n",
    "             index=False\n",
    "            )\n",
    "DK_re.to_sql('renewable_power_plants_DK',\n",
    "             engine,\n",
    "             if_exists='replace',\n",
    "             chunksize=100000,\n",
    "             index=False\n",
    "            )\n",
    "FR_re.to_sql('renewable_power_plants_FR',\n",
    "             engine,\n",
    "             if_exists=\"replace\",\n",
    "             chunksize=100000,\n",
    "             index=False\n",
    "            )\n",
    "validation.to_sql('validation_marker',\n",
    "             engine,\n",
    "             if_exists=\"replace\",\n",
    "             chunksize=100000,\n",
    "             index=False\n",
    "            )\n",
    "timeseries_daily.to_sql('renewable_capacity_timeseries_DE',\n",
    "             engine,\n",
    "             if_exists=\"replace\",\n",
    "             chunksize=100000,\n",
    "             index=False\n",
    "            )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9.3 Write time series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "\n",
    "# Write daily cumulated time series as csv\n",
    "timeseries_daily.to_csv(path_package + '/renewable_capacity_timeseries_DE.csv',\n",
    "                        sep=',',\n",
    "                        float_format='%.3f',\n",
    "                        decimal='.',\n",
    "                        date_format='%Y-%m-%d',\n",
    "                        encoding='utf-8',\n",
    "                        if_exists=\"replace\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## 9.4 Write meta data\n",
    "\n",
    "The Data Packages meta data are created in the specific JSON format as proposed by the Open Knowledge Foundation. Please see the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.\n",
    "\n",
    "In order to keep the Jupyter Notebook more readable the metadata is written in the human-readable YAML format using a multi-line string and then parse the string into a Python dictionary and save it as a JSON file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "metadata = \"\"\"\n",
    "name: opsd-renewable-power-plants\n",
    "title: Renewable power plants\n",
    "description: List of renewable energy power stations\n",
    "long_description: >-\n",
    "    This Data Package contains a list of renewable energy power plants in lists of \n",
    "    renewable energy-based power plants of Germany, Denmark, France and Poland. \n",
    "    Germany: More than 1.7 million renewable power plant entries, eligible under the \n",
    "    renewable support scheme (EEG). \n",
    "    Denmark: Wind and phovoltaic power plants with a high level of detail. \n",
    "    France: Aggregated capacity and number of installations per energy source per \n",
    "    municipality (Commune). \n",
    "    Poland: Summed capacity and number of installations per energy source \n",
    "    per municipality (Powiat). \n",
    "    Due to different data availability, the power plant lists are of different \n",
    "    accurancy and partly provide different power plant parameter. Due to that, the \n",
    "    lists are provided as seperate csv-files per country and as separate sheets in the\n",
    "    excel file. Suspect data or entries with high probability of duplication are marked\n",
    "    in the column 'comment'. Theses validation markers are explained in the file\n",
    "    validation_marker.csv. Filtering all entries with comments out results in the recommended\n",
    "    data set.\n",
    "    Additionally, the Data Package includes a daily time series of cumulated\n",
    "    installed capacity per energy source type for Germany. All data processing is \n",
    "    conducted in Python and pandas and has been documented in the Jupyter Notebooks \n",
    "    linked below. \n",
    "version: \"2016-10-21\"\n",
    "keywords: [master data register,power plants,renewables,germany,denmark,france,poland,open power system data]\n",
    "geographical-scope: Germany, Denmark, France, Poland\n",
    "resources:\n",
    "    - path: renewable_power_plants_DE.csv\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      missingValue: \"\"\n",
    "      schema:         \n",
    "          fields:\n",
    "            - name: commissioning_date\n",
    "              type: date\n",
    "              format: YYYY-MM-DD\n",
    "              description: Date of commissioning of specific unit\n",
    "            - name: decommissioning_date\n",
    "              type: date\n",
    "              format: YYYY-MM-DD\n",
    "              description: Date of decommissioning of specific unit\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsd-contentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "              unit: MW\n",
    "            - name: thermal_capacity\n",
    "              description: Installed thermal capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "              unit: MW\n",
    "            - name: voltage_level\n",
    "              description: Voltage level of grid connection\n",
    "              type: string\n",
    "            - name: tso\n",
    "              description: Name of transmission system operator of the area the plant is located\n",
    "              type: string\n",
    "            - name: dso\n",
    "              description: Name of distribution system operator of the region the plant is located in\n",
    "              type: string\n",
    "            - name: dso_id\n",
    "              description: Company number of German distribution grid operator\n",
    "              type: string\n",
    "            - name: eeg_id\n",
    "              description: Power plant EEG (German feed-in tariff law) remuneration number\n",
    "              type: string\n",
    "            - name: bnetza_id\n",
    "              description: Power plant identification number by BNetzA\n",
    "              type: string\n",
    "            - name: federal_state\n",
    "              description: Name of German administrative level 'Bundesland'\n",
    "              type: string\n",
    "            - name: postcode\n",
    "              description: German zip-code\n",
    "              type: string\n",
    "            - name: municipality_code\n",
    "              description: German Gemeindenummer (municipalitiy number)\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: Name of German Gemeinde (municipality)\n",
    "              type: string\n",
    "            - name: address\n",
    "              description: Street name or name of land parcel\n",
    "              type: string\n",
    "            - name: address_number\n",
    "              description: House number or number of land parcel\n",
    "              type: string\n",
    "            - name: utm_zone\n",
    "              description: Universal Transverse Mercator zone value\n",
    "              type: \n",
    "            - name: utm_east\n",
    "              description: Coordinate in Universal Transverse Mercator (east)\n",
    "              type: numeric\n",
    "              format: float\n",
    "            - name: utm_north\n",
    "              description: Coordinate in Universal Transverse Mercator (north)\n",
    "              type: numeric\n",
    "              format: float\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: geopoint\n",
    "              format: lat\n",
    "            - name: lon\n",
    "              description: Longitude coordinates \n",
    "              type: geopoint\n",
    "              format: lon\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "            - name: comment\n",
    "              description: Shortcodes for comments related to this entry, explanation can be looked up in validation_marker.csv\n",
    "              type: string\n",
    "    - path: renewable_power_plants_DK.csv\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      missingValue: \"\"\n",
    "      schema:\n",
    "          fields:\n",
    "            - name: commissioning_date\n",
    "              type: date\n",
    "              format: YYYY-MM-DD\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsd-contentfilter: \"true\"\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "            - name: dso\n",
    "              description: Name of distribution system operator of the region the plant is located in\n",
    "              type: string\n",
    "            - name: gsrn_id\n",
    "              description: Danish wind turbine identifier number (GSRN)\n",
    "              type: number\n",
    "              format: integer\n",
    "            - name: postcode\n",
    "              description: Danish zip-code\n",
    "              type: string\n",
    "            - name: municipality_code\n",
    "              description: Danish 3-digit Kommune-Nr\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: Name of Danish Kommune\n",
    "              type: string\n",
    "            - name: address\n",
    "              description: Street name or name of land parcel\n",
    "              type: string\n",
    "            - name: address_number\n",
    "              description: House number or number of land parcel\n",
    "              type: string\n",
    "            - name: utm_east\n",
    "              description: Coordinate in Universal Transverse Mercator (east)\n",
    "              type: numeric\n",
    "              format: float\n",
    "            - name: utm_north\n",
    "              description: Coordinate in Universal Transverse Mercator (north)\n",
    "              type: numeric\n",
    "              format: float\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: geopoint\n",
    "              format: lat\n",
    "            - name: lon\n",
    "              description: Longitude coordinates \n",
    "              type: geopoint\n",
    "              format: lon\n",
    "            - name: hub_height\n",
    "              description: Wind turbine hub heigth in m\n",
    "              type: numeric\n",
    "              format: float\n",
    "            - name: rotor_diameter\n",
    "              description: Wind turbine rotor diameter in m\n",
    "              type: numeric\n",
    "              format: float\n",
    "            - name: manufacturer\n",
    "              description: Company that has built the wind turbine\n",
    "              type: string\n",
    "            - name: model\n",
    "              description: Wind turbind model type\n",
    "              type: string\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "    - path: renewable_power_plants_FR.csv\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      missingValue: \"\"\n",
    "      schema:\n",
    "          fields:\n",
    "            - name: municipality_code\n",
    "              description: French 5-digit INSEE code for Communes\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: Name of French Commune\n",
    "              type: string\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsd-contentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "            - name: number_of_installations\n",
    "              description: Number of installations of the energy source subtype in the municipality\n",
    "              type: number\n",
    "              format: integer\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: geopoint\n",
    "              format: lat\n",
    "            - name: lon\n",
    "              description: Longitude coordinates \n",
    "              type: geopoint\n",
    "              format: lon\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "    - path: renewable_power_plants_PL.csv\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      missingValue: \"\"\n",
    "      schema:\n",
    "          fields:\n",
    "            - name: district\n",
    "              description: Name of the Polish powiat\n",
    "              type: string\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsd-contentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "            - name: number_of_installations\n",
    "              description: Number of installations of the energy source subtype in the district\n",
    "              type: number\n",
    "              format: integer\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "    - path: renewable_power_plants.xlsx\n",
    "      format: xlsx\n",
    "    - path: validation_marker.csv\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      mediatype: text/csv\n",
    "      missingValue: \"\"\n",
    "      schema:         \n",
    "          fields:\n",
    "            - name: Validation_Marker\n",
    "              description: Name of validation marker utilized in column comment in the renewable_power_plant_germany.csv\n",
    "              type: string\n",
    "            - name: Explanation\n",
    "              description: Comment explaining meaning of validation marker\n",
    "              type: string\n",
    "    - path: renewable_capacity_timeseries_DE.csv\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      mediatype: text/csv\n",
    "      missingValue: \"\"\n",
    "      schema:         \n",
    "          fields:\n",
    "            - name:\n",
    "              description: Day\n",
    "              type: datetime\n",
    "              format: YYYY-MM-DD\n",
    "            - name: Solar\n",
    "              description: Cumulated electrical solar capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "              unit: MW\n",
    "            - name: Onshore\n",
    "              description: Cumulated electrical wind onshore capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "              unit: MW\n",
    "            - name: Offshore\n",
    "              description: Cumulated electrical wind offshore capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "              unit: MW\n",
    "            - name: Bioenergy and renewable waste\n",
    "              description: Cumulated electrical bioenergy and renewable waste capacity in MW\n",
    "              type: number\n",
    "              format: float\n",
    "              unit: MW\n",
    "            - name: Geothermal\n",
    "              description: Cumulated electrical geothermal capacity in MW\n",
    "              type: number  \n",
    "              format: float\n",
    "              unit: MW\n",
    "            - name: Run-of-river\n",
    "              description: Cumulated electrical run-of-river capacity in MW\n",
    "              type: number \n",
    "              format: float\n",
    "              unit: MW\n",
    "licenses:\n",
    "    - type: MIT license\n",
    "      url: http://www.opensource.org/licenses/MIT\n",
    "sources:\n",
    "    - name: BNetzA\n",
    "      web: http://www.bundesnetzagentur.de/cln_1422/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/ErneuerbareEnergien/Anlagenregister/Anlagenregister_Veroeffentlichung/Anlagenregister_Veroeffentlichungen_node.html\n",
    "      source: Bundesnetzagentur register of renewable power plants (excl. PV)\n",
    "    - name: BNetzA_PV\n",
    "      web: http://www.bundesnetzagentur.de/cln_1431/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/ErneuerbareEnergien/Photovoltaik/DatenMeldgn_EEG-VergSaetze/DatenMeldgn_EEG-VergSaetze_node.html    \n",
    "      source: Bundesnetzagentur register of PV power plants\n",
    "    - name: TransnetBW, TenneT, Amprion, 50Hertz, Netztransparenz.de\n",
    "      web: https://www.netztransparenz.de/de/Anlagenstammdaten.htm\n",
    "      source: Netztransparenz.de - information platform of German TSOs (register of renewable power plants in their control area)\n",
    "    - name: Postleitzahlen Deutschland\n",
    "      web: http://www.suche-postleitzahl.org/downloads\n",
    "      source: Zip codes of Germany linked to geo-information\n",
    "    - name: Energinet.dk\n",
    "      web: http://www.energinet.dk/SiteCollectionDocuments/Danske%20dokumenter/El/SolcelleGraf.xlsx\n",
    "      source: register of Danish wind power plants\n",
    "    - name: Energistyrelsen\n",
    "      web: http://www.ens.dk/sites/ens.dk/files/info/tal-kort/statistik-noegletal/oversigt-energisektoren/stamdataregister-vindmoeller/anlaegprodtilnettet.xls    \n",
    "      source: ens.dk - register of Danish PV power plants\n",
    "    - name: GeoNames\n",
    "      web: http://download.geonames.org/export/zip/\n",
    "      source: geonames.org\n",
    "    - name: French Ministery of the Environment, Energy and the Sea\n",
    "      web: http://www.statistiques.developpement-durable.gouv.fr/energie-climat/r/energies-renouvelables.html?tx_ttnews[tt_news]=20647\n",
    "      source: register of renewable power plants by generation type and municipality\n",
    "    - name: OpenDataSoft\n",
    "      web: http://public.opendatasoft.com/explore/dataset/correspondance-code-insee-code-postal/download/'\\\n",
    "           '?format=csv&refine.statut=Commune%20simple&timezone=Europe/Berlin&use_labels_for_header=true\n",
    "      source: Code Postal - Code INSEE\n",
    "    - name: Urzad Regulacji Energetyki (URE)\n",
    "      web: http://www.ure.gov.pl/uremapoze/mapa.html\n",
    "      source: Energy Regulatory Office of Poland\n",
    "contributors:\n",
    "    - name: Kristian Reincke\n",
    "      email: kristian.reincke@uni-flensburg.de\n",
    "      web: http://open-power-system-data.org/\n",
    "views: True\n",
    "openpowersystemdata-enable-listing: True\n",
    "documentation: https://github.com/Open-Power-System-Data/renewable_power_plants/blob/2016-10-21/main.ipynb\n",
    "last_changes: Included Danish, French and Polish data, Updated German input data\n",
    "\"\"\"\n",
    "\n",
    "metadata = yaml.load(metadata)\n",
    "\n",
    "datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))\n",
    "\n",
    "# Write the information of the metadata\n",
    "with open(os.path.join(path_package, 'datapackage.json'), 'w') as f:\n",
    "    f.write(datapackage_json)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [conda env:renewable_power_plants]",
   "language": "python",
   "name": "conda-env-renewable_power_plants-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  },
  "latex_envs": {
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 0
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}