{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"width:100%; background-color: #D9EDF7; border: 1px solid #CFCFCF; text-align: left; padding: 10px;\">\n",
    "      <b>Renewable power plants: Validation and output notebook</b>\n",
    "      <ul>\n",
    "        <li><a href=\"main.ipynb\">Main notebook</a></li>\n",
    "        <li><a href=\"download_and_process.ipynb\">Download and process notebook</a></li>\n",
    "        <li>Validation and output notebook</li>\n",
    "      </ul>\n",
    "      <br>This notebook is part of the <a href=\"http://data.open-power-system-data.org/renewable_power_plants\"> Renewable power plants Data Package</a> of <a href=\"http://open-power-system-data.org\">Open Power System Data</a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part 1 of the script (<a href=\"download_and_process.ipynb\">Download and process Notebook</a>) has downloaded and merged the original data. This Notebook subsequently checks, validates the list of renewable power plants and creates CSV/XLSX/SQLite files. It also generates a daily time series of cumulated installed capacities by energy source.\n",
    "\n",
    "*(Before running this script make sure you ran Part 1, so that the renewables.pickle files for each country exist in the same folder as the scripts)*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Initialization\" data-toc-modified-id=\"Initialization-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Initialization</a></span><ul class=\"toc-item\"><li><span><a href=\"#Script-setup\" data-toc-modified-id=\"Script-setup-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Script setup</a></span><ul class=\"toc-item\"><li><span><a href=\"#Load-the-list-of-sources\" data-toc-modified-id=\"Load-the-list-of-sources-1.1.1\"><span class=\"toc-item-num\">1.1.1&nbsp;&nbsp;</span>Load the list of sources</a></span></li></ul></li><li><span><a href=\"#Load-data\" data-toc-modified-id=\"Load-data-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Load data</a></span></li><li><span><a href=\"#Download-coastline-data\" data-toc-modified-id=\"Download-coastline-data-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Download coastline data</a></span></li></ul></li><li><span><a href=\"#Validation-Markers\" data-toc-modified-id=\"Validation-Markers-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Validation Markers</a></span><ul class=\"toc-item\"><li><span><a href=\"#Define-the-Markers-for-Germany\" data-toc-modified-id=\"Define-the-Markers-for-Germany-2.1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Define the Markers for Germany</a></span></li><li><span><a href=\"#Define-the-Markers-for-France\" data-toc-modified-id=\"Define-the-Markers-for-France-2.2\"><span class=\"toc-item-num\">2.2&nbsp;&nbsp;</span>Define the Markers for France</a></span></li><li><span><a href=\"#Define-the-Markers-for-the-United-Kingdom\" data-toc-modified-id=\"Define-the-Markers-for-the-United-Kingdom-2.3\"><span class=\"toc-item-num\">2.3&nbsp;&nbsp;</span>Define the Markers for the United Kingdom</a></span></li><li><span><a href=\"#Mark-the-data\" data-toc-modified-id=\"Mark-the-data-2.4\"><span class=\"toc-item-num\">2.4&nbsp;&nbsp;</span>Mark the data</a></span></li></ul></li><li><span><a href=\"#Harmonization\" data-toc-modified-id=\"Harmonization-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Harmonization</a></span><ul class=\"toc-item\"><li><span><a href=\"#Harmonizing-column-order\" data-toc-modified-id=\"Harmonizing-column-order-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>Harmonizing column order</a></span></li><li><span><a href=\"#Cleaning-columns\" data-toc-modified-id=\"Cleaning-columns-3.2\"><span class=\"toc-item-num\">3.2&nbsp;&nbsp;</span>Cleaning columns</a></span></li><li><span><a href=\"#Sort\" data-toc-modified-id=\"Sort-3.3\"><span class=\"toc-item-num\">3.3&nbsp;&nbsp;</span>Sort</a></span></li><li><span><a href=\"#Leave-unspecified-cells-blank\" data-toc-modified-id=\"Leave-unspecified-cells-blank-3.4\"><span class=\"toc-item-num\">3.4&nbsp;&nbsp;</span>Leave unspecified cells blank</a></span></li><li><span><a href=\"#Separate-dirty-from-clean\" data-toc-modified-id=\"Separate-dirty-from-clean-3.5\"><span class=\"toc-item-num\">3.5&nbsp;&nbsp;</span>Separate dirty from clean</a></span></li></ul></li><li><span><a href=\"#Drop-duplicates\" data-toc-modified-id=\"Drop-duplicates-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Drop duplicates</a></span></li><li><span><a href=\"#Capacity-time-series\" data-toc-modified-id=\"Capacity-time-series-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Capacity time series</a></span><ul class=\"toc-item\"><li><span><a href=\"#Make-separate-series-for-Great-Britain-and-Northern-Ireland\" data-toc-modified-id=\"Make-separate-series-for-Great-Britain-and-Northern-Ireland-5.1\"><span class=\"toc-item-num\">5.1&nbsp;&nbsp;</span>Make separate series for Great Britain and Northern Ireland</a></span></li><li><span><a href=\"#Create-total-wind-columns\" data-toc-modified-id=\"Create-total-wind-columns-5.2\"><span class=\"toc-item-num\">5.2&nbsp;&nbsp;</span>Create total wind columns</a></span></li><li><span><a href=\"#Create-one-time-series-file-containing-al-countries\" data-toc-modified-id=\"Create-one-time-series-file-containing-al-countries-5.3\"><span class=\"toc-item-num\">5.3&nbsp;&nbsp;</span>Create one time series file containing al countries</a></span></li></ul></li><li><span><a href=\"#Make-the-normalized-dataframe-for-all-the-countries\" data-toc-modified-id=\"Make-the-normalized-dataframe-for-all-the-countries-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>Make the normalized dataframe for all the countries</a></span></li><li><span><a href=\"#Output\" data-toc-modified-id=\"Output-7\"><span class=\"toc-item-num\">7&nbsp;&nbsp;</span>Output</a></span><ul class=\"toc-item\"><li><span><a href=\"#Write-data-files\" data-toc-modified-id=\"Write-data-files-7.1\"><span class=\"toc-item-num\">7.1&nbsp;&nbsp;</span>Write data files</a></span><ul class=\"toc-item\"><li><span><a href=\"#Write-CSV-files\" data-toc-modified-id=\"Write-CSV-files-7.1.1\"><span class=\"toc-item-num\">7.1.1&nbsp;&nbsp;</span>Write CSV-files</a></span></li><li><span><a href=\"#Write-XLSX-files\" data-toc-modified-id=\"Write-XLSX-files-7.1.2\"><span class=\"toc-item-num\">7.1.2&nbsp;&nbsp;</span>Write XLSX-files</a></span></li><li><span><a href=\"#Write-SQLite\" data-toc-modified-id=\"Write-SQLite-7.1.3\"><span class=\"toc-item-num\">7.1.3&nbsp;&nbsp;</span>Write SQLite</a></span></li></ul></li><li><span><a href=\"#Write-meta-data\" data-toc-modified-id=\"Write-meta-data-7.2\"><span class=\"toc-item-num\">7.2&nbsp;&nbsp;</span>Write meta data</a></span></li><li><span><a href=\"#Generate-checksums\" data-toc-modified-id=\"Generate-checksums-7.3\"><span class=\"toc-item-num\">7.3&nbsp;&nbsp;</span>Generate checksums</a></span></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Initialization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-08-21T01:29:31.407554Z",
     "start_time": "2020-08-21T01:29:31.402967Z"
    }
   },
   "outputs": [],
   "source": [
    "settings = {\n",
    "    'version': '2020-08-25',\n",
    "    'changes': 'Updated all countries with new data available (DE, FR, PL, CH, DK, UK), added data for CZ and SE.'\n",
    "}\n",
    "\n",
    "settings['referenceDate'] = settings['version']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Script setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-08-21T01:29:36.956012Z",
     "start_time": "2020-08-21T01:29:32.227259Z"
    },
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "import json\n",
    "import logging\n",
    "import os\n",
    "import urllib.parse\n",
    "import re\n",
    "import zipfile\n",
    "import gc\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import requests\n",
    "import sqlalchemy\n",
    "import yaml\n",
    "import hashlib\n",
    "import os\n",
    "import fiona\n",
    "import cartopy.io.shapereader as shpreader\n",
    "import shapely.geometry as sgeom\n",
    "from shapely.prepared import prep\n",
    "from shapely.ops import unary_union\n",
    "import fake_useragent\n",
    "import datetime\n",
    "import xlsxwriter\n",
    "from IPython.display import Markdown\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "# Option to make pandas display 40 columns max per dataframe (default is 20)\n",
    "pd.options.display.max_columns = 40\n",
    "\n",
    "# Create input and output folders if they don't exist\n",
    "os.makedirs(os.path.join('input', 'original_data'), exist_ok=True)\n",
    "\n",
    "os.makedirs('output', exist_ok=True)\n",
    "os.makedirs(os.path.join('output', 'renewable_power_plants'), exist_ok=True)\n",
    "package_path = os.path.join('output', 'renewable_power_plants',settings['version'])\n",
    "os.makedirs(package_path, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load the list of sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-08-21T01:29:37.286266Z",
     "start_time": "2020-08-21T01:29:36.958463Z"
    }
   },
   "outputs": [],
   "source": [
    "source_df = pd.read_csv(os.path.join('input', 'sources.csv'))\n",
    "source_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, select the countries you want to validate. Fill in the array `countries` with their codes.\n",
    "\n",
    "Run the following cell to see which countries are available in this version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-08-21T01:29:37.299457Z",
     "start_time": "2020-08-21T01:29:37.288117Z"
    }
   },
   "outputs": [],
   "source": [
    "set(source_df['country'].unique().tolist()) - set(['EU'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-08-21T01:29:38.385196Z",
     "start_time": "2020-08-21T01:29:38.381027Z"
    }
   },
   "outputs": [],
   "source": [
    "# Fill in this array with the codes of the countries you want to validate.\n",
    "# E.g. countries = ['CH', 'CZ', 'DE', 'DK', 'FR', 'PL', 'SE', 'UK']\n",
    "countries = ['CH', 'CZ', 'DE', 'DK', 'FR', 'PL', 'SE', 'UK']\n",
    "#countries = ['FR']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, load the data on the selected countries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_countries = set(source_df['country'].unique().tolist()) - set(['EU'])\n",
    "all_countries_non_DE = all_countries - set(['DE'])\n",
    "#all_countries_dirty = set(['DE_dirty', 'FR_dirty'])\n",
    "all_countries_including_dirty = all_countries | set(['DE_dirty', 'FR_dirty'])\n",
    "\n",
    "# Read data from script Part 1 download_and_process\n",
    "dfs = {}\n",
    "for country in countries:\n",
    "    print('Loading', country)\n",
    "    path = os.path.join('intermediate', country+'_renewables.pickle')\n",
    "    if os.path.exists(path):\n",
    "        dfs[country] = pd.read_pickle(path)\n",
    "        print('\\tDone!')\n",
    "    else:\n",
    "        print('\\tThe file', path, 'does not exist.')\n",
    "    # Calling garbage collector may speed up the process on some machines\n",
    "    gc.collect()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download coastline data\n",
    "\n",
    "The coastline shapefile is needed to check if the geocoordinates of the land powerplants point to a land location, and conversely, if the geocoordinates of the onshore facilities point to a location not on land."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "coastline_url = 'https://www.ngdc.noaa.gov/mgg/shorelines/data/gshhg/latest/gshhg-shp-2.3.7.zip'\n",
    "\n",
    "user_agent = fake_useragent.UserAgent()\n",
    "\n",
    "directory_path = os.path.join('input', 'maps', 'coastline')\n",
    "os.makedirs(directory_path, exist_ok=True)\n",
    "filepath = os.path.join(directory_path, 'gshhg-shp-2.3.7.zip')\n",
    "\n",
    "# check if the file exists; if not, download it\n",
    "if not os.path.exists(filepath):\n",
    "    session = requests.session()\n",
    "    print(coastline_url)\n",
    "    print('Downloading...')\n",
    "    headers = {'User-Agent' : user_agent.random}\n",
    "    r = session.get(coastline_url, headers=headers, stream=True)\n",
    "    total_size = r.headers.get('content-length')\n",
    "    total_size = int(total_size)\n",
    "    chuncksize = 4096\n",
    "    with open(filepath, 'wb') as file:\n",
    "        downloaded = 0\n",
    "        for chunck in r.iter_content(chuncksize):\n",
    "            file.write(chunck)\n",
    "            downloaded += chuncksize\n",
    "            print('\\rProgress: {:.2f}%'.format(100 * downloaded / float(total_size)), end='')\n",
    "    print(' Done.')\n",
    "    zip_ref = zipfile.ZipFile(filepath, 'r')\n",
    "    zip_ref.extractall(directory_path)\n",
    "    zip_ref.close()\n",
    "else:\n",
    "    print('The file is already there:', filepath)\n",
    "    filepath = '' + filepath\n",
    "\n",
    "coastline_shapefile_path = os.path.join('input', 'maps', 'coastline', 'GSHHS_shp', 'f', 'GSHHS_f_L1.shp')\n",
    "print(\"Shapefile path: \", coastline_shapefile_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Validation Markers\n",
    "\n",
    "This section checks the DataFrame for a set of pre-defined criteria and adds markers to the entries in an additional column. The marked data will be included in the output files, but marked, so that they can be easiliy filtered out. For creating the validation plots and the time series, suspect data is skipped.\n",
    "\n",
    "Each marker will be represented by its code, the country code it applies to, short and long descriptions, as well as a `python` function that returns a boolean mask whose `True` values denote the marked rows.\n",
    "\n",
    "We proceed by definining the markers for Germany, France and the United Kingdom, and storing them into a dictionary. We also define a routine for extracting country-specific markers from it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a dictionary to store markers\n",
    "validation_markers = {}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a function for extracting country-specific markers from the dictionary\n",
    "def get_markers(validation_markers, country):\n",
    "    markers = {}\n",
    "    for key in validation_markers:\n",
    "        if validation_markers[key]['Country'] == country:\n",
    "            short_explanation = validation_markers[key]['Short explanation']\n",
    "            long_explanation = validation_markers[key]['Long explanation']\n",
    "            marker_function = validation_markers[key]['function']\n",
    "            \n",
    "            markers[key] = {'Short explanation' : short_explanation,\n",
    "                            'Long explanation' : long_explanation,\n",
    "                            'function' : marker_function}\n",
    "    return markers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define the Markers for Germany"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "key = 'R_1'\n",
    "cutoff_date_bnetza = '2017-12-31'\n",
    "cutoff_date_bnetza = pd.Timestamp(2017, 12, 31)\n",
    "\n",
    "R1_DE_marker_function = lambda df: (df['commissioning_date'] <= cutoff_date_bnetza) &\\\n",
    "                        (df['data_source'].isin(['BNetzA', 'BNetzA_PV', 'BNetzA_PV_historic']))\n",
    "\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"DE\",\n",
    "    \"function\" : R1_DE_marker_function,\n",
    "    \"Short explanation\": \"data_source = BNetzA and commissioning_date < \" + str(cutoff_date_bnetza.date()),\n",
    "    \"Long explanation\": \"This powerplant is probably also represented by an entry from the TSO data and should therefore be filtered out.\"\n",
    "}\n",
    "\n",
    "key = 'R_2'\n",
    "R2_DE_marker_function = lambda df: (df['notification_reason'] != 'Inbetriebnahme') & \\\n",
    "                                   (df['data_source'] == 'BNetzA')\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"DE\",\n",
    "    \"function\" : R2_DE_marker_function,\n",
    "    \"Short explanation\": \"notification_reason other than commissioning (Inbetriebnahme)\",\n",
    "    \"Long explanation\": \"This powerplant is probably represented by an earlier entry already (possibly also from the TSO data) and should therefore be filtered out.\"\n",
    "}\n",
    "\n",
    "key = 'R_3'\n",
    "R3_DE_marker_function = lambda df : df['commissioning_date'].isnull()\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"DE\",\n",
    "    \"function\" : R3_DE_marker_function,\n",
    "    \"Short explanation\": \"commissioning_date not specified\",\n",
    "    \"Long explanation\": \"\"\n",
    "}\n",
    "\n",
    "key = 'R_4'\n",
    "R4_DE_marker_function = lambda df : df['electrical_capacity'] <= 0.0\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"DE\",\n",
    "    \"function\" : R4_DE_marker_function,\n",
    "    \"Short explanation\": \"electrical_capacity not specified\",\n",
    "    \"Long explanation\": \"\"\n",
    "}\n",
    "\n",
    "key = 'R_5'\n",
    "# Just the entry which is not double should be kept, thus the other one is marked\n",
    "R5_DE_marker_function = lambda df : df['grid_decommissioning_date'].isnull() == False \n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"DE\",\n",
    "    \"function\" : R5_DE_marker_function,\n",
    "    \"Short explanation\": \"decommissioned from the grid\",\n",
    "    \"Long explanation\": \"This powerplant is probably commissioned again to the grid of another grid operator and therefore this doubled entry should be filtered out.\"\n",
    "}\n",
    "\n",
    "key = 'R_6'\n",
    "R6_DE_marker_function = lambda df: df['decommissioning_date'].isnull() == False\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"DE\",\n",
    "    \"function\" : R6_DE_marker_function,\n",
    "    \"Short explanation\": \"decommissioned\",\n",
    "    \"Long explanation\": \"This powerplant is completely decommissioned.\"\n",
    "}\n",
    "\n",
    "# Note that we skip R7 here as R7 is used for \n",
    "# Frech oversees power plants below (we never change meanings of R markers, so R7 stays reserved for that)\n",
    "key = 'R_8' \n",
    "# note that this depends on BNetzA items to be last in list, because we want to keep the TSO items\n",
    "R8_DE_marker_function = lambda df: (df.duplicated(['eeg_id'],keep='first')) & \\\n",
    "                                  (df['eeg_id'].isnull() == False)\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"DE\",\n",
    "    \"function\" : R8_DE_marker_function,\n",
    "    \"Short explanation\": \"duplicate_eeg_id\",\n",
    "    \"Long explanation\": \"This power plant is twice in the data (e.g. through BNetzA and TSOs).\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define the Markers for France"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "key = 'R_7'\n",
    "R7_FR_marker_function = lambda df: (df['lat'] < 41) | (df['lon'] < -6) | (df['lon'] > 10)\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"FR\",\n",
    "    \"function\" : R7_FR_marker_function,\n",
    "    \"Short explanation\": \"not connected to the European grid\",\n",
    "    \"Long explanation\": \"This powerplant is located in regions belonging to France but not located in Europe (e.g. Guadeloupe).\"\n",
    "}\n",
    "\n",
    "key = 'R_10'\n",
    "R10_FR_marker_function = lambda df: ((df['data_source'] == 'OPEN DATA RESEAUX ENERGIES') &\n",
    "                                     (\n",
    "                                      (df['commissioning_date'].isnull()) |\n",
    "                                      (df['disconnection_date'].isnull() == False)\n",
    "                                     )\n",
    "                                    )\n",
    "\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"FR\",\n",
    "    \"function\" : R10_FR_marker_function,\n",
    "    \"Short explanation\": \"inactive\",\n",
    "    \"Long explanation\": \"This powerplant is inactive: not commissioned or is disconnected from the grid.\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define the Markers for the United Kingdom"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a function to check if an offshore powerplant is not wind\n",
    "geoms = fiona.open(coastline_shapefile_path)\n",
    "land_geom = sgeom.MultiPolygon([sgeom.shape(geom['geometry']) for geom in geoms])\n",
    "land = prep(land_geom)\n",
    "\n",
    "def not_on_land_but_should_be(powerplant_data):\n",
    "    longitude = powerplant_data['lon']\n",
    "    latitude = powerplant_data['lat']\n",
    "    if pd.isnull(longitude) or pd.isnull(latitude):\n",
    "        return False\n",
    "    not_on_land = not land.contains(sgeom.Point(longitude, latitude))\n",
    "    offshore_ok =  'Offshore' in [powerplant_data['region'], powerplant_data['municipality']] or \\\n",
    "                  (powerplant_data['energy_source_level_2'] in ['Wind', 'Marine'])\n",
    "    return not_on_land and not offshore_ok\n",
    "    \n",
    "\n",
    "key = 'R_9'\n",
    "R9_UK_marker_function = lambda df: df.apply(not_on_land_but_should_be, axis=1)\n",
    "validation_markers[key] = {\n",
    "    \"Country\" : \"UK\",\n",
    "    \"function\" : R9_UK_marker_function,\n",
    "    \"Short explanation\": \"Not on land, but should be.\",\n",
    "    \"Long explanation\": \"The geocoordinates of this powerplant indicate that it is not on the UK mainland, but the facility is not an offshore wind farm.\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mark the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for country in countries:\n",
    "    print(country)\n",
    "    \n",
    "    # Get markers for the country in question\n",
    "    markers = get_markers(validation_markers, country)\n",
    "    \n",
    "    if len(markers) > 0:\n",
    "        # Create an empty marker column\n",
    "        dfs[country]['comment'] = \"\"\n",
    "        \n",
    "        for key in markers:\n",
    "            # Extract the marker function\n",
    "            print('\\t', key)\n",
    "            marker_function =  markers[key]['function']\n",
    "        \n",
    "            # Mark the data\n",
    "            marked_mask = marker_function(dfs[country])\n",
    "            #marked_indices = (marked_mask[marked_mask == True]).index\n",
    "        \n",
    "            # Add the marker key to the comment column of the marked rows\n",
    "            dfs[country].loc[marked_mask, 'comment'] += (key + '|')\n",
    "        \n",
    "            # Remove unnecessary variables which may be taking up a lot of memory\n",
    "            del marked_mask\n",
    "        print('\\tDone!')\n",
    "    else:\n",
    "        print('\\tNo markers for this country.')\n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For each country, show the total capacity aggregated by  (1) the comment and data source and (2) the comment and energy level 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "for country in countries:\n",
    "    if 'comment' in dfs[country].columns:\n",
    "        print(country)\n",
    "        # Summarize the capacity of data by comment and data_source\n",
    "        summary = dfs[country].groupby(['comment', 'data_source'])['electrical_capacity'].sum().to_frame()\n",
    "        display(summary)\n",
    "\n",
    "        # Summarize the capacity of data by comment and energy type\n",
    "        summary = (dfs[country].groupby(['comment', 'energy_source_level_2'])['electrical_capacity']\n",
    "                    .sum().to_frame())\n",
    "        display(summary)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Harmonization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Harmonizing column order\n",
    "\n",
    "Here, we define the order of the columns for each country.\n",
    "\n",
    "In order to be consistent, we adopt the following approach. We define the following groups of columns:\n",
    "1. **Energy columns**: those that show what type of energy is being produced at the power plant and what's the plant electrical capacity.\n",
    "2. **Source columns**: those that describe the source which provided the data.\n",
    "3. **Location columns**: those that provide us information on where the plant is (e.g. address, geographical latitude and longitude, and so on).\n",
    "4. **Temporal columns**: those that represent the dates that are important to know about the facility (such as the commissioning date).\n",
    "5. **Owner columns**: those that describe the plant's owner.\n",
    "6. **Name and id columns**: those that show the plant's name and id in the original data provided by the source.\n",
    "7. **Technical columns**: those that show us some technical aspects of the plants, such as the number of installations or the rotor's diameter.\n",
    "8. **Other columns**: those that do not belong to any group previously defined.\n",
    "\n",
    "Each column that appears in any dataset is classified as belonging to one and only one group. For each group, we define the inner order of the columns that belong to it.\n",
    "\n",
    "We will enforce the following rules for all the countries:\n",
    "1. If columns `A` and `B` are in the same group, then `A` appears before `B` if and only if it precedes `B` in the group's inner order, and vice versa.\n",
    "2. If column `A` belongs to group `i` and column `B` to group `j`, then `A` appears before `B` if and only if `i<j`, and vice versa.\n",
    "\n",
    "This means that all energy columns present in the data appear before any source column present in the set, that all location columns in the dataset appear before any temporal column, and so on. The columns in the same group follow the group's inner order. If we were to order the union of column names of all the datasets according to these rules, we would get an array of columns that we'll call **the default order**.\n",
    "\n",
    "**Note 1**: The datasets of different countries have different columns. For example, not all location columns are  present in all the datasets because not all sources describe their plants with the same level of precision.\n",
    "\n",
    "**Note 2**: All the datasets have all the columns in the **energy** and **source** columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The columns in this group show what type of energy is being produced\n",
    "# at the facility in question and what is its capacity.\n",
    "energy_columns = [\n",
    "    'electrical_capacity', 'energy_source_level_1', 'energy_source_level_2',\n",
    "    'energy_source_level_3', 'technology'\n",
    "]\n",
    "\n",
    "# The columns in this group describe the data source.\n",
    "source_columns = [\n",
    "    'data_source'\n",
    "]\n",
    "\n",
    "# The columns in this group bear information on the power plant's location\n",
    "# such as NUTS codes, latitude and longitude, municipality etc.\n",
    "location_columns = [\n",
    "    'nuts_1_region', 'nuts_2_region', 'nuts_3_region', \n",
    "    'lon', 'lat', 'municipality', 'municipality_code', \n",
    "    'postcode', 'address', 'region', 'region_code', \n",
    "    'municipality_group', 'municipality_group_code', \n",
    "    'departement', 'departement_code','county', 'locality',\n",
    "    'country', 'district', 'canton', 'federal_state'\n",
    "]\n",
    "\n",
    "# The columns in this group refer to the significant dates related\n",
    "# to the power plants, such as its commissioning date.\n",
    "temporal_columns = [\n",
    "    'commissioning_date', 'decommissioning_date', 'connection_date',\n",
    "    'disconnection_date', 'contract_period_end'\n",
    "]\n",
    "\n",
    "# The columns in this groupe provide the information on the plant's owner.\n",
    "owner_columns = [\n",
    "    'owner', 'company'\n",
    "]\n",
    "\n",
    "# The columns in this group provide the plant's name and its id in the\n",
    "# original data supplied by the data source.\n",
    "name_and_id_columns = [\n",
    "    'site_name', 'IRIS_code', 'URE_id', 'eeg_id', 'EIC_code',\n",
    "    'uk_beis_id', 'se_vindbrukskollen_id'\n",
    "]\n",
    "\n",
    "# These columns describe some technical aspects of the plant in question.\n",
    "technical_columns = [\n",
    "    'solar_mounting_type', 'chp', 'hub_height', 'rotor_diameter',\n",
    "    'model', 'capacity_individual_turbine', 'gsrn_id', 'number_of_turbines',\n",
    "    'voltage_level', 'number_of_installations'\n",
    "]\n",
    "\n",
    "# The columns in this groups are those that do not belong to any group\n",
    "# previously defined.\n",
    "other_columns = [\n",
    "    'tariff', 'project_name', 'dso', 'dso_id', 'tso', 'operator',\n",
    "    'manufacturer', 'production','as_of_year', 'comment', 'geographical_resolution'\n",
    "]\n",
    "\n",
    "# Now, we define the order in which the columns may appear in a dataframe for a country.\n",
    "# This means that, e.g., all energy columns in a country's dataframe must appear before any source column,\n",
    "# that all location columns must appear before any temporal column, and so on.\n",
    "ordered_groups = [\n",
    "    energy_columns, source_columns, location_columns, temporal_columns, \n",
    "    owner_columns, technical_columns, name_and_id_columns, other_columns\n",
    "]\n",
    "\n",
    "# Merge or the columns\n",
    "default_order = []\n",
    "for group in ordered_groups:\n",
    "    default_order.extend(group)\n",
    "\n",
    "# Uncomment the following line to show the default order\n",
    "#default_order"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us know apply those rules and define the order for each country."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up the variable to contain the text for the markdown displaying the ordered columns for all the countries\n",
    "markdown_text = '| Country | Order of Columns |\\n|:---|:---|\\n'\n",
    "\n",
    "# and a dictionary to contain the order for each country.\n",
    "column_lists = {}\n",
    "\n",
    "# Fill the dictionary and the text.\n",
    "for country in dfs:\n",
    "    # Make sure that all countries have all the energy columns \n",
    "    # even if it means that some are going to be empty\n",
    "    country_columns = list(dfs[country].columns)\n",
    "    country_columns = list(set(country_columns + energy_columns))\n",
    "    \n",
    "    # Determine the order.\n",
    "    order_for_the_country = [column for column in default_order if column in country_columns]\n",
    "    \n",
    "    # Remember it.\n",
    "    column_lists[country] = order_for_the_country\n",
    "    \n",
    "    # Prepare the markdown to display\n",
    "    columns_as_text = ', '.join(order_for_the_country)\n",
    "    markdown_text += '| {} | {} |\\n'.format(country, columns_as_text)\n",
    "\n",
    "# Display the order of columns for each country\n",
    "markdown = Markdown(markdown_text)\n",
    "display(markdown)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's sort the columns according to the orders we defined above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for country in column_lists:\n",
    "    if country in countries and country in dfs:\n",
    "        print('Harmonizing the order for', country)\n",
    "        for column in column_lists[country]:\n",
    "            if column not in dfs[country].columns:\n",
    "                print(country, 'has no column named:', column, '. It will be empty. Check if that\\'s ok.')\n",
    "        dfs[country] = dfs[country].loc[:, column_lists[country]]\n",
    "        print('\\tDone!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleaning columns\n",
    "\n",
    "Here, we clean the columns to make sure that integer columns really contain integers, to enforce the format for dates (YYYY-MM-DD), to control the precision for decimal columns, and to make all strings one-line and without trailing white spaces."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cleaning_specs = {\n",
    "    'decimal' : {\n",
    "        'DE': ['electrical_capacity','lat','lon'],\n",
    "        'DK': ['electrical_capacity','lat','lon'],\n",
    "        'CH': ['electrical_capacity','lat','lon'],\n",
    "        'FR': ['electrical_capacity','lat','lon'],\n",
    "        'PL': ['electrical_capacity'],\n",
    "        'UK': ['electrical_capacity', 'lat', 'lon'],\n",
    "        'SE': ['electrical_capacity', 'lat', 'lon'],\n",
    "        'CZ': ['electrical_capacity', 'lat', 'lon']\n",
    "    },\n",
    "    'integer': {\n",
    "        'DE': ['municipality_code'],\n",
    "        'UK': ['uk_beis_id', 'number_of_turbines'],\n",
    "        'FR': ['IRIS_code', 'departement_code', 'municipality_code', 'municipality_group_code',\n",
    "              'region_code', 'as_of_year'],\n",
    "        'PL': ['URE_id', 'as_of_year'],\n",
    "        'DK': ['gsrn_id']\n",
    "    },\n",
    "    'date': {\n",
    "        'DE': ['commissioning_date', 'decommissioning_date'],\n",
    "        'DK': ['commissioning_date'],\n",
    "        'FR': ['commissioning_date', 'connection_date', 'disconnection_date'],\n",
    "        'CH': ['commissioning_date'],\n",
    "        'UK': ['commissioning_date'],\n",
    "        'SE': ['commissioning_date']\n",
    "    },\n",
    "    'one-line string': {\n",
    "        'DE' : ['federal_state', 'municipality', 'address'],\n",
    "        'DK' : ['municipality', 'address', 'manufacturer', 'model'],\n",
    "        'FR' : ['municipality', 'EIC_code', 'site_name', 'departement', 'municipality_group', 'region'],\n",
    "        'PL' : ['district', 'region'],\n",
    "        'CH' : ['municipality', 'project_name', 'canton', 'address', 'company'],\n",
    "        'UK' : ['address', 'municipality', 'site_name', 'region'],\n",
    "        'SE' : ['municipality', 'county', 'manufacturer', 'se_vindbrukskollen_id', 'site_name'],\n",
    "        'CZ' : ['site_name', 'owner', 'region', 'municipality', 'locality']\n",
    "    }\n",
    "}\n",
    "\n",
    "def to_1_line(string):\n",
    "    if pd.isnull(string) or not isinstance(string, str):\n",
    "        return string\n",
    "    return string.replace('\\r', '').replace('\\n', '')\n",
    "\n",
    "for cleaning_type, cleaning_spec in cleaning_specs.items():\n",
    "    for country, fields in cleaning_spec.items():\n",
    "        if country not in countries or country not in dfs:\n",
    "            continue\n",
    "        for field in fields:\n",
    "            print('Cleaning ' + country + '.' + field +' to ' + cleaning_type + '.')\n",
    "            if cleaning_type == 'decimal':\n",
    "                dfs[country][field] = dfs[country][field].map(lambda x: round(x, 14))\n",
    "            elif cleaning_type == 'integer':\n",
    "                dfs[country][field] = pd.to_numeric(dfs[country][field], errors='coerce')\n",
    "                dfs[country][field] = dfs[country][field].map(lambda x: '%.0f' % x)  \n",
    "            elif cleaning_type == 'date':\n",
    "                dfs[country][field] = dfs[country][field].apply(lambda x: x.date())\n",
    "            elif cleaning_type == 'one-line string':\n",
    "                dfs[country][field] = dfs[country][field].apply(lambda x: to_1_line(x))\n",
    "\n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sort \n",
    "\n",
    "Now, let us sort the plants in each dataset by the commissioning date, or the most precise location column if the source does not provide the date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sort_by = {\n",
    "    'DE': 'commissioning_date',\n",
    "    'DK': 'commissioning_date',\n",
    "    'CH': 'commissioning_date',\n",
    "    'FR': 'commissioning_date',\n",
    "    'PL': 'district',\n",
    "    'UK': 'commissioning_date',\n",
    "    'SE': 'commissioning_date'\n",
    "}\n",
    "\n",
    "for country, sort_by in sort_by.items():\n",
    "    if country not in countries or country not in dfs:\n",
    "        continue\n",
    "    print('Sorting', country)\n",
    "    dfs[country] = dfs[country].iloc[dfs[country][sort_by].sort_values().index]\n",
    "    dfs[country].reset_index(drop=True, inplace=True)\n",
    "    print('\\tDone!')\n",
    "    \n",
    "print('Done!')\n",
    "del sort_by"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Leave unspecified cells blank\n",
    "\n",
    "This step may take some time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "for country in dfs:\n",
    "    print(country)\n",
    "    dfs[country].replace('nan', '', inplace=True)\n",
    "    dfs[country].fillna('', inplace=True)\n",
    "    print('\\tDone!')\n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Separate dirty from clean\n",
    "\n",
    "We separate all plants in the FR and DE datasets which have a validation marker in the comments column into a separate DataFrame and eventually also in a separate CSV file, so the main country files only contain \"clean\" plants, i.e. those without any special comment. This is useful since all our comments denote that most people would probably not like to include them in their calculations.\n",
    "\n",
    "The dirty power plants in the UK are not separated from the clean."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "dirty_keys = {\n",
    "    'DE' : 'DE_dirty',\n",
    "    'FR' : 'FR_dirty'\n",
    "}\n",
    "\n",
    "dirty_countries = [key for key in dirty_keys if key in countries]\n",
    "\n",
    "for country in dirty_countries:\n",
    "    idx_dirty = dfs[country][dfs[country].comment.str.len() > 1].index\n",
    "    \n",
    "    dirty_key = dirty_keys[country]\n",
    "    \n",
    "    dfs[dirty_key] = dfs[country].loc[idx_dirty]\n",
    "    dfs[dirty_key].drop_duplicates(inplace=True)\n",
    "    dfs[dirty_key].reset_index(drop=True, inplace=True)\n",
    "    \n",
    "    dfs[country] = dfs[country].drop(idx_dirty, axis='index')\n",
    "    dfs[country] = dfs[country].drop('comment', axis='columns')\n",
    "    dfs[country].reset_index(drop=True, inplace=True)\n",
    "    \n",
    "    del idx_dirty\n",
    "    gc.collect()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the order for the outvalidated plants in Germany and France\n",
    "column_lists['DE_dirty'] = list(column_lists['DE'])\n",
    "column_lists['FR_dirty'] = list(column_lists['FR']) \n",
    "\n",
    "# Remove the comment from the column orders for France and Germany\n",
    "for country in dirty_keys:\n",
    "    column_lists[country] = [column for column in column_lists[country] if column != 'comment']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Drop duplicates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for country in dfs:\n",
    "    print(country)\n",
    "    before = dfs[country].shape[0]\n",
    "    dfs[country].drop_duplicates(inplace=True)\n",
    "    dfs[country].reset_index(drop=True, inplace=True)\n",
    "    after = dfs[country].shape[0]\n",
    "    print('\\tDone! Dropped {} out of {} plants.'.format(before - after, before))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Capacity time series\n",
    "\n",
    "This section creates a daily and yearly time series of the cumulated installed capacity by energy source for the United Kingdom, Germany, Denmark, Switzerland, France and Sweden (if selected at the beginning of the notebook). Three time series will be created for the UK: one for the whole country (GB-UKM), one for Northern Ireland (GB-NIR), and one for the Great Britain (GB-GBN). The data will be a part of the output and will be compared in a plot for validation in the next section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "daily_timeseries = {}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def to_new_level(row):\n",
    "    if(row['energy_source_level_2'] == 'Wind'):\n",
    "        energy_type_label = (row['energy_source_level_2']+'_'+row['technology']).lower()\n",
    "    else:\n",
    "        energy_type_label = row['energy_source_level_2'].lower()\n",
    "\n",
    "    return energy_type_label\n",
    "\n",
    "def min_date(series):\n",
    "    mask = series.apply(lambda x: type(x) == datetime.date)\n",
    "    return series[mask].min()\n",
    "\n",
    "def max_date(series):\n",
    "    mask = series.apply(lambda x: type(x) == datetime.date)\n",
    "    return series[mask].max()\n",
    "\n",
    "def to_daily_timeseries(df, start_date, end_date): \n",
    "    # Filter out missing dates\n",
    "    invalid_date_mask = df['commissioning_date'].apply(lambda x: pd.isna(x) or (isinstance(x, str) and len(x) == 0))\n",
    "    df = df.loc[~invalid_date_mask, :]\n",
    "    \n",
    "    # Combine energy levels to new standardized values\n",
    "    energy_type = df[['energy_source_level_2', 'energy_source_level_3', 'technology']].apply(to_new_level, axis=1)\n",
    "    df['energy_type'] = energy_type\n",
    "    \n",
    "    # Set range of time series as index\n",
    "    daily_timeseries = pd.DataFrame(index=pd.date_range(start=start_date, end=end_date, freq='D'))\n",
    "    \n",
    "    # Create cumulated time series per energy source for both yearly and daily time series\n",
    "    for energy_type in df['energy_type'].unique():\n",
    "        temp = (df[['commissioning_date', 'electrical_capacity']]\n",
    "            .loc[df['energy_type'] == energy_type])\n",
    "        temp_timeseries = temp.set_index('commissioning_date')\n",
    "        temp_timeseries.index = pd.DatetimeIndex(temp_timeseries.index)\n",
    "\n",
    "        # Create cumulated time series per energy_source and day\n",
    "        resampled = temp_timeseries.resample('D')\n",
    "        summed_by_day = resampled.sum()\n",
    "        cumulative_sums = summed_by_day.cumsum()\n",
    "        \n",
    "        daily_timeseries[energy_type] = cumulative_sums.fillna(method='ffill') # fill missing values\n",
    "        \n",
    "        # Make sure that the columns are properly filled\n",
    "        daily_timeseries[energy_type]= daily_timeseries[energy_type].fillna(method='ffill').fillna(value=0)\n",
    "    \n",
    "    # Reset the time index\n",
    "    daily_timeseries.reset_index(inplace=True)\n",
    "\n",
    "    # Set the index name\n",
    "    daily_timeseries.rename(columns={'index': 'day'}, inplace=True)\n",
    "    \n",
    "    # Drop the temporary column \"energy_type\"\n",
    "    df.drop('energy_type', axis=1, inplace=True)\n",
    "    return daily_timeseries\n",
    "\n",
    "eligible_for_timeseries = [country for country in countries if 'commissioning_date' in dfs[country].columns]\n",
    "#eligible_for_timeseries = ['CH', 'UK', 'DK', 'DE', 'SE', 'FR'] #\n",
    "possible_start_dates = [min_date(dfs[country]['commissioning_date']) for country in eligible_for_timeseries]\n",
    "possible_end_dates = [max_date(dfs[country]['commissioning_date']) for country in eligible_for_timeseries]\n",
    "\n",
    "#print(\"Possible start and end dates:\")\n",
    "#for country in eligible_for_timeseries:\n",
    "#    print(country, min_date(dfs[country]['commissioning_date']), max_date(dfs[country]['commissioning_date']))\n",
    "\n",
    "if len(possible_start_dates) == 0:\n",
    "    print('We cannot create timeseries from this data. Please, skip the cells which deal with timeseries.')\n",
    "\n",
    "start_date = min(possible_start_dates)\n",
    "end_date = max(possible_end_dates)\n",
    "\n",
    "for country in eligible_for_timeseries:\n",
    "    print(\"Timeseries for\", country)\n",
    "    daily_timeseries[country] = to_daily_timeseries(dfs[country], start_date, end_date)\n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make separate series for Great Britain and Northern Ireland"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if 'UK' in dfs:\n",
    "    # Create the mask for Northern Ireland\n",
    "    ni_mask = dfs['UK']['country'] == 'Northern Ireland'\n",
    "\n",
    "    # Split the UK data\n",
    "    ni_df = dfs['UK'][ni_mask].copy()\n",
    "    gb_df = dfs['UK'][~ni_mask].copy()\n",
    "\n",
    "    # Make the timeseries for Northern Ireland\n",
    "    daily_timeseries['GB-NIR'] = to_daily_timeseries(ni_df, start_date, end_date)\n",
    "\n",
    "    # Make the timeseries for Great Britain (England, Wales, Scotland)\n",
    "    daily_timeseries['GB-GBN'] = to_daily_timeseries(gb_df, start_date, end_date)\n",
    "\n",
    "    # Renaming the entry for UK to conform to the ISO codes\n",
    "    daily_timeseries['GB-UKM'] = daily_timeseries.pop('UK')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create total wind columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the column \"wind\" as a sum of more specific wind technologies (onshore, offshore, other or unspecified)\n",
    "# if not only one is present in the series.\n",
    "for country in daily_timeseries:\n",
    "    columns = daily_timeseries[country].columns\n",
    "    wind_columns = ['wind_onshore', 'wind_offshore', 'wind_other or unspecified technology']\n",
    "    flags = {wind_column: wind_column in columns for wind_column in wind_columns}\n",
    "    present_technologies = [wind_column for wind_column in flags if flags[wind_column] == True]\n",
    "    if len(present_technologies) > 1:\n",
    "        print('Adding', ' and '.join(present_technologies) , 'for', country )\n",
    "        daily_timeseries[country]['wind'] = 0\n",
    "        for wind_column in flags:\n",
    "            if flags[wind_column]:\n",
    "                daily_timeseries[country]['wind'] += daily_timeseries[country][wind_column]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create one time series file containing al countries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "unified_daily_timeseries = pd.DataFrame(index=pd.date_range(start=start_date, end=end_date, freq='D'))\n",
    "# Append the country name to capacity columns' names\n",
    "for c in daily_timeseries:\n",
    "    new_columns = [c + \"_\" + col + \"_capacity\" if col != 'day' else 'day' for col in daily_timeseries[c].columns]\n",
    "    daily_timeseries[c].columns = new_columns\n",
    "    \n",
    "# Unify separate series\n",
    "unified_daily_timeseries = pd.concat(daily_timeseries.values(), axis=1, sort=False)\n",
    "\n",
    "# Make sure the day column appears only once\n",
    "days = unified_daily_timeseries['day']\n",
    "if not isinstance(days, pd.core.series.Series):\n",
    "    unified_daily_timeseries.drop('day', axis=1, inplace=True)\n",
    "    unified_daily_timeseries['day'] = days.iloc[:, 0]\n",
    "unified_daily_timeseries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sort columns alphabetically\n",
    "unified_daily_timeseries = unified_daily_timeseries.reindex(sorted(unified_daily_timeseries.columns), axis=1)\n",
    "unified_daily_timeseries = unified_daily_timeseries.set_index('day').reset_index() # move day column to first position\n",
    "\n",
    "# drop column DE_hydro because it is not all of hydro but only subsidised hydro, which could be misleading\n",
    "if 'DE' in countries and 'DE_hydro_capacity' in unified_daily_timeseries.columns:\n",
    "    unified_daily_timeseries.drop(columns='DE_hydro_capacity', inplace=True)\n",
    "# do the same for CH_hydro for the same reason\n",
    "if 'CH' in countries and 'CH_hydro_capacity' in unified_daily_timeseries.columns:\n",
    "    unified_daily_timeseries.drop(columns='CH_hydro_capacity', inplace=True)\n",
    "# Show some rows\n",
    "unified_daily_timeseries.tail(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Make the normalized dataframe for all the countries\n",
    "\n",
    "Here, we create a dataframe containing the following data for all the countries:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geographical_resolution = {\n",
    "    'PL' : 'power plant',\n",
    "    'FR' : lambda x: 'power plant' if x['data_source'] == 'OPEN DATA RESEAUX ENERGIES' else 'municipality',\n",
    "    'CH' : 'municipality',\n",
    "    'DE' : 'power plant',\n",
    "    'DK' : 'power plant',\n",
    "    'UK' : 'power plant',\n",
    "    'SE' : 'power plant',\n",
    "    'CZ' : 'power plant'\n",
    "}\n",
    "\n",
    "dfs_to_concat = []\n",
    "\n",
    "columns = [ 'electrical_capacity', 'energy_source_level_1', 'energy_source_level_2', 'energy_source_level_3',\n",
    "           'technology', 'data_source', 'nuts_1_region', 'nuts_2_region', 'nuts_3_region',\n",
    "           'lon', 'lat', 'municipality', 'country', 'commissioning_date', 'as_of_year', 'geographical_resolution'\n",
    "          ]\n",
    "\n",
    "for country in countries:\n",
    "    if country not in dfs:\n",
    "        continue\n",
    "        \n",
    "    #for column in columns:\n",
    "    #    if column not in dfs[country].columns:\n",
    "    #        print(column, 'not in the df of ', country, '-- check if that\\'s ok.')\n",
    "            \n",
    "    country_df = dfs[country].loc[:, columns].copy()\n",
    "    country_df['country'] = country\n",
    "    resolution = geographical_resolution[country]\n",
    "    \n",
    "    if callable(resolution):\n",
    "        country_df['geographical_resolution'] = country_df.apply(resolution, axis=1)\n",
    "    else:\n",
    "        country_df['geographical_resolution'] = geographical_resolution[country]\n",
    "    \n",
    "    dfs_to_concat.append(country_df)\n",
    "\n",
    "european_df = pd.concat(dfs_to_concat)\n",
    "european_df.reset_index(inplace=True, drop=True)\n",
    "\n",
    "european_df.drop_duplicates(inplace=True)\n",
    "european_df.reset_index(inplace=True, drop=True)\n",
    "\n",
    "european_df.sample(n=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Output\n",
    "This section finally writes the Data Package:\n",
    "* CSV + XLSX + SQLite\n",
    "* Meta data (JSON)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.makedirs(package_path, exist_ok=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make sure the daily timeseries has only the date part, not the full datetime with time information\n",
    "unified_daily_timeseries['day'] = unified_daily_timeseries['day'].dt.date"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write data files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write CSV-files\n",
    "\n",
    "One csv-file for each country. This process will take some time depending on your hardware."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write each country's dataset as a separate csv file\n",
    "table_names = {}\n",
    "dirty_explanations = {\n",
    "    'DE_dirty' : 'outvalidated_plants',\n",
    "    'FR_dirty' : 'outvalidated_plants'\n",
    "}\n",
    "\n",
    "for country in dfs:\n",
    "    print(country)\n",
    "    \n",
    "    if '_dirty' in country:\n",
    "        table_names[country] = 'res_plants_separated_' + country[:-6] + '_' + dirty_explanations.get(country, '')\n",
    "    else:\n",
    "        table_names[country] = 'renewable_power_plants_' + country\n",
    "    \n",
    "    dfs[country].to_csv(os.path.join(package_path, table_names[country]+'.csv'),\n",
    "            sep=',',\n",
    "            decimal='.',\n",
    "            date_format='%Y-%m-%d',\n",
    "            line_terminator='\\n',\n",
    "            encoding='utf-8',\n",
    "            index=False)\n",
    "    \n",
    "    print('\\tDone!')\n",
    "    \n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write daily cumulated time series as csv\n",
    "unified_daily_timeseries.to_csv(os.path.join(package_path, 'renewable_capacity_timeseries.csv'),\n",
    "        sep=',',\n",
    "        float_format='%.3f',\n",
    "        decimal='.',\n",
    "        date_format='%Y-%m-%d',\n",
    "        encoding='utf-8',\n",
    "        index=False)\n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "european_df.to_csv(os.path.join(package_path, 'renewable_power_plants_EU.csv'),\n",
    "            sep=',',\n",
    "            decimal='.',\n",
    "            date_format='%Y-%m-%d',\n",
    "            line_terminator='\\n',\n",
    "            encoding='utf-8',\n",
    "            index=False)\n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Remove functions from the markers dictionary\n",
    "validation_markers_new = {}\n",
    "for marker_key in validation_markers:\n",
    "    validation_markers_new[marker_key] = {}\n",
    "    for description_key in validation_markers[key]:\n",
    "        description = validation_markers[marker_key][description_key]\n",
    "        if type(description) == str:\n",
    "            validation_markers_new[marker_key][description_key] = description\n",
    "    \n",
    "validation_markers = validation_markers_new\n",
    "\n",
    "# Write csv of Marker Explanations\n",
    "validation_marker_df = pd.DataFrame(validation_markers).transpose()\n",
    "validation_marker_df = validation_marker_df.iloc[:, ::-1] # Reverse column order\n",
    "validation_marker_df.index.name = 'Validation marker'\n",
    "validation_marker_df.reset_index(inplace=True)\n",
    "validation_marker_df.to_csv(os.path.join(package_path, 'validation_marker.csv'), \n",
    "        sep=',',\n",
    "        decimal='.',\n",
    "        date_format='%Y-%m-%d',\n",
    "        line_terminator='\\n',\n",
    "        encoding='utf-8',\n",
    "        index=False\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write XLSX-files\n",
    "\n",
    "All country power plant list will be written in one xlsx-file. Each country power plant list is written in a separate sheet. If a country's dataset is too large to fit into a single sheet, we split it into several separate sheets. Each sheet is limited to 1,000,000 power plants at most.\n",
    "\n",
    "The timeseries is placed in a sheet of its own.\n",
    "\n",
    "An additional sheet includes the explanations of the markers.\n",
    "\n",
    "*Note*: This process may take some time depending on your hardware."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the function which writes a row of data to a row in an Excel sheet\n",
    "def write_row(row, sheet, data_index, offset=0):\n",
    "    sheet_row_index = data_index + 1 - offset\n",
    "    for j, field in enumerate(row):\n",
    "        if pd.isna(field):\n",
    "            field = ''\n",
    "        sheet.write(sheet_row_index, j, field)\n",
    "    #sheet.write_row(sheet_row_index, 0, row)\n",
    "\n",
    "# Define the function for converting column number in Excel names\n",
    "# Note: the column numbering is assumed to start from zero.\n",
    "def excel_column_name(number):\n",
    "    if number < 26:\n",
    "        return chr(ord('A') + number)\n",
    "    else:\n",
    "        return excel_column_name((number // 26) - 1) + excel_column_name(number % 26)\n",
    "\n",
    "# Define the function for creating a named sheet in the given xlsxwriter Workbook, writing its header\n",
    "# and formatting its columns according to a given dictionary of pairs {data_column: excel_format}.\n",
    "def create_sheet(book, sheet_name, header, formats={}):\n",
    "    # Create the sheet\n",
    "    sheet = book.add_worksheet(name=sheet_name)\n",
    "\n",
    "    # Write the header\n",
    "    sheet.write_row(0, 0, header)\n",
    "            \n",
    "    # Set the format of date columns\n",
    "    for j in range(len(header)):\n",
    "        column_format = formats.get(header[j], None)\n",
    "        if column_format is not None:\n",
    "            excel_name = excel_column_name(j)\n",
    "            sheet.set_column('{}:{}'.format(excel_name, excel_name), None, column_format)\n",
    "    \n",
    "    return sheet\n",
    "\n",
    "# Define the function for writing a dataframe to an Excel file efficiently.\n",
    "# The goal is not just to split a df across several sheets if it's too large,\n",
    "# but also to reduce the RAM usage and make the process run faster.\n",
    "# Make sure that book is an xlsxwriter.Workbook with constant memory set to True.\n",
    "# That way, the data written to its sheets will be flushed after each row, so RAM usage\n",
    "# will be kept at a constant level. Without constant_memory set to True, the same data would be copied\n",
    "# to the sheets, which are kept in RAM, so the total amount of RAM used to store the data would at least double\n",
    "# which could make everything slower if you don't have enough RAM.\n",
    "# Note: pandas does provide method to_excel which receives an Excel writer, but\n",
    "# even if you set its parameter constant_memory to True, it won't work because to_excel writes the data\n",
    "# column by column, whereas constant_memory requires the data to be written row after row. \n",
    "def write_df_to_excel(df, book, max_sheet_size, formats={}, name=None):\n",
    "    # Get the number of rows in the df\n",
    "    number_of_rows = df.shape[0]\n",
    "\n",
    "    # Get the df's header \n",
    "    header = df.columns\n",
    "\n",
    "    # Check if the df is too large to fit in to one sheet.\n",
    "    if number_of_rows > max_sheet_size:\n",
    "        # If so, define the indices which will separate the sheets.\n",
    "        boundaries = list(range(0, number_of_rows, max_sheet_size)) + [number_of_rows]\n",
    "\n",
    "        # Include the final index if it is not already there\n",
    "        if boundaries[-1] != number_of_rows:\n",
    "            boundaries += [number_of_rows]\n",
    "\n",
    "        number_of_sheets = len(boundaries) - 1\n",
    "        \n",
    "        # Split the data across the sheets\n",
    "        # so that the i-th sheet contains the data whose indices are in the range [splitters[i], splitters[i+1])\n",
    "        print('\\tSplitting the data into {} sheets'.format(number_of_sheets))\n",
    "        \n",
    "        for i in range(number_of_sheets):\n",
    "            # Define the sheet's name.\n",
    "            if name is not None:\n",
    "                sheet_name_format = name + ' part-{}'\n",
    "            else:\n",
    "                sheet_name_format = 'part-{}'\n",
    "            sheet_name = sheet_name_format.format(i + 1) # i + 1 because i is a zero-based index\n",
    "                                                         # and 1-based indices are more readable\n",
    "            \n",
    "            # Create the sheet\n",
    "            sheet = create_sheet(book, sheet_name, header, formats)\n",
    "                    \n",
    "            # Get the sheet's boundary indices.\n",
    "            start = boundaries[i]\n",
    "            end = boundaries[i + 1]\n",
    "            \n",
    "            # Calculate the offset. It is 0 for the first sheet. \n",
    "            # For all the other sheets, it is equal to the total number of rows written before the sheet at hand.\n",
    "            if i == 0:\n",
    "                offset = 0\n",
    "            else:\n",
    "                offset = i * max_sheet_size\n",
    "            \n",
    "            # Write the data.\n",
    "            print('\\t\\tWriting [{}:{}] into the sheet number {}'.format(start, end, i + 1))\n",
    "            for data_int_index in range(number_of_rows):\n",
    "                row = df.iloc[data_int_index, :]\n",
    "                write_row(row, sheet, data_int_index, offset)\n",
    "    else:\n",
    "        # Create the sheet for the df\n",
    "        sheet = create_sheet(book, name, header, formats)\n",
    "\n",
    "        # Write the data\n",
    "        for data_int_index in range(number_of_rows):\n",
    "            row = df.iloc[data_int_index, :]\n",
    "            write_row(row, sheet, data_int_index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an empty xlsx file\n",
    "xlsx_path = os.path.join(package_path, 'renewable_power_plants.xlsx')\n",
    "book = xlsxwriter.Workbook(xlsx_path, {'constant_memory': True})\n",
    "\n",
    "# Set the max size of a sheet to 1,000,000 power plants.\n",
    "max_sheet_size = 10**6\n",
    "\n",
    "# Define the format for the dates\n",
    "date_format = book.add_format({'num_format':'yyyy-mm-dd'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write the datasets of the countries (dirty as well as clean).\n",
    "for country in dfs:\n",
    "    print(country)\n",
    "    \n",
    "    # Specify the Excel formatting rules\n",
    "    column_formats = {column: date_format for column in dfs[country] if 'date' in column}\n",
    "    \n",
    "    # Write the country data to the Excel file\n",
    "    write_df_to_excel(dfs[country], book, max_sheet_size, formats=column_formats, name=country)\n",
    "    print('\\tDone!')\n",
    "\n",
    "# Write the markers' explanations\n",
    "print('Write the validation markers sheet')\n",
    "write_df_to_excel(validation_marker_df, book, max_sheet_size, name='validation_marker')\n",
    "print('\\tDone!')\n",
    "\n",
    "# Write the timeseries \n",
    "print('Write the timeseries')\n",
    "# Make sure that `day` gets formatted as a date yyyy-mm-dd\n",
    "column_formats = {'day' : date_format}\n",
    "# Write the series\n",
    "write_df_to_excel(unified_daily_timeseries, book, max_sheet_size, formats=column_formats,\n",
    "                  name='capacity_timeseries')\n",
    "print('\\tDone!')\n",
    "\n",
    "print('Save the Excel file.')\n",
    "book.close()   \n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write SQLite"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "engine = sqlalchemy.create_engine('sqlite:///' + package_path + '/renewable_power_plants.sqlite')  \n",
    "\n",
    "for country in all_countries_including_dirty:\n",
    "    if country in dfs:\n",
    "        print(country)\n",
    "        # Parameter chunksize is for lower-memory computers. Removing it might speed things up.\n",
    "        dfs[country].to_sql(table_names[country], engine, if_exists=\"replace\", chunksize=100000, index=False)\n",
    "        print('\\tDone!')\n",
    "\n",
    "print('Validation markers')\n",
    "validation_marker_df.to_sql('validation_marker', engine, if_exists=\"replace\", chunksize=100000, index=False)\n",
    "print('Done!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save the european df as sqlite\n",
    "european_df.to_sql('renewable_power_plants_EU', engine, if_exists=\"replace\", chunksize=100000, index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save timeseries as sqlite\n",
    "unified_daily_timeseries.to_sql('renewable_capacity_timeseries', engine, if_exists=\"replace\", chunksize=100000, index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Write meta data\n",
    "\n",
    "The Data Packages meta data are created in the specific JSON format as proposed by the Open Knowledge Foundation. Please see the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.\n",
    "\n",
    "In order to keep the Jupyter Notebook more readable the metadata is written in the human-readable YAML format using a multi-line string and then parse the string into a Python dictionary and save it as a JSON file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-08-21T01:29:53.267728Z",
     "start_time": "2020-08-21T01:29:52.153125Z"
    }
   },
   "outputs": [],
   "source": [
    "# Automatically generate some metadata strings such as the list of countries covered, the list of sources etc.\n",
    "country_list_filepath = os.path.join('input', 'countries.csv')\n",
    "countries_df = pd.read_csv(country_list_filepath)\n",
    "countries_df.set_index('full_name', inplace=True)\n",
    "countries_dict = countries_df.to_dict(orient='index')\n",
    "covered_countries = \"\"\n",
    "\n",
    "countries = sorted(countries_dict.keys())\n",
    "\n",
    "for country in countries:\n",
    "    description = countries_dict[country]['data_description']\n",
    "    covered_countries += '    {}: {}\\n'.format(country, description)\n",
    "\n",
    "list_of_countries = ', '.join(countries[:-1]) + ' and ' + countries[-1]\n",
    "list_of_countries_noand = ', '.join(countries)\n",
    "list_of_countries_keywords = ','.join(countries).lower()\n",
    "\n",
    "source_list_filepath = os.path.join('input', 'sources.csv')\n",
    "sources_df = pd.read_csv(source_list_filepath)\n",
    "\n",
    "sources_metadata = \"\"\"    - title: Postleitzahlen Deutschland\n",
    "      path: http://www.suche-postleitzahl.org/downloads\n",
    "      description: Zip codes of Germany linked to geo-information\n",
    "    - title: GeoNames\n",
    "      path: http://download.geonames.org/export/zip/\n",
    "      description: The GeoNames geographical database which covers all countries and contains over eleven million placenames that are available for download free of charge.\n",
    "    - title: Eurostat (NUTS tables)\n",
    "      path: https://ec.europa.eu/eurostat/home?\n",
    "      description: The data for mapping coordinates, postcodes, municipality names and codes to NUTS region codes.\n",
    "\"\"\"\n",
    "source_format = \"\"\"    - title: {source}\n",
    "      path: {url}\n",
    "      description: {short_description}\n",
    "\"\"\"\n",
    "\n",
    "geonames_eurostat_mask = sources_df['source'].isin(['Geonames', 'Eurostat'])\n",
    "sources_df = sources_df[~geonames_eurostat_mask]\n",
    "#display(sources_df)\n",
    "\n",
    "sources_dict = sources_df[[\"full_name\", \"url\", \"short_description\"]]\\\n",
    "    .set_index(\"full_name\" )\\\n",
    "    .to_dict(orient=\"index\")\n",
    "\n",
    "for source in sources_dict:\n",
    "    source_metadata = {'source' : source}\n",
    "    source_metadata.update(sources_dict[source])\n",
    "    sources_metadata += source_format.format(**source_metadata)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-08-21T01:29:53.590389Z",
     "start_time": "2020-08-21T01:29:53.269779Z"
    }
   },
   "outputs": [],
   "source": [
    "metadata = \"\"\"\n",
    "hide: yes\n",
    "profile: data-package\n",
    "_metadataVersion: 1.2\n",
    "name: opsd_renewable_power_plants\n",
    "title: Renewable power plants\n",
    "description: List of renewable energy power stations\n",
    "longDescription: >-\n",
    "    This Data Package contains a list of renewable energy power plants in lists of \n",
    "    renewable energy-based power plants of {list_of_countries}. \n",
    "{covered_countries}\n",
    "    Due to different data availability, the power plant lists are of different \n",
    "    accurancy and partly provide different power plant parameter. Due to that, the \n",
    "    lists are provided as seperate csv-files per country and as separate sheets in the\n",
    "    excel file. Suspect data or entries with high probability of duplication are marked\n",
    "    in the column 'comment'. Theses validation markers are explained in the file\n",
    "    validation_marker.csv.\n",
    "    Additionally, the Data Package includes daily time series of cumulated\n",
    "    installed capacity per energy source type for Germany, Denmark, Switzerland, the United Kingdom and Sweden. All data processing is \n",
    "    conducted in Python and pandas and has been documented in the Jupyter Notebooks linked below. \n",
    "keywords: [master data register,power plants,renewables,{list_of_countries_keywords},open power system data]\n",
    "spatial: \n",
    "    location: {list_of_countries_noand}\n",
    "    resolution: Power plants, municipalities\n",
    "resources:\n",
    "    - name: renewable_power_plants_de\n",
    "      profile: tabular-data-resource\n",
    "      path: renewable_power_plants_DE.csv\n",
    "      title: Renewable power plants in Germany\n",
    "      format: csv\n",
    "      mediatype: text/csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          primaryKey: eeg_id\n",
    "          fields:\n",
    "            - name: electrical_capacity\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "              unit: MW\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. DE1).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. DE11).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. DE111).\n",
    "              type: string\n",
    "            - name: lon\n",
    "              description: Longitude coordinates\n",
    "              type: number\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: number\n",
    "            - name: municipality\n",
    "              description: Name of German Gemeinde (municipality)\n",
    "              type: string\n",
    "            - name: municipality_code\n",
    "              description: German Gemeindenummer (municipalitiy number)\n",
    "              type: string\n",
    "            - name: postcode\n",
    "              description: German zip-code\n",
    "              type: string\n",
    "            - name: address\n",
    "              description: Street name or name of land parcel\n",
    "              type: string\n",
    "            - name: federal_state\n",
    "              description: Name of German administrative level 'Bundesland'\n",
    "              type: string\n",
    "            - name: commissioning_date\n",
    "              description: Date of commissioning of specific unit\n",
    "              type: date\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: decommissioning_date\n",
    "              description: Date of decommissioning of specific unit\n",
    "              type: date\n",
    "            - name: voltage_level\n",
    "              description: Voltage level of grid connection\n",
    "              type: string\n",
    "            - name: eeg_id\n",
    "              description: Power plant EEG (German feed-in tariff law) remuneration number\n",
    "              type: string\n",
    "            - name: dso\n",
    "              description: Name of distribution system operator of the region the plant is located in\n",
    "              type: string\n",
    "            - name: dso_id\n",
    "              description: Company number of German distribution grid operator\n",
    "              type: string\n",
    "            - name: tso\n",
    "              description: Name of transmission system operator of the area the plant is located\n",
    "              type: string\n",
    "    - name: renewable_power_plants_dk\n",
    "      path: renewable_power_plants_DK.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          fields:\n",
    "            - name: commissioning_date\n",
    "              description: date of the plant's commissioning\n",
    "              type: date\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source.\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              unit: MW\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "            - name: dso\n",
    "              description: Name of distribution system operator of the region the plant is located in\n",
    "              type: string\n",
    "            - name: gsrn_id\n",
    "              description: Danish wind turbine identifier number (GSRN)\n",
    "              type: integer\n",
    "            - name: postcode\n",
    "              description: Danish zip-code\n",
    "              type: string\n",
    "            - name: municipality_code\n",
    "              description: Danish 3-digit Kommune-Nr\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: Name of Danish Kommune\n",
    "              type: string\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. DK0).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. DK01).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. DK013).\n",
    "              type: string\n",
    "            - name: address\n",
    "              description: Street name or name of land parcel\n",
    "              type: string\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: number\n",
    "            - name: lon\n",
    "              description: Longitude coordinates \n",
    "              type: number\n",
    "            - name: hub_height\n",
    "              description: Wind turbine hub heigth in m\n",
    "              type: number\n",
    "            - name: rotor_diameter\n",
    "              description: Wind turbine rotor diameter in m\n",
    "              type: number\n",
    "            - name: manufacturer\n",
    "              description: Company that has built the wind turbine\n",
    "              type: string\n",
    "            - name: model\n",
    "              description: Wind turbine model type\n",
    "              type: string\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "    - name: renewable_power_plants_fr\n",
    "      path: renewable_power_plants_FR.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          fields:\n",
    "            - name: site_name\n",
    "              description: The power plant's name.\n",
    "              type: string\n",
    "            - name: EIC_code\n",
    "              description: Energy Identification Code - the plant's unique identifier in the French grid\n",
    "              type: string\n",
    "            - name: IRIS_code\n",
    "              description: IRIS code\n",
    "              type: string\n",
    "            - name: commissioning_date\n",
    "              description: The date of the plant's commissioning\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: date\n",
    "            - name: connection_date\n",
    "              description: The data when the plant was connected to the French grid\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: date\n",
    "            - name: disconnection_date\n",
    "              description: The date that the plant was disconnected from the French grid\n",
    "              opsdContentFilter: \"true\"\n",
    "              type: date\n",
    "            - name: departement\n",
    "              description: The name of the French departement\n",
    "              type: string\n",
    "            - name: departement_code\n",
    "              description: The number of the French departement \n",
    "              type: integer\n",
    "            - name: municipality\n",
    "              description: Name of French Commune\n",
    "              type: string\n",
    "            - name: municipality_code\n",
    "              description: French 5-digit INSEE code for Communes\n",
    "              type: integer\n",
    "            - name: municipality_group\n",
    "              description: Name of the group of municipalities the plant is located in.\n",
    "              type: string\n",
    "            - name: municipality_group_code\n",
    "              description: Code of the group of municipalities the plant is located in.\n",
    "              type: integer\n",
    "            - name: region\n",
    "              description: Name of the French region\n",
    "              type: string\n",
    "            - name: region_code\n",
    "              description: Code of the French region\n",
    "              type: integer\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. FR1).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. FR10).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. FR101).\n",
    "              type: string\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              unit: MW\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "            - name: number_of_installations\n",
    "              description: Number of installations of the energy source subtype in the municipality. Due to confidentiality reasons, the values smaller than 3 are published as ''<3'' (as in the source).\n",
    "              type: integer\n",
    "              bareNumber: false\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: number\n",
    "            - name: lon\n",
    "              description: Longitude coordinates \n",
    "              type: number\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "            - name: as_of_year\n",
    "              description: Year for which the data source compiled the original dataset.\n",
    "              type: integer\n",
    "    - name: renewable_power_plants_pl\n",
    "      path: renewable_power_plants_PL.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          primaryKey: URE_id\n",
    "          fields:\n",
    "            - name: URE_id\n",
    "              type: integer\n",
    "              description: The URE id of the plant.\n",
    "            - name: region\n",
    "              type: string\n",
    "              description: The name of the Polish voivodeship.\n",
    "            - name: district\n",
    "              description: The name of the Polish powiat.\n",
    "              type: string\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. PL1).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. PL11).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. PL113).\n",
    "              type: string\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: string\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              unit: MW\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "            - name: as_of_year\n",
    "              description: Year for which the data source compiled the original dataset.\n",
    "              type: integer\n",
    "    - name: renewable_power_plants_uk\n",
    "      path: renewable_power_plants_UK.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          primaryKey: uk_beis_id\n",
    "          fields:\n",
    "            - name: commissioning_date\n",
    "              description: Date of commissioning of specific unit\n",
    "              type: date\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: uk_beis_id\n",
    "              description: ID for the plant as assigned by UK BEIS.\n",
    "              type: integer\n",
    "            - name: site_name\n",
    "              description: Name of site\n",
    "              type: string\n",
    "            - name: operator\n",
    "              description: Name of operator\n",
    "              type: string\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: string\n",
    "            - name: energy_source_level_3\n",
    "              description: Type of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              description: Installed electrical capacity in MW\n",
    "              unit: MW\n",
    "              type: number\n",
    "            - name: chp\n",
    "              description: Is the project capable of combined heat and power output\n",
    "              type: string\n",
    "            - name: capacity_individual_turbine\n",
    "              description: For windfarms, the individual capacity of each wind turbine in megawatts (MW)\n",
    "              type: number\n",
    "            - name: number_of_turbines\n",
    "              description: For windfarms, the number of wind turbines located on the site\n",
    "              type: integer\n",
    "            - name: solar_mounting_type\n",
    "              description: For solar PV developments, whether the PV panels are ground or roof mounted\n",
    "              type: string\n",
    "            - name: address\n",
    "              description: Address\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: Municipality\n",
    "              type: string\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. UKD).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. UKD1).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. UKC12).\n",
    "              type: string\n",
    "            - name: region\n",
    "              description: Region\n",
    "              type: string\n",
    "            - name: country\n",
    "              description: The UK's constituent country in which the facility is located.\n",
    "              type: string\n",
    "            - name: postcode\n",
    "              description: Postcode\n",
    "              type: string\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: string\n",
    "            - name: lon\n",
    "              description: Longitude coordinates\n",
    "              type: string\n",
    "            - name: data_source\n",
    "              description: The source of database entries\n",
    "              type: string\n",
    "            - name: comment\n",
    "              description: Shortcodes for comments related to this entry, explanation can be looked up in validation_marker.csv\n",
    "              type: string\n",
    "    - name: renewable_power_plants_ch\n",
    "      path: renewable_power_plants_CH.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          fields:\n",
    "            - name: commissioning_date\n",
    "              description: Commissioning date\n",
    "              type: date\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: municipality\n",
    "              description: Municipality\n",
    "              type: string\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. CH0).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. CH03).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. CH031).\n",
    "              type: string\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Type of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              unit: MW\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "            - name: municipality_code\n",
    "              description: Municipality code\n",
    "              type: integer\n",
    "            - name: project_name\n",
    "              description: Name of the project\n",
    "              type: string\n",
    "            - name: production\n",
    "              description: Yearly production in MWh\n",
    "              type: number\n",
    "            - name: tariff\n",
    "              description: Tariff in CHF for 2016\n",
    "              type: number\n",
    "            - name: contract_period_end\n",
    "              description: End of subsidy contract\n",
    "              type: date\n",
    "            - name: address\n",
    "              description: Street name\n",
    "              type: string\n",
    "            - name: canton\n",
    "              description: Name of the cantones/ member states of the Swiss confederation\n",
    "              type: string\n",
    "            - name: company\n",
    "              description: Name of the company\n",
    "              type: string\n",
    "            - name: lat\n",
    "              description: Latitude coordinate\n",
    "              type: number\n",
    "            - name: lon\n",
    "              description: Longitude coordinate \n",
    "              type: number\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "            - name: postcode\n",
    "              description: Swiss zip code\n",
    "              type: string\n",
    "    - name: renewable_power_plants_se\n",
    "      path: renewable_power_plants_SE.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          fields:\n",
    "            - name: commissioning_date\n",
    "              description: Commissioning date\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: date\n",
    "            - name: se_vindbrukskollen_id\n",
    "              description: The id in the vindbrukskollen data\n",
    "              type: string\n",
    "            - name: site_name \n",
    "              description: Name of site\n",
    "              type: string\n",
    "            - name: manufacturer\n",
    "              description: Manufacturer\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: Municipality\n",
    "              type: string\n",
    "            - name: county\n",
    "              description: County\n",
    "              type: string\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. SE0).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. SE02).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. SE021).\n",
    "              type: string\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: number\n",
    "            - name: lon\n",
    "              description: Longitude coordinates\n",
    "              type: number\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "            - name: energy_source_level_3\n",
    "              description: Type of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              unit: MW\n",
    "              description: Installed electrical capacity in MW.\n",
    "              type: number\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "    - name: renewable_power_plants_cz\n",
    "      path: renewable_power_plants_CZ.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          fields:\n",
    "            - name: site_name \n",
    "              description: Name of site\n",
    "              type: string\n",
    "            - name: owner\n",
    "              description: Owner\n",
    "              type: string\n",
    "            - name: region\n",
    "              description: Region\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: Municipality\n",
    "              type: string\n",
    "            - name: locality\n",
    "              description: Town or village\n",
    "              type: string\n",
    "            - name: postcode\n",
    "              description: Postcode\n",
    "              type: string\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: number\n",
    "            - name: lon\n",
    "              description: Longitude coordinates\n",
    "              type: number\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. CZ0).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. CZ08).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. CZ08-).\n",
    "              type: string\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "            - name: energy_source_level_3\n",
    "              description: Type of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              unit: MW\n",
    "              description: Installed electrical capacity in MW.\n",
    "              type: number\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "    - name: res_plants_separated_de_outvalidated_plants\n",
    "      path: res_plants_separated_DE_outvalidated_plants.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:  \n",
    "          missingValues: [\"\"]\n",
    "          fields:\n",
    "            - name: commissioning_date\n",
    "              type: date\n",
    "              description: Date of commissioning of specific unit\n",
    "            - name: decommissioning_date\n",
    "              type: date\n",
    "              description: Date of decommissioning of specific unit\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              unit: MW\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "              unit: MW\n",
    "            - name: voltage_level\n",
    "              description: Voltage level of grid connection\n",
    "              type: string\n",
    "            - name: tso\n",
    "              description: Name of transmission system operator of the area the plant is located\n",
    "              type: string\n",
    "            - name: dso\n",
    "              description: Name of distribution system operator of the region the plant is located in\n",
    "              type: string\n",
    "            - name: dso_id\n",
    "              description: Company number of German distribution grid operator\n",
    "              type: string\n",
    "            - name: eeg_id\n",
    "              description: Power plant EEG (German feed-in tariff law) remuneration number\n",
    "              type: string\n",
    "            - name: federal_state\n",
    "              description: Name of German administrative level 'Bundesland'\n",
    "              type: string\n",
    "            - name: postcode\n",
    "              description: German zip-code\n",
    "              type: string\n",
    "            - name: municipality_code\n",
    "              description: German Gemeindenummer (municipalitiy number)\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: Name of German Gemeinde (municipality)\n",
    "              type: string\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. DE1).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. DE11).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. DE111).\n",
    "              type: string\n",
    "            - name: address\n",
    "              description: Street name or name of land parcel\n",
    "              type: string\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: number\n",
    "            - name: lon\n",
    "              description: Longitude coordinates \n",
    "              type: number\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "            - name: comment\n",
    "              description: Shortcodes for comments related to this entry, explanation can be looked up in validation_marker.csv\n",
    "              type: string\n",
    "    - name: res_plants_separated_fr_outvalidated_plants\n",
    "      path: res_plants_separated_FR_outvalidated_plants.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          fields:\n",
    "            - name: site_name\n",
    "              description: The power plant's name.\n",
    "              type: string\n",
    "            - name: EIC_code\n",
    "              description: Energy Identification Code - the plant's unique identifier in the French grid\n",
    "              type: string\n",
    "            - name: IRIS_code\n",
    "              description: IRIS code\n",
    "              type: string\n",
    "            - name: commissioning_date\n",
    "              description: The date of the plant's commissioning\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: date\n",
    "            - name: connection_date\n",
    "              description: The data when the plant was connected to the French grid\n",
    "              opsdContentfilter: \"true\"\n",
    "              type: date\n",
    "            - name: disconnection_date\n",
    "              description: The date that the plant was disconnected from the French grid\n",
    "              opsdContentFilter: \"true\"\n",
    "              type: date\n",
    "            - name: departement\n",
    "              description: The name of the French departement\n",
    "              type: string\n",
    "            - name: departement_code\n",
    "              description: The number of the French departement \n",
    "              type: integer\n",
    "            - name: municipality\n",
    "              description: Name of French Commune\n",
    "              type: string\n",
    "            - name: municipality_code\n",
    "              description: French 5-digit INSEE code for Communes\n",
    "              type: integer\n",
    "            - name: municipality_group\n",
    "              description: Name of the group of municipalities the plant is located in.\n",
    "              type: string\n",
    "            - name: municipality_group_code\n",
    "              description: Code of the group of municipalities the plant is located in.\n",
    "              type: integer\n",
    "            - name: region\n",
    "              description: Name of the French region\n",
    "              type: string\n",
    "            - name: region_code\n",
    "              description: Code of the French region\n",
    "              type: integer\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. FR1).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. FR10).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. FR101).\n",
    "              type: string\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Subtype of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: technology\n",
    "              description: Technology to harvest energy source (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              unit: MW\n",
    "              description: Installed electrical capacity in MW\n",
    "              type: number\n",
    "            - name: number_of_installations\n",
    "              description: Number of installations of the energy source subtype in the municipality. Due to confidentiality reasons, the values smaller than 3 are published as ''<3'' (as in the source).\n",
    "              type: integer\n",
    "              bareNumber: false\n",
    "            - name: lat\n",
    "              description: Latitude coordinates\n",
    "              type: number\n",
    "            - name: lon\n",
    "              description: Longitude coordinates \n",
    "              type: number\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "            - name: as_of_year\n",
    "              description: Year for which the data source compiled the original dataset.\n",
    "              type: integer\n",
    "            - name: comment\n",
    "              description: Shortcodes for comments related to this entry, explanation can be looked up in validation_marker.csv\n",
    "              type: string\n",
    "    - name: validation_marker\n",
    "      path: validation_marker.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      mediatype: text/csv\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          primaryKey: Validation marker\n",
    "          fields:\n",
    "            - name: Validation marker\n",
    "              description: Name of validation marker utilized in column comment in the renewable_power_plant_germany.csv\n",
    "              type: string\n",
    "            - name: Long explanation\n",
    "              description: Explanation of the validation marker\n",
    "              type: string\n",
    "            - name: Short explanation\n",
    "              description: Short comment on the meaning of the marker\n",
    "              type: string\n",
    "            - name: Country\n",
    "              description: The country for which the marker is defined.\n",
    "              type: string\n",
    "    - name: renewable_power_plants_eu\n",
    "      path: renewable_power_plants_EU.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      mediatype: text/csv\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "          fields:\n",
    "            - name: energy_source_level_1\n",
    "              description: Type of energy source (e.g. Renewable energy)\n",
    "              type: string\n",
    "            - name: energy_source_level_2\n",
    "              description: Type of energy source (e.g. Wind, Solar)\n",
    "              type: string\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: energy_source_level_3\n",
    "              description: Type of energy source (e.g. Biomass and biogas)\n",
    "              type: string\n",
    "            - name: electrical_capacity\n",
    "              description: Installed electrical capacity in MW\n",
    "              unit: MW\n",
    "              type: number\n",
    "            - name: technology\n",
    "              description: Technology to harvest energt (e.g. Onshore, Photovoltaics)\n",
    "              type: string\n",
    "            - name: data_source\n",
    "              description: Source of database entry\n",
    "              type: string\n",
    "            - name: municipality\n",
    "              description: The name of the municipality in which the facility is located.\n",
    "              type: string\n",
    "            - name: nuts_1_region\n",
    "              description: The code of the NUTS 1 region the facility is in (e.g. NL1).\n",
    "              type: string\n",
    "            - name: nuts_2_region\n",
    "              description: The code of the NUTS 2 region the facility is in (e.g. NL11).\n",
    "              type: string\n",
    "            - name: nuts_3_region\n",
    "              description: The code of the NUTS 3 region the facility is in (e.g. NL112).\n",
    "              type: string\n",
    "            - name: lon\n",
    "              description: Geographical longitude\n",
    "              type: number\n",
    "            - name: lat\n",
    "              description: Geographical latitude\n",
    "              type: number\n",
    "            - name: commissioning_date\n",
    "              type: date\n",
    "              description: Date of commissioning of specific unit\n",
    "            - name: geographical_resolution\n",
    "              description: Precision of geographical information (exact power plant location, municipality, district)\n",
    "              type: string\n",
    "            - name: as_of_year\n",
    "              description: Year for which the data source compiled the corresponding dataset\n",
    "              type: integer\n",
    "            - name: country\n",
    "              description: The country in which the facility is located\n",
    "              type: string\n",
    "    - name: renewable_power_plants_xlsx\n",
    "      profile: data-resource\n",
    "      path: renewable_power_plants.xlsx\n",
    "      title: The whole package\n",
    "      description: The whole package as an Excel file with each country written in a sheet except for Germany which is spread across two, alongside with timeseries and validation markers.\n",
    "      format: xlsx\n",
    "      mediatype: application/vnd.ms-excel\n",
    "      encoding: UTF-8\n",
    "      schema:\n",
    "          missingValues: [\"\"]\n",
    "    - name: renewable_capacity_timeseries\n",
    "      path: renewable_capacity_timeseries.csv\n",
    "      profile: tabular-data-resource\n",
    "      format: csv\n",
    "      encoding: UTF-8\n",
    "      mediatype: text/csv\n",
    "      schema: \n",
    "          missingValues: [\"\"]\n",
    "          primaryKey: day\n",
    "          fields:\n",
    "            - name: day\n",
    "              type: date\n",
    "              description: The day of the timeseries entry\n",
    "              opsdContentfilter: \"true\"\n",
    "            - name: CH_bioenergy_capacity\n",
    "              description: Cumulative bioenergy electrical capacity for Switzerland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Switzerland\n",
    "                Variable: Bioenergy\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Swiss Federal Office of Energy\n",
    "                path: input/original_data/CH/BFE/9669-Liste aller KEV-Bezüger im Jahr 2018.xlsx\n",
    "            - name: CH_solar_capacity\n",
    "              description: Cumulative solar electrical capacity for Switzerland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Switzerland\n",
    "                Variable: Solar\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Swiss Federal Office of Energy\n",
    "                path: input/original_data/CH/BFE/9669-Liste aller KEV-Bezüger im Jahr 2018.xlsx\n",
    "            - name: CH_wind_onshore_capacity\n",
    "              description: Cumulative onshore wind electrical capacity for Switzerland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Switzerland\n",
    "                Variable: Wind onshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Swiss Federal Office of Energy\n",
    "                path: input/original_data/CH/BFE/9669-Liste aller KEV-Bezüger im Jahr 2018.xlsx\n",
    "            - name: DE_bioenergy_capacity\n",
    "              description: Cumulative bioenergy electrical capacity for Germany in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Germany\n",
    "                Variable: Bioenergy\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BNetzA and Netztransparenz.de\n",
    "            - name: DE_geothermal_capacity\n",
    "              description: Cumulative geothermal electrical capacity for Germany in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Germany\n",
    "                Variable: Geothermal\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BNetzA and Netztransparenz.de\n",
    "            - name: DE_solar_capacity\n",
    "              description: Cumulative solar electrical capacity for Germany in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Germany\n",
    "                Variable: Solar\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BNetzA and Netztransparenz.de\n",
    "            - name: DE_wind_capacity \n",
    "              ription: Cumulative total wind electrical capacity for Germany in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Germany\n",
    "                Variable: Wind\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BNetzA and Netztransparenz.de\n",
    "            - name: DE_wind_offshore_capacity\n",
    "              description: Cumulative offshore wind electrical capacity for Germany in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Germany\n",
    "                Variable: Wind offshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BNetzA and Netztransparenz.de\n",
    "            - name: DE_wind_onshore_capacity\n",
    "              description: Cumulative onshore wind electrical capacity for Germany in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Germany\n",
    "                Variable: Wind onshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BNetzA and Netztransparenz.de\n",
    "            - name: DK_solar_capacity\n",
    "              description: Cumulative solar electrical capacity for Denmark in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Denmark\n",
    "                Variable: Solar\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Energinet.dk\n",
    "                path: input/original_data/DK/Energinet/SolcellerGraf-2016-11.xlsx\n",
    "            - name: DK_wind_capacity \n",
    "              description: Cumulative total wind electrical capacity for Denmark in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Denmark\n",
    "                Variable: Wind\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Danish Energy Agency\n",
    "                path: input/original_data/DK/Energistyrelsen/anlaegprodtilnettet.xls\n",
    "            - name: DK_wind_offshore_capacity\n",
    "              description: Cumulative offshore wind electrical capacity for Denmark in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Denmark\n",
    "                Variable: Wind offshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Danish Energy Agency\n",
    "                path: input/original_data/DK/Energistyrelsen/anlaegprodtilnettet.xls\n",
    "            - name: DK_wind_onshore_capacity\n",
    "              description: Cumulative onshore wind electrical capacity for Denmark in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Denmark\n",
    "                Variable: Wind onshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Danish Energy Agency\n",
    "                path: input/original_data/DK/Energistyrelsen/anlaegprodtilnettet.xls\n",
    "            - name: FR_bioenergy_capacity\n",
    "              description: Cumulative bioenergy electrical capacity for France in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: France\n",
    "                Variable:\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from ODRE and Ministère de la Transition écologique et solidaire\n",
    "            - name: FR_geothermal_capacity\n",
    "              description: Cumulative geothermal electrical capacity for France in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: France\n",
    "                Variable: Geothermal\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from ODRE and Ministère de la Transition écologique et solidaire\n",
    "            - name: FR_hydro_capacity\n",
    "              description: Cumulative hydroenergy electrical capacity for France in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: France\n",
    "                Variable: Hydro\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from ODRE and Ministère de la Transition écologique et solidaire\n",
    "            - name: FR_marine_capacity\n",
    "              description: Cumulative marine electrical capacity for France in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: France\n",
    "                Variable: Marine\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from ODRE and Ministère de la Transition écologique et solidaire.\n",
    "            - name: FR_solar_capacity\n",
    "              description: Cumulative solar electrical capacity for France in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: France\n",
    "                Variable: Solar\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from ODRE and Ministère de la Transition écologique et solidaire.\n",
    "            - name: FR_wind_onshore_capacity\n",
    "              description: Cumulative wind onshore electrical capacity for France in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: France\n",
    "                Variable: Wind onshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from ODRE and Ministère de la Transition écologique et solidaire.\n",
    "            - name: GB-GBN_bioenergy_capacity\n",
    "              description: Cumulative bioenergy electrical capacity for Great Britain (England, Scotland, Wales) in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Great Britain (England, Scotland, Wales)\n",
    "                Variable: Bioenergy\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-GBN_hydro_capacity\n",
    "              description: Cumulative hydro electrical capacity for Great Britain (England, Scotland, Wales) in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Great Britain (England, Scotland, Wales)\n",
    "                Variable: Hydro\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-GBN_marine_capacity\n",
    "              description: Cumulative marine electrical capacity for Great Britain (England, Scotland, Wales) in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Great Britain (England, Scotland, Wales)\n",
    "                Variable: Marine\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-GBN_solar_capacity\n",
    "              description: Cumulative solar electrical capacity for Great Britain (England, Scotland, Wales) in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Great Britain (England, Scotland, Wales)\n",
    "                Variable: Solar\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-GBN_wind_capacity\n",
    "              description: Cumulative total wind electrical capacity for Great Britain (England, Scotland, Wales) in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Great Britain (England, Scotland, Wales)\n",
    "                Variable: Wind\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-GBN_wind_offshore_capacity\n",
    "              description: Cumulative offshore wind electrical capacity for Great Britain (England, Scotland, Wales) in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Great Britain (England, Scotland, Wales)\n",
    "                Variable: Wind offshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-GBN_wind_onshore_capacity\n",
    "              description: Cumulative onshore wind electrical capacity for Great Britain (England, Scotland, Wales) in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Great Britain (England, Scotland, Wales)\n",
    "                Variable: Wind onshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-NIR_bioenergy_capacity\n",
    "              description: Cumulative bioenergy electrical capacity for Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Northern Ireland\n",
    "                Variable: Bioenergy\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-NIR_solar_capacity\n",
    "              description: Cumulative solar electrical capacity for Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Northern Ireland\n",
    "                Variable: Solar\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-NIR_wind_onshore_capacity\n",
    "              description: Cumulative onshore wind electrical capacity for Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Northern Ireland\n",
    "                Variable: Wind onshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-UKM_bioenergy_capacity\n",
    "              description: Cumulative bioenergy electrical capacity for the United Kingdom of Great Britain and Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: United Kingdom of Great Britain and Northern Ireland\n",
    "                Variable: Bioenergy\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-UKM_hydro_capacity\n",
    "              description: Cumulative hydro electrical capacity for the United Kingdom of Great Britain and Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: United Kingdom of Great Britain and Northern Ireland\n",
    "                Variable: Hydro\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-UKM_marine_capacity\n",
    "              description: Cumulative marine electrical capacity for the United Kingdom of Great Britain and Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: United Kingdom of Great Britain and Northern Ireland\n",
    "                Variable: Marine\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-UKM_solar_capacity\n",
    "              description: Cumulative solar electrical capacity for the United Kingdom of Great Britain and Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: United Kingdom of Great Britain and Northern Ireland\n",
    "                Variable: Solar\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-UKM_wind_capacity\n",
    "              description: Cumulative total wind electrical capacity for the United Kingdom of Great Britain and Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: United Kingdom of Great Britain and Northern Ireland\n",
    "                Variable: Wind\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-UKM_wind_offshore_capacity\n",
    "              description: Cumulative offshore wind electrical capacity for the United Kingdom of Great Britain and Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: United Kingdom of Great Britain and Northern Ireland\n",
    "                Variable: Wind offshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: GB-UKM_wind_onshore_capacity\n",
    "              description: Cumulative onshore wind electrical capacity for the United Kingdom of Great Britain and Northern Ireland in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: United Kingdom of Great Britain and Northern Ireland\n",
    "                Variable: Wind onshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from BEIS\n",
    "                path: input/original_data/UK/BEIS/renewable-energy-planning-database-march-2020-update.csv\n",
    "            - name: SE_wind_capacity\n",
    "              description: Cumulative total wind electrical capacity for Sweden in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Sweden\n",
    "                Variable: Wind\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Vindbrukskollen\n",
    "                path: input/original_data/SE/Vindbrukskollen/VBK_export_allman_prod.xlsx\n",
    "            - name: SE_wind_offshore_capacity\n",
    "              description: Cumulative offshore wind electrical capacity for Sweden in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Sweden\n",
    "                Variable: Wind offshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Vindbrukskollen\n",
    "                path: input/original_data/SE/Vindbrukskollen/VBK_export_allman_prod.xlsx\n",
    "            - name: SE_wind_onshore_capacity\n",
    "              description: Cumulative onshore wind electrical capacity for Sweden in MW\n",
    "              unit: MW\n",
    "              opsdProperties:\n",
    "                Region: Sweden\n",
    "                Variable: Wind onshore\n",
    "              type: number\n",
    "              source:\n",
    "                title: Own calculation based on plant-level data from Vindbrukskollen\n",
    "                path: input/original_data/SE/Vindbrukskollen/VBK_export_allman_prod.xlsx\n",
    "sources:\n",
    "{sources_metadata}\n",
    "contributors:\n",
    "    - title: Ingmar Schlecht\n",
    "      role: maintainer\n",
    "      organization: Neon GmbH\n",
    "      email: schlecht@neon-energie.de\n",
    "    - title: Milos Simic\n",
    "      role: contributor\n",
    "      email: milos.simic.ms@gmail.com\n",
    "\"\"\".format(**{\n",
    "    'list_of_countries' : list_of_countries,\n",
    "    'list_of_countries_noand' : list_of_countries_noand,\n",
    "    'list_of_countries_keywords' : list_of_countries_keywords,\n",
    "    'covered_countries' : covered_countries,\n",
    "    'sources_metadata' : sources_metadata\n",
    "})\n",
    "\n",
    "metadata = yaml.load(metadata)\n",
    "\n",
    "metadata['homepage'] = 'https://data.open-power-system-data.org/renewable_power_plants/' + settings['version']\n",
    "metadata['id'] = 'https://doi.org/10.25832/renewable_power_plants/' + settings['version']\n",
    "metadata['last_changes'] = settings['changes']\n",
    "metadata['version'] = settings['version']\n",
    "metadata['publicationDate'] = settings['version']\n",
    "metadata['_external'] = 'false'\n",
    "metadata['attribution'] = 'Open Power System Data. 2020. Data Package Renewable power plants.' +\\\n",
    "                          ' Version 2020-05-20. https://doi.org/10.25832/renewable_power_plants/2020-05-20.' +\\\n",
    "                          ' (Primary data from various sources, for a complete list see URL)'\n",
    "metadata['_metadataVersion'] = '1.2'\n",
    "lastYear = int(settings['version'][0:4])-1\n",
    "\n",
    "metadata['temporal'] = {\n",
    "    'referenceDate': settings['referenceDate']\n",
    "}\n",
    "\n",
    "metadata['documentation'] = 'https://github.com/Open-Power-System-Data/renewable_power_plants/blob/'+settings['version']+'/main.ipynb'\n",
    "\n",
    "datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '), ensure_ascii=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-08-21T01:30:07.752025Z",
     "start_time": "2020-08-21T01:29:58.513008Z"
    }
   },
   "outputs": [],
   "source": [
    "# Add metadata fields to conform to the OPDF Metadata version 1.2\n",
    "def get_sha_hash(path, blocksize=65536):\n",
    "    sha_hasher = hashlib.sha256()\n",
    "    with open(path, 'rb') as f:\n",
    "        buffer = f.read(blocksize)\n",
    "        while len(buffer) > 0:\n",
    "            sha_hasher.update(buffer)\n",
    "            buffer = f.read(blocksize)\n",
    "        return sha_hasher.hexdigest()\n",
    "\n",
    "for resource in metadata['resources']:\n",
    "    if resource['format'] == 'csv':\n",
    "        resource['dialect'] = {\n",
    "            'delimiter' : ',',\n",
    "            'decimalChar' : '.',\n",
    "            'lineTerminator' : '\\\\n',\n",
    "            'header' : 'true'\n",
    "        }\n",
    "    file_name = resource['path']\n",
    "    file_path = os.path.join(package_path, file_name)\n",
    "    resource['hash'] = get_sha_hash(file_path)\n",
    "    resource['size'] = os.stat(file_path).st_size\n",
    "    resource['profile'] = 'tabular-data-resource'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sort the fields for each resource according to the default order\n",
    "# (except for the timeseries and markers).\n",
    "for i, resource in enumerate(metadata['resources']):\n",
    "    if resource['name'] not in ['validation_marker', 'renewable_capacity_timeseries']:\n",
    "        print(resource['name'])\n",
    "        fields = resource['schema']['fields']\n",
    "        sorted_fields = sorted(fields, key = lambda field: default_order.index(field['name']))\n",
    "        resource['schema']['fields'] = sorted_fields"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create JSON\n",
    "datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '), ensure_ascii=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write the JSON metadata\n",
    "with open(os.path.join(package_path, 'datapackage.json'), 'w', encoding='utf-8') as f:\n",
    "    f.write(datapackage_json)\n",
    "    f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate checksums\n",
    "\n",
    "Generates checksums.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "files = [\n",
    "    'validation_marker.csv', \n",
    "    'renewable_power_plants.sqlite',\n",
    "    'renewable_power_plants.xlsx',\n",
    "]\n",
    "\n",
    "for country in all_countries_including_dirty:\n",
    "    if country in table_names:\n",
    "        files.append(table_names[country]+'.csv')\n",
    "\n",
    "files.append('renewable_capacity_timeseries.csv')\n",
    "\n",
    "files.append('renewable_power_plants_EU.csv')\n",
    "    \n",
    "with open('checksums.txt', 'w') as f:\n",
    "    for file_name in sorted(files):\n",
    "        print(file_name)\n",
    "        file_hash = get_sha_hash(os.path.join(package_path, file_name))\n",
    "        f.write('{},{}\\n'.format(file_name, file_hash))\n",
    "        print('\\tDone!')\n",
    "    print('Done!')"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "p33",
   "language": "python",
   "name": "p33"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "latex_envs": {
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 0
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "716px",
    "left": "104px",
    "top": "280px",
    "width": "231px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "position": {
    "height": "531px",
    "left": "1530px",
    "right": "40px",
    "top": "273px",
    "width": "350px"
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}