{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Forecast run parsing\n", "\n", "The Solar Forecast Arbiter is designed to analyze *non-overlapping* forecasts. The [documentation](https://solarforecastarbiter.org/definitions/) details the motivation for this choice. Briefly, this forces forecast users and providers to think carefully about what kinds of forecasts they want to analyze. Ideally, these choices will be informed by one or more decision-making processes. \n", "\n", "Forecast providers often design systems that create many overlapping forecasts. For example, new forecasts may be issued once per hour and extend for 48 hours. In some systems, the raw forecasts may also have a higher temporal resolution than the end-user application requires. Providers or end users need to parse these forecasts by parameters such as lead time before they can be analyzed. Similarly, these forecast runs must be parsed into one or more non-overlapping \"forecast evaluation time series\" before they can be uploaded to the Solar Forecast Arbiter. The figure below illustrates this situation. It shows three forecast runs (green) issued 1 hour apart, each with 15 minute intervals and a length of 3 hours. These forecast runs are sliced, resampled, and concatenated into two different forecast evaluation time series: a 1 hour ahead, 15 minute interval forecast (blue); and a 2 hour ahead, 1 hour interval forecast (red).\n", "\n", "![timeline](https://solarforecastarbiter.org/images/timeline_merged.svg)\n", "\n", "This tutorial with demonstrate how to prepare many forecast runs for analysis in the Solar Forecast Arbiter." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "from pathlib import Path\n", "import os\n", "import shutil\n", "\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's generate forecast data from a simulated operational system. The system will create a new forecast every hour, saving each forecast to a csv file. The forecast run attributes will be similar to those shown in the figure above:\n", "\n", "* 1 hour issue frequency\n", "* 3 hour run length\n", "* 0 lead time\n", "* 15 minute interval length\n", "\n", "The forecast values will be equal to the initialization hour + the forecast hour." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# make a new directory to store the output. \n", "# a cell at the end of the notebook will delete this directory using shutil.rmtree(fx_path)\n", "fx_path = Path('generated_forecasts')\n", "fx_path.mkdir()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "issue_start = pd.Timestamp('2020-01-01', tz='UTC')\n", "issue_end = issue_start + pd.Timedelta('12h')\n", "issue_freq = pd.Timedelta('1h')\n", "closed = 'left' # consistent with interval averages labeled by the beginning of the interval\n", "issue_times = pd.date_range(start=issue_start, end=issue_end, freq=issue_freq, closed=closed)\n", "\n", "lead_time = pd.Timedelta('0min')\n", "run_length = pd.Timedelta('3h')\n", "interval_length = pd.Timedelta('15min')\n", "\n", "time_format = '%Y%m%dT%H%M%SZ'\n", "\n", "for issue_time in issue_times:\n", " fx_start = issue_time + lead_time\n", " fx_end = fx_start + run_length \n", " fx_index = pd.date_range(start=fx_start, end=fx_end, freq=interval_length, closed=closed, name='timestamp')\n", " fx_values = issue_time.hour + fx_index.hour\n", " fx = pd.Series(fx_values, index=fx_index, name='value')\n", " issue_time_str = issue_time.strftime(time_format)\n", " header = f'# forecast issued at {issue_time_str}\\n'\n", " file_name = f'fx_{issue_time_str}.csv'\n", " with open(fx_path / file_name, 'w') as f:\n", " f.write(header)\n", " fx.to_csv(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a list of all of the files we created" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[PosixPath('generated_forecasts/fx_20200101T000000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T010000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T020000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T030000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T040000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T050000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T060000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T070000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T080000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T090000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T100000Z.csv'),\n", " PosixPath('generated_forecasts/fx_20200101T110000Z.csv')]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fx_files = sorted(fx_path.iterdir())\n", "fx_files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now inspect a couple of the files to confirm the data is as we expected." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# forecast issued at 20200101T000000Z\n", "timestamp,value\n", "2020-01-01 00:00:00+00:00,0\n", "2020-01-01 00:15:00+00:00,0\n", "2020-01-01 00:30:00+00:00,0\n", "2020-01-01 00:45:00+00:00,0\n", "2020-01-01 01:00:00+00:00,1\n", "2020-01-01 01:15:00+00:00,1\n", "2020-01-01 01:30:00+00:00,1\n", "2020-01-01 01:45:00+00:00,1\n", "2020-01-01 02:00:00+00:00,2\n", "2020-01-01 02:15:00+00:00,2\n", "2020-01-01 02:30:00+00:00,2\n", "2020-01-01 02:45:00+00:00,2\n", "\n" ] } ], "source": [ "with open(fx_files[0]) as f:\n", " print(f.read())" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# forecast issued at 20200101T010000Z\n", "timestamp,value\n", "2020-01-01 01:00:00+00:00,2\n", "2020-01-01 01:15:00+00:00,2\n", "2020-01-01 01:30:00+00:00,2\n", "2020-01-01 01:45:00+00:00,2\n", "2020-01-01 02:00:00+00:00,3\n", "2020-01-01 02:15:00+00:00,3\n", "2020-01-01 02:30:00+00:00,3\n", "2020-01-01 02:45:00+00:00,3\n", "2020-01-01 03:00:00+00:00,4\n", "2020-01-01 03:15:00+00:00,4\n", "2020-01-01 03:30:00+00:00,4\n", "2020-01-01 03:45:00+00:00,4\n", "\n" ] } ], "source": [ "with open(fx_files[1]) as f:\n", " print(f.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, so we have a series of overlapping forecasts and we need to slice and reassemble them into something the Solar Forecast Arbiter can use. The for loop below accomplishes this for the hour ahead, 15 minute interval forecast shown in blue in the figure." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "lead_time = pd.Timedelta('1h')\n", "run_length = pd.Timedelta('1h')\n", "\n", "fx_parsed = []\n", "for fx_run in fx_files:\n", " fx = pd.read_csv(fx_run, comment='#', index_col=0, parse_dates=True)\n", " # get issue time from filename. remove .csv suffix, then remove fx_ prefix\n", " issue_time = fx_run.name.split('.')[0].split('_')[1]\n", " issue_time = pd.Timestamp(issue_time)\n", " fx_start = issue_time + lead_time\n", " # -1ns to account for interval_label=beginning when slicing\n", " fx_end = fx_start + run_length - pd.Timedelta('1ns') \n", " fx_sliced = fx.loc[fx_start:fx_end]\n", " fx_parsed.append(fx_sliced)\n", " \n", "blue_fx_concat = pd.concat(fx_parsed)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
timestamp
2020-01-01 01:00:00+00:001
2020-01-01 01:15:00+00:001
2020-01-01 01:30:00+00:001
2020-01-01 01:45:00+00:001
2020-01-01 02:00:00+00:003
2020-01-01 02:15:00+00:003
2020-01-01 02:30:00+00:003
2020-01-01 02:45:00+00:003
2020-01-01 03:00:00+00:005
2020-01-01 03:15:00+00:005
2020-01-01 03:30:00+00:005
2020-01-01 03:45:00+00:005
2020-01-01 04:00:00+00:007
2020-01-01 04:15:00+00:007
2020-01-01 04:30:00+00:007
2020-01-01 04:45:00+00:007
2020-01-01 05:00:00+00:009
2020-01-01 05:15:00+00:009
2020-01-01 05:30:00+00:009
2020-01-01 05:45:00+00:009
2020-01-01 06:00:00+00:0011
2020-01-01 06:15:00+00:0011
2020-01-01 06:30:00+00:0011
2020-01-01 06:45:00+00:0011
2020-01-01 07:00:00+00:0013
2020-01-01 07:15:00+00:0013
2020-01-01 07:30:00+00:0013
2020-01-01 07:45:00+00:0013
2020-01-01 08:00:00+00:0015
2020-01-01 08:15:00+00:0015
2020-01-01 08:30:00+00:0015
2020-01-01 08:45:00+00:0015
2020-01-01 09:00:00+00:0017
2020-01-01 09:15:00+00:0017
2020-01-01 09:30:00+00:0017
2020-01-01 09:45:00+00:0017
2020-01-01 10:00:00+00:0019
2020-01-01 10:15:00+00:0019
2020-01-01 10:30:00+00:0019
2020-01-01 10:45:00+00:0019
2020-01-01 11:00:00+00:0021
2020-01-01 11:15:00+00:0021
2020-01-01 11:30:00+00:0021
2020-01-01 11:45:00+00:0021
2020-01-01 12:00:00+00:0023
2020-01-01 12:15:00+00:0023
2020-01-01 12:30:00+00:0023
2020-01-01 12:45:00+00:0023
\n", "
" ], "text/plain": [ " value\n", "timestamp \n", "2020-01-01 01:00:00+00:00 1\n", "2020-01-01 01:15:00+00:00 1\n", "2020-01-01 01:30:00+00:00 1\n", "2020-01-01 01:45:00+00:00 1\n", "2020-01-01 02:00:00+00:00 3\n", "2020-01-01 02:15:00+00:00 3\n", "2020-01-01 02:30:00+00:00 3\n", "2020-01-01 02:45:00+00:00 3\n", "2020-01-01 03:00:00+00:00 5\n", "2020-01-01 03:15:00+00:00 5\n", "2020-01-01 03:30:00+00:00 5\n", "2020-01-01 03:45:00+00:00 5\n", "2020-01-01 04:00:00+00:00 7\n", "2020-01-01 04:15:00+00:00 7\n", "2020-01-01 04:30:00+00:00 7\n", "2020-01-01 04:45:00+00:00 7\n", "2020-01-01 05:00:00+00:00 9\n", "2020-01-01 05:15:00+00:00 9\n", "2020-01-01 05:30:00+00:00 9\n", "2020-01-01 05:45:00+00:00 9\n", "2020-01-01 06:00:00+00:00 11\n", "2020-01-01 06:15:00+00:00 11\n", "2020-01-01 06:30:00+00:00 11\n", "2020-01-01 06:45:00+00:00 11\n", "2020-01-01 07:00:00+00:00 13\n", "2020-01-01 07:15:00+00:00 13\n", "2020-01-01 07:30:00+00:00 13\n", "2020-01-01 07:45:00+00:00 13\n", "2020-01-01 08:00:00+00:00 15\n", "2020-01-01 08:15:00+00:00 15\n", "2020-01-01 08:30:00+00:00 15\n", "2020-01-01 08:45:00+00:00 15\n", "2020-01-01 09:00:00+00:00 17\n", "2020-01-01 09:15:00+00:00 17\n", "2020-01-01 09:30:00+00:00 17\n", "2020-01-01 09:45:00+00:00 17\n", "2020-01-01 10:00:00+00:00 19\n", "2020-01-01 10:15:00+00:00 19\n", "2020-01-01 10:30:00+00:00 19\n", "2020-01-01 10:45:00+00:00 19\n", "2020-01-01 11:00:00+00:00 21\n", "2020-01-01 11:15:00+00:00 21\n", "2020-01-01 11:30:00+00:00 21\n", "2020-01-01 11:45:00+00:00 21\n", "2020-01-01 12:00:00+00:00 23\n", "2020-01-01 12:15:00+00:00 23\n", "2020-01-01 12:30:00+00:00 23\n", "2020-01-01 12:45:00+00:00 23" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "blue_fx_concat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we repeat the process for the 2 hour ahead, 1 hour interval red forecast." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
timestamp
2020-01-01 02:00:00+00:002
2020-01-01 03:00:00+00:004
2020-01-01 04:00:00+00:006
2020-01-01 05:00:00+00:008
2020-01-01 06:00:00+00:0010
2020-01-01 07:00:00+00:0012
2020-01-01 08:00:00+00:0014
2020-01-01 09:00:00+00:0016
2020-01-01 10:00:00+00:0018
2020-01-01 11:00:00+00:0020
2020-01-01 12:00:00+00:0022
2020-01-01 13:00:00+00:0024
\n", "
" ], "text/plain": [ " value\n", "timestamp \n", "2020-01-01 02:00:00+00:00 2\n", "2020-01-01 03:00:00+00:00 4\n", "2020-01-01 04:00:00+00:00 6\n", "2020-01-01 05:00:00+00:00 8\n", "2020-01-01 06:00:00+00:00 10\n", "2020-01-01 07:00:00+00:00 12\n", "2020-01-01 08:00:00+00:00 14\n", "2020-01-01 09:00:00+00:00 16\n", "2020-01-01 10:00:00+00:00 18\n", "2020-01-01 11:00:00+00:00 20\n", "2020-01-01 12:00:00+00:00 22\n", "2020-01-01 13:00:00+00:00 24" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lead_time = pd.Timedelta('2h')\n", "interval_length = pd.Timedelta('1h')\n", "run_length = pd.Timedelta('1h')\n", "\n", "fx_parsed = []\n", "for fx_run in fx_files:\n", " fx = pd.read_csv(fx_run, comment='#', index_col=0, parse_dates=True)\n", " # get issue time from filename. remove .csv suffix, then remove fx_ prefix\n", " issue_time = fx_run.name.split('.')[0].split('_')[1]\n", " issue_time = pd.Timestamp(issue_time)\n", " fx_start = issue_time + lead_time\n", " # -1ns to account for interval_label=beginning when slicing\n", " fx_end = fx_start + run_length - pd.Timedelta('1ns') \n", " fx_sliced = fx.loc[fx_start:fx_end].resample(interval_length).mean()\n", " fx_parsed.append(fx_sliced)\n", " \n", "red_fx_concat = pd.concat(fx_parsed)\n", "red_fx_concat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These forecasts could be uploaded to the Solar Forecast Arbiter if the corresponding metadata exists. Users may choose to create the metadata using the [Dashboard](https://dashboard.solarforecastarbiter.org) or programmatically. See the [Data Upload and Download in Python](data_upload_download.ipynb) tutorial for an example of how to programmatically create the metadata.\n", "\n", "A more sophisticated workflow might start by formally defining the metadata in the Solar Forecast Arbiter and then use that metadata to parse the forecast runs. This is left as an exercise for the reader!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's be tidy and delete the directory and files that we created." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "shutil.rmtree(fx_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }