{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Creating a `System` representing the entire U.S."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "**Originally Contributed by**: Clayton Barrows"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Introduction"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "This example demonstrates how to assemble a `System` representing the entire U.S. using\n",
    "[PowerSystems.jl](https://github.com/NREL-SIIP/powersystems.jl) and the data assembled by\n",
    "[Xu, et. al.](https://arxiv.org/abs/2002.06155). We'll use the same tabular data parsing\n",
    "capability [demonstrated on the RTS-GMLC dataset](https://nbviewer.jupyter.org/github/NREL-SIIP/SIIPExamples.jl/blob/master/notebook/2_PowerSystems_examples/04_parse_tabulardata.ipynb)."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Dependencies"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "using SIIPExamples\n",
    "using PowerSystems\n",
    "using TimeSeries\n",
    "using Dates\n",
    "using TimeZones\n",
    "using DataFrames\n",
    "using CSV"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Fetch Data\n",
    "PowerSystems.jl links to some test data that is suitable for this example.\n",
    "Let's download the test data"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "println(\"downloading data...\")\n",
    "datadir = joinpath(dirname(dirname(pathof(SIIPExamples))), \"US-System\")\n",
    "siip_data = joinpath(datadir, \"SIIP\")\n",
    "if !isdir(datadir)\n",
    "    mkdir(datadir)\n",
    "    mkdir(siip_data)\n",
    "    tempfilename =\n",
    "        download(\"https://zenodo.org/record/3753177/files/USATestSystem.zip?download=1\")\n",
    "    SIIPExamples.unzip(SIIPExamples.os, tempfilename, datadir)\n",
    "end\n",
    "\n",
    "config_dir = joinpath(\n",
    "    dirname(dirname(pathof(SIIPExamples))),\n",
    "    \"script\",\n",
    "    \"2_PowerSystems_examples\",\n",
    "    \"US_config\",\n",
    ")"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Data Formatting\n",
    "This is a big dataset. Typically one would only want to include one of the interconnects\n",
    "available. Lets use Texas to start. You can set `interconnect = nothing` if you want everything."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "interconnect = \"Texas\"\n",
    "timezone = FixedTimeZone(\"UTC-6\")\n",
    "initial_time = ZonedDateTime(DateTime(\"2016-01-01T00:00:00\"), timezone)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "There are a few minor incompatibilities between the data and the supported tabular data\n",
    "format. We can resolve those here.\n",
    "\n",
    "First, PowerSystems.jl only supports parsing piecewise linear generator costs from tabular\n",
    "data. So, we can sample the quadratic polynomial cost curves and provide PWL points."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "println(\"formatting data ...\")\n",
    "!isnothing(interconnect) && println(\"filtering data to include $interconnect ...\")\n",
    "gen = DataFrame(CSV.File(joinpath(datadir, \"plant.csv\")))\n",
    "filter!(row -> row[:interconnect] == interconnect, gen)\n",
    "gencost = DataFrame(CSV.File(joinpath(datadir, \"gencost.csv\")))\n",
    "gen = innerjoin(gen, gencost, on = :plant_id, makeunique = true, validate = (false, false))\n",
    "\n",
    "function make_pwl(gen::DataFrame, traunches = 2)\n",
    "    output_pct_cols = [\"output_point_\" * string(i) for i in 0:traunches]\n",
    "    hr_cols = [\"heat_rate_incr_\" * string(i) for i in 1:traunches]\n",
    "    pushfirst!(hr_cols, \"heat_rate_avg_0\")\n",
    "    pwl = DataFrame(repeat([Float64], 6), Symbol.(vcat(output_pct_cols, hr_cols)))\n",
    "    for row in eachrow(gen)\n",
    "        traunch_len = (1.0 - row.Pmin / row.Pmax) / traunches\n",
    "        pct = [row.Pmin / row.Pmax + i * traunch_len for i in 0:traunches]\n",
    "        #c(pct) = pct * row.Pmax * (row.GenIOB + row.GenIOC^2 + row.GenIOD^3)\n",
    "        c(pct) = pct * row.Pmax * (row.c1 + row.c2^2) + row.c0 #this formats the \"c\" columns to hack the heat rate parser in PSY\n",
    "        hr = [c(pct[1])]\n",
    "        [push!(hr, c(pct[i + 1]) - hr[i]) for i in 1:traunches]\n",
    "        push!(pwl, vcat(pct, hr))\n",
    "    end\n",
    "    return hcat(gen, pwl)\n",
    "end\n",
    "\n",
    "gen = make_pwl(gen);\n",
    "\n",
    "gen[!, \"fuel_price\"] .= 1000.0;  #this formats the \"c\" columns to hack the heat rate parser in PSY"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "There are some incomplete aspects of this dataset. Here, I've assigned some approximate\n",
    "minimum up/down times, and some minor adjustments to categories. There are better\n",
    "ways to do this, but this works for this script..."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "gen[:, :unit_type] .= \"OT\"\n",
    "gen[:, :min_up_time] .= 0.0\n",
    "gen[:, :min_down_time] .= 0.0\n",
    "gen[:, :ramp_30] .= gen[:, :ramp_30] ./ 30.0 # we need ramp rates in MW/min\n",
    "[\n",
    "    gen[gen.type .== \"wind\", col] .= [\"Wind\", 0.0, 0.0][ix] for\n",
    "    (ix, col) in enumerate([:unit_type, :min_up_time, :min_down_time])\n",
    "]\n",
    "[\n",
    "    gen[gen.type .== \"solar\", col] .= [\"PV\", 0.0, 0.0][ix] for\n",
    "    (ix, col) in enumerate([:unit_type, :min_up_time, :min_down_time])\n",
    "]\n",
    "[\n",
    "    gen[gen.type .== \"hydro\", col] .= [\"HY\", 0.0, 0.0][ix] for\n",
    "    (ix, col) in enumerate([:unit_type, :min_up_time, :min_down_time])\n",
    "]\n",
    "[\n",
    "    gen[gen.type .== \"ng\", col] .= [4.5, 8][ix] for\n",
    "    (ix, col) in enumerate([:min_up_time, :min_down_time])\n",
    "]\n",
    "[\n",
    "    gen[gen.type .== \"coal\", col] .= [24, 48][ix] for\n",
    "    (ix, col) in enumerate([:min_up_time, :min_down_time])\n",
    "]\n",
    "[\n",
    "    gen[gen.type .== \"nuclear\", col] .= [72, 72][ix] for\n",
    "    (ix, col) in enumerate([:min_up_time, :min_down_time])\n",
    "]"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "At the moment, PowerSimulations can't do unit commitment with generators that have Pmin = 0.0"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "idx_zero_pmin = [\n",
    "    g.type in [\"ng\", \"coal\", \"hydro\", \"nuclear\"] && g.Pmin <= 0 for\n",
    "    g in eachrow(gen[:, [:type, :Pmin]])\n",
    "]\n",
    "gen[idx_zero_pmin, :Pmin] = gen[idx_zero_pmin, :Pmax] .* 0.05\n",
    "\n",
    "gen[:, :name] = \"gen\" .* string.(gen.plant_id)\n",
    "CSV.write(joinpath(siip_data, \"gen.csv\"), gen)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Let's also merge the zone.csv with the bus.csv and identify bus types"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "bus = DataFrame(CSV.File(joinpath(datadir, \"bus.csv\")))\n",
    "!isnothing(interconnect) && filter!(row -> row[:interconnect] == interconnect, bus)\n",
    "zone = DataFrame(CSV.File(joinpath(datadir, \"zone.csv\")))\n",
    "bus = leftjoin(bus, zone, on = :zone_id)\n",
    "bustypes = Dict(1 => \"PV\", 2 => \"PQ\", 3 => \"REF\", 4 => \"ISOLATED\")\n",
    "bus.bustype = [bustypes[b] for b in bus.type]\n",
    "filter!(row -> row[:bustype] != PowerSystems.BusTypes.ISOLATED, bus)\n",
    "bus.name = \"bus\" .* string.(bus.bus_id)\n",
    "CSV.write(joinpath(siip_data, \"bus.csv\"), bus)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "We need branch names as strings"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "branch = DataFrame(CSV.File(joinpath(datadir, \"branch.csv\")))\n",
    "branch = leftjoin(\n",
    "    branch,\n",
    "    DataFrames.rename!(bus[:, [:bus_id, :baseKV]], [:from_bus_id, :from_baseKV]),\n",
    "    on = :from_bus_id,\n",
    ")\n",
    "branch = leftjoin(\n",
    "    branch,\n",
    "    DataFrames.rename!(bus[:, [:bus_id, :baseKV]], [:to_bus_id, :to_baseKV]),\n",
    "    on = :to_bus_id,\n",
    ")\n",
    "!isnothing(interconnect) && filter!(row -> row[:interconnect] == interconnect, branch)\n",
    "branch.name = \"branch\" .* string.(branch.branch_id)\n",
    "branch.tr_ratio = branch.from_baseKV ./ branch.to_baseKV\n",
    "CSV.write(joinpath(siip_data, \"branch.csv\"), branch)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "The PowerSystems parser expects the files to be named a certain way.\n",
    "And, we need a \"control_mode\" column in dc-line data"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "dcbranch = DataFrame(CSV.File(joinpath(datadir, \"dcline.csv\")))\n",
    "!isnothing(interconnect) && filter!(row -> row[:from_bus_id] in bus.bus_id, dcbranch)\n",
    "!isnothing(interconnect) && filter!(row -> row[:to_bus_id] in bus.bus_id, dcbranch)\n",
    "dcbranch.name = \"dcbranch\" .* string.(dcbranch.dcline_id)\n",
    "dcbranch[:, :control_mode] .= \"Power\"\n",
    "CSV.write(joinpath(siip_data, \"dc_branch.csv\"), dcbranch)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "### We need to create a reference for where to get timeseries data for each component."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "timeseries = []\n",
    "ts_csv = [\"wind\", \"solar\", \"hydro\", \"demand\"]\n",
    "plant_ids = Symbol.(string.(gen.plant_id))\n",
    "for f in ts_csv\n",
    "    println(\"formatting $f.csv ...\")\n",
    "    csvpath = joinpath(siip_data, f * \".csv\")\n",
    "    csv = DataFrame(CSV.File(joinpath(datadir, f * \".csv\")))\n",
    "    (category, name_prefix, label) =\n",
    "        f == \"demand\" ? (\"Area\", \"\", \"max_active_power\") :\n",
    "        (\"Generator\", \"gen\", \"max_active_power\")\n",
    "    if !(:DateTime in names(csv))\n",
    "        DataFrames.rename!(\n",
    "            csv,\n",
    "            (names(csv)[occursin.(\"UTC\", String.(names(csv)))][1] => :DateTime),\n",
    "        )\n",
    "        #The timeseries data is in UTC, this converts it to a fixed UTC offset\n",
    "        csv.DateTime =\n",
    "            ZonedDateTime.(\n",
    "                DateTime.(csv.DateTime, \"yyyy-mm-dd HH:MM:SS\"),\n",
    "                timezone,\n",
    "                from_utc = true,\n",
    "            )\n",
    "        delete!(csv, csv.DateTime .< initial_time)\n",
    "        csv.DateTime = Dates.format.(csv.DateTime, \"yyyy-mm-ddTHH:MM:SS\")\n",
    "    end\n",
    "    device_names = f == \"demand\" ? unique(bus.zone_name) : gen.name\n",
    "    for id in names(csv)\n",
    "        colname = id\n",
    "        if f == \"demand\"\n",
    "            if Symbol(id) in Symbol.(zone.zone_id)\n",
    "                colname = Symbol(zone[Symbol.(zone.zone_id) .== Symbol(id), :zone_name][1])\n",
    "                DataFrames.rename!(csv, (id => colname))\n",
    "            end\n",
    "            sf = sum(bus[string.(bus.zone_id) .== id, :Pd])\n",
    "        else\n",
    "            if Symbol(id) in plant_ids\n",
    "                colname = Symbol(gen[Symbol.(gen.plant_id) .== Symbol(id), :name][1])\n",
    "                DataFrames.rename!(csv, (id => colname))\n",
    "            end\n",
    "            sf = maximum(csv[:, colname]) == 0.0 ? 1.0 : \"Max\"\n",
    "        end\n",
    "        if String(colname) in device_names\n",
    "            push!(\n",
    "                timeseries,\n",
    "                Dict(\n",
    "                    \"simulation\" => \"DA\",\n",
    "                    \"category\" => category,\n",
    "                    \"module\" => \"InfrastructureSystems\",\n",
    "                    \"type\" => \"SingleTimeSeries\",\n",
    "                    \"component_name\" => String(colname),\n",
    "                    \"name\" => label,\n",
    "                    \"resolution\" => 3600,\n",
    "                    \"scaling_factor_multiplier\" => \"get_max_active_power\",\n",
    "                    \"scaling_factor_multiplier_module\" => \"PowerSystems\",\n",
    "                    \"normalization_factor\" => sf,\n",
    "                    \"data_file\" => csvpath,\n",
    "                ),\n",
    "            )\n",
    "        end\n",
    "    end\n",
    "    CSV.write(csvpath, csv)\n",
    "end\n",
    "\n",
    "timeseries_pointers = joinpath(siip_data, \"timeseries_pointers.json\")\n",
    "open(timeseries_pointers, \"w\") do io\n",
    "    PowerSystems.InfrastructureSystems.JSON3.write(io, timeseries)\n",
    "end"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "### The tabular data format relies on a folder containing `*.csv` files and `.yaml` files\n",
    "describing the column names of each file in PowerSystems terms, and the PowerSystems\n",
    "data type that should be created for each generator type. The respective \"us_decriptors.yaml\"\n",
    "and \"US_generator_mapping.yaml\" files have already been tailored to this dataset."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "println(\"parsing csv files...\")\n",
    "rawsys = PowerSystems.PowerSystemTableData(\n",
    "    siip_data,\n",
    "    100.0,\n",
    "    joinpath(config_dir, \"us_descriptors.yaml\"),\n",
    "    generator_mapping_file = joinpath(config_dir, \"US_generator_mapping.yaml\"),\n",
    ")"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Create a `System`\n",
    "Next, we'll create a `System` from the `rawsys` data. Since a `System` is predicated on a\n",
    "time series resolution and the `rawsys` data includes both 5-minute and 1-hour resolution\n",
    "time series, we also need to specify which time series we want to include in the `System`.\n",
    "The `time_series_resolution` kwarg filters to only include time series with a matching resolution."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "println(\"creating System\")\n",
    "sys = System(rawsys; config_path = joinpath(config_dir, \"us_system_validation.json\"));\n",
    "sys"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "This all took reasonably long, so we can save our `System` using the serialization\n",
    "capability included with PowerSystems.jl:"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "to_json(sys, joinpath(siip_data, \"sys.json\"), force = true)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "---\n",
    "\n",
    "*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*"
   ],
   "metadata": {}
  }
 ],
 "nbformat_minor": 3,
 "metadata": {
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.5.4"
  },
  "kernelspec": {
   "name": "julia-1.5",
   "display_name": "Julia 1.5.4",
   "language": "julia"
  }
 },
 "nbformat": 4
}