# Key IO Data Structures in SuPy

## Introduction

The cell below demonstrates a minimal case of SuPy simulation with all key IO data structures included:

In [1]:
import supy as sp
df_state_init, df_forcing = sp.load_SampleData()
df_output, df_state_final = sp.run_supy(df_forcing, df_state_init)

* Input:
    SuPy requires two `DataFrame`s to perform a simulation, which are:
    * `df_state_init`: model initial states;
    * `df_forcing`: forcing data.
    
    These input data can be loaded either through calling [load_SampleData()](../auto-gen/supy.load_forcing_grid.rst#supy.load_forcing_grid) as shown above or using [init_supy](../auto-gen/supy.init_supy.rst#supy.init_supy). Or, based on the loaded sample `DataFrame`s, you can modify the content to create new `DataFrame`s for your specific needs.


* Output:
    The output data by SuPy consists of two `DataFrame`s:
    * `df_output`: model output results; this is usually the basis for scientific analysis.
    * `df_state_final`: model final states; any of its entries can be used as a `df_state_init` to start another SuPy simulation.


## Input

### `df_state_init`: model initial states

In [2]:
df_state_init.head()

var,ah_min,ah_min,ah_slope_cooling,ah_slope_cooling,ah_slope_heating,ah_slope_heating,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,...,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,tair24hr,numcapita,gridiv
ind_dim,"(0,)","(1,)","(0,)","(1,)","(0,)","(1,)","(0, 0)","(0, 1)","(1, 0)","(1, 1)","(2, 0)","(2, 1)","(3, 0)","(3, 1)","(4, 0)",...,"(275,)","(276,)","(277,)","(278,)","(279,)","(280,)","(281,)","(282,)","(283,)","(284,)","(285,)","(286,)","(287,)",0,0
grid,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2
98,15.0,15.0,2.7,2.7,2.7,2.7,0.57,0.65,0.45,0.49,0.43,0.46,0.4,0.47,0.4,...,273.15,273.15,273.15,273.15,273.15,273.15,273.15,273.15,273.15,273.15,273.15,273.15,273.15,204.58,98


`df_state_init` is organised with ***grids*** in __rows__ and ***their states*** in __columns__. The details of all state variables can be found in [the description page](../data-structure/df_state.rst#df-state-variables).

Please note the properties are stored as *flattened values* to fit into the tabular format due to the nature of `DataFrame` though they may actually be of higher dimension (e.g. [ahprof_24hr](../data-structure/df_state.rst#cmdoption-arg-ahprof-24hr) with the dimension {24, 2}). To indicate the variable dimensionality of these properties, SuPy use the `ind_dim` level in columns for indices of values:

* `0` for scalars;
* `(ind_dim1, ind_dim2, ...)` for arrays (for a generic sense, vectors are 1D arrays).

Take `ohm_coef` below for example, it has a dimension of {8, 4, 3} according to [the description](../data-structure/df_state.rst#cmdoption-arg-ohm-coef), which implies the actual values used by SuPy in simulations are passed in a layout as an array of the dimension {8, 4, 3}. As such, to get proper values passed in, users should follow the dimensionality requirement to prepare/modify `df_state_init`. 

In [3]:
df_state_init.loc[:,'ohm_coef']

ind_dim,"(0, 0, 0)","(0, 0, 1)","(0, 0, 2)","(0, 1, 0)","(0, 1, 1)","(0, 1, 2)","(0, 2, 0)","(0, 2, 1)","(0, 2, 2)","(0, 3, 0)","(0, 3, 1)","(0, 3, 2)","(1, 0, 0)","(1, 0, 1)","(1, 0, 2)",...,"(6, 3, 0)","(6, 3, 1)","(6, 3, 2)","(7, 0, 0)","(7, 0, 1)","(7, 0, 2)","(7, 1, 0)","(7, 1, 1)","(7, 1, 2)","(7, 2, 0)","(7, 2, 1)","(7, 2, 2)","(7, 3, 0)","(7, 3, 1)","(7, 3, 2)"
grid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
98,0.719,0.194,-36.6,0.719,0.194,-36.6,0.719,0.194,-36.6,0.719,0.194,-36.6,0.238,0.427,-16.7,...,0.5,0.21,-39.1,0.25,0.6,-30.0,0.25,0.6,-30.0,0.25,0.6,-30.0,0.25,0.6,-30.0


### `df_forcing`: forcing data

`df_forcing` is organised with ***temporal records*** in __rows__ and ***forcing variables*** in __columns__. The details of all forcing variables can be found in [the description page](../data-structure/df_forcing.rst#df-forcing-variables). 


The missing values can be specified with `-999`s, which are the default NANs accepted by SuPy and its backend SUEWS.

In [4]:
df_forcing.head()

Unnamed: 0,iy,id,it,imin,qn,qh,qe,qs,qf,U,RH,Tair,pres,rain,kdown,snow,ldown,fcld,Wuh,xsmd,lai,kdiff,kdir,wdir,isec
2012-01-01 00:05:00,2012,1,0,5,-999.0,-999.0,-999.0,-999.0,-999.0,4.515,85.463333,11.77375,1001.5125,0.0,0.153333,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,0.0
2012-01-01 00:10:00,2012,1,0,10,-999.0,-999.0,-999.0,-999.0,-999.0,4.515,85.463333,11.77375,1001.5125,0.0,0.153333,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,0.0
2012-01-01 00:15:00,2012,1,0,15,-999.0,-999.0,-999.0,-999.0,-999.0,4.515,85.463333,11.77375,1001.5125,0.0,0.153333,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,0.0
2012-01-01 00:20:00,2012,1,0,20,-999.0,-999.0,-999.0,-999.0,-999.0,4.515,85.463333,11.77375,1001.5125,0.0,0.153333,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,0.0
2012-01-01 00:25:00,2012,1,0,25,-999.0,-999.0,-999.0,-999.0,-999.0,4.515,85.463333,11.77375,1001.5125,0.0,0.153333,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,0.0


<div class="alert alert-info">
 
**Note:**

The index of `df_forcing` **SHOULD BE** strictly of `DatetimeIndex` type if you want create a `df_forcing` for SuPy simulation.
The SuPy runtime time-step size is instructed by the `df_forcing` with its index information.

</div>


The infomation below indicates SuPy will run at a 5 min (i.e. 300 s) time-step if driven by this specific `df_forcing`:

In [5]:
freq_forcing=df_forcing.index.freq
freq_forcing

<300 * Seconds>

## Output

### `df_output`: model output results

`df_output` is organised with ***temporal records of grids*** in __rows__ and ***output variables of different groups*** in __columns__. The details of all forcing variables can be found in [the description page](../data-structure/df_output.rst#df-output-variables). 

In [6]:
df_output.head()

Unnamed: 0_level_0,group,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,SUEWS,...,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState,DailyState
Unnamed: 0_level_1,var,Kdown,Kup,Ldown,Lup,Tsurf,QN,QF,QS,QH,QE,QHlumps,QElumps,QHresis,Rain,Irr,...,WU_Grass2,WU_Grass3,deltaLAI,LAIlumps,AlbSnow,DensSnow_Paved,DensSnow_Bldgs,DensSnow_EveTr,DensSnow_DecTr,DensSnow_Grass,DensSnow_BSoil,DensSnow_Water,a1,a2,a3
grid,datetime,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2
98,2012-01-01 00:05:00,0.153333,0.018279,344.310184,371.986259,11.775615,-27.541021,40.574001,-46.53243,62.420064,3.576493,49.732605,9.832804,0.042327,0.0,0.0,...,,,,,,,,,,,,,,,
98,2012-01-01 00:10:00,0.153333,0.018279,344.310184,371.986259,11.775615,-27.541021,39.724283,-46.53243,61.654096,3.492744,48.98036,9.735333,0.042294,0.0,0.0,...,,,,,,,,,,,,,,,
98,2012-01-01 00:15:00,0.153333,0.018279,344.310184,371.986259,11.775615,-27.541021,38.874566,-46.53243,60.885968,3.411154,48.228114,9.637861,0.04226,0.0,0.0,...,,,,,,,,,,,,,,,
98,2012-01-01 00:20:00,0.153333,0.018279,344.310184,371.986259,11.775615,-27.541021,38.024849,-46.53243,60.115745,3.33166,47.475869,9.540389,0.042226,0.0,0.0,...,,,,,,,,,,,,,,,
98,2012-01-01 00:25:00,0.153333,0.018279,344.310184,371.986259,11.775615,-27.541021,37.175131,-46.53243,59.343488,3.2542,46.723623,9.442917,0.042192,0.0,0.0,...,,,,,,,,,,,,,,,


`df_output` are recorded at the same temporal resolution as `df_forcing`:

In [7]:
freq_out = df_output.index.levels[1].freq
(freq_out, freq_out == freq_forcing)

(<300 * Seconds>, True)

### `df_state_final`: model final states

`df_state_final` has the identical data structure as `df_state_init` except for the extra level `datetime` in index, which stores the temporal information associated with model states. Such structure can facilitate the reuse of it as initial model states for other simulations (e.g., diagnostics of runtime model states with `save_state=True` set in `run_supy`; or simply using it as the initial conditions for future simulations starting at the ending times of previous runs).

The meanings of state variables in `df_state_final` can be found in [the description page](../data-structure/df_state.rst#df-state-variables).

In [8]:
df_state_final.head()

Unnamed: 0_level_0,var,aerodynamicresistancemethod,ah_min,ah_min,ah_slope_cooling,ah_slope_cooling,ah_slope_heating,ah_slope_heating,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,ahprof_24hr,...,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,wuprofm_24hr,z,z0m_in,zdm_in
Unnamed: 0_level_1,ind_dim,0,"(0,)","(1,)","(0,)","(1,)","(0,)","(1,)","(0, 0)","(0, 1)","(1, 0)","(1, 1)","(2, 0)","(2, 1)","(3, 0)","(3, 1)",...,"(18, 0)","(18, 1)","(19, 0)","(19, 1)","(20, 0)","(20, 1)","(21, 0)","(21, 1)","(22, 0)","(22, 1)","(23, 0)","(23, 1)",0,0,0
datetime,grid,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2
2012-01-01 00:05:00,98,2,15.0,15.0,2.7,2.7,2.7,2.7,0.57,0.65,0.45,0.49,0.43,0.46,0.4,0.47,...,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,49.6,1.9,14.2
2013-01-01 00:05:00,98,2,15.0,15.0,2.7,2.7,2.7,2.7,0.57,0.65,0.45,0.49,0.43,0.46,0.4,0.47,...,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,49.6,1.9,14.2
