{ "cells": [ { "cell_type": "markdown", "id": "adjacent-petersburg", "metadata": {}, "source": [ "# How to Set-up Your Simulation Code for Use With EasyVVUQ" ] }, { "cell_type": "markdown", "id": "mechanical-control", "metadata": {}, "source": [ "**Author**: Vytautas Jancauskas, LRZ (jancauskas@lrz.de)" ] }, { "cell_type": "markdown", "id": "lightweight-actress", "metadata": {}, "source": [ "In this tutorial we will describe what is needed in order to use EasyVVUQ with your existing simulation code. Some other steps might need to be taken depending on the method of execution chose. For example you might need to also create a Docker container for your simulation. This is discussed in other tutorials. Note that this tutorial is mostly read-only due to the fact that the encoder/decoder code in isolation does not really do anything. To this end cell cells contain the special ```%%script false``` tag." ] }, { "cell_type": "markdown", "id": "laughing-bridal", "metadata": {}, "source": [ "The two components that will generally require the most customisation are Encoder and Decoder. An Encoder is responsible for preparing input files and directory structures needed to run the simulation. A Decoder is responsible for parsing the output of the simulation. We will see how to prepare these components for your simulation." ] }, { "cell_type": "code", "execution_count": 1, "id": "royal-office", "metadata": {}, "outputs": [], "source": [ "import easyvvuq as uq\n", "import chaospy as cp" ] }, { "cell_type": "markdown", "id": "cleared-header", "metadata": {}, "source": [ "# Encoders" ] }, { "cell_type": "markdown", "id": "bizarre-idaho", "metadata": {}, "source": [ "As far as encoders are concerned the existing classes should be enough (you will not need to create new classes by inheritance). The only issue should be which one to choose." ] }, { "cell_type": "markdown", "id": "painful-outdoors", "metadata": {}, "source": [ "## GenericEncoder" ] }, { "cell_type": "markdown", "id": "electoral-flavor", "metadata": {}, "source": [ "The simplest type of encoder is the ```GenericEncoder``` which is a simple template based encoder. To use it you need to prepare a template input file. This file should replace all numerical values you want to vary during sampling with strings of the form ```$key``` where ```$``` can be literally the dollar sign or can be replaced by some other delimiter when instantiating the encoder. The ```key``` part is to be replaced by the name of the variable in the ```vary``` dictionary." ] }, { "cell_type": "markdown", "id": "outer-movement", "metadata": {}, "source": [ "Suppose your simulation takes files of the following format (the file being stored under ```input.template```):\n", "\n", "```\n", "[inputs]\n", "variable1=$var1\n", "variable2=0.5\n", "variable3=$var2\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "id": "descending-spotlight", "metadata": {}, "outputs": [], "source": [ "generic_encoder = uq.encoders.GenericEncoder(\n", " 'input.template', \n", " delimiter='$', \n", " target_filename='input.cfg')" ] }, { "cell_type": "markdown", "id": "excited-tuning", "metadata": {}, "source": [ "Supposing we have an input parameter distribution dictionary as shown below." ] }, { "cell_type": "code", "execution_count": 3, "id": "imperial-recommendation", "metadata": {}, "outputs": [], "source": [ "vary = {\n", " 'var1': cp.Uniform(0.0, 1.0),\n", " 'var2': cp.Normal(0.0, 1.0)\n", "}" ] }, { "cell_type": "markdown", "id": "hybrid-score", "metadata": {}, "source": [ "When sampling for a specific analysis method EasyVVUQ will then produce ```input.cfg``` files where ```$var1``` will be replaced with values from a Uniform distribution with parameters zero and one and likewise ```$var2``` will be replaced with values from a Normal distribution with zero mean and standard deviation of one. These input files will then be used to run the simulation. " ] }, { "cell_type": "markdown", "id": "meaningful-petite", "metadata": {}, "source": [ "## JinjaEncoder" ] }, { "cell_type": "markdown", "id": "later-clark", "metadata": {}, "source": [ "While the basic encoder above will probably be enough for most cases EasyVVUQ also supports the Jinja2 templating language for more complicated cases." ] }, { "cell_type": "code", "execution_count": 4, "id": "packed-graduation", "metadata": {}, "outputs": [], "source": [ "jinja_encoder = uq.encoders.JinjaEncoder('input.jinja2', target_filename='input.cfg')" ] }, { "cell_type": "markdown", "id": "enormous-carnival", "metadata": {}, "source": [ "The JinjaEncoder works very much like the GenericEncoder, with the difference that the template can contain more complicated expressions, and even Python code.\n", "\n", "Variables and expressions are written inside double braces: `{{var1}}`\n", "Jinja has the concept of filters, used with a `|` (pipe) charachter. For instance, `{{var1 | int}}` converts var1 to an integer.\n", "\n", "You can also write Python expressions inside the braces. For example, you can format floating point variables to a certain precision using Python string formatting:\n", "`'%0.3f' % thevariable`\n", "Logical expression producing different strings depending on a logical variable:\n", "\n", "`l_sb = {{'.true.' if l_sb else '.false.'}}`\n", "\n", "The full documentation of the template language is here: https://jinja.palletsprojects.com/en/3.1.x/templates/\n", "\n", "**Sidebar example: scaling multiple input parameters using one sampled value**\n", "\n", "The key method to have multiple input variables be multiplied by the same randomly generated sample value is to have the base values defined as constant parameters, while the sample value is a separate parameter that generated from a distribution.\n", "\n", "For instance, if we have a random number `rn`, and we would like to parameters `r1` and `r2` to be scaled with `rn`, then we can define the parameters as follows:\n", "\n", "```\n", "params = {\n", " \"rn\": {\"type\": \"float\", \"default\": 0.5},\n", " \"r1_base\": {\"type\": \"float\", \"default\": 0.2},\n", " \"r2_base\": {\"type\": \"float\", \"default\": 2.0},\n", "}\n", "```\n", "\n", "And define a vary as follows:\n", "\n", "```\n", "vary = {\n", " 'rn': cp.Uniform(0.0, 1.0),\n", "}\n", "```\n", "\n", "Then, in the template we would write for example something like:\n", "\n", "```\n", "[inputs]\n", "r1={{r1_base*rn}}\n", "r2={{r2_base*rn}}\n", "\n", "```\n", "\n", "**End of sidebar**\n", "\n", "Now for the simple case we had previously the template file would look as follows." ] }, { "cell_type": "markdown", "id": "joined-demographic", "metadata": {}, "source": [ "```\n", "[inputs]\n", "variable1={{ var1 }}\n", "variable2=0.5\n", "variable3={{ var2 }}\n", "```" ] }, { "cell_type": "markdown", "id": "balanced-correlation", "metadata": {}, "source": [ "## CopyEncoder and DirectoryBuilder" ] }, { "cell_type": "markdown", "id": "offensive-vitamin", "metadata": {}, "source": [ "CopyEncoder is meant to simply copy a file from a local directory to the simulation directory. This is needed when the simulation software depends on a file being in the run directory along with the input files but you don't want to vary anything in that file for analysis. This can be certain databases that a simulation needs, static input or other files." ] }, { "cell_type": "code", "execution_count": 5, "id": "musical-garage", "metadata": {}, "outputs": [], "source": [ "%%script false --no-raise-error\n", "copy_encoder = uq.encoders.CopyEncoder(\"source.txt\", \"root/folder1/target.txt\")" ] }, { "cell_type": "markdown", "id": "vocational-refund", "metadata": {}, "source": [ "DirectoryBuilder is used to create a directory structure inside the run directory. Again, some simulations need this. You want to use this encoder with other encoders and you want to specify it as the first encodder. Combining encoders is discussed in the next section." ] }, { "cell_type": "code", "execution_count": 6, "id": "controlling-cleaning", "metadata": {}, "outputs": [], "source": [ "directory_builder = uq.encoders.DirectoryBuilder(\n", " {\"root\": {\"folder1\": None, \"folder2\": {\"leaf1\": None, \"leaf2\": None}}})" ] }, { "cell_type": "markdown", "id": "caroline-veteran", "metadata": {}, "source": [ "This encoder would create a directory structure that looks like this:\n", "\n", "```\n", "root\\\n", " folder1\\\n", " folder2\\\n", " leaf1\\\n", " leaf2\\\n", "```" ] }, { "cell_type": "markdown", "id": "statewide-fashion", "metadata": {}, "source": [ "## Combining Encoders with MultiEncoder" ] }, { "cell_type": "markdown", "id": "green-notion", "metadata": {}, "source": [ "When a simulation requires a complicated directory structure and/or multiple input files to be present in each run directory you can combine multiple encoders with a MultiEncoder. Please have in mind that the order in which you specify them will matter. So, for example, you will generally want to put DirectoryBuilder first so that the directory structure is created before copying the files over. It could look a bit like the example below." ] }, { "cell_type": "code", "execution_count": 7, "id": "congressional-survival", "metadata": {}, "outputs": [], "source": [ "%%script false --no-raise-error\n", "multi_encoder = uq.encoders.MultiEncoder(\n", " directory_builder, copy_encoder, \n", " generic_encoder, jinja_encoder)" ] }, { "cell_type": "markdown", "id": "functioning-landing", "metadata": {}, "source": [ "# Decoders" ] }, { "cell_type": "markdown", "id": "demographic-sacrifice", "metadata": {}, "source": [ "Decoders will tend to vary more since the parsing of simulation output is a more complex topic. We provide several options that should cover at least some common formats. The most common one (probably) being CSV (comma separated values) file type. If these are not enough it isn't difficult to write your own provided that the simulation output format is somewhat sensible. If it isn't it should still be generally possible." ] }, { "cell_type": "markdown", "id": "compatible-bonus", "metadata": {}, "source": [ "## JSONDecoder and YAMLDecoder" ] }, { "cell_type": "markdown", "id": "separated-strike", "metadata": {}, "source": [ "If your simulation outputs a JSON or YAML file you can use these decoders." ] }, { "cell_type": "code", "execution_count": 8, "id": "private-glory", "metadata": {}, "outputs": [], "source": [ "decoder = uq.decoders.JSONDecoder('output.json', output_columns=['x', 'y'])" ] }, { "cell_type": "markdown", "id": "indoor-novel", "metadata": {}, "source": [ "Or, almost identically." ] }, { "cell_type": "code", "execution_count": 9, "id": "suited-assist", "metadata": {}, "outputs": [], "source": [ "decoder = uq.decoders.YAMLDecoder('output.yaml', output_columns=['x', 'y'])" ] }, { "cell_type": "markdown", "id": "toxic-congo", "metadata": {}, "source": [ "Here the first argument is the filename of the file that the simulation will produce in the run directory. Output columns are variables that we are interested in. These do not have to be top-level - you just need to specify a list if they aren't. For example `['a', 'b', 'c']` will mean that it is located in a JSON structure of the form `{'a' : {'b' : {'c' : value},...},...}`." ] }, { "cell_type": "code", "execution_count": 10, "id": "stretch-kingston", "metadata": {}, "outputs": [], "source": [ "decoder = uq.decoders.YAMLDecoder('output.yaml', output_columns=[['a', 'b', 'c'], 'y'])" ] }, { "cell_type": "markdown", "id": "educational-pendant", "metadata": {}, "source": [ "## CSVDecoder" ] }, { "cell_type": "markdown", "id": "smaller-airfare", "metadata": {}, "source": [ "This decoder, predictably, can be used to parse CSV files. The only notable difference between this and previous decoders is that usually the values will be vectors (columns of the CSV files). So you would use this in situations where you want to work with vector quantities of interest." ] }, { "cell_type": "code", "execution_count": 11, "id": "effective-tuesday", "metadata": {}, "outputs": [], "source": [ "%%script false --no-raise-error\n", "decoder = uq.decoders.CSVDecoder('output.csv', output_columns=['x', 'y'])" ] }, { "cell_type": "markdown", "id": "authentic-tactics", "metadata": {}, "source": [ "## Custom Decoders" ] }, { "cell_type": "markdown", "id": "coupled-travel", "metadata": {}, "source": [ "If none of the above, you can easily create a custom encoder." ] }, { "cell_type": "code", "execution_count": 12, "id": "hundred-narrow", "metadata": {}, "outputs": [], "source": [ "class CustomDecoder:\n", " def __init__(self, target_filename=None, output_columns=None):\n", " pass\n", " \n", " def parse_sim_output(self, run_info={}):\n", " return {}" ] }, { "cell_type": "markdown", "id": "found-requirement", "metadata": {}, "source": [ "The methods defined above are the ones you must define for you decoder. The arguments to `__init__` should be of the same form as the arguments in the pre-defined decoders. The `run_info` has several different fields, but the relevant one is `run_dir` that contains the absolute path to the run directory (and where you will find the simulation output usually). " ] }, { "cell_type": "markdown", "id": "friendly-negative", "metadata": {}, "source": [ "The method `parse_sim_output` is responsible for parsing the simulation output and returning the data in an EasyVVUQ compatible format. This format is very simple - it is a single level deep dictionary where keys are variable names and values are either scalar values or lists if they are vectors. " ] }, { "cell_type": "markdown", "id": "fluid-panic", "metadata": {}, "source": [ "For example:" ] }, { "cell_type": "markdown", "id": "purple-reservation", "metadata": {}, "source": [ "```\n", "{'a' : 0.01, 'b' : [1, 2, 3], 'c' : ...}\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 5 }