{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Restart Simulation from Log Files\n", "Author(s): Paul Miles | Date Created: May 5, 2020\n", "\n", "Many models are time consuming to evaluate. As MCMC simulations required many model evaluations, it can be useful to periodically save the chain elements to a file. This can be useful for a variety of reasons:\n", "\n", "- Chain visualization while simulation continues to run.\n", "- Chain is saved in the event that simulation ends prematurely. \n", "\n", "This is important when working on remote systems where you may have limited computation time. This tutorial demonstrates the following:\n", "\n", "- How to specify a log file directory\n", "- Format to save log files in (binary or text)\n", "- How to use these log to start a new simulation\n", "\n", "Similar or related topics are also discussed in the tutorial [Chain Log Files](Chain_Log_Files.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Run Simulation & Export to Log Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import required paths." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.9.0\n" ] } ], "source": [ "import numpy as np\n", "from pymcmcstat.MCMC import MCMC\n", "from datetime import datetime\n", "import pymcmcstat\n", "print(pymcmcstat.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define a simple model and sum-of-squares function." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# define test model function\n", "def test_modelfun(xdata, theta):\n", " m = theta[0]\n", " b = theta[1]\n", " nrow, ncol = xdata.shape\n", " y = np.zeros([nrow,1])\n", " y[:,0] = m*xdata.reshape(nrow,) + b\n", " return y\n", "\n", "def test_ssfun(theta, data):\n", " xdata = data.xdata[0]\n", " ydata = data.ydata[0]\n", " # eval model\n", " ymodel = test_modelfun(xdata, theta)\n", " # calc sos\n", " ss = sum((ymodel[:, 0] - ydata[:, 0])**2)\n", " return ss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize MCMC object:\n", "- Add data\n", "- Define model settings\n", "- Define model parameters" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Initialize MCMC object\n", "mcset = MCMC()\n", "# Add data\n", "nds = 100\n", "x = np.linspace(2, 3, num=nds)\n", "y = 2.*x + 3. + 0.1*np.random.standard_normal(x.shape)\n", "mcset.data.add_data_set(x, y)\n", "# update model settings\n", "mcset.model_settings.define_model_settings(sos_function=test_ssfun)\n", "\n", "mcset.parameters.add_model_parameter(\n", " name='m',\n", " theta0=2.,\n", " minimum=-10,\n", " maximum=np.inf,\n", " sample=True)\n", "mcset.parameters.add_model_parameter(\n", " name='b',\n", " theta0=-5.,\n", " minimum=-10,\n", " maximum=100,\n", " sample=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define log file directory and turn on flags in simulations options\n", "The following keyword arguments of the simulation options allow you to setup the log files.\n", "\n", "- `savedir`: Directory in which to store log files. If not specified, but log files turned on, then saves to directory with naming convention 'YYYYMMDD_hhmmss_chain_log'.\n", "- `save_to_bin`: Save log files in binary format. Uses `h5py` package for binary read/write.\n", "- `save_to_txt`: Save log files in text format. Uses `numpy` package for text read/write.\n", "\n", "By default the feature is set to `False`. You can save to either format or to both. Regardless of what format is used to save the chain, a text log file will be included which appends a date/time stamp with corresponding chain indices. This will be explained in more detail later.\n", "\n", "We choose to save to the resource directory, and to save to .txt only (`save_to_txt=True`). To accommodate restart, it is important to also indicate `save_to_json = True`. By default, the .json file will contain all information, including the chain files. However, since we are saving the chains to a log file, we include the argument `save_lightly = True`. This results in the required meta data being saved to the .json file instead of the potentially large chains." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " [-----------------100%-----------------] 5000 of 5000 complete in 5.4 sec" ] } ], "source": [ "import os\n", "datestr = datetime.now().strftime('%Y%m%d_%H%M%S')\n", "savedir = 'resources' + os.sep + str('{}_{}'.format(datestr, 'demo_restart'))\n", "mcset.simulation_options.define_simulation_options(\n", " nsimu=int(5e3), updatesigma=1, method='dram',\n", " savesize=1000, save_to_json=True,\n", " verbosity=0, waitbar=True, save_to_txt=True,\n", " save_lightly=True, savedir=savedir)\n", "\n", "mcset.run_simulation()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To verify the restart procedure, we display the final chain values for our two parameters." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Final chain values: [1.95678129 3.13920213]\n" ] } ], "source": [ "results = mcset.simulation_results.results\n", "chain = results['chain']\n", "print('Final chain values: {}'.format(chain[-1, :]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Start a New Simulation from the Original Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe that the folder `20200505_213622_demo_restart` matches the default pattern for the output directory, and we display its contents" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "20200505_213622_mcmc_simulation.json covchainfile.txt sschainfile.txt\r\n", "chainfile.txt s2chainfile.txt txtlogfile.txt\r\n" ] } ], "source": [ "ls resources/20200505_213622_demo_restart" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, there are log files saved in text (.txt) format. There is also a .json file that contains all the necessary meta data to restart the simulation. Note, if you run this simulation on your machine, the results folder will be different because of the date/time stamp." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run New Simulation" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "del mcset\n", "mcset = MCMC()\n", "# Add data\n", "nds = 100\n", "x = np.linspace(2, 3, num=nds)\n", "y = 2.*x + 3. + 0.1*np.random.standard_normal(x.shape)\n", "mcset.data.add_data_set(x, y)\n", "# update model settings\n", "mcset.model_settings.define_model_settings(sos_function=test_ssfun)\n", "\n", "mcset.parameters.add_model_parameter(\n", " name='m',\n", " theta0=2.,\n", " minimum=-10,\n", " maximum=np.inf,\n", " sample=True)\n", "mcset.parameters.add_model_parameter(\n", " name='b',\n", " theta0=-5.,\n", " minimum=-10,\n", " maximum=100,\n", " sample=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To access the previously run simulations, we must defined the location they were saved and also the name of the .json restart file." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " [-----------------100%-----------------] 5000 of 5000 complete in 4.4 sec" ] } ], "source": [ "import os\n", "savedir = 'resources' + os.sep + '20200505_213622_demo_restart'\n", "restart_file = savedir + os.sep + '20200505_213622_mcmc_simulation.json'\n", "mcset.simulation_options.define_simulation_options(\n", " nsimu=int(5e3), updatesigma=1, method='dram',\n", " savesize=1000, save_to_json=True,\n", " verbosity=0, waitbar=True, save_to_txt=True,\n", " save_lightly=True,\n", " json_restart_file=restart_file)\n", "mcset.run_simulation()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, the simulation ran, but did it successfully use the old information?... We can display the first chain elements to see if they match the final from the first simulation." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.95678129 3.13920213]\n" ] } ], "source": [ "results = mcset.simulation_results.results\n", "chain = results['chain']\n", "print('{}'.format(chain[0, :]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While this isn't the most streamlined approach to restarting simulations, it is a possible solution.\n", "\n", "**Caveats:**\n", "It should be noted that this approach does not take the final element of the `s2chain` when initializing the error variance for the restart simulation - this is an artifact of the source code the Python package is based on. You could add this feature by reading the `s2chainfile` and extracting the final row. You can then include it in the simulation by specifying `sigma2` when defining the `model_settings`.\n", "\n", "Another current limitation is that all the simulations will be exported to separate folders/files. The `pymcmcstat` package does not currently have a simple procedure for splicing these result sets together, but it is possible do merge the chain results using other Python methods." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": true, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }