{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "
\n", "\n", "# Save Chain to Log File\n", "Author(s): Paul Miles | Date Created: July 19, 2019\n", "\n", "Many models are time consuming to evaluate. As MCMC simulations required many model evaluations, it can be useful to periodically save the chain elements to a file. This can be useful for a variety of reasons:\n", "\n", "- Chain visualization while simulation continues to run.\n", "- Chain is saved in the event that simulation ends prematurely. \n", "\n", "This is important when working on remote systems where you may have limited computation time. This tutorial demonstrates the following:\n", "\n", "- How to specify a log file directory\n", "- Format to save log files in (binary or text)\n", "- How to read in log files for analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Run Simulation & Export to Log Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import required paths." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.9.0\n" ] } ], "source": [ "import numpy as np\n", "from pymcmcstat.MCMC import MCMC\n", "from datetime import datetime\n", "import pymcmcstat\n", "print(pymcmcstat.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define a simple model and sum-of-squares function." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# define test model function\n", "def test_modelfun(xdata, theta):\n", " m = theta[0]\n", " b = theta[1]\n", " nrow, ncol = xdata.shape\n", " y = np.zeros([nrow,1])\n", " y[:,0] = m*xdata.reshape(nrow,) + b\n", " return y\n", "\n", "def test_ssfun(theta, data):\n", " xdata = data.xdata[0]\n", " ydata = data.ydata[0]\n", " # eval model\n", " ymodel = test_modelfun(xdata, theta)\n", " # calc sos\n", " ss = sum((ymodel[:, 0] - ydata[:, 0])**2)\n", " return ss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize MCMC object:\n", "- Add data\n", "- Define model settings\n", "- Define model parameters" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Initialize MCMC object\n", "mcset = MCMC()\n", "# Add data\n", "nds = 100\n", "x = np.linspace(2, 3, num=nds)\n", "y = 2.*x + 3. + 0.1*np.random.standard_normal(x.shape)\n", "mcset.data.add_data_set(x, y)\n", "# update model settings\n", "mcset.model_settings.define_model_settings(sos_function=test_ssfun)\n", "\n", "mcset.parameters.add_model_parameter(\n", " name='m',\n", " theta0=2.,\n", " minimum=-10,\n", " maximum=np.inf,\n", " sample=True)\n", "mcset.parameters.add_model_parameter(\n", " name='b',\n", " theta0=-5.,\n", " minimum=-10,\n", " maximum=100,\n", " sample=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define log file directory and turn on flags in simulations options\n", "The following keyword arguments of the simulation options allow you to setup the log files.\n", "\n", "- `savedir`: Directory in which to store log files. If not specified, but log files turned on, then saves to directory with naming convention 'YYYYMMDD_hhmmss_chain_log'.\n", "- `save_to_bin`: Save log files in binary format. Uses `h5py` package for binary read/write.\n", "- `save_to_txt`: Save log files in text format. Uses `numpy` package for text read/write.\n", "\n", "By default the feature is set to `False`. You can save to either format or to both. Regardless of what format is used to save the chain, a text log file will be included which appends a date/time stamp with corresponding chain indices. This will be explained in more detail later.\n", "\n", "To generate a set of results and save them to a specific directory, the following code can be executed:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import os\n", "datestr = datetime.now().strftime('%Y%m%d_%H%M%S')\n", "savedir = 'resources' + os.sep + str('{}_{}'.format(datestr, 'serial_chain'))\n", "mcset.simulation_options.define_simulation_options(\n", " nsimu=int(5e4), updatesigma=1, method='dram',\n", " savedir=savedir, savesize=1000, save_to_json=True,\n", " verbosity=0, waitbar=False, save_to_txt=True, save_to_bin=True)\n", "\n", "mcset.run_simulation()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Process Simulation\n", "At this point, the simulation is either running, or has completed running. You will observe a folder in the working directory that matches the input argument for `savedir`. In this case, we are going to reference a saved solution from the `resources` directory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe that the folder `resources/20190517_073038_serial_chain` matches the pattern that was specified for `savedir`, and we display its contents" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "20190517_073038_mcmc_simulation.json s2chainfile.h5\r\n", "binlogfile.txt s2chainfile.txt\r\n", "chainfile.h5 sschainfile.h5\r\n", "chainfile.txt sschainfile.txt\r\n", "covchainfile.h5 txtlogfile.txt\r\n", "covchainfile.txt\r\n" ] } ], "source": [ "ls resources/20190517_073038_serial_chain/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, there are log files saved in both binary (h5) and text (txt) format. Note, if you run this simulation on your machine, the results folder (`savedir`) will be different because of the date/time stamp." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Processing the log files\n", "We start by importing several modules from the [pymcmcstat](https://prmiles.wordpress.ncsu.edu/codes/python-packages/pymcmcstat/) package. We note that this operation should be done from a separate script file, and possibly from a separate computer. For example, if running a long simulation on a remote server, you can periodically copy the log files from the remote server and analyze the chains on your local machine." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "from pymcmcstat.chain import ChainProcessing as CP\n", "from pymcmcstat.chain.ChainStatistics import chainstats\n", "from pymcmcstat import mcmcplot as mcp\n", "import time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We initialize the plotting class and define the directory in which to find the log files. If you want to look at results you generated, then chage the value of `savedir` accordingly." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# define directory where log files are saved\n", "savedir = 'resources' + os.sep + '20190517_073038_serial_chain'\n", "# For testing purposes we can repeatedly read in the data to see how binary versus text is processed.\n", "ns = 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read in binary data files and print amount of time it takes to process." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Binary: 0.05322208404541016 sec\n", "\n" ] } ], "source": [ "start = time.time()\n", "for ii in range(ns):\n", " results = CP.read_in_savedir_files(savedir, extension='h5')\n", "end = time.time()\n", "binary_time = end - start\n", "print('Binary: {} sec\\n'.format(binary_time/ns))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Text: 0.4920275926589966 sec\n", "\n" ] } ], "source": [ "start = time.time()\n", "for ii in range(ns):\n", " results = CP.read_in_savedir_files(savedir, extension='txt') \n", "end = time.time()\n", "text_time = end - start\n", "print('Text: {} sec\\n'.format(text_time/ns))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is clearly seen that the binary files are more quickly processed. In either case, the results extracted from the log files are identical, and we can proceed with the analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis\n", "We extract the following from the results dictionary:\n", "- `chain`: Sampling chain for model parameters\n", "- `s2chain`: Observation error chain\n", "- `sschain`: Sum-of-squares error corresponding to each row of `chain`." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "chain = results['chain']\n", "s2chain = results['s2chain']\n", "sschain = results['sschain']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define the burn-in period for the chain as half the simulation run time. Display statistics for burned-in portion of chain." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "------------------------------\n", "name : mean std MC_err tau geweke\n", "$p_{0}$ : 2.1135 0.0391 0.0017 38.1465 0.9989\n", "$p_{1}$ : 2.7027 0.0984 0.0044 37.5644 0.9975\n", "------------------------------\n", "==============================\n", "Acceptance rate information\n", "Chain provided:\n", "Net : 20.32% -> 5079/25000\n", "------------------------------\n" ] } ], "source": [ "# define burnin\n", "nsimu = chain.shape[0]\n", "burnin = int(nsimu/2)\n", "# display chain statistics\n", "stats = chainstats(chain[burnin:,:], returnstats=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot Chain\n", "- Chain panel\n", "- pairwise correlation" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "settings=dict(fig=dict(figsize=(3, 3)))\n", "mcp.plot_chain_panel(chain[burnin:, :], settings=settings)\n", "mcp.plot_pairwise_correlation_panel(chain[burnin:, :], settings=settings);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Print log files\n", "Log files display a date/time stamp associated with when chain segments were appended to the correponding log file.\n", "\n", "| Date | Time | Start | End |\n", "|--------------|---------|---------|---------|\n", "|2019-05-17| 07:30:40|\t0|\t999|\n", "|2019-05-17| 07:30:40|\t1000|\t1999|\n", "|2019-05-17| 07:30:41|\t2000|\t2999|\n", "|2019-05-17| 07:30:41|\t3000|\t3999|\n", "|2019-05-17| 07:30:41|\t4000|\t4999|\n", "|2019-05-17| 07:30:41|\t5000|\t5999|\n", "\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------------------\n", "Display log file: resources/20190517_073038_serial_chain/binlogfile.txt\n", "2019-05-17 07:30:40\t0\t999\n", "2019-05-17 07:30:40\t1000\t1999\n", "2019-05-17 07:30:41\t2000\t2999\n", "2019-05-17 07:30:41\t3000\t3999\n", "2019-05-17 07:30:41\t4000\t4999\n", "2019-05-17 07:30:41\t5000\t5999\n", "2019-05-17 07:30:42\t6000\t6999\n", "2019-05-17 07:30:42\t7000\t7999\n", "2019-05-17 07:30:42\t8000\t8999\n", "2019-05-17 07:30:43\t9000\t9999\n", "2019-05-17 07:30:43\t10000\t10999\n", "2019-05-17 07:30:43\t11000\t11999\n", "2019-05-17 07:30:43\t12000\t12999\n", "2019-05-17 07:30:44\t13000\t13999\n", "2019-05-17 07:30:44\t14000\t14999\n", "2019-05-17 07:30:44\t15000\t15999\n", "2019-05-17 07:30:45\t16000\t16999\n", "2019-05-17 07:30:45\t17000\t17999\n", "2019-05-17 07:30:45\t18000\t18999\n", "2019-05-17 07:30:45\t19000\t19999\n", "2019-05-17 07:30:46\t20000\t20999\n", "2019-05-17 07:30:46\t21000\t21999\n", "2019-05-17 07:30:46\t22000\t22999\n", "2019-05-17 07:30:47\t23000\t23999\n", "2019-05-17 07:30:47\t24000\t24999\n", "2019-05-17 07:30:47\t25000\t25999\n", "2019-05-17 07:30:48\t26000\t26999\n", "2019-05-17 07:30:48\t27000\t27999\n", "2019-05-17 07:30:48\t28000\t28999\n", "2019-05-17 07:30:48\t29000\t29999\n", "2019-05-17 07:30:49\t30000\t30999\n", "2019-05-17 07:30:49\t31000\t31999\n", "2019-05-17 07:30:49\t32000\t32999\n", "2019-05-17 07:30:50\t33000\t33999\n", "2019-05-17 07:30:50\t34000\t34999\n", "2019-05-17 07:30:50\t35000\t35999\n", "2019-05-17 07:30:50\t36000\t36999\n", "2019-05-17 07:30:51\t37000\t37999\n", "2019-05-17 07:30:51\t38000\t38999\n", "2019-05-17 07:30:51\t39000\t39999\n", "2019-05-17 07:30:52\t40000\t40999\n", "2019-05-17 07:30:52\t41000\t41999\n", "2019-05-17 07:30:52\t42000\t42999\n", "2019-05-17 07:30:52\t43000\t43999\n", "2019-05-17 07:30:53\t44000\t44999\n", "2019-05-17 07:30:53\t45000\t45999\n", "2019-05-17 07:30:53\t46000\t46999\n", "2019-05-17 07:30:54\t47000\t47999\n", "2019-05-17 07:30:54\t48000\t48999\n", "2019-05-17 07:30:54\t49000\t49999\n", "\n", "--------------------------\n", "\n", "--------------------------\n", "Display log file: resources/20190517_073038_serial_chain/txtlogfile.txt\n", "2019-05-17 07:30:40\t0\t999\n", "2019-05-17 07:30:40\t1000\t1999\n", "2019-05-17 07:30:41\t2000\t2999\n", "2019-05-17 07:30:41\t3000\t3999\n", "2019-05-17 07:30:41\t4000\t4999\n", "2019-05-17 07:30:41\t5000\t5999\n", "2019-05-17 07:30:42\t6000\t6999\n", "2019-05-17 07:30:42\t7000\t7999\n", "2019-05-17 07:30:42\t8000\t8999\n", "2019-05-17 07:30:43\t9000\t9999\n", "2019-05-17 07:30:43\t10000\t10999\n", "2019-05-17 07:30:43\t11000\t11999\n", "2019-05-17 07:30:43\t12000\t12999\n", "2019-05-17 07:30:44\t13000\t13999\n", "2019-05-17 07:30:44\t14000\t14999\n", "2019-05-17 07:30:44\t15000\t15999\n", "2019-05-17 07:30:45\t16000\t16999\n", "2019-05-17 07:30:45\t17000\t17999\n", "2019-05-17 07:30:45\t18000\t18999\n", "2019-05-17 07:30:45\t19000\t19999\n", "2019-05-17 07:30:46\t20000\t20999\n", "2019-05-17 07:30:46\t21000\t21999\n", "2019-05-17 07:30:46\t22000\t22999\n", "2019-05-17 07:30:47\t23000\t23999\n", "2019-05-17 07:30:47\t24000\t24999\n", "2019-05-17 07:30:47\t25000\t25999\n", "2019-05-17 07:30:48\t26000\t26999\n", "2019-05-17 07:30:48\t27000\t27999\n", "2019-05-17 07:30:48\t28000\t28999\n", "2019-05-17 07:30:48\t29000\t29999\n", "2019-05-17 07:30:49\t30000\t30999\n", "2019-05-17 07:30:49\t31000\t31999\n", "2019-05-17 07:30:49\t32000\t32999\n", "2019-05-17 07:30:50\t33000\t33999\n", "2019-05-17 07:30:50\t34000\t34999\n", "2019-05-17 07:30:50\t35000\t35999\n", "2019-05-17 07:30:50\t36000\t36999\n", "2019-05-17 07:30:51\t37000\t37999\n", "2019-05-17 07:30:51\t38000\t38999\n", "2019-05-17 07:30:51\t39000\t39999\n", "2019-05-17 07:30:52\t40000\t40999\n", "2019-05-17 07:30:52\t41000\t41999\n", "2019-05-17 07:30:52\t42000\t42999\n", "2019-05-17 07:30:52\t43000\t43999\n", "2019-05-17 07:30:53\t44000\t44999\n", "2019-05-17 07:30:53\t45000\t45999\n", "2019-05-17 07:30:53\t46000\t46999\n", "2019-05-17 07:30:54\t47000\t47999\n", "2019-05-17 07:30:54\t48000\t48999\n", "2019-05-17 07:30:54\t49000\t49999\n", "\n", "--------------------------\n", "\n" ] } ], "source": [ "CP.print_log_files(savedir)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": true, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }