{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Parallel GST using MPI\n",
    "The purpose of this tutorial is to demonstrate how to compute GST estimates in parallel (using multiple CPUs or \"processors\").  The core PyGSTi computational routines are written to take advantage of multiple processors via the MPI communication framework, and so one must have a version of MPI and the `mpi4py` python package installed in order use run pyGSTi calculations in parallel.  \n",
    "\n",
    "Since `mpi4py` doesn't play nicely with Jupyter notebooks, this tutorial is a bit more clunky than the others.  In it, we will create a standalone Python script that imports `mpi4py` and execute it.\n",
    "\n",
    "We will use as an example the same \"standard\" single-qubit gate set of the first tutorial.  We'll first create a dataset, and then a script to be run in parallel which loads the data.  The creation of a simulated data is performed in the same way as the first tutorial.   Since *random* numbers are generated and used as simulated counts within the call to `generate_fake_data`, it is important that this is *not* done in a parallel environment, or different CPUs may get different data sets.  (This isn't an issue in the typical situation when the data is obtained experimentally.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Import pyGSTi and the \"stardard 1-qubit quantities for a gateset with X(pi/2), Y(pi/2), and idle gates\"\n",
    "import pygsti\n",
    "from pygsti.construction import std1Q_XYI\n",
    "\n",
    "#Create a data set\n",
    "gs_target = std1Q_XYI.gs_target\n",
    "fiducials = std1Q_XYI.fiducials\n",
    "germs = std1Q_XYI.germs\n",
    "maxLengths = [1,2,4,8,16,32]\n",
    "\n",
    "gs_datagen = gs_target.depolarize(gate_noise=0.1, spam_noise=0.001)\n",
    "listOfExperiments = pygsti.construction.make_lsgst_experiment_list(gs_target.gates.keys(), fiducials, fiducials, germs, maxLengths)\n",
    "ds = pygsti.construction.generate_fake_data(gs_datagen, listOfExperiments, nSamples=1000,\n",
    "                                            sampleError=\"binomial\", seed=1234)\n",
    "pygsti.io.write_dataset(\"example_files/mpi_example_dataset.txt\", ds)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll write a Python script that will load in the just-created `DataSet`, run GST on it, and write the output to a file.  The only major difference between the contents of this script and previous examples is that the script imports `mpi4py` and passes a MPI comm object (`comm`) to the `do_long_sequence_gst` function.  Since parallel computing is best used for computationaly intensive GST calculations, we also demonstrate how to set a per-processor memory limit to tell pyGSTi to partition its computations so as to not exceed this memory usage.  Lastly, note the use of the `gaugeOptParams` argument of `do_long_sequence_gst`, which can be used to weight different gate set members differently during gauge optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "mpiScript = \"\"\"\n",
    "import time\n",
    "import pygsti\n",
    "from pygsti.construction import std1Q_XYI\n",
    "\n",
    "#get MPI comm\n",
    "from mpi4py import MPI\n",
    "comm = MPI.COMM_WORLD\n",
    "\n",
    "print(\"Rank %d started\" % comm.Get_rank())\n",
    "\n",
    "#define target gateset, fiducials, and germs as before\n",
    "gs_target = std1Q_XYI.gs_target\n",
    "fiducials = std1Q_XYI.fiducials\n",
    "germs = std1Q_XYI.germs\n",
    "maxLengths = [1,2,4,8,16,32]\n",
    "\n",
    "#tell gauge optimization to weight the gate matrix\n",
    "# elements 100x more heavily than the SPAM vector elements, and\n",
    "# to specifically weight the Gx gate twice as heavily as the other\n",
    "# gates.\n",
    "goParams = {'itemWeights':{'spam': 0.01, 'gates': 1.0, 'Gx': 2.0} }\n",
    "\n",
    "#Specify a per-core memory limit (useful for larger GST calculations)\n",
    "memLim = 2.1*(1024)**3  # 2.1 GB\n",
    "\n",
    "#Perform TP-constrained GST\n",
    "gs_target.set_all_parameterizations(\"TP\")\n",
    "    \n",
    "#load the dataset\n",
    "ds = pygsti.io.load_dataset(\"example_files/mpi_example_dataset.txt\")\n",
    "\n",
    "start = time.time()\n",
    "results = pygsti.do_long_sequence_gst(ds, gs_target, fiducials, fiducials,\n",
    "                                      germs, maxLengths,memLimit=memLim,\n",
    "                                      gaugeOptParams=goParams, comm=comm,\n",
    "                                      verbosity=2)\n",
    "end = time.time()\n",
    "print(\"Rank %d finished in %.1fs\" % (comm.Get_rank(), end-start))\n",
    "if comm.Get_rank() == 0:\n",
    "    import pickle\n",
    "    pickle.dump(results, open(\"example_files/mpi_example_results.pkl\",\"wb\"))\n",
    "\"\"\"\n",
    "with open(\"example_files/mpi_example_script.py\",\"w\") as f:\n",
    "    f.write(mpiScript)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we run the script with 3 processors using `mpiexec`.  The `mpiexec` executable should have been installed with your MPI distribution -- if it doesn't exist, try replacing `mpiexec` with `mpirun`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Rank 1 started\n",
      "Rank 0 started\n",
      "Rank 2 started\n",
      "--- Gate Sequence Creation ---\n",
      "   1702 sequences created\n",
      "   Dataset has 1702 entries: 1702 utilized, 0 requested sequences were missing\n",
      "--- LGST ---\n",
      "  Singular values of I_tilde (truncating to first 4 of 6) = \n",
      "  4.244089943192679\n",
      "  1.1594632778409208\n",
      "  0.9651516670737965\n",
      "  0.9297628363691268\n",
      "  0.049256811347238104\n",
      "  0.025150658372136828\n",
      "  \n",
      "  Singular values of target I_tilde (truncating to first 4 of 6) = \n",
      "  4.242640687119286\n",
      "  1.414213562373096\n",
      "  1.4142135623730956\n",
      "  1.4142135623730954\n",
      "  2.5038933168948026e-16\n",
      "  2.023452063009528e-16\n",
      "  \n",
      "--- Iterative MLGST: Iter 1 of 6  92 gate strings ---: \n",
      "  --- Minimum Chi^2 GST ---\n",
      "  Memory limit = 2.10GB\n",
      "  Cur, Persist, Gather = 0.13, 0.00, 0.21 GB\n",
      "  Finding num_nongauge_params is too expensive: using total params.\n",
      "  Sum of Chi^2 = 95.7297 (91 data params - 43 model params = expected mean of 48; p-value = 5.13941e-05)\n",
      "  Completed in 0.6s\n",
      "  2*Delta(log(L)) = 96.1163\n",
      "  Iteration 1 took 0.6s\n",
      "  \n",
      "--- Iterative MLGST: Iter 2 of 6  168 gate strings ---: \n",
      "  --- Minimum Chi^2 GST ---\n",
      "  Memory limit = 2.10GB\n",
      "  Cur, Persist, Gather = 0.13, 0.00, 0.21 GB\n",
      "  Finding num_nongauge_params is too expensive: using total params.\n",
      "  Sum of Chi^2 = 162.192 (167 data params - 43 model params = expected mean of 124; p-value = 0.0120809)\n",
      "  Completed in 0.4s\n",
      "  2*Delta(log(L)) = 162.56\n",
      "  Iteration 2 took 0.4s\n",
      "  \n",
      "--- Iterative MLGST: Iter 3 of 6  450 gate strings ---: \n",
      "  --- Minimum Chi^2 GST ---\n",
      "  Memory limit = 2.10GB\n",
      "  Cur, Persist, Gather = 0.13, 0.00, 0.21 GB\n",
      "  Finding num_nongauge_params is too expensive: using total params.\n",
      "  Sum of Chi^2 = 484.676 (449 data params - 43 model params = expected mean of 406; p-value = 0.00435191)\n",
      "  Completed in 0.9s\n",
      "  2*Delta(log(L)) = 485.572\n",
      "  Iteration 3 took 1.0s\n",
      "  \n",
      "--- Iterative MLGST: Iter 4 of 6  862 gate strings ---: \n",
      "  --- Minimum Chi^2 GST ---\n",
      "  Memory limit = 2.10GB\n",
      "  Cur, Persist, Gather = 0.13, 0.00, 0.21 GB\n",
      "  Finding num_nongauge_params is too expensive: using total params.\n",
      "  Sum of Chi^2 = 895.303 (861 data params - 43 model params = expected mean of 818; p-value = 0.0307097)\n",
      "  Completed in 2.0s\n",
      "  2*Delta(log(L)) = 896.23\n",
      "  Iteration 4 took 2.0s\n",
      "  \n",
      "--- Iterative MLGST: Iter 5 of 6  1282 gate strings ---: \n",
      "  --- Minimum Chi^2 GST ---\n",
      "  Memory limit = 2.10GB\n",
      "  Cur, Persist, Gather = 0.14, 0.00, 0.21 GB\n",
      "  Finding num_nongauge_params is too expensive: using total params.\n",
      "  Sum of Chi^2 = 1350.86 (1281 data params - 43 model params = expected mean of 1238; p-value = 0.0133489)\n",
      "  Completed in 2.9s\n",
      "  2*Delta(log(L)) = 1351.9\n",
      "  Iteration 5 took 3.0s\n",
      "  \n",
      "--- Iterative MLGST: Iter 6 of 6  1702 gate strings ---: \n",
      "  --- Minimum Chi^2 GST ---\n",
      "  Memory limit = 2.10GB\n",
      "  Cur, Persist, Gather = 0.14, 0.00, 0.21 GB\n",
      "  Finding num_nongauge_params is too expensive: using total params.\n",
      "  Sum of Chi^2 = 1800.55 (1701 data params - 43 model params = expected mean of 1658; p-value = 0.00777432)\n",
      "  Completed in 3.5s\n",
      "  2*Delta(log(L)) = 1801.81\n",
      "  Iteration 6 took 3.6s\n",
      "  \n",
      "  Switching to ML objective (last iteration)\n",
      "  --- MLGST ---\n",
      "  Memory: limit = 2.10GB(cur, persist, gthr = 0.14, 0.00, 0.21 GB)\n",
      "  Finding num_nongauge_params is too expensive: using total params.\n",
      "    Maximum log(L) = 900.852 below upper bound of -2.84686e+06\n",
      "      2*Delta(log(L)) = 1801.7 (1701 data params - 43 model params = expected mean of 1658; p-value = 0.00737681)\n",
      "    Completed in 1.2s\n",
      "  2*Delta(log(L)) = 1801.7\n",
      "  Final MLGST took 1.2s\n",
      "  \n",
      "Iterative MLGST Total Time: 11.8s\n",
      "  -- Adding Gauge Optimized (go0) --\n",
      "--- Re-optimizing logl after robust data scaling ---\n",
      "  --- MLGST ---\n",
      "  Memory: limit = 2.10GB(cur, persist, gthr = 0.14, 0.00, 0.21 GB)\n",
      "  Finding num_nongauge_params is too expensive: using total params.\n",
      "    Maximum log(L) = 900.852 below upper bound of -2.84686e+06\n",
      "      2*Delta(log(L)) = 1801.7 (1701 data params - 43 model params = expected mean of 1658; p-value = 0.00737681)\n",
      "    Completed in 1.0s\n",
      "  -- Adding Gauge Optimized (go0) --\n",
      "Rank 1 finished in 15.9s\n",
      "Rank 0 finished in 15.8s\n",
      "Rank 2 finished in 15.7s\n"
     ]
    }
   ],
   "source": [
    "! mpiexec -n 3 python3 \"example_files/mpi_example_script.py\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice in the above that output within `do_long_sequence_gst` is not duplicated (only the first processor outputs to stdout) so that the output looks identical to running on a single processor.  Finally, we just need to read the pickled `Results` object from file and proceed with any post-processing analysis.  In this case, we'll just create a  report. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "*** Creating workspace ***\n",
      "*** Generating switchboard ***\n",
      "Found standard clifford compilation from std1Q_XYI\n",
      "*** Generating tables ***\n",
      "  targetSpamBriefTable                          took 1.042009 seconds\n",
      "  targetGatesBoxTable                           took 0.26143 seconds\n",
      "  datasetOverviewTable                          took 0.202052 seconds\n",
      "  bestGatesetSpamParametersTable                took 0.000872 seconds\n",
      "  bestGatesetSpamBriefTable                     took 0.299435 seconds\n",
      "  bestGatesetSpamVsTargetTable                  took 0.131787 seconds\n",
      "  bestGatesetGaugeOptParamsTable                took 0.000601 seconds\n",
      "  bestGatesetGatesBoxTable                      took 0.645652 seconds\n",
      "  bestGatesetChoiEvalTable                      took 0.632017 seconds\n",
      "  bestGatesetDecompTable                        took 0.431345 seconds\n",
      "  bestGatesetEvalTable                          took 0.004359 seconds\n",
      "  bestGermsEvalTable                            took 0.020595 seconds\n",
      "  bestGatesetVsTargetTable                      took 0.27236 seconds\n",
      "  bestGatesVsTargetTable_gv                     took 0.370436 seconds\n",
      "  bestGatesVsTargetTable_gvgerms                took 0.139976 seconds\n",
      "  bestGatesVsTargetTable_gi                     took 0.02013 seconds\n",
      "  bestGatesVsTargetTable_gigerms                took 0.028079 seconds\n",
      "  bestGatesVsTargetTable_sum                    took 0.336994 seconds\n",
      "  bestGatesetErrGenBoxTable                     took 1.197842 seconds\n",
      "  metadataTable                                 took 0.16385 seconds\n",
      "  stdoutBlock                                   took 0.001845 seconds\n",
      "  profilerTable                                 took 0.177892 seconds\n",
      "  softwareEnvTable                              took 0.048463 seconds\n",
      "  exampleTable                                  took 0.09586 seconds\n",
      "  singleMetricTable_gv                          took 0.359545 seconds\n",
      "  singleMetricTable_gi                          took 0.022656 seconds\n",
      "  fiducialListTable                             took 0.00107 seconds\n",
      "  prepStrListTable                              took 0.000269 seconds\n",
      "  effectStrListTable                            took 0.000293 seconds\n",
      "  colorBoxPlotKeyPlot                           took 0.101153 seconds\n",
      "  germList2ColTable                             took 0.000431 seconds\n",
      "  progressTable                                 took 4.845024 seconds\n",
      "*** Generating plots ***\n",
      "  gramBarPlot                                   took 0.122269 seconds\n",
      "  progressBarPlot                               took 0.090196 seconds\n",
      "  progressBarPlot_sum                           took 0.000843 seconds\n",
      "  finalFitComparePlot                           took 0.07746 seconds\n",
      "  bestEstimateColorBoxPlot                      took 18.20526 seconds\n",
      "  bestEstimateTVDColorBoxPlot                   took 17.833853 seconds\n",
      "  bestEstimateColorScatterPlot                  took 22.53869 seconds\n",
      "  bestEstimateColorHistogram                    took 18.968191 seconds\n",
      "  progressTable_scl                             took 5.112971 seconds\n",
      "  progressBarPlot_scl                           took 0.094536 seconds\n",
      "  bestEstimateColorBoxPlot_scl                  took 16.824006 seconds\n",
      "  bestEstimateColorScatterPlot_scl              took 20.652359 seconds\n",
      "  bestEstimateColorHistogram_scl                took 17.336409 seconds\n",
      "  dataScalingColorBoxPlot                       took 0.373911 seconds\n",
      "*** Merging into template file ***\n",
      "  Rendering topSwitchboard                      took 0.000197 seconds\n",
      "  Rendering maxLSwitchboard1                    took 0.000475 seconds\n",
      "  Rendering targetSpamBriefTable                took 0.020996 seconds\n",
      "  Rendering targetGatesBoxTable                 took 0.015463 seconds\n",
      "  Rendering datasetOverviewTable                took 0.00091 seconds\n",
      "  Rendering bestGatesetSpamParametersTable      took 0.003097 seconds\n",
      "  Rendering bestGatesetSpamBriefTable           took 0.037325 seconds\n",
      "  Rendering bestGatesetSpamVsTargetTable        took 0.003419 seconds\n",
      "  Rendering bestGatesetGaugeOptParamsTable      took 0.003759 seconds\n",
      "  Rendering bestGatesetGatesBoxTable            took 0.021816 seconds\n",
      "  Rendering bestGatesetChoiEvalTable            took 0.023831 seconds\n",
      "  Rendering bestGatesetDecompTable              took 0.022681 seconds\n",
      "  Rendering bestGatesetEvalTable                took 0.037761 seconds\n",
      "  Rendering bestGermsEvalTable                  took 0.144064 seconds\n",
      "  Rendering bestGatesetVsTargetTable            took 0.001227 seconds\n",
      "  Rendering bestGatesVsTargetTable_gv           took 0.00864 seconds\n",
      "  Rendering bestGatesVsTargetTable_gvgerms      took 0.020981 seconds\n",
      "  Rendering bestGatesVsTargetTable_gi           took 0.006548 seconds\n",
      "  Rendering bestGatesVsTargetTable_gigerms      took 0.006077 seconds\n",
      "  Rendering bestGatesVsTargetTable_sum          took 0.00747 seconds\n",
      "  Rendering bestGatesetErrGenBoxTable           took 0.03881 seconds\n",
      "  Rendering metadataTable                       took 0.006903 seconds\n",
      "  Rendering stdoutBlock                         took 0.001828 seconds\n",
      "  Rendering profilerTable                       took 0.004451 seconds\n",
      "  Rendering softwareEnvTable                    took 0.005541 seconds\n",
      "  Rendering exampleTable                        took 0.004608 seconds\n",
      "  Rendering metricSwitchboard_gv                took 6.3e-05 seconds\n",
      "  Rendering metricSwitchboard_gi                took 6.2e-05 seconds\n",
      "  Rendering singleMetricTable_gv                took 0.010169 seconds\n",
      "  Rendering singleMetricTable_gi                took 0.008515 seconds\n",
      "  Rendering fiducialListTable                   took 0.003872 seconds\n",
      "  Rendering prepStrListTable                    took 0.00436 seconds\n",
      "  Rendering effectStrListTable                  took 0.004452 seconds\n",
      "  Rendering colorBoxPlotKeyPlot                 took 0.004817 seconds\n",
      "  Rendering germList2ColTable                   took 0.005587 seconds\n",
      "  Rendering progressTable                       took 0.009789 seconds\n",
      "  Rendering gramBarPlot                         took 0.004575 seconds\n",
      "  Rendering progressBarPlot                     took 0.003685 seconds\n",
      "  Rendering progressBarPlot_sum                 took 0.002923 seconds\n",
      "  Rendering finalFitComparePlot                 took 0.002802 seconds\n",
      "  Rendering bestEstimateColorBoxPlot            took 0.133472 seconds\n",
      "  Rendering bestEstimateTVDColorBoxPlot         took 0.124616 seconds\n",
      "  Rendering bestEstimateColorScatterPlot        took 0.163898 seconds\n",
      "  Rendering bestEstimateColorHistogram          took 0.100233 seconds\n",
      "  Rendering progressTable_scl                   took 0.010059 seconds\n",
      "  Rendering progressBarPlot_scl                 took 0.005599 seconds\n",
      "  Rendering bestEstimateColorBoxPlot_scl        took 0.135641 seconds\n",
      "  Rendering bestEstimateColorScatterPlot_scl    took 0.194431 seconds\n",
      "  Rendering bestEstimateColorHistogram_scl      took 0.115876 seconds\n",
      "  Rendering dataScalingColorBoxPlot             took 0.040465 seconds\n",
      "Output written to example_files/mpi_example_brief directory\n",
      "Opening example_files/mpi_example_brief/main.html...\n",
      "*** Report Generation Complete!  Total time 152.65s ***\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<pygsti.report.workspace.Workspace at 0x1064d70b8>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pickle\n",
    "results = pickle.load(open(\"example_files/mpi_example_results.pkl\",\"rb\"))\n",
    "pygsti.report.create_standard_report(results, \"example_files/mpi_example_brief\",\n",
    "                                    title=\"MPI Example Report\", verbosity=2, auto_open=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}