{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# File Input and Output\n",
    "\n",
    "This tutorial describes how different pyGSTi objects can be saved to and read back from files.  There are four  main ways of doing this, depending on the type of object you're wanting to save.\n",
    "\n",
    "1. **Custom text formats** A small number of pyGSTi objects can be converted to and from a pyGSTi-specific text format, that can be written to and from files.  Currently `DataSet`, `MultiDataSet`, `Circuit`, and `Label` are the only such objects.  `Circuit` and `Label` objects are usually not read or written individually; usually it's a *list* of circuits that are.  \n",
    "\n",
    "2. **JSON format** Many of pyGSTi's objects have the ability to be converted to and from the JavaScript object notation (JSON) format.  This is a standard text format that is widely supported and portable, making it the preferred choice for reading and writing just about any pyGSTi object except a data set or and circuit list (which have their own text formats).\n",
    "\n",
    "3. **Experiment design directory** Several high-level objects can be read and written from *directories* rather than individual files.  These include `ExperimentDesign`, `Protocol`, `ProtocolData`, `ProtocolResults`, and `ProtocolResultsDir` objects.  Because these object types hold lots of other objects, storing them to directories makes looking through the contents of the object possible without even using Python, since different parts of the object are stored in different files.  This is further facilitated by saving the parts of the high-level object in the custom text or JSON formats described above - the resulting directory contains portable, human-readable files.  \n",
    "   Secondly, this directory format allows for multiple of object types to effectively *share* a single directory. This is achieved by placing the data for an object in a subdirectory beneath the given directory named based on the object type.  For instance, writing an `ExperimentDesign` to the *path/to/root_directory* will write the object's contents within the *path/to/root_directory/edesign* directory.  A `ProtocolData`, which contains an `ExperimentDesign`, when writing to the same *path/to/root_directory* will write its non-experiment design contents within *path/to/root_directory/data* directory and re-use the serialized experiment design in the `edesign` sub-directory.  When experiment designs contain other (nested) experiment designs, this leads to an efficient tree structure organizing circuit lists, experimental data, and analysis results.   \n",
    "\n",
    "4. **Pickle, basic JSON and MSGPACK formats**  Finally, most pyGSTi objects are able to be \"pickled\" using Python's pickle format.  The codec that allows for this can also write [JSON](https://www.json.org) and [MessagePack](https://msgpack.org/index.html) formats via functions in `pygsti.io.json` and `pygsti.io.msgpack`.  These routines create an alternate format to the \"nice\" JSON format described above: they produce a format that is fragile, non-portable, and not human readable.  As such, using `pickle`, `pygsti.io.json` or `pygsti.io.msgpack` should be a last resort and only used for temporary storage.\n",
    "\n",
    "\n",
    "## Text formats\n",
    "\n",
    "All text-based input and output is done via the `pygsti.io` sub-package.  Below we give examples of how to read and write data sets and circuit objects.  (Note: Data set objects also have `read_binary` and `write_binary` methods which read and write *binary* formats -- different from the text formats used by the `pygsti.io` routines -- but these binary formats are less portable and not human readable, making the text formats preferable)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pygsti\n",
    "\n",
    "#DataSets ------------------------------------------------------------\n",
    "dataset_txt = \\\n",
    "\"\"\"## Columns = 0 count, 1 count\n",
    "{} 0 100\n",
    "Gx 10 80\n",
    "GxGy 40 20\n",
    "Gx^4 20 70\n",
    "\"\"\"\n",
    "with open(\"../tutorial_files/TestDataSet.txt\",\"w\") as f:\n",
    "    f.write(dataset_txt)\n",
    "\n",
    "ds = pygsti.io.read_dataset(\"../tutorial_files/TestDataSet.txt\")\n",
    "pygsti.io.write_dataset(\"../tutorial_files/TestDataSet.txt\", ds)\n",
    "\n",
    "\n",
    "#MultiDataSets ------------------------------------------------------------\n",
    "multidataset_txt = \\\n",
    "\"\"\"## Columns = DS0 0 count, DS0 1 count, DS1 0 frequency, DS1 count total                                \n",
    "{} 0 100 0 100                                                                                                      \n",
    "Gx 10 90 0.1 100                                                                                                    \n",
    "GxGy 40 60 0.4 100                                                                                                  \n",
    "Gx^4 20 80 0.2 100                                                                                                  \n",
    "\"\"\"\n",
    "\n",
    "with open(\"../tutorial_files/TestMultiDataSet.txt\",\"w\") as f:\n",
    "    f.write(multidataset_txt)\n",
    "    \n",
    "multiDS = pygsti.io.read_multidataset(\"../tutorial_files/TestMultiDataSet.txt\", cache=True)\n",
    "pygsti.io.write_multidataset(\"../tutorial_files/TestDataSet.txt\", multiDS)\n",
    "\n",
    "\n",
    "#DataSets w/timestamped data --------------------------------------------\n",
    "# Note: left of equals sign is letter, right is spam label\n",
    "tddataset_txt = \\\n",
    "\"\"\"## 0 = 0\n",
    "## 1 = 1\n",
    "{} 011001\n",
    "Gx 111000111\n",
    "Gy 11001100\n",
    "\"\"\"\n",
    "with open(\"../tutorial_files/TestTDDataset.txt\",\"w\") as f:\n",
    "    f.write(tddataset_txt)\n",
    "    \n",
    "tdds_fromfile = pygsti.io.read_time_dependent_dataset(\"../tutorial_files/TestTDDataset.txt\")\n",
    "#NOTE: currently there's no way to *write* a DataSet w/timestamped data to a text file yet.\n",
    "\n",
    "\n",
    "#Circuits ------------------------------------------------------------\n",
    "from pygsti.modelpacks import smq1Q_XY  \n",
    "cList = pygsti.circuits.create_lsgst_circuits(\n",
    "    [('Gxpi2',0), ('Gypi2',0)], smq1Q_XY.prep_fiducials(), smq1Q_XY.meas_fiducials(),\n",
    "    smq1Q_XY.germs(), [1,2,4,8])    \n",
    "pygsti.io.write_circuit_list(\"../tutorial_files/TestCircuitList.txt\",cList,\"#Test Circuit List\")\n",
    "pygsti.io.write_empty_dataset(\"../tutorial_files/TestEmptyDataset.txt\",cList) \n",
    "  #additionally creates columns of zeros where data should go...\n",
    "cList2 = pygsti.io.read_circuit_list(\"../tutorial_files/TestCircuitList.txt\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## JSON format\n",
    "Any object that derives from `pygsti.baseobjs.NicelySerializable` can be written to or read from a \"nice\" JSON format using the object's `write` and `read` methods.  This is the best way to serialize `Model` objects, which can be particularly complex.  We give an example of this below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pspec = pygsti.processors.QubitProcessorSpec(2, ['Gxpi2', 'Gypi2', 'Gcnot'], geometry='line')\n",
    "mdl = pygsti.models.modelconstruction.create_crosstalk_free_model(pspec)\n",
    "mdl.write(\"../tutorial_files/serialized_model.json\")\n",
    "\n",
    "#A base class can read in derived-class files \n",
    "loaded_mdl = pygsti.models.Model.read(\"../tutorial_files/serialized_model.json\")\n",
    "print(type(loaded_mdl))\n",
    "\n",
    "import numpy as np\n",
    "assert(np.allclose(mdl.to_vector(), loaded_mdl.to_vector()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also serialize a `QubitProcessorSpec` this way:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pspec.write(\"../tutorial_files/serialized_pspec.json\")\n",
    "print(\"wrote a\", str(type(pspec)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But we'll get an error if we try to read in the processor specification using `pygsti.models.Model`, since a class's `.read` ensures that it constructs an object of that type or a subclass of it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    pygsti.models.Model.read(\"../tutorial_files/serialized_pspec.json\")\n",
    "except ValueError as v:\n",
    "    print(\"ERROR: \", str(v))\n",
    "\n",
    "p1 = pygsti.processors.QubitProcessorSpec.read(\"../tutorial_files/serialized_pspec.json\")  # ok - reads object of same type\n",
    "p2 = pygsti.processors.ProcessorSpec.read(\"../tutorial_files/serialized_pspec.json\")  # ok - ProcessorSpec is a base class of QubitProcessorSpec\n",
    "p3 = pygsti.baseobjs.NicelySerializable.read(\"../tutorial_files/serialized_pspec.json\") # also ok - NicelySerializeable is a base class too\n",
    "\n",
    "assert(type(p1) == pygsti.processors.QubitProcessorSpec)\n",
    "assert(type(p2) == pygsti.processors.QubitProcessorSpec)\n",
    "assert(type(p3) == pygsti.processors.QubitProcessorSpec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Experiment design directory tree\n",
    "`ExperimentDesign`, `Protocol`, `ProtocolData`, `ProtocolResults`, and `ProtocolResultsDir` objects can be written to a *directory* using using their `write` methods.  These objects place their data in subdirectories beneath the given directory as follows:\n",
    "\n",
    "- `ExperimentDesign` places its contents in an `edesign` subdirectory.\n",
    "- `ProtocolData` places its contained experiment design in an `edesign` subdirectory (it writes its contained `ExperimentDesign` to the same root directory) and the rest of its contents in a `data` subdirectory.\n",
    "- `ProtocolResults` writes its contained `ProtocolData` object into the same root directory (this uses `edesign` and `data` subdirectories) and the rest of its contents into the `results/<name>` subdirectory, where `<name>` is given by the `name` attribute of the `ProtocolResults` object. \n",
    "- `ProtocolResultsDir` writes its contained `ProtocolData` object into the same root directory (this uses `edesign` and `data` subdirectories) and the rest of its contents, which consist of multiple `ProtocolResults` objects beneath a `results` subdirectory (i.e. it writes all of its contained `ProtocolResults` objects to the same root directory).\n",
    "- `Protocol` doesn't conform to the same structure, and can be independently written to a given directory. Contents are written directly into this directory (no subdirectories), and so the destination directory is unrelated to the common \"root directory\" used by the preceding classes.\n",
    "\n",
    "This results in a directory structure where common information (e.g. an experiment design) is only written to disk in one location (the `edesign` subdirectory) and shared by all the objects that need it.  Additional subdirectories may exist beneath the root directory to support *nested* experiment designs and their related data and results.  These subdirectory names are listed in the `edesign/subdirs.json` file.  Additional subdirectories not listed here are ignored and can be used by users as they please.  To read these objects back from disk, you use the following `pygsti.io` functions:\n",
    "\n",
    "- `pygsti.io.read_edesign_from_dir(root_directory)` reads the experiment design saved beneath the given root directory, i.e., from `root_directory/edesign`.  Also loads nested experiment designs.\n",
    "- `pygsti.io.read_data_from_dir(root_directory)` reads a `ProtocolData` from the given root directory, including nested data objects.\n",
    "- `pygsti.io.read_results_from_dir(root_directory)` reads a `ProtocolResultsDir` from the given root directory, including nested result directory objects.\n",
    "- `pygsti.io.read_protocol_from_dir(directory)` reads a `Protocol` object from the given directory.  This directory is independent and only used for holding a protocol's contents, i.e. *not* like the `root_directory` arguments to the previous functions.\n",
    "\n",
    "We demonstrate some of this functionality with a simple example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    import shutil\n",
    "    shutil.rmtree(\"../tutorial_files/example_root_directory\")  # start with a clean slate, in case you rerun this tutorial\n",
    "except FileNotFoundError:\n",
    "    pass\n",
    "from pygsti.modelpacks import smq1Q_XYI\n",
    "edesign = smq1Q_XYI.create_gst_experiment_design(1)\n",
    "edesign.write(\"../tutorial_files/example_root_directory\")  # creates .../example_root_directory/edesign"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load back in the experiment design (just to demo this function):\n",
    "loaded_edesign = pygsti.io.read_edesign_from_dir(\"../tutorial_files/example_root_directory\")\n",
    "\n",
    "#Create & write an empty ProtocolData object to the same root directory\n",
    "# (this is often useful because it creates an empty data set file in .../example_root_directory/data\n",
    "#  that can be used as a template for the real experimental data)\n",
    "pygsti.io.write_empty_protocol_data(\"../tutorial_files/example_root_directory\", loaded_edesign, clobber_ok=True)\n",
    "\n",
    "#Alternatively, we can create an empty ProtocolData object and then write it:\n",
    "data = pygsti.protocols.ProtocolData(loaded_edesign)\n",
    "data.write()  # same as .write(\"../tutorial_files/example_root_directory\") because loaded_edesign remembers \n",
    "              # where is was loaded from and this is the default destination (see loaded_edesign._loaded_from)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Next, let's create a protocol to run on the data.\n",
    "protocol = pygsti.protocols.GateSetTomography(smq1Q_XYI.target_model(\"full TP\"), name=\"TP_GST_Analysis\")\n",
    "\n",
    "#We can read/write this separately, though we don't really need to since it will get saved as a part\n",
    "# of the results later on.  Still, to demonstrate this, here's how to do it:\n",
    "protocol.write(\"../tutorial_files/example_gst_protocol\")\n",
    "loaded_protocol = pygsti.io.read_protocol_from_dir(\"../tutorial_files/example_gst_protocol\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Now, generate some simulated data and run the protocol on it to produce some results\n",
    "\n",
    "# Mimic filling in or replacing the empty \"template\" dataset file (dataset.txt) with actual data\n",
    "datagen_model = smq1Q_XYI.target_model().depolarize(op_noise=0.05, spam_noise=0.2)\n",
    "pygsti.io.fill_in_empty_dataset_with_fake_data(\"../tutorial_files/example_root_directory/data/dataset.txt\",\n",
    "                                               datagen_model, num_samples=1000, seed=2021)\n",
    "\n",
    "# Load the just-simulated data as a ProtocolData object and run our GST protocol on it:\n",
    "loaded_data = pygsti.io.read_data_from_dir(\"../tutorial_files/example_root_directory\")\n",
    "results = protocol.run(loaded_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save the results:\n",
    "results.write()  # same as .write(\"../tutorial_files/example_root_directory\") because result's edesign remembers \n",
    "                 # the root directory is was loaded from and this is the default destination."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point we have directory containing an experiment design, a corresponding set of data, and some results from analyzing that data.  It may be interesting to poke around in this [example_root_directory](../tutorial_files/example_root_directory) and take not of where things are.  Notice how the subdirectory under `results` is the name we gave to the protocol above: `TP_GST_Analysis`.\n",
    "\n",
    "We can load the results back in, again using the *same* root directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loaded_results_dir = pygsti.io.read_results_from_dir(\"../tutorial_files/example_root_directory\")\n",
    "\n",
    "print(\"Available results = \", list(loaded_results_dir.for_protocol.keys()))\n",
    "gst_results = loaded_results_dir.for_protocol['TP_GST_Analysis']\n",
    "\n",
    "gst_estimate = gst_results.estimates['TP_GST_Analysis']  # protocol name is also used as a default estimate name\n",
    "gst_model = gst_estimate.models['stdgaugeopt']\n",
    "dataset = gst_results.data.dataset  # ProtocolResults -> ProtocolData -> DataSet\n",
    "\n",
    "nSigma = pygsti.tools.two_delta_logl_nsigma(gst_model, dataset)\n",
    "print(\"Model violation (goodness of fit) = %g sigma\" % nSigma)\n",
    "if nSigma < 2:\n",
    "    print(\"  The GST model gives a very good fit to the data!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Something we want to emphasize is that all of the I/O function calls (except the simulated data generation and unnecessary separate saving of the protocol) are given the same, single root directory (`\"../tutorial_files/example_root_directory\"`).  This is a key feature and the intended design of pyGSTi's I/O framework - that a given experiment design, data, and analysis all live together in the same \"node\" (corresponding to a root directory) in a potential tree of experiments.  In the example above, we just have a single node.  By using nested experiment designs (e.g. `CombinedExperimentDesign` and `SimultaneousExperimentDesign` objects) a tree of such \"root directories\" (perhaps better called \"node directories\" then) can be built. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Serialization to pickle, JSON and MSGPACK formats\n",
    "\n",
    "**DEPRECATED: The following serialization functionality is deprecated and may not be supported/break in newer pysgti releases. Please migrate to updated json serialization functionality described above.*** \n",
    "\n",
    "PyGSTi contains support for reading and writing most (if not all) of its objects from and to the JSON and MessagePack formats using a pickle-like format.  The modules `pygsti.io.json` and `pygsti.io.msgpack` mimic the more general Python `json` and `msgpack` packages (`json` is a standard package, `msgpack` is a separate package, and must be installed if you wish to use pyGSTi's MessagePack functionality).  These, in turn, mimic the load/dump interface of the standard `pickle` module, so it's very easy to serialize data using any of these formats.  Here's a brief summary of the mais advantages and disadvantages of each format:\n",
    "\n",
    "- pickle\n",
    " - **Advantages**: a standard Python package; very easy to use; can serialize almost anything.\n",
    " - **Disadvantages**: incompatibility between python2 and python3 pickles; can be large on disk (inefficient); not web safe.\n",
    "- json\n",
    " - **Advantages**: a standard Python package; web-safe character set.; *should* be the same on python2 or python3\n",
    " - **Disadvantages**: large on disk (inefficient)\n",
    "- msgpack\n",
    " - **Advantages**: *should* be the same on python2 or python3; efficient binary format (small on disk)\n",
    " - **Disadvantages**: needs external `msgpack` package; binary non-web-safe format.\n",
    " \n",
    "As stated above, because this format is fragile (will completely break when pyGSTi internals change) and not human readable, this should be used as a last resort or just as temporary storage.  Below we demonstrate how to use the `io.json` and `io.msgpack` modules.  Using `pickle` is essentially the same, as pyGSTi objects support being pickled too."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [],
   "source": [
    "import pygsti.serialization.json as json\n",
    "import pygsti.serialization.msgpack as msgpack\n",
    "\n",
    "#Models\n",
    "from pygsti.modelpacks import smq1Q_XYI\n",
    "target_model = smq1Q_XYI.target_model()\n",
    "json.dump(target_model, open(\"../tutorial_files/TestModel.json\",'w'))\n",
    "target_model_from_json = json.load(open(\"../tutorial_files/TestModel.json\"))\n",
    "\n",
    "msgpack.dump(target_model, open(\"../tutorial_files/TestModel.mp\",'wb'))\n",
    "target_model_from_msgpack = msgpack.load(open(\"../tutorial_files/TestModel.mp\", 'rb'))\n",
    "\n",
    "#DataSets\n",
    "json.dump(ds, open(\"../tutorial_files/TestDataSet.json\",'w'))\n",
    "ds_from_json = json.load(open(\"../tutorial_files/TestDataSet.json\"))\n",
    "\n",
    "msgpack.dump(ds, open(\"../tutorial_files/TestDataSet.mp\",'wb'))\n",
    "ds_from_msgpack = msgpack.load(open(\"../tutorial_files/TestDataSet.mp\",'rb'))\n",
    "\n",
    "#MultiDataSets\n",
    "json.dump(multiDS, open(\"../tutorial_files/TestMultiDataSet.json\",'w'))\n",
    "multiDS_from_json = json.load(open(\"../tutorial_files/TestMultiDataSet.json\"))\n",
    "\n",
    "msgpack.dump(multiDS, open(\"../tutorial_files/TestMultiDataSet.mp\",'wb'))\n",
    "multiDS_from_msgpack = msgpack.load(open(\"../tutorial_files/TestMultiDataSet.mp\",'rb'))\n",
    "\n",
    "# Timestamped-data DataSets\n",
    "json.dump(tdds_fromfile, open(\"../tutorial_files/TestTDDataset.json\",'w'))\n",
    "tdds_from_json = json.load(open(\"../tutorial_files/TestTDDataset.json\"))\n",
    "\n",
    "msgpack.dump(tdds_fromfile, open(\"../tutorial_files/TestTDDataset.mp\",'wb'))\n",
    "tdds_from_msgpack = msgpack.load(open(\"../tutorial_files/TestTDDataset.mp\",'rb'))\n",
    "\n",
    "#Circuit Lists\n",
    "json.dump(cList, open(\"../tutorial_files/TestCircuitList.json\",'w'))\n",
    "cList_from_json = json.load(open(\"../tutorial_files/TestCircuitList.json\"))\n",
    "\n",
    "msgpack.dump(cList, open(\"../tutorial_files/TestCircuitList.mp\",'wb'))\n",
    "cList_from_msgpack = msgpack.load(open(\"../tutorial_files/TestCircuitList.mp\",'rb'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}