{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Sets Tutorial\n",
    "This tutorial demonstrates how to create and use `DataSet` objects.  At its core Gate Set Tomography finds a gate set which best fits some experimental data, and in pyGSTi a `DataSet` is used to hold that data.  When a `DataSet` is used to hold time-independent data, it essentially looks like a nested dictionary which associates gate strings with dictionaries of (outcome-label,count) pairs so that `dataset[gateString][outcomeLabel]` can be used to read & write the number of `outcomeLabel` outcomes of the experiment given by the sequence `gateString`.\n",
    "\n",
    "There are a few important differences between a `DataSet` and a dictionary-of-dictionaries:\n",
    "- `DataSet` objects can be in one of two modes: *static* or *non-static*.  When in *non-static* mode, data can be freely modified within the set, making this mode to use during the data-entry.  In the *static* mode, data cannot be modified and the `DataSet` is essentially read-only.  The `done_adding_data` method of a `DataSet` switches from non-static to static mode, and should be called, as the name implies, once all desired data has been added (or modified).  Once a `DataSet` is static, it is read-only for the rest of its life; to modify its data the best one can do is make a non-static *copy* via the `copy_nonstatic` member and modify the copy.\n",
    "\n",
    "- Because `DataSet`s may contain time-dependent data, the dictionary-access syntax for a single outcome label (i.e. `dataset[gateString][outcomeLabel]`) *cannot* be used to write counts for new `gateString` keys; One should instead  use the `add_`*xxx* methods of the `DataSet` object.\n",
    "\n",
    "Once a `DataSet` is constructed, filled with data, and made *static*, it is typically passed as a parameter to one of pyGSTi's algorithm or driver routines to find a `GateSet` estimate based on the data.  This tutorial focuses on how to construct a `DataSet` and modify its data.  Later tutorials will demonstrate the different GST algorithms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import pygsti"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating a `DataSet`\n",
    "There three basic ways to create `DataSet` objects in `pygsti`:\n",
    "* By creating an empty `DataSet` object and manually adding counts corresponding to gate strings.  Remember that the `add_`*xxx* methods must be used to add data for gate strings not yet in the `DataSet`.  Once the data is added, be sure to call `done_adding_data`, as this restructures the internal storage of the `DataSet` to optimize the access operations used by algorithms.\n",
    "* By loading from a text-format dataset file via `pygsti.io.load_dataset`.  The result is a ready-to-use-in-algorithms *static* `DataSet`, so there's no need to call `done_adding_data` this time.\n",
    "* By using a `GateSet` to generate \"fake\" data via `generate_fake_data`. This can be useful for doing simulations of GST, and comparing to your experimental results.\n",
    "\n",
    "We do each of these in turn in the cells below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#1) Creating a data set from scratch\n",
    "#    Note that tuples may be used in lieu of GateString objects\n",
    "ds1 = pygsti.objects.DataSet(outcomeLabels=['0','1'])\n",
    "ds1.add_count_dict( ('Gx',), {'0': 10, '1': 90} )\n",
    "ds1.add_count_dict( ('Gx','Gy'), {'0': 40, '1': 60} )\n",
    "ds1[('Gy',)] = {'0': 10, '1': 90} # dictionary assignment\n",
    "\n",
    "#Modify existing data using dictionary-like access\n",
    "ds1[('Gx',)]['0'] = 15\n",
    "ds1[('Gx',)]['1'] = 85\n",
    "\n",
    "#GateString objects can be used.\n",
    "gs = pygsti.objects.GateString( ('Gx','Gy'))\n",
    "ds1[gs]['0'] = 45\n",
    "ds1[gs]['1'] = 55\n",
    "\n",
    "ds1.done_adding_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading tutorial_files/Example_TinyDataset.txt: 100%\n"
     ]
    }
   ],
   "source": [
    "#2) By creating and loading a text-format dataset file.  The first\n",
    "#    row is a directive which specifies what the columns (after the\n",
    "#    first one) holds.  Other allowed values are \"0 frequency\", \n",
    "#    \"1 count\", etc.  Note that \"0\" and \"1\" in are the \n",
    "#    SPAM labels and must match those of any GateSet used in \n",
    "#    conjuction with this DataSet.\n",
    "dataset_txt = \\\n",
    "\"\"\"## Columns = plus count, count total\n",
    "{} 0 100\n",
    "Gx 10 90\n",
    "GxGy 40 60\n",
    "Gx^4 20 90\n",
    "\"\"\"\n",
    "with open(\"tutorial_files/Example_TinyDataset.txt\",\"w\") as tinydataset:\n",
    "    tinydataset.write(dataset_txt)\n",
    "ds2 = pygsti.io.load_dataset(\"tutorial_files/Example_TinyDataset.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#3) By generating fake data (using the std1Q_XYI standard gate set module)\n",
    "from pygsti.construction import std1Q_XYI\n",
    "\n",
    "#Depolarize the perfect X,Y,I gate set\n",
    "depol_gateset = std1Q_XYI.gs_target.depolarize(gate_noise=0.1)\n",
    "\n",
    "#Compute the sequences needed to perform Long Sequence GST on \n",
    "# this GateSet with sequences up to lenth 512\n",
    "gatestring_list = pygsti.construction.make_lsgst_experiment_list(\n",
    "    std1Q_XYI.gs_target, std1Q_XYI.prepStrs, std1Q_XYI.effectStrs,\n",
    "    std1Q_XYI.germs, [1,2,4,8,16,32,64,128,256,512])\n",
    "\n",
    "#Generate fake data (Tutorial 00)\n",
    "ds3 = pygsti.construction.generate_fake_data(depol_gateset, gatestring_list, nSamples=1000,\n",
    "                                             sampleError='binomial', seed=100)\n",
    "ds3b = pygsti.construction.generate_fake_data(depol_gateset, gatestring_list, nSamples=50,\n",
    "                                              sampleError='binomial', seed=100)\n",
    "\n",
    "#Write the ds3 and ds3b datasets to a file for later tutorials\n",
    "pygsti.io.write_dataset(\"tutorial_files/Example_Dataset.txt\", ds3, outcomeLabelOrder=['0','1']) \n",
    "pygsti.io.write_dataset(\"tutorial_files/Example_Dataset_LowCnts.txt\", ds3b) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Viewing `DataSets`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset1:\n",
      " Gx  :  {('0',): 15.0, ('1',): 85.0}\n",
      "GxGy  :  {('0',): 45.0, ('1',): 55.0}\n",
      "Gy  :  {('0',): 10.0, ('1',): 90.0}\n",
      "\n",
      "\n",
      "Dataset2:\n",
      " Gx  :  {('plus',): 10.0}\n",
      "GxGy  :  {('plus',): 40.0}\n",
      "Gx^4  :  {('plus',): 20.0}\n",
      "\n",
      "\n",
      "Dataset3 is too big to print, so here it is truncated to Dataset2's strings\n",
      " Gx  :  {('0',): 501.0, ('1',): 499.0}\n",
      "GxGy  :  {('0',): 504.0, ('1',): 496.0}\n",
      "Gx^4  :  {('0',): 829.0, ('1',): 171.0}\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "#It's easy to just print them:\n",
    "print(\"Dataset1:\\n\",ds1)\n",
    "print(\"Dataset2:\\n\",ds2)\n",
    "print(\"Dataset3 is too big to print, so here it is truncated to Dataset2's strings\\n\", ds3.truncate(ds2.keys()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note that the outcome labels `'0'` and `'1'` appear as `('0',)` and `('1',)`**.  This is because outcome labels in pyGSTi are tuples of time-ordered instrument element (allowing for intermediate measurements) and POVM effect labels.  In the special but common case when there are no intermediate measurements, the outcome label is a 1-tuple of just the final POVM effect label.  In this case, one may use the effect label itself (e.g. `'0'` or `'1'`) in place of the 1-tuple in almost all contexts, as it is automatically converted to the 1-tuple (e.g. `('0',)` or `('1',)`) internally.  When printing, however, the 1-tuple is still displayed to remind the user of the more general structure contained in the `DataSet`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Iteration over data sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[GateString(Gx), GateString(GxGy), GateString(Gy)]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# A DataSet's keys() method returns a list of GateString objects\n",
    "ds1.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Gatestring = Gx   , SPAM label = ('0',), count = 15\n",
      "Gatestring = Gx   , SPAM label = ('1',), count = 85\n",
      "Gatestring = GxGy , SPAM label = ('0',), count = 45\n",
      "Gatestring = GxGy , SPAM label = ('1',), count = 55\n",
      "Gatestring = Gy   , SPAM label = ('0',), count = 10\n",
      "Gatestring = Gy   , SPAM label = ('1',), count = 90\n"
     ]
    }
   ],
   "source": [
    "# There are many ways to iterate over a DataSet.  Here's one:\n",
    "for gatestring in ds1.keys():\n",
    "    dsRow = ds1[gatestring]\n",
    "    for spamlabel in dsRow.counts.keys():\n",
    "        print(\"Gatestring = %s, SPAM label = %s, count = %d\" % \\\n",
    "            (str(gatestring).ljust(5), str(spamlabel).ljust(6), dsRow[spamlabel]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Advanced features of data sets\n",
    "\n",
    "### `collisionAction` argument\n",
    "When creating a `DataSet` one may specify the `collisionAction` argument as either `\"aggregate\"` (the default) or `\"keepseparate\"`.  The former instructs the `DataSet` to simply add the counts of like outcomes when counts are added for an already existing gate sequence.  `\"keepseparate\"`, on the other hand, causes the `DataSet` to tag added count data by appending a fictitious `\"#<n>\"` gate label to a gate sequence that already exists, where `<n>` is an integer.  When retreiving the keys of a `keepseparate` data set, the `stripOccuranceTags` argument to `keys()` determines whether the `\"#<n>\"` labels are included in the output (if they're not - the default - duplicate keys may be returned).  Access to different occurances of the same data are provided via the `occurrance` argument of the `get_row` and `set_row` functions, which should be used instead of the usual bracket indexing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Aggregate-mode Dataset:\n",
      " GxGy  :  {('0',): 40.0, ('1',): 60.0}\n",
      "\n",
      "\n",
      "Keepseparate-mode Dataset:\n",
      " GxGy  :  {('0',): 10.0, ('1',): 90.0}\n",
      "GxGy#1  :  {('0',): 40.0, ('1',): 60.0}\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "ds_agg = pygsti.objects.DataSet(outcomeLabels=['0','1'], collisionAction=\"aggregate\") #the default\n",
    "ds_agg.add_count_dict( ('Gx','Gy'), {'0': 10, '1': 90} )\n",
    "ds_agg.add_count_dict( ('Gx','Gy'), {'0': 40, '1': 60} )\n",
    "print(\"Aggregate-mode Dataset:\\n\",ds_agg)\n",
    "\n",
    "ds_sep = pygsti.objects.DataSet(outcomeLabels=['0','1'], collisionAction=\"keepseparate\")\n",
    "ds_sep.add_count_dict( ('Gx','Gy'), {'0': 10, '1': 90} )\n",
    "ds_sep.add_count_dict( ('Gx','Gy'), {'0': 40, '1': 60} )\n",
    "print(\"Keepseparate-mode Dataset:\\n\",ds_sep)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Time-dependent data\n",
    "When your data is time-stamped, either for each individual count or by groups of counts, there are additional (richer) options for analysis.  The `DataSet` class is also capable of storing time-dependent data by holding *series* of count data rather than binned numbers-of-counts, which are added via its `add_series_data` method.  Outcome counts are input by giving at least two parallel arrays of 1) outcome labels and 2) time stamps.  Optionally, one can provide a third array of repetitions, specifying how many times the corresponding outcome occurred at the time stamp.  While in reality no two outcomes are taken at exactly the same time, a `TDDataSet` allows for arbitrarily *coarse-grained* time-dependent data in which multiple outcomes are all tagged with the *same* time stamp.  In fact, the \"time-independent\" case considered in this tutorial so far is actually a special case in which the all data is stamped at *time=0*.\n",
    "\n",
    "Below we demonstrate how to create and initialize a `DataSet` using time series data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Create an empty dataset                                                                       \n",
    "tdds = pygsti.objects.DataSet(outcomeLabels=['0','1'])\n",
    "\n",
    "#Add a \"single-shot\" series of outcomes, where each spam label (outcome) has a separate time stamp\n",
    "tdds.add_raw_series_data( ('Gx',), #gate sequence                                                                 \n",
    "            ['0','0','1','0','1','0','1','1','1','0'], #spam labels                                                                                                                 \n",
    "            [0.0, 0.2, 0.5, 0.6, 0.7, 0.9, 1.1, 1.3, 1.35, 1.5]) #time stamps                                                                                              \n",
    "\n",
    "#When adding outcome-counts in \"chunks\" where the counts of each\n",
    "# chunk occur at nominally the same time, use 'add_raw_series_data' to\n",
    "# add a list of count dictionaries with a timestamp given for each dict:\n",
    "tdds.add_series_data( ('Gx','Gx'),  #gate sequence                                                               \n",
    "                      [{'0':10, '1':90}, {'0':30, '1':70}], #count dicts                                                         \n",
    "                      [0.0, 1.0]) #time stamps - one per dictionary                                                               \n",
    "\n",
    "#For even more control, you can specify the timestamp of each count\n",
    "# event or group of identical outcomes that occur at the same time:\n",
    "#Add 3 'plus' outcomes at time 0.0, followed by 2 'minus' outcomes at time 1.0\n",
    "tdds.add_raw_series_data( ('Gy',),  #gate sequence                                                               \n",
    "                      ['0','1'], #spam labels                                                         \n",
    "                      [0.0, 1.0], #time stamps                                                               \n",
    "                      [3,2]) #repeats  \n",
    "\n",
    "#The above coarse-grained addition is logically identical to:\n",
    "# tdds.add_raw_series_data( ('Gy',),  #gate sequence                                                               \n",
    "#                       ['0','0','0','1','1'], #spam labels                                                         \n",
    "#                       [0.0, 0.0, 0.0, 1.0, 1.0]) #time stamps                                                               \n",
    "# (However, the DataSet will store the coase-grained addition more efficiently.) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When one is done populating the `DataSet` with data, one should still call `done_adding_data`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "tdds.done_adding_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Access to the underlying time series data is done by indexing on the gate sequence (to get a `DataSetRow` object, just as in the time-independent case) which has various methods for retrieving its underlying data: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO for Gx string:\n",
      "\n",
      "Outcome Label Indices = [0 0 1 0 1 0 1 1 1 0]\n",
      "Time stamps = [0.   0.2  0.5  0.6  0.7  0.9  1.1  1.3  1.35 1.5 ]\n",
      "Repetitions = [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
      "\n",
      "Raw outcome label indices: [0 0 1 0 1 0 1 1 1 0]\n",
      "Raw time stamps: [0.   0.2  0.5  0.6  0.7  0.9  1.1  1.3  1.35 1.5 ]\n",
      "Raw repetitions: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
      "Number of entries in raw arrays: 10\n",
      "Outcome Labels: [('0',), ('0',), ('1',), ('0',), ('1',), ('0',), ('1',), ('1',), ('1',), ('0',)]\n",
      "Repetition-expanded outcome labels: [('0',), ('0',), ('1',), ('0',), ('1',), ('0',), ('1',), ('1',), ('1',), ('0',)]\n",
      "Repetition-expanded outcome label indices: [0 0 1 0 1 0 1 1 1 0]\n",
      "Repetition-expanded time stamps: [0.   0.2  0.5  0.6  0.7  0.9  1.1  1.3  1.35 1.5 ]\n",
      "Time-independent-like counts per spam label: OutcomeLabelDict([(('0',), 5.0), (('1',), 5.0)])\n",
      "Time-independent-like total counts: 10.0\n",
      "Time-independent-like spam label fraction: OrderedDict([(('0',), 0.5), (('1',), 0.5)])\n",
      "\n",
      "\n",
      "INFO for Gy string:\n",
      "\n",
      "Outcome Label Indices = [0 1]\n",
      "Time stamps = [0. 1.]\n",
      "Repetitions = [3. 2.]\n",
      "\n",
      "Raw outcome label indices: [0 1]\n",
      "Raw time stamps: [0. 1.]\n",
      "Raw repetitions: [3. 2.]\n",
      "Number of entries in raw arrays: 2\n",
      "Spam Labels: [('0',), ('1',)]\n",
      "Repetition-expanded outcome labels: [('0',), ('0',), ('0',), ('1',), ('1',)]\n",
      "Repetition-expanded outcome label indices: [0 0 0 1 1]\n",
      "Repetition-expanded time stamps: [0. 0. 0. 1. 1.]\n",
      "Time-independent-like counts per spam label: OutcomeLabelDict([(('0',), 3.0), (('1',), 2.0)])\n",
      "Time-independent-like total counts: 5.0\n",
      "Time-independent-like spam label fraction: OrderedDict([(('0',), 0.6), (('1',), 0.4)])\n"
     ]
    }
   ],
   "source": [
    "tdds_row = tdds[('Gx',)]\n",
    "print(\"INFO for Gx string:\\n\")\n",
    "print( tdds_row )\n",
    "      \n",
    "print( \"Raw outcome label indices:\", tdds_row.oli )\n",
    "print( \"Raw time stamps:\", tdds_row.time )\n",
    "print( \"Raw repetitions:\", tdds_row.reps )\n",
    "print( \"Number of entries in raw arrays:\", len(tdds_row) )\n",
    "\n",
    "print( \"Outcome Labels:\", tdds_row.outcomes )\n",
    "print( \"Repetition-expanded outcome labels:\", tdds_row.get_expanded_ol() )\n",
    "print( \"Repetition-expanded outcome label indices:\", tdds_row.get_expanded_oli() )\n",
    "print( \"Repetition-expanded time stamps:\", tdds_row.get_expanded_times() )\n",
    "print( \"Time-independent-like counts per spam label:\", tdds_row.counts )\n",
    "print( \"Time-independent-like total counts:\", tdds_row.total )\n",
    "print( \"Time-independent-like spam label fraction:\", tdds_row.fractions )\n",
    "\n",
    "print(\"\\n\")\n",
    "\n",
    "tdds_row = tdds[('Gy',)]\n",
    "print(\"INFO for Gy string:\\n\")\n",
    "print( tdds_row )\n",
    "      \n",
    "print( \"Raw outcome label indices:\", tdds_row.oli )\n",
    "print( \"Raw time stamps:\", tdds_row.time )\n",
    "print( \"Raw repetitions:\", tdds_row.reps )\n",
    "print( \"Number of entries in raw arrays:\", len(tdds_row) )\n",
    "\n",
    "print( \"Spam Labels:\", tdds_row.outcomes )\n",
    "print( \"Repetition-expanded outcome labels:\", tdds_row.get_expanded_ol() )\n",
    "print( \"Repetition-expanded outcome label indices:\", tdds_row.get_expanded_oli() )\n",
    "print( \"Repetition-expanded time stamps:\", tdds_row.get_expanded_times() )\n",
    "print( \"Time-independent-like counts per spam label:\", tdds_row.counts )\n",
    "print( \"Time-independent-like total counts:\", tdds_row.total )\n",
    "print( \"Time-independent-like spam label fraction:\", tdds_row.fractions )\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, it is possible to read text-formatted time-dependent data in the special case when\n",
    "1. the outcomes are all single-shot \n",
    "2. the time stamps of the outcomes are the integers (starting at zero) for *all* of the gate sequences.\n",
    "This corresponds to the case when each sequence is performed and measured simultaneously at equally spaced intervals.  We realize this is a bit fictitous and more text-format input options will be created in the future."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The `MultiDataSet` object: a dictionary of `DataSet`s\n",
    "Sometimes it is useful to deal with several sets of data all of which hold counts for the *same* set of gate sequences.  For example, colleting data to perform GST on Monday and then again on Tuesday, or making an adjustment to an experimental system and re-taking data, could create two separate data sets with the same sequences.  PyGSTi has a separate data type, `pygsti.objects.MultiDataSet`, for this purpose.  A `MultiDataSet` looks and acts like a simple dictionary of `DataSet` objects, but underneath implements some certain optimizations that reduce the amount of space and memory required to store the data.  Primarily, it holds just a *single* list of the gate sequences - as opposed to an actual dictionary of `DataSet`s in which each `DataSet` contains it's own copy of the gate sequences.  In addition to being more space efficient, a `MultiDataSet` is able to aggregate all of its data into a single \"summed\" `DataSet` via `get_datasets_aggregate(...)`, which can be useful for combining several \"passes\" of experimental data.  \n",
    "\n",
    "Several remarks regarding a `MultiDataSet` are worth mentioning:\n",
    "- you add `DataSets` to a `MultiDataSet` using the `add_dataset` method.  However only *static* `DataSet` objects can be added.  This is because the MultiDataSet must keep all of its `DataSet`s locked to the same set of sequences, and a non-static `DataSet` allows the addition or removal of only its sequences.  (If the `DataSet` you want to add isn't in static-mode, call its `done_adding_data` method.)\n",
    "- square-bracket indexing accesses the `MultiDataSet` as if it were a dictionary of `DataSets`.\n",
    "- `MultiDataSets` can be loaded and saved from a single text-format file with columns for each contained `DataSet` - see `pygsti.io.load_multidataset`.\n",
    "\n",
    "Here's a brief example of using a `MultiDataSet`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MultiDataSet has 2 gate strings and DataSet labels ['myDS', 'myDS2']\n",
      "Empty string data for myDS =  {('0',): 10.0, ('1',): 90.0}\n",
      "Empty string data for myDS2 =  {('0',): 15.0, ('1',): 85.0}\n",
      "Gx string data (no label) = {('0',): 10.0, ('1',): 90.0}\n",
      "Gx string data (no label) = {('0',): 5.0, ('1',): 95.0}\n",
      "GxGy string data for myDS = {('0',): 20.0, ('1',): 80.0}\n",
      "GxGy string data for myDS2 = {('0',): 30.0, ('1',): 70.0}\n",
      "\n",
      "Summed data:\n",
      "{}  :  {('0',): 25.0, ('1',): 175.0}\n",
      "Gx  :  {('0',): 15.0, ('1',): 185.0}\n",
      "GxGy  :  {('0',): 50.0, ('1',): 150.0}\n",
      "GxGxGxGx  :  {('0',): 60.0, ('1',): 140.0}\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from __future__ import print_function\n",
    "import pygsti\n",
    "\n",
    "multiDS = pygsti.objects.MultiDataSet()\n",
    "\n",
    "#Create some datasets                                           \n",
    "ds = pygsti.objects.DataSet(outcomeLabels=['0','1'])\n",
    "ds.add_count_dict( (), {'0': 10, '1': 90} )\n",
    "ds.add_count_dict( ('Gx',), {'0': 10, '1': 90} )\n",
    "ds.add_count_dict( ('Gx','Gy'), {'0': 20, '1': 80} )\n",
    "ds.add_count_dict( ('Gx','Gx','Gx','Gx'), {'0': 20, '1': 80} )\n",
    "ds.done_adding_data()\n",
    "\n",
    "ds2 = pygsti.objects.DataSet(outcomeLabels=['0','1'])            \n",
    "ds2.add_count_dict( (), {'0': 15, '1': 85} )\n",
    "ds2.add_count_dict( ('Gx',), {'0': 5, '1': 95} )\n",
    "ds2.add_count_dict( ('Gx','Gy'), {'0': 30, '1': 70} )\n",
    "ds2.add_count_dict( ('Gx','Gx','Gx','Gx'), {'0': 40, '1': 60} )\n",
    "ds2.done_adding_data()\n",
    "\n",
    "multiDS['myDS'] = ds\n",
    "multiDS['myDS2'] = ds2\n",
    "\n",
    "nStrs = len(multiDS)\n",
    "dslabels = list(multiDS.keys())\n",
    "print(\"MultiDataSet has %d gate strings and DataSet labels %s\" % (nStrs, dslabels))\n",
    "    \n",
    "for dslabel in multiDS:\n",
    "    ds = multiDS[dslabel]\n",
    "    print(\"Empty string data for %s = \" % dslabel, ds[()])       \n",
    "\n",
    "for ds in multiDS.values():\n",
    "    print(\"Gx string data (no label) =\", ds[('Gx',)])     \n",
    "\n",
    "for dslabel,ds in multiDS.items():\n",
    "    print(\"GxGy string data for %s =\" % dslabel, ds[('Gx','Gy')])  \n",
    "\n",
    "dsSum = multiDS.get_datasets_aggregate('myDS','myDS2')\n",
    "print(\"\\nSummed data:\")\n",
    "print(dsSum)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading tutorial_files/TinyMultiDataset.txt: 100%\n",
      "\n",
      "Loaded from file:\n",
      "\n",
      "MultiDataSet containing: 2 datasets, each with 4 strings\n",
      " Dataset names = DS0, DS1\n",
      " Outcome labels = ('0',), ('1',)\n",
      "Gate strings: \n",
      "{}\n",
      "Gx\n",
      "GxGy\n",
      "Gx^4\n",
      "\n"
     ]
    }
   ],
   "source": [
    "multi_dataset_txt = \\\n",
    "\"\"\"## Columns = DS0 0 count, DS0 1 count, DS1 0 frequency, DS1 count total                                \n",
    "{} 0 100 0 100                                                                                                      \n",
    "Gx 10 90 0.1 100                                                                                                    \n",
    "GxGy 40 60 0.4 100                                                                                                  \n",
    "Gx^4 20 80 0.2 100                                                                                                  \n",
    "\"\"\"\n",
    "\n",
    "with open(\"tutorial_files/TinyMultiDataset.txt\",\"w\") as output:\n",
    "    output.write(multi_dataset_txt)\n",
    "multiDS_fromFile = pygsti.io.load_multidataset(\"tutorial_files/TinyMultiDataset.txt\", cache=False)\n",
    "\n",
    "print(\"\\nLoaded from file:\\n\")\n",
    "print(multiDS_fromFile)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}