{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Sets Tutorial\n",
    "This tutorial demonstrates how to create and use `DataSet` objects.  At its core Gate Set Tomography finds a gate set which best fits some experimental data, and in pyGSTi a `DataSet` is used to hold that data.  `DataSet`s are essentially nested dictionaries which associate a count (a number, typically an integer) with (gate string, SPAM label) pairs so that `dataset[gateString][spamLabel]` can be used to read & write the number of `spamLabel` outcomes of the experiment given by the sequence `gateString`.\n",
    "\n",
    "There are a few important differences between a `DataSet` and a dictionary-of-dictionaries:\n",
    "- `DataSet` objects can be in one of two modes: *static* or *non-static*.  When in *non-static* mode, data can be freely modified within the set, making this mode to use during the data-entry.  In the *static* mode, data cannot be modified and the `DataSet` is essentially read-only.  The `done_adding_data` method of a `DataSet` switches from non-static to static mode, and should be called, as the name implies, once all desired data has been added (or modified).  Once a `DataSet` is static, it is read-only for the rest of its life; to modify its data the best one can do is make a non-static *copy* via the `copy_nonstatic` member and modify the copy.\n",
    "\n",
    "- When data for a gate string is present in a `DataSet`, counts must exist for *all* SPAM labels.  That is, for a given gate string, you cannot store counts for only a subset of the SPAM labels.  Because of this condition, dictionary-access syntax of the SPAM label (i.e. `dataset[gateString][spamLabel]`) *cannot* be used to write counts for new `gateString` keys; One must either assign an entire dictionary of SPAM label-count pairs to `dataset[gateString]` or use the `add_`*xxx* methods (these methods add data for *all* SPAM labels at once). \n",
    "\n",
    "Once a `DataSet` is constructed, filled with data, and made *static*, it is typically passed as a parameter to one of pyGSTi's algorithm or driver routines to find a `GateSet` estimate based on the data.  This tutorial focuses on how to construct a `DataSet` and modify its data.  Later tutorials will demonstrate the different GST algorithms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import pygsti"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Creating a `DataSet`\n",
    "There three basic ways to create `DataSet` objects in `pygsti`:\n",
    "* By creating an empty `DataSet` object and manually adding counts corresponding to gate strings.  Remember that the `add_`*xxx* methods must be used to add data for gate strings not yet in the `DataSet`.  Once the data is added, be sure to call `done_adding_data`, as this restructures the internal storage of the `DataSet` to optimize the access operations used by algorithms.\n",
    "* By loading from a text-format dataset file via `pygsti.io.load_dataset`.  The result is a ready-to-use-in-algorithms *static* `DataSet`, so there's no need to call `done_adding_data` this time.\n",
    "* By using a `GateSet` to generate \"fake\" data via `generate_fake_data`. This can be useful for doing simulations of GST, and comparing to your experimental results.\n",
    "\n",
    "We do each of these in turn in the cells below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#1) Creating a data set from scratch\n",
    "#    Note that tuples may be used in lieu of GateString objects\n",
    "ds1 = pygsti.objects.DataSet(spamLabels=['plus','minus'])\n",
    "ds1.add_count_dict( ('Gx',), {'plus': 10, 'minus': 90} )\n",
    "ds1.add_count_dict( ('Gx','Gy'), {'plus': 40, 'minus': 60} )\n",
    "ds1[('Gy',)] = {'plus': 10, 'minus': 90} # dictionary assignment\n",
    "\n",
    "#Modify existing data using dictionary-like access\n",
    "ds1[('Gx',)]['plus'] = 15\n",
    "ds1[('Gx',)]['minus'] = 85\n",
    "\n",
    "#GateString objects can be used.\n",
    "gs = pygsti.objects.GateString( ('Gx','Gy'))\n",
    "ds1[gs]['plus'] = 45\n",
    "ds1[gs]['minus'] = 55\n",
    "\n",
    "ds1.done_adding_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading tutorial_files/Example_TinyDataset.txt: 100%\n"
     ]
    }
   ],
   "source": [
    "#2) By creating and loading a text-format dataset file.  The first\n",
    "#    row is a directive which specifies what the columns (after the\n",
    "#    first one) holds.  Other allowed values are \"plus frequency\", \n",
    "#    \"minus count\", etc.  Note that \"plus\" and \"minus\" in are the \n",
    "#    SPAM labels and must match those of any GateSet used in \n",
    "#    conjuction with this DataSet.\n",
    "dataset_txt = \\\n",
    "\"\"\"## Columns = plus count, count total\n",
    "{} 0 100\n",
    "Gx 10 90\n",
    "GxGy 40 60\n",
    "Gx^4 20 90\n",
    "\"\"\"\n",
    "with open(\"tutorial_files/Example_TinyDataset.txt\",\"w\") as tinydataset:\n",
    "    tinydataset.write(dataset_txt)\n",
    "ds2 = pygsti.io.load_dataset(\"tutorial_files/Example_TinyDataset.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#3) By generating fake data (using our example gate list string from the previous tutorial)\n",
    "\n",
    "#Load the example gate set from Tutorial 01\n",
    "gateset = pygsti.io.load_gateset(\"tutorial_files/Example_Gateset.txt\")\n",
    "\n",
    "#Depolarize it (Tutorial 01)\n",
    "depol_gateset = gateset.depolarize(gate_noise=0.1)\n",
    "\n",
    "#Load the example gatestring list from Tutorial 02\n",
    "gatestring_list = pygsti.io.load_gatestring_list(\"tutorial_files/Example_GatestringList.txt\")\n",
    "\n",
    "#Generate fake data (Tutorial 00)\n",
    "ds3 = pygsti.construction.generate_fake_data(depol_gateset, gatestring_list, nSamples=1000,\n",
    "                                             sampleError='binomial', seed=100)\n",
    "ds3b = pygsti.construction.generate_fake_data(depol_gateset, gatestring_list, nSamples=50,\n",
    "                                              sampleError='binomial', seed=100)\n",
    "\n",
    "#Write the ds3 and ds3b datasets to a file for later tutorials\n",
    "pygsti.io.write_dataset(\"tutorial_files/Example_Dataset.txt\", ds3, spamLabelOrder=['plus','minus']) \n",
    "pygsti.io.write_dataset(\"tutorial_files/Example_Dataset_LowCnts.txt\", ds3b) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Viewing `DataSets`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset1:\n",
      " Gx  :  {'plus': 15.0, 'minus': 85.0}\n",
      "GxGy  :  {'plus': 45.0, 'minus': 55.0}\n",
      "Gy  :  {'plus': 10.0, 'minus': 90.0}\n",
      "\n",
      "\n",
      "Dataset2:\n",
      " {}  :  {'plus': 0.0, 'minus': 100.0}\n",
      "Gx  :  {'plus': 10.0, 'minus': 80.0}\n",
      "GxGy  :  {'plus': 40.0, 'minus': 20.0}\n",
      "GxGxGxGx  :  {'plus': 20.0, 'minus': 70.0}\n",
      "\n",
      "\n",
      "Dataset3 is too big to print, so here it is truncated to Dataset2's strings\n",
      " {}  :  {'plus': 0.0, 'minus': 1000.0}\n",
      "Gx  :  {'plus': 499.0, 'minus': 501.0}\n",
      "GxGy  :  {'plus': 496.0, 'minus': 504.0}\n",
      "GxGxGxGx  :  {'plus': 171.0, 'minus': 829.0}\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "#It's easy to just print them:\n",
    "print(\"Dataset1:\\n\", ds1)\n",
    "print(\"Dataset2:\\n\", ds2)\n",
    "print(\"Dataset3 is too big to print, so here it is truncated to Dataset2's strings\\n\", ds3.truncate(ds2.keys()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Iteration over data sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[GateString(Gx), GateString(GxGy), GateString(Gy)]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# A DataSet's keys() method returns a list of GateString objects\n",
    "ds1.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Gatestring = Gx   , SPAM label = plus  , count = 15\n",
      "Gatestring = Gx   , SPAM label = minus , count = 85\n",
      "Gatestring = GxGy , SPAM label = plus  , count = 45\n",
      "Gatestring = GxGy , SPAM label = minus , count = 55\n",
      "Gatestring = Gy   , SPAM label = plus  , count = 10\n",
      "Gatestring = Gy   , SPAM label = minus , count = 90\n"
     ]
    }
   ],
   "source": [
    "# There are many ways to iterate over a DataSet.  Here's one:\n",
    "for gatestring in ds1.keys():\n",
    "    dsRow = ds1[gatestring]\n",
    "    for spamlabel in dsRow.keys():\n",
    "        print(\"Gatestring = %s, SPAM label = %s, count = %d\" % \\\n",
    "            (str(gatestring).ljust(5), str(spamlabel).ljust(6), dsRow[spamlabel]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}