{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Time-dependent data in Data Sets\n",
    "The [DataSet tutorial](../DataSet.ipynb) covered the basics of how to use `DataSet` objects with time-independent counts. When your data is time-stamped, either for each individual count or by groups of counts, there are additional (richer) options for analysis.  The `DataSet` class is also capable of storing time-dependent data by holding *series* of count data rather than binned numbers-of-counts, which are added via its `add_series_data` method.  Outcome counts are input by giving at least two parallel arrays of 1) outcome labels and 2) time stamps.  Optionally, one can provide a third array of repetitions, specifying how many times the corresponding outcome occurred at the time stamp.  While in reality no two outcomes are taken at exactly the same time, a `DataSet` allows for arbitrarily *coarse-grained* time-dependent data in which multiple outcomes are all tagged with the *same* time stamp.  In fact, the \"time-independent\" case considered in the aforementioned tutorial is actually just a special case in which the all data is stamped at *time=0*.\n",
    "\n",
    "Below we demonstrate how to create and initialize a `DataSet` using time series data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pygsti\n",
    "\n",
    "#Create an empty dataset                                                                       \n",
    "tdds = pygsti.objects.DataSet(outcomeLabels=['0','1'])\n",
    "\n",
    "#Add a \"single-shot\" series of outcomes, where each spam label (outcome) has a separate time stamp\n",
    "tdds.add_raw_series_data( ('Gx',), #gate sequence                                                                 \n",
    "            ['0','0','1','0','1','0','1','1','1','0'], #spam labels                                                                                                                 \n",
    "            [0.0, 0.2, 0.5, 0.6, 0.7, 0.9, 1.1, 1.3, 1.35, 1.5]) #time stamps                                                                                              \n",
    "\n",
    "#When adding outcome-counts in \"chunks\" where the counts of each\n",
    "# chunk occur at nominally the same time, use 'add_raw_series_data' to\n",
    "# add a list of count dictionaries with a timestamp given for each dict:\n",
    "tdds.add_series_data( ('Gx','Gx'),  #gate sequence                                                               \n",
    "                      [{'0':10, '1':90}, {'0':30, '1':70}], #count dicts                                                         \n",
    "                      [0.0, 1.0]) #time stamps - one per dictionary                                                               \n",
    "\n",
    "#For even more control, you can specify the timestamp of each count\n",
    "# event or group of identical outcomes that occur at the same time:\n",
    "#Add 3 'plus' outcomes at time 0.0, followed by 2 'minus' outcomes at time 1.0\n",
    "tdds.add_raw_series_data( ('Gy',),  #gate sequence                                                               \n",
    "                      ['0','1'], #spam labels                                                         \n",
    "                      [0.0, 1.0], #time stamps                                                               \n",
    "                      [3,2]) #repeats  \n",
    "\n",
    "#The above coarse-grained addition is logically identical to:\n",
    "# tdds.add_raw_series_data( ('Gy',),  #gate sequence                                                               \n",
    "#                       ['0','0','0','1','1'], #spam labels                                                         \n",
    "#                       [0.0, 0.0, 0.0, 1.0, 1.0]) #time stamps                                                               \n",
    "# (However, the DataSet will store the coase-grained addition more efficiently.) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When one is done populating the `DataSet` with data, one should still call `done_adding_data`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "tdds.done_adding_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Access to the underlying time series data is done by indexing on the gate sequence (to get a `DataSetRow` object, just as in the time-independent case) which has various methods for retrieving its underlying data: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO for Gx string:\n",
      "\n",
      "Outcome Label Indices = [0 0 1 0 1 0 1 1 1 0]\n",
      "Time stamps = [0.   0.2  0.5  0.6  0.7  0.9  1.1  1.3  1.35 1.5 ]\n",
      "Repetitions = [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
      "\n",
      "Raw outcome label indices: [0 0 1 0 1 0 1 1 1 0]\n",
      "Raw time stamps: [0.   0.2  0.5  0.6  0.7  0.9  1.1  1.3  1.35 1.5 ]\n",
      "Raw repetitions: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
      "Number of entries in raw arrays: 10\n",
      "Outcome Labels: [('0',), ('0',), ('1',), ('0',), ('1',), ('0',), ('1',), ('1',), ('1',), ('0',)]\n",
      "Repetition-expanded outcome labels: [('0',), ('0',), ('1',), ('0',), ('1',), ('0',), ('1',), ('1',), ('1',), ('0',)]\n",
      "Repetition-expanded outcome label indices: [0 0 1 0 1 0 1 1 1 0]\n",
      "Repetition-expanded time stamps: [0.   0.2  0.5  0.6  0.7  0.9  1.1  1.3  1.35 1.5 ]\n",
      "Time-independent-like counts per spam label: OutcomeLabelDict([(('0',), 5.0), (('1',), 5.0)])\n",
      "Time-independent-like total counts: 10.0\n",
      "Time-independent-like spam label fraction: OrderedDict([(('0',), 1.0)])\n",
      "\n",
      "\n",
      "INFO for Gy string:\n",
      "\n",
      "Outcome Label Indices = [0 1]\n",
      "Time stamps = [0. 1.]\n",
      "Repetitions = [3. 2.]\n",
      "\n",
      "Raw outcome label indices: [0 1]\n",
      "Raw time stamps: [0. 1.]\n",
      "Raw repetitions: [3. 2.]\n",
      "Number of entries in raw arrays: 2\n",
      "Spam Labels: [('0',), ('1',)]\n",
      "Repetition-expanded outcome labels: [('0',), ('0',), ('0',), ('1',), ('1',)]\n",
      "Repetition-expanded outcome label indices: [0 0 0 1 1]\n",
      "Repetition-expanded time stamps: [0. 0. 0. 1. 1.]\n",
      "Time-independent-like counts per spam label: OutcomeLabelDict([(('0',), 3.0), (('1',), 2.0)])\n",
      "Time-independent-like total counts: 5.0\n",
      "Time-independent-like spam label fraction: OrderedDict([(('0',), 1.0)])\n"
     ]
    }
   ],
   "source": [
    "tdds_row = tdds[('Gx',)]\n",
    "print(\"INFO for Gx string:\\n\")\n",
    "print( tdds_row )\n",
    "      \n",
    "print( \"Raw outcome label indices:\", tdds_row.oli )\n",
    "print( \"Raw time stamps:\", tdds_row.time )\n",
    "print( \"Raw repetitions:\", tdds_row.reps )\n",
    "print( \"Number of entries in raw arrays:\", len(tdds_row) )\n",
    "\n",
    "print( \"Outcome Labels:\", tdds_row.outcomes )\n",
    "print( \"Repetition-expanded outcome labels:\", tdds_row.get_expanded_ol() )\n",
    "print( \"Repetition-expanded outcome label indices:\", tdds_row.get_expanded_oli() )\n",
    "print( \"Repetition-expanded time stamps:\", tdds_row.get_expanded_times() )\n",
    "print( \"Time-independent-like counts per spam label:\", tdds_row.counts )\n",
    "print( \"Time-independent-like total counts:\", tdds_row.total )\n",
    "print( \"Time-independent-like spam label fraction:\", tdds_row.fractions )\n",
    "\n",
    "print(\"\\n\")\n",
    "\n",
    "tdds_row = tdds[('Gy',)]\n",
    "print(\"INFO for Gy string:\\n\")\n",
    "print( tdds_row )\n",
    "      \n",
    "print( \"Raw outcome label indices:\", tdds_row.oli )\n",
    "print( \"Raw time stamps:\", tdds_row.time )\n",
    "print( \"Raw repetitions:\", tdds_row.reps )\n",
    "print( \"Number of entries in raw arrays:\", len(tdds_row) )\n",
    "\n",
    "print( \"Spam Labels:\", tdds_row.outcomes )\n",
    "print( \"Repetition-expanded outcome labels:\", tdds_row.get_expanded_ol() )\n",
    "print( \"Repetition-expanded outcome label indices:\", tdds_row.get_expanded_oli() )\n",
    "print( \"Repetition-expanded time stamps:\", tdds_row.get_expanded_times() )\n",
    "print( \"Time-independent-like counts per spam label:\", tdds_row.counts )\n",
    "print( \"Time-independent-like total counts:\", tdds_row.total )\n",
    "print( \"Time-independent-like spam label fraction:\", tdds_row.fractions )\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, it is possible to read text-formatted time-dependent data in the special case when\n",
    "1. the outcomes are all single-shot \n",
    "2. the time stamps of the outcomes are the integers (starting at zero) for *all* of the operation sequences.\n",
    "This corresponds to the case when each sequence is performed and measured simultaneously at equally spaced intervals.  We realize this is a bit fictitous and more text-format input options will be created in the future."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading ../../tutorial_files/TDDataset.txt: 100%\n",
      "Dataset outcomes: OrderedDict([(('0',), 0), (('1',), 1)])\n",
      "{} :\n",
      "Outcome Label Indices = [0 1 1 0 0 1]\n",
      "Time stamps = [0. 1. 2. 3. 4. 5.]\n",
      "( no repetitions )\n",
      "\n",
      "Gx :\n",
      "Outcome Label Indices = [1 1 1 0 0 0 1 1 1]\n",
      "Time stamps = [0. 1. 2. 3. 4. 5. 6. 7. 8.]\n",
      "( no repetitions )\n",
      "\n",
      "Gy :\n",
      "Outcome Label Indices = [1 1 0 0 1 1 0 0]\n",
      "Time stamps = [0. 1. 2. 3. 4. 5. 6. 7.]\n",
      "( no repetitions )\n",
      "\n",
      "\n",
      "\n",
      "Some tests:\n",
      "0.5\n",
      "0.5\n",
      "9.0\n"
     ]
    }
   ],
   "source": [
    "tddataset_txt = \\\n",
    "\"\"\"## 0 = 0                                                                                                                   \n",
    "## 1 = 1                                                                                                                      \n",
    "{} 011001                                                                                                                     \n",
    "Gx 111000111                                                                                                                  \n",
    "Gy 11001100                                                                                                                   \n",
    "\"\"\"\n",
    "with open(\"../../tutorial_files/TDDataset.txt\",\"w\") as output:\n",
    "    output.write(tddataset_txt)\n",
    "tdds_fromfile = pygsti.io.load_tddataset(\"../../tutorial_files/TDDataset.txt\")\n",
    "print(tdds_fromfile)\n",
    "\n",
    "print(\"Some tests:\")\n",
    "print(tdds_fromfile[()].fraction('1'))\n",
    "print(tdds_fromfile[('Gy',)].fraction('1'))\n",
    "print(tdds_fromfile[('Gx',)].total)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}