{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Time-dependent data in Data Sets\n",
    "The [DataSet tutorial](../DataSet.ipynb) covered the basics of how to use `DataSet` objects with time-independent counts. When your data is time-stamped, either for each individual count or by groups of counts, there are additional (richer) options for analysis.  The `DataSet` class is also capable of storing time-dependent data by holding *series* of count data rather than binned numbers-of-counts, which are added via its `add_series_data` method.  Outcome counts are input by giving at least two parallel arrays of 1) outcome labels and 2) time stamps.  Optionally, one can provide a third array of repetitions, specifying how many times the corresponding outcome occurred at the time stamp.  While in reality no two outcomes are taken at exactly the same time, a `DataSet` allows for arbitrarily *coarse-grained* time-dependent data in which multiple outcomes are all tagged with the *same* time stamp.  In fact, the \"time-independent\" case considered in the aforementioned tutorial is actually just a special case in which the all data is stamped at *time=0*.\n",
    "\n",
    "Below we demonstrate how to create and initialize a `DataSet` using time series data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pygsti\n",
    "\n",
    "#Create an empty dataset                                                                       \n",
    "tdds = pygsti.objects.DataSet(outcomeLabels=['0','1'])\n",
    "\n",
    "#Add a \"single-shot\" series of outcomes, where each spam label (outcome) has a separate time stamp\n",
    "tdds.add_raw_series_data( ('Gx',), #gate sequence                                                                 \n",
    "            ['0','0','1','0','1','0','1','1','1','0'], #spam labels                                                                                                                 \n",
    "            [0.0, 0.2, 0.5, 0.6, 0.7, 0.9, 1.1, 1.3, 1.35, 1.5]) #time stamps                                                                                              \n",
    "\n",
    "#When adding outcome-counts in \"chunks\" where the counts of each\n",
    "# chunk occur at nominally the same time, use 'add_raw_series_data' to\n",
    "# add a list of count dictionaries with a timestamp given for each dict:\n",
    "tdds.add_series_data( ('Gx','Gx'),  #gate sequence                                                               \n",
    "                      [{'0':10, '1':90}, {'0':30, '1':70}], #count dicts                                                         \n",
    "                      [0.0, 1.0]) #time stamps - one per dictionary                                                               \n",
    "\n",
    "#For even more control, you can specify the timestamp of each count\n",
    "# event or group of identical outcomes that occur at the same time:\n",
    "#Add 3 'plus' outcomes at time 0.0, followed by 2 'minus' outcomes at time 1.0\n",
    "tdds.add_raw_series_data( ('Gy',),  #gate sequence                                                               \n",
    "                      ['0','1'], #spam labels                                                         \n",
    "                      [0.0, 1.0], #time stamps                                                               \n",
    "                      [3,2]) #repeats  \n",
    "\n",
    "#The above coarse-grained addition is logically identical to:\n",
    "# tdds.add_raw_series_data( ('Gy',),  #gate sequence                                                               \n",
    "#                       ['0','0','0','1','1'], #spam labels                                                         \n",
    "#                       [0.0, 0.0, 0.0, 1.0, 1.0]) #time stamps                                                               \n",
    "# (However, the DataSet will store the coase-grained addition more efficiently.) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When one is done populating the `DataSet` with data, one should still call `done_adding_data`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "tdds.done_adding_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Access to the underlying time series data is done by indexing on the gate sequence (to get a `DataSetRow` object, just as in the time-independent case) which has various methods for retrieving its underlying data: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "tdds_row = tdds[('Gx',)]\n",
    "print(\"INFO for Gx string:\\n\")\n",
    "print( tdds_row )\n",
    "      \n",
    "print( \"Raw outcome label indices:\", tdds_row.oli )\n",
    "print( \"Raw time stamps:\", tdds_row.time )\n",
    "print( \"Raw repetitions:\", tdds_row.reps )\n",
    "print( \"Number of entries in raw arrays:\", len(tdds_row) )\n",
    "\n",
    "print( \"Outcome Labels:\", tdds_row.outcomes )\n",
    "print( \"Repetition-expanded outcome labels:\", tdds_row.get_expanded_ol() )\n",
    "print( \"Repetition-expanded outcome label indices:\", tdds_row.get_expanded_oli() )\n",
    "print( \"Repetition-expanded time stamps:\", tdds_row.get_expanded_times() )\n",
    "print( \"Time-independent-like counts per spam label:\", tdds_row.counts )\n",
    "print( \"Time-independent-like total counts:\", tdds_row.total )\n",
    "print( \"Time-independent-like spam label fraction:\", tdds_row.fractions )\n",
    "\n",
    "print(\"\\n\")\n",
    "\n",
    "tdds_row = tdds[('Gy',)]\n",
    "print(\"INFO for Gy string:\\n\")\n",
    "print( tdds_row )\n",
    "      \n",
    "print( \"Raw outcome label indices:\", tdds_row.oli )\n",
    "print( \"Raw time stamps:\", tdds_row.time )\n",
    "print( \"Raw repetitions:\", tdds_row.reps )\n",
    "print( \"Number of entries in raw arrays:\", len(tdds_row) )\n",
    "\n",
    "print( \"Spam Labels:\", tdds_row.outcomes )\n",
    "print( \"Repetition-expanded outcome labels:\", tdds_row.get_expanded_ol() )\n",
    "print( \"Repetition-expanded outcome label indices:\", tdds_row.get_expanded_oli() )\n",
    "print( \"Repetition-expanded time stamps:\", tdds_row.get_expanded_times() )\n",
    "print( \"Time-independent-like counts per spam label:\", tdds_row.counts )\n",
    "print( \"Time-independent-like total counts:\", tdds_row.total )\n",
    "print( \"Time-independent-like spam label fraction:\", tdds_row.fractions )\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Text data-file formats\n",
    "\n",
    "It is possible to read text-formatted time-dependent data in two ways.\n",
    "\n",
    "The first way is for the special case when\n",
    "1. the outcomes are all single-shot \n",
    "2. the time stamps of the outcomes are the integers (starting at zero) for *all* of the operation sequences.\n",
    "This corresponds to the case when each sequence is performed and measured simultaneously at equally spaced intervals. This is a bit fictitous, but it allows for the compact format given below.  Currently, the only way to read in this format is using the separate `load_tddataset` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "tddataset_txt = \\\n",
    "\"\"\"## 0 = 0                                                                                                                   \n",
    "## 1 = 1                                                                                                                      \n",
    "{} 011001                                                                                                                     \n",
    "Gx 111000111                                                                                                                  \n",
    "Gy 11001100                                                                                                                   \n",
    "\"\"\"\n",
    "with open(\"../../tutorial_files/TDDataset.txt\",\"w\") as output:\n",
    "    output.write(tddataset_txt)\n",
    "tdds_fromfile = pygsti.io.load_tddataset(\"../../tutorial_files/TDDataset.txt\")\n",
    "print(tdds_fromfile)\n",
    "\n",
    "print(\"Some tests:\")\n",
    "print(tdds_fromfile[()].fraction('1'))\n",
    "print(tdds_fromfile[('Gy',)].fraction('1'))\n",
    "print(tdds_fromfile[('Gx',)].total)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second way can describe arbitrary timstamped data, and uses a more general format where each circuit is on a line by itself, followed by two or three subsequent lines giving the timestamps, the outcome labels, and (optionally) the repetition counts for that circuit.  If the repetition counts are not given, they are all assumed to equal 1.  This is the format that is needed to interact with nicely with `ProtocolData` objects, e.g. for use with `load_data_from_dir`.  Here's an example that creates the same `DataSet` as the one loaded in above, and then loads it in using the usual `load_dataset` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "general_tddataset_txt = \\\n",
    "\"\"\"{}\n",
    "times: 0  1  2  3  4  5\n",
    "outcomes: 0  1  1  0  0  1\n",
    "\n",
    "Gx\n",
    "times: 0  1  2  3  4  5  6  7  8\n",
    "outcomes: 1  1  1  0  0  0  1  1  1\n",
    "\n",
    "Gy\n",
    "times: 0  1  2  3  4  5  6  7\n",
    "outcomes: 1  1  0  0  1  1  0  0\n",
    "\n",
    "\"\"\"\n",
    "with open(\"../../tutorial_files/DatasetWithTimestamps.txt\",\"w\") as output:\n",
    "    output.write(general_tddataset_txt)\n",
    "\n",
    "#This format can be read in using the usual 'load'\n",
    "general_tdds_fromfile = pygsti.io.load_dataset(\"../../tutorial_files/DatasetWithTimestamps.txt\")\n",
    "print(general_tdds_fromfile)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `DatasetWithTimestamps.txt` file could also have been created by specifying `fixedColumnMode=False` to the usual `write_dataset` function, that is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "pygsti.io.write_dataset(\"../../tutorial_files/DatasetWithTimestamps.txt\", tdds_fromfile, fixedColumnMode=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're recording several passes through a set of circuits, and all the data on each pass is considered to occur at the same time (i.e. a course-graining of the time-stamped data), then it may be useful to specify the repetition counts.  For example, the following data file describes data that was taken in two passes (at time 1.0 and 2.0) of 100 circuit repetitions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "general_tddataset_txt = \\\n",
    "\"\"\"{}\n",
    "times:        1  1  2  2\n",
    "outcomes:     0  1  0  1\n",
    "repetitions: 20 80 25 75\n",
    "\n",
    "Gx\n",
    "times:        1  1  2  2\n",
    "outcomes:     0  1  0  1\n",
    "repetitions: 50 50 55 45\n",
    "\n",
    "Gy\n",
    "times:        1  1  2  2\n",
    "outcomes:     0  1  0  1\n",
    "repetitions: 63 37 52 48\n",
    "\n",
    "\"\"\"\n",
    "with open(\"../../tutorial_files/DatasetWith2Passes.txt\",\"w\") as output:\n",
    "    output.write(general_tddataset_txt)\n",
    "\n",
    "#This format can be read in using the usual 'load'\n",
    "twopass_ds = pygsti.io.load_dataset(\"../../tutorial_files/DatasetWith2Passes.txt\")\n",
    "print(twopass_ds)\n",
    "\n",
    "print(\"Some tests:\")\n",
    "print(twopass_ds[()].counts)  #total counts (aggregates over time)\n",
    "print(twopass_ds[()][1.0])    # counts at time=0.0 -- Note this must be a *float* or it's interpeted as in index\n",
    "print(twopass_ds[()][2.0])    # counts at time=1.0\n",
    "\n",
    "#fraction and total function act like they would if all the data was aggregated.\n",
    "print(twopass_ds[()].fraction('1'))\n",
    "print(twopass_ds[('Gy',)].fraction('1'))\n",
    "print(twopass_ds[('Gx',)].total)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}