{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Sets Tutorial\n", "This tutorial demonstrates how to create and use `DataSet` objects. PyGSTi uses `DataSet` objects to hold experimental or simulated data in the form of outcome counts. When a `DataSet` is used to hold time-independent data (the typical case, and all we'll look at in this tutorial), it essentially looks like a nested dictionary which associates operation sequences with dictionaries of (outcome-label,count) pairs so that `dataset[circuit][outcomeLabel]` can be used to read & write the number of `outcomeLabel` outcomes of the experiment given by the circuit `circuit`.\n", "\n", "There are a few important differences between a `DataSet` and a dictionary-of-dictionaries:\n", "- `DataSet` objects can be in one of two modes: *static* or *non-static*. When in *non-static* mode, data can be freely modified within the set, making this mode to use during the data-entry. In the *static* mode, data cannot be modified and the `DataSet` is essentially read-only. The `done_adding_data` method of a `DataSet` switches from non-static to static mode, and should be called, as the name implies, once all desired data has been added (or modified). Once a `DataSet` is static, it is read-only for the rest of its life; to modify its data the best one can do is make a non-static *copy* via the `copy_nonstatic` member and modify the copy.\n", "\n", "- Because `DataSet`s may contain time-dependent data, the dictionary-access syntax for a single outcome label (i.e. `dataset[circuit][outcomeLabel]`) *cannot* be used to write counts for new `circuit` keys; One should instead use the `add_`*xxx* methods of the `DataSet` object.\n", "\n", "Once a `DataSet` is constructed, filled with data, and made *static*, it is typically passed as a parameter to one of pyGSTi's algorithm or driver routines to find a `Model` estimate based on the data. This tutorial focuses on how to construct a `DataSet` and modify its data. Later tutorials will demonstrate the different GST algorithms." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function\n", "import pygsti" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a `DataSet`\n", "There three basic ways to create `DataSet` objects in `pygsti`:\n", "* By creating an empty `DataSet` object and manually adding counts corresponding to operation sequences. Remember that the `add_`*xxx* methods must be used to add data for operation sequences not yet in the `DataSet`. Once the data is added, be sure to call `done_adding_data`, as this restructures the internal storage of the `DataSet` to optimize the access operations used by algorithms.\n", "* By loading from a text-format dataset file via `pygsti.io.load_dataset`. The result is a ready-to-use-in-algorithms *static* `DataSet`, so there's no need to call `done_adding_data` this time.\n", "* By using a `Model` to generate \"fake\" data via `generate_fake_data`. This can be useful for doing simulations of GST, and comparing to your experimental results.\n", "\n", "We do each of these in turn in the cells below." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#1) Creating a data set from scratch\n", "# Note that tuples may be used in lieu of OpString objects\n", "ds1 = pygsti.objects.DataSet(outcomeLabels=['0','1'])\n", "ds1.add_count_dict( ('Gx',), {'0': 10, '1': 90} )\n", "ds1.add_count_dict( ('Gx','Gy'), {'0': 40, '1': 60} )\n", "ds1[('Gy',)] = {'0': 10, '1': 90} # dictionary assignment\n", "\n", "#Modify existing data using dictionary-like access\n", "ds1[('Gx',)]['0'] = 15\n", "ds1[('Gx',)]['1'] = 85\n", "\n", "#OpString objects can be used.\n", "mdl = pygsti.objects.Circuit( ('Gx','Gy'))\n", "ds1[mdl]['0'] = 45\n", "ds1[mdl]['1'] = 55\n", "\n", "ds1.done_adding_data()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading ../tutorial_files/Example_TinyDataset.txt: 100%\n" ] } ], "source": [ "#2) By creating and loading a text-format dataset file. The first\n", "# row is a directive which specifies what the columns (after the\n", "# first one) holds. Note that \"0\" and \"1\" in are the \n", "# SPAM labels and must match those of any Model used in \n", "# conjuction with this DataSet.\n", "dataset_txt = \\\n", "\"\"\"## Columns = 0 count, 1 count\n", "{} 0 100\n", "Gx 10 90\n", "GxGy 40 60\n", "Gx^4 20 90\n", "\"\"\"\n", "with open(\"../tutorial_files/Example_TinyDataset.txt\",\"w\") as tinydataset:\n", " tinydataset.write(dataset_txt)\n", "ds2 = pygsti.io.load_dataset(\"../tutorial_files/Example_TinyDataset.txt\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "#3) By generating fake data (using the std1Q_XYI standard model module)\n", "from pygsti.construction import std1Q_XYI\n", "\n", "#Depolarize the perfect X,Y,I model\n", "depol_gateset = std1Q_XYI.target_model().depolarize(op_noise=0.1)\n", "\n", "#Compute the sequences needed to perform Long Sequence GST on \n", "# this Model with sequences up to lenth 512\n", "circuit_list = pygsti.construction.make_lsgst_experiment_list(\n", " std1Q_XYI.target_model(), std1Q_XYI.prepStrs, std1Q_XYI.effectStrs,\n", " std1Q_XYI.germs, [1,2,4,8,16,32,64,128,256,512])\n", "\n", "#Generate fake data (Tutorial 00)\n", "ds3 = pygsti.construction.generate_fake_data(depol_gateset, circuit_list, nSamples=1000,\n", " sampleError='binomial', seed=100)\n", "ds3b = pygsti.construction.generate_fake_data(depol_gateset, circuit_list, nSamples=50,\n", " sampleError='binomial', seed=100)\n", "\n", "#Write the ds3 and ds3b datasets to a file for later tutorials\n", "pygsti.io.write_dataset(\"../tutorial_files/Example_Dataset.txt\", ds3, outcomeLabelOrder=['0','1']) \n", "pygsti.io.write_dataset(\"../tutorial_files/Example_Dataset_LowCnts.txt\", ds3b) " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Viewing `DataSets`" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset1:\n", " Gx : {('0',): 15.0, ('1',): 85.0}\n", "GxGy : {('0',): 45.0, ('1',): 55.0}\n", "Gy : {('0',): 10.0, ('1',): 90.0}\n", "\n", "\n", "Dataset2:\n", " {} : {('0',): 0.0, ('1',): 100.0}\n", "Gx : {('0',): 10.0, ('1',): 90.0}\n", "GxGy : {('0',): 40.0, ('1',): 60.0}\n", "Gx^4 : {('0',): 20.0, ('1',): 90.0}\n", "\n", "\n", "Dataset3 is too big to print, so here it is truncated to Dataset2's strings\n", " {} : {('0',): 1000.0, ('1',): 0.0}\n", "Gx : {('0',): 501.0, ('1',): 499.0}\n", "GxGy : {('0',): 496.0, ('1',): 504.0}\n", "Gx^4 : {('0',): 829.0, ('1',): 171.0}\n", "\n", "\n" ] } ], "source": [ "#It's easy to just print them:\n", "print(\"Dataset1:\\n\",ds1)\n", "print(\"Dataset2:\\n\",ds2)\n", "print(\"Dataset3 is too big to print, so here it is truncated to Dataset2's strings\\n\", ds3.truncate(ds2.keys()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note that the outcome labels `'0'` and `'1'` appear as `('0',)` and `('1',)`**. This is because outcome labels in pyGSTi are tuples of time-ordered instrument element (see the [intermediate measurements tutorial](advanced/Instruments.ipynb)) and POVM effect labels. In the special but common case when there are no intermediate measurements, the outcome label is a 1-tuple of just the final POVM effect label. In this case, one may use the effect label itself (e.g. `'0'` or `'1'`) in place of the 1-tuple in almost all contexts, as it is automatically converted to the 1-tuple (e.g. `('0',)` or `('1',)`) internally. When printing, however, the 1-tuple is still displayed to remind the user of the more general structure contained in the `DataSet`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Iteration over data sets" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Circuit(Gx), Circuit(GxGy), Circuit(Gy)]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# A DataSet's keys() method returns a list of OpString objects\n", "ds1.keys()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Circuit = Circuit(Gx) , SPAM label = ('0',) , count = 15\n", "Circuit = Circuit(Gx) , SPAM label = ('1',) , count = 85\n", "Circuit = Circuit(GxGy), SPAM label = ('0',) , count = 45\n", "Circuit = Circuit(GxGy), SPAM label = ('1',) , count = 55\n", "Circuit = Circuit(Gy) , SPAM label = ('0',) , count = 10\n", "Circuit = Circuit(Gy) , SPAM label = ('1',) , count = 90\n" ] } ], "source": [ "# There are many ways to iterate over a DataSet. Here's one:\n", "for circuit in ds1.keys():\n", " dsRow = ds1[circuit]\n", " for spamlabel in dsRow.counts.keys():\n", " print(\"Circuit = %s, SPAM label = %s, count = %d\" % \\\n", " (repr(circuit).ljust(13), str(spamlabel).ljust(7), dsRow[spamlabel]))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Advanced features of data sets\n", "\n", "### `collisionAction` argument\n", "When creating a `DataSet` one may specify the `collisionAction` argument as either `\"aggregate\"` (the default) or `\"keepseparate\"`. The former instructs the `DataSet` to simply add the counts of like outcomes when counts are added for an already existing gate sequence. `\"keepseparate\"`, on the other hand, causes the `DataSet` to tag added count data by appending a fictitious `\"#\"` operation label to a gate sequence that already exists, where `` is an integer. When retreiving the keys of a `keepseparate` data set, the `stripOccuranceTags` argument to `keys()` determines whether the `\"#\"` labels are included in the output (if they're not - the default - duplicate keys may be returned). Access to different occurances of the same data are provided via the `occurrance` argument of the `get_row` and `set_row` functions, which should be used instead of the usual bracket indexing." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Aggregate-mode Dataset:\n", " GxGy : {('0',): 40.0, ('1',): 60.0}\n", "\n", "\n", "Keepseparate-mode Dataset:\n", " GxGy : {('0',): 10.0, ('1',): 90.0}\n", "GxGy#1 : {('0',): 40.0, ('1',): 60.0}\n", "\n", "\n" ] } ], "source": [ "ds_agg = pygsti.objects.DataSet(outcomeLabels=['0','1'], collisionAction=\"aggregate\") #the default\n", "ds_agg.add_count_dict( ('Gx','Gy'), {'0': 10, '1': 90} )\n", "ds_agg.add_count_dict( ('Gx','Gy'), {'0': 40, '1': 60} )\n", "print(\"Aggregate-mode Dataset:\\n\",ds_agg)\n", "\n", "ds_sep = pygsti.objects.DataSet(outcomeLabels=['0','1'], collisionAction=\"keepseparate\")\n", "ds_sep.add_count_dict( ('Gx','Gy'), {'0': 10, '1': 90} )\n", "ds_sep.add_count_dict( ('Gx','Gy'), {'0': 40, '1': 60} )\n", "print(\"Keepseparate-mode Dataset:\\n\",ds_sep)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Related topics\n", "This concludes our overview of `DataSet` objects. Here are a couple of related topics we didn't touch on that might be of interest:\n", "- the ability of a `DataSet` to store **time-dependent data** (in addition to the count data described above) is covered in the [timestamped data tutorial](advanced/TimestampedDataSets.ipynb).\n", "- the `MultiDataSet` object, which is similar to a dictionary of `DataSets` is described in the [MultiDataSet tutorial](advanced/MultiDataSet.ipynb). `MultiDataSet` objects are useful when you have several data sets containing outcomes for the *same* set of circuits, like when you do multiple passes of the same experimental procedure." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 1 }