{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Access Saved Data\n", "\n", "**This tutorial shows Databoker 2.0.0, which is in prelease. It is currently being evaluated and tested at some NSLS-II beamlines.**\n", "\n", "**To run this, first we need to point Juptyer at a separate Python environment. Execute the line below, and the click \"Python 3\" in the top-right corner of this notebook, and choose \"Python 3 (preview)\" from the options that appear.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!/srv/conda/envs/preview/bin/python -m ipykernel install --user --name preview --display-name \"Python 3 (preview)\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial we will acquire data with the Bluesky RunEngine, persist it in a database (MongoDB), and then use the Databroker/Tiled Python client to access it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bluesky_tutorial_utils import setup_data_saving_future_version\n", "from bluesky import RunEngine\n", "\n", "RE = RunEngine()\n", "catalog = setup_data_saving_future_version(RE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bluesky.plans import scan, count\n", "from ophyd.sim import det, motor, motor1, motor2\n", "\n", "from bluesky.preprocessors import SupplementalData\n", "\n", "# Record positions of motor1 and motor2 and the beginning and end of\n", "# every run in the \"baseline\" stream.\n", "sd = SupplementalData(baseline=[motor1, motor2])\n", "RE.preprocessors.append(sd)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "RE(count([det], 3), purpose=\"calibration\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "RE(scan([det], motor, -1, 1, 5), mood=\"optimistic\", sample={\"color\": \"red\", \"composition\": \"Ni\"})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "RE(scan([det], motor, -1, 1, 5), mood=\"skeptical\", sample={\"color\": \"red\", \"composition\": \"Ni\"})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(uid,) = RE(scan([det], motor, -1, 1, 5), mood=\"optimistic\", sample={\"color\": \"blue\", \"composition\": \"Cu\"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What can you do with a Catalog?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look up by recency." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look up by `scan_id`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look up by (partial) universally unique ID." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "uid" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog[uid]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "uid[:8]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog[uid[:8]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate over entries like a dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for uid, run in catalog.items():\n", " print(f\"{uid[:8]}: {run}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or do anything you can do with a (read-only) `dict`. This shows that `catalog` implements Python's standard \"mapping\" interface." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import collections.abc\n", "\n", "isinstance(catalog, collections.abc.Mapping)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In summary:\n", "```python\n", "catalog[-1] # the most recent Run\n", "catalog[-5] # the fifth-most-recent Run\n", "catalog[3] # 'scan_id' == 3 (if ambiguous, returns the most recent match)\n", "catalog[\"6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced\"] # a full globally unique ID...\n", "catalog[\"6f3ee9\"] # ...or just enough characters to uniquely identify it (6-8 usually suffices)\n", "```\n", "\n", "The globally unique ID is best for use in scripts, but the others are nice for interactive use. All of these incantations return a `BlueskyRun`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run = catalog[-1]\n", "run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Catalog also support search." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from databroker.queries import FullText, TimeRange, RawMongo # more to come..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog.search(RawMongo(start={\"plan_name\": \"count\"}))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog.search(FullText(\"optimistic\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog.search(TimeRange(since=\"2020\", until=\"2020-03-01\", timezone=\"Canada/Central\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog.search(TimeRange(since=\"2020\", timezone=\"Canada/Central\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you search on a Catalog, you get another Catalog with a subset of the entries. You can search on this in turn, progressively narrowing the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog.search(RawMongo(start={\"sample.color\": \"red\"}))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog.search(RawMongo(start={\"sample.color\": \"red\"})).search(FullText(\"optimistic\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Try various searches." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What can you with a BlueskyRun?\n", "\n", "A `BlueskyRun` bundles together some metadata and several logical tables (\"streams\") of data. First, the metadata. It always comes in two sections, `\"start\"` and `\"stop\"`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run.start # Everything we know before the measurements start." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above contains a mixture of things that bluesky automatically recorded (e.g. the time), things the bluesky plan reported (e.g. which motor(s) are scanned), and things the user told us (e.g. the name of the operator)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run.stop # Everything we only know after the measurements stop." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can dig into the contents in the usual way." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run.start[\"num_points\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run.stop[\"exit_status\"] == \"success\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we said, a Run bundles together any number of \"streams\" of data. Picture these as tables or spreadsheets. The stream names are shown when we print `run`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also list them programmatically." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(run)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access a particular stream like `run[\"primary\"].read()`. Dot access also works — `run.primary.read()` — if the stream name is a valid Python identifier and does not collide with any other attributes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds = run[\"primary\"].read()\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an `xarray.Dataset`. At this point Bluesky and Data Broker have served their purpose and handed us a useful, general-purpose scientific Python data structure with our data in it.\n", "\n", "## What can you do with an `xarray.Dataset`?\n", "\n", "We can easily generate scatter plots of one dimension vs another." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds.plot.scatter(x=\"time\", y=\"det\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can pull out specific columns. (Each column in an `xarray.Dataset` is called an `xarray.DataArray`.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "motor = ds[\"motor\"]\n", "motor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inside this `xarray.DataArray` is a plain old numpy array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(motor.values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The extra context provided by xarray is very useful. Notice that the dimensions have names, so we can perform aggregations over named axes without remembering the _order_ of the dimensions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `plot` method on `xarray.DataArray` often just \"does the right thing\" based on the dimensionality of the data. It even labels our axes for us!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "motor.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a quick overview of xarray see [the xarray documentation](https://xarray.pydata.org/en/stable/quick-overview.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Coming back to our `run`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "read the \"baseline\" stream. The baseline stream conventionally includes readings taken just before and after a scan to record all potentially-relevant positions and temperatures and note if they have drifted." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Try your solution here." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load solutions/access_baseline_data.py" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (preview)", "language": "python", "name": "preview" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }