{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# WrightTools for Numpy Users\n", "As scientists transitioned to the Python Scientific Library during its rise in popularity, [new users needed translations to their familiar software, such as Matlab](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html). In the same way, it's important to compare Numpy strategies for organizing data with the framework of WrightTools. This notebook attempts to show how common tools of Numpy (especially advanced indexing) can translate to the WrightTools framework relatively easily.\n", "\n", "These examples are concerned with the `Data` objects that retain the rectangular shapes of ndarrays. The benefits of numpy arrays can generally be accessed within WrightTools. It is a subset of the more general `Data` object representations that can be used (via [making axes that are linear combinations of variables](http://wright.tools/en/stable/auto_examples/fringes_transform.html?highlight=transform)). \n", "\n", "Don't forget to also consult [the WrightTools documentation](https://wright.tools) or examine [WrightTools test scripts](https://github.com/wright-group/WrightTools/tree/master/tests) for examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## numpy ndarrays --> `data` object" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import WrightTools as wt\n", "print(np.__version__) # tested on 1.18.1\n", "print(wt.__version__) # tested on 3.3.1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "x = np.linspace(0, 1, 5) # Hz\n", "y = np.linspace(500, 700, 3) # nm\n", "z = np.exp(-x[:, None]) * np.sqrt(y - 500)[None, :]\n", "\n", "data = wt.Data()\n", "data.create_channel(name=\"z\", values=z)\n", "# BE SURE TO BROADCAST REDUCED DIM VARIABLES--this is how wt.Data knows dimensionality\n", "data.create_variable(name=\"x\", values=x[:, None], units=\"Hz\") \n", "data.create_variable(name=\"y\", values=y[None, :], units=\"nm\")\n", "data.transform(\"x\", \"y\") # create axes\n", "\n", "data.print_tree()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we need to broadcast the variables into the data object. The reason is similar to why we need to broadcast `x` and `y` arrays when defining the `z` array: otherwise it is not clear that the data is multidimensional. Failing to broadcast is a \"gotcha\", because WrightTools will still create `data` (just as with the z-array, you _can_ make a 1D array from `x` and `y`); you will only run into problems once you try to work with the `data`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**WARNING**: Array index order does not correspond to axes numbers!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "print([ax.natural_name for ax in data.axes])\n", "print(f\"data.z.shape= {data.z.shape}\")\n", "\n", "data.transform(\"y\", \"x\")\n", "print([ax.natural_name for ax in data.axes]) # order of axes switches\n", "print(f\"data.z.shape = {data.z.shape}\") # shape of channel does not" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Indexing, advanced indexing" ] }, { "source": [ "Under the hood, WrightTools objects, such as `Data`, `Channel`, and `Variable`, are a type of `hdf5` object, _not_ numpy arrays (for more info, read about the [wt5 format](http://wright.tools/en/stable/wt5.html)). As such, slice commands _emulate_ ndarray behavior, and [only support a subset of all numpy array indexing tricks](https://docs.h5py.org/en/latest/high/dataset.html#fancy-indexing). " ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Access datasets as numpy arrays using slice commands:\n", "All regular slice commands work. Using a null slice returns a full view of the channel or variable as a numpy array." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "print(type(data.z))\n", "print(type(data.z[:])) # a view of the z values as a numpy array\n", "print(data.z[:5, 2:3]) # typical slicing arguments works here as well" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**REMINDER**: the relationship between axis number and channel indices is not fixed (cf. `data.transform`, above), and can be difficult to discern. For this reason, index slicing can get confusing quickly, especially if several dimensions have the same size. For a versatile option that leverages the strengths of WrightTools, use the **`split`** and **`chop`** methods to slice and iterate along dimensions, respectively. Examples using both methods are shown further below." ] }, { "source": [ "### Do not use `newaxis` or `None` indexing to expand dimensionality" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "try: # raises TypeError\n", " data.z[..., np.newaxis] # or, equivalently, data.z[..., None]\n", "except TypeError:\n", " print(\"didn't work!\")" ] }, { "source": [ "A quick workaround, of course, is to work directly with the ndarray view (i.e. `data.z[:]`), which is a numpy array and accepts all regular numpy indexing tricks. For example, `newaxis` works here, as do other advanced indexing methods:" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "data.z[:][..., np.newaxis] # no error\n", "data.z[:][np.isnan(data.z[:])] # no error" ] }, { "source": [ "Be careful, though! Applying this indexing will not give you write capabilities:" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "temp = data.copy(verbose=False)\n", "temp.z[:][..., np.newaxis] *= -1 # no error, but z does not change because boolean indexing copies the view\n", "\n", "print(np.all(temp.z[:] == data.z[:])) # temp.z is unchanged!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatives to expanding dimensionality include making a new data object with expanded ndarrays, or using `wt.data.join`. We show how to expand dimensionality using `wt.data.join` further below." ] }, { "source": [ "### Do not use [boolean array indexing](https://numpy.org/doc/stable/reference/arrays.indexing.html#boolean-array-indexing) on channels and variables - consider `split`" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "\n", "positive = z > 0 # first column is False\n", "z_advind = z[positive] # traditional boolean array indexing with a numpy array\n", "\n", "try:\n", " data.z[positive] # doesn't work\n", "except TypeError:\n", " print(\"Boolean indexing of channel did not work!\")" ] }, { "source": [ "`data.z` cannot be indexed with a boolean array: instead, the `split` method provides the analogous indexing tricks. To use `split`, we first establish the boolean logic as an `expression`, and then use `split` to parse that expression.\n", "\n", "For this example, we can pass the boolean logic it as a `Variable`, and then split based on the variable value:" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "temp = data.copy()\n", "temp.create_variable(name=\"zvar\", values=positive)\n", "zpositive = temp.split(\"zvar\", [True])[1]\n", "print(data.z[:], '\\n')\n", "print(positive, '\\n')\n", "print(zpositive.z[:], '\\n')" ] }, { "source": [ "Here's [a few more good examples of using `split`](http://wright.tools/en/stable/auto_examples/split.html)" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slicing (Keep Dimensionality) --> `data.split`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "z_subset = z[x >= 0.5]\n", "data_subset = data.split(\"x\", [0.5], units=\"Hz\", verbose=False)[1]\n", "\n", "print(\"ndim:\", data_subset.ndim)\n", "print(np.all(data_subset.z[:] == z_subset))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slicing (Reduce Dimensionality) --> `data.chop`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "z_subset = z[2] # z_subset = z[x==x[2]] is equivalent\n", "data_subset = data.chop(\"y\", at={\"x\": [data.x[2], \"Hz\"]}, verbose=False)[0]\n", "\n", "print(\"ndim:\", data_subset.ndim)\n", "print(np.all(data_subset.z[:] == z_subset))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use chop to loop through reduced dimensions of the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# option 1: iterate through collection\n", "chop = data.chop(\"y\")\n", "for di in chop.values():\n", " print(di.constants)\n", "print(\"\\r\")\n", "# option 2: iterate through points, use \"at\" kwarg\n", "for xi in data.x.points:\n", " di = data.chop(\"y\", at={\"x\": [xi, data.x.units]}, verbose=False)[0]\n", " print(di.constants)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For very large datasets, the second option is useful because you never deal with the whole collection. Thus you can loop through individual chop elements and close them after each iteration, saving memory. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `np.newaxis` (or `arr[:, None]`) --> `create_variable` \n", "For a Data object to understand another dimension, create a new variable for the dataset (and transform to the new variable). Since `np.newaxis` makes an orthogonal array dimension, the new variable will be a constant over the data currently spanned: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "z_na = z[..., None]\n", "\n", "new_data = data.copy()\n", "new_data.create_variable(name=\"Temperature\", values=np.ones((1, 1)))\n", "# note the variable shape--variable is broadcast to all elements of the data\n", "# optional: declare Temperature a constant via `create constant`\n", "# new_data.create_constant(\"Temperature\")\n", "new_data.transform(\"x\", \"y\", \"Temperature\")\n", "\n", "print(\"z_na.shape: \", z_na.shape, f\" ({z_na.ndim}D)\")\n", "print(\"new_data.shape: \", new_data.shape, f\" ({new_data.ndim}D)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have added a new variable, but `new_data` does not increase dimensionality. Dimensionality corresponds to the array shapes, not the number of variables (the dimensionality would still be 2 even if `Temperature` changed for each `x` and `y` coordinate).\n", "\n", "Even though the dimensionality has not changed, `new_data` now understands another axis is in play. The above procedure allows us to expand the dimensionality via `join`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `np.concatenate`/`tile`/`stack`/`block`, etc --> `wt.data.join`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Case I: increase dimensionality (stack, block)\n", "If we have two datasets with a trivial dimension of different values, we can combine them to achieve a higher dimensionality data object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "new_data2 = data.copy(verbose=False)\n", "new_data2.create_variable(name=\"Temperature\", values=np.ones((1, 1))*2)\n", "# new_data2.create_constant(\"Temperature\")\n", "new_data2.transform(\"x\", \"y\", \"Temperature\")\n", "\n", "data_with_temps = wt.data.join([new_data, new_data2])\n", "data_with_temps.print_tree()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this strategy can be used to undo the `chop` operation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "chopped = data.chop(\"y\", verbose=False) # data objects in `chopped` have the same axes and points, but differing \"y\" values\n", "# pre-condition data as higher dimensionality\n", "for di in chopped.values():\n", " di.transform(\"x\", \"y\")\n", "stacked = wt.data.join(chopped.values(), name=\"stacked\", verbose=False)\n", "stacked.print_tree()\n", "\n", "print(np.all(stacked.z[:] == data.z[:]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Case II: same dimensionality/mixed dimensionality (concatenate)\n", "\n", "This problem is equivalent to inverting `split`. Note that this rectification will only recompose the array shape to within a transpose." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "splitted = data.split(\"x\", [0.5], units=\"Hz\") # data objects with the same axes, but different points\n", "concatenated = wt.data.join(splitted.values(), name=\"concatenated\", verbose=False) \n", "\n", "print(data.shape, concatenated.shape) # note: different shapes can arise!\n", "print(np.all(data.z[:].T == concatenated.z[:]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Channel Array Math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Binary (channel + constant) operations" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "z **= 2\n", "data.z **= 2 # works for +, -, /, *, **\n", "\n", "print(np.all(data.z[:] == z))\n", "\n", "data.z **= 0.5\n", "z **= 0.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Binary (channel + channel) Operations" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# within the same data object:\n", "data.create_channel(name=\"zed\", values=-data.z[:])\n", "data.zed += data.z\n", "\n", "print(data.zed[:])\n", "data.remove_channel(\"zed\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# between two data objects\n", "data2 = data.copy(verbose=False)\n", "data2.z += data.z\n", "\n", "print(np.all(data2.z[:] == 2 * data.z[:]))\n", "data2.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variable Math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Variables require tricky syntax to change (the above channel math will _not_ work). But the following will work:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "change_x = data.copy(verbose=False)\n", "x = change_x[\"x\"] # the x reference is necessary to use setitem (*=, +=, etc.) syntax on a variable\n", "x **=2\n", "print(np.all(data[\"x\"][:]**2 == change_x[\"x\"][:]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, do you really want to change your variables, or just use different units? It's often the latter. In that case, apply `convert` to the axes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "units_data = data.copy()\n", "units_data.convert(\"wn\") # all axes with frequency/wavelength units will be converted to wavenumbers\n", "units_data.print_tree()\n", "units_data.x.convert(\"Hz\") # apply conversion only to x axis\n", "units_data.print_tree()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maybe you do need to do math that is not a unit conversion (for instance, shift delay values). If needed, you can overwrite the old variable by removing it and renaming the new variable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "print(*data.x[:])\n", "# define replacement variable\n", "data.create_variable(name=\"_x\", values=np.linspace(0, 2, data.x.size).reshape(data[\"x\"].shape), units=data.x.units)\n", "# remove target variable\n", "data.transform(\"y\")\n", "data.remove_variable(\"x\")\n", "# replace target variable\n", "data.rename_variables(_x=\"x\")\n", "data.transform(\"x\", \"y\")\n", "data.print_tree()\n", "print(*data.x[:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Axes Math - `transform`\n", "Axes are just expressions of variables, so the scope of `Axis` math is the linear combinations of variables (plus offsets). Keep in mind that calling linear combinations of variables will force the Data object to rectify the units of all variables involved." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "data = data.copy(verbose=False)\n", "data.transform(\"2*x\", \"3*y\") # do not use spaces when defining axes\n", "print(*data.axes)\n", "data.transform(\"2*x-y\", \"2*y\") # note that axis[0] is now 2D\n", "print(*data.axes)\n", "data.transform(\"x-2\", \"y\") # constant 2 is interpreted in units of `data.x.units`\n", "print(*data.axes)" ] }, { "source": [ "## Summary\n", "\n", "We've gone over many of the common array manipulations that make numpy so powerful, and shown how similar manipulations can be done within the WrightTools framework. Hopefully these examples will empower you explore your datasets with ease!\n", "\n", "Need more examples? Are there some numpy manipulations you are still curious about? Give us feedback by making an issue at our [GitHub page](https://github.com/wright-group/WrightTools/issues)!" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4-final" } }, "nbformat": 4, "nbformat_minor": 4 }