{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## unyt_dask arrays\n", "\n", "This notebook demonstrates the latest version of the `unyt_dask_array` implementation at https://github.com/chrishavlin/unyt/tree/dask_unyt \n", "\n", "This implementation adds dask as an optional dependency to `unyt`, and subclasses `dask.array.core.Array` to create a unyt array with dask abilities. \n", "\n", "The main access point is through the `unyt_from_dask` function, which takes a dask array and user-specified units information to create a `unyt_dask_array` object:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 800.00 MB 8.00 MB
Shape (10000, 10000) (1000, 1000)
Count 100 Tasks 100 Chunks
Type float64 numpy.ndarray
Units m m
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 10000\n", " 10000\n", "\n", "
" ], "text/plain": [ "unyt_dask_array" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from unyt.dask_array import unyt_from_dask\n", "from dask import array as dask_array\n", "\n", "x = unyt_from_dask(dask_array.random.random((10000,10000), chunks=(1000,1000)), 'm')\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This array behaves like a dask array, so that when operations are applied, we initially only build the dask execution graph:\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 8 B 8 B
Shape () ()
Count 339 Tasks 1 Chunks
Type float64 numpy.ndarray
Units m m
\n", "
\n", "\n", "
" ], "text/plain": [ "unyt_dask_array" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = (x * 2).mean()\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and when we execute that graph, we get back a base `unyt_quantity` or `unyt_array` depending on the number of elements reutrned:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_quantity(1.00004815, 'm')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.compute()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_array([0.99678626, 0.99558319, 1.00601329, ..., 1.00921666,\n", " 0.99226075, 1.00205869], 'm')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = (x * 2).mean(1)\n", "result.compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "adding or subtracting follows the unyt behavior, in that we need to add/subtract objects that have units. If adding a constant, it must be a `unyt_quantity`:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "ename": "UnitOperationError", "evalue": "The operator for unyt_arrays with units \"m\" (dimensions \"(length)\") and \"dimensionless\" (dimensions \"1\") is not well defined.", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mUnitOperationError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# this will error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/src/yt_general/unyt/unyt/dask_array.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 190\u001b[0m \u001b[0mfuncname\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mthe_func\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 191\u001b[0m \u001b[0mufunc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mua\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munyt_quantity\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfuncname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 192\u001b[0;31m \u001b[0mnewargs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0munyt_result\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_prep_ufunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mufunc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mextract_dask\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 193\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 194\u001b[0m \u001b[0mdasksuperfunk\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mArray\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfuncname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/src/yt_general/unyt/unyt/dask_array.py\u001b[0m in \u001b[0;36m_prep_ufunc\u001b[0;34m(ufunc, extract_dask, *input, **kwargs)\u001b[0m\n\u001b[1;32m 166\u001b[0m \u001b[0;31m# apply the operation to the hidden unyt_quantities\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 167\u001b[0m \u001b[0minput\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0munyt_inputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_sanitize_unit_args\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0minput\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 168\u001b[0;31m \u001b[0munyt_result\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mufunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0munyt_inputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 169\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 170\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mextract_dask\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/src/yt_general/unyt/unyt/array.py\u001b[0m in \u001b[0;36m__array_ufunc__\u001b[0;34m(self, ufunc, method, *inputs, **kwargs)\u001b[0m\n\u001b[1;32m 1769\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mUnitOperationError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mufunc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mu0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mu1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1770\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1771\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mUnitOperationError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mufunc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mu0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mu1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1772\u001b[0m \u001b[0mconv\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moffset\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mu1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_conversion_factor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mu0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minp1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1773\u001b[0m \u001b[0mnew_dtype\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"f\"\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minp1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitemsize\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mUnitOperationError\u001b[0m: The operator for unyt_arrays with units \"m\" (dimensions \"(length)\") and \"dimensionless\" (dimensions \"1\") is not well defined." ] } ], "source": [ "# this will error\n", "result = x + 2" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 800.00 MB 8.00 MB
Shape (10000, 10000) (1000, 1000)
Count 200 Tasks 100 Chunks
Type float64 numpy.ndarray
Units m m
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 10000\n", " 10000\n", "\n", "
" ], "text/plain": [ "unyt_dask_array" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from unyt import unyt_quantity\n", "\n", "result = x + unyt_quantity(10, 'm')\n", "result" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_quantity(10.50002407, 'm')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.mean().compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `unyt_dask_array` class will convert units of the same dimension before calculation, following normal unyt behavior:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_quantity(10.50002407, 'm')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = x + unyt_quantity(1000, 'cm')\n", "result.mean().compute()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_quantity(10.50002407, 'm')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = x.to('km') + unyt_quantity(1000, 'cm')\n", "result.mean().compute().to('m')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, in the case of multiple `unyt_dask_arrays`:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 800.00 MB 8.00 MB
Shape (10000, 10000) (1000, 1000)
Count 1000 Tasks 100 Chunks
Type float64 numpy.ndarray
Units km km
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 10000\n", " 10000\n", "\n", "
" ], "text/plain": [ "unyt_dask_array" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = unyt_from_dask(dask_array.random.random((10000,10000), chunks=(1000,1000)), 'm')\n", "x2 = unyt_from_dask(dask_array.random.random((10000,10000), chunks=(1000,1000)), 'cm')\n", "x3 = unyt_from_dask(dask_array.random.random((10000,10000), chunks=(1000,1000)), 'km')\n", "\n", "x4 = (x1 * x2 + x3 * x2) / x1\n", "x4" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_quantity(5.565572e-05, 'km')" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x4.mean().compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a dask client is active, then execution is managed by the client:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from dask.distributed import Client\n", "client = Client(threads_per_worker=4, n_workers=1)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "

Client

\n", "\n", "
\n", "

Cluster

\n", "
    \n", "
  • Workers: 1
  • \n", "
  • Cores: 4
  • \n", "
  • Memory: 33.51 GB
  • \n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "x_da = unyt_from_dask(dask_array.random.random((10000, 10000), chunks=(1000, 1000)), 'm')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 8 B 8 B
Shape () ()
Count 239 Tasks 1 Chunks
Type float64 numpy.ndarray
Units m m
\n", "
\n", "\n", "
" ], "text/plain": [ "unyt_dask_array" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x_da.min()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 800.00 MB 8.00 MB
Shape (10000, 10000) (1000, 1000)
Count 200 Tasks 100 Chunks
Type float64 numpy.ndarray
Units km km
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 10000\n", " 10000\n", "\n", "
" ], "text/plain": [ "unyt_dask_array" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x_da.to('km')" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_quantity(0.001, 'km')" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x_da.to('km').max().compute()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 4 }