{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# TileDB Quickstart Delayed Notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook will walk through usage of TileDB Cloud Delayed APIs\n", "\n", "We will start off using the high level delayed API and finish with a quick look at the lower level DAG functionality.\n", "\n", "First let's import the necessary packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import base packages\n", "import tiledb\n", "import tiledb.cloud\n", "from tiledb.cloud.compute import Delayed, DelayedSQL, DelayedArrayUDF\n", "import numpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction To Delayed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generic Functions\n", "\n", "Any python function can be wrapped in a Delayed object making the function executable as a future" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "x = Delayed(numpy.median)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You can see the type is now `Delayed`\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function can be called with parameters to store it and lazily executed" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x([1,2,3,4,5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To force an execution call `compute()`" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SQL and Arrays\n", "\n", "Besides arbitrary python functions, serverless sql queries and array based UDFs can also be called with the delayed API" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AVG(`a`)
02
\n", "
" ], "text/plain": [ " AVG(`a`)\n", "0 2" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# SQL\n", "y = DelayedSQL(\"select AVG(`a`) FROM `tiledb://TileDB-Inc/quickstart_sparse`\")\n", "\n", "# Run query\n", "y.compute()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Array\n", "z = DelayedArrayUDF(\"tiledb://TileDB-Inc/quickstart_sparse\", lambda x: numpy.average(x[\"a\"]))([(1, 4), (1, 4)])\n", "\n", "# Run the udf on the array\n", "z.compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Local Functions\n", "\n", "Lastly it is also possible to include a generic python function as delayed but have it run locally instead of serverlessly. This is useful for testing or for saving finalized results to your local machine, i.e. saving a image.\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "local = Delayed(numpy.median, local=True)([1,2,3])\n", "local.compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task Graphs\n", "\n", "Delayed objects can be combined into a task graph. Output from one function or query can be passed into another, and dependencies are automatically determined." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Build several delayed objects to build in a graph\n", "local = Delayed(lambda x: x * 2, local=True)(100)\n", "array_apply = DelayedArrayUDF(\"tiledb://TileDB-Inc/quickstart_sparse\", lambda x: numpy.sum(x[\"a\"]), name=\"array_apply\")([(1, 4), (1, 4)])\n", "sql = DelayedSQL(\"select SUM(`a`) as a from `{}`\".format(\"tiledb://TileDB-Inc/quickstart_dense\"), name=\"sql\")\n", "\n", "# Custom function to use to average all the results we are passing in\n", "def mean(local, array_apply, sql):\n", " import numpy\n", " return numpy.mean([local, array_apply, sql.iloc(0)[0]])\n", "\n", "res = Delayed(func_exec=mean, name=\"node_exec\")(local, array_apply, sql)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A live graph can show the status of the Task Graph" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b0d8ebc128144b84bdaf4f32aef90cb0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FigureWidget({\n", " 'data': [{'hoverinfo': 'none',\n", " 'line': {'color': '#888', 'width': 0.5},\n", " …" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "res.visualize(force_plotly=True)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "114.0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced Delayed Usage\n", "\n", "There are several functionalities which are exposed to allow for complex use cases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Manually specifying dependencies\n", "\n", "There might be cases were a function relies on another function but does not take it's arguments. An example would be if a function manipulated data on S3 but did not return anything." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# A few base functions:\n", "import random\n", "node_1 = Delayed(numpy.median, local=True, name=\"node_1\")([1, 2, 3])\n", "node_2 = Delayed(lambda x: x * 2, local=True, name=\"node_2\")(node_1)\n", "node_3 = Delayed(lambda x: x * 2, local=True, name=\"node_3\")(node_2)\n", "\n", "nodes_by_name= {'node_1': node_1, 'node_2': node_2, 'node_3': node_3}\n", "#Function which sleeps for some time so we can see the graph in different states\n", "def f():\n", " import time\n", " import random\n", " x = random.randrange(0, 30)\n", " time.sleep(x)\n", " return x\n", "\n", "# Randomly add 96 other nodes to the graph. All of these will use the sleep function\n", "for i in range(4, 100):\n", " name = \"node_{}\".format(i)\n", " node = Delayed(f, local=True, name=name)()\n", " \n", " dep = random.randrange(1, i-1)\n", " # Randomly set dependency on one other node\n", " node_dep = nodes_by_name[\"node_{}\".format(dep)]\n", " # Force the dependency to be set\n", " node.depends_on(node_dep)\n", " \n", " nodes_by_name[name] = node\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "30e18e102c934783a911215cd88078cb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Visualize(value='{\"nodes\": [\"node_1\", \"node_2\", \"node_4\", \"node_11\", \"node_20\", \"node_61\", \"node_3\", \"node_5\",…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "node_1.visualize()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "node_99 = nodes_by_name[\"node_99\"]\n", "node_99.compute()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }