{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Numba 0.51 CUDA Release demo\n",
    "\n",
    "Key changes to the CUDA target include:\n",
    "\n",
    "* Support for CUDA Toolkit 11, Ampere, and Compute Capability 8.0\n",
    "* Stream callbacks ([demo](#Stream-callbacks))\n",
    "* Async awaitable streams ([demo](#Async-awaitable-streams))\n",
    "* Printing of SASS code for kernels ([demo](#Inspecting-SASS-code))\n",
    "* Atomic ``nanmin`` and ``nanmax`` ([demo](#Atomic-nanmin-and-nanmax))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A couple of useful imports\n",
    "from numba import cuda\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Stream callbacks\n",
    "\n",
    "Adding a callback to a stream allows a function on the host to be called when all the items presently enqueued on the stream have completed. This can be useful for notifying the host code that a certain sequence of events has completed.\n",
    "\n",
    "The CUDA Toolkit Samples include an example of using stream callbacks in a multithreaded application in the `0_Simple/simpleCallback` directory - for this notebook, we demonstrate the CUDA Python API only.\n",
    "\n",
    "First let's define a kernel and a callback function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A simple kernel to add two arrays\n",
    "@cuda.jit\n",
    "def add_kernel(r, x, y):\n",
    "    i = cuda.grid(1)\n",
    "    \n",
    "    if i < len(r):\n",
    "        r[i] = x[i] + y[i]\n",
    "\n",
    "# Python function for callback\n",
    "# `data` is the data value set when the callback was added (see below)\n",
    "def print_callback(stream, status, data):\n",
    "    print(f'Callback function reached! Data is {data}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll create a stream then use it to transfer data and launch a kernel before enqueueing a callback:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Callback function reached! Data is HELLO WORLD\n"
     ]
    }
   ],
   "source": [
    "# Create a stream\n",
    "s1 = cuda.stream()\n",
    "\n",
    "# Transfer all data to the device on the stream\n",
    "n_elements = 256\n",
    "x = cuda.to_device(np.random.random(n_elements), stream=s1)\n",
    "y = cuda.to_device(np.random.random(n_elements), stream=s1)\n",
    "r = cuda.device_array_like(x, stream=s1)\n",
    "\n",
    "# Launch the kernel on the stream\n",
    "add_kernel[1, n_elements, s1](r, x, y)\n",
    "\n",
    "# Add a callback that will be called on the host when the kernel launch is complete.\n",
    "# The first parameter is the function to call.\n",
    "# The second parameter is passed into the `data` (3rd) argument of the callback.\n",
    "s1.add_callback(print_callback, 'HELLO WORLD')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Callbacks can also be used to notify `Event` objects. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Waiting for event\n",
      "Waiting returned: True\n"
     ]
    }
   ],
   "source": [
    "import threading\n",
    "\n",
    "# Create a stream and an event that will be notified\n",
    "s2 = cuda.stream()\n",
    "callback_event = threading.Event()\n",
    "\n",
    "# Define a callback function to notify the event\n",
    "def notify_callback(stream, status, event):\n",
    "    event.set()\n",
    "\n",
    "# Add the callback to the stream:\n",
    "s2.add_callback(notify_callback, callback_event)\n",
    "\n",
    "# Wait for the event\n",
    "print(\"Waiting for event\")\n",
    "ret = callback_event.wait(1.0)\n",
    "print(f\"Waiting returned: {ret}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Async awaitable streams\n",
    "\n",
    "`Stream.async_done()` returns an awaitable that resolves once all preceding stream operations are complete:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import asyncio\n",
    "\n",
    "# Define a new stream\n",
    "s3 = cuda.stream()\n",
    "\n",
    "# Directly awaiting on async_done works in the notebook as it already\n",
    "# has a running event loop\n",
    "await s3.async_done()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An example creating multiple tasks on multiple streams and gathering:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1.0, 2.0, 3.0, 4.0]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "async def async_cuda_fn(value_in):\n",
    "    stream = cuda.stream()\n",
    "    h_src, h_dst = cuda.pinned_array(8), cuda.pinned_array(8)\n",
    "    h_src[:] = value_in\n",
    "    d_ary = cuda.to_device(h_src, stream=stream)\n",
    "    d_ary.copy_to_host(h_dst, stream=stream)\n",
    "    await stream.async_done()\n",
    "    return h_dst.mean()\n",
    "\n",
    "values_in = [1, 2, 3, 4]\n",
    "tasks = [asyncio.create_task(async_cuda_fn(v)) for v in values_in]\n",
    "values_out = await asyncio.gather(*tasks)\n",
    "values_out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspecting SASS code\n",
    "\n",
    "Let's define a simple kernel, and compile it eagerly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "@cuda.jit('void(float32[::1], float32[::1], float32[::1])')\n",
    "def add(r, x, y):\n",
    "    i = cuda.grid(1)\n",
    "    \n",
    "    if i < len(r):\n",
    "        r[i] = x[i] + y[i]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can print the SASS code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t.headerflags\t@\"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM75 EF_CUDA_VIRTUAL_SM(EF_CUDA_SM75)\"\n",
      "\t.elftype\t@\"ET_EXEC\"\n",
      "\n",
      "\n",
      "//--------------------- .debug_frame              --------------------------\n",
      "\t.section\t.debug_frame,\"\",@progbits\n",
      ".debug_frame:\n",
      "        /*0000*/ \t.byte\t0xff, 0xff, 0xff, 0xff, 0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff\n",
      "        /*0010*/ \t.byte\t0xff, 0xff, 0xff, 0xff, 0x03, 0x00, 0x04, 0x7c, 0xff, 0xff, 0xff, 0xff, 0x0f, 0x0c, 0x81, 0x80\n",
      "        /*0020*/ \t.byte\t0x80, 0x28, 0x00, 0x08, 0xff, 0x81, 0x80, 0x28, 0x08, 0x81, 0x80, 0x80, 0x28, 0x00, 0x00, 0x00\n",
      "        /*0030*/ \t.byte\t0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00\n",
      "        /*0040*/ \t.byte\t0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00\n",
      "        /*0048*/ \t.dword\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE\n",
      "        /*0050*/ \t.byte\t0x70, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x02, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00\n",
      "        /*0060*/ \t.byte\t0x00, 0x00, 0x0c, 0x81, 0x80, 0x80, 0x28, 0x00, 0x04, 0x7a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00\n",
      "\n",
      "\n",
      "//--------------------- .nv.info                  --------------------------\n",
      "\t.section\t.nv.info,\"\",@\"SHT_CUDA_INFO\"\n",
      "\t.align\t4\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_FRAME_SIZE\n",
      "\t.align\t\t4\n",
      "        /*0000*/ \t.byte\t0x04, 0x11\n",
      "        /*0002*/ \t.short\t(.L_11 - .L_10)\n",
      "\t.align\t\t4\n",
      ".L_10:\n",
      "        /*0004*/ \t.word\tindex@(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE)\n",
      "        /*0008*/ \t.word\t0x00000000\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_REGCOUNT\n",
      "\t.align\t\t4\n",
      ".L_11:\n",
      "        /*000c*/ \t.byte\t0x04, 0x2f\n",
      "        /*000e*/ \t.short\t(.L_13 - .L_12)\n",
      "\t.align\t\t4\n",
      ".L_12:\n",
      "        /*0010*/ \t.word\tindex@(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE)\n",
      "        /*0014*/ \t.word\t0x0000000d\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_MIN_STACK_SIZE\n",
      "\t.align\t\t4\n",
      ".L_13:\n",
      "        /*0018*/ \t.byte\t0x04, 0x12\n",
      "        /*001a*/ \t.short\t(.L_15 - .L_14)\n",
      "\t.align\t\t4\n",
      ".L_14:\n",
      "        /*001c*/ \t.word\tindex@(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE)\n",
      "        /*0020*/ \t.word\t0x00000000\n",
      ".L_15:\n",
      "\n",
      "\n",
      "//--------------------- .nv.info._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE --------------------------\n",
      "\t.section\t.nv.info._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE,\"\",@\"SHT_CUDA_INFO\"\n",
      "\t.align\t4\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_EXIT_INSTR_OFFSETS\n",
      "\t.align\t\t4\n",
      "        /*0000*/ \t.byte\t0x04, 0x1c\n",
      "        /*0002*/ \t.short\t(.L_17 - .L_16)\n",
      "\n",
      "\n",
      "\t//   ....[0]....\n",
      ".L_16:\n",
      "        /*0004*/ \t.word\t0x00000070\n",
      "\n",
      "\n",
      "\t//   ....[1]....\n",
      "        /*0008*/ \t.word\t0x000001f0\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_MAXREG_COUNT\n",
      "\t.align\t\t4\n",
      ".L_17:\n",
      "        /*000c*/ \t.byte\t0x03, 0x1b\n",
      "        /*000e*/ \t.short\t0x00ff\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      "        /*0010*/ \t.byte\t0x04, 0x17\n",
      "        /*0012*/ \t.short\t(.L_19 - .L_18)\n",
      ".L_18:\n",
      "        /*0014*/ \t.word\t0x00000000\n",
      "        /*0018*/ \t.short\t0x0000\n",
      "        /*001a*/ \t.short\t0x0000\n",
      "        /*001c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_19:\n",
      "        /*0020*/ \t.byte\t0x04, 0x17\n",
      "        /*0022*/ \t.short\t(.L_21 - .L_20)\n",
      ".L_20:\n",
      "        /*0024*/ \t.word\t0x00000000\n",
      "        /*0028*/ \t.short\t0x0001\n",
      "        /*002a*/ \t.short\t0x0008\n",
      "        /*002c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_21:\n",
      "        /*0030*/ \t.byte\t0x04, 0x17\n",
      "        /*0032*/ \t.short\t(.L_23 - .L_22)\n",
      ".L_22:\n",
      "        /*0034*/ \t.word\t0x00000000\n",
      "        /*0038*/ \t.short\t0x0002\n",
      "        /*003a*/ \t.short\t0x0010\n",
      "        /*003c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_23:\n",
      "        /*0040*/ \t.byte\t0x04, 0x17\n",
      "        /*0042*/ \t.short\t(.L_25 - .L_24)\n",
      ".L_24:\n",
      "        /*0044*/ \t.word\t0x00000000\n",
      "        /*0048*/ \t.short\t0x0003\n",
      "        /*004a*/ \t.short\t0x0018\n",
      "        /*004c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_25:\n",
      "        /*0050*/ \t.byte\t0x04, 0x17\n",
      "        /*0052*/ \t.short\t(.L_27 - .L_26)\n",
      ".L_26:\n",
      "        /*0054*/ \t.word\t0x00000000\n",
      "        /*0058*/ \t.short\t0x0004\n",
      "        /*005a*/ \t.short\t0x0020\n",
      "        /*005c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_27:\n",
      "        /*0060*/ \t.byte\t0x04, 0x17\n",
      "        /*0062*/ \t.short\t(.L_29 - .L_28)\n",
      ".L_28:\n",
      "        /*0064*/ \t.word\t0x00000000\n",
      "        /*0068*/ \t.short\t0x0005\n",
      "        /*006a*/ \t.short\t0x0028\n",
      "        /*006c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_29:\n",
      "        /*0070*/ \t.byte\t0x04, 0x17\n",
      "        /*0072*/ \t.short\t(.L_31 - .L_30)\n",
      ".L_30:\n",
      "        /*0074*/ \t.word\t0x00000000\n",
      "        /*0078*/ \t.short\t0x0006\n",
      "        /*007a*/ \t.short\t0x0030\n",
      "        /*007c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_31:\n",
      "        /*0080*/ \t.byte\t0x04, 0x17\n",
      "        /*0082*/ \t.short\t(.L_33 - .L_32)\n",
      ".L_32:\n",
      "        /*0084*/ \t.word\t0x00000000\n",
      "        /*0088*/ \t.short\t0x0007\n",
      "        /*008a*/ \t.short\t0x0038\n",
      "        /*008c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_33:\n",
      "        /*0090*/ \t.byte\t0x04, 0x17\n",
      "        /*0092*/ \t.short\t(.L_35 - .L_34)\n",
      ".L_34:\n",
      "        /*0094*/ \t.word\t0x00000000\n",
      "        /*0098*/ \t.short\t0x0008\n",
      "        /*009a*/ \t.short\t0x0040\n",
      "        /*009c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_35:\n",
      "        /*00a0*/ \t.byte\t0x04, 0x17\n",
      "        /*00a2*/ \t.short\t(.L_37 - .L_36)\n",
      ".L_36:\n",
      "        /*00a4*/ \t.word\t0x00000000\n",
      "        /*00a8*/ \t.short\t0x0009\n",
      "        /*00aa*/ \t.short\t0x0048\n",
      "        /*00ac*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_37:\n",
      "        /*00b0*/ \t.byte\t0x04, 0x17\n",
      "        /*00b2*/ \t.short\t(.L_39 - .L_38)\n",
      ".L_38:\n",
      "        /*00b4*/ \t.word\t0x00000000\n",
      "        /*00b8*/ \t.short\t0x000a\n",
      "        /*00ba*/ \t.short\t0x0050\n",
      "        /*00bc*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_39:\n",
      "        /*00c0*/ \t.byte\t0x04, 0x17\n",
      "        /*00c2*/ \t.short\t(.L_41 - .L_40)\n",
      ".L_40:\n",
      "        /*00c4*/ \t.word\t0x00000000\n",
      "        /*00c8*/ \t.short\t0x000b\n",
      "        /*00ca*/ \t.short\t0x0058\n",
      "        /*00cc*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_41:\n",
      "        /*00d0*/ \t.byte\t0x04, 0x17\n",
      "        /*00d2*/ \t.short\t(.L_43 - .L_42)\n",
      ".L_42:\n",
      "        /*00d4*/ \t.word\t0x00000000\n",
      "        /*00d8*/ \t.short\t0x000c\n",
      "        /*00da*/ \t.short\t0x0060\n",
      "        /*00dc*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_43:\n",
      "        /*00e0*/ \t.byte\t0x04, 0x17\n",
      "        /*00e2*/ \t.short\t(.L_45 - .L_44)\n",
      ".L_44:\n",
      "        /*00e4*/ \t.word\t0x00000000\n",
      "        /*00e8*/ \t.short\t0x000d\n",
      "        /*00ea*/ \t.short\t0x0068\n",
      "        /*00ec*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_45:\n",
      "        /*00f0*/ \t.byte\t0x04, 0x17\n",
      "        /*00f2*/ \t.short\t(.L_47 - .L_46)\n",
      ".L_46:\n",
      "        /*00f4*/ \t.word\t0x00000000\n",
      "        /*00f8*/ \t.short\t0x000e\n",
      "        /*00fa*/ \t.short\t0x0070\n",
      "        /*00fc*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_47:\n",
      "        /*0100*/ \t.byte\t0x04, 0x17\n",
      "        /*0102*/ \t.short\t(.L_49 - .L_48)\n",
      ".L_48:\n",
      "        /*0104*/ \t.word\t0x00000000\n",
      "        /*0108*/ \t.short\t0x000f\n",
      "        /*010a*/ \t.short\t0x0078\n",
      "        /*010c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_49:\n",
      "        /*0110*/ \t.byte\t0x04, 0x17\n",
      "        /*0112*/ \t.short\t(.L_51 - .L_50)\n",
      ".L_50:\n",
      "        /*0114*/ \t.word\t0x00000000\n",
      "        /*0118*/ \t.short\t0x0010\n",
      "        /*011a*/ \t.short\t0x0080\n",
      "        /*011c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_51:\n",
      "        /*0120*/ \t.byte\t0x04, 0x17\n",
      "        /*0122*/ \t.short\t(.L_53 - .L_52)\n",
      ".L_52:\n",
      "        /*0124*/ \t.word\t0x00000000\n",
      "        /*0128*/ \t.short\t0x0011\n",
      "        /*012a*/ \t.short\t0x0088\n",
      "        /*012c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_53:\n",
      "        /*0130*/ \t.byte\t0x04, 0x17\n",
      "        /*0132*/ \t.short\t(.L_55 - .L_54)\n",
      ".L_54:\n",
      "        /*0134*/ \t.word\t0x00000000\n",
      "        /*0138*/ \t.short\t0x0012\n",
      "        /*013a*/ \t.short\t0x0090\n",
      "        /*013c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_55:\n",
      "        /*0140*/ \t.byte\t0x04, 0x17\n",
      "        /*0142*/ \t.short\t(.L_57 - .L_56)\n",
      ".L_56:\n",
      "        /*0144*/ \t.word\t0x00000000\n",
      "        /*0148*/ \t.short\t0x0013\n",
      "        /*014a*/ \t.short\t0x0098\n",
      "        /*014c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_KPARAM_INFO\n",
      "\t.align\t\t4\n",
      ".L_57:\n",
      "        /*0150*/ \t.byte\t0x04, 0x17\n",
      "        /*0152*/ \t.short\t(.L_59 - .L_58)\n",
      ".L_58:\n",
      "        /*0154*/ \t.word\t0x00000000\n",
      "        /*0158*/ \t.short\t0x0014\n",
      "        /*015a*/ \t.short\t0x00a0\n",
      "        /*015c*/ \t.byte\t0x00, 0xf0, 0x21, 0x00\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_CBANK_PARAM_SIZE\n",
      "\t.align\t\t4\n",
      ".L_59:\n",
      "        /*0160*/ \t.byte\t0x03, 0x19\n",
      "        /*0162*/ \t.short\t0x00a8\n",
      "\n",
      "\n",
      "\t//----- nvinfo : EIATTR_PARAM_CBANK\n",
      "\t.align\t\t4\n",
      "        /*0164*/ \t.byte\t0x04, 0x0a\n",
      "        /*0166*/ \t.short\t(.L_61 - .L_60)\n",
      "\t.align\t\t4\n",
      ".L_60:\n",
      "        /*0168*/ \t.word\tindex@(.nv.constant0._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE)\n",
      "        /*016c*/ \t.short\t0x0160\n",
      "        /*016e*/ \t.short\t0x00a8\n",
      ".L_61:\n",
      "\n",
      "\n",
      "//--------------------- .nv.constant0._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE --------------------------\n",
      "\t.section\t.nv.constant0._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE,\"a\",@progbits\n",
      "\t.align\t4\n",
      ".nv.constant0._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE:\n",
      "\t.zero\t\t520\n",
      "\n",
      "\n",
      "//--------------------- .text._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE --------------------------\n",
      "\t.section\t.text._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE,\"ax\",@progbits\n",
      "\t.sectioninfo\t@\"SHI_REGISTERS=13\"\n",
      "\t.align\t128\n",
      "        .global         _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE\n",
      "        .type           _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE,@function\n",
      "        .size           _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE,(.L_62 - _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE)\n",
      "        .other          _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE,@\"STO_CUDA_ENTRY STV_DEFAULT\"\n",
      "_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE:\n",
      ".text._ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE:\n",
      "        /*0000*/                   IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;\n",
      "        /*0010*/                   S2R R0, SR_TID.X ;\n",
      "        /*0020*/                   S2R R3, SR_CTAID.X ;\n",
      "        /*0030*/                   IMAD R0, R3, c[0x0][0x0], R0 ;\n",
      "        /*0040*/                   ISETP.GE.U32.AND P0, PT, R0, c[0x0][0x188], PT ;\n",
      "        /*0050*/                   SHF.R.S32.HI R6, RZ, 0x1f, R0 ;\n",
      "        /*0060*/                   ISETP.GE.AND.EX P0, PT, R6, c[0x0][0x18c], PT, P0 ;\n",
      "        /*0070*/               @P0 EXIT ;\n",
      "        /*0080*/                   ISETP.GE.AND P0, PT, R0, RZ, PT ;\n",
      "        /*0090*/                   SEL R7, RZ, c[0x0][0x1c0], P0 ;\n",
      "        /*00a0*/                   SEL R3, RZ, c[0x0][0x1f8], P0 ;\n",
      "        /*00b0*/                   SEL R9, RZ, c[0x0][0x1c4], P0 ;\n",
      "        /*00c0*/                   SEL R5, RZ, c[0x0][0x1fc], P0 ;\n",
      "        /*00d0*/                   IADD3 R7, P1, R0.reuse, R7, RZ ;\n",
      "        /*00e0*/                   IADD3 R3, P2, R0, R3, RZ ;\n",
      "        /*00f0*/                   IMAD.X R10, R6.reuse, 0x1, R9, P1 ;\n",
      "        /*0100*/                   LEA R4, P1, R7, c[0x0][0x1b8], 0x2 ;\n",
      "        /*0110*/                   IMAD.X R8, R6, 0x1, R5, P2 ;\n",
      "        /*0120*/                   LEA R2, P2, R3, c[0x0][0x1f0], 0x2 ;\n",
      "        /*0130*/                   LEA.HI.X R5, R7, c[0x0][0x1bc], R10, 0x2, P1 ;\n",
      "        /*0140*/                   LEA.HI.X R3, R3, c[0x0][0x1f4], R8, 0x2, P2 ;\n",
      "        /*0150*/                   LDG.E.SYS R5, [R4] ;\n",
      "        /*0160*/                   LDG.E.SYS R2, [R2] ;\n",
      "        /*0170*/                   SEL R7, RZ, c[0x0][0x188], P0 ;\n",
      "        /*0180*/                   IADD3 R0, P1, R0, R7, RZ ;\n",
      "        /*0190*/                   SEL R7, RZ, c[0x0][0x18c], P0 ;\n",
      "        /*01a0*/                   IMAD.X R7, R6, 0x1, R7, P1 ;\n",
      "        /*01b0*/                   LEA R6, P0, R0, c[0x0][0x180], 0x2 ;\n",
      "        /*01c0*/                   LEA.HI.X R7, R0, c[0x0][0x184], R7, 0x2, P0 ;\n",
      "        /*01d0*/                   FADD R9, R2, R5 ;\n",
      "        /*01e0*/                   STG.E.SYS [R6], R9 ;\n",
      "        /*01f0*/                   EXIT ;\n",
      ".L_9:\n",
      "        /*0200*/                   BRA `(.L_9);\n",
      ".L_62:\n",
      "\n",
      "\n",
      "//--------------------- .nv.global                --------------------------\n",
      "\t.section\t.nv.global,\"aw\",@nobits\n",
      "\t.align\t8\n",
      "\t.global\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidx__\n",
      "\t.type\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidx__,@object\n",
      "\t.size\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidx__,(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidz__ - _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidx__)\n",
      "_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidx__:\n",
      ".nv.global:\n",
      "\t.zero\t\t4\n",
      "\t.global\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidz__\n",
      "\t.type\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidz__,@object\n",
      "\t.size\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidz__,(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__errcode__ - _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidz__)\n",
      "_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidz__:\n",
      "\t.zero\t\t4\n",
      "\t.global\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__errcode__\n",
      "\t.type\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__errcode__,@object\n",
      "\t.size\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__errcode__,(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidy__ - _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__errcode__)\n",
      "_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__errcode__:\n",
      "\t.zero\t\t4\n",
      "\t.global\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidy__\n",
      "\t.type\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidy__,@object\n",
      "\t.size\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidy__,(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidx__ - _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidy__)\n",
      "_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__ctaidy__:\n",
      "\t.zero\t\t4\n",
      "\t.global\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidx__\n",
      "\t.type\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidx__,@object\n",
      "\t.size\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidx__,(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidz__ - _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidx__)\n",
      "_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidx__:\n",
      "\t.zero\t\t4\n",
      "\t.global\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidz__\n",
      "\t.type\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidz__,@object\n",
      "\t.size\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidz__,(_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidy__ - _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidz__)\n",
      "_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidz__:\n",
      "\t.zero\t\t4\n",
      "\t.global\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidy__\n",
      "\t.type\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidy__,@object\n",
      "\t.size\t\t_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidy__,(.L_4 - _ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidy__)\n",
      "_ZN6cudapy8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE__tidy__:\n",
      "\t.zero\t\t4\n",
      ".L_4:\n",
      "\t.zero\t\t4\n",
      "\t.global\t\t_ZN08NumbaEnv8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE\n",
      "\t.type\t\t_ZN08NumbaEnv8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE,@object\n",
      "\t.size\t\t_ZN08NumbaEnv8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE,(.L_8 - _ZN08NumbaEnv8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE)\n",
      "_ZN08NumbaEnv8__main__7add$242E5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE5ArrayIfLi1E1C7mutable7alignedE:\n",
      "\t.zero\t\t8\n",
      ".L_8:\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(add.inspect_sass())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that SASS inspection uses `nvdisasm` from the [CUDA Binary Utilities](https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html), so the full CUDA toolkit must be installed - the conda `cudatoolkit` package does not include `nvdisasm`.\n",
    "\n",
    "# Atomic `nanmin` and `nanmax`\n",
    "\n",
    "Numba 0.50 changed the semantics of `cuda.atomic.min` and `cuda.atomic.max` to more closely match that of `math.max` and `numpy.max`, which always return the first argument if one of the operands is a NaN. The old behaviour, treating NaN values as missing data is now implemented as `cuda.atomic.nanmin` and `cuda.atomic.nanmax` (c.f. [`nanmin`](https://numpy.org/doc/stable/reference/generated/numpy.nanmin.html) and [`nanmax`](https://numpy.org/doc/stable/reference/generated/numpy.nanmax.html) in NumPy).\n",
    "\n",
    "To demonstrate the difference, we'll define two kernels, one using `max`, and the other using `nanmax`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "@cuda.jit\n",
    "def atomic_max_array(val, arr):\n",
    "    i = cuda.grid(1)\n",
    "    cuda.atomic.max(arr, i, val[0])\n",
    "\n",
    "@cuda.jit\n",
    "def atomic_nanmax_array(val, arr):\n",
    "    i = cuda.grid(1)\n",
    "    cuda.atomic.nanmax(arr, i, val[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we define input arrays - we'll set the value of one element to NaN:    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0.,  1.,  2.,  3.,  4., nan,  6.,  7.], dtype=float32)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n_elements = 8\n",
    "x_max = np.arange(n_elements, dtype=np.float32)\n",
    "x_max[5] = np.nan\n",
    "\n",
    "# A copy of the input for the other kernel\n",
    "x_nanmax = x_max.copy()\n",
    "\n",
    "# Print an array to show the NaN in position:\n",
    "x_max"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll launch the kernels on our arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A value to compare the array values with\n",
    "val = np.ones(1, dtype=np.float32) + 2\n",
    "\n",
    "# Launch kernels\n",
    "atomic_max_array[1, n_elements](val, x_max)\n",
    "atomic_nanmax_array[1, n_elements](val, x_nanmax)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looking at the output from `cuda.atomic.max`, we see that it didn't treat the NaN value as missing data, but instead returned it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 3.,  3.,  3.,  3.,  4., nan,  6.,  7.], dtype=float32)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_max"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whereas `cuda.atomic.nanmax` treated the NaN as missing data and replaced it with the comparison value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3., 3., 3., 3., 4., 3., 6., 7.], dtype=float32)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_nanmax"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}