{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# dask-image: A library for distributed image processing\n", "\n", "John Kirkham ([@jakirkham]( https://github.com/jakirkham ))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Typical image processing use cases\n", "\n", "\n", "![]( https://cloud.githubusercontent.com/assets/896692/23625282/7f2d79dc-025d-11e7-8728-d8924596f8fa.png )\n", "\n", "https://github.com/ageitgey/face_recognition" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Typical image processing use cases\n", "\n", "* Commodity cameras\n", "* Color images\n", "* Fit in-memory\n", "* Generic images of recognizable scenes\n", "* Various successful algorithms" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Large image processing use cases\n", "\n", "[![AOLLSM and ExLLSM]( http://img.youtube.com/vi/ma4fbBLKUEE/0.jpg )]( https://www.youtube.com/watch?v=ma4fbBLKUEE )" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Large image processing use cases\n", "\n", "* Specialized instruments\n", "* Monochrome to multispectral\n", "* Does not fit in-memory\n", "* Domain specialists understand data\n", "* Complex pipelines needed for analysis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Working with large image data is hard\n", "\n", "* Data size limits scientists\n", "* Domain knowledge limits technologists" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Common workflows\n", "\n", "* Batch Processing\n", "* Large field of view" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Common workflows - Batch Processing" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "for each_fn in myfiles:\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk)\n", " a_mask = threshold(a_cleaned)\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Common workflows - Large image" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "# Repeated for each op\n", "for each_slice in regions:\n", " larger_slice, cropped_slice = add_overlap(each_slice, cleanup_overlap)\n", " a_larger = load(larger_slice)\n", " a_large_cleaned = cleanup(a_larger)\n", " a = a_large_cleaned[cropped_slice]\n", " save(a)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What are the challenges with these?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles: # <--- Not parallel\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk)\n", " a_mask = threshold(a_cleaned)\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles: # <--- Not parallel\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk) # <--- Not inspectable\n", " a_mask = threshold(a_cleaned)\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles: # <--- Not parallel\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk) # <--- Not inspectable\n", " a_mask = threshold(a_cleaned) # <--- Not swappable\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles: # <--- Not parallel\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk) # <--- Not inspectable\n", " a_mask = threshold(a_cleaned) # <--- Not swappable\n", " a_labeled = label(a_mask)\n", " save(a_labeled) # <--- Not interactive\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "# Repeated for each op # <--- Higher overhead for complex ops\n", "for each_slc in regions:\n", " larger_slice, cropped_slice = get_cleanup_overlap(each_slice)\n", " a_larger = load(larger_slice)\n", " a_large_cleaned = cleanup(a_larger)\n", " a = a_large_cleaned[cropped_slice]\n", " save(a)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# This workflow presents challenges" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Fixing each step increases complexity" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Challenging to maintain" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Hard to learn" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Not very reusable" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## We want to maintain our existing workflow\n", "\n", "
\n", "\n", "## But operate at scale" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\"Dask\"\n", " \n", "1. Dask Array with `map_blocks` and `map_overlap`\n", "2. Dask Image (new!)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Loading image data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "```python\n", "import dask.array as da\n", "from dask_image.imread import imread\n", "\n", "a = da.block([\n", " [imread(\"images/fn00.tiff\"), imread(\"images/fn01.tiff\")],\n", " [imread(\"images/fn10.tiff\"), imread(\"images/fn11.tiff\")],\n", "])\n", "```\n", "\n", "
\n", "\n", "Read more here: https://blog.dask.org/2019/06/20/load-image-data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Batch Processing (Revisited)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "```python\n", "a_cleaned = a.map_blocks(cleanup)\n", "a_mask = a_cleaned.map_blocks(threshold)\n", "a_labeled = a_mask.map_blocks(label)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles:\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk)\n", " a_mask = threshold(a_cleaned)\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Large Image (Revisited)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "```python\n", "a_cleaned = a.map_overlap(cleanup, cleanup_overlap)\n", "a_mask = a_cleaned.map_overlap(threshold, threshold_overlap)\n", "a_labeled = a_mask.map_overlap(label, label_overlap)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "# Repeated for each op\n", "for each_slice in regions:\n", " larger_slice, cropped_slice = add_overlap(each_slice, cleanup_overlap)\n", " a_larger = load(larger_slice)\n", " a_large_cleaned = cleanup(a_larger)\n", " a = a_large_cleaned[cropped_slice]\n", " save(a)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# How can we improve this workflow?\n", "\n", "* `.map_blocks` for batch\n", "* `.map_overlap` for large images" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Are we done?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* What about reusability?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* How do we engage domain specialists?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* By making a library using common API :)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Smoothing Use Case" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "![Checkerboard]( images/checkerboard.png )" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![Smoothed Checkerboard]( images/checkerboard_smoothed.png )" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Smoothing Use Case" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "from scipy.ndimage import uniform_filter\n", "\n", "uniform_filter(a, 10)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Smoothing Use Case" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "def uniform_filter(input,\n", " size=3,\n", " mode='reflect',\n", " cval=0.0,\n", " origin=0):\n", " size = _utils._get_size(input.ndim, size)\n", " depth = _utils._get_depth(size, origin)\n", "\n", " depth, boundary = _utils._get_depth_boundary(input.ndim, depth, \"none\")\n", "\n", " result = input.map_overlap(\n", " scipy.ndimage.filters.uniform_filter,\n", " depth=depth,\n", " boundary=boundary,\n", " dtype=input.dtype,\n", " size=size,\n", " mode=mode,\n", " cval=cval,\n", " origin=origin\n", " )\n", "\n", " return result\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# scipy.ndimage coverage" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "| Function name | SciPy ndimage | dask-image |\n", "|-------------------------------|---------------|------------|\n", "| `affine_transform` | X | |\n", "| `binary_closing` | X | X |\n", "| `binary_dilation` | X | X |\n", "| `binary_erosion` | X | X |\n", "| `binary_fill_holes` | X | |\n", "| `binary_hit_or_miss` | X | |\n", "| `binary_opening` | X | X |\n", "| `binary_propagation` | X | |\n", "| `black_tophat` | X | |\n", "| `center_of_mass` | X | X |\n", "| `convolve` | X | X |\n", "| `convolve1d` | X | |\n", "| `correlate` | X | X |\n", "| `correlate1d` | X | |\n", "| `distance_transform_bf` | X | |\n", "| `distance_transform_cdt` | X | |\n", "| `distance_transform_edt` | X | |\n", "| `extrema` | X | X |\n", "| `find_objects` | X | |\n", "| `fourier_ellipsoid` | X | |\n", "| `fourier_gaussian` | X | X |\n", "| `fourier_shift` | X | X |\n", "| `fourier_uniform` | X | X |\n", "| `gaussian_filter` | X | X |\n", "| `gaussian_filter1d` | X | |\n", "| `gaussian_gradient_magnitude` | X | X |\n", "| `gaussian_laplace` | X | X |\n", "| `generate_binary_structure` | X | |\n", "| `generic_filter` | X | X |\n", "| `generic_filter1d` | X | |\n", "| `generic_gradient_magnitude` | X | |\n", "| `generic_laplace` | X | |\n", "| `geometric_transform` | X | |\n", "| `grey_closing` | X | |\n", "| `grey_dilation` | X | |\n", "| `grey_erosion` | X | |\n", "| `grey_opening` | X | |\n", "| `histogram` | X | X |\n", "| `imread` | X | X |\n", "| `iterate_structure` | X | |\n", "| `label` | X | X |\n", "| `labeled_comprehension` | X | X |\n", "| `laplace` | X | X |\n", "| `map_coordinates` | X | |\n", "| `maximum` | X | X |\n", "| `maximum_filter` | X | X |\n", "| `maximum_filter1d` | X | |\n", "| `maximum_position` | X | X |\n", "| `mean` | X | X |\n", "| `median` | X | X |\n", "| `median_filter` | X | X |\n", "| `minimum` | X | X |\n", "| `minimum_filter` | X | X |\n", "| `minimum_filter1d` | X | |\n", "| `minimum_position` | X | X |\n", "| `morphological_gradient` | X | |\n", "| `morphological_laplace` | X | |\n", "| `percentile_filter` | X | X |\n", "| `prewitt` | X | X |\n", "| `rank_filter` | X | X |\n", "| `rotate` | X | |\n", "| `shift` | X | |\n", "| `sobel` | X | X |\n", "| `spline_filter` | X | |\n", "| `spline_filter1d` | X | |\n", "| `standard_deviation` | X | X |\n", "| `sum` | X | X |\n", "| `uniform_filter` | X | X |\n", "| `uniform_filter1d` | X | |\n", "| `variance` | X | X |\n", "| `watershed_ift` | X | |\n", "| `white_tophat` | X | |\n", "| `zoom` | X | |" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Future Work" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "* Adding needed functions to the API\n", "* Working closely with the community\n", "* Handling generalized arrays\n", "* Exploring GPUs for similar operations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Getting started\n", "\n", "### Conda\n", "\n", "```\n", "conda install -c conda-forge dask-image\n", "```\n", "\n", "### Pip\n", "\n", "```\n", "pip install dask-image\n", "```" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }