{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# dask-image: A library for distributed image processing\n", "\n", "John Kirkham ([@jakirkham]( https://github.com/jakirkham ))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Typical image processing use cases\n", "\n", "\n", "![]( https://cloud.githubusercontent.com/assets/896692/23625282/7f2d79dc-025d-11e7-8728-d8924596f8fa.png )\n", "\n", "https://github.com/ageitgey/face_recognition" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Typical image processing use cases\n", "\n", "* Commodity cameras\n", "* Color images\n", "* Fit in-memory\n", "* Generic images of recognizable scenes\n", "* Various successful algorithms" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Large image processing use cases\n", "\n", "[![AOLLSM and ExLLSM]( http://img.youtube.com/vi/ma4fbBLKUEE/0.jpg )]( https://www.youtube.com/watch?v=ma4fbBLKUEE )" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Large image processing use cases\n", "\n", "* Specialized instruments\n", "* Monochrome to multispectral\n", "* Does not fit in-memory\n", "* Domain specialists understand data\n", "* Complex pipelines needed for analysis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Working with large image data is hard\n", "\n", "* Data size limits scientists\n", "* Domain knowledge limits technologists" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Common workflows\n", "\n", "* Batch Processing\n", "* Large field of view" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Common workflows - Batch Processing" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "for each_fn in myfiles:\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk)\n", " a_mask = threshold(a_cleaned)\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Common workflows - Large image" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "# Repeated for each op\n", "for each_slice in regions:\n", " larger_slice, cropped_slice = add_overlap(each_slice, cleanup_overlap)\n", " a_larger = load(larger_slice)\n", " a_large_cleaned = cleanup(a_larger)\n", " a = a_large_cleaned[cropped_slice]\n", " save(a)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What are the challenges with these?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles: # <--- Not parallel\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk)\n", " a_mask = threshold(a_cleaned)\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles: # <--- Not parallel\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk) # <--- Not inspectable\n", " a_mask = threshold(a_cleaned)\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles: # <--- Not parallel\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk) # <--- Not inspectable\n", " a_mask = threshold(a_cleaned) # <--- Not swappable\n", " a_labeled = label(a_mask)\n", " save(a_labeled)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "for each_fn in myfiles: # <--- Not parallel\n", " a_chunk = load(each_fn)\n", " a_cleaned = cleanup(a_chunk) # <--- Not inspectable\n", " a_mask = threshold(a_cleaned) # <--- Not swappable\n", " a_labeled = label(a_mask)\n", " save(a_labeled) # <--- Not interactive\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "```python\n", "# Repeated for each op # <--- Higher overhead for complex ops\n", "for each_slc in regions:\n", " larger_slice, cropped_slice = get_cleanup_overlap(each_slice)\n", " a_larger = load(larger_slice)\n", " a_large_cleaned = cleanup(a_larger)\n", " a = a_large_cleaned[cropped_slice]\n", " save(a)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# This workflow presents challenges" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Fixing each step increases complexity" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Challenging to maintain" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Hard to learn" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Not very reusable" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## We want to maintain our existing workflow\n", "\n", "