{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Image Processing\n", "================" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the quickstart guide for dask-image." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. [Setting up your environment](#Setting-up-your-environment)\n", "1. [Importing dask-image](#Importing-dask-image)\n", "1. [Getting the example data](#Getting-the-example-data)\n", "1. [Reading in image data](#Reading-in-image-data)\n", " 1. [Reading a single image](#Reading-a-single-image)\n", " 1. [Reading multiple images](#Reading-multiple-images)\n", "\n", "1. [Applying your own custom function to images](#Applying-your-own-custom-function-to-images)\n", " 1. [Embarrassingly parallel problems](#Embarrassingly-parallel-problems)\n", "\n", "1. [Joining partial images together](#Joining-partial-images-together)\n", "\n", "1. [A segmentation analysis pipeline](#A-segmentation-analysis-pipeline)\n", " 1. [Filtering](#Filtering)\n", " 1. [Segmenting](#Segmenting)\n", " 1. [Analyzing](#Analyzing)\n", "1. [Next steps](#Next-steps)\n", "1. [Cleaning up temporary directories and files](#Cleaning-up-temporary-directories-and-files)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Setting up your environment](#Setting-up-your-environment)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install Extra Dependencies\n", "\n", "We first install the library scikit-image for easier access to the example image data there." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are running this notebook on your own computer and not in the mybinder environment, you'll additionally need to ensure your Python environment contains:\n", "* dask\n", "* dask-image\n", "* python-graphviz\n", "* scikit-image\n", "* matplotlib\n", "* numpy\n", "\n", "You can refer to the full list of dependencies used for the `dask-examples` repository, available in the [`binder/environment.yml` file here](https://github.com/dask/dask-examples/blob/main/binder/environment.yml\n", ") (note that the `nomkl` package is not available for Windows users): https://github.com/dask/dask-examples/blob/main/binder/environment.yml\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Importing dask-image](#Importing-dask-image)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you import dask-image, be sure to use an underscore instead of a dash between the two words." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import dask_image.imread\n", "import dask_image.ndfilters\n", "import dask_image.ndmeasure\n", "import dask.array as da" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll also use matplotlib to display image results in this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Getting the example data](#Getting-the-example-data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use some example image data from the scikit-image library in this tutorial. These images are very small, but will allow us to demonstrate the functionality of dask-image. \n", "\n", "Let's download and save a public domain image of the astronaut Eileen Collins to a temporary directory. This image was originally downloaded from the NASA Great Images database , but we'll access it with scikit-image's `data.astronaut()` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!mkdir temp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from skimage import data, io\n", "\n", "output_filename = os.path.join('temp', 'astronaut.png')\n", "io.imsave(output_filename, data.astronaut())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Really large datasets often can't fit all of the data into a single file, so we'll chop this image into four parts and save the image tiles to a second temporary directory. This will give you a better idea of how you can use dask-image on a real dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!mkdir temp-tiles" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "io.imsave(os.path.join('temp-tiles', 'image-00.png'), data.astronaut()[:256, :256, :]) # top left\n", "io.imsave(os.path.join('temp-tiles', 'image-01.png'), data.astronaut()[:256, 256:, :]) # top right\n", "io.imsave(os.path.join('temp-tiles', 'image-10.png'), data.astronaut()[256:, :256, :]) # bottom left\n", "io.imsave(os.path.join('temp-tiles', 'image-11.png'), data.astronaut()[256:, 256:, :]) # bottom right" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have some data saved, let's practise reading in files with dask-image and processing our images." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Reading in image data](#Reading-in-image-data)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Reading a single image](#Reading-a-single-image)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load a public domain image of the astronaut Eileen Collins with dask-image [imread()](http://image.dask.org/en/latest/dask_image.imread.html). This image was originally downloaded from the NASA Great Images database ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "filename = os.path.join('temp', 'astronaut.png')\n", "print(filename)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "astronaut = dask_image.imread.imread(filename)\n", "print(astronaut)\n", "plt.imshow(astronaut[0, ...]) # display the first (and only) frame of the image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This has created a dask array with `shape=(1, 512, 512, 3)`. This means it contains one image frame with 512 rows, 512 columns, and 3 color channels. \n", "\n", "Since the image is relatively small, it fits entirely within one dask-image chunk, with `chunksize=(1, 512, 512, 3)`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Reading multiple images](#Reading-multiple-images)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In many cases, you may have multiple images stored on disk, for example:\n", "`image_00.png`, `image_01.png`, ... `image_NN.png`. These can be read into a dask array as multiple image frames." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we have the astronaut image split into four non-overlapping tiles:\n", "* `image_00.png` = top left image (index 0,0)\n", "* `image_01.png` = top right image (index 0,1)\n", "* `image_10.png` = bottom left image (index 1,0)\n", "* `image_11.png` = bottom right image (index 1,1)\n", "\n", "This filename pattern can be matched with regex: `image-*.png`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls temp-tiles" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "filename_pattern = os.path.join('temp-tiles', 'image-*.png')\n", "tiled_astronaut_images = dask_image.imread.imread(filename_pattern)\n", "print(tiled_astronaut_images)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This has created a dask array with `shape=(4, 256, 256, 3)`. This means it contains four image frames; each with 256 rows, 256 columns, and 3 color channels. \n", "\n", "There are four chunks in this particular case. Each image frame here is a separate chunk with `chunksize=(1, 256, 256, 3)`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=2, ncols=2)\n", "ax[0,0].imshow(tiled_astronaut_images[0])\n", "ax[0,1].imshow(tiled_astronaut_images[1])\n", "ax[1,0].imshow(tiled_astronaut_images[2])\n", "ax[1,1].imshow(tiled_astronaut_images[3])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Applying your own custom function to images](#Applying-your-own-custom-function-to-images)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next you'll want to do some image processing, and apply a function to your images.\n", "\n", "We'll use a very simple example: converting an RGB image to grayscale. But you can also use this method to apply arbitrary functions to dask images. To convert our image to grayscale, we'll use the equation to calculate luminance ([reference pdf](http://www.poynton.com/PDFs/ColorFAQ.pdf))\": \n", "\n", "`Y = 0.2125 R + 0.7154 G + 0.0721 B` \n", "\n", "We'll write the function for this equation as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def grayscale(rgb):\n", " result = ((rgb[..., 0] * 0.2125) + \n", " (rgb[..., 1] * 0.7154) + \n", " (rgb[..., 2] * 0.0721))\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's apply this function to the astronaut image we read in as a single file and visualize the computation graph. \n", "\n", "(Visualizing the computation graph isn't necessary most of the time but it's helpful to know what dask is doing under the hood, and it can also be very useful for debugging problems.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "single_image_result = grayscale(astronaut)\n", "print(single_image_result)\n", "single_image_result.visualize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also see that there are no longer three color channels in the shape of the result, and that the output image is as expected." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Original image dimensions: \", astronaut.shape)\n", "print(\"Processed image dimensions:\", single_image_result.shape)\n", "\n", "fig, (ax0, ax1) = plt.subplots(nrows=1, ncols=2)\n", "ax0.imshow(astronaut[0, ...]) # display the first (and only) frame of the image\n", "ax1.imshow(single_image_result[0, ...], cmap='gray') # display the first (and only) frame of the image\n", "\n", "# Subplot headings\n", "ax0.set_title('Original image')\n", "ax1.set_title('Processed image')\n", "\n", "# Don't display axes\n", "ax0.axis('off')\n", "ax1.axis('off')\n", "\n", "# Display images\n", "plt.show(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Embarrassingly parallel problems](#Embarrassingly-parallel-problems)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The syntax is identical to apply a function to multiple images or dask chunks. This is an example of an embarrassingly parallel problem, and we see that dask automatically creates a computation graph for each chunk.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = grayscale(tiled_astronaut_images)\n", "print(result)\n", "result.visualize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ((ax0, ax1), (ax2, ax3)) = plt.subplots(nrows=2, ncols=2)\n", "ax0.imshow(result[0, ...], cmap='gray')\n", "ax1.imshow(result[1, ...], cmap='gray')\n", "ax2.imshow(result[2, ...], cmap='gray')\n", "ax3.imshow(result[3, ...], cmap='gray')\n", "\n", "# Subplot headings\n", "ax0.set_title('First chunk')\n", "ax1.set_title('Second chunk')\n", "ax2.set_title('Thurd chunk')\n", "ax3.set_title('Fourth chunk')\n", "\n", "# Don't display axes\n", "ax0.axis('off')\n", "ax1.axis('off')\n", "ax2.axis('off')\n", "ax3.axis('off')\n", "\n", "# Display images\n", "plt.show(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Joining partial images together](#Joining-partial-images-together)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, Things are looking pretty good! But how can we join these image chunks together?\n", "\n", "So far, we haven't needed any information from neighboring pixels to do our calculations. But there are lots of functions (like those in [dask-image ndfilters](https://dask-image.readthedocs.io/en/latest/dask_image.ndfilters.html)) that *do* need this for accurate results. You could end up with unwanted edge effects if you don't tell dask how your images should be joined.\n", "\n", "Dask has several ways to join chunks together: [Stack, Concatenate, and Block](http://docs.dask.org/en/latest/array-stack.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Block is very versatile, so we'll use that in this next example. You simply pass in a list (or list of lists) to tell dask the spatial relationship between image chunks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = [[result[0, ...], result[1, ...]],\n", " [result[2, ...], result[3, ...]]]\n", "combined_image = da.block(data)\n", "print(combined_image.shape)\n", "plt.imshow(combined_image, cmap='gray')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [A segmentation analysis pipeline](#A-segmentation-analysis-pipeline)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll walk through a simple image segmentation and analysis pipeline with three steps:\n", "1. [Filtering](#Filtering)\n", "1. [Segmenting](#Segmenting)\n", "1. [Analyzing](#Analyzing)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Filtering](#Filtering)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most analysis pipelines require some degree of image preprocessing. dask-image has a number of inbuilt filters available via [dask-image ndfilters](https://dask-image.readthedocs.io/en/latest/dask_image.ndfilters.html)\n", "\n", "Commonly a guassian filter may used to smooth the image before segmentation. This causes some loss of sharpness in the image, but can improve segmentation quality for methods that rely on image thresholding." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "smoothed_image = dask_image.ndfilters.gaussian_filter(combined_image, sigma=[1, 1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see a small amount of blur in the smoothed image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, (ax0, ax1) = plt.subplots(nrows=1, ncols=2)\n", "ax0.imshow(smoothed_image, cmap='gray')\n", "ax1.imshow(smoothed_image - combined_image, cmap='gray')\n", "\n", "# Subplot headings\n", "ax0.set_title('Smoothed image')\n", "ax1.set_title('Difference from original')\n", "\n", "# Don't display axes\n", "ax0.axis('off')\n", "ax1.axis('off')\n", "\n", "# Display images\n", "plt.show(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the gaussian filter uses information from neighbouring pixels, the computational graph looks more complicated than the ones we looked at earlier. This is no longer embarrassingly parallel. Where possible dask keeps the computations for each of the four image chunks separate, but must combine information from different chunks near the edges.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "smoothed_image.visualize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Segmenting](#Segmenting)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After the image preprocessing, we segment regions of interest from the data. We'll use a simple arbitrary threshold as the cutoff, at 75% of the maximum intensity of the smoothed image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "threshold_value = 0.75 * da.max(smoothed_image).compute()\n", "print(threshold_value)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "threshold_image = smoothed_image > threshold_value\n", "plt.imshow(threshold_image, cmap='gray')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we label each region of connected pixels above the threshold value. For this we use the `label` function from [dask-image ndmeasure](https://dask-image.readthedocs.io/en/latest/dask_image.ndmeasure.html). This will return both the label image, and the number of labels." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "label_image, num_labels = dask_image.ndmeasure.label(threshold_image)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Number of labels:\", int(num_labels))\n", "plt.imshow(label_image, cmap='viridis')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Analyzing](#Analysing)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a number of inbuilt functions in [dask-image ndmeasure](https://dask-image.readthedocs.io/en/latest/dask_image.ndmeasure.html) useful for quantitative analysis.\n", "\n", "We'll use the `dask_image.ndmeasure.mean()` and `dask_image.ndmeasure.standard_deviation()` functions, and apply them to each label region with `dask_image.ndmeasure.labeled_comprehension()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "index = list(range(int(num_labels))) # Note that we're including the background label=0 here, too.\n", "out_dtype = float # The data type we want to use for our results.\n", "default = None # The value to return if an element of index does not exist in the label image.\n", "mean_values = dask_image.ndmeasure.labeled_comprehension(combined_image, label_image, index, dask_image.ndmeasure.mean, out_dtype, default, pass_positions=False)\n", "print(mean_values.compute())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we're including label 0 in our index, it's not surprising that the first mean value is so much lower than the others - it's the background region below our cutoff threshold for segmentation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also calculate the standard deviation of the pixel values in our greyscale image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stdev_values = dask_image.ndmeasure.labeled_comprehension(combined_image, label_image, index, dask_image.ndmeasure.standard_deviation, out_dtype, default, pass_positions=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's load our analysis results into a pandas table and then save it as a csv file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df = pd.DataFrame()\n", "df['label'] = index\n", "df['mean'] = mean_values.compute()\n", "df['standard_deviation'] = stdev_values.compute()\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.to_csv('example_analysis_results.csv')\n", "print('Saved example_analysis_results.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Next steps](#Next-steps)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I hope this guide has helped you to get started with dask-image. \n", "\n", "**Documentation**\n", "\n", "You can read more about dask-image in the [dask-image documentation](https://dask-image.readthedocs.io/en/latest/api.html)\n", "and [API reference](https://dask-image.readthedocs.io/en/latest/api.html). Documentation for [dask is here](http://docs.dask.org/en/latest/).\n", "\n", "The dask-examples repository has a number of other example notebooks: https://github.com/dask/dask-examples\n", "\n", "**Scaling up with dask distributed**\n", "\n", "If you want to send dask jobs to a computing cluster for distributed processing, you should take a look at [dask distributed](https://distributed.dask.org/en/latest/). There is also a [quickstart guide available](https://distributed.dask.org/en/latest/quickstart.html).\n", "\n", "**Saving image data with zarr**\n", "\n", "In some cases it may be necessary to save large data after image processing, [zarr](https://zarr.readthedocs.io/en/stable/) is a python library that you may find useful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Cleaning up temporary directories and files](#Cleaning-up-temporary-directories-and-files)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You recall we saved some example data to the directories `temp/` and `temp-tiles/`. To delete the contents, run the following command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!rm -r temp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!rm -r temp-tiles" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 4 }