{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Image Segmentation using the idr0021 dataset\n", "\n", "This notebook uses the idr0021 dataset ([idr0021-lawo-pericentriolarmaterial/experimentA](http://idr.openmicroscopy.org/webclient/?show=project-51)) and tries to reproduce some of the analysis published in ['Subdiffraction imaging of centrosomes reveals higher-order organizational features of pericentriolar material'](https://doi.org/10.1038/ncb2591); in particular to create a figure similar to [Figure 1](https://www.nature.com/articles/ncb2591/figures/1) of the article.\n", "\n", "## Tasks\n", "\n", "- Connect to an OMERO.server\n", "- Load data (images, datasets, projects, pixel data)\n", "- Image segmentation using EBImage\n", "- Create ROIs\n", "- Store segmentation results on the OMERO.server\n", "- Plot the results\n", "- Calculate basic statistics from the segmentation data\n", "\n", "**Exercise:** Go to https://workshop.openmicroscopy.org , log in as your given user, and find your dataset called 'R-dataset'. Leave the window open, as you will use OMERO.web to check the results as we're going along." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to use this notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The grey blocks with the square brackets to the left are code blocks. Click on a block to select it and click the play button in the toolbar to execute the block of code (alternatively hold SHIFT key and hit ENTER). The bracket will show a star to indicate that this code is currently running. The output is displayed below the block. Wait for it to finish. When the execution is finished the star will be replaced by a number. \n", "\n", "As we go through the notebook you are expected to run the code blocks in order.\n", "\n", "Here's an example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "message(\"Just wait for a bit...\")\n", "Sys.sleep(3)\n", "message(\"Done.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can change the code and run the block again." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's get started..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the necessary libraries:\n", "- [romero-gateway](https://github.com/ome/rOMERO-gateway) for the OMERO <-> R communication\n", "- [EBImage](https://bioconductor.org/packages/release/bioc/html/EBImage.html) to perform the image thresholding." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load the libraries\n", "library(romero.gateway)\n", "library(EBImage, warn.conflicts = FALSE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Log in to the OMERO server" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "user_name = readline('Username: ')\n", "user_password <- getPass::getPass('OMERO password: ')\n", "\n", "server <- OMEROServer(host = 'wss://workshop.openmicroscopy.org/omero-ws', port = 443L, username=user_name, password=user_password)\n", "server <- connect(server)\n", "paste('Successfully logged in as', server@user$getUserName())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load and display an image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Go to the image 'siControl_N20_Cep215_I_20110411_Mon-1509_0_SIR_PRJ.dv' in your 'R-dataset', find the 'Image ID', copy it and paste it below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "imageId <- REPLACE_WITH_IMAGE_ID \n", "image <- loadObject(server, \"ImageData\", imageId)\n", "paste(\"Image\", imageId, \"loaded.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the pixel values and display the image" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# There is just one plane, so z = 1 and t = 1\n", "z <- 1\n", "t <- 1\n", "\n", "# Load the second channel\n", "channelIndex <- 2\n", "\n", "pixels <- getPixelValues(image, z, t, channelIndex)\n", "\n", "ebimage <- EBImage::Image(data = pixels, colormode = 'Grayscale')\n", "ebimage <- normalize(ebimage)\n", "EBImage::display(ebimage)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Image segmentation\n", "\n", "### Visual\n", "\n", "We are using four steps for the segmentation:\n", "- Thresholding using the [thresh](https://www.rdocumentation.org/packages/EBImage/versions/4.14.2/topics/thresh) function to get a binary image.\n", "- Filtering to remove noise, using [medianFilter](https://www.rdocumentation.org/packages/EBImage/versions/4.14.2/topics/medianFilter).\n", "- [fillHull](https://www.rdocumentation.org/packages/EBImage/versions/4.14.2/topics/fillHull) to close holes in the segmented objects.\n", "\n", "Note: Up till now we still had a binary image (the pixel array only consists of 0's (background) and 1's (objects)), therefore we need:\n", "\n", "- [bwlabel](https://www.rdocumentation.org/packages/EBImage/versions/4.14.2/topics/bwlabel) to distinguish the single objects from each other. It assigns a different number to each object (first object will get pixel values '1', the second pixel values '2', etc.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Threshold\n", "threshImg <- thresh(ebimage, w=10, h=10, offset=0.01)\n", "\n", "# Remove noise\n", "threshImg <- medianFilter(threshImg, size=3)\n", "\n", "# Fill holes\n", "threshImg <- fillHull(threshImg) \n", "\n", "# Distinguish objects\n", "threshImg <- bwlabel(threshImg)\n", "\n", "# Show the segmented image\n", "EBImage::display(colorLabels(threshImg))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Modify the parameters of the thresh function w, h and offset to get a better segmentation. Goal: The two centrioles should be detected as two separate objects. We don't have to get rid off all the noise, but the two centrioles should be the largest objects detected. (Hint: w=15, h=15, offset=0.1 seems to work pretty well)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute the features (measurements)\n", "\n", "Use the [computeFeatures](https://www.rdocumentation.org/packages/EBImage/versions/4.14.2/topics/computeFeatures) methods to calculate some properties of the objects." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "shapes = computeFeatures.shape(threshImg)\n", "moments = computeFeatures.moment(threshImg)\n", "\n", "# merge the two dataframes together into one 'features' dataframe\n", "features <- merge(shapes, moments, by=0, all=TRUE)\n", "features" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For further analysis we need the object sizes: 's.area', 's.perimeter', especially 's.radius.mean' (~ diameter).\n", "\n", "For the visualisation we can use the location (m.cx, m.mcy), the major radius (m.majoraxis), the minor radius (can be calculated from m.eccentricity) and the rotation angle (m.theta) to draw an Ellipse ROI around each object." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save the results back to OMERO\n", "\n", "### Create the ROIs from the features table\n", "\n", "In order to be able to re-use this code later, we define a function for it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#' Create ROIs from a features table\n", "#'\n", "#' @param features The shape and moments generated by EBImage computeFeatures\n", "#' @return A dataframe specifying the x, y, rx, ry, w, h, theta and text parameters of the ROIs\n", "createROIs <- function(features) {\n", " rois <- data.frame(x = c(0), y = c(0), rx = c(0), ry = c(0), w = c(0), h = c(0), theta = c(0), text = c('remove'), stringsAsFactors = FALSE)\n", " for (index in 1:length(features[,1])) {\n", " x <- features[index,8]\n", " y <- features[index,9]\n", " r1 <- features[index,10]\n", " ecc <- features[index,11]\n", " r2 <- sqrt(r1^2 * (1 - ecc^2))\n", " theta <- features[index,12]\n", " rois <- rbind(rois, c(x, y, r1, r2, NA, NA, -theta, as.character(index)))\n", " }\n", " rois <- rois[-1,]\n", " rownames(rois) <- c()\n", " return(rois)\n", "}\n", "print(\"Function 'createROIs' defined.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rois <- createROIs(features)\n", "rois" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Save the ROIs back to OMERO" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# addROIs creates an ROI for each entry in the dataframe specified by the 'coords' parameter\n", "addROIs(image, coords = rois)\n", "print(\"ROIs successfully created.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Open the image in the full viewer in OMERO.web and check the ROIs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Store the features table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Attach the whole 'features' dataframe to the image:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 'attachDataframe' directly attaches an R dataframe to an OMERO image (project, dataset, etc.)\n", "# ('invisible' not strictly necessary, just suppresses some unnecessary output)\n", "invisible(attachDataframe(image, features, \"ROI features\"))\n", "print(\"Data frame successfully attached.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataframe is now attached to the image as an [HDF](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) table with file name 'ROI features'. You can download and open it with software like HDF Compass, load it into python scripts using 'h5py' library, etc. or simply load it directly as R dataframe again." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively attach it as CSV file:\n", "\n", "- Use R's built-in write.csv method to create a csv file\n", "- And attach that to the image using attachFile function from the romero.gateway" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "csvFile <- \"/tmp/ROI_features.csv\"\n", "write.csv(features, file = csvFile)\n", "fileannotation <- attachFile(image, csvFile)\n", "print(\"CSV file successfully attached.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Automate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tasks:\n", "\n", "- Segment each image of a dataset.\n", "- For visualisation create an ROI for each of the objects detected.\n", "- For further analysis only take the largest object per image into account (assumption: this is a centriole). Parameters to evaluate for this object are area, perimeter and diameter\n", "\n", "Note: We need to be able to specify the channel. Only one channel in each image is relevant for our analysis. The relevant channel is identified by its name.\n", "\n", "We put all the pieces together and wrap them up in a function called 'analyzeImage':\n", "\n", "- Determine the relevant channel\n", "- Load the pixel values\n", "- Perform the segmentation\n", "- Calculate the features of the detected objects\n", "- Create ROIs to visualise the segmentation results (adding the ROI 'index' as text/comment so that we can identify the ROI in the features table)\n", "- Extract the valuable information for further statistical analysis (area, perimeter and diameter of the largest object) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#' Performs an image segmentation to find the largest ROI of the image\n", "#' and returns some features of interest (area, perimeter and diameter).\n", "#' Optionally: Creates an ROI for each object detected by the segmentation.\n", "#'\n", "#' @param image The image to segment\n", "#' @param channelName The channel to be taken into account\n", "#' @param df The dataframe to add the features to (channelName, imageName, ImageID, ROIIndex, area, perimeter, diameter)\n", "#' @param saveROIs Flag if ROIs should be created and attached to the image (default: FALSE)\n", "#' @return The dataframe with the features\n", "analyzeImage <- function(image, channelName, df, saveROIs = FALSE) {\n", " # Find the channel index\n", " chnames <- getChannelNames(image)\n", " chindex <- match(channelName, chnames, nomatch = 0)\n", " if (chindex == 0) {\n", " message (paste(\"Could not resolve channel name, skipping \", image@dataobject$getId()))\n", " return(df)\n", " }\n", " \n", " # Load the pixels\n", " pixels <- getPixelValues(image, 1, 1, chindex)\n", " ebi <- EBImage::Image(data = pixels, colormode = 'Grayscale')\n", " ebi <- normalize(ebi)\n", " \n", " # this is our segmentation workflow from above\n", " threshImg <- thresh(ebi, w=15, h=15, offset=0.1)\n", " threshImg <- medianFilter(threshImg, size=3)\n", " threshImg <- fillHull(threshImg) \n", " threshImg <- bwlabel(threshImg)\n", " \n", " # Calculate the features\n", " shapes = suppressMessages(computeFeatures.shape(threshImg))\n", " moments = suppressMessages(computeFeatures.moment(threshImg))\n", " features <- merge(shapes, moments, by=0, all=TRUE)\n", " \n", " if (length(features[,1])>1) {\n", " # Add the ROIs to the image\n", " if (saveROIs) {\n", " rois <- createROIs(features)\n", " addROIs(image, coords = rois)\n", " }\n", " \n", " # Add the interesting properties (area, perimeter and diameter)\n", " # of the largest object together with channel name, image name, image id \n", " # and roi index to the dataframe\n", " features <- features[order(-features[,2]),]\n", " diameter <- features[1,4]*2\n", " df <- rbind(df, c(channelName, image@dataobject$getName(), image@dataobject$getId(), features[1,1], features[1,2], features[1,3], diameter))\n", " }\n", " return(df)\n", "}\n", "print(\"Function 'analyzeImage' defined.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate over the dataset and call the analyze function for each of the images:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "datasetId <- REPLACE_WITH_DATASET_ID\n", "\n", "channelName <- 'CDK5RAP2-C'\n", "\n", "dataset <- loadObject(server, \"DatasetData\", datasetId)\n", "\n", "# Keep the channel name, image name, image id, area, perimeter, and diameter of the largest ROIs\n", "result <- data.frame(Channel = c('remove'), ImageName = c('remove'), Image = c(0), ROIIndex = c(0), Area = c(0), Perimeter = c(0), Diameter = c(0), stringsAsFactors = FALSE)\n", "\n", "images <- getImages(dataset)\n", "for (image in images) {\n", " result <- tryCatch({\n", " analyzeImage(image, channelName, result, saveROIs = TRUE)\n", " }, warning = function(war) {\n", " message(paste(\"WARNING: \", image@dataobject$getId(),war))\n", " return(result)\n", " }, error = function(err) {\n", " message(paste(\"ERROR: \", image@dataobject$getId() ,err))\n", " return(result)\n", " }, finally = {\n", " })\n", "}\n", "\n", "result <- result[-1,]\n", "rownames(result) <- c()\n", "\n", "# set the correct datatypes\n", "result$Channel <- as.factor(result$Channel)\n", "result$Area <- as.numeric(result$Area)\n", "result$Perimeter <- as.numeric(result$Perimeter)\n", "result$Diameter <- as.numeric(result$Diameter)\n", "\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Open some of the other images of the dataset in the full viewer to check the ROIs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Optional Exercise:** The measurements (Area, Perimeter and Diameter) are currently showing the number of pixels. Find out the pixel size in nanometers and apply it accordingly. Hint: The X and Y pixel sizes are the same for all images. Select one image in OMERO.web and check its metadata (right hand side panel). Solution: See bottom of this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pxSizeInNM <- NA # REPLACE WITH PIXEL SIZE\n", "if (!is.na(pxSizeInNM)) {\n", " # Replace result$Area, Perimeter, Diameter with the\n", " # size in nanometers, using the variable 'pxSizeInNM'\n", " # declared above.\n", " \n", " # Enter code here...\n", " \n", " # Just print out the data frame again to check result:\n", " result\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistical analysis\n", "\n", "The image segmentation has already been run over the whole idr0021 project and the results attached to the project as table 'Summary from R'. \n", "We are going to load this table from OMERO as dataframe \"centrioles\". In this manner, we will quickly obtain a sizeable segmentation dataset to perform a statistical analysis on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise**: Go to trainer-1's idr0021 project, copy the ID and paste it below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "projectId <- REPLACE_WITH_PROJECT_ID \n", "project <- loadObject(server, \"ProjectData\", projectId)\n", "dataframes <- availableDataframes(project)\n", "print(dataframes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load the dataframe\n", "dfID <- dataframes$ID[1]\n", "centrioles <- loadDataframe(project, dfID)\n", "\n", "# Make sure the data types are correct\n", "centrioles$Dataset <- as.factor(centrioles$Dataset)\n", "centrioles$Diameter <- as.numeric(centrioles$Diameter)\n", "centrioles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Order the datasets ascending by mean diameter \n", "ag <-aggregate(centrioles$Diameter ~ centrioles$Dataset, centrioles, median)\n", "ordered <- factor(centrioles$Dataset, levels=ag[order(ag$`centrioles$Diameter`), 'centrioles$Dataset'])\n", "\n", "# Draw the plot \n", "plot(centrioles$Diameter ~ ordered, ylab='Diameter (nm)', xlab=\"Protein\", cex.axis=0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interpretation\n", "\n", "#### One-way analysis of variance\n", "\n", "Is there a significant difference between the proteins?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fit <- aov(centrioles$Diameter ~ centrioles$Dataset)\n", "summary(fit)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Two-sample Wilcoxon test of all pairwise combinations\n", "\n", "Are all proteins significantly different from each other?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Two-sample Wilcoxon test ('Mann-Whitney') of all pairwise combinations:\n", "combins <- combn(levels(centrioles$Dataset), 2)\n", "params_list <- split(as.vector(combins), rep(1:ncol(combins), each = nrow(combins)))\n", "testResults <- data.frame()\n", "for (param in params_list) {\n", " testdf <- subset(centrioles, centrioles$Dataset %in% param)\n", " pval <- wilcox.test(formula = Diameter ~ Dataset, data = testdf)$p.value\n", " testResults <- rbind(testResults, data.frame(Protein_1=param[1], Protein_2=param[2], p_value=pval))\n", "}\n", "testResults" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally close the connection to the OMERO.server again:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "disconnect(server)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Further exercise:** Take a look at the plot again. Find the outliers in OMERO.web (using 'parade') and check what might have gone wrong there." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution for the optional exercise ('set pixel size in nanometer'):\n", "\n", "pxSizeInNM <- 40\n", "result$Area <- result$Area * pxSizeInNM ^ 2\n", "result$Perimeter <- result$Perimeter * pxSizeInNM\n", "result$Diameter <- result$Diameter * pxSizeInNM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### License (BSD 2-Clause)\n", "\n", "Copyright (c) 2021, University of Dundee All rights reserved.\n", "\n", "Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n", "\n", "Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\n", "Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\n", "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. " ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 2 }