{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Big Data Analytics for 4D Scanning Transmission Electron Microscopy\n",
    "\n",
    "**S. Jesse, M. Chi, A. Belianinov, C. Beekman, S. V. Kalinin, A. Y. Borisevich & A. R. Lupini**\n",
    "\n",
    "Supporting material for paper published in:<br>\n",
    "**Scientific Reports** -  https://www.nature.com/articles/srep26348\n",
    "\n",
    "Here, we will be working with four dimensional datasets acquired using a scanning transmission electron microscope (STEM). These datsets have four dimensions - two (x, y) dimensions from the position of the electron beam and each spatial pixel contains a two dimensional (u, v) image, called a **ronchigram**, recorded by the detector. Though the ronchigrams are typically averaged to two values (bright field, dark field), retaining the raw ronchigrams enables deeper investigation of data to reveal the existence of different phases in the material and other patterns that would typically not be visible in the averaged data\n",
    "\n",
    "### References:\n",
    "This Jupyter notebook uses [pycroscopy](https://pycroscopy.github.io/pycroscopy/about.html) to analyze Band Excitation data. We request you to reference the following papers if you use this notebook for your research: \n",
    "\n",
    "* [Arxiv paper](https://arxiv.org/abs/1903.09515) titled \"*USID and Pycroscopy - Open frameworks for storing and analyzing spectroscopic and imaging data*\"\n",
    "* [Scientific Reports](https://www.nature.com/articles/srep26348) titled \"*Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography*\"\n",
    "* Dataset used here is available via [OSTI / OLCF](https://www.osti.gov/dataexplorer/biblio/1463599)\n",
    "\n",
    "### Information about this notebook:\n",
    "Notebook written by:<br>\n",
    "**Suhas Somnath, and Chris R. Smith**<br>\n",
    "The Center for Nanophase Materials Science and The Institute for Functional Imaging for Materials <br>\n",
    "Oak Ridge National Laboratory<br>\n",
    "1/19/2017\n",
    "\n",
    "This is a Jupyter Notebook - it contains text and executable code `cells`. To learn more about how to use it, please see [this video](https://www.youtube.com/watch?v=jZ952vChhuI). Please see the image below for some basic tips on using this notebook.\n",
    "\n",
    "![notebook_rules.png](https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/jupyter_notebooks/notebook_rules.png)\n",
    "\n",
    "Image courtesy of Jean Bilheux from the [neutron imaging](https://github.com/neutronimaging/python_notebooks) GitHub repository.\n",
    "\n",
    "### A note about software versions:\n",
    "**Note: This notebook was written for the pyUSID and pycroscopy versions listed below and is not guaranteed to work on past or future versions of the packages**\n",
    "\n",
    "`pyUSID` version: `0.0.4`<br>\n",
    "`pycroscopy` version: `0.60.1`\n",
    "\n",
    "If you have a different version of the packages installed, you may consider using the notebook as is and accept the possibility of errors. The cell below will attempt to install the correct versions of the packages. However, if you experience trouble, you can uninstall the existing version of the conflicting packages and install the required versions above by executing the following commands in a terminal (Linux / MacOS) / Anaconda prompt (Windows): \n",
    "\n",
    "```bash\n",
    "pip uninstall pyUSID\n",
    "pip install -I pyUSID==0.0.4\n",
    "pip uninstall pycroscopy\n",
    "pip install -I pycroscopy==0.60.1\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure the notebook first"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Make sure needed packages are installed and up-to-date\n",
    "import sys\n",
    "!conda install --yes --prefix {sys.prefix} numpy scipy matplotlib scikit-learn Ipython ipywidgets h5py\n",
    "!{sys.executable} -m pip install -U --no-deps pyUSID==0.0.4\n",
    "!{sys.executable} -m pip install -U --no-deps pycroscopy==0.60.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Ensure python 3 compatibility\n",
    "from __future__ import division, print_function, absolute_import\n",
    "\n",
    "import os\n",
    "\n",
    "# Import necessary libraries:\n",
    "import h5py\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.patches as patches\n",
    "from IPython.display import display, HTML\n",
    "import ipywidgets as widgets\n",
    "from sklearn.cluster import KMeans\n",
    "\n",
    "sys.path.append('..')\n",
    "import pyUSID as usid\n",
    "import pycroscopy as px\n",
    "\n",
    "# Make Notebook take up most of page width\n",
    "display(HTML(data=\"\"\"\n",
    "<style>\n",
    "    div#notebook-container    { width: 95%; }\n",
    "    div#menubar-container     { width: 65%; }\n",
    "    div#maintoolbar-container { width: 99%; }\n",
    "</style>\n",
    "\"\"\"))\n",
    "\n",
    "# set up notebook to show plots within the notebook\n",
    "% matplotlib notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load pycroscopy compatible 4D STEM dataset\n",
    "\n",
    "For simplicity we will use a dataset that has already been translated form its original data format into a **Univeral Spectroscopy and Imaging Data (USID)** hierarchical data format (HDF5 or H5) file. For more information regarding USID, HDF5, etc. please see the documentation on our github projects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Select a file to work on:\n",
    "h5_path = px.io_utils.file_dialog('*.h5', '4D STEM dataset formatted according to USID')\n",
    "print('Working on:\\n' + h5_path)\n",
    "# Open the file\n",
    "h5_file = h5py.File(h5_path, mode='r+')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get reference to Raw measurement"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Select the dataset containing the raw data to start working with:\n",
    "h5_main = usid.hdf_utils.find_dataset(h5_file, 'Raw_Data')[-1]\n",
    "\n",
    "# Upgrade this object from a regular HDF5 dataset to a USIDataset:\n",
    "h5_main = usid.USIDataset(h5_main)\n",
    "\n",
    "# Read some necessary parameters:\n",
    "h5_pos_inds = h5_main.h5_pos_inds\n",
    "num_rows, num_cols = h5_main.pos_dim_sizes\n",
    "h5_spec_inds = h5_main.h5_spec_inds\n",
    "num_sensor_rows, num_sensor_cols = h5_main.spec_dim_sizes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the Raw Ronchigrams"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "coarse_row = int(0.5*num_rows)\n",
    "coarse_col = int(0.5*num_cols)\n",
    "coarse_pos = coarse_row * num_rows + coarse_col\n",
    "\n",
    "current_ronch = np.reshape(h5_main[coarse_pos], (num_sensor_rows, num_sensor_cols))\n",
    "\n",
    "fig, axes = plt.subplots(ncols=2, figsize=(14,7))\n",
    "axes[0].hold(True)\n",
    "axes[0].set_title('Mean Response')\n",
    "main_map = axes[0].imshow(np.reshape(h5_main.parent['Spectroscopic_Mean'], (num_rows, num_cols)), \n",
    "                          cmap=px.plot_utils.cmap_jet_white_center(), origin='lower')\n",
    "main_vert_line = axes[0].axvline(x=coarse_col, color='k')\n",
    "main_hor_line = axes[0].axhline(y=coarse_row, color='k')\n",
    "axes[1].set_title('Ronchigram at current pixel')\n",
    "img_zoom = axes[1].imshow(current_ronch,cmap=px.plot_utils.cmap_jet_white_center(), origin='lower')\n",
    "\n",
    "def move_zoom_box(event):\n",
    "    if not main_map.axes.in_axes(event):\n",
    "        return\n",
    "    \n",
    "    coarse_col = int(round(event.xdata))\n",
    "    coarse_row = int(round(event.ydata))\n",
    "    main_vert_line.set_xdata(coarse_col)\n",
    "    main_hor_line.set_ydata(coarse_row)\n",
    "    \n",
    "    coarse_pos = coarse_row * num_rows + coarse_col\n",
    "    current_ronch = np.reshape(h5_main[coarse_pos], (num_sensor_rows, num_sensor_cols))\n",
    "\n",
    "    img_zoom.set_data(current_ronch)\n",
    "    #img_zoom.set_clim(vmax=ronch_max, vmin=ronch_min)\n",
    "    fig.canvas.draw()\n",
    "    \n",
    "\n",
    "cid = main_map.figure.canvas.mpl_connect('button_press_event', move_zoom_box)\n",
    "# widgets.interact(move_zoom_box, coarse_row=(0, num_rows, 1), \n",
    "#                  coarse_col=(0, num_cols, 1));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Performing Singular Value Decompostion (SVD)\n",
    "SVD decomposes data (arranged as position x value) into a sequence of orthogonal components arranged in descending order of variance. The first component contains the most significant trend in the data. The second component contains the next most significant trend orthogonal to all previous components (just the first component). Each component consists of the trend itself (eigenvector), the spatial variaion of this trend (eigenvalues), and the variance (statistical importance) of the component.\n",
    "\n",
    "Here, SVD essentially compares every single ronchigram with every other ronchigram to find statistically significant trends in the dataset. Such correlation would be infeasible if the ronchigrams were averaged to bright-field and dark-field scalar values. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Choose how many components you want\n",
    "num_svd_comps = 256\n",
    "\n",
    "proc = px.processing.SVD(h5_main, num_components=num_svd_comps)\n",
    "\n",
    "h5_svd_group = proc.compute()\n",
    "    \n",
    "h5_u = h5_svd_group['U']\n",
    "h5_v = h5_svd_group['V']\n",
    "h5_s = h5_svd_group['S']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the SVD results\n",
    "\n",
    "##### S (variance):\n",
    "The plot below shows the variance or statistical significance of the SVD components. The first few components contain the most significant information while the last few components mainly contain noise. \n",
    "\n",
    "Note also that the plot below is a log-log plot. The importance of each subsequent component drops exponentially."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Choose how many components of U and V to display\n",
    "num_plot_comps = 16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Visualize variance of the principal components\n",
    "fig, axes = usid.plot_utils.plot_scree(h5_s, title='Variance')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### U (Eigenvalues or Abundance maps):\n",
    "The plot below shows the spatial distribution of each SVD component"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Visualize the abundance maps from SVD:\n",
    "loadings = np.reshape(h5_u[:, :num_plot_comps], (num_rows, num_cols, -1)).transpose([2, 0, 1])\n",
    "fig, axes = usid.plot_utils.plot_map_stack(loadings, num_comps=num_plot_comps, title='Abundance Maps',\n",
    "                                         cmap=px.plot_utils.cmap_jet_white_center())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### V (Endmembers or Eigenvectors)\n",
    "The V dataset contains the endmembers for each component"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Visualize the Endmembers from SVD:\n",
    "eigenvectors = np.reshape(h5_v[:num_plot_comps], (-1, num_sensor_rows, num_sensor_cols))\n",
    "fig, axes = usid.plot_utils.plot_map_stack(eigenvectors, num_comps=num_plot_comps, title='Endmembers',\n",
    "                                         cmap=px.plot_utils.cmap_jet_white_center())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clustering\n",
    "Clustering divides data into k clusters such that the variance within each cluster is minimized. \n",
    "\n",
    "In principle, clustering can be perfomed on any dataset that has some spectral values (eg. - ronchgirams in the case of the raw dataset or an array of values for each SVD component) for each position. However, the computation time increases with the size of the dataset.\n",
    "\n",
    "Here, we will be performing k-means clustering on the U matrix from SVD. \n",
    "We want a large enough number of clusters so that K-means identifies fine nuances in the data. At the same time, we want to minimize computational time by reducing the number of clusters. We recommend 32 clusters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Choose how many SVD components to use in clustering\n",
    "spectral_components = 128\n",
    "# Choose how many clusters to use\n",
    "num_clusters = 32"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "estimator = KMeans(n_clusters=num_clusters)\n",
    "\n",
    "proc = px.processing.Cluster(h5_u, estimator, num_comps=spectral_components)\n",
    "\n",
    "h5_kmeans_group = proc.compute()\n",
    "    \n",
    "h5_labels = h5_kmeans_group['Labels']\n",
    "h5_centroids = h5_kmeans_group['Mean_Response']\n",
    "\n",
    "# In case we take existing results, we need to get these parameters\n",
    "num_comps_for_clustering = h5_centroids.shape[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Visualize k-means results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "label_mat = np.reshape(h5_labels, (num_rows, num_cols))\n",
    "fig, axes = px.viz.cluster_utils.plot_cluster_labels(label_mat, num_clusters=num_clusters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Visualize the hierarchical clustering\n",
    "The vertical length of the branches indicates the relative separation between neighboring clusters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "e_vals = np.reshape(h5_u[:, :spectral_components], \n",
    "                    (num_rows, num_cols, -1))\n",
    "fig = px.viz.cluster_utils.plot_cluster_dendrogram(label_mat, e_vals, \n",
    "                                                   num_comps_for_clustering, \n",
    "                                                   num_clusters, \n",
    "                                                   last=num_clusters);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save and close\n",
    "* Save the .h5 file that we are working on by closing it. <br>\n",
    "* Also, consider exporting this notebook as a notebook or an html file. <br> To do this, go to File >> Download as >> HTML\n",
    "* Finally consider saving this notebook if necessary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "h5_file.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  },
  "widgets": {
   "state": {
    "74a037e0ed7f4854a0c8e337ac2d6798": {
     "views": [
      {
       "cell_index": 10
      }
     ]
    }
   },
   "version": "1.2.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}