{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "oJj7NuZlyZGJ"
   },
   "source": [
    "<h1 style=\"text-align:center;line-height:1.5em;font-size:30px;\">Data and Scripts <br>for Hydrological Streamline Detection Using a U-net Model with Attention Module</h1>\n",
    "<p style=\"text-align:center;font-size:12px;\">\n",
    "$Zewei$ $Xu^{1,2}$; $Shaowen$ $Wang^{1,2}$; $Lawrence V.$ $Stanislawski^{3}$; $Zhe$ $Jiang^{4}$; $Nattapon$ $Jaroenchai^{1,2}$; $Arpan Man$ $Sainju^{4}$; $Ethan$ $Shavers^{3}$; $E. Lynn$ $Usery^{3}$; $Li$ $Chen^{2,5}$; $Zhiyu$ $Li^{1,2}$; $Bin$ $Su^{1,2}$\n",
    "</p>\n",
    "<p style=\"text-align:center;font-size:12px;\">\n",
    "$^{1}$$Department$ $of$ $Geography$ $and$ $Geographic$ $Information$ $Science$, $University$ $of$ $Illinois$ $at$ $Urbana-Champaign$, $Urbana$, $IL$, $USA$<br>\n",
    "$^{2}$$CyberGIS$ $Center$ $for$ $Advanced$ $Digital$ $and$ $Spatial$ $Studies$, $University$ $of$ $Illinois$ $at$ $Urbana-Champaign$, $Urbana$, $IL$, $USA$<br>\n",
    "$^{3}$$U.S.$ $Geological$ $Survey$, $Center$ $of$ $Excellence$ $for$ $Geospatial$ $Information$ $Science$, $Rolla$, $MO$, $USA$<br>\n",
    "$^{4}$$Department$ $of$ $Computer$ $Science$, $University$ $of$ $Alabama$, $Tuscaloosa$, $AL$, $USA$<br>\n",
    "$^{5}$$School$ $of$ $Geosciences$ $and$ $Info-Physics$, $Central$ $South$ $University$, $Changsha$, $Hunan$, $China$<br>\n",
    "$Corresponding$ $Author:$ $nj7@illinois.edu$\n",
    "</p>\n",
    "\n",
    "---\n",
    "    \n",
    "**Notebook Structure:**\n",
    "- [Introduction](1_Introduction.ipynb)\n",
    "- Codes\n",
    " - [Data Preprocessing](2.1_Code_Data_Preprocessing.ipynb)\n",
    " - [Model Training](2.2_Code_Model_Training.ipynb)\n",
    " - [Interpret the Result](2.3_Code_Interpret_the_Result%20.ipynb) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "lRgnHKrhyZGM"
   },
   "source": [
    "---\n",
    "\n",
    "### Data Preprocessing \n",
    "\n",
    "This part of code will generate the samples for training and validation process. \n",
    "\n",
    "In this research, the training and validatin are generated in 4 secenarios base on the parameter m in the block below,\n",
    "\n",
    "1. scenario 1 (m=\"n\"): the upper half is used to generate trainging and validation samples.\n",
    "2. scenario 2 (m=\"n2\"): the lower half is used to generate trainging and validation samples.\n",
    "3. scenario 3 (m=\"v\"): the left half is used to generate trainging and validation samples.\n",
    "4. scenario 4 (m=\"v2\"): the right half is used to generate trainging and validation samples.\n",
    "\n",
    "The code will generate 2000 samples then devide the sample into training (2/3) and valiation (1/3) sample sets. Then the data and label of the data will be save to .npy files. \n",
    "\n",
    "---\n",
    "\n",
    "### Run the block below to install required libraries\n",
    "\n",
    "You have to restart the kernel after running the code block below. \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "q3yCBlsYYN5O",
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: imgaug in /opt/conda/lib/python3.7/site-packages (0.4.0)\n",
      "Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from imgaug) (1.14.0)\n",
      "Requirement already satisfied: opencv-python in /opt/conda/lib/python3.7/site-packages (from imgaug) (4.3.0.36)\n",
      "Requirement already satisfied: Shapely in /opt/conda/lib/python3.7/site-packages (from imgaug) (1.7.0)\n",
      "Requirement already satisfied: scipy in /opt/conda/lib/python3.7/site-packages (from imgaug) (1.4.1)\n",
      "Requirement already satisfied: matplotlib in /opt/conda/lib/python3.7/site-packages (from imgaug) (3.1.3)\n",
      "Requirement already satisfied: imageio in /opt/conda/lib/python3.7/site-packages (from imgaug) (2.8.0)\n",
      "Requirement already satisfied: Pillow in /opt/conda/lib/python3.7/site-packages (from imgaug) (7.0.0)\n",
      "Requirement already satisfied: numpy>=1.15 in /opt/conda/lib/python3.7/site-packages (from imgaug) (1.18.1)\n",
      "Requirement already satisfied: scikit-image>=0.14.2 in /opt/conda/lib/python3.7/site-packages (from imgaug) (0.16.2)\n",
      "Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->imgaug) (1.1.0)\n",
      "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->imgaug) (2.4.6)\n",
      "Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->imgaug) (0.10.0)\n",
      "Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->imgaug) (2.8.1)\n",
      "Requirement already satisfied: networkx>=2.0 in /opt/conda/lib/python3.7/site-packages (from scikit-image>=0.14.2->imgaug) (2.4)\n",
      "Requirement already satisfied: PyWavelets>=0.4.0 in /opt/conda/lib/python3.7/site-packages (from scikit-image>=0.14.2->imgaug) (1.1.1)\n",
      "Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib->imgaug) (45.2.0.post20200209)\n",
      "Requirement already satisfied: decorator>=4.3.0 in /opt/conda/lib/python3.7/site-packages (from networkx>=2.0->scikit-image>=0.14.2->imgaug) (4.4.2)\n",
      "Collecting intervaltree\n",
      "  Downloading intervaltree-3.1.0.tar.gz (32 kB)\n",
      "Requirement already satisfied: sortedcontainers<3.0,>=2.0 in /opt/conda/lib/python3.7/site-packages (from intervaltree) (2.1.0)\n",
      "Building wheels for collected packages: intervaltree\n",
      "  Building wheel for intervaltree (setup.py) ... \u001b[?25ldone\n",
      "\u001b[?25h  Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26102 sha256=a0f53d6eb87da0930d8910f41eb79651df1547958aa09ca36ff4b2d025b6fad4\n",
      "  Stored in directory: /home/jovyan/.cache/pip/wheels/16/85/bd/1001cbb46dcfb71c2001cd7401c6fb250392f22a81ce3722f7\n",
      "Successfully built intervaltree\n",
      "Installing collected packages: intervaltree\n",
      "Successfully installed intervaltree-3.1.0\n",
      "Collecting numpy\n",
      "  Downloading numpy-1.20.2-cp37-cp37m-manylinux2010_x86_64.whl (15.3 MB)\n",
      "\u001b[K     |████████████████████████████████| 15.3 MB 13.3 MB/s eta 0:00:01█▊                         | 3.2 MB 13.3 MB/s eta 0:00:01��█████                     | 5.2 MB 13.3 MB/s eta 0:00:01              | 7.2 MB 13.3 MB/s eta 0:00:01 |██████████████████▍             | 8.8 MB 13.3 MB/s eta 0:00:01B 13.3 MB/s eta 0:00:01\n",
      "\u001b[31mERROR: libpysal 4.2.2 requires bs4, which is not installed.\u001b[0m\n",
      "\u001b[?25hInstalling collected packages: numpy\n",
      "\u001b[33m  WARNING: The scripts f2py, f2py3 and f2py3.7 are installed in '/home/jovyan/.local/bin' which is not on PATH.\n",
      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\n",
      "Successfully installed numpy-1.20.2\n"
     ]
    }
   ],
   "source": [
    "# install required libraries for preprocessing.\n",
    "!pip install --user imgaug=0.4.0\n",
    "!pip install --user intervaltree\n",
    "!pip install --user numpy --upgrade\n",
    "\n",
    "# Please restart the kernel after this block finished running."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### Samples generation\n",
    "\n",
    "First we extract the training and validation samples from the reference dataset. We can only sample small sample size due to the size of the dataset. Therefore, we must augment the trainging samples to increase the training sample size and generalize the smaples.\n",
    "\n",
    "The code below shows the augmentation steps usign the pre-sampling patches in the \"train_patches_top-left_\\*.npy\". The results are saved to train_data_aug\\*.npy and train_label_aug\\*.npy.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 184209,
     "status": "ok",
     "timestamp": 1583959869676,
     "user": {
      "displayName": "nattapon jaroenchai",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14Gj9i_TMlYzv27Vkas3v5qWkW7ZdTBQme3RVKqYp7c4=s64",
      "userId": "17092454241854925654"
     },
     "user_tz": 300
    },
    "id": "gBoLpQE3dGe_",
    "outputId": "a725a0de-6d38-40af-dbbd-e5725e353efb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Load data complete!\n",
      "augmentation saved!\n",
      "Training data augmentation complete!\n"
     ]
    }
   ],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "root=\"/home/jovyan/shared_data/data/unet_streamline_detection/\"\n",
    "save_root=\"/home/jovyan/work/unet_streamline_detection/\"\n",
    "m = 'v2'\n",
    "# Training data augmentation\n",
    "import numpy as np\n",
    "import os\n",
    "import matplotlib.pyplot as plt\n",
    "from imgaug import augmenters as iaa\n",
    "# reference array\n",
    "reference = np.load(root+'data/reference.npy')\n",
    "# total prediction feature maps\n",
    "total = np.load(root+'data/total.npy')\n",
    "# Top-left coordinates of training patches\n",
    "trainlo = np.load(root+'data/train_patches_top-left'+m+'.npy')\n",
    "# add 200 buffer pixels to each patch\n",
    "\n",
    "print(\"Load data complete!\")\n",
    "\n",
    "pad = 200\n",
    "trainlo[:,0] += pad\n",
    "trainlo[:,1] += pad\n",
    "depth = total.shape[2]\n",
    "reference = np.pad(reference,(pad,pad),'symmetric')\n",
    "for i in range(depth):\n",
    "    temp = np.pad(total[:,:,i],(pad,pad),'symmetric')[:,:,np.newaxis]\n",
    "    if i == 0:\n",
    "        totaln = temp\n",
    "    else:\n",
    "        totaln = np.concatenate((totaln,temp),axis = 2)\n",
    "def process(lpathc,lref):\n",
    "    # rotate a random degree between -50 and 130\n",
    "    lpathc = np.concatenate((lpathc,lref[:,:,np.newaxis]),axis = 2)\n",
    "    rotate = iaa.Affine(rotate=(-50, 130))\n",
    "    image1 = rotate.augment_image(lpathc)\n",
    "    # rotate a random degree between 230 and 310\n",
    "    rotate = iaa.Affine(rotate=(230, 310))\n",
    "    image2 = rotate.augment_image(lpathc)\n",
    "\n",
    "    #Scale a random ratio between 0.3 and 0.6\n",
    "    scale = iaa.Affine(scale={\"x\": (0.3, 0.6), \"y\": (0.3, 0.6)})\n",
    "    image3 = scale.augment_image(lpathc)\n",
    "\n",
    "    #Scale a random ratio between 1.5 and 2.0\n",
    "    scale = iaa.Affine(scale={\"x\": (1.5, 2.0), \"y\": (1.5, 2.0)})\n",
    "    image4 = scale.augment_image(lpathc)\n",
    "    \n",
    "    # shear a random degree between -30 and 30\n",
    "    shear = iaa.Affine(shear=(-30, 30))\n",
    "    image5 = shear.augment_image(lpathc)    \n",
    "    \n",
    "    # flip horizontally\n",
    "    flip = iaa.Fliplr(1.0)\n",
    "    image6 = flip.augment_image(lpathc)  \n",
    "    \n",
    "    # Add Guassian noises\n",
    "    #gua = iaa.AdditiveGaussianNoise(scale=(10, 20))\n",
    "    #image6 = gua.augment_image(lpathc)\n",
    "    #ref6 = gua.augment_image(lref)  \n",
    "    oii = []\n",
    "    orr = []\n",
    "    for i in [image1,image2,image3,image4,image5,image6]:\n",
    "        oii.append(i[pad:(pad+224),pad:(pad+224),:-1])\n",
    "        orr.append(i[pad:(pad+224),pad:(pad+224),-1])\n",
    "    return [oii,orr]\n",
    "\n",
    "# Concatenate augmented training data based on different types of augmentations\n",
    "pc = 0\n",
    "train_data_aug = []\n",
    "for i in range(len(trainlo)):\n",
    "    lo = trainlo[i]\n",
    "    lpatch = totaln[(lo[0]-pad):(lo[0]+224+pad),(lo[1]-pad):(lo[1]+224+pad),:]\n",
    "    lref = reference[(lo[0]-pad):(lo[0]+224+pad),(lo[1]-pad):(lo[1]+224+pad)]\n",
    "    if len(train_data_aug) == 0:\n",
    "        train_data_aug = lpatch[pad:(-pad),pad:(-pad),:][np.newaxis,:,:,:]\n",
    "        train_label_aug = lref[pad:(-pad),pad:(-pad)][np.newaxis,:,:]\n",
    "    else:\n",
    "        train_data_aug = np.concatenate((train_data_aug,lpatch[pad:(-pad),pad:(-pad),:][np.newaxis,:,:,:]),axis = 0)\n",
    "        train_label_aug = np.concatenate((train_label_aug,lref[pad:(-pad),pad:(-pad)][np.newaxis,:,:]),axis = 0)\n",
    "    [reim,rere] = process(lpatch,lref)\n",
    "    for j in range(6):\n",
    "        train_data_aug = np.concatenate((train_data_aug,reim[j][np.newaxis,:,:,:]),axis = 0)\n",
    "        train_label_aug = np.concatenate((train_label_aug,rere[j][np.newaxis,:,:]),axis = 0)\n",
    "    if i%30 == 0:\n",
    "        np.save(save_root+'data/gen/train_data_augP'+str(pc)+'.npy',train_data_aug)\n",
    "        np.save(save_root+'data/gen/train_label_augP'+str(pc)+'.npy',train_label_aug)\n",
    "        train_data_aug = []\n",
    "        train_label_aug = []\n",
    "        pc+=1\n",
    "        \n",
    "# store training data after different types of augmentations\n",
    "np.save(save_root+'data/gen/train_data_augP'+str(pc)+'.npy',train_data_aug)\n",
    "np.save(save_root+'data/gen/train_label_augP'+str(pc)+'.npy',train_label_aug)\n",
    "\n",
    "print(\"augmentation saved!\")\n",
    "\n",
    "# Concatenate the training data of different augmentations\n",
    "for i in range(pc+1):\n",
    "    temp = np.load(save_root+'data/gen/train_data_augP'+str(i)+'.npy')\n",
    "    templ = np.load(save_root+'data/gen/train_label_augP'+str(i)+'.npy')\n",
    "    if i == 0:\n",
    "        fdata = temp\n",
    "        fl = templ\n",
    "    else:\n",
    "        fdata = np.concatenate((fdata,temp),axis = 0)\n",
    "        fl = np.concatenate((fl,templ),axis = 0)\n",
    "\n",
    "# remove unnesessary intermediate files\n",
    "#os.system('rm /content/drive/My Drive/USGS/Notebooks/data/train_*_augP*.npy')\n",
    "# Shuffle the finalized file and save as .npy\n",
    "rand = np.arange(len(fdata))\n",
    "np.random.shuffle(rand)\n",
    "train_data_aug = fdata[rand]\n",
    "train_label_aug = fl[rand]\n",
    "np.save(save_root+'data/gen/train_data_aug'+m+'.npy',train_data_aug)\n",
    "np.save(save_root+'data/gen/train_label_aug'+m+'.npy',train_label_aug[:,:,:,np.newaxis])\n",
    "\n",
    "print(\"Training data augmentation complete!\")\n",
    "###### visualization ########\n",
    "#    import pdb\n",
    "#    pdb.set_trace() \n",
    "#    plt.imshow(lref[pad:(-pad),pad:(-pad)])\n",
    "#    plt.show()\n",
    "#    plt.imshow(lpatch[:,:,0][pad:(-pad),pad:(-pad)])\n",
    "#    plt.show()    \n",
    "#    aug = ['rotate1','rotate2','scale1','scale2','shear','flip']\n",
    "#    for i in range(6):\n",
    "#        print(aug[i])\n",
    "#        plt.imshow(rere[i])\n",
    "#        plt.show()\n",
    "#        plt.imshow(reim[i][:,:,0])\n",
    "#        plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "w7QSRH-6yZGd"
   },
   "source": [
    "---\n",
    "### Generate testing samples\n",
    "We genrate the testing samples using moving window method that covers entire part left from the training and validation samples. \n",
    "\n",
    "*Note: running this part may consume large memory*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 87
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 56150,
     "status": "ok",
     "timestamp": 1583959199784,
     "user": {
      "displayName": "nattapon jaroenchai",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14Gj9i_TMlYzv27Vkas3v5qWkW7ZdTBQme3RVKqYp7c4=s64",
      "userId": "17092454241854925654"
     },
     "user_tz": 300
    },
    "id": "DgHs_6qvIv2p",
    "outputId": "a384fa2b-9af2-47cb-afa9-41a6031f0c7d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Completed: Data Loading!\n",
      "rows:24\n",
      "columns:16\n",
      "Testing moving window is generate!\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import copy\n",
    "import random\n",
    "# stream/non-stream sample size\n",
    "\n",
    "size = 2000 # number of samples \n",
    "patch_size = 224 #patch size of each sample\n",
    "\n",
    "#Total data dimension: 3981*2640\n",
    "mask = np.load(root+'data/mask.npy')\n",
    "totaldata = np.load(root+'data/total.npy')\n",
    "totaldata = np.concatenate((totaldata,mask[:,:,np.newaxis]),axis = 2)\n",
    "label = np.load(root+'data/reference.npy')\n",
    "\n",
    "print('Completed: Data Loading!')\n",
    "# buffer size\n",
    "buf = 30\n",
    "it = 'full'\n",
    "# Image dimension\n",
    "IMG_WIDTH = 224\n",
    "IMG_HEIGHT = 224\n",
    "# moving window size = image_dimension - 2*buffer_size\n",
    "mw = IMG_WIDTH - buf*2\n",
    "\n",
    "# Number of trainig channels\n",
    "# Adding padding for moving window\n",
    "totalnew = np.pad(totaldata, (buf, buf), 'symmetric')\n",
    "totalnew = totalnew[:,:,buf:(buf+5)]\n",
    "dim = totaldata.shape[:2]\n",
    "\n",
    "# number of patch rows\n",
    "numr = dim[0]//(IMG_WIDTH - buf*2)#224\n",
    "# number of patch columns\n",
    "print('rows:'+str(numr))\n",
    "numc = dim[1]//(IMG_WIDTH - buf*2)#224\n",
    "print('columns:'+str(numc))\n",
    "\n",
    "# Splitting the total data into patches \n",
    "count = 0\n",
    "for i in range(numr):\n",
    "\tfor j in range(numc):\n",
    "\t\tcount += 1\n",
    "\t\ttemp = totalnew[i*mw:(i*mw+224),j*mw:(j*mw+224),:][np.newaxis,:,:,:]\n",
    "\t\tif count == 1:\n",
    "\t\t\ttotal = temp#[:,:,:,:-1]\n",
    "\t\telse:\n",
    "\t\t\ttotal = np.concatenate((total, temp),axis = 0)\n",
    "# Save the total dataset\n",
    "np.save(save_root+'data/gen/prediction_data.npy',total)\n",
    "print(\"Testing moving window is generate!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "machine_shape": "hm",
   "name": "2.1_Code_Data_Preprocessing.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}