{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.8/site-packages/geopandas/_compat.py:84: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.0-CAPI-1.16.2). Conversions between both will be slow.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import os\n",
    "import subprocess\n",
    "\n",
    "import earthpy as et\n",
    "import earthpy.plot as ep\n",
    "import earthpy.spatial as es\n",
    "import matplotlib.pyplot as plt\n",
    "import nivapy3 as nivapy\n",
    "import numpy as np\n",
    "import rasterio\n",
    "from rasterio.enums import Resampling\n",
    "\n",
    "plt.style.use(\"ggplot\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Frisk Oslofjord: raster data processing\n",
    "\n",
    "The raw RGB dataset is supplied as a single, 4-band (RGBA) GeoTiff with 2.2 cm cell resolution and 8-bit `uint` data type. It is located here:\n",
    "\n",
    "     K:\\Avdeling\\Mar\\HAN\\2_Prosjekter\\2019c_DRONING2019\\FriskOslofjord\\For ML analysis_James-Harry\\2019-08-27_Flight9_RGB\n",
    "     \n",
    "The alpha band in this dataset provides the \"no data\" mask, with values of 0 corresponding to NoData and values of 255 for valid data.\n",
    "\n",
    "The multispectral dataset is stored as separate bands here:\n",
    "\n",
    "    K:\\Avdeling\\Mar\\HAN\\2_Prosjekter\\2019c_DRONING2019\\FriskOslofjord\\For ML analysis_James-Harry\\2019-08-27_Flight6_MS\n",
    "    \n",
    "These bands have a 9.3 cm cell resolution and values are stored as 32-bit `float` data type. Instead of using an \"alpha mask\" to represent no data, these bands use a NoData value of -10000.\n",
    "\n",
    "Note that due to the difference in data types, values in the RGB dataset range from 0 to 255, whereas those in the multispectral data are in the range from 0 to 1. The aim of this notebook is to create a single, consistent dataset containing all the bands of interest.\n",
    "\n",
    "## 1. Align and resample\n",
    "\n",
    "Using a 5 cm cell resolution seems like a reasonable compromise between the two datasets. I have manually created a 5 cm \"snap raster\" named `bounding_box.tif`, based on a shapefile with the same name. To code below resamples and aligns all the images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing: /home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_blue.tif\n",
      "Processing: /home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_green.tif\n",
      "Processing: /home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_red.tif\n",
      "Processing: /home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_nir.tif\n",
      "Processing: /home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_rededge.tif\n",
      "Processing: /home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_all.tif\n"
     ]
    }
   ],
   "source": [
    "# Snap raster\n",
    "bounding_box = \"/home/jovyan/shared/drones/frisk_oslofjord/raster/bounding_box.tif\"\n",
    "\n",
    "# Multi-spectral images\n",
    "flist = [\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/2019-08-27_Flight6_MS_Reprocess_transparent_reflectance_blue.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/2019-08-27_Flight6_MS_Reprocess_transparent_reflectance_green.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/2019-08-27_Flight6_MS_Reprocess_transparent_reflectance_red.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/2019-08-27_Flight6_MS_Reprocess_transparent_reflectance_nir.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/2019-08-27_Flight6_MS_Reprocess_transparent_reflectance_rededge.tif\",\n",
    "]\n",
    "\n",
    "# Loop over datasets\n",
    "for fpath in flist:\n",
    "    name = os.path.split(fpath)[1].split(\"_\")[-1]\n",
    "    out_path = (\n",
    "        \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_\"\n",
    "        + name\n",
    "    )\n",
    "    print(\"Processing:\", out_path)\n",
    "\n",
    "    # Use rio warp. See https://rasterio.readthedocs.io/en/stable/cli.html#warp\n",
    "    # Syntax: rio warp input.tif output.tif --like template.tif\n",
    "    args = [\n",
    "        \"rio\",\n",
    "        \"warp\",\n",
    "        fpath,\n",
    "        out_path,\n",
    "        \"--like\",\n",
    "        bounding_box,\n",
    "        \"--overwrite\",\n",
    "    ]\n",
    "\n",
    "    subprocess.check_call(args)\n",
    "\n",
    "# RGB images\n",
    "fpath = \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/2019-08-27_Flight9_RGB_transparent_mosaic_group1.tif\"\n",
    "out_path = \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_all.tif\"\n",
    "print(\"Processing:\", out_path)\n",
    "\n",
    "# Use rio warp. See https://rasterio.readthedocs.io/en/stable/cli.html#warp\n",
    "# Syntax: rio warp input.tif output.tif --like template.tif\n",
    "args = [\n",
    "    \"rio\",\n",
    "    \"warp\",\n",
    "    fpath,\n",
    "    out_path,\n",
    "    \"--like\",\n",
    "    bounding_box,\n",
    "    \"--overwrite\",\n",
    "]\n",
    "\n",
    "res = subprocess.check_call(args)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Extract RGB to single bands\n",
    "\n",
    "The code here extracts each of the 4 RGB bands as a single band raster for further processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Open multiband raster\n",
    "with rasterio.open(\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_all.tif\"\n",
    ") as src:\n",
    "    meta = src.meta\n",
    "    nbands = src.count\n",
    "\n",
    "    # Write outputs with single band\n",
    "    meta.update(count=1)\n",
    "\n",
    "    # Bands ordered as RGBA\n",
    "    names = [\"red\", \"green\", \"blue\", \"alpha\"]\n",
    "\n",
    "    # Loop over bands\n",
    "    for idx in range(nbands):\n",
    "        print(\"Processing:\", names[idx], \"band\")\n",
    "\n",
    "        out_tif = f\"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_{names[idx]}.tif\"\n",
    "        with rasterio.open(out_tif, \"w\", **meta) as dst:\n",
    "            dst.write_band(1, src.read(idx + 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Bit-depth conversion and NoData handling\n",
    "\n",
    "To keep file sizes manageable, it seems best to work with 8-bit `unit` images (like the RGB data), rather 32-bit `float` images (like the multispectral dataset), which are four times larger. Converting from 32-bit to 8-bit involves some loss of information, which may lead to issues such as lack of \"dynamic range\" in the output images. However, I think 8-bit images should provide sufficient range for the ML algorithms and the savings in data size will be worth it.\n",
    "\n",
    "Converting from 32-bit to 8-bit is simple enough, **except** we need to choose an appropriate NoData value in the range between 0 and 255 (note that we can't use an alpha band here, as the NoData masks are different between the RGB and multispectral datasets). The code below performs a direct conversion from 32-bit to 8-bit, by multiplying the float values by 255 and rounding to the nearest integer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List of 5 cm multispec images\n",
    "flist = [\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_blue.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_green.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_red.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_nir.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_rededge.tif\",\n",
    "]\n",
    "\n",
    "# Loop over images\n",
    "for fpath in flist:\n",
    "    ds = rasterio.open(fpath)\n",
    "    data = ds.read(1)\n",
    "\n",
    "    # Set no data and rescale to range [0, 255]\n",
    "    data[data == -10000] = np.nan\n",
    "    data = np.round(data * 255, decimals=0)\n",
    "    print(np.nanmin(data), np.nanmax(data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on this, it looks possible to use the value `255` to indicate NoData, since it's not needed in the pixel values themselves. Does this also work for the RGB bands?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List of 5 cm rgb images\n",
    "flist = [\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_blue.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_green.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_red.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_alpha.tif\",\n",
    "]\n",
    "\n",
    "# Loop over images\n",
    "for fpath in flist:\n",
    "    ds = rasterio.open(fpath)\n",
    "    data = ds.read(1)\n",
    "    print(np.nanmin(data), np.nanmax(data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pixel values in the RGB data do seem to include both 0 and 255, which may indicate lack of dynamic range. The code below takes a closer look at one of these images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show RGB red band\n",
    "with rasterio.open(\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_red.tif\"\n",
    ") as src:\n",
    "    data = src.read(1)\n",
    "    plt.imshow(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Setup plot\n",
    "fig, axes = plt.subplots(ncols=4, nrows=1, figsize=(15, 6))\n",
    "\n",
    "# Loop over images\n",
    "for idx, fpath in enumerate(flist):\n",
    "    name = os.path.split(fpath)[1].split(\"_\")[-1][:-4]\n",
    "    ds = rasterio.open(fpath)\n",
    "    data = ds.read(1)\n",
    "\n",
    "    if name == \"alpha\":\n",
    "        axes[idx].imshow(data)\n",
    "    else:\n",
    "        axes[idx].imshow(data != 0)\n",
    "\n",
    "    axes[idx].set_title(name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code above checks zeros in the three RGB colour bands against the alpha mask (which shows the true location of NoData pixels). It looks as though the value 0 could be used as NoData in these bands without any major issues, since there are only a few cells with values of zero *within* the alpha mask. \n",
    "\n",
    "Based on this, I suggest the following strategy for creating a combined dataset:\n",
    "\n",
    " * Discard the alpha mask from the RGB dataset. Use values of zero in these bands to indicate NoData\n",
    " \n",
    " * For the multispectral data, convert from 32-bit to 8-bit range by multiplying by 255 and rounding to the nearest integer. **Values of zero should be artificially set to one, and zero can then be used as the NoData value here as well**. This is a slight adjustment/fudge of the raw pixel values, but any effects on the subsequent inference process should be negligible"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Build combined dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List of all 5 cm images\n",
    "flist = [\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_red.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_green.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/rgb/flight9_rgb_blue.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_red.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_green.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_blue.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_nir.tif\",\n",
    "    \"/home/jovyan/shared/drones/frisk_oslofjord/raster/multi_spec/flight6_multispec_rededge.tif\",\n",
    "]\n",
    "\n",
    "# Output\n",
    "out_tif = \"/home/jovyan/shared/drones/frisk_oslofjord/raster/ne_akeroya_5cm_all.tif\"\n",
    "\n",
    "# Read metadata of first file\n",
    "with rasterio.open(flist[0]) as src:\n",
    "    meta = src.meta\n",
    "\n",
    "# Update meta to reflect the number of layers in output and NoData value\n",
    "meta.update(count=len(flist), nodata=0)\n",
    "\n",
    "# Read each layer and write it to stack\n",
    "with rasterio.open(out_tif, \"w\", **meta) as dst:\n",
    "    for idx, fpath in enumerate(flist, start=1):\n",
    "        # Get attributes from file name\n",
    "        spec, name = os.path.split(fpath)[1].split(\"_\")[1:]\n",
    "        name = name[:-4]\n",
    "        print(\"Processing:\", spec, name)\n",
    "\n",
    "        with rasterio.open(fpath) as src:\n",
    "            # Read array\n",
    "            data = src.read(1)\n",
    "\n",
    "            if spec == \"multispec\":  # RGB data can be written directly\n",
    "                # Set NoData and round\n",
    "                data[data == -10000] = np.nan\n",
    "                data = np.round(data * 255, decimals=0)\n",
    "\n",
    "                # Artificially adjust genuine zeros to ones\n",
    "                data[data == 0] = 1\n",
    "\n",
    "                # Set NoData as zero\n",
    "                data[np.isnan(data)] = 0\n",
    "\n",
    "                # Convert to uint8\n",
    "                data = data.astype(np.uint8)\n",
    "\n",
    "            # Save to output\n",
    "            dst.write_band(idx, data)\n",
    "\n",
    "            # Add band description\n",
    "            dst.set_band_description(idx, f\"{spec}_{name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Explore final dataset\n",
    "\n",
    "The code below explores and checks the integrated dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Output path\n",
    "out_tif = \"/home/jovyan/shared/drones/frisk_oslofjord/raster/ne_akeroya_5cm_all.tif\"\n",
    "\n",
    "# Read downsampled version for better rendering speed\n",
    "resamp_fac = 4\n",
    "\n",
    "# Print dataset properties\n",
    "with rasterio.open(out_tif) as src:\n",
    "    band_titles = src.descriptions\n",
    "    data = src.read(\n",
    "        out_shape=(src.count, src.height // resamp_fac, src.width // resamp_fac),\n",
    "        resampling=Resampling.average,\n",
    "    )\n",
    "\n",
    "    meta = src.meta\n",
    "    for key in meta.keys():\n",
    "        print(f\"{key:10}:{meta[key]}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot all bands using earthpy\n",
    "ep.plot_bands(data, title=band_titles, cbar=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The multispectral data looks very dark compared to the RGB data. Let's take a look at histograms for each band."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Histograms of band values\n",
    "ep.hist(data, cols=3, title=band_titles)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is clear from the histograms above that the multispectral data is using only a small portion of the available range compared to the RGB data. Applying a \"linear stretch\" to these data should therefore give a better aesthetic appearance. **This might also help the ML algorithm - consider incorporating this change permanently?**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot composite images\n",
    "fig, axes = plt.subplots(ncols=2, nrows=1, figsize=(15, 10))\n",
    "\n",
    "ep.plot_rgb(data, rgb=[0, 1, 2], ax=axes[0], title=\"RGB composite\")\n",
    "\n",
    "ep.plot_rgb(\n",
    "    data,\n",
    "    rgb=[3, 4, 5],\n",
    "    ax=axes[1],\n",
    "    stretch=True,  # Only needed for multispec\n",
    "    title=\"RGB composite from multispectral data\",\n",
    ")\n",
    "\n",
    "plt.tight_layout()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}