{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Raster (Grid) Handling\n",
    "\n",
    "This section introduces geospatial analysis of raster (gridded) data with `gdal` and `rasterstats`. Download sample raster datasets from [River Architect](https://github.com/RiverArchitect/SampleData/archive/master.zip). In particular, this section uses GeoTIFF rasters located in [`RiverArchitect/SampleData/01_Conditions/2100_sample/`](https://github.com/RiverArchitect/SampleData/tree/master/01_Conditions/2100_sample).\n",
    "\n",
    "\n",
    "> **Requirements:**\n",
    "> * Make sure to understand [gridded raster data](https://hydro-informatics.com/geopy/geospatial-data.html#raster), in particular, the `.tif` ([`GeoTIFF`](https://hydro-informatics.com/documentation/glossary.html#term-GeoTIFF)) format\n",
    "> * Accomplish the [QGIS tutorial](https://hydro-informatics.com/geopy/use-qgis.html)\n",
    "\n",
    "\n",
    "> **Tips:**\n",
    "> * While OSGeo's `ogr` module is useful for shapefile handling, raster data are best handled by `gdal`.  \n",
    "> * The functions featured in this section are introduced are partially also implemented in [flusstools](https://flusstools.readthedocs.io/).\n",
    "> * To use those functions, make sure flusstools is installed and import it as follows: `from flusstools import geotools`. Some of the functions shown in this tutorial can then be used with `geotools.function_name()`.\n",
    "\n",
    "\n",
    "## Load raster\n",
    "\n",
    "### Open Existing Raster Data \n",
    "\n",
    "Raster data can be opened in Python code as an instance of `gdal.Open(\"FILENAME\")`. The following code block provides a function to open any raster specified with the `file_name` input argument. One of the most important elements when dealing with raster data is the raster band, which takes on a similar data carrier role as `GetLayer` in shapefile handling. To access the raster band, the below-shown `open_raster` function:\n",
    "\n",
    "1. Enables error and warning feedback with `gdal.UseExceptions()` (this step is vital when using gdal).\n",
    "1. Opens the provided raster `file_name` embraced by `try` - `except` statements to inform if and why a potential error occurred while opening the raster.\n",
    "1. Opens the raster band number stated in the optional `band_number` keyword argument with `raster_band = raster.GetRasterBand(band_number)` (the default value is `1`).\n",
    "1. Returns the raster and raster band objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from osgeo import gdal\n",
    "\n",
    "\n",
    "def open_raster(file_name, band_number=1):\n",
    "    \"\"\"\n",
    "    Open a raster file and access its bands\n",
    "    :param file_name: STR of a raster file directory and name\n",
    "    :param band_number: INT of the raster band number to open (default: 1)\n",
    "    :output: osgeo.gdal.Dataset, osgeo.gdal.Band objects\n",
    "    \"\"\"\n",
    "    gdal.UseExceptions()\n",
    "    # open raster file or return None if not accessible\n",
    "    try:\n",
    "        raster = gdal.Open(file_name)\n",
    "    except RuntimeError as e:\n",
    "        print(\"ERROR: Cannot open raster.\")\n",
    "        print(e)\n",
    "        return None\n",
    "    # open raster band or return None if corrupted\n",
    "    try:\n",
    "        raster_band = raster.GetRasterBand(band_number)\n",
    "    except RuntimeError as e:\n",
    "        print(\"ERROR: Cannot access raster band.\")\n",
    "        print(e)\n",
    "        return None\n",
    "    return raster, raster_band"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Tip:** This function is available in flusstools. For instance, use it with `from flusstools import geotools` and `geotools.open_raster(\"path/to/raster.tif\")`.\n",
    "\n",
    "To use the `open_raster` function, call it with a file name as shown in the following code block with the `h001000.tif` raster from the [*River Architect* sample data](https://github.com/RiverArchitect/SampleData/archive/master.zip). The script immediately closes the raster again by overwriting the variable with a `None`-instance to avoid the GeoTIFF file being locked afterward."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "file_name = r\"\" + os.getcwd() + \"/geodata/rasters/h001000.tif\"\n",
    "src, depth = open_raster(file_name)\n",
    "print(src)\n",
    "print(depth)\n",
    "depth = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Raster Band Statistics and Toolbox Scripts\n",
    "\n",
    "Once the raster and its band are loaded with the above `open_raster` function, we can access statistical information (e.g., the minimum or the maximum), identify the *no-data* value (i.e., a pre-defined value that is assigned to pixels without value), or the type of units used.\n",
    "\n",
    "Python scripts for processing geospatial data can also be embedded as plugins in GIS desktop applications (e.g., as plugins in QGIS or Toolbox in ArcGIS Pro). To run a Python script in a GIS desktop application, it should be written as a standalone script that can receive input arguments. The interested reader can learn more about implementing plugins in QGIS in the [QGIS docs](https://docs.qgis.org/3.22/en/docs/pyqgis_developer_cookbook/plugins/index.html).\n",
    "\n",
    "> **Note:** QGIS wraps many external algorithm types, which are available through the **QGIS Processing Toolbox**. These algorithms belong, for example, to *SAGA* or *GRASS GIS*.\n",
    "\n",
    "Here we will only write the next code block so that it can be run in a console/terminal application as a standalone script (recall the [instructions for writing standalone script](https://hydro-informatics.com/jupyter/pypckg.html#standalone))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "# make sure to use exceptions\n",
    "gdal.UseExceptions()\n",
    "\n",
    "def how2use():\n",
    "    # provide usage instructions for the script\n",
    "    print(\"\"\"\n",
    "    $ raster_band_info.py [ band number ] input-raster\n",
    "    \"\"\")\n",
    "    # exit program if wrong input arguments provided\n",
    "    sys.exit(1)\n",
    "    \n",
    "\n",
    "def get_color_bands(raster_band):\n",
    "    \"\"\"\n",
    "    :param raster_band: osgeo.gdal.Band object\n",
    "    :output: list of color bands used in raster_band\n",
    "    \"\"\" \n",
    "    \n",
    "    # get ColorTable and return False if None\n",
    "    color_table = raster_band.GetColorTable()\n",
    "    if color_table is None:\n",
    "        print(\"Band has no ColorTable.\")\n",
    "        return None\n",
    "    else:\n",
    "        print(\"Found %i color definitions.\" % int(color_table.GetCount()))\n",
    "\n",
    "    # iterate through color_table and append objects found to color_bands list\n",
    "    color_bands = []\n",
    "    for c in range(0, color_table.GetCount() ):\n",
    "        entry = color_table.GetColorEntry(c)\n",
    "        if not entry:\n",
    "            continue\n",
    "        color_bands.append(str(color_table.GetColorEntryAsRGB(c, entry)))\n",
    "    return color_bands\n",
    "\n",
    "def main(band_number, input_file):\n",
    "    src, band = open_raster(input_file)\n",
    "    print(\"Band minimum: \", band.GetMinimum())\n",
    "    print(\"Band maximum: \", band.GetMaximum())\n",
    "    print(\"No-data value: \", band.GetNoDataValue())\n",
    "    print(\"Band unit type: \", band.GetUnitType())    \n",
    "\n",
    "    try:\n",
    "        print(\", \".join(get_color_bands(band)))\n",
    "    except TypeError:\n",
    "        print(\"ColorTable: None\")\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    # make standalone\n",
    "    if len( sys.argv ) < 3:\n",
    "        print(\"\"\"\n",
    "        ERROR: Provide two arguments:\n",
    "        1) the band number (int) and 2) input raster directory (str)\n",
    "        \"\"\")\n",
    "        how2use()\n",
    "\n",
    "    main(int(sys.argv[1]), str(sys.argv[2]))\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To run this script, save it as [raster_band_info.py](https://raw.githubusercontent.com/hydro-informatics/jupyter-python-course/main/geodata/raster_band_info.py) (e.g., in `C:\\temp` in Windows or `~/temp/` in Linux) and navigate to the script directory in a terminal interface (e.g., in PyCharm's Terminal, Anaconda Prompt, or the Linux Terminal) using the `cd` command. Then run the script to get statistics of the water depth raster `h001000.tif` with (in Windows):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "C:\\temp\\ python raster_band_info.py 1 \"C:\\temp\\geodata\\rasters\\h001000.tif\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On Linux and with the [vflussenv activated](https://hydro-informatics.com/python-basics/pyinstall.html#pip-quick) tap, for example:\n",
    "\n",
    "```\n",
    "python raster_band_info.py 1 \"~/temp/geodata/rasters/h001000.tif\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "Band minimum:  0.0\n",
    "Band maximum:  7.0613012313843\n",
    "No-data value:  -3.4028234663852886e+38\n",
    "Band unit type:\n",
    "Band has no ColorTable.\n",
    "ColorTable: None\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create and Save a Raster (from Array)\n",
    "\n",
    "### Raster Drivers\n",
    "\n",
    "Just like for shapefile files, the appropriate `gdal` driver (analogous to `ogr` drivers) must be loaded to save a raster. To get a full list of `gdal` raster drivers run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver_list = [str(gdal.GetDriver(i).GetDescription()) for i in range(gdal.GetDriverCount())]\n",
    "driver_list.sort()\n",
    "print(\", \".join(driver_list[:]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Raster Data Types \n",
    "\n",
    "The output raster pixels can be one of the following data types (source: [gdal.org/doxygen/](https://gdal.org/doxygen/classGDALDataset.html)):\n",
    "\n",
    "* `GDT_Unknown` Unknown or unspecified type\n",
    "* `GDT_Byte` 8 bit unsigned integer\n",
    "* `GDT_UInt16` 16 bit unsigned integer\n",
    "* `GDT_Int16` 16 bit signed integer\n",
    "* `GDT_UInt32` 32 bit unsigned integer\n",
    "* `GDT_Int32` 32 bit signed integer\n",
    "* `GDT_Float32` 32 bit floating point\n",
    "* `GDT_Float64` 64 bit floating point\n",
    "* `GDT_CInt16` Complex Int16\n",
    "* `GDT_CInt32` Complex Int32\n",
    "* `GDT_CFloat32` Complex Float32\n",
    "* `GDT_CFloat64` Complex Float64 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a Raster (Array to Raster) \n",
    "\n",
    "Knowing the basics of raster handling, data types, and Python, we can create a raster from a numeric array. Since a raster is basically a georeferenced array, it is convenient to convert a [numpy array](https://hydro-informatics.com/python-basics/pynum.html#array-matrix-operations) into a raster (band). The function blocks feature the conversion of a *numpy* array into a GeoTIFF raster according to the following workflow:\n",
    "\n",
    "1. Check out the GeoTIFF driver (`driver = gdal.GetDriverByName('GTiff')`).\n",
    "1. Retrieve the array size and (number of rows `rows` and columns `cols`).\n",
    "1. Create a new GeoTIFF raster (`new_raster = driver.Create(file_name, cols, rows, 1, eType=rdtype)`), where \n",
    "    - `file_name` is the directory and name of the new raster file ending on `.tif` (e.g., `\"C:\\\\temp\\\\rasters\\\\new.tif\"`).\n",
    "    - `cols`, `rows` represent the array shape, and `eType` is the geospatial data type (see above list)\n",
    "1. Set the geographic origin, which is stored in the `origin` (*tuple*) parameter and defines the `pixel_width` and `pixel_height` (pixel units defined with `srs` - see below).\n",
    "1. Replace `np.nan` values in the numpy array with `nan_value`.\n",
    "1. Instantiate a `band` object, set the `NoDataValue` to `nan_value`, and write the array to the `band`.\n",
    "1. Create a spatial reference system object  (`srs`) as a function of the `epsg` input parameter and export it to WKT format.\n",
    "1. Release the raster (flush from cache).\n",
    "\n",
    "> **Note:** The units defined with the `epsg` number drive the pixel size, where `pixel_width` and `pixel_height` are multipliers of that unit. In the case of `epsg=3857`, the unit is `meters` and `pixel_width=10` combined with `pixel_height=20` creates 10-m wide and 20-m high pixels. In the case of `epsg=4326`, the unit is (geographic) `degrees` and 1 degree by 1-degree pixels can have the size of a count(r)y."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from osgeo import osr\n",
    "\n",
    "\n",
    "def create_raster(file_name, raster_array, origin=None, epsg=4326, pixel_width=10, pixel_height=10,\n",
    "                  nan_value=-9999.0, rdtype=gdal.GDT_Float32, geo_info=False):\n",
    "    \"\"\"\n",
    "    Convert a numpy.array to a GeoTIFF raster with the following parameters\n",
    "    :param file_name: STR of target file name, including directory; must end on \".tif\"\n",
    "    :param raster_array: np.array of values to rasterize\n",
    "    :param origin: TUPLE of (x, y) origin coordinates\n",
    "    :param epsg: INT of EPSG:XXXX projection to use - default=4326\n",
    "    :param pixel_height: INT of pixel height (multiple of unit defined with the EPSG number) - default=10m\n",
    "    :param pixel_width: INT of pixel width (multiple of unit defined with the EPSG number) - default=10m\n",
    "    :param nan_value: INT/FLOAT no-data value to be used in the raster (replaces non-numeric and np.nan in array)\n",
    "                        default=-9999.0\n",
    "    :param rdtype: gdal.GDALDataType raster data type - default=gdal.GDT_Float32 (32 bit floating point)\n",
    "    :param geo_info: TUPLE defining a gdal.DataSet.GetGeoTransform object (supersedes origin, pixel_width, pixel_height)\n",
    "                        default=False\n",
    "    \"\"\"\n",
    "    # check out driver\n",
    "    driver = gdal.GetDriverByName('GTiff')\n",
    "\n",
    "    # create raster dataset with number of cols and rows of the input array\n",
    "    cols = raster_array.shape[1]\n",
    "    rows = raster_array.shape[0]\n",
    "    new_raster = driver.Create(file_name, cols, rows, 1, eType=rdtype)    \n",
    "\n",
    "    # apply geo-origin and pixel dimensions\n",
    "    if not geo_info:\n",
    "        origin_x = origin[0]\n",
    "        origin_y = origin[1]\n",
    "        new_raster.SetGeoTransform((origin_x, pixel_width, 0, origin_y, 0, pixel_height))\n",
    "    else:\n",
    "        new_raster.SetGeoTransform(geo_info)\n",
    "    \n",
    "    # replace np.nan values\n",
    "    raster_array[np.isnan(raster_array)] = nan_value\n",
    "\n",
    "    # retrieve band number 1\n",
    "    band = new_raster.GetRasterBand(1)\n",
    "    band.SetNoDataValue(nan_value)\n",
    "    band.WriteArray(raster_array)\n",
    "    band.SetScale(1.0)\n",
    "\n",
    "    # create projection and assign to raster\n",
    "    srs = osr.SpatialReference()\n",
    "    srs.ImportFromEPSG(epsg)\n",
    "    new_raster.SetProjection(srs.ExportToWkt())\n",
    "\n",
    "    # release raster band\n",
    "    band.FlushCache()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To call the function for writing a random numpy array, we can now use the `create_raster()` function (also available from `geotools.create_raster()`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "# set the name of the output GeoTIFF raster\n",
    "raster_name = r\"\" + os.getcwd() + \"/geodata/rasters/random_unis_dem.tif\"\n",
    "# create a random numpy array (DEM-like values) - can be replaced with any other numpy.array\n",
    "unis_dem = np.random.rand(300, 300) + 455.0\n",
    "# overwrite one pixel with np.nan\n",
    "unis_dem[5, 7] = np.nan\n",
    "# define a raster origin in EPSG:3857\n",
    "raster_origin = (1013428.396233, 6231555.006177)\n",
    "# call create_raster to create a 1-m-resolution raster in EPSG:4326 projection\n",
    "create_raster(raster_name, unis_dem, raster_origin,  pixel_width=1,  pixel_height=1, epsg=3857) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![img](https://hydro-informatics.com/_images/qgis-ras-unis.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Raster Calculus (Raster / Band to Array) \n",
    "\n",
    "The procedure described with the `create_raster()` function can be used vice-versa, to create [numpy array](https://hydro-informatics.com/python-basics/pynum.html#array-matrix-operations) from raster bands. The raster converted to a numpy array enables algebraic or other logical operations to be applied to existing raster data.\n",
    "\n",
    "Need an example? In the *RiverArchitect SampleData*, the units of the water depth raster `h001000.tif` are in U.S. customary feet and the units of the flow velocity raster `u001000.tif` are in feet per second. However, to calculate the [Froude number](https://hydro-informatics.com/documentation/glossary.html#term-Froude-number) (involves the gravity constant) for every pixel based on the two rasters (water depth and flow velocity), it is convenient to convert both rasters into m and m/s, respectively. For this purpose, the following code block features another re-usable, custom function that loads a raster as an array and overwrites `NoDataValues` with `np.nan` (`raster` and `band` can be instantiated with the above `open_raster` function):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def raster2array(file_name, band_number=1):\n",
    "    \"\"\"\n",
    "    :param file_name: STR of target file name, including directory; must end on \".tif\"\n",
    "    :param band_number: INT of the raster band number to open (default: 1)\n",
    "    :output: (1) ndarray() of the indicated raster band, where no-data values are replaced with np.nan\n",
    "             (2) the GeoTransformation used in the original raster\n",
    "    \"\"\"\n",
    "    # open the raster and band (see above)\n",
    "    raster, band = open_raster(file_name, band_number=band_number)\n",
    "    # read array data from band\n",
    "    band_array = band.ReadAsArray()\n",
    "    # overwrite NoDataValues with np.nan\n",
    "    band_array = np.where(band_array == band.GetNoDataValue(), np.nan, band_array)\n",
    "    # return the array and GeoTransformation used in the original raster\n",
    "    return raster, band_array, raster.GetGeoTransform()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `raster2array()` function is also included in flusstools: `geotools.raster2array()`\n",
    "\n",
    "> **Challenge:** The `raster2array` function returns a tuple, where `output[0]` corresponds to the array and `output[1]` is the geo-transformation. Can you optimize the way how this information is returned?\n",
    "\n",
    "Finally, to create a Froude number GeoTIFF raster, the following code block makes use of the `raster2array` function for converting the water depth and flow velocity GeoTIFF rasters into a numpy array, performing simple algebraic calculations to convert the rasters to m and m/s, respectively, and save the resulting GeoTIFF file. In detail, the workflow involves to: \n",
    "\n",
    "* Define the input raster file names with directories (`h_file` and `u_file`),\n",
    "* Load original rasters as `ndarray` with the `raster2array()` function and get the original `GeoTransform` description,\n",
    "* Convert all values from U.S. customary feet to S.I. metric units (recall the [`feet_to_meter()`](https://hydro-informatics.com/jupyter/pyfun.html#kwargs) function from the Python basics), and\n",
    "* Save a new copy of the raster."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "h_file = r\"\" + os.getcwd() + \"/geodata/rasters/h001000.tif\"\n",
    "u_file = r\"\" + os.getcwd() + \"/geodata/rasters/u001000.tif\"\n",
    "\n",
    "# load both rasters as arrays\n",
    "h_ras, h, h_geo_info = raster2array(h_file)\n",
    "u_ras, u, u_geo_info = raster2array(u_file)\n",
    "\n",
    "#convert to metric system\n",
    "h *= 0.3048\n",
    "u *= 0.3048\n",
    "\n",
    "# calculate the Froude number as array and avoid zero-division warning messages\n",
    "with np.errstate(divide=\"ignore\", invalid=\"ignore\"):\n",
    "    Froude = u / np.sqrt(h * 9.81)\n",
    "\n",
    "# create Froude raster from array\n",
    "create_raster(file_name= r\"\" + os.path.abspath(\"\") + \"/geodata/rasters/Fr1000cfs.tif\",\n",
    "              raster_array=Froude, epsg=6418, geo_info=h_geo_info)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![img](https://hydro-informatics.com/_images/qgis-py-fr.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reproject a Raster \n",
    "\n",
    "The transformation (and reprojection) of a raster into another coordinate system involves rotating, shifting, and shearing pixels. If one of these operations is skipped, the reprojected raster may be squeezed, twisted, or placed anywhere in the world but not where it should be placed. Therefore, the approach for the reprojection of a raster into another coordinate reference system implies the following steps:\n",
    "\n",
    "1. Retrieve the source and target spatial reference systems (e.g., derive from a `gdal.Dataset` or an `EPSG` authority code).\n",
    "1. Read the geo transformation of the source dataset ( `gdal.Dataset.GetGeoTransform()`).\n",
    "1. Derive the number of pixels and the spacing between pixels in the new (reprojected) dataset.\n",
    "1. Instantiate the new (reprojected) dataset.\n",
    "1. Project an image of the source dataset onto the new (reprojected) dataset (`gdal.ReprojectImage()`).\n",
    "\n",
    "The spatial reference system can be derived from a dataset with the explanations in the [shapefile](https://hydro-informatics.com/jupyter/geo-shp.html#reproject-shp) section by writing a `get_srs()` function. The following code block shows the `get_srs()` function (uses the `osr` library from `osgeo` / `gdal` ), which is also integrated in [flusstools](https://flusstools.readthedocs.io/en/latest/geotools.html#module-flusstools.geotools.srs_mgmt) (`geotools.get_srs()`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_srs(dataset):\n",
    "    \"\"\"\n",
    "    Get the spatial reference of any gdal.Dataset\n",
    "    :param dataset: osgeo.gdal.Dataset (raster)\n",
    "    :output: osr.SpatialReference\n",
    "    \"\"\"\n",
    "    sr = osr.SpatialReference()\n",
    "    sr.ImportFromWkt(dataset.GetProjection())\n",
    "    # auto-detect epsg\n",
    "    auto_detect = sr.AutoIdentifyEPSG()\n",
    "    if auto_detect != 0:\n",
    "        sr = sr.FindMatches()[0][0]  # Find matches returns list of tuple of SpatialReferences\n",
    "        sr.AutoIdentifyEPSG()\n",
    "    # assign input SpatialReference\n",
    "    sr.ImportFromEPSG(int(sr.GetAuthorityCode(None)))\n",
    "    return sr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the `open_raster()` and `get_srs()` functions, we have all necessary ingredients to accomplish the raster reprojection workflow in another funcation called `reproject_raster()`. An additional feature of the function is that it ensures the correct use of `osr.CoordinateTransformation`, which behaves differently under `gdal` 3.0 compared with older `gdal` versions ([read on OSGeo's GitHub page](https://github.com/OSGeo/gdal/issues/1546))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def reproject_raster(source_dataset, source_srs, target_srs):\n",
    "    \"\"\"\n",
    "    Reproject a raster dataset (preferably use through reproject function)\n",
    "    :param source_dataset: osgeo.ogr.DataSource (instantiate with ogr.Open(SHP-FILE))\n",
    "    :param source_srs: osgeo.osr.SpatialReference (instantiate with get_srs(source_dataset))\n",
    "    :param target_srs: osgeo.osr.SpatialReference (instantiate with get_srs(DATASET-WITH-TARGET-PROJECTION))\n",
    "    \"\"\"\n",
    "    # READ THE SOURCE GEO TRANSFORMATION (ORIGIN_X, PIXEL_WIDTH, 0, ORIGIN_Y, 0, PIXEL_HEIGHT)\n",
    "    src_geo_transform = source_dataset.GetGeoTransform()\n",
    "    \n",
    "    # DERIVE PIXEL AND RASTER SIZE\n",
    "    pixel_width = src_geo_transform[1]\n",
    "    x_size = source_dataset.RasterXSize\n",
    "    y_size = source_dataset.RasterYSize\n",
    "\n",
    "    # ensure that TransformPoint (later) uses (x, y) instead of (y, x) with gdal version >= 3.0\n",
    "    source_srs.SetAxisMappingStrategy(osr.OAMS_TRADITIONAL_GIS_ORDER)\n",
    "    target_srs.SetAxisMappingStrategy(osr.OAMS_TRADITIONAL_GIS_ORDER)\n",
    "\n",
    "    # get CoordinateTransformation\n",
    "    coord_trans = osr.CoordinateTransformation(source_srs, target_srs)\n",
    "\n",
    "    # get boundaries of reprojected (new) dataset\n",
    "    (org_x, org_y, org_z) = coord_trans.TransformPoint(src_geo_transform[0], src_geo_transform[3])\n",
    "    (max_x, min_y, new_z) = coord_trans.TransformPoint(src_geo_transform[0] + src_geo_transform[1] * x_size,\n",
    "                                                       src_geo_transform[3] + src_geo_transform[5] * y_size,)\n",
    "\n",
    "    # INSTANTIATE NEW (REPROJECTED) IN-MEMORY DATASET AS A FUNCTION OF THE RASTER SIZE\n",
    "    mem_driver = gdal.GetDriverByName('MEM')\n",
    "    tar_dataset = mem_driver.Create(\"\",\n",
    "                                    int((max_x - org_x) / pixel_width),\n",
    "                                    int((org_y - min_y) / pixel_width),\n",
    "                                    1, gdal.GDT_Float32)\n",
    "    # create new GeoTransformation\n",
    "    new_geo_transformation = (org_x, pixel_width, src_geo_transform[2],\n",
    "                              org_y, src_geo_transform[4], -pixel_width)\n",
    "\n",
    "    # assign the new GeoTransformation to the target dataset\n",
    "    tar_dataset.SetGeoTransform(new_geo_transformation)\n",
    "    tar_dataset.SetProjection(target_srs.ExportToWkt())\n",
    "\n",
    "    # PROJECT THE SOURCE RASTER ONTO THE NEW REPROJECTED RASTER\n",
    "    rep = gdal.ReprojectImage(source_dataset, tar_dataset,\n",
    "                              source_srs.ExportToWkt(), target_srs.ExportToWkt(),\n",
    "                              gdal.GRA_Bilinear)\n",
    "    return tar_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the `reproject_raster` function in a Python script requires a source dataset and another (orientation) dataset with the new coordinate system into which the source dataset will be projected. The following example shows how to re-project the above-created Froude-number raster into the `EPSG=3857` coordinate system for viewing it in QGIS on a *Google Satellite* basemap ([recall basemaps in QGIS tutorial](https://hydro-informatics.com/geopy/use-qgis.html#basemap)). As orientation data set, use [web_frame.tif](https://raw.githubusercontent.com/hydro-informatics/jupyter-python-course/main/geodata/rasters/web_frame.tif), which was created on the *Google Satellite* basemap.\n",
    "\n",
    "With the `get_srs()` function that automatically detects the raster projection and spatial reference system, we can use the `create_raster()` function to reproject the above-created `Fr1000cfs.tif` raster (e.g., to `epsg=4326`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load original and orientation rasters\n",
    "source_file_name = r\"\" + os.path.abspath(\"\") + \"/geodata/rasters/Fr1000cfs.tif\"\n",
    "orientation_file_name = r\"\" + os.path.abspath(\"\") + \"/geodata/rasters/web_frame.tif\"\n",
    "\n",
    "src_dataset, src_band = open_raster(source_file_name)\n",
    "ort_dataset, ort_band = open_raster(orientation_file_name)\n",
    "\n",
    "src_srs = get_srs(src_dataset)\n",
    "new_srs = get_srs(ort_dataset)\n",
    "\n",
    "print(\"Source EPSG: \" + str(src_srs.GetAuthorityCode(None)))\n",
    "print(\"Target EPSG: \" + str(new_srs.GetAuthorityCode(None)))\n",
    "\n",
    "# flush orientation dataset\n",
    "ort_dataset, ort_band = None, None\n",
    "\n",
    "# create re-projected raster and save as GeoTIFF\n",
    "reproj_dataset = reproject_raster(src_dataset, src_srs, new_srs)\n",
    "reproj_file_name = r\"\" + os.path.abspath(\"\") + \"/geodata/rasters/Fr1000cfs_reproj.tif\"\n",
    "array_data = reproj_dataset.ReadAsArray()\n",
    "new_epsg = int(new_srs.GetAuthorityCode(None))\n",
    "geo_transformation = reproj_dataset.GetGeoTransform()\n",
    "create_raster(reproj_file_name, raster_array=array_data, epsg=new_epsg, geo_info=geo_transformation)\n",
    "reproj_dataset = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plotted in QGIS, the reprojected Froude-number raster looks like this:\n",
    "\n",
    "![img](https://hydro-informatics.com/_images/qgis-reproj-Froude.png)\n",
    "\n",
    "The `reproject_raster` function is also available (slightly modified) in [flusstools](https://flusstools.readthedocs.io/en/latest/geotools.html#module-flusstools.geotools.srs_mgmt) (`geotools.reproject_raster()`), where saving the new reprojected raster is embedded in the function (automatically appends the syllable `\"_epsg[NO]\"` to the original file name).\n",
    "\n",
    "> **Note:** To display multiple rasters with different coordinate systems on the same map in QGIS, the coordinate systems must be harmonized in most cases. While QGIS will automatically propose to convert a raster with a non-project-compliant CRS to the project CRS, it also has a dedicated function for adjusting raster coordinate systems: In QGIS, click on the **Raster** menu > **Projections** > **Warp (Reproject)...**. Select the raster(s) to reproject (i.e., the raster(s) to harmonize with the project coordinate system). However, *Warp* may not perform all reprojection steps as desired and lead to wrong placements of the new raster. The *Warp* method is also available in Python through `gdal.Warp` ([read the docs](https://gdal.org/tutorials/warp_tut.html)): <br><br>`kwargs={'format': 'GTiff', 'geoloc': True}`<br>`gdal.Warp(TARGET_GEO_TIFF_FILE_NAME, SOURCE_GEO_TIFF_FILE_NAME, **kwargs)`\n",
    "\n",
    "\n",
    "> **The coordinate transformation fails** when no transformation between the indicated source and target spatial reference system can be established (i.e., `gdal` does not know the transformation). This problem occurs often when old, regional coordinate systems are transformed into coordinate systems for web applications (e.g., `EPSG=3857`). Read more in the [gdal docs](https://gdal.org/tutorials/osr_api_tut.html#coordinate-transformation).`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Zonal Statistics for Morphological Units\n",
    "\n",
    "In hydraulic and geospatial analyses, the question of statistical values of certain areas of one or more rasters often arises. For instance, we may be interested in mean values and standard deviations in specific zones of a water body. For this purpose, *Zonal statistics* enable the delineation of an area of a raster by using a polygon shapefile.\n",
    "\n",
    "The *River Architect* dataset includes a slackwater zone and zonal statistics help to identify the mean water depth and flow velocity of slackwaters, which are a so-called morphological unit ([recall the create-shapefile section](https://hydro-informatics.com/jupyter/geo-shp.html#create-shp)).\n",
    "\n",
    "> **Background:** Instream morphological units aid in describing the geospatial organization of fluvial landforms, which play an important role in ecohydraulic analyses and river restoration. For instance, *pool* units describe deep water zones with low flow velocity, *riffle* units are typically characterized by shallow water depths and high velocity, and *slackwater* units are shallow flow zones with low flow velocity (many juvenile fish love slackwaters). [Wyrick and Pasternack (2014)](https://www.sciencedirect.com/science/article/pii/S0169555X14000099) introduce the delineation of morphological units and an open-access summary can be found in the [Appendix Sect. 5 in Schwindt et al. (2020)](https://ars.els-cdn.com/content/image/1-s2.0-S235271101930281X-mmc1.pdf).\n",
    "\n",
    "To analyze a visually apparent slackwater unit, we can draw a polygon in a new shapefile that delineates morphological units. The following figures guide through the creation of a polygon shapefile and the delineation of the riffle with QGIS. Start with opening QGIS and create a new project. Import the water depth and flow velocity rasters showing the slow and shallow water zone. Then follow the workflow in the illustrations below.\n",
    "\n",
    "![img](https://hydro-informatics.com/_images/qgis-create-shp.png)\n",
    "![img](https://hydro-informatics.com/_images/qgis-new-shp.png)\n",
    "![img](https://hydro-informatics.com/_images/qgis-toggle-editing.png)\n",
    "![img](https://hydro-informatics.com/_images/qgis-draw-polygon.png)\n",
    "\n",
    "Finalize the drawing by clicking on the **Save Edits** disk button (between **Toggle Editing** and **Add Polygon**). Just in case, the slackwater delineation polygon shapefile is also available at [the jupyter-python repository](https://raw.githubusercontent.com/hydro-informatics/jupyter-python-course/main/geodata/shapefiles/slackwater-poly.zip) of this eBook. \n",
    "\n",
    "\n",
    "> **Recall:** The new polygon is not saved as long as the edits are not saved. That means: Regularly saving edits when drawing features in QGIS.\n",
    "\n",
    "\n",
    "Zonal statistics can be calculated using the `gdal` and `ogr` libraries, but this is a little cumbersome. The [rasterio](https://rasterio.readthedocs.io/en/latest/) (`conda install -c conda-forge rasterio`) library provides a much more convenient method to calculate zonal statistics with its `rasterstats.zonal_stats(SHP-FILE, RASTER, STATSTICS-TYPES)` method. With `zonal_stats`, we can easily obtain statistics of the water depth and flow velocity rasters in the limits of the new slackwater polygon."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import rasterstats as rs\n",
    "# make file names\n",
    "h_file = r\"\" + os.getcwd() + \"/geodata/rasters/h001000.tif\"\n",
    "u_file = r\"\" + os.getcwd() + \"/geodata/rasters/u001000.tif\"\n",
    "zone = r\"\" + os.getcwd() + \"/geodata/shapefiles/slackw-poly.shp\"\n",
    "\n",
    "# get water depth stats in zone\n",
    "h_stats = rs.zonal_stats(zone, h_file, stats=[\"min\", \"max\", \"median\", \"majority\", \"sum\"])\n",
    "# get flow velocity stats in zone - note the different stats assignment\n",
    "u_stats = rs.zonal_stats(zone, u_file, stats=\"min max median majority sum\")\n",
    "\n",
    "print(h_stats)\n",
    "print(u_stats)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Recall that both rasters are in the U.S. customary unit system (i.e., feet and feet per second). More statistics can be calculated with `zonal_stats`: <br>\n",
    "`min`, `max`, `mean`, `count`, `sum`, `std`, `median`, `majority`, `minority`, `unique`, `range`, `nodata`, `percentile_<q>` (where `<q>` can be any float number between 0 and 100).\n",
    "\n",
    "In addition, user-defined statistics can be added, where the [numpy.ma](https://numpy.org/doc/stable/reference/routines.ma.html#masked-arrays-arithmetics) module is particularly useful with its array handling capacities, such as for transposing or specifying statistics along an axis. For instance, we can define a specific function to calculate the standard deviation as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    " def raster_std(raster_array):\n",
    "        return np.ma.std(raster_array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use the `raster_std` function in `zonal_stats`, write something similar to this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "u_stats = rs.zonal_stats(\n",
    "    zone, u_file,\n",
    "    stats=\"min max\",\n",
    "    add_stats={\"stdev\": raster_std}\n",
    ")\n",
    "print(u_stats)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clip a Raster\n",
    "The above-introduced `rasterstats.zonal_stats` method works with *\"Mini-Rasters\"*, which represent clips of the input raster to the user-defined polygon shapefile. Moreover, a mini-raster itself can be obtained by defining the optional keyword argument `raster_out=True`. In the case that we want to get the original raster clipped without and statistical operation, we can use a little trick by defining an additional statistics function that returns the original array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    " def original(raster_array):\n",
    "        return raster_array"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With `raster_out=True` and the `original()` function, we can retrieve the clipped raster in the following array types:\n",
    "\n",
    "* `mini_raster_array` - a clipped and masked numpy array,\n",
    "* `mini_raster_affine` - a transformation as an `Affine` object (i.e., with [affine 6-parameter transformation](https://gdal.org/user/raster_data_model.html#affine-geotransform)), and\n",
    "* `mini_raster_nodata` - `NoData` values.\n",
    "\n",
    "The following code block illustrates the usage for retrieving a numpy array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import rasterstats as rs\n",
    "\n",
    "h_file = r\"\" + os.getcwd() + \"/geodata/rasters/h001000.tif\"\n",
    "h_stats = rs.zonal_stats(zone, h_file, stats=\"count\",\n",
    "                         add_stats={\"original\": original},\n",
    "                         raster_out=True)\n",
    "print(h_stats[0].keys())\n",
    "print(h_stats[0][\"mini_raster_array\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Tip:** Use the above-shown methods to assign a projection and save the clipped array as a GeoTIFF raster. The functions are also implemented in [flusstools] (more precisely, in the script [`flusstools.geotools.raster_mgmt.create_raster`](https://flusstools.readthedocs.io/en/latest/geotools.html#module-flusstools.geotools.raster_mgmt, which can be used as `geotools.raster_mgmt()`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Slope / Aspect Maps and Built-in Command Line Scripts\n",
    "\n",
    "Hillslope maps are an important parameter in hydraulics, hydrology, and ecology. For instance, the slope determines the flow direction of water and it is also a criterion for delineating the habitat of many species. To calculate hill slope gradients and directions, `gdal` has a command-line tool called `gdaldem` , which requires a DEM (Digital Elevation Model) raster. The general use of `gdaldem` in the command line is (arguments in brackets are optional): "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "gdaldem slope input_dem output_slope_map  [-p use percent slope (default=degrees)] [-s scale* (default=1)] [-alg ZevenbergenThorne] [-compute_edges] [-b Band (default=1)] [-of format] [-co \"NAME=VALUE\"]* [-q]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To call the command-line tool, we can either use a terminal/command prompt or Python's standard library `subprocess`. The following code block illustrates the usage of the `gdaldem` command line tool through [`subprocess.call()`](https://docs.python.org/3/library/subprocess.html) to create a slope raster (in percent) from the *River Architect* sample data's [`dem.tif`](https://raw.githubusercontent.com/hydro-informatics/jupyter-python-course/main/geodata/rasters/dem.tif). Note that `subprocess.call()` returns `0` if the command execution was successful and any other return value indicates an error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import subprocess, os\n",
    "\n",
    "cmd_create_slope = \"gdaldem slope {0}/geodata/rasters/dem.tif {0}/geodata/rasters/slope-percent.tif -p\".format(os.path.abspath(\"\"))\n",
    "subprocess.call(cmd_create_slope, shell=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![img](https://hydro-informatics.com/_images/qgis-slope.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to the absolute slope (i.e., inclination in degrees), or instead of the slope, it can be important to know the slope direction (e.g., inclination toward south, west, east, or north). A raster indicating the slope direction is called **aspect raster**, where south corresponds to 0° (and 360°), west to 90°, north to 180°, and east to 270°. An aspect raster can also be created with `gdaldem`:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "gdaldem aspect input_dem output_aspect_map [-trigonometric] [-zero_for_flat] [-alg ZevenbergenThorne] [-compute_edges] [-b Band (default=1)] [-of format] [-co \"NAME=VALUE\"]* [-q]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To create an aspect raster of the *River Architect* sample data DEM run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cmd_create_aspect = \"gdaldem aspect {0}/geodata/rasters/dem.tif {0}/geodata/rasters/slope-aspect.tif\".format(os.path.abspath(\"\"))\n",
    "subprocess.call(cmd_create_aspect, shell=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![img](https://hydro-informatics.com/_images/qgis-aspect.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Least Cost Path between Pixels (and another Way of Reprojection)\n",
    "\n",
    "### Ecohydraulic Background\n",
    "Least cost paths are important to plan efficient routes for navigation (e.g., in a car) and they can also be helpful in ecohydraulics. Let's take for a moment the position of a fish that, after a flood with decreasing discharge, wants to swim as fast as possible from the floodplain back into the main channel where there is enough water. In the figure below, point 1 shows the starting point on the floodplain and point 2 the destination in the main channel. The reddish background represents the above-produced slope raster (*slope-percent.tif*) and the water depth at average annual discharge is colored in blue."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![img](https://hydro-informatics.com/_images/qgis-slope-pts.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Naturally, the path of least costs corresponds to the path of the steepest, monotonously downward-facing slope and we will assume that a fish is able to find it.\n",
    "\n",
    "> **A note on hydro-peaking:** In many rivers, fish face the daily challenge of escaping from the deadly trap of lateral depressions. The reason for this is that many hydroelectric power plants cause abrupt fluctuations in discharge due to fluctuations in energy demand and production (so-called **hydro-peaking**). As a result, there is a high stranding risk for fish in many regulated rivers that are subjected to hydro-peaking. The following figure illustrates stranding risk zones as a function of discharge (in cubic feet per second) at the lower Yuba River (California, USA).\n",
    "\n",
    "![img](https://hydro-informatics.com/_images/ra-stranding.png)\n",
    "\n",
    "*Image source: [The River Architect Wiki / Kenneth Larrieu](https://riverarchitect.github.io/RA_wiki/StrandingRisk)*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Functions and Libraries Involved \n",
    "\n",
    "The `skimage` (`scikit-image`) library (see [Other packages in the Open source libraries section](https://hydro-informatics.com/geopy/geo-pckg.html#other-geo-pckgs)) provides with [`skimage.graph.route_through_array`](https://scikit-image.org/docs/0.19.x/api/skimage.graph.html) a smart method to calculate the least cost path by summing up pixel-wise connections from point 1 to point 2.\n",
    "\n",
    "Here is how it works: Assume a numpy array (e.g., with random slope values) that looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "slope_image = np.random.randint(100, size=(3, 5))\n",
    "slope_image"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To find the fastest way from array index `[0][0]` (top left  `point_1 = (0, 0)`) to array index `[2][4]` (bottom right point `point_2 = (2, 4)`), we can use `route_through_array()` to get a *list* (`least_cost_path_indices`) with the array coordinates of the path to go, and the costs (`weight`) involved (sum of all pixels of the least cost path):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from skimage.graph import route_through_array\n",
    "\n",
    "point_1 = (0, 0)\n",
    "point_2 = (2, 4)\n",
    "least_cost_path_indices, weight = route_through_array(slope_image, point_1, point_2)\n",
    "least_cost_path_indices, weight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To integrate the least cost path list into an array that we can *rasterize* (`geotools.create_raster()`), we can paste `least_cost_path_indices` into a numpy-zeros array of the original slope raster (image) as a transposed list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "least_cost_path_indices = np.array(least_cost_path_indices).T\n",
    "least_cost_path_array = np.zeros_like(slope_image)\n",
    "least_cost_path_array[least_cost_path_indices[0], least_cost_path_indices[1]] = 1\n",
    "least_cost_path_array"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In practice, the slope raster is georeferenced, and therefore, we have to use pixel coordinates relative to the coordinate system origin. For this purpose, we need two more functions:\n",
    "\n",
    "* One function to calculate the pixel-index related offset that we will name `coords2offset`: The `coords2offset()` function will return the x-y shift in the form of \"number of pixels\" (two *integer*s, one for *x* and one for *y* shift). \n",
    "* The [above-defined `get_srs()`](https://hydro-informatics.com/jupyter/geo-raster.html#reproject-raster) function (i.e., `geotools.get_srs()`).\n",
    "\n",
    "The `coords2offset()` function looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def coords2offset(geo_transform, x_coord, y_coord):\n",
    "    \"\"\"\n",
    "    Returns x-y pixel offset\n",
    "    :param geo_transform: osgeo.gdal.Dataset.GetGeoTransform() object\n",
    "    :param x_coord: FLOAT of x-coordinate\n",
    "    :param y_coord: FLOAT of y-coordinate\n",
    "    :return: offset_x, offset_y (both integer of pixel numbers)\n",
    "    \"\"\"\n",
    "    origin_x = geo_transform[0]\n",
    "    origin_y = geo_transform[3]\n",
    "    pixel_width = geo_transform[1]\n",
    "    pixel_height = geo_transform[5]\n",
    "    offset_x = int((x_coord - origin_x) / pixel_width)\n",
    "    offset_y = int((y_coord - origin_y) / pixel_height)\n",
    "    return offset_x, offset_y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Tip:** The `coords2offset()` function is also available in flusstools with more robust raise-exception notation: use with `geotools.coords2offset()` and have a look into the script located in [`flusstools.geotools.dataset_mgmt`](https://raw.githubusercontent.com/Ecohydraulics/flusstools-pckg/main/flusstools/geotools/dataset_mgmt.py)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `coords2offset()` function converts a raster array (e.g., produced with the above-defined `geotools.raster2array()` function) into an array that can be used with `route_through_array()` with the following workflow:\n",
    "\n",
    "1. Use the raster's `geo_transform` (`gdal.Dataset.GetGeoTransform = (origin_x, pixel_width, 0, origin_y, 0, pixel_height)`) and the start and end point coordinates (i.e, `start_coord` of point 1 and `stop_coord` of point 2) in `coords2offset()` to get their pixel indices (`start_index_x`, `start_index_y`, `stop_index_x`, and `stop_index_y`) in the raster array.\n",
    "1. Replace `np.nan` values in the raster array with values that are higher than the maximum value of the array. Do not use zeros, because we want to exclude the `np.nan` pixels from the least cost path later by overwriting `np.nan` with very high pixel costs.\n",
    "1. Use `route_through_array()` as above explained with the optional arguments `geometric=True` (use the [*MCP_Geometric* class](https://scikit-image.org/docs/0.19.x/api/skimage.graph.html#skimage.graph.MCP_Geometric) rather than [*MCP_base*](https://scikit-image.org/docs/0.19.x/api/skimage.graph.html#skimage.graph.MCP) to calculate costs) and `fully_connected=True` (enables using diagonal pixels as direct neighbors).\n",
    "1. Integrate the least cost path list (`index_path`) into a numpy-zeros array (child of `raster_array`, as explained above) and return the `path_array`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_path_array(raster_array, geo_transform, start_coord, stop_coord):\n",
    "    # transform coordinates to array index\n",
    "    start_index_x, start_index_y = coords2offset(geo_transform, start_coord[0], start_coord[1])\n",
    "    stop_index_x, stop_index_y = coords2offset(geo_transform, stop_coord[0], stop_coord[1])\n",
    "\n",
    "    # replace np.nan with max raised by an order of magnitude to exclude pixels from least cost\n",
    "    raster_array[np.isnan(raster_array)] = np.nanmax(raster_array) * 10\n",
    "\n",
    "    # create path and costs\n",
    "    index_path, cost = route_through_array(raster_array, (start_index_y, start_index_x),\n",
    "                                               (stop_index_y, stop_index_x),\n",
    "                                               geometric=True, fully_connected=True)\n",
    "\n",
    "\n",
    "    index_path = np.array(index_path).T\n",
    "    path_array = np.zeros_like(raster_array)\n",
    "    path_array[index_path[0], index_path[1]] = 1\n",
    "    return path_array"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Application\n",
    "\n",
    "Recall, we defined the following functions (all are available in [flusstools](https://flusstools.readthedocs.io) through `from flusstools import geotools`) that we can use for the calculation of the least cost path to get from point 1 to point 2 in the *slope-percent.tif* raster:\n",
    "\n",
    "* `geotools.raster2array()` \n",
    "* `geotools.create_path_array()`\n",
    "* `geotools.get_srs()`\n",
    "* `geotools.create_raster()`\n",
    "\n",
    "The below code block uses these functions as follows:\n",
    "\n",
    "1. Define input (*slope-percent.tif*) and output (*least_cost.tif*) raster names (with directories).\n",
    "1. Define the coordinates of points 1 and 2 as *tuple*s (x, y) in the *EPSG:6418* projection.\n",
    "1. Load the input raster(`src_raster`), its band as array (`raster_array`), and geotransformation (`geo_transform`) with the `raster2array()` function.\n",
    "1. Get the least cost path indicated with ones in an array of zeros ( i.e., an on-off `path_array`) with the `create_path_array()` function.\n",
    "1. Get the `osgeo.osr.SpatialReference` of the input raster (`src_raster = osgeo.gdal.Dataset(\"slope-percent.tif\")`).\n",
    "1. Create the least cost path GeoTIFF raster with the `create_raster()` function as a `gdal.GDT_Byte` band."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from skimage.graph import route_through_array\n",
    "\n",
    "# define raster input and out names\n",
    "in_raster_name = r\"\" + os.path.abspath(\"\") + \"/geodata/rasters/slope-percent.tif\"\n",
    "out_raster_name = r\"\" + os.path.abspath(\"\") + \"/geodata/rasters/least-cost.tif\"\n",
    "# define coordinates of points 1 and 2 (in EPSG:6418)\n",
    "point_1_coord = (6749261.94092826917767525, 2206970.35179582564160228)  \n",
    "point_2_coord = (6749016.82820663042366505, 2207050.61491037486121058)\n",
    "\n",
    "# get source raster (osgeo.gdal.Dataset), the raster as nd.array, and the geotransformation tuple\n",
    "src_raster, raster_array, geo_transform = raster2array(in_raster_name)\n",
    "# get the zeros-like array with least cost pixels = 1\n",
    "path_array = create_path_array(raster_array, geo_transform, point_1_coord, point_2_coord)\n",
    "# get the spatial reference system of the input raster (slope-percent.tif)\n",
    "src_srs = get_srs(src_raster)\n",
    "# project the least cost path_array into a Byte (only zeros and ones) raster\n",
    "create_raster(out_raster_name, path_array, epsg=int(src_srs.GetAuthorityCode(None)),\n",
    "              rdtype=gdal.GDT_Byte, geo_info=geo_transform)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![img](https://hydro-informatics.com/_images/qgis-least-cost.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Legitimately, you may wonder whether it was better to represent the least cost path as a line. Of course, that is correct. However, this operation is a conversion of a raster into a line shapefile, which is explained in the next section on [geodata conversion](https://hydro-informatics.com/jupyter/geo-convert.html#raster2line). Curious readers can also directly use the `raster2line()` function of flusstools (or have a look at it in the  [flusstools.geotools/dataset_mgmt.py](https://raw.githubusercontent.com/Ecohydraulics/flusstools-pckg/main/flusstools/geotools/dataset_mgmt.py) script)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Exercise:** Familiarize with raster handling in the [geospatial ecohydraulics](https://hydro-informatics.com/exercises/ex-geco) exercise."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "fluss_kernel",
   "language": "python",
   "name": "fluss_kernel"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}