{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "a1d6ca95-688c-4db8-93b4-dad6bcff1e23", "metadata": {}, "outputs": [], "source": [ "import glob\n", "import os\n", "import shutil\n", "\n", "import contextily as cx\n", "import geopandas as gpd\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "from config import SETTINGS\n", "\n", "import seabeepy as sb\n", "\n", "plt.style.use(\"ggplot\")" ] }, { "cell_type": "code", "execution_count": null, "id": "5f03cc41-8d6b-471f-b1ad-f41bcd97cac1", "metadata": {}, "outputs": [], "source": [ "# Login to MinIO\n", "minio_client = sb.storage.minio_login(\n", " user=SETTINGS.MINIO_ACCESS_ID, password=SETTINGS.MINIO_SECRET_KEY\n", ")" ] }, { "cell_type": "markdown", "id": "ccfd5b99-053e-4156-96d3-169efad02b58", "metadata": {}, "source": [ "# SeaBee annotation workflow\n", "\n", "This notebook includes useful functions to streamline the SeaBee annotation workflow for **habitat classification**. See the documentation [here](https://seabee-no.github.io/documentation/annotation.html#sec-packaging) for details.\n", "\n", "## 1. User input" ] }, { "cell_type": "code", "execution_count": null, "id": "ab0fd924-8b06-49c8-abae-1bec32d568e5", "metadata": {}, "outputs": [], "source": [ "# Version of the Excel class definitions to use\n", "class_version = \"1-0\"\n", "\n", "# Campaign folder. Should be a folder within '/niva/tidy/annotation'\n", "camp_fold = r\"/home/notebook/shared-seabee-ns9879k/niva-tidy/annotation/olberg_2023\"\n", "\n", "# Field/column identifying unique subareas in 'subarea_boundaries' shapefile\n", "subarea_id_field = \"id\"\n", "\n", "# Output CRS for geopackage. Should match the CRS of the drone images\n", "crs = \"epsg:25832\"\n", "\n", "# Folder for saving intermediates\n", "temp_fold = r\"/home/notebook/annotation_temp\"" ] }, { "cell_type": "markdown", "id": "f816af6d-aeed-484b-9892-42eb219f3f7e", "metadata": {}, "source": [ "## 2. Create class definition file\n", "\n", "Create a class definition file compatible with ArcGIS Pro from an Excel table. The Excel file **must** be structured as illustrated below:\n", "\n", "| **A** | **B** | **C** | **D** | **E** | **F** | **G** |\n", "|:-------------:|:-------------:|:-------------:|:----------------------------------------------:|:------------:|:------------:|:------------:|\n", "| **lev1_name** | **lev2_name** | **lev3_name** | **lev3_desc** | **lev1_hex** | **lev2_hex** | **lev3_hex** |\n", "| ALGAE | RED | VERLA | Vertebrata lanosa (grisetangdokke/trøffeltang) | #996633 | #CF4D50 | #47223E |\n", "| ALGAE | TURF | TURF | Unspecified turf (lurv) | #996633 | #779A7F | #779A7F |\n", "| ANGIO | ANGIO | ZOSMA | Zostera marina (ålegras) | #A7FD67 | #A7FD67 | #A7FD67 |\n", "| MAERL | MAERL | MAERL | Maerl (løstliggende rugl) | #CAA2A9 | #CAA2A9 | #CAA2A9 |\n", "| URCHIN | URCHIN | ECHES | Echinus esculentus (rød kråkebolle) | #868509 | #868509 | #ED7D31 |\n", "| URCHIN | URCHIN | GRAAC | Gracilechinus acutus (langpiggsjøpiggsvin) | #868509 | #868509 | #C98C58 |\n", "| STARFISH | STARFISH | OPHNI | Ophiocomina nigra (svart slangestjerne) | #80258F | #80258F | #80258F |\n", "| STARFISH | STARFISH | ASTRU | Asterias rubens (vanlig korstroll) | #80258F | #80258F | #BE4BD1 |\n", "\n", "The output is an `.ecs` file (JSON) that can be loaded into the Training Samples Manager in ArcGIS Pro's Image Analyst extension. Both this file and the original Excel file should be pushed to the `annotation` repository on GitHub." ] }, { "cell_type": "code", "execution_count": null, "id": "3de4eab7-d779-450f-a5b2-fedcc3d44bc5", "metadata": {}, "outputs": [], "source": [ "# Read template data from Excel\n", "class_version = class_version.replace(\".\", \"-\")\n", "major_ver_no = int(class_version.split(\"-\")[0])\n", "xl_path = f\"../class_definitions/seabee_habitat_classes_v{class_version}.xlsx\"\n", "if major_ver_no < 1:\n", " # Use old template structure\n", " df = pd.read_excel(xl_path, usecols=\"A:D\")\n", "else:\n", " # Use updated template\n", " df = pd.read_excel(xl_path, usecols=\"A:G\")\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "c85d6284-a3e8-4f45-b58e-e0e43907a861", "metadata": {}, "outputs": [], "source": [ "# Convert to .ecs\n", "name = f\"seabee_class_definitions_v{class_version}\"\n", "desc = f\"v{class_version} of the SeaBee class definition file.\"\n", "out_fold = r\"../class_definitions\"\n", "\n", "if major_ver_no < 1:\n", " sb.anno.class_definition_from_df_deprecated(\n", " df,\n", " name,\n", " out_fold=out_fold,\n", " version=1,\n", " org=\"NIVA\",\n", " desc=desc,\n", " )\n", "else:\n", " sb.anno.class_definition_from_df(\n", " df,\n", " name,\n", " out_fold=out_fold,\n", " version=1,\n", " org=\"NIVA\",\n", " desc=desc,\n", " )" ] }, { "cell_type": "markdown", "id": "711178b7-bf13-4af8-bb34-23ea39e04394", "metadata": {}, "source": [ "## 3. Process annotation\n", "\n", "### 3.1. Check data structure" ] }, { "cell_type": "code", "execution_count": null, "id": "4badff60-7211-4747-813d-aa57f7a79414", "metadata": {}, "outputs": [], "source": [ "# Check data structure within 'camp_fold'\n", "def check_single_shapefile(folder_path):\n", " search_path = os.path.join(folder_path, \"*.shp\")\n", " flist = list(glob.glob(search_path))\n", " assert (\n", " len(flist) == 1\n", " ), f\"Folder '{os.path.basename(folder_path)}' should contain a single shapefile.\"\n", " return flist[0]\n", "\n", "\n", "subfolds = [\"annotation_by_subarea\", \"region_of_interest\", \"subarea_boundaries\"]\n", "for subfold in subfolds:\n", " assert os.path.isdir(\n", " os.path.join(camp_fold, subfold)\n", " ), f\"Subfolder '{subfold}' does not exist in 'camp_fold'.\"\n", "\n", "# Check 'subarea_boundaries' and 'region_of_interest' contain a single shapefile\n", "subarea_shp_path = check_single_shapefile(os.path.join(camp_fold, \"subarea_boundaries\"))\n", "roi_shp_path = check_single_shapefile(os.path.join(camp_fold, \"region_of_interest\"))" ] }, { "cell_type": "markdown", "id": "5e8ab6db-021f-4644-99bc-0248a4638a2a", "metadata": {}, "source": [ "### 3.2. Merge shapefiles\n", "\n", "During annotation, **all users should work with the same class definition file** (i.e. the `.ecs` created above). Annotations from all users for the same area can then be exported as shapefiles and added to a single folder, named `annotation_by_subarea`. As long as the same class definition file has been used by everyone, the shapefiles will have a consistent structure with the same classes. These can therefore be merged and \"dissolved\" to create a single annotation dataset for the whole area." ] }, { "cell_type": "code", "execution_count": null, "id": "8891bd57-97cb-46e2-a48e-f3bd7ee616ce", "metadata": {}, "outputs": [], "source": [ "# Merge and dissolve annotation shapefiles for each subarea\n", "# Make sure 'shp_fold' ONLY contains annotation shapefiles!\n", "shp_fold = os.path.join(camp_fold, \"annotation_by_subarea\")\n", "gdf = sb.anno.merge_shapefiles(shp_fold)\n", "gdf.head()" ] }, { "cell_type": "markdown", "id": "677b11ae-45d6-4c4f-9bf4-5e18772bc674", "metadata": {}, "source": [ "### 3.4. Assign subareas\n", "\n", "The merged shapefile above is intersected with the subarea polygons to create a set of training samples split by subarea." ] }, { "cell_type": "code", "execution_count": null, "id": "23d70834-2db0-4aa2-801b-22fa8cc9dd82", "metadata": {}, "outputs": [], "source": [ "# Assign subarea IDs\n", "subarea_gdf = gpd.read_file(subarea_shp_path)\n", "subarea_gdf.rename({subarea_id_field: \"subarea_id\"}, axis=\"columns\", inplace=True)\n", "subarea_gdf = subarea_gdf[[\"subarea_id\", \"geometry\"]]\n", "assert subarea_gdf.crs == gdf.crs\n", "gdf = gdf.overlay(subarea_gdf, how=\"intersection\")\n", "gdf.head()" ] }, { "cell_type": "markdown", "id": "4f3fc443-2ff9-4916-9001-93032153911c", "metadata": {}, "source": [ "### 3.5. Rebuild the class hierarchy\n", "\n", "ArcGIS Pro stores all annotation in a single field (`Classname`), regardless of the level in the `.ecs` hierarchy. For machine learning, it is better to have one column of class labels per level, as this makes it easier to generate raster training datasets using labels for any level. To get around this limitation, `class_definition_from_df` embeds the hierarchy in the `Classcode`, so that it can be reconstructed afterwards. This is done by the function below." ] }, { "cell_type": "code", "execution_count": null, "id": "1c5ddb49-587d-40fd-80d6-6b3a2590f222", "metadata": {}, "outputs": [], "source": [ "# Extract annotation levels to separate columns\n", "gdf = sb.anno.rebuild_class_hierarchy(gdf, class_version)\n", "\n", "# Save\n", "merged_anno_dir = os.path.join(temp_fold, \"merged_annotation\")\n", "os.makedirs(merged_anno_dir, exist_ok=True)\n", "anno_shp_path = os.path.join(merged_anno_dir, f\"merged_annotation_v{class_version}.shp\")\n", "gdf.to_file(anno_shp_path, index=False)\n", "gdf.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "dcf72bcc-d719-4097-95d4-f0c287f5e7a0", "metadata": {}, "outputs": [], "source": [ "# Plot\n", "ax = gdf.plot(\n", " column=\"lev2_name\", cmap=\"Set3\", categorical=True, legend=True, figsize=(10, 10)\n", ")\n", "cx.add_basemap(\n", " ax,\n", " crs=gdf.crs.to_string(),\n", " attribution=False,\n", " # source=\"https://opencache.statkart.no/gatekeeper/gk/gk.open_gmaps?layers=topo4&zoom={z}&x={x}&y={y}\",\n", " source=\"https://cache.kartverket.no/v1/wmts/1.0.0/topo/default/webmercator/{z}/{y}/{x}.png\",\n", ")\n", "leg = ax.get_legend()\n", "leg.set_bbox_to_anchor((1, 0.0, 0.4, 1))\n", "ax.axis(\"off\");" ] }, { "cell_type": "markdown", "id": "330a213d-7015-41ac-97f1-f594c9db126b", "metadata": {}, "source": [ "### 3.6. Create geopackage" ] }, { "cell_type": "code", "execution_count": null, "id": "4381b5ab-95ef-4f58-ae4c-17495d50fc63", "metadata": {}, "outputs": [], "source": [ "# Save annotation, ROI and subareas to a geopackage\n", "camp_name = os.path.basename(camp_fold)\n", "shp_path_dict = {\n", " f\"{camp_name}_merged_annotation_v{class_version}\": anno_shp_path,\n", " f\"{camp_name}_region_of_interest\": roi_shp_path,\n", " f\"{camp_name}_subareas\": subarea_shp_path,\n", "}\n", "gpkg_name = f\"{camp_name}_annotation.gpkg\"\n", "gpkg_path = os.path.join(temp_fold, gpkg_name)\n", "for layer, shp_path in shp_path_dict.items():\n", " gdf = gpd.read_file(shp_path).to_crs(crs)\n", " gdf.to_file(gpkg_path, layer=layer, driver=\"GPKG\", index=False)" ] }, { "cell_type": "code", "execution_count": null, "id": "a5896388-0308-4442-b8c8-075e22886d6c", "metadata": {}, "outputs": [], "source": [ "# Copy to MinIO and tidy\n", "sb.storage.copy_folder(\n", " merged_anno_dir,\n", " camp_fold,\n", " minio_client,\n", " overwrite=False,\n", " clean_dst=False,\n", " containing_folder=True,\n", ")\n", "sb.storage.copy_file(\n", " gpkg_path,\n", " os.path.join(camp_fold, gpkg_name),\n", " minio_client,\n", " overwrite=False,\n", ")\n", "shutil.rmtree(merged_anno_dir)\n", "os.remove(gpkg_path)" ] }, { "cell_type": "markdown", "id": "f6195afb-e6fc-47d9-933f-73b8500ae54d", "metadata": {}, "source": [ "## 4. Publish annotation" ] }, { "cell_type": "code", "execution_count": null, "id": "f3fc31ca-aa58-43a0-8834-d53267c64d69", "metadata": {}, "outputs": [], "source": [ "# Define layer styling using SeaBee standard SLDs.\n", "# 'style_level' is the level in the hierarchy used for colouring polygons\n", "style_level = 2\n", "style_dict = {\n", " f\"{camp_name}_merged_annotation_v{class_version}\": f\"annotation_classes_v{class_version}_level{style_level}.sld\",\n", " f\"{camp_name}_region_of_interest\": \"black_outline.sld\",\n", " f\"{camp_name}_subareas\": \"red_outline.sld\",\n", "}\n", "\n", "# Identify geopackage\n", "camp_name = os.path.basename(camp_fold)\n", "gpkg_name = f\"{camp_name}_annotation.gpkg\"\n", "gpkg_path = os.path.join(camp_fold, gpkg_name)\n", "layers = list(style_dict.keys())\n", "\n", "# Upload layers to GeoServer\n", "store_name = sb.geo.upload_geopackage_layers_to_geoserver(\n", " gpkg_path,\n", " layers,\n", " SETTINGS.GEOSERVER_USER,\n", " SETTINGS.GEOSERVER_PASSWORD,\n", " workspace=\"geonode\",\n", " style_dict=style_dict,\n", ")\n", "\n", "# Publish to GeoNode\n", "for layer in layers:\n", " sb.geo.publish_to_geonode(\n", " layer,\n", " SETTINGS.GEONODE_USER,\n", " SETTINGS.GEONODE_PASSWORD,\n", " store_name=store_name,\n", " workspace=\"geonode\",\n", " )" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }