{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\"Open\n", "\n", "Uncomment the following line to install [geemap](https://geemap.org) if needed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# !pip install geemap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Learning with Earth Engine - Accuracy Assessment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Supervised classification algorithms available in Earth Engine\n", "\n", "Source: https://developers.google.com/earth-engine/classification\n", "\n", "The `Classifier` package handles supervised classification by traditional ML algorithms running in Earth Engine. These classifiers include CART, RandomForest, NaiveBayes and SVM. The general workflow for classification is:\n", "\n", "1. Collect training data. Assemble features which have a property that stores the known class label and properties storing numeric values for the predictors.\n", "2. Instantiate a classifier. Set its parameters if necessary.\n", "3. Train the classifier using the training data.\n", "4. Classify an image or feature collection.\n", "5. Estimate classification error with independent validation data.\n", "\n", "The training data is a `FeatureCollection` with a property storing the class label and properties storing predictor variables. Class labels should be consecutive, integers starting from 0. If necessary, use remap() to convert class values to consecutive integers. The predictors should be numeric.\n", "\n", "To assess the accuracy of a classifier, use a `ConfusionMatrix`. The `sample()` method generates two random samples from the input data: one for training and one for validation. The training sample is used to train the classifier. You can get resubstitution accuracy on the training data from `classifier.confusionMatrix()`. To get validation accuracy, classify the validation data. This adds a `classification` property to the validation `FeatureCollection`. Call `errorMatrix()` on the classified `FeatureCollection` to get a confusion matrix representing validation (expected) accuracy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](https://i.imgur.com/vROsEiq.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step-by-step tutorial\n", "\n", "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ee\n", "import geemap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an interactive map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Map = geemap.Map()\n", "Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add data to the map\n", "\n", "Let's add the [USGS National Land Cover Database](https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD), which can be used to create training data with class labels. \n", "\n", "![](https://i.imgur.com/7QoRXxu.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "NLCD2016 = ee.Image('USGS/NLCD/NLCD2016').select('landcover')\n", "Map.addLayer(NLCD2016, {}, 'NLCD 2016')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the NLCD metadata to find out the Landsat image IDs used to generate the land cover data. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "NLCD_metadata = ee.FeatureCollection(\"users/giswqs/landcover/NLCD2016_metadata\")\n", "Map.addLayer(NLCD_metadata, {}, 'NLCD Metadata')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# point = ee.Geometry.Point([-122.4439, 37.7538]) # Sanfrancisco, CA\n", "# point = ee.Geometry.Point([-83.9293, 36.0526]) # Knoxville, TN\n", "point = ee.Geometry.Point([-88.3070, 41.7471]) # Chicago, IL" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metadata = NLCD_metadata.filterBounds(point).first()\n", "region = metadata.geometry()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metadata.get('2016on_bas').getInfo()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "doy = metadata.get('2016on_bas').getInfo().replace('LC08_', '')\n", "doy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ee.Date.parse('YYYYDDD', doy).format('YYYY-MM-dd').getInfo()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "start_date = ee.Date.parse('YYYYDDD', doy)\n", "end_date = start_date.advance(1, 'day')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR') \\\n", " .filterBounds(point) \\\n", " .filterDate(start_date, end_date) \\\n", " .first() \\\n", " .select('B[1-7]') \\\n", " .clip(region)\n", "\n", "vis_params = {\n", " 'min': 0,\n", " 'max': 3000,\n", " 'bands': ['B5', 'B4', 'B3']\n", "}\n", "\n", "Map.centerObject(point, 8)\n", "Map.addLayer(image, vis_params, \"Landsat-8\")\n", "Map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nlcd_raw = NLCD2016.clip(region)\n", "Map.addLayer(nlcd_raw, {}, 'NLCD')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare for consecutive class labels\n", "\n", "In this example, we are going to use the [USGS National Land Cover Database (NLCD)](https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD) to create label dataset for training.\n", "\n", "First, we need to use the `remap()` function to turn class labels into consecutive integers. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "raw_class_values = nlcd_raw.get('landcover_class_values').getInfo()\n", "print(raw_class_values)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_classes = len(raw_class_values)\n", "new_class_values = list(range(0, n_classes))\n", "new_class_values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class_palette = nlcd_raw.get('landcover_class_palette').getInfo()\n", "print(class_palette)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nlcd = nlcd_raw.remap(raw_class_values, new_class_values).select(['remapped'], ['landcover'])\n", "nlcd = nlcd.set('landcover_class_values', new_class_values)\n", "nlcd = nlcd.set('landcover_class_palette', class_palette)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Map.addLayer(nlcd, {}, 'NLCD')\n", "Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make training data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Make the training dataset.\n", "points = nlcd.sample(**{\n", " 'region': region,\n", " 'scale': 30,\n", " 'numPixels': 5000,\n", " 'seed': 0,\n", " 'geometries': True # Set this to False to ignore geometries\n", "})\n", "\n", "Map.addLayer(points, {}, 'training', False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(points.size().getInfo())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(points.first().getInfo())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split training and testing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use these bands for prediction.\n", "bands = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7']\n", "\n", "# This property of the table stores the land cover labels.\n", "label = 'landcover'\n", "\n", "# Overlay the points on the imagery to get training.\n", "sample = image.select(bands).sampleRegions(**{\n", " 'collection': points,\n", " 'properties': [label],\n", " 'scale': 30\n", "})\n", "\n", "# Adds a column of deterministic pseudorandom numbers. \n", "sample = sample.randomColumn()\n", "\n", "split = 0.7\n", "\n", "training = sample.filter(ee.Filter.lt('random', split))\n", "validation = sample.filter(ee.Filter.gte('random', split))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "training.first().getInfo()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "validation.first().getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train the classifier\n", "\n", "In this examples, we will use random forest classification." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "classifier = ee.Classifier.smileRandomForest(10).train(training, label, bands)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classify the image" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Classify the image with the same bands used for training.\n", "result = image.select(bands).classify(classifier)\n", "\n", "# # Display the clusters with random colors.\n", "Map.addLayer(result.randomVisualizer(), {}, 'classfied')\n", "Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Render categorical map\n", "\n", "To render a categorical map, we can set two image properties: `classification_class_values` and `classification_class_palette`. We can use the same style as the NLCD so that it is easy to compare the two maps. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class_values = nlcd.get('landcover_class_values').getInfo()\n", "print(class_values)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class_palette = nlcd.get('landcover_class_palette').getInfo()\n", "print(class_palette)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "landcover = result.set('classification_class_values', class_values)\n", "landcover = landcover.set('classification_class_palette', class_palette)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Map.addLayer(landcover, {}, 'Land cover')\n", "Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize the result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Change layer opacity:')\n", "cluster_layer = Map.layers[-1]\n", "cluster_layer.interact(opacity=(0, 1, 0.1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add a legend to the map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Map.add_legend(builtin_legend='NLCD')\n", "Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Accuracy assessment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Training dataset\n", "\n", "`confusionMatrix()` computes a 2D confusion matrix for a classifier based on its training data (ie: resubstitution error). Axis 0 of the matrix correspond to the input classes (i.e., reference data), and axis 1 to the output classes (i.e., classification data). The rows and columns start at class 0 and increase sequentially up to the maximum class value" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_accuracy = classifier.confusionMatrix()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_accuracy.getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Overall Accuracy essentially tells us out of all of the reference sites what proportion were mapped correctly. The overall accuracy is usually expressed as a percent, with 100% accuracy being a perfect classification where all reference site were classified correctly. Overall accuracy is the easiest to calculate and understand but ultimately only provides the map user and producer with basic accuracy information. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_accuracy.accuracy().getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Kappa Coefficient is generated from a statistical test to evaluate the accuracy of a classification. Kappa essentially evaluates how well the classification performed as compared to just randomly assigning values, i.e. did the classification do better than random. The Kappa Coefficient can range from -1 t0 1. A value of 0 indicated that the classification is no better than a random classification. A negative number indicates the classification is significantly worse than random. A value close to 1 indicates that the classification is significantly better than random. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_accuracy.kappa().getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Producer's Accuracy is the map accuracy from the point of view of the map maker (the producer). This is how often are real features on the ground correctly shown on the classified map or the probability that a certain land cover of an area on the ground is classified as such. The Producer's Accuracy is complement of the Omission Error, Producer's Accuracy = 100%-Omission Error. It is also the number of reference sites classified accurately divided by the total number of reference sites for that class. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_accuracy.producersAccuracy().getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Consumer's Accuracy is the accuracy from the point of view of a map user, not the map maker. the User's accuracy essentially tells use how often the class on the map will actually be present on the ground. This is referred to as reliability. The User's Accuracy is complement of the Commission Error, User's Accuracy = 100%-Commission Error. The User's Accuracy is calculating by taking the total number of correct classifications for a particular class and dividing it by the row total. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_accuracy.consumersAccuracy().getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Validation dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "validated = validation.classify(classifier)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "validated.first().getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`errorMatrix` computes a 2D error matrix for a collection by comparing two columns of a collection: one containing the actual values, and one containing predicted values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_accuracy = validated.errorMatrix('landcover', 'classification')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_accuracy.getInfo()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_accuracy.accuracy().getInfo()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_accuracy.kappa().getInfo()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_accuracy.producersAccuracy().getInfo()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_accuracy.consumersAccuracy().getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download confusion matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import csv\n", "import os\n", "\n", "out_dir = os.path.join(os.path.expanduser('~'), 'Downloads')\n", "training_csv = os.path.join(out_dir, 'train_accuracy.csv')\n", "testing_csv = os.path.join(out_dir, 'test_accuracy.csv')\n", "\n", "with open(training_csv, \"w\", newline=\"\") as f:\n", " writer = csv.writer(f)\n", " writer.writerows(train_accuracy.getInfo())\n", " \n", "with open(testing_csv, \"w\", newline=\"\") as f:\n", " writer = csv.writer(f)\n", " writer.writerows(test_accuracy.getInfo())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reclassify land cover map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "landcover = landcover.remap(new_class_values, raw_class_values).select(['remapped'], ['classification'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "landcover = landcover.set('classification_class_values', raw_class_values)\n", "landcover = landcover.set('classification_class_palette', class_palette)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Map.addLayer(landcover, {}, 'Final land cover')\n", "Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Export the result\n", "\n", "Export the result directly to your computer:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "out_dir = os.path.join(os.path.expanduser('~'), 'Downloads')\n", "out_file = os.path.join(out_dir, 'landcover.tif')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geemap.ee_export_image(landcover, filename=out_file, scale=900)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Export the result to Google Drive:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geemap.ee_export_image_to_drive(landcover, description='landcover', folder='export', scale=900)" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Table of Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "384px" }, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }