{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "<a href=\"https://githubtocolab.com/gee-community/geemap/blob/master/examples/notebooks/local_rf_training.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\"/></a>\n",
    "\n",
    "Uncomment the following line to install [geemap](https://geemap.org) if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install geemap"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## local_rf_training\n",
    "\n",
    "This notebook illustrates how to train a random forest (or any other ensemble tree estimator) locally using scikit-learn, convert the estimator into a string representation that Earth Engine can interpret, and how to apply the machine learning model with EE."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import package\n",
    "import ee\n",
    "import geemap\n",
    "from geemap import ml  # note new module within geemap\n",
    "\n",
    "import pandas as pd\n",
    "from sklearn import ensemble"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "geemap.ee_initialize()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "### Training a model locally and using with EE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# read the feature table to train our RandomForest model\n",
    "# data taken from ee.FeatureCollection('GOOGLE/EE/DEMOS/demo_landcover_labels')\n",
    "\n",
    "df = pd.read_csv(\"../data/rf_example.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# specify the names of the features (i.e. band names) and label\n",
    "# feature names used to extract out features and define what bands\n",
    "\n",
    "feature_names = [\"B2\", \"B3\", \"B4\", \"B5\", \"B6\", \"B7\"]\n",
    "label = \"landcover\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the features and labels into separate variables\n",
    "X = df[feature_names]\n",
    "y = df[label]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a classifier and fit\n",
    "n_trees = 10\n",
    "rf = ensemble.RandomForestClassifier(10).fit(X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert the estimator into a list of strings\n",
    "# this function also works with the ensemble.ExtraTrees estimator\n",
    "trees = ml.rf_to_strings(rf, feature_names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "# print the first tree to see the result\n",
    "print(trees[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "# number of trees we converted should equal the number of trees we defined for the model\n",
    "len(trees) == n_trees"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "At this point you can take the list of strings and save them locally to avoid training again. However, we want to use the model with EE so we need to create an ee.Classifier and persist the data on ee for best results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a ee classifier to use with ee objects from the trees\n",
    "ee_classifier = ml.strings_to_classifier(trees)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make a cloud-free Landsat 8 TOA composite (from raw imagery).\n",
    "l8 = ee.ImageCollection(\"LANDSAT/LC08/C01/T1\")\n",
    "\n",
    "image = ee.Algorithms.Landsat.simpleComposite(\n",
    "    collection=l8.filterDate(\"2018-01-01\", \"2018-12-31\"), asFloat=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "# classify the image using the classifier we created from the local training\n",
    "# note: here we select the feature_names from the image that way the classifier knows which bands to use\n",
    "classified = image.select(feature_names).classify(ee_classifier)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "# display results\n",
    "Map = geemap.Map(center=(37.75, -122.25), zoom=11)\n",
    "\n",
    "Map.addLayer(\n",
    "    image,\n",
    "    {\"bands\": [\"B7\", \"B5\", \"B3\"], \"min\": 0.05, \"max\": 0.55, \"gamma\": 1.5},\n",
    "    \"image\",\n",
    ")\n",
    "Map.addLayer(\n",
    "    classified,\n",
    "    {\"min\": 0, \"max\": 2, \"palette\": [\"red\", \"green\", \"blue\"]},\n",
    "    \"classification\",\n",
    ")\n",
    "\n",
    "Map"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": [
    "Yay!! 🎉 Looks like our example works. Don't party too much because there is a catch...\n",
    "\n",
    "This workflow has several limitations particularly due to how much data you can pass from the client to the server and how large of a model ee can actually handle. EE can only handle 40MB of data passed to the server, so if you have a lot of large decision tree strings then this will not work. Also, creating a classifier from strings has limitation (see this ee-forum discussion: https://groups.google.com/g/google-earth-engine-developers/c/lFFU1GBPzi8/m/6MewQk1FBwAJ), this is again limited by string lengths when ee creates a computation graph.\n",
    "\n",
    "So, you can use this but know you will probably run into errors when training large models."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "### Saving trees to ee.FeatureCollection\n",
    "\n",
    "Now we have the strings in a format that ee can use, we want to save it for later use. There is a function to export a list of tree strings to a feature collection. The feature collection will have a pro"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "# specify asset id where to save trees\n",
    "# be sure to change <user_name> to your ee user name\n",
    "asset_id = \"users/<user_name>/random_forest_strings_test\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "# kick off an export process so it will be saved to the ee asset\n",
    "ml.export_trees_to_fc(trees, asset_id)\n",
    "\n",
    "# this will kick off an export task, so wait a few minutes before moving on"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "# read the exported tree feature collection\n",
    "rf_fc = ee.FeatureCollection(asset_id)\n",
    "\n",
    "# convert it to a classifier, very similar to the `ml.trees_to_classifier` function\n",
    "another_classifier = ml.fc_to_classifier(rf_fc)\n",
    "\n",
    "# classify the image again but with the classifier from the persisted trees\n",
    "classified = image.select(feature_names).classify(another_classifier)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "# display results\n",
    "# we should get the exact same results as before\n",
    "Map = geemap.Map(center=(37.75, -122.25), zoom=11)\n",
    "\n",
    "Map.addLayer(\n",
    "    image,\n",
    "    {\"bands\": [\"B7\", \"B5\", \"B3\"], \"min\": 0.05, \"max\": 0.55, \"gamma\": 1.5},\n",
    "    \"image\",\n",
    ")\n",
    "Map.addLayer(\n",
    "    classified,\n",
    "    {\"min\": 0, \"max\": 2, \"palette\": [\"red\", \"green\", \"blue\"]},\n",
    "    \"classification\",\n",
    ")\n",
    "\n",
    "Map"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}