{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## How to export thousands of image chips from Earth Engine in a few minutes\n", "\n", "This source code of this notebook was adopted from the Medium post - [Fast(er) Downloads](https://gorelick.medium.com/fast-er-downloads-a2abd512aa26) by Noel Gorelick. Credits to Noel. \n", "\n", "Due to the [limitation](https://docs.python.org/3/library/multiprocessing.html) of the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) package, the functionality of this notebook can only be run in the top-level. Therefore, it could not be implemented as a function under geemap. \n", "\n", "### Install packages\n", "\n", "Uncomment the following line to install the required packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# !pip install geemap retry" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ee\n", "import geemap\n", "import logging\n", "import multiprocessing\n", "import os\n", "import requests\n", "import shutil\n", "from retry import retry" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialize GEE to use the high-volume endpoint\n", "\n", "- [high-volume endpoint](https://developers.google.com/earth-engine/cloud/highvolume)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an interactive map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Map = geemap.Map()\n", "Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define the Region of Interest (ROI)\n", "\n", "You can use the drawing tools on the map to draw an ROI, then you can use `Map.user_roi` to retrieve the geometry. Alternatively, you can define the ROI as an ee.Geometry as shown below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# region = Map.user_roi\n", "region = ee.Geometry.Polygon(\n", " [\n", " [\n", " [-122.513695, 37.707998],\n", " [-122.513695, 37.804359],\n", " [-122.371902, 37.804359],\n", " [-122.371902, 37.707998],\n", " [-122.513695, 37.707998],\n", " ]\n", " ],\n", " None,\n", " False,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define the image source\n", "\n", "Using the 1-m [NAIP imagery](https://developers.google.com/earth-engine/datasets/catalog/USDA_NAIP_DOQQ)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image = (\n", " ee.ImageCollection('USDA/NAIP/DOQQ')\n", " .filterBounds(region)\n", " .filterDate('2020', '2021')\n", " .mosaic()\n", " .clip(region)\n", " .select('N', 'R', 'G')\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the 10-m [Sentinel-2 imagery](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR#bands)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# image = ee.ImageCollection('COPERNICUS/S2_SR') \\\n", "# .filterBounds(region) \\\n", "# .filterDate('2021', '2022') \\\n", "# .select('B8', 'B4', 'B3') \\\n", "# .median() \\\n", "# .visualize(min=0, max=4000) \\\n", "# .clip(region)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set parameters\n", "\n", "If you want the exported images to have coordinate system, change `format` to `GEO_TIFF`. Otherwise, you can use `png` or `jpg` formats." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "params = {\n", " 'count': 100, # How many image chips to export\n", " 'buffer': 127, # The buffer distance (m) around each point\n", " 'scale': 100, # The scale to do stratified sampling\n", " 'seed': 1, # A randomization seed to use for subsampling.\n", " 'dimensions': '256x256', # The dimension of each image chip\n", " 'format': \"png\", # The output image format, can be png, jpg, ZIPPED_GEO_TIFF, GEO_TIFF, NPY\n", " 'prefix': 'tile_', # The filename prefix\n", " 'processes': 25, # How many processes to used for parallel processing\n", " 'out_dir': '.', # The output directory. Default to the current working directly\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add layers to map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Map.addLayer(image, {}, \"Image\")\n", "Map.addLayer(region, {}, \"ROI\", False)\n", "Map.setCenter(-122.4415, 37.7555, 12)\n", "Map" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Generate a list of work items\n", "\n", "In the example, we are going to generate 1000 points using the stratified random sampling, which requires a `classBand`. It is the name of the band containing the classes to use for stratification. If unspecified, the first band of the input image is used. Therefore, we have toADD a new band with a constant value (e.g., 1) to the image. The result of the `getRequests()`function returns a list of dictionaries containing points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def getRequests():\n", " img = ee.Image(1).rename(\"Class\").addBands(image)\n", " points = img.stratifiedSample(\n", " numPoints=params['count'],\n", " region=region,\n", " scale=params['scale'],\n", " seed=params['seed'],\n", " geometries=True,\n", " )\n", " Map.data = points\n", " return points.aggregate_array('.geo').getInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a function for downloading image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `getResult()` function then takes one of those points and generates an image centered on that location, which is then downloaded as a PNG and saved to a file. This function uses `image.getThumbURL()` to select the pixels, however you could also use `image.getDownloadURL()` if you wanted the output to be in GeoTIFF or NumPy format ([source](https://gorelick.medium.com/fast-er-downloads-a2abd512aa26))." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@retry(tries=10, delay=1, backoff=2)\n", "def getResult(index, point):\n", " point = ee.Geometry.Point(point['coordinates'])\n", " region = point.buffer(params['buffer']).bounds()\n", "\n", " if params['format'] in ['png', 'jpg']:\n", " url = image.getThumbURL(\n", " {\n", " 'region': region,\n", " 'dimensions': params['dimensions'],\n", " 'format': params['format'],\n", " }\n", " )\n", " else:\n", " url = image.getDownloadURL(\n", " {\n", " 'region': region,\n", " 'dimensions': params['dimensions'],\n", " 'format': params['format'],\n", " }\n", " )\n", "\n", " if params['format'] == \"GEO_TIFF\":\n", " ext = 'tif'\n", " else:\n", " ext = params['format']\n", "\n", " r = requests.get(url, stream=True)\n", " if r.status_code != 200:\n", " r.raise_for_status()\n", "\n", " out_dir = os.path.abspath(params['out_dir'])\n", " basename = str(index).zfill(len(str(params['count'])))\n", " filename = f\"{out_dir}/{params['prefix']}{basename}.{ext}\"\n", " with open(filename, 'wb') as out_file:\n", " shutil.copyfileobj(r.raw, out_file)\n", " print(\"Done: \", basename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download images" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "logging.basicConfig()\n", "items = getRequests()\n", "\n", "pool = multiprocessing.Pool(params['processes'])\n", "pool.starmap(getResult, enumerate(items))\n", "\n", "pool.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieve sample points" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Map.addLayer(Map.data, {}, \"Sample points\")\n", "Map" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }