{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Landsat 8 NDVI Analysis on the Cloud\n", "\n", "This notebook demonstrates a \"Cloud-native\" analysis of [Normalized Difference Vegetation Index (NDVI)](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index) using Landsat 8 data. \n", "\n", "**What is unique about this workflow is that no data is downloaded to our local computer! All calculations are performed in memory across many distributed machines on the Google Cloud.** \n", "\n", "This workflow is possible because the Landsat 8 data is stored in [Cloud-Optimized Geotiff](http://www.cogeo.org) format, which can be accessed remotely via [xarray](http://xarray.pydata.org/en/stable/) and [rasterio](https://rasterio.readthedocs.io/en/latest/) Python libraries. Distributed computing is enabled through a [Pangeo](http://pangeo-data.org) JupyterHub deployment with [Dask Kubernetes](https://github.com/dask/dask-kubernetes).\n", "\n", "About Landsat 8:\n", "https://landsat.usgs.gov/landsat-8 \n", "\n", "About the Landsat archive:\n", "https://cloud.google.com/storage/docs/public-datasets/landsat\n", "\n", "Date: August 30, 2018\n", "\n", "Created by:\n", "Scott Henderson (scottyh@uw.edu), Daniel Rothenberg" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "import os\n", "import pandas as pd\n", "import rasterio\n", "import xarray as xr\n", "import requests\n", "\n", "from dask_kubernetes import KubeCluster\n", "from dask.distributed import Client\n", "from dask.distributed import wait, progress\n", "\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Xarray version: 0.10.8\n", "Rasterio version: 1.0.3\n" ] } ], "source": [ "# Print package versions\n", "print('Xarray version: ', xr.__version__)\n", "print('Rasterio version: ', rasterio.__version__)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Set environment variables for cloud-optimized-geotiffs efficiency\n", "os.environ['GDAL_DISABLE_READDIR_ON_OPEN']='YES'\n", "os.environ['CPL_VSIL_CURL_ALLOWED_EXTENSIONS']='TIF'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use NASA Common Metadata Repository (CMR) to get Landsat 8 images\n", "\n", "[NASA CMR](https://earthdata.nasa.gov/about/science-system-description/eosdis-components/common-metadata-repository) is a new unified way to search for remote sensing assests across many archive centers. If you prefer a graphical user interface, NASA [Earthdata Search](https://search.earthdata.nasa.gov/search) is built on top of CMR. CMR returns download links through the USGS (https://earthexplorer.usgs.gov), but the same archive is mirrored as a (Google Public Dataset)[https://cloud.google.com/storage/docs/public-datasets/landsat], so we'll make a function that queries CMR and returns URLs to the imagery stored on Google Cloud." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def query_cmr_landsat(collection='Landsat_8_OLI_TIRS_C1',tier='T1', path=47, row=27):\n", " \"\"\"Query NASA CMR for Collection1, Tier1 Landsat scenes from a specific path and row.\"\"\"\n", " \n", " data = [f'short_name={collection}',\n", " f'page_size=2000',\n", " f'attribute[]=string,CollectionCategory,{tier}',\n", " f'attribute[]=int,WRSPath,{path}',\n", " f'attribute[]=int,WRSRow,{row}',\n", " ]\n", "\n", " query = 'https://cmr.earthdata.nasa.gov/search/granules.json?' + '&'.join(data)\n", "\n", " r = requests.get(query, timeout=100)\n", " print(r.url)\n", " \n", " df = pd.DataFrame(r.json()['feed']['entry'])\n", " \n", " # Save results to a file\n", " #print('Saved results to cmr-result.json')\n", " #with open('cmr-result.json', 'w') as j:\n", " # j.write(r.text)\n", " \n", " return df" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def make_google_archive(pids, bands):\n", " \"\"\"Turn list of product_ids into pandas dataframe for NDVI analysis.\"\"\"\n", " \n", " path = pids[0].split('_')[2][1:3]\n", " row = pids[0].split('_')[2][-2:]\n", " baseurl = f'https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/0{path}/0{row}'\n", " \n", " dates = [pd.to_datetime(x.split('_')[3]) for x in pids]\n", " df = pd.DataFrame(dict(product_id=pids, date=dates))\n", " \n", " for band in bands:\n", " df[band] = [f'{baseurl}/{x}/{x}_{band}.TIF' for x in pids]\n", " \n", " return df" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://cmr.earthdata.nasa.gov/search/granules.json?short_name=Landsat_8_OLI_TIRS_C1&page_size=2000&attribute[]=string,CollectionCategory,T1&attribute[]=int,WRSPath,47&attribute[]=int,WRSRow,27\n" ] } ], "source": [ "df = query_cmr_landsat()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "pids = df.title.tolist()\n", "df = make_google_archive(pids, ['B4', 'B5'])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_iddateB4B5
0LC08_L1TP_047027_20130421_20170310_01_T12013-04-21https://storage.googleapis.com/gcp-public-data...https://storage.googleapis.com/gcp-public-data...
1LC08_L1TP_047027_20130523_20170310_01_T12013-05-23https://storage.googleapis.com/gcp-public-data...https://storage.googleapis.com/gcp-public-data...
2LC08_L1TP_047027_20130608_20170310_01_T12013-06-08https://storage.googleapis.com/gcp-public-data...https://storage.googleapis.com/gcp-public-data...
3LC08_L1TP_047027_20130624_20170309_01_T12013-06-24https://storage.googleapis.com/gcp-public-data...https://storage.googleapis.com/gcp-public-data...
4LC08_L1TP_047027_20130710_20180201_01_T12013-07-10https://storage.googleapis.com/gcp-public-data...https://storage.googleapis.com/gcp-public-data...
\n", "
" ], "text/plain": [ " product_id date \\\n", "0 LC08_L1TP_047027_20130421_20170310_01_T1 2013-04-21 \n", "1 LC08_L1TP_047027_20130523_20170310_01_T1 2013-05-23 \n", "2 LC08_L1TP_047027_20130608_20170310_01_T1 2013-06-08 \n", "3 LC08_L1TP_047027_20130624_20170309_01_T1 2013-06-24 \n", "4 LC08_L1TP_047027_20130710_20180201_01_T1 2013-07-10 \n", "\n", " B4 \\\n", "0 https://storage.googleapis.com/gcp-public-data... \n", "1 https://storage.googleapis.com/gcp-public-data... \n", "2 https://storage.googleapis.com/gcp-public-data... \n", "3 https://storage.googleapis.com/gcp-public-data... \n", "4 https://storage.googleapis.com/gcp-public-data... \n", "\n", " B5 \n", "0 https://storage.googleapis.com/gcp-public-data... \n", "1 https://storage.googleapis.com/gcp-public-data... \n", "2 https://storage.googleapis.com/gcp-public-data... \n", "3 https://storage.googleapis.com/gcp-public-data... \n", "4 https://storage.googleapis.com/gcp-public-data... " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch Dask Kubernetes Cluster\n", "\n", "This will allow us to distribute our analysis across many machines. In the default configuration for Pangeo Binder, each worker has 2 vCPUs and 7Gb of RAM. It may take several minutes to initialize these workers and make them available to Dask." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d3039133d0284773834984eeebe4e2d1", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='

KubeCluster

'), HBox(children=(HTML(value='\\n
\\n