{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Started with RasterFrames Notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup Spark Environment" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pyrasterframes\n", "from pyrasterframes.utils import create_rf_spark_session\n", "import pyrasterframes.rf_ipython # enables nicer visualizations of pandas DF\n", "from pyrasterframes.rasterfunctions import *\n", "import pyspark.sql.functions as F" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [], "source": [ "spark = create_rf_spark_session()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get a PySpark DataFrame from [open data](https://docs.opendata.aws/modis-pds/readme.html)\n", "\n", "Read a single \"granule\" or scene of MODIS surface reflectance data. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "uri = 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059' \\\n", " '/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF'\n", "df = spark.read.raster(uri)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "root\n", " |-- proj_raster_path: string (nullable = false)\n", " |-- proj_raster: struct (nullable = true)\n", " | |-- tile_context: struct (nullable = false)\n", " | | |-- extent: struct (nullable = false)\n", " | | | |-- xmin: double (nullable = false)\n", " | | | |-- ymin: double (nullable = false)\n", " | | | |-- xmax: double (nullable = false)\n", " | | | |-- ymax: double (nullable = false)\n", " | | |-- crs: struct (nullable = false)\n", " | | | |-- crsProj4: string (nullable = false)\n", " | |-- tile: tile (nullable = false)\n", "\n" ] } ], "source": [ "df.printSchema()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do some work with the raster data; add 3 element-wise to the pixel/cell values and show some rows of the DataFrame." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Showing only top 5 rows
rf_local_add(proj_raster, 3)
" ], "text/markdown": [ "\n", "_Showing only top 5 rows_.\n", "\n", "| rf_local_add(proj_raster, 3) |\n", "|---|\n", "| |\n", "| |\n", "| |\n", "| |\n", "| |" ], "text/plain": [ "DataFrame[rf_local_add(proj_raster, 3): struct,crs:struct>,tile:udt>]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.select(rf_local_add(df.proj_raster, F.lit(3)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The extent struct tells us where in the [CRS](https://spatialreference.org/ref/sr-org/6842/) the tile data covers. The granule is split into arbitrary sized chunks. Each row is a different chunk. Let's see how many.\n", "\n", "Side note: you can configure the default size of these chunks, which are called Tiles, by passing a tuple of desired columns and rows as: `raster(uri, tile_dimensions=(96, 96))`. The default is `(256, 256)`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "100" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What area does the DataFrame cover?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Showing only top 5 rows
proj_raster_pathfootprint
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIFPOLYGON ((-70.85954815687087 8.933333332533772, -71.07986282542622 9.999999999104968, -69.99674110618135 9.999999999104968, -69.77978361352781 8.933333332533772, -70.85954815687087 8.933333332533772))
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIFPOLYGON ((-69.77978361352781 8.933333332533772, -69.99674110618135 9.999999999104968, -68.91361938693649 9.999999999104968, -68.70001907018472 8.933333332533772, -69.77978361352781 8.933333332533772))
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIFPOLYGON ((-68.70001907018474 8.933333332533772, -68.9136193869365 9.999999999104968, -67.83049766769162 9.999999999104968, -67.62025452684165 8.933333332533772, -68.70001907018474 8.933333332533772))
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIFPOLYGON ((-67.62025452684165 8.933333332533772, -67.83049766769162 9.999999999104968, -66.74737594844675 9.999999999104968, -66.54048998349857 8.933333332533772, -67.62025452684165 8.933333332533772))
https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIFPOLYGON ((-66.54048998349859 8.933333332533772, -66.74737594844676 9.999999999104968, -65.66425422920187 9.999999999104968, -65.4607254401555 8.933333332533772, -66.54048998349859 8.933333332533772))
" ], "text/markdown": [ "\n", "_Showing only top 5 rows_.\n", "\n", "| proj_raster_path | footprint |\n", "|---|---|\n", "| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-70.85954815687087 8.933333332533772, -71.07986282542622 9.999999999104968, -69.99674110618135 9.999999999104968, -69.77978361352781 8.933333332533772, -70.85954815687087 8.933333332533772)) |\n", "| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-69.77978361352781 8.933333332533772, -69.99674110618135 9.999999999104968, -68.91361938693649 9.999999999104968, -68.70001907018472 8.933333332533772, -69.77978361352781 8.933333332533772)) |\n", "| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-68.70001907018474 8.933333332533772, -68.9136193869365 9.999999999104968, -67.83049766769162 9.999999999104968, -67.62025452684165 8.933333332533772, -68.70001907018474 8.933333332533772)) |\n", "| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-67.62025452684165 8.933333332533772, -67.83049766769162 9.999999999104968, -66.74737594844675 9.999999999104968, -66.54048998349857 8.933333332533772, -67.62025452684165 8.933333332533772)) |\n", "| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-66.54048998349859 8.933333332533772, -66.74737594844676 9.999999999104968, -65.66425422920187 9.999999999104968, -65.4607254401555 8.933333332533772, -66.54048998349859 8.933333332533772)) |" ], "text/plain": [ "DataFrame[proj_raster_path: string, footprint: udt]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "crs = df.agg(F.first(rf_crs(df.proj_raster)).crsProj4.alias('crs')).first()['crs']\n", "print(crs)\n", "coverage_area = df.select(\n", " df.proj_raster_path,\n", " st_reproject(\n", " st_geometry(rf_extent(df.proj_raster)), \n", " rf_mk_crs(crs), \n", " rf_mk_crs('EPSG:4326')).alias('footprint')\n", " )\n", "coverage_area" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So where in the world is that? We'll generate a little visualization with Leaflet in the notebook using Folium." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import geopandas\n", "import folium" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "gdf = geopandas.GeoDataFrame(\n", " coverage_area.select('footprint').toPandas(), \n", " geometry='footprint', crs={'init':'EPSG:4326'}) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "folium.Map((5, -65), zoom_start=6) \\\n", " .add_child(folium.GeoJson(gdf.__geo_interface__))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at a sample of the data. You may find it useful to double-click the tile image column to see larger or smaller rendering of the image." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
proj_raster_pathextentgeotile
0https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7783653.637667, 993342.4642358534, -7665045.582235853, 1111950.519667)POLYGON ((-7783653.637667 993342.4642358534, -7...
1https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7665045.582235853, 993342.4642358534, -7546437.526804706, 1111950.519667)POLYGON ((-7665045.582235853 993342.4642358534,...
2https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7546437.526804707, 993342.4642358534, -7427829.47137356, 1111950.519667)POLYGON ((-7546437.526804707 993342.4642358534,...
3https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7427829.47137356, 993342.4642358534, -7309221.415942413, 1111950.519667)POLYGON ((-7427829.47137356 993342.4642358534, ...
4https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7309221.415942414, 993342.4642358534, -7190613.360511267, 1111950.519667)POLYGON ((-7309221.415942414 993342.4642358534,...
\n", "
" ], "text/plain": [ " proj_raster_path \\\n", "0 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "1 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "2 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "3 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "4 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "\n", " extent \\\n", "0 (-7783653.637667, 993342.4642358534, -7665045.... \n", "1 (-7665045.582235853, 993342.4642358534, -75464... \n", "2 (-7546437.526804707, 993342.4642358534, -74278... \n", "3 (-7427829.47137356, 993342.4642358534, -730922... \n", "4 (-7309221.415942414, 993342.4642358534, -71906... \n", "\n", " geo \\\n", "0 POLYGON ((-7783653.637667 993342.4642358534, -... \n", "1 POLYGON ((-7665045.582235853 993342.4642358534... \n", "2 POLYGON ((-7546437.526804707 993342.4642358534... \n", "3 POLYGON ((-7427829.47137356 993342.4642358534,... \n", "4 POLYGON ((-7309221.415942414 993342.4642358534... \n", "\n", " tile \n", "0 Tile(dimensions=[256, 256], cell_type=CellType... \n", "1 Tile(dimensions=[256, 256], cell_type=CellType... \n", "2 Tile(dimensions=[256, 256], cell_type=CellType... \n", "3 Tile(dimensions=[256, 256], cell_type=CellType... \n", "4 Tile(dimensions=[256, 256], cell_type=CellType... " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Look at a sample\n", "pandas_df = df.select(\n", " df.proj_raster_path,\n", " rf_extent(df.proj_raster).alias('extent'),\n", " rf_geometry(df.proj_raster).alias('geo'),\n", " rf_tile(df.proj_raster).alias('tile'),\n", ").limit(5).toPandas()\n", "pandas_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }