{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Started with RasterFrames Notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup Spark Environment" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pyrasterframes\n", "import pyrasterframes.rf_ipython # enables nicer visualizations of pandas DF\n", "from pyrasterframes.rasterfunctions import (rf_local_add, rf_dimensions, rf_extent, rf_crs, rf_mk_crs,\n", " st_geometry, st_reproject, rf_tile)\n", "import pyspark.sql.functions as F" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [], "source": [ "spark = pyrasterframes.get_spark_session()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get a PySpark DataFrame from [open data](https://docs.opendata.aws/modis-pds/readme.html)\n", "\n", "Read a single \"granule\" or scene of MODIS surface reflectance data. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "uri = 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059' \\\n", " '/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF'\n", "df = spark.read.raster(uri)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "root\n", " |-- proj_raster_path: string (nullable = false)\n", " |-- proj_raster: struct (nullable = true)\n", " | |-- tile_context: struct (nullable = false)\n", " | | |-- extent: struct (nullable = false)\n", " | | | |-- xmin: double (nullable = false)\n", " | | | |-- ymin: double (nullable = false)\n", " | | | |-- xmax: double (nullable = false)\n", " | | | |-- ymax: double (nullable = false)\n", " | | |-- crs: struct (nullable = false)\n", " | | | |-- crsProj4: string (nullable = false)\n", " | |-- tile: tile (nullable = false)\n", "\n" ] } ], "source": [ "df.printSchema()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do some work with the raster data; add 3 element-wise to the pixel/cell values and show some rows of the DataFrame." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "|rf_local_add(proj_raster, 3) |\n", "+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "|[[[-7783653.637667, 993342.4642358534, -7665045.582235852, 1111950.519667], [+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ]], [int16ud32767, (256,256), [3408,3471,3110,2875,2798,2973,3255,3169,-2147483648,3217,...,-2147483648,-2147483648,-2147483648,-2147483648,-2147483648,-2147483648,-2147483648,2519,3036,-2147483648]]]|\n", "|[[[-7665045.582235853, 993342.4642358534, -7546437.526804706, 1111950.519667], [+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ]], [int16ud32767, (256,256), [2337,2346,2581,2751,2575,2364,2223,2384,2618,2296,...,-2147483648,-2147483648,2688,2702,2967,3200,3257,3052,2914,2534]]] |\n", "|[[[-7546437.526804707, 993342.4642358534, -7427829.471373559, 1111950.519667], [+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ]], [int16ud32767, (257,256), [2602,2728,2784,2781,2567,2539,2254,2327,2436,2888,...,2726,2695,2788,2898,3139,3121,2939,2778,2859,2728]]] |\n", "|[[[-7427829.47137356, 993342.4642358534, -7309221.415942413, 1111950.519667], [+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ]], [int16ud32767, (256,256), [3058,3163,3036,3228,2877,3310,2885,2932,2931,2940,...,2299,2190,2180,2441,2563,2431,2347,2414,2525,2778]]] |\n", "|[[[-7309221.415942414, 993342.4642358534, -7190613.360511266, 1111950.519667], [+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m ]], [int16ud32767, (257,256), [3238,3355,3502,3055,3343,3334,-2147483648,-2147483648,-2147483648,-2147483648,...,2629,2841,2983,2839,3107,2762,2524,3175,3190,3181]]] |\n", "+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "df.select(rf_local_add(df.proj_raster, F.lit(3))).show(5, False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The extent struct tells us where in the [CRS](https://spatialreference.org/ref/sr-org/6842/) the tile data covers. The granule is split into arbitrary sized chunks. Each row is a different chunk. Let's see how many.\n", "\n", "Side note: you can configure the default size of these chunks, which are called Tiles, by passing a tuple of desired columns and rows as: `raster(uri, tile_dimensions=(96, 96))`. The default is `(256, 256)`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "100" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What area does the DataFrame cover?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+proj=sinu +lon_0=0.0 +x_0=0.0 +y_0=0.0 +a=6371007.181 +b=6371007.181 +units=m \n", "+--------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "|proj_raster_path |footprint |\n", "+--------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-70.85954815687087 8.933333332533772, -71.07986282542622 9.999999999104968, -69.99674110618135 9.999999999104968, -69.7797836135278 8.933333332533772, -70.85954815687087 8.933333332533772)) |\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-69.77978361352781 8.933333332533772, -69.99674110618135 9.999999999104968, -68.91361938693649 9.999999999104968, -68.70001907018472 8.933333332533772, -69.77978361352781 8.933333332533772)) |\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-68.70001907018474 8.933333332533772, -68.9136193869365 9.999999999104968, -67.8304976676916 9.999999999104968, -67.62025452684163 8.933333332533772, -68.70001907018474 8.933333332533772)) |\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-67.62025452684165 8.933333332533772, -67.83049766769162 9.999999999104968, -66.74737594844675 9.999999999104968, -66.54048998349857 8.933333332533772, -67.62025452684165 8.933333332533772)) |\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-66.54048998349859 8.933333332533772, -66.74737594844676 9.999999999104968, -65.66425422920187 9.999999999104968, -65.4607254401555 8.933333332533772, -66.54048998349859 8.933333332533772)) |\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-65.4607254401555 8.933333332533772, -65.66425422920187 9.999999999104968, -64.58113250995702 9.999999999104968, -64.38096089681244 8.933333332533772, -65.4607254401555 8.933333332533772)) |\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-64.38096089681244 8.933333332533772, -64.58113250995702 9.999999999104968, -63.498010790712144 9.999999999104968, -63.30119635346936 8.933333332533772, -64.38096089681244 8.933333332533772))|\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-63.30119635346937 8.933333332533772, -63.49801079071215 9.999999999104968, -62.41488907146726 9.999999999104968, -62.221431810126276 8.933333332533772, -63.30119635346937 8.933333332533772))|\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-62.22143181012629 8.933333332533772, -62.41488907146727 9.999999999104968, -61.33176735222239 9.999999999104968, -61.14166726678321 8.933333332533772, -62.22143181012629 8.933333332533772)) |\n", "|https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF|POLYGON ((-61.14166726678322 8.933333332533772, -61.3317673522224 9.999999999104968, -60.92559670750556 9.999999999104968, -60.736755563029554 8.933333332533772, -61.14166726678322 8.933333332533772)) |\n", "+--------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "only showing top 10 rows\n", "\n" ] } ], "source": [ "crs = df.agg(F.first(rf_crs(df.proj_raster)).crsProj4.alias('crs')).first()['crs']\n", "print(crs)\n", "coverage_area = df.select(\n", " df.proj_raster_path,\n", " st_reproject(\n", " st_geometry(rf_extent(df.proj_raster)), \n", " rf_mk_crs(crs), \n", " rf_mk_crs('EPSG:4326')).alias('footprint')\n", " )\n", "coverage_area.show(10, False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So where in the world is that? We'll generate a little visualization with Leaflet in the notebook using Folium." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import geopandas\n", "import folium" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "gdf = geopandas.GeoDataFrame(\n", " coverage_area.select('footprint').toPandas(), \n", " geometry='footprint', crs={'init':'EPSG:4326'}) " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "folium.Map((5, -65), zoom_start=6) \\\n", " .add_child(folium.GeoJson(gdf.__geo_interface__))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at a sample of the data. You may find it useful to double-click the tile image column to see larger or smaller rendering of the image." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
proj_raster_pathextenttile
0https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7783653.637667, 993342.4642358534, -7665045.582235852, 1111950.519667)
1https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7665045.582235853, 993342.4642358534, -7546437.526804706, 1111950.519667)
2https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7546437.526804707, 993342.4642358534, -7427829.471373559, 1111950.519667)
3https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7427829.47137356, 993342.4642358534, -7309221.415942413, 1111950.519667)
4https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF(-7309221.415942414, 993342.4642358534, -7190613.360511266, 1111950.519667)
\n", "
" ], "text/plain": [ " proj_raster_path \\\n", "0 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "1 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "2 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "3 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "4 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n", "\n", " extent \\\n", "0 (-7783653.637667, 993342.4642358534, -7665045.... \n", "1 (-7665045.582235853, 993342.4642358534, -75464... \n", "2 (-7546437.526804707, 993342.4642358534, -74278... \n", "3 (-7427829.47137356, 993342.4642358534, -730922... \n", "4 (-7309221.415942414, 993342.4642358534, -71906... \n", "\n", " tile \n", "0 Tile(dimensions=[256, 256], cell_type=CellType... \n", "1 Tile(dimensions=[256, 256], cell_type=CellType... \n", "2 Tile(dimensions=[257, 256], cell_type=CellType... \n", "3 Tile(dimensions=[256, 256], cell_type=CellType... \n", "4 Tile(dimensions=[257, 256], cell_type=CellType... " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Look at a sample\n", "pandas_df = df.select(\n", " df.proj_raster_path,\n", " rf_extent(df.proj_raster).alias('extent'),\n", " rf_tile(df.proj_raster).alias('tile'),\n", ").limit(5).toPandas()\n", "pandas_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }