{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started with RasterFrames Notebook"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup Spark Environment"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pyrasterframes\n",
"from pyrasterframes.utils import create_rf_spark_session\n",
"import pyrasterframes.rf_ipython # enables nicer visualizations of pandas DF\n",
"from pyrasterframes.rasterfunctions import *\n",
"import pyspark.sql.functions as F"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"spark = create_rf_spark_session()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get a PySpark DataFrame from [open data](https://docs.opendata.aws/modis-pds/readme.html)\n",
"\n",
"Read a single \"granule\" or scene of MODIS surface reflectance data. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"uri = 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059' \\\n",
" '/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF'\n",
"df = spark.read.raster(uri)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"root\n",
" |-- proj_raster_path: string (nullable = false)\n",
" |-- proj_raster: struct (nullable = true)\n",
" | |-- tile_context: struct (nullable = false)\n",
" | | |-- extent: struct (nullable = false)\n",
" | | | |-- xmin: double (nullable = false)\n",
" | | | |-- ymin: double (nullable = false)\n",
" | | | |-- xmax: double (nullable = false)\n",
" | | | |-- ymax: double (nullable = false)\n",
" | | |-- crs: struct (nullable = false)\n",
" | | | |-- crsProj4: string (nullable = false)\n",
" | |-- tile: tile (nullable = false)\n",
"\n"
]
}
],
"source": [
"df.printSchema()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do some work with the raster data; add 3 element-wise to the pixel/cell values and show some rows of the DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"Showing only top 5 rows\n",
"\n",
"rf_local_add(proj_raster, 3) |
\n",
"\n",
"\n",
" |
\n",
" |
\n",
" |
\n",
" |
\n",
" |
\n",
"\n",
"
"
],
"text/markdown": [
"\n",
"_Showing only top 5 rows_.\n",
"\n",
"| rf_local_add(proj_raster, 3) |\n",
"|---|\n",
"| |\n",
"| |\n",
"| |\n",
"| |\n",
"| |"
],
"text/plain": [
"DataFrame[rf_local_add(proj_raster, 3): struct,crs:struct>,tile:udt>]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.select(rf_local_add(df.proj_raster, F.lit(3)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The extent struct tells us where in the [CRS](https://spatialreference.org/ref/sr-org/6842/) the tile data covers. The granule is split into arbitrary sized chunks. Each row is a different chunk. Let's see how many.\n",
"\n",
"Side note: you can configure the default size of these chunks, which are called Tiles, by passing a tuple of desired columns and rows as: `raster(uri, tile_dimensions=(96, 96))`. The default is `(256, 256)`"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"100"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What area does the DataFrame cover?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs \n"
]
},
{
"data": {
"text/html": [
"\n",
"Showing only top 5 rows\n",
"\n",
"proj_raster_path | footprint |
\n",
"\n",
"\n",
"https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-70.85954815687087 8.933333332533772, -71.07986282542622 9.999999999104968, -69.99674110618135 9.999999999104968, -69.77978361352781 8.933333332533772, -70.85954815687087 8.933333332533772)) |
\n",
"https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-69.77978361352781 8.933333332533772, -69.99674110618135 9.999999999104968, -68.91361938693649 9.999999999104968, -68.70001907018472 8.933333332533772, -69.77978361352781 8.933333332533772)) |
\n",
"https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-68.70001907018474 8.933333332533772, -68.9136193869365 9.999999999104968, -67.83049766769162 9.999999999104968, -67.62025452684165 8.933333332533772, -68.70001907018474 8.933333332533772)) |
\n",
"https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-67.62025452684165 8.933333332533772, -67.83049766769162 9.999999999104968, -66.74737594844675 9.999999999104968, -66.54048998349857 8.933333332533772, -67.62025452684165 8.933333332533772)) |
\n",
"https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-66.54048998349859 8.933333332533772, -66.74737594844676 9.999999999104968, -65.66425422920187 9.999999999104968, -65.4607254401555 8.933333332533772, -66.54048998349859 8.933333332533772)) |
\n",
"\n",
"
"
],
"text/markdown": [
"\n",
"_Showing only top 5 rows_.\n",
"\n",
"| proj_raster_path | footprint |\n",
"|---|---|\n",
"| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-70.85954815687087 8.933333332533772, -71.07986282542622 9.999999999104968, -69.99674110618135 9.999999999104968, -69.77978361352781 8.933333332533772, -70.85954815687087 8.933333332533772)) |\n",
"| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-69.77978361352781 8.933333332533772, -69.99674110618135 9.999999999104968, -68.91361938693649 9.999999999104968, -68.70001907018472 8.933333332533772, -69.77978361352781 8.933333332533772)) |\n",
"| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-68.70001907018474 8.933333332533772, -68.9136193869365 9.999999999104968, -67.83049766769162 9.999999999104968, -67.62025452684165 8.933333332533772, -68.70001907018474 8.933333332533772)) |\n",
"| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-67.62025452684165 8.933333332533772, -67.83049766769162 9.999999999104968, -66.74737594844675 9.999999999104968, -66.54048998349857 8.933333332533772, -67.62025452684165 8.933333332533772)) |\n",
"| https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | POLYGON ((-66.54048998349859 8.933333332533772, -66.74737594844676 9.999999999104968, -65.66425422920187 9.999999999104968, -65.4607254401555 8.933333332533772, -66.54048998349859 8.933333332533772)) |"
],
"text/plain": [
"DataFrame[proj_raster_path: string, footprint: udt]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"crs = df.agg(F.first(rf_crs(df.proj_raster)).crsProj4.alias('crs')).first()['crs']\n",
"print(crs)\n",
"coverage_area = df.select(\n",
" df.proj_raster_path,\n",
" st_reproject(\n",
" st_geometry(rf_extent(df.proj_raster)), \n",
" rf_mk_crs(crs), \n",
" rf_mk_crs('EPSG:4326')).alias('footprint')\n",
" )\n",
"coverage_area"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So where in the world is that? We'll generate a little visualization with Leaflet in the notebook using Folium."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"import geopandas\n",
"import folium"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"gdf = geopandas.GeoDataFrame(\n",
" coverage_area.select('footprint').toPandas(), \n",
" geometry='footprint', crs={'init':'EPSG:4326'}) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"folium.Map((5, -65), zoom_start=6) \\\n",
" .add_child(folium.GeoJson(gdf.__geo_interface__))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look at a sample of the data. You may find it useful to double-click the tile image column to see larger or smaller rendering of the image."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" proj_raster_path | \n",
" extent | \n",
" geo | \n",
" tile | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | \n",
" (-7783653.637667, 993342.4642358534, -7665045.582235853, 1111950.519667) | \n",
" POLYGON ((-7783653.637667 993342.4642358534, -7... | \n",
" | \n",
"
\n",
" \n",
" 1 | \n",
" https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | \n",
" (-7665045.582235853, 993342.4642358534, -7546437.526804706, 1111950.519667) | \n",
" POLYGON ((-7665045.582235853 993342.4642358534,... | \n",
" | \n",
"
\n",
" \n",
" 2 | \n",
" https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | \n",
" (-7546437.526804707, 993342.4642358534, -7427829.47137356, 1111950.519667) | \n",
" POLYGON ((-7546437.526804707 993342.4642358534,... | \n",
" | \n",
"
\n",
" \n",
" 3 | \n",
" https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | \n",
" (-7427829.47137356, 993342.4642358534, -7309221.415942413, 1111950.519667) | \n",
" POLYGON ((-7427829.47137356 993342.4642358534, ... | \n",
" | \n",
"
\n",
" \n",
" 4 | \n",
" https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF | \n",
" (-7309221.415942414, 993342.4642358534, -7190613.360511267, 1111950.519667) | \n",
" POLYGON ((-7309221.415942414 993342.4642358534,... | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" proj_raster_path \\\n",
"0 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n",
"1 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n",
"2 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n",
"3 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n",
"4 https://modis-pds.s3.amazonaws.com/MCD43A4.006... \n",
"\n",
" extent \\\n",
"0 (-7783653.637667, 993342.4642358534, -7665045.... \n",
"1 (-7665045.582235853, 993342.4642358534, -75464... \n",
"2 (-7546437.526804707, 993342.4642358534, -74278... \n",
"3 (-7427829.47137356, 993342.4642358534, -730922... \n",
"4 (-7309221.415942414, 993342.4642358534, -71906... \n",
"\n",
" geo \\\n",
"0 POLYGON ((-7783653.637667 993342.4642358534, -... \n",
"1 POLYGON ((-7665045.582235853 993342.4642358534... \n",
"2 POLYGON ((-7546437.526804707 993342.4642358534... \n",
"3 POLYGON ((-7427829.47137356 993342.4642358534,... \n",
"4 POLYGON ((-7309221.415942414 993342.4642358534... \n",
"\n",
" tile \n",
"0 Tile(dimensions=[256, 256], cell_type=CellType... \n",
"1 Tile(dimensions=[256, 256], cell_type=CellType... \n",
"2 Tile(dimensions=[256, 256], cell_type=CellType... \n",
"3 Tile(dimensions=[256, 256], cell_type=CellType... \n",
"4 Tile(dimensions=[256, 256], cell_type=CellType... "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Look at a sample\n",
"pandas_df = df.select(\n",
" df.proj_raster_path,\n",
" rf_extent(df.proj_raster).alias('extent'),\n",
" rf_geometry(df.proj_raster).alias('geo'),\n",
" rf_tile(df.proj_raster).alias('tile'),\n",
").limit(5).toPandas()\n",
"pandas_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}