{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Geopandas Tutorial\n", "\n", "\n", "
\n", "\n", "### Overview\n", " \n", "* **teaching:** 30 minutes\n", "* **exercises:** 0\n", "* **questions:**\n", " * How can I analyze and visualize vector data in Python with geopandas?\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Table of contents\n", "1. [**Pandas and Geopandas**](#Pandas-and-Geopandas)\n", "1. [**Tabular data with Pandas**](#Tabular-data-with-Pandas)\n", "1. [**Vector data with Geopandas**](#Vector-data-with-Geopandas)\n", "1. [**Visualization with holoviz**](#Visualization-with-holoviz)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pandas and Geopandas primer\n", "\n", "[Pandas](https://pandas.pydata.org) is a core scientific Python library to work with \"Panel Data\" (PanDas). Basically if you have a spreadsheet or database you should be using Pandas. Pandas has many input/output (I/O) functions, and two core data structures - the \"Series\" and \"DataFrame\". [Geopandas](http://geopandas.org) extends Pandas to work efficently with collections of geographic Vector data - geometric shapes that are georeferenced to a position on Earth's surface. Geopandas data objects are, you might have guessed, called \"GeoSeries\" and \"GeoDataFrame\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#These libraries are mature, but constantly improving, so it's always good to keep track of the version:\n", "import pandas as pd\n", "import geopandas as gpd\n", "print('Pandas version: ', pd.__version__)\n", "print('Geopandas version: ', gpd.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tabular data with Pandas\n", "\n", "We'll use the [Smithsonian Global Volcanism database](https://volcano.si.edu). This could be a local csv, excel file, sql database etc... or remote data or results from a server (https://volcano.si.edu/database/webservices.cfm)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load csv results from server into a Pandas DataFrame\n", "server = 'https://webservices.volcano.si.edu/geoserver/GVP-VOTW/ows?'\n", "query = 'service=WFS&version=2.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes&outputFormat=csv'\n", "df = pd.read_csv(server+query)\n", "print(type(df))\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use the dataframe indexing to extract subsets\n", "df.iloc[2:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Query a column for a value of interest\n", "df.query('Volcano_Name == \"Shasta\"')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Pandas is all about efficient data access and visualization\n", "# Here are just a few examples\n", "df.Last_Eruption_Year.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.Region.unique()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.groupby('Region').Last_Eruption_Year.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Save the results of your analysis\n", "results = df.groupby('Region').Last_Eruption_Year.describe()\n", "results.to_csv('last_eruption_year_stats.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.Elevation.plot.hist()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.groupby('Region').Volcano_Name.count().sort_values().plot.barh()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercises:\n", "\n", "- Make a new plot!\n", "- Change the query to get eruption information" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Vector data with Geopandas\n", "\n", "Since the Volcano database has geolocation information we should consider visualizing information on a map!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now load query results as json directly in geopandas\n", "query = 'service=WFS&version=2.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes&outputFormat=json'\n", "gf = gpd.read_file(server+query)\n", "print(type(gf))\n", "gf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# NOTE this looks the same as the dataframe from before, \n", "# but it is actual a 'geodataframe' with a specified coordinate reference system (crs)\n", "print(type(gf))\n", "print(gf.crs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The same indexing and operations work with geodataframes\n", "gf.iloc[2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# But now we have a variety of spatial operations at our disposal\n", "# Subsetting is very easy in Geopandas. Often we only want points in a certain bounding box\n", "ymin, ymax, xmin, xmax = [45, 49, -120, -124]\n", "subset = gf.cx[xmin:xmax, ymin:ymax]\n", "subset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Geopandas by default plots latitude and longitude of each entry (row) in a table\n", "subset.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Maybe we want to get a polygon that encloses all those points\n", "# Geopandas uses shapely under the surface\n", "import shapely\n", "point_collection = shapely.geometry.MultiPoint(subset.geometry.tolist())\n", "polygon = point_collection.convex_hull\n", "polygon" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can convert that polygon to a new CRS easily with geopandas\n", "# For example, convert to UTM to get area in units of square meters\n", "# https://spatialreference.org/ref/epsg/wgs-84-utm-zone-10n/ \n", "# EPSG:32610\n", "gfShape = gpd.GeoDataFrame(geometry=[polygon], crs = {'init': 'epsg:4326'})\n", "gfShape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f'Polygon area km^2')\n", "area = gfShape.to_crs(epsg=32610).area * 1e-6\n", "area" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Save shape as geospatial vector format for GIS software\n", "myshape = gfShape.to_crs(epsg=32610)\n", "myshape.to_file('myshape.gpkg', driver='GPKG')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Finally, let's say you have a different polygon and want to extract all the volcanoes in it\n", "# This is referred to a 'spatial join' http://geopandas.org/mergingdata.html\n", "# gpd has some built-in datasets from the natural earth project https://www.naturalearthdata.com\n", "world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n", "world" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get volcanoes of Colombia\n", "colombia = world.query('name == \"Colombia\"')\n", "colombia" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "colombian_volcanoes = gpd.sjoin(gf, colombia, how=\"inner\", op='within')\n", "colombian_volcanoes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualization with holoviz\n", "\n", "For geographic data on a map [holoviz](http://holoviz.org) libraries are fantastic!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import geoviews as gv\n", "import hvplot.pandas\n", "\n", "print('Geoviews version: ', gv.__version__)\n", "print('hvplot version: ', hvplot.__version__)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Geoviews offers many basemaps\n", "tiles = gv.tile_sources.StamenTerrain()\n", "tiles" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# hvplot makes it easy to plot dataframes or geodataframes\n", "volcano_names = gf.loc[:,['Volcano_Name','geometry']]\n", "points = volcano_names.hvplot(geo=True, hover_cols=['Volcano_Name'], frame_width=600)\n", "points" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Combining data in geoviews is done like so:\n", "tiles * points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Excercises:\n", "\n", "- Recreate bar and histogram plots with hvplot!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }