\n",
"\n",
"### Overview\n",
" \n",
"* **teaching:** 30 minutes\n",
"* **exercises:** 0\n",
"* **questions:**\n",
" * How can I analyze and visualize vector data in Python with geopandas?\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Table of contents\n",
"1. [**Pandas and Geopandas**](#Pandas-and-Geopandas)\n",
"1. [**Tabular data with Pandas**](#Tabular-data-with-Pandas)\n",
"1. [**Vector data with Geopandas**](#Vector-data-with-Geopandas)\n",
"1. [**Visualization with holoviz**](#Visualization-with-holoviz)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pandas and Geopandas primer\n",
"\n",
"[Pandas](https://pandas.pydata.org) is a core scientific Python library to work with \"Panel Data\" (PanDas). Basically if you have a spreadsheet or database you should be using Pandas. Pandas has many input/output (I/O) functions, and two core data structures - the \"Series\" and \"DataFrame\". [Geopandas](http://geopandas.org) extends Pandas to work efficently with collections of geographic Vector data - geometric shapes that are georeferenced to a position on Earth's surface. Geopandas data objects are, you might have guessed, called \"GeoSeries\" and \"GeoDataFrame\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#These libraries are mature, but constantly improving, so it's always good to keep track of the version:\n",
"import pandas as pd\n",
"import geopandas as gpd\n",
"print('Pandas version: ', pd.__version__)\n",
"print('Geopandas version: ', gpd.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tabular data with Pandas\n",
"\n",
"We'll use the [Smithsonian Global Volcanism database](https://volcano.si.edu). This could be a local csv, excel file, sql database etc... or remote data or results from a server (https://volcano.si.edu/database/webservices.cfm)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load csv results from server into a Pandas DataFrame\n",
"server = 'https://webservices.volcano.si.edu/geoserver/GVP-VOTW/ows?'\n",
"query = 'service=WFS&version=2.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes&outputFormat=csv'\n",
"df = pd.read_csv(server+query)\n",
"print(type(df))\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use the dataframe indexing to extract subsets\n",
"df.iloc[2:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Query a column for a value of interest\n",
"df.query('Volcano_Name == \"Shasta\"')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Pandas is all about efficient data access and visualization\n",
"# Here are just a few examples\n",
"df.Last_Eruption_Year.describe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.Region.unique()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.groupby('Region').Last_Eruption_Year.describe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Save the results of your analysis\n",
"results = df.groupby('Region').Last_Eruption_Year.describe()\n",
"results.to_csv('last_eruption_year_stats.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.Elevation.plot.hist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.groupby('Region').Volcano_Name.count().sort_values().plot.barh()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Exercises:\n",
"\n",
"- Make a new plot!\n",
"- Change the query to get eruption information"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Vector data with Geopandas\n",
"\n",
"Since the Volcano database has geolocation information we should consider visualizing information on a map!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Now load query results as json directly in geopandas\n",
"query = 'service=WFS&version=2.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes&outputFormat=json'\n",
"gf = gpd.read_file(server+query)\n",
"print(type(gf))\n",
"gf.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# NOTE this looks the same as the dataframe from before, \n",
"# but it is actual a 'geodataframe' with a specified coordinate reference system (crs)\n",
"print(type(gf))\n",
"print(gf.crs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The same indexing and operations work with geodataframes\n",
"gf.iloc[2]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# But now we have a variety of spatial operations at our disposal\n",
"# Subsetting is very easy in Geopandas. Often we only want points in a certain bounding box\n",
"ymin, ymax, xmin, xmax = [45, 49, -120, -124]\n",
"subset = gf.cx[xmin:xmax, ymin:ymax]\n",
"subset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Geopandas by default plots latitude and longitude of each entry (row) in a table\n",
"subset.plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Maybe we want to get a polygon that encloses all those points\n",
"# Geopandas uses shapely under the surface\n",
"import shapely\n",
"point_collection = shapely.geometry.MultiPoint(subset.geometry.tolist())\n",
"polygon = point_collection.convex_hull\n",
"polygon"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We can convert that polygon to a new CRS easily with geopandas\n",
"# For example, convert to UTM to get area in units of square meters\n",
"# https://spatialreference.org/ref/epsg/wgs-84-utm-zone-10n/ \n",
"# EPSG:32610\n",
"gfShape = gpd.GeoDataFrame(geometry=[polygon], crs = {'init': 'epsg:4326'})\n",
"gfShape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f'Polygon area km^2')\n",
"area = gfShape.to_crs(epsg=32610).area * 1e-6\n",
"area"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Save shape as geospatial vector format for GIS software\n",
"myshape = gfShape.to_crs(epsg=32610)\n",
"myshape.to_file('myshape.gpkg', driver='GPKG')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Finally, let's say you have a different polygon and want to extract all the volcanoes in it\n",
"# This is referred to a 'spatial join' http://geopandas.org/mergingdata.html\n",
"# gpd has some built-in datasets from the natural earth project https://www.naturalearthdata.com\n",
"world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n",
"world"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get volcanoes of Colombia\n",
"colombia = world.query('name == \"Colombia\"')\n",
"colombia"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"colombian_volcanoes = gpd.sjoin(gf, colombia, how=\"inner\", op='within')\n",
"colombian_volcanoes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualization with holoviz\n",
"\n",
"For geographic data on a map [holoviz](http://holoviz.org) libraries are fantastic!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import geoviews as gv\n",
"import hvplot.pandas\n",
"\n",
"print('Geoviews version: ', gv.__version__)\n",
"print('hvplot version: ', hvplot.__version__)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Geoviews offers many basemaps\n",
"tiles = gv.tile_sources.StamenTerrain()\n",
"tiles"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# hvplot makes it easy to plot dataframes or geodataframes\n",
"volcano_names = gf.loc[:,['Volcano_Name','geometry']]\n",
"points = volcano_names.hvplot(geo=True, hover_cols=['Volcano_Name'], frame_width=600)\n",
"points"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Combining data in geoviews is done like so:\n",
"tiles * points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Excercises:\n",
"\n",
"- Recreate bar and histogram plots with hvplot!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}