geoplot.
aggplot
(df, projection=None, hue=None, by=None, geometry=None, nmax=None, nmin=None, nsig=0, agg=<function mean>, cmap='viridis', vmin=None, vmax=None, legend=True, legend_kwargs=None, extent=None, figsize=(8, 6), ax=None, **kwargs)¶A minimum-expectations summary plot type which handles mixes of geometry types and missing aggregate geometry data.
Parameters: |
|
---|---|
Returns: | The axis object with the plot on it. |
Return type: | AxesSubplot or GeoAxesSubplot instance |
Examples
This plot type accepts any geometry, including mixtures of polygons and points, averages the value of a certain data parameter at their centroids, and plots the result, using a colormap is the visual variable.
For the purposes of comparison, this library’s choropleth
function takes some sort of data as input,
polygons as geospatial context, and combines themselves into a colorful map. This is useful if, for example,
you have data on the amount of crimes committed per neigborhood, and you want to plot that.
But suppose your original dataset came in terms of individual observations - instead of “n collisions happened in this neighborhood”, you have “one collision occured at this specific coordinate at this specific date”. This is obviously more useful data - it can be made to do more things - but in order to generate the same map, you will first have to do all of the work of geolocating your points to neighborhoods (not trivial), then aggregating them (by, in this case, taking a count).
aggplot
handles this work for you. It takes input in the form of observations, and outputs as useful as
possible a visualization of their “regional” statistics. What a “region” corresponds to depends on how much
geospatial information you can provide.
If you can’t provide any geospatial context, aggplot
will output what’s known as a quadtree: it will break
your data down into recursive squares, and use them to aggregate the data. This is a very experimental format,
is very fiddly to make, and has not yet been optimized for speed; but it provides a useful baseline which
requires no additional work and can be used to expose interesting geospatial correlations right away. And,
if you have enough observations, it can be a pretty good approximation (collisions in New York City pictured).
Our first few examples are of just such figures. A simple aggplot
quadtree can be generated with just a
dataset, a data column of interest, and, optionally, a projection.
import geoplot as gplt
import geoplot.crs as gcrs
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='LATDEP')
To get the best output, you often need to tweak the nmin
and nmax
parameters, controlling the minimum and
maximum number of observations per box, respectively, yourself. In this case we’ll also choose a different
matplotlib colormap, using the cmap
parameter.
aggplot
will satisfy the nmax
parameter before trying to satisfy nmin
, so you may result in spaces
without observations, or ones lacking a statistically significant number of observations. This is necessary in
order to break up “spaces” that the algorithm would otherwise end on. You can control the maximum number of
observations in the blank spaces using the nsig
parameter.
gplt.aggplot(collisions, nmin=20, nmax=500, nsig=5, projection=gcrs.PlateCarree(), hue='LATDEP', cmap='Reds')
You’ll have to play around with these parameters to get the clearest picture.
Usually, however, observations with a geospatial component will be provided with some form of spatial
categorization. In the case of our collisions example, this comes in the form of a postal zip code. With the
simple addition of this data column via the by
parameter, our output changes radically, taking advantage of
the additional context we now have to sort and aggregate our observations by (hopefully) geospatially
meaningful, if still crude, grouped convex hulls.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
by='BOROUGH')
Finally, suppose you actually know exactly the geometries that you would like to aggregate by. Provide these in
the form of a geopandas
GeoSeries
, one whose index matches the values in your by
column (so
BROOKLYN
matches BROOKLYN
for example), to the geometry
parameter. Your output will now be an
ordinary choropleth.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
by='BOROUGH', geometry=boroughs)
Observations will be aggregated by average, by default. In our example case, our plot shows that accidents in Manhattan tend to result in significantly fewer injuries than accidents occuring in other boroughs.
Choose which aggregation to use by passing a function to the agg
parameter.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
geometry=boroughs_2, by='BOROUGH', agg=len)
legend
toggles the legend.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
geometry=boroughs_2, by='BOROUGH', agg=len, legend=False)
Additional keyword arguments are passed to the underlying matplotlib.patches.Polygon
instances
(ref).
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
geometry=boroughs_2, by='BOROUGH', agg=len, linewidth=0)
Additional keyword arguments for styling the colorbar legend are
passed using legend_kwargs
.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
geometry=boroughs_2, by='BOROUGH', agg=len, linewidth=0,
legend_kwargs={'orientation': 'horizontal'})