geoplot.
sankey
(*args, projection=None, start=None, end=None, path=None, hue=None, categorical=False, scheme=None, k=5, cmap='viridis', vmin=None, vmax=None, legend=False, legend_kwargs=None, legend_labels=None, legend_values=None, legend_var=None, extent=None, figsize=(8, 6), ax=None, scale=None, limits=(1, 5), scale_func=None, **kwargs)¶A geospatial Sankey diagram (flow map).
Parameters: |
|
---|---|
Returns: | The axis object with the plot on it. |
Return type: | AxesSubplot or GeoAxesSubplot instance |
Examples
A Sankey diagram is a type of plot useful for visualizing flow through a network. Minard’s diagram of Napolean’s ill-fated invasion of Russia is a classical example. A Sankey diagram is useful when you wish to show movement within a network (a graph): traffic load a road network, for example, or typical airport traffic patterns.
This plot type is unusual amongst geoplot
types in that it is meant for two columns of geography,
resulting in a slightly different API. A basic sankey
specifies data, start
points, end
points, and,
optionally, a projection.
import geoplot as gplt
import geoplot.crs as gcrs
gplt.sankey(mock_data, start='origin', end='destination', projection=gcrs.PlateCarree())
However, Sankey diagrams need additional geospatial context to be interpretable. In this case (and for the remainder of the examples) we will provide this by overlaying world geometry.
ax = gplt.sankey(mock_data, start='origin', end='destination', projection=gcrs.PlateCarree())
ax.coastlines()
This function is very seaborn
-like in that the usual df
argument is optional. If geometries are provided
as independent iterables it can be dropped.
ax = gplt.sankey(projection=gcrs.PlateCarree(), start=network['from'], end=network['to'])
ax.set_global()
ax.coastlines()
You may be wondering why the lines are curved. By default, the paths followed by the plot are the actual shortest paths between those two points, in the spherical sense. This is known as great circle distance. We can see this clearly in an ortographic projection.
ax = gplt.sankey(projection=gcrs.Orthographic(), start=network['from'], end=network['to'],
extent=(-180, 180, -90, 90))
ax.set_global()
ax.coastlines()
ax.outline_patch.set_visible(True)
Plot using a different distance metric, pass it as an argument to the path
parameter. Awkwardly, cartopy
crs
objects (not geoplot
ones) are required.
import cartopy.ccrs as ccrs
ax = gplt.sankey(projection=gcrs.PlateCarree(), start=network['from'], end=network['to'],
path=ccrs.PlateCarree())
ax.set_global()
ax.coastlines()
One of the most powerful sankey
features is that if your data has custom paths, you can use those instead
with the path
parameter.
gplt.sankey(dc, path=dc.geometry, projection=gcrs.AlbersEqualArea(), scale='aadt',
limits=(0.1, 10))
The hue
parameter colorizes paths based on data.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to', path=PlateCarree(),
hue='mock_variable')
ax.set_global()
ax.coastlines()
cmap
changes the colormap.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='mock_variable', cmap='RdYlBu')
ax.set_global()
ax.coastlines()
legend
adds a legend.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='mock_variable', cmap='RdYlBu',
legend=True)
ax.set_global()
ax.coastlines()
Pass keyword arguments to the legend with legend_kwargs
. This is often necessary for positioning.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='mock_variable', cmap='RdYlBu',
legend=True, legend_kwargs={'bbox_to_anchor': (1.4, 1.0)})
ax.set_global()
ax.coastlines()
Specify custom legend labels with legend_labels
.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='mock_variable', cmap='RdYlBu',
legend=True, legend_kwargs={'bbox_to_anchor': (1.25, 1.0)},
legend_labels=['Very Low', 'Low', 'Average', 'High', 'Very High'])
ax.set_global()
ax.coastlines()
Change the number of bins with k
.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='mock_variable', cmap='RdYlBu',
legend=True, legend_kwargs={'bbox_to_anchor': (1.25, 1.0)},
k=3)
ax.set_global()
ax.coastlines()
Change the binning sceme with scheme
.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='mock_variable', cmap='RdYlBu',
legend=True, legend_kwargs={'bbox_to_anchor': (1.25, 1.0)},
k=3, scheme='equal_interval')
ax.set_global()
ax.coastlines()
If your variable of interest is already categorical, specify categorical=True
to
use the labels in your dataset directly.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='above_meridian', cmap='RdYlBu',
legend=True, legend_kwargs={'bbox_to_anchor': (1.2, 1.0)},
categorical=True)
ax.set_global()
ax.coastlines()
scale
can be used to enable linewidth
as a visual variable.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
scale='mock_data',
legend=True, legend_kwargs={'bbox_to_anchor': (1.2, 1.0)},
color='lightblue')
ax.set_global()
ax.coastlines()
By default, the polygons will be scaled according to the data such that the minimum value is scaled by a factor of
0.2 while the largest value is left unchanged. Adjust this using the limits
parameter.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
scale='mock_data', limits=(1, 3),
legend=True, legend_kwargs={'bbox_to_anchor': (1.2, 1.0)},
color='lightblue')
ax.set_global()
ax.coastlines()
The default scaling function is a linear one. You can change the scaling function to whatever you want by
specifying a scale_func
input. This should be a factory function of two variables which, when given the
maximum and minimum of the dataset, returns a scaling function which will be applied to the rest of the data.
def trivial_scale(minval, maxval):
def scalar(val):
return 2
return scalar
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
scale='mock_data', scale_func=trivial_scale,
legend=True, legend_kwargs={'bbox_to_anchor': (1.1, 1.0)},
color='lightblue')
ax.set_global()
ax.coastlines()
In case more than one visual variable is used, control which one appears in the legend using legend_var
.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
scale='mock_data',
legend=True, legend_kwargs={'bbox_to_anchor': (1.1, 1.0)},
hue='mock_data', legend_var="hue")
ax.set_global()
ax.coastlines()