"
]
},
{
"cell_type": "markdown",
"id": "293467b6",
"metadata": {},
"source": [
"In this exercise we will link plots generated with `hvplot` from the earthquake data using HoloViews linked selections.\n",
"\n",
"#### Loading the data as before"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ea4d657",
"metadata": {},
"outputs": [],
"source": [
"import pathlib\n",
"import pandas as pd\n",
"import holoviews as hv # noqa\n",
"\n",
"import hvplot.pandas # noqa: adds hvplot method to pandas objects\n",
"import hvplot.xarray # noqa: adds hvplot method to xarray objects"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "677f5904",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_parquet(pathlib.Path('../../data/earthquakes-projected.parq'))\n",
"df.index = df.index.tz_localize(None) # to prevent error in comparison\n",
"df = df.reset_index() # treat time like any other column to allow selections on any dimension\n",
"most_severe = df[df.mag >= 7]"
]
},
{
"cell_type": "markdown",
"id": "78539656",
"metadata": {},
"source": [
"#### The distribution of earthquakes over the Earth's surface \n",
"\n",
"##### Over latitude\n",
"\n",
"So far we have seen linked histograms, but the same approach generalizes to any other collection of plot types. This time we shall use `kind='points'` to generate points plots instead of histograms.\n",
"\n",
"First create a points plot called `lat_points` that plots the latitudes of the `most_severe` earthquakes over time. Customize it by making the points red plus-sign markers (`+`) and the plot responsive with height of `300` pixels. At the end of the cell, display this object.\n",
"\n",
" Hint \n",
"\n",
"The time of the earthquakes are in the `'time'` column while the latitudes are in the `'latitude'` column of `most_severe`. The points point color is controlled by the `color` keyword argument and `'red'` is a valid color specification. The height is controlled by a keyword of the same name and `marker='+'` will use that marker style.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7703556b",
"metadata": {},
"outputs": [],
"source": [
"lat_points = ... # Use hvplot here to visualize a latitude points from most_severe\n",
"lat_points # Display it"
]
},
{
"cell_type": "markdown",
"id": "f9de1ef9",
"metadata": {},
"source": [
"Solution \n",
"\n",
"```python\n",
"lat_points = most_severe.hvplot(\n",
" x='time', y='latitude', kind='points', color='red', marker='+', responsive=True, height=300)\n",
"lat_points \n",
"```\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"id": "94b45e6c",
"metadata": {},
"source": [
"##### Combined with a longitude points plot\n",
"\n",
"Now make a corresponding points plot over longitude called `lon_points` that plots the longitudes of the `most_severe` earthquakes over time. Customize it by making the points out of blue cross markers (`x`) with the same height of `300` pixels as before. It should be responsive and at the end of the cell, display this object in a layout with the previous `lat_points` plot. The longitude plot should be on the left and the latitude plot should be on the right."
]
},
{
"cell_type": "markdown",
"id": "5c006fb5",
"metadata": {},
"source": [
" Hint \n",
"\n",
"This plot is identical to the previous one except the name of the handle and the fact that the `'longitude'` column is now used and the points are colored blue. To combine the plots together, use the previous handle of `lat_points` to create a layout with `lon_points` and the HoloViews `+` operator.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d194a398",
"metadata": {},
"outputs": [],
"source": [
"lon_points = ... # Use hvplot here to visualize longitude points from most_severe\n",
"# Display it to the left of lat_points"
]
},
{
"cell_type": "markdown",
"id": "b97235be",
"metadata": {},
"source": [
"Solution \n",
"\n",
"```python\n",
"lon_points = most_severe.hvplot(\n",
" x='time', y='longitude', kind='points', color='blue', marker='x', responsive=True, height=300)\n",
"lon_points + lat_points\n",
"```\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"id": "34dd264e",
"metadata": {},
"source": [
"Now we have two points plots derived from the `most_severe` `DataFrame` but right now they are not linked.\n",
"\n",
"#### Linking the points plots\n",
"\n",
"Now we can use `hv.link_selections` to link these two points plots together. Create the same layout as before with the longitude points on the left and the latitude points on the right, but this time link them together.\n",
"\n",
"\n",
" Hint \n",
"\n",
"You will need to make a linked selection instance with `hv.link_selections.instance()` and pass the points layout to that instance in order to link the plots.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "075fb0ce",
"metadata": {},
"outputs": [],
"source": [
"# Display a longitude points plot on the left that is linked to a latitude points plot on the right"
]
},
{
"cell_type": "markdown",
"id": "9184083d",
"metadata": {},
"source": [
"Solution \n",
" \n",
"Combining the plots as well as the linked selection:\n",
"\n",
"```python\n",
"ls = hv.link_selections.instance()\n",
"\n",
"lat_points = most_severe.hvplot(\n",
" x='time', y='latitude', kind='points', color='red', marker='+', responsive=True, height=300)\n",
"\n",
"lon_points = most_severe.hvplot(\n",
" x='time', y='longitude', kind='points', color='blue', marker='x', responsive=True, height=300)\n",
"\n",
"ls(lon_points + lat_points)\n",
"```\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"id": "ef32c8ca",
"metadata": {},
"source": [
"Use the box select tool to confirm that these plots are now linked. Note that you can reset the selection with the reset button in the Bokeh toolbar.\n",
"\n",
"#### Analysing the filtered selection\n",
"\n",
"Now that we have linked plots, we can interactively select points in the visualization and then use that selection to filter the original `DataFrame`. After making a selection in the plot above, show a statistical summary of the points that have been selected.\n",
"\n",
" Hint \n",
"\n",
"The linked selection object has a `.filter` method that can filter your original `DataFrame` (`most_severe`). To compute statistics of a `DataFrame`, you can use the pandas `.describe()` method.\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "94e54444",
"metadata": {},
"outputs": [],
"source": [
"# Display a summary of a linked selection in the plot above using the pandas .describe()"
]
},
{
"cell_type": "markdown",
"id": "abecfa31",
"metadata": {},
"source": [
"Solution \n",
"\n",
"Assume the handle to the linked selection is called `ls`:\n",
"\n",
"```python\n",
"ls.filter(most_severe).describe()\n",
"```\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"id": "79130bbc",
"metadata": {},
"source": [
"#### Extra credit: Points vs. Scatter\n",
"\n",
"If you change `kind='points'` to `kind='scatter'` on one or both plots above, you should see what looks like precisely the same plots, so you may wonder why there are two different but so similar plots available. But if you then make a selection, you should be able to see the difference (try it both ways!). We used a points plot above, because the latitude and longitude do not have a dependent relationship with time, and a scatter plot requires and conveys such a relationship. Here if you make a box selection on a scatter plot, you'll see that the selected \"box\" is actually a region of the x axis, because the selection is applied only to the key dimensions (independent variables), not to any dependent variables. Whereas the same selection on a points plot will select on both latitude/longitude and time, resulting in a box-shaped selection rather than a horizontal span. Differences in behavior like this are why it is important to be explicit about what you are assuming about how your data dimensions are related. \n",
"\n",
"When in doubt, use a points plot unless you really do want to claim that (or investigate whether) `y` depends on `x`, as a scatter plot assumes. E.g. could you imagine swapping `x` and `y` and still have a meaningful plot? If not, it's a candidate for a scatter plot; otherwise it's a points plot."
]
}
],
"metadata": {
"language_info": {
"name": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}