{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 0: Getting started with MovingPandas\n", "\n", "\n", "\n", "MovingPandas provides a trajectory datatype based on GeoPandas.\n", "The project home is at https://github.com/anitagraser/movingpandas\n", "\n", "This tutorial presents some of the trajectory manipulation and visualization functions implemented in MovingPandas.\n", "\n", "After following this tutorial, you will have a basic understanding of what MovingPandas is and what it can be used for. You'll be ready to dive into application examples presented in the the follow-up tutorials:\n", "* [Tutorial 1: Ship data analysis](1_ship_data_analysis.ipynb)\n", "* [Tutorial 2: Bird migration analysis](2_bird_migration_analysis.ipynb)\n", "* [Tutorial 3: Horse collar data exploration](3_horse_collar.ipynb)\n", "* [Tutorial 4: Trajectory aggregation (flow maps)](4_generalization_and_aggregation.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "MovingPandas follows the **trajectories = timeseries with geometries** approach of modeling movement data.\n", "\n", "A MovingPandas trajectory can be interpreted as either a time series of points or a time series of line segments.\n", "The line-based approach has many advantages for trajectory analysis and visualization. (For more detail, see e.g. Westermeier (2018))\n", "\n", "![alt text](./data/trajectory_length.PNG \"Trajectory length\")\n", "![alt text](./data/trajectory_context.PNG \"Trajectory context\")\n", "![alt text](./data/trajectory_distance.PNG \"Trajectory distance\")\n", "\n", "\n", "### References\n", "\n", "* Graser, A. (2019). MovingPandas: Efficient Structures for Movement Data in Python. GI_Forum ‒ Journal of Geographic Information Science 2019, 1-2019, 54-68. doi:10.1553/giscience2019_01_s54. URL: https://www.austriaca.at/rootcollection?arp=0x003aba2b\n", "* Westermeier, E.M. (2018). Contextual Trajectory Modeling and Analysis. Master Thesis, Interfaculty Department of Geoinformatics, University of Salzburg.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Jupyter notebook setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.core.display import display, HTML\n", "display(HTML(\"\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import urllib\n", "import os\n", "import pandas as pd\n", "import geopandas as gpd\n", "from geopandas import GeoDataFrame, read_file\n", "from shapely.geometry import Point, LineString, Polygon\n", "from fiona.crs import from_epsg\n", "from datetime import datetime, timedelta\n", "from matplotlib import pyplot as plt\n", "\n", "import sys\n", "sys.path.append(\"..\")\n", "import movingpandas as mpd\n", "print(mpd.__version__)\n", "\n", "import warnings\n", "warnings.simplefilter(\"ignore\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CRS_METRIC = from_epsg(31256)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a trajectory from scratch\n", "\n", "Trajectory objects consist of a trajectory ID and a GeoPandas GeoDataFrame with a DatetimeIndex. The data frame therefore represents the trajectory data as a Pandas time series with associated point locations (and optional further attributes).\n", "\n", "Let's create a small toy trajectory to see how this works:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame([\n", " {'geometry':Point(0,0), 't':datetime(2018,1,1,12,0,0)},\n", " {'geometry':Point(6,0), 't':datetime(2018,1,1,12,6,0)},\n", " {'geometry':Point(6,6), 't':datetime(2018,1,1,12,10,0)},\n", " {'geometry':Point(9,9), 't':datetime(2018,1,1,12,15,0)}\n", "]).set_index('t')\n", "geo_df = GeoDataFrame(df, crs=CRS_METRIC)\n", "toy_traj = mpd.Trajectory(geo_df, 1)\n", "toy_traj.df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access **key information** about our trajectory by looking at the print output:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(toy_traj)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also access the trajectories GeoDataFrame:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "toy_traj.df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing trajectories\n", "\n", "To **visualize the trajectory**, we can turn it into a linestring.\n", "\n", "(The notebook environment automatically plots Shapely geometry objects like the LineString returned by to_linestring().)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "toy_traj.to_linestring()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can **compute the speed** of movement along the trajectory (between consecutive points). The values are in meters per second:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "toy_traj.add_speed(overwrite=True)\n", "toy_traj.df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also visualize the speed values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "toy_traj.plot(column=\"speed\", linewidth=5, capstyle='round', legend=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In contrast to the earlier example where we visualized the whole trajectory as one linestring, the trajectory plot() function draws each line segment individually and thus each can have a different color." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyzing trajectories" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MovingPandas provides many functions for trajectory analysis. \n", "\n", "To see all available functions of the MovingPandas.Trajectory class use:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dir(mpd.Trajectory)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Functions that start with an underscore (e.g. ```__str__```) should not be called directly. All other functions are free to use." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extracting a moving object's position was at a certain time\n", "\n", "For example, let's have a look at the get_position_at() function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "help(mpd.Trajectory.get_position_at)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we call this method, the resulting point is directly rendered:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "toy_traj.get_position_at(datetime(2018,1,1,12,6,0), method=\"nearest\") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see its coordinates, we can look at the print output:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(toy_traj.get_position_at(datetime(2018,1,1,12,6,0), method=\"nearest\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method parameter describes what the function should do if there is no entry in the trajectory GeoDataFrame for the specified timestamp. \n", "\n", "For example, there is no entry at 2018-01-01 12:07:00" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "toy_traj.df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method=\"nearest\"))\n", "print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method=\"interpolated\"))\n", "print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method=\"ffill\")) # from the previous row\n", "print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method=\"bfill\")) # from the following row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extracting trajectory segments based on time or geometry (i.e. clipping)\n", "\n", "First, let's extract the trajectory segment for a certain time period:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "segment = toy_traj.get_segment_between(datetime(2018,1,1,12,6,0),datetime(2018,1,1,12,12,0))\n", "print(segment)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's extract the trajectory segment that intersects with a given polygon:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xmin, xmax, ymin, ymax = 2, 8, -10, 5\n", "polygon = Polygon([(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), (xmin, ymin)])\n", "polygon" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "intersections = toy_traj.clip(polygon)\n", "print(intersections[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "intersections[0].plot(linewidth=5, capstyle='round')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Beyond toy trajectories: Loading trajectory data from GeoPackage\n", "\n", "The MovingPandas repository contains a demo GeoPackage file that can be loaded as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "df = read_file('data/demodata_geolife.gpkg')\n", "df['t'] = pd.to_datetime(df['t'])\n", "df = df.set_index('t').tz_localize(None)\n", "print(\"Finished reading {} rows\".format(len(df)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After reading the trajectory point data from file, we want to construct the trajectories.\n", "\n", "There are two options:\n", "\n", "1. Manually calling the Trajectory constructor\n", "2. Using TrajectoryCollection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Option 1: Creating trajectories manually\n", "\n", "Pandas makes it straightforward to group trajectory points by trajectory id. After the grouping step, we can call the Trajectory constructor: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "trajectories = []\n", "for key, values in df.groupby(['trajectory_id']):\n", " trajectory = mpd.Trajectory(values, key)\n", " print(trajectory)\n", " trajectories.append(trajectory)\n", "\n", "print(\"Finished creating {} trajectories\".format(len(trajectories)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Option 2: Creating trajectories with TrajectoryCollection\n", "\n", "TrajectoryCollection is a convenience class that takes care of creating trajectories from a GeoDataFrame:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "traj_collection = mpd.TrajectoryCollection(df, 'trajectory_id')\n", "print(traj_collection)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "traj_collection.plot(column='trajectory_id', legend=True, figsize=(9,5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's look at one of those trajectories:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_traj = traj_collection.trajectories[1]\n", "print(my_traj)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True, vmax=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To visualize trajectories in their geographical context, we can also create interactive plots with basemaps:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_traj.hvplot(c='speed', width=700, height=400, line_width=7.0, tiles='StamenTonerBackground', cmap='Viridis', colorbar=True, clim=(0,20))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "( my_traj.hvplot(c='speed', width=700, height=400, line_width=7.0, tiles='StamenTonerBackground', cmap='Viridis', colorbar=True, clim=(0,20)) * \n", " gpd.GeoDataFrame([my_traj.get_row_at(datetime(2009,6,29,8,0,0))]).hvplot(geo=True, size=200, color='red') )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trajectory manipulation and handling\n", "\n", "### Finding intersections with a Shapely polygon\n", "\n", "The clip function can be used to extract trajectory segments that are located within an area of interest polygon.\n", "\n", "This is how to use clip on a list of Trajectory objects:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xmin, xmax, ymin, ymax = 116.3685035,116.3702945,39.904675,39.907728\n", "polygon = Polygon([(xmin,ymin), (xmin,ymax), (xmax,ymax), (xmax,ymin), (xmin,ymin)])\n", "\n", "intersections = []\n", "for traj in trajectories:\n", " for intersection in traj.clip(polygon):\n", " intersections.append(intersection)\n", "print(\"Found {} intersections\".format(len(intersections)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "intersections[2].plot(linewidth=5.0, capstyle='round')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, using **TrajectoryCollection**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clipped = traj_collection.clip(polygon)\n", "clipped.trajectories[2].plot(linewidth=5.0, capstyle='round')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Splitting trajectories\n", "\n", "Gaps are quite common in trajectories. For example, GPS tracks may contain gaps if moving objects enter tunnels where GPS reception is lost. In other use cases, moving objects may leave the observation area for longer time before returning and continuing their recorded track.\n", "\n", "Depending on the use case, we therefore might want to split trajectories at observation gaps that exceed a certain minimum duration:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_traj = trajectories[1]\n", "print(my_traj)\n", "my_traj.plot(linewidth=5.0, capstyle='round')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "split = mpd.ObservationGapSplitter(my_traj).split(gap=timedelta(minutes=5))\n", "for traj in split:\n", " print(traj)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=len(split), figsize=(19,4))\n", "for i, traj in enumerate(split):\n", " traj.plot(ax=axes[i], linewidth=5.0, capstyle='round')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generalizing trajectories\n", "\n", "To reduce the size of trajectory objects, we can generalize them, for example, using the Douglas-Peucker algorithm:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "original_traj = trajectories[1]\n", "print(original_traj)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "original_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try different tolerance settings and observe the results in line geometry and therefore also length:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "generalized_traj = mpd.DouglasPeuckerGeneralizer(original_traj).generalize(tolerance=0.001)\n", "generalized_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Original length: %s'%(original_traj.get_length()))\n", "print('Generalized length: %s'%(generalized_traj.get_length()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An alternative generalization method is to down-sample the trajectory to ensure a certain time delta between records:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "time_generalized = mpd.MinTimeDeltaGeneralizer(original_traj).generalize(tolerance=timedelta(minutes=1))\n", "time_generalized.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "time_generalized.df.head(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "original_traj.df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Continue exploring MovingPandas\n", "\n", "* [Tutorial 1: Ship data analysis](1_ship_data_analysis.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }