{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tour du Mont-Blanc 2016 Analysis\n", "\n", "### By Jeremy Tuloup - [@jtpio](https://twitter.com/jtpio)\n", "\n", "Blog post: [https://jtp.io/2017/02/12/tmb-2016-analysis.html](https://jtp.io/2017/02/12/tmb-2016-analysis.html)\n", "\n", "\n", "Facing new challenges is always a good thing. And they can be of any type.\n", "\n", "In August 2016, I decided to go hiking around the Mont-Blanc (highest montain in Europe).\n", "\n", "**9 days. 3 people. Carrying a tent and using it every day. ~160 km in total.**\n", "\n", "I had several goals in mind:\n", "\n", "1. For personal reasons, this is something I had wanted to do for about a year\n", "2. Record GPS data and write some kind of report with numbers and graphs\n", "3. Face a new challenge to push my physical limits and motivation further\n", "\n", "While I was preparing the Tour du Mont-Blanc (commonly called TMB), it was quite difficult to find good information about the different days. There are **plenty** of blog posts from other trekkers, books and guides with useful advice. That's true. But what I really wanted was a good online resource with the number of kilometers as well as an estimation of the elevation and times for all the different parts of the track. And I couln't find any!\n", "\n", "The purpose of this notebook is to **provide this \"stats oriented guide\"**, and show how convenient it can be to mix programming and data to tell a story." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gear\n", "\n", "The GPS data was recorded with a standard Android phone (Nexus 5).\n", "\n", "Every morning, we would simply start the recording, let the device do its job and stop the recording at the end of the day. Sometimes (although not systemactically), I would place a mark to indicate longer breaks (e.g lunch).\n", "\n", "Looking at the graphs below, you will notice disparities in the data, caused by the quality of the sampling. They can be quite annoying, especially when computing summary statistics related to the elevation. But data analysis is also about this: dealing with data that is not necessary clean and well formated." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing the data\n", "\n", "There are several websites that make it possible to visualize GPS data.\n", "\n", "A good one is [mapexplorer.com](http://maplorer.com/view_gpx.html). Upload a GPX file to visualize both the trace map and the elevation graph. Quite useful for single tracks, but not very practical in our case when the goal is to combine several days on the same page.\n", "\n", "Thankfully, the Jupyter Notebook makes it quite easy to write text and show photos while plotting data. Super convenient to tell a story!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "This analysis was based on a fresh Anaconda environment." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.5.2 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:53:06) \n", "[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]\n" ] } ], "source": [ "import sys\n", "print(sys.version)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's define some tools and libraries needed to create the statistics and graphs. We rely on some useful packages:\n", "\n", "pip install seaborn folium gpxpy piexif srtm.py" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import datetime\n", "import folium\n", "import glob\n", "import gpxpy\n", "import matplotlib.pyplot as plt\n", "import os\n", "import pandas as pd\n", "import piexif\n", "import seaborn as sns\n", "import srtm\n", "from collections import defaultdict\n", "from IPython.display import display, HTML\n", "from pytz import timezone\n", "from statistics import mean\n", "\n", "%matplotlib inline\n", "plt.rcdefaults()\n", "sns.set_style('darkgrid')\n", "sns.set_palette('deep', desat=.6)\n", "sns.set_context(rc={\"figure.figsize\": (16, 9)})\n", "\n", "# tweak table display\n", "display(HTML(''))\n", "\n", "france = timezone('Europe/Paris')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "folium will display the maps, and gpxpy load and manipulate gpx data.\n", "\n", "srtm will help fix the elevation data recorded by the Android device. The raw data is indeed not that good and shows lots of jumps in the elevation." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "DATA_FOLDER = './gps_data/'\n", "data_files = [os.path.join(DATA_FOLDER, f) for f in os.listdir(DATA_FOLDER)]\n", "\n", "# sort by day\n", "data_files.sort()\n", "\n", "# retrieve the elevation data to fix the raw (recorded) elevation\n", "elevation_data = srtm.get_data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's build a mini framework that will be reused for each day. Doing so, we can minimize the number of lines of code and focus on the story telling.\n", "\n", "To make it simpler, let's load all the data at once." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "day_points = []\n", "stats = defaultdict(list)\n", "\n", "for file in data_files:\n", " with open(file, 'r') as f:\n", " gpx = gpxpy.parse(f)\n", " \n", " elevation_data.add_elevations(gpx, smooth=True)\n", " points = gpx.get_points_data()\n", " day_points.append({\n", " 'points': points,\n", " 'waypoints': gpx.waypoints\n", " })\n", " \n", " lowest, highest = gpx.get_elevation_extremes()\n", " uphill, downhill = gpx.get_uphill_downhill()\n", " stats['Date'].append(gpx.get_time_bounds().start_time)\n", " stats['Distance'].append(round(points[-1].distance_from_start / 1000, 2))\n", " stats['Duration'].append(str(datetime.timedelta(seconds=gpx.get_duration())))\n", " stats['Lowest'].append(int(lowest))\n", " stats['Highest'].append(int(highest))\n", " stats['Uphill'].append(int(uphill))\n", " stats['Downhill'].append(int(downhill))\n", " \n", "df = pd.DataFrame(\n", " stats,\n", " columns=['Date', 'Distance', 'Duration', 'Lowest', 'Highest', 'Uphill', 'Downhill']\n", ")\n", "\n", "df.columns = ['Date', 'Distance (km)', 'Duration', 'Lowest (m)', 'Highest (m)', 'Uphill (m)', 'Downhill (m)']\n", "# reindex by date\n", "df['Date'] = df['Date'].map(lambda dt: dt.replace(tzinfo=france))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is another interesting thing to do with an interactive map: place markers along the trail and show a photo taken at that position (similar to the way Google Maps does it).\n", "\n", "This requires having the GPS coordinates for each photo. While most of the smartphones add them automatically in the metadata, this is not the case for my old camera :(\n", "\n", "However, **the camera saves the time at which the photo was taken**. To associate a photo with a location, we can go through the list of location and compare the time." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# offset between the camera and the phone (still winter time)\n", "CLOCK_OFFSET = datetime.timedelta(hours=2, minutes=0)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Serve the images from the web so they are correctly displayed in the folium iframe\n", "BASE_PHOTO_URL = 'https://raw.githubusercontent.com/jtpio/data-playground/master/tmb/'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def plot_photos(m, points, day):\n", " for img_file in glob.iglob('./photos/{}/*.JPG'.format(day), recursive=True):\n", " exif_dict = piexif.load(img_file)\n", " raw_date = exif_dict['0th'][piexif.ImageIFD.DateTime].decode('utf-8')\n", " d = datetime.datetime.strptime(raw_date, '%Y:%m:%d %H:%M:%S') - CLOCK_OFFSET\n", " closest_point = min(points, key=lambda p: abs(p.point.time - d)).point\n", " \n", " # normalize the pictures size\n", " width = 600\n", " resolution_x = exif_dict['Exif'][piexif.ExifIFD.PixelXDimension]\n", " resolution_y = exif_dict['Exif'][piexif.ExifIFD.PixelYDimension]\n", " ratio = resolution_x / resolution_y\n", " height = width / ratio\n", " \n", " lat, lng = closest_point.latitude, closest_point.longitude\n", " img_link = os.path.join(BASE_PHOTO_URL, img_file)\n", " img_html = ''.format(img_link, width)\n", " img_frame = folium.element.IFrame(html=img_html, width=width + 20, height=height + 20)\n", " popup = folium.Popup(img_frame, max_width=width + 20)\n", " marker = folium.Marker((lat, lng), popup=popup,\n", " icon=folium.Icon(color='green', icon='picture'))\n", " marker.add_to(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, given a set of points, plot them on a map." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def plot_track(m, points, waypoints, zoom):\n", " points = [p.point for p in points]\n", " mean_lat = mean(p.latitude for p in points)\n", " mean_lng = mean(p.longitude for p in points)\n", " \n", " # create the map\n", " m.location = [mean_lat, mean_lng]\n", " m.zoom_start = zoom\n", " \n", " pts = [(p.latitude, p.longitude) for p in points]\n", " folium.PolyLine(pts, color='red', weight=2.5, opacity=1).add_to(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given a set of points, plot the elevation over the distance." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def plot_elevation(points, waypoints):\n", " px = [p.distance_from_start / 1000 for p in points]\n", " py = [p.point.elevation for p in points]\n", " plt.plot(px, py)\n", " plt.xlabel('Distance (km)')\n", " plt.ylabel('Elevation (m)')\n", " plt.xlim(0, px[-1])\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given a day, show its statistics." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def plot_day(day, zoom=13):\n", " m = folium.Map()\n", " points = day_points[day - 1]['points']\n", " wps = day_points[day - 1]['waypoints']\n", " \n", " display(HTML(df[day-1:day].to_html(index=False)))\n", " plot_track(m, points, wps, zoom)\n", " plot_photos(m, points, day)\n", " display(m)\n", " \n", " plot_elevation(points, wps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot all the days." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def plot_all_days(zoom=13):\n", " points = [pt for pts in day_points for pt in pts['points']]\n", " m = folium.Map()\n", " plot_track(m, points, [], zoom)\n", " display(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All set. Time for the trip!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Day 1: Les Houches - Chalets de Miage" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateDistance (km)DurationLowest (m)Highest (m)Uphill (m)Downhill (m)
2016-08-01 08:11:29+02:0015.295:53:5198821141356776
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
DateDistance (km)DurationLowest (m)Highest (m)Uphill (m)Downhill (m)
2016-08-02 06:50:55+02:0020.038:19:05114624851567683
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
DateDistance (km)DurationLowest (m)Highest (m)Uphill (m)Downhill (m)
2016-08-03 06:48:53+02:0021.3812:00:351783266210851511
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
DateDistance (km)DurationLowest (m)Highest (m)Uphill (m)Downhill (m)
2016-08-04 08:15:36+02:0015.399:45:3912071999177959
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
DateDistance (km)DurationLowest (m)Highest (m)Uphill (m)Downhill (m)
2016-08-05 08:13:52+02:0017.388:05:56126220431285793
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
DateDistance (km)DurationLowest (m)Highest (m)Uphill (m)Downhill (m)
2016-08-06 08:22:57+02:0014.996:38:24144025346831018
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "