{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Peter Norvig, Oct 2017
pandas Aug 2020
Data updated monthly
\n", "\n", "# Bike Stats Code\n", "\n", "Code to support the analysis in the notebook [Bike-Stats.ipynb](Bike-Stats.ipynb)." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "from IPython.core.display import HTML\n", "from typing import Iterator, Iterable, Tuple, List, Dict\n", "from collections import namedtuple\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import re" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading Data: `rides`, `yearly`, and `daily`\n", "\n", "I saved a bunch of my recorded [Strava](https://www.strava.com/athletes/575579) rides, most of them longer than 25 miles, as [`bikerides.tsv`](bikerides.tsv). The tab-separated columns are: the date; the year; a title; the elapsed time of the ride; the length of the ride in miles; and the total climbing in feet, e.g.: \n", "\n", " Mon, 10/5/2020\tHalf way around the bay on bay trail\t6:26:35\t80.05\t541\n", " \n", "I parse the file into the pandas dataframe `rides`, adding derived columns for miles per hour, vertical meters climbed per hour (VAM), grade in feet per mile, grade in percent, and kilometers ridden:" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [], "source": [ "def parse_rides(lines):\n", " \"\"\"Parse a bikerides.tsv file.\"\"\"\n", " return drop_index(add_ride_columns(pd.read_table(lines, comment='#',\n", " converters=dict(hours=parse_hours, feet=parse_int))))\n", "\n", "def parse_hours(time: str) -> float: \n", " \"\"\"Parse '4:30:00' => 4.5 hours.\"\"\"\n", " hrs = sum(int(x) * 60 ** (i - 2) \n", " for i, x in enumerate(reversed(time.split(':'))))\n", " return round(hrs, 2)\n", "\n", "def parse_int(field: str) -> int: return int(field.replace(',', '').replace('ft', '').replace('mi', ''))\n", "\n", "def add_ride_columns(rides) -> pd.DataFrame:\n", " \"\"\"Compute new columns from existing ones.\"\"\"\n", " mi, hr, ft = rides['miles'], rides['hours'], rides['feet']\n", " if 'date' in rides and 'year' not in rides:\n", " rides.insert(1, \"year\", [int(str(d).split('/')[-1]) for d in rides['date'].tolist()])\n", " return rides.assign(\n", " mph=round(mi / hr, 2),\n", " vam=round(ft / hr / 3.28084),\n", " fpmi=round(ft / mi),\n", " pct=round(ft / mi * 100 / 5280, 2),\n", " kms=round(mi * 1.609, 2),\n", " meters=round(ft * 0.3048))\n", "\n", "def drop_index(frame) -> pd.DataFrame:\n", " \"\"\"Drop the index column.\"\"\"\n", " frame.index = [''] * len(frame)\n", " return frame" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [], "source": [ "rides = parse_rides(open('bikerides.tsv'))\n", "\n", "yearly = parse_rides(open('bikeyears.tsv')).drop(columns='date')\n", "\n", "daily = yearly.copy()\n", "for name in 'hours miles feet kms meters'.split():\n", " daily[name] = round(daily[name].map(lambda x: x / (6 * 52)), 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading Data: `segments`, `places`, and `tiles`\n", "\n", "I picked some representative climbing segments ([`bikesegments.csv`](bikesegments.csv)) with the segment length in miles and climb in feet, along with several of my times on the segment. A line like\n", "\n", " Old La Honda, 2.98, 1255, 28:49, 34:03, 36:44\n", " \n", "means that this segment of Old La Honda Rd is 2.98 miles long, 1255 feet of climbing, and I've selected three times for my rides on that segment: the fastest, middle, and slowest of the times that Strava shows. (However, I ended up dropping the slowest time in the charts to make them less busy.)\n", "\n", "I keep track of percentage of roads ridden in various places in `'bikeplaceshort.csv'`, which comes from wandrer.earth." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "def parse_segments(lines) -> pd.DataFrame:\n", " \"\"\"Parse segments into rides. Each ride is a tuple of:\n", " (segment_title, time, miles, feet_climb).\"\"\"\n", " records = []\n", " for segment in lines:\n", " title, mi, ft, *times = segment.split(',')[:5]\n", " for time in times:\n", " records.append((title, parse_hours(time), float(mi), parse_int(ft)))\n", " return add_ride_columns(pd.DataFrame(records, columns=('title', 'hours', 'miles', 'feet')))" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "def make_clickable(comment) -> str:\n", " \"\"\"Make a clickable link for a pandas dataframe.\"\"\"\n", " if '!' not in comment:\n", " return comment\n", " anchor, number = comment.split('!')\n", " return f'{anchor}'\n", "\n", "def link_date(date) -> str:\n", " \"\"\"Make the date into a clickable link.\"\"\"\n", " m, d, y = date.split('/')\n", " return f'{date}'" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "segments = parse_segments(open('bikesegments.csv'))\n", "\n", "places = drop_index(pd.read_table(open('bikeplaceshort.csv'), sep=',', comment='#'))\n", "\n", "tiles = drop_index(pd.DataFrame(columns='date square cluster total comment'.split(), data=[\n", " ('02/25/2024', 14, 1196, 3279, 'Expanding through Santa Cruz and to the South!10838162005'),\n", " ('01/01/2024', 14, 1056, 3105, 'Start of this year'),\n", " ('12/08/2023', 14, 1042, 3084, 'Benicia ride connects East Bay and Napa clusters!10350071201'),\n", " ('11/05/2023', 14, 932, 2914, 'Alum Rock ride gets 14x14 max square!8850905872'),\n", " ('06/30/2023', 13, 689, 2640, 'Rides in east Bay fill in holes!9298603815'),\n", " ('04/14/2023', 13, 630, 2595, 'Black Sands Beach low-tide hike connects Marin to max cluster!8891171008'),\n", " ('03/04/2023', 13, 583, 2574, 'Almaden rides connects Gilroy to max cluster!8654437264'),\n", " ('10/22/2022', 13, 396, 2495, 'Alviso levees to get to 13x13 max square!8003921626'),\n", " ('10/16/2022', 12, 393, 2492, 'Milpitas ride connects East Bay to max cluster!7974994605'),\n", " ('09/08/2022', 11, 300, 2487, 'First started tracking tiles')])\n", " ).style.format({'comment': make_clickable, 'date': link_date})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting and Curve-Fitting" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "plt.rcParams[\"figure.figsize\"] = (12, 6)\n", "\n", "def show(X, Y, data, title='', degrees=(2, 3)): \n", " \"\"\"Plot X versus Y and a best fit curve to it, with some bells and whistles.\"\"\"\n", " grid(); plt.ylabel(Y); plt.xlabel(X); plt.title(title)\n", " plt.scatter(X, Y, data=data, c='grey', marker='+')\n", " X1 = np.linspace(min(data[X]), max(data[X]), 100)\n", " for degree in degrees:\n", " F = poly_fit(data[X], data[Y], degree)\n", " plt.plot(X1, [F(x) for x in X1], '-')\n", " \n", "def grid(axis='both'): \n", " \"Turn on the grid.\"\n", " plt.minorticks_on() \n", " plt.grid(which='major', ls='-', alpha=3/4, axis=axis)\n", " plt.grid(which='minor', ls=':', alpha=1/2, axis=axis)\n", " \n", "def poly_fit(X, Y, degree: int) -> callable:\n", " \"\"\"The polynomial function that best fits the X,Y vectors.\"\"\"\n", " coeffs = np.polyfit(X, Y, degree)[::-1]\n", " return lambda x: sum(c * x ** i for i, c in enumerate(coeffs)) \n", "\n", "estimator = poly_fit(rides['feet'] / rides['miles'], \n", " rides['miles'] / rides['hours'], 2)\n", "\n", "def estimate(miles, feet, estimator=estimator) -> float:\n", " \"\"\"Given a ride distance in miles and total climb in feet, estimate time in minutes.\"\"\"\n", " return round(60 * miles / estimator(feet / miles))\n", "\n", "def top(frame, field, n=20): return drop_index(frame.sort_values(field, ascending=False).head(n))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Wandrer Places " ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [], "source": [ "def mapl(f, *values): return list(map(f, *values))\n", "\n", "def wandering(places=places, by=['pct']):\n", " \"All those who wander are not lost.\" # Also try by=['cat', 'pct']\n", " M = 1_000_000\n", " F = drop_index(places.sort_values(by=by, ascending=('pct' not in by)))\n", " pd.set_option('display.max_rows', None)\n", " return pd.DataFrame(\n", " {'pct': [f'{p:.1f}%' if (p > 1) else f'{p:.3f}%' for p in F['pct']],\n", " 'county': F['county'],\n", " 'name': F['name'],\n", " 'total': F['miles'],\n", " 'done': mapl(rounded, F['miles'] * F['pct'] / 100),\n", " 'to next badge': mapl(to_go, F['pct'], F['miles'])})\n", "\n", "\n", "def to_go(pct, miles, targets=(0.02, 0.1, 0.2, 1, 2, 25, 50, 90, 99)):\n", " \"\"\"Describe next target to hit to get a badge.\"\"\"\n", " done = pct * miles / 100\n", " return next((f'{rounded(target / 100 * miles - done):>5} mi to {target}%' \n", " for target in targets\n", " if done < target / 100 * miles), \n", " '')\n", " \n", "def rounded(x: float) -> str: \n", " \"\"\"Round x to 3 spaces wide (if possible).\"\"\"\n", " return (rounded(x/1e6) + 'M' if x > 1e6\n", " else f'{x/1e6:4.2f}M' if x > 1e5\n", " else f'{round(x):,d}' if x > 10 \n", " else f'{x:.1f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pareto Front " ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "def make_leaders(data):\n", " \"\"\"Make a dataframe of leaders in two counties.\"\"\"\n", " leaders = pd.DataFrame(data, columns=['Name', 'Initials', 'SMC %', 'SCC %'])\n", " leaders['SMC miles'] = [round(2814 * d[2] / 100) for d in data]\n", " leaders['SCC miles'] = [round(7569 * d[3] / 100) for d in data]\n", " leaders['Total miles'] = leaders['SMC miles'] + leaders['SCC miles']\n", " leaders['Avg %'] = (leaders['SMC %'] + leaders['SCC %']) / 2\n", " return drop_index(leaders.sort_values('Avg %', ascending=False))\n", "\n", "leaders = make_leaders([ # Data as of Jan 3, 2024 (Name, Initials, SMC, SCC)\n", " ('Megan Gardner', 'MG', 99.01, 13.6),\n", " ('Barry Mann', 'BM', 77.41, 30.38), \n", " ('Peter Norvig', 'PN', 63.5, 33.0),\n", " ('Brian Feinberg', 'BF', 32.5, 43.9),\n", " ('Jason Molenda', 'JM', 7.56, 56.25) \n", " ])\n", " \n", "def pareto_front(leaders):\n", " ax = leaders.plot('SMC %', 'SCC %', kind='scatter')\n", " front = sorted((x, y) for i, (_, _, x, y, *_) in leaders.iterrows())\n", " ax.plot(*zip(*front), ':'); ax.axis('square'); grid()\n", " ax.set_xlabel('San Mateo County %')\n", " ax.set_ylabel('Santa Clara County %')\n", " for i, (name, initials, x, y, *_) in leaders.iterrows():\n", " ax.text(x - 2, y + 2, initials)\n", " return leaders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Eddington Number" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "def Ed_number(rides, units) -> int:\n", " \"\"\"Eddington number: The maximum integer e such that you have bicycled \n", " a distance of at least e on at least e days.\"\"\"\n", " distances = sorted(rides[units], reverse=True)\n", " return max(e for e, d in enumerate(distances, 1) if d >= e)\n", "\n", "def Ed_gap(distances, target) -> int:\n", " \"\"\"The number of rides needed to reach an Eddington number target.\"\"\"\n", " return target - sum(distances >= target)\n", "\n", "def Ed_gaps(rides, E_km=103, E_mi=69, N=9) -> dict:\n", " \"\"\"A table of gaps to Eddington numbers by year.\"\"\"\n", " data = [(E_km + d, Ed_gap(rides.kms, E_km + d), E_mi + d, Ed_gap(rides.miles, E_mi + d))\n", " for d in range(N)]\n", " df = pd.DataFrame(data, columns=['kms', 'kms gap', 'miles', 'miles gap'])\n", " return drop_index(df)\n", "\n", "def Ed_progress(rides, years=range(2024, 2013, -1)) -> pd.DataFrame:\n", " \"\"\"A table of Eddington numbers by year, and a plot.\"\"\"\n", " def Ed(year, unit): return Ed_number(rides[rides['year'] <= year], unit)\n", " data = [(y, Ed(y, 'kms'), Ed(y, 'miles')) for y in years]\n", " df = pd.DataFrame(data, columns=['year', 'Ed_km', 'Ed_mi'])\n", " return drop_index(df)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "toc-autonumbering": true }, "nbformat": 4, "nbformat_minor": 4 }