{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Parse map coordinates from metadata\n", "\n", "The harvest of digitised maps metadata includes a `coordinates` column that provides a string representation of either a point or a bounding box. This notebook attempts to parse the coordinate string and convert the values to decimals. It then uses the decimal values to explore the geographical context of Trove's digitised map collection.\n", "\n", "The coordinate strings are either:\n", "\n", "* Points in the format `(Longitude/Latitude)`, for example: '(E 145°33ʹ/S 37°42ʹ)'.\n", "* Bounding boxes in the format `(W--E/N--S)`, for example: '(E 114°00ʹ00ʺ--E 130°00ʹ00ʺ/S 14°00ʹ00ʺ--S 34°00ʹ00ʺ)'.\n", "\n", "I'm using [lat_lon_parser](https://github.com/NOAA-ORR-ERD/lat_lon_parser) to convert degrees/minutes/seconds to decimal values." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import logging\n", "from operator import itemgetter\n", "\n", "import altair as alt\n", "import folium\n", "import ipywidgets as widgets\n", "import pandas as pd\n", "from folium.plugins import FastMarkerCluster\n", "from ipyleaflet import ImageOverlay, Map, WidgetControl\n", "from lat_lon_parser import parse\n", "from vega_datasets import data as vega_data" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Save the parsing errors in a log file\n", "logging.basicConfig(\n", " filename=\"parse_errors.log\", level=logging.DEBUG, format=\"%(message)s\"\n", ")\n", "\n", "\n", "def check_coord(value, lat_lon):\n", " \"\"\"\n", " Make sure that lat/longs are within expected range.\n", " Drop values if outside range.\n", " \"\"\"\n", " if lat_lon == \"lat\" and abs(value) <= 90:\n", " return value\n", " elif lat_lon == \"lon\" and abs(value) <= 180:\n", " return value\n", " else:\n", " raise ValueError\n", " return None\n", "\n", "\n", "def get_center(parsed):\n", " \"\"\"\n", " Get the centre of a bounding box.\n", " Returns point coords.\n", "\n", " See: https://gis.stackexchange.com/a/394860\n", " \"\"\"\n", " e, w, n, s = itemgetter(\"east\", \"west\", \"north\", \"south\")(parsed)\n", " width = max(w, e) - min(w, e)\n", " # get the box height\n", " height = max(s, n) - min(s, n)\n", " # compute the center\n", " center = check_coord(round(min(s, n) + height / 2, 4), \"lat\"), check_coord(\n", " round(min(w, e) + width / 2, 4), \"lon\"\n", " )\n", " return center\n", "\n", "\n", "def parse_value(value):\n", " \"\"\"\n", " Parse latitude or longitude values.\n", " \"\"\"\n", " values = value.split(\"--\")\n", " # Sometimes single hyphens are used\n", " if len(values) == 1:\n", " values = value.split(\"-\")\n", " coords = [parse(v) for v in values]\n", " return sorted(coords)\n", "\n", "\n", "def parse_coords(coords):\n", " \"\"\"\n", " Parses a coordinate string, converting values to decimal.\n", "\n", " For points -- returns latitude and longitude.\n", " For boxes -- returns centre of box as latitude, longitude, and bounds as east, west, north, and south.\n", " \"\"\"\n", " parsed = {}\n", " # Default values\n", " for c in [\"east\", \"west\", \"north\", \"south\", \"latitude\", \"longitude\"]:\n", " parsed[c] = None\n", " try:\n", " # Split string into lat and long using /\n", " long, lat = coords.split(\"/\")\n", " if long.startswith(\"N\"):\n", " long, lat = lat, long\n", " longs = parse_value(long)\n", " lats = parse_value(lat)\n", " except (ValueError, TypeError):\n", " logging.error(coords)\n", " else:\n", " try:\n", " # Bounding box\n", " if len(longs) == 2 and len(lats) == 2:\n", " parsed[\"east\"] = check_coord(longs[-1], \"lon\")\n", " parsed[\"west\"] = check_coord(longs[0], \"lon\")\n", " parsed[\"north\"] = check_coord(lats[-1], \"lat\")\n", " parsed[\"south\"] = check_coord(lats[0], \"lat\")\n", " # Get centre of bounding box\n", " latitude, longitude = get_center(parsed)\n", " parsed[\"latitude\"] = latitude\n", " parsed[\"longitude\"] = longitude\n", " # Point\n", " elif len(longs) == 1 and len(lats) == 1:\n", " parsed[\"latitude\"] = check_coord(lats[0], \"lat\")\n", " parsed[\"longitude\"] = check_coord(longs[0], \"lon\")\n", " except ValueError:\n", " logging.error(coords)\n", " return parsed\n", "\n", "\n", "def get_coords(row):\n", " \"\"\"\n", " Process a row of the dataset, converting coordinate string into decimal values.\n", " \"\"\"\n", " coords = (\n", " str(row[\"coordinates\"]).strip(\".\").strip(\"(\").strip(\")\").strip(\"[\").strip(\"]\")\n", " )\n", " return parse_coords(coords)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the harvested data." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\n", " \"https://raw.githubusercontent.com/GLAM-Workbench/trove-maps-data/main/single_maps_20230131.csv\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many digitised maps have coordinate values?" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "27158" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df[\"coordinates\"].notnull()].shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parse the coordinate strings." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Extract a subset of the harvested data\n", "df_coords = df.loc[df[\"coordinates\"].notnull()][[\"title\", \"url\", \"coordinates\"]].copy()\n", "\n", "# Parse the coordinate values and save the results to new columns\n", "df_coords[[\"east\", \"west\", \"north\", \"south\", \"latitude\", \"longitude\"]] = df_coords.loc[\n", " df_coords[\"coordinates\"].notnull()\n", "].apply(get_coords, axis=1, result_type=\"expand\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a peek at the parsed data." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | title | \n", "url | \n", "coordinates | \n", "east | \n", "west | \n", "north | \n", "south | \n", "latitude | \n", "longitude | \n", "
---|---|---|---|---|---|---|---|---|---|
1 | \n", "Ayers, from 5 to 20 m. S.E. b. S. of Fort Poin... | \n", "http://nla.gov.au/nla.obj-232162256 | \n", "(E 130⁰50'--E 131⁰00'/S 12⁰30'--S 12⁰40') | \n", "131.000000 | \n", "130.833333 | \n", "-12.500000 | \n", "-12.666667 | \n", "-12.5833 | \n", "130.9167 | \n", "
3 | \n", "Bagot, from Fort Point to 20 m. E.N.E. drawn b... | \n", "http://nla.gov.au/nla.obj-232162365 | \n", "(E 130⁰40'--E 131⁰05'/S 12⁰20'--S 12⁰30') | \n", "131.083333 | \n", "130.666667 | \n", "-12.333333 | \n", "-12.500000 | \n", "-12.4167 | \n", "130.8750 | \n", "
5 | \n", "Bundey, N.T. Frazer S. Crawford, photo-lithogr... | \n", "http://nla.gov.au/nla.obj-232164150 | \n", "(E 131⁰20'--E 131⁰25'/S 12⁰30'--S 12⁰40') | \n", "131.416667 | \n", "131.333333 | \n", "-12.500000 | \n", "-12.666667 | \n", "-12.5833 | \n", "131.3750 | \n", "
7 | \n", "Cavenagh, from 18 to 32 m. S.S.E. of Fort Poin... | \n", "http://nla.gov.au/nla.obj-232162631 | \n", "(E 130⁰50'--E 131⁰05'/S 12⁰40'--S 12⁰55') | \n", "131.083333 | \n", "130.833333 | \n", "-12.666667 | \n", "-12.916667 | \n", "-12.7917 | \n", "130.9583 | \n", "
8 | \n", "Colton, from 25 to 40 m. S.E. of Fort Point / ... | \n", "http://nla.gov.au/nla.obj-232162851 | \n", "(E 131⁰05'--E 131⁰20'/S 12⁰40'--S 12⁰55') | \n", "131.333333 | \n", "131.083333 | \n", "-12.666667 | \n", "-12.916667 | \n", "-12.7917 | \n", "131.2083 | \n", "
${title}
${url}