{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Parse map coordinates from metadata\n", "\n", "The harvest of digitised maps metadata includes a `coordinates` column that provides a string representation of either a point or a bounding box. This notebook attempts to parse the coordinate string and convert the values to decimals. It then uses the decimal values to explore the geographical context of Trove's digitised map collection.\n", "\n", "The coordinate strings are either:\n", "\n", "* Points in the format `(Longitude/Latitude)`, for example: '(E 145°33ʹ/S 37°42ʹ)'.\n", "* Bounding boxes in the format `(W--E/N--S)`, for example: '(E 114°00ʹ00ʺ--E 130°00ʹ00ʺ/S 14°00ʹ00ʺ--S 34°00ʹ00ʺ)'.\n", "\n", "I'm using [lat_lon_parser](https://github.com/NOAA-ORR-ERD/lat_lon_parser) to convert degrees/minutes/seconds to decimal values." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import logging\n", "from operator import itemgetter\n", "\n", "import altair as alt\n", "import folium\n", "import ipywidgets as widgets\n", "import pandas as pd\n", "from folium.plugins import FastMarkerCluster\n", "from ipyleaflet import ImageOverlay, Map, WidgetControl\n", "from lat_lon_parser import parse\n", "from vega_datasets import data as vega_data" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Save the parsing errors in a log file\n", "logging.basicConfig(\n", " filename=\"parse_errors.log\", level=logging.DEBUG, format=\"%(message)s\"\n", ")\n", "\n", "\n", "def check_coord(value, lat_lon):\n", " \"\"\"\n", " Make sure that lat/longs are within expected range.\n", " Drop values if outside range.\n", " \"\"\"\n", " if lat_lon == \"lat\" and abs(value) <= 90:\n", " return value\n", " elif lat_lon == \"lon\" and abs(value) <= 180:\n", " return value\n", " else:\n", " raise ValueError\n", " return None\n", "\n", "\n", "def get_center(parsed):\n", " \"\"\"\n", " Get the centre of a bounding box.\n", " Returns point coords.\n", "\n", " See: https://gis.stackexchange.com/a/394860\n", " \"\"\"\n", " e, w, n, s = itemgetter(\"east\", \"west\", \"north\", \"south\")(parsed)\n", " width = max(w, e) - min(w, e)\n", " # get the box height\n", " height = max(s, n) - min(s, n)\n", " # compute the center\n", " center = check_coord(round(min(s, n) + height / 2, 4), \"lat\"), check_coord(\n", " round(min(w, e) + width / 2, 4), \"lon\"\n", " )\n", " return center\n", "\n", "\n", "def parse_value(value):\n", " \"\"\"\n", " Parse latitude or longitude values.\n", " \"\"\"\n", " values = value.split(\"--\")\n", " # Sometimes single hyphens are used\n", " if len(values) == 1:\n", " values = value.split(\"-\")\n", " coords = [parse(v) for v in values]\n", " return sorted(coords)\n", "\n", "\n", "def parse_coords(coords):\n", " \"\"\"\n", " Parses a coordinate string, converting values to decimal.\n", "\n", " For points -- returns latitude and longitude.\n", " For boxes -- returns centre of box as latitude, longitude, and bounds as east, west, north, and south.\n", " \"\"\"\n", " parsed = {}\n", " # Default values\n", " for c in [\"east\", \"west\", \"north\", \"south\", \"latitude\", \"longitude\"]:\n", " parsed[c] = None\n", " try:\n", " # Split string into lat and long using /\n", " long, lat = coords.split(\"/\")\n", " if long.startswith(\"N\"):\n", " long, lat = lat, long\n", " longs = parse_value(long)\n", " lats = parse_value(lat)\n", " except (ValueError, TypeError):\n", " logging.error(coords)\n", " else:\n", " try:\n", " # Bounding box\n", " if len(longs) == 2 and len(lats) == 2:\n", " parsed[\"east\"] = check_coord(longs[-1], \"lon\")\n", " parsed[\"west\"] = check_coord(longs[0], \"lon\")\n", " parsed[\"north\"] = check_coord(lats[-1], \"lat\")\n", " parsed[\"south\"] = check_coord(lats[0], \"lat\")\n", " # Get centre of bounding box\n", " latitude, longitude = get_center(parsed)\n", " parsed[\"latitude\"] = latitude\n", " parsed[\"longitude\"] = longitude\n", " # Point\n", " elif len(longs) == 1 and len(lats) == 1:\n", " parsed[\"latitude\"] = check_coord(lats[0], \"lat\")\n", " parsed[\"longitude\"] = check_coord(longs[0], \"lon\")\n", " except ValueError:\n", " logging.error(coords)\n", " return parsed\n", "\n", "\n", "def get_coords(row):\n", " \"\"\"\n", " Process a row of the dataset, converting coordinate string into decimal values.\n", " \"\"\"\n", " coords = (\n", " str(row[\"coordinates\"]).strip(\".\").strip(\"(\").strip(\")\").strip(\"[\").strip(\"]\")\n", " )\n", " return parse_coords(coords)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the harvested data." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\n", " \"https://raw.githubusercontent.com/GLAM-Workbench/trove-maps-data/main/single_maps_20230131.csv\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many digitised maps have coordinate values?" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "27158" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df[\"coordinates\"].notnull()].shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parse the coordinate strings." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Extract a subset of the harvested data\n", "df_coords = df.loc[df[\"coordinates\"].notnull()][[\"title\", \"url\", \"coordinates\"]].copy()\n", "\n", "# Parse the coordinate values and save the results to new columns\n", "df_coords[[\"east\", \"west\", \"north\", \"south\", \"latitude\", \"longitude\"]] = df_coords.loc[\n", " df_coords[\"coordinates\"].notnull()\n", "].apply(get_coords, axis=1, result_type=\"expand\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a peek at the parsed data." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleurlcoordinateseastwestnorthsouthlatitudelongitude
1Ayers, from 5 to 20 m. S.E. b. S. of Fort Poin...http://nla.gov.au/nla.obj-232162256(E 130⁰50'--E 131⁰00'/S 12⁰30'--S 12⁰40')131.000000130.833333-12.500000-12.666667-12.5833130.9167
3Bagot, from Fort Point to 20 m. E.N.E. drawn b...http://nla.gov.au/nla.obj-232162365(E 130⁰40'--E 131⁰05'/S 12⁰20'--S 12⁰30')131.083333130.666667-12.333333-12.500000-12.4167130.8750
5Bundey, N.T. Frazer S. Crawford, photo-lithogr...http://nla.gov.au/nla.obj-232164150(E 131⁰20'--E 131⁰25'/S 12⁰30'--S 12⁰40')131.416667131.333333-12.500000-12.666667-12.5833131.3750
7Cavenagh, from 18 to 32 m. S.S.E. of Fort Poin...http://nla.gov.au/nla.obj-232162631(E 130⁰50'--E 131⁰05'/S 12⁰40'--S 12⁰55')131.083333130.833333-12.666667-12.916667-12.7917130.9583
8Colton, from 25 to 40 m. S.E. of Fort Point / ...http://nla.gov.au/nla.obj-232162851(E 131⁰05'--E 131⁰20'/S 12⁰40'--S 12⁰55')131.333333131.083333-12.666667-12.916667-12.7917131.2083
\n", "
" ], "text/plain": [ " title \\\n", "1 Ayers, from 5 to 20 m. S.E. b. S. of Fort Poin... \n", "3 Bagot, from Fort Point to 20 m. E.N.E. drawn b... \n", "5 Bundey, N.T. Frazer S. Crawford, photo-lithogr... \n", "7 Cavenagh, from 18 to 32 m. S.S.E. of Fort Poin... \n", "8 Colton, from 25 to 40 m. S.E. of Fort Point / ... \n", "\n", " url \\\n", "1 http://nla.gov.au/nla.obj-232162256 \n", "3 http://nla.gov.au/nla.obj-232162365 \n", "5 http://nla.gov.au/nla.obj-232164150 \n", "7 http://nla.gov.au/nla.obj-232162631 \n", "8 http://nla.gov.au/nla.obj-232162851 \n", "\n", " coordinates east west \\\n", "1 (E 130⁰50'--E 131⁰00'/S 12⁰30'--S 12⁰40') 131.000000 130.833333 \n", "3 (E 130⁰40'--E 131⁰05'/S 12⁰20'--S 12⁰30') 131.083333 130.666667 \n", "5 (E 131⁰20'--E 131⁰25'/S 12⁰30'--S 12⁰40') 131.416667 131.333333 \n", "7 (E 130⁰50'--E 131⁰05'/S 12⁰40'--S 12⁰55') 131.083333 130.833333 \n", "8 (E 131⁰05'--E 131⁰20'/S 12⁰40'--S 12⁰55') 131.333333 131.083333 \n", "\n", " north south latitude longitude \n", "1 -12.500000 -12.666667 -12.5833 130.9167 \n", "3 -12.333333 -12.500000 -12.4167 130.8750 \n", "5 -12.500000 -12.666667 -12.5833 131.3750 \n", "7 -12.666667 -12.916667 -12.7917 130.9583 \n", "8 -12.666667 -12.916667 -12.7917 131.2083 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_coords.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many of the coordinate strings could be successfuly parsed as decimal values?" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26591" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_coords.loc[\n", " (df_coords[\"latitude\"].notnull()) & (df_coords[\"longitude\"].notnull())\n", "].shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Save the parsed coordinates." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [ "nbval-skip" ] }, "outputs": [], "source": [ "df_coords.to_csv(\"single_maps_20230131_coordinates.csv\", index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display the geographical distribution of the digitised maps\n", "\n", "To visualise the locations of the maps we can plot the centre points using Altair. Mouseover the points to view the map titles." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This loads the country boundaries data\n", "countries = alt.topo_feature(vega_data.world_110m.url, feature=\"countries\")\n", "\n", "# First we'll create the world map using the boundaries\n", "background = (\n", " alt.Chart(countries)\n", " .mark_geoshape(fill=\"lightgray\", stroke=\"white\")\n", " .project(\"equirectangular\")\n", " .properties(width=900, height=500)\n", ")\n", "\n", "# Then we'll plot the positions of places using circles\n", "points = (\n", " # By loading the data from url we stop the notebook from getting bloated\n", " alt.Chart(\n", " \"https://raw.githubusercontent.com/GLAM-Workbench/trove-maps-data/main/single_maps_20230131_coordinates.csv\"\n", " )\n", " .mark_circle(\n", " # Style the circles\n", " size=10,\n", " color=\"steelblue\",\n", " opacity=0.4,\n", " )\n", " .encode(\n", " # Provide the coordinates\n", " longitude=\"longitude:Q\",\n", " latitude=\"latitude:Q\",\n", " # More info on hover\n", " tooltip=[\"title:N\", \"url:N\"],\n", " )\n", " .properties(width=900, height=500)\n", ")\n", "\n", "# Finally we layer the plotted points on top of the backgroup map\n", "alt.layer(background, points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The obvious 'crosshairs' at the 0,0 point suggest that some of the coordinates might be missing data. If you explore the points you'll probably find some that seem to be in the wrong position, perhaps because the latitudes or longitudes have been flipped. More checking would be needed if you were using this dataset for detailed analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use Folium to plot the map locations, creating a marker for each point and grouping them into clusters. Folium's FastMarkerCluster plugin lets us add thousands of markers without slowing things down. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m = folium.Map(location=(0, 0), zoom_start=2)\n", "\n", "callback = (\n", " \"function (row) {\"\n", " 'var marker = L.marker(new L.LatLng(row[0], row[1]), {color: \"blue\", tooltip: row[2]});'\n", " \"var popup = L.popup({maxWidth: '300'});\"\n", " \"var icon = L.AwesomeMarkers.icon({\"\n", " \"icon: 'globe',\"\n", " \"iconColor: 'white',\"\n", " \"markerColor: 'blue',\"\n", " \"prefix: 'glyphicon',\"\n", " \"extraClasses: 'fa-rotate-0'\"\n", " \"});\"\n", " \"marker.setIcon(icon);\"\n", " \"const title = row[2];\"\n", " \"const url = row[3];\"\n", " \"var mytext = $(`

${title}

${url}
`)[0];\"\n", " \"popup.setContent(mytext);\"\n", " \"marker.bindPopup(popup);\"\n", " \"return marker};\"\n", ")\n", "\n", "m.add_child(\n", " FastMarkerCluster(\n", " df_coords.loc[\n", " (df_coords[\"latitude\"].notnull()) & (df_coords[\"longitude\"].notnull())\n", " ][[\"latitude\", \"longitude\", \"title\", \"url\"]],\n", " callback=callback,\n", " )\n", ")\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I've also saved the Folium map as [an HTML file](https://glam-workbench.net/trove-maps/trove-map-clusters.html) for easy browsing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display map image on a modern basemap\n", "\n", "To view a single map in context, we can use IPyLeaflet to layer the map image on top of a modern basemap, using the bounding box data to position the image. We'll also add a slider to change the opacity of the map image, so we can compare it to the basemap underneath.\n", "\n", "\n", "\n", "The results will, of course, depend on both the accuracy of the coordinates and the nature of the map. For more accurate results we'd want to use something like [MapWarper](https://mapwarper.net/) to georectify the map image. Note too that if a map image has a lot of white space or text around the map itself, the proportions of the image might not match the bounds of the map. This will result in the image being distorted." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Select a random record with a bounding box\n", "random = df_coords.loc[df_coords[\"east\"].notnull()].sample(n=1).iloc[0]\n", "\n", "default_layout = widgets.Layout(width=\"900px\", height=\"540px\")\n", "\n", "# Create the basemap and centre it on the map location\n", "m = Map(\n", " center=(random[\"latitude\"], random[\"longitude\"]),\n", " zoom=8,\n", " scroll_wheel_zoom=True,\n", " layout=default_layout,\n", ")\n", "\n", "# Add the map image as an overlay\n", "image = ImageOverlay(\n", " # Image url\n", " url=f\"{random['url']}/image\",\n", " # Postion the image using the bounding box data\n", " bounds=((random[\"south\"], random[\"west\"]), (random[\"north\"], random[\"east\"])),\n", ")\n", "\n", "# Add a slider to control the opacity of the image\n", "slider = widgets.FloatSlider(\n", " min=0,\n", " max=1,\n", " value=1, # Opacity is valid in [0,1] range\n", " orientation=\"vertical\", # Vertical slider is what we want\n", " readout=False, # No need to show exact value\n", " layout=widgets.Layout(width=\"2em\"),\n", ")\n", "\n", "# Connect slider value to opacity property of the Image Layer\n", "widgets.jslink((slider, \"value\"), (image, \"opacity\"))\n", "\n", "m.add_control(WidgetControl(widget=slider))\n", "\n", "m.add_layer(image)\n", "print(random[\"title\"])\n", "print(random[\"url\"])\n", "\n", "# Set the map zoom so you can see all of the image.\n", "m.fit_bounds([[random[\"south\"], random[\"west\"]], [random[\"north\"], random[\"east\"]]])\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.net/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "vscode": { "interpreter": { "hash": "f54aba2de7a75230217f549a064c6555500d2132634fbcab9606dbfda34a2a1b" } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }