{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring digitised maps in Trove\n", "\n", "If you've ever poked around in Trove's 'map' zone, you might have noticed the beautiful deep-zoomable images available for many of the NLA's digitised maps. Even better, in many cases the high-resolution TIFF versions of the digitised maps are available for download.\n", "\n", "I knew there were lots of great maps you could download from Trove, but how many? And how big were the files? I thought I'd try to quantify this a bit by harvesting and analysing the metadata.\n", "\n", "The size of the downloadable files (both in bytes and pixels) are [embedded within the landing pages](https://nbviewer.jupyter.org/github/GLAM-Workbench/trove-books/blob/master/Metadata-for-Trove-digitised-works.ipynb) for the digitised maps. So harvesting the metadata involves a number of steps:\n", "\n", "* Use the Trove API to search for maps that include the phrase \"nla.obj\" – this will filter the results to maps that have been digitised and are available through Trove\n", "* Work through the results, checking to see if the record includes a link to a digital copy.\n", "* If there is a digital copy, extract the embedded work data from the landing page.\n", "* Scrape the copyright status from the page.\n", "\n", "**2023 update!** It turns out that embedded within the embedded data are MARC descriptions that include some other metadata that's not available through the API. This includes the map scale and coordinates. The coordinates can either be a point, or a bounding box. I've saved these values as well, and explored some ways of parsing and visualising the coordinates in this notebook.\n", "\n", "The fields in the harvested dataset are:\n", "\n", "* `title` – title of the map\n", "* `url` – url to the map in the digitised file viewer\n", "* `work_url` – url to the work in the Trove map category\n", "* `identifier` – NLA identifier\n", "* `date` – date published or created\n", "* `creators` – creators of the map\n", "* `publication` – publication place, publisher, and publication date (if available)\n", "* `extent` – physical description of map\n", "* `copyright_status` – copyright status based on available metadata (scraped from web page)\n", "* `scale` – map scale\n", "* `coordinates` – map coordinates, either a point or a bounding box (format is 'W--E/N--S', eg: 'E 130⁰50'--E 131⁰00'/S 12⁰30'--S 12⁰40')\n", "* `filesize_string` – filesize string in MB\n", "* `filesize` – size of TIFF file in bytes\n", "* `width` – width of TIFF in pixels\n", "* `height` – height of TIFF in pixels\n", "* `copy_role` – I'm not sure what the values in this field signify, but as described below, you can use them to download high-res TIFF images\n", "\n", "## Getting map images\n", "\n", "There are a couple of undocumented tricks that make it easy to programatically download images of the maps.\n", "\n", "* To view the JPG version, just add `/image` to the map url. For example: http://nla.gov.au/nla.obj-232162256/image \n", "* The JPG image will be at the highest available resolution, but you requests smaller versions using the `wid` parameter to specify a pixel width. For example: http://nla.gov.au/nla.obj-232162256/image?wid=400\n", "* There seems to be an upper limit for the resolution of the JPG versions, higher resolutions might be available via the TIFF file which you can download by adding the `copy_role` value to the url. For example, if the `copy_role` is 'm' this url will download the TIFF: http://nla.gov.au/nla.obj-232162256/m (note that some of these files are very, very large – you might want to check the `filesize` before downloading)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting things up" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "import json\n", "import os\n", "import re\n", "import time\n", "import warnings\n", "\n", "warnings.simplefilter(action=\"ignore\", category=FutureWarning)\n", "\n", "import altair as alt\n", "import pandas as pd\n", "import requests_cache\n", "from bs4 import BeautifulSoup\n", "from IPython.display import FileLink, display\n", "from requests.adapters import HTTPAdapter\n", "from requests.packages.urllib3.util.retry import Retry\n", "from tqdm.auto import tqdm\n", "\n", "s = requests_cache.CachedSession()\n", "retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504])\n", "s.mount(\"https://\", HTTPAdapter(max_retries=retries))\n", "s.mount(\"http://\", HTTPAdapter(max_retries=retries))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Load variables from the .env file if it exists\n", "# Use %%capture to suppress messages\n", "%load_ext dotenv\n", "%dotenv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## You'll need a Trove API key to harvest the data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your API key is: gq29l1g1h75pimh4\n" ] } ], "source": [ "# This creates a variable called 'api_key', paste your key between the quotes\n", "api_key = \"\"\n", "\n", "# Use an api key value from environment variables if it is available (useful for testing)\n", "if os.getenv(\"TROVE_API_KEY\"):\n", " api_key = os.getenv(\"TROVE_API_KEY\")\n", "\n", "# This displays a message with your key\n", "print(\"Your API key is: {}\".format(api_key))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define some functions to do the work" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def get_total_results(params):\n", " \"\"\"\n", " Get the total number of results for a search.\n", " \"\"\"\n", " these_params = params.copy()\n", " these_params[\"n\"] = 0\n", " response = s.get(\"https://api.trove.nla.gov.au/v2/result\", params=these_params)\n", " data = response.json()\n", " return int(data[\"response\"][\"zone\"][0][\"records\"][\"total\"])\n", "\n", "\n", "def get_fulltext_url(links):\n", " \"\"\"\n", " Loop through the identifiers to find a link to the digital version of the journal.\n", " \"\"\"\n", " url = None\n", " for link in links:\n", " if link[\"linktype\"] == \"fulltext\" and \"nla.obj\" in link[\"value\"]:\n", " url = link[\"value\"]\n", " break\n", " return url\n", "\n", "\n", "def get_copyright_status(response=None, url=None):\n", " \"\"\"\n", " Scrape copyright information from a digital work page.\n", " \"\"\"\n", " if url and not response:\n", " response = s.get(url)\n", " if response:\n", " soup = BeautifulSoup(response.text, \"lxml\")\n", " try:\n", " copyright_status = str(\n", " soup.find(\"div\", id=\"tab-access\").find(\"p\", class_=\"decorative\").string\n", " )\n", " return copyright_status\n", " # No access tab\n", " except AttributeError:\n", " pass\n", " return None\n", "\n", "\n", "def get_work_data(url):\n", " \"\"\"\n", " Extract work data in a JSON string from the work's HTML page.\n", " \"\"\"\n", " response = s.get(url)\n", " try:\n", " work_data = json.loads(\n", " re.search(\n", " r\"var work = JSON\\.parse\\(JSON\\.stringify\\((\\{.*\\})\", response.text\n", " ).group(1)\n", " )\n", " except (AttributeError, TypeError):\n", " work_data = {}\n", " # else:\n", " # If there's no copyright info in the work data, then scrape it\n", " # if \"copyrightPolicy\" not in work_data:\n", " # work_data[\"copyrightPolicy\"] = get_copyright_status(response)\n", " if not response.from_cache:\n", " time.sleep(0.2)\n", " return work_data\n", "\n", "\n", "def find_field_content(record, tag, subfield):\n", " \"\"\"\n", " Loop through a MARC record looking for tag/subfield.\n", " If found, return the subfield value.\n", " \"\"\"\n", " try:\n", " for field in record[\"datafield\"]:\n", " if field[\"tag\"] == tag:\n", " if isinstance(field[\"subfield\"], list):\n", " for sfield in field[\"subfield\"]:\n", " if sfield[\"code\"] == subfield:\n", " return sfield[\"content\"]\n", " else:\n", " if field[\"subfield\"][\"code\"] == subfield:\n", " return field[\"subfield\"][\"content\"]\n", " except (KeyError, TypeError):\n", " pass\n", " return None\n", "\n", "\n", "def get_marc_field(work_data, tag, subfield):\n", " \"\"\"\n", " Loop through all the MARC records in work metadata looking for a tag/subfield.\n", " If found, return the subfield value.\n", " \"\"\"\n", " if \"marcData\" in work_data and work_data[\"marcData\"]:\n", " for record in work_data[\"marcData\"][\"record\"]:\n", " content = find_field_content(record, tag, subfield)\n", " if content:\n", " return content\n", " return None\n", "\n", "\n", "def format_bytes(size):\n", " \"\"\"\n", " Format bytes as a human-readable string\n", " \"\"\"\n", " # 2**10 = 1024\n", " power = 2**10\n", " n = 0\n", " power_labels = {0: \"\", 1: \"K\", 2: \"M\", 3: \"G\", 4: \"T\"}\n", " while size > power:\n", " size /= power\n", " n += 1\n", " return size, power_labels[n] + \"B\"\n", "\n", "\n", "def get_publication_details(work_data):\n", " \"\"\"\n", " Get MARC values for publication details and combine into a single string.\n", " \"\"\"\n", " parts = []\n", " for code in [\"a\", \"b\", \"c\"]:\n", " value = get_marc_field(work_data, 260, code)\n", " if value:\n", " parts.append(str(value))\n", " return \" \".join(parts)\n", "\n", "\n", "def get_map_data(work_data):\n", " \"\"\"\n", " Look for file size information in the embedded data\n", " \"\"\"\n", " map_data = {}\n", " width = None\n", " height = None\n", " num_bytes = None\n", " try:\n", " # Make sure there's a downloadable version\n", " if (\n", " work_data.get(\"accessConditions\") == \"Unrestricted\"\n", " and \"copies\" in work_data\n", " ):\n", " for copy in work_data[\"copies\"]:\n", " # Get the pixel dimensions\n", " if \"technicalmetadata\" in copy:\n", " width = copy[\"technicalmetadata\"].get(\"width\")\n", " height = copy[\"technicalmetadata\"].get(\"height\")\n", " # Get filesize in bytes\n", " elif (\n", " copy[\"copyrole\"] in [\"m\", \"o\", \"i\", \"fd\"]\n", " and copy[\"access\"] == \"true\"\n", " ):\n", " num_bytes = copy.get(\"filesize\")\n", " copy_role = copy[\"copyrole\"]\n", " if width and height and num_bytes:\n", " size, unit = format_bytes(num_bytes)\n", " # Convert bytes to something human friendly\n", " map_data[\"filesize_string\"] = \"{:.2f}{}\".format(size, unit)\n", " map_data[\"filesize\"] = num_bytes\n", " map_data[\"width\"] = width\n", " map_data[\"height\"] = height\n", " map_data[\"copy_role\"] = copy_role\n", "\n", " except AttributeError:\n", " pass\n", " return map_data\n", "\n", "\n", "def get_maps():\n", " \"\"\"\n", " Harvest metadata about maps.\n", " \"\"\"\n", " url = \"http://api.trove.nla.gov.au/v2/result\"\n", " maps = []\n", " params = {\n", " \"q\": '\"nla.obj-\"',\n", " \"zone\": \"map\",\n", " \"l-availability\": \"y\",\n", " \"l-format\": \"Map/Single map\",\n", " \"bulkHarvest\": \"true\", # Needed to maintain a consistent order across requests\n", " \"key\": api_key,\n", " \"n\": 100,\n", " \"encoding\": \"json\",\n", " }\n", " start = \"*\"\n", " total = get_total_results(params)\n", " with tqdm(total=total) as pbar:\n", " while start:\n", " params[\"s\"] = start\n", " response = s.get(url, params=params)\n", " data = response.json()\n", " # If there's a startNext value then we get it to request the next page of results\n", " try:\n", " start = data[\"response\"][\"zone\"][0][\"records\"][\"nextStart\"]\n", " except KeyError:\n", " start = None\n", " for work in tqdm(\n", " data[\"response\"][\"zone\"][0][\"records\"][\"work\"], leave=False\n", " ):\n", " # Check to see if there's a link to a digital version\n", " try:\n", " fulltext_url = get_fulltext_url(work[\"identifier\"])\n", " except KeyError:\n", " pass\n", " else:\n", " if fulltext_url:\n", " work_data = get_work_data(fulltext_url)\n", " map_data = get_map_data(work_data)\n", " obj_id = re.search(r\"(nla\\.obj\\-\\d+)\", fulltext_url).group(1)\n", " try:\n", " contributors = \"|\".join(work.get(\"contributor\"))\n", " except TypeError:\n", " contributors = work.get(\"contributor\")\n", " # Get basic metadata\n", " # You could add more work data here\n", " # Check the Trove API docs for work record structure\n", " map_data[\"title\"] = work[\"title\"]\n", " map_data[\"url\"] = fulltext_url\n", " map_data[\"work_url\"] = work.get(\"troveUrl\")\n", " map_data[\"identifier\"] = obj_id\n", " map_data[\"date\"] = work.get(\"issued\")\n", " map_data[\"creators\"] = contributors\n", " map_data[\"publication\"] = get_publication_details(work_data)\n", " map_data[\"extent\"] = work_data.get(\"extent\")\n", " # I think the copyright status scraped from the page (below) is more likely to be accurate\n", " # map_data[\"copyright_policy\"] = work_data.get(\"copyrightPolicy\")\n", " map_data[\"copyright_status\"] = get_copyright_status(\n", " url=fulltext_url\n", " )\n", " map_data[\"scale\"] = get_marc_field(work_data, 255, \"a\")\n", " map_data[\"coordinates\"] = get_marc_field(work_data, 255, \"c\")\n", " maps.append(map_data)\n", " # print(map_data)\n", " if not response.from_cache:\n", " time.sleep(0.2)\n", " pbar.update(100)\n", " return maps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download map data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbval-skip" ] }, "outputs": [], "source": [ "maps = get_maps()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert to dataframe and save to CSV" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbval-skip" ] }, "outputs": [], "source": [ "# Convert to dataframe\n", "# Convert dtypes converts numbers to integers rather than floats\n", "df = pd.DataFrame(maps).convert_dtypes()\n", "\n", "# Reorder columns\n", "df = df[\n", " [\n", " \"identifier\",\n", " \"title\",\n", " \"url\",\n", " \"work_url\",\n", " \"date\",\n", " \"creators\",\n", " \"publication\",\n", " \"extent\",\n", " \"copyright_status\",\n", " \"scale\",\n", " \"coordinates\",\n", " \"filesize_string\",\n", " \"filesize\",\n", " \"width\",\n", " \"height\",\n", " \"copy_role\",\n", " ]\n", "]\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbval-skip" ] }, "outputs": [], "source": [ "# Save to CSV\n", "csv_file = f\"single_maps_{datetime.datetime.now().strftime('%Y%m%d')}.csv\"\n", "df.to_csv(csv_file, index=False)\n", "display(FileLink(csv_file))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's explore the results" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Reload data from CSV if necessary\n", "df = pd.read_csv(\n", " \"https://raw.githubusercontent.com/GLAM-Workbench/trove-maps-data/main/single_maps_20230131.csv\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many digitised maps are available?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "33,161 maps\n" ] } ], "source": [ "print(\"{:,} maps\".format(df.shape[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many of the maps have high-resolution downloads?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(29190, 16)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df[\"filesize\"].notnull()].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are the `copy_role` values?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "m 28809\n", "i 355\n", "o 26\n", "Name: copy_role, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"copy_role\"].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How much map data is available for download?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13.29TB\n" ] } ], "source": [ "size, unit = format_bytes(df[\"filesize\"].sum())\n", "print(\"{:.2f}{}\".format(size, unit))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's the copyright status of the maps?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Out of Copyright 24490\n", "In Copyright 7573\n", "Edition Out of Copyright 625\n", "Copyright Undetermined 305\n", "Copyright Uncertain 111\n", "Unknown 17\n", "Edition In Copyright 4\n", "Name: copyright_status, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"copyright_status\"].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's show the copyright status as a chart..." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "counts = df[\"copyright_status\"].value_counts().to_frame().reset_index()\n", "counts.columns = [\"status\", \"count\"]\n", "alt.Chart(counts).mark_bar().encode(\n", " y=\"status:N\", x=\"count\", tooltip=\"count\"\n", ").properties(height=200)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the sizes of the download files. To make this easier we'll divide the filesizes into ranges (bins) and count the number of files in each range." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mbcount
0(0, 500]16007
1(500, 1000]10208
2(1000, 1500]2574
3(1500, 2000]312
4(2000, 3000]78
5(3000, 3500]11
\n", "
" ], "text/plain": [ " mb count\n", "0 (0, 500] 16007\n", "1 (500, 1000] 10208\n", "2 (1000, 1500] 2574\n", "3 (1500, 2000] 312\n", "4 (2000, 3000] 78\n", "5 (3000, 3500] 11" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert bytes to mb\n", "df[\"mb\"] = df[\"filesize\"] / 2**10 / 2**10\n", "# Create 500mb-sized bins and count the number of files in each bin\n", "sizes = (\n", " pd.cut(df[\"mb\"], bins=[0, 500, 1000, 1500, 2000, 3000, 3500])\n", " .value_counts()\n", " .to_frame()\n", " .reset_index()\n", ")\n", "sizes.columns = [\"mb\", \"count\"]\n", "# Convert intervals to strings for display in chart\n", "sizes[\"mb\"] = sizes[\"mb\"].astype(str)\n", "sizes" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(sizes).mark_bar().encode(\n", " x=alt.X(\"mb:N\", sort=None), y=\"count:Q\", tooltip=\"count:Q\"\n", ").properties(width=400)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So while most are less than 500MB, more than 10,000 are between 0.5 and 1GB!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's the biggest file available for download?" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "identifier nla.obj-591001246\n", "title Map of the City of Rangoon and suburbs 1928-29...\n", "url http://nla.gov.au/nla.obj-591001246\n", "work_url https://trove.nla.gov.au/work/182743876\n", "date 1932\n", "creators Geological Survey of India\n", "publication NaN\n", "extent 1 map on 4 sheets : colour ; 154 x 126 cm, sheets\n", "copyright_status Out of Copyright\n", "scale Scale 1:12,000. 1 in. = 1000 ft.\n", "coordinates (E 96°06ʹ--E 96°13ʹ/N 16°53ʹ--N 16°44ʹ).\n", "filesize_string 3.38GB\n", "filesize 3623879488.0\n", "width 31769.0\n", "height 38023.0\n", "copy_role m\n", "mb 3456.000793\n", "Name: 5046, dtype: object" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[df[\"filesize\"].idxmax()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All downloads greater than 3GB." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifiertitleurlwork_urldatecreatorspublicationextentcopyright_statusscalecoordinatesfilesize_stringfilesizewidthheightcopy_rolemb
2424nla.obj-2567709383Map of the coastal plain of British Guianahttps://nla.gov.au/nla.obj-2567709383https://trove.nla.gov.au/work/1522150301955Bleackley, D. (David)[S.l.] : Geological Survey of British Guiana, ...1 map : col. ; 88 x 205 cm.In CopyrightScale [ca. 1:143,000].(W 60°00ʹ--W 57°00ʹ/N 9°00ʹ--N 6°00ʹ).3.08GB3.305391e+0949731.022155.0m3152.266552
5046nla.obj-591001246Map of the City of Rangoon and suburbs 1928-29...http://nla.gov.au/nla.obj-591001246https://trove.nla.gov.au/work/1827438761932Geological Survey of IndiaNaN1 map on 4 sheets : colour ; 154 x 126 cm, sheetsOut of CopyrightScale 1:12,000. 1 in. = 1000 ft.(E 96°06ʹ--E 96°13ʹ/N 16°53ʹ--N 16°44ʹ).3.38GB3.623879e+0931769.038023.0m3456.000793
6038nla.obj-3009772762Shqipëria, hartë fiziko-politike : shkalla 1...https://nla.gov.au/nla.obj-3009772762https://trove.nla.gov.au/work/1918127271965Samimi, ErgjinNaN1 map on 3 sheets : color ; 173 x 91 cm, sheet...In CopyrightScale 1:200,000. 1 cm to 2 km ;(E 18°58ʹ--E 21°12ʹ/N 42°40ʹ--N 39°35ʹ).3.04GB3.266078e+0923106.047117.0m3114.774906
7442nla.obj-568387103Peta geologi teknik daerah Jakarta - Bogor : E...http://nla.gov.au/nla.obj-568387103https://trove.nla.gov.au/work/202085531970Indonesia. Direktorat GeologiNaN1 map : colour ; 157 x 107 cmIn CopyrightScale 1:50,000(E 106°33'00\"--E 106°59'00\"/S 5°59'00\"--S 6°38...3.05GB3.279211e+0926384.041429.0m3127.298904
7916nla.obj-400826638Nyūginia-tō zenzu / Taiwan Sōtokufu Gaijibu...http://nla.gov.au/nla.obj-400826638https://trove.nla.gov.au/work/2054818101942TaiwanNaN1 map on 4 sheets : colour ; 172 x 99 cmOut of CopyrightScale 1:5,000,000 ;(E 126°00ʹ--E 156°00ʹ/N 4°00ʹ--S 12°00ʹ).3.04GB3.264456e+0942659.025508.0m3113.228321
11317nla.obj-568387099Geological map of Djawa and Madura / compiled ...http://nla.gov.au/nla.obj-568387099https://trove.nla.gov.au/work/2182088951963Indonesia. Direktorat GeologiNaN1 map : colour ; 78 x 216 cm.In CopyrightScale 1:500,000(E 104°58ʹ28ʺ--E 113°98ʹ28ʺ/S 5°30ʹ00ʺ--S 9°00...3.08GB3.311802e+0952593.020990.0m3158.380127
14297nla.obj-1954049619A new chart of the South Pacific Ocean, includ...https://nla.gov.au/nla.obj-1954049619https://trove.nla.gov.au/work/2374213921849-1857James Imray and SonNaN1 map ; 96.4 x 183.0 cmEdition Out of CopyrightScale approximately 1:11,000,000 at the equator(E 111°--W 60°/N 20°--S 60°).3.00GB3.223027e+0944606.024085.0m3073.717865
14717nla.obj-2618718155Proposed plan for the site for the federal cap...https://nla.gov.au/nla.obj-2618718155https://trove.nla.gov.au/work/2391264001911Wilson, George, died 1923NaN1 map : colour ; 141 x 141 cmOut of CopyrightScale 1:4,800 ;(E 149°08'/S 35°18').3.12GB3.344969e+0933600.033184.0m3190.011211
14887nla.obj-2824965115Map of the mandated territory of New Guinea / ...https://nla.gov.au/nla.obj-2824965115https://trove.nla.gov.au/work/2399970091925Krahe, R. E.NaN1 map : transparent architectural linen ; 210 ...In CopyrightScale 1:1,000,000(E 140°50'00\"--E 159°41'00\"/S 0°33'00\"--S 11°5...3.37GB3.622362e+0953028.022770.0m3454.553661
\n", "
" ], "text/plain": [ " identifier title \\\n", "2424 nla.obj-2567709383 Map of the coastal plain of British Guiana \n", "5046 nla.obj-591001246 Map of the City of Rangoon and suburbs 1928-29... \n", "6038 nla.obj-3009772762 Shqipëria, hartë fiziko-politike : shkalla 1... \n", "7442 nla.obj-568387103 Peta geologi teknik daerah Jakarta - Bogor : E... \n", "7916 nla.obj-400826638 Nyūginia-tō zenzu / Taiwan Sōtokufu Gaijibu... \n", "11317 nla.obj-568387099 Geological map of Djawa and Madura / compiled ... \n", "14297 nla.obj-1954049619 A new chart of the South Pacific Ocean, includ... \n", "14717 nla.obj-2618718155 Proposed plan for the site for the federal cap... \n", "14887 nla.obj-2824965115 Map of the mandated territory of New Guinea / ... \n", "\n", " url \\\n", "2424 https://nla.gov.au/nla.obj-2567709383 \n", "5046 http://nla.gov.au/nla.obj-591001246 \n", "6038 https://nla.gov.au/nla.obj-3009772762 \n", "7442 http://nla.gov.au/nla.obj-568387103 \n", "7916 http://nla.gov.au/nla.obj-400826638 \n", "11317 http://nla.gov.au/nla.obj-568387099 \n", "14297 https://nla.gov.au/nla.obj-1954049619 \n", "14717 https://nla.gov.au/nla.obj-2618718155 \n", "14887 https://nla.gov.au/nla.obj-2824965115 \n", "\n", " work_url date \\\n", "2424 https://trove.nla.gov.au/work/152215030 1955 \n", "5046 https://trove.nla.gov.au/work/182743876 1932 \n", "6038 https://trove.nla.gov.au/work/191812727 1965 \n", "7442 https://trove.nla.gov.au/work/20208553 1970 \n", "7916 https://trove.nla.gov.au/work/205481810 1942 \n", "11317 https://trove.nla.gov.au/work/218208895 1963 \n", "14297 https://trove.nla.gov.au/work/237421392 1849-1857 \n", "14717 https://trove.nla.gov.au/work/239126400 1911 \n", "14887 https://trove.nla.gov.au/work/239997009 1925 \n", "\n", " creators \\\n", "2424 Bleackley, D. (David) \n", "5046 Geological Survey of India \n", "6038 Samimi, Ergjin \n", "7442 Indonesia. Direktorat Geologi \n", "7916 Taiwan \n", "11317 Indonesia. Direktorat Geologi \n", "14297 James Imray and Son \n", "14717 Wilson, George, died 1923 \n", "14887 Krahe, R. E. \n", "\n", " publication \\\n", "2424 [S.l.] : Geological Survey of British Guiana, ... \n", "5046 NaN \n", "6038 NaN \n", "7442 NaN \n", "7916 NaN \n", "11317 NaN \n", "14297 NaN \n", "14717 NaN \n", "14887 NaN \n", "\n", " extent \\\n", "2424 1 map : col. ; 88 x 205 cm. \n", "5046 1 map on 4 sheets : colour ; 154 x 126 cm, sheets \n", "6038 1 map on 3 sheets : color ; 173 x 91 cm, sheet... \n", "7442 1 map : colour ; 157 x 107 cm \n", "7916 1 map on 4 sheets : colour ; 172 x 99 cm \n", "11317 1 map : colour ; 78 x 216 cm. \n", "14297 1 map ; 96.4 x 183.0 cm \n", "14717 1 map : colour ; 141 x 141 cm \n", "14887 1 map : transparent architectural linen ; 210 ... \n", "\n", " copyright_status \\\n", "2424 In Copyright \n", "5046 Out of Copyright \n", "6038 In Copyright \n", "7442 In Copyright \n", "7916 Out of Copyright \n", "11317 In Copyright \n", "14297 Edition Out of Copyright \n", "14717 Out of Copyright \n", "14887 In Copyright \n", "\n", " scale \\\n", "2424 Scale [ca. 1:143,000]. \n", "5046 Scale 1:12,000. 1 in. = 1000 ft. \n", "6038 Scale 1:200,000. 1 cm to 2 km ; \n", "7442 Scale 1:50,000 \n", "7916 Scale 1:5,000,000 ; \n", "11317 Scale 1:500,000 \n", "14297 Scale approximately 1:11,000,000 at the equator \n", "14717 Scale 1:4,800 ; \n", "14887 Scale 1:1,000,000 \n", "\n", " coordinates filesize_string \\\n", "2424 (W 60°00ʹ--W 57°00ʹ/N 9°00ʹ--N 6°00ʹ). 3.08GB \n", "5046 (E 96°06ʹ--E 96°13ʹ/N 16°53ʹ--N 16°44ʹ). 3.38GB \n", "6038 (E 18°58ʹ--E 21°12ʹ/N 42°40ʹ--N 39°35ʹ). 3.04GB \n", "7442 (E 106°33'00\"--E 106°59'00\"/S 5°59'00\"--S 6°38... 3.05GB \n", "7916 (E 126°00ʹ--E 156°00ʹ/N 4°00ʹ--S 12°00ʹ). 3.04GB \n", "11317 (E 104°58ʹ28ʺ--E 113°98ʹ28ʺ/S 5°30ʹ00ʺ--S 9°00... 3.08GB \n", "14297 (E 111°--W 60°/N 20°--S 60°). 3.00GB \n", "14717 (E 149°08'/S 35°18'). 3.12GB \n", "14887 (E 140°50'00\"--E 159°41'00\"/S 0°33'00\"--S 11°5... 3.37GB \n", "\n", " filesize width height copy_role mb \n", "2424 3.305391e+09 49731.0 22155.0 m 3152.266552 \n", "5046 3.623879e+09 31769.0 38023.0 m 3456.000793 \n", "6038 3.266078e+09 23106.0 47117.0 m 3114.774906 \n", "7442 3.279211e+09 26384.0 41429.0 m 3127.298904 \n", "7916 3.264456e+09 42659.0 25508.0 m 3113.228321 \n", "11317 3.311802e+09 52593.0 20990.0 m 3158.380127 \n", "14297 3.223027e+09 44606.0 24085.0 m 3073.717865 \n", "14717 3.344969e+09 33600.0 33184.0 m 3190.011211 \n", "14887 3.622362e+09 53028.0 22770.0 m 3454.553661 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[(df[\"filesize\"] / 2**10 / 2**10 / 2**10) > 3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The widest image?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "identifier nla.obj-636346192\n", "title Land status petroleum mining agreement in resp...\n", "url http://nla.gov.au/nla.obj-636346192\n", "work_url https://trove.nla.gov.au/work/230363372\n", "date 1968\n", "creators Brunei Shell Petroleum Company\n", "publication NaN\n", "extent 1 map ; 286 x 58 cm\n", "copyright_status In Copyright\n", "scale Scale 1:10,000\n", "coordinates (E 114°09ʹ53ʺ--E 114°23ʹ34ʺ/N 4°38ʹ42ʺ--N 4°32...\n", "filesize_string 2.80GB\n", "filesize 3008938460.0\n", "width 68453.0\n", "height 14652.0\n", "copy_role m\n", "mb 2869.547329\n", "Name: 13113, dtype: object" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[df[\"width\"].idxmax()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The tallest image?" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "identifier nla.obj-2824964225\n", "title Traverse of the Ramu River, navigated by the \"...\n", "url https://nla.gov.au/nla.obj-2824964225\n", "work_url https://trove.nla.gov.au/work/240049759\n", "date 1940-1945\n", "creators Stanley, Evan R. (Evan Richard), 1885-1924\n", "publication NaN\n", "extent 1 map : on architectural linen ; 410 x 76 cm\n", "copyright_status In Copyright\n", "scale Scale 1:31,760\n", "coordinates (E 144°35'--E 144°50'/S 4°01'--S 5°11').\n", "filesize_string 2.85GB\n", "filesize 3057135688.0\n", "width 13840.0\n", "height 73630.0\n", "copy_role m\n", "mb 2915.511787\n", "Name: 14904, dtype: object" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[df[\"height\"].idxmax()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.net/).\n", "\n", "Work on this notebook was originally supported by the [Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab](https://tinker.edu.au/).\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12 (default, May 16 2022, 14:53:00) \n[GCC 11.2.0]" }, "vscode": { "interpreter": { "hash": "f54aba2de7a75230217f549a064c6555500d2132634fbcab9606dbfda34a2a1b" } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }