{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exploring digitised maps in Trove\n",
    "\n",
    "If you've ever poked around in Trove's 'map' zone, you might have noticed the beautiful deep-zoomable images available for many of the NLA's digitised maps. Even better, in many cases the high-resolution TIFF versions of the digitised maps are available for download.\n",
    "\n",
    "I knew there were lots of great maps you could download from Trove, but how many? And how big were the files? I thought I'd try to quantify this a bit by harvesting and analysing the metadata.\n",
    "\n",
    "The size of the downloadable files (both in bytes and pixels) are [embedded within the landing pages](https://nbviewer.jupyter.org/github/GLAM-Workbench/trove-books/blob/master/Metadata-for-Trove-digitised-works.ipynb) for the digitised maps. So harvesting the metadata involves a number of steps:\n",
    "\n",
    "* Use the Trove API to search for maps that include the phrase \"nla.obj\" – this will filter the results to maps that have been digitised and are available through Trove\n",
    "* Work through the results, checking to see if the record includes a link to a digital copy.\n",
    "* If there is a digital copy, extract the embedded work data from the landing page.\n",
    "* Scrape the copyright status from the page.\n",
    "\n",
    "**2023 update!** It turns out that embedded within the embedded data are MARC descriptions that include some other metadata that's not available through the API. This includes the map scale and coordinates. The coordinates can either be a point, or a bounding box. I've saved these values as well, and explored some ways of parsing and visualising the coordinates in this notebook.\n",
    "\n",
    "The fields in the harvested dataset are:\n",
    "\n",
    "* `title` – title of the map\n",
    "* `url` – url to the map in the digitised file viewer\n",
    "* `work_url` – url to the work in the Trove map category\n",
    "* `identifier` – NLA identifier\n",
    "* `date` – date published or created\n",
    "* `creators` – creators of the map\n",
    "* `publication` – publication place, publisher, and publication date (if available)\n",
    "* `extent` – physical description of map\n",
    "* `copyright_status` – copyright status based on available metadata (scraped from web page)\n",
    "* `scale` – map scale\n",
    "* `coordinates` – map coordinates, either a point or a bounding box (format is 'W--E/N--S', eg: 'E 130⁰50'--E 131⁰00'/S 12⁰30'--S 12⁰40')\n",
    "* `filesize_string` – filesize string in MB\n",
    "* `filesize` – size of TIFF file in bytes\n",
    "* `width` – width of TIFF in pixels\n",
    "* `height` – height of TIFF in pixels\n",
    "* `copy_role` – I'm not sure what the values in this field signify, but as described below, you can use them to download high-res TIFF images\n",
    "\n",
    "## Getting map images\n",
    "\n",
    "There are a couple of undocumented tricks that make it easy to programatically download images of the maps.\n",
    "\n",
    "* To view the JPG version, just add `/image` to the map url. For example: http://nla.gov.au/nla.obj-232162256/image \n",
    "* The JPG image will be at the highest available resolution, but you requests smaller versions using the `wid` parameter to specify a pixel width. For example: http://nla.gov.au/nla.obj-232162256/image?wid=400\n",
    "* There seems to be an upper limit for the resolution of the JPG versions, higher resolutions might be available via the TIFF file which you can download by adding the `copy_role` value to the url. For example, if the `copy_role` is 'm' this url will download the TIFF: http://nla.gov.au/nla.obj-232162256/m (note that some of these files are very, very large – you might want to check the `filesize` before downloading)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting things up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import datetime\n",
    "import json\n",
    "import os\n",
    "import re\n",
    "import time\n",
    "import warnings\n",
    "from functools import reduce\n",
    "\n",
    "warnings.simplefilter(action=\"ignore\", category=FutureWarning)\n",
    "\n",
    "import altair as alt\n",
    "import pandas as pd\n",
    "import requests_cache\n",
    "from bs4 import BeautifulSoup\n",
    "from IPython.display import FileLink, display\n",
    "from requests.adapters import HTTPAdapter\n",
    "from requests.packages.urllib3.util.retry import Retry\n",
    "from tqdm.auto import tqdm\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "s = requests_cache.CachedSession()\n",
    "retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504])\n",
    "s.mount(\"https://\", HTTPAdapter(max_retries=retries))\n",
    "s.mount(\"http://\", HTTPAdapter(max_retries=retries))\n",
    "\n",
    "load_dotenv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## You'll need a Trove API key to harvest the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# This creates a variable called 'api_key', paste your key between the quotes\n",
    "API_KEY = \"YOUR API KEY\"\n",
    "\n",
    "# Use an api key value from environment variables if it is available (useful for testing)\n",
    "if os.getenv(\"TROVE_API_KEY\"):\n",
    "    API_KEY = os.getenv(\"TROVE_API_KEY\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define some functions to do the work"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def get_total_results(params):\n",
    "    \"\"\"\n",
    "    Get the total number of results for a search.\n",
    "    \"\"\"\n",
    "    these_params = params.copy()\n",
    "    these_params[\"n\"] = 0\n",
    "    response = s.get(\"https://api.trove.nla.gov.au/v3/result\", params=these_params, headers={\"X-API-KEY\": API_KEY})\n",
    "    data = response.json()\n",
    "    return int(data[\"category\"][0][\"records\"][\"total\"])\n",
    "\n",
    "\n",
    "def get_fulltext_url(links):\n",
    "    \"\"\"\n",
    "    Loop through the identifiers to find a link to the digital version of the journal.\n",
    "    \"\"\"\n",
    "    url = None\n",
    "    for link in links:\n",
    "        if link[\"linktype\"] == \"fulltext\" and \"nla.obj\" in link[\"value\"]:\n",
    "            url = link[\"value\"]\n",
    "            break\n",
    "    return url\n",
    "\n",
    "\n",
    "def get_copyright_status(response=None, url=None):\n",
    "    \"\"\"\n",
    "    Scrape copyright information from a digital work page.\n",
    "    \"\"\"\n",
    "    if url and not response:\n",
    "        response = s.get(url)\n",
    "    if response:\n",
    "        soup = BeautifulSoup(response.text, \"lxml\")\n",
    "        try:\n",
    "            copyright_status = str(\n",
    "                soup.find(\"div\", id=\"tab-access\").find(\"p\", class_=\"decorative\").string\n",
    "            )\n",
    "            return copyright_status\n",
    "        # No access tab\n",
    "        except AttributeError:\n",
    "            pass\n",
    "    return \"\"\n",
    "\n",
    "\n",
    "def get_work_data(url):\n",
    "    \"\"\"\n",
    "    Extract work data in a JSON string from the work's HTML page.\n",
    "    \"\"\"\n",
    "    response = s.get(url)\n",
    "    try:\n",
    "        work_data = json.loads(\n",
    "            re.search(\n",
    "                r\"var work = JSON\\.parse\\(JSON\\.stringify\\((\\{.*\\})\", response.text\n",
    "            ).group(1)\n",
    "        )\n",
    "    except (AttributeError, TypeError):\n",
    "        work_data = {}\n",
    "    # else:\n",
    "    # If there's no copyright info in the work data, then scrape it\n",
    "    # if \"copyrightPolicy\" not in work_data:\n",
    "    #    work_data[\"copyrightPolicy\"] = get_copyright_status(response)\n",
    "    if not response.from_cache:\n",
    "        time.sleep(0.2)\n",
    "    return work_data\n",
    "\n",
    "\n",
    "def find_field_content(record, tag, subfield):\n",
    "    \"\"\"\n",
    "    Loop through a MARC record looking for tag/subfield.\n",
    "    If found, return the subfield value.\n",
    "    \"\"\"\n",
    "    try:\n",
    "        for field in record[\"datafield\"]:\n",
    "            if field[\"tag\"] == tag:\n",
    "                if isinstance(field[\"subfield\"], list):\n",
    "                    for sfield in field[\"subfield\"]:\n",
    "                        if sfield[\"code\"] == subfield:\n",
    "                            return sfield[\"content\"]\n",
    "                else:\n",
    "                    if field[\"subfield\"][\"code\"] == subfield:\n",
    "                        return field[\"subfield\"][\"content\"]\n",
    "    except (KeyError, TypeError):\n",
    "        pass\n",
    "    return \"\"\n",
    "\n",
    "\n",
    "def get_marc_field(work_data, tag, subfield):\n",
    "    \"\"\"\n",
    "    Loop through all the MARC records in work metadata looking for a tag/subfield.\n",
    "    If found, return the subfield value.\n",
    "    \"\"\"\n",
    "    if \"marcData\" in work_data and work_data[\"marcData\"]:\n",
    "        for record in work_data[\"marcData\"][\"record\"]:\n",
    "            content = find_field_content(record, tag, subfield)\n",
    "            if content:\n",
    "                return content\n",
    "    return \"\"\n",
    "\n",
    "\n",
    "def format_bytes(size):\n",
    "    \"\"\"\n",
    "    Format bytes as a human-readable string\n",
    "    \"\"\"\n",
    "    # 2**10 = 1024\n",
    "    if not size:\n",
    "        return \"\", \"\"\n",
    "    power = 2**10\n",
    "    n = 0\n",
    "    power_labels = {0: \"\", 1: \"K\", 2: \"M\", 3: \"G\", 4: \"T\"}\n",
    "    while size > power:\n",
    "        size /= power\n",
    "        n += 1\n",
    "    return size, power_labels[n] + \"B\"\n",
    "\n",
    "\n",
    "def get_publication_details(work_data):\n",
    "    \"\"\"\n",
    "    Get MARC values for publication details and combine into a single string.\n",
    "    \"\"\"\n",
    "    parts = []\n",
    "    for code in [\"a\", \"b\", \"c\"]:\n",
    "        value = get_marc_field(work_data, 260, code)\n",
    "        if value:\n",
    "            parts.append(str(value))\n",
    "    return \" \".join(parts)\n",
    "\n",
    "\n",
    "def get_map_data(work_data):\n",
    "    \"\"\"\n",
    "    Look for file size information in the embedded data\n",
    "    \"\"\"\n",
    "    map_data = {\n",
    "        \"filesize_string\": \"\",\n",
    "        \"filesize\": 0,\n",
    "        \"width\": 0,\n",
    "        \"height\": 0,\n",
    "        \"copy_role\": \"\"\n",
    "    }\n",
    "    width = None\n",
    "    height = None\n",
    "    num_bytes = None\n",
    "    try:\n",
    "        # Make sure there's a downloadable version\n",
    "        if (\n",
    "            work_data.get(\"accessConditions\") == \"Unrestricted\"\n",
    "            and \"copies\" in work_data\n",
    "        ):\n",
    "            for copy in work_data[\"copies\"]:\n",
    "                width = \"\"\n",
    "                height = \"\"\n",
    "                num_bytes = \"\"\n",
    "                copy_role = \"\"\n",
    "                # Get the pixel dimensions\n",
    "                if \"technicalmetadata\" in copy:\n",
    "                    width = copy[\"technicalmetadata\"].get(\"width\", 0)\n",
    "                    height = copy[\"technicalmetadata\"].get(\"height\", 0)\n",
    "                # Get filesize in bytes\n",
    "                elif (\n",
    "                    copy[\"copyrole\"] in [\"m\", \"o\", \"i\", \"fd\"]\n",
    "                    and copy[\"access\"] == \"true\"\n",
    "                ):\n",
    "                    num_bytes = copy.get(\"filesize\", 0)\n",
    "                    copy_role = copy.get(\"copyrole\", \"\")\n",
    "            size, unit = format_bytes(num_bytes)\n",
    "            # Convert bytes to something human friendly\n",
    "            if size:\n",
    "                map_data[\"filesize_string\"] = \"{:.2f}{}\".format(size, unit)\n",
    "            map_data[\"filesize\"] = num_bytes\n",
    "            map_data[\"width\"] = width\n",
    "            map_data[\"height\"] = height\n",
    "            map_data[\"copy_role\"] = copy_role\n",
    "\n",
    "    except AttributeError:\n",
    "        pass\n",
    "    return map_data\n",
    "\n",
    "\n",
    "def get_maps():\n",
    "    \"\"\"\n",
    "    Harvest metadata about maps.\n",
    "    \"\"\"\n",
    "    url = \"http://api.trove.nla.gov.au/v3/result\"\n",
    "    maps = []\n",
    "    params = {\n",
    "        \"q\": '\"nla.obj-\"',\n",
    "        \"category\": \"image\",\n",
    "        \"l-artType\": \"map\",\n",
    "        \"l-availability\": \"y\",\n",
    "        \"l-format\": \"Map/Single map\",\n",
    "        \"bulkHarvest\": \"true\",  # Needed to maintain a consistent order across requests\n",
    "        \"n\": 100,\n",
    "        \"encoding\": \"json\",\n",
    "    }\n",
    "    start = \"*\"\n",
    "    total = get_total_results(params)\n",
    "    with tqdm(total=total) as pbar:\n",
    "        while start:\n",
    "            params[\"s\"] = start\n",
    "            response = s.get(url, params=params, headers={\"X-API-KEY\": API_KEY})\n",
    "            data = response.json()\n",
    "            # If there's a startNext value then we get it to request the next page of results\n",
    "            try:\n",
    "                start = data[\"category\"][0][\"records\"][\"nextStart\"]\n",
    "            except KeyError:\n",
    "                start = None\n",
    "            for work in tqdm(\n",
    "                data[\"category\"][0][\"records\"][\"work\"], leave=False\n",
    "            ):\n",
    "                # Check to see if there's a link to a digital version\n",
    "                try:\n",
    "                    fulltext_url = get_fulltext_url(work[\"identifier\"])\n",
    "                except KeyError:\n",
    "                    pass\n",
    "                else:\n",
    "                    if fulltext_url:\n",
    "                        work_data = get_work_data(fulltext_url)\n",
    "                        map_data = get_map_data(work_data)\n",
    "                        obj_id = re.search(r\"(nla\\.obj\\-\\d+)\", fulltext_url).group(1)\n",
    "                        # Get basic metadata\n",
    "                        # You could add more work data here\n",
    "                        # Check the Trove API docs for work record structure\n",
    "                        map_data[\"title\"] = work.get(\"title\", \"\")\n",
    "                        map_data[\"url\"] = fulltext_url\n",
    "                        map_data[\"work_url\"] = work.get(\"troveUrl\", \"\")\n",
    "                        map_data[\"identifier\"] = obj_id\n",
    "                        map_data[\"date\"] = work.get(\"issued\", \"\")\n",
    "                        map_data[\"creators\"] = \"|\".join(work.get(\"contributor\", []))\n",
    "                        map_data[\"publication\"] = get_publication_details(work_data)\n",
    "                        map_data[\"extent\"] = work_data.get(\"extent\", \"\")\n",
    "                        # I think the copyright status scraped from the page (below) is more likely to be accurate\n",
    "                        # map_data[\"copyright_policy\"] = work_data.get(\"copyrightPolicy\")\n",
    "                        map_data[\"copyright_status\"] = get_copyright_status(\n",
    "                            url=fulltext_url\n",
    "                        )\n",
    "                        map_data[\"scale\"] = get_marc_field(work_data, 255, \"a\")\n",
    "                        map_data[\"coordinates\"] = get_marc_field(work_data, 255, \"c\")\n",
    "                        maps.append(map_data)\n",
    "                        # print(map_data)\n",
    "            pbar.update(100)\n",
    "    return maps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download map data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [],
   "source": [
    "maps = get_maps()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convert to dataframe and save to CSV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [],
   "source": [
    "# Convert to dataframe\n",
    "# Convert dtypes converts numbers to integers rather than floats\n",
    "df = pd.DataFrame(maps).convert_dtypes()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def merge_column(columns):\n",
    "    values = []\n",
    "    for value in columns:\n",
    "        if isinstance(value, list):\n",
    "            values += [str(v) for v in value if v]\n",
    "        elif value:\n",
    "            values.append(str(value))\n",
    "    return \" | \".join(sorted(set(values)))\n",
    "\n",
    "\n",
    "def merge_records(df):\n",
    "    # df[\"pages\"].fillna(0, inplace=True)\n",
    "    # df.fillna(\"\", inplace=True)\n",
    "    # df[\"pages\"] = df[\"pages\"].astype(\"Int64\")\n",
    "\n",
    "    # Add base dataset with columns that will always have only one value\n",
    "    dfs = [df[[\"identifier\", \"url\"]].drop_duplicates()]\n",
    "\n",
    "    # Columns that potentially have multiple values which will be merged\n",
    "    columns = [\n",
    "        \"title\",\n",
    "        \"work_url\",\n",
    "        \"date\",\n",
    "        \"creators\",\n",
    "        \"publication\",\n",
    "        \"extent\",\n",
    "        \"copyright_status\",\n",
    "        \"scale\",\n",
    "        \"coordinates\",\n",
    "        \"filesize_string\",\n",
    "        \"filesize\",\n",
    "        \"width\",\n",
    "        \"height\",\n",
    "        \"copy_role\"\n",
    "    ]\n",
    "\n",
    "    # Merge values from each column in turn, creating a new dataframe from each\n",
    "    for column in columns:\n",
    "        dfs.append(\n",
    "            df.groupby([\"identifier\", \"url\"])[column].apply(merge_column).reset_index()\n",
    "        )\n",
    "\n",
    "    # Merge all the individual dataframes into one, linking on `text_file` value\n",
    "    df_merged = reduce(\n",
    "        lambda left, right: pd.merge(left, right, on=[\"identifier\", \"url\"], how=\"left\"), dfs\n",
    "    )\n",
    "    return df_merged"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df_merged = merge_records(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<a href='single_maps_20240608.csv' target='_blank'>single_maps_20240608.csv</a><br>"
      ],
      "text/plain": [
       "/home/tim/mywork/glam-workbench/trove-maps/notebooks/single_maps_20240608.csv"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Reorder columns\n",
    "df_merged = df_merged[\n",
    "    [\n",
    "        \"identifier\",\n",
    "        \"title\",\n",
    "        \"url\",\n",
    "        \"work_url\",\n",
    "        \"date\",\n",
    "        \"creators\",\n",
    "        \"publication\",\n",
    "        \"extent\",\n",
    "        \"copyright_status\",\n",
    "        \"scale\",\n",
    "        \"coordinates\",\n",
    "        \"filesize_string\",\n",
    "        \"filesize\",\n",
    "        \"width\",\n",
    "        \"height\",\n",
    "        \"copy_role\",\n",
    "    ]\n",
    "]\n",
    "\n",
    "# Save to CSV\n",
    "csv_file = f\"single_maps_{datetime.datetime.now().strftime('%Y%m%d')}.csv\"\n",
    "df_merged.to_csv(csv_file, index=False)\n",
    "display(FileLink(csv_file))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Let's explore the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Reload data from CSV if necessary\n",
    "df = pd.read_csv(\n",
    "    \"https://raw.githubusercontent.com/GLAM-Workbench/trove-maps-data/main/single_maps.csv\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many digitised maps are available?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "35,042 maps\n"
     ]
    }
   ],
   "source": [
    "print(\"{:,} maps\".format(df.shape[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>identifier</th>\n",
       "      <th>title</th>\n",
       "      <th>url</th>\n",
       "      <th>work_url</th>\n",
       "      <th>date</th>\n",
       "      <th>creators</th>\n",
       "      <th>publication</th>\n",
       "      <th>extent</th>\n",
       "      <th>copyright_status</th>\n",
       "      <th>scale</th>\n",
       "      <th>coordinates</th>\n",
       "      <th>filesize_string</th>\n",
       "      <th>filesize</th>\n",
       "      <th>width</th>\n",
       "      <th>height</th>\n",
       "      <th>copy_role</th>\n",
       "      <th>mb</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2314</th>\n",
       "      <td>nla.obj-1059119069</td>\n",
       "      <td>[Western Australia gold mining leases]. Depart...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-1059119069</td>\n",
       "      <td>https://trove.nla.gov.au/work/14409767</td>\n",
       "      <td>1905</td>\n",
       "      <td>Western Australia. Department of Mines</td>\n",
       "      <td>[South Kensington : Science Museum Library, 19...</td>\n",
       "      <td>1 map ; 65 x 99 cm.</td>\n",
       "      <td>Out of Copyright</td>\n",
       "      <td>Scale [ca. 1:31 680]</td>\n",
       "      <td>(E 122°10'30\"/S 28°49'00\")</td>\n",
       "      <td>1.19GB</td>\n",
       "      <td>1281565428</td>\n",
       "      <td>25062</td>\n",
       "      <td>17045</td>\n",
       "      <td>m</td>\n",
       "      <td>1222.196033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2315</th>\n",
       "      <td>nla.obj-1059119069</td>\n",
       "      <td>[Western Australia gold mining leases]. Depart...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-1059119069</td>\n",
       "      <td>https://trove.nla.gov.au/work/14409773</td>\n",
       "      <td>1905</td>\n",
       "      <td>Western Australia. Department of Mines</td>\n",
       "      <td>[South Kensington : Science Museum Library, 19...</td>\n",
       "      <td>1 map ; 65 x 99 cm.</td>\n",
       "      <td>Out of Copyright</td>\n",
       "      <td>Scale [ca. 1:31 680]</td>\n",
       "      <td>(E 122°10'30\"/S 28°49'00\")</td>\n",
       "      <td>1.19GB</td>\n",
       "      <td>1281565428</td>\n",
       "      <td>25062</td>\n",
       "      <td>17045</td>\n",
       "      <td>m</td>\n",
       "      <td>1222.196033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2471</th>\n",
       "      <td>nla.obj-1059122984</td>\n",
       "      <td>[Western Australia gold mining leases]. (1.2.0...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-1059122984</td>\n",
       "      <td>https://trove.nla.gov.au/work/14626171</td>\n",
       "      <td>1905</td>\n",
       "      <td>Western Australia. Department of Mines</td>\n",
       "      <td>[London : Science Museum Library, 1905?]</td>\n",
       "      <td>1 map ; 65 x 98 cm.</td>\n",
       "      <td>Edition Out of Copyright</td>\n",
       "      <td>Scale [ca. 1:15 840]</td>\n",
       "      <td>(E 121°09'/S 30°57')</td>\n",
       "      <td>1.14GB</td>\n",
       "      <td>1228526852</td>\n",
       "      <td>23731</td>\n",
       "      <td>17256</td>\n",
       "      <td>m</td>\n",
       "      <td>1171.614506</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2467</th>\n",
       "      <td>nla.obj-1059122984</td>\n",
       "      <td>[Western Australia gold mining leases]. (1.2.0...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-1059122984</td>\n",
       "      <td>https://trove.nla.gov.au/work/14626082</td>\n",
       "      <td>1905</td>\n",
       "      <td>Western Australia. Department of Mines</td>\n",
       "      <td>[London : Science Museum Library, 1905?]</td>\n",
       "      <td>1 map ; 65 x 98 cm.</td>\n",
       "      <td>Edition Out of Copyright</td>\n",
       "      <td>Scale [ca. 1:15 840]</td>\n",
       "      <td>(E 121°09'/S 30°57')</td>\n",
       "      <td>1.14GB</td>\n",
       "      <td>1228526852</td>\n",
       "      <td>23731</td>\n",
       "      <td>17256</td>\n",
       "      <td>m</td>\n",
       "      <td>1171.614506</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2472</th>\n",
       "      <td>nla.obj-1059123632</td>\n",
       "      <td>[Western Australia gold mining leases]. 20.5.0...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-1059123632</td>\n",
       "      <td>https://trove.nla.gov.au/work/14626181</td>\n",
       "      <td>1905</td>\n",
       "      <td>Western Australia. Department of Mines</td>\n",
       "      <td>[London : Science Museum Library, 1905?]</td>\n",
       "      <td>1 map ; 65 x 98 cm.</td>\n",
       "      <td>Edition Out of Copyright</td>\n",
       "      <td>Scale [ca. 1:15 840]</td>\n",
       "      <td>(E 121°09'/S 30°57')</td>\n",
       "      <td>1.20GB</td>\n",
       "      <td>1287582536</td>\n",
       "      <td>25084</td>\n",
       "      <td>17110</td>\n",
       "      <td>m</td>\n",
       "      <td>1227.934395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4155</th>\n",
       "      <td>nla.obj-678124518</td>\n",
       "      <td>Geological atlas 1:50 000 series. Geological S...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-678124518</td>\n",
       "      <td>https://trove.nla.gov.au/work/16946369</td>\n",
       "      <td>1987</td>\n",
       "      <td>Geological Survey of Tasmania</td>\n",
       "      <td>Hobart, Tas. : The Dept., 1987</td>\n",
       "      <td>1 map : col. ; 56 x 84 cm. on sheet 69 x 107 cm.</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:50 000</td>\n",
       "      <td>(E 148⁰00'--E 148⁰30'/S 41⁰15'--S 41⁰30')</td>\n",
       "      <td>1.16GB</td>\n",
       "      <td>1245749060</td>\n",
       "      <td>25436</td>\n",
       "      <td>16325</td>\n",
       "      <td>m</td>\n",
       "      <td>1188.038883</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23268</th>\n",
       "      <td>nla.obj-893284845</td>\n",
       "      <td>Vegetation survey of Western Australia. mapped...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-893284845</td>\n",
       "      <td>https://trove.nla.gov.au/work/32952780</td>\n",
       "      <td>1973</td>\n",
       "      <td>Beard, J. S. (John Stanley), 1916-2011</td>\n",
       "      <td>Perth : Vegmap Publications, 1973</td>\n",
       "      <td>1 map ; 47 x 59 cm., on sheet 59 x 70 cm. + 1 ...</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:250,000</td>\n",
       "      <td>(E 123°00ʹ--E 124°30ʹ/S 33°00ʹ--S 34°00ʹ).</td>\n",
       "      <td>748.65MB</td>\n",
       "      <td>785014388</td>\n",
       "      <td>18087</td>\n",
       "      <td>14467</td>\n",
       "      <td>m</td>\n",
       "      <td>748.64806</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7405</th>\n",
       "      <td>nla.obj-893284845</td>\n",
       "      <td>Vegetation survey of Western Australia. mapped...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-893284845</td>\n",
       "      <td>https://trove.nla.gov.au/work/19937156</td>\n",
       "      <td>1973</td>\n",
       "      <td>Beard, J. S. (John Stanley), 1916-2011</td>\n",
       "      <td>Perth : Vegmap Publications, 1973</td>\n",
       "      <td>1 map ; 47 x 59 cm., on sheet 59 x 70 cm. + 1 ...</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:250,000</td>\n",
       "      <td>(E 123°00ʹ--E 124°30ʹ/S 33°00ʹ--S 34°00ʹ).</td>\n",
       "      <td>748.65MB</td>\n",
       "      <td>785014388</td>\n",
       "      <td>18087</td>\n",
       "      <td>14467</td>\n",
       "      <td>m</td>\n",
       "      <td>748.64806</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>nla.obj-961623531</td>\n",
       "      <td>Australia 1:25 000 topographic survey. Produce...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-961623531</td>\n",
       "      <td>https://trove.nla.gov.au/work/10384783</td>\n",
       "      <td>1979</td>\n",
       "      <td>Western Australia. Department of Lands and Sur...</td>\n",
       "      <td>Perth (W.A.) : Dept. of Lands and Surveys, 1979</td>\n",
       "      <td>1 map : col. ; 55 x 50 cm.</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:25,000</td>\n",
       "      <td>(E 115°45ʹ00ʺ--E 115°52ʹ30ʺ/S 32°15ʹ00ʺ--S 32°...</td>\n",
       "      <td>676.20MB</td>\n",
       "      <td>709045908</td>\n",
       "      <td>14078</td>\n",
       "      <td>16788</td>\n",
       "      <td>m</td>\n",
       "      <td>676.198872</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3339</th>\n",
       "      <td>nla.obj-961623531</td>\n",
       "      <td>Australia 1:25 000 topographic survey.: Wellar...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-961623531</td>\n",
       "      <td>https://trove.nla.gov.au/work/159335939</td>\n",
       "      <td>1891-1979</td>\n",
       "      <td>Western Australia. Department of Lands and Sur...</td>\n",
       "      <td>Perth (W.A.) : Dept. of Lands and Surveys, 1979</td>\n",
       "      <td>1 map : col. ; 55 x 50 cm.</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:25,000</td>\n",
       "      <td>(E 115°45ʹ00ʺ--E 115°52ʹ30ʺ/S 32°15ʹ00ʺ--S 32°...</td>\n",
       "      <td>676.20MB</td>\n",
       "      <td>709045908</td>\n",
       "      <td>14078</td>\n",
       "      <td>16788</td>\n",
       "      <td>m</td>\n",
       "      <td>676.198872</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>398 rows × 17 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               identifier                                              title  \\\n",
       "2314   nla.obj-1059119069  [Western Australia gold mining leases]. Depart...   \n",
       "2315   nla.obj-1059119069  [Western Australia gold mining leases]. Depart...   \n",
       "2471   nla.obj-1059122984  [Western Australia gold mining leases]. (1.2.0...   \n",
       "2467   nla.obj-1059122984  [Western Australia gold mining leases]. (1.2.0...   \n",
       "2472   nla.obj-1059123632  [Western Australia gold mining leases]. 20.5.0...   \n",
       "...                   ...                                                ...   \n",
       "4155    nla.obj-678124518  Geological atlas 1:50 000 series. Geological S...   \n",
       "23268   nla.obj-893284845  Vegetation survey of Western Australia. mapped...   \n",
       "7405    nla.obj-893284845  Vegetation survey of Western Australia. mapped...   \n",
       "132     nla.obj-961623531  Australia 1:25 000 topographic survey. Produce...   \n",
       "3339    nla.obj-961623531  Australia 1:25 000 topographic survey.: Wellar...   \n",
       "\n",
       "                                        url  \\\n",
       "2314   http://nla.gov.au/nla.obj-1059119069   \n",
       "2315   http://nla.gov.au/nla.obj-1059119069   \n",
       "2471   http://nla.gov.au/nla.obj-1059122984   \n",
       "2467   http://nla.gov.au/nla.obj-1059122984   \n",
       "2472   http://nla.gov.au/nla.obj-1059123632   \n",
       "...                                     ...   \n",
       "4155   https://nla.gov.au/nla.obj-678124518   \n",
       "23268  https://nla.gov.au/nla.obj-893284845   \n",
       "7405   https://nla.gov.au/nla.obj-893284845   \n",
       "132    https://nla.gov.au/nla.obj-961623531   \n",
       "3339   https://nla.gov.au/nla.obj-961623531   \n",
       "\n",
       "                                      work_url       date  \\\n",
       "2314    https://trove.nla.gov.au/work/14409767       1905   \n",
       "2315    https://trove.nla.gov.au/work/14409773       1905   \n",
       "2471    https://trove.nla.gov.au/work/14626171       1905   \n",
       "2467    https://trove.nla.gov.au/work/14626082       1905   \n",
       "2472    https://trove.nla.gov.au/work/14626181       1905   \n",
       "...                                        ...        ...   \n",
       "4155    https://trove.nla.gov.au/work/16946369       1987   \n",
       "23268   https://trove.nla.gov.au/work/32952780       1973   \n",
       "7405    https://trove.nla.gov.au/work/19937156       1973   \n",
       "132     https://trove.nla.gov.au/work/10384783       1979   \n",
       "3339   https://trove.nla.gov.au/work/159335939  1891-1979   \n",
       "\n",
       "                                                creators  \\\n",
       "2314              Western Australia. Department of Mines   \n",
       "2315              Western Australia. Department of Mines   \n",
       "2471              Western Australia. Department of Mines   \n",
       "2467              Western Australia. Department of Mines   \n",
       "2472              Western Australia. Department of Mines   \n",
       "...                                                  ...   \n",
       "4155                       Geological Survey of Tasmania   \n",
       "23268             Beard, J. S. (John Stanley), 1916-2011   \n",
       "7405              Beard, J. S. (John Stanley), 1916-2011   \n",
       "132    Western Australia. Department of Lands and Sur...   \n",
       "3339   Western Australia. Department of Lands and Sur...   \n",
       "\n",
       "                                             publication  \\\n",
       "2314   [South Kensington : Science Museum Library, 19...   \n",
       "2315   [South Kensington : Science Museum Library, 19...   \n",
       "2471            [London : Science Museum Library, 1905?]   \n",
       "2467            [London : Science Museum Library, 1905?]   \n",
       "2472            [London : Science Museum Library, 1905?]   \n",
       "...                                                  ...   \n",
       "4155                      Hobart, Tas. : The Dept., 1987   \n",
       "23268                  Perth : Vegmap Publications, 1973   \n",
       "7405                   Perth : Vegmap Publications, 1973   \n",
       "132      Perth (W.A.) : Dept. of Lands and Surveys, 1979   \n",
       "3339     Perth (W.A.) : Dept. of Lands and Surveys, 1979   \n",
       "\n",
       "                                                  extent  \\\n",
       "2314                                 1 map ; 65 x 99 cm.   \n",
       "2315                                 1 map ; 65 x 99 cm.   \n",
       "2471                                 1 map ; 65 x 98 cm.   \n",
       "2467                                 1 map ; 65 x 98 cm.   \n",
       "2472                                 1 map ; 65 x 98 cm.   \n",
       "...                                                  ...   \n",
       "4155    1 map : col. ; 56 x 84 cm. on sheet 69 x 107 cm.   \n",
       "23268  1 map ; 47 x 59 cm., on sheet 59 x 70 cm. + 1 ...   \n",
       "7405   1 map ; 47 x 59 cm., on sheet 59 x 70 cm. + 1 ...   \n",
       "132                           1 map : col. ; 55 x 50 cm.   \n",
       "3339                          1 map : col. ; 55 x 50 cm.   \n",
       "\n",
       "               copyright_status                 scale  \\\n",
       "2314           Out of Copyright  Scale [ca. 1:31 680]   \n",
       "2315           Out of Copyright  Scale [ca. 1:31 680]   \n",
       "2471   Edition Out of Copyright  Scale [ca. 1:15 840]   \n",
       "2467   Edition Out of Copyright  Scale [ca. 1:15 840]   \n",
       "2472   Edition Out of Copyright  Scale [ca. 1:15 840]   \n",
       "...                         ...                   ...   \n",
       "4155               In Copyright        Scale 1:50 000   \n",
       "23268              In Copyright       Scale 1:250,000   \n",
       "7405               In Copyright       Scale 1:250,000   \n",
       "132                In Copyright        Scale 1:25,000   \n",
       "3339               In Copyright        Scale 1:25,000   \n",
       "\n",
       "                                             coordinates filesize_string  \\\n",
       "2314                          (E 122°10'30\"/S 28°49'00\")          1.19GB   \n",
       "2315                          (E 122°10'30\"/S 28°49'00\")          1.19GB   \n",
       "2471                                (E 121°09'/S 30°57')          1.14GB   \n",
       "2467                                (E 121°09'/S 30°57')          1.14GB   \n",
       "2472                                (E 121°09'/S 30°57')          1.20GB   \n",
       "...                                                  ...             ...   \n",
       "4155           (E 148⁰00'--E 148⁰30'/S 41⁰15'--S 41⁰30')          1.16GB   \n",
       "23268         (E 123°00ʹ--E 124°30ʹ/S 33°00ʹ--S 34°00ʹ).        748.65MB   \n",
       "7405          (E 123°00ʹ--E 124°30ʹ/S 33°00ʹ--S 34°00ʹ).        748.65MB   \n",
       "132    (E 115°45ʹ00ʺ--E 115°52ʹ30ʺ/S 32°15ʹ00ʺ--S 32°...        676.20MB   \n",
       "3339   (E 115°45ʹ00ʺ--E 115°52ʹ30ʺ/S 32°15ʹ00ʺ--S 32°...        676.20MB   \n",
       "\n",
       "         filesize  width  height copy_role           mb  \n",
       "2314   1281565428  25062   17045         m  1222.196033  \n",
       "2315   1281565428  25062   17045         m  1222.196033  \n",
       "2471   1228526852  23731   17256         m  1171.614506  \n",
       "2467   1228526852  23731   17256         m  1171.614506  \n",
       "2472   1287582536  25084   17110         m  1227.934395  \n",
       "...           ...    ...     ...       ...          ...  \n",
       "4155   1245749060  25436   16325         m  1188.038883  \n",
       "23268   785014388  18087   14467         m    748.64806  \n",
       "7405    785014388  18087   14467         m    748.64806  \n",
       "132     709045908  14078   16788         m   676.198872  \n",
       "3339    709045908  14078   16788         m   676.198872  \n",
       "\n",
       "[398 rows x 17 columns]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[df.duplicated([\"url\"], keep=False)].sort_values(\"url\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many of the maps have high-resolution downloads?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(30738, 16)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[df[\"filesize\"].notnull()].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What are the `copy_role` values?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "copy_role\n",
       "m    30344\n",
       "i      364\n",
       "o       30\n",
       "Name: count, dtype: Int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"copy_role\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How much map data is available for download?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "14.41TB\n"
     ]
    }
   ],
   "source": [
    "size, unit = format_bytes(df[\"filesize\"].sum())\n",
    "print(\"{:.2f}{}\".format(size, unit))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What's the copyright status of the maps?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "copyright_status\n",
       "Out of Copyright            25281\n",
       "In Copyright                 8618\n",
       "Edition Out of Copyright      631\n",
       "Copyright Undetermined        349\n",
       "Copyright Uncertain           110\n",
       "Unknown                        22\n",
       "Edition In Copyright            4\n",
       "Name: count, dtype: Int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"copyright_status\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's show the copyright status as a chart..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<style>\n",
       "  #altair-viz-0f28f62f2e3d4168a2e899f6de936684.vega-embed {\n",
       "    width: 100%;\n",
       "    display: flex;\n",
       "  }\n",
       "\n",
       "  #altair-viz-0f28f62f2e3d4168a2e899f6de936684.vega-embed details,\n",
       "  #altair-viz-0f28f62f2e3d4168a2e899f6de936684.vega-embed details summary {\n",
       "    position: relative;\n",
       "  }\n",
       "</style>\n",
       "<div id=\"altair-viz-0f28f62f2e3d4168a2e899f6de936684\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-0f28f62f2e3d4168a2e899f6de936684\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-0f28f62f2e3d4168a2e899f6de936684\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.17.0?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function maybeLoadScript(lib, version) {\n",
       "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
       "      return (VEGA_DEBUG[key] == version) ?\n",
       "        Promise.resolve(paths[lib]) :\n",
       "        new Promise(function(resolve, reject) {\n",
       "          var s = document.createElement('script');\n",
       "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "          s.async = true;\n",
       "          s.onload = () => {\n",
       "            VEGA_DEBUG[key] = version;\n",
       "            return resolve(paths[lib]);\n",
       "          };\n",
       "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "          s.src = paths[lib];\n",
       "        });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else {\n",
       "      maybeLoadScript(\"vega\", \"5\")\n",
       "        .then(() => maybeLoadScript(\"vega-lite\", \"5.17.0\"))\n",
       "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 300, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-b3969833ee7f0df3e915c487c4ea8dbc\"}, \"mark\": {\"type\": \"bar\"}, \"encoding\": {\"tooltip\": {\"field\": \"count\", \"type\": \"quantitative\"}, \"x\": {\"field\": \"count\", \"type\": \"quantitative\"}, \"y\": {\"field\": \"status\", \"type\": \"nominal\"}}, \"height\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.17.0.json\", \"datasets\": {\"data-b3969833ee7f0df3e915c487c4ea8dbc\": [{\"status\": \"Out of Copyright\", \"count\": 25281}, {\"status\": \"In Copyright\", \"count\": 8618}, {\"status\": \"Edition Out of Copyright\", \"count\": 631}, {\"status\": \"Copyright Undetermined\", \"count\": 349}, {\"status\": \"Copyright Uncertain\", \"count\": 110}, {\"status\": \"Unknown\", \"count\": 22}, {\"status\": \"Edition In Copyright\", \"count\": 4}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "counts = df[\"copyright_status\"].value_counts().to_frame().reset_index()\n",
    "counts.columns = [\"status\", \"count\"]\n",
    "alt.Chart(counts).mark_bar().encode(\n",
    "    y=\"status:N\", x=\"count\", tooltip=\"count\"\n",
    ").properties(height=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at the sizes of the download files. To make this easier we'll divide the filesizes into ranges (bins) and count the number of files in each range."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mb</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>(0, 500]</td>\n",
       "      <td>16143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(500, 1000]</td>\n",
       "      <td>11454</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(1000, 1500]</td>\n",
       "      <td>2733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>(1500, 2000]</td>\n",
       "      <td>311</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>(2000, 3000]</td>\n",
       "      <td>84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>(3000, 3500]</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             mb  count\n",
       "0      (0, 500]  16143\n",
       "1   (500, 1000]  11454\n",
       "2  (1000, 1500]   2733\n",
       "3  (1500, 2000]    311\n",
       "4  (2000, 3000]     84\n",
       "5  (3000, 3500]     12"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Convert bytes to mb\n",
    "df[\"mb\"] = df[\"filesize\"] / 2**10 / 2**10\n",
    "# Create 500mb-sized bins and count the number of files in each bin\n",
    "sizes = (\n",
    "    pd.cut(df[\"mb\"], bins=[0, 500, 1000, 1500, 2000, 3000, 3500])\n",
    "    .value_counts()\n",
    "    .to_frame()\n",
    "    .reset_index()\n",
    ")\n",
    "sizes.columns = [\"mb\", \"count\"]\n",
    "# Convert intervals to strings for display in chart\n",
    "sizes[\"mb\"] = sizes[\"mb\"].astype(str)\n",
    "sizes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<style>\n",
       "  #altair-viz-16d0f11e3ce24fdb8da003ea3ea5828b.vega-embed {\n",
       "    width: 100%;\n",
       "    display: flex;\n",
       "  }\n",
       "\n",
       "  #altair-viz-16d0f11e3ce24fdb8da003ea3ea5828b.vega-embed details,\n",
       "  #altair-viz-16d0f11e3ce24fdb8da003ea3ea5828b.vega-embed details summary {\n",
       "    position: relative;\n",
       "  }\n",
       "</style>\n",
       "<div id=\"altair-viz-16d0f11e3ce24fdb8da003ea3ea5828b\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-16d0f11e3ce24fdb8da003ea3ea5828b\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-16d0f11e3ce24fdb8da003ea3ea5828b\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.17.0?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function maybeLoadScript(lib, version) {\n",
       "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
       "      return (VEGA_DEBUG[key] == version) ?\n",
       "        Promise.resolve(paths[lib]) :\n",
       "        new Promise(function(resolve, reject) {\n",
       "          var s = document.createElement('script');\n",
       "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "          s.async = true;\n",
       "          s.onload = () => {\n",
       "            VEGA_DEBUG[key] = version;\n",
       "            return resolve(paths[lib]);\n",
       "          };\n",
       "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "          s.src = paths[lib];\n",
       "        });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else {\n",
       "      maybeLoadScript(\"vega\", \"5\")\n",
       "        .then(() => maybeLoadScript(\"vega-lite\", \"5.17.0\"))\n",
       "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 300, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-15b2f253c0869bfa9f03e36395e6c3d9\"}, \"mark\": {\"type\": \"bar\"}, \"encoding\": {\"tooltip\": {\"field\": \"count\", \"type\": \"quantitative\"}, \"x\": {\"field\": \"mb\", \"sort\": null, \"type\": \"nominal\"}, \"y\": {\"field\": \"count\", \"type\": \"quantitative\"}}, \"width\": 400, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.17.0.json\", \"datasets\": {\"data-15b2f253c0869bfa9f03e36395e6c3d9\": [{\"mb\": \"(0, 500]\", \"count\": 16143}, {\"mb\": \"(500, 1000]\", \"count\": 11454}, {\"mb\": \"(1000, 1500]\", \"count\": 2733}, {\"mb\": \"(1500, 2000]\", \"count\": 311}, {\"mb\": \"(2000, 3000]\", \"count\": 84}, {\"mb\": \"(3000, 3500]\", \"count\": 12}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alt.Chart(sizes).mark_bar().encode(\n",
    "    x=alt.X(\"mb:N\", sort=None), y=\"count:Q\", tooltip=\"count:Q\"\n",
    ").properties(width=400)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So while most are less than 500MB, more than 10,000 are between 0.5 and 1GB!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What's the biggest file available for download?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "identifier                                         nla.obj-2458846831\n",
       "title               Geologic map of the Arabian Peninsula / compil...\n",
       "url                             https://nla.gov.au/nla.obj-2458846831\n",
       "work_url                       https://trove.nla.gov.au/work/12332257\n",
       "date                                                             1963\n",
       "creators                                     Geological Survey (U.S.)\n",
       "publication                       Washington, D.C. : The Survey, 1963\n",
       "extent                                   1 map : col. ; 113 x 132 cm.\n",
       "copyright_status                                         In Copyright\n",
       "scale                                             Scale 1:2,000,000 ;\n",
       "coordinates                               (E 34°--E 61°/N 32°--N 12°)\n",
       "filesize_string                                                3.64GB\n",
       "filesize                                                   3907679404\n",
       "width                                                           43211\n",
       "height                                                          30144\n",
       "copy_role                                                           m\n",
       "mb                                                        3726.653484\n",
       "Name: 1536, dtype: object"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[df[\"filesize\"].idxmax()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All downloads greater than 3GB."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>identifier</th>\n",
       "      <th>title</th>\n",
       "      <th>url</th>\n",
       "      <th>work_url</th>\n",
       "      <th>date</th>\n",
       "      <th>creators</th>\n",
       "      <th>publication</th>\n",
       "      <th>extent</th>\n",
       "      <th>copyright_status</th>\n",
       "      <th>scale</th>\n",
       "      <th>coordinates</th>\n",
       "      <th>filesize_string</th>\n",
       "      <th>filesize</th>\n",
       "      <th>width</th>\n",
       "      <th>height</th>\n",
       "      <th>copy_role</th>\n",
       "      <th>mb</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1536</th>\n",
       "      <td>nla.obj-2458846831</td>\n",
       "      <td>Geologic map of the Arabian Peninsula / compil...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-2458846831</td>\n",
       "      <td>https://trove.nla.gov.au/work/12332257</td>\n",
       "      <td>1963</td>\n",
       "      <td>Geological Survey (U.S.)</td>\n",
       "      <td>Washington, D.C. : The Survey, 1963</td>\n",
       "      <td>1 map : col. ; 113 x 132 cm.</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:2,000,000 ;</td>\n",
       "      <td>(E 34°--E 61°/N 32°--N 12°)</td>\n",
       "      <td>3.64GB</td>\n",
       "      <td>3907679404</td>\n",
       "      <td>43211</td>\n",
       "      <td>30144</td>\n",
       "      <td>m</td>\n",
       "      <td>3726.653484</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2611</th>\n",
       "      <td>nla.obj-2567709383</td>\n",
       "      <td>Map of the coastal plain of British Guiana</td>\n",
       "      <td>https://nla.gov.au/nla.obj-2567709383</td>\n",
       "      <td>https://trove.nla.gov.au/work/152215030</td>\n",
       "      <td>1955</td>\n",
       "      <td>Bleackley, D. (David)</td>\n",
       "      <td>[S.l.] : Geological Survey of British Guiana, ...</td>\n",
       "      <td>1 map : col. ; 88 x 205 cm.</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale [ca. 1:143,000].</td>\n",
       "      <td>(W 60°00ʹ--W 57°00ʹ/N 9°00ʹ--N 6°00ʹ).</td>\n",
       "      <td>3.08GB</td>\n",
       "      <td>3305391052</td>\n",
       "      <td>49731</td>\n",
       "      <td>22155</td>\n",
       "      <td>m</td>\n",
       "      <td>3152.266552</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5240</th>\n",
       "      <td>nla.obj-591001246</td>\n",
       "      <td>Map of the City of Rangoon and suburbs 1928-29...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-591001246</td>\n",
       "      <td>https://trove.nla.gov.au/work/182743876</td>\n",
       "      <td>1932</td>\n",
       "      <td>Geological Survey of India</td>\n",
       "      <td></td>\n",
       "      <td>1 map on 4 sheets : colour ; 154 x 126 cm, sheets</td>\n",
       "      <td>Out of Copyright</td>\n",
       "      <td>Scale 1:12,000. 1 in. = 1000 ft.</td>\n",
       "      <td>(E 96°06ʹ--E 96°13ʹ/N 16°53ʹ--N 16°44ʹ).</td>\n",
       "      <td>3.38GB</td>\n",
       "      <td>3623879488</td>\n",
       "      <td>31769</td>\n",
       "      <td>38023</td>\n",
       "      <td>m</td>\n",
       "      <td>3456.000793</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6241</th>\n",
       "      <td>nla.obj-3009772762</td>\n",
       "      <td>Shqipëria, hartë fiziko-politike : shkalla 1...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-3009772762</td>\n",
       "      <td>https://trove.nla.gov.au/work/191812727</td>\n",
       "      <td>1965</td>\n",
       "      <td>Samimi, Ergjin</td>\n",
       "      <td></td>\n",
       "      <td>1 map on 3 sheets : color ; 173 x 91 cm, sheet...</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:200,000. 1 cm to 2 km ;</td>\n",
       "      <td>(E 18°58ʹ--E 21°12ʹ/N 42°40ʹ--N 39°35ʹ).</td>\n",
       "      <td>3.04GB</td>\n",
       "      <td>3266078212</td>\n",
       "      <td>23106</td>\n",
       "      <td>47117</td>\n",
       "      <td>m</td>\n",
       "      <td>3114.774906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7871</th>\n",
       "      <td>nla.obj-568387103</td>\n",
       "      <td>Peta geologi teknik daerah Jakarta - Bogor = E...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-568387103</td>\n",
       "      <td>https://trove.nla.gov.au/work/20208553</td>\n",
       "      <td>1970</td>\n",
       "      <td>Indonesia. Direktorat Geologi</td>\n",
       "      <td></td>\n",
       "      <td>1 map : colour ; 157 x 107 cm</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:50,000</td>\n",
       "      <td>(E 106°33'00\"--E 106°59'00\"/S 5°59'00\"--S 6°38...</td>\n",
       "      <td>3.05GB</td>\n",
       "      <td>3279210576</td>\n",
       "      <td>26384</td>\n",
       "      <td>41429</td>\n",
       "      <td>m</td>\n",
       "      <td>3127.298904</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8362</th>\n",
       "      <td>nla.obj-400826638</td>\n",
       "      <td>Nyūginia-tō zenzu / Taiwan Sōtokufu Gaijibu...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-400826638</td>\n",
       "      <td>https://trove.nla.gov.au/work/205481810</td>\n",
       "      <td>1942</td>\n",
       "      <td>Taiwan</td>\n",
       "      <td></td>\n",
       "      <td>1 map on 4 sheets : colour ; 172 x 99 cm</td>\n",
       "      <td>Out of Copyright</td>\n",
       "      <td>Scale 1:5,000,000 ;</td>\n",
       "      <td>(E 126°00ʹ--E 156°00ʹ/N 4°00ʹ--S 12°00ʹ).</td>\n",
       "      <td>3.04GB</td>\n",
       "      <td>3264456500</td>\n",
       "      <td>42659</td>\n",
       "      <td>25508</td>\n",
       "      <td>m</td>\n",
       "      <td>3113.228321</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11917</th>\n",
       "      <td>nla.obj-568387099</td>\n",
       "      <td>Geological map of Djawa and Madura / compiled ...</td>\n",
       "      <td>http://nla.gov.au/nla.obj-568387099</td>\n",
       "      <td>https://trove.nla.gov.au/work/218208895</td>\n",
       "      <td>1963</td>\n",
       "      <td>Indonesia. Direktorat Geologi</td>\n",
       "      <td></td>\n",
       "      <td>1 map : colour ; 78 x 216 cm.</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:500,000</td>\n",
       "      <td>(E 104°58ʹ28ʺ--E 113°98ʹ28ʺ/S 5°30ʹ00ʺ--S 9°00...</td>\n",
       "      <td>3.08GB</td>\n",
       "      <td>3311801600</td>\n",
       "      <td>52593</td>\n",
       "      <td>20990</td>\n",
       "      <td>m</td>\n",
       "      <td>3158.380127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14886</th>\n",
       "      <td>nla.obj-1954049619</td>\n",
       "      <td>A new chart of the South Pacific Ocean, includ...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-1954049619</td>\n",
       "      <td>https://trove.nla.gov.au/work/237421392</td>\n",
       "      <td>1849-1857</td>\n",
       "      <td>James Imray and Son</td>\n",
       "      <td></td>\n",
       "      <td>1 map ; 96.4 x 183.0 cm</td>\n",
       "      <td>Edition Out of Copyright</td>\n",
       "      <td>Scale approximately 1:11,000,000 at the equator</td>\n",
       "      <td>(E 111°--W 60°/N 20°--S 60°).</td>\n",
       "      <td>3.00GB</td>\n",
       "      <td>3223026784</td>\n",
       "      <td>44606</td>\n",
       "      <td>24085</td>\n",
       "      <td>m</td>\n",
       "      <td>3073.717865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15311</th>\n",
       "      <td>nla.obj-2618718155</td>\n",
       "      <td>Proposed plan for the site for the federal cap...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-2618718155</td>\n",
       "      <td>https://trove.nla.gov.au/work/239126400</td>\n",
       "      <td>1911</td>\n",
       "      <td>Wilson, George, died 1923</td>\n",
       "      <td></td>\n",
       "      <td>1 map : colour ; 141 x 141 cm</td>\n",
       "      <td>Out of Copyright</td>\n",
       "      <td>Scale 1:4,800 ;</td>\n",
       "      <td>(E 149°08'/S 35°18').</td>\n",
       "      <td>3.12GB</td>\n",
       "      <td>3344969196</td>\n",
       "      <td>33600</td>\n",
       "      <td>33184</td>\n",
       "      <td>m</td>\n",
       "      <td>3190.011211</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15477</th>\n",
       "      <td>nla.obj-2824965115</td>\n",
       "      <td>Map of the mandated territory of New Guinea / ...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-2824965115</td>\n",
       "      <td>https://trove.nla.gov.au/work/239997009</td>\n",
       "      <td>1925</td>\n",
       "      <td>Krahe, R. E.</td>\n",
       "      <td></td>\n",
       "      <td>1 map : transparent architectural linen ; 210 ...</td>\n",
       "      <td>In Copyright</td>\n",
       "      <td>Scale 1:1,000,000</td>\n",
       "      <td>(E 140°50'00\"--E 159°41'00\"/S 0°33'00\"--S 11°5...</td>\n",
       "      <td>3.37GB</td>\n",
       "      <td>3622362060</td>\n",
       "      <td>53028</td>\n",
       "      <td>22770</td>\n",
       "      <td>m</td>\n",
       "      <td>3454.553661</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34455</th>\n",
       "      <td>nla.obj-230705067</td>\n",
       "      <td>Plan shewing pastoral leases and claims in the...</td>\n",
       "      <td>https://nla.gov.au/nla.obj-230705067</td>\n",
       "      <td>https://trove.nla.gov.au/work/8818311</td>\n",
       "      <td>1885-1950</td>\n",
       "      <td>South Australia. Surveyor-General's Office</td>\n",
       "      <td>Adelaide : Surveyor General's Office, 1885</td>\n",
       "      <td>1 map on 3 sheets ; 169.1 x 99.7 cm., on sheet...</td>\n",
       "      <td>Out of Copyright</td>\n",
       "      <td>Scale [1:1 000 000]. 16 miles to 1 inch.</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>3.08GB</td>\n",
       "      <td>3308608288</td>\n",
       "      <td>25576</td>\n",
       "      <td>43121</td>\n",
       "      <td>m</td>\n",
       "      <td>3155.334747</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               identifier                                              title  \\\n",
       "1536   nla.obj-2458846831  Geologic map of the Arabian Peninsula / compil...   \n",
       "2611   nla.obj-2567709383         Map of the coastal plain of British Guiana   \n",
       "5240    nla.obj-591001246  Map of the City of Rangoon and suburbs 1928-29...   \n",
       "6241   nla.obj-3009772762  Shqipëria, hartë fiziko-politike : shkalla 1...   \n",
       "7871    nla.obj-568387103  Peta geologi teknik daerah Jakarta - Bogor = E...   \n",
       "8362    nla.obj-400826638  Nyūginia-tō zenzu / Taiwan Sōtokufu Gaijibu...   \n",
       "11917   nla.obj-568387099  Geological map of Djawa and Madura / compiled ...   \n",
       "14886  nla.obj-1954049619  A new chart of the South Pacific Ocean, includ...   \n",
       "15311  nla.obj-2618718155  Proposed plan for the site for the federal cap...   \n",
       "15477  nla.obj-2824965115  Map of the mandated territory of New Guinea / ...   \n",
       "34455   nla.obj-230705067  Plan shewing pastoral leases and claims in the...   \n",
       "\n",
       "                                         url  \\\n",
       "1536   https://nla.gov.au/nla.obj-2458846831   \n",
       "2611   https://nla.gov.au/nla.obj-2567709383   \n",
       "5240     http://nla.gov.au/nla.obj-591001246   \n",
       "6241   https://nla.gov.au/nla.obj-3009772762   \n",
       "7871     http://nla.gov.au/nla.obj-568387103   \n",
       "8362     http://nla.gov.au/nla.obj-400826638   \n",
       "11917    http://nla.gov.au/nla.obj-568387099   \n",
       "14886  https://nla.gov.au/nla.obj-1954049619   \n",
       "15311  https://nla.gov.au/nla.obj-2618718155   \n",
       "15477  https://nla.gov.au/nla.obj-2824965115   \n",
       "34455   https://nla.gov.au/nla.obj-230705067   \n",
       "\n",
       "                                      work_url       date  \\\n",
       "1536    https://trove.nla.gov.au/work/12332257       1963   \n",
       "2611   https://trove.nla.gov.au/work/152215030       1955   \n",
       "5240   https://trove.nla.gov.au/work/182743876       1932   \n",
       "6241   https://trove.nla.gov.au/work/191812727       1965   \n",
       "7871    https://trove.nla.gov.au/work/20208553       1970   \n",
       "8362   https://trove.nla.gov.au/work/205481810       1942   \n",
       "11917  https://trove.nla.gov.au/work/218208895       1963   \n",
       "14886  https://trove.nla.gov.au/work/237421392  1849-1857   \n",
       "15311  https://trove.nla.gov.au/work/239126400       1911   \n",
       "15477  https://trove.nla.gov.au/work/239997009       1925   \n",
       "34455    https://trove.nla.gov.au/work/8818311  1885-1950   \n",
       "\n",
       "                                         creators  \\\n",
       "1536                     Geological Survey (U.S.)   \n",
       "2611                        Bleackley, D. (David)   \n",
       "5240                   Geological Survey of India   \n",
       "6241                               Samimi, Ergjin   \n",
       "7871                Indonesia. Direktorat Geologi   \n",
       "8362                                       Taiwan   \n",
       "11917               Indonesia. Direktorat Geologi   \n",
       "14886                         James Imray and Son   \n",
       "15311                   Wilson, George, died 1923   \n",
       "15477                                Krahe, R. E.   \n",
       "34455  South Australia. Surveyor-General's Office   \n",
       "\n",
       "                                             publication  \\\n",
       "1536                 Washington, D.C. : The Survey, 1963   \n",
       "2611   [S.l.] : Geological Survey of British Guiana, ...   \n",
       "5240                                                       \n",
       "6241                                                       \n",
       "7871                                                       \n",
       "8362                                                       \n",
       "11917                                                      \n",
       "14886                                                      \n",
       "15311                                                      \n",
       "15477                                                      \n",
       "34455         Adelaide : Surveyor General's Office, 1885   \n",
       "\n",
       "                                                  extent  \\\n",
       "1536                        1 map : col. ; 113 x 132 cm.   \n",
       "2611                         1 map : col. ; 88 x 205 cm.   \n",
       "5240   1 map on 4 sheets : colour ; 154 x 126 cm, sheets   \n",
       "6241   1 map on 3 sheets : color ; 173 x 91 cm, sheet...   \n",
       "7871                       1 map : colour ; 157 x 107 cm   \n",
       "8362            1 map on 4 sheets : colour ; 172 x 99 cm   \n",
       "11917                      1 map : colour ; 78 x 216 cm.   \n",
       "14886                            1 map ; 96.4 x 183.0 cm   \n",
       "15311                      1 map : colour ; 141 x 141 cm   \n",
       "15477  1 map : transparent architectural linen ; 210 ...   \n",
       "34455  1 map on 3 sheets ; 169.1 x 99.7 cm., on sheet...   \n",
       "\n",
       "               copyright_status  \\\n",
       "1536               In Copyright   \n",
       "2611               In Copyright   \n",
       "5240           Out of Copyright   \n",
       "6241               In Copyright   \n",
       "7871               In Copyright   \n",
       "8362           Out of Copyright   \n",
       "11917              In Copyright   \n",
       "14886  Edition Out of Copyright   \n",
       "15311          Out of Copyright   \n",
       "15477              In Copyright   \n",
       "34455          Out of Copyright   \n",
       "\n",
       "                                                 scale  \\\n",
       "1536                               Scale 1:2,000,000 ;   \n",
       "2611                            Scale [ca. 1:143,000].   \n",
       "5240                  Scale 1:12,000. 1 in. = 1000 ft.   \n",
       "6241                   Scale 1:200,000. 1 cm to 2 km ;   \n",
       "7871                                    Scale 1:50,000   \n",
       "8362                               Scale 1:5,000,000 ;   \n",
       "11917                                  Scale 1:500,000   \n",
       "14886  Scale approximately 1:11,000,000 at the equator   \n",
       "15311                                  Scale 1:4,800 ;   \n",
       "15477                                Scale 1:1,000,000   \n",
       "34455         Scale [1:1 000 000]. 16 miles to 1 inch.   \n",
       "\n",
       "                                             coordinates filesize_string  \\\n",
       "1536                         (E 34°--E 61°/N 32°--N 12°)          3.64GB   \n",
       "2611              (W 60°00ʹ--W 57°00ʹ/N 9°00ʹ--N 6°00ʹ).          3.08GB   \n",
       "5240            (E 96°06ʹ--E 96°13ʹ/N 16°53ʹ--N 16°44ʹ).          3.38GB   \n",
       "6241            (E 18°58ʹ--E 21°12ʹ/N 42°40ʹ--N 39°35ʹ).          3.04GB   \n",
       "7871   (E 106°33'00\"--E 106°59'00\"/S 5°59'00\"--S 6°38...          3.05GB   \n",
       "8362           (E 126°00ʹ--E 156°00ʹ/N 4°00ʹ--S 12°00ʹ).          3.04GB   \n",
       "11917  (E 104°58ʹ28ʺ--E 113°98ʹ28ʺ/S 5°30ʹ00ʺ--S 9°00...          3.08GB   \n",
       "14886                      (E 111°--W 60°/N 20°--S 60°).          3.00GB   \n",
       "15311                              (E 149°08'/S 35°18').          3.12GB   \n",
       "15477  (E 140°50'00\"--E 159°41'00\"/S 0°33'00\"--S 11°5...          3.37GB   \n",
       "34455                                               <NA>          3.08GB   \n",
       "\n",
       "         filesize  width  height copy_role           mb  \n",
       "1536   3907679404  43211   30144         m  3726.653484  \n",
       "2611   3305391052  49731   22155         m  3152.266552  \n",
       "5240   3623879488  31769   38023         m  3456.000793  \n",
       "6241   3266078212  23106   47117         m  3114.774906  \n",
       "7871   3279210576  26384   41429         m  3127.298904  \n",
       "8362   3264456500  42659   25508         m  3113.228321  \n",
       "11917  3311801600  52593   20990         m  3158.380127  \n",
       "14886  3223026784  44606   24085         m  3073.717865  \n",
       "15311  3344969196  33600   33184         m  3190.011211  \n",
       "15477  3622362060  53028   22770         m  3454.553661  \n",
       "34455  3308608288  25576   43121         m  3155.334747  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[(df[\"filesize\"] / 2**10 / 2**10 / 2**10) > 3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The widest image?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "identifier                                          nla.obj-636346192\n",
       "title               Land status petroleum mining agreement in resp...\n",
       "url                               http://nla.gov.au/nla.obj-636346192\n",
       "work_url                      https://trove.nla.gov.au/work/230363372\n",
       "date                                                             1968\n",
       "creators                               Brunei Shell Petroleum Company\n",
       "publication                                                          \n",
       "extent                                            1 map ; 286 x 58 cm\n",
       "copyright_status                                         In Copyright\n",
       "scale                                                  Scale 1:10,000\n",
       "coordinates         (E 114°09ʹ53ʺ--E 114°23ʹ34ʺ/N 4°38ʹ42ʺ--N 4°32...\n",
       "filesize_string                                                2.80GB\n",
       "filesize                                                   3008938460\n",
       "width                                                           68453\n",
       "height                                                          14652\n",
       "copy_role                                                           m\n",
       "mb                                                        2869.547329\n",
       "Name: 13749, dtype: object"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[df[\"width\"].idxmax()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The tallest image?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "identifier                                         nla.obj-2824964225\n",
       "title               Traverse of the Ramu River, navigated by the \"...\n",
       "url                             https://nla.gov.au/nla.obj-2824964225\n",
       "work_url                       https://trove.nla.gov.au/work/36757550\n",
       "date                                                        1921-1945\n",
       "creators                   Stanley, Evan R. (Evan Richard), 1885-1924\n",
       "publication                                                          \n",
       "extent                   1 map : on architectural linen ; 410 x 76 cm\n",
       "copyright_status                                         In Copyright\n",
       "scale                                                  Scale 1:31,760\n",
       "coordinates                  (E 144°35'--E 144°50'/S 4°01'--S 5°11').\n",
       "filesize_string                                                2.85GB\n",
       "filesize                                                   3057135688\n",
       "width                                                           13840\n",
       "height                                                          73630\n",
       "copy_role                                                           m\n",
       "mb                                                        2915.511787\n",
       "Name: 31282, dtype: object"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[df[\"height\"].idxmax()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "\n",
    "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.net/).\n",
    "\n",
    "Work on this notebook was originally supported by the [Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab](https://tinker.edu.au/).\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "rocrate": {
   "action": [
    {
     "description": "This dataset contains metadata describing digitised maps in Trove, harvested from the Trove API and other sources.",
     "isPartOf": "https://github.com/GLAM-Workbench/trove-maps-data",
     "mainEntityOfPage": "https://glam-workbench.net/trove-maps/single-maps-data/",
     "result": [
      {
       "url": "https://github.com/GLAM-Workbench/trove-maps-data/blob/main/single_maps_coordinates.csv"
      }
     ]
    }
   ],
   "author": [
    {
     "mainEntityOfPage": "https://timsherratt.au",
     "name": "Sherratt, Tim",
     "orcid": "https://orcid.org/0000-0001-7956-4498"
    }
   ],
   "description": "I knew there were lots of great maps you could download from Trove, but how many? And how big were the files? I thought I'd try to quantify this a bit by harvesting and analysing the metadata.",
   "mainEntityOfPage": "https://glam-workbench.net/trove-maps/exploring-digitised-maps/",
   "name": "Exploring digitised maps in Trove",
   "position": 0
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}