{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get a random work from Trove using queries and facets\n",
"\n",
"Here's another way you can get a random work from Trove's `book`, `article`, `picture`, `map`, `music`, or `collection` zones. This approach is particularly useful if you want to get a random result from a search, or want to apply a variety of facets. It's not as quick as [pinging random work ids at Trove](notebooks/random_work_by_id.ipynb), but it's more flexible.\n",
"\n",
"Basically this method gets all the available facets for a particular search. If the search has more than 100 results, it chooses one of the facets at random and applies it. It keeps doing this until the search returns less that 100 results. Then it chooses a work at random from the results. If you don't supply a query, it uses a random stop word to mix things up a bit.\n",
"\n",
"The problem with this approach is that facets can't always be extracted from records, and there's no way of finding records without a particular facet. For example, you can use the `year` facet to limit results to a particular year, but what about records that don't have a `year` value. Once you start using that facet, they're invisible. I'm worried that this will mean that certain parts of Trove will never be surfaced. It would of course be much better if Trove just supported random sorting so I didn't have to do all these stupid workarounds.\n",
"\n",
"Collection searches (ie using NUC identifiers) are particularly tricky, because items from a single collection can share very similar facet values. To try and limit the results in this sort of situation, I've provided a couple of extra parameters:\n",
"\n",
"* `add_word` – adds a random stopword to the query\n",
"* `add_number` – adds a random two digit number to the query (useful if the records use numeric identifiers)\n",
"\n",
"These can help increase the degree of randomness, but again I suspect some parts of collections will never be reached."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import os\n",
"import random\n",
"\n",
"import requests\n",
"from IPython.display import HTML, display\n",
"from requests.adapters import HTTPAdapter\n",
"from requests.packages.urllib3.util.retry import Retry\n",
"\n",
"s = requests.Session()\n",
"retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504])\n",
"s.mount(\"https://\", HTTPAdapter(max_retries=retries))\n",
"s.mount(\"http://\", HTTPAdapter(max_retries=retries))\n",
"\n",
"with open(\"stopwords.json\", \"r\") as json_file:\n",
" STOPWORDS = json.load(json_file)\n",
"\n",
"API_URL = \"http://api.trove.nla.gov.au/v2/result\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%%capture\n",
"# Load variables from the .env file if it exists\n",
"# Use %%capture to suppress messages\n",
"%load_ext dotenv\n",
"%dotenv"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Insert your Trove API key\n",
"API_KEY = \"YOUR API KEY\"\n",
"\n",
"# Use api key value from environment variables if it is available\n",
"if os.getenv(\"TROVE_API_KEY\"):\n",
" API_KEY = os.getenv(\"TROVE_API_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def get_facet_terms(terms):\n",
" \"\"\"\n",
" Get all the terms in a facet.\n",
" \"\"\"\n",
" facet_terms = []\n",
" for term in terms:\n",
" facet_terms.append(term[\"search\"])\n",
" if \"term\" in term:\n",
" facet_terms += get_facet_terms(term[\"term\"])\n",
" return facet_terms\n",
"\n",
"\n",
"def get_facets(data):\n",
" \"\"\"\n",
" Get the names/terms of facets available from a search.\n",
" \"\"\"\n",
" facets = []\n",
" for facet in data[\"response\"][\"zone\"][0][\"facets\"][\"facet\"]:\n",
" if facet[\"name\"][:3] != \"adv\" and facet[\"name\"] != \"decade\":\n",
" terms = get_facet_terms(facet[\"term\"])\n",
" facets.append({\"facet\": facet[\"name\"], \"terms\": terms})\n",
" return facets\n",
"\n",
"\n",
"def set_query(params, query=None, add_word=False, add_number=False):\n",
" \"\"\"\n",
" Add a 'q' value to the parameters, including random words and numbers if required.\n",
" \"\"\"\n",
" random_word = random.choice(STOPWORDS)\n",
" random_number = random.randrange(1, 100)\n",
" if query:\n",
" if add_word:\n",
" params[\"q\"] = f'{query} \"{random_word}\"'\n",
" elif add_number:\n",
" params[\"q\"] = f'{query} \"{random_number:02}\"'\n",
" else:\n",
" params[\"q\"] = query\n",
" else:\n",
" params[\"q\"] = f'\"{random_word}\"'\n",
" return params\n",
"\n",
"\n",
"def get_random_work_from_zone(zone, query, **kwargs):\n",
" total = 0\n",
" applied_facets = []\n",
" params = {\n",
" \"zone\": zone,\n",
" \"encoding\": \"json\",\n",
" # Keeping this at 0 until we've filtered the results speeds things up\n",
" \"n\": \"0\",\n",
" \"key\": API_KEY,\n",
" \"facet\": \"all\",\n",
" \"include\": \"links\",\n",
" }\n",
" params[\"q\"] = query\n",
" for key, value in kwargs.items():\n",
" params[f\"l-{key}\"] = value\n",
" applied_facets.append(key)\n",
" response = s.get(API_URL, params=params)\n",
" data = response.json()\n",
" total = int(data[\"response\"][\"zone\"][0][\"records\"][\"total\"])\n",
" facets = get_facets(data)\n",
" facets[:] = [f for f in facets if f.get(\"facet\") not in applied_facets]\n",
" # Keep going until we either have less than 100 results or we run out of facets\n",
" while total > 100 and len(facets) > 0:\n",
" # print(f'Facets: {len(facets)}')\n",
" # Select another facet\n",
" new_facet = random.choice(facets)\n",
" # Add it to the applied list\n",
" applied_facets.append(new_facet[\"facet\"])\n",
" # Add the new facet as a parameter\n",
" params[f'l-{new_facet[\"facet\"]}'] = random.choice(new_facet[\"terms\"])\n",
" # Get the new results\n",
" response = s.get(API_URL, params=params)\n",
" data = response.json()\n",
" # Get the facets available from the new search\n",
" facets = get_facets(data)\n",
" # Remove facets from the list that have already been applied\n",
" facets[:] = [f for f in facets if f.get(\"facet\") not in applied_facets]\n",
" total = int(data[\"response\"][\"zone\"][0][\"records\"][\"total\"])\n",
" # print(total)\n",
" # print(response.url)\n",
" if total > 0:\n",
" params[\"n\"] = \"100\"\n",
" # Cleaning up a bit\n",
" params.pop(\"facet\", None)\n",
" response = s.get(API_URL, params=params)\n",
" data = response.json()\n",
" work = random.choice(data[\"response\"][\"zone\"][0][\"records\"][\"work\"])\n",
" return work\n",
"\n",
"\n",
"def get_zones(data):\n",
" \"\"\"\n",
" Find which zones have results in them.\n",
" \"\"\"\n",
" zones = []\n",
" for zone in data[\"response\"][\"zone\"]:\n",
" if int(zone[\"records\"][\"total\"]) > 0:\n",
" zones.append(zone[\"name\"])\n",
" return zones\n",
"\n",
"\n",
"def get_random_work(zone=None, query=None, add_word=False, add_number=False, **kwargs):\n",
" tries = 0\n",
" zones = []\n",
" params = {\n",
" \"encoding\": \"json\",\n",
" \"n\": \"0\",\n",
" \"key\": API_KEY,\n",
" }\n",
" if zone:\n",
" params[\"zone\"] = zone\n",
" else:\n",
" params[\"zone\"] = \"book,article,picture,map,music,collection\"\n",
" params = set_query(params, query, add_word)\n",
" # Add any supplied facets\n",
" for key, value in kwargs.items():\n",
" params[f\"l-{key}\"] = value\n",
" # Make sure that at least some zones have results\n",
" while len(zones) == 0 and tries <= 10:\n",
" params = set_query(params, query, add_word, add_number)\n",
" response = s.get(API_URL, params=params)\n",
" # print(response.url)\n",
" data = response.json()\n",
" zones = get_zones(data)\n",
" tries += 1\n",
" if len(zones) > 0:\n",
" work = get_random_work_from_zone(\n",
" zone=random.choice(zones), query=params[\"q\"], **kwargs\n",
" )\n",
" return work"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get a work from Chinese-Australian Historical Images in Australia (CHIA)\n",
"\n",
"This is a collection were facets aren't terribly useful in slicing up the results because the range of values is very limited. However, items in this collection do have numeric identifiers, and so including the `add_number` parameter seems to help divide it up into chunks of less than 100."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': '197894996',\n",
" 'url': '/work/197894996',\n",
" 'troveUrl': 'https://trove.nla.gov.au/work/197894996',\n",
" 'title': 'Chinatown, Darwin',\n",
" 'contributor': ['Jack Buscall'],\n",
" 'issued': '1939-1941',\n",
" 'type': ['Photograph'],\n",
" 'rights': 'Reproduction rights owned by the Northern Territory Library.',\n",
" 'holdingsCount': 1,\n",
" 'versionCount': 1,\n",
" 'hasCorrections': 'N',\n",
" 'relevance': {'score': '31.259758', 'value': 'very relevant'},\n",
" 'snippet': \". 32. Also titled: 'Darwin's Chinatown before World War 2' Copyprint Location: Cavenagh Street, Bennett\",\n",
" 'identifier': [{'type': 'url',\n",
" 'linktype': 'fulltext',\n",
" 'value': 'http://www.chia.chinesemuseum.com.au/objects/D002314.htm'},\n",
" {'type': 'url',\n",
" 'linktype': 'thumbnail',\n",
" 'value': 'http://www.territorystories.nt.gov.au/bitstream/handle/10070/1855/04375.JPG.jpg'}]}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_random_work(query='(nuc:\"VMUS:CHIA\")', add_number=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get a photo with a thumbnail\n",
"\n",
"Using the new `imageInd` parameter in the query to find records with thumbnails."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': '238334520',\n",
" 'url': '/work/238334520',\n",
" 'troveUrl': 'https://trove.nla.gov.au/work/238334520',\n",
" 'title': 'Photograph - Post Card',\n",
" 'contributor': [\"Valentine's\"],\n",
" 'issued': 1908,\n",
" 'type': ['Photograph'],\n",
" 'rights': ['You may download, display, print or reproduce this image in an unaltered form and with acknowledgement to Phillip Island and District Historical Society Inc. for personal, educational and private research use. If you wish to use it for any other purposes you must obtain permission from Phillip Island and District Historical Society Inc..',\n",
" 'Attribution Non Commercial ShareAlike 3.0 Unported Creative Commons',\n",
" 'http://creativecommons.org/licenses/by-nc-sa/3.0/'],\n",
" 'holdingsCount': 1,\n",
" 'versionCount': 1,\n",
" 'hasCorrections': 'N',\n",
" 'relevance': {'score': '0.0059571834', 'value': 'vaguely relevant'},\n",
" 'snippet': ' Phillip Island\"\\nLetter written by Marie of Everton Cowes to Millie addressed to Mrs H. Blamey, \"Roslyn',\n",
" 'identifier': [{'type': 'url',\n",
" 'linktype': 'fulltext',\n",
" 'linktext': 'Explore further with Victorian Collections',\n",
" 'value': 'https://victoriancollections.net.au/items/56c400582162f10e68c9dea7'},\n",
" {'type': 'url',\n",
" 'linktype': 'thumbnail',\n",
" 'value': 'https://victoriancollections.net.au/media/collectors/4f729f5b97f83e0308601629/items/56c400582162f10e68c9dea7/item-media/5ee6cc6521ea671d3ca61101/item-130x0.jpg'}]}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_random_work(zone=\"picture\", q=\"imageInd:thumbnail\", format=\"Photograph\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get a work tagged 'Japan'\n",
"\n",
"You can include as many additional facets as you want. Here's an example using `publictag`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': '6411867',\n",
" 'url': '/work/6411867',\n",
" 'troveUrl': 'https://trove.nla.gov.au/work/6411867',\n",
" 'title': 'The enigma of Japanese power : people and politics in a stateless nation / Karel van Wolferen',\n",
" 'contributor': ['Wolferen, Karel Van'],\n",
" 'issued': '1988-1993',\n",
" 'type': ['Book', 'Book/Illustrated', 'Audio book'],\n",
" 'isPartOf': {'type': 'series', 'value': 'Tut books'},\n",
" 'holdingsCount': 47,\n",
" 'versionCount': 12,\n",
" 'hasCorrections': 'N',\n",
" 'relevance': {'score': '0.0022870363', 'value': 'vaguely relevant'},\n",
" 'identifier': [{'type': 'url',\n",
" 'linktype': 'restricted',\n",
" 'linktext': 'source',\n",
" 'value': 'http://www.loc.gov/catdir/description/random048/89040552.html'},\n",
" {'type': 'url',\n",
" 'linktype': 'restricted',\n",
" 'linktext': 'Direct link to full text: http://openlibrary.org/details/enigmaofjapanese00wolf',\n",
" 'value': 'http://openlibrary.org/books/OL2062341M'},\n",
" {'type': 'url',\n",
" 'linktype': 'restricted',\n",
" 'linktext': 'HathiTrust Digital Library, Limited view (search only)',\n",
" 'value': 'http://catalog.hathitrust.org/api/volumes/oclc/19130854.html'},\n",
" {'type': 'url',\n",
" 'linktype': 'restricted',\n",
" 'linktext': 'Free eBook from the Internet Archive',\n",
" 'value': 'https://archive.org/details/enigmaofjapanese00wolf'},\n",
" {'type': 'url',\n",
" 'linktype': 'restricted',\n",
" 'linktext': 'Direct link to full text: http://openlibrary.org/details/enigmaofjapanese00kare',\n",
" 'value': 'http://openlibrary.org/books/OL16828742M'},\n",
" {'type': 'url',\n",
" 'linktype': 'restricted',\n",
" 'linktext': 'Direct link to full text: http://openlibrary.org/details/enigmaofjapanese00wolf_0',\n",
" 'value': 'http://openlibrary.org/books/OL2217099M'},\n",
" {'type': 'url',\n",
" 'linktype': 'notonline',\n",
" 'linktext': 'Publisher description',\n",
" 'value': 'http://www.loc.gov/catdir/description/random048/89040552.html'},\n",
" {'type': 'url',\n",
" 'linktype': 'notonline',\n",
" 'linktext': 'Additional information and access via Open Library',\n",
" 'value': 'https://openlibrary.org/books/OL2062341M'}]}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_random_work(publictag=\"Japan\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Display a random thumbnail\n",
"\n",
"Just to cheer myself up a bit..."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"record = get_random_work(zone=\"picture\", q=\"imageInd:thumbnail\", format=\"Photograph\")\n",
"for link in record[\"identifier\"]:\n",
" if link[\"linktype\"] == \"thumbnail\":\n",
" url = link[\"value\"]\n",
" break\n",
"display(HTML(f''))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Speed test"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.05 s ± 1.13 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"get_random_work()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"\n",
"Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}