{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Get a random work from Trove\n", "\n", "Here's a way you can get a random work from Trove's `book`, `article`, `picture`, `map`, `music`, or `collection` zones. It generates random work id prefixes and performs a wildcard search using the `id` index. If the prefix returns no results, a digit is sliced off the end. If a prefix returns more than 100 results, a digit is added to the end. This continues until the result set hits the sweet spot between 0 and 100.\n", "\n", "This method should also work ok with the `format` facet, however, the further you go down the format hierarchy the smaller the slices, and therefore the harder it will be to match a work id. But certainly you should be able to get random works with specific top-level formats without any drama – for example, a random thesis from the book zone.\n", "\n", "This method is probably not going to work for specific collections (ie with a NUC id), or in combination with other search queries. Basically, the more you limit the pool of potential resources, the harder it will be to match on random work ids. In that case you might want to [try using facets](notebooks/random_work_by_facets.ipynb)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "import random\n", "\n", "import requests\n", "from requests.adapters import HTTPAdapter\n", "from requests.packages.urllib3.util.retry import Retry\n", "\n", "s = requests.Session()\n", "retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504])\n", "s.mount(\"https://\", HTTPAdapter(max_retries=retries))\n", "s.mount(\"http://\", HTTPAdapter(max_retries=retries))\n", "\n", "API_URL = \"http://api.trove.nla.gov.au/v2/result\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%capture\n", "# Load variables from the .env file if it exists\n", "# Use %%capture to suppress messages\n", "%load_ext dotenv\n", "%dotenv" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Insert your Trove API key\n", "API_KEY = \"YOUR API KEY\"\n", "\n", "# Use api key value from environment variables if it is available\n", "if os.getenv(\"TROVE_API_KEY\"):\n", " API_KEY = os.getenv(\"TROVE_API_KEY\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "def get_random_work_from_zone(zone, work_format=None):\n", " total = 0\n", " params = {\"zone\": zone, \"encoding\": \"json\", \"n\": \"100\", \"key\": API_KEY}\n", " if work_format:\n", " params[\"l-format\"] = work_format\n", " random_id = None\n", " random_sequence = list(range(0, 10))\n", " random.shuffle(random_sequence)\n", " pos = 0\n", " while total == 0 or total > 100:\n", " if total == 0 and random_id is None:\n", " random_id = str(random.randrange(10000, 100000))\n", " elif total == 0:\n", " if len(random_id) >= 4:\n", " random_id = random_id[:-1]\n", " else:\n", " random_id = str(random.randrange(10000, 100000))\n", " if total > 100 and pos < 10:\n", " random_id = f\"{random_id}{random_sequence[pos]}\"\n", " pos += 1\n", " elif pos == 10:\n", " random_id = str(random.randrange(10000, 100000))\n", " pos = 0\n", " params[\"q\"] = f\"id:{random_id}*\"\n", " response = s.get(API_URL, params=params)\n", " data = response.json()\n", " total = int(data[\"response\"][\"zone\"][0][\"records\"][\"total\"])\n", " # print(total)\n", " # print(response.url)\n", " return random.choice(data[\"response\"][\"zone\"][0][\"records\"][\"work\"])\n", "\n", "\n", "def get_random_work():\n", " zone = random.choice([\"book\", \"article\", \"picture\", \"map\", \"music\", \"collection\"])\n", " work = get_random_work_from_zone(zone)\n", " return work" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random work from a random zone" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'id': '41455576',\n", " 'url': '/work/41455576',\n", " 'troveUrl': 'https://trove.nla.gov.au/work/41455576',\n", " 'title': 'Iutchib chombi',\n", " 'issued': 2010,\n", " 'type': ['Video'],\n", " 'holdingsCount': 0,\n", " 'versionCount': 1,\n", " 'hasCorrections': 'N',\n", " 'relevance': {'score': '6.0', 'value': 'very relevant'}}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_work()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random work from the `picture` zone\n", "\n", "You can specify one of `book`, `article`, `picture`, `map`, `music`, or `collection`. For example:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'id': '6547302',\n", " 'url': '/work/6547302',\n", " 'troveUrl': 'https://trove.nla.gov.au/work/6547302',\n", " 'title': 'Rockpool life of southern Australia / research and text by Harry Breidahl ; illustration and design by Alexis Beckett',\n", " 'contributor': ['Breidahl, Harry'],\n", " 'issued': 1990,\n", " 'type': ['Poster, chart, other'],\n", " 'holdingsCount': 1,\n", " 'versionCount': 1,\n", " 'hasCorrections': 'N',\n", " 'relevance': {'score': '6.0', 'value': 'very relevant'}}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_work_from_zone(\"picture\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random thesis from the `book` zone" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'id': '5465453',\n", " 'url': '/work/5465453',\n", " 'troveUrl': 'https://trove.nla.gov.au/work/5465453',\n", " 'title': 'Studies in colloid and polymer science',\n", " 'contributor': ['Chan, Derek Y. C'],\n", " 'issued': '1974-2018',\n", " 'type': ['Thesis'],\n", " 'holdingsCount': 3,\n", " 'versionCount': 2,\n", " 'hasCorrections': 'N',\n", " 'relevance': {'score': '7.0705104', 'value': 'very relevant'},\n", " 'identifier': [{'type': 'url',\n", " 'linktype': 'fulltext',\n", " 'value': 'http://hdl.handle.net/1885/139931'},\n", " {'type': 'url',\n", " 'linktype': 'thumbnail',\n", " 'value': 'https://openresearch-repository.anu.edu.au/bitstream/1885/139931/5/b10167766-Chan_D.pdf.jpg'}]}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_work_from_zone(\"book\", work_format=\"Thesis\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Speed test" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "430 ms ± 183 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "get_random_work()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" } }, "nbformat": 4, "nbformat_minor": 4 }