{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Get an random newspaper article from Trove\n", "\n", "Changes to the Trove API mean that the techniques I've previously used to select resources at random [will no longer work](https://updates.timsherratt.org/2019/10/09/creators-and-users.html). This notebook provides one alternative.\n", "\n", "I wanted something that would work efficiently, but would also expose as much of the content as possible. Applying multiple facets together with a randomly-generated query seems to do a good job of getting the result set below 100 (the maximum available from a single API call). This should mean that *most* of the newspaper articles are reachable, but it's a bit hard to quantify.\n", "\n", "Thanks to Mitchell Harrop for [suggesting I could use randomly selected stopwords](https://twitter.com/mharropesquire/status/1182175315860213760) as queries. I've supplemented the stopwords with letters and digits, and together they seem to do a good job of applying an initial filter and mixing up the relevance ranking.\n", "\n", "As you can see from the examples below, you can supply any of the facets available in the newspapers zone – for example: `state`, `title`, `year`, `illType`, `category`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import random\n", "import requests\n", "from requests.adapters import HTTPAdapter\n", "from requests.packages.urllib3.util.retry import Retry\n", "import json\n", "\n", "s = requests.Session()\n", "retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])\n", "s.mount('https://', HTTPAdapter(max_retries=retries))\n", "s.mount('http://', HTTPAdapter(max_retries=retries))\n", "\n", "with open('stopwords.json', 'r') as json_file:\n", " STOPWORDS = json.load(json_file)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "API_KEY = 'YOUR API KEY'\n", "API_URL = 'http://api.trove.nla.gov.au/v2/result'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def get_random_facet_value(params, facet):\n", " '''\n", " Get values for the supplied facet and choose one at random.\n", " '''\n", " these_params = params.copy()\n", " these_params['facet'] = facet\n", " response = s.get(API_URL, params=these_params)\n", " data = response.json()\n", " try:\n", " values = [t['search'] for t in data['response']['zone'][0]['facets']['facet']['term']]\n", " except TypeError:\n", " return None\n", " return random.choice(values)\n", "\n", " \n", "def get_total_results(params):\n", " response = s.get(API_URL, params=params)\n", " data = response.json()\n", " total = int(data['response']['zone'][0]['records']['total'])\n", " return total\n", "\n", "\n", "def get_random_article(query=None, **kwargs):\n", " '''\n", " Get a random article.\n", " The kwargs can be any of the available facets, such as 'state', 'title', 'illtype', 'year'.\n", " '''\n", " total = 0\n", " applied_facets = []\n", " facets = ['month', 'year', 'decade', 'word', 'illustrated', 'category', 'title']\n", " tries = 0\n", " params = {\n", " 'zone': 'newspaper',\n", " 'encoding': 'json',\n", " # Note that keeping n at 0 until we've filtered the result set speeds things up considerably\n", " 'n': '0',\n", " # Uncomment these if you need more than the basic data\n", " #'reclevel': 'full',\n", " #'include': 'articleText',\n", " 'key': API_KEY\n", " }\n", " if query:\n", " params['q'] = query\n", " # If there's no query supplied then use a random stopword to mix up the results\n", " else:\n", " random_word = random.choice(STOPWORDS)\n", " params['q'] = f'\"{random_word}\"'\n", " # Apply any supplied factes\n", " for key, value in kwargs.items():\n", " params[f'l-{key}'] = value\n", " applied_facets.append(key)\n", " # Remove any facets that have already been applied from the list of available facets\n", " facets[:] = [f for f in facets if f not in applied_facets]\n", " total = get_total_results(params)\n", " # If our randomly selected stopword has produced no results\n", " # keep trying with new queries until we get some (give up after 10 tries)\n", " while total == 0 and tries <= 10:\n", " if not query:\n", " random_word = random.choice(STOPWORDS)\n", " params['q'] = f'\"{random_word}\"'\n", " tries += 1\n", " # Apply facets one at a time until we have less than 100 results, or we run out of facets\n", " while total > 100 and len(facets) > 0:\n", " # Get the next facet\n", " facet = facets.pop()\n", " # Set the facet to a randomly selected value\n", " params[f'l-{facet}'] = get_random_facet_value(params, facet)\n", " total = get_total_results(params)\n", " #print(total)\n", " #print(response.url)\n", " # If we've ended up with some results, then select one (of the first 100) at random\n", " if total > 0:\n", " params['n'] = '100'\n", " response = s.get(API_URL, params=params)\n", " data = response.json()\n", " article = random.choice(data['response']['zone'][0]['records']['article'])\n", " return article" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get any old article..." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '166455502',\n", " 'url': '/newspaper/166455502',\n", " 'heading': 'THE SOCIETY OF ARTS EXHIBITION. A GLANCE AT THE PICTURES.',\n", " 'category': 'Detailed lists, results, guides',\n", " 'title': {'id': '824',\n", " 'value': 'Quiz and the Lantern (Adelaide, SA : 1890 - 1900)'},\n", " 'date': '1896-07-02',\n", " 'page': 13,\n", " 'pageSequence': 13,\n", " 'relevance': {'score': '0.22012922', 'value': 'may have relevance'},\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/166455502?searchTerm=%22where%22'}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random article about pademelons" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '155441736',\n", " 'url': '/newspaper/155441736',\n", " 'heading': 'UNDER THE SOUTHERN CROSS I.—A LIVELY PRIMITIVE: A MISSION ON THE MACLEAY RIVER.',\n", " 'category': 'Article',\n", " 'title': {'id': '647', 'value': 'The Methodist (Sydney, NSW : 1892 - 1954)'},\n", " 'date': '1915-02-13',\n", " 'page': 3,\n", " 'pageSequence': 3,\n", " 'relevance': {'score': '10.578832', 'value': 'very relevant'},\n", " 'snippet': 'The Macleay River is situated about 250 miles to the north of Newcastle. The river is one of the most important between the Hunter and the Brisbane. One of our devoted local preachers from',\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/155441736?searchTerm=pademelon'}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article(query='pademelon')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random article from Tasmania" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '169044787',\n", " 'url': '/newspaper/169044787',\n", " 'heading': 'SUPREME COURT SITTING IN BANCO. FRIDAY, 11th MAY, 1866. (Before their Honor Sir Valentine Fleming, Knight, Chief Justice, and Sir F. Smith, Knight, Puisne Judge.)',\n", " 'category': 'Detailed lists, results, guides',\n", " 'title': {'id': '865',\n", " 'value': 'Tasmanian Morning Herald (Hobart, Tas. : 1865 - 1866)'},\n", " 'date': '1866-05-12',\n", " 'page': 2,\n", " 'pageSequence': 2,\n", " 'relevance': {'score': '0.47627062', 'value': 'likely to be relevant'},\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/169044787?searchTerm=%22how%22'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article(state='Tasmania')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random article from the _Sydney Morning Herald_" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '28366302',\n", " 'url': '/newspaper/28366302',\n", " 'heading': 'LONDON TELEGRAMS TO THE AMERICAN PRESS.',\n", " 'category': 'Article',\n", " 'title': {'id': '35',\n", " 'value': 'The Sydney Morning Herald (NSW : 1842 - 1954)'},\n", " 'date': '1885-04-10',\n", " 'page': 5,\n", " 'pageSequence': 5,\n", " 'relevance': {'score': '1.1776975', 'value': 'likely to be relevant'},\n", " 'snippet': 'London, February 27.—The Russian newspaper Svet, the organ of General Komaroff, asserts that there is a strong party at Herat who desire that Russian protection be extended over the city. The Journal de St. Petersburg',\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/28366302?searchTerm=%22ain%22'}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article(title='35', category='Article')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random illustrated article" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '99665791',\n", " 'url': '/newspaper/99665791',\n", " 'heading': \"IN A MENAGERIE. Stories of the Animals' Cleverness.\",\n", " 'category': 'Article',\n", " 'title': {'id': '414',\n", " 'value': 'Argyle Liberal and District Recorder (NSW : 1903 - 1930)'},\n", " 'date': '1907-12-20',\n", " 'page': 3,\n", " 'pageSequence': 3,\n", " 'relevance': {'score': '5.633662', 'value': 'very relevant'},\n", " 'snippet': '\"That elephant there,\" said George, the veteran head of a large menagerie, \"hasn\\'t got any more morals than—well, she\\'s just naturally bad.\"',\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/99665791?searchTerm=%22doesn%22'}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article(illustrated='true')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random illustrated advertisement from the _Australian Womens Weekly_" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '53250867',\n", " 'url': '/newspaper/53250867',\n", " 'heading': 'Advertising',\n", " 'category': 'Advertising',\n", " 'title': {'id': '112',\n", " 'value': \"The Australian Women's Weekly (1933 - 1982)\"},\n", " 'date': '1980-06-18',\n", " 'page': 101,\n", " 'pageSequence': 101,\n", " 'relevance': {'score': '0.24747185', 'value': 'may have relevance'},\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/53250867?searchTerm=%22had%22'}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article(title='112', illustrated='true', category='Advertising')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random cartoon" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '75198942',\n", " 'url': '/newspaper/75198942',\n", " 'heading': \"Don't Tell Auntie An Hilarious Christmas-time Episode\",\n", " 'category': 'Article',\n", " 'title': {'id': '251', 'value': 'Sunshine Advocate (Vic. : 1924 - 1954)'},\n", " 'date': '1937-12-17',\n", " 'page': 1,\n", " 'pageSequence': '1 S',\n", " 'relevance': {'score': '1.9126042', 'value': 'likely to be relevant'},\n", " 'snippet': \"George Cunningham was not carrying his auntie's luggage; he was only directing the efforts of all the porters who were. At the head of the\",\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/75198942?searchTerm=%22hadn%22'}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article(illtype='Cartoon')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random article from 1930" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '113686939',\n", " 'url': '/newspaper/113686939',\n", " 'heading': 'DID BUILDER ERR? President Townsend Outspoken',\n", " 'category': 'Article',\n", " 'title': {'id': '410',\n", " 'value': 'Gilgandra Weekly and Castlereagh (NSW : 1929 - 1942)'},\n", " 'date': '1930-11-20',\n", " 'page': 7,\n", " 'pageSequence': 7,\n", " 'relevance': {'score': '0.9030686', 'value': 'likely to be relevant'},\n", " 'snippet': '\"In this case it appears that the Council has been deliberately flouted,\" declared the President (Cr. E. V. Townsend) at',\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/113686939?searchTerm=%22not%22'}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article(year='1930')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get a random article tagged 'poem'" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': '4759567',\n", " 'url': '/newspaper/4759567',\n", " 'heading': \"MIDNIGHT LAMENTATION OF HEKI'S WIFE.\",\n", " 'category': 'Article',\n", " 'title': {'id': '18', 'value': 'The Melbourne Argus (Vic. : 1846 - 1848)'},\n", " 'date': '1846-07-10',\n", " 'page': 4,\n", " 'pageSequence': 4,\n", " 'relevance': {'score': '0.59091616', 'value': 'likely to be relevant'},\n", " 'snippet': 'WRITTEN ON READING, IN THE MELBOURNE ARGUS OF TUESDAY, 30TH JUNE, AN ACCOUNT OF A VISIT TO THE NEW ZEALAND CHIEFS, HEKI AND KAWITI.',\n", " 'troveUrl': 'https://trove.nla.gov.au/ndp/del/article/4759567?searchTerm=%22g%22'}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_random_article(publictag='poem')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Speed test" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.49 s ± 742 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "get_random_article()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }