{ "cells": [ { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Beyond the copyright cliff of death\n", "\n", "Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. Let's find out how many, and which newspapers they were published in." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "from datetime import datetime\n", "\n", "import pandas as pd\n", "import requests\n", "from dotenv import load_dotenv\n", "from IPython.display import FileLink, display\n", "\n", "load_dotenv()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Insert your Trove API key\n", "API_KEY = \"YOUR API KEY\"\n", "\n", "# Use api key value from environment variables if it is available\n", "if os.getenv(\"TROVE_API_KEY\"):\n", " API_KEY = os.getenv(\"TROVE_API_KEY\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Search for articles published after 1955\n", "\n", "First we're going to run a date query to find all the articles published after 1954. But instead of looking at the articles themselves, we're going to get the `title` facet – this will tell us the number of articles for each newspaper." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "params = {\n", " \"q\": \"date:[1955 TO *]\", # date range query\n", " \"category\": \"newspaper\",\n", " \"l-artType\": \"newspaper\",\n", " \"facet\": \"title\", # get the newspaper facets\n", " \"encoding\": \"json\",\n", " \"n\": 0, # no articles thanks\n", " \"key\": API_KEY,\n", "}\n", "\n", "headers = {\"X-API-KEY\": API_KEY}" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Make our API request\n", "response = requests.get(\n", " \"https://api.trove.nla.gov.au/v3/result\", params=params, headers=headers\n", ")\n", "data = response.json()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Get the facet data\n", "facets = data[\"category\"][0][\"facets\"][\"facet\"][0][\"term\"]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_articlesid
0256748811
15736581685
2489896370
34174721376
42636181694
\n", "
" ], "text/plain": [ " number_of_articles id\n", "0 2567488 11\n", "1 573658 1685\n", "2 489896 370\n", "3 417472 1376\n", "4 263618 1694" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert to a dataframe\n", "df_articles = pd.DataFrame(facets)\n", "# Get rid of some columns\n", "df_articles = df_articles[[\"count\", \"search\"]]\n", "# Rename columns\n", "df_articles.columns = [\"number_of_articles\", \"id\"]\n", "# Change id to string, so we can merge on it later\n", "df_articles[\"id\"] = df_articles[\"id\"].astype(\"str\")\n", "# Preview results\n", "df_articles.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Match the facets with newspapers\n", "\n", "As you can see from the data above, the `title` facet only gives us the identifier for a newspaper, not its title or date range. To get more information about each newspaper, we're going to get a list of newspapers from the Trove API and then merge the two datasets." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Get ALL the newspapers\n", "response = requests.get(\n", " \"https://api.trove.nla.gov.au/v3/newspaper/titles\",\n", " params={\"encoding\": \"json\"},\n", " headers=headers,\n", ")\n", "newspapers_data = response.json()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "newspapers = newspapers_data[\"newspaper\"]\n", "# Convert to a dataframe\n", "df_newspapers = pd.DataFrame(newspapers)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_articlesidtitlestateissntroveUrlstartDateendDate
0256748811The Canberra Times (ACT : 1926 - 1995)ACT01576925https://nla.gov.au/nla.news-title111926-09-031995-12-31
15736581685The Australian Jewish News (Melbourne, Vic. : ...VictoriaNDP00187https://nla.gov.au/nla.news-title16851935-05-241999-12-24
2489896370Port Lincoln Times (SA : 1927 - 1988; 1992 - 2...South Australia13215272https://nla.gov.au/nla.news-title3701927-08-052002-12-31
34174721376Papua New Guinea Post-Courier (Port Moresby : ...International22087427https://nla.gov.au/nla.news-title13761969-06-301981-06-30
42636181694The Australian Jewish Times (Sydney, NSW : 195...New South WalesNDP00196https://nla.gov.au/nla.news-title16941953-10-161990-04-06
\n", "
" ], "text/plain": [ " number_of_articles id \\\n", "0 2567488 11 \n", "1 573658 1685 \n", "2 489896 370 \n", "3 417472 1376 \n", "4 263618 1694 \n", "\n", " title state \\\n", "0 The Canberra Times (ACT : 1926 - 1995) ACT \n", "1 The Australian Jewish News (Melbourne, Vic. : ... Victoria \n", "2 Port Lincoln Times (SA : 1927 - 1988; 1992 - 2... South Australia \n", "3 Papua New Guinea Post-Courier (Port Moresby : ... International \n", "4 The Australian Jewish Times (Sydney, NSW : 195... New South Wales \n", "\n", " issn troveUrl startDate endDate \n", "0 01576925 https://nla.gov.au/nla.news-title11 1926-09-03 1995-12-31 \n", "1 NDP00187 https://nla.gov.au/nla.news-title1685 1935-05-24 1999-12-24 \n", "2 13215272 https://nla.gov.au/nla.news-title370 1927-08-05 2002-12-31 \n", "3 22087427 https://nla.gov.au/nla.news-title1376 1969-06-30 1981-06-30 \n", "4 NDP00196 https://nla.gov.au/nla.news-title1694 1953-10-16 1990-04-06 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Merge the two dataframes by doing a left join on the 'id' column\n", "df_newspapers_post54 = pd.merge(df_articles, df_newspapers, how=\"left\", on=\"id\")\n", "df_newspapers_post54.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "119" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many newspapers?\n", "df_newspapers_post54.shape[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "nbval-skip" ] }, "outputs": [], "source": [ "# Reorder columns and save as CSV\n", "csv_file = f\"newspapers_post_54_{datetime.now().strftime('%Y%m%d')}.csv\"\n", "df_newspapers_post54[\n", " [\n", " \"title\",\n", " \"state\",\n", " \"id\",\n", " \"startDate\",\n", " \"endDate\",\n", " \"issn\",\n", " \"number_of_articles\",\n", " \"troveUrl\",\n", " ]\n", "].to_csv(csv_file, index=False)\n", "# Display a link for easy download\n", "display(FileLink(csv_file))" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). \n", "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "rocrate": { "action": [ { "description": "CSV formatted dataset containing a list of digitised newspapers in Trove with articles published after 1954 (the copyright cliff of death).", "isPartOf": "https://github.com/GLAM-Workbench/trove-newspapers-data-post-54", "mainEntityOfPage": "https://glam-workbench.net/trove-newspapers/csv-newspapers-post-54/", "name": "Trove newspapers with articles published after 1954", "result": [ { "url": "https://github.com/GLAM-Workbench/trove-newspapers-data-post-54/blob/main/newspapers_post_54.csv" } ], "workExample": [ { "name": "Explore in Datasette", "url": "https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-newspapers-data-post-54/blob/v1.5/newspapers_post_54.csv" } ] } ], "author": [ { "mainEntityOfPage": "https://timsherratt.au", "name": "Sherratt, Tim", "orcid": "https://orcid.org/0000-0001-7956-4498" } ], "category": "Trove newspapers in context", "description": "Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. Let's find out how many, and which newspapers they were published in.", "mainEntityOfPage": "https://glam-workbench.net/trove-newspapers/Beyond_the_copyright_cliff_of_death/", "name": "Beyond the copyright cliff of death", "position": 4 } }, "nbformat": 4, "nbformat_minor": 4 }