{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Beyond the copyright cliff of death\n", "\n", "Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. Let's find out how many, and which newspapers they were published in." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "import pandas as pd\n", "import requests\n", "from IPython.display import FileLink, display" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Load variables from the .env file if it exists\n", "# Use %%capture to suppress messages\n", "%load_ext dotenv\n", "%dotenv" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Insert your Trove API key\n", "API_KEY = \"YOUR API KEY\"\n", "\n", "# Use api key value from environment variables if it is available\n", "if os.getenv(\"TROVE_API_KEY\"):\n", " API_KEY = os.getenv(\"TROVE_API_KEY\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Search for articles published after 1955\n", "\n", "First we're going to run a date query to find all the articles published after 1954. But instead of looking at the articles themselves, we're going to get the `title` facet – this will tell us the number of articles for each newspaper." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "params = {\n", " \"q\": \"date:[1955 TO *]\", # date range query\n", " \"zone\": \"newspaper\",\n", " \"facet\": \"title\", # get the newspaper facets\n", " \"encoding\": \"json\",\n", " \"n\": 0, # no articles thanks\n", " \"key\": API_KEY,\n", "}" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Make our API request\n", "response = requests.get(\"https://api.trove.nla.gov.au/v2/result\", params=params)\n", "data = response.json()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Get the facet data\n", "facets = data[\"response\"][\"zone\"][0][\"facets\"][\"facet\"][\"term\"]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_articlesid
0256748811
15736581685
24174721376
32636181694
4225466112
\n", "
" ], "text/plain": [ " number_of_articles id\n", "0 2567488 11\n", "1 573658 1685\n", "2 417472 1376\n", "3 263618 1694\n", "4 225466 112" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert to a dataframe\n", "df_articles = pd.DataFrame(facets)\n", "# Get rid of some columns\n", "df_articles = df_articles[[\"count\", \"display\"]]\n", "# Rename columns\n", "df_articles.columns = [\"number_of_articles\", \"id\"]\n", "# Change id to string, so we can merge on it later\n", "df_articles[\"id\"] = df_articles[\"id\"].astype(\"str\")\n", "# Preview results\n", "df_articles.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Match the facets with newspapers\n", "\n", "As you can see from the data above, the `title` facet only gives us the identifier for a newspaper, not its title or date range. To get more information about each newspaper, we're going to get a list of newspapers from the Trove API and then merge the two datasets." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Get ALL the newspapers\n", "response = requests.get(\n", " \"https://api.trove.nla.gov.au/v2/newspaper/titles\",\n", " params={\"encoding\": \"json\", \"key\": API_KEY},\n", ")\n", "newspapers_data = response.json()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "newspapers = newspapers_data[\"response\"][\"records\"][\"newspaper\"]\n", "# Convert to a dataframe\n", "df_newspapers = pd.DataFrame(newspapers)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_articlesidtitlestateissntroveUrlstartDateendDate
0256748811The Canberra Times (ACT : 1926 - 1995)ACT01576925https://trove.nla.gov.au/ndp/del/title/111926-09-031995-12-31
15736581685The Australian Jewish News (Melbourne, Vic. : ...VictoriaNDP00187https://trove.nla.gov.au/ndp/del/title/16851935-05-241999-12-24
24174721376Papua New Guinea Post-Courier (Port Moresby : ...International22087427https://trove.nla.gov.au/ndp/del/title/13761969-06-301981-06-30
32636181694The Australian Jewish Times (Sydney, NSW : 195...New South WalesNDP00196https://trove.nla.gov.au/ndp/del/title/16941953-10-161990-04-06
4225466112The Australian Women's Weekly (1933 - 1982)National00050458https://trove.nla.gov.au/ndp/del/title/1121933-06-101982-12-15
\n", "
" ], "text/plain": [ " number_of_articles id title \\\n", "0 2567488 11 The Canberra Times (ACT : 1926 - 1995) \n", "1 573658 1685 The Australian Jewish News (Melbourne, Vic. : ... \n", "2 417472 1376 Papua New Guinea Post-Courier (Port Moresby : ... \n", "3 263618 1694 The Australian Jewish Times (Sydney, NSW : 195... \n", "4 225466 112 The Australian Women's Weekly (1933 - 1982) \n", "\n", " state issn troveUrl \\\n", "0 ACT 01576925 https://trove.nla.gov.au/ndp/del/title/11 \n", "1 Victoria NDP00187 https://trove.nla.gov.au/ndp/del/title/1685 \n", "2 International 22087427 https://trove.nla.gov.au/ndp/del/title/1376 \n", "3 New South Wales NDP00196 https://trove.nla.gov.au/ndp/del/title/1694 \n", "4 National 00050458 https://trove.nla.gov.au/ndp/del/title/112 \n", "\n", " startDate endDate \n", "0 1926-09-03 1995-12-31 \n", "1 1935-05-24 1999-12-24 \n", "2 1969-06-30 1981-06-30 \n", "3 1953-10-16 1990-04-06 \n", "4 1933-06-10 1982-12-15 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Merge the two dataframes by doing a left join on the 'id' column\n", "df_newspapers_post54 = pd.merge(df_articles, df_newspapers, how=\"left\", on=\"id\")\n", "df_newspapers_post54.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "92" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many newspapers?\n", "df_newspapers_post54.shape[0]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [ "nbval-skip" ] }, "outputs": [ { "data": { "text/html": [ "newspapers_post_54.csv
" ], "text/plain": [ "/home/tim/Workspace/mycode/glam-workbench/trove-newspapers/notebooks/newspapers_post_54.csv" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Reorder columns and save as CSV\n", "df_newspapers_post54[\n", " [\n", " \"title\",\n", " \"state\",\n", " \"id\",\n", " \"startDate\",\n", " \"endDate\",\n", " \"issn\",\n", " \"number_of_articles\",\n", " \"troveUrl\",\n", " ]\n", "].to_csv(\"newspapers_post_54.csv\", index=False)\n", "# Display a link for easy download\n", "display(FileLink(\"newspapers_post_54.csv\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). \n", "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 4 }