{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# Beyond the copyright cliff of death\n",
    "\n",
    "Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. Let's find out how many, and which newspapers they were published in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "from datetime import datetime\n",
    "\n",
    "import pandas as pd\n",
    "import requests\n",
    "from dotenv import load_dotenv\n",
    "from IPython.display import FileLink, display\n",
    "\n",
    "load_dotenv()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Insert your Trove API key\n",
    "API_KEY = \"YOUR API KEY\"\n",
    "\n",
    "# Use api key value from environment variables if it is available\n",
    "if os.getenv(\"TROVE_API_KEY\"):\n",
    "    API_KEY = os.getenv(\"TROVE_API_KEY\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Search for articles published after 1955\n",
    "\n",
    "First we're going to run a date query to find all the articles published after 1954. But instead of looking at the articles themselves, we're going to get the `title` facet – this will tell us the number of articles for each newspaper."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = {\n",
    "    \"q\": \"date:[1955 TO *]\",  # date range query\n",
    "    \"category\": \"newspaper\",\n",
    "    \"l-artType\": \"newspaper\",\n",
    "    \"facet\": \"title\",  # get the newspaper facets\n",
    "    \"encoding\": \"json\",\n",
    "    \"n\": 0,  # no articles thanks\n",
    "    \"key\": API_KEY,\n",
    "}\n",
    "\n",
    "headers = {\"X-API-KEY\": API_KEY}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make our API request\n",
    "response = requests.get(\n",
    "    \"https://api.trove.nla.gov.au/v3/result\", params=params, headers=headers\n",
    ")\n",
    "data = response.json()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the facet data\n",
    "facets = data[\"category\"][0][\"facets\"][\"facet\"][0][\"term\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>number_of_articles</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2567488</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>573658</td>\n",
       "      <td>1685</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>489896</td>\n",
       "      <td>370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>417472</td>\n",
       "      <td>1376</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>263618</td>\n",
       "      <td>1694</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   number_of_articles    id\n",
       "0             2567488    11\n",
       "1              573658  1685\n",
       "2              489896   370\n",
       "3              417472  1376\n",
       "4              263618  1694"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Convert to a dataframe\n",
    "df_articles = pd.DataFrame(facets)\n",
    "# Get rid of some columns\n",
    "df_articles = df_articles[[\"count\", \"search\"]]\n",
    "# Rename columns\n",
    "df_articles.columns = [\"number_of_articles\", \"id\"]\n",
    "# Change id to string, so we can merge on it later\n",
    "df_articles[\"id\"] = df_articles[\"id\"].astype(\"str\")\n",
    "# Preview results\n",
    "df_articles.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Match the facets with newspapers\n",
    "\n",
    "As you can see from the data above, the `title` facet only gives us the identifier for a newspaper, not its title or date range. To get more information about each newspaper, we're going to get a list of newspapers from the Trove API and then merge the two datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get ALL the newspapers\n",
    "response = requests.get(\n",
    "    \"https://api.trove.nla.gov.au/v3/newspaper/titles\",\n",
    "    params={\"encoding\": \"json\"},\n",
    "    headers=headers,\n",
    ")\n",
    "newspapers_data = response.json()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "newspapers = newspapers_data[\"newspaper\"]\n",
    "# Convert to a dataframe\n",
    "df_newspapers = pd.DataFrame(newspapers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>number_of_articles</th>\n",
       "      <th>id</th>\n",
       "      <th>title</th>\n",
       "      <th>state</th>\n",
       "      <th>issn</th>\n",
       "      <th>troveUrl</th>\n",
       "      <th>startDate</th>\n",
       "      <th>endDate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2567488</td>\n",
       "      <td>11</td>\n",
       "      <td>The Canberra Times (ACT : 1926 - 1995)</td>\n",
       "      <td>ACT</td>\n",
       "      <td>01576925</td>\n",
       "      <td>https://nla.gov.au/nla.news-title11</td>\n",
       "      <td>1926-09-03</td>\n",
       "      <td>1995-12-31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>573658</td>\n",
       "      <td>1685</td>\n",
       "      <td>The Australian Jewish News (Melbourne, Vic. : ...</td>\n",
       "      <td>Victoria</td>\n",
       "      <td>NDP00187</td>\n",
       "      <td>https://nla.gov.au/nla.news-title1685</td>\n",
       "      <td>1935-05-24</td>\n",
       "      <td>1999-12-24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>489896</td>\n",
       "      <td>370</td>\n",
       "      <td>Port Lincoln Times (SA : 1927 - 1988; 1992 - 2...</td>\n",
       "      <td>South Australia</td>\n",
       "      <td>13215272</td>\n",
       "      <td>https://nla.gov.au/nla.news-title370</td>\n",
       "      <td>1927-08-05</td>\n",
       "      <td>2002-12-31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>417472</td>\n",
       "      <td>1376</td>\n",
       "      <td>Papua New Guinea Post-Courier (Port Moresby : ...</td>\n",
       "      <td>International</td>\n",
       "      <td>22087427</td>\n",
       "      <td>https://nla.gov.au/nla.news-title1376</td>\n",
       "      <td>1969-06-30</td>\n",
       "      <td>1981-06-30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>263618</td>\n",
       "      <td>1694</td>\n",
       "      <td>The Australian Jewish Times (Sydney, NSW : 195...</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>NDP00196</td>\n",
       "      <td>https://nla.gov.au/nla.news-title1694</td>\n",
       "      <td>1953-10-16</td>\n",
       "      <td>1990-04-06</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   number_of_articles    id  \\\n",
       "0             2567488    11   \n",
       "1              573658  1685   \n",
       "2              489896   370   \n",
       "3              417472  1376   \n",
       "4              263618  1694   \n",
       "\n",
       "                                               title            state  \\\n",
       "0             The Canberra Times (ACT : 1926 - 1995)              ACT   \n",
       "1  The Australian Jewish News (Melbourne, Vic. : ...         Victoria   \n",
       "2  Port Lincoln Times (SA : 1927 - 1988; 1992 - 2...  South Australia   \n",
       "3  Papua New Guinea Post-Courier (Port Moresby : ...    International   \n",
       "4  The Australian Jewish Times (Sydney, NSW : 195...  New South Wales   \n",
       "\n",
       "       issn                               troveUrl   startDate     endDate  \n",
       "0  01576925    https://nla.gov.au/nla.news-title11  1926-09-03  1995-12-31  \n",
       "1  NDP00187  https://nla.gov.au/nla.news-title1685  1935-05-24  1999-12-24  \n",
       "2  13215272   https://nla.gov.au/nla.news-title370  1927-08-05  2002-12-31  \n",
       "3  22087427  https://nla.gov.au/nla.news-title1376  1969-06-30  1981-06-30  \n",
       "4  NDP00196  https://nla.gov.au/nla.news-title1694  1953-10-16  1990-04-06  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Merge the two dataframes by doing a left join on the 'id' column\n",
    "df_newspapers_post54 = pd.merge(df_articles, df_newspapers, how=\"left\", on=\"id\")\n",
    "df_newspapers_post54.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "119"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# How many newspapers?\n",
    "df_newspapers_post54.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [],
   "source": [
    "# Reorder columns and save as CSV\n",
    "csv_file = f\"newspapers_post_54_{datetime.now().strftime('%Y%m%d')}.csv\"\n",
    "df_newspapers_post54[\n",
    "    [\n",
    "        \"title\",\n",
    "        \"state\",\n",
    "        \"id\",\n",
    "        \"startDate\",\n",
    "        \"endDate\",\n",
    "        \"issn\",\n",
    "        \"number_of_articles\",\n",
    "        \"troveUrl\",\n",
    "    ]\n",
    "].to_csv(csv_file, index=False)\n",
    "# Display a link for easy download\n",
    "display(FileLink(csv_file))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "----\n",
    "\n",
    "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).  \n",
    "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "rocrate": {
   "action": [
    {
     "description": "CSV formatted dataset containing a list of digitised newspapers in Trove with articles published after 1954 (the copyright cliff of death).",
     "isPartOf": "https://github.com/GLAM-Workbench/trove-newspapers-data-post-54",
     "mainEntityOfPage": "https://glam-workbench.net/trove-newspapers/csv-newspapers-post-54/",
     "name": "Trove newspapers with articles published after 1954",
     "result": [
      {
       "url": "https://github.com/GLAM-Workbench/trove-newspapers-data-post-54/blob/main/newspapers_post_54.csv"
      }
     ],
     "workExample": [
      {
       "name": "Explore in Datasette",
       "url": "https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-newspapers-data-post-54/blob/v1.5/newspapers_post_54.csv"
      }
     ]
    }
   ],
   "author": [
    {
     "mainEntityOfPage": "https://timsherratt.au",
     "name": "Sherratt, Tim",
     "orcid": "https://orcid.org/0000-0001-7956-4498"
    }
   ],
   "category": "Trove newspapers in context",
   "description": "Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. Let's find out how many, and which newspapers they were published in.",
   "mainEntityOfPage": "https://glam-workbench.net/trove-newspapers/Beyond_the_copyright_cliff_of_death/",
   "name": "Beyond the copyright cliff of death",
   "position": 4
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}