{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# Today's news yesterday\n",
    "\n",
    "Version 2 of the Trove API fixed a problem with date searching. At last you can search for articles published on a particular day!\n",
    "\n",
    "There's a trick though. If you want to find articles from 2 November 1942, you have to search for a date range from 1 November to 2 November. This is what the query would look like:\n",
    "\n",
    "```\n",
    "date:[1942-11-01T00:00:00Z TO 1942-11-02T00:00:00Z]\n",
    "```\n",
    "\n",
    "Once you know that, it's not too hard to do things like find front pages from exactly 100 years ago. This notebook shows you how."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.</p>\n",
    "\n",
    "<p>\n",
    "    Some tips:\n",
    "    <ul>\n",
    "        <li>Code cells have boxes around them.</li>\n",
    "        <li>To run a code cell click on the cell and then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook.</li>\n",
    "        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.</li>\n",
    "        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>\n",
    "        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>\n",
    "    </ul>\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get things ready"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import random\n",
    "import re\n",
    "import shutil\n",
    "\n",
    "import arrow\n",
    "import requests\n",
    "from dotenv import load_dotenv\n",
    "from IPython.display import Image, display\n",
    "\n",
    "load_dotenv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Set your API key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Insert your Trove API key\n",
    "API_KEY = \"YOUR API KEY\"\n",
    "\n",
    "# Use api key value from environment variables if it is available\n",
    "if os.getenv(\"TROVE_API_KEY\"):\n",
    "    API_KEY = os.getenv(\"TROVE_API_KEY\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a date query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get today's date\n",
    "now = arrow.now(\"Australia/Canberra\")\n",
    "# Go back in time 100 years\n",
    "end = now.shift(years=-100)\n",
    "# Subtract an extra day for the start of the date range\n",
    "start = end.shift(days=-1)\n",
    "# Format the query\n",
    "date_query = \"date:[{}Z TO {}Z]\".format(\n",
    "    start.format(\"YYYY-MM-DDT00:00:00\"), end.format(\"YYYY-MM-DDT00:00:00\")\n",
    ")\n",
    "date_query"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up API request parameters\n",
    "\n",
    "Note that we're adding `firstpageseq:1` to the date query. This limits results to articles on the front page. We can then get the identifier of the front page from the article record."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Set up parameters for our API query\n",
    "# <-- Click the run icon\n",
    "params = {\n",
    "    \"category\": \"newspaper\",\n",
    "    \"l-artType\": \"newspaper\",\n",
    "    \"reclevel\": \"full\",\n",
    "    \"encoding\": \"json\",\n",
    "    \"n\": \"100\",\n",
    "    \"q\": \"{} firstpageseq:1\".format(date_query),\n",
    "}\n",
    "\n",
    "headers = {\"X-API-KEY\": API_KEY}\n",
    "\n",
    "api_url = \"http://api.trove.nla.gov.au/v3/result\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make the API request"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "response = requests.get(api_url, params=params, headers=headers)\n",
    "data = response.json()\n",
    "articles = data[\"category\"][0][\"records\"][\"article\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select and download a front page\n",
    "\n",
    "Our API request returned a maximum of 100 articles. This function selects one at random, then downloads the front page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def get_front_page():\n",
    "    # Select a random article\n",
    "    article = random.sample(articles, 1)[0]\n",
    "    # Get the front page identifier from the page url\n",
    "    page_id = re.search(r\"news-page(\\d+)\", article[\"trovePageUrl\"]).group(1)\n",
    "    # Construct the url we need to download the image\n",
    "    page_url = \"http://trove.nla.gov.au/ndp/imageservice/nla.news-page{}/level2\".format(\n",
    "        page_id\n",
    "    )\n",
    "    # Download the page image\n",
    "    response = requests.get(page_url, stream=True)\n",
    "    with open(\"data/frontpage.jpg\", \"wb\") as out_file:\n",
    "        shutil.copyfileobj(response.raw, out_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Display the front page\n",
    "\n",
    "First we use the function defined above to download a randomly-selected front page, and then we display it.\n",
    "\n",
    "Re-run this cell for a different page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "get_front_page()\n",
    "display(Image(filename=\"data/frontpage.jpg\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "----\n",
    "\n",
    "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).  \n",
    "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb).\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "rocrate": {
   "author": [
    {
     "mainEntityOfPage": "https://timsherratt.au",
     "name": "Sherratt, Tim",
     "orcid": "https://orcid.org/0000-0001-7956-4498"
    }
   ],
   "description": "Uses the date index and the firstpageseq parameter to find articles from exactly 100 years ago that were published on the front page. It then selects one of the articles at random and downloads and displays an image of the front page.",
   "mainEntityOfPage": "https://glam-workbench.net/trove-newspapers/Todays-news-yesterday/",
   "name": "Today's news yesterday"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}