{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# Upload Trove newspaper articles to Omeka-S\n",
    "\n",
    "I was keen to play around with [Omeka-S](https://omeka.org/s/) – the new, Linked Open Data powered version of Omeka. In particular I wanted to understand its [API](https://omeka.org/s/docs/developer/key_concepts/api/), its use of [JSON-LD](https://json-ld.org/), and how I could create and use relationships between different types of entities. I decided that one way of exploring these things was to build a pipeline that would make it easy to upload Trove newspaper articles to an Omeka site. I recently figured out how to generate nice thumbnails from newspaper articles, so I could add images as well. But what about the 'Linked' part of LOD? I thought I'd not only create records for the newspaper articles, I'd create records for each cited newspaper and link them to the articles. Things not strings!\n",
    "\n",
    "Of course, this is not the only way to get Trove records into Omeka. You can save them to [Zotero](https://www.zotero.org/) and then use the [Zotero Import module](https://omeka.org/s/docs/user-manual/modules/zoteroimport/) to upload to Omeka. My [first Omeka 'how to'](http://discontents.com.au/some-exhibition-magic-with-zotero-and-omeka/) (written 8 years ago!) talked about using the Zotero Import plugin to upload records from the National Archives of Australia. While the import module works well, I wanted the flexibility to generate new images on the fly, modify the metadata, and unflatten the Zotero records to build newspaper-to-article relationships. I also wanted to be able to get newspaper articles from other sources, such as Trove lists.\n",
    "\n",
    "As I started playing around I realised I could rejig my thumbnail generator to [get an image of the whole article](https://nbviewer.jupyter.org/github/GLAM-Workbench/trove-newspapers/blob/master/Save-Trove-newspaper-article-as-image.ipynb). This is useful because Trove's article download options tend to slice up images in unpleasant ways. So I decided to upload the complete image (or images if the article is published across multiple pages) and let Omeka create the derivative versions.\n",
    "\n",
    "In summary, you can use this notebook to:\n",
    "\n",
    "* Get a Trove newspaper article using the Trove API\n",
    "* Generate an image for the article\n",
    "* Search the Omeka site using the API to see if the newspaper that published the article already has a record, if not create one.\n",
    "* Upload the article details to Omeka using the API, including a link to the newspaper record, as well as the article image(s).\n",
    "\n",
    "But what newspaper articles are we uploading? I've created four different ways of selecting articles for upload. There are further details and configuration options for each of these  methods below:\n",
    "\n",
    "* [Option 1: Upload a Trove newspaper search](#Option-1%3A-Upload-a-Trove-newspaper-search)\n",
    "* [Option 2: Upload newspaper articles from a Trove list](#Option-2%3A-Upload-newspaper-articles-from-a-Trove-list)\n",
    "* [Option 3: Upload Trove newspaper articles saved in Zotero](#Option-3%3A-Upload-Trove-newspaper-articles-saved-in-Zotero)\n",
    "* [Option 4: Upload a list of article ids](#Option-4%3A-Upload-a-list-of-article-ids)\n",
    "\n",
    "There's a fair bit of configuration involved in getting this all working, so make sure you follow the instructions under [Basic configuration](#Basic-configuration) and [Preparing Omeka-S](#Preparing-Omeka-S) before proceeding.\n",
    "\n",
    "## An example\n",
    "\n",
    "Here's an example of a [newspaper article](https://trove.nla.gov.au/newspaper/article/162833980) in Trove:\n",
    "\n",
    "<img src=\"images/162833980.png\" style=\"width: 600px;\">\n",
    "\n",
    "Here's the same article after it's been imported into Omeka. Note that the article is linked to the newspaper that published it using the `isPartOf` property. You can also see the images and PDF of the article that were automatically uploaded along with the metadata.\n",
    "\n",
    "<img src=\"images/article_omeka.png\" style=\"width: 600px;\">\n",
    "\n",
    "And here's the record for the linked newspaper:\n",
    "\n",
    "<img src=\"images/newspaper_omeka.png\" style=\"width: 600px;\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic configuration\n",
    "\n",
    "First of all you need an Omeka-S site to upload to! If you don't have a site, a handy server, or a preferred web host, I'd suggest you set up an account with [Reclaim Hosting](https://reclaimhosting.com/) – they're affordable, have excellent support, and provide a one-click installer for Omeka-S. You'll be up and running very quickly.\n",
    "\n",
    "Once you have your Omeka-S site, you need to add some configuration values in the cell below:\n",
    "\n",
    "* `API_URL` – the url (or endpoint) of your Omeka site's API. This is basically just the url of your site with `/api` on the end.\n",
    "* `KEY_IDENTITY` and `KEY_CREDENTIAL` these are the authentication keys you need to upload new records to Omeka-S.\n",
    "* `TROVE_API_KEY` – authentication key to access the Trove API\n",
    "\n",
    "See below for instructions on generating your keys.\n",
    "\n",
    "### Generating your Omeka keys\n",
    "\n",
    "1. To generate your keys, log in to the admin interface of your Omeka-S site and click on 'Users' in the side menu. \n",
    "2. Find your user account in the list and then click on the pencil icon to edit your details. \n",
    "3. Click on the 'API keys' tab.\n",
    "4. In the 'New key label' box, enter a name for your key – something like 'Trove upload' would be fine.\n",
    "5. Click on the 'Save' button.\n",
    "6. A message will appear with your `key_identity` and `key_credential` values – the `key_credential` is only ever displayed once, so copy them both now!\n",
    "\n",
    "### Trove API key\n",
    "\n",
    "To get your Trove API key, just [follow these instructions](http://help.nla.gov.au/trove/building-with-trove/api).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Preparing Omeka-S\n",
    "\n",
    "Before we start uploading newspaper articles we need to set up the vocabularies and templates we need in Omeka to describe them.\n",
    "\n",
    "### Installing the schema.org vocabulary\n",
    "\n",
    "Omeka uses defined vocabularies to describe items. A number of vocabularies are built-in, but we're going to add the widely-used [Schema.org](https://schema.org/) vocabularly. Amongst many other things, Schema.org includes classes for both `Newspaper` and `NewsArticle`. You can download the [Schema.org definition files here](https://schema.org/docs/developers.html). In the instructions below, I suggest installing the 'all layers' version of the definition file, as this includes the [Bib extension](https://bib.schema.org/) as well as [Pending changes](https://pending.schema.org/) such as `ArchiveComponent` and `ArchiveOrganization`.\n",
    "\n",
    "To install Schema.org in Omeka:\n",
    "\n",
    "1. Log in to the admin interface of your Omeka-S site and click on 'Vocabularies' in the side menu.\n",
    "2. Click on the 'Import new vocabulary button.\n",
    "3. In the `Vocabularly url` field enter http://schema.org/version/latest/all-layers.rdf\n",
    "4. In the `File format` field select 'RDF/XML'\n",
    "5. In the `Prefix` field enter 'schema'\n",
    "6. In the `Namespace URI` field enter http://schema.org/\n",
    "7. In the `Label` field enter 'Schema.org'\n",
    "8. Click 'Import' to complete the installation.\n",
    "\n",
    "This is what the import form should look like:\n",
    "\n",
    "<img src=\"images/omeka-vocab-import.png\" width=\"600\">\n",
    "\n",
    "See the Omeka-S documentation for [more information on managing vocabularies](https://omeka.org/s/docs/user-manual/content/vocabularies/).\n",
    "\n",
    "### Installing the Numeric Data Type module\n",
    "\n",
    "The [Numeric Data Type module](https://omeka.org/s/docs/user-manual/modules/numericdatatypes/) gives you more options in defining the data type of a field. In particular, you can identify certain values as ISO formatted dates so they can then be properly formatted, sorted, and searched. The template I've created for newspaper articles (see below) uses the 'timestamp' data type to import the dates of the newspaper articles, so you'll need to install this module before importing the templates.\n",
    "\n",
    "1. Download the [Numeric Data Type module](https://omeka.org/s/modules/NumericDataTypes/) to your computer.\n",
    "2. Upload the zipped module to the modules folder of your Omeka site.\n",
    "3. Unzip the module.\n",
    "4. Log in to the admin interface of your Omeka-S site and click on 'Modules' in the side menu.\n",
    "5. Click on the 'Install' button.\n",
    "\n",
    "See the Omeka-S documentation for [more information on installing modules](https://omeka.org/s/docs/user-manual/modules/#installing-modules).\n",
    "\n",
    "### Importing the resource templates \n",
    "\n",
    "One of the powerful features of Omeka is the ability to define the types of items you're working with using 'resource templates'. These resource templates associate the items with specific vocabulary classes and list the expected properties. I've created resource templates for 'Newspaper' and 'Newspaper article' and exported them as JSON. The Trove upload code looks for these templates, so they need to be imported into your Omeka site.\n",
    "\n",
    "1. Download [Newspaper.json](templates/Newspaper.json) and [Newspaper_article.json](templates/Newspaper_article.json) to your own computer.\n",
    "2. Log in to the admin interface of your Omeka-S site and click on 'Resource templates' in the side menu.\n",
    "3. Click on the 'Import' button\n",
    "4. Click on 'Browse' and select the `Newspaper.json` file you downloaded.\n",
    "5. Click on the 'Review import' button.\n",
    "6. Click on the 'Complete import' button.\n",
    "7. Click the 'Edit Resource Template' button.\n",
    "8. Find the `name` property and click on the pencil icon to edit it.\n",
    "9. Check the box next to 'Use for resource title', then click on the 'Set changes' button.\n",
    "10. Find the `description` property and click on the pencil icon to edit it.\n",
    "11. Check the box next to 'Use for resource description', then click on the 'Set changes' button.\n",
    "12. Click on the 'Save' button.\n",
    "13. Repeat the same procedure to import `Newspaper_article.json`, but check at the 'Review import' stage to make sure that the selected data type for the `datePublished` property is `Timestamp`.\n",
    "\n",
    "Note that I've deliberately kept the templates fairly simple. You might want to add additional properties. However, if you change or remove any of the existing properties, you'll probably also have to edit the `add_article()` function below.\n",
    "\n",
    "See the Omeka-S documentation for [more information on managing resource templates](https://omeka.org/s/docs/user-manual/content/resource-template/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define all the functions that we need\n",
    "\n",
    "Hit **Shift+Enter** to run each of the cells below and set up all the basic configuration and functions we need."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import re\n",
    "from pathlib import Path\n",
    "\n",
    "import arrow\n",
    "import requests\n",
    "from bs4 import BeautifulSoup\n",
    "from dotenv import load_dotenv\n",
    "from omeka_s_tools.api import OmekaAPIClient\n",
    "from pyzotero import zotero\n",
    "from requests.adapters import HTTPAdapter\n",
    "from requests.packages.urllib3.util.retry import Retry\n",
    "from tqdm.auto import tqdm\n",
    "from trove_newspaper_images.articles import download_images\n",
    "\n",
    "s = requests.Session()\n",
    "retries = Retry(total=10, backoff_factor=1, status_forcelist=[502, 503, 504, 524])\n",
    "s.mount(\"http://\", HTTPAdapter(max_retries=retries))\n",
    "s.mount(\"https://\", HTTPAdapter(max_retries=retries))\n",
    "\n",
    "load_dotenv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Insert the values of your Trove API key and your Omeka API url and authentication keys below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# CONFIGURATION\n",
    "# Things you need to change!\n",
    "# Paste your values in below!\n",
    "\n",
    "# The url of your Omeka site's api (basically your Omeka site site url with '/api' on the end)\n",
    "API_URL = \"http://your.omeka.site/api\"\n",
    "\n",
    "# The keys to your Omeka site\n",
    "KEY_IDENTITY = \"YOUR OMEKA KEY IDENTITY\"\n",
    "KEY_CREDENTIAL = \"YOUR OMEKA KEY CREDENTIAL\"\n",
    "\n",
    "# Your Trove API key\n",
    "TROVE_API_KEY = \"YOUR TROVE API KEY\"\n",
    "\n",
    "# Alternatively, use api keys and settings from environment variables if available\n",
    "if os.getenv(\"TROVE_API_KEY\"):\n",
    "    TROVE_API_KEY = os.getenv(\"TROVE_API_KEY\")\n",
    "if os.getenv(\"OMEKA_KEY_IDENTITY\"):\n",
    "    KEY_IDENTITY = os.getenv(\"OMEKA_KEY_IDENTITY\")\n",
    "if os.getenv(\"OMEKA_KEY_CREDENTIAL\"):\n",
    "    KEY_CREDENTIAL = os.getenv(\"OMEKA_KEY_CREDENTIAL\")\n",
    "if os.getenv(\"OMEKA_API_URL\"):\n",
    "    API_URL = os.getenv(\"OMEKA_API_URL\")\n",
    "\n",
    "TROVE_HEADERS = {\"X-API-KEY\": TROVE_API_KEY}\n",
    "\n",
    "# Resize images so this is the max dimension -- the Trove page images are very big, so you might want to resize before uploading to Omeka\n",
    "# Set this to None if you want them as big as possible (this might be useful if you're using the Omeka IIIF server & Universal viewer modules)\n",
    "MAX_IMAGE_SIZE = 3000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "omeka = OmekaAPIClient(\n",
    "    API_URL, key_identity=KEY_IDENTITY, key_credential=KEY_CREDENTIAL\n",
    ")\n",
    "\n",
    "\n",
    "def get_article(article_id):\n",
    "    \"\"\"\n",
    "    Retrieve an individual newspaper article from the Trove API.\n",
    "\n",
    "    Parameters:\n",
    "    * `article_id` - a Trove article identifier\n",
    "\n",
    "    Returns:\n",
    "    * a dict with article metadata from Trove API\n",
    "    \"\"\"\n",
    "    url = \"http://api.trove.nla.gov.au/v3/newspaper/{}\".format(article_id)\n",
    "    params = {\"include\": \"articleText\", \"encoding\": \"json\"}\n",
    "    response = s.get(url, params=params, headers=TROVE_HEADERS)\n",
    "    return response.json()\n",
    "\n",
    "\n",
    "def check_for_item(item_url, template_id):\n",
    "    \"\"\"\n",
    "    Check to see if an item exists in Omeka using schema:url values and template ids.\n",
    "\n",
    "    Parameters:\n",
    "    * `item_url` - a unique url saved in the schema:url field of an item\n",
    "    * `template_id` - the Omeka id of a template to filter item results\n",
    "\n",
    "    Returns:\n",
    "    * the JSON-LD item representation if found, None if not.\n",
    "    \"\"\"\n",
    "    # Filter items by the supplied parameters\n",
    "    results = omeka.filter_items_by_property(\n",
    "        filter_property=\"schema:url\",\n",
    "        filter_value=item_url,\n",
    "        resource_template_id=template_id,\n",
    "    )\n",
    "    # Get the first record, or None if there are no matches\n",
    "    try:\n",
    "        item = results[\"results\"][0]\n",
    "    except (KeyError, IndexError):\n",
    "        item = None\n",
    "    return item\n",
    "\n",
    "\n",
    "def add_newspaper(newspaper):\n",
    "    \"\"\"\n",
    "    Check to see if the given newspaper has already been uploaded to Omeka.\n",
    "    If not, upload metadata to Omeka.\n",
    "\n",
    "    Parameters:\n",
    "    * `newspaper` - this is the dict identifying the newspaper from a Trove article record\n",
    "\n",
    "    Returns:\n",
    "    * the Omeka id of the newspaper record\n",
    "    \"\"\"\n",
    "    # Get details of the Newspaper template\n",
    "    newspaper_template = omeka.get_template_by_label(\"Newspaper\")\n",
    "    template_id = newspaper_template[\"o:id\"]\n",
    "    # Get the class used with the Newspaper template\n",
    "    class_id = newspaper_template[\"o:resource_class\"][\"o:id\"]\n",
    "    # Construct a Trove persistent url for the newspaper\n",
    "    newspaper_url = f'http://nla.gov.au/nla.news-title{newspaper[\"id\"]}'\n",
    "    # Check to see if the newspaper has already been uploaded to Omeka\n",
    "    newspaper_item = check_for_item(newspaper_url, template_id)\n",
    "    # If it hasn't been uploaded, upload it!\n",
    "    if not newspaper_item:\n",
    "        # Prepare the data\n",
    "        newspaper_data = {\n",
    "            \"schema:name\": [newspaper[\"title\"]],\n",
    "            \"schema:identifier\": [newspaper[\"id\"]],\n",
    "            \"schema:url\": [newspaper_url],\n",
    "        }\n",
    "        # Construct the payload for upload to omeka\n",
    "        payload = omeka.prepare_item_payload_using_template(newspaper_data, template_id)\n",
    "        # Upload the item\n",
    "        newspaper_item = omeka.add_item(\n",
    "            payload, template_id=template_id, class_id=class_id\n",
    "        )\n",
    "    return newspaper_item[\"o:id\"]\n",
    "\n",
    "\n",
    "def add_article(article_id):\n",
    "    \"\"\"\n",
    "    Check to see if the given article has already been uploaded to Omeka.\n",
    "    If not, retrieve information about a newspaper article from Trove and\n",
    "    upload the metadata, text, and images to Omeka.\n",
    "\n",
    "    Parameters:\n",
    "    * `article_id` - a Trove article identifier\n",
    "\n",
    "    Returns:\n",
    "    * a JSON-LD representation of the new Omeka item\n",
    "    \"\"\"\n",
    "    # Get details of the newspaper article template\n",
    "    article_template = omeka.get_template_by_label(\"Newspaper article\")\n",
    "    template_id = article_template[\"o:id\"]\n",
    "    # Get the resource class used with the newspaper article template\n",
    "    class_id = article_template[\"o:resource_class\"][\"o:id\"]\n",
    "    # Construct a Trove article persistent url\n",
    "    article_url = f\"http://nla.gov.au/nla.news-article{article_id}\"\n",
    "    # Check to see if an article with this url has already been uploaded to Omeka\n",
    "    article_item = check_for_item(article_url, template_id)\n",
    "    # If the article hasn't been uploaded, we'll upload it!\n",
    "    if not article_item:\n",
    "        # Get article details from Trove\n",
    "        article = get_article(article_id)\n",
    "        # Format a description\n",
    "        formatted_date = arrow.get(article[\"date\"], \"YYYY-MM-DD\").format(\"D MMM YYYY\")\n",
    "        summary = (\n",
    "            f'{formatted_date}, {article[\"title\"][\"title\"]}, page {article[\"page\"]}'\n",
    "        )\n",
    "        # Get the Omeka id of the newspaper it was published in\n",
    "        newspaper_id = add_newspaper(article[\"title\"])\n",
    "        # Remove html tags from article text\n",
    "        try:\n",
    "            soup = BeautifulSoup(article[\"articleText\"])\n",
    "            article_text = soup.get_text()\n",
    "        except KeyError:\n",
    "            article_text = \"\"\n",
    "        # Prepare the article metadata\n",
    "        article_data = {\n",
    "            \"schema:name\": [article[\"heading\"]],\n",
    "            \"schema:description\": [summary],\n",
    "            \"schema:datePublished\": [article[\"date\"]],\n",
    "            \"schema:isPartOf\": [newspaper_id],\n",
    "            \"schema:pagination\": [article[\"page\"]],\n",
    "            \"schema:identifier\": [article_id],\n",
    "            \"schema:url\": [article_url],\n",
    "            \"schema:text\": [article_text],\n",
    "        }\n",
    "        # Construct the payload for uploading to Omeka\n",
    "        payload = omeka.prepare_item_payload_using_template(article_data, template_id)\n",
    "        # Download images of the article\n",
    "        article_images = download_images(article_id, output_dir=\"temp\")\n",
    "        image_paths = [Path(\"temp\", i) for i in article_images]\n",
    "        # Upload the article\n",
    "        article_item = omeka.add_item(\n",
    "            payload, media_files=image_paths, template_id=template_id, class_id=class_id\n",
    "        )\n",
    "    return article_item"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Select your upload method\n",
    "\n",
    "Select the method you want to use to select and process your articles and follow the associated instructions.\n",
    "\n",
    "----\n",
    "\n",
    "### Option 1: Upload a Trove newspaper search\n",
    "\n",
    "This will attempt to upload **all** the newspaper articles returned by a search to your Omeka site. Obviously you want to **restrict your search** to make sure you're only getting the articles you want – use things like facets and date ranges to keep the result set to a manageable size. **You have been warned...**\n",
    "\n",
    "Once you've constructed your search, enter the various parameters below. You might want to add additional facets such as `l-decade` or `l-year`. Check out [Trove's API documentation](https://help.nla.gov.au/trove/building-with-trove/api-version-2-technical-guide) to see all the options.\n",
    "\n",
    "Once you've edited the search details, hit **Shift+Enter** to run the cells below and start your upload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def get_total_results(params):\n",
    "    \"\"\"\n",
    "    Get the total number of results for a Trove search.\n",
    "    \"\"\"\n",
    "    these_params = params.copy()\n",
    "    these_params[\"n\"] = 0\n",
    "    response = s.get(\n",
    "        \"https://api.trove.nla.gov.au/v3/result\",\n",
    "        params=these_params,\n",
    "        headers=TROVE_HEADERS,\n",
    "    )\n",
    "    data = response.json()\n",
    "    return int(data[\"category\"][0][\"records\"][\"total\"])\n",
    "\n",
    "\n",
    "def upload_trove_search(params):\n",
    "    start = \"*\"\n",
    "    total = get_total_results(params)\n",
    "    with tqdm(total=total) as pbar:\n",
    "        while start:\n",
    "            params[\"s\"] = start\n",
    "            response = s.get(\n",
    "                \"https://api.trove.nla.gov.au/v3/result\",\n",
    "                params=params,\n",
    "                headers=TROVE_HEADERS,\n",
    "            )\n",
    "            data = response.json()\n",
    "            # The nextStart parameter is used to get the next page of results.\n",
    "            # If there's no nextStart then it means we're on the last page of results.\n",
    "            try:\n",
    "                start = data[\"category\"][0][\"records\"][\"nextStart\"]\n",
    "            except KeyError:\n",
    "                start = None\n",
    "            for article in data[\"category\"][0][\"records\"][\"article\"]:\n",
    "                add_article(article[\"id\"])\n",
    "                pbar.update(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [],
   "source": [
    "# Edit/add search values and parameters as required. These are an example only!\n",
    "trove_params = {\n",
    "    \"q\": '\"inigo jones\"',  # required -- change to anything you might enter in the Trove search box (including double quotes for phrases and boolean operators like AND)\n",
    "    \"category\": \"newspaper\",  # don't change this\n",
    "    \"l-artType\": \"newspaper\",\n",
    "    \"l-illustrated\": \"true\",  # edit or remove -- limits to illustrated articles\n",
    "    \"l-illtype\": \"Photo\",  # edit or remove -- limits to illustrations with photos\n",
    "    \"l-word\": \"1000+ Words\",  # edit or remove -- limits to article with more than 1000 words\n",
    "    \"include\": \"articleText\",  # don't change this\n",
    "    \"encoding\": \"json\",  # don't change this\n",
    "}\n",
    "\n",
    "upload_trove_search(trove_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "----\n",
    "\n",
    "### Option 2: Upload newspaper articles from a Trove list\n",
    "\n",
    "You can upload any newspaper articles stored in a Trove list to your Omeka site. \n",
    "\n",
    "To find the `list_id`, just go to the list's web page. The `list_id` is the string of numbers that appears after 'id=' in the url. Once you have your `list_id`, paste it where indicated in the cell below.\n",
    "\n",
    "Once you've edited the list details, hit **Shift+Enter** to run the cells below and start your upload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def upload_trove_list(list_id):\n",
    "    \"\"\"\n",
    "    Upload any newspaper articles in the given Trove list to Omeka.\n",
    "    \"\"\"\n",
    "    url = \"http://api.trove.nla.gov.au/v3/list/{}\".format(list_id)\n",
    "    params = {\"include\": \"listItems\", \"encoding\": \"json\", \"key\": TROVE_API_KEY}\n",
    "    response = s.get(url, params=params)\n",
    "    data = response.json()\n",
    "    for item in tqdm(data[\"listItem\"]):\n",
    "        for category, record in item.items():\n",
    "            if category == \"article\":\n",
    "                add_article(record[\"id\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [],
   "source": [
    "# Paste the identifier of your list between the quotes\n",
    "list_id = \"[Your list ID]\"\n",
    "list_id = \"83777\"\n",
    "upload_trove_list(list_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "----\n",
    "\n",
    "### Option 3: Upload Trove newspaper articles saved in Zotero\n",
    "\n",
    "You can upload any Trove newspaper articles stored in a Zotero collection to your Omeka site.\n",
    "\n",
    "To access Zotero you need four pieces of information:\n",
    "\n",
    "* `ZOTERO_KEY` – generate an API for this application by [going here](https://www.zotero.org/settings/keys/new) once you've logged into the Zotero web site. Your key only need read access.\n",
    "* `LIBRARY_ID` – a string of numbers that identifies your library. For personal libraries, go to [this page](https://www.zotero.org/settings/keys/) once you've logged into the Zotero web site and look for the line that says 'Your userID for use in API calls is...'. For group libraries, just open the groups web page – the `LIBRARY_ID` will be the string of numbers after '/groups/' in the url.\n",
    "* `LIBRARY_TYPE` – either 'user' or 'group', depending on whether it's a personal library of a group library.\n",
    "* `collection_id` – the id of the collection that contains your Trove newspaper articles. Just open the collection on the Zotero website – the `collection_id` is the string of letters and numbers that comes after '/collectionKey/' in the url.\n",
    "\n",
    "For additional information see the [Pyzotero](https://github.com/urschrei/pyzotero) documention.\n",
    "\n",
    "When you have all the values you need, simply paste them where indicated in the cells below.\n",
    "\n",
    "Once you've added your details, hit **Shift+Enter** to run the cells below and start your upload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# ENTER YOUR VALUES BETWEEN THE QUOTES WHERE INDICATED\n",
    "ZOTERO_KEY = \"YOUR ZOTERO KEY\"  # The Zotero API key you generated\n",
    "LIBRARY_TYPE = \"user\"  # user or group\n",
    "LIBRARY_ID = \"YOUR ZOTERO ID\"  # Either a personal user id or a group id\n",
    "\n",
    "# Or you can store your information in a .env file\n",
    "if os.getenv(\"ZOTERO_KEY\"):\n",
    "    ZOTERO_KEY = os.getenv(\"ZOTERO_KEY\")\n",
    "if os.getenv(\"ZOTERO_ID\"):\n",
    "    LIBRARY_ID = os.getenv(\"ZOTERO_ID\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def upload_zotero_collection(coll_id):\n",
    "    \"\"\"\n",
    "    Upload any Trove newspaper articles in the given collection to Omeka.\n",
    "    \"\"\"\n",
    "    zot = zotero.Zotero(LIBRARY_ID, LIBRARY_TYPE, ZOTERO_KEY)\n",
    "    items = zot.everything(zot.collection_items(coll_id))\n",
    "    articles = []\n",
    "    for item in items:\n",
    "        # Filter out things that aren't newspaper articles\n",
    "        try:\n",
    "            url = item[\"data\"][\"url\"]\n",
    "            if (\n",
    "                item[\"data\"][\"itemType\"] == \"newspaperArticle\"\n",
    "                and \"nla.news-article\" in url\n",
    "            ):\n",
    "                article_id = re.search(r\"(\\d+)$\", url).group(1)\n",
    "                articles.append(article_id)\n",
    "        except KeyError:\n",
    "            pass\n",
    "    for article_id in tqdm(articles):\n",
    "        add_article(article_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [],
   "source": [
    "# Paste your collection ID between the quotes below.\n",
    "collection_id = \"YOUR COLLECTION ID\"\n",
    "upload_zotero_collection(collection_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---- \n",
    "\n",
    "### Option 4: Upload a list of article ids\n",
    "\n",
    "This is useful for testing – just get some Trove newspaper article identifiers (the number in the url) and upload them to your Omeka site.\n",
    "\n",
    "Once you've edited the list of article, hit **Shift+Enter** to run the cells below and start your upload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Edit the list of articles as you see fit...\n",
    "article_ids = [130413505, 65179201]\n",
    "\n",
    "for article_id in tqdm(article_ids):\n",
    "    add_article(article_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Future developments\n",
    "\n",
    "* I'd originally hoped to use the Omeka-S API to do most of the configuration, including importing vocabularies and resource templates. But I [just couldn't get it to work properly](https://forum.omeka.org/t/upload-resource-template-via-api/8855/5). If I figure it out I'll add the details.\n",
    "\n",
    "* It would be good to be able to upload other types of things from Trove such as images, maps and journal articles. This should be possible, however, things are not as standardised as they are with the newspaper articles, so it'll take a bit of work. Let me know if you're interested and it'll give me a bit more motivation.\n",
    "\n",
    "* Also, going back to my Omeka how-to from 8 years back, I'll be creating a RecordSearch to Omeka-S pipeline as well."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "----\n",
    "\n",
    "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).  \n",
    "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb).\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "rocrate": {
   "author": [
    {
     "mainEntityOfPage": "https://timsherratt.au",
     "name": "Sherratt, Tim",
     "orcid": "https://orcid.org/0000-0001-7956-4498"
    }
   ],
   "description": "This notebook steps through the process of uploading Trove newspaper articles to your own Omeka-S instance via the API. As well as uploading the article metadata, it attaches image(s) and PDFs of the articles, and creates a linked record for the publishing newspaper. The source of the articles can be a Trove search, a Trove list, a Zotero collection, or just a list of article ids.",
   "mainEntityOfPage": "https://glam-workbench.net/trove-newspapers/Upload-Trove-newspapers-to-Omeka/",
   "name": "Upload Trove newspaper articles to Omeka-S"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}