{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Beyond the copyright cliff of death\n",
    "\n",
    "Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. Let's find out how many, and which newspapers they were published in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import pandas as pd\n",
    "from IPython.display import display, FileLink"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "trove_api_key = 'YOUR API KEY'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Search for articles published after 1955\n",
    "\n",
    "First we're going to run a date query to find all the articles published after 1954. But instead of looking at the articles themselves, we're going to get the `title` facet – this will tell us the number of articles for each newspaper."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = {\n",
    "    'q': 'date:[1955 TO *]', # date range query\n",
    "    'zone': 'newspaper',\n",
    "    'facet': 'title', # get the newspaper facets\n",
    "    'encoding': 'json',\n",
    "    'n': 0, # no articles thanks\n",
    "    'key': trove_api_key\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make our API request\n",
    "response = requests.get('https://api.trove.nla.gov.au/v2/result', params=params)\n",
    "data = response.json()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the facet data\n",
    "facets = data['response']['zone'][0]['facets']['facet']['term']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>number_of_articles</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2567488</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>417472</td>\n",
       "      <td>1376</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>225466</td>\n",
       "      <td>112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>159156</td>\n",
       "      <td>1044</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>143726</td>\n",
       "      <td>856</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  number_of_articles    id\n",
       "0            2567488    11\n",
       "1             417472  1376\n",
       "2             225466   112\n",
       "3             159156  1044\n",
       "4             143726   856"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Convert to a dataframe\n",
    "df_articles = pd.DataFrame(facets)\n",
    "# Get rid of some columns\n",
    "df_articles = df_articles[['count', 'display']]\n",
    "# Rename columns\n",
    "df_articles.columns = ['number_of_articles', 'id']\n",
    "# Change id to string, so we can merge on it later\n",
    "df_articles['id'] = df_articles['id'].astype('str')\n",
    "# Preview results\n",
    "df_articles.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Match the facets with newspapers\n",
    "\n",
    "As you can see from the data above, the `title` facet only gives us the identifier for a newspaper, not its title or date range. To get more information about each newspaper, we're going to get a list of newspapers from the Trove API and then merge the two datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get ALL the newspapers\n",
    "response = requests.get('https://api.trove.nla.gov.au/v2/newspaper/titles', params={'encoding': 'json', 'key': trove_api_key})\n",
    "newspapers_data = response.json()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "newspapers = newspapers_data['response']['records']['newspaper']\n",
    "# Convert to a dataframe\n",
    "df_newspapers = pd.DataFrame(newspapers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>number_of_articles</th>\n",
       "      <th>id</th>\n",
       "      <th>endDate</th>\n",
       "      <th>issn</th>\n",
       "      <th>startDate</th>\n",
       "      <th>state</th>\n",
       "      <th>title</th>\n",
       "      <th>troveUrl</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2567488</td>\n",
       "      <td>11</td>\n",
       "      <td>1995-12-31</td>\n",
       "      <td>01576925</td>\n",
       "      <td>1926-09-03</td>\n",
       "      <td>ACT</td>\n",
       "      <td>The Canberra Times (ACT : 1926 - 1995)</td>\n",
       "      <td>https://trove.nla.gov.au/ndp/del/title/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>417472</td>\n",
       "      <td>1376</td>\n",
       "      <td>1981-06-30</td>\n",
       "      <td>22087427</td>\n",
       "      <td>1969-06-30</td>\n",
       "      <td>International</td>\n",
       "      <td>Papua New Guinea Post-Courier (Port Moresby : ...</td>\n",
       "      <td>https://trove.nla.gov.au/ndp/del/title/1376</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>225466</td>\n",
       "      <td>112</td>\n",
       "      <td>1982-12-15</td>\n",
       "      <td>00050458</td>\n",
       "      <td>1933-06-10</td>\n",
       "      <td>National</td>\n",
       "      <td>The Australian Women's Weekly (1933 - 1982)</td>\n",
       "      <td>https://trove.nla.gov.au/ndp/del/title/112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>159156</td>\n",
       "      <td>1044</td>\n",
       "      <td>1993-07-19</td>\n",
       "      <td>22051902</td>\n",
       "      <td>1951-10-05</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>Dutch Australian Weekly (Sydney, NSW : 1951 - ...</td>\n",
       "      <td>https://trove.nla.gov.au/ndp/del/title/1044</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>143726</td>\n",
       "      <td>856</td>\n",
       "      <td>1999-12-31</td>\n",
       "      <td>22040722</td>\n",
       "      <td>1987-01-07</td>\n",
       "      <td>South Australia</td>\n",
       "      <td>Times (Victor Harbor, SA : 1987 - 1999)</td>\n",
       "      <td>https://trove.nla.gov.au/ndp/del/title/856</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  number_of_articles    id     endDate      issn   startDate            state  \\\n",
       "0            2567488    11  1995-12-31  01576925  1926-09-03              ACT   \n",
       "1             417472  1376  1981-06-30  22087427  1969-06-30    International   \n",
       "2             225466   112  1982-12-15  00050458  1933-06-10         National   \n",
       "3             159156  1044  1993-07-19  22051902  1951-10-05  New South Wales   \n",
       "4             143726   856  1999-12-31  22040722  1987-01-07  South Australia   \n",
       "\n",
       "                                               title  \\\n",
       "0             The Canberra Times (ACT : 1926 - 1995)   \n",
       "1  Papua New Guinea Post-Courier (Port Moresby : ...   \n",
       "2        The Australian Women's Weekly (1933 - 1982)   \n",
       "3  Dutch Australian Weekly (Sydney, NSW : 1951 - ...   \n",
       "4            Times (Victor Harbor, SA : 1987 - 1999)   \n",
       "\n",
       "                                      troveUrl  \n",
       "0    https://trove.nla.gov.au/ndp/del/title/11  \n",
       "1  https://trove.nla.gov.au/ndp/del/title/1376  \n",
       "2   https://trove.nla.gov.au/ndp/del/title/112  \n",
       "3  https://trove.nla.gov.au/ndp/del/title/1044  \n",
       "4   https://trove.nla.gov.au/ndp/del/title/856  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Merge the two dataframes by doing a left join on the 'id' column\n",
    "df_newspapers_post54 = pd.merge(df_articles, df_newspapers, how='left', on='id')\n",
    "df_newspapers_post54.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "72"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# How many newspapers?\n",
    "df_newspapers_post54.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<a href='newspapers_post_54.csv' target='_blank'>newspapers_post_54.csv</a><br>"
      ],
      "text/plain": [
       "/Volumes/Workspace/mycode/glam-workbench/trove-newspapers/notebooks/newspapers_post_54.csv"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Reorder columns and save as CSV\n",
    "df_newspapers_post54[['title', 'state', 'id', 'startDate', 'endDate', 'issn', 'number_of_articles', 'troveUrl']].to_csv('newspapers_post_54.csv', index=False)\n",
    "# Display a link for easy download\n",
    "display(FileLink('newspapers_post_54.csv'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}