{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Map Trove newspaper results by place of publication over time\n", "\n", "In [another notebook](Map-newspaper-results-by-place-of-publication.ipynb), I constructed a heatmap displaying the places of publication of articles returned by a search in Trove's newspapers zone.\n", "\n", "I suggested that it would be interesting to visualise changes over time. This notebook does just that by creating an animated heatmap.\n", "\n", "The key difference here is that instead of just getting and processing a single Trove API request, we'll need to fire off a series of API requests — one for each time interval.\n", "\n", "You can use this notebook to visualise your own search queries, just edit the search parameters were indicated.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add your API key\n", "\n", "You need an [API key](http://help.nla.gov.au/trove/building-with-trove/api) to get data from Trove." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This creates a variable called 'api_key', paste your key between the quotes\n", "# <-- Then click the run icon \n", "api_key = 'YOUR API KEY'\n", "\n", "# This displays a message with your key\n", "print('Your API key is: {}'.format(api_key))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting things up\n", "\n", "You don't need to edit anything here. Just run the cells to load the bits and pieces we need." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Import the libraries we need\n", "# <-- Click the run icon \n", "import requests\n", "import pandas as pd\n", "import os\n", "import altair as alt\n", "import json\n", "import folium\n", "from folium.plugins import HeatMapWithTime\n", "import numpy as np\n", "from tqdm.auto import tqdm" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Set up default parameters for our API query\n", "# <-- Click the run icon \n", "params = {\n", " 'zone': 'newspaper',\n", " 'encoding': 'json',\n", " 'facet': 'title',\n", " 'n': '1',\n", " 'key': api_key\n", "}\n", "\n", "api_url = 'http://api.trove.nla.gov.au/v2/result'" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "tags": [] }, "outputs": [], "source": [ "# <-- Click the run icon \n", "def format_facets(data):\n", " '''\n", " Extract and normalise the facet data\n", " '''\n", " # Check to make sure we have results\n", " try:\n", " facets = data['response']['zone'][0]['facets']['facet']['term']\n", " except TypeError:\n", " # No results!\n", " raise\n", " else:\n", " # Convert to DataFrame\n", " df = pd.DataFrame(facets)\n", " # Select the columns we want\n", " df = df[['display', 'count']]\n", " # Rename the columns\n", " df.columns = ['title_id', 'total']\n", " # Make sure the total column is a number\n", " df['total'] = pd.to_numeric(df['total'], errors='coerce')\n", " return df\n", "\n", "def prepare_data(data):\n", " '''\n", " Reformat the facet data, merge with locations, and then generate a list of locations.\n", " '''\n", " # Check for results\n", " try:\n", " df = format_facets(data)\n", " except TypeError:\n", " # If there are no results just return and empty list\n", " hm_data = []\n", " else:\n", " # Merge facets data with geolocated list of titles\n", " df_located = pd.merge(df, locations, on='title_id', how='left')\n", " # Group results by place, and calculate the total results for each\n", " df_totals = df_located.groupby(['place', 'latitude', 'longitude']).sum()\n", " hm_data = []\n", " for place in df_totals.index:\n", " # Get the total\n", " total = df_totals.loc[place]['total']\n", " # Add the coordinates of the place to the list of locations as many times as there are articles\n", " hm_data += ([[place[1], place[2]]] * total)\n", " return hm_data\n", "\n", "# Get the geolocated titles data\n", "locations = pd.read_csv('data/trove-newspaper-titles-locations.csv', dtype={'title_id': 'int64'})\n", "# Only keep the first instance of each title\n", "locations.drop_duplicates(subset=['title_id'], keep='first', inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Construct your search\n", "\n", "This is where you set your search keywords. Change 'weather AND wragge date:[* TO 1954]' in the cell below to anything you might enter in the Trove simple search box. Don't include a date range, as we'll be handling that separately. For example:\n", "\n", "`params['q'] = 'weather AND wragge'`\n", "\n", "`params['q'] = '\"Clement Wragge\"'`\n", "\n", "`params['q'] = 'text:\"White Australia Policy\"'`\n", "\n", "You can also limit the results to specific categories. To only search for articles, include this line:\n", "\n", "`params['l-category'] = 'Article'`" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# Enter your search parameters\n", "# This can be anything you'd enter in the Trove simple search box\n", "params['q'] = 'text:\"White Australia\"'\n", "\n", "# Remove the \"#\" symbol from the line below to limit the results to the article category\n", "#params['l-category'] = 'Article'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set your date range\n", "\n", "In this example we'll use years as our time interval. We could easily change this to months, or even individual days for a fine-grained analysis." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "start_year = 1880\n", "end_year = 1950" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get the data from Trove\n", "\n", "We need to make an API request for each year in our date range, so we'll construct a loop.\n", "\n", "The cell below generates two lists. The first, `hm_series`, is a list containing the data from each API request. The second, `time_index`, is a list of the years we're getting data for. Obviously these two lists should be the same length — one dataset for each year." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# <-- Click the run icon \n", "hm_series = []\n", "time_index = []\n", "for year in tqdm(range(start_year, end_year + 1)):\n", " time_index.append(year)\n", " decade = str(year)[:3]\n", " params['l-decade'] = decade\n", " params['l-year'] = year\n", " response = requests.get(api_url, params=params)\n", " data = response.json()\n", " hm_data = prepare_data(data)\n", " hm_series.append(hm_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make an animated heatmap\n", "\n", "To create an animated heatmap we just need to feed it the `hm_series` data and time index." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# <-- Click the run icon \n", "# Create the map\n", "m = folium.Map(\n", " location=[-30, 135],\n", " zoom_start=4\n", ")\n", "\n", "#Add the heatmap data!\n", "HeatMapWithTime(\n", " hm_series,\n", " index=time_index,\n", " auto_play=True\n", ").add_to(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Search for \"White Australia\" from 1880 to 1950" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# <-- Click the run icon \n", "display(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). \n", "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }