{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Map Trove newspaper results by place of publication\n", "\n", "In another notebook, I explored some ways in which you could [map Trove newspaper results using the `state` facet](Map-newspaper-results-by-state.ipynb). In this notebook we'll go a bit deeper and map the actual **locations** in which the newspapers returned by our search results were published.\n", "\n", "To do this, we'll use the `title` facet. This returns a list of all the newspapers in our results, and the number of matching articles in each.\n", "\n", "You can use this notebook to visualise your own search queries, just edit the search parameters were indicated.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add your API key\n", "\n", "You need an [API key](http://help.nla.gov.au/trove/building-with-trove/api) to get data from Trove." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This creates a variable called 'api_key', paste your key between the quotes\n", "# <-- Then click the run icon \n", "api_key = ''\n", "\n", "# This displays a message with your key\n", "print('Your API key is: {}'.format(api_key))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting things up\n", "\n", "You don't need to edit anything here. Just run the cells to load the bits and pieces we need." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Import the libraries we need\n", "# <-- Click the run icon \n", "import requests\n", "import pandas as pd\n", "import os\n", "import altair as alt\n", "import json\n", "import folium\n", "from folium.plugins import MarkerCluster\n", "from folium.plugins import HeatMap\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Set up default parameters for our API query\n", "# <-- Click the run icon \n", "params = {\n", " 'zone': 'newspaper',\n", " 'encoding': 'json',\n", " 'facet': 'title',\n", " 'n': '1',\n", " 'key': api_key\n", "}\n", "\n", "api_url = 'http://api.trove.nla.gov.au/v2/result'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Construct your search\n", "\n", "This is where you set your search keywords. Change 'weather AND wragge date:[* TO 1954]' in the cell below to anything you might enter in the Trove simple search box. For example:\n", "\n", "`params['q'] = 'weather AND wragge'`\n", "\n", "`params['q'] = '\"Clement Wragge\"'`\n", "\n", "`params['q'] = 'text:\"White Australia Policy\"'`\n", "\n", "`params['q'] = 'weather AND date:[1890-01-01T00:00:00Z TO 1920-12-11T00:00:00Z]'`\n", "\n", "You can also limit the results to specific categories. To only search for articles, include this line:\n", "\n", "`params['l-category'] = 'Article'`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Enter your search parameters\n", "# This can be anything you'd enter in the Trove simple search box\n", "params['q'] = 'weather AND wragge date:[* TO 1954]'\n", "\n", "# Remove the \"#\" symbol from the line below to limit the results to the article category\n", "#params['l-category'] = 'Article'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get the data from Trove\n", "\n", "Everything's set up, so just run the cells!\n", "\n", "### Make an API request" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# <-- Click the run icon \n", "response = requests.get(api_url, params=params)\n", "data = response.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reformat the results" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
title_idtotal
08404846
1164833
25081688
3101553
4741513
\n", "
" ], "text/plain": [ " title_id total\n", "0 840 4846\n", "1 16 4833\n", "2 508 1688\n", "3 10 1553\n", "4 74 1513" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# <-- Click the run icon \n", "def format_facets(data):\n", " facets = data['response']['zone'][0]['facets']['facet']['term']\n", " df = pd.DataFrame(facets)\n", " df = df[['display', 'count']]\n", " df.columns = ['title_id', 'total']\n", " df['total'] = pd.to_numeric(df['total'], errors='coerce')\n", " return df\n", "df = format_facets(data)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load location data\n", "\n", "I've previously created a [CSV file](data/trove-newspaper-titles-locations.csv) that provides geolocated places of publication for newspapers in Trove. Some newspapers are associated with multiple places (for example a cluster of nearby country towns), so the CSV file can contain multiple rows for a single newspaper title. Note also that any newspapers that were added to Trove since I last harvested the locations in April 2018 will drop out of the data.\n", "\n", "We're going to merge the facets data with my geolocated titles file, matching on the `title_id`. We'll only take the first matching row from the geolocated data." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
title_idtotalnewspaper_titlestateplace_idplacelatitudelongitude
08404846The Telegraph (Brisbane, Qld. : 1872 - 1947)QLDQLD4555Brisbane-27.467848153.028013
1164833The Brisbane Courier (Qld. : 1864 - 1933)QLDQLD4555Brisbane-27.467848153.028013
25081688Evening News (Sydney, NSW : 1869 - 1931)NSWNSW79218Sydney-33.873200151.209600
3101553The Mercury (Hobart, Tas. : 1860 - 1954)TASTAS00752Hobart-42.880001147.320007
4741513Launceston Examiner (Tas. : 1842 - 1899)TASTAS00338Launceston-41.439999147.139999
\n", "
" ], "text/plain": [ " title_id total newspaper_title state \\\n", "0 840 4846 The Telegraph (Brisbane, Qld. : 1872 - 1947) QLD \n", "1 16 4833 The Brisbane Courier (Qld. : 1864 - 1933) QLD \n", "2 508 1688 Evening News (Sydney, NSW : 1869 - 1931) NSW \n", "3 10 1553 The Mercury (Hobart, Tas. : 1860 - 1954) TAS \n", "4 74 1513 Launceston Examiner (Tas. : 1842 - 1899) TAS \n", "\n", " place_id place latitude longitude \n", "0 QLD4555 Brisbane -27.467848 153.028013 \n", "1 QLD4555 Brisbane -27.467848 153.028013 \n", "2 NSW79218 Sydney -33.873200 151.209600 \n", "3 TAS00752 Hobart -42.880001 147.320007 \n", "4 TAS00338 Launceston -41.439999 147.139999 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the geolocated data\n", "locations = pd.read_csv('data/trove-newspaper-titles-locations.csv', dtype={'title_id': int})\n", "# Only keep the first instance of each title\n", "locations.drop_duplicates(subset=['title_id'], keep='first', inplace=True)\n", "# Merge the facets and the geolocated data\n", "df_located = pd.merge(df, locations, on='title_id', how='left')\n", "df_located.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display top 20 newspapers\n", "\n", "Now we have titles for our newspaper facets, let's chart the top twenty." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(df_located[:20]).mark_bar().encode(\n", " y=alt.Y('newspaper_title:N', sort=df_located['newspaper_title'][:20].tolist(), title=''),\n", " x=alt.X('total:Q', title='Number of articles')\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Map places of publication\n", "\n", "More than one newspaper can be associated with a place, so rather than map individual newspapers, we'll group them by place." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Group newspapers by place\n", "df_places = df_located.groupby(['place', 'latitude', 'longitude'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the fun part. We'll create a map, then we'll loop through the places, getting the total number of articles from all the grouped newspapers." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(\n", " location=[-30, 135],\n", " zoom_start=4\n", ")\n", "# We'll cluster the markers for better readability\n", "marker_cluster = MarkerCluster().add_to(m)\n", "\n", "for place, group in df_places:\n", " # Get the total articles from the grouped titles\n", " total = group['total'].sum()\n", " # Turn all the grouped title_ids into a string that we can use in a Trove search url\n", " titles = group['title_id'].astype('str').str.cat(sep='&l-title=')\n", " # Create the content of the marker popup -- includes a search link back to Trove!\n", " html = '{}
{} articles'.format(place[0], params['q'], titles, params.get('l-category', ''), total)\n", " # Add the marker to the map\n", " folium.Marker([place[1], place[2]], popup=html).add_to(marker_cluster)\n", "\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Play around with the map. Note the link on the total number of articles in the pop-ups — it should open Trove and find the matching articles!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make a heatmap\n", "\n", "The map above is great from browsing, but doesn't give much of a sense of the **number** of results in each place. Let's try creating a heatmap instead.\n", "\n", "To populate a heatmap we just need a list of coordinates — one set of coordinates for each article." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Get the total number of articles for each place\n", "df_totals = df_places.sum()\n", "locations = []\n", "# Loop through the places\n", "for place in df_totals.index:\n", " # Get the total\n", " total = df_totals.loc[place]['total']\n", " # Add the coordinates of the place to the list of locations as many times as there are articles\n", " locations += ([[place[1], place[2]]] * total)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create another map\n", "m2 = folium.Map(\n", " location=[-30, 135],\n", " zoom_start=4\n", ")\n", "\n", "#Add the heatmap data!\n", "HeatMap(locations).add_to(m2)\n", "m2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's looking pretty interesting. Hmmmm, it would be nice if we could animate this through time, but we'd need more data. Perhaps a [future notebook](Map-newspaper-results-by-place-of-publication-over-time.ipynb) topic?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). \n", "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }