{ "cells": [ { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Map Trove newspaper results by state\n", "\n", "Version 2 of the Trove API adds the `state` facet to the `newspapers` zone. This means we can easily get the number of articles from a search query published in each state." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting things up\n", "\n", "First we'll import the packages we need." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Import the libraries we need\n", "# <-- Click the run icon\n", "import json\n", "import os\n", "\n", "import altair as alt\n", "import pandas as pd\n", "import requests\n", "from dotenv import load_dotenv\n", "\n", "load_dotenv()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need an [API key](http://help.nla.gov.au/trove/building-with-trove/api) to get data from Trove. Insert your key below." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Insert your Trove API key\n", "API_KEY = \"YOUR API KEY\"\n", "\n", "# Use api key value from environment variables if it is available\n", "if os.getenv(\"TROVE_API_KEY\"):\n", " API_KEY = os.getenv(\"TROVE_API_KEY\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set up some default parameters for our API query." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Set up default parameters for our API query\n", "# <-- Click the run icon\n", "params = {\n", " \"category\": \"newspaper\",\n", " \"l-artType\": \"newspaper\",\n", " \"encoding\": \"json\",\n", " \"facet\": \"state\",\n", " \"n\": 0,\n", "}\n", "\n", "headers = {\"X-API-KEY\": API_KEY}\n", "\n", "API_URL = \"http://api.trove.nla.gov.au/v3/result\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Construct your search\n", "\n", "This is where you set your search keywords. Change 'weather' in the cell below to anything you might enter in the Trove simple search box. For example:\n", "\n", "`params['q'] = 'weather AND wragge'`\n", "\n", "`params['q'] = '\"Clement Wragge\"'`\n", "\n", "`params['q'] = 'text:\"White Australia Policy\"'`\n", "\n", "`params['q'] = 'weather AND date:[1890-01-01T00:00:00Z TO 1920-12-11T00:00:00Z]'`\n", "\n", "You can also limit the results to specific categories. To only search for articles, include this line:\n", "\n", "`params['l-category'] = 'Article'`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Enter your search parameters\n", "# This can be anything you'd enter in the Trove simple search box\n", "params[\"q\"] = \"drought\"\n", "\n", "# Remove the \"#\" symbol from the line below to limit the results to the article category\n", "params[\"l-category\"] = \"Article\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get the data from Trove\n", "\n", "Everything's set up, so just run the cells!\n", "\n", "### Make an API request" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# <-- Click the run icon\n", "response = requests.get(API_URL, params=params, headers=headers)\n", "data = response.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reformat the results" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statetotal
0New South Wales558970
1Queensland301702
2Victoria258701
3South Australia151862
4Western Australia85960
5Tasmania47622
6Australian Capital Territory13026
7Northern Territory1778
\n", "
" ], "text/plain": [ " state total\n", "0 New South Wales 558970\n", "1 Queensland 301702\n", "2 Victoria 258701\n", "3 South Australia 151862\n", "4 Western Australia 85960\n", "5 Tasmania 47622\n", "6 Australian Capital Territory 13026\n", "7 Northern Territory 1778" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# <-- Click the run icon\n", "def format_facets(data):\n", " facets = data[\"category\"][0][\"facets\"][\"facet\"][0][\"term\"]\n", " df = pd.DataFrame(facets)\n", " df = df[[\"display\", \"count\"]]\n", " df.columns = [\"state\", \"total\"]\n", " df[\"total\"] = pd.to_numeric(df[\"total\"], errors=\"coerce\")\n", " df = df.replace(\"ACT\", \"Australian Capital Territory\")\n", " df = df[(df[\"state\"] != \"National\") & (df[\"state\"] != \"International\")]\n", " return df\n", "\n", "\n", "df = format_facets(data)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make some charts!\n", "\n", "Just run the cells!" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Create a bar chart\n", "# <-- Click the run icon\n", "chart = (\n", " alt.Chart(df)\n", " .mark_bar(color=\"#084081\")\n", " .encode(\n", " x=alt.X(\"total\", axis=alt.Axis(title=\"Total articles\")),\n", " y=alt.Y(\"state\", axis=alt.Axis(title=\"\")),\n", " tooltip=[alt.Tooltip(\"total\", title=\"Total articles\")],\n", " )\n", " .properties(width=300, height=200)\n", ")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Make a chloropleth map\n", "# <-- Click the run icon\n", "with open(\"data/aus_state.geojson\", \"r\") as geo_file:\n", " geo_data = json.load(geo_file)\n", "map = (\n", " alt.Chart(alt.Data(values=geo_data[\"features\"]))\n", " .mark_geoshape(stroke=\"black\", strokeWidth=0.2)\n", " .encode(\n", " color=alt.Color(\n", " \"total:Q\",\n", " scale=alt.Scale(scheme=\"greenblue\"),\n", " legend=alt.Legend(title=\"Total articles\"),\n", " )\n", " )\n", " .transform_lookup(\n", " lookup=\"properties.STATE_NAME\", from_=alt.LookupData(df, \"state\", [\"total\"])\n", " )\n", " .project(type=\"mercator\")\n", " .properties(width=400, height=400)\n", ")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.HConcatChart(...)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the charts side by side\n", "# <-- Click the run icon\n", "alt.hconcat(map, chart).resolve_legend(color=\"independent\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate proportions\n", "\n", "Of course, this is just the raw number of results. Some states have more newspapers, so we have to be careful about making comparisons. This time we'll divide the number of search results for each state by the total number of articles published in that state to see what proportion our search represents. This should provide a more meaningful basis for comparison.\n", "\n", "To calculate what proportion of the total number of articles the search results represent, we need to make another API request to find out how many articles were published over the same time frame. This time we'll leave the keywords out of the query. If you've set a date range, you'll want to keep it as the value for `params['q']`. Otherwise remove the `q` parameter completely using `.pop()`, eg:\n", "\n", "```\n", "params.pop('q', None)\n", "```\n", "\n", "or\n", "\n", "```\n", "params['q'] = 'date:[* TO 1954]'\n", "```" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# params.pop(\"q\", None)\n", "params[\"q\"] = \"date:[* TO 1954]\"\n", "response = requests.get(API_URL, params=params, headers=headers)\n", "total_data = response.json()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statetotal_xtotal_yproportion
0New South Wales558970597047360.009362
1Queensland301702294097120.010259
2Victoria258701321631100.008043
3South Australia151862169938790.008936
4Western Australia85960169639730.005067
5Tasmania47622118795800.004009
6Australian Capital Territory130265539260.023516
7Northern Territory17782944180.006039
\n", "
" ], "text/plain": [ " state total_x total_y proportion\n", "0 New South Wales 558970 59704736 0.009362\n", "1 Queensland 301702 29409712 0.010259\n", "2 Victoria 258701 32163110 0.008043\n", "3 South Australia 151862 16993879 0.008936\n", "4 Western Australia 85960 16963973 0.005067\n", "5 Tasmania 47622 11879580 0.004009\n", "6 Australian Capital Territory 13026 553926 0.023516\n", "7 Northern Territory 1778 294418 0.006039" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reformat the facets\n", "total_df = format_facets(total_data)\n", "# Merge the two dataframes, joining them on the 'state' column\n", "df = pd.merge(df, total_df, on=\"state\", how=\"left\")\n", "# Create a new column that contains the proportion of the total results this search represents\n", "df[\"proportion\"] = df[\"total_x\"] / df[\"total_y\"]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make some better charts!\n", "\n", "Just run the cells!" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Create a bar chart\n", "# <-- Click the run icon\n", "chart2 = (\n", " alt.Chart(df)\n", " .mark_bar(color=\"#084081\")\n", " .encode(\n", " x=alt.X(\"proportion\", axis=alt.Axis(title=\"Proportion of articles\")),\n", " y=alt.Y(\"state\", axis=alt.Axis(title=\"\")),\n", " tooltip=[alt.Tooltip(\"proportion\", title=\"Proportion\")],\n", " )\n", " .properties(width=300, height=200)\n", ")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "map2 = (\n", " alt.Chart(alt.Data(values=geo_data[\"features\"]))\n", " .mark_geoshape(stroke=\"black\", strokeWidth=0.2)\n", " .encode(\n", " color=alt.Color(\n", " \"proportion:Q\",\n", " scale=alt.Scale(scheme=\"greenblue\"),\n", " legend=alt.Legend(title=\"Proportion of articles\"),\n", " )\n", " )\n", " .transform_lookup(\n", " lookup=\"properties.STATE_NAME\",\n", " from_=alt.LookupData(df, \"state\", [\"proportion\"]),\n", " )\n", " .project(type=\"mercator\")\n", " .properties(width=400, height=400)\n", ")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.HConcatChart(...)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the charts side by side\n", "# <-- Click the run icon\n", "alt.hconcat(map2, chart2).resolve_legend(color=\"independent\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). \n", "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "rocrate": { "author": [ { "mainEntityOfPage": "https://timsherratt.au", "name": "Sherratt, Tim", "orcid": "https://orcid.org/0000-0001-7956-4498" } ], "category": "Visualising searches", "description": "Uses the Trove state facet to create a choropleth map that visualises the number of search results per state.", "mainEntityOfPage": "https://glam-workbench.net/trove-newspapers/Map-newspaper-results-by-state/", "name": "Map Trove newspaper results by state", "position": 4 } }, "nbformat": 4, "nbformat_minor": 4 }