{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Map Trove newspaper results by state\n", "\n", "Version 2 of the Trove API adds the `state` facet to the `newspapers` zone. This means we can easily get the number of articles from a search query published in each state." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting things up\n", "\n", "First we'll import the packages we need." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Import the libraries we need\n", "# <-- Click the run icon\n", "import json\n", "import os\n", "\n", "import altair as alt\n", "import pandas as pd\n", "import requests" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Load variables from the .env file if it exists\n", "# Use %%capture to suppress messages\n", "%load_ext dotenv\n", "%dotenv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need an [API key](http://help.nla.gov.au/trove/building-with-trove/api) to get data from Trove. Insert your key below." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Insert your Trove API key\n", "API_KEY = \"YOUR API KEY\"\n", "\n", "# Use api key value from environment variables if it is available\n", "if os.getenv(\"TROVE_API_KEY\"):\n", " API_KEY = os.getenv(\"TROVE_API_KEY\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set up some default parameters for our API query." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Set up default parameters for our API query\n", "# <-- Click the run icon\n", "params = {\n", " \"zone\": \"newspaper\",\n", " \"encoding\": \"json\",\n", " \"facet\": \"state\",\n", " \"n\": \"1\",\n", " \"key\": API_KEY,\n", "}\n", "\n", "API_URL = \"http://api.trove.nla.gov.au/v2/result\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Construct your search\n", "\n", "This is where you set your search keywords. Change 'weather' in the cell below to anything you might enter in the Trove simple search box. For example:\n", "\n", "`params['q'] = 'weather AND wragge'`\n", "\n", "`params['q'] = '\"Clement Wragge\"'`\n", "\n", "`params['q'] = 'text:\"White Australia Policy\"'`\n", "\n", "`params['q'] = 'weather AND date:[1890-01-01T00:00:00Z TO 1920-12-11T00:00:00Z]'`\n", "\n", "You can also limit the results to specific categories. To only search for articles, include this line:\n", "\n", "`params['l-category'] = 'Article'`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Enter your search parameters\n", "# This can be anything you'd enter in the Trove simple search box\n", "params[\"q\"] = \"radio\"\n", "\n", "# Remove the \"#\" symbol from the line below to limit the results to the article category\n", "params[\"l-category\"] = \"Article\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get the data from Trove\n", "\n", "Everything's set up, so just run the cells!\n", "\n", "### Make an API request" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# <-- Click the run icon\n", "response = requests.get(API_URL, params=params)\n", "data = response.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reformat the results" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statetotal
0New South Wales547572
1Queensland333583
2Western Australia206961
3Victoria161047
4South Australia132064
5Tasmania91574
6Australian Capital Territory80686
9Northern Territory5908
\n", "
" ], "text/plain": [ " state total\n", "0 New South Wales 547572\n", "1 Queensland 333583\n", "2 Western Australia 206961\n", "3 Victoria 161047\n", "4 South Australia 132064\n", "5 Tasmania 91574\n", "6 Australian Capital Territory 80686\n", "9 Northern Territory 5908" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# <-- Click the run icon\n", "def format_facets(data):\n", " facets = data[\"response\"][\"zone\"][0][\"facets\"][\"facet\"][\"term\"]\n", " df = pd.DataFrame(facets)\n", " df = df[[\"display\", \"count\"]]\n", " df.columns = [\"state\", \"total\"]\n", " df[\"total\"] = pd.to_numeric(df[\"total\"], errors=\"coerce\")\n", " df = df.replace(\"ACT\", \"Australian Capital Territory\")\n", " df = df[(df[\"state\"] != \"National\") & (df[\"state\"] != \"International\")]\n", " return df\n", "\n", "\n", "df = format_facets(data)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make some charts!\n", "\n", "Just run the cells!" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# Create a bar chart\n", "# <-- Click the run icon\n", "chart = (\n", " alt.Chart(df)\n", " .mark_bar(color=\"#084081\")\n", " .encode(\n", " x=alt.X(\"total\", axis=alt.Axis(title=\"Total articles\")),\n", " y=alt.Y(\"state\", axis=alt.Axis(title=\"\")),\n", " tooltip=[alt.Tooltip(\"total\", title=\"Total articles\")],\n", " )\n", " .properties(width=300, height=200)\n", ")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# Make a chloropleth map\n", "# <-- Click the run icon\n", "with open(\"data/aus_state.geojson\", \"r\") as geo_file:\n", " geo_data = json.load(geo_file)\n", "map = (\n", " alt.Chart(alt.Data(values=geo_data[\"features\"]))\n", " .mark_geoshape(stroke=\"black\", strokeWidth=0.2)\n", " .encode(\n", " color=alt.Color(\n", " \"total:Q\",\n", " scale=alt.Scale(scheme=\"greenblue\"),\n", " legend=alt.Legend(title=\"Total articles\"),\n", " )\n", " )\n", " .transform_lookup(\n", " lookup=\"properties.STATE_NAME\", from_=alt.LookupData(df, \"state\", [\"total\"])\n", " )\n", " .project(type=\"mercator\")\n", " .properties(width=400, height=400)\n", ")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.HConcatChart(...)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the charts side by side\n", "# <-- Click the run icon\n", "alt.hconcat(map, chart).resolve_legend(color=\"independent\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate proportions\n", "\n", "Of course, this is just the raw number of results. Some states have more newspapers, so we have to be careful about making comparisons. This time we'll divide the number of search results for each state by the total number of articles published in that state to see what proportion our search represents. This should provide a more meaningful basis for comparison.\n", "\n", "To calculate what proportion of the total number of articles the search results represent, we need to make another API request to find out how many articles were published over the same time frame. This time we'll leave the keywords out of the query. If you've set a date range, you'll want to keep it as the value for `params['q']`. Otherwise set `params['q']` to a string containing nothing but a space, eg:\n", "\n", "```\n", "params['q'] = ' '\n", "```\n", "\n", "or\n", "\n", "```\n", "params['q'] = 'date:[* TO 1954]'\n", "```" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# params['q'] = ' '\n", "params[\"q\"] = \"date:[* TO 1954]\"\n", "response = requests.get(API_URL, params=params)\n", "total_data = response.json()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statetotal_xtotal_yproportion
0New South Wales547572557845520.009816
1Queensland333583293823850.011353
2Western Australia206961164909270.012550
3Victoria161047267114770.006029
4South Australia132064169434950.007794
5Tasmania91574118037190.007758
6Australian Capital Territory806865539260.145662
7Northern Territory59082944180.020067
\n", "
" ], "text/plain": [ " state total_x total_y proportion\n", "0 New South Wales 547572 55784552 0.009816\n", "1 Queensland 333583 29382385 0.011353\n", "2 Western Australia 206961 16490927 0.012550\n", "3 Victoria 161047 26711477 0.006029\n", "4 South Australia 132064 16943495 0.007794\n", "5 Tasmania 91574 11803719 0.007758\n", "6 Australian Capital Territory 80686 553926 0.145662\n", "7 Northern Territory 5908 294418 0.020067" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reformat the facets\n", "total_df = format_facets(total_data)\n", "# Merge the two dataframes, joining them on the 'state' column\n", "df = pd.merge(df, total_df, on=\"state\", how=\"left\")\n", "# Create a new column that contains the proportion of the total results this search represents\n", "df[\"proportion\"] = df[\"total_x\"] / df[\"total_y\"]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make some better charts!\n", "\n", "Just run the cells!" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# Create a bar chart\n", "# <-- Click the run icon\n", "chart2 = (\n", " alt.Chart(df)\n", " .mark_bar(color=\"#084081\")\n", " .encode(\n", " x=alt.X(\"proportion\", axis=alt.Axis(title=\"Proportion of articles\")),\n", " y=alt.Y(\"state\", axis=alt.Axis(title=\"\")),\n", " tooltip=[alt.Tooltip(\"proportion\", title=\"Proportion\")],\n", " )\n", " .properties(width=300, height=200)\n", ")" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "map2 = (\n", " alt.Chart(alt.Data(values=geo_data[\"features\"]))\n", " .mark_geoshape(stroke=\"black\", strokeWidth=0.2)\n", " .encode(\n", " color=alt.Color(\n", " \"proportion:Q\",\n", " scale=alt.Scale(scheme=\"greenblue\"),\n", " legend=alt.Legend(title=\"Proportion of articles\"),\n", " )\n", " )\n", " .transform_lookup(\n", " lookup=\"properties.STATE_NAME\",\n", " from_=alt.LookupData(df, \"state\", [\"proportion\"]),\n", " )\n", " .project(type=\"mercator\")\n", " .properties(width=400, height=400)\n", ")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.HConcatChart(...)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the charts side by side\n", "# <-- Click the run icon\n", "alt.hconcat(map2, chart2).resolve_legend(color=\"independent\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). \n", "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 4 }