{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Map Trove newspaper results by state\n", "\n", "Version 2 of the Trove API adds the `state` facet to the `newspapers` zone. This means we can easily get the number of articles from a search query published in each state." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add your API key" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This creates a variable called 'api_key', paste your key between the quotes\n", "# <-- Then click the run icon \n", "api_key = 'YOUR API KEY'\n", "\n", "# This displays a message with your key\n", "print('Your API key is: {}'.format(api_key))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting things up\n", "\n", "You don't need to edit anything here. Just run the cells to load the bits and pieces we need." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Import the libraries we need\n", "# <-- Click the run icon \n", "import requests\n", "import pandas as pd\n", "import os\n", "import altair as alt\n", "import json" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Set up default parameters for our API query\n", "# <-- Click the run icon \n", "params = {\n", " 'zone': 'newspaper',\n", " 'encoding': 'json',\n", " 'facet': 'state',\n", " 'n': '1',\n", " 'key': api_key\n", "}\n", "\n", "api_url = 'http://api.trove.nla.gov.au/v2/result'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Construct your search\n", "\n", "This is where you set your search keywords. Change 'weather' in the cell below to anything you might enter in the Trove simple search box. For example:\n", "\n", "`params['q'] = 'weather AND wragge'`\n", "\n", "`params['q'] = '\"Clement Wragge\"'`\n", "\n", "`params['q'] = 'text:\"White Australia Policy\"'`\n", "\n", "`params['q'] = 'weather AND date:[1890-01-01T00:00:00Z TO 1920-12-11T00:00:00Z]'`\n", "\n", "You can also limit the results to specific categories. To only search for articles, include this line:\n", "\n", "`params['l-category'] = 'Article'`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Enter your search parameters\n", "# This can be anything you'd enter in the Trove simple search box\n", "params['q'] = 'radio'\n", "\n", "# Remove the \"#\" symbol from the line below to limit the results to the article category\n", "params['l-category'] = 'Article'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get the data from Trove\n", "\n", "Everything's set up, so just run the cells!\n", "\n", "### Make an API request" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# <-- Click the run icon \n", "response = requests.get(api_url, params=params)\n", "data = response.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reformat the results" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statetotal
0New South Wales524367
1Queensland332178
2Western Australia193398
3Victoria139507
4South Australia130366
5Tasmania89391
6Australian Capital Territory79916
9Northern Territory5876
\n", "
" ], "text/plain": [ " state total\n", "0 New South Wales 524367\n", "1 Queensland 332178\n", "2 Western Australia 193398\n", "3 Victoria 139507\n", "4 South Australia 130366\n", "5 Tasmania 89391\n", "6 Australian Capital Territory 79916\n", "9 Northern Territory 5876" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# <-- Click the run icon \n", "def format_facets(data):\n", " facets = data['response']['zone'][0]['facets']['facet']['term']\n", " df = pd.DataFrame(facets)\n", " df = df[['display', 'count']]\n", " df.columns = ['state', 'total']\n", " df['total'] = pd.to_numeric(df['total'], errors='coerce')\n", " df = df.replace('ACT', 'Australian Capital Territory')\n", " df = df[(df['state'] != 'National') & (df['state'] != 'International')]\n", " return df\n", "df = format_facets(data)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make some charts!\n", "\n", "Just run the cells!" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Create a bar chart\n", "# <-- Click the run icon \n", "chart = alt.Chart(df).mark_bar(color='#084081').encode(\n", " x=alt.X('total', axis=alt.Axis(title='Total articles')),\n", " y=alt.Y('state', axis=alt.Axis(title='')),\n", " tooltip=[alt.Tooltip('total', title='Total articles')]\n", ").properties(width=300, height=200)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Make a chloropleth map\n", "# <-- Click the run icon \n", "with open('data/aus_state.geojson', \"r\") as geo_file:\n", " geo_data = json.load(geo_file)\n", "map = alt.Chart(alt.Data(values=geo_data['features'])\n", " ).mark_geoshape(stroke='black', strokeWidth=0.2\n", " ).encode(color=alt.Color('total:Q', scale=alt.Scale(scheme='greenblue'), legend=alt.Legend(title='Total articles'))\n", " ).transform_lookup(lookup='properties.STATE_NAME', from_=alt.LookupData(df, 'state', ['total'])\n", " ).project(type='mercator'\n", " ).properties(width=400, height=400)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.HConcatChart(...)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the charts side by side\n", "# <-- Click the run icon \n", "alt.hconcat(map, chart).resolve_legend(\n", " color=\"independent\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate proportions\n", "\n", "Of course, this is just the raw number of results. Some states have more newspapers, so we have to be careful about making comparisons. This time we'll divide the number of search results for each state by the total number of articles published in that state to see what proportion our search represents. This should provide a more meaningful basis for comparison.\n", "\n", "To calculate what proportion of the total number of articles the search results represent, we need to make another API request to find out how many articles were published over the same time frame. This time we'll leave the keywords out of the query. If you've set a date range, you'll want to keep it as the value for `params['q']`. Otherwise set `params['q']` to a string containing nothing but a space, eg:\n", "\n", "```\n", "params['q'] = ' '\n", "```\n", "\n", "or\n", "\n", "```\n", "params['q'] = 'date:[* TO 1954]'\n", "```" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "#params['q'] = ' '\n", "params['q'] = 'date:[* TO 1954]'\n", "response = requests.get(api_url, params=params)\n", "total_data = response.json()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statetotal_xtotal_yproportion
0New South Wales524367550285530.009529
1Queensland332178293357540.011323
2Western Australia193398154230710.012540
3Victoria139507260998830.005345
4South Australia130366168356230.007743
5Tasmania89391116672270.007662
6Australian Capital Territory799165539260.144272
7Northern Territory58762944180.019958
\n", "
" ], "text/plain": [ " state total_x total_y proportion\n", "0 New South Wales 524367 55028553 0.009529\n", "1 Queensland 332178 29335754 0.011323\n", "2 Western Australia 193398 15423071 0.012540\n", "3 Victoria 139507 26099883 0.005345\n", "4 South Australia 130366 16835623 0.007743\n", "5 Tasmania 89391 11667227 0.007662\n", "6 Australian Capital Territory 79916 553926 0.144272\n", "7 Northern Territory 5876 294418 0.019958" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reformat the facets\n", "total_df = format_facets(total_data)\n", "# Merge the two dataframes, joining them on the 'state' column\n", "df = pd.merge(df, total_df, on='state', how='left')\n", "# Create a new column that contains the proportion of the total results this search represents\n", "df['proportion'] = df['total_x']/df['total_y']\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make some better charts!\n", "\n", "Just run the cells!" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Create a bar chart\n", "# <-- Click the run icon \n", "chart2 = alt.Chart(df).mark_bar(color='#084081').encode(\n", " x=alt.X('proportion', axis=alt.Axis(title='Proportion of articles')),\n", " y=alt.Y('state', axis=alt.Axis(title='')),\n", " tooltip=[alt.Tooltip('proportion', title='Proportion')]\n", ").properties(width=300, height=200)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "map2 = alt.Chart(alt.Data(values=geo_data['features'])\n", " ).mark_geoshape(stroke='black', strokeWidth=0.2\n", " ).encode(color=alt.Color('proportion:Q', scale=alt.Scale(scheme='greenblue'), legend=alt.Legend(title='Proportion of articles'))\n", " ).transform_lookup(lookup='properties.STATE_NAME', from_=alt.LookupData(df, 'state', ['proportion'])\n", " ).project(type='mercator'\n", " ).properties(width=400, height=400)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.HConcatChart(...)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the charts side by side\n", "# <-- Click the run icon \n", "alt.hconcat(map2, chart2).resolve_legend(\n", " color=\"independent\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }