{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Find results by country in DigitalNZ\n", "\n", "Many items in DigtalNZ include location information. This can include a country, but as far as I can see there's no direct way to search for results relating to a particular country using the API.\n", "\n", "You can, however, search for geocoded locations using bounding boxes. This notebook shows how you can use this to search for countries. It makes use of the handy [country-bounding-boxes](https://github.com/graydon/country-bounding-boxes) Python package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import pandas as pd\n", "from country_bounding_boxes import country_subunits_by_iso_code\n", "import ipywidgets as widgets\n", "import iso3166\n", "from tqdm.auto import tqdm\n", "from vega_datasets import data as vega_data\n", "import altair as alt\n", "from IPython.display import display, HTML\n", "import random" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get your DigitalNZ API key\n", "\n", "[Get yourself an API key](https://digitalnz.org/developers/getting-started) and paste it between the quotes below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "api_key = '[YOUR API KEY]'\n", "print('Your API key is: {}'.format(api_key))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make a results by country widget\n", "\n", "First we'll make a little widget that you can use to find the number of results in DigitalNZ for a particular country." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def search_by_country(country):\n", " # Set basic parameters for the API query\n", " params = {\n", " 'api_key': api_key,\n", " 'text': ''\n", " }\n", " result_count = 0\n", " \n", " # We'll collect the results we get to display a sample\n", " results = []\n", " \n", " # Get bounding boxes for the supplied two letter country code\n", " # Note that there may be more than one bounding box per country\n", " bboxes = [c.bbox for c in country_subunits_by_iso_code(country.alpha2)] \n", " \n", " # Loop through bounding boxes\n", " for bbox in bboxes:\n", " \n", " # DigitalNZ expects a bounding box in the format N,W,S,E.\n", " # We have to reorganise things and save the bbox as a string to use in the API query\n", " bbox_str = '{3},{0},{1},{2}'.format(*bbox)\n", " \n", " # Add the bbox to the query params\n", " params['geo_bbox'] = bbox_str\n", " \n", " # Make the request\n", " response = requests.get('http://api.digitalnz.org/v3/records.json', params)\n", " \n", " # Get the result as JSON\n", " data = response.json()\n", " \n", " # Get the total results\n", " result_count += data['search']['result_count']\n", " \n", " # Add the results to our collection for this country\n", " results += data['search']['results']\n", " \n", " # Display the number of results\n", " html = '

There are {:,} items related to {}.

'.format(result_count, country.name)\n", " if results:\n", " # If there are more than 10 results select a random sample of 10\n", " sample = 10 if len(results) > 10 else len(results)\n", " html += '

Here are {} of {:,} results:

'\n", " display(HTML(html))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This widget is bound to the search_by_country function\n", "# Selecting a country will run the function and display the results\n", "widgets.interact(search_by_country, country=iso3166.countries_by_name);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find the number of results for every country\n", "\n", "Now we'll get the number of results for every country and do something with them." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def totals_by_country():\n", " country_totals = []\n", " params = {\n", " 'api_key': api_key,\n", " 'text': ''\n", " }\n", " for country in tqdm(iso3166.countries):\n", " result_count = 0\n", " bboxes = [c.bbox for c in country_subunits_by_iso_code(country.alpha2)] \n", " for bbox in bboxes:\n", " # DigitalNZ expects N,W,S,E\n", " bbox_str = '{3},{0},{1},{2}'.format(*bbox)\n", " params['geo_bbox'] = bbox_str\n", " response = requests.get('http://api.digitalnz.org/v3/records.json', params)\n", " data = response.json()\n", " result_count += data['search']['result_count']\n", " country_totals.append({'country': country.name, 'alpha_code': country.alpha2, 'numeric_code': int(country.numeric), 'results': result_count})\n", " return country_totals " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "totals = totals_by_country()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryalpha_codenumeric_coderesults
159New ZealandNZ554723861
8AntarcticaAQ1018450
158New CaledoniaNC5401460
13AustraliaAU36452
75FranceFR250438
236United States of AmericaUS840364
39CanadaCA124265
21BelgiumBE56254
73FijiFJ242226
204Solomon IslandsSB90218
\n", "
" ], "text/plain": [ " country alpha_code numeric_code results\n", "159 New Zealand NZ 554 723861\n", "8 Antarctica AQ 10 18450\n", "158 New Caledonia NC 540 1460\n", "13 Australia AU 36 452\n", "75 France FR 250 438\n", "236 United States of America US 840 364\n", "39 Canada CA 124 265\n", "21 Belgium BE 56 254\n", "73 Fiji FJ 242 226\n", "204 Solomon Islands SB 90 218" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert the results to a dataframe\n", "df = pd.DataFrame(totals).sort_values(['results'], ascending=False)\n", "df.head(10)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Download file" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Save the results to a CSV file in case you want to download\n", "df.to_csv('country_totals.csv', index=False)\n", "display(HTML('Download file'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dispay the results on a choropleth map\n", "\n", "As you can see from the sample above, New Zealand and Antarctica have *lots* more item than anywhere else. To make it possible to see differences between the other countries in the world, we'll limit the scale of the map to 600." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the country outlines\n", "world = alt.topo_feature(vega_data.world_110m.url, 'countries')\n", "\n", "# Create the chart\n", "# Note domain setting to limit upper value of scale\n", "alt.Chart(world).mark_geoshape(stroke='black', strokeWidth=0.2).encode(\n", " color=alt.Color('results:Q', scale=alt.Scale(scheme='greenblue', domain=[0,600])),\n", " tooltip=['results:Q']\n", ").transform_lookup(\n", " lookup='id',\n", " from_=alt.LookupData(df, 'numeric_code', ['results'])\n", ").project(\n", " type='equirectangular'\n", ").properties(\n", " width=800,\n", " height=500\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.net/). Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }