{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting some top-level data from the DigitalNZ API\n", "\n", "This notebook pokes around at the top-level of DigitalNZ, mainly using facets.\n", "\n", "See the [API documentation](https://digitalnz.org/developers/api-docs-v3) for more detailed information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import pandas as pd\n", "import altair as alt\n", "from IPython.display import display, HTML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Get yourself an API key](https://digitalnz.org/developers/getting-started) and paste it between the quotes below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "api_key = '[YOUR API KEY]'\n", "print('Your API key is: {}'.format(api_key))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Base url for queries\n", "api_search_url = 'http://api.digitalnz.org/v3/records.json'\n", "\n", "# Set up the query params (we'll change these later)\n", "# Let's start with an empty text query to look at everything\n", "def set_params():\n", " params = {\n", " 'api_key': api_key,\n", " 'text': ''\n", " }\n", " return params" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def get_data(params):\n", " '''\n", " Retrieve an API query and extract the JSON payload.\n", " '''\n", " response = requests.get(api_search_url, params=params)\n", " return response.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hello world!" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " There are 32,111,791 items\n" ] } ], "source": [ "# How many items are there?\n", "params = set_params()\n", "data = get_data(params)\n", "print(' There are {:,} items'.format(data['search']['result_count']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Items by century" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "params['facets'] = 'century'\n", "data = get_data(params)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centurycount
0190017209636
1180011159985
220002482630
317006087
416002782
514001109
613001014
71500606
8600542
9700388
\n", "
" ], "text/plain": [ " century count\n", "0 1900 17209636\n", "1 1800 11159985\n", "2 2000 2482630\n", "3 1700 6087\n", "4 1600 2782\n", "5 1400 1109\n", "6 1300 1014\n", "7 1500 606\n", "8 600 542\n", "9 700 388" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "centuries = data['search']['facets']['century']\n", "centuries_df = pd.Series(centuries).to_frame().reset_index()\n", "centuries_df.columns = ['century', 'count']\n", "centuries_df" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.HConcatChart(...)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1 = alt.Chart(centuries_df).mark_bar().encode(\n", " x = 'century:O',\n", " y = 'count:Q',\n", " tooltip = alt.Tooltip('count', format=',')\n", ")\n", "c2 = alt.Chart(centuries_df).mark_bar().encode(\n", " x = 'century:O',\n", " y = alt.Y('count:Q', \n", " scale=alt.Scale(type='log')),\n", " tooltip = alt.Tooltip('count', format=',')\n", ")\n", "c1 | c2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Items by decade" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "params['facets'] = 'decade'\n", "params['facets_per_page'] = 25\n", "data = get_data(params)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
decadecount
019006464371
119106178640
218904758678
318803663331
418701844200
\n", "
" ], "text/plain": [ " decade count\n", "0 1900 6464371\n", "1 1910 6178640\n", "2 1890 4758678\n", "3 1880 3663331\n", "4 1870 1844200" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "decades = data['search']['facets']['decade']\n", "decades_df = pd.Series(decades).to_frame().reset_index()\n", "decades_df.columns = ['decade', 'count']\n", "decades_df.head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(decades_df).mark_bar().encode(\n", " x = 'decade:O',\n", " y = 'count:Q',\n", " tooltip = alt.Tooltip('count', format=',')\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Top 25 collections" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "params['facets'] = 'display_collection'\n", "params['facets_per_page'] = 26\n", "data = get_data(params)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
collectioncount
0Papers Past26122911
1Radio New Zealand778363
2iNaturalist NZ — Mātaki Taiao571510
3TAPUHI338051
4Auckland Libraries Heritage Images Collection267112
\n", "
" ], "text/plain": [ " collection count\n", "0 Papers Past 26122911\n", "1 Radio New Zealand 778363\n", "2 iNaturalist NZ — Mātaki Taiao 571510\n", "3 TAPUHI 338051\n", "4 Auckland Libraries Heritage Images Collection 267112" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note that the facet is called 'primary_collection' in the results!\n", "collections = data['search']['facets']['primary_collection']\n", "collections_df = pd.Series(collections).to_frame().reset_index()\n", "collections_df.columns = ['collection', 'count']\n", "collections_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Papers Past is so much bigger than anything else, let's exclude it from the chart." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(collections_df[1:]).mark_bar().encode(\n", " x=alt.X('count:Q'),\n", " y=alt.Y('collection:N'),\n", " tooltip = alt.Tooltip('count', format=',')\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a dataset of all collections" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "more = True\n", "all_collections = {}\n", "params['facets'] = 'display_collection'\n", "params['facets_per_page'] = 100\n", "params['facets_page'] = 1\n", "while more:\n", " data = get_data(params)\n", " facets = data['search']['facets']['primary_collection']\n", " if facets:\n", " all_collections.update(facets)\n", " params['facets_page'] += 1\n", " else:\n", " more = False" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
collectioncount
0Papers Past26122911
1Radio New Zealand778363
2iNaturalist NZ — Mātaki Taiao571510
3TAPUHI338051
4Auckland Libraries Heritage Images Collection267112
\n", "
" ], "text/plain": [ " collection count\n", "0 Papers Past 26122911\n", "1 Radio New Zealand 778363\n", "2 iNaturalist NZ — Mātaki Taiao 571510\n", "3 TAPUHI 338051\n", "4 Auckland Libraries Heritage Images Collection 267112" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_collections_df = pd.Series(all_collections).to_frame().reset_index()\n", "all_collections_df.columns = ['collection', 'count']\n", "all_collections_df.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Download CSV file" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "all_collections_df.to_csv('digitalnz_collections.csv', index=False)\n", "display(HTML('Download CSV file'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Top 25 newspapers in Papers Past" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "params['facets'] = 'collection'\n", "params['and[display_collection][]'] = 'Papers Past'\n", "params['facets_per_page'] = 26\n", "params['facets_page'] = 1\n", "data = get_data(params)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
newspapercount
0Papers Past26122911
1Evening Post3772941
2Otago Daily Times1583125
3Wanganui Chronicle1163217
4Hawera & Normanby Star1075326
\n", "
" ], "text/plain": [ " newspaper count\n", "0 Papers Past 26122911\n", "1 Evening Post 3772941\n", "2 Otago Daily Times 1583125\n", "3 Wanganui Chronicle 1163217\n", "4 Hawera & Normanby Star 1075326" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newspapers = data['search']['facets']['collection']\n", "newspapers_df = pd.Series(newspapers).to_frame().reset_index()\n", "newspapers_df.columns = ['newspaper', 'count']\n", "newspapers_df.head()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(newspapers_df[1:]).mark_bar().encode(\n", " x=alt.X('count:Q'),\n", " y=alt.Y('newspaper:N'),\n", " tooltip = alt.Tooltip('count', format=',')\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## All newspapers in Papers Past" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "more = True\n", "all_newspapers = {}\n", "params['facets'] = 'collection'\n", "params['and[display_collection][]'] = 'Papers Past'\n", "params['facets_per_page'] = 100\n", "params['facets_page'] = 1\n", "while more:\n", " data = get_data(params)\n", " facets = data['search']['facets']['collection']\n", " if facets:\n", " all_newspapers.update(facets)\n", " params['facets_page'] += 1\n", " else:\n", " more = False" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
newspapercount
0Papers Past26122911
1Evening Post3772941
2Otago Daily Times1583125
3Wanganui Chronicle1163217
4Hawera & Normanby Star1075326
\n", "
" ], "text/plain": [ " newspaper count\n", "0 Papers Past 26122911\n", "1 Evening Post 3772941\n", "2 Otago Daily Times 1583125\n", "3 Wanganui Chronicle 1163217\n", "4 Hawera & Normanby Star 1075326" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_newspapers_df = pd.Series(all_newspapers).to_frame().reset_index()\n", "all_newspapers_df.columns = ['newspaper', 'count']\n", "all_newspapers_df.head()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Download CSV file" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "all_newspapers_df[1:].to_csv('paperspast_newspapers.csv', index=False)\n", "display(HTML('Download CSV file'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.net/). Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }