{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting some top-level data from the DigitalNZ API\n",
    "\n",
    "This notebook pokes around at the top-level of DigitalNZ, mainly using facets.\n",
    "\n",
    "See the [API documentation](https://digitalnz.org/developers/api-docs-v3) for more detailed information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.</p>\n",
    "\n",
    "<p>\n",
    "    Some tips:\n",
    "    <ul>\n",
    "        <li>Code cells have boxes around them.</li>\n",
    "        <li>To run a code cell click on the cell and then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook.</li>\n",
    "        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.</li>\n",
    "        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>\n",
    "        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>\n",
    "    </ul>\n",
    "</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import pandas as pd\n",
    "import altair as alt\n",
    "from IPython.display import display, HTML"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Get yourself an API key](https://digitalnz.org/developers/getting-started) and paste it between the quotes below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "api_key = '[YOUR API KEY]'\n",
    "print('Your API key is: {}'.format(api_key))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Base url for queries\n",
    "api_search_url = 'http://api.digitalnz.org/v3/records.json'\n",
    "\n",
    "# Set up the query params (we'll change these later)\n",
    "# Let's start with an empty text query to look at everything\n",
    "def set_params():\n",
    "    params = {\n",
    "        'api_key': api_key,\n",
    "        'text': ''\n",
    "    }\n",
    "    return params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_data(params):\n",
    "    '''\n",
    "    Retrieve an API query and extract the JSON payload.\n",
    "    '''\n",
    "    response = requests.get(api_search_url, params=params)\n",
    "    return response.json()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hello world!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " There are 32,111,791 items\n"
     ]
    }
   ],
   "source": [
    "# How many items are there?\n",
    "params = set_params()\n",
    "data = get_data(params)\n",
    "print(' There are {:,} items'.format(data['search']['result_count']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Items by century"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "params['facets'] = 'century'\n",
    "data = get_data(params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>century</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1900</td>\n",
       "      <td>17209636</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1800</td>\n",
       "      <td>11159985</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2000</td>\n",
       "      <td>2482630</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1700</td>\n",
       "      <td>6087</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1600</td>\n",
       "      <td>2782</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1400</td>\n",
       "      <td>1109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1300</td>\n",
       "      <td>1014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1500</td>\n",
       "      <td>606</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>600</td>\n",
       "      <td>542</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>700</td>\n",
       "      <td>388</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  century     count\n",
       "0    1900  17209636\n",
       "1    1800  11159985\n",
       "2    2000   2482630\n",
       "3    1700      6087\n",
       "4    1600      2782\n",
       "5    1400      1109\n",
       "6    1300      1014\n",
       "7    1500       606\n",
       "8     600       542\n",
       "9     700       388"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "centuries = data['search']['facets']['century']\n",
    "centuries_df = pd.Series(centuries).to_frame().reset_index()\n",
    "centuries_df.columns = ['century', 'count']\n",
    "centuries_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-9813cf4c6a7f451ead726ace381f419f\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-9813cf4c6a7f451ead726ace381f419f\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-9813cf4c6a7f451ead726ace381f419f\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function loadScript(lib) {\n",
       "      return new Promise(function(resolve, reject) {\n",
       "        var s = document.createElement('script');\n",
       "        s.src = paths[lib];\n",
       "        s.async = true;\n",
       "        s.onload = () => resolve(paths[lib]);\n",
       "        s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "        document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "      });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else if (typeof vegaEmbed === \"function\") {\n",
       "      displayChart(vegaEmbed);\n",
       "    } else {\n",
       "      loadScript(\"vega\")\n",
       "        .then(() => loadScript(\"vega-lite\"))\n",
       "        .then(() => loadScript(\"vega-embed\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"hconcat\": [{\"mark\": \"bar\", \"encoding\": {\"tooltip\": {\"type\": \"quantitative\", \"field\": \"count\", \"format\": \",\"}, \"x\": {\"type\": \"ordinal\", \"field\": \"century\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"count\"}}}, {\"mark\": \"bar\", \"encoding\": {\"tooltip\": {\"type\": \"quantitative\", \"field\": \"count\", \"format\": \",\"}, \"x\": {\"type\": \"ordinal\", \"field\": \"century\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"count\", \"scale\": {\"type\": \"log\"}}}}], \"data\": {\"name\": \"data-feb0c62c5795202b46bb61e6db92602d\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-feb0c62c5795202b46bb61e6db92602d\": [{\"century\": \"1900\", \"count\": 17209636}, {\"century\": \"1800\", \"count\": 11159985}, {\"century\": \"2000\", \"count\": 2482630}, {\"century\": \"1700\", \"count\": 6087}, {\"century\": \"1600\", \"count\": 2782}, {\"century\": \"1400\", \"count\": 1109}, {\"century\": \"1300\", \"count\": 1014}, {\"century\": \"1500\", \"count\": 606}, {\"century\": \"600\", \"count\": 542}, {\"century\": \"700\", \"count\": 388}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.HConcatChart(...)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1 = alt.Chart(centuries_df).mark_bar().encode(\n",
    "    x = 'century:O',\n",
    "    y = 'count:Q',\n",
    "    tooltip = alt.Tooltip('count', format=',')\n",
    ")\n",
    "c2 = alt.Chart(centuries_df).mark_bar().encode(\n",
    "    x = 'century:O',\n",
    "    y = alt.Y('count:Q', \n",
    "          scale=alt.Scale(type='log')),\n",
    "    tooltip = alt.Tooltip('count', format=',')\n",
    ")\n",
    "c1 | c2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Items by decade"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "params['facets'] = 'decade'\n",
    "params['facets_per_page'] = 25\n",
    "data = get_data(params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>decade</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1900</td>\n",
       "      <td>6464371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1910</td>\n",
       "      <td>6178640</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1890</td>\n",
       "      <td>4758678</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1880</td>\n",
       "      <td>3663331</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1870</td>\n",
       "      <td>1844200</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  decade    count\n",
       "0   1900  6464371\n",
       "1   1910  6178640\n",
       "2   1890  4758678\n",
       "3   1880  3663331\n",
       "4   1870  1844200"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "decades = data['search']['facets']['decade']\n",
    "decades_df = pd.Series(decades).to_frame().reset_index()\n",
    "decades_df.columns = ['decade', 'count']\n",
    "decades_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-79fdebfe2001453894033eeeea769851\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-79fdebfe2001453894033eeeea769851\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-79fdebfe2001453894033eeeea769851\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function loadScript(lib) {\n",
       "      return new Promise(function(resolve, reject) {\n",
       "        var s = document.createElement('script');\n",
       "        s.src = paths[lib];\n",
       "        s.async = true;\n",
       "        s.onload = () => resolve(paths[lib]);\n",
       "        s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "        document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "      });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else if (typeof vegaEmbed === \"function\") {\n",
       "      displayChart(vegaEmbed);\n",
       "    } else {\n",
       "      loadScript(\"vega\")\n",
       "        .then(() => loadScript(\"vega-lite\"))\n",
       "        .then(() => loadScript(\"vega-embed\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-b0a7ac9f6d16ad1892825170cf3ded32\"}, \"mark\": \"bar\", \"encoding\": {\"tooltip\": {\"type\": \"quantitative\", \"field\": \"count\", \"format\": \",\"}, \"x\": {\"type\": \"ordinal\", \"field\": \"decade\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"count\"}}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-b0a7ac9f6d16ad1892825170cf3ded32\": [{\"decade\": \"1900\", \"count\": 6464371}, {\"decade\": \"1910\", \"count\": 6178640}, {\"decade\": \"1890\", \"count\": 4758678}, {\"decade\": \"1880\", \"count\": 3663331}, {\"decade\": \"1870\", \"count\": 1844200}, {\"decade\": \"2010\", \"count\": 1770667}, {\"decade\": \"1920\", \"count\": 1741989}, {\"decade\": \"1930\", \"count\": 1201114}, {\"decade\": \"1860\", \"count\": 726969}, {\"decade\": \"1940\", \"count\": 658329}, {\"decade\": \"2000\", \"count\": 421910}, {\"decade\": \"2020\", \"count\": 290646}, {\"decade\": \"1960\", \"count\": 274765}, {\"decade\": \"1950\", \"count\": 261727}, {\"decade\": \"1970\", \"count\": 188858}, {\"decade\": \"1990\", \"count\": 166874}, {\"decade\": \"1980\", \"count\": 147218}, {\"decade\": \"1850\", \"count\": 125768}, {\"decade\": \"1840\", \"count\": 43237}, {\"decade\": \"1830\", \"count\": 3152}, {\"decade\": \"1800\", \"count\": 3109}, {\"decade\": \"1820\", \"count\": 1905}, {\"decade\": \"1810\", \"count\": 1415}, {\"decade\": \"1770\", \"count\": 1095}, {\"decade\": \"1790\", \"count\": 1056}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alt.Chart(decades_df).mark_bar().encode(\n",
    "    x = 'decade:O',\n",
    "    y = 'count:Q',\n",
    "    tooltip = alt.Tooltip('count', format=',')\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Top 25 collections"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "params['facets'] = 'display_collection'\n",
    "params['facets_per_page'] = 26\n",
    "data = get_data(params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>collection</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Papers Past</td>\n",
       "      <td>26122911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Radio New Zealand</td>\n",
       "      <td>778363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>iNaturalist NZ — Mātaki Taiao</td>\n",
       "      <td>571510</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>TAPUHI</td>\n",
       "      <td>338051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Auckland Libraries Heritage Images Collection</td>\n",
       "      <td>267112</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      collection     count\n",
       "0                                    Papers Past  26122911\n",
       "1                              Radio New Zealand    778363\n",
       "2                  iNaturalist NZ — Mātaki Taiao    571510\n",
       "3                                         TAPUHI    338051\n",
       "4  Auckland Libraries Heritage Images Collection    267112"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Note that the facet is called 'primary_collection' in the results!\n",
    "collections = data['search']['facets']['primary_collection']\n",
    "collections_df = pd.Series(collections).to_frame().reset_index()\n",
    "collections_df.columns = ['collection', 'count']\n",
    "collections_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Papers Past is so much bigger than anything else, let's exclude it from the chart."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-7df58fdd8e004f398cfb4ae9352d2183\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-7df58fdd8e004f398cfb4ae9352d2183\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-7df58fdd8e004f398cfb4ae9352d2183\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function loadScript(lib) {\n",
       "      return new Promise(function(resolve, reject) {\n",
       "        var s = document.createElement('script');\n",
       "        s.src = paths[lib];\n",
       "        s.async = true;\n",
       "        s.onload = () => resolve(paths[lib]);\n",
       "        s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "        document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "      });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else if (typeof vegaEmbed === \"function\") {\n",
       "      displayChart(vegaEmbed);\n",
       "    } else {\n",
       "      loadScript(\"vega\")\n",
       "        .then(() => loadScript(\"vega-lite\"))\n",
       "        .then(() => loadScript(\"vega-embed\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-066c8c5fc438fd9d0795ff1adddd1cf7\"}, \"mark\": \"bar\", \"encoding\": {\"tooltip\": {\"type\": \"quantitative\", \"field\": \"count\", \"format\": \",\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"count\"}, \"y\": {\"type\": \"nominal\", \"field\": \"collection\"}}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-066c8c5fc438fd9d0795ff1adddd1cf7\": [{\"collection\": \"Radio New Zealand\", \"count\": 778363}, {\"collection\": \"iNaturalist NZ \\u2014 M\\u0101taki Taiao\", \"count\": 571510}, {\"collection\": \"TAPUHI\", \"count\": 338051}, {\"collection\": \"Auckland Libraries Heritage Images Collection\", \"count\": 267112}, {\"collection\": \"Auckland Museum Collections\", \"count\": 261411}, {\"collection\": \"Cenotaph Database\", \"count\": 252931}, {\"collection\": \"New Zealand Gazette\", \"count\": 223448}, {\"collection\": \"New Zealand Electronic Text Collection\", \"count\": 222512}, {\"collection\": \"Te Papa Collections Online\", \"count\": 217714}, {\"collection\": \"Nelson Provincial Museum\", \"count\": 153977}, {\"collection\": \"Archway\", \"count\": 140914}, {\"collection\": \"Puke Ariki\", \"count\": 133622}, {\"collection\": \"Canterbury Museum\", \"count\": 129754}, {\"collection\": \"QuakeStudies Repository\", \"count\": 129439}, {\"collection\": \"Trove\", \"count\": 122094}, {\"collection\": \"Kura Heritage Collections Online\", \"count\": 118991}, {\"collection\": \"National Library of New Zealand Catalogue\", \"count\": 105049}, {\"collection\": \"Antarctica NZ Digital Asset Manager\", \"count\": 58934}, {\"collection\": \"TVNZ\", \"count\": 54346}, {\"collection\": \"Anthropology Photographic Archive\", \"count\": 51499}, {\"collection\": \"Upper Hutt Newspaper Archive\", \"count\": 51184}, {\"collection\": \"Figure.NZ\", \"count\": 49517}, {\"collection\": \"Kete Christchurch\", \"count\": 45056}, {\"collection\": \"Newshub\", \"count\": 37818}, {\"collection\": \"Transactions and Proceedings of the Royal Society of New Zealand 1868-1961\", \"count\": 37625}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alt.Chart(collections_df[1:]).mark_bar().encode(\n",
    "    x=alt.X('count:Q'),\n",
    "    y=alt.Y('collection:N'),\n",
    "    tooltip = alt.Tooltip('count', format=',')\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a dataset of all collections"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "more = True\n",
    "all_collections = {}\n",
    "params['facets'] = 'display_collection'\n",
    "params['facets_per_page'] = 100\n",
    "params['facets_page'] = 1\n",
    "while more:\n",
    "    data = get_data(params)\n",
    "    facets = data['search']['facets']['primary_collection']\n",
    "    if facets:\n",
    "        all_collections.update(facets)\n",
    "        params['facets_page'] += 1\n",
    "    else:\n",
    "        more = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>collection</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Papers Past</td>\n",
       "      <td>26122911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Radio New Zealand</td>\n",
       "      <td>778363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>iNaturalist NZ — Mātaki Taiao</td>\n",
       "      <td>571510</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>TAPUHI</td>\n",
       "      <td>338051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Auckland Libraries Heritage Images Collection</td>\n",
       "      <td>267112</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      collection     count\n",
       "0                                    Papers Past  26122911\n",
       "1                              Radio New Zealand    778363\n",
       "2                  iNaturalist NZ — Mātaki Taiao    571510\n",
       "3                                         TAPUHI    338051\n",
       "4  Auckland Libraries Heritage Images Collection    267112"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_collections_df = pd.Series(all_collections).to_frame().reset_index()\n",
    "all_collections_df.columns = ['collection', 'count']\n",
    "all_collections_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<a href=\"digitalnz_collections.csv\" download>Download CSV file</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "all_collections_df.to_csv('digitalnz_collections.csv', index=False)\n",
    "display(HTML('<a href=\"digitalnz_collections.csv\" download>Download CSV file</a>'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Top 25 newspapers in Papers Past"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "params['facets'] = 'collection'\n",
    "params['and[display_collection][]'] = 'Papers Past'\n",
    "params['facets_per_page'] = 26\n",
    "params['facets_page'] = 1\n",
    "data = get_data(params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>newspaper</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Papers Past</td>\n",
       "      <td>26122911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Evening Post</td>\n",
       "      <td>3772941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Otago Daily Times</td>\n",
       "      <td>1583125</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wanganui Chronicle</td>\n",
       "      <td>1163217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hawera &amp; Normanby Star</td>\n",
       "      <td>1075326</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                newspaper     count\n",
       "0             Papers Past  26122911\n",
       "1            Evening Post   3772941\n",
       "2       Otago Daily Times   1583125\n",
       "3      Wanganui Chronicle   1163217\n",
       "4  Hawera & Normanby Star   1075326"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newspapers = data['search']['facets']['collection']\n",
    "newspapers_df = pd.Series(newspapers).to_frame().reset_index()\n",
    "newspapers_df.columns = ['newspaper', 'count']\n",
    "newspapers_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<div id=\"altair-viz-49d278971e8f41bfb57409826ef9bc0d\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-49d278971e8f41bfb57409826ef9bc0d\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-49d278971e8f41bfb57409826ef9bc0d\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function loadScript(lib) {\n",
       "      return new Promise(function(resolve, reject) {\n",
       "        var s = document.createElement('script');\n",
       "        s.src = paths[lib];\n",
       "        s.async = true;\n",
       "        s.onload = () => resolve(paths[lib]);\n",
       "        s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "        document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "      });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else if (typeof vegaEmbed === \"function\") {\n",
       "      displayChart(vegaEmbed);\n",
       "    } else {\n",
       "      loadScript(\"vega\")\n",
       "        .then(() => loadScript(\"vega-lite\"))\n",
       "        .then(() => loadScript(\"vega-embed\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-e7189ce8320f11903201c2632a3ce22d\"}, \"mark\": \"bar\", \"encoding\": {\"tooltip\": {\"type\": \"quantitative\", \"field\": \"count\", \"format\": \",\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"count\"}, \"y\": {\"type\": \"nominal\", \"field\": \"newspaper\"}}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-e7189ce8320f11903201c2632a3ce22d\": [{\"newspaper\": \"Evening Post\", \"count\": 3772941}, {\"newspaper\": \"Otago Daily Times\", \"count\": 1583125}, {\"newspaper\": \"Wanganui Chronicle\", \"count\": 1163217}, {\"newspaper\": \"Hawera & Normanby Star\", \"count\": 1075326}, {\"newspaper\": \"Marlborough Express\", \"count\": 1036166}, {\"newspaper\": \"Auckland Star\", \"count\": 1005149}, {\"newspaper\": \"Colonist\", \"count\": 999872}, {\"newspaper\": \"Poverty Bay Herald\", \"count\": 963395}, {\"newspaper\": \"Grey River Argus\", \"count\": 890905}, {\"newspaper\": \"Thames Star\", \"count\": 869635}, {\"newspaper\": \"Ashburton Guardian\", \"count\": 830871}, {\"newspaper\": \"Star\", \"count\": 819679}, {\"newspaper\": \"Wanganui Herald\", \"count\": 784091}, {\"newspaper\": \"Nelson Evening Mail\", \"count\": 693164}, {\"newspaper\": \"Taranaki Herald\", \"count\": 677704}, {\"newspaper\": \"Feilding Star\", \"count\": 650958}, {\"newspaper\": \"Otago Witness\", \"count\": 603649}, {\"newspaper\": \"Wairarapa Daily Times\", \"count\": 575596}, {\"newspaper\": \"West Coast Times\", \"count\": 534690}, {\"newspaper\": \"Taranaki Daily News\", \"count\": 525341}, {\"newspaper\": \"Hawke's Bay Herald\", \"count\": 445395}, {\"newspaper\": \"Southland Times\", \"count\": 418609}, {\"newspaper\": \"North Otago Times\", \"count\": 417958}, {\"newspaper\": \"Timaru Herald\", \"count\": 406184}, {\"newspaper\": \"Mataura Ensign\", \"count\": 363231}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alt.Chart(newspapers_df[1:]).mark_bar().encode(\n",
    "    x=alt.X('count:Q'),\n",
    "    y=alt.Y('newspaper:N'),\n",
    "    tooltip = alt.Tooltip('count', format=',')\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## All newspapers in Papers Past"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "more = True\n",
    "all_newspapers = {}\n",
    "params['facets'] = 'collection'\n",
    "params['and[display_collection][]'] = 'Papers Past'\n",
    "params['facets_per_page'] = 100\n",
    "params['facets_page'] = 1\n",
    "while more:\n",
    "    data = get_data(params)\n",
    "    facets = data['search']['facets']['collection']\n",
    "    if facets:\n",
    "        all_newspapers.update(facets)\n",
    "        params['facets_page'] += 1\n",
    "    else:\n",
    "        more = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>newspaper</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Papers Past</td>\n",
       "      <td>26122911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Evening Post</td>\n",
       "      <td>3772941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Otago Daily Times</td>\n",
       "      <td>1583125</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wanganui Chronicle</td>\n",
       "      <td>1163217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hawera &amp; Normanby Star</td>\n",
       "      <td>1075326</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                newspaper     count\n",
       "0             Papers Past  26122911\n",
       "1            Evening Post   3772941\n",
       "2       Otago Daily Times   1583125\n",
       "3      Wanganui Chronicle   1163217\n",
       "4  Hawera & Normanby Star   1075326"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_newspapers_df = pd.Series(all_newspapers).to_frame().reset_index()\n",
    "all_newspapers_df.columns = ['newspaper', 'count']\n",
    "all_newspapers_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<a href=\"paperspast_newspapers.csv\" download>Download CSV file</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "all_newspapers_df[1:].to_csv('paperspast_newspapers.csv', index=False)\n",
    "display(HTML('<a href=\"paperspast_newspapers.csv\" download>Download CSV file</a>'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "\n",
    "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.net/). Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}