{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exploring facets\n",
    "\n",
    "<p class=\"alert alert-info\">New to Jupyter notebooks? Try <a href=\"getting-started/Using_Jupyter_notebooks.ipynb\"><b>Using Jupyter notebooks</b></a> for a quick introduction.</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Facets aggregate collection data in interesting and useful ways, allowing us to build pictures of the collection. This notebook shows you how to get facet data from Trove."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import altair as alt\n",
    "import pandas as pd\n",
    "import requests\n",
    "\n",
    "# Make sure data directory exists\n",
    "os.makedirs(\"data\", exist_ok=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "# Load variables from the .env file if it exists\n",
    "# Use %%capture to suppress messages\n",
    "%load_ext dotenv\n",
    "%dotenv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Insert your API key between the quotes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Your API key is: gq29l1g1h75pimh4\n"
     ]
    }
   ],
   "source": [
    "# This creates a variable called 'api_key', paste your key between the quotes\n",
    "API_KEY = \"\"\n",
    "\n",
    "# Use an api key value from environment variables if it is available (useful for testing)\n",
    "if os.getenv(\"TROVE_API_KEY\"):\n",
    "    API_KEY = os.getenv(\"TROVE_API_KEY\")\n",
    "\n",
    "# This displays a message with your key\n",
    "print(\"Your API key is: {}\".format(API_KEY))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "api_search_url = \"https://api.trove.nla.gov.au/v3/result\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set up our query parameters. We want everything, so we set the `q` parameter to be a single space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = {\n",
    "    \"q\": \" \",  # A space to search for everything\n",
    "    \"facet\": \"format\",\n",
    "    \"category\": \"book\",\n",
    "    \"encoding\": \"json\",\n",
    "    \"n\": 1,\n",
    "}\n",
    "\n",
    "headers = {\"X-API-KEY\": API_KEY}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = requests.get(api_search_url, params=params)\n",
    "data = response.json()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>facet</th>\n",
       "      <th>total</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Archived website</td>\n",
       "      <td>33660</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Article</td>\n",
       "      <td>7377170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Article/Abstract</td>\n",
       "      <td>99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Article/Book chapter</td>\n",
       "      <td>67276</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Article/Conference paper</td>\n",
       "      <td>112605</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Article/Journal or magazine article</td>\n",
       "      <td>1971332</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Article/Other article</td>\n",
       "      <td>4770227</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Article/Report</td>\n",
       "      <td>466581</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Article/Review</td>\n",
       "      <td>285937</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Article/Working paper</td>\n",
       "      <td>73468</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Audio book</td>\n",
       "      <td>321559</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Book</td>\n",
       "      <td>17061706</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Book/Braille</td>\n",
       "      <td>36613</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Book/Illustrated</td>\n",
       "      <td>7922202</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Book/Large print</td>\n",
       "      <td>119801</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Conference Proceedings</td>\n",
       "      <td>483440</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Data set</td>\n",
       "      <td>27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Government publication</td>\n",
       "      <td>226184</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Microform</td>\n",
       "      <td>946703</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Periodical</td>\n",
       "      <td>2113846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Periodical/Journal, magazine, other</td>\n",
       "      <td>2028483</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Periodical/Newspaper</td>\n",
       "      <td>87122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Thesis</td>\n",
       "      <td>38121</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                  facet     total\n",
       "21                     Archived website     33660\n",
       "4                               Article   7377170\n",
       "5                      Article/Abstract        99\n",
       "6                  Article/Book chapter     67276\n",
       "7              Article/Conference paper    112605\n",
       "8   Article/Journal or magazine article   1971332\n",
       "9                 Article/Other article   4770227\n",
       "10                       Article/Report    466581\n",
       "11                       Article/Review    285937\n",
       "12                Article/Working paper     73468\n",
       "18                           Audio book    321559\n",
       "0                                  Book  17061706\n",
       "1                          Book/Braille     36613\n",
       "2                      Book/Illustrated   7922202\n",
       "3                      Book/Large print    119801\n",
       "17               Conference Proceedings    483440\n",
       "22                             Data set        27\n",
       "19               Government publication    226184\n",
       "16                            Microform    946703\n",
       "13                           Periodical   2113846\n",
       "14  Periodical/Journal, magazine, other   2028483\n",
       "15                 Periodical/Newspaper     87122\n",
       "20                               Thesis     38121"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def facet_totals(data):\n",
    "    \"\"\"\n",
    "    Loop through facets saving terms and counts.\n",
    "    Returns a list of dictionaries.\n",
    "    \"\"\"\n",
    "    facets = []\n",
    "    try:\n",
    "        terms = data[\"category\"][0][\"facets\"][\"facet\"][0][\"term\"]\n",
    "    except KeyError:\n",
    "        pass\n",
    "    else:\n",
    "        for term in terms:\n",
    "            facets.append({\"facet\": term[\"search\"], \"total\": int(term[\"count\"])})\n",
    "            if \"term\" in term:\n",
    "                # There be sub-terms!\n",
    "                for subterm in term[\"term\"]:\n",
    "                    facets.append(\n",
    "                        {\"facet\": subterm[\"search\"], \"total\": int(subterm[\"count\"])}\n",
    "                    )\n",
    "    return pd.DataFrame(facets)\n",
    "\n",
    "\n",
    "facet_totals = facet_totals(data)\n",
    "facet_totals.sort_values(\"facet\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign a group by splitting\n",
    "facet_totals[\"group\"] = facet_totals[\"facet\"].apply(lambda x: x.split(\"/\")[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can create a bar chart using Altair. The `x` values will be the zone names, and the `y` values will be the totals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Comment out either or both of these lines if not necessary\n",
    "# Sort by total (highest to lowest) and take the top twenty\n",
    "# top_facets = facet_totals.sort_values(by=\"total\", ascending=False)[:20]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<style>\n",
       "  #altair-viz-c11e470ed2de41e389c96bab6f2c495a.vega-embed {\n",
       "    width: 100%;\n",
       "    display: flex;\n",
       "  }\n",
       "\n",
       "  #altair-viz-c11e470ed2de41e389c96bab6f2c495a.vega-embed details,\n",
       "  #altair-viz-c11e470ed2de41e389c96bab6f2c495a.vega-embed details summary {\n",
       "    position: relative;\n",
       "  }\n",
       "</style>\n",
       "<div id=\"altair-viz-c11e470ed2de41e389c96bab6f2c495a\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
       "  (function(spec, embedOpt){\n",
       "    let outputDiv = document.currentScript.previousElementSibling;\n",
       "    if (outputDiv.id !== \"altair-viz-c11e470ed2de41e389c96bab6f2c495a\") {\n",
       "      outputDiv = document.getElementById(\"altair-viz-c11e470ed2de41e389c96bab6f2c495a\");\n",
       "    }\n",
       "    const paths = {\n",
       "      \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
       "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
       "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.16.3?noext\",\n",
       "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
       "    };\n",
       "\n",
       "    function maybeLoadScript(lib, version) {\n",
       "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
       "      return (VEGA_DEBUG[key] == version) ?\n",
       "        Promise.resolve(paths[lib]) :\n",
       "        new Promise(function(resolve, reject) {\n",
       "          var s = document.createElement('script');\n",
       "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "          s.async = true;\n",
       "          s.onload = () => {\n",
       "            VEGA_DEBUG[key] = version;\n",
       "            return resolve(paths[lib]);\n",
       "          };\n",
       "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
       "          s.src = paths[lib];\n",
       "        });\n",
       "    }\n",
       "\n",
       "    function showError(err) {\n",
       "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
       "      throw err;\n",
       "    }\n",
       "\n",
       "    function displayChart(vegaEmbed) {\n",
       "      vegaEmbed(outputDiv, spec, embedOpt)\n",
       "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
       "    }\n",
       "\n",
       "    if(typeof define === \"function\" && define.amd) {\n",
       "      requirejs.config({paths});\n",
       "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
       "    } else {\n",
       "      maybeLoadScript(\"vega\", \"5\")\n",
       "        .then(() => maybeLoadScript(\"vega-lite\", \"5.16.3\"))\n",
       "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
       "        .catch(showError)\n",
       "        .then(() => displayChart(vegaEmbed));\n",
       "    }\n",
       "  })({\"config\": {\"view\": {\"continuousWidth\": 300, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-4b1a65bf6168b91b8d3b64119ae14124\"}, \"mark\": {\"type\": \"bar\"}, \"encoding\": {\"color\": {\"field\": \"group\", \"type\": \"nominal\"}, \"tooltip\": [{\"field\": \"facet\", \"type\": \"nominal\"}, {\"field\": \"total\", \"format\": \",\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"total\", \"type\": \"quantitative\"}, \"y\": {\"field\": \"facet\", \"type\": \"nominal\"}}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.16.3.json\", \"datasets\": {\"data-4b1a65bf6168b91b8d3b64119ae14124\": [{\"facet\": \"Book\", \"total\": 17061706, \"group\": \"Book\"}, {\"facet\": \"Book/Braille\", \"total\": 36613, \"group\": \"Book\"}, {\"facet\": \"Book/Illustrated\", \"total\": 7922202, \"group\": \"Book\"}, {\"facet\": \"Book/Large print\", \"total\": 119801, \"group\": \"Book\"}, {\"facet\": \"Article\", \"total\": 7377170, \"group\": \"Article\"}, {\"facet\": \"Article/Abstract\", \"total\": 99, \"group\": \"Article\"}, {\"facet\": \"Article/Book chapter\", \"total\": 67276, \"group\": \"Article\"}, {\"facet\": \"Article/Conference paper\", \"total\": 112605, \"group\": \"Article\"}, {\"facet\": \"Article/Journal or magazine article\", \"total\": 1971332, \"group\": \"Article\"}, {\"facet\": \"Article/Other article\", \"total\": 4770227, \"group\": \"Article\"}, {\"facet\": \"Article/Report\", \"total\": 466581, \"group\": \"Article\"}, {\"facet\": \"Article/Review\", \"total\": 285937, \"group\": \"Article\"}, {\"facet\": \"Article/Working paper\", \"total\": 73468, \"group\": \"Article\"}, {\"facet\": \"Periodical\", \"total\": 2113846, \"group\": \"Periodical\"}, {\"facet\": \"Periodical/Journal, magazine, other\", \"total\": 2028483, \"group\": \"Periodical\"}, {\"facet\": \"Periodical/Newspaper\", \"total\": 87122, \"group\": \"Periodical\"}, {\"facet\": \"Microform\", \"total\": 946703, \"group\": \"Microform\"}, {\"facet\": \"Conference Proceedings\", \"total\": 483440, \"group\": \"Conference Proceedings\"}, {\"facet\": \"Audio book\", \"total\": 321559, \"group\": \"Audio book\"}, {\"facet\": \"Government publication\", \"total\": 226184, \"group\": \"Government publication\"}, {\"facet\": \"Thesis\", \"total\": 38121, \"group\": \"Thesis\"}, {\"facet\": \"Archived website\", \"total\": 33660, \"group\": \"Archived website\"}, {\"facet\": \"Data set\", \"total\": 27, \"group\": \"Data set\"}]}}, {\"mode\": \"vega-lite\"});\n",
       "</script>"
      ],
      "text/plain": [
       "alt.Chart(...)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a bar chart\n",
    "alt.Chart(facet_totals).mark_bar().encode(\n",
    "    x=\"total:Q\",\n",
    "    y=\"facet:N\",\n",
    "    color=\"group:N\",\n",
    "    tooltip=[\"facet:N\", alt.Tooltip(\"total:Q\", format=\",\")],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": [
     "nbval-skip"
    ]
   },
   "outputs": [],
   "source": [
    "facet_totals.to_csv(f\"data/facet-{params['facet']}.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once you've saved this file, you can download it from the workbench [data directory](data)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Going further\n",
    "\n",
    "For an in depth exploration of facets in the newspaper zone and how they can help us visualise change over time, see [Visualise Trove newspaper searches over time](https://glam-workbench.github.io/trove-newspapers/#visualise-trove-newspaper-searches-over-time)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "\n",
    "Created by [Tim Sherrratt](https://timsherratt.org) for the [GLAM workbench](https://glam-workbench.net/). Support this project by [becoming a GitHub sponsor](https://github.com/sponsors/wragge?o=esb)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}