{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring facets\n", "\n", "

New to Jupyter notebooks? Try Using Jupyter notebooks for a quick introduction.

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Facets aggregate collection data in interesting and useful ways, allowing us to build pictures of the collection. This notebook shows you how to get facet data from Trove." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "import altair as alt\n", "import pandas as pd\n", "import requests\n", "\n", "# Make sure data directory exists\n", "os.makedirs(\"data\", exist_ok=True)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Load variables from the .env file if it exists\n", "# Use %%capture to suppress messages\n", "%load_ext dotenv\n", "%dotenv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Insert your API key between the quotes." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your API key is: gq29l1g1h75pimh4\n" ] } ], "source": [ "# This creates a variable called 'api_key', paste your key between the quotes\n", "API_KEY = \"\"\n", "\n", "# Use an api key value from environment variables if it is available (useful for testing)\n", "if os.getenv(\"TROVE_API_KEY\"):\n", " API_KEY = os.getenv(\"TROVE_API_KEY\")\n", "\n", "# This displays a message with your key\n", "print(\"Your API key is: {}\".format(API_KEY))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "api_search_url = \"https://api.trove.nla.gov.au/v3/result\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set up our query parameters. We want everything, so we set the `q` parameter to be a single space." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "params = {\n", " \"q\": \" \", # A space to search for everything\n", " \"facet\": \"format\",\n", " \"category\": \"book\",\n", " \"encoding\": \"json\",\n", " \"n\": 1,\n", "}\n", "\n", "headers = {\"X-API-KEY\": API_KEY}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "response = requests.get(api_search_url, params=params)\n", "data = response.json()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facettotal
21Archived website33660
4Article7377170
5Article/Abstract99
6Article/Book chapter67276
7Article/Conference paper112605
8Article/Journal or magazine article1971332
9Article/Other article4770227
10Article/Report466581
11Article/Review285937
12Article/Working paper73468
18Audio book321559
0Book17061706
1Book/Braille36613
2Book/Illustrated7922202
3Book/Large print119801
17Conference Proceedings483440
22Data set27
19Government publication226184
16Microform946703
13Periodical2113846
14Periodical/Journal, magazine, other2028483
15Periodical/Newspaper87122
20Thesis38121
\n", "
" ], "text/plain": [ " facet total\n", "21 Archived website 33660\n", "4 Article 7377170\n", "5 Article/Abstract 99\n", "6 Article/Book chapter 67276\n", "7 Article/Conference paper 112605\n", "8 Article/Journal or magazine article 1971332\n", "9 Article/Other article 4770227\n", "10 Article/Report 466581\n", "11 Article/Review 285937\n", "12 Article/Working paper 73468\n", "18 Audio book 321559\n", "0 Book 17061706\n", "1 Book/Braille 36613\n", "2 Book/Illustrated 7922202\n", "3 Book/Large print 119801\n", "17 Conference Proceedings 483440\n", "22 Data set 27\n", "19 Government publication 226184\n", "16 Microform 946703\n", "13 Periodical 2113846\n", "14 Periodical/Journal, magazine, other 2028483\n", "15 Periodical/Newspaper 87122\n", "20 Thesis 38121" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def facet_totals(data):\n", " \"\"\"\n", " Loop through facets saving terms and counts.\n", " Returns a list of dictionaries.\n", " \"\"\"\n", " facets = []\n", " try:\n", " terms = data[\"category\"][0][\"facets\"][\"facet\"][0][\"term\"]\n", " except KeyError:\n", " pass\n", " else:\n", " for term in terms:\n", " facets.append({\"facet\": term[\"search\"], \"total\": int(term[\"count\"])})\n", " if \"term\" in term:\n", " # There be sub-terms!\n", " for subterm in term[\"term\"]:\n", " facets.append(\n", " {\"facet\": subterm[\"search\"], \"total\": int(subterm[\"count\"])}\n", " )\n", " return pd.DataFrame(facets)\n", "\n", "\n", "facet_totals = facet_totals(data)\n", "facet_totals.sort_values(\"facet\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Assign a group by splitting\n", "facet_totals[\"group\"] = facet_totals[\"facet\"].apply(lambda x: x.split(\"/\")[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can create a bar chart using Altair. The `x` values will be the zone names, and the `y` values will be the totals." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Comment out either or both of these lines if not necessary\n", "# Sort by total (highest to lowest) and take the top twenty\n", "# top_facets = facet_totals.sort_values(by=\"total\", ascending=False)[:20]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a bar chart\n", "alt.Chart(facet_totals).mark_bar().encode(\n", " x=\"total:Q\",\n", " y=\"facet:N\",\n", " color=\"group:N\",\n", " tooltip=[\"facet:N\", alt.Tooltip(\"total:Q\", format=\",\")],\n", ")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [ "nbval-skip" ] }, "outputs": [], "source": [ "facet_totals.to_csv(f\"data/facet-{params['facet']}.csv\", index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you've saved this file, you can download it from the workbench [data directory](data)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going further\n", "\n", "For an in depth exploration of facets in the newspaper zone and how they can help us visualise change over time, see [Visualise Trove newspaper searches over time](https://glam-workbench.github.io/trove-newspapers/#visualise-trove-newspaper-searches-over-time)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherrratt](https://timsherratt.org) for the [GLAM workbench](https://glam-workbench.net/). Support this project by [becoming a GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }