{ "cells": [ { "cell_type": "markdown", "id": "6cf0a308-d390-4a60-b7a9-db48f24b9b73", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# QueryPic\n", "\n", "#### Visualise searches in Trove's newspapers and gazettes\n", "\n", "[View in GLAM Workbench](https://glam-workbench.net/trove-newspapers/) · [View code](https://github.com/GLAM-Workbench/trove-newspapers/blob/master/querypic.ipynb)\n", "\n", "What does it mean when your search in [Trove's digitised newspapers](https://trove.nla.gov.au/newspaper/) returns 3 million results? QueryPic helps you explore your search results by showing you how they change over time – aggregating the number of articles matching your query by day, month, or year.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 1, "id": "5b58d28e-06d9-4db4-9d57-28bb9b0e72de", "metadata": {}, "outputs": [], "source": [ "# This notebook is designed to run in Voila\n", "# If you can see the code, just select 'View > Open with Voila in new browser tab' from the menu." ] }, { "cell_type": "code", "execution_count": 2, "id": "96b91880-9793-43b9-b6b7-6569e73de2e4", "metadata": {}, "outputs": [], "source": [ "%%capture\n", "\n", "import os\n", "import re\n", "from calendar import monthrange\n", "from operator import itemgetter # used for sorting\n", "\n", "import altair as alt\n", "import arrow\n", "import ipywidgets as widgets\n", "import pandas as pd # makes manipulating the data easier\n", "import requests_cache\n", "from dotenv import load_dotenv\n", "from IPython.display import HTML, display\n", "from requests.adapters import HTTPAdapter\n", "from requests.packages.urllib3.util.retry import Retry\n", "from tqdm.auto import tqdm\n", "from trove_query_parser.parser import parse_query\n", "\n", "load_dotenv()\n", "\n", "# Make sure data directory exists\n", "os.makedirs(\"data\", exist_ok=True)\n", "\n", "# Create a session that will automatically retry on server errors\n", "s = requests_cache.CachedSession(\"querypic\", expire_after=60 * 60)\n", "retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504])\n", "s.mount(\"http://\", HTTPAdapter(max_retries=retries))\n", "s.mount(\"https://\", HTTPAdapter(max_retries=retries))\n", "\n", "# CONFIG SO THAT ALTAIR HREFS OPEN IN A NEW TAB\n", "\n", "\n", "def blank_href():\n", " return {\"usermeta\": {\"embedOptions\": {\"loader\": {\"target\": \"_blank\"}}}}\n", "\n", "\n", "# register the custom theme under a chosen name\n", "alt.themes.register(\"blank_href\", blank_href)\n", "\n", "# enable the newly registered theme\n", "alt.themes.enable(\"blank_href\")\n", "\n", "dfs = []\n", "queries = []\n", "unit = None\n", "shifted = False\n", "API_KEY = \"\"" ] }, { "cell_type": "code", "execution_count": 3, "id": "8fdd456a-072d-4dc6-a1e2-96337790359b", "metadata": {}, "outputs": [], "source": [ "def get_results(params):\n", " \"\"\"\n", " Get JSON response data from the Trove API.\n", " Parameters:\n", " params\n", " Returns:\n", " JSON formatted response data from Trove API\n", " \"\"\"\n", " response = s.get(\n", " \"https://api.trove.nla.gov.au/v3/result\",\n", " params=params,\n", " headers={\"X-API-KEY\": API_KEY},\n", " timeout=30,\n", " )\n", " response.raise_for_status()\n", " # display(response.url) # This shows us the url that's sent to the API\n", " data = response.json()\n", " return data\n", "\n", "\n", "def get_year_facets(data, start, end):\n", " \"\"\"\n", " Loop through facets in Trove API response, saving terms and counts.\n", " Parameters:\n", " data - JSON formatted response data from Trove API\n", " Returns:\n", " A list of dictionaries containing: 'year', 'total_results'\n", " \"\"\"\n", " dates = {}\n", " try:\n", " for term in data[\"category\"][0][\"facets\"][\"facet\"][0][\"term\"]:\n", " if int(term[\"search\"]) >= start and int(term[\"search\"]) <= end:\n", " dates[f'{term[\"search\"]}-01-01'] = int(term[\"count\"])\n", " except (TypeError, KeyError):\n", " pass\n", " return dates\n", "\n", "\n", "def get_month_facets(data, year, start, end):\n", " \"\"\"\n", " Loop through facets in Trove API response, saving terms and counts.\n", " Parameters:\n", " data - JSON formatted response data from Trove API\n", " \"\"\"\n", " dates = {}\n", " try:\n", " for term in data[\"category\"][0][\"facets\"][\"facet\"][0][\"term\"]:\n", " iso_date = f'{year}-{int(term[\"search\"])}-01'\n", " date = arrow.get(iso_date)\n", " if date >= arrow.get(start) and date <= arrow.get(end):\n", " dates[iso_date] = int(term[\"count\"])\n", " except (TypeError, KeyError):\n", " pass\n", " return dates\n", "\n", "\n", "def combine_totals(query_data, total_data, start, end, unit):\n", " \"\"\"\n", " Take facets data from the query search and a blank search (ie everything) for a decade and combine them.\n", " Parameters:\n", " query_data - list of dictionaries containing facets data from a query search\n", " total_data - list of dictionaries containing facets data from a blank search\n", " Returns:\n", " A list of dictionaries containing: 'year', 'total_results', 'total articles'\n", " \"\"\"\n", " totals = []\n", " # These are for cases where a full datetime is provided\n", " if unit == \"year\":\n", " start = f\"{start[:4]}-01-01\"\n", " elif unit == \"month\":\n", " start = f\"{start[:7]}-01\"\n", " start_date = arrow.get(start)\n", " if shifted and unit == \"day\":\n", " start_date = start_date.shift(days=+1)\n", " end_date = arrow.get(end)\n", " while start_date <= end_date:\n", " totals.append(\n", " {\n", " \"date\": start_date.format(\"YYYY-MM-DD\"),\n", " \"total_results\": query_data.get(start_date.format(\"YYYY-MM-DD\"), 0),\n", " \"total_articles\": total_data.get(start_date.format(\"YYYY-MM-DD\"), 0),\n", " }\n", " )\n", " if unit == \"year\":\n", " start_date = start_date.shift(years=+1)\n", " elif unit == \"month\":\n", " start_date = start_date.shift(months=+1)\n", " elif unit == \"day\":\n", " start_date = start_date.shift(days=+1)\n", " return totals\n", "\n", "\n", "def clean_params(params):\n", " \"\"\"\n", " Remove unwanted facets from query to get total articles.\n", " \"\"\"\n", " keep = [\n", " \"l-decade\",\n", " \"l-year\",\n", " \"l-month\",\n", " \"l-title\",\n", " \"l-state\",\n", " \"l-artType\",\n", " \"key\",\n", " \"encoding\",\n", " \"q\",\n", " \"n\",\n", " \"category\",\n", " \"facet\",\n", " ]\n", " params_c = params.copy()\n", " for k in list(params_c.keys()):\n", " if k not in keep:\n", " del params_c[k]\n", " return params_c\n", "\n", "\n", "def year_totals(params):\n", " \"\"\"\n", " Generate a dataset for a search query.\n", " Parameters:\n", " params: the API search parameters\n", " Returns:\n", " A list of dicts, each containing:\n", " - date\n", " - total_results\n", " - total_articles\n", " \"\"\"\n", " global unit\n", " query_dates = {}\n", " total_dates = {}\n", " params_c = params.copy()\n", " q = params_c[\"q\"]\n", " if choose_unit.value != \"auto\":\n", " unit = choose_unit.value\n", " start, end, _ = set_date_range(params_c)\n", " else:\n", " start, end, unit = set_date_range(params_c)\n", " start_year = int(start[:4])\n", " end_year = int(end[:4])\n", " with results:\n", " if unit == \"year\":\n", " start_decade = int(start[:3])\n", " end_decade = int(end[:3])\n", " for decade in tqdm(range(start_decade, end_decade + 1), leave=False):\n", " params_c[\"facet\"] = \"year\"\n", " params_c[\"q\"] = q\n", " params_c[\"l-decade\"] = decade\n", " query_data = get_results(params_c)\n", " params_cleaned = clean_params(params_c)\n", " params_cleaned[\"q\"] = \" \"\n", " total_data = get_results(params_cleaned)\n", " query_dates.update(get_year_facets(query_data, start_year, end_year))\n", " total_dates.update(get_year_facets(total_data, start_year, end_year))\n", " totals = combine_totals(query_dates, total_dates, start, end, unit)\n", " totals.sort(key=itemgetter(\"date\"))\n", " elif unit == \"month\":\n", " for year in tqdm(range(start_year, end_year + 1), leave=False):\n", " params_c[\"q\"] = q\n", " params_c[\"l-decade\"] = str(year)[:3]\n", " params_c[\"l-year\"] = year\n", " params_c[\"facet\"] = \"month\"\n", " query_data = get_results(params_c)\n", " params_cleaned = clean_params(params_c)\n", " params_cleaned[\"q\"] = \" \"\n", " total_data = get_results(params_cleaned)\n", " query_dates.update(get_month_facets(query_data, year, start, end))\n", " total_dates.update(get_month_facets(total_data, year, start, end))\n", " totals = combine_totals(query_dates, total_dates, start, end, unit)\n", " elif unit == \"day\":\n", " totals = []\n", " start_date = arrow.get(start)\n", " if shifted:\n", " start_date = start_date.shift(days=+1)\n", " end_date = arrow.get(end)\n", " with tqdm(total=(end_date - start_date).days + 1, leave=False) as pbar:\n", " while start_date <= end_date:\n", " q = re.sub(r\" date:\\[.+\\]\", \"\", q)\n", " from_date = start_date.shift(days=-1).format(\"YYYY-MM-DDT00:00:00\")\n", " to_date = start_date.format(\"YYYY-MM-DDT00:00:00\")\n", " q = q + f\" date:[{from_date}Z TO {to_date}Z]\"\n", " params_c[\"q\"] = q\n", " query_data = get_results(params_c)\n", " params_cleaned = clean_params(params_c)\n", " params_cleaned[\"q\"] = f\"date:[{from_date}Z TO {to_date}Z]\"\n", " total_data = get_results(params_cleaned)\n", " totals.append(\n", " {\n", " \"date\": to_date,\n", " \"total_results\": int(\n", " query_data[\"category\"][0][\"records\"][\"total\"]\n", " ),\n", " \"total_articles\": int(\n", " total_data[\"category\"][0][\"records\"][\"total\"]\n", " ),\n", " }\n", " )\n", " start_date = start_date.shift(days=+1)\n", " pbar.update(1)\n", " return totals\n", "\n", "\n", "def set_date_range(params):\n", " \"\"\"\n", " Determines the date range from the query paramaters,\n", " then uses the date range to set the time unit.\n", " Returns:\n", " - start: start date (ISO format)\n", " - end: end date (ISO format)\n", " - unit: one of 'year', 'month', or 'day'\n", " \"\"\"\n", " global shifted\n", " shifted = False\n", " if \"l-month\" in params:\n", " start = f'{params[\"l-year\"][0]}-{params[\"l-month\"][0]}-01'\n", " end = f'{params[\"l-year\"][0]}-{params[\"l-month\"][0]}-{monthrange(int(params[\"l-year\"][0]), int(params[\"l-month\"][0]))[1]}'\n", " unit = \"day\"\n", " elif \"l-year\" in params:\n", " start = params[\"l-year\"][0] + \"-01-01\"\n", " end = params[\"l-year\"][0] + \"-12-31\"\n", " unit = \"month\"\n", " elif \"l-decade\" in params:\n", " start = params[\"l-decade\"][0] + \"0-01-01\"\n", " end = params[\"l-decade\"][0] + \"9-12-31\"\n", " unit = \"month\"\n", " elif \"date:\" in params[\"q\"]:\n", " date_range = re.search(r\"date:\\[(.+)\\]\", params[\"q\"]).group(1)\n", " start_date, _, end_date = date_range.split()\n", " if len(start_date) > 4:\n", " shifted = True\n", " diff = arrow.get(end_date) - arrow.get(start_date)\n", " days = diff.days\n", " # More than a 20 years\n", " if days > 2 * 3653:\n", " start = start_date[:10]\n", " end = end_date[:10]\n", " unit = \"year\"\n", " # A single year\n", " elif days == 0:\n", " start = start_date[:10]\n", " end = (\n", " arrow.get(end_date[:10])\n", " .shift(years=+1)\n", " .shift(days=-1)\n", " .format(\"YYYY-MM-DD\")\n", " )\n", " unit = \"month\"\n", " elif days < 94:\n", " start = start_date[:10]\n", " end = end_date[:10]\n", " unit = \"day\"\n", " else:\n", " start = start_date[:10]\n", " end = end_date[:10]\n", " unit = \"month\"\n", " else:\n", " start = \"1803-01-01\"\n", " end = arrow.now().format(\"YYYY-01-01\")\n", " unit = \"year\"\n", " return start, end, unit\n", "\n", "\n", "def show_results(view=\"raw\"):\n", " \"\"\"\n", " Display the chart and the save data options.\n", " \"\"\"\n", " results.clear_output(wait=True)\n", " save_data.clear_output(wait=True)\n", " chart = make_chart(view=view)\n", " chart_type.unobserve(change_chart, \"value\")\n", " chart_type.value = \"raw\"\n", " chart_type.observe(change_chart, \"value\")\n", " csv_file = save_as_csv()\n", " with results:\n", " display(chart_type)\n", " display(chart)\n", " with save_data:\n", " display(\n", " widgets.HBox(\n", " [save_chart_button, save_chart_width, save_chart_height],\n", " layout=widgets.Layout(margin=\"50px 0 50px 0\"),\n", " ),\n", " # ,\n", " )\n", " display(HTML(f'Download data: {csv_file}'))\n", "\n", "\n", "def make_chart(view, width=800, height=400):\n", " \"\"\"\n", " Create the chart.\n", " Parameters:\n", " - view: either 'raw' or 'relative'\n", " - width: in pixels\n", " - height: in pixels\n", " \"\"\"\n", " # Combine dfs into a single df\n", " df = pd.concat(dfs, ignore_index=True)\n", " # Define shared tooltips\n", " tooltip = [\n", " alt.Tooltip(\"id\", title=\"query\"),\n", " alt.Tooltip(\"total_results:Q\", title=\"results\", format=\",\"),\n", " alt.Tooltip(\"PercentOfTotal:Q\", title=\"proportion\", format=\".2%\"),\n", " ]\n", " # Configure x & tooltips based on time unit\n", " if unit == \"year\":\n", " x = alt.X(\n", " \"year(date):T\",\n", " axis=alt.Axis(title=\"Year\", labelAngle=45),\n", " scale=alt.Scale(padding=10),\n", " )\n", " tooltip.insert(1, alt.Tooltip(\"year(date):T\", title=\"year\"))\n", " elif unit == \"month\":\n", " x = alt.X(\n", " \"yearmonth(date):T\",\n", " axis=alt.Axis(title=\"Month\"),\n", " scale=alt.Scale(padding=10),\n", " )\n", " tooltip.insert(1, alt.Tooltip(\"yearmonth(date):T\", title=\"month\"))\n", " elif unit == \"day\":\n", " x = alt.X(\n", " \"date:T\",\n", " axis=alt.Axis(title=\"Date\", format=\"%e %b %Y\"),\n", " scale=alt.Scale(padding=10),\n", " )\n", " tooltip.insert(1, alt.Tooltip(\"date:T\", title=\"date\", format=\"%A, %e %b %Y\"))\n", " # Configure y based on cahrt view type\n", " if view == \"raw\":\n", " y = alt.Y(\n", " \"total_results:Q\", axis=alt.Axis(format=\",d\", title=\"Number of articles\")\n", " )\n", " elif view == \"relative\":\n", " y = alt.Y(\n", " \"PercentOfTotal:Q\",\n", " axis=alt.Axis(format=\".2%\", title=\"Percentage of total articles\"),\n", " )\n", " # Create chart\n", " plot = (\n", " alt.Chart(df)\n", " .mark_line(point=True, interpolate=\"cardinal\")\n", " .encode(\n", " x=x,\n", " y=y,\n", " tooltip=tooltip,\n", " color=alt.Color(\"id\", legend=alt.Legend(title=\"\")),\n", " href=\"url:N\",\n", " )\n", " .properties(\n", " width=width,\n", " height=height,\n", " title={\n", " \"text\": \"Trove Newspapers & Gazettes Search\",\n", " \"subtitle\": f'Created by QueryPic: {arrow.now().format(\"D MMMM YYYY\")}',\n", " },\n", " )\n", " .transform_calculate(\n", " PercentOfTotal=\"datum.total_results / datum.total_articles\"\n", " )\n", " )\n", " # Create text chart listing queries\n", " query_list = list_queries()\n", " # Combine charts\n", " chart = (\n", " alt.vconcat(plot, query_list)\n", " .configure(padding=20)\n", " .configure_view(strokeWidth=0)\n", " .configure_title(fontSize=14)\n", " )\n", " return chart\n", "\n", "\n", "def list_queries():\n", " \"\"\"\n", " Creates a text-based chart that lists the saved queries.\n", " \"\"\"\n", " df = pd.DataFrame(queries)\n", " chart = (\n", " alt.Chart(df)\n", " .mark_text(align=\"left\", dx=2, dy=1, baseline=\"middle\")\n", " .encode(\n", " x=alt.X(\"x:Q\", title=None, axis=None, scale=alt.Scale(range=[0, 1])),\n", " y=alt.Y(\n", " \"id:O\",\n", " title=None,\n", " axis=alt.Axis(labelFontWeight=\"bold\", domain=False, grid=False),\n", " ),\n", " text=\"url:N\",\n", " href=\"url\",\n", " color=alt.value(\"blue\"),\n", " )\n", " )\n", " return chart\n", "\n", "\n", "def clear_all(b):\n", " \"\"\"\n", " Clear all queries and results.\n", " \"\"\"\n", " global dfs, queries\n", " dfs = []\n", " queries = []\n", " query.value = \"\"\n", " results.clear_output()\n", " save_data.clear_output()\n", "\n", "\n", "def clear_last(b):\n", " \"\"\"\n", " Remove the most recent query from the chart.\n", " \"\"\"\n", " global dfs, queries\n", " results.clear_output()\n", " save_data.clear_output()\n", " dfs.pop()\n", " queries.pop()\n", " if dfs:\n", " show_results()\n", "\n", "\n", "def save_chart(b):\n", " \"\"\"\n", " Save the chart as HTML for download.\n", " \"\"\"\n", " width = save_chart_width.value\n", " height = save_chart_height.value\n", " if chart_type.value == \"proportion\":\n", " chart = make_chart(\"relative\", width, height)\n", " else:\n", " chart = make_chart(\"raw\", width, height)\n", " filename = f'data/querypic-{arrow.now().format(\"YYYYMMDDHHmmss\")}.html'\n", " chart.save(filename)\n", " with save_data:\n", " display(\n", " HTML(f\"Download HTML version: {filename}\")\n", " )\n", " # display(widgets.HBox([save_chart_button, save_chart_width, save_chart_height], layout=widgets.Layout(margin='50px 0 50px 0')))\n", "\n", "\n", "def save_as_csv():\n", " \"\"\"\n", " Save harvested data as a CSV for download.\n", " \"\"\"\n", " df = pd.concat(dfs, ignore_index=True)\n", " filename = f'data/querypic-{arrow.now().format(\"YYYYMMDDHHmmss\")}.csv'\n", " df.to_csv(filename, index=False)\n", " return filename\n", "\n", "\n", "def change_chart(o):\n", " \"\"\"\n", " Switch between chart views.\n", " \"\"\"\n", " results.clear_output()\n", " if chart_type.value == \"proportion\":\n", " view = \"relative\"\n", " else:\n", " view = \"raw\"\n", " chart = make_chart(view)\n", " # chart_type.value = view\n", " with results:\n", " display(chart_type)\n", " display(chart)\n", "\n", "\n", "def add_date_query(date):\n", " date_from = arrow.get(date).shift(days=-1).format(\"YYYY-MM-DD\")\n", " date_query = f\"date:[{date_from}T00:00:00Z TO {date}T00:00:00Z]\"\n", " url = re.sub(r\"\\s*date:\\[.+\\]\", \"\", query.value)\n", " url = re.sub(r\"(keyword=[^&]+)\", r\"\\1 \" + date_query, url)\n", " return url\n", "\n", "\n", "def add_urls_to_df(df):\n", " url = re.sub(r\"\\s*date:\\[.+\\]\", \"\", query.value)\n", " if unit == \"year\":\n", " df[\"url\"] = df[\"date\"].apply(lambda x: f\"{url}&l-decade={x[:3]}&l-year={x[:4]}\")\n", " elif unit == \"month\":\n", " df[\"url\"] = df[\"date\"].apply(\n", " lambda x: f'{url}&l-decade={x[:3]}&l-year={x[:4]}&l-month={x[5:7].strip(\"0\")}'\n", " )\n", " elif unit == \"day\":\n", " df[\"url\"] = df[\"date\"].apply(lambda x: add_date_query(x))\n", " return df\n", "\n", "\n", "def get_data(b):\n", " \"\"\"\n", " Assemble the data and prepare it for display.\n", " \"\"\"\n", " global dfs, queries, API_KEY\n", " # Add current query to queries list\n", " queries.append(\n", " {\n", " \"x\": 0,\n", " \"y\": len(queries),\n", " \"id\": f\"Query {len(queries) + 1}\",\n", " \"url\": query.value,\n", " \"params\": query.value.split(\"?\")[1],\n", " }\n", " )\n", " # Extract params from query\n", " params = parse_query(query.value, 3)\n", " # Add extra params for API\n", " API_KEY = api_key.value\n", " params[\"encoding\"] = \"json\"\n", " params[\"n\"] = 1\n", " # Limit to newspapers if no specific category set\n", " if \",\" in params[\"category\"]:\n", " params[\"category\"] = \"newspaper\"\n", " # Get the data\n", " totals = year_totals(params)\n", " # Convert to dataframe\n", " df = pd.DataFrame(totals)\n", " # Add urls to the data rows\n", " df = add_urls_to_df(df)\n", " # Add a query id to the dataframe\n", " df[\"id\"] = f\"Query {len(queries)}\"\n", " # Add current ddf to list of dfs\n", " dfs.append(df)\n", " # Display the results\n", " show_results()\n", "\n", "\n", "# CREATE WIDGETS\n", "\n", "results = widgets.Output()\n", "save_data = widgets.Output()\n", "\n", "chart_type = widgets.Dropdown(\n", " options=[\n", " (\"Raw number of results\", \"raw\"),\n", " (\"Proportion of total articles\", \"proportion\"),\n", " ],\n", " value=\"raw\",\n", ")\n", "\n", "chart_type.observe(change_chart, \"value\")\n", "\n", "api_key = widgets.Password(\n", " placeholder=\"Enter your Trove API key\",\n", " description=\"API key:\",\n", " disabled=False,\n", " value=\"\",\n", ")\n", "\n", "query = widgets.Text(\n", " placeholder=\"Enter your search query\",\n", " description=\"Query:\",\n", " disabled=False,\n", " value=\"\",\n", " layout=widgets.Layout(width=\"80%\"),\n", ")\n", "\n", "choose_unit = widgets.Dropdown(\n", " options=[\n", " (\"Automatic\", \"auto\"),\n", " (\"Year\", \"year\"),\n", " (\"Month\", \"month\"),\n", " (\"Day\", \"day\"),\n", " ],\n", " value=\"auto\",\n", " description=\"Time unit:\",\n", ")\n", "\n", "clear_last_button = widgets.Button(\n", " description=\"Remove last query\",\n", " disabled=False,\n", " button_style=\"\", # 'success', 'info', 'warning', 'danger' or ''\n", " tooltip=\"Remove the last query\",\n", " icon=\"\",\n", ")\n", "\n", "clear_all_button = widgets.Button(\n", " description=\"Clear all queries\",\n", " disabled=False,\n", " button_style=\"\", # 'success', 'info', 'warning', 'danger' or ''\n", " tooltip=\"Clear current queries\",\n", " icon=\"\",\n", ")\n", "\n", "get_data_button = widgets.Button(\n", " description=\"Visualise query\",\n", " disabled=False,\n", " button_style=\"primary\", # 'success', 'info', 'warning', 'danger' or ''\n", " tooltip=\"Create chart from query\",\n", " icon=\"\",\n", ")\n", "\n", "save_chart_button = widgets.Button(\n", " description=\"Save chart as HTML\",\n", " disabled=False,\n", " button_style=\"primary\", # 'success', 'info', 'warning', 'danger' or ''\n", " tooltip=\"Save chart as HTML\",\n", " icon=\"\",\n", ")\n", "\n", "save_chart_width = widgets.BoundedIntText(\n", " value=700, min=700, max=2000, step=100, description=\"Width\", disabled=False\n", ")\n", "\n", "save_chart_height = widgets.BoundedIntText(\n", " value=400, min=400, max=1500, step=100, description=\"Height\", disabled=False\n", ")\n", "\n", "clear_all_button.on_click(clear_all)\n", "clear_last_button.on_click(clear_last)\n", "get_data_button.on_click(get_data)\n", "save_chart_button.on_click(save_chart)" ] }, { "cell_type": "markdown", "id": "92122b41-ac09-413a-820d-b627edb0eee8", "metadata": {}, "source": [ "## 1. Enter your Trove API key\n", "\n", "Get your own [Trove API key](https://trove.nla.gov.au/about/create-something/using-api) and enter it below." ] }, { "cell_type": "code", "execution_count": 4, "id": "eeb4138a-c8f9-4c02-b03d-39270dd1c240", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6222a6ea23e94f908e9ff0f261f782e7", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Password(description='API key:', placeholder='Enter your Trove API key')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(api_key)" ] }, { "cell_type": "markdown", "id": "89237347-a145-4f00-9ed4-14ed4755ef94", "metadata": {}, "source": [ "## 2. Enter your search url\n", "\n", "Construct your search using the [Trove web interface](https://trove.nla.gov.au/newspaper/), then just copy and paste the url into the box below." ] }, { "cell_type": "code", "execution_count": 5, "id": "3b76984e-c761-4938-8756-35a08d37bfb1", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3972b18127e2440d83f6fe681ea2aba3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='', description='Query:', layout=Layout(width='80%'), placeholder='Enter your search query')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(query)" ] }, { "cell_type": "markdown", "id": "a1c2da63-d5fa-4cea-9210-7ee5aecf3c3d", "metadata": {}, "source": [ "## 3. Select the scale (optional)\n", "\n", "QueryPic aggregates search results by time units – either 'year', 'month', or 'day'. If you choose 'Automatic' in the list below, QueryPic will choose a unit based on the date range of your query, trying to balance resolution and efficiency:\n", "\n", "\n", "\n", "If you're not happy with these results you can select your own time unit." ] }, { "cell_type": "code", "execution_count": 6, "id": "052e5136-1381-4981-b1b4-054a3484a233", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a127d622a9244b8098399950c70f370a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Time unit:', options=(('Automatic', 'auto'), ('Year', 'year'), ('Month', 'month'), ('Day…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(choose_unit)" ] }, { "cell_type": "markdown", "id": "6977f588-5f28-4c22-b716-16d765ca3931", "metadata": {}, "source": [ "## 4. Create your chart\n", "\n", "You can add as many queries as you want to a single chart." ] }, { "cell_type": "code", "execution_count": 7, "id": "77c6095d-4ac4-4826-8921-e50c1476833a", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "37f3490591a743cf998ad54234931260", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HBox(children=(Button(button_style='primary', description='Visualise query', style=ButtonStyle(…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(\n", " widgets.VBox(\n", " [\n", " widgets.HBox(\n", " [get_data_button, clear_last_button, clear_all_button],\n", " layout=widgets.Layout(margin=\"0 0 20px 0\"),\n", " ),\n", " results,\n", " save_data,\n", " ]\n", " )\n", ")" ] }, { "cell_type": "markdown", "id": "3db7579a-6a6c-4a7a-9dc3-5d06faea3b78", "metadata": {}, "source": [ "## 5. Questions?\n", "\n", "**But what are you actually searching?** For more ways of analysing Trove's digitised newspaper corpus, see the [Trove Newspapers in Context](https://glam-workbench.net/trove-newspapers/#trove-newspapers-in-context) of the GLAM Workbench.\n", "\n", "**How does it work?** For examples of how to use Trove's facets to construct high-level visualisations like these, see [Visualise Trove newspaper searches over time](https://glam-workbench.net/trove-newspapers/#visualise-trove-newspaper-searches-over-time).\n", "\n", "**Any problems?** Feel free to ask questions in the [GLAM Workbench section](https://ozglam.chat/c/glam-workbench/8) of OzGLAM Help." ] }, { "cell_type": "code", "execution_count": 8, "id": "b61f3132-4cde-44e7-aca4-cae96816f2ee", "metadata": {}, "outputs": [], "source": [ "# TESTING\n", "if os.getenv(\"GW_STATUS\") == \"dev\" and os.getenv(\"TROVE_API_KEY\"):\n", " api_key.value = os.getenv(\"TROVE_API_KEY\")\n", " query.value = \"https://trove.nla.gov.au/search/category/newspapers?keyword=cat\"\n", " get_data_button.click()" ] }, { "cell_type": "markdown", "id": "a101f6ba-2c21-4cfb-ba43-c47395a8b630", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb).\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "rocrate": { "author": [ { "mainEntityOfPage": "https://timsherratt.au", "name": "Sherratt, Tim", "orcid": "https://orcid.org/0000-0001-7956-4498" } ], "category": "Visualising searches", "description": "QueryPic helps you explore your newspapers search results by showing you how they change over time – aggregating the number of articles matching your query by day, month, or year.", "mainEntityOfPage": "https://glam-workbench.net/trove-newspapers/querypic/", "name": "QueryPic", "position": 1, "softwareRequirements": "Voila" } }, "nbformat": 4, "nbformat_minor": 5 }