{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DFAT Cable Finder\n", "\n", "**If you ever need to find a file in the National Archives of Australia that contains a specific numbered cable from the Department of Foreign Affairs this is the tool for you!**\n", "\n", "Just give it a cable number and it will look in the series listed below for a file that might contain the cable. For each possible match it returns a link to the file as well as a bit of information about it.\n", "\n", "This tool works because many of the files in these series include the first and last numbered cable in the file title. So all it does is look at the numbers in each file title to see if the cable you're interested in falls somewhere between them. It's simple, but it's not something you can do in RecordSearch.\n", "\n", "It's far from perfect because the way the file titles are constructed are not always consistent, but it's quicker than looking through all the file titles manually.\n", "\n", "Series searched:\n", "\n", "* [A11785](http://www.naa.gov.au/cgi-bin/Search?O=S&Number=A11785) – Top Secret original and spares inward cables, annual single number series (1948-1972)\n", "* [A11786](http://www.naa.gov.au/cgi-bin/Search?O=S&Number=A11786) – Top Secret original and spares outward cables, single number series (1948-1972)\n", "* [A3195](http://www.naa.gov.au/cgi-bin/Search?O=S&Number=A3195) – Master sheets (used stencils) of inwards cables, annual single number series (1939-1949)\n", "* [A3196](http://www.naa.gov.au/cgi-bin/Search?O=S&Number=A3196) – Master sheets (used stencils) of outwards cables, annual single number series (1939-1949)\n", "* [A6364](http://www.naa.gov.au/cgi-bin/Search?O=S&Number=A6364) – Printed copies of inward cables with I (Inward) prefix filed in binders alphabetically by post (1950-1974)\n", "* [A6366](http://www.naa.gov.au/cgi-bin/Search?O=S&Number=A6366) – Printed copies of outward cables with O (Outward) prefix filed in binders alphabetically by post (1950-1974)\n", "\n", "Let me know if you'd like additional series added. If you want to refresh the series data from RecordSearch, just delete the `cables_data.json` file before running a search. The tool will then reharvest all the data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import json\n", "import re\n", "from copy import deepcopy\n", "\n", "import ipywidgets as widgets\n", "from IPython.display import HTML, display\n", "from recordsearch_data_scraper.scrapers import RSItemSearch" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "series = [\"A11785\", \"A11786\", \"A3195 \", \"A3196\", \"A6364\", \"A6366\"]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Find files containing this numbered cable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ab395e28819e46e483b4460703c15e10", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Text(value='', description='Cable:', placeholder='enter cable number'), HTML(value='

Filte…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def get_total_files(series):\n", " \"\"\"\n", " Get the number of files in a series.\n", " \"\"\"\n", " results = RSItemSearch(sort=5, digitised=False, series=series)\n", " return int(results.total_results)\n", "\n", "\n", "def get_files(series):\n", " \"\"\"\n", " Harvest file details from a series in RecordSearch\n", " \"\"\"\n", " all_results = []\n", " item_search = RSItemSearch(series=series, sort=5)\n", " more = True\n", " while more:\n", " results = item_search.get_results()\n", " all_results += results\n", " if not results:\n", " more = False\n", " return all_results\n", "\n", "\n", "def refresh_data():\n", " \"\"\"\n", " Harvest data from the listed series and save the results in a json file.\n", " \"\"\"\n", " results = []\n", " for s in series:\n", " results += get_files(s)\n", " with open(\"cables_data.json\", \"w\") as json_file:\n", " json.dump(results, json_file)\n", " return results\n", "\n", "\n", "def load_data():\n", " \"\"\"\n", " Try to load preharvested data.\n", " If the data file doesn't exist, harvest it.\n", " \"\"\"\n", " try:\n", " with open(\"cables_data.json\", \"r\") as json_file:\n", " results = json.load(json_file)\n", " except (FileNotFoundError, json.JSONDecodeError):\n", " results = refresh_data()\n", " return results\n", "\n", "\n", "def check_year(r, year):\n", " keep = False\n", " try:\n", " start = int(r[\"contents_dates\"][\"start_date\"][:4])\n", " end = int(r[\"contents_dates\"][\"end_date\"][:4])\n", " except (TypeError, KeyError):\n", " pass\n", " else:\n", " if int(year) >= start and int(year) <= end:\n", " keep = True\n", " return keep\n", "\n", "\n", "def find_cable(cable, series=None, year=None):\n", " display_results.clear_output()\n", " # Load pre harvested data\n", " results = load_data()\n", " try:\n", " cable_num = int(re.search(r\"[OI0]{0,1}\\.{0,1}\\s*?(\\d+)\", cable).group(1))\n", " except AttributeError:\n", " print(\"Not a number\")\n", " filtered_results = deepcopy(results)\n", " if series:\n", " filtered_results = [r for r in filtered_results if r[\"series\"] == series]\n", " if year:\n", " filtered_results = [r for r in filtered_results if check_year(r, year) is True]\n", " for result in filtered_results:\n", " # Start conservatively, looking for O or I in front of numbers\n", " cables = re.findall(r\"[OI]{1}\\.{0,1}\\s*?(\\d+)\", result[\"title\"])\n", " if len(cables) == 0:\n", " # If that didn't work find all numbers\n", " cables = re.findall(r\"\\d+\", result[\"title\"])\n", " if len(cables) > 2:\n", " # If there are too many numbers, exclude ones that look like years\n", " cables = [c for c in cables if not re.search(r\"^19[1-9]{1}\\d{1}$\", c)]\n", " # Just right\n", " # print(cables)\n", " if len(cables) == 2:\n", " if cable_num >= int(cables[0]) and cable_num <= int(cables[1]):\n", " # Display the details of each candidate\n", " html = '

NAA: {}, {}'.format(\n", " result[\"identifier\"], result[\"series\"], result[\"control_symbol\"]\n", " )\n", " html += \"
{}\".format(result[\"title\"])\n", " html += \"
{}\".format(result[\"contents_dates\"][\"date_str\"])\n", " if result[\"digitised_status\"] is True:\n", " html += \"
Digitised: {} pages\".format(result[\"digitised_pages\"])\n", " html += \"

\"\n", " with display_results:\n", " display(HTML(html))\n", "\n", "\n", "def run_query(b):\n", " find_cable(cable.value, series=series_select.value, year=year.value)\n", "\n", "\n", "# All the widgety things\n", "series_options = [(s, s) for s in series]\n", "series_options[0] = (\"All\", None)\n", "series_select = widgets.Dropdown(options=series_options, description=\"Series:\")\n", "year = widgets.Text(\n", " value=None, placeholder=\"filter by year, eg 1940\", description=\"Year:\"\n", ")\n", "cable = widgets.Text(value=None, placeholder=\"enter cable number\", description=\"Cable:\")\n", "display_results = widgets.Output(layout=widgets.Layout(margin=\"40px 0 0 0\"))\n", "button = widgets.Button(\n", " description=\"Find files!\",\n", " button_style=\"primary\",\n", " layout=widgets.Layout(margin=\"20px 0 0 0\"),\n", ")\n", "button.on_click(run_query)\n", "display(HTML(\"

Find files containing this numbered cable

\"))\n", "display(\n", " widgets.VBox(\n", " [\n", " cable,\n", " widgets.HTML(\n", " \"

Filter by series and/or year to reduce the number of results

\"\n", " ),\n", " series_select,\n", " year,\n", " button,\n", " display_results,\n", " ]\n", " )\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "0b6365c927b34006a7d6aa480ca4e257": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "TextModel", "state": { "description": "Year:", "layout": "IPY_MODEL_9226780600654fa496fe1500d1476580", "placeholder": "filter by year, eg 1940", "style": "IPY_MODEL_2689bfb63ecf4a8795aba7b8516a646b" } }, "2689bfb63ecf4a8795aba7b8516a646b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "2e8c03149f0a4700ac049c2bf061cd99": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DropdownModel", "state": { "_options_labels": [ "All", "A11786", "A3195 ", "A3196", "A6364", "A6366" ], "description": "Series:", "index": 3, "layout": "IPY_MODEL_6574b274926a4fdfb84b82f8873b0d29", "style": "IPY_MODEL_6b48b5fae4ca4b3999b777d27878a33b" } }, "34b35a2556854a2382017cfa161f8464": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ButtonModel", "state": { "button_style": "primary", "description": "Find files!", "layout": "IPY_MODEL_718e8de84f7d47e890002ac97040dc06", "style": "IPY_MODEL_5f64a5f2162a4837b092fe16f5394f31" } }, "37d09fae37934b66b0a27c247887b228": { "model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": { "layout": "IPY_MODEL_3c93a8a5ce224b8bb400f9fba3fa8b52", "outputs": [ { "data": { "text/html": "

NAA: A3196, 1939/1
Negative master carbon sheets of outward cables - O. 1 of 1939 to O. 457 of 1939
1939 - 1939

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "

NAA: A3196, 1940/1
Negative master carbon sheets of outward cables - O. 1 of 1940 to O. 499 of 1940
1940 - 1940

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "

NAA: A3196, 1941/1
Negative master carbon sheets of outward cables - O. 1 of 1941 to O. 500 of 1941
1941 - 1941
Digitised: 506 pages

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "

NAA: A3196, 1941/47
Negative master carbon sheets of outward cables - O.57 to O.13531 of 1941
1941 - 1941

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "

NAA: A3196, 1942/1
Negative master carbon sheets of outward cables - O.001 of 1942 to O.500 of 1942
1942 - 1942
Digitised: 485 pages

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "

NAA: A3196, 1943/1
Negative master carbon sheets of outward cables - O.1 of 1943 to O.500 of 1943
1943 - 1943
Digitised: 466 pages

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "

NAA: A3196, 1944/1
Negative master carbon sheets of outward cables - O.1 of 1944 to O.500 of 1944
1944 - 1944
Digitised: 443 pages

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "

NAA: A3196, 1946/1
Negative master carbon sheets of outward cables - O.0001 of 1946 to O.0500 of 1946
1946 - 1946

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "

NAA: A3196, 1947/1
Negative master carbon sheets of outward cables - O.1 of 1947 to O.500 of 1947
1947 - 1947

", "text/plain": "" }, "metadata": {}, "output_type": "display_data" } ] } }, "3c93a8a5ce224b8bb400f9fba3fa8b52": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "margin": "40px 0 0 0" } }, "418c325039b94bbab9b172a61ed440aa": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "5f64a5f2162a4837b092fe16f5394f31": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ButtonStyleModel", "state": {} }, "6574b274926a4fdfb84b82f8873b0d29": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "6b48b5fae4ca4b3999b777d27878a33b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "718e8de84f7d47e890002ac97040dc06": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "margin": "20px 0 0 0" } }, "8335d78057374213a6054c79fd67992b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_85aea1b49897476a830990307057d310", "style": "IPY_MODEL_418c325039b94bbab9b172a61ed440aa", "value": "

Filter by series and/or year to reduce the number of results

" } }, "85aea1b49897476a830990307057d310": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "9226780600654fa496fe1500d1476580": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "ab395e28819e46e483b4460703c15e10": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "VBoxModel", "state": { "children": [ "IPY_MODEL_ccf84fe0468c460db5ee582116dfb78e", "IPY_MODEL_8335d78057374213a6054c79fd67992b", "IPY_MODEL_2e8c03149f0a4700ac049c2bf061cd99", "IPY_MODEL_0b6365c927b34006a7d6aa480ca4e257", "IPY_MODEL_34b35a2556854a2382017cfa161f8464", "IPY_MODEL_37d09fae37934b66b0a27c247887b228" ], "layout": "IPY_MODEL_faebd922dc064e46878edf251c23049a" } }, "ccf84fe0468c460db5ee582116dfb78e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "TextModel", "state": { "description": "Cable:", "layout": "IPY_MODEL_fe31197f01ff4cf29efe880f96114c1a", "placeholder": "enter cable number", "style": "IPY_MODEL_f8937ad9bd9c45d08c8cac5ef334b412", "value": "I100" } }, "f8937ad9bd9c45d08c8cac5ef334b412": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "faebd922dc064e46878edf251c23049a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "fe31197f01ff4cf29efe880f96114c1a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }