{ "cells": [ { "cell_type": "markdown", "id": "119ccc36", "metadata": {}, "source": [ "# RecordSearch\n", "\n", "Current version: [v1.1.1](https://github.com/GLAM-Workbench/recordsearch/releases/tag/v1.1.1)\n", "\n", "This repository contains Jupyter notebooks to work with data from the National Archives of Australia's RecordSearch database.\n", "\n", "[RecordSearch](https://recordsearch.naa.gov.au/) is the online collection database of the National Archives of Australia. Based on the [series system](https://www.naa.gov.au/help-your-research/getting-started/commonwealth-record-series-crs-system), RecordSearch provides rich, contextual information about series, items, agencies, and functions.\n", "\n", "Unfortunately RecordSearch doesn't provide access to machine-readable data through an API, so we have to resort to screen scraping. The notebooks here make use of the [RecordSearch Data Scraper](https://wragge.github.io/recordsearch_data_scraper/).\n", "\n", "See the [RecordSearch section](https://glam-workbench.net/recordsearch/) of the GLAM Workbench for more details.\n", "\n", "## Notebook topics\n", "\n", "### Harvesting data\n", "\n", "* [**Harvest items from a search in RecordSearch**](harvesting_items_from_a_search.ipynb) – save the results of an item search in RecordSearch as a downloadable dataset, you can also save images and PDFs from digitised files\n", "* [**Harvest files with the access status of 'closed'**](harvest_closed_files.ipynb) – find out what we're not allowed to see by harvesting details of 'closed' files\n", "* [**Harvest recently digitised files from RecordSearch**](harvest_recently_digitised_files.ipynb) – save details of files digitised in the past month\n", "* [**Harvest details of all series in RecordSearch**](harvest_series_data.ipynb) – get details of all series registered in RecordSearch, also generates a summary dataset with the total number of items digitised, described and in each access category\n", "* [**Harvesting functions from the RecordSearch interface**](harvesting_functions_from_recordsearch.ipynb) – extract information from the RecordSearch interface about the hierarchy of functions it uses to describe the work of government agencies\n", "* [**Harvest agencies associated with *all* functions**](get_all_agencies_by_function.ipynb) – loops through the list of functions saving details of the agencies associated with each\n", "\n", "### Analysing data\n", "\n", "* [**Exploring harvested series data, 2021**](series_harvest_basic_stats.ipynb) – generates some basic statistics from the harvest of series data\n", "* [**Exploring harvested series data, 2022**](series_harvest_basic_stats_2022.ipynb) – generates some basic statistics from the harvest of series data in 2022 and compares the results to the previous year\n", "* [**Summary of records digitised in the previous week**](recently_digitised_update.ipynb) – run this notebook to analyse the most recent dataset of recently digitised files, summarising the results by series\n", "* [**How many of the functions are actually used?**](how_many_functions_are_used.ipynb) – looks at the harvest of functions to see how many are actually in use\n", "* [**Who's responsible?**](display_agencies_by_function.ipynb) – pick a function to which which agencies are have been responsible for it over time\n", "\n", "### Useful tools\n", "\n", "* [**DIY Redaction Art Collages**](diy_redaction_collage.ipynb) – generates a random sample of ASIO redactions and packs them into one big image\n", "* [**Download the contents of a digitised file**](get_images_from_a_digitised_file.ipynb) – get a digitised files as a folder full of images\n", "* [**Get a list of agencies associated with a function**](get_agencies_associated_with_function.ipynb) - pick a function and create a downloadable list of agencies responsible for it\n", "* [**DFAT Cable Finder**](Find_cables.ipynb) – helps you find numbered cables created by DFAT\n", "\n", "## Data downloads\n", "\n", "* [Summary data about all series in RecordSearch, May 2021](https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_May_2021.csv) (15mb CSV) – contains basic descriptive information about all the series currently registered on RecordSearch (May 2021) as well as the total number of items described, digitised, and in each access category.\n", "* [Summary data about all series in RecordSearch, April 2022](https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_April_2022.csv) (15mb CSV) – contains basic descriptive information about all the series currently registered on RecordSearch (May 2021) as well as the total number of items described, digitised, and in each access category.\n", "* [Recently digitised files](https://github.com/GLAM-Workbench/recordsearch/blob/master/data/recently-digitised-20210327) (CSV) – containing details of files digitised between 25 February and 26 March 2021, for an ongoing record of digitised files see [this repository](https://github.com/wragge/naa-recently-digitised) which creates weekly snapsots." ] }, { "cell_type": "markdown", "id": "3b6ae58b", "metadata": {}, "source": [ "## Cite as\n", "\n", "See the GLAM Workbench or [Zenodo](https://doi.org/10.5281/zenodo.3544753) for up-to-date citation details.\n", "\n", "----\n", "\n", "This repository is part of the [GLAM Workbench](https://glam-workbench.github.io/). \n", "If you think this project is worthwhile, you might like [to sponsor me on GitHub](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }