{ "cells": [ { "cell_type": "markdown", "id": "7296915d", "metadata": {}, "source": [ "# Libraries Tasmania\n", "\n", "Current version: [v1.0.0](https://github.com/GLAM-Workbench/libraries-tasmania/releases/tag/v1.0.0)\n", "\n", "Tools and examples for working with data from [Libraries Tasmania](https://www.libraries.tas.gov.au/). For more information see the [Libraries Tasmania](https://glam-workbench.net/libraries-tasmania/) section of the GLAM Workbench.\n", "\n", "## Tasmanian Post Office Directories\n", "\n", "The [Tasmanian Post Office Directories from 1890 to 1948](https://stors.tas.gov.au/ILS/SD_ILS-981598) have been digitised and made available by Libraries Tasmania for download as PDFs. These notebooks document a workflow that extracts text and images from the PDFs to build a [searchable database of their contents](https://glam-workbench.net/tasmanian-post-office-directories/).\n", "\n", "* [Download and process Tasmanian Post Office Directory PDFs](tas-pod-save-text-images.ipynb) – downloads all 48 PDFs, then extracts images and text from the PDFs using PyMuPDF\n", "* [Upload Tasmanian Post Office Directory images to Amazon s3 for IIIF](tas-pod-upload-images.ipynb) – converts the images into pyramidal TIFFs using pyvips and then uploads them to an Amazon s3 bucket for delivery via IIIF\n", "* [Extract text from PDF images using Tesseract](tas-pod-ocr-with-tesseract.ipynb) – uses Tesseract to extract text from the images\n", "* [Add content from the Tasmanian Post Office Directories to an SQLite database](tas-pod-add-to-datasette.ipynb) – brings everything together in an SQLite database ready for delivery through Datasette\n", "\n", "See the [GLAM Workbench for more details](https://glam-workbench.github.io/libraries-tasmania/)." ] }, { "cell_type": "markdown", "id": "176aaa47", "metadata": {}, "source": [ "## Cite as\n", "\n", "See the GLAM Workbench or [Zenodo](https://doi.org/10.5281/zenodo.7080836) for up-to-date citation details.\n", "\n", "----\n", "\n", "This repository is part of the [GLAM Workbench](https://glam-workbench.github.io/). \n" ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }