A Digital Catalogue of Medieval and Early Modern Manuscripts Granholm Patrik National Library of Sweden, Sweden patrik.granholm@kb.se 2019-04-23T07:24:00Z Name, Institution
Street City Country Name

Converted from a Word document

Paper Short Paper manuscripts cataloguing digitisation TEI IIIF digital archives and digital libraries medieval studies digital research infrastructures and virtual research environments English library & information science manuscripts description and representation
Introduction

This paper presents the guiding principles and ongoing development of Manuscripta, a digital catalogue of medieval and early modern manuscripts in Sweden, which started as a project specific database but has since evolved to become a national infrastructure. The manuscript descriptions are encoded in TEI, which is a highly suitable metadata format for detailed, scholarly catalogues since the hierarchical structure of TEI corresponds to the four parts traditionally used in cataloguing: description of contents, codicological description, provenance, and bibliography. The digitised manuscripts are presented using the IIIF API, and the images are available free of restriction under the CC0 public domain dedication. The infrastructure is built entirely using open source software, and the source code, together with the TEI-files, are available on GitHub.

Cataloguing and encoding principles

Medieval and early modern manuscripts are seldom monographs, but often contain multiple texts by different authors. Furthermore, they have usually gone through several stages of production and use, e.g. been expanded, taken apart, lost certain parts, or been rebound with new additions. Occasionally new texts may have been inserted on previously blank pages. This historical stratigraphy has not, until recently, been taken into account in manuscript cataloguing, but recent scholarship and methodological developments have shown that the notion of codicological units is an essential requirement for detailed, scholarly catalogues, not least to distinguish different dates and provenances of various units, and to clearly indicate this information for each particular unit, which has previously not been the case (Gumbert, 2004; Andrist et al., 2013).

The manuscript descriptions are therefore structured around the notion of codicological units, and the customised TEI encoding schema, which also provides cataloguing guidelines with examples and documentation of the elements and attributes used, has been tailored to this end. This is one of the first digital catalogues to implement this state-of-the-art research, which at times has been called a codicological revolution. There have so far only been a few printed catalogues offering this form of stratified cataloguing.

The structure of the TEI-encoded descriptions follows as closely as possible the hierarchical structure of the manuscripts: the intellectual content, physical description, and history, where applicable, of each unit are described in separate <msPart> elements, whereas information common to all units, e.g. the binding, provenance, and bibliography, is described one level higher, directly under the <msDesc> element. The <msPart> element is always used, even when a manuscript consists of only one codicological unit, to reflect the notion of the ‘monomerous codex’, a term coined by Gumbert (Gumbert, 2004:26).

Technology

The infrastructure is built entirely using open source software which is an essential requirement for transparency and for ensuring long-term maintainability. It runs on an XML database called eXist-db, which offers advanced indexing and search functionality through XQuery, and built-in functions for converting TEI to HTML and PDF. The digitised manuscripts are displayed adjacent to the descriptions, with page references linked to the digitised manuscript, enabling immediate access to specific locations in the manuscript. Part of the description is given in running text, and TEI provides a variety of possibilities for tagging words and phrases, e.g. information such as names, places, dates, writing material, and watermarks. These tags then allow for advanced text search queries. In addition, names, places, and bibliographical references are linked to authority files which have information on alternative name forms, short biographical and geographical data, and links to other external resources. Since TEI is based on XML, it can easily be converted into other formats like HTML, PDF, MODS, JSON, RDFa. This is an essential requirement for the data to be transferable between different platforms and to enable Linked Open Data.

Cataloguing is done using a web-based editing interface, which does not require any knowledge of TEI encoding and therefore simplifies and reduces the time required for the cataloguing process. The interface is built with React.js. Previously, it has been necessary to use an XML editor which had a steep learning curve, and was time-consuming as well as error prone, even when validating with an encoding schema and using detailed cataloguing guidelines.

Future plans

More descriptions will be added to Manuscripta continuously, not only born digital descriptions, but also descriptions from legacy databases like Microsoft Access and FileMaker, and printed manuscript catalogues, using OCR and data extraction. These will then be encoded into the same schema as the born digital descriptions. Further development will include adding a controlled vocabulary for technical terms used in the manuscript descriptions and authority files for works preserved in the manuscripts; enriching the HTML with Microdata, i.e., machine-readable data for the semantic web which would enable search engines to give users more accurate search results; creating stylesheets for converting the TEI of the authority files into linked open data formats like RDF and JSON-LD.

Bibliography Andrist, P., Canart, P. and Maniaci, M. (2013). La syntaxe du codex : essai de codicologie structurale. Turnhout: Brepols. Gumbert, J. P. (2004). Codicological Units: Towards a Terminology for the Stratigraphy of the Non-Homogeneous Codex. Segno e testo 2: 17-42.