Converted from an OASIS Open Document
The Digital Knowledge Store was developed from March 2012 to April 2015 at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) and funded by the Deutsche Forschungsgemeinschaft (DFG). In this first project phase a search infrastructure enables centralised access to all digital resources of the BBAW was created. The DFG granted funds for a second period in which the Digital Knowledge Store will be expanded with a focus on deployment options and integration for partner research institutions.
The BBAW hosts over 170 research projects with over 1.2 million digital resources. For the first time these resources and their metadata were completely integrated into a central full text index and made accessible through an innovative user interface. The resources hosted at the BBAW vary widely in terms of content, formats and languages. The main part of the resources are provided as digital editions and translations in formats like XML, HTML and PDF but also as electronic catalogues, documentations, databases, digital full text collections and dictionaries. The Digital Knowledge Store can be queried in different languages via the morphologically analyzed index. We utilize a number of language technologies (Bing, DONATUS) to enable this multilingual search. The search also covers automatically and manually created metadata which enhance the resources, connect them semantically and provide additional information to the user. The metadata of all resources are provided as well via a machine readable web service (OAI-PMH) and in that way become part of the Linked Open Data Cloud.
The biggest challenge in building an interdisciplinary research data infrastructure like the Digital Knowledge Store was the heterogeneity of the digital resources created at the BBAW in the last 20 years. Hosted on different servers in different databases they vary widely in regard to content and access possibilities. It was the main task to access these data generically and bundle them in the central Apache Lucene index and in a metadata scheme adapted to the needs of the academy (based on OAI-ORE). Specific import modules were implemented for the various projects and resource collections which integrate the varying data structures of the research projects. Semantically connected suggestions are provided by integrating Semantic Web Technologies (e.g. DBpedia) and Text Mining components which extend the query term and invite the user to discover and explore the academy's projects.
The second project phase of the Digital Knowledge Store running until 2017 will expand its possibilities especially in terms of sustainability and availability. There is a heavy demand by academic institutions for sustainable longterm research infrastructures which can meet the specific requirements of research data in the humanities, e.g. the integration of heterogeneous resources and content handling. One important goal in the next stage is to broaden the target user group. The software components of the Digital Knowledge Store will be provided as an installation package. This will enable Partner institutions to run their own Knowledge Store adapted to their own digital resources. In order to coordinate further development and collaboration with future users an open workshop will be held in April 2016 in Berlin.
Another topic in the next project period will be the development of guidelines for the minimum structural and technical requirements that resources and metadata have to meet to be integrated easily into the index. The guidelines will include objectives for the (technical) quality of the resources and their metadata. These best-practice-recommendations can become a general recommendation in the digital humanities for building and maintaining resource collections and a reference on how to deal with the quality of resources and metadata beyond their specific use case. Our partner institutions will successively optimize and adjust them to their specific needs. Additionally workflows for the manual and automatic supply of metadata will be created and specified. Further development goals are the automatic evaluation and integration of user feedback into the query process as well as visualization components.