Shaping Humanities Data: Use, Reuse, and Paths Toward Computationally Amenable Cultural Heritage Collections Thomas Padilla thomaspadilla@ucsb.edu UC Santa Barbara, United States of America Sarah Potvin spotvin@library.tamu.edu Texas A&M University, United States of America Laurie Allen laallen@upenn.edu University of Pennsylvania, United States of America Stewart Varner svarner@sas.upenn.edu University of Pennsylvania, United States of America Galleries, libraries, archives, and museums (GLAMs) increasingly seek to make digitized and borndigital collections accessible as data optimized for computational methods and tools common to the Digital Humanities. Preparation and publication of collections as data extends possible collection use beyond the analog object interactions that collection interfaces tend to try and emulate. In line with open data efforts, libraries, archives, and museums typically work to assign open licenses to these data. Current access methods are widely divergent, spanning simple provision of compressed collection objects in ZIP files, exposing static collection websites that can be crawled using a tool like rsync, leveraging Github for text collection access, provisioning an API, enabling FTP access to collections, mediating computational processes performed on collection data through a platform, to facilitating data access through use of torrent technology. Concurrently, in response to researcher requests for data-mining, commercial publishers have developed a range of processes for delivering proprietary corpuses with terms and conditions that significantly limit or expressly forbid data sharing, including providing libraries with physical hard drives loaded with the data. There are no consensus-driven best practices that guide the generation, description, and provisioning of computationally amenable GLAM collections for the range of communities that fall within the Digital Humanities. Without best practices in this space, institutions run the risk of misplaced investment of resources that foster the creation of irregular, ultimately disorienting data access environments. Indeed, the panoply of institutional approaches poses a challenge to GLAM institutions seeking best practices and clear guidelines for publishing collections as data. One major barrier to the development of consensus-driven best practice is an incomplete understanding of how digital humanists, among others, are using and reusing cultural heritage data. This workshop aims to make progress towards bridging that gap. Research indicates that types of use exhibited by digital humanists include but are not limited to text analysis, image analysis, mapping, sound analysis, and network analysis. Orientation to the full scope of academic use types can be gained through in-depth analysis of data use practices across disciplines as represented in core Digital Humanities journals (Padilla and Higgins 2016), by reviewing works at the annual global Digital Humanities conference (Weingart 2016), and by studying edited volumes that have to this point effectively compiled a broad range of research in this space (Gold 2012; Gold and Klein 2016; Burdick, Drucker, Lunen-feld et al 2012; Schreibman, Siemens, Unsworth 2016). This workshop will build upon this orientation by engaging directly with digital humanists' existing and projected research and pedagogical practices that draw upon ever growing GLAM collections. Blending short talks by practitioners, guided discussion, and workshopping of the organizers' draft framework (further described below), the workshop will focus on how researchers and educators use GLAM collections that have been made accessible as data, and will extend to consider how these uses should inform collection creation and access. The organizers of this workshop are members of the project team for “Always Already Computational: Library Collections as Data,” an effort sponsored by the Institute of Museum and Library Services in the United States of America through their National Forum grant program. The organizers have observed that GLAM approaches to the preparation of collections as data are often heavily influenced by national or regional priorities and associated infrastructures. Yet the use and reuse of these open data is necessarily international. While the organizers of the workshop are US-based, the workshop aims to surface geograph-ically-diverse praxis. The short talks in the workshop have been selected through an open CFP facilitated by an international program committee. The workshop may be structured thematically, based on talks and demos solicited via the CFP. Participants will be encouraged to consider how efforts to develop computationally amenable collections, which run the risk of recreating and reinforcing long standing biases inherent in cultural heritage collection practice, provide an opportunity to reframe, enrich, and/or contextualize collections in a manner that seeks to avoid replication of bias. Potential themes may include: • Use and Reuse: How are data used and reused? What methods and tools are commonly employed? Do these differ by disciplinary community? What types of data are used? What types of data are desired but are difficult to use for reasons included but not limited to copyright status, content type (e.g. video, audio, web, software), size? How, when, and where are data reused? What factors enhance or inhibit the likelihood of data reuse? • Access: What can we learn from our collective experiences working to access data from within and outside of the cultural heritage community? What are preferred methods of data access? What factors should be considered when deciding among access methods? When is simple click and download of bulk collections appropriate? What characteristics define an optimally useful API (application programming interface) for a wide range of users with varying technical expertise? Is an API always the best route to go? Are there a mix of options that should be considered? What considerations inform development of those access options? • Description and Discovery: How do digital humanists locate appropriate data? What tools are used to search for data? What information about the data is necessary to enable use? When compiling meta-collections of data, how are digital humanists maintaining provenance and merging disparate metadata? Burdick, A., Drucker, J. and Lunenfeld, P., Presner, T. and Schnapp, J. (Eds) (2012). Digital_Humanities. Cambridge: MIT Press. https://mitpress.mit.edu/ books/dig-italhumanities Gold, M. (Ed) (2012). Debates in the Digital Humanities. Minneapolis: University of Minnesota Press. http://dhdebates.gc.cuny.edu/debates/1 Gold, M. and Klein, L. (Eds) (2016). Debates in the Digital Humanities. Minneapolis: University of Minnesota Press. http://dhdebates.gc.cuny.edu/ debates/2 Padilla, T., Higgins, D. (2016). Data Praxis in the Digital Humanities: Use, Production, Access. In Digital Humanities 2016: Conference Abstracts. Jagiellonian University & Pedagogical University, Krakow, pp. 644-646. Schreibman, S., Siemens, R. and Unsworth, J. (Eds). (2016). A New Companion to Digital Humanities. 2nd edition. Wiley-Blackwell. Weingart, S. (2016). “Submissions to DH2016 (pt. 1).” On the scottbot irregular . http://scottbot.net/submissions- to-dh2016-pt-1/