The Trials of Tokenization

Bibliography Burrows, J. F. (2007). All the Way Through: Testing for Authorship in Different Frequency Strata. LLC, 22(1): 27–47. Chesnutt, C. W. (1900). The House Behind the Cedars. Houghton Mifflin, Boston, http://babel.hathitrust.org/cgi/pt?view=plaintext;size=100;id=nc01.ark%3A%2F13960%2Ft7cr7221k;page=root;seq=25;num=13. Chiarcos, C., Ritz, J. and Stede, M. (2012). By All These Lovely Tokens . . . : Merging Conflicting Tokenizations. Language Resources and Evaluation, 46(1): 53–74. Craig, H. and Kinney, A. F. (2010). Shakespeare, Computers, and the Mystery of Authorship. Cambridge University Press, Cambridge. Dridan, R., and Oepen, S. (2012). Tokenization: Returning to a Long Solved Problem: A Survey, Contrastive Experiment, Recommendations, and Toolkit. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 378–82. Eder, M. (2013). Mind Your Corpus: Systematic Errors in Authorship Attribution. LLC, 28(4): 603–14. Eder, M., Rybicki, J. and Kestemont, M. (2014). Stylo. Gaskell, E. (1910 [1851]). Cranford. Houghton Mifflin, Boston, http://babel.hathitrust.org/cgi/pt?q1=twelve;id=hvd.32044097042071;view=plaintext;start=1;sz=10;page=root;size=100;seq=143;num=107. Hoover, D. L. (2014a). Making Waves: Algorithmic Criticism Revisited. DH2014, Lausanne, Switzerland: EPFL-UNIL, pp. 202–4. Hoover, D. L. (2014b). The Full-Spectrum Text-Analysis Spreadsheet. Digital Humanities 2013, Center for Digital Research in the Humanities, Lincoln, NE, University of Nebraska, pp. 226–29. The Intelligent Archive. (2012). Centre for Literary and Linguistic Computing, University of Newcastle, Australia. Jockers, M. L. (2013). Macroanalysis: Digital Methods and Literary History. University of Illinois Press, Urbana-Champaign. Levitan, S. and Argamon, S. (2006). Fixing the Federalist: Correcting Results and Evaluating Editions for Automated Attribution. Digital Humanities 2006. Paris: Centre de Recherche Cultures Anglophones et Technologies de l’Information, pp. 323–26. Plasek, A. and Hoover, D. L. (2014). Starting the Conversation: Literary Studies, Algorithmic Opacity, and Computer-Assisted Literary Insight. DH2014, Lausanne: EPFL-UNIL, pp. 305–6. Ramsay, S. (2011). Reading Machines: Toward an Algorithmic Criticism. University of Illinois Press, Urbana. Scott, M. (2012). WordSmith Tools version 6. Liverpool: Lexical Analysis Software. Sinclair, S., Rockwell, G. and the Voyant Tools Team. (2012). Voyant Tools (web application). Tsukamoto, S. (2004). KWIC Concordance for Windows version 4.7.