An Evaluation of Rhyme Detection Using Historical Dictionaries

Introduction

As part of a larger project in distant reading nineteenth-century British poetry, a method for detecting line-end rhymes was devised that utilizes rhyme dictionaries published in the eighteenth and nineteenth centuries. This method was proposed in order to account for historical debates about the definition of poetic rhymes in English as well as historical changes in pronunciation. This paper describes an evaluation of this approach that compares it to a method commonly used in computational analysis, which is based on the CMU Pronouncing Dictionary, in order to understand what significant differences occur.

Historical context

Rhyme in English poetry is generally defined as the connection between two syllables “that have identical stressed vowels and subsequent phonemes but differ in initial consonant(s) if any are present” (Brogan and Cushman, 2012: 1184). Line-end rhyme is the most common use of rhyme, and it contributes to the structure and effect of particular poetic forms, like the sonnet and triolet, and to stanza patterns like the Spenserian stanza.

One syllable, or masculine rhymes, predominate in English poetry, as do “perfect” or exact rhymes, in which the vowel sounds are identical: cat/hat. Yet many poets have also used “imperfect” or near rhymes, in which the vowels are somewhat different: young/song. Literary critics in the nineteenth century frequently debated the rules for rhyme, either pointing to such examples as justification for a relaxed definition, or deriding them as bad poetry.

Alongside these debates, many different rhyme dictionaries were published in the nineteenth century, which offered critical definitions and examples of poetic rhyme, as well as lists of rhyme syllables and rhyming words in English. These dictionaries were aimed at would-be poets, students of poetry, and those wishing to improve their pronunciation of English. Rhyme dictionaries can thus serve as a data source for understanding both historical theories about rhyme and historical British pronunciation.

Rhyme detection with historical rhyme dictionaries

In previous work, a method for rhyme detection using John Walker’s “Index of Perfect and Allowable Rhymes” was demonstrated. Each entry consists of a rhyme syllable, a list of words that end with that syllable, other perfect rhymes, and a list of allowable rhymes:

Am, dam, ham, pam, ram, sam, cram, dram, fam., sham, swam, epigram, anagram, &c. Perfect rhymes, damn, lamb. Allowable rhymes dame, lame, &c. (Walker, 1824: 642)

A key-value table was created from these entries. The rhyme detection script uses the key-value pairs to identify perfect rhyme words and syllables first, followed by allowable rhyme words and syllables within the poem. Rhyme patterns are also visualized as a sequence of capital letters (ABABCDCD), as is standard in literary studies.

This method makes possible the detection of rhyme words, rhyme syllables, and rhyme patterns in large document sets. This method for computational historical poetics can compare different historical theories of rhyme as well as use them to evaluate rhyme usage in large document collections. This method also contributes to the study of rhyme’s effects on poetic vocabulary more generally in the nineteenth century.

Rhyme detection with the CMU Pronouncing Dictionary

The Carnegie Mellon University Pronouncing Dictionary is “an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations” (http://www.speech.cs.cmu.edu/cgi-bin/cmudict). It is widely used for a variety of language analysis tasks and is available through NLTK.

Several researchers have based their work on rhyme detection on the CMU Pronouncing Dictionary (Hirjee and Brown, 2010; Kao and Jurafsky, 2012; McCurdy, et al., 2015). The rhyme-plus package for node.js (https://www.npmjs.com/package/rhyme-plus) and the pronouncing package for Python (https: //pypi.org/project/pronouncing/) include functions for rhyme analysis based on the CMU Pronouncing Dictionary. This wide availability has made it standard for dictionary-based digital humanities work involving pronunciation. (It should be noted, however, that some other projects use speech transcription or speech synthesizer programs as an alternative to dictionary tables (Clement et al, 2013; Malmi et al, 2016).)

Evaluation

In this evaluation project, rhyme detection using historical rhyme dictionaries is compared to rhyme detection using the CMU Pronouncing Dictionary. First, the rhyme syllable and word pairs from Walker’s rhyme dictionary are compared against the CMU dictionary to discover which rhyme word pairs are found in both dictionaries; which rhyme word pairs are found only in the CMU dictionary; and which rhyme word pairs are found only in Walker’s dictionary.

Preliminary evaluation with a random sampling from Walker’s dictionary suggests that a significant proportion of historical rhymes labeled as perfect, as well as most of those labeled as “allowable” by Walker, are not discovered by using the CMU dictionary. Several reasons are suggested for this: pronunciation differences between American and British English; vocabulary differences between literary and general English; and vocabulary and pronunciation differences between nineteenth-century and contemporary English.

A second phase of evaluation tests the historical dictionary method and the CMU Pronouncing Dictionary over a corpus of 1500 British poems published between 1800-1900 to evaluate how significant the differences between the dictionaries are for the analysis of a literary corpus.

Bibliography Brogan, T. V. F. and Cushman, S. (2012). Rhyme. In Greene, R., et al. (eds), Princeton Encyclopedia of Poetry and Poetics. Princeton: Princeton University Press, pp. 1182-92. Clement, T. et al. (2013). Sounding for Meaning: Using Theories of Knowledge Representation to Analyze Aural Patterns in Texts. DHQ, 7(1). http://www.digitalhumanities.org/dhq/vol/7/1/000146/000146.html Hirjee, H. and Brown, D. (2010). Using Automated Rhyme Detection to Characterize Rhyming Style in Rap Music. Empirical Musicology Review, 5(4): 121-45. Kao, J. and Jurafsky, D. (2012). A Computational Analysis of Style, Affect, and Imagery in Contemporary Poetry. Workshop on Computational Linguistics for Literature. Montreal: ACL, pp. 8-17. Malmi, E., et al. (2016). DopeLearning: A Computational Approach to Rap Lyrics Generation. Proceedings of Knowledge Discovery and Data Mining (KDD). San Francisco: ACM. http://dx.doi.org/10.1145/2939672.293967 McCurdy, N., et al. (2015). Rhymedesign: A Tool for Analyzing Sonic Devices in Poetry. Proceedings of NAACL-HLT Fourth Workshop on Computational Linguistics for Literature. Denver: ACL, pp. 12-22. Walker, J. (1824). A Rhyming Dictionary; Answering, at the Same Time, the Purposes of Spelling and Pronouncing the English Language, on a Plan not Hitherto Attempted. New Edition. London: William Baynes and Son; Edinburgh: H. S. Baynes and Co.. Accessed in HathiTrust Digital Library: http://hdl.handle.net/2027/hvd.hwpa6m