Stylometric Analyses of Character Speeches in French Plays Galleron Ioana U. Sorbonne Nouvelle, France ioana.galleron@sorbonne-nouvelle.fr 2019-04-29T06:38:00Z Name, Institution
Street City Country Name

Converted from a Word document

Paper Short Paper characters stylometry French comedy corpus and text analysis literary studies stylistics and stylometry french studies content analysis English

It is usually understood that literary authors have style, and numerous papers have been published about the usefulness of digital approaches and techniques for identifying stylistic specificities of a writer, for confronting styles, for attributing texts, etc. (see Holmes, 1994; Brunet, 2004; Herman et al., 2004). However, another traditional type of stylistic analysis in literary studies has been less operationalized in a digital paradigm, aimed at observing how the characters speak (see however Brynes 2010 and 2012). This paper contributes to the testing of digital tools for such an approach; in other words, it tries to answer, with digital tools, whether literary characters have a style, or if the signal of the author that creates them is prevalent over all other kinds of linguistic specificities.

In order to answer this question, the paper focuses on theatrical texts from the 17 th and 18 th French centuries. To the contrary of what happens in narrative texts, character’s discourses have clear boundaries in plays, and can be easily extracted from an XML/TEI encoded text. Also, characters in plays have been somewhat less studied with digital tools than characters in novels (Jockers and Kiriloff, 2016; Jockers 2013).

A sample of 8 comedies staged approximately from 1630 to 1740 has been put together; the sample tries to balance well known play writers, such as Molière, Corneille Destouches and Marivaux, with more obscure authors (Dancourt, Regnard, Boissy, Boursault). However, all the plays are “grandes comédies” in 5 acts and in verses, with comparable lengths and quite similar numbers of characters. A total amount of 82 discourses has been extracted using an XQuery under BaseX.

First, the plays as a whole (but without the stage directions or any kind of meta-discourse) have been submitted to a PCA using the stylometric library under R written by (Eder and al., 2016). As expected, differences between author styles are well underlined by their distribution on the graph, with Molière in the middle and Regnard closer to him than to the authors from the 18 th century.

This first analysis has been conducted only to confirm that the tool works on the type of texts it has been fed to, and yields sensible results.

Second, character’s discourses have been submitted to the same kind of analysis. After further adjustments, such as the exclusion from the corpus of too short roles that were skewing the general distribution on the graph, and the testing of various algorithms (covariance, correlation or cluster analysis? Classic Delta, Canberra or Eder’s Delta?), the following representation has been obtained:

As it can be observed, characters do not group by “origin” (i. e., more often than not characters from a play do not appear together), nor do they display a clear historical split - to the difference to what was happening with the authors. This tends to confirm that characters do have a style, whose parameters are to be further identified and tested. Moreover, when merging in a same txt file discourses of feminine, respectively masculine, characters from the same play/ author, significant differences appear in certain cases, with Molière’s Agnès and other feminine characters being the most intriguing case:

In a third stage, an analysis of characters’ speeches is conducted with TXM (see Heiden, 2010), so as to delineate the stylistic differences pointed to by the PCA and to attempt an explanation. Verbs do not seem to be useful discriminators, even when the texts are lemmatized. The calculus of specificities shows that personnel pronouns, names and possessives are the most discriminant features. To these, one may add the adjectives, which appear more frequently as a characteristic of male speeches according to the table of preferred words built with the “oppose” function under stylo library in R. While the importance of “pronouns of dialogue” for characterizing plays has already been underlined (Muller, 1979), the other features are a bit more surprising, since one would have expected, for instance, verbs to play a more prominent role, related to the actantial position of the characters. Also, it is not clear why male characters would use more adjectives than their feminine counterparts; sensibility and a trend towards pathos are more often evoked in relation to the second ones. After further inquiries with a new set of discourses, from other plays by the same authors, but also from other authors, so as to confirm the above mentioned phenomena, the paper will try to propose some explanations based on a close analysis of the contexts.

Bibliography Brunet, Étienne. (2004). Où l’on mesure la distance entre les distances. Texto! http://www.revue-texto.net/Inedits/Brunet/Brunet_Distance.html Brynes, R. (2010). A statistical analysis of the ‘Eumaeus’ Phrasemes in James Joyce’s Ulysses . JADT 2010: Conference Proceedings, Rome: Edizioni Universitarie di Lettere Economia Diritto, pp. 289-96. Brynes, R. (2012). The Stylometry of Cliche Density and Character. Proceedings of JADT 2012, Liège/ Bruxelles: Université de Liège/ Facultés Universitaires Saint-Louis, pp. 239-46. Eder, M., Kestemont, M. and Rybicki, J. (2013). Stylometry with R: a suite of tools. Digital Humanities 2013: Conference Abstracts. University of Nebraska--Lincoln, NE, pp. 487-89. Heiden, S. (2010). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. 24th Pacific Asia Conference on Language, Information and Computation. Institute for Digital Enhancement of Cognitive Development, Waseda University, pp. 389-98. Herrmann, J. B., van Dalen-Oskam, K. and Schöch, C. (2015). Revisiting style, a key concept in literary studies. Journal of Literary Theory, 9(1) : 25-52. Holmes, D.I. (1994) Authorship attribution. Computers and the Humanities, 28(2): 87-106. Jockers, M. and Kiriloff, G. (2016) Understanding Gender and Character Agency in the 19th Century Novel. Journal of Cultural Analytics. DOI 10.22148/16.010 Jockers, M. (2013) Macroanalysis. Digital Methods and Literary History, Urbana, Chicago and Springfield: University of Illinois Press. Kastberg Sjöblom, M. (2004) L’indice pronominal est-il encore d’actualité? Lexicometrica, (5): 1-21 . Malrieu, D. (2001). Stylistique et statistique textuelle. À partir de l’article de C. Muller sur les ‘pronoms de dialogue’. Texto ! http://www.revue-texto.net/index.php?id=611 Muller, C. (1979). Langue française et linguistique quantitative. Genève : Slatkine.