Quantifying Complexity in Multimodal Media: Alan Moore and the “Density” of the Graphic Novel

Introduction

In an interview he gave in the year 2000, the well-known comics author Alan Moore made a remarkable observation about the graphic novel. Although he was critical of the term, which is commonly used to refer to book-length comics narratives, Moore acknowledged that canonical titles such as Art Spiegelman’s Maus and his own Watchmen could legitimately be described as novelistic on the basis of their higher “density” (Moore). Moore thus implicitly hypothesized that critical appreciation may have a formal basis. As we understand it, Moore’s brief reference to density—which he does not expand on in the interview—may be reformulated as complexity: the visual and textual cues that make it comparatively easy or difficult to comprehend and interpret a given narrative. Yet, this notion of complexity introduces further complexities for a scholarly understanding: On the one hand, Moore’s hypothesis accords with recent attempts in DH to find a new middle ground between the older formalisms and a cultural studies emphasis on the discursive construction of literary concepts. In practice, this reorientation necessitates a combination of computation and cultural sociology (English & Underwood). On the other hand, anyone who attempts to operationalize a concept such as density in multimodal media also faces technical challenges, in our case the automatic recognition of handwritten comics fonts.

In this paper, we describe the operationalization of Moore’s concept of density with the help of six textual and visual measures. We then present a pilot study of 40 graphic novels and memoirs, which are taken from the first representative corpus of English graphic narrative, or GNC (Dunst, Hartel & Laubrock). Six of these can be described as canonical given their frequent discussion in academic scholarship. The relatively small number of titles can be traced to the aforementioned technical hurdles. DH research on comics and visual media more generally has made significant progress in recent years (Dunst, Laubrock & Wildfeuer). Yet, existing computer vision methods still need to be adapted to the structural features of comics, such as individual panels, speech bubbles, and non-perspectival drawn images. Existing OCR software based on static and adaptive character classifiers leads to poor results in recognizing highly individualized and frequently handwritten comics fonts. This paper builds on early results of applying neural network-based automatic text recognition (ATR) to graphic narratives and may constitute the first computational analysis of comics text.

Dataset & Methodology

The brevity of Moore’s reference to density does not give any indication of his precise understanding of the term. However, our previous research has shown that Shannon entropy, a measure for the visual agitation of a page, and the number of shapes are useful indicators of style in comics (Dunst & Hartel 2018). These measures also capture central elements of basic visual processing, which distinguishes variations in color and brightness and establishes discontinuities between shapes (Lefèvre). In addition, we include the number of individual panels in our formalization. Most comics pages consist of several individual images that are framed by drawn borders or white space to suggest a sequence. Therefore, the number of panels per page indicates whether a page consists of one single image, or is constructed from the complex arrangement of many.

We currently achieve the most promising results recognizing comics text with the open-source Tesseract 4 software, which is based on a long short-term memory (LSTM) recurrent neural network (Smith). As described in earlier work (Hartel & Dunst), we use the similarity measure of the “Bag Error Rate” (BER) to decide whether the ATR software produces plausible results for text analysis based on a Bag-of-Words (BOW) approach. For each graphic novel, we manually annotated around 10% of its pages and compared this gold standard to the results of the ATR. Only if the BOW of the gold standard and the recognized texts are similar enough (a BER smaller than 40, for details see Hartel & Dunst), do we consider the graphic novel appropriate for text analysis. Research on the complexity of written texts often uses simple word-based measures. Standardized reading tests such as the Gunning fog index or Flesch-Kincaid count the length of individual words and sentence lengths. While our ATR does not reliably recognize punctuation at this point and is thus unable to count sentence length, we include the number of overall words on a page, word length by number of characters, and normalized type-token ratio in our calculation of textual complexity. In order to weigh all six textual and visual measures equally, we normalized each measure by computing the quotient of the value for each graphic novel and the maximum value for all graphic novels.

The designation of certain graphic novels and memoirs in the GNC as canonical is based on the frequency with which they are mentioned in the Bonn Online Bibliography of Comics Research. Figure 1 gives an overview of the 20 titles with the highest number of mentions and includes all six titles that were part of our study ( A Contract with God, Fun Home, Jimmy Corrigan, Maus, V for Vendetta, Watchmen).

Fig. 1: 20 titles in GNC with the most mentions in Bonn Bibliography of Comics Research

Results & Discussion

Our pilot study provides empirical evidence that supports Moore’s hypothesis that critically esteemed, or canonical, graphic novels are characterized by higher density. Despite the comparatively small number of titles analyzed, figure 2 shows that the results are highly significant, with p<2*10 -16.

Fig. 2.: Distinction in density between canonical and non-canonical graphic narratives

The results introduce a number of finer distinctions that are, by necessity, absent from Moore’s brief mention of density. Figures 3 and 4 compare four genres: the umbrella category graphic fantasy, which includes science fiction, fantasy, horror, and superhero narratives; graphic memoirs; graphic novels in the narrower sense of the word as fictional, literary narratives; and graphic non-fiction. Both information channels present in graphic novels show significant differences for canonical and less celebrated titles. If we look at different genres, we see that graphic memoirs are less complex visually than other titles but show the highest score for textual density. Graphic fantasy emerges as the most visually complex genre.

Fig. 3: Genre comparison for visual density. All categories show statistically significant distinction, with p < 0,05.

Fig. 4: Genre comparison for textual density. The pairings graphic novel–graphic fantasy, graphic novel–graphic memoir, and graphic novel–graphic non-fiction are statistically significant, with p < 0,05.

The difference between textual and visual density contributes to our empirical knowledge about narrative. The visual density of graphic fantasy is due to higher entropy and number of shapes. Work in progress indicates that these titles also tend to be more colorful and more irregular in their layout. Titles such as Moore’s Watchmen and V for Vendetta are thus visually highly complex, possibly because of the emphasis on spectacle and entertainment in these genres. In contrast, the textual density of graphic memoirs might contribute to their frequent discussion in academic scholarship, with Maus amassing 15% of all mentions in our corpus. Textual complexity arguably appeals to literary and cultural critics who have been schooled to appreciate titles that allow for ambiguity and subtle interpretations. However, many graphic memoirs (including Maus) are published in black and white—a feature that leads to lower entropy and, in our current operationalization, to somewhat lower visual density. Finally, a combination of high visual and textual density seems to augur well for the success of graphic narratives. As figure 5 shows, four of the six canonical examples included in our study can be found among the highest scoring titles on both counts.

Fig. 5: Scores for overall density, with canonical titles marked green

Conclusion & Outlook

We’ve detailed the operationalization of a concept, that of density or complexity, that was anecdotally connected to social processes of canonization by a leading comics author. Similar processes might be at work in multimodal media, including film and television. Generally speaking, higher-level concepts that combine information channels may provide useful research hypotheses for multimodal analysis. In contrast to more exploratory analyses of correlation between verbal and visual measures, these concepts can easily be connected to qualitative scholarship and sociological metadata. In a next step, we will increase the number of titles to the total of 250 included in the GNC. This will allow for a representative overview of graphic narrative. In addition to Tesseract 4, we are currently training Transkribus ATR software (Kahle et al.) on comics fonts and are working on enhanced text spotting, so that we will likely be able to present a more comprehensive version of this study in time for DH 2019.

Bibliography Dunst, A. & Hartel, R. (2018). “Automated Genre and Author Distinction in Comics“ DH 2018: Book of Abstracts, 184-188. Dunst, A. & Hartel, R. (2018). “The Quantitative Study of Comics: Towards a Visual Stylometry of Graphic Narrative”. A. Dunst, Laubrock, J. & Wildfeuer, J. (Ed.), Empirical Comics Research: Digital, Cognitive, and Multimodal Methods. New York: Routledge, 43-61. Dunst, A., Hartel, R. & Laubrock, J. (2017). “The Graphic Narrative Corpus (GNC): Design, Annotation, and Analysis for the Digital Humanities” In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition 2017, 15-20. Dunst, A., Laubrock, J. & Wildfeuer, J. (2017). “Comics and Empirical Research: An Introduction”. In: A. Dunst, Laubrock, J. & Wildfeuer, J. (Ed.), Empirical Comics Research: Digital, Cognitive, and Multimodal Methods. New York: Routledge, 1-23. English, J. & Underwood, T. (2016). “Shifting Scales: Between Literature and Social Science.” Modern Language Quarterly 77: 277-295. Hartel, R. & Dunst, A. (2019). “How Good is Good Enough? Establishing Quality Thresholds for the Automatic Text Analysis of Retro-Digitized Comics” In: Proceedings of the Multimedia Modeling Conference 2019, https://easychair.org/publications/preprint_open/Mdf2 . Kahle, P., Colutto, S., Hackl, G., & Mühlberger, G. (2017). “Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents”. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR): 19-24. Lefèvre, P. (2016). “No Content without Form: Graphic Style as the Primary Entrance to a Story.” In: N. Cohn (Ed.). The Visual Narrative Reader. London: Bloomsbury, 72. Moore, A. & Kavanagh, B. “Interview”, 17 October 2000 http://www.blather.net/articles/amoore/alanmoore.txt. Smith, R. (2007). “An Overview of the Tesseract OCR Engine.” In: Ninth International Conference on Document Analysis and Recognition (ICDAR): 629-633.