Converted from an OASIS Open Document
The character network of a given narrative (novel, play, film, graphic novel, etc.) models the structure formed by the relations in its character-system (Woloch, 2003). A relation between two characters symbolises their co-presence in parts of the narrative; the entire set of relations between all characters constitutes a formal model of this character-system and lends itself to display and analysis. For example, Moretti (2011) used network modelling to compare the importance of protagonists from Shakespeare’s
Hamlet, while Trilcke
et al. (2015) created character networks for 465 German plays and used them to initiate a wider study of German Theatre.
Most applications of character network analysis have disregarded temporality, possibly because of its representational complexity. Consequently, all relations in the system are considered as happening at the same time: one cannot distinguish if a given edge symbolises a relation at the start, at the end, or in several parts of the work under study. Furthermore, because temporality is not being accounted for, there is usually no way of relating the network visualisation with the unfolding of the source narrative. While prototypes such as those discussed in Roberts-Smith
et al. (2013) offer sophisticate ways of dynamically visualising the text of theatre plays, they do so in a way that is unrelated to character network modelling.
Based on these observations, we set out to develop an open source web application which models the character-system of theatre plays as a sequence of network states synchronised with the actual narrative content ( https://github.com/maladesimaginaires/intnetviz). This paper proposes a high-level overview of our application , successively focusing on the underlying structure extraction process, the conception of the graphical interface, and the range of uses envisioned for it. In the conclusion, we evoke the ways in which we intend to develop it and reflect on the potential significance of this development at a more epistemological level.
The underlying workflow has been divided into two parts. First, the text of a play is processed using
Orange Textable (Orange Canvas (http://orange.biolab.si/) visual programming environment: in particular, the play is divided into its component parts (acts, scenes, lines) and the characters present in each scene are identified and associated with each line. These data are then imported into a web interface based on the open source
D3 JavaScript library (
The workflow is designed to facilitate the later inclusion of new data. The play used in the initial development phase was Molière’s
L’Ecole des femmes, as found in raw text format on the Project Gutenberg website (
http://www.gutenberg.org/files/43535/43535-0.txt). Theatre plays constitute a privileged input for the automatic extraction of structural information: such information is in general explicitly encoded in these data, as illustrated by the excerpt of Molière’s play reproduced on Figure 1. It is worth noting that the extraction phase can be readily adapted to take advantage of cases where structural information has been formally encoded using TEI-XML annotation for instance, such as the “Théâtre classique” database used by Karsdorp
et al. (2015). We are currently investigating the possibility of extending our approach to a large body of plays in this way.
Structure extraction is performed by a mostly linear chaining of segmentation and recoding operations based on regular expressions which gradually transform the raw text of the play into the tables that will be later used for controlling the web interface. Each table has the same number of rows, corresponding to the extracted lines of the play. The first table gives the text of each line (lightly annotated in XML) along with the associated character and stage directions (Table 1). The second table indicates the presence or absence of each character when each line is spoken (see Table 2).
Applied to
L’Ecole des femmes, the process was found to give fairly reliable results: about 98% of the lines were correctly extracted. Most errors could be fixed by determining
a priori the set of acceptable character names in a given play. An unexpected source of errors was the assumption that characters could be consistently identified by a single string; even without taking into account situations where a character is explicitly “renamed” as part of a surprise effect, linguistic processes such as the succession of indefinite and definite articles in French discourse lead to formal variations in the designation of characters.
Data files are then imported into a browser and D3 is used to build an interactive visualisation of the character-system (a demo is available at http://bit.ly/network-demo) where the state of the network is synchronised with the current line (Figure 2). Character nodes are positioned using a force-based layout. Browsing the text simultaneously updates the network, line, and position indicator (act, scene, line). At each step, the weight of edges (expressed by their width) increases with the number of co-presences between the characters up to this point. Similarly, the weight of nodes (their radius) increases with the number of lines spoken by this character.
A character node can be in one of four states. It is:
The latter state makes it possible to offer a view of the final network state, highlighting absences as well as presences in the earlier stages.
We display each line in front of the matching network state for a reason: the network does not exhaust the richness of the play. Since co-presence analysis does not consider the actual content of lines, it cannot–nor aims to–account for discursive references to characters. However, by extracting an actantial model from speech turns, the tool provides a concise and dynamic representation of the enunciative structure of the play. Paired with background knowledge about theatrical narration, such a visualisation offers new ways of reading.
From a philological perspective, relating sociological variables (gender, age, social status) with structural properties of the network (dynamical statistics, centrality measures, word and line counts) helps clarifying which profiles lead the interactions, which ones merely react to them, and which ones are excluded from them, revealing potential social stigma. By expliciting not only the presence but also the absence of links between certain characters, the proposed visualisation may contribute to a visual and interactive deconstruction of the plays (Derrida, 1967).
By allowing the user to browse through successive pictures of the interactions, the interface provides a unique opportunity to "play" the play and visualise the flow of speech between characters. We believe that this playful dimension is particularly interesting for pedagogic uses, especially when discovering a new narrative work with students.
Last but not least, our method can also be applied to any text in the course of the writing process. In this context, disposing of a visualisation of existing interactions at each moment of the text may help the writer distance herself from an impressionistic representation.
One of the most significant benefits of a digital humanities approach is the dialectic relation created between the development of a prototype and the verification of scientific hypotheses. Discussing seemingly trivial design issues like the number of colors requested to map the network relations has frequently led us to expliciting and fruitfully questioning differences in our epistemological backgrounds. Thus we consider the following perspectives not only as potential improvements to our tool but also, more importantly, as opportunities to challenge our own theoretical preconceptions:
By allowing the user to browse the content of a narrative and manipulate an interactive character network, our motivation is ultimately to contribute to a better integration of distant and close reading practices.