Converted from an OASIS Open Document
Cuneiform characters have been described using various systems in the past and the varieties of systems used in the literature as well as in daily work varies from language to discipline. Commonly, sign lists (Borger 1971, Borger 2004, Ruster 1989, Deimel 1947) are created and published in the form of dictionaries in a non-machine-readable form. Similarly, for computers, the only way to distinguish cuneiform characters is currently to assign them different numbers in a list (e.g. Unicode (Unicode Staff, 1991)) and consider a distinction on this level. Therefore we are left with many systems and numbers to describe the same cuneiform sign. (Figure 4). Contrary to listing cuneiform signs, (Gottstein, 2012) took another approach in creating a searchable cuneiform character encoding based on wedge types which would be implemented in applications such as CuneiPainter
A machine-readable paleographic description despite yet representing another encoding scheme could link all systems of cuneiform character descriptions, as it directly describes the characters shape and positioning parameters. Scholars could register newly found characters easily in a machine-readable way and provide the basis for computational analysis on the paleographic shapes of cuneiform characters. Such paleographic information would ideally be integrated into currently emerging Semantic Dictionaries for cuneiform (Homburg, 2017, 2018) to enrich linguistic linked open data and thereby profit the respective scholars. In addition a machine-readable paleographic description provides the basis to capture sign variants of characters currently described in unicode. It is very common for on unicode codepoint to have many sign variants describing the same meaning over the centuries in which cuneiform has been written. Those sign variants have never been assessed digitally (only as sketches in books) and could provide valuable insights for philologists.
Paleo Codage builds on the description of (Gottstein, 2012), by using simple character descriptions for certain wedge types and by extending it with a Manuel de Codage (Van den Berg, 1997) inspired set of relational descriptions.
Cuneiform wedges are distinguished as follows:
The system encodes relations between wedges as shown by the following most frequent examples:
In addition size variations of cuneiform wedges are common and can be encoded as follows:
Capital letters signify a bigger version (e.g. A instead of a), wedges prefixed with a small s a smaller version (e.g. sa instead of a)
(e.g.
Lastly, angles of diagonal cuneiform characters may vary between characters which required angle modifiers to be added to the encoding.
While the order in which cuneiform wedges were drawn is not always agreed upon by the respective scholars (Devecchi, 2015), PaleoCodagesβ order independent of this dispute is from left to right and then from up to down in order to avoid ambiguities concerning cuneiform sign definitions. In order to facilicate the representation of displaced wedge groups PaleoCodage also includes the following positioning modifiers (/ half the size down, ~ half size to the left, # half size to the right, as well as < and > as rotation modifiers, rotating the whole glyph). Further operators could be added if needed by glyphs which can currently not be modeled.
A proof of concept is provided on a representative subset of 200 cuneiform unicode characters
Table 1: Cuneiform Encoding Examples
A generated similarity graph for verification purposes (Figure 2) using the new encoding method shows the applicability of the encoding to identify subglyphs that are included in other glyphs which in turn is useful information to be included in (Semantic) dictionaries. Further similarity measures on the encoding (String Similarity) could reveal additional connections between cuneiform character representations.
Figure 2: Cuneiform Character relations as graph (excerpt): Only by verification of the encoding the computer can e.g. now recognize that the glyph IMIN3 (b:b:b_b:b:b_b) is contained by the glyph ilimmu3 (b:b:b_b:b:b_b:b:b). Using the Gottstein System such a conclusion could not be made as they would be classified as b7 and b9 respectively.
Given the paleographic information encoded in a standardized way users have the ability to draw a rudimentary shape of the character in order to detect the character they are seeing in front of them (e.g. on a picture or a tablet). This functionality is currently being implemented in CuneiPainter
Figure 3: Paleo Codage Input (JavaScript Application)
Figure 4: Cuneiform Numbering Systems: Semantic Dictionary for Ancient Languages