Peer-Reviewed Article

Accepting Editor: Takako Kato, DeMontfort University, Leicester.
Recommending Reader: Judy Quinn, University of Cambridge.
Recommending Reader: Philip Shaw, University of Leicester.
Received: October 25, 2013
Revised: August 1, 2014
Published: July 10, 2015

The project The Variance of Njáls saga examines variation in the sixty-three medieval and post-medieval manuscripts of Njáls saga from a linguistic, philological, and literary perspective. This saga is the most extensive of the Icelandic family sagas and is thought to have been composed around 1280. The following article describes methods used in the project to identify synchronic variation at a linguistic level in the fourteenth century manuscripts of the saga, and aspects of an analysis of the stemmatic relationship between manuscripts. In both fields the development of computer-based approaches has advanced notably in the last few years. However, affordable solutions customised for end-users are still lacking. The project therefore focuses on easy tools that can be applied in a short-term project with limited financial and human resources. The manuscripts are transcribed according to the conventions of MENOTA (Medieval Nordic Text Archive) in TEI-XML; a segmentation allowing for an identification of corresponding contents is added; linguistic features relevant for an examination of variation are tagged; and structures relevant for a comparison are displayed for further analysis with the help of XSLT-style sheets. The article discusses challenges that lie in the peculiarities of medieval writing (non-standardised orthography, abbreviations) and tries to outline practicable solutions. Initial results of comparisons of manuscripts based on this approach show variation not only in the semantic domain (substitution of words), but also in the syntactic domain (change of word order, usage of different syntactical constructions).

Keywords: Old Icelandic; Njáls saga; linguistic variation; XML-encoding; linguistic annotation.

§ 1    Njáls saga is the subject of a research project located at the Árni Magnússon Institute in Reykjavík which aims to re-evaluate the textual transmission of the saga. The last critical edition (Gíslason and Jónsson 1875) appeared more than 135 years ago, and work done in the project will be a contribution to the preparation of a modern critical edition. The short-term scientific interest of this project, however, is: a) a revision of the stemma of the manuscripts which was outlined by Einar Ólafur Sveinsson (1953) some sixty years ago and b) an investigation of variation in the manuscripts from different scientific angles (material philology, linguistics, stylistics, literary studies).

§ 2    Part of this research is the investigation of synchronic linguistic variation in the oldest manuscripts of the saga from the fourteenth century. In this article I give a short outline of how the corpus for this research is compiled and prepared for linguistic analyses, give one example of an application of the methods on a limited part of the corpus, describe some of the typical problems of work with medieval manuscripts in the context of digital humanities, and try to sketch possible solutions for some of these problems.

Njáls saga

§ 3    Njáls saga is usually seen as the pinnacle of the development of the Icelandic family saga, a genre which is not only Iceland's most important contribution to world literature, but also by far the largest corpus of original (i.e. non-translated) medieval prose texts we have in any Germanic language, which makes it especially interesting for historical-linguistic research. The text was presumably composed around 1280.[1] The special status of Njáls saga in the literary consciousness of the Icelanders, comparable to the role of Shakespeare's Hamlet in English, Goethe's Faust in German, or Dante's Divina Commedia in Italian literature, is due to the stylistic and compositional quality of the text. With approximately 100,000 words, Njáls saga is the longest Icelandic family saga. Its popularity is emphasised by the unusually large number of medieval and post-medieval manuscripts (eighteen manuscripts between 1300 and 1500 and forty-five from the sixteenth to the nineteenth century are still extant), and citations from the saga have reached the status of proverbs used in everyday communication.

Examples of variation

§ 4    The following short sample from the beginning of chapter sixty-three which deals with the so-called Battle at Knafahólar is thought to give a picture of some typical examples of the types of variation we find between the manuscripts of the saga. The text follows what is apparently the oldest manuscript, the fragment Þormóðsbók (AM 162 B fol. δ) from around 1300; variants come from manuscripts from about the same period, Gráskinna (GKS 2870 4to, ca. 1300) and Reykjabók (AM 468 4to, ca. 1300-1325).[2]

§ 5    The transcription is diplomatic (see The diplomatic level: How to reconcile linguistic principles, practicability and philological traditions? below), i.e. the orthography of the original is kept (except for variation on the allographic level), abbreviations are expanded and italicised, upper-/lower-case letters and punctuation are kept, line breaks, page breaks and column breaks are indicated with superscript numbers or letters. To facilitate a comparison of the different manuscripts, sentence numbers (see the Segmentation section below) are displayed.

§ 6    The English text follows Robert Cook's translation, which is based on the text from the mid-fourteenth century codex Möðruvallabók (AM 132 fol.). Möðruvallabók has not yet been included in the text corpus of the project. To facilitate a comparison of the English translation and the Old Icelandic original, deviating readings in Möðruvallabók (apart from differences in spelling and use of tenses) have been added in brackets to the transcription from Reykjabók. The additions are based on Andrea van Arkel-de Leeuw van Weenen's transcription (1987).

Þormóðsbók (AM 162 B fol. δ), p. 11v B:

§ 7    om bardaga

§ 8    1 NU eggiar Starkaðr ſina menn9 2 oc ſnero fram i neſit at þeim . 3 Sigurðr 10 ſuinhofþi for fyrſt oc hafði 11 torgv ſciolld rendan en ſverð i 12 annarri hendi 4 Gunnarr ſer hann oc ſcytr 13 til hanſ af boganom . 5 hann bra upp við 14 ſcilldinom er hann ſa at orin flo hatt 15 oc flo orin igegnom ſciolldinn oc au16gat ſua at vt flo ihnaccann oc uarð 17 þat uig fyrſt . 6 Annarri avr ſkavt Gunnarr at 18 vlfheþni heima manni ſtarkaðar oc kom ſv 19 a hann miðian . oc fell hann a fetr bon20da ſinom . en bundinn fell um hann þveran 21 7 kolſceggr kaſtar ſteini ihofoð bond22anom oc varð þat hanſ bani .

Gráskinna (GKS 2870 4to)

§ 9    30 1 <S>iþan egiaði ſtarkaðr menn ſina . 2 ſnva þeir nv framm ineſit at [40v] 1 þeim . 3 Sigurðr ſvinhavfði for fyrſtr oc hafði tavrgo ſkiolld ein2byrðan enn ſviðv i annarri hendi . 4 Gvnnarr ſer hann oc ſkytr af bog[a]ganom 3 5 hann bra vp ſkilldinom er hann ſa orina hatt flivga oc kom avrin i gegnom 4 ſkiolldinn . oc i avgat ſva at vt kom vm hnakkann oc varð þat vig 5 fyrſt . 6 annarri avr ſkavtt Gvnnarr at vlfheðni raða manni ſtadkaðar oc 6 kom a hann miðian . oc fell hann fyrir fœtr einom . oc fell bondinn vm hann . 7 kolskeggr 7 kaſtaði til ſteini oc kom ihavfvt bondanom . oc var þat hans bani .

Reykjabók (AM 468 4to) (Möðruvallabók, AM 132 fol.)

§ 10    [32v] 26 1 Siðan eg[g]iaði ſtarkaðr [ſin]a menn . 2 ſnua þeir 27 þa fram i neſit at þeim 3 [Sigurðr] ſvinhofði for fyrſtr ok hafði tor28gv ſkiolld einn rauðan (M: einbyrðan) en ſuiðv iannarri hendi . 4 Gunnarr ſer hann ok ſkytr til 29 hanſ afboganvm . 5 hann bra vpp hat[t] (M: -) ſkildinvm er hann ſa aurina hatt flygia ok kom 30 orin igegnvm ſkioldinn ok iaugat sva at vtt kom i (M: -) hnackann okvarð þat vig fyrſt 6 [33r] 1 annarri aur ſkaut Gunnar[r] (M: hann) at vlfheðni manni (M: ráðamanni) ſtar[c]kaðar okkom ſv a hann miðian okfell hann fyrir fetr bo2anda einvm ok (M: +fell) bondinn v[m] hann . 7 kolſkeggr kaſtar til ſteini okkom ihofut bondanvm okvarð þat hanſ 3 bani

English translation by Robert Cook

§ 11    1 Starkad then urged his men on; 2 they headed toward those who were on the point of land. 3 Sigurd Swine-head was out in front and was holding a small round shield, and a hunting-spear in the other hand. 4 Gunnar saw him and shot at him with his bow. 5 Sigurd raised his shield when he saw the arrow flying high, but it went through the shield and into his eye and out at the back of his neck. That was the first slaying. 6 Gunnar shot another arrow at Ulfhedin, Starkad’s overseer, and it struck him in the waist and he fell at the feet of a farmer, and the farmer tripped over him. 7 Kolskegg threw a stone and it hit the head of the farmer, and that was his death.

Variation on the lexical level

§ 12    Certain types of variation on the lexical level, for example the use of coordinating conjunctions (oc/ok [and], en [but], or no conjunction), the use or omission of anaphoric subject pronouns[3] (þeir [they]) in coordinated main clauses or the denomination of Úlfhéðinn as heimamaður (roughly [farmhand]), ráðamaður (roughly [bailiff]) or simply maður ([man], manni being the dative form), have to be regarded as deliberate changes of the text and thus being of potential stylistic value (it should be pointed out that statements about the direction of the change or relationships between single manuscripts are not intended at this point).

§ 13    Other instances of lexical variation are of lesser interest for the part of the project that deals with synchronic linguistic variation and is mostly concerned with grammatical features, but are in some instances highly interesting in connection with stemmatological questions. Those variants are the result of unconscious changes due to misreadings by a scribe. A good example is the description of the weapons Sigurd Swine-head is using: A sword (sverð) in Þormóðsbók, but a certain type of spear used for hunting (sviða) in Gráskinna and Reykjabók. In the Icelandic family sagas (the search was based on the forty-three different texts in “Snerpa.is.” 2015), the word sviða is much less frequent than the word sverð (a total of seven instances in forty-three different texts compared to several hundred instances for sverð); an (unconscious) change from sviða (found in Gráskinna and Reykjabók) to sverð is more likely than the opposite development (see also Sveinsson 1953, 77). For the description of Sigurður's shield the version of Gráskinna seems to fit the plot best. The information that the shield is einbyrður [single-layer] is useful because it explains why Gunnar's arrow so easily penetrates it. In comparison to this, the descriptions of the shield as red (in Reykjabók) or round (in Þormóðsbók) are superfluous or tautological. In the family sagas, the adjective einbyrður is also by far less frequent than rauðr (rendr is rather uncommon too). It is tantalising to hypothesise that einbyrðan was the original reading that was split up in two different words, einn and byrðan, by a scribe because it was separated by a line-shift in his exemplar (as is the case for example in Gráskinna, in the transcription above). However, an adjective byrðr [boarded] makes absolutely no sense in this context and would thus have been corrected, possibly to round or red, and also einn (either a numeral one or an indefinite pronoun a certain) would in this case have been in the best case superfluous: of course Sigurd Swine-Head uses only one shield, and the use of a postpositive indefinite pronoun (a certain/some kind of target-shield) would be rather marked in this context and not appropriate in connection with a rather common weapon as a törguskjöld (a small round shield with a buckle). This would lead to the assumption that both Þormóðsbók and Reykjabók provide younger readings than Gráskinna (and also Möðruvallabók).

Linguistic variation

§ 14    The earliest manuscripts of Njáls saga exhibit several examples of synchronic grammatical variation that pose severe theoretical problems for linguistic approaches dealing with linguistic universals. A typical example that shows up in the text examples above is the order of the reflexive possessive pronoun sinn and its governing noun (sína menn/menn sína [his men]. Examples are given in Modern Icelandic spelling). Language typology claims that the order of nouns and attributes, in this case the noun maðr [man] and the reflexive possessive pronoun sinn [his], is one of the features that determines the type of a language and should thus be stable.[4] On grounds of this typological axiom, variation in this core area of syntax in different copies from the same text and produced at the same time is not to be expected. Nevertheless this variation can be observed, and fortunately linguistics is able to provide descriptive models to deal with the empirical fact of linguistic variation.

§ 15    A highly applicable model is Coseriu's (1988) model of a language as a system of different varieties that can be arranged on three different levels: location, social group, and circumstances. A language system at a certain time consists thus of different regional dialects, social dialects, and styles. If we act on the assumption that the variation in the use of certain grammatical constructions that can be observed in manuscripts of Njáls saga from about the same time can neither be attributed to historical developments nor to regional or social dialects, we have to deal with these types of variant as stylistic features.

§ 16    It is obvious that there is a connection between historical developments in a language, sometimes involving linguistic contact, and the style of certain types of texts. An example from Modern Icelandic that involves word order and may be comparable to variation in the order of noun and pronoun in the sample texts is the reverse order of negation and finite verb (negation-finite verb instead of finite verb-negation as in the standard language) in conjunctional clauses in certain types of religious and political texts that can be traced to developments in the history of Icelandic (Zeevaert 2009, 288). Nevertheless, in the modern Icelandic case the decision to use one of the two alternative structures is dependent not on historical-linguistic developments, but on (extralinguistic) circumstances (subject of the text, target audience, purpose). I hypothesise that for cases like the variation in the order of noun and attribute in the samples from the Njáls saga-manuscripts above, the same is true. To provide evidence for this hypothesis it is necessary to describe the usage of grammatical variables in different manuscripts on a larger scale in order to be able to attribute this variation to stylistic decisions dependent on certain non-linguistic circumstances.

Automatic Collation/Stemma

§ 17    As was mentioned above, an examination of the relationships between the manuscripts and, if necessary, a revision of the stemma set up by Einar Ólafur Sveinsson (1953) is one of the main goals of the project. This examination is based on XML-transcriptions of the individual manuscripts that by and large follow the Menota-guidelines. Menota is an acronym for Medieval Nordic Text Archive, an internet resource publishing digital medieval Scandinavian texts. The guidelines, which are based on the TEI-standard of text-encoding but include modifications necessary for dealing with specific properties of Scandinavian manuscripts, are an excellent and exhaustive description of the procedure of encoding texts for this archive. In some rare cases a modification of the Menota-guidelines was necessary to meet specific demands of our project.

§ 18    In the last few years, the development of computer-based collation and production of stemmata has advanced notably. The last step, that is to say software to collate manuscripts and to generate stemmata that are customised for the end-user, is still lacking. Juxta, an open-source tool for comparing and collating multiple witnesses to a single textual work (About Juxta 2015), originally developed at the University of Virginia, comes quite close to this ideal. Unfortunately the desktop version of the software shows some shortcomings that make it difficult to use with Old Icelandic texts. Icelandic characters like á, ð, æ, þ, ó or é, not to mention peculiarities of medieval Icelandic manuscripts like r-rotunda or insular f (), are not displayed (see Fig. 1); and even more serious is the fact that some rather important variants were not found by the program. The difference between the text witnesses Þormóðsbók (the base text for the comparison) and Óssbók (AM 162 B fol. γ = gamma.txt) on the one hand and Gráskinna and Reykjabók on the other hand is that the first two manuscripts treat it as a fact that a certain Þiðrandi Síðu-Hallssonur was slain by dísir (female guardian spirits), whereas the other two witnesses treat this rather as a rumour.[5]

Extract from the critical apparatus in Juxta (for the complete text in the different manuscripts see Figure 2
Figure 1: Extract from the critical apparatus in Juxta (for the complete text in the different manuscripts see Figure 2

§ 19    The use of the collation software Collate (description available in Kondrup 2011, 469) is a rather unrealistic option, as it runs only on older Macintosh computers and technical or other support is not available. Its successor, CollateX (developed in the frame of the COST action Interedition) is not yet ready for public use. However, a web-application with restricted functionality is available (http://collatex.net/demo/) and giving quite promising results (see Fig. 2).

Collation of the same sentence as in Figure 1 in Collate console
Figure 2: Collation of the same sentence as in Figure 1 in Collate console

§ 20    The beta version of Juxta Commons, the online version of Juxta, does not exhibit the same problems with the handling of non-USASCII-characters as the desktop-version. Recently a function to compile a critical apparatus was added, and the results are much more reliable than the ones from Juxta, although still not completely error-free (see Fig. 3). In this example, the text from Þormóðsbók (W1) and Óssbók (W2) is represented correctly in the apparatus whereas for the text from Gráskinna (W3) the words þann and are omitted erroneously). From experience, however, accessibility and maintenance are generally problematic issues in connection with web-based solutions.

Collation of the same sentence as in Figure 1 and Figure 2 in Juxta Commons
Figure 3: Collation of the same sentence as in Figure 1 and Figure 2 in Juxta Commons

Compiling and preparing the corpus

The purpose of transcribing

§ 21    The purpose of transcribing manuscripts is quite obvious. Medieval manuscripts are precious unique copies with limited accessibility. Transcriptions reproduce the relevant contents of the original and adapt its form to the demands of contemporary readers. It is less obvious, though, which features of the manuscript are relevant and to what extent its text has to be prepared for the reader. It is self-evident that an analysis of letter forms or the use of abbreviations in a manuscript cannot be performed on an edition with normalised and modernised orthography. In editions aimed at a general audience or intended for use in educational contexts, however, a normalised text is the appropriate choice because it avoids irritation caused by variation in spelling, scribal errors, or deviation from the linguistic standard familiar from dictionaries and grammar books.

§ 22    One of the main advantages of the TEI-XML-format is that it allows for an encoding of additional information in a transcription and gives the opportunity to choose which information is to be suppressed and how the chosen features are to be displayed on the computer screen or represented in files for further steps of analysis or preparation. This makes it possible to use the transcriptions not only as a basis for computer-aided collating and producing of stemmata, but also for a comparison of different manuscripts with regard to, for example, linguistic variation, and for editions of single manuscript witnesses.

Levels of transcription

§ 23    In the project The Variance of Njáls saga, the text is transcribed in three parallel versions or levels (<facs>, a type-facsimile transcription, <dipl>, a diplomatic transcription, and <norm>, a normalised transcription), an approach that is suggested in the guidelines for transcriptions in the Medieval Nordic Text Archive.

Transcription on three levels. Special letter forms/characters are encoded as entities (for example ſ as &slong;)
Figure 4: Transcription on three levels. Special letter forms/characters are encoded as entities (for example ſ as &slong;)

§ 24    The <facs>-level tries to reproduce the text of the manuscript in the style of a type facsimile: letter forms, abbreviation signs, punctuation, and layout are reproduced as accurately as possible using a special character set. The <facs>-level has proven to be a very useful tool for example in connection with analysing the use of abbreviations in certain manuscripts. Unfortunately, a transcription of the <facs>-level is very time consuming. It is not yet implemented in all manuscripts, and we decided to postpone it for the remaining manuscripts until the transcription of the project corpus is completed on the <dipl>- and <norm>-levels.

§ 25    The <norm>-level is a transformation of the diplomatic level to modern Icelandic spelling. A normalised orthography makes it, for example, much easier to search for words or morphological structures. The decision to use the modern Icelandic standard creates a certain distance between the transcription and the language of the original (although the distance between contemporary and medieval Icelandic is much smaller than in other European languages). The advantage in comparison to a reconstructed historical normalised orthography as, for example, used in the ONP (Ordbog over det norrøne prosasprog) is its easier applicability and better documentation which helps to avoid errors and inconsistencies during transcribing.

§ 26    No morphosyntactical adaptation of the language of the manuscripts is made on the <norm>-level: thus, for example, forms like vér and þér (modern Icelandic við [we] and þið [you], 1PL and 2PL of the personal pronoun) or em (modern Icelandic er [am], 1SG.PRS.IND of vera [to be]) are kept. This decision ensures an unambiguous approach to the normalisation of the manuscript texts (adaptations to modern language use only in the phonological but not in the morphological domain). However, it entails some limitations with respect to the straightforwardness of searches of words and forms and leads also to anachronistic combinations of the morphological and phonological shape in phrases like ég em maður skapharður (Modern Icelandic ég er maður skapharður, normalised Old Icelandic ek em maðr skapharðr [I'm a man harsh of mood]). Historical phonological changes from after the time of the writing of the manuscript, like the epenthesis of u after a consonant and before a word-final r (in the text example above the names Starkaðr and Sigurðr become Starkaður and Sigurður on the normalised level) or the shortening of final rr in weakly stressed endings (in the example from Þórmóðsbók above the name Gunnarr, which becomes Gunnar on the normalised level), also affect morphological endings which make it, in some cases, difficult to distinguish between morphological and purely phonological change.

§ 27    The graphic distance from the original does not constitute a disadvantage for linguistic analyses of the text, which in any case have to be based on the <dipl>-level, but is a precondition for the successful use of collation software. An issue with machine-based approaches to manuscript collation is the difficulty of distinguishing between important and unimportant variants. In most cases, variation on the graphic level is not of interest for stemmatological questions; a normalised text is thus able to sieve a considerable amount of irrelevant variation. For the examples of comparisons of manuscripts with CollateX, Juxta, and Juxta Commons above the normalised versions were used. A positive side-effect of the modern Icelandic standard is the better accessibility of the text to an Icelandic audience.

The diplomatic level: How to reconcile linguistic principles, practicability and philological traditions?

§ 28    A diplomatic version of the manuscript text which keeps the spelling of the manuscript but expands abbreviations, corrects errors, and normalises allographs without phonological value seems to be most appropriate for linguistic (apart from palaeographical) purposes. Such a version is represented by the <dipl>-level of our transcriptions.[6] Unfortunately clear descriptions of how to approach diplomatic transcriptions of Old Icelandic manuscripts are still lacking, a fact already mentioned by Knirk (1985, 612). The editorial practice of the Editiones Arnamagnæanæ, the series of critical editions of Old Icelandic texts published by the Arnamagnæan institutions in Copenhagen and Reykjavík, seems, to a large extent, to build on traditions developed in those institutions that partly reflect technical limitations now resolved (Jensen 1989, 211; Sigtryggsson 2005, 265-268). Aspects of this practice are documented in the introductions of some of the Arnamagnæan editions (for example in Chesnutt 2006, LXV-LXVIII); Knirk (1985) and Jóhannes B. Sigtryggsson (2005) give descriptions of the treatment of punctuation, word division, capitalisation, graphemes, expansion of abbreviations, and corrections of scribal errors.

§ 29    Unfortunately, however, some rather basic features like the non-distinction of letter forms/signs which render no phonological distinction are disputed in the literature: the distinction between <s> and <ſ>, for example, is described as solely palaeographical by Driscoll (2006, 255) but as potentially phonological (with <s> for /s:/ and <ſ> for /s/), at least in the oldest manuscripts, by Gunnlaugsson (2003, 202) and Haugen (2004, 92).

§ 30    Medieval manuscripts were written for contemporaries and not for twenty-first-century linguists, and to a modern reader medieval writing practice may often appear to be unsystematic. In Þormóðsbók (Knirk 1985, 609 giving examples from other manuscripts), the round form of s is used in different functions that partly overlap: most commonly (in approximately seventy percent of the cases) it is used in connection with abbreviations – obviously because, by contrast with <ſ>, it leaves space for superscript letters or abbreviation marks – and at the end of words.[7] It is less common that round <s> is used instead of a geminate consonant (approximately twelve percent of the cases)[8] or at the beginning of proper nouns (approximately eight percent). The two latter functions are also fulfilled by the small capital forms of g, n, and r (<ɢ>, <ɴ> and <ʀ>), which, in contrast to round <s>, exhibit clearly distinguishable forms for capitals and minuscules. Such a distinction on formal grounds is not possible for s. A distinction according to function, for example using <ſ> in the transcription for <s> with superscript abbreviation signs, superscript <s> and short final <s> in the manuscript and <s> in the transcription for <s> in the beginning of names and when it stands for /s:/ in the manuscript, seems not to be applicable, not only because of the functional overlap of some instances of <s>, but also because it would violate the principle of keeping the spelling of the manuscript and would not reduce phonologically allographic variation, but rather establish a phonological differentiation that is graphically not systematically present in the manuscript.

§ 31    A comparable issue is the distinction between u and v. The use of the round or the pointed variant is partly characteristic of certain Latin scripts (mainly due to the use of different writing tools), but often <u> and <v> are used as allographic variants, to some extent distributed according to their position in the word; <u> is, for example, typical for Carolingian minuscule (Derolez 2006, 52), the pointed form <v>, however, is reintroduced in the ninth century and used particularly in the beginning and to a lesser extent at the end of words (Bischoff 1986, 156). A differentiation between two graphemes <u> and <v> is post-medieval (Mazal 1986, 82). In Icelandic manuscripts the situation is complicated by the use of <>,[9] originally developed in England. It is traditionally transcribed as <v>. In Þormóðsbók it is mainly used word-initially, but in a substantial number of cases it is also used in medial position, and in rare cases: final position. It is clear from writings like <hlt>, <hlvt>,and <hlut> for hlut (ACC.SG of hlutr [thing]) that there is free variation between the three allographs (i.e. no complementary distribution), although clear preferences for certain positions are visible (<> 80.5% initial, <v> 51% initial, <u> 52.5% medial).[10]

§ 32    Thus both <> and <v> are rendered as v on the diplomatic level, whereas <u> and <v> are kept apart because they represent different phonemes in early Icelandic manuscripts (Hreinn Benediktsson 1965, 26), as in Modern Icelandic. The letters c, q, and k are distinguished on the diplomatic level. Their distribution in Icelandic manuscripts is for the most part conditioned by the graphic context. In contrast to the distribution of r-rotunda and normal r, however, where r-rotunda, at least originally, was mainly used after the round letters o and ð, this distribution is primarily phonologically motivated. According to Hreinn Benediktsson (1965, 30-31), c was replaced by k before front vowels in medieval texts written in Germanic languages in several countries. In Medieval Latin pronunciation, the unvoiced velar stop, represented by c in the Latin alphabet, was palatalised before front vowels, and the usage of the variant k clarified that this pronunciation was not to be applied in this context in Germanic languages. When the influence of Latin writing was reduced by the increasing number of manuscripts in Icelandic, this context became obscured and <c> and <k> were used as variants without regard to phonological/graphic context. Disregarding cases like the exclusive usage of c for the Roman number for one hundred and as the first letter in combinations that stand for the long unvoiced velar stop (<cc> and <ck> are used in the manuscript, but not <kk>), it has to be stated that the allographs <c> and <k> are used in free variation in Þormóðsbók, but with a pronounced tendency to prefer k word-initially and c word-finally (although the latter tendency becomes much less prominent if the conjunction oc [and] is not taken into account).[11] The letter q is exclusively used before u/v; c and k are not used in this position. This practice does not rest upon a phonological difference in Old Icelandic but is ultimately adopted from Latin orthography (Hreinn Benediktsson 1965, 33).

§ 33    Traditionally the decision on which allographs to keep in an edition and which not seems to rely at least partly on modern usage; distinctions involving allographs not present in the language at the time of the edition tend to be levelled out. Variants such as r-rotunda and normal r or insular f and normal f are more endangered than those still in use (for example <u> and <v> or <q> and <k> to the present day in English and German, <ſ> and <s> until the beginning of the twentieth century in Denmark). In this article, the r-rotunda used in Þormóðsbók and Gráskinna had to be replaced with the normal r in the example-texts from those manuscripts to avoid problems with the display of this character in browsers. Against this background it seems to be acceptable that the treatment of allographs is not based on phonological reasoning alone but considers also to a certain extent the traditions of Old Norse philology, and that not only phonological differences from the time of the writing of the manuscript are rendered, but also earlier and later developments. After all, the manuscripts dealt with in the project cover a period of more than five hundred years.[12]


§ 34    Another typical issue in transcriptions of Medieval Icelandic texts is the expansion of abbreviations. Icelandic manuscripts are characterised by, in comparison to Continental manuscripts, an excessive use of abbreviations. Icelandic abbreviation practice, which is mainly based on insular Latin traditions, exhibits a fairly unequivocal system. A problem for the transcription is not so much the identification of what the abbreviation stands for, but which orthography to use for the expanded form. In most cases abbreviations consist of highly conventionalised symbols that stand in an iconic relationship to the characters they represent (in the sense that the abbreviation sign renders the form of letters it stands for). This iconic relationship, however, is partly obscured by the fact that letter forms have changed from the time of the creation of the abbreviation sign. The abbreviation for ra originally represents an open a (Cappelli 1967, XXVII), which is not used in Icelandic manuscripts from the fourteenth century. Furthermore, phonological change that is rendered in the orthography is not necessarily visible in abbreviation signs. The abbreviation for er is frequently used to abbreviate the word-ending that in the oldest manuscripts is written out -er, but from the thirteenth century to the end of the Middle Ages ‑ir (Þórólfsson 1925, XXIIf.).

§ 35    The usual practice is to expand abbreviations according to comparable unabbreviated forms used by the scribe and, in the (usual) case of orthographic variation, to use the most frequent forms. In other words, a diplomatic edition should seek to render the text as it was intended by the scribe.


§ 36    To facilitate a comparison between different manuscripts, a segmentation was added to the text. This segmentation makes it easier for collation programs to deal with differences caused by additions, omissions, or rearrangements of larger chunks of text, but it is also of great use for a comparison of linguistic features because it facilitates the finding of text corresponding to a certain structure with regard to context but deviating formally. In contrast to other major medieval literary, especially poetic, works (for example the Old French Chanson de Roland, Wolfram's Parzival, or Dante's Commedia), a generally recognised or self-evident reference system does not exist for Njáls saga. We therefore decided to introduce a system based on the smallest self-contained textual unit, the sentence. In this context the notion of sentence does not refer to a syntactical unit, but is used with regard to (semantic) contents. A similar system (chapter and verse) is used very successfully to identify corresponding textual units in the Bible. This segmentation was based on the latest edition of the text (Egilsson 2003) which adds punctuation marks according to modern usage that could easily be used to identify beginnings and endings of sentences. The TEI-conventions allow for the implementation of a numbered segmentation based on sentences which is compatible with a more precise syntactical segmentation.

Linguistic Analysis

§ 37    Software for academic purposes, not least in the area of digital humanities, tends to be commercially less promising, and is thus often dependent on public funding which is subject to financial and political conditions, than software for commercial purposes. We therefore decided not to rely solely on external solutions but to develop simple approaches that can be mastered with project-internal expertise and without further costs.

§ 38    Finding variation, especially of the linguistic type, in different versions of the same text requires the identification of certain structures to be compared and the output of corresponding chunks of text potentially containing this structure in the different manuscripts. The structure of our transcriptions allows for both a tagging of the structures in question and an identification of corresponding units of contents in different transcriptions, and to us the use of XSLT-style sheets appeared to be a manageable way to prepare this information for further research. Most tasks can be accomplished with a limited number of basic style sheets[13] that can be easily adapted to certain demands.

§ 39    Typical linguistic variables for the earliest manuscript-fragments of Njáls saga are the position of the finite verb (verb-first or verb-second order), the order of noun and attribute (attribute before noun or noun before attribute), but also other stylistic phenomena as historical present tense vs. past tense or the use of either accusative with infinitive or conjunctional subordinate clauses in indirect speech (examples for all variables below Fig. 5, Fig. 6, Fig. 7).

§ 40    In order to find and compare these variables in different manuscripts, a mark-up of certain grammatical information was added to the XML-transcriptions. The Menota-guidelines provide a complete tag set for the morphosyntactic annotation of Old Icelandic texts. A complete tagging of the text, however, is excessive in relation to the task. For an investigation of the above-mentioned variables, a tagging of parts of speech (noun, verb, adjective, etc.) and a partial morphosyntactic annotation seem to be sufficient. To find instances of historical present tense, a tagging of the tense of verbs is self-evident, but a marking of direct speech to distinguish instances of historical present tense in the narrative parts of the text from normal present tense used in dialogues is also necessary. For the variables involving word order it is not sufficient to tag parts of speech. What has to be added is case when nouns in the genitive are used as modifiers and a marking of syntactical units, noun phrases for the order of noun and modifier, and the beginnings of main clauses to determine the position of the finite verb. Subordinate-clause constructions in indirect speech (conjunctional clauses, accusatives with infinitive) require a marking of clause-type and tagging of the parts of speech decisive for the construction (conjunction, finite verb in the middle voice plus infinitive).

Tagging of present tense in direct speech
Figure 5: Tagging of present tense in direct speech

Tagging of historic present tense (no direct speech)
Figure 6: Tagging of historic present tense (no direct speech)

Part-of-speech, phrase and clause-tagging for determination of word order
Figure 7: Part-of-speech, phrase and clause-tagging for determination of word order

§ 41    An extension of the tagging that might be useful for further research questions is unproblematic and can be based on the already realised tagging at a later stage.

Style sheets

§ 42    Of course it is possible to find strings of characters (including tags) with the search-function of text- or XML-editors. A considerably more efficient method, however, is to use XSL-style sheets (Extensible Stylesheet Language Transformations) that either transform the XML-file to HTML or generate PDF- or text files. The use of style sheets allows the definition of both content and form of the output to, for example, limit the output to the <dipl>-level or to define a certain font style, for example, for expanded abbreviations:

Formatting instructions in the style sheet
Figure 8: Formatting instructions in the style sheet

This allows for the output as a PDF-document with information about page numbers, columns, line numbers, emendations, and expanded abbreviations (Fig. 9).

Output in the form of a diplomatic edition
Figure 9: Output in the form of a diplomatic edition

Of much more interest for a linguistic analysis and comparison of the manuscripts, however, is the possibility to count and display certain tagged structures.

Narrative inversion

§ 43    One of the features that is usually emphasised in descriptions of the literary quality of the Icelandic family sagas is their characteristic style, which is usually described as objective, controlled, lacking decorative figures, syntactically uncomplicated, and containing elements of oral style and so forth (Bollason 2011, 16; Szokody 2002, 985; Hallberg 1969, 63 etc.). Amongst the typical features of this style Hauksson and Óskarsson mention transpositions of word order and especially the so-called narrative inversion, that is the order finite verb–subject instead of the unmarked order subject–finite verb (1994, 273ff.).

§ 44    This stylistic feature is also present in Njáls saga, but during our work with the manuscripts we observed differences in its use between different text witnesses.[14] To quantify these observations and to prepare a more objective and less random analysis of these differences in the corpus, we tagged clauses and finite verbs in two chapters of Njáls saga in four different manuscripts: Betabrotið (AM 162 B fol. β), Kálfalækjarbók (AM 133 fol.), Skafinskinna (AM 2868 4to) and Gráskinna (GKS 2870 4to) (the fragments cover different parts of the text which limits the number of corresponding chapters in different manuscripts).

§ 45    The beginning of clauses was marked with the <cl>-tag, finite verbs were tagged as xVB fF (cf. Fig. 10), which makes it easy to count and output clauses beginning with a finite verb using the XPath-expressions count(//cl[*[1][contains(@me:msa,'fF')]]) and (//s[.//cl[*[1][contains(@me:msa,'fF')]]]) (Fig. 11).

Sentences with narrative inversion from Gráskinna
Figure 10: Sentences with narrative inversion from Gráskinna

§ 46    This query was performed on all four transcriptions, and the results were revised for incorrect examples. A tagging of clause-types was not implemented in the XML-transcriptions, non-declarative (interrogative and imperative) and other inapplicable sentences were removed, and the sentences corresponding to all valid examples (identified by chapter- and sentence-number) from all texts were output, with interesting results.

Display of V1-clauses in Gráskinna
Figure 11: Display of V1-clauses in Gráskinna

§ 47    Twelve sentences (out of fifty) featured narrative inversion in at least one of the texts, but only one of them in all texts. The distribution was rather unbalanced between the different manuscripts: Betabrotið contained only three (2.5%), and Gráskinna eleven (9.5%) examples.

§ 48    Of course these results have to be interpreted with some reservation. The analysis was done only on two chapters out of 159, and only one stylistic feature was examined. In addition to this, narrative inversion in Icelandic is a bit more complex. For Table 1, only unintroduced main clauses were considered. In some of the manuscripts, however, narrative inversion after the conjunction ok [and] (searched for with the XPath-expression: //s[.//cl[*[1][contains(@me:msa,'xCC')] and *[2][contains(@me:msa,'fF')]]]) is clearly more frequent than in unintroduced main clauses (cf. Fig. 12).

Narrative inversion in four manuscripts of Njáls saga
Figure 12: Narrative inversion in four manuscripts of Njáls saga

§ 49    Christoffersen's remarks on narrative inversion (she prefers the term discourse cohesion) in Old Nordic reveal that it is more frequent in main clauses introduced with ok than in unintroduced main clauses, although she is rather cautious about clear statements (2002, 185). A general problem for a precise survey seems to be that different texts behave differently, and it is interesting to see that in the small sample from our project's corpus this tendency also holds true for different contemporaneous manuscripts of the same text (see Table 1).

Table 1: Distribution of narrative inversion in four manuscripts of Njáls saga

Nr. AM 162 B fol. β AM 133 fol. GKS 2868

GKS 2870

7,37 x x
8,1 x
8,6 x x x
8,8 x x x x
8,9 x
8,10 x x x
8,11 x x x
8,15 x x x
8,16 x x
8,18 x x
8,27 x x
8,29 x x x


§ 50    The main problem in the automatic comparison of different manuscripts of medieval texts is not so much to identify textual variants, but to distinguish between important and unimportant variants. Software for an automatic collation of texts like Juxta or the different versions of Collate rely mostly on a comparison of chunks of text which does not always lead to satisfying results, as the example in Fig. 1 shows. In the case of Njáls saga, a further problem is the number of sixty-three text-witnesses which exceeds by far the number of texts that can be handled by software like Juxta or Juxta Commons but would anyway result in a critical apparatus too large to be useful for a specific analysis of, for example, word order.

§ 51    Research interests in connection with Old Icelandic texts are manifold and require different kinds of transcriptions and annotations. Printed editions are usually designed with a certain audience in mind (Kondrup 2011, 43-86) which very often excludes their usage for, for example, linguistic research questions (cf. the discussions in Marques-Aguado 2013 and Beltrami 2013). One of the advantages of digital representations of manuscript texts over printed editions is their flexibility and adaptability to different demands. A complete transcription of all manuscript-witnesses of a text is usually less time-consuming than a manual collation if a suitable work-flow is applied (Andrews 2013, 67), and the possibilities to add additional levels of transcription or information needed for different kinds of analyses to a digital transcription are virtually unlimited. Thus, the actual challenge is not so much to develop methods for an automatic assessment but to prepare suitable data for work on different research questions with as little time and effort as possible. From my experience, this development is still very much influenced by the tradition of printed critical editions; a typical example from the project The variance of Njáls saga is the treatment of variants of letter forms (see the previous section: The diplomatic level: How to reconcile linguistic principles, practicability and philological traditions?).

§ 52    In this article I have described the methods used in the project The variance of Njáls saga to detect variation on different levels and I have put special emphasis on convenient tools and approaches used to find and analyse linguistic variation between manuscripts. This approach can be used to deal with most cases of linguistic variation, and the example of narrative inversion (see the previous section: Narrative inversion) shows that it is able to revise earlier research based on printed editions (Zeevaert 2014, 985-986 presents a stronger focus on results). Software designed to automatically compare different versions of a text can be a useful aid for this research. At the moment, however, the use of convenient tools and approaches as they are described in this article seems to be a more promising way to come to conclusions about linguistic differences between manuscripts of a text.


[1]. Cf. for example Um Brennu-Njáls sögu (1991, VII). A self-evident terminus ante quem is the age of the oldest extant manuscripts that can be dated to around 1300 (± 25 years). The terminus post quem is usually determined with regard to the usage of certain judicial proceedings and technical terms (for example the Low German loan word prófa [to examine]) that do not show up in the laws of the Icelandic free-state but are of Norwegian provenance. The laws of the free-state were replaced by Járnsíða in 1271, and according to Einar Ólafur Sveinsson (1954, LXXVIII) it is quite likely that Járnsíða was used as a source by the author of Njáls saga; for example, the proverb með lögum skal land várt byggja, en með ólögum eyða [with law our land shall rise, but it will perish with lawlessness] in chapter seventy of Njáls saga seems to be taken directly from Járnsíða, but it is assumed that it took some years for the new legal customs and law codices to have an effect on the writing of a saga; Einar Ólafur Sveinsson (1933, 299 ff., esp. 310).

[2]. It is assumed that the use of the sobriquets instead of call numbers makes it easier for the reader to distinguish between the different manuscripts. The call numbers (GKS stands for Gammel Kongelig Samlig, the Old Royal Collection, AM for Den Arnamagnæanske håndskriftsamling, the Arnamagnean Collection) are given in brackets at first mention and in the list of Source texts. Þormóðsbok is named after the seventeenth-century Icelandic historian Þormóður Torfason, the name Gráskinna refers to the sealskin cover of the codex, and the names of Reykjabók and Möðruvallabók to the provenance of the manuscripts (Reykir and Möðruvellir are Icelandic place names).

[3]. In Þormóðsbók, sentences one and two are coordinated by oc [and], and the subject in sentence two is omitted, which is to be expected if both sentences have a common subject. In this case, however, the omission of the pronominal subject þeir [they] in sentence two in Þormóðsbók is sylleptic. The verb form snero refers to a subject in the plural, Starkaðr and his men; the subject in sentence one, however, is a singular, Starkaðr. Thus, the usage or omission of a pronominal subject in the plural may be taken as a conscious stylistic decision.

[4]. Zeevaert (2012, 173 ff.) challenges the idea of a universal principle of consistency in the order of modified and modifying elements in phrases and clauses, which is at the basis of typological approaches to syntax in the Greenbergian tradition, on grounds of lacking – and in the case of the Scandinavian languages counterfactual – empirical evidence.

[5]. Þormóðsbók: sá er dísir drápu/Óssbók: þann er að dísir vægju [the one who was slain by dísir;] Gráskinna: þann er sagt að dísir vægju [he is said to have been slain by dísir]; Reykjabók: þann er sagt er að dísir vægju [the one who is said to have been slain by the dísir].

[6]. Haugen (2004, 94) and Driscoll (2006, 254) point to the fact that the term diplomatic edition covers a continuum from strictly diplomatic editions aiming at reproducing every feature of a manuscript to semi-diplomatic editions giving no information about expanded abbreviations or the layout and the punctuation of the original. The method applied here corresponds by and large to what is described by Guðvarður Már Gunnlaugsson (2003, 202) as a stafrétt útgáfa [literal edition]. Especially for historical language stages the distinction between individual deviations from a linguistic norm and scribal errors is difficult and often subjectively biased. The correction of scribal errors in transcriptions should thus be applied with carefulness, and has to be traceable to avoid wrong conclusions about the language of the manuscript.

[7]. A complementary distribution of different s-allographs exists for example in the Greek script (word final: <ς>, non-final: <σ>) or blackletter scripts like the German Fraktur (used until 1941, word final: <s>, non-final: <ſ>). In Iceland this distribution is common in late medieval manuscripts, but not in the earlier texts; in Þormóðsbók word-final <s> is written <ſ> in approximately eighty-three percent and <s> in approximately seventeen percent of the cases (in about half of these cases <s> stands for a geminate s), in non-final position the distribution is 95% for <ſ> and 5% for <s>. According to Mazal (1986, 11) <s> was introduced to Latin script in the eleventh century as part of the ligature -vs at the end of words and spread from there to other positions.

[8]. This seems to be the usage proposed by the First Grammarian (i.e. the anonymous author of the Icelandic so-called First Grammatical Treatise from the mid-twelfth century) who lists the shape of the letters to be used for the short and long variants of consonants, for example <g> for /g/, <ɢ> for /g:/, <c> for /k/, <k> for /k:/ (Nordal 1931, 88). Unfortunately, the scribe of Codex Wormianus, the only extant manuscript of the treatise, jumped over the line containing the character to be used for /s:/ when copying the text, which was added above the line as <s>, presumably by a younger hand.

[9]. The letter usually referred to as insular 'v' (for example in Hreinn Benediktsson 1965, 25), was adapted from <ƿ> (Wynn) which was used in Old English writing for the voiced labio-velar approximant /w/. In Post-Classical Latin, <v> represented a voiced labiodental fricative /v/ (Norberg 1968: 21), and <v> was thus no longer perceived as an adequate representation of English /w/. In a shape that resembles a capital Latin P it was originally part of the Elder fuþark and the Anglo-Saxon fuþorc.

[10]. It should be added that <v> and <u> are not clearly distinguishable in all cases: <v> does not have a clearly pointed but rather a round shape, <u> consists of two slightly waved downward strokes whereas the right stroke of <v> is bowed to the left. Thus <u> usually has a characteristic short stroke to the right on the base line, but one cannot exclude that the distribution in the transcription is slightly biased by modern orthography that uses <v> for the consonant /v/ and <u> for the vowel /u/ in cases where this stroke is not clearly discernible.

[11]. The tendency for a distribution of the two letters c and k in relation to their position in the word seems to reflect a rule that is formulated in the Second Grammatical Treatise, preserved together with three other writings on Old Icelandic grammatical matters in Codex Wormianus (AM 242 fol.) and under the name Háttalykill also in Codex Upsaliensis (DG 11). Together with <ð>, <z>, and <x>, <c> is classified as an undirstafr [sub-letter], which can only be used syllable-finally (En fjórði stafr er c, ok hafa sumir menn þann ritshátt, at setja hann fyrir k eða q; en hitt eina er rétt hans hljóð, at vera sem aðrir undirstafir í enda samstöfu, Raschellà 1982, 68). Hreinn Benediktsson (1965, 79) explains this rule as an attempt to reinterpret the use of two graphemes for one phoneme after the reason for their distribution was no longer transparent.

[12]. In comparison to the First Grammarian's orthographic principals, which are radically phonologically based (he proposes for example using <c> for /k/, to sort out <q> and to use <k> equivalent to the small capitals of other consonants, i.e. instead of the geminate consonant), this approach may appear inconsequent from a strictly linguistic point of view. It should be mentioned, though, that the First Grammarian's rules are not even applied in the part of Codex Wormianus that contains the First Grammatical Treatise.

[13]. Suitable style sheets are provided by different organisations. In the Njála-project we use mainly style sheets provided by Menota and TEI, but also style sheets that were originally developed by Kai Wörner (HZSK, Hamburg) for use with an Old Swedish corpus. Of valuable help for the adaption and development of the style sheets to the tasks of our project and for the design of new style sheets were Ulrike Henny (CCeH, Cologne) and, at a course organised by the IDE (Institut für Dokumentologie und Editorik), Martina Semlak (University of Graz).

[14]. Hallberg (1968, 38ff.) gives an overview of the use of what he calls omvänd ordföljd [inverted word order] in different manuscripts of nine Icelandic sagas. The figures include four family sagas (not Njáls saga, however) which partly exhibit quite distinct differences in the use of the feature.


The project Breytileiki Njáls sögu/The Variance of Njáls saga (principal investigator: Dr Svanhildur Óskarsdóttir) is funded by Rannsóknarmiðstöð Íslands/The Icelandic Centre for Research (http://www.rannis.is/) (styrknúmer 110610021).

This article is based on a presentation with the title Axes, halberds or foils given at the COST-workshop Easy Tools for Difficult Texts: Manuscripts & Textual Tradition at the Huygens ING, Den Haag, Netherlands, 18-19 April 2013.

I would like to thank Alaric Hall (School of English, University of Leeds) and two anonymous reviewers for valuable suggestions for an improvement of this article, and Emily Lethbridge (Miðaldastofa, Háskóli Íslands) for useful comments on an earlier version.

