Digital Medievalist 3 (2007-2008). ISSN: 1715-0736.
© Peter A. Stokes, 2007. Creative Commons Attribution-NonCommercial licence

Palaeography and Image-Processing: Some Solutions and Problems

[ Skip to Abstract | Return to Top ]

Peer-Reviewed Article

Accepting Editor: Daniel Paul O'Donnell, University of Lethbridge.
Recommending Reader: Melissa Terras, University College London.
Received: January 11, 2007
Revised: November 6, 2007
Published: December 24, 2007

[ Skip to Navigation | Return to Colophon ]


This paper considers the application of image-processing and data-mining to the analysis of scribal hands. The work of forensic document analysts on feature-extraction is considered, particularly the algorithms developed for automatic handwriting-recognition by Srihari, and by Bulacu and Schomaker. Automatic clustering is also considered using the AutoClass package. Preliminary results of the author’s own experiments with these approaches are presented, and some of the obstacles are outlined which must be overcome before a practical system can be developed for the automatic identification of medieval scribes.

Keywords: Palaeography; Imaging; Feature-extraction; Clustering; Forensic Document Analysis.

[ Return to Navigation]

§ 1    “With the aid of technological advances palaeography, which is an art of seeing and comprehending, is in the process of becoming an art of measurement” (Bischoff 1990, 3). With this seemingly innocuous statement, and with the help of the editors of Scrittura e Civiltà, Bernhard Bischoff sparked a furious debate over the role in modern palaeography of objective measurement, and by implication of computing (Costamagna et al. 1995-1996; Pratesi 1998; Gumbert 1998; Derolez 2003, 6-9). Arianna Ciula has already discussed this debate in the inaugural volume of Digital Medievalist and I shall not repeat her work here. Instead, I wish to raise questions about the so-called “art of measurement” itself, and to see how work in related fields can be applied to palaeography. Ciula has already shown us one way in which computers can be used for objective analysis, and a different approach has recently been used to help scholars read the Vindolanda tablets (Terras 2006). However, one of the main difficulties faced by palaeographers is the classification and identification of hands, and this is an area which has already received a good deal of attention in other disciplines. Specifically, the community of forensic document analysts have been working for several years now to develop computer-based systems for identifying and classifying modern handwriting, and this begs the question whether such work can be applied to medieval writing as well. The answers to this are complex and cannot possibly be covered in a single paper, but instead I wish to consider two techniques which have been developed by forensic document analysts and which can be tested relatively easily on medieval script. By doing so I hope to show that this related research is indeed useful to medievalists, and in showing this I seek also to demonstrate that the “art of measurement” can be used not to replace other techniques but to supplement them and to contribute to our understanding in new and previously unattainable ways.

Analysing the Script: Automatic Feature-Extraction

§ 2     The first approach to be considered here is automatic feature-extraction. In Zurada’s terms, the objective of feature-extraction is to produce feature-vectors which “retain the minimum number of data dimensions while maintaining the probability of correct classification”, and where “the feature space dimensionality is postulated to be much smaller than the dimensionality of the pattern space” (Zurada 1992, 95). Despite this technical-sounding definition, feature-extraction has long been a part of “traditional” manuscript studies and also forensic document analysis; in this context it is simply the identification of key features which are used to establish the degree of similarity between two hands. [1] At least among palaeographers, the emphasis on features has been used to introduce objectivity and communicability into the field: rather than describing the aspect of a page in subjective terms, it is now usually thought more useful to use clear and unambiguous criteria which can be easily understood and verified by others (Derolez 2003, 1-2 and 6-9). Attempts have been made, therefore, to establish terminologies for describing letter-forms and script-systems which reflect the decisions, conscious or otherwise, made by medieval scribes (Bischoff 1954; Brown 1990; Derolez 2003, 13-24). However, such features by their nature are the very ones which scribes can easily adopt and abandon at will. As Spumar has noted, “the copyist is not a machine programmed to determined functions and causing us to consider all confused variants and developments as the result of another hand” (Spumar 1976, 64); on the contrary, it has long been recognised that scribes will deliberately alter their writing to conform to the different expectations which accompany different kinds of text. [2] This is by no means to say that such a morphological approach to palaeography is invalid: on the contrary, it has proven to be extremely useful. [3] It does mean, however, that one must take some care in interpreting the evidence provided by letter-forms. The problem also remains of how one can determine which features are significant. Such a problem is somewhat less intractable when considering script-systems: in many cases, a given script can be defined with greater or lesser accuracy by a relatively small set of letter-forms. [4] The problem becomes much greater, however, when trying to distinguish between different scribes. Such identifications seem to be relatively sound if a cluster of unusual letter-forms can be found which occur in a group of related manuscripts and nowhere else: the assumption then is that those manuscripts were written by the same scribe, or at least by scribes from the same school. However, difficult questions must still be asked. How many features are required to secure an identification? How unusual must these features be? How can one be certain that a second scribe was not copying these features? Or that this unusualness is not an artefact of missing evidence rather than the oddities of a single scribe? Even the most highly-regarded palaeographers have slipped up while trying to navigate a path through these treacherous grounds. [5]

§ 3    One possible path towards solving these problems is to use a computer to extract large quantities of precisely defined information which can be analysed statistically and which could not be obtained any other way within a practical time-frame. This approach has been used by researchers who have been working to develop systems for the automatic identification of modern handwriting. [6] They have experimented with several different statistical measurements which can be obtained from a sample of handwriting and which can then be used for comparison and identification. One group tested their system on one thousand samples and obtained accuracy of about 95% for fully-automated identification and verification of modern handwriting (Srihari 2003, iii). A second group, using a much less complex system, achieved comparable results in generating a list of ten possible matches to a given sample out of a set of 250 writers (Bulacu, Schomaker, and Vuurpijl 2003, 4). In both cases, however, the results were obtained using samples of handwriting which were obtained under carefully controlled conditions: the text was the same for each sample and was selected to include all important features of the hand, the pages were the same and were laid out in the same way, the same pens, paper, and supports were used, and the samples were all digitised in the same way and under the same conditions. While this uniformity was necessary for the scientific validity of the experiments, these conditions are clearly ideal and thus represent the best results one could hope to achieve. Nevertheless, these results do seem promising, and so the application of these methods to medieval handwriting deserves investigation.

§ 4    In all of the cases considered, the approach has been to use a computer to extract features in the form of statistical measurements from digitised samples of handwriting, and to use these measurements to compare different hands. The bulk of the research, then, has been in determining which measurements to take, and a number of different solutions has been proposed. Some, such as the speed and pressure of the pen, need to be taken at the time of writing and so are of no use either to the palaeographer or to the forensic document analyst. Others, such as the entropy and the distribution of shades of grey, depend on high-quality images which have been digitised under nearly-identical conditions. Although libraries are producing high-quality digital images in some quantity, they are still some way off producing a complete corpus in any sense. [7] Instead, a practical system will need to function with images from a variety of sources and should ideally be able to cope with scans of photographs and perhaps even of half-tone plates in books. Similarly, at least for the purposes of an initial study, the algorithms need to be fairly straightforward and quick to implement; if the results show promise, then a longer and more concerted effort can be justified.

§ 5    To this end, I have selected and implemented five measurements in order to test their usefulness to the study of medieval handwriting. The first of these is run-lengths (Arazi 1977) and is demonstrated in Figure 1 below. By scanning through an image, the software can count the number of consecutive pixels corresponding either to background or to ink in a given direction: for example, Figure 1 shows a run of four background pixels in the horizontal direction, and a run of five foreground pixels in the vertical direction. Thus a large number of long horizontal runs would indicate more space between vertical strokes, and many long vertical runs might suggest a hand with elongated and relatively upright ascenders, descenders, and minims. [8] Some degree of scaling and normalisation is required to account for differences in the size of the hand and the size and resolution of the image; this is discussed further below.

Figure 1: Run-lengths Run-lengths

§ 6    Another measurement is known as autocorrelation: in short, it measures the degree of regularity in a hand and indicates the distance between regularly occurring elements. It is calculated by overlaying a copy of the image onto itself and counting the number of pixels in common; the overlaid image is then moved horizontally by one pixel and the count repeated. A page filled entirely with perfectly reproduced and regularly spaced examples of the letter l, for example, will give an autocorrelation of almost zero for all horizontal shifts except those where the letters are all aligned, at which point it will be maximum. This is demonstrated in Figure 2, below. The first diagram shows almost no overlap and so the autocorrelation for this displacement is very low, the second shows some overlap and so a higher value, and the third shows almost complete overlap and so a near-maximum value.

Figure 2: Autocorrelation. Note the increasing overlap as the horizontal displacement of the blue image changes relative to the black one. Autocorrelation. Note the increasing overlap as the horizontal displacement of the blue image changes relative to the black one.

§ 7    Bulacu and Schomaker have also proposed edge-directions as another possible metric. In this case, the edges of every letter are broken down into small, straight lines and their directions measured and counted (Bulacu, Schomaker, and Vuurpijl 2003). The direction is measured by overlaying a theoretical box on the image centred at the lower tip of each line, and the angle is determined by detecting where the line crosses the edge of the box. Thus in Figure 3, below, the line has an “angle” of seven. Such a measurement, when calculated for all edge-segments in an image, gives an indication of the average direction of the strokes in a given hand. A very upright and angular hand will have most lines either vertical or horizontal (so with values clustering around 0 and 6), a sloping hand will show most edges in the direction of the slope, and a rotund hand will not show much of a peak in any direction.

Figure 3: Edge-direction (3-pixel radius). This edge-segment has an “angle” of 7. Edge-direction (3-pixel radius). This edge-segment has an “angle” of 7.

§ 8    Finally, the same authors have also proposed hinge-directions (Bulacu, Schomaker, and Vuurpijl 2003, 3). This is an extension of edge-directions but instead it considers “hinges”, namely the points where two straight lines meet. By measuring the angles of both lines in a “hinge” the metric seeks to characterise the bends and angles in a hand, as demonstrated in Figure 4 below.

Figure 4: Hinge-directions (3-pixel radius). The “hinge” here has an “angle” of 7,0. Hinge-directions (3-pixel radius). The “hinge” here has an “angle” of 7,0.

§ 9    The five algorithms were initially implemented in MATLAB using the DIPimage toolbox provided by the Pattern Recognition group in the Department of Applied Physics at the Delft University of Technology (for full details see van Ginkel and van Kempen 2003). However, the toolbox proved to be impractical, due partly to the inherent inefficiencies of MATLAB and partly to the quota of CPU cycles which had been imposed on the only system which was accessible at the time. Fortunately, the team at Delft have also made available the C library underlying the DIPimage toolkit. The MATLAB code was therefore converted to C++, using the DIPlib library instead of the DIPimage toolkit, and this was found to run at greatly increased speed while still allowing rapid prototyping. The five algorithms outlined above were implemented as described in the respective literature. [9] Two different measures of distance were tested, the so-called Euclidean and χ2, and χ2 was ultimately used. [10] The autocorrelation histogram was normalised such that the minimum value was zero and the first element was one; the others were all normalised to probability density functions (PDFs) but multiplied by 100 for convenience. The images were all scaled so that minims in each image were the same height, in order to eliminate bias due to differences in the size of the hand and the size and resolution of the image. Each image was then converted from greyscale to black and white and the edges of each stroke were obtained for the edge-direction and hinge- direction algorithms. [11]

Figure 5: Sample images for Test 1 Sample images for Test 1

§ 10     For the first test, six images were used and are reproduced as Figure 5 above; four images are of Anglo-Caroline minuscule written by one scribe (B, D, E and F), and two of Insular minuscule by a second scribe (A and C). The images were all 1100×500 pixels large, 8-bit greyscale, and taken from the same 24-bit RGB image which was scanned at 300 dpi. In terms of the script, each image was about three lines of text high and about fifteen to twenty letters wide; minims were about 45 pixels high. Each set of measurements was generated for each sample; the results are shown in Figure 6 below. As these figures represent distances, the matching hands should show significantly lower values than other hands in the same matrix; these numbers are displayed in bold in the tables.

Figure 6: Distance matrices for the five methods in Test 1. Numbers in bold should be lower than the other numbers in that matrix if the corresponding method was successful. Distance matrices for the five methods in Test 1. Numbers in bold should be lower than the other numbers in that matrix if the corresponding method was successful.

§ 11    From these tables, it can be seen that correct results were obtained in every case except that the horizontal-runs algorithm failed to group Hand D correctly. Given the imprecise nature of the problem, it is unreasonable to expect that every metric should produce a perfect response every time: instead, a number of different measurements could be taken and a voting-mechanism or something similar used to make a final decision. In these circumstances, the above results look very promising indeed.

§ 12    One would certainly hope that the results were this good, however, since the sample hands were carefully chosen, and all the images are the same size, the same resolution, and taken from the same original image. As noted above, however, any useful system will need to be able to account for differences in all of these factors. The next test, therefore, was designed to see if the system can indeed account for the different sizes of images. This time, five images were used, as reproduced as Figure 7 below.

Figure 7: Sample images for Test 2 Sample images for Test 2

§ 13    Samples A and B were written in English Vernacular minuscule, and C, D, and E in Anglo- Caroline minuscule; image E is a subset of C, and D also overlaps substantially with C. Each image was 8-bit greyscale, and taken from the same 24-bit RGB image which was scanned at 150 dpi. All of the images were 500×220 pixels large, except for Sample C which measured 500×440 pixels. In terms of the script, each image was about thirty letters wide and either five or ten lines of text high; cue-height corresponded to about 18 pixels.

Figure 8: Distance matrices for Test 2. Numbers in bold should be smaller than others: note that this is not so for H-runs, row B column A, and for V-runs is only so for row E column C. Distance matrices for Test 2. Numbers in bold should be smaller than others: note that this is not so for H-runs, row B column A, and for V-runs is only so for row E column C.

§ 14    Once again, the results are good but not perfect. This time, both run-length algorithms had some difficulty in correctly identifying the matching hands, but the other three all produced correct results. Perhaps more significantly, however, the larger image (C) was misclassified no more often than any of the other samples. Indeed, C and E match very closely for all of the algorithms, and this suggests that the results are indeed approximately independent of the size of the image. Bulacu and Schomaker’s conclusions are also confirmed here: the directions of edges and hinges produce superior results to the older metrics of run-length and autocorrelation.

§ 15    A third test was conducted to compare samples of different sizes in both horizontal and vertical directions, as well as different resolutions. Again, five images were used, this time all of the same hand. All the images were 8-bit greyscale and were taken from a 24-bit colour image at 300 dpi, but the sizes and resolutions of the images varied, as shown in Figures 9 and 10 below.

Figure 9: Sample images for Test 3 Sample images for Test 3

Figure 10: Parameters for Test 3 Parameters for Test 3

§ 16    Samples D and E are identical except for the different resolutions. Since all of the samples were of the same hand, one would hope that no clear identifications would emerge, that all of the samples would be approximately the same distance from one another. On the other hand, if the system were sensitive to variations in size or resolution, then this should become apparent from this test. If C was substantially distant from the others, for example, then this would indicate that the results were indeed sensitive to size. Alternatively, if E was classified as distinct from the other four hands, then a bias due to resolution would be revealed.

Figure 11: Distance Matrices for Test 3. Note that no single value is consistently lower or higher than any others in any of the five matrices: this finding suggests that there is no bias in any of the methods. Distance Matrices for Test 3. Note that no single value is consistently lower or higher than any others in any of the five matrices: this finding suggests that there is no bias in any of the methods.

§ 17    Once again, the results are promising. No strong bias due to size or resolution is revealed. Only the autocorrelation correctly reported no difference at all between samples D and E; more significantly, no algorithms clearly misassigned E to its own group, although all but the autocorrelation function returned slightly greater distances for this hand.

§ 18    From these preliminary experiments, it seems that the algorithms in question show some promise and are worthy of further attention. However, it should be noted that even an untrained person would have had little difficulty in classifying any of the samples which have been tested here, and a great deal more work is required before any computer-based system could out-perform a human. Nevertheless, advances in image-processing are rapid and much more sophisticated techniques are available than those used here. [12] What the computer can do is process very large numbers of hands very quickly and produce a short-list of likely matches. Even then questions remain as to how to interpret the data which the algorithms present. If the distances between samples are all relatively large, and if the algorithms all produce much the same classification, then all is well. But how similar need two hands be before they are grouped? Or, in more quantitative terms, what is the maximum allowable distance between two hands before they are classified as different? As the above results have shown, the distances vary from metric to metric, and from dataset to dataset, and so no single number can be assigned which will hold good for all situations. Instead, an adaptive system must be developed, which can account for these variations and decide for itself what values are appropriate, and how many different groups should be formed. Fortunately, this is another area in which a computer can be of use.


§ 19     The second technique which I shall consider has a somewhat different point of origin. As has been discussed above, one of the primary difficulties faced by palaeographers is the grouping of related specimens of handwriting. The degree of objectivity in such a grouping varies between individuals, but whatever the approach some difficulties remain the same. The first results from the sheer volume of data: I am aware of no studies on the subject but expert palaeographers I have spoken to claim to recall no more than perhaps thirty or forty scribal hands at most. However the extant corpus from some scriptoria can number in the hundreds. Furthermore, as the previous discussion has demonstrated, it is not necessarily clear how such scribal hands should be grouped. A palaeographer can look through a large number of hands and collect data on all of the letter-forms used by all of the scribes, but this immediately produces a problem: either a small number of features are considered, but it is very difficult to determine which features are sufficient to characterise a given hand, or every possible feature is recorded, in which case the volume of data is too great for any one person to process. A similar difficulty applies to the automatic feature-extraction discussed in the previous section: in this case, not only the volume but also the nature of the data is prohibitive, since the long lists of numbers which are produced have little meaning outside the software which produced them. Several approaches have already been developed by palaeographers in an effort to accommodate the volume of data, [13] but these are relatively crude and are only useful in very simple cases. However, the problem of classification has been the subject of extensive research in computer science, and a large volume of software has already been developed and made freely available to help solve this problem. [14] Once again, then, a fundamental question in palaeography has already been examined in depth by researchers in another discipline, and so the question must be asked whether such research can be usefully applied here. In the remainder of this paper I shall therefore consider this question and test one of the many pieces of software in a practical example.

§ 20    The relevant discipline here is an entire field of artificial intelligence variously called data-mining, clustering, or unsupervised learning, and which has been defined by one author as “the problem of automatic discovery of classes in data” (Stutz and Cheeseman 1996, 61). [15]

§ 21    This is in contrast to “supervised learning”, in which the system is presented with a set of training-data, the desired classification of which is already known, and the network can then use this to learn how such classifications are to be obtained. Supervised learning presents difficulties to the palaeographer since hundreds of known examples are normally required, and in most cases nowhere near this number of scribal hands has been localised and dated. [16] Instead, many automated systems have been developed for unsupervised learning which determine their own criteria for categorisation: very few initial assumptions are made about the input-data, and the machine is left to make its own decisions about which features are significant, how groups should be formed, and even how many groups there should be. The applicability of this technique to palaeography needs hardly be stated, but such an approach introduces complications in interpreting the output of these systems. Without any external guidance, the classifications which the network chooses could reflect either simple biases in data or important and hitherto unrecognised similarities. It need not be the case that an unsupervised classifier will produce exactly the same results as a human expert, and this raises the question of whether the machine’s results should be accepted when they differ from a person’s, and also how much the software should be forced to conform to preconceived notions about interrelations in the data. It may well be true that an eleventh-century documentary writ has some degree of commonality with a fourth-century luxury manuscript, however deeply buried that connexion is, but such a link is unlikely to be of much value to the palaeographer. The answer seems to be something of a compromise: as researchers have found, “discovery of important structure is usually a process of finding classes, interpreting the results, transforming and/or augmenting the data, and repeating the cycle.” (Stutz and Cheeseman 1996, 62).

§ 22    The program chosen for initial experiments with the medieval scripts is known as AutoClass. This package was developed by a group at the NASA Ames Research Centre to implement “unsupervised classification based on the classical mixture model, supplemented by a Bayesian method for determining the optimal classes” (Stutz and Cheeseman 1996, 61). [17] The program has been carefully designed to be as general as possible, making no assumptions about the underlying data or even the number of groups into which the samples should be classified. It can accommodate both real and discrete values, and so can be used with the measurements discussed in the previous section of this paper but also with lists of features which have been gathered by a palaeographer. It was first used by this author to classify eleven images of five different hands using data which had been produced by the five algorithms discussed above. Although the number of images was not particularly large, the distance-matrices were still large enough, and the variation in distances small enough, that a grouping was not immediately apparent. To this end, the C++ software was modified to produce the header, database, and model files required by AutoClass, incorporating all 888 data-points for each of the eleven hands. In practice, 327 of these points had only one unique value and so were of no use in classification; therefore 561-dimensional vectors of real scalar values were employed. [18] The software was allowed to run for 19365 tries, after which time it classified the hands into three different groups with an approximate marginal likelihood of - 27417.639. The samples and their classification are shown in Figure 12 below, and the expected grouping is A-D as one, E-H as another (with subgroups E-F and G-H), and I-K as the third.

Figure 12: Classification of scribal hands using AutoClass. The expected result was to group A-D, E-H (with subgroups E-F and G-H), and I-K. Classification of scribal hands using AutoClass. The expected result was to group A-D, E-H (with subgroups E-F and G-H), and I-K.

§ 23    As can be seen, the classification was largely successful, except that it associated Samples A and B with E and F on the one hand, and C and D with G and H on the other. The precise reasons for this are not clear, and are possibly due to the fact that C and D are twice the size of A and B: although earlier tests suggested that variations in image-size had little impact on the measurements, they may have had enough of an impact to affect the more subtle categorization which is being attempted here. Given that the run-length measurements were more likely to return false groupings, it may be that removing the data contributed by these algorithms will produce better results. Although more work is certainly required, however, these initial results do suggest that this computer-based approach may be of some use.

§ 24    As noted above, AutoClass can also incorporate discrete data in addition to the automatically generated feature vectors. This then allows a second application: the classification of hands based on features which have been extracted manually by a palaeographer. To this end, I constructed a list of some 286 features and identified which of those features are present in 466 sample hands, primarily vernacular writing from England datable to the late tenth and early eleventh centuries. [19] In order to facilitate the entry of information into the computer, I created a form within a pre-existing database of all manuscripts and scribal hands under consideration. [20] Since the data to be entered is a simple “Yes/No” value for each field, it might be thought that the most appropriate form would contain nothing but a long list of check-boxes. In practice, however, this proved to be extremely unwieldy, and a great deal of time was initially spent looking through the nearly three hundred boxes in order to find the ones which were required. Similarly, it was very difficult to add, remove, or otherwise alter the list of features, and such alterations are essential as one’s sense of which features should be recorded alters with experience. Instead, a second table was created which simply contained hand-feature pairs, and a form created which contained nothing more than a drop-down list of hands and a drop-down list of features; this is shown in Figure 13 below.

Figure 13: Database form for the entry of features Database form for the entry of features

§ 25    This proved to be very efficient for data-entry, and just over 17,000 hand-feature pairs were entered for the 466 hands. However, this specially developed format is not recognised by any generic classifier I am aware of, since those classifiers all expect input in the form of vectors. To this end, a second piece of software was developed which read in a file exported from the database, processed all of the hand-feature pairs for each hand, and then produced the database, header, and model files which AutoClass could then read. Facilities were also added to weight the data, to apply certain rules whereby the presence of a given feature could be inferred from another (for example, that horned a must also be flat-topped), and to produce histograms of features by date and location. Twenty-two of the 286 dimensions had only one unique value and so were ignored; the remainder were entered as discrete nominal values with the single multinomial model. Results at the time of writing have been somewhat disappointing as the AutoClass software has a strong tendency to group all of the manuscripts together in a single class. Somewhat better results have been obtained by reducing the number of features and considering only those which I had previously identified as being of greater significance but even this usually produces only two or three classes for the 466 different hands. Indeed, the most useful approach so far has been to abandon automatic classification entirely and to build forms into the database which allow an expert user to search directly for different features, or to obtain the features which are found in scribal hands from a given location. [21] Examples of these forms are shown in Figures 14 and 15 below.

Figure 14: Searching for scribal hands by letter-form. Note that eleven of the sixteen hands with the features indicated are associated either with Southeast England or with Ælfric who was at Cerne Abbas, Dorset, but had close links to Christ Church, Canterbury (CaCC). Searching for scribal hands by letter-form. Note that eleven of the sixteen hands with the features indicated are associated either with Southeast England or with Ælfric who was at Cerne Abbas, Dorset, but had close links to Christ Church, Canterbury (CaCC).

Figure 15: Searching for letter-forms and scribal hands by location. The form tells us that 56 hands can be localised to Worcester or York, of which 44 show wedged ascenders, 40 show round c, 37 show horizontal minim-feet, and so on. Searching for letter-forms and scribal hands by location. The form tells us that 56 hands can be localised to Worcester or York, of which 44 show wedged ascenders, 40 show round c, 37 show horizontal minim-feet, and so on.

§ 26    An approach such as this is useful but it is very time-consuming both to build the corpus and to search it. It also depends very heavily on all parties using the same terminology when describing letter-forms but no such standard terminology yet exists (Bischoff 1954; Derolez 2003, 13-24). It is also almost impossible for a person to assess the relative significance of each of the 286 dimensions and to judge which combination of features would produce the best results. For these reasons a fully automated approach such as AutoClass may seem preferable. However, difficulties of terminology still apply as these are used to build the underlying data. Similarly the relative significance of features is important in automated classification since, as discussed above, the results improved markedly when a relatively small number of features was entered into the classifier, these features having been predetermined through “traditional” palaeographical research. However, this human intervention eliminates one of the primary advantages of using a computer, namely the ability to assess very many different elements at once. Furthermore, as observed above, it is an interesting question how much a computer-based approach might reveal new relationships and significant features which have not hitherto been considered by palaeographers. AutoClass itself reports what it considers to be the relative significance of features, and this information could be of use not only to reduce the volume of data which is entered, but also to provide clues to the palaeographer regarding which features should be considered. However, the software can only consider the data which it is given, and if the human user is to filter out much of this information beforehand then he or she denies this possibility to the machine. A hierarchical classifier may produce better results, since the data can naturally be organised as hands within scribes within scriptoria within scripts, but this again imposes a structure which may or may not be valid, and indeed in the late Anglo-Saxon period the evidence seems quite clear that such a structure did not exist. [22] Ultimately, however, it is perhaps unreasonable to expect any software to produce useful results without a great deal of effort and experimentation, given such a complex data-set. As noted above, the discovery of structure is successful when implemented as a process rather than a one-off attempt.

Concluding Remarks

§ 27     The above discussion has concentrated on only two of the many possible ways in which research in apparently unrelated fields can be applied to palaeography. As I have already suggested, these applications all require careful thought and no small effort to ensure that they are carried out appropriately; technology provides tools rather than magical solutions, and no tool is useful unless it is properly used. Similarly, I do not think that computer-based approaches can or should replace traditional methods of palaeography; instead, the technology enables new approaches which provide different types of evidence for subsequent (human) interpretation. With proper care, these approaches can make significant contributions to our understanding of medieval palaeography and are certainly here to stay. Indeed, I can think of no better conclusion than Gumbert’s rephrasal of Bischoff’s well-known line: “palaeography, and codicology, which are arts of seeing and feeling, are now, happily, in the process of becoming also arts of measurement.” (Gumbert 1998, 404).


[1] . I follow Malcolm Parkes here in distinguishing between “script”, “the model which the scribe has in mind’s eye when he writes”, and “(scribal) hand,” “what he actually puts down on the page.” See Parkes 1969, xxvi.

[2] . Examples include scribes deliberately distinguishing between English Vernacular minuscule and Anglo-Caroline minuscule, for which see especially Dumville 1988, 53-54; Dumville 1993, particularly 152-54; and Dumville 2001, 9.

[3] . For examples applied to late Anglo-Saxon script, see Ker 1957, especially xxv-xxxiii; Dumville 1987; Dumville 1994; and Dumville 1993; note also Derolez 2003, 6-9.

[4] . For some examples of such definitions, see note 2, above.

[5] . Neil Ker, for example, referred to a “characteristic” mark of punctuation used by the “Hemming” scribe but which is actually found in the work of several other scribes; see Stokes forthcoming.

[7] . The number of large-scale projects to digitise entire manuscripts or even libraries is increasing rapidly. For some examples see Codices Electronici Ecclesiae Coloniensis (<>), Codices Electronici Sangallenses (<>), Irish Script on Screen (<>), Early Manuscripts at Oxford University (<>), the Árni Magnússon Institute of Iceland (<>) and Parker on the Web (<>).

[8] . A minim is the basic short vertical stroke used to form many letters: the letter i is formed with one minim, n with two, and m with three. An ascender is the component of a letter which reaches above minim-height and is found in letters such as l, h, and b. A descender is that which reaches below the line of writing as found in letters like p and q.

[9] . Histograms of 100 bins were used for the run-length and autocorrelation metrics, and edge-fragments of four pixels were used for the edge and hinge directions: each script was therefore represented as an 888-dimensional vector of positive real values.

[10] . Bulacu and Schomaker tested Hamming, Minkowski up to fifth order, Hausdorff, χ2, and Bhattacharyya functions to measure distance. Although they did not provide details of these tests, they have asserted that “only best-performing distance functions” were used in the final results, and their tables include only χ2 and Euclidean distances. The same two functions were used in their second paper, but were applied to different features. See Schomaker, Bulacu, and van Erp 2003, 546, and compare Bulacu and Schomaker 2003, 3.

[11] . The images were processed by applying an isodata threshold, the MorphologicalRange function in DIPlib with filter parameters of 3, thresholding at 80, and then obtaining the Euclidean skeleton with the end-pixel condition set to “natural”. For these functions see van Kempen et al. 2003, 321, 415-16, and 139-40.

[12] . For an overview of some of the recent developments in this field, see the website of the 9th International Conference on Document Analysis and Recognition (<>), and especially Schomaker et al. 2007.

[13] . For two such approaches, see Gumbert 1976, and Davis 1998.

[14] . Discussions of such principles have been given by MacKay 2003, esp. 300, for maximum-likelihood; Hanson, Stutz, and Cheeseman 1991 and Stutz and Cheeseman 1996 for Bayesian classification; and Zurada 1992, for neural networks.

[15] . For a similar definition see Zurada 1992, 56-58, among others.

[16] . For the use of supervised learning in a similar context see Terras 2006.

[17] . See also Hanson, Stutz, and Cheeseman 1991, and the project website at <>.

[18] . The software was configured with a zero-point of 0, a relative error of 0.02, and using the Single Normal CN model An explanation of these settings is given in the preparation-c.text and models-c.text files which are included in the AutoClass distribution, for which see <>.

[19] . For a full discussion of the hands and features see Stokes 2005.

[20] . The database has not yet been made publicly available but the content is derived from Gneuss 2001, Ker 1957, Sawyer 1968, and my own research. A detailed discussion of the hands, and results obtained from the database, can be found in Stokes 2005.

[21] . For the results of this analysis see Stokes 2005, and for a similar approach but with a very different interface see the palaeographic catalogue in the MANCASS C11 database (<>).

[22] . I am indebted to Prof. David MacKay for this suggestion. For an example of such a classifier see the dendogram presented by Ciula 2005. For the lack of organisation in Anglo-Saxon script see especially Ker 1985, 34.

Works cited

Arazi, B. 1977. Handwriting identification by means of run-length measurements. Institute of Electrical and Electronic Engineering Transactions Systems, Man and Cybernetics SMC-7, no. 12:878-81.

Bischoff, Bernhard. 1954. Nomenclature des écritures livresques du IXe au XIIIe siècle. In Nomenclature des écritures livresques du IXe au XVIe siècle, edited by B. Bischoff, G. I. Lieftink and G. Battelli, 7-14. Paris: Centre National de la Recherche Scientifique.

Bischoff, Bernhard. 1990. Latin palaeography: Antiquity and the middle ages. Translated by D. Ó Cróinín and D. Ganz. Cambridge: Cambridge University Press.

Brown, Michelle P. 1990. A guide to western historical scripts from antiquity to 1600. London: British Library.

Bulacu, Marius, and Lambert Schomaker. 2003. Writer style from oriented fragments. In Proceedings of the Tenth International Conference on Computer Analysis of Images and Patterns (Groningen - The Netherlands, August), 460-469.

Bulacu, Marius, Lambert Schomaker, and Louis Vuurpijl. 2003. Writer-identification using edge-based directional features. In Proceedings of the Seventh International Conference on Document Analysis and Recognition (Edinburgh - Scotland, August), 2:937-941.

Ciula, Arianna. 2005. Digital palaeography: Using the digital representation of medieval script to support palaeographic analysis. Digital Medievalist 1.

Costamagna, Giorgio, Françoise Gasparri, Léon Gilissen, et al. 1995 and 1996. Commentare Bischoff. Scrittura e Civiltà 19:325-48 and 20:401-7.

Davis, Lisa Fagin. 1998. Towards an automated system of script classification. Manuscripta 42:193-201.

Derolez, Albert. 2003. The palaeography of gothic manuscript books from the twelfth to the early sixteenth century. Cambridge: Cambridge University Press.

Dumville, David N. 1987. English square minuscule script: The background and earliest phases. Anglo-Saxon England 16:147-179.

Dumville, David N. 1988. Beowulf come lately: Some notes on the palaeography of the Nowell Codex. Archiv für das Studium der neueren Sprachen und Literaturen 225:49-63.

Dumville, David N. 1993. English caroline script and monastic history: Studies in benedictinism, A.D. 950-1030. Woodbridge: Boydell.

Dumville, David N. 1994. English square minuscule script: The mid-century phases. Anglo-Saxon England 23:133-164.

Dumville, David N. 2001. Specimina codicum palaeoanglicorum. In Kansai university collection of essays in commemoration of the 50th anniversary of the Institute of Oriental and Occidental Studies, 1-24. Suita, Osaka.

Gneuss, Helmut. 2001. Handlist of Anglo-Saxon Manuscripts: A List of Manuscripts and Manuscript Fragments Written or Owned in England up to 1100. Tempe, AZ: Arizona Center for Medieval and Renaissance Studies.

Gumbert, J. P. 1976. A proposal for a cartesian nomenclature. In Essays presented to G. I. Lieftinck, IV: Miniatures, scripts, collections, edited by J. P. Gumbert and M. J. M. de Haan, 45-52. Amsterdam: Van Gendt.

Gumbert, J. P. 1998. Commentare “Commentare Bischoff”. Scrittura e Civiltà 22:397-404.

Hanson, Robin, John Stutz, and Peter Cheeseman. 2004. Bayesian classification theory: Technical report FIA-90-12-7-01. NASA 1991. <>.

Ker, Neil R. 1957. Catalogue of manuscripts containing Anglo-Saxon. Oxford: Clarendon.

Ker, Neil R. 1985. Books, collectors and libraries: Studies in medieval heritage. London: Hambledon.

MacKay, David J. C. 2003. Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press.

Parkes, Malcolm. 1969. English cursive book hands, 1250-1500. Oxford: Clarendon.

Pratesi, Alessandro. 1998. Commentare Bischoff: Un secondo intervento. Scrittura e Civiltà 22:405-8.

Sawyer, P. H. 1968. Anglo-Saxon charters: An annotated list and bibliography. London: Royal Historical Society. Revised electronic version by R. Rushforth, S. Kelly, S. Miller et al. available at <>.

Schomaker, Lambert, Marius Bulacu, and Merijn van Erp. 2003. Sparse-parametric writer identification using heterogeneous feature groups. In Proceedings of the International Conference on Image Processing (Barcelona - Spain, September), 1:545-548. <>.

Schomaker, Lambert, Marius Bulacu, and Merijn van Erp. 2007. Advances in writer identification and verification. Keynote paper delivered to the 9th International Conference on Document Analysis and Recognition. Curitiba: ICDAR. <>.

Spumar, Pavel. 1976. Palaeographical difficulties in defining an individual script. In Essays presented to G. I. Lieftinck, IV: Miniatures, scripts, collections, edited by J. P. Gumbert and M. J. M. de Haan, 62-68. Amsterdam: Van Gendt.

Srihari, Sargur N. 2001. Handwriting identification: Research to study validity of individuality of handwriting and develop computer-assisted procedures for comparing handwriting. Buffalo, NY: Center of Excellence for Document Analysis and Recognition.

Srihari, Sargur N. 2003. Quantitative assessment of handwriting individuality [Powerpoint Presentation]. CEDAR<>.

Srihari, Sargur N., Sung-Hyuk Cha, Hina Arora, and Sangjik Lee. 2002. Individuality of handwriting. Journal of Forensic Science 47:1-17.

Stokes, Peter A. 2005. English vernacular script ca 990–ca 1035. Cambridge: unpublished Ph.D. dissertation.

Stokes, Peter A. Forthcoming. The “Vision of Leofric”: manuscript, text, and content. Peritia.

Stutz, John, and Peter Cheeseman. 1996. Bayesian classification (Autoclass): Theory and results. In Advances in knowledge, discovery and data mining, edited by U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, 61-83. Cambridge, MA: MIT Press.

Terras, Melissa. 2006. Image to interpretation: An intelligent system to aid historians in reading the Vindolanda Texts. Oxford: Oxford University Press.

van Ginkel, Michael, and Geert van Kempen. 2004. DIPimage and DIPlib . <>.

van Kempen, Geert, Michael van Ginkel, Cris L. Luengo Hendriks, and Lucas J. van Vliet. 2003. DIPlib function reference. Delft: Delft University of Techonology.

Zhang, Bin, and Sargur N. Srihari. 2003. Binary vector dissimilarity measures for handwriting identification. Document Recognition and Retrieval 10:28-38.

Zurada, Jacek M. 1992. Introduction to artificial neural systems. St Paul: West Publishing.