Digital Medievalist 1 (2006). ISSN: 1715-0736.
© M. J. Driscoll 2005. Creative Commons Attribution-NonCommercial licence, 2.5

P5-MS: A general purpose tagset for manuscript description

[ Skip to Abstract | Return to Top ]

Commissioned Project Report

Commissioning Editor: James Cummings, University of Oxford.
Received: November 25, 2005
Revised: January 6, 2006
Published: May 2, 2006

[ Skip to Navigation | Return to Colophon ]

Abstract

This article discusses the new manuscript description module in TEI P5, looking in particular at how and why it differs from its immediate predecessors, the proposals made by the MASTER project, and the TEI Medieval Manuscripts Description Work Group (TEI-MMSS).

Keywords: manuscript description; electronic cataloguing; text encoding; metadata standards; XML.


[ Return to Navigation]

Background

§ 1    The idea of using computers to provide greater access to medieval and other manuscript materials dates from the late 70s and early 80s, when a number of attempts were made to apply relational database technology to manuscript studies, in particular in the form of searchable electronic catalogues. Unfortunately, but also understandably, these projects generally relied on locally developed or proprietary software, with all the problems for long-term maintenance and interoperability that entails. Moreover, each system also had its own standards in respect of the nature and amount of information included, the order and way in which this information was presented and so on, reflecting the lack of even national standards for manuscript description. Although the results were frequently impressive, we were still a very long way from the seamless union catalogue of European medieval manuscripts now envisaged by manuscript scholars—some of them at least. [1]

§ 2    In the mid-nineties the advent of Standard Generalized Markup Language (SGML) and the World Wide Web gave new impetus to work on electronic manuscript cataloguing. At the same time, developments in digital imaging meant that manuscript holding institutions could provide an unprecedented degree of access to their holdings while diminishing their actual use: preservation and access in one. And with the rise of large digital collections came an increased awareness of the central importance of metadata standards.

§ 3    In 1996 the Mellon Foundation provided funding for three collaborative projects, Electronic Access to Medieval Manuscripts (EAMMS <http://www.hmml.org/eamms/>), Digital Scriptorium (<http://sunsite3.berkeley.edu/Scriptorium/>) and—of less relevance here but still important—Sagnanet (<http://sagnanet.is>), all of which sought to develop mechanisms for providing online access to manuscript materials of various kinds. In Europe meanwhile, there was a meeting held in November 1996 at Studley Priory, near Oxford, organised by Peter Robinson of de Montfort University and Hope Mayo from the EAMMS project and attended by representatives from major manuscript holding institutions in Britain, France, The Netherlands, Denmark, Germany, the Czech Republic, and Italy, together with experts on MARC, the Berkeley Finding Aids project, the Text Encoding Initiative, and the Dublin Core. The Studley Priory meeting was followed by meetings of the EAMMS group at Hill Monastic Manuscript Library in December 1996, and in November 1997 by a meeting at Columbia University which brought together many of the participants in the EAMMS, Digital Scriptorium, and (then still nascent) MASTER projects. These meetings confirmed that there was indeed not only a widespread awareness of the need for a standard, but also a fairly broad consensus as to what form that standard should take and what the appropriate technical means were to implement it.

The MASTER project and TEI-MMSS

§ 4    The version of the TEI Guidelines currently under construction, TEI P5, contains a major new chapter on manuscript description (Driscoll et al. 2005) (hereafter referred to as P5-MS). The tagset documented there is based chiefly on that developed by the EU-funded MASTER project and the TEI Medieval Manuscripts Description Work Group (TEI-MMSS). [2] Although the work of these two groups proceeded largely in tandem, with members of each attending the other's meetings and so on, and despite an avowed intention that a single set of recommendations should emerge from them, there were, in the end, significant discrepancies between the two.

§ 5    In some cases these discrepancies arose because one of the groups simply paid more attention to some aspect of manuscript description than the other did. The MASTER project, for example, never finalised its discussion on seals before the end of the project period, while TEI-MMSS did, whereas MASTER developed quite sophisticated mechanisms for dealing with personographies and bibliographical references, an area largely untouched by the Work Group. In this sense the two schemes could be said to complement each other. There were, however, also discrepancies between the two which seemed to reflect a fundamental difference of opinion as to what the tagset should be used for and by whom. Thus TEI-MMSS, which consisted principally of librarians and cataloguers, seemed primarily concerned with the practicalities of manuscript cataloguing, and in particular with the accommodation of existing (legacy) data, while the MASTER project, which consisted principally of manuscript scholars, seemed more interested in determining the underlying structure of manuscript descriptions in a more general, theoretical way. (I do not want here to make too much of the distinction manuscript librarian vs. manuscript scholar for the simple reason that, while the opposite may not be true, most manuscript librarians are also manuscript scholars; I only wish to point out how an underlying practical vs. a theoretical orientation might not surprisingly lead to a somewhat different result.)

Legacy data

§ 6    The issue of legacy data is real enough, since most manuscript collections will have some form of catalogue (a printed book or card catalogue) and only very rarely is one required to describe a group of manuscripts ex nihilo. The existing catalogue may be of such great authority that one feels that nothing in it may be changed, and one may even want to reproduce as closely as possible the physical appearance of the original; more commonly, one may simply lack the time, expertise, or funding necessary to recast and augment existing data, although one may still want to provide sufficient markup to facilitate some basic searching capabilities. On the other hand, the existing catalogue may be old and out of date, and one may wish to update the information it provides on the basis of more recent scholarship; one may also want to describe a manuscript, a group of manuscripts, or an entire collection in much greater detail than has been done before, and one may therefore feel in no way bound to reproduce or even acknowledge existing data. Then there are those who are not engaged in cataloguing at all, but who rather are interested in encoding codicological and/or philological data (whether from an existing catalogue or not) for other scholarly purposes. Ideally, a tagset for manuscript description should be able to accommodate the full range of eventualities. MASTER was reasonably good at catering to the latter type of user but was too inflexible to be of much use to those who were either not able or not of a mind to modify existing data, but were still interested in providing wider access to it.

§ 7    The solution proposed by TEI-MMSS was to loosen up the DTD, essentially allowing the sub-elements of <msDescription> to occur in any order and to repeat, and also to occur within running prose. While this did undeniably ensure greater flexibility, the opportunities for abuse and illogicalities it opened up were worrying: things can, after all, become so flexible that they lose any kind of structural integrity.

TEI task force

§ 8    Recognising both the failure of MASTER to deal in a satisfactory way with things like the issue of legacy data and the inadequacy of the solutions proposed by TEI-MMSS, the TEI Council in 2002 appointed a special task force whose job it was to review the current state of TEI-based recommendations for the detailed description of manuscript materials [...], to identify and define a common subset of those recommendations adequate to the needs of the TEI community [and] to document that set of recommendations in such a way as to facilitate their inclusion in TEI P5, once approved by the TEI Council. [3] Because the task force was able also to take into account the actual experience of the many projects using MASTER as well as complementary work done by other agencies, notably the Repertorium of Old Bulgarian Literature and Letters project (<http://clover.slavic.pitt.edu/~repertorium/>), the result was not simply a common subset of the two schemes, but rather a significant improvement on both.

§ 9    Initially the task force thought of proposing two alternative elements, along the lines of <bibl> vs. <biblStruct> or <entry> vs. <entryFree> , which would allow the user a choice between a structured and an unstructured <msDescription> . It was fairly quickly determined that this was not the ideal solution, however, since one might well want or need to begin with unstructured data to which one could add structure at a later time. The best way to achieve this, it was decided, was instead to offer the choice between unstructured data, in the form of simple paragraphs ( <p> ) and structured data, in the form of special purpose elements, at every level in the description.

§ 10    The first-level children of <msDescription> are the following:

  • <msIdentifier> : information which uniquely identifies the manuscript, i.e. its location, holding institution, and shelfmark.
  • <head> (formerly <msHeading> ; see below): a brief description of the manuscript, for example a uniform or supplied title, information on place and date of origin, and the language or languages of the contents.
  • <msContents> : information on the intellectual content of the manuscript or manuscript part.
  • <physDesc> : information concerning physical aspects of the manuscript or manuscript part, such as its material, size, format, script, decoration, binding, etc.
  • <history> : information on the history of the manuscript or manuscript part, its origin, provenance, and acquisition by its holding institution.
  • <additional> : administrative information relating to its availability, custodial history, surrogates, etc.
  • <msPart> : in essence a nested <msDescription> , to be used for composite manuscripts now regarded as constituting a single unit but made up of two or more parts which were originally physically distinct.

Of these, only <msIdentifier> is required. Following this, one has the option of providing a heading and then either one or more paragraphs, marked up as a series of <p> elements, or one or more of the specialised elements listed above (which are all optional, but if used may only appear once and only in the order given). Within each of these elements there is again a choice between paragraphs or a number of specialised sub-elements; <msContents> , for example, may consist either of one or more paragraphs or one or more <msItem> elements; each of these <msItem> elements may in turn contain either paragraphs or specific elements for <rubric> , <incipit> , <explicit> , and <colophon> , as well as the standard TEI elements <author> , <title> , <respStmt> , and <bibl> . In this way, it was felt, a single mechanism could be provided which would be flexible enough to deal with everything from raw legacy data to highly structured original descriptions.

Examples: MS. Add. A. 61

§ 11    To take a simple example, here is a short description of a manuscript, chosen more or less at random from the Bodleian Library's Summary catalogue (Madan et al. 1895-1953, 5: 515).

Figure 1: Description of Oxford, Bodleian, MS. Add. A. 61 in Madan et al. 1895-1953 Description of Oxford, Bodleian, MS. Add. A. 61 in Madan et al. 1895-1953

A simple conversion

§ 12     If one wanted to put this catalogue entry into machine readable form, but was not interested in or was unable to add any further markup, one could begin by either scanning or keying in the text, providing the necessary information for the mandatory <msIdentifier> element, and then simply wrapping the three paragraphs of prose in <p> elements, as in the following example:

<msDescription>    <msIdentifier>
    <settlement> Oxford </settlement>
     <repository> Bodleian Library </repository>
    <idno> MS. Add. A. 61 </idno>
    <altIdentifier type="SC">
      <idno> 28843 </idno>
     </altIdentifier>
   </msIdentifier>
  <p> In Latin, on parchment: written in more than one hand of the 13th cent. in England: 7¼ x 5⅜ in., i + 55 leaves, in double columns: with a few coloured capitals. </p>
  <p> 'Hic incipit Bruitus Anglie,' the De origine et gestis Regum Angliae of Geoffrey of Monmouth (Galfridus Monumetensis: beg. 'Cum mecum multa & de multis.' </p>
  <p> On fol. 54v very faint is 'Iste liber est fratris guillelmi de buria de ... Roberti ordinis fratrum Pred[icatorum],' 14th cent. (?): 'hanauilla' is written at the foot of the page (15th cent.). Bought from the rev. W. D. Macray on March 17, 1863, for £1 10s. </p>
</msDescription>

§ 13    A simple conversion of this kind would take no more than a few minutes if done by hand, and could be largely automated. Should one wish to approximate more closely the appearance of the printed text, the rend attribute could be used on the three <p> elements, with appropriate values; alternatively, the first paragraph could be tagged as a <head> and the third as a <note> (and moved inside the previous, and now only, <p> ). The result would, with a suitable style sheet, be displayable in a browser; it would not, however, be very useful for search purposes (other than searches for shelfmark and Summary catalogue running number).

Rich conversion

§ 14     In order to provide slightly richer markup, one could wrap the paragraphs in the appropriate special-purpose first-child-level elements of <msDescription> and add some of the phrase-level elements available when the manuscript description module is in use. Doing so necessitates some slight reorganisation of the data vis-à-vis the original printed source, but no actual rewriting of the text. Now, however, one would be able to search specifically for title, material, and date and place of origin.

<msDescription>
   <msIdentifier>
     <settlement> Oxford </settlement>
     <repository> Bodleian Library </repository>
     <idno> MS. Add. A. 61 </idno>
     <altIdentifier type="SC">
       <idno> 28843 </idno>
     </altIdentifier>
   </msIdentifier>
   <msContents>
   <p> <q> Hic incipit Bruitus Anglie, </q> the <title> De origine et gestis Regum Angliae </title> of Geoffrey of Monmouth (Galfridus Monumetensis): beg. <q> Cum mecum multa & de multis. </q> In Latin. </p>
   </msContents>
   <physDesc>
     <p> <material> Parchment </material> : written in more than one hand: 7¼ x 5⅜ in., i + 55 leaves, in double columns: with a few coloured capitals. </p>    </physDesc>
   <history>
    <p> Written in <origPlace> England </origPlace> in the <origDate> 13th cent. </origDate> On fol. 54v very faint is <q> Iste liber est fratris guillelmi de buria de ... Roberti ordinis fratrum Pred[icatorum], </q> 14th cent. (?): <q> hanauilla </q> is written at the foot of the page (15th cent.). Bought from the rev. W. D. Macray on March 17, 1863, for £1 10s. </p>
   </history>
</msDescription>

Full restructuring

§ 15    One could also restructure the entire entry, using the full range of elements and sub-elements available within <msDescription> .

<msDescription>
   <msIdentifier>
     <settlement> Oxford </settlement>
     <repository> Bodleian Library </repository>
     <idno> MS. Add. A. 61 </idno>
     <altIdentifier type="SC">
       <idno> 28843 </idno>
     </altIdentifier>
   </msIdentifier>
   <msContents>
     <msItem>
       <author xml:lang="en"> Geoffrey of Monmouth </author>
       <author xml:lang="la"> Galfridus Monumetensis </author>
       <title type="uniform"> De origine et gestis Regum Angliae </title>
       <rubric> Hic incipit Bruitus Anglie </rubric>
       <incipit> Cum mecum multa & de multis </incipit>
       <textLang mainLang="la"> Latin </textLang>
     </msItem>
   </msContents>
   <physDesc>
     <objectDesc form="codex">
       <supportDesc material="perg">
         <support>
           <p> Parchment. </p>
         </support>
         <extent> i + 55 leaves <dimensions scope="all" type="leaf" unit="inch">
             <height> 7¼ </height>
             <width> 5⅜ </width>
           </dimensions>
         </extent>
       </supportDesc>
       <layoutDesc>
         <layout columns="2">
           <p> In double columns. </p>
         </layout>
       </layoutDesc>
     </objectDesc>
     <handDesc>
       <p> Written in more than one hand. </p>
     </handDesc>
     <decoDesc>
       <p> With a few coloured capitals. </p>
     </decoDesc>
   </physDesc>
   <history>
     <origin>
       <p> Written in <origPlace> England </origPlace> in the <origDate notAfter="1300" notBefore="1200"> 13th cent. </origDate> </p>
     </origin>
     <provenance>
       <p> On fol. 54v very faint is <q> Iste liber est fratris guillelmi de buria de <gap/> Roberti ordinis fratrum Pred <expan> icatorum </expan> </q> , 14th cent. (?): <q> hanauilla </q> is written at the foot of the page (15th cent.). </p>
     </provenance>
     <acquisition>
       <p> Bought from the rev. <name type="person"> W. D. Macray </name> on <date value="1863-03-17"> March 17, 1863 </date> , for £1 10s. </p>
     </acquisition>
   </history>
</msDescription>

§ 16    Note that here again it is largely a question of cutting and pasting sections of text from the original; there has been no rewriting of the text as such, although one might well at this stage wish to do so, extracting the data and updating and supplementing it as required. And clearly this is the most sensible thing to do: extract the data rather than worrying about the exact wording—which is precisely what the Bodleian have done in their electronic catalogue:

Figure 2: Electronic catalogue entry for Oxford, Bodleian, MS. Add. A. 61. Electronic catalogue entry for Oxford, Bodleian, MS. Add. A. 61.

TEI-MMS encoding

§ 17    What one may not do is the following, viz. take the pre-existing text as it comes and mark it up with the relevant parts of <msDescription> :

<msDescription>
   <msIdentifier>
     <altName rend="bold" type="SC"> 28843. </altName>
   </msIdentifier>
   <msContents>
     <p> In <textLang langKey="la"> Latin </textLang> </p>
   </msContents>
   <physDesc>
     <support>
       <p> on parchment </p>
     </support>
     <msWriting>
       <p> written in more than one hand </p>
     </msWriting>
   </physDesc>
   <history>
     <origin>
       <p> of the <origDate> 13th cent. </origDate> in <origPlace> England </origPlace> </p>
     </origin>
   </history>
   <physDesc>
     <dimensions> 7¼ x 5⅜ in. </dimensions>
     <extent> i + 55 leaves </extent>
     <layout>
       <p> in double columns </p>
     </layout>
     <decoration>
       <p> with a few coloured capitals. </p>
     </decoration>
   </physDesc>
   <msContents>
     <msItem> <rubric> Hic incipit Bruitus Anglie </rubric> , the <title type="uniform"> De origine et gestis Regum Angliae </title> of <author> Geoffrey of Monmouth (Galfridus Monumetensis) </author> : beg. <incipit> Cum mecum multa & de multis </incipit> </msItem>
   </msContents>
   <history>
     <provenance>
       <p> On fol. 54v very faint is 'Iste liber est fratris guillelmi de buria de ... Roberti ordinis fratrum Pred[icatorum]', 14th cent. (?): 'hanauilla' is written at the foot of the page (15th cent.). </p>
     </provenance>
     <acquisition>
       <p> Bought from the rev. W. D. Macray on March 17, 1863, for £1 10s. </p>
     </acquisition>
     <p> Now <msIdentifier type="primary">
         <idno> MS. Add. A. 61 </idno>
       </msIdentifier> . </p>
   </history>
</msDescription>

§ 18    Marking up legacy data in this way, which is what was proposed by TEI-MMSS (the example above parses against their DTD), is more time consuming than the cut and paste method, but requires an equal degree of familiarity with manuscript description conventions on the part of the person doing the conversion, who must be able to identify which parts of the text pertain to content, physical description, history, and so on. More importantly, flattening out the hierarchical structure of <msDescription> in this way leads to serious illogicalities, such as in the first section, where no fewer than six paragraphs are opened and closed within what is arguably a single sentence.

§ 19    As was said before, the relative inflexibility of MASTER made it less than ideal for those working with large amounts of legacy data and a tight budget and/or timetable, that is to say probably the majority of electronic cataloguing projects today, while the complete lack of constraints in the system proposed by TEI-MMSS made it unsuitable for those wishing to have more rigorously structured data. The solution put forward in P5-MS, it is hoped, allows the needs of the full range of potential users to be accommodated.

Major innovations in P5-MS

§ 20    As mentioned above P5-MS represents not simply an amalgamation of the schemes proposed by MASTER and TEI-MMSS, but rather, we believe, a significant improvement on both. Listed below are the major innovations in P5-MS, along with a brief discussion of the thinking behind them.

Attributes on <msDescription>

§ 21    In MASTER and TEI-MMSS there were type and status attributes on <msDescription> (and <msPart> ). The intention with the former was to distinguish at the very top (as it were) between manuscripts proper and archival material (charters etc.). In P5-MS, however, the form attribute on the new grouping element <objectDesc> (discussed further below) provides the same information. The status attribute, on the other hand, was defined as specifying the compositional status of a manuscript or manuscript part and had in the MASTER scheme the possible values uni|compo|frag|def|unknown. TEI-MMSS proposed that these should be split in two, as a manuscript can be both composite and defective, with a status attribute the possible values of which were frag|def|unk, and a composite attribute with possible values y|n|u. Over the years it had been pointed out several times that, for one thing, status should probably be called msStatus or something similar, since it refers to the status of the manuscript and not of the description, and for another that the distinction between composite and non-composite manuscripts is in any case inferable from the presence or absence of <msPart> elements; similarly, a manuscript which is defective will (or should) have defective="true" on <msContents> or <msItem> . Thus the only possible value of status which cannot be inferred from other places in the document is frag; this was always potentially problematic anyway, as the distinction between a fragmentary and a defective manuscript is somewhat arbitrary (a rule of thumb was that if the cataloguer thought it likely that more than 50% of the leaves were missing, the manuscript was a fragment, otherwise it was merely defective). It was decided therefore to drop both these attributes in P5-MS.

§ 22    TEI-MMSS also proposed a dateAttrib attribute on the elements <msDescription> and <msPart> , with possible values dated|datable|unknown, the idea being to distinguish, again at the topmost level, between manuscripts which are dated internally, those which can be dated on the basis of other evidence, and those for which an approximate date has be assigned by a scholar on the basis, for example, of palaeographical or orthographical features. Recognising the central importance of these distinctions to manuscript scholarship, the task force initially decided to incorporate this attribute, even proposing an analogous placeAttrib attribute, with possible values localized|localizable|unknown. Here again, however, these attributes merely repeat at a higher level information for which there already is a place elsewhere in the document, namely the evidence attribute, available on <origDate> and <origPlace> , which has the possible values internal, meaning that the manuscript is formally dated or localised by the scribe, external, meaning that the manuscript is datable or localisable via inferred knowledge from some aspect of the book itself, and conjecture, meaning that in the absence of internal or external evidence an attribution of place or date for the manuscript has been made by the cataloguer or scholar on the basis of his or her expertise. A value of internal for evidence on <origDate> thus means dateAttrib="dated", a value of external means dateAttrib="datable", and conjecture means dateAttrib="unknown", and the same is true of the values of evidence on <origPlace> . For this reason it was decided also to drop these attributes, leaving only those which are globally available.

§ 23    This does, of course, mean that one can only distinguish diplomas from codices, composite from non-composite manuscripts, or dated and datable manuscripts from undatable ones if one chooses the more structured option, which will not apply to those hoping (or forced) to make do with an absolute minimum of tagging. It was felt, however, that there should only be one set of mechanisms available for making such distinctions, and that those wishing to do so should make use of those mechanisms.

Contents and structure of <msIdentifier>

§ 24    In both MASTER and TEI-MMSS the sub-elements available within <msIdentifier> were <country> , <region> , <settlement> , <institution> , <repository> , <collection> , <idno> , and <altName> , which was intended for a former shelfmark or some name other than the shelfmark by which a manuscript is known. In MASTER it was decided that three of these, <settlement> , <repository> , and <idno> , should be required, since they provide what is, by common consent, the minimum amount of information necessary to identify a manuscript uniquely: place, repository, shelfmark. TEI-MMSS argued that all the sub-elements of <msIdentifier> should be optional and repeatable, and that <msIdentifier> itself should be allowed to repeat, as there are cases, for example, where there are two legal owners of the same manuscript, where a collection legally exists in more than one city, where a repository only has one manuscript, or only one of any significance, which has no call number as such but is instead known by one or more names, or where the manuscript was formerly owned by someone else, or had another shelfmark. TEI-MMSS also proposed that <msIdentifier> should be used when referring to another manuscript within a description, and pointed out that in such cases the sub-elements might appear in varied order and interspersed with plain prose. A type attribute was proposed, with the values primary|former|cited|msPart, to keep these various uses separate. The task force felt quite strongly that there should only be one <msIdentifier> per <msDescription> , i.e. that pertaining to the manuscript being described, and that most of the scenarios envisaged by TEI-MMSS could be dealt with either by existing means or through the introduction of an <altIdentifier> element, in effect a nested <msIdentifier> , which could be used for former shelfmarks or other alternative forms of identification, such as the running number of a printed catalogue, as in the example from the Bodleian Library Summary catalogue given above. <altName> , renamed <msName> , is still available for nicknames. One of the cases mentioned by TEI-MMSS, that of a manuscript known only by a name but with no shelfmark as such, required a more radical solution, as did another case, not mentioned by TEI-MMSS but brought to the attention of the task force, namely that of scattered manuscripts, that is to say manuscripts which have been split up but which are nevertheless treated by the scholarly community as single units. One well-known example of this is the Old Church Slavonic manuscript known as Codex Suprasliensis, parts of which are found in three separate repositories, in Ljubljana, Warsaw, and St. Petersburg. This could be dealt with using <msName> followed by a series of <altIdentifier> elements, with an appropriate value on the type attribute to indicate the nature of the relationship between them.

<msIdentifier>
   <msName type="nickname" xml:lang="la"> Codex Suprasliensis </msName>
   <altIdentifier type="partial">
     <settlement> Ljubljana </settlement>
     <repository> Narodna in univerzitetna knjiznica </repository>
     <idno> MS Kopitar 2 </idno>
   </altIdentifier>
   <altIdentifier type="partial">
     <settlement> Warszawa </settlement>
     <repository> Biblioteka Narodowa </repository>
     <idno> BO 3.201 </idno>
   </altIdentifier>
   <altIdentifier type="partial">
     <settlement> Sankt-Peterburg </settlement>
     <repository> Rossiiskaia natsional'naia biblioteka </repository>
     <idno> Q.p.I.72 </idno>
   </altIdentifier>
</msIdentifier>

<msHeading> replaced by <head>

§ 25    The <msHeading> element was used in MASTER/TEI-MMSS in order to provide a short summary description of a manuscript, sometimes called a tombstone, such as might be displayed or printed as a heading to a catalogue description; it was made clear by both groups that <msHeading> was not intended to stand in place of a proper description, and that the elements internal to it ( <author> etc.) were not there for search purposes. Unable to see any significant difference between <msHeading> and the standard TEI <head> element, the task force decided to replace the former with the latter. The content model of <head> allows for phrase-level elements only, which in P5-MS include <title> , <origDate> , <origPlace> , and <note> , leaving <author> , <resptSmt> , and <textLang> unavailable from the old content model (although <head> may also contain <bibl> , which contains <author> , <resptSmt> ). But, as was said, since this element is intended only to provide a heading for a manuscript description, rather than the description itself, structured information on contents, date and place of origin, language, and so on should be given under the appropriate elements. Here, for example, is an example of an <msHeading> according to MASTER/TEI-MMSS:

<msHeading>
   <title> Apocalypse with Commentary </title>
   <origPlace> Spain/Portugal </origPlace>
   <origDate notAfter="1300" notBefore="1200"> s. XIII </origDate>
   <textLang langKey="LAT"> Latin </textLang>
</msHeading>

The corresponding <head> element in P5-MS is:

<head>
   <title> Apocalypse with Commentary </title> ; Spain/Portugal, s. XIII, Latin.
</head>

Note that in the P5-conformant example only <title> has been used, chiefly for rendering purposes.

Two kinds of <msItem>

§ 26    There are now two forms of the <msItem> element, one still called <msItem> , the content model of which is essentially the same as that of the old MASTER/TEI-MMSS element, and a new <msItemStruct> , which can be used for a more rigorously structured description; the contents of the two are identical and the only difference between them is that in the former the order of the elements is free whereas in the latter is constrained. Note that the <finalRubric> element, originally present in MASTER but subsequently removed, has been reinstated in both (although in <msItem> it would in theory also be possible to use <rubric type="final"> ). A new element, <filiation> , has been introduced (borrowed from the Repertorium project) to provide a place for information concerning the manuscript's relationship to other surviving witnesses of the same text, its protographs, antigraphs, and apographs.

Structure of <physDesc>

§ 27    The greatest changes made to the original model proposed by MASTER and TEI-MMSS are in <physDesc> , where a number of grouping elements has been introduced in order to make the structure more logical, and several existing elements have been renamed for the sake of consistency. The first of these new grouping elements is <objectDesc> , which relates specifically to the text-bearing object. It contains two further elements, <supportDesc> and <layoutDesc> (both optional); <supportDesc> contains the elements relating to the physical object, or vehicle, on which the text is inscribed, <support> , <extent> , <foliation> , <collation> , and <condition> (all optional, but only in that order), while <layoutDesc> contains one or more <layout> elements, detailing the way(s) in which the text is organised on the page (or other surface). The structure of these sub-elements is essentially the same as it was in MASTER/TEI-MMSS, but at each level there is also the possibility of using paragraphs ( <p> ). Note than instead of the element <form> there is now a form attribute on <objectDesc> and similarly a material attribute on <supportDesc> , as it was felt that these were most likely to be of interest as search criteria (although the information may, of course, also be repeated in the prose).

§ 28    The following, are two encodings of the same physical description, or at least that part dealing with the support and layout. First, the old MASTER/TEI-MMSS-conformant record:

<physDesc>
   <form>
     <p> Codex. </p>
   </form>
   <support>
     <p> Parchment. The entire codex is a palimpsest, deriving from four separate manuscripts, two of which are from responsorialia from the tenth-eleventh century. There are also the remains of a ninth-century Catalonian <title> Forum Iudicum </title> written in early Visigothic minuscule. </p>
   </support>
   <extent> ii + 97 + ii, <dimensions scope="all" type="leaf">
       <height> 201 </height>
       <width> 129 </width>
     </dimensions> </extent>
   <collation>
     <p> <formula notation="AMI"> 1-3:8, 4:6, 5-13:8 </formula> <signatures> There are quire signatures in red ink in the centre lower margin, <q> ii </q> - <q> viiii </q> , on <locus> fols 39v </locus> , <locus> 47v </locus> , <locus> 55v </locus> , <locus> 64v </locus> , <locus> 71v </locus> , <locus> 79v </locus> , <locus> 87v </locus> , and <locus> 95v </locus> </signatures> . </p>
   </collation>
   <layout columns="1" writtenLines="24">
     <p> Written in one column throughout; 24 lines per page. </p>
   </layout>
   <!-- more -->
</physDesc>

Secondly, the new P5-MS:

<physDesc>
   <objectDesc form="codex">
     <supportDesc material="perg">
       <support>
         <p> <material> Parchment </material> . The entire codex is a palimpsest, deriving from four separate manuscripts, two of which are from responsorialia from the tenth-eleventh century. There are also the remains of a ninth-century Catalonian <title> Forum Iudicum </title> written in early Visigothic minuscule. </p>
       </support>
       <extent> ii + 97 + ii, <dimensions scope="all" type="leaf"> <height> 201 </height> <width> 129 </width> </dimensions>
       </extent> <collation>
         <p> <formula notation="AMI"> 1-3:8, 4:6, 5-13:8 </formula> <signatures> There are quire signatures in red ink in the centre lower margin, <q> ii </q> - <q> viiii </q> , on <locus> fols 39v </locus> , <locus> 47v </locus> , <locus> 55v </locus> , <locus> 64v </locus> , <locus> 71v </locus> , <locus> 79v </locus> , <locus> 87v </locus> , and <locus> 95v </locus> </signatures> . </p>
       </collation>
     </supportDesc>
     <layoutDesc>
       <layout columns="1" writtenLines="24">
         <p> Written in one column throughout; 24 lines per page. </p>
       </layout>
     </layoutDesc>
   <!-- more -->
</physDesc>

§ 29    Following <objectDesc> comes <handDesc> (formerly <msWriting> ), containing one or more <handNote> (formerly <handDesc> ) elements, and after that <musicNotation> , containing one or more paragraphs, <decoDesc> (formerly <decoration> ), which contains one or more <decoNote> elements, <additions> , containing one or more paragraphs, <bindingDesc> , containing one or more <binding> elements, a new <sealDesc> element, containing one or more <seal> elements, and finally <accMat> , which formerly came under <additional> , containing one or more paragraphs. Here too, the structure of the sub-elements is essentially the same as it was in MASTER/TEI-MMSS, apart from <decoNote> , which previously had a large number of very specific attributes, viz. size, technique, style, and quality, all without fixed sets of values, and figurative and illustrative, the possible values of which were yes, no or unknown/non-applicable. It was decided to drop all of these, retaining only type and subtype (in addition to those globally available). Note that paragraphs are available as an alternative within all elements where there are special-purpose sub-elements, while it is hoped that more structured alternatives will be developed in the future for those elements for which there are none at present, <binding> for example.

§ 30    An attempt at grouping the other first-child level elements of <physDesc> , that is, on the one hand, <handDesc> , <musicNotation> , <decoDesc> , and <additions> (what might be referred to as the meaningful inky bits as opposed to the matrix on which they are inscribed), and, on the other, <bindingDesc> , <sealDesc> , and <accMat> (things which happen to the manuscript after it has come into being and are less integrally a part of it), was abandoned, chiefly owing to a lack of suitable nomenclature, although it is hoped this may be taken up again at a later date.

Conclusion

§ 31    Attending the early meetings of MASTER and TEI-MMSS was, for the present writer at least but doubtless for others too, a bit like when as a youth one first has dinner at someone else's house and discovers that not everyone does everything in exactly the same way. It could be a small detail, such as how the table is set or the napkins folded, but it could also be something fairly major, like the order and composition of the courses: although pretty much everybody has their pudding last, some people eat their salad before, others with, and still others after the main course (but before the pudding, naturally) —and then of course there are those who don't eat salad at all. We found at these early meetings, and later in the series of MASTER workshops held around Europe, that while there is quite clearly a single tradition for the description of (western) manuscripts, one with its roots in antiquity, there is also a great deal of variation within that tradition, and the majority of us are brought up and remain within one regional variety. An encoding standard for the description of manuscripts—or, for that matter, meals—needs to be flexible enough to accommodate this variation, while remaining true to the underlying tradition. With the TEI P5 module for manuscript description we believe we have accomplished this. We believe moreover that although it was originally developed to meet the needs of manuscript scholars working in the European tradition, the module is general enough so that it can also be extended to other kinds of materials and other traditions, indeed virtually any text-bearing artefact.

Notes:

[1] . A useful survey of this early work can be found in Stevens 1991. Several pioneer projects from this period have carried on and developed into important research facilities, such as the International Computer Catalogue of Medieval Scientific Manuscripts in München (recently redubbed Jordanus <http://jordanus.ign.uni-muenchen.de>), the Zentralinventar Mittelalterlicher Handschriften (ZIH) at the Deutsche Staatsbibliothek in Berlin (which has subsequently developed into Manuscripta Mediaevalia <http://www.manuscripta-mediaevalia.de/>, and MEDIUM at the Institut de Recherche et d'Histoire des Textes in Paris <http://www.irht.cnrs.fr/>.

[2] . MASTER (Manuscript Access through Standards for Electronic Records) was an international project whose goal was to define and implement a general purpose standard for the description of manuscript materials using XML (initially SGML). Funding for the project came from the Telematics for Libraries section of the European Union Fourth Framework research programme. The project period began in January 1999 and ran through June 2001. Project leader was Peter Robinson, then at the Centre for Technology and the Arts at De Montfort University, Leicester (UK). Full partners, in addition to De Montfort University, were Koninklijke Bibliotheek, Den Haag (NL), Det Arnamagnæanske Institut, København (DK), L'Institut de recherche et d'histoire des textes, Paris/Orleans (FR), The Humanities Computing Unit, Oxford (UK), and Národní knihovna Ceské republiky, Praha (CZ). Associate partners included Stofnun Árna Magnússonar á Íslandi, Reykjavík (IS), Universitetsbiblioteket, Lund (SE), Народна Библиотека Св Св Кирил и Методий and Институт по Математика и Информатика, БАН, Софиа (BG), The Perdita Project at Nottingham Trent University (UK), and Lietuvos nacionaline Martyno Mazvydo biblioteka, Vilnius (LT). An independent expert group, made up of Dr Ian Doyle, Durham (UK), Professor Peter Gumbert, Leiden (NL), and Dr Gilbert Ouy, Paris (FR), monitored and commented on the development of the standard from the start. An archive copy of the reference manual for the MASTER DTD is available in the TEI website (<http://www.tei-c.org.uk/Master/Reference/oldindex.html>). TEI-MMSS (TEI Medieval Manuscripts Description Work Group) was headed by Consuelo W. Dutschke of the Rare Book and Manuscript Library, Columbia University (USA), and Ambrogio Piazzoni of the Biblioteca Apostolica Vaticana (IT); other members were Peter J. Kidd, The British Library (UK), Eva Nylander, Lunds universitets bibliotek (SE), and Merrilee Proffitt, Research Library Group (USA). The group was active between July 1998 and October 2000. Documents pertaining to the work of TEI-MMSS, including DTDs and documentation, are available at <http://www.merrilee.org/tei-mss/> (accessed 19/11 2005).

[3] . The present writer served as chair of the task force; other members were Merrilee Proffitt and David Birnbaum, at the time both members of the Council, as well as the two TEI editors (ex officio). The Council's original charge and other documents pertaining to the work of the task force can be found on the TEI website (<http://www.tei-c.org/Activities/MS/>).

Works cited

Madan, Falconer, Richard William Hunt, et al. 1895-1953. A summary catalogue of western manuscripts in the Bodleian Library at Oxford which have not hitherto been catalogued in the Quarto series 7 vols. in 8 [vol. II in 2 parts]. Oxford: Clarendon Press. Reprinted with corrections in vols. I and VII, Munich, 1980.

Stevens, Wesley ed. 1991. Bibliographic access to Medieval and Renaissance manuscripts: A survey of computerized data bases and information services. New York: Haworth Press.

Manuscript Description. In TEI P5: Guidelines for Electronic Text Encoding and Interchange, ed. C.M. Sperberg-McQueen and Lou Burnard <http://www.tei-c.org/release/doc/tei-p5-doc/html/MS.html>.