TEI P5 schema for Europeana Regia (ER), based upon ENRICH

Table of contents

1 Executive Summary

This document defines an XML format for the structure of the data which all Europeana Regia (ER) partners will contribute to Europeana, either directly or indirectly by means of a harvester or transformation process. It is an application of the TEI Release P5, according to the specification instantiated by the Europeana Regia ODD, based on the ENRICH ODD.

The schema defined by this document addresses three distinct aspects of a digitized manuscript:
  1. metadata describing the original source manuscript (??)
  2. metadata describing digitized images of the original source manuscript (??)
  3. a transcription of the text contained by the original source manuscript

Within ER, only the first two are required. However, the schema documented here also provides for the third, in the interest of completeness and for the assistance of ER partners wishing to provide richer access facilities to their holdings.

The schema defined by this document is available in RelaxNG, and W3C Schema formats from the Europeana Regia website at the address http://www.hab.de/forschung/projekte/europeana-regia/ODD/. These files are also both included as part of this deliverable.

For Europeana Regia, therefore, we propose to reduce the number of choices to encode some phenomena and to constrain attribute values where possible. It should be stressed however that
The overall structure of an ER-conformant XML document may be summarized as follows:
<TEI>
 <teiHeader>
<!-- ... metadata describing the digitization -->
<!-- ... metadata describing the manuscript -->
 </teiHeader>
 <facsimile>
<!-- ... metadata describing the digital images -->
 </facsimile>
 <text>
<!-- (optional) transcription of the manuscript -->
 </text>
</TEI>

In Deliverable D2.2, we provide full user-level documentation for the ER encoding scheme. This documentation is generated from the P5 release of the TEI Guidelines together with some revised descriptive material. In the present document we summarize how the TEI P5 Release has been customized.

2 Customization Section

We include in the schema the four basic key TEI modules header, core, tei, and textstructure. We —as ENRICH does— also include five specialized modules: msdescription, linking, namesdates, figures, and transcr. Additionally to these for Europeana Regia we propose to include the module gaiji in order to allow for the use of the element <g> and the module nets in order to allow for the use of the element <graph>.

All the elements and attributes defined by these modules are included in the ER schema, with the following modifications. Firstly, several unwanted elements are deleted. Secondly, some optional attributes have been made compulsory, and their range of possible values are constrained. Finally, the content model for a small number of elements has been simplified to remove unwanted alternatives.

The following elements are deleted: ab, alt, altGrp, analytic, appInfo, application, biblFull, biblStruct, binaryObject, broadcast, cRefPattern, <cit>, climate, correction, distinct, email, emph, equipment, equiv, fsdDecl, headItem, headLabel, hyphenation, imprint, interpretation, join, joinGrp, <link>, listNym, measureGrp, meeting, mentioned, metDecl, metSym, monogr, msItemStruct, namespace, normalization, num, nym, postBox, postCode, <q>, quotation, recording, recordingStmt, refsDecl, rendition, said, samplingDecl, scriptStmt, segmentation, series, soCalled, sp, speaker, stage, state, stdVals, street, tagUsage, tagsDecl, terrain, time, timeline, variantEncoding, when.

For ENRICH, the elements <table>, <row>, <cell>, measure, rs, <teiCorpus>, <linkGrp>, and att.global.linking had been deleted as well. For ER this removal has been taken back for the following reasons:

On the altIdentifier element, the type attribute is compulsory, and must take one of the following values: former; palimpsest; partial; internal; system; other. To these values the following values are added in the course of ER: access; alternative; catalog; erroneous; multivolume

On the availability element, the status attribute is compulsory, and must take one of the following values: free; unknown; restricted.

On the biblScope element, the type attribute is compulsory, and must take one of the following values: volume; pages.

On the custEvent element, the type attribute is compulsory, and must take one of the following values: check; conservation; description; exhibition; loan; photography; other.

On the decoNote element, the type attribute is compulsory, and must take one of the following values: border; diagram; initial; marginal; miniature; mixed; paratext; secondary; other; illustration; printmark; publishmark; vignette; frieze; map; unspecified.

On the dimensions element, the type attribute is compulsory, and must take one of the following values: leaf; binding; slip; written; boxed; unknown.

On the measure element, the type attribute is compulsory, and may take one of the following values: columnsCount; leavesCount; linesCount; unknown. For this element could not be used the same attribute values as for dimensions, as the latter would not distinguish between sizes and numbers.

On the gap element, the reason attribute is compulsory, and must take one of the following values: damage; illegible; cancelled; irrelevant.

On all members of the att.dimensions class, the unit attribute is compulsory, and must take one of the following values: chars; leaves; lines; mm; pages; words. The precision attribute is removed.

On the <handNote> element, the script attribute is compulsory, and must take one of the following values: carolmin; textualis; cursiva; hybrida; humbook; humcursiva; kanzlei; kurrent; capquad; caprust; uncialis; semiunc; benevent; luxeuil; corbie; insulmin; alemmin; raetmin; carolgot; textura; rotunda; cancell; bastarda; cursant; cursrec; other.

On the <handNote> element, the scope attribute is recommended, and must take one of the following values: sole; major; minor; unknown.

On the hi element, the rend attribute is compulsory, and should take one of the following values: hyphenated; underline; double-underline; bold; spaced; stacked; caps; italic; sup; rubric.

On the layout element, the columns attribute is compulsory, and must take a numeric value.

On the msDesc element, the xml:id attribute is compulsory, and must be a valid XML identifier.

On the msDesc element, the xml:lang attribute is compulsory, and must be a valid ISO 639 language code.

On the name element, the type attribute is compulsory, and must take one of the following values: person; place; org; unknown.

On the objectDesc element, the form attribute is compulsory, and must take one of the following values: codex; leaf; scroll; other.

On the person element, the sex attribute is compulsory, and must be one of 1 (male), 2 (female), 0 (inapplicable), or 9 (unknown).

On the ref element, the type attribute is recommended, and should take on of the following values: biblical; classical; medieval; altMs; mss; inkunabeln; drucke; pdf; purl; urn; doi; crossRef; repertorium.

On the region element, the type attribute is compulsory, and must take one of the following values: parish; county; compass; geog; state; unknown.

On the supplied element, the reason attribute is compulsory, and must take one of the following values: omitted; illegible; damage; unknown.

On the supportDesc element, the material attribute is compulsory, and must take one of the following values: perg; chart; mixed; unknown.

The following changes do not affect TEI conformance since either they affect only optional parts of TEI content models or they involve additional value constraints for TEI attributes:

Schema europeana-regia: Model classes

Schema europeana-regia: Attribute classes

att.global

att.global 
Module tei
Attributes Attributes
rendition
Status Optional

Schema europeana-regia: Elements

<altIdentifier>

<altIdentifier>
Module msdescription
Attributes Attributes
type
Status Required
Legal values are:
accession
accession number
alternative
the standard identification in an alternative version in writing
catalog
number in a catalogue
collection
a manuscript that has been grouped together with other manuscripts for some reason
erroneous
erroneous shelf number, but used in some literature
former
former shelf number
internal
internal project identifier
multivolume
mss is part of a multivolume and therefore has more than one shelfmark
other
unspecified [Default]
palimpsest
identifier of a previously written but deleted item
partial
identifier of a previously distinct item
system
former system identifier (Manuscriptorium specific)

<author>

<author>
Module core
Attributes Attributes
role
Status Recommended
Suggested values include:
author
author of a work - FRBR work [Default]
translator
translator of a work - FRBR expression
commentator
commentator of a work - FRBR manifestation
editor
editor of a work - FRBR manifestation/item
other
unspecified

<availability>

<availability>
Module header
Attributes Attributes
status
Status Required
Legal values are:
free
unknown
[Default]
restricted

<biblScope>

<biblScope>
Module core
Attributes Attributes
type
Status Required
Suggested values include:
column
volume
pages
[Default]

<custEvent>

<custEvent>
Module msdescription
Attributes Attributes
type
Status Recommended
Suggested values include:
check
conservation
description
exhibition
loan
photography
other
[Default]

<date>

<date>
Module core
Schematron

<s:pattern name="date_values">
<s:rule context="tei:date">
 <s:assert
   test="@when or (@notAfter and @notBefore) or (@from and @to)">
You must provide either @when or @to/@from, or @notAfter/@notBefore.</s:assert></s:rule></s:pattern>

<decoNote>

<decoNote>
Module msdescription
Attributes Attributes
type
Status Recommended
Suggested values include:
border
diagram
initial
marginal
miniature
mixed
paratext
secondary
other
[Default]
illustration
printmark
publishmark
vignette
frieze
map
unspecified

<dimensions>

<dimensions>
Module msdescription
Attributes Attributes
type
Status Required
Suggested values include:
binding
boxed
illustration
leaf
slip
written
unknown
[Default]

<gap>

<gap>
Module core
Attributes Attributes
reasongives the reason for omission of this material from the transcription.
Status Recommended
Legal values are:
damage
medium is damaged
illegible
material cannot be reliably read
cancelled
material can be read but has been cancelled by scribe
irrelevant
material is not regarded as relevant by the transcriber [Default]
omitted
material omitted by transcriber
lacuna
material missing from the source
unitnames the unit used for describing the extent of the gap
Status Optional
Legal values are:
chars
written characters
leaves
leaves
lines
lines
mm
millimetres
pages
pages
words
words

<hi>

<hi>
Module core
Attributes Attributes
rend
Status Recommended
Suggested values include:
font-stretch:expanded
font-style:italic
font-style:normal
font-variant:small-caps
font-weight:bold
text-decoration:underline
text-decoration:double-underline
hyphenated
rubric
stacked
sup

<layout>

<layout>
Module msdescription
Attributes Attributes
columns
Status Recommended
Datatype 1–2 occurrences of  data.countseparated by whitespace

<measure>

<measure>
Module core
Attributes Attributes
type
Status Required
Suggested values include:
currency
leavesCount
pagesCount
columnsCount
linesCount
pageDimensions
binding
written
boxed
miniature
illustration
unknown
[Default]
unit
Status Optional
Suggested values include:
cm
[Default]
mm
in
chars
lines
columns
leaves

<msDesc>

<msDesc>
Module msdescription
Attributes Attributes
xml:id
Status Required
xml:lang
Status Required

<name>

<name>
Module core
Attributes Attributes
type
Status Recommended
Suggested values include:
person
the name of a person
place
the name of a place
project
the name of a project
org
the name of an organisation
unknown
name of an unknown type [Default]

<objectDesc>

<objectDesc>
Module msdescription
Attributes Attributes
form
Status Required
Legal values are:
codex
a bound codex [Default]
fascicle
part of a bound codex with its own history
leaf
a loose leaf
scroll
a scroll
other
any other format

<origDate>

<origDate>
Module msdescription
Schematron

<s:pattern name="date_values">
<s:rule context="tei:origDate">
 <s:assert
   test="@when or (@notAfter and @notBefore) or (@from and @to)">
You must provide either @when or @to/@from, or @notAfter/@notBefore.</s:assert></s:rule></s:pattern>

<person>

<person>
Module namesdates
Attributes Attributes
sex
Status Recommended

<recordHist>

<recordHist>
Module msdescription
Declaration
element recordHist { model.pLike+ | source }

<ref>

<ref>
Module core
Attributes Attributes
type
Status Recommended
Suggested values include:
biblical
certain type of reference, should be accompagnied by cRef
classical
certain type of reference, should be accompagnied by cRef
medieval
certain type of reference, should be accompagnied by cRef
altMs
reference to a manuscript that belongs to another institution other than the owner of the described manuscript
mss
reference to a manuscript that belongs to the same institution as the described manuscript, should be accompagnied by cRef
purl
doi
urn
URN of the work
tgn
Number in the Getty Thesaurus of Geographical Names
vd16
Number of an item in the database VD 16. To be used without leading

VD 16

, spaces will be omitted, e.g. @cRef="E3185"
vd17
Number of an item in the database VD 17. To be used without leading

VD 17

, e.g. @cRef="23:320717T"
manumed
Manuscripta Medediaevalia, Identifier not yet established
opac
reference number in the local library catalogue (OPAC)
pl
Patrologie Latina; traditionally cited, with volume and column (a-c), without spaces, e.g. cRef="43_253A"
pg
Patrologia Graeca; like PL, references the column of the Greek text, e.g. cRef="43_253A").
ebdb
Einbanddatenbank
wzma
Wasserzeichen des Mittelalters
wilc
Watermarks in incunabula printed in the Low Countries
pnd
Personennamendatei
gnd
Gemeinsame Normdatei
viaf
Virtual International Authority File (only to be used if no reference number in national authority files exists)
other
any other type of reference
wdb
Identifier (xml:id) aus facsimile.xml, z.B. drucke_lh-4f-106-1_00003; vgl. Allgemeines
inkunabeln
drucke
pdf
gbv
PPN des GBV

<region>

<region>
Module namesdates
Attributes Attributes
type
Status Recommended
Legal values are:
parish
county
compass
geog
state
unknown
[Default]

<rs>

<rs>
Module core
Attributes Attributes
role
Status Recommended
Suggested values include:
author
author of a work - FRBR work [Default]
translator
translator of a work - FRBR expression
commentator
commentator of a work - FRBR manifestation
editor
editor of a work - FRBR manifestation/item
other
unspecified

<scriptNote>

<scriptNote>
Module header
Attributes Attributes
xml:id
Status Required

<supplied>

<supplied>
Module transcr
Attributes Attributes
reason
Status Recommended
Legal values are:
omitted
illegible
damage
unknown
[Default]

<supportDesc>

<supportDesc>
Module msdescription
Attributes Attributes
material
Status Required
Suggested values include:
perg
parchment
chart
paper
mixed
mixture of any other materials than paper, papyrus, and parchment
papyrus
papyrus_perg
mixture of papyrus and parchment
papyrus_chart
mixture of papyrus and paper
perg_chart
mixture of paper and parchment
unknown
unknown [Default]

<surrogates>

<surrogates> contains information about any non-digital representations of the manuscript being described which may exist in the holding institution or elsewhere.
Module msdescription

<term>

<term> The element term might be used to encode structural metadata on a manuscript such as physical entities, textual divisions, or special occurrences like annotations or decorative elements. The values are copied from the set of terms for structural metadata for the DFG-Viewer.
Module core
Attributes Attributes
key
Status Recommended
Suggested values include:
additional
additional
address
address
annotation
annotation
article
article
binding
binding
bookplate
bookplate
chapter
chapter
collation
collation
colophon
colophon
contained_work
contained work
contents
table of contents
corrigenda
corrigenda
cover
cover
cover_front
front cover
cover_back
back cover
dedication
dedication
edge
edge
endsheet
endsheet
engraved_titlepage
engraved titlepage
entry
entry
fascicle
fascicle
fragment
fragment
illustration
illustration
imprint
imprint
index
index
initial_decoration
initial decoration
issue
issue
manuscript
manuscript
map
map
monograph
monograph
multivolume_work
multi volume work
musical_notation
musical notation
ornament
ornament
paste_down
paste down
periodical
periodical
preface
preface
printers_mark
printers mark
privileges
privileges
provenance
provenance
scheme
scheme
section
section
spine
spine
stamp
stamp
table
table
text
text
title_page
title page
verse
verse
volume
volume
type
Status Optional
Suggested values include:
author
The term is the name of an author of a text in the manuscript.
placename
The term is the name of a place.
script
The term names the script used in the manuscript.
structure
The term belongs to the set of structural metadata.
title
The term is the title of a text in the manuscript.

<textLang>

<textLang>
Module msdescription
Attributes Attributes
mainLang
Status Required


Date: 0.1. Created on 16 Feb 2010