1
Executive Summary
This document defines an XML format for the structure of the data which all Europeana
Regia (ER) partners will contribute to Europeana, either directly or indirectly by means
of a harvester or transformation process. It is an application of the TEI Release P5,
according to the specification instantiated by the Europeana Regia ODD, based on the ENRICH
ODD.
The schema defined by this document addresses three distinct aspects of a digitized
manuscript:
- metadata describing the original source manuscript
(??)
- metadata describing digitized images of the original source manuscript
(??)
- a transcription of the text contained by the original source manuscript
Within ER, only the first two are required. However, the schema documented here also
provides for the third, in the interest of completeness and for the assistance of ER
partners wishing to provide richer access facilities to their holdings.
The schema defined by this document is available in RelaxNG, and W3C Schema formats from
the Europeana Regia website at the address http://www.hab.de/forschung/projekte/europeana-regia/ODD/. These files
are also both included as part of this deliverable.
For Europeana Regia
, therefore, we propose to reduce the number of choices
to encode some phenomena and to constrain attribute values where possible.
It should be stressed however that
- the resulting schema must remain TEI Conformant: we are only
defining a subset
- no constraints will be introduced without full consent from all
partners in the project
The overall structure of an ER-conformant XML document may be summarized as follows:
<TEI>
<teiHeader>
</teiHeader>
<facsimile>
</facsimile>
<text>
</text>
</TEI>
In Deliverable D2.2, we provide full user-level documentation for the ER encoding scheme.
This documentation is generated from the P5 release of the TEI Guidelines together with
some revised descriptive material. In the present document we summarize how the TEI P5
Release has been customized.
2
Customization Section
We include in the schema the four basic key TEI modules header,
core, tei, and textstructure. We
—as ENRICH does— also
include five specialized modules: msdescription, linking,
namesdates, figures, and transcr.
Additionally to these for Europeana Regia we propose to include the module gaiji
in order to allow for the use of the element <g> and the module nets
in order to allow for the use of the element <graph>.
All the elements and attributes defined by these modules are included in the ER schema,
with the following modifications. Firstly, several unwanted elements are deleted.
Secondly, some optional attributes have been made compulsory, and their range of possible
values are constrained. Finally, the content model for a small number of elements has been
simplified to remove unwanted alternatives.
The following elements are deleted:
ab,
alt,
altGrp,
analytic,
appInfo,
application,
biblFull,
biblStruct,
binaryObject,
broadcast,
cRefPattern,
<cit>,
climate,
correction,
distinct,
email,
emph,
equipment,
equiv,
fsdDecl,
headItem,
headLabel,
hyphenation,
imprint,
interpretation,
join,
joinGrp,
<link>,
listNym,
measureGrp,
meeting,
mentioned,
metDecl,
metSym,
monogr,
msItemStruct,
namespace,
normalization,
num,
nym,
postBox,
postCode,
<q>,
quotation,
recording,
recordingStmt,
refsDecl,
rendition,
said,
samplingDecl,
scriptStmt,
segmentation,
series,
soCalled,
sp,
speaker,
stage,
state,
stdVals,
street,
tagUsage,
tagsDecl,
terrain,
time,
timeline,
variantEncoding,
when.
For ENRICH, the elements
<table>,
<row>,
<cell>,
measure,
rs,
<teiCorpus>,
<linkGrp>, and
att.global.linking had been
deleted as well. For ER this removal has been taken back for the following reasons:
- We came accross manuscript descriptions that contain tables. We therefore consider the
elements <table>, <row>, and <cell> to be useful in the schema.
- Some numbers, especially in the physical description, like the number of leaves, should be encoded.
As this could not be done using dimensions, the element measure has to be included in the schema.
- We came accross some older descriptions that consist of merely more than the manuscript heading.
If this heading contains an authors name, this could not be encoded with author,
as the element <head> cannot have it as child and the title is encoded as <title>.
Thus, if an ancient title contains an author's name, this has to be encoded using rs,
preferably with type and author.
- The element <teiCorpus> is needed in order to deal with complex digital objects such as a
digital edition of which the manuscript description could be part of. Also, in case one wants to construct
a manuscript catalogue, and every manuscript description exists as one TEI document, the
<teiCorpus> is one possible way to group these together.
- As the data for Europeana will be extracted not only from the manuscript descriptions
but also from structural metadata which will be encoded in the <text> element of any TEI file,
there may be the need for extensive linkage between items in the manuscript description, represented as
<msItem>s or decoNotes and textual divisions of the transcription. Thus both the
element <linkGrp> as well as the attributes of the class att.global.linking
should be present in the schema.
On the altIdentifier element, the type attribute is compulsory, and
must take one of the following values: former; palimpsest; partial;
internal; system; other.
To these values the following values are added in the course of ER:
access; alternative; catalog; erroneous; multivolume
On the availability element, the status attribute is compulsory, and
must take one of the following values: free; unknown;
restricted.
On the biblScope element, the type attribute is compulsory, and must
take one of the following values: volume; pages.
On the custEvent element, the type attribute is compulsory, and must
take one of the following values: check; conservation;
description; exhibition; loan; photography;
other.
On the decoNote element, the type attribute is compulsory, and must
take one of the following values: border; diagram;
initial; marginal; miniature; mixed;
paratext; secondary; other; illustration;
printmark; publishmark; vignette; frieze;
map; unspecified.
On the dimensions element, the type attribute is compulsory, and must
take one of the following values: leaf; binding; slip;
written; boxed; unknown.
On the measure element, the type attribute is compulsory, and may
take one of the following values: columnsCount; leavesCount;
linesCount; unknown. For this element could not be used
the same attribute values as for dimensions, as
the latter would not distinguish between sizes and numbers.
On the gap element, the reason attribute is compulsory, and must take
one of the following values: damage; illegible;
cancelled; irrelevant.
On all members of the att.dimensions class, the unit attribute
is compulsory, and must take one of the following values: chars;
leaves; lines; mm; pages; words.
The precision attribute is removed.
On the <handNote> element, the script attribute is compulsory, and must
take one of the following values: carolmin; textualis;
cursiva; hybrida; humbook; humcursiva;
kanzlei; kurrent; capquad; caprust;
uncialis; semiunc; benevent; luxeuil;
corbie; insulmin; alemmin; raetmin;
carolgot; textura; rotunda; cancell;
bastarda; cursant; cursrec; other.
On the <handNote> element, the scope attribute is recommended, and must
take one of the following values: sole; major; minor; unknown.
On the hi element, the rend attribute is compulsory, and should take
one of the following values: hyphenated; underline;
double-underline; bold; spaced; stacked;
caps; italic; sup; rubric.
On the layout element, the columns attribute is compulsory, and must
take a numeric value.
On the msDesc element, the xml:id attribute is compulsory, and must
be a valid XML identifier.
On the msDesc element, the xml:lang attribute is compulsory, and must
be a valid ISO 639 language code.
On the name element, the type attribute is compulsory, and must take
one of the following values: person; place; org;
unknown.
On the objectDesc element, the form attribute is compulsory, and must
take one of the following values: codex; leaf; scroll;
other.
On the person element, the sex attribute is compulsory, and must be
one of 1 (male), 2 (female), 0 (inapplicable), or 9 (unknown).
On the ref element, the type attribute is recommended, and should
take on of the following values: biblical; classical;
medieval; altMs; mss; inkunabeln;
drucke; pdf; purl; urn; doi;
crossRef; repertorium.
On the region element, the type attribute is compulsory, and must
take one of the following values: parish; county;
compass; geog; state; unknown.
On the supplied element, the reason attribute is compulsory, and must
take one of the following values: omitted; illegible;
damage; unknown.
On the supportDesc element, the material attribute is compulsory, and
must take one of the following values: perg; chart;
mixed; unknown.
The following changes do not affect TEI conformance since either they affect only
optional parts of TEI content models or they involve additional value constraints for TEI
attributes:
- On the textLang element, the mainLang attribute is compulsory,
and must take a legal character identifier as value.
- The optional elements xml:id and xml:lang are made mandatory
for the msDesc element.
- The content model of the date element is changed to include a schematron rule
which enforces an appropriate selection of attributes (one of: when,
to and from, or notAfter and notBefore)
Schema europeana-regia: Model classes
Schema europeana-regia: Attribute classes
att.global
| att.global |
|
Module
|
tei |
|
Attributes
|
Attributes |
Schema europeana-regia: Elements
<altIdentifier>
|
<altIdentifier>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes
| type | | Status
| Required |
| Legal values are: | - accession
- accession number
- alternative
- the standard identification in an alternative version in writing
- catalog
- number in a catalogue
- collection
- a manuscript that has been grouped together with other manuscripts for some reason
- erroneous
- erroneous shelf number, but used in some literature
- former
- former shelf number
- internal
- internal project identifier
- multivolume
- mss is part of a multivolume and therefore has more than one shelfmark
- other
- unspecified [Default]
- palimpsest
- identifier of a previously written but deleted item
- partial
- identifier of a previously distinct item
- system
- former system identifier (Manuscriptorium specific)
|
|
|
<author>
|
<author>
|
|
Module
|
core |
|
Attributes
|
Attributes
| role | | Status
| Recommended | | Suggested values include: | - author
- author of a work - FRBR work [Default]
- translator
- translator of a work - FRBR expression
- commentator
- commentator of a work - FRBR manifestation
- editor
- editor of a work - FRBR manifestation/item
- other
- unspecified
|
|
|
<availability>
|
<availability>
|
|
Module
|
header |
|
Attributes
|
Attributes
| status | | Status
| Required |
| Legal values are: | - free
- unknown
- [Default]
- restricted
|
|
|
<biblScope>
|
<biblScope>
|
|
Module
|
core |
|
Attributes
|
Attributes
| type | | Status
| Required |
| Suggested values include: | - column
- volume
- pages
- [Default]
|
|
|
<custEvent>
|
<custEvent>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes
| type | | Status
| Recommended | | Suggested values include: | - check
- conservation
- description
- exhibition
- loan
- photography
- other
- [Default]
|
|
|
<date>
|
<date>
|
|
Module
|
core |
|
Schematron
|
<s:pattern name="date_values"> <s:rule context="tei:date"> <s:assert test="@when or (@notAfter and @notBefore) or (@from and @to)">You must provide either @when or @to/@from, or @notAfter/@notBefore.</s:assert></s:rule></s:pattern>
|
<decoNote>
|
<decoNote>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes
| type | | Status
| Recommended | | Suggested values include: | - border
- diagram
- initial
- marginal
- miniature
- mixed
- paratext
- secondary
- other
- [Default]
- illustration
- printmark
- publishmark
- vignette
- frieze
- map
- unspecified
|
|
|
<dimensions>
|
<dimensions>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes
| type | | Status
| Required |
| Suggested values include: | - binding
- boxed
- illustration
- leaf
- slip
- written
- unknown
- [Default]
|
|
|
<gap>
|
<gap>
|
|
Module
|
core |
|
Attributes
|
Attributes
| reason | gives the reason for omission of this material from the transcription.| Status
| Recommended | | Legal values are: | - damage
- medium is damaged
- illegible
- material cannot be reliably read
- cancelled
- material can be read but has been cancelled by scribe
- irrelevant
- material is not regarded as relevant by the transcriber [Default]
- omitted
- material omitted by transcriber
- lacuna
- material missing from the source
|
|
| unit | names the unit used for describing the extent of the gap| Status
| Optional | | Legal values are: | - chars
- written characters
- leaves
- leaves
- lines
- lines
- mm
- millimetres
- pages
- pages
- words
- words
|
|
|
<hi>
|
<hi>
|
|
Module
|
core |
|
Attributes
|
Attributes
| rend | | Status
| Recommended |
| Suggested values include: | - font-stretch:expanded
- font-style:italic
- font-style:normal
- font-variant:small-caps
- font-weight:bold
- text-decoration:underline
- text-decoration:double-underline
- hyphenated
- rubric
- stacked
- sup
|
|
|
<layout>
|
<layout>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes
| columns | | Status
| Recommended |
| Datatype
| 1–2 occurrences of
data.countseparated by whitespace |
|
|
<measure>
|
<measure>
|
|
Module
|
core |
|
Attributes
|
Attributes
| type | | Status
| Required |
| Suggested values include: | - currency
- leavesCount
- pagesCount
- columnsCount
- linesCount
- pageDimensions
- binding
- written
- boxed
- miniature
- illustration
- unknown
- [Default]
|
|
| unit | | Status
| Optional | | Suggested values include: | - cm
- [Default]
- mm
- in
- chars
- lines
- columns
- leaves
|
|
|
<msDesc>
|
<msDesc>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes |
<name>
|
<name>
|
|
Module
|
core |
|
Attributes
|
Attributes
| type | | Status
| Recommended | | Suggested values include: | - person
- the name of a person
- place
- the name of a place
- project
- the name of a project
- org
- the name of an organisation
- unknown
- name of an unknown type [Default]
|
|
|
<objectDesc>
|
<objectDesc>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes
| form | | Status
| Required |
| Legal values are: | - codex
- a bound codex [Default]
- fascicle
- part of a bound codex with its own history
- leaf
- a loose leaf
- scroll
- a scroll
- other
- any other format
|
|
|
<origDate>
|
<origDate>
|
|
Module
|
msdescription |
|
Schematron
|
<s:pattern name="date_values"> <s:rule context="tei:origDate"> <s:assert test="@when or (@notAfter and @notBefore) or (@from and @to)">You must provide either @when or @to/@from, or @notAfter/@notBefore.</s:assert></s:rule></s:pattern>
|
<person>
|
<person>
|
|
Module
|
namesdates |
|
Attributes
|
Attributes |
<recordHist>
|
<recordHist>
|
|
Module
|
msdescription |
|
Declaration
|
element recordHist { model.pLike+ | source }
|
<ref>
|
<ref>
|
|
Module
|
core |
|
Attributes
|
Attributes
| type | | Status
| Recommended |
| Suggested values include: | - biblical
- certain type of reference, should be accompagnied by cRef
- classical
- certain type of reference, should be accompagnied by cRef
- medieval
- certain type of reference, should be accompagnied by cRef
- altMs
- reference to a manuscript that belongs to another institution other than the owner of the described manuscript
- mss
- reference to a manuscript that belongs to the same institution as the described manuscript, should be accompagnied by cRef
- purl
- doi
- urn
- URN of the work
- tgn
- Number in the Getty Thesaurus of Geographical Names
- vd16
- Number of an item in the database VD 16. To be used without leading
VD 16 , spaces will be omitted, e.g. @cRef="E3185" - vd17
- Number of an item in the database VD 17. To be used without leading
VD 17 , e.g. @cRef="23:320717T" - manumed
- Manuscripta Medediaevalia, Identifier not yet established
- opac
- reference number in the local library catalogue (OPAC)
- pl
- Patrologie Latina; traditionally cited, with volume and column (a-c), without spaces, e.g. cRef="43_253A"
- pg
- Patrologia Graeca; like PL, references the column of the Greek text, e.g. cRef="43_253A").
- ebdb
- Einbanddatenbank
- wzma
- Wasserzeichen des Mittelalters
- wilc
- Watermarks in incunabula printed in the Low Countries
- pnd
- Personennamendatei
- gnd
- Gemeinsame Normdatei
- viaf
- Virtual International Authority File (only to be used if no reference number in national authority files exists)
- other
- any other type of reference
- wdb
- Identifier (xml:id) aus facsimile.xml, z.B. drucke_lh-4f-106-1_00003; vgl. Allgemeines
- inkunabeln
- drucke
- pdf
- gbv
- PPN des GBV
|
|
|
<region>
|
<region>
|
|
Module
|
namesdates |
|
Attributes
|
Attributes
| type | | Status
| Recommended | | Legal values are: | - parish
- county
- compass
- geog
- state
- unknown
- [Default]
|
|
|
<rs>
|
<rs>
|
|
Module
|
core |
|
Attributes
|
Attributes
| role | | Status
| Recommended | | Suggested values include: | - author
- author of a work - FRBR work [Default]
- translator
- translator of a work - FRBR expression
- commentator
- commentator of a work - FRBR manifestation
- editor
- editor of a work - FRBR manifestation/item
- other
- unspecified
|
|
|
<scriptNote>
|
<scriptNote>
|
|
Module
|
header |
|
Attributes
|
Attributes |
<supplied>
|
<supplied>
|
|
Module
|
transcr |
|
Attributes
|
Attributes
| reason | | Status
| Recommended | | Legal values are: | - omitted
- illegible
- damage
- unknown
- [Default]
|
|
|
<supportDesc>
|
<supportDesc>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes
| material | | Status
| Required |
| Suggested values include: | - perg
- parchment
- chart
- paper
- mixed
- mixture of any other materials than paper, papyrus, and parchment
- papyrus
- papyrus_perg
- mixture of papyrus and parchment
- papyrus_chart
- mixture of papyrus and paper
- perg_chart
- mixture of paper and parchment
- unknown
- unknown [Default]
|
|
|
<surrogates>
| <surrogates> contains information about any non-digital representations of the manuscript being described which may exist in the holding institution or elsewhere. |
|
Module
|
msdescription |
<term>
| <term> The element term might be used to encode structural metadata on a manuscript such as physical entities, textual divisions, or special occurrences like annotations or decorative elements.
The values are copied from the set of terms for structural metadata for the DFG-Viewer. |
|
Module
|
core |
|
Attributes
|
Attributes
| key | | Status
| Recommended |
| Suggested values include: | - additional
- additional
- address
- address
- annotation
- annotation
- article
- article
- binding
- binding
- bookplate
- bookplate
- chapter
- chapter
- collation
- collation
- colophon
- colophon
- contained_work
- contained work
- contents
- table of contents
- corrigenda
- corrigenda
- cover
- cover
- cover_front
- front cover
- cover_back
- back cover
- dedication
- dedication
- edge
- edge
- endsheet
- endsheet
- engraved_titlepage
- engraved titlepage
- entry
- entry
- fascicle
- fascicle
- fragment
- fragment
- illustration
- illustration
- imprint
- imprint
- index
- index
- initial_decoration
- initial decoration
- issue
- issue
- manuscript
- manuscript
- map
- map
- monograph
- monograph
- multivolume_work
- multi volume work
- musical_notation
- musical notation
- ornament
- ornament
- paste_down
- paste down
- periodical
- periodical
- preface
- preface
- printers_mark
- printers mark
- privileges
- privileges
- provenance
- provenance
- scheme
- scheme
- section
- section
- spine
- spine
- stamp
- stamp
- table
- table
- text
- text
- title_page
- title page
- verse
- verse
- volume
- volume
|
|
| type | | Status
| Optional |
| Suggested values include: | - author
- The term is the name of an author of a text in the manuscript.
- placename
- The term is the name of a place.
- script
- The term names the script used in the manuscript.
- structure
- The term belongs to the set of structural metadata.
- title
- The term is the title of a text in the manuscript.
|
|
|
<textLang>
|
<textLang>
|
|
Module
|
msdescription |
|
Attributes
|
Attributes |