Interlinking source text collections a Norwegian example - - PowerPoint PPT Presentation
Interlinking source text collections a Norwegian example - - PowerPoint PPT Presentation
Interlinking source text collections a Norwegian example Christian-Emil Ore Charter by king Hkon Hkonsson 1225 Collection 1 more recent transcripts Collection 1 more recent transcripts Transcripts from manuscripts
Charter by king Hákon Hákonsson 1225
Collection 1 – more recent transcripts
- Transcripts from manuscripts (facsimiles)
- Written in Old Norwegian 1170 – 1405
- Approx 5 000 transcripts
- Transcribed on paper 1960 – 1990
- Digitized with simple markup in 1990ies
- A part of the Menotec project 2010 – 2012:
– Linguistic text corpust, transcript checked – XML encoding on TEI-MENOTA diplomatic level – See www.menota@org
Collection 1 – more recent transcripts
Collection 2 – Diplomatarium Norvegicum
Summary Source info Date Text number Place Edited text
- 22 volumes, 19 000 entries (approx)
- Published 1846 – 1995
- Each entry identified by volume and text
number
- Digitized 1990ies, home made XML-encoding
- Online since 1996
– www.dokpro.uio.no/dipl_norv/diplom_felt.html
- TEI P5 encoding 2011
Collection 2 – Diplomatarium Norvegicum
- 9 volumes, 10 500 entries (approx)
- Published 1978 – present
- Type setting files and word prosessor files
- Home made XML-encoding
- Online since 2004
– www.dokpro.uio.no/dipl_norv/regesta_felt.html – Links to Dipl. Norv. on the web.
- TEI P5 encoding 2011
Collection 3 – Regesta Norvegica
Collection 3 – Regesta Norvegica
Text witnesses Where it is published, among others Diplomatarium Norvegicum
Linking data – 1997
- 23447. Grave find from Roman iron age from the stone circle
at Jåberg (farmnr. 139) Sandar parish,Vestfold county. A) Bronze fibula from older Roman periode of the main type [...] 139, Jaaberg. Pron: jåbber. References: - i Jabærghi RB. 31, 56. Jabergh DN II 657, 1471. Iaberg NRJ. IV 127. Jabere DN III 836, 1539. [...] Vol II p. 657
- No. 882, Date 26 August 1471. Place: [Hyppestad]
[...] Jtem swor oc Stenulff Leidulfson sinsz fadhurs ordh at han gek med sin fadhur aff Jabergh som ligger i Sanda Hered deghi effther sancte Johannes dagh [...]
“Norwegian farm names” Diplomatarium Norvegicum Archaeological acquisition catalogue
<TEI ...> <teiHeader> <!--All kind of metadata--> <!-- Persons, places, events, bibl. ref --> <!-- text witnesses etc --> </teiHeader> <text> <! xml encode text goes here -->... </text> ... </TEI> Additional structure with extracted assertions and metadata from the document expressed in RDF-XML compliant with CIDOC-CRM Extraction done by XSLT, cf CLAROS project
Part 1, the text Part 2, data for Linked Data (semantic web) RDF triples
Linked Data – TEI XML documents
The CIDOC CRM (ISO-21127)
Top-level Classes relevant for Integration
participate in E39 Actors (persons, inst.) E55 Types E28 Conceptual Objects E18 Physical Things E2 Temporal Entities (Events) E41 Appellations refer to / refine refer to / identifie have location w i t h i n
E53 Places E52 Time-Spans
at affect or refer to
The four levels of FRBR (Functional Requirements for Bibliographic Record)
(E28) Work Item Expression Manifestation
Actors, places and events:
TEI elements CIDOC-CRM class person | E21 Person
- rg
E74 Group place E53 Place event E5 Event
Names:
TEI elements CIDOC-CRM class name E41 Appellation placeName E82 Place Appellation … …
Mapping between TEI and CIDOC-CRM
Detailed info at http://www.tei-c.org/SIG/Ontologies/guidelines/guidelinesTeiMappableCrm.xml
- CLAROS: Classical Art Research Online Services
– Partners in France, Germany, Greece, UK – CLAROS project aims to combine discrete databases of information about the ancient world using an RDF triplestore of assertions using CIDOC CRM. – Currently includes art objects, archaeological sites, antiquarian photographs, and onomastics. Lexicon of Greek Personal Names contributes via representation in TEI XML
Other projects exploring interlinking 1
- WissKI – Wissenschaftliche Kommunikations-
Infrastruktur (2009 – 2011)
- Common work space, CIDOC-CRM, RDF triplestore
- Partner
– Gemanische Nationalmuseum, Nürnberg
- Art history
– Zoologisches Forschungsmuseum Alexander Koenig, Bonn
- Biological spesimens, expedition diaries
– Friedrich-Alexander-Universität Erlangen-Nürnberg
- IT expertise
Other projects exploring interlinking 2
Information architecture
WissKi and CLAROS projects are based on
- A common database (RDF triplestore)
- Data model based on the CIDOC-CRM and
- XML encoded texts compliant with TEI P5
Common RDF triplestore based on CIDOC-CRM - ontology Partner databases Partner databases
... ...
User interface(s)
- An entry comprises
– A summary: who, what, when, to whom – Information about text witness(es) – Date, place of the creation – An edited text based on the text witness(es) (DN)
- An entry can be seen as a FRBR-expression for
the texts found on the text witness(es)
- Registers with aditional information
Collection 2 & 3: DN & RN
- Follows “the new philology “ tradition
– One text witness per transcript – Diplomatic transcript is in principle unedited.
- Information to identify the physical text
witness
- Added information: word, lemma, part of
speech, punctuation, syntactic information
Collection 1 – modern transcripts
- Diplomatarium Norvegicum and Regesta
Norvegica
– Each volume represents a complete printed work. – One TEI XML-file per volume
- The transcripts:
– Each transcript is a separate work – Separate TEI XML-files for each transcript – Metadata taken from the other sources
- Persistent identifiers (URIs)?
The texts as TEI-XML documents
- RN/DN registers
– Persons, places, onomastic information
- Transcripts
– Linguistic information
- RN/DN entries
– Creation date, place – Text witnesses, archival signature – Cross references for copies (vidimus) etc. – Published, mentioned, bibl. references
Possible points for external links
- The real world /meta information is placed to
the TEI header with pointers to the corresponding parts of the TEI document.
- A XSLT-stylesheet extract the information
from the header to a a set of RDF-triples which can be used in the Linked Data environment as in the Claros and WissKi projects.
Information architecture in the documents
- State of the art TEI XML encoded files of the
Diplomatarium Norvegicum, Regesta Norvegica
- The xml transcript files will have a richer metadata
header with info from the other sources
- A better interlinked web site for medieval
documents pertaining Norway, free download of all the xml-texts
- Open the material to other projects using Linked
(Open) Data
- Hopefully an invitation to other archives to at least
discuss linking of their collections (especially relevant for the Nordic countries and Hanseatic archives)
Outcome
Encoding of the Diplomatarium Norvegicum texts
[...] <lb xml:id=“pb_dn_02_005_001”/> H[akon] konongr sun H[akonar] konongs sendir bondom oc <lb xml:id=“pb_dn_02_005_002”/> buþeignum. ollum guðs vinnum. oc sinum þeim er þetta bref sea eða [...]
Encoding of the transcripts
[...] <pb ed="ms"/><lb ed="ms" idref=“pb_dn_02_005_001"/><lb n="1" ed="DN"/> <w xml:id="w000001"><me:dipl>H<ex>akon</ex></me:dipl></w> <w xml:id="w000002"><me:dipl>kono<ex>n</ex>gR</me:dipl></w> <w xml:id="w000003"><me:dipl>sun</me:dipl></w> <me:punct>.</me:punct> <w xml:id="w000004"><me:dipl>H<ex>akonar</ex></me:dipl></w> <w xml:id="w000005"><me:dipl>k<ex>onongs</ex></me:dipl></w> [...]