Interlinking source text collections a Norwegian example - - PowerPoint PPT Presentation

interlinking source text collections a norwegian example
SMART_READER_LITE
LIVE PREVIEW

Interlinking source text collections a Norwegian example - - PowerPoint PPT Presentation

Interlinking source text collections a Norwegian example Christian-Emil Ore Charter by king Hkon Hkonsson 1225 Collection 1 more recent transcripts Collection 1 more recent transcripts Transcripts from manuscripts


slide-1
SLIDE 1

Interlinking source text collections – a Norwegian example

Christian-Emil Ore

slide-2
SLIDE 2

Charter by king Hákon Hákonsson 1225

slide-3
SLIDE 3

Collection 1 – more recent transcripts

slide-4
SLIDE 4
  • Transcripts from manuscripts (facsimiles)
  • Written in Old Norwegian 1170 – 1405
  • Approx 5 000 transcripts
  • Transcribed on paper 1960 – 1990
  • Digitized with simple markup in 1990ies
  • A part of the Menotec project 2010 – 2012:

– Linguistic text corpust, transcript checked – XML encoding on TEI-MENOTA diplomatic level – See www.menota@org

Collection 1 – more recent transcripts

slide-5
SLIDE 5

Collection 2 – Diplomatarium Norvegicum

Summary Source info Date Text number Place Edited text

slide-6
SLIDE 6
  • 22 volumes, 19 000 entries (approx)
  • Published 1846 – 1995
  • Each entry identified by volume and text

number

  • Digitized 1990ies, home made XML-encoding
  • Online since 1996

– www.dokpro.uio.no/dipl_norv/diplom_felt.html

  • TEI P5 encoding 2011

Collection 2 – Diplomatarium Norvegicum

slide-7
SLIDE 7
  • 9 volumes, 10 500 entries (approx)
  • Published 1978 – present
  • Type setting files and word prosessor files
  • Home made XML-encoding
  • Online since 2004

– www.dokpro.uio.no/dipl_norv/regesta_felt.html – Links to Dipl. Norv. on the web.

  • TEI P5 encoding 2011

Collection 3 – Regesta Norvegica

slide-8
SLIDE 8

Collection 3 – Regesta Norvegica

Text witnesses Where it is published, among others Diplomatarium Norvegicum

slide-9
SLIDE 9

Linking data – 1997

  • 23447. Grave find from Roman iron age from the stone circle

at Jåberg (farmnr. 139) Sandar parish,Vestfold county. A) Bronze fibula from older Roman periode of the main type [...] 139, Jaaberg. Pron: jåbber. References: - i Jabærghi RB. 31, 56. Jabergh DN II 657, 1471. Iaberg NRJ. IV 127. Jabere DN III 836, 1539. [...] Vol II p. 657

  • No. 882, Date 26 August 1471. Place: [Hyppestad]

[...] Jtem swor oc Stenulff Leidulfson sinsz fadhurs ordh at han gek med sin fadhur aff Jabergh som ligger i Sanda Hered deghi effther sancte Johannes dagh [...]

“Norwegian farm names” Diplomatarium Norvegicum Archaeological acquisition catalogue

slide-10
SLIDE 10

<TEI ...> <teiHeader> <!--All kind of metadata--> <!-- Persons, places, events, bibl. ref --> <!-- text witnesses etc --> </teiHeader> <text> <! xml encode text goes here -->... </text> ... </TEI> Additional structure with extracted assertions and metadata from the document expressed in RDF-XML compliant with CIDOC-CRM Extraction done by XSLT, cf CLAROS project

Part 1, the text Part 2, data for Linked Data (semantic web) RDF triples

Linked Data – TEI XML documents

slide-11
SLIDE 11

The CIDOC CRM (ISO-21127)

Top-level Classes relevant for Integration

participate in E39 Actors (persons, inst.) E55 Types E28 Conceptual Objects E18 Physical Things E2 Temporal Entities (Events) E41 Appellations refer to / refine refer to / identifie have location w i t h i n

E53 Places E52 Time-Spans

at affect or refer to

slide-12
SLIDE 12

The four levels of FRBR (Functional Requirements for Bibliographic Record)

(E28) Work Item Expression Manifestation

slide-13
SLIDE 13

Actors, places and events:

TEI elements CIDOC-CRM class person | E21 Person

  • rg

E74 Group place E53 Place event E5 Event

Names:

TEI elements CIDOC-CRM class name E41 Appellation placeName E82 Place Appellation … …

Mapping between TEI and CIDOC-CRM

Detailed info at http://www.tei-c.org/SIG/Ontologies/guidelines/guidelinesTeiMappableCrm.xml

slide-14
SLIDE 14
  • CLAROS: Classical Art Research Online Services

– Partners in France, Germany, Greece, UK – CLAROS project aims to combine discrete databases of information about the ancient world using an RDF triplestore of assertions using CIDOC CRM. – Currently includes art objects, archaeological sites, antiquarian photographs, and onomastics. Lexicon of Greek Personal Names contributes via representation in TEI XML

Other projects exploring interlinking 1

slide-15
SLIDE 15
  • WissKI – Wissenschaftliche Kommunikations-

Infrastruktur (2009 – 2011)

  • Common work space, CIDOC-CRM, RDF triplestore
  • Partner

– Gemanische Nationalmuseum, Nürnberg

  • Art history

– Zoologisches Forschungsmuseum Alexander Koenig, Bonn

  • Biological spesimens, expedition diaries

– Friedrich-Alexander-Universität Erlangen-Nürnberg

  • IT expertise

Other projects exploring interlinking 2

slide-16
SLIDE 16

Information architecture

WissKi and CLAROS projects are based on

  • A common database (RDF triplestore)
  • Data model based on the CIDOC-CRM and
  • XML encoded texts compliant with TEI P5

Common RDF triplestore based on CIDOC-CRM - ontology Partner databases Partner databases

... ...

User interface(s)

slide-17
SLIDE 17
  • An entry comprises

– A summary: who, what, when, to whom – Information about text witness(es) – Date, place of the creation – An edited text based on the text witness(es) (DN)

  • An entry can be seen as a FRBR-expression for

the texts found on the text witness(es)

  • Registers with aditional information

Collection 2 & 3: DN & RN

slide-18
SLIDE 18
  • Follows “the new philology “ tradition

– One text witness per transcript – Diplomatic transcript is in principle unedited.

  • Information to identify the physical text

witness

  • Added information: word, lemma, part of

speech, punctuation, syntactic information

Collection 1 – modern transcripts

slide-19
SLIDE 19
  • Diplomatarium Norvegicum and Regesta

Norvegica

– Each volume represents a complete printed work. – One TEI XML-file per volume

  • The transcripts:

– Each transcript is a separate work – Separate TEI XML-files for each transcript – Metadata taken from the other sources

  • Persistent identifiers (URIs)?

The texts as TEI-XML documents

slide-20
SLIDE 20
  • RN/DN registers

– Persons, places, onomastic information

  • Transcripts

– Linguistic information

  • RN/DN entries

– Creation date, place – Text witnesses, archival signature – Cross references for copies (vidimus) etc. – Published, mentioned, bibl. references

Possible points for external links

slide-21
SLIDE 21
  • The real world /meta information is placed to

the TEI header with pointers to the corresponding parts of the TEI document.

  • A XSLT-stylesheet extract the information

from the header to a a set of RDF-triples which can be used in the Linked Data environment as in the Claros and WissKi projects.

Information architecture in the documents

slide-22
SLIDE 22
  • State of the art TEI XML encoded files of the

Diplomatarium Norvegicum, Regesta Norvegica

  • The xml transcript files will have a richer metadata

header with info from the other sources

  • A better interlinked web site for medieval

documents pertaining Norway, free download of all the xml-texts

  • Open the material to other projects using Linked

(Open) Data

  • Hopefully an invitation to other archives to at least

discuss linking of their collections (especially relevant for the Nordic countries and Hanseatic archives)

Outcome

slide-23
SLIDE 23

Encoding of the Diplomatarium Norvegicum texts

[...] <lb xml:id=“pb_dn_02_005_001”/> H[akon] konongr sun H[akonar] konongs sendir bondom oc <lb xml:id=“pb_dn_02_005_002”/> buþeignum. ollum guðs vinnum. oc sinum þeim er þetta bref sea eða [...]

Encoding of the transcripts

[...] <pb ed="ms"/><lb ed="ms" idref=“pb_dn_02_005_001"/><lb n="1" ed="DN"/> <w xml:id="w000001"><me:dipl>H<ex>akon</ex></me:dipl></w> <w xml:id="w000002"><me:dipl>kono<ex>n</ex>gR</me:dipl></w> <w xml:id="w000003"><me:dipl>sun</me:dipl></w> <me:punct>.</me:punct> <w xml:id="w000004"><me:dipl>H<ex>akonar</ex></me:dipl></w> <w xml:id="w000005"><me:dipl>k<ex>onongs</ex></me:dipl></w> [...]