interlinking source text collections a norwegian example
play

Interlinking source text collections a Norwegian example - PowerPoint PPT Presentation

Interlinking source text collections a Norwegian example Christian-Emil Ore Charter by king Hkon Hkonsson 1225 Collection 1 more recent transcripts Collection 1 more recent transcripts Transcripts from manuscripts


  1. Interlinking source text collections – a Norwegian example Christian-Emil Ore

  2. Charter by king Hákon Hákonsson 1225

  3. Collection 1 – more recent transcripts

  4. Collection 1 – more recent transcripts • Transcripts from manuscripts (facsimiles) • Written in Old Norwegian 1170 – 1405 • Approx 5 000 transcripts • Transcribed on paper 1960 – 1990 • Digitized with simple markup in 1990ies • A part of the Menotec project 2010 – 2012: – Linguistic text corpust, transcript checked – XML encoding on TEI-MENOTA diplomatic level – See www.menota@org

  5. Collection 2 – Diplomatarium Norvegicum Summary Source info Text number Date Place Edited text

  6. Collection 2 – Diplomatarium Norvegicum • 22 volumes, 19 000 entries (approx) • Published 1846 – 1995 • Each entry identified by volume and text number • Digitized 1990ies, home made XML-encoding • Online since 1996 – www.dokpro.uio.no/dipl_norv/diplom_felt.html • TEI P5 encoding 2011

  7. Collection 3 – Regesta Norvegica • 9 volumes, 10 500 entries (approx) • Published 1978 – present • Type setting files and word prosessor files • Home made XML-encoding • Online since 2004 – www.dokpro.uio.no/dipl_norv/regesta_felt.html – Links to Dipl. Norv. on the web. • TEI P5 encoding 2011

  8. Collection 3 – Regesta Norvegica Text witnesses Where it is published, among others Diplomatarium Norvegicum

  9. Linking data – 1997 “Norwegian farm names” 139, Jaaberg . Pron: jåbber. References: - i Jabærghi RB. 31, 56. Jabergh DN II 657, 1471 . Iaberg NRJ. IV 127. Jabere DN III 836, 1539. [...] Diplomatarium Norvegicum Vol II p. 657 No. 882, Date 26 August 1471. Place: [Hyppestad] [...] Jtem swor oc Stenulff Leidulfson sinsz fadhurs ordh at han gek med sin fadhur aff Jabergh som ligger i Sanda Hered deghi effther sancte Johannes dagh [...] 23447. Grave find from Roman iron age from the stone circle at Jåberg (farmnr. 139) Sandar parish,Vestfold county. A) Bronze fibula from older Roman periode of the main type [...] Archaeological acquisition catalogue

  10. Linked Data – TEI XML documents <TEI ...> Part 1, the text <teiHeader> <!--All kind of metadata--> <!-- Persons, places, events, bibl. ref --> <!-- text witnesses etc --> </teiHeader> <text> <! xml encode text goes here -->... </text> ... </TEI> Additional structure with extracted assertions Part 2, data for and metadata from the document expressed Linked Data in RDF-XML compliant with CIDOC-CRM (semantic web) RDF triples Extraction done by XSLT, cf CLAROS project

  11. The CIDOC CRM (ISO-21127) Top-level Classes relevant for Integration E55 Types refer to / refine E39 Actors E28 Conceptual Objects E41 Appellations refer to / identifie (persons, inst.) E18 Physical Things participate in affect or refer to E2 Temporal Entities have location (Events) at w i t h i n E52 Time-Spans E53 Places

  12. The four levels of FRBR ( Functional Requirements for Bibliographic Record) (E28) Work Expression Manifestation Item

  13. Mapping between TEI and CIDOC-CRM Actors, places and events: TEI elements CIDOC-CRM class person | E21 Person org E74 Group place E53 Place event E5 Event Names: TEI elements CIDOC-CRM class name E41 Appellation placeName E82 Place Appellation … … Detailed info at http://www.tei-c.org/SIG/Ontologies/guidelines/guidelinesTeiMappableCrm.xml

  14. Other projects exploring interlinking 1 • CLAROS: Classical Art Research Online Services – Partners in France, Germany, Greece, UK – CLAROS project aims to combine discrete databases of information about the ancient world using an RDF triplestore of assertions using CIDOC CRM. – Currently includes art objects, archaeological sites, antiquarian photographs, and onomastics. Lexicon of Greek Personal Names contributes via representation in TEI XML

  15. Other projects exploring interlinking 2 • WissKI – Wissenschaftliche Kommunikations- Infrastruktur (2009 – 2011) • Common work space, CIDOC-CRM, RDF triplestore • Partner – Gemanische Nationalmuseum, Nürnberg • Art history – Zoologisches Forschungsmuseum Alexander Koenig, Bonn • Biological spesimens, expedition diaries – Friedrich-Alexander-Universität Erlangen-Nürnberg • IT expertise

  16. Information architecture WissKi and CLAROS projects are based on • A common database (RDF triplestore) • Data model based on the CIDOC-CRM and • XML encoded texts compliant with TEI P5 User interface(s) Common RDF triplestore based on CIDOC-CRM - ontology ... ... Partner databases Partner databases

  17. Collection 2 & 3: DN & RN • An entry comprises – A summary: who, what, when, to whom – Information about text witness(es) – Date, place of the creation – An edited text based on the text witness(es) (DN) • An entry can be seen as a FRBR-expression for the texts found on the text witness(es) • Registers with aditional information

  18. Collection 1 – modern transcripts • Follows “the new philology “ tradition – One text witness per transcript – Diplomatic transcript is in principle unedited. • Information to identify the physical text witness • Added information: word, lemma, part of speech, punctuation, syntactic information

  19. The texts as TEI-XML documents • Diplomatarium Norvegicum and Regesta Norvegica – Each volume represents a complete printed work. – One TEI XML-file per volume • The transcripts: – Each transcript is a separate work – Separate TEI XML-files for each transcript – Metadata taken from the other sources • Persistent identifiers (URIs)?

  20. Possible points for external links • RN/DN registers – Persons, places, onomastic information • Transcripts – Linguistic information • RN/DN entries – Creation date, place – Text witnesses, archival signature – Cross references for copies (vidimus) etc. – Published, mentioned, bibl. references

  21. Information architecture in the documents • The real world /meta information is placed to the TEI header with pointers to the corresponding parts of the TEI document. • A XSLT-stylesheet extract the information from the header to a a set of RDF-triples which can be used in the Linked Data environment as in the Claros and WissKi projects.

  22. Outcome • State of the art TEI XML encoded files of the Diplomatarium Norvegicum, Regesta Norvegica • The xml transcript files will have a richer metadata header with info from the other sources • A better interlinked web site for medieval documents pertaining Norway, free download of all the xml-texts • Open the material to other projects using Linked (Open) Data • Hopefully an invitation to other archives to at least discuss linking of their collections (especially relevant for the Nordic countries and Hanseatic archives)

  23. Encoding of the Diplomatarium Norvegicum texts [...] <lb xml:id=“pb_dn_02_005_001”/> H[akon] konongr sun H[akonar] konongs sendir bondom oc <lb xml:id=“pb_dn_02_005_002”/> buþeignum. ollum guðs vinnum. oc sinum þeim er þetta bref sea eða [...] Encoding of the transcripts [...] <pb ed="ms"/><lb ed="ms" idref=“pb_dn_02_005_001"/><lb n="1" ed="DN"/> <w xml:id="w000001"><me:dipl>H<ex>akon</ex></me:dipl></w> <w xml:id="w000002"><me:dipl>kono<ex>n</ex>gR</me:dipl></w> <w xml:id="w000003"><me:dipl>sun</me:dipl></w> <me:punct>.</me:punct> <w xml:id="w000004"><me:dipl>H<ex>akonar</ex></me:dipl></w> <w xml:id="w000005"><me:dipl>k<ex>onongs</ex></me:dipl></w> [...]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend