linking the tei
play

Linking the TEI Approaches, Limitations, Use Cases Christian - PowerPoint PPT Presentation

DH2019, Utrecht, 2019-07-11 Linking the TEI Approaches, Limitations, Use Cases Christian Chiarcos & Maxim Ionov {chiarcos|ionov}@cs.uni-frankfurt.de Applied Computational Linguistics Goethe Universitt Frankfurt, Germany Linking the TEI


  1. DH2019, Utrecht, 2019-07-11 Linking the TEI Approaches, Limitations, Use Cases Christian Chiarcos & Maxim Ionov {chiarcos|ionov}@cs.uni-frankfurt.de Applied Computational Linguistics Goethe Universität Frankfurt, Germany

  2. Linking the TEI  The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters  The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed  The „proper“ way • standard-conformant standoff annotations

  3. Linking the TEI  The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters  The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed  The „proper“ way • standard-conformant standoff annotations TEI/XML + WebAnnotation (JSON-LD) TEI-compliant and LOD-compliant restricted to static TEI documents

  4. Linking the TEI  The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters  The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed inline XML solutions ?  The „proper“ way • standard-conformant standoff annotations TEI/XML + WebAnnotation (JSON-LD) breaks if we TEI-compliant and LOD-compliant have dynamic TEI content restricted to static TEI documents

  5. TEI and TEI customizations  very rich vocabulary • TEI P5: 569 elements, 505 attributes  TEI customization with ODD • high-level specification for customizing the TEI  select modules  refine vocabulary elements  generate (textual) documentation  generate actual schemas • any TEI project should start with such a customization

  6. TEI and TEI customizations  TEI occupies a very distinctive position with respect to the idea of standardization: • strictly speaking, it is not a standard, • but is poised between a standard and a consensus, • possessing some characteristics of each, in ways that have very interesting consequences for extension and interchange. (Bauman & Flanders 2004, bullet points by us) TEI compliance does not entail interoperability For problems not documented in the TEI documentation, • Documents following the same customization are different customizations will not be interoperable interoperable. • Beyond that, the TEI provides only an orientation for their E.g., when trying to encode RDF triples in the TEI ;) interpretation

  7. Resource Description Framework  RDF 1.1: general data model for the web of data • W3C recommendation 2014  directed labeled multi-graph • multiple edges between the same nodes  for every edge: • source node („RDF subject“) defined by URI • edge type („RDF property / relation“) defined by URI • target node („RDF object“) defined by URI  alternatively, target can also be an atomic (literal) value ⇒ „triple“ / „RDF statement“

  8. Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> prefix declarations perseus-nt: tlg002.perseus-grc1:9.35 graph (RDF) text perseus-nt:tlg002.perseus-grc1:9.35 (URI)

  9. Triples and graphs: NT Mark 9:35 variants RDFS URL: machine-readable PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> representation of a particular vocabulary PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations perseus-nt: tlg002.perseus-grc1:9.35 rdfs:label graph „... πάντων ἔσχατος ...“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ .

  10. Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> PREFIX saws-nt : <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations PREFIX saws : <http://purl.org/saws/ontology#> saws:isVariantOf perseus-nt: saws-nt: tlg002.perseus-grc1:9.35 divsection1.o14.a107 rdfs:label graph „... πάντων ἔσχατος ...“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . saws-nt: divsection1.o14.a107 saws:isVariantOf perseus-nt:tlg002.perseus-grc1:9.35 .

  11. Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> PREFIX saws-nt : <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations PREFIX saws : <http://purl.org/saws/ontology#> saws:isVariantOf perseus-nt: saws-nt: tlg002.perseus-grc1:9.35 divsection1.o14.a107 rdfs:label rdfs:label graph „... πάντων ἔσχατος ...“ „ἔσχατος πάντων“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . saws- nt: divsection1.o14.a107 rdfs:label „ἔσχατος πάντων“ . saws-nt: divsection1.o14.a107 saws:isVariantOf perseus-nt:tlg002.perseus-grc1:9.35 .

  12. Why RDF for philological data?  flexible and generic mechanism for creating cross- references  integration of / linking with linked open data (LOD)  facilitate re-usability, sustainability and replicability  build on existing LOD technology  enforce explicit, machine-readable semantics • URIs for data structures (say, rdfs:label) can resolve against a machine-readable, formal definition • facilitates semantic validation instead of syntactic validation  SHACL, OWL2/DL

  13. Linked Open Data (LOD) a best practice for publishing data on the web  Use URIs as names for things  Use HTTP URIs so that people can look up those names.  Provide useful information, using RDF- based standards  Include links to other URIs https://www.w3.org/DesignIssues/LinkedData.html

  14. Linguistic Linked Open Data (LLOD) http://linguistic-lod.org/ (version of July 2017)

  15. Linking the TEI  Both TEI and LOD are important topics in DH • They do not converge ...  It is not possible to integrate RDF triples in a single TEI document in a way that is both TEI- and W3C-compliant • You can always create customized TEI with extensions for your personal approach to encode RDF triples, but there is no recommended way for doing so

  16. Linking the TEI: Options  The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters  Inline XML solutions • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed  What to do and how to choose

  17. Linking the TEI The (TEI-)native way

  18. The (TEI-)native way: @ref*  Various elements can take @ref arguments, these refer to URIs • these correspond to RDF targets • no explicit representation of RDF source or RDF predicate  these must be extrapolated from definitions or data snippets * other URI-bearing attributes do exist, too

  19. The (TEI-)native way: @ref Text Database and Dictionary of Classic Mayan (TWKM, University of Bonn, Germany, 2014-2029) subject: @xml:id property: < g> „is instance of glyph type“ target: @ref

  20. The (TEI-)native way: @ref  It‘s easy to build a converter, BUT • if we have different hypotheses regarding the reading of a sign (physical damage or different interpretation), there is no easy way to express  provenance  uncertainty  etc. of multiple alternative readings subject: @xml:id property: < g> „is instance of glyph type“ target: @ref

  21. The (TEI-)native way: @ref  It‘s easy to build a converter, BUT • if we have different hypotheses regarding the reading of a sign (physical damage or different interpretation), there is no easy way to express  provenance The colleagues will certainly invent something, but ...  uncertainty ... this would be a natural application of reification and established RDF  etc. vocabularies ! subject: @xml:id property: < g> „is instance of glyph type“ target: @ref

  22. Inline XML I TEI-endorsed (not LOD-compliant)

  23. Inline XML: TEI-endorsed  several TEI elements are possible representatives for generic RDF triples <graph> , <fs> , <link> , <relation> • all of these already do have a different interpretation  application for RDF triples thus falls under tag abuse • TEI P5 features examples of RDF triples  using <relation>  not mentioned in the definition

  24. Inline XML: <relation> example  problems: • @active and @passive originate from functions in social (!) networks, their application to directed edges (here between geonames!) is inadequate and confusing • @name is a string attribute, the URI reference cannot be resolved without a specialized converter

  25. Inline XML: <relation> example II  problems: • @active and @passive (as before) • <relation> is syntactically constrained to environments which are reserved for named entities (e.g., <namesList> )  Is an arbitrary text passage really a named entity?  In the original proposal (SAWS), <relation> was made child of <seg> and <ab>

  26. Inline XML II W3C-compliant (not TEI-endorsed)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend