Linking the TEI Approaches, Limitations, Use Cases Christian - - PowerPoint PPT Presentation

linking the tei
SMART_READER_LITE
LIVE PREVIEW

Linking the TEI Approaches, Limitations, Use Cases Christian - - PowerPoint PPT Presentation

DH2019, Utrecht, 2019-07-11 Linking the TEI Approaches, Limitations, Use Cases Christian Chiarcos & Maxim Ionov {chiarcos|ionov}@cs.uni-frankfurt.de Applied Computational Linguistics Goethe Universitt Frankfurt, Germany Linking the TEI


slide-1
SLIDE 1

Linking the TEI

Approaches, Limitations, Use Cases

Christian Chiarcos & Maxim Ionov

{chiarcos|ionov}@cs.uni-frankfurt.de Applied Computational Linguistics Goethe Universität Frankfurt, Germany

DH2019, Utrecht, 2019-07-11

slide-2
SLIDE 2

Linking the TEI

  • The (TEI-)native way
  • URIs, but no properties
  • User-provided ad hoc converters
  • The inline way
  • TEI-endorsed, but not LOD-compliant
  • LOD-compliant, but not TEI-endorsed
  • The „proper“ way
  • standard-conformant standoff annotations
slide-3
SLIDE 3
  • The (TEI-)native way
  • URIs, but no properties
  • User-provided ad hoc converters
  • The inline way
  • TEI-endorsed, but not LOD-compliant
  • LOD-compliant, but not TEI-endorsed
  • The „proper“ way
  • standard-conformant standoff annotations

TEI/XML + WebAnnotation (JSON-LD) TEI-compliant and LOD-compliant restricted to static TEI documents

Linking the TEI

slide-4
SLIDE 4

Linking the TEI

  • The (TEI-)native way
  • URIs, but no properties
  • User-provided ad hoc converters
  • The inline way
  • TEI-endorsed, but not LOD-compliant
  • LOD-compliant, but not TEI-endorsed
  • The „proper“ way
  • standard-conformant standoff annotations

TEI/XML + WebAnnotation (JSON-LD) TEI-compliant and LOD-compliant restricted to static TEI documents

inline XML solutions ?

breaks if we have dynamic TEI content

slide-5
SLIDE 5

TEI and TEI customizations

  • very rich vocabulary
  • TEI P5: 569 elements, 505 attributes
  • TEI customization with ODD
  • high-level specification for customizing the TEI
  • select modules
  • refine vocabulary elements

 generate (textual) documentation  generate actual schemas

  • any TEI project should start with such a

customization

slide-6
SLIDE 6

TEI and TEI customizations

TEI compliance does not entail interoperability

  • Documents following the same customization are

interoperable.

  • Beyond that, the TEI provides only an orientation for their

interpretation For problems not documented in the TEI documentation, different customizations will not be interoperable E.g., when trying to encode RDF triples in the TEI ;)

  • TEI occupies a very distinctive position with respect to the

idea of standardization:

  • strictly speaking, it is not a standard,
  • but is poised between a standard and a consensus,
  • possessing some characteristics of each, in ways that have very

interesting consequences for extension and interchange.

(Bauman & Flanders 2004, bullet points by us)

slide-7
SLIDE 7

Resource Description Framework

  • RDF 1.1: general data model for the web of data
  • W3C recommendation 2014
  • directed labeled multi-graph
  • multiple edges between the same nodes
  • for every edge:
  • source node („RDF subject“) defined by URI
  • edge type („RDF property / relation“) defined by URI
  • target node („RDF object“) defined by URI
  • alternatively, target can also be an atomic (literal) value

⇒ „triple“ / „RDF statement“

slide-8
SLIDE 8

Triples and graphs: NT Mark 9:35 variants

perseus-nt: tlg002.perseus-grc1:9.35

PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.>

graph (RDF) text (URI) perseus-nt:tlg002.perseus-grc1:9.35 prefix declarations

slide-9
SLIDE 9

Triples and graphs: NT Mark 9:35 variants

„... πάντων ἔσχατος ...“ perseus-nt: tlg002.perseus-grc1:9.35

PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.>

rdfs:label

PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#>

graph (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . prefix declarations RDFS URL: machine-readable representation of a particular vocabulary

slide-10
SLIDE 10

Triples and graphs: NT Mark 9:35 variants

„... πάντων ἔσχατος ...“ perseus-nt: tlg002.perseus-grc1:9.35

PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> PREFIX saws-nt : <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.>

saws-nt: divsection1.o14.a107 rdfs:label

PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#>

saws:isVariantOf

PREFIX saws : <http://purl.org/saws/ontology#>

graph (RDF) perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . saws-nt: divsection1.o14.a107 saws:isVariantOf perseus-nt:tlg002.perseus-grc1:9.35 . prefix declarations Turtle

slide-11
SLIDE 11

Triples and graphs: NT Mark 9:35 variants

„... πάντων ἔσχατος ...“ perseus-nt: tlg002.perseus-grc1:9.35

PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> PREFIX saws-nt : <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.>

saws-nt: divsection1.o14.a107 „ἔσχατος πάντων“ rdfs:label

PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#>

saws:isVariantOf

PREFIX saws : <http://purl.org/saws/ontology#>

rdfs:label graph (RDF) perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . saws-nt: divsection1.o14.a107 rdfs:label „ἔσχατος πάντων“ . saws-nt: divsection1.o14.a107 saws:isVariantOf perseus-nt:tlg002.perseus-grc1:9.35 . prefix declarations Turtle

slide-12
SLIDE 12

Why RDF for philological data?

  • flexible and generic mechanism for creating cross-

references

  • integration of / linking with linked open data (LOD)
  • facilitate re-usability, sustainability and replicability
  • build on existing LOD technology
  • enforce explicit, machine-readable semantics
  • URIs for data structures (say, rdfs:label) can resolve against

a machine-readable, formal definition

  • facilitates semantic validation instead of syntactic

validation

  • SHACL, OWL2/DL
slide-13
SLIDE 13

Linked Open Data (LOD)

  • Use URIs as names for

things

  • Use HTTP URIs so that

people can look up those names.

  • Provide useful

information, using RDF- based standards

  • Include links to other

URIs

https://www.w3.org/DesignIssues/LinkedData.html

a best practice for publishing data on the web

slide-14
SLIDE 14

Linguistic Linked Open Data (LLOD)

(version of July 2017) http://linguistic-lod.org/

slide-15
SLIDE 15

Linking the TEI

  • Both TEI and LOD are important topics in DH
  • They do not converge ...
  • It is not possible to integrate RDF triples in a

single TEI document in a way that is both TEI- and W3C-compliant

  • You can always create customized TEI with

extensions for your personal approach to encode RDF triples, but there is no recommended way for doing so

slide-16
SLIDE 16

Linking the TEI: Options

  • The (TEI-)native way
  • URIs, but no properties
  • User-provided ad hoc converters
  • Inline XML solutions
  • TEI-endorsed, but not LOD-compliant
  • LOD-compliant, but not TEI-endorsed
  • What to do and how to choose
slide-17
SLIDE 17

Linking the TEI

The (TEI-)native way

slide-18
SLIDE 18

The (TEI-)native way: @ref*

  • Various elements can take @ref arguments,

these refer to URIs

  • these correspond to RDF targets
  • no explicit representation of RDF source or RDF

predicate

  • these must be extrapolated from definitions or data

snippets

* other URI-bearing attributes do exist, too

slide-19
SLIDE 19

The (TEI-)native way: @ref

Text Database and Dictionary of Classic Mayan

(TWKM, University of Bonn, Germany, 2014-2029)

property: <g> „is instance of glyph type“ target: @ref subject: @xml:id

slide-20
SLIDE 20

The (TEI-)native way: @ref

  • It‘s easy to build a converter, BUT
  • if we have different hypotheses regarding the

reading of a sign (physical damage or different interpretation), there is no easy way to express

  • provenance
  • uncertainty
  • etc. of multiple alternative readings

property: <g> „is instance of glyph type“ target: @ref subject: @xml:id

slide-21
SLIDE 21

The (TEI-)native way: @ref

  • It‘s easy to build a converter, BUT
  • if we have different hypotheses regarding the

reading of a sign (physical damage or different interpretation), there is no easy way to express

  • provenance
  • uncertainty
  • etc.

property: <g> „is instance of glyph type“ target: @ref subject: @xml:id ... this would be a natural application of reification and established RDF vocabularies ! The colleagues will certainly invent something, but ...

slide-22
SLIDE 22

Inline XML I

TEI-endorsed (not LOD-compliant)

slide-23
SLIDE 23

Inline XML: TEI-endorsed

  • several TEI elements are possible

representatives for generic RDF triples

<graph>, <fs>, <link>, <relation>

  • all of these already do have a different

interpretation

  • application for RDF triples thus falls under tag abuse
  • TEI P5 features examples of RDF triples
  • using <relation>
  • not mentioned in the definition
slide-24
SLIDE 24

Inline XML: <relation> example

  • problems:
  • @active and @passive originate from functions in

social (!) networks, their application to directed edges (here between geonames!) is inadequate and confusing

  • @name is a string attribute, the URI reference

cannot be resolved without a specialized converter

slide-25
SLIDE 25

Inline XML: <relation> example II

  • problems:
  • @active and @passive (as before)
  • <relation> is syntactically constrained to

environments which are reserved for named entities (e.g., <namesList>)

  • Is an arbitrary text passage really a named entity?
  • In the original proposal (SAWS), <relation> was made

child of <seg> and <ab>

slide-26
SLIDE 26

Inline XML II

W3C-compliant (not TEI-endorsed)

slide-27
SLIDE 27

RDF in attributes (RDFa)

  • augmenting XML-based markup languages

with RDF data structures

  • W3C recommendation 2015
  • requires the addition of several attributes into

the host vocabulary

  • @about (subject URI)
  • @property (property URIs)
  • @resource (object URI)
  • @content (object literal, if not taken from CDATA)
slide-28
SLIDE 28

TEI+RDFa

  • Gui de Chauliac’s

Grande Chirurgie

  • most profound

compendium of medical knowledge of the 14th c.

  • TEI+RDFa modelling,

HTML+RDFa export

Tittel, Bermúdez-Sabel & Chiarcos (2018)

  • Ms. Bibl. Montpellier Fac. Médecine 184
slide-29
SLIDE 29

:8 rdfs:label „anathomie “ ; rdfs:seeAlso deaf:anatomie; skos:definition „structure ...“.

slide-30
SLIDE 30

TEI(+RDFa) => HTML5(+RDFa)

  • Online edition with
  • Links to DEAF

dictionary entries

Display sense definitions for every lexical unit Display notes from critical apparatus

slide-31
SLIDE 31

Querying (HTML5+)RDFa

  • Online edition

http://www.deaf-page.de/guichaulmTel/edition.html

  • RDFa parser (pyRdfa): https://www.w3.org/2012/pyRdfa/
slide-32
SLIDE 32
  • Online edition

http://www.deaf-page.de/guichaulmTel/edition.html

  • RDFa parser (pyRdfa)
  • direct link: https://www.w3.org/2012/pyRdfa/extract?uri=http://www.deaf-

page.de/guichaulmTel/edition.html&format=turtle

Querying (HTML5+)RDFa

slide-33
SLIDE 33

Querying (HTML5+)RDFa

  • Online edition

http://www.deaf-page.de/guichaulmTel/edition.html

  • RDFa parser (pyRdfa)
  • direct link:

https://www.w3.org/2012/pyRdfa/extract?uri=http://www.deaf- page.de/guichaulmTel/edition.html&format=turtle

  • Querying the edition (sparql.org)
  • using direct pyRdfa link

as target graph

slide-34
SLIDE 34

Querying (HTML5+)RDFa

  • Online edition

http://www.deaf-page.de/guichaulmTel/edition.html

  • RDFa parser (pyRdfa)
  • direct link:

https://www.w3.org/2012/pyRdfa/extract?uri=http://www.deaf- page.de/guichaulmTel/edition.html&format=turtle

  • Querying the edition (sparql.org)
  • using direct pyRdfa link

(FROM) URI

  • http://www.sparql.org/sparql?

query=PREFIX+rdfs%3A+%3Chttp%3A...

slide-35
SLIDE 35

TEI+RDFa

  • using an established W3C standard
  • comes with relevant tooling for parsing,

conversion, querying, etc.

  • RDF data can be preserved in output

serializations

  • it does not have to be recreated
  • TEI approval pending

https://github.com/TEIC/TEI/issues/1860

  • discussed since 2006 (at least)

http://tei-l.970651.n3.nabble.com/template/NamlServlet.jtp? macro=search_page&node=1692902&query=RDFa

slide-36
SLIDE 36

Summary

  • No satisfying inline XML solution
  • encoding with TEI attributes is semantically limited and

ambiguous

  • but enough for referring to LOD data sets
  • encoding with TEI elements is semantically ambiguous and

largely to be considered tag abuse

  • but ok for internal use
  • encoding with RDFa attributes is not endorsed by the TEI
  • but comes with off-the-shelf tooling
  • The only way for staying LOD-compliant and TEI-

compliant is standoff annotation, e.g., with WebAnnotation and JSON-LD

  • technically demanding and not always feasible
slide-37
SLIDE 37

Different use cases require different solutions

TEI-XML + WebAnnotation (JSON-LD) no

If you want to provide a generic RDF view on, say, a digital edition or electronic data set

Do you plan to update your TEI data?

slide-38
SLIDE 38

Different use cases require different solutions

yes TEI+RDFa yes TEI-XML + WebAnnotation (JSON-LD) no

If you want to provide a generic RDF view on, say, a digital edition or electronic data set

Do you plan to update your TEI data? Do you need W3C compliancy?

slide-39
SLIDE 39

Different use cases require different solutions

yes TEI+RDFa yes no yes TEI-XML + WebAnnotation (JSON-LD) no

If you want to provide a generic RDF view on, say, a digital edition or electronic data set

Do you plan to update your TEI data? Do you need W3C compliancy? Do you want to combine human-readable and machine-readable views in multiple output documents?

slide-40
SLIDE 40

Different use cases require different solutions

yes TEI+RDFa yes no yes no no TEI-XML + WebAnnotation (JSON-LD) no

If you want to provide a generic RDF view on, say, a digital edition or electronic data set

Do you plan to update your TEI data? Do you need W3C compliancy? Do you want to combine human-readable and machine-readable views in multiple output documents? Do you have the resources to create and maintain your own converters?

slide-41
SLIDE 41

Different use cases require different solutions

yes TEI+RDFa yes no yes no no yes interoperable TEI-XML + WebAnnotation (JSON-LD) no

If you want to provide a generic RDF view on, say, a digital edition or electronic data set

Do you plan to update your TEI data? Do you need W3C compliancy? Do you want to combine human-readable and machine-readable views in multiple output documents? Do you have the resources to create and maintain your own converters? Do you want your data to be interoperable or interpretable?

slide-42
SLIDE 42

Different use cases require different solutions

yes TEI+RDFa yes no yes no no yes interoperable just interpretable (not interoperable) TEI <relation> native TEI/XML yes no TEI-XML + WebAnnotation (JSON-LD) no

If you want to provide a generic RDF view on, say, a digital edition or electronic data set

Do you plan to update your TEI data? Do you need W3C compliancy? Do you want to combine human-readable and machine-readable views in multiple output documents? Do you have the resources to create and maintain your own converters? Do you want your data to be interoperable or interpretable? Do specific TEI markup elements sufficiently cover your use case?

slide-43
SLIDE 43

Different use cases require different solutions

yes TEI+RDFa yes no yes no no yes interoperable just interpretable (not interoperable) native TEI/XML yes no TEI-XML + WebAnnotation (JSON-LD) no

If you want to provide a generic RDF view on, say, a digital edition or electronic data set

Do you plan to update your TEI data? Do you need W3C compliancy? Do you want to combine human-readable and machine-readable views in multiple output documents? Do you have the resources to create and maintain your own converters? Do you want your data to be interoperable or interpretable? Do specific TEI markup elements sufficiently cover your use case?

N3 your own converters

...

RDF/TTL HTML+ RDFa ePub+ RDFa

publication

RDF/HDT JSON-LD RDF/XML RDF-Thrift

TEI <relation>

slide-44
SLIDE 44

Thank you for your attention !

Special thanks to

TEI board & mailing list members, Sabine Tittel, Helena Bermúdez-Sabel, participants of SD-LLOD 2017, GlobaLex-2018 and LDL-2018, ...

slide-45
SLIDE 45

please reference as Christian Chiarcos and Maxim Ionov (2019), Linking the TEI – Approaches, Limitations, Use Cases. Paper presented at Digital Humanities Conference 2019 (DH2019), 9 - 12 July 2019, Utrecht, the Netherlands

A proper follow-up publication is in preparation, stay tuned! Also, please share your thoughts and criticisms.