An ontology for digital graphematjcs and philology Die - - PowerPoint PPT Presentation

an ontology for digital graphematjcs and philology
SMART_READER_LITE
LIVE PREVIEW

An ontology for digital graphematjcs and philology Die - - PowerPoint PPT Presentation

Paolo Monella An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universitt Wuppertal (BUW), 6 February 2020 Outline Outline Interoperability of digital


slide-1
SLIDE 1

Paolo Monella

Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universität Wuppertal (BUW), 6 February 2020

An ontology for digital graphematjcs and philology

slide-2
SLIDE 2

Outline

slide-3
SLIDE 3

Outline

  • Interoperability
  • f digital scholarly editjons (DSEs)

based on diplomatjc transcriptjons

  • Digital modelling (ontology)
  • f pre-modern writjng systems

Graphemes / allographs

Allographs: capitals, ligatures, positjonal variants, emphasis etc.

  • In practjce:

how can grapheme/allograph modelling make my DSE more interoperable?

  • Open issues
slide-4
SLIDE 4

Interoperability: the issue

slide-5
SLIDE 5

Interoperability: the issue

slide-6
SLIDE 6

Interoperability: the issue

  • uenenū
slide-7
SLIDE 7

Interoperability: the issue

  • uenenū

Diplomatic

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)
slide-8
SLIDE 8

Interoperability: the issue

  • uenenū
slide-9
SLIDE 9

Interoperability: the issue

  • uenenū
  • venenum
slide-10
SLIDE 10

Interoperability: the issue

  • uenenū
  • venenum
  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-11
SLIDE 11

Interoperability: the issue

  • uenenū
  • venenum

venenum

  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-12
SLIDE 12

Interoperability: the issue

  • uenenū
  • venenum
  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-13
SLIDE 13

Interoperability: the issue

  • uenenū
  • venenum
  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-14
SLIDE 14

Interoperability: the issue

  • uenenū
  • venenum
  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-15
SLIDE 15

Interoperability: the issue

  • My focus: European Medieval handwritjng

...and early print (imitatjng handwritjng)

slide-16
SLIDE 16

Interoperability: current solutjons

slide-17
SLIDE 17

Unicode (TEI’s recommendatjon)

  • Solutjon for new digital texts
  • Not enough for pre-modern writjng systems

Allographs

  • ſ (U+017F) / s (U+0073; ASCII 115)
  • Have I encoded that they correspond to each other (variants of

grapheme <s>)?

slide-18
SLIDE 18

Project-specifjc solutjons

  • Disposable home-made solutjons
  • Normalizatjon sofuware and strategies
  • TEI: theory-agnostjc
slide-19
SLIDE 19

Interoperability through modelling

slide-20
SLIDE 20

Interoperability through modelling

  • Scholarly discussion on modelling
  • Documentjng project-specifjc modelling

and normalizatjon practjces

prose

formal (sofuware code, tables)

  • Shared models
  • Reusable sofuware libraries
slide-21
SLIDE 21

An ontology for digital graphematjcs and philology

slide-22
SLIDE 22

Ontology

Grapheme Alphabeme Token (inflected word) Lemma Allograph

  • Legature
  • Abbreviation
  • Logograph
  • Alphabetic Grapheme
  • Abbreviation Mark
  • Brevigraph
  • Diacritic
  • Space
  • Punctuation
  • Metamark

is_a +

slide-23
SLIDE 23

Ontology

Grapheme Linguistic Gr. Textual Gr. Logograph Intra-verbal Gr. {+alphabetic} {-alphabetic} Diacritic Alphabetic Grapheme Abbreviation Mark Brevigraph Space Meta- mark Punc- tuation is_a

slide-24
SLIDE 24

Ontology

Grapheme Alphabeme Token (inflected word) Lemma Allograph

  • Legature
  • Abbreviation
  • Logograph
  • Alphabetic Grapheme
  • Abbreviation Mark
  • Brevigraph
  • Diacritic
  • Space
  • Punctuation
  • Metamark

is_a +

slide-25
SLIDE 25

Ontology

Grapheme Alphabeme Token (inflected word) Lemma Allograph

  • Legature
  • Abbreviation
  • Logograph
  • Alphabetic Grapheme
  • Abbreviation Mark
  • Brevigraph
  • Diacritic
  • Space
  • Punctuation
  • Metamark

is_a +

slide-26
SLIDE 26

Graphemes/allographs

slide-27
SLIDE 27

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

Text

<z> <y> <x> <t> <s>

System

Comparatur vel ad se vel ad alium He is compared to himself or to another

slide-28
SLIDE 28

«τ»

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

Text

<z> <y> <x> <t> <s>

System

«√»

System

slide-29
SLIDE 29

«τ»

Substitution: → No change in “denotative meaning”

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

<z> <y> <x> <t> <s>

Commutation: → Change in “denotative meaning”

«√»

slide-30
SLIDE 30

«τ»

Substitution: → No change in “denotative meaning”

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

<z> <y> <x> <t> <s>

Commutation: → Change in “denotative meaning”

«√»

Allographs Graphemes

slide-31
SLIDE 31

«τ»

Substitution: → No change in “denotative meaning”

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

<z> <y> <x> <t> <s>

Commutation: → Change in “denotative meaning”

«√»

Allographs Graphemes Gr Allogr t: τ | Ɛ | √ u: u | v z: z

slide-32
SLIDE 32

Graphemes / allographs: what to transcribe?

  • What the project wants!

based on its scientjfjc interests

(and on tjme / money)

  • But: framed in a larger model
slide-33
SLIDE 33

Can allographs have a distjnctjve value?

slide-34
SLIDE 34

√ √ √ √ τ τ τ τ τ

Allographs

Ɛ

Ɛ

slide-35
SLIDE 35

√ √ √ √ τ τ τ τ τ

Allographs

Ɛ

  • 1. «τ»
  • 2. «Ɛ»

3.«√»

Ɛ

slide-36
SLIDE 36

Capitals: allographs or graphemes?

  • Cool (CA) is a cool town

Geographical name

  • Smith is a good smith

Proper name

  • ODD fjles are odd fjles

Acronym

OK for contemporary Western writing systems Not for classical/medieval handwriting (see later)

slide-37
SLIDE 37

Capitals: allographs or graphemes?

Grapheme <D> Allograph «D» Allograph «d» Archi-grapheme D

Grapheme <D> Grapheme <d>

Alphabeme D

Grapheme <D> Grapheme <d>

  • R. Mordenti
  • F. Neuber
  • P. Monella
  • Cool (CA) is a cool town

Geographical name

  • Smith is a good smith

Proper name

  • ODD fjles are odd fjles

Acronym

slide-38
SLIDE 38
  • I go because I have to. Stay here!

I go because I have to stay here!

Sentence segmentatjon: distjnctjve value for meaning of the whole text

Capitals

slide-39
SLIDE 39

Sentence segmentatjon: distjnctjve value for meaning of the whole text

Capitals Punctuation

  • I go because I have to. Stay here!

I go because I have to stay here!

slide-40
SLIDE 40
  • σαῦρος, ſucceſs, daſs (daß)

Word segmentatjon: distjnctjve value for meaning of the whole text

slide-41
SLIDE 41
  • σαῦρος, ſucceſs, daſs (daß)

Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me)

Word segmentatjon: distjnctjve value for meaning of the whole text

Positional allograph

slide-42
SLIDE 42
  • σαῦρος, ſucceſs, daſs (daß)

Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me)

Word segmentatjon: distjnctjve value for meaning of the whole text

Space Positional allograph

slide-43
SLIDE 43

Connotators

slide-44
SLIDE 44

Connotators

slide-45
SLIDE 45

𝖝𝖎𝖕

WHO

≠ Connotator

“Gothic” (marked)

Connotator

“Gaul” (not marked) Pertinence

Connotators

slide-46
SLIDE 46

Connotators, pertjnent for the writer

  • graphemes as entjtjes

Emphasis

  • the Evangelist wrote

Respect

Connotators

slide-47
SLIDE 47

«Ɛ»

(Non-)pertjnent allographs: positjonal variants

  • Ligatures
  • Non-pertjnent for the writer
  • Connotators, pertjnent

for (some) readers

editors, paleographers, codicologists, historians studying a MS / book

(Beneventan vs Caroline script, print font, ſ / s)

«τ» «√»

Allographs

slide-48
SLIDE 48

Distjnctjve value (pertjnence) of allographs?

  • Graphemes change denotatjve meaning

fame vs name

Hjelmslev: denotatjve semiotjcs

  • Allographs can have other forms of distjnctjve value (pertjnence)

For the writer

  • 𝖝𝖎𝖕

vs

WHO

  • Hjelmslev: connotative semiotics

For the reader (digital editor)

  • Digital editors can set their own pertinence (transcription) criteria

based on their scientific interests

E.g.: fraktur font → political connotation in WW1

slide-49
SLIDE 49

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

slide-50
SLIDE 50

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A) Manual (selective) transcription (witness B)

slide-51
SLIDE 51

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A)

Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs

Manual (selective) transcription (witness B)

slide-52
SLIDE 52

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A)

Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs

Manual (selective) transcription (witness B)

slide-53
SLIDE 53

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A)

Unicode characters Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs

Manual (selective) transcription (witness B)

slide-54
SLIDE 54

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A)

Unicode characters Markup / annotation Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs

Manual (selective) transcription (witness B)

slide-55
SLIDE 55

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

OCR/HTT (witness A) Manual (selective) transcription (witness B)

Allographic transcription

slide-56
SLIDE 56

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

OCR/HTT (witness A) Manual (selective) transcription (witness B)

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)

Allographic transcription

slide-57
SLIDE 57

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u

OCR/HTT (witness A) Manual (selective) transcription (witness B)

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)

Allographic transcription

slide-58
SLIDE 58

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u

unter dem schloss

OCR/HTT (witness A) Manual (selective) transcription (witness B)

unter dem schloss

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)

Allographic transcription Graphematic transcription

slide-59
SLIDE 59

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u

unter dem schloss

  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)

OCR/HTT (witness A) Manual (selective) transcription (witness B)

unter dem schloss

(More) interoperability

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)

Allographic transcription Graphematic transcription

slide-60
SLIDE 60

Open issues

slide-61
SLIDE 61

Open issues

  • Allographic words

Spelling (wife / wyfge)

slide-62
SLIDE 62

Open issues

  • Allographic words

Spelling (wife / wyfge)

Abbreviatjons (ūter / unter)

slide-63
SLIDE 63

Open issues

ūter dem schloss

  • Allographic words

Spelling (wife / wyfge)

Abbreviatjons (ūter / unter)

unter dem schloss

Graphematic

ūter dem ſchloſs unter dem ſchloſs

Allographic

slide-64
SLIDE 64

Open issues

ūter dem schloss [unter]

  • Allographic words

Spelling (wife / wyfge)

Abbreviatjons (ūter / unter)

unter dem schloss [unter]

Graphematic

ūter dem ſchloſs unter dem ſchloſs

Allographic Linguistic (normalized)

slide-65
SLIDE 65

Open issues

ūter dem schloss [unter]

  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)

Interoperability

  • Allographic words

Spelling (wife / wyfge)

Abbreviatjons (ūter / unter)

unter dem schloss [unter]

Graphematic

ūter dem ſchloſs unter dem ſchloſs

Allographic Linguistic (normalized)

slide-66
SLIDE 66

Outline

slide-67
SLIDE 67

Outline

  • Interoperability
  • f digital scholarly editjons (DSEs)

based on diplomatjc transcriptjons

  • Digital modelling (ontology)
  • f pre-modern writjng systems

Graphemes / allographs

Allographs: capitals, ligatures, positjonal variants, emphasis etc.

  • In practjce:

how can grapheme/allograph modelling make my DSE more interoperable?

  • Open issues