An ontology for digital graphematjcs and philology Die - - PowerPoint PPT Presentation

an ontology for digital graphematjcs and philology
SMART_READER_LITE
LIVE PREVIEW

An ontology for digital graphematjcs and philology Die - - PowerPoint PPT Presentation

Paolo Monella An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universitt Wuppertal (BUW), 6 February 2020 Outline Outline Interoperability of digital


slide-1
SLIDE 1

Paolo Monella

Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universität Wuppertal (BUW), 6 February 2020

An ontology for digital graphematjcs and philology

slide-2
SLIDE 2

Outline

slide-3
SLIDE 3

Outline

  • Interoperability
  • f digital scholarly editjons (DSEs)

based on diplomatjc transcriptjons

  • Digital modelling (ontology)
  • f pre-modern writjng systems

Graphemes / allographs

Allographs: capitals, ligatures, positjonal variants, emphasis etc.

  • In practjce:

how can grapheme/allograph modelling make my DSE more interoperable?

  • Open issues
slide-4
SLIDE 4

Interoperability: the issue

slide-5
SLIDE 5

Interoperability: the issue

slide-6
SLIDE 6

Interoperability: the issue

  • uenenū
slide-7
SLIDE 7

Interoperability: the issue

  • uenenū

Diplomatic

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)
slide-8
SLIDE 8

Interoperability: the issue

  • uenenū
slide-9
SLIDE 9

Interoperability: the issue

  • uenenū
  • venenum
slide-10
SLIDE 10

Interoperability: the issue

  • uenenū
  • venenum
  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-11
SLIDE 11

Interoperability: the issue

  • uenenū
  • venenum

venenum

  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-12
SLIDE 12

Interoperability: the issue

  • uenenū
  • venenum
  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-13
SLIDE 13

Interoperability: the issue

  • uenenū
  • venenum
  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-14
SLIDE 14

Interoperability: the issue

  • uenenū
  • venenum
  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)
slide-15
SLIDE 15

Interoperability: the issue

  • My focus: European Medieval handwritjng

...and early print (imitatjng handwritjng)

slide-16
SLIDE 16

Interoperability: the issue

  • My focus: European Medieval handwritjng

...and early print (imitatjng handwritjng)

Pre-Gutenberg (and shortly afuer)

  • Alphabetjc writjng systems (so far)

Latjn script (Italian, English...), Greek, Cyrillic...

No non-alphabetjc (Cuneiform, Arabic, Chinese etc.)

slide-17
SLIDE 17

Interoperability: current solutjons

slide-18
SLIDE 18

Unicode (TEI’s recommendatjon)

  • Solutjon for new digital texts
  • Not enough for pre-modern writjng systems

Allographs

  • ſ (U+017F) / s (U+0073; ASCII 115)
  • Have I encoded that they correspond to each other (variants of

grapheme <s>)?

slide-19
SLIDE 19

Unicode (TEI’s recommendatjon)

  • Solutjon for new digital texts
  • Not enough for pre-modern writjng systems

Allographs

  • ſ (U+017F) / s (U+0073; ASCII 115)
  • Have I encoded that they correspond to each other (variants of

grapheme <s>)?

Ligatures

  • & (U+0026; ASCII 38)
  • Have I encoded that it is equivalent to “e + t” in that MS?

Grapheme set

  • u (U+0075; ASCII 117)
  • Have I encoded whether it “covers” (or not) <u> and <v>?
slide-20
SLIDE 20

Diplomatjc/normalized: the surrender?

  • venenum
  • uenenū

Diplomatic Normalized

  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (distant reading)...
  • Historical documentation
  • Visualization
  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)
slide-21
SLIDE 21

Project-specifjc solutjons

  • Disposable home-made solutjons
  • Normalizatjon sofuware and strategies
  • TEI: theory-agnostjc
slide-22
SLIDE 22

Interoperability through modelling

slide-23
SLIDE 23

Interoperability through modelling

  • Scholarly discussion on modelling
  • Documentjng project-specifjc modelling

and normalizatjon practjces

prose

formal (sofuware code, tables)

  • Shared models
  • Reusable sofuware libraries
slide-24
SLIDE 24

An ontology for digital graphematjcs and philology

slide-25
SLIDE 25

Ontology

Grapheme Alphabeme Token (inflected word) Lemma Allograph

  • Legature
  • Abbreviation
  • Logograph
  • Alphabetic Grapheme
  • Abbreviation Mark
  • Brevigraph
  • Diacritic
  • Space
  • Punctuation
  • Metamark

is_a +

slide-26
SLIDE 26

Ontology

Grapheme Linguistic Gr. Textual Gr. Logograph Intra-verbal Gr. {+alphabetic} {-alphabetic} Diacritic Alphabetic Grapheme Abbreviation Mark Brevigraph Space Meta- mark Punc- tuation is_a

slide-27
SLIDE 27

Ontology

Grapheme Alphabeme Token (inflected word) Lemma Allograph

  • Legature
  • Abbreviation
  • Logograph
  • Alphabetic Grapheme
  • Abbreviation Mark
  • Brevigraph
  • Diacritic
  • Space
  • Punctuation
  • Metamark

is_a +

slide-28
SLIDE 28

Ontology

Grapheme Alphabeme Token (inflected word) Lemma Allograph

  • Legature
  • Abbreviation
  • Logograph
  • Alphabetic Grapheme
  • Abbreviation Mark
  • Brevigraph
  • Diacritic
  • Space
  • Punctuation
  • Metamark

is_a +

slide-29
SLIDE 29

Digital modelling for pre-modern writjng systems

slide-30
SLIDE 30

Digital modelling

slide-31
SLIDE 31

Digital modelling

  • co̊paraƐur uł adſe uładalium
  • Comparatur vel ad se vel ad alium

He is compared to himself or to another

slide-32
SLIDE 32

Digital modelling

  • co̊paraƐur uł adſe uładalium
  • Comparatur vel ad se vel ad alium

He is compared to himself or to another

slide-33
SLIDE 33

Digital modelling

  • co̊paraƐur uł adſe uładalium

Digital modelling

  • Comparatur vel ad se vel ad alium

He is compared to himself or to another

slide-34
SLIDE 34

Digital modelling

  • co̊paraƐur uł adſe uładalium
slide-35
SLIDE 35

A structural approach to digital modelling

  • co̊paraƐur uł adſe uładalium

Text System

<z> <y> <x> <t> <s>

Entities Analysis

slide-36
SLIDE 36

A structural approach to digital modelling

  • co̊paraƐur uł adſe uładalium

Text System

<z> <y> <x> <t> <s>

Entities Analysis

Digital modelling

slide-37
SLIDE 37

Graphemes/allographs

slide-38
SLIDE 38

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

Text

<z> <y> <x> <t> <s>

System

Comparatur vel ad se vel ad alium He is compared to himself or to another

slide-39
SLIDE 39

«τ»

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

Text

<z> <y> <x> <t> <s>

System

«√»

System

slide-40
SLIDE 40

«τ»

Substitution: → No change in “denotative meaning”

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

<z> <y> <x> <t> <s>

Commutation: → Change in “denotative meaning”

«√»

slide-41
SLIDE 41

«τ»

Substitution: → No change in “denotative meaning”

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

<z> <y> <x> <t> <s>

Commutation: → Change in “denotative meaning”

«√»

Allographs Graphemes

slide-42
SLIDE 42

«τ»

Substitution: → No change in “denotative meaning”

Graphemes/allographs: the commutatjon test

  • co̊paraƐur uł adſe uładalium

<z> <y> <x> <t> <s>

Commutation: → Change in “denotative meaning”

«√»

Allographs Graphemes Gr Allogr t: τ | Ɛ | √ u: u | v z: z

slide-43
SLIDE 43

Graphemes / allographs: what to transcribe?

  • What the project wants!

based on its scientjfjc interests

(and on tjme / money)

  • But: framed in a larger model
slide-44
SLIDE 44

Saussure, pertjnence and the scribe’s toolbox

slide-45
SLIDE 45

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

a b c d e f g h i l m n o p q r s t u z · ;

MS A MS B

Saussure, pertjnence and the scribe’s toolbox

slide-46
SLIDE 46

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

a b c d e f g h i l m n o p q r s t u z · ;

OCR from Teubner

Saussure, pertjnence and the scribe’s toolbox

OCR from Loeb

slide-47
SLIDE 47

Saussure, pertjnence and the scribe’s toolbox

  • The toolbox of the scribe

Defjnitjon of graphemes, allographs…

  • Writjng systems as autonomous semiotjc systems (Sampson)

Not as epiphenomena of oral language (phonemes)

Mandarin / cantonese

“Opaque” orthographies (English)

  • “knight”, “aile”, “read”, “read” (past tense)

Medieval MSS: pronunciatjon?

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

slide-48
SLIDE 48

Saussure, pertjnence and the scribe’s toolbox

  • “In language there are only difgerences” (Saussure)

“But the statement that everything in language is negatjve is true

  • nly if the signifjed and the signifjer are considered separately;

when we consider the sign in its totality, we have something that is positjve in its own class”

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

slide-49
SLIDE 49

Saussure, pertjnence and the scribe’s toolbox

  • Can we defjne the scribe’s (graphematjc, signifjer) toolbox

under complete ignorance of the linguistjc (meaning, signifjed) dimension?

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

slide-50
SLIDE 50

Saussure, pertjnence and the scribe’s toolbox

  • Can we defjne the scribe’s toolbox under complete ignorance
  • f the linguistjc dimension?

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

slide-51
SLIDE 51

Saussure, pertjnence and the scribe’s toolbox

  • Can we defjne the scribe’s toolbox under complete ignorance
  • f the linguistjc dimension?

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

Segmentation

slide-52
SLIDE 52

Saussure, pertjnence and the scribe’s toolbox

  • Can we defjne the scribe’s toolbox under complete ignorance
  • f the linguistjc dimension?

Devanāgarī அட௉ந்த பைன ிடீக்கூம் ட௉க தடீட௉யூம் ிடீக்கூம் - ஈ.வி.பைக.எஸ். இளங்பைகவன் வரூம் பைதர்தலில் ஐபைக் நிறூவனத்ன் இட௉ ந் ிற்ற உள்பைளம்: ஸ்லின் ிரசந்த் கிபை$ரின் நிறூவனத்ன் இட௉ ந்

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

slide-53
SLIDE 53

Saussure, pertjnence and the scribe’s toolbox

  • Can we defjne the scribe’s toolbox under complete ignorance
  • f the linguistjc dimension?

a b c d e f g h i j l m n o p q r s t u v z . , ; : !

Devanāgarī Turkish, Latin, Italian, English

slide-54
SLIDE 54

Can allographs have a distjnctjve value?

slide-55
SLIDE 55

√ √ √ √ τ τ τ τ τ

Allographs

Ɛ

Ɛ

slide-56
SLIDE 56

√ √ √ √ τ τ τ τ τ

Allographs

Ɛ

  • 1. «τ»
  • 2. «Ɛ»

3.«√»

Ɛ

slide-57
SLIDE 57

Capitals: allographs or graphemes?

  • Cool (CA) is a cool town

Geographical name

  • Smith is a good smith

Proper name

  • ODD fjles are odd fjles

Acronym

OK for contemporary Western writing systems Not for classical/medieval handwriting (see later)

slide-58
SLIDE 58

Capitals: allographs or graphemes?

Grapheme <D> Allograph «D» Allograph «d» Archi-grapheme D

Grapheme <D> Grapheme <d>

Alphabeme D

Grapheme <D> Grapheme <d>

  • R. Mordenti
  • F. Neuber
  • P. Monella
  • Cool (CA) is a cool town

Geographical name

  • Smith is a good smith

Proper name

  • ODD fjles are odd fjles

Acronym

slide-59
SLIDE 59
  • I go because I have to. Stay here!

I go because I have to stay here!

Sentence segmentatjon: distjnctjve value for meaning of the whole text

Capitals

slide-60
SLIDE 60

Sentence segmentatjon: distjnctjve value for meaning of the whole text

Capitals Punctuation

  • I go because I have to. Stay here!

I go because I have to stay here!

slide-61
SLIDE 61
  • σαῦρος, ſucceſs, daſs (daß)

Word segmentatjon: distjnctjve value for meaning of the whole text

slide-62
SLIDE 62
  • σαῦρος, ſucceſs, daſs (daß)

Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me)

Word segmentatjon: distjnctjve value for meaning of the whole text

Positional allograph

slide-63
SLIDE 63
  • σαῦρος, ſucceſs, daſs (daß)

Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me)

Word segmentatjon: distjnctjve value for meaning of the whole text

Space Positional allograph

slide-64
SLIDE 64

Connotators

slide-65
SLIDE 65

Connotators

slide-66
SLIDE 66

𝖝𝖎𝖕

WHO

≠ Connotator

“Gothic” (marked)

Connotator

“Gaul” (not marked) Pertinence

Connotators

slide-67
SLIDE 67

Connotators, pertjnent for the writer

  • graphemes as entjtjes

Emphasis

  • the Evangelist wrote

Respect

Connotators

slide-68
SLIDE 68

Distjnctjve value (pertjnence) of allographs?

  • Pertjnent difgerences defjne entjtjes (graphemes, allographs)

Distjnctjve value

  • etjc vs -emic
slide-69
SLIDE 69

«Ɛ»

(Non-)pertjnent allographs: positjonal variants

  • Complementary distributjon

Hjelmslev’s “varietjes”

«τ» «√»

Allographs

slide-70
SLIDE 70

«Ɛ»

(Non-)pertjnent allographs: positjonal variants

  • Ligatures
  • Non-pertjnent for the writer
  • Connotators, pertjnent

for (some) readers

editors, paleographers, codicologists, historians studying a MS / book

(Beneventan vs Caroline script, print font, ſ / s)

«τ» «√»

Allographs

slide-71
SLIDE 71

(Non-)pertjnent allographs: free variants

√ √ √ √ τ τ τ τ τ Ɛ

Ɛ

  • Non-pertjnent for the writer
  • Connotators, pertjnent

for (some) readers

editors, paleographers, codicologists, historians studying a MS / book

(Beneventan vs Caroline script, print font, ſ / s)

slide-72
SLIDE 72

(Non-)pertjnent allographs: free variants

  • Infjnite
  • Contjnuum → discrete

It is diffjcult to draw boundaries

Digital (=discrete) modelling

  • Hjelmlev: metasemiology

√ √ √ √ τ τ τ τ τ Ɛ

Ɛ

slide-73
SLIDE 73

Distjnctjve value (pertjnence) of allographs?

  • Graphemes change denotatjve meaning

fame vs name

Hjelmslev: denotatjve semiotjcs

  • Allographs can have other forms of distjnctjve value (pertjnence)

For the writer

  • 𝖝𝖎𝖕

vs

WHO

  • Hjelmslev: connotative semiotics

For the reader (digital editor)

  • Digital editors can set their own pertinence (transcription) criteria

based on their scientific interests

E.g.: fraktur font → political connotation in WW1

slide-74
SLIDE 74

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

slide-75
SLIDE 75

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

  • How can a structural digital modelling
  • f the graphemes/allographs distjnctjon

make my DSE more interoperable?

slide-76
SLIDE 76

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A) Manual (selective) transcription (witness B)

slide-77
SLIDE 77

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A)

Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs

Manual (selective) transcription (witness B)

slide-78
SLIDE 78

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A)

Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs

Manual (selective) transcription (witness B)

slide-79
SLIDE 79

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A)

Unicode characters Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs

Manual (selective) transcription (witness B)

slide-80
SLIDE 80

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

OCR/HTT (witness A)

Unicode characters Markup / annotation Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs

Manual (selective) transcription (witness B)

slide-81
SLIDE 81

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

OCR/HTT (witness A) Manual (selective) transcription (witness B)

Allographic transcription

slide-82
SLIDE 82

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

OCR/HTT (witness A) Manual (selective) transcription (witness B)

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)

Allographic transcription

slide-83
SLIDE 83

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u

OCR/HTT (witness A) Manual (selective) transcription (witness B)

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)

Allographic transcription

slide-84
SLIDE 84

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u

unter dem schloss

OCR/HTT (witness A) Manual (selective) transcription (witness B)

unter dem schloss

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)

Allographic transcription Graphematic transcription

slide-85
SLIDE 85

In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

Vnτer <hi>dem</hi> schloss unter dem ſchloſs

Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u

unter dem schloss

  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)

OCR/HTT (witness A) Manual (selective) transcription (witness B)

unter dem schloss

(More) interoperability

  • Historical documentation
  • Visualization
  • Processing
  • (Erkenntnispotentiale)

Allographic transcription Graphematic transcription

slide-86
SLIDE 86

Open issues

slide-87
SLIDE 87

Open issues

  • Individual allographs

Distjnctjve value / pertjnence (capitals, punctuatjon etc.)

Ligature segmentatjon one (&) or two (et)?

slide-88
SLIDE 88

Open issues

  • Allographic words

Spelling (wife / wyfge)

slide-89
SLIDE 89

Open issues

  • Allographic words

Spelling (wife / wyfge)

Abbreviatjons (ūter / unter)

slide-90
SLIDE 90

Open issues

ūter dem schloss

  • Allographic words

Spelling (wife / wyfge)

Abbreviatjons (ūter / unter)

unter dem schloss

Graphematic

ūter dem ſchloſs unter dem ſchloſs

Allographic

slide-91
SLIDE 91

Open issues

ūter dem schloss [unter]

  • Allographic words

Spelling (wife / wyfge)

Abbreviatjons (ūter / unter)

unter dem schloss [unter]

Graphematic

ūter dem ſchloſs unter dem ſchloſs

Allographic Linguistic (normalized)

slide-92
SLIDE 92

Open issues

ūter dem schloss [unter]

  • Processing
  • Search
  • Collation
  • NLP (lemma, PoS etc.)
  • Statistics (dist. reading)

Interoperability

  • Allographic words

Spelling (wife / wyfge)

Abbreviatjons (ūter / unter)

unter dem schloss [unter]

Graphematic

ūter dem ſchloſs unter dem ſchloſs

Allographic Linguistic (normalized)

slide-93
SLIDE 93

Outline

slide-94
SLIDE 94

Outline

  • Interoperability
  • f digital scholarly editjons (DSEs)

based on diplomatjc transcriptjons

  • Digital modelling (ontology)
  • f pre-modern writjng systems

Graphemes / allographs

Allographs: capitals, ligatures, positjonal variants, emphasis etc.

  • In practjce:

how can grapheme/allograph modelling make my DSE more interoperable?

  • Open issues