SLIDE 1 Paolo Monella
Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universität Wuppertal (BUW), 6 February 2020
An ontology for digital graphematjcs and philology
SLIDE 2
Outline
SLIDE 3 Outline
- Interoperability
- f digital scholarly editjons (DSEs)
based on diplomatjc transcriptjons
- Digital modelling (ontology)
- f pre-modern writjng systems
–
Graphemes / allographs
–
Allographs: capitals, ligatures, positjonal variants, emphasis etc.
how can grapheme/allograph modelling make my DSE more interoperable?
SLIDE 4
Interoperability: the issue
SLIDE 5
Interoperability: the issue
SLIDE 6 Interoperability: the issue
SLIDE 7 Interoperability: the issue
Diplomatic
- Historical documentation
- Visualization
- Processing
- (Erkenntnispotentiale)
SLIDE 8 Interoperability: the issue
SLIDE 9 Interoperability: the issue
SLIDE 10 Interoperability: the issue
- uenenū
- venenum
- Processing
- Search
- Collation
- NLP (lemma, PoS etc.)
- Statistics (dist. reading)
SLIDE 11 Interoperability: the issue
venenum
- Processing
- Search
- Collation
- NLP (lemma, PoS etc.)
- Statistics (dist. reading)
SLIDE 12 Interoperability: the issue
- uenenū
- venenum
- Processing
- Search
- Collation
- NLP (lemma, PoS etc.)
- Statistics (dist. reading)
SLIDE 13 Interoperability: the issue
- uenenū
- venenum
- Processing
- Search
- Collation
- NLP (lemma, PoS etc.)
- Statistics (dist. reading)
SLIDE 14 Interoperability: the issue
- uenenū
- venenum
- Processing
- Search
- Collation
- NLP (lemma, PoS etc.)
- Statistics (dist. reading)
SLIDE 15 Interoperability: the issue
- My focus: European Medieval handwritjng
–
...and early print (imitatjng handwritjng)
SLIDE 16 Interoperability: the issue
- My focus: European Medieval handwritjng
–
...and early print (imitatjng handwritjng)
–
Pre-Gutenberg (and shortly afuer)
- Alphabetjc writjng systems (so far)
–
Latjn script (Italian, English...), Greek, Cyrillic...
–
No non-alphabetjc (Cuneiform, Arabic, Chinese etc.)
SLIDE 17
Interoperability: current solutjons
SLIDE 18 Unicode (TEI’s recommendatjon)
- Solutjon for new digital texts
- Not enough for pre-modern writjng systems
–
Allographs
- ſ (U+017F) / s (U+0073; ASCII 115)
- Have I encoded that they correspond to each other (variants of
grapheme <s>)?
SLIDE 19 Unicode (TEI’s recommendatjon)
- Solutjon for new digital texts
- Not enough for pre-modern writjng systems
–
Allographs
- ſ (U+017F) / s (U+0073; ASCII 115)
- Have I encoded that they correspond to each other (variants of
grapheme <s>)?
–
Ligatures
- & (U+0026; ASCII 38)
- Have I encoded that it is equivalent to “e + t” in that MS?
–
Grapheme set
- u (U+0075; ASCII 117)
- Have I encoded whether it “covers” (or not) <u> and <v>?
SLIDE 20 Diplomatjc/normalized: the surrender?
Diplomatic Normalized
- Processing
- Search
- Collation
- NLP (lemma, PoS etc.)
- Statistics (distant reading)...
- Historical documentation
- Visualization
- Historical documentation
- Visualization
- Processing
- (Erkenntnispotentiale)
SLIDE 21 Project-specifjc solutjons
- Disposable home-made solutjons
- Normalizatjon sofuware and strategies
- TEI: theory-agnostjc
SLIDE 22
Interoperability through modelling
SLIDE 23 Interoperability through modelling
- Scholarly discussion on modelling
- Documentjng project-specifjc modelling
and normalizatjon practjces
–
prose
–
formal (sofuware code, tables)
- Shared models
- Reusable sofuware libraries
SLIDE 24
An ontology for digital graphematjcs and philology
SLIDE 25 Ontology
Grapheme Alphabeme Token (inflected word) Lemma Allograph
- Legature
- Abbreviation
- Logograph
- Alphabetic Grapheme
- Abbreviation Mark
- Brevigraph
- Diacritic
- Space
- Punctuation
- Metamark
is_a +
SLIDE 26 Ontology
Grapheme Linguistic Gr. Textual Gr. Logograph Intra-verbal Gr. {+alphabetic} {-alphabetic} Diacritic Alphabetic Grapheme Abbreviation Mark Brevigraph Space Meta- mark Punc- tuation is_a
SLIDE 27 Ontology
Grapheme Alphabeme Token (inflected word) Lemma Allograph
- Legature
- Abbreviation
- Logograph
- Alphabetic Grapheme
- Abbreviation Mark
- Brevigraph
- Diacritic
- Space
- Punctuation
- Metamark
is_a +
SLIDE 28 Ontology
Grapheme Alphabeme Token (inflected word) Lemma Allograph
- Legature
- Abbreviation
- Logograph
- Alphabetic Grapheme
- Abbreviation Mark
- Brevigraph
- Diacritic
- Space
- Punctuation
- Metamark
is_a +
SLIDE 29
Digital modelling for pre-modern writjng systems
SLIDE 30
Digital modelling
SLIDE 31 Digital modelling
- co̊paraƐur uł adſe uładalium
- Comparatur vel ad se vel ad alium
He is compared to himself or to another
SLIDE 32 Digital modelling
- co̊paraƐur uł adſe uładalium
- Comparatur vel ad se vel ad alium
He is compared to himself or to another
SLIDE 33 Digital modelling
- co̊paraƐur uł adſe uładalium
Digital modelling
- Comparatur vel ad se vel ad alium
He is compared to himself or to another
SLIDE 34 Digital modelling
- co̊paraƐur uł adſe uładalium
SLIDE 35 A structural approach to digital modelling
- co̊paraƐur uł adſe uładalium
Text System
<z> <y> <x> <t> <s>
Entities Analysis
SLIDE 36 A structural approach to digital modelling
- co̊paraƐur uł adſe uładalium
Text System
<z> <y> <x> <t> <s>
Entities Analysis
Digital modelling
SLIDE 37
Graphemes/allographs
SLIDE 38 Graphemes/allographs: the commutatjon test
- co̊paraƐur uł adſe uładalium
Text
<z> <y> <x> <t> <s>
System
Comparatur vel ad se vel ad alium He is compared to himself or to another
SLIDE 39 «τ»
Graphemes/allographs: the commutatjon test
- co̊paraƐur uł adſe uładalium
Text
<z> <y> <x> <t> <s>
System
«√»
System
SLIDE 40 «τ»
Substitution: → No change in “denotative meaning”
Graphemes/allographs: the commutatjon test
- co̊paraƐur uł adſe uładalium
<z> <y> <x> <t> <s>
Commutation: → Change in “denotative meaning”
«√»
SLIDE 41 «τ»
Substitution: → No change in “denotative meaning”
Graphemes/allographs: the commutatjon test
- co̊paraƐur uł adſe uładalium
<z> <y> <x> <t> <s>
Commutation: → Change in “denotative meaning”
«√»
Allographs Graphemes
SLIDE 42 «τ»
Substitution: → No change in “denotative meaning”
Graphemes/allographs: the commutatjon test
- co̊paraƐur uł adſe uładalium
<z> <y> <x> <t> <s>
Commutation: → Change in “denotative meaning”
«√»
Allographs Graphemes Gr Allogr t: τ | Ɛ | √ u: u | v z: z
SLIDE 43 Graphemes / allographs: what to transcribe?
–
based on its scientjfjc interests
–
(and on tjme / money)
- But: framed in a larger model
SLIDE 44
Saussure, pertjnence and the scribe’s toolbox
SLIDE 45 a b c d e f g h i j l m n o p q r s t u v z . , ; : !
a b c d e f g h i l m n o p q r s t u z · ;
MS A MS B
Saussure, pertjnence and the scribe’s toolbox
SLIDE 46 a b c d e f g h i j l m n o p q r s t u v z . , ; : !
a b c d e f g h i l m n o p q r s t u z · ;
OCR from Teubner
Saussure, pertjnence and the scribe’s toolbox
OCR from Loeb
SLIDE 47 Saussure, pertjnence and the scribe’s toolbox
- The toolbox of the scribe
–
Defjnitjon of graphemes, allographs…
- Writjng systems as autonomous semiotjc systems (Sampson)
–
Not as epiphenomena of oral language (phonemes)
–
Mandarin / cantonese
–
“Opaque” orthographies (English)
- “knight”, “aile”, “read”, “read” (past tense)
–
Medieval MSS: pronunciatjon?
a b c d e f g h i j l m n o p q r s t u v z . , ; : !
SLIDE 48 Saussure, pertjnence and the scribe’s toolbox
- “In language there are only difgerences” (Saussure)
–
“But the statement that everything in language is negatjve is true
- nly if the signifjed and the signifjer are considered separately;
when we consider the sign in its totality, we have something that is positjve in its own class”
a b c d e f g h i j l m n o p q r s t u v z . , ; : !
SLIDE 49 Saussure, pertjnence and the scribe’s toolbox
- Can we defjne the scribe’s (graphematjc, signifjer) toolbox
under complete ignorance of the linguistjc (meaning, signifjed) dimension?
a b c d e f g h i j l m n o p q r s t u v z . , ; : !
SLIDE 50 Saussure, pertjnence and the scribe’s toolbox
- Can we defjne the scribe’s toolbox under complete ignorance
- f the linguistjc dimension?
a b c d e f g h i j l m n o p q r s t u v z . , ; : !
SLIDE 51 Saussure, pertjnence and the scribe’s toolbox
- Can we defjne the scribe’s toolbox under complete ignorance
- f the linguistjc dimension?
a b c d e f g h i j l m n o p q r s t u v z . , ; : !
Segmentation
SLIDE 52 Saussure, pertjnence and the scribe’s toolbox
- Can we defjne the scribe’s toolbox under complete ignorance
- f the linguistjc dimension?
Devanāgarī அடந்த பைன ிடீக்கூம் டக தடீடயூம் ிடீக்கூம் - ஈ.வி.பைக.எஸ். இளங்பைகவன் வரூம் பைதர்தலில் ஐபைக் நிறூவனத்ன் இட ந் ிற்ற உள்பைளம்: ஸ்லின் ிரசந்த் கிபை$ரின் நிறூவனத்ன் இட ந்
a b c d e f g h i j l m n o p q r s t u v z . , ; : !
SLIDE 53 Saussure, pertjnence and the scribe’s toolbox
- Can we defjne the scribe’s toolbox under complete ignorance
- f the linguistjc dimension?
a b c d e f g h i j l m n o p q r s t u v z . , ; : !
Devanāgarī Turkish, Latin, Italian, English
SLIDE 54
Can allographs have a distjnctjve value?
SLIDE 55
√ √ √ √ τ τ τ τ τ
Allographs
Ɛ
Ɛ
SLIDE 56 √ √ √ √ τ τ τ τ τ
Allographs
Ɛ
3.«√»
Ɛ
SLIDE 57 Capitals: allographs or graphemes?
Geographical name
Proper name
Acronym
OK for contemporary Western writing systems Not for classical/medieval handwriting (see later)
SLIDE 58 Capitals: allographs or graphemes?
Grapheme <D> Allograph «D» Allograph «d» Archi-grapheme D
Grapheme <D> Grapheme <d>
Alphabeme D
Grapheme <D> Grapheme <d>
- R. Mordenti
- F. Neuber
- P. Monella
- Cool (CA) is a cool town
Geographical name
Proper name
Acronym
SLIDE 59
- I go because I have to. Stay here!
I go because I have to stay here!
Sentence segmentatjon: distjnctjve value for meaning of the whole text
Capitals
SLIDE 60 Sentence segmentatjon: distjnctjve value for meaning of the whole text
Capitals Punctuation
- I go because I have to. Stay here!
I go because I have to stay here!
SLIDE 61
- σαῦρος, ſucceſs, daſs (daß)
Word segmentatjon: distjnctjve value for meaning of the whole text
SLIDE 62
- σαῦρος, ſucceſs, daſs (daß)
Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me)
Word segmentatjon: distjnctjve value for meaning of the whole text
Positional allograph
SLIDE 63
- σαῦρος, ſucceſs, daſs (daß)
Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me)
Word segmentatjon: distjnctjve value for meaning of the whole text
Space Positional allograph
SLIDE 64
Connotators
SLIDE 65
Connotators
SLIDE 66 𝖝𝖎𝖕
WHO
≠ Connotator
“Gothic” (marked)
Connotator
“Gaul” (not marked) Pertinence
Connotators
SLIDE 67 Connotators, pertjnent for the writer
Emphasis
Respect
Connotators
SLIDE 68 Distjnctjve value (pertjnence) of allographs?
- Pertjnent difgerences defjne entjtjes (graphemes, allographs)
–
Distjnctjve value
–
SLIDE 69 «Ɛ»
(Non-)pertjnent allographs: positjonal variants
- Complementary distributjon
–
Hjelmslev’s “varietjes”
«τ» «√»
Allographs
SLIDE 70 «Ɛ»
(Non-)pertjnent allographs: positjonal variants
- Ligatures
- Non-pertjnent for the writer
- Connotators, pertjnent
for (some) readers
–
editors, paleographers, codicologists, historians studying a MS / book
–
(Beneventan vs Caroline script, print font, ſ / s)
«τ» «√»
Allographs
SLIDE 71 (Non-)pertjnent allographs: free variants
√ √ √ √ τ τ τ τ τ Ɛ
Ɛ
- Non-pertjnent for the writer
- Connotators, pertjnent
for (some) readers
–
editors, paleographers, codicologists, historians studying a MS / book
–
(Beneventan vs Caroline script, print font, ſ / s)
SLIDE 72 (Non-)pertjnent allographs: free variants
- Infjnite
- Contjnuum → discrete
–
It is diffjcult to draw boundaries
–
Digital (=discrete) modelling
√ √ √ √ τ τ τ τ τ Ɛ
Ɛ
SLIDE 73 Distjnctjve value (pertjnence) of allographs?
- Graphemes change denotatjve meaning
–
fame vs name
–
Hjelmslev: denotatjve semiotjcs
- Allographs can have other forms of distjnctjve value (pertjnence)
–
For the writer
vs
WHO
- Hjelmslev: connotative semiotics
–
For the reader (digital editor)
- Digital editors can set their own pertinence (transcription) criteria
–
based on their scientific interests
–
E.g.: fraktur font → political connotation in WW1
SLIDE 74
In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
SLIDE 75 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
- How can a structural digital modelling
- f the graphemes/allographs distjnctjon
make my DSE more interoperable?
SLIDE 76 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
OCR/HTT (witness A) Manual (selective) transcription (witness B)
SLIDE 77 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
OCR/HTT (witness A)
Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs
Manual (selective) transcription (witness B)
SLIDE 78 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
OCR/HTT (witness A)
Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs
Manual (selective) transcription (witness B)
SLIDE 79 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
OCR/HTT (witness A)
Unicode characters Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs
Manual (selective) transcription (witness B)
SLIDE 80 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
OCR/HTT (witness A)
Unicode characters Markup / annotation Vnτer <hi>dem</hi> schloss Allographic transcription unter dem ſchloſs
Manual (selective) transcription (witness B)
SLIDE 81 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
Vnτer <hi>dem</hi> schloss unter dem ſchloſs
OCR/HTT (witness A) Manual (selective) transcription (witness B)
Allographic transcription
SLIDE 82 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
Vnτer <hi>dem</hi> schloss unter dem ſchloſs
OCR/HTT (witness A) Manual (selective) transcription (witness B)
- Historical documentation
- Visualization
- Processing
- (Erkenntnispotentiale)
Allographic transcription
SLIDE 83 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
Vnτer <hi>dem</hi> schloss unter dem ſchloſs
Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u
OCR/HTT (witness A) Manual (selective) transcription (witness B)
- Historical documentation
- Visualization
- Processing
- (Erkenntnispotentiale)
Allographic transcription
SLIDE 84 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
Vnτer <hi>dem</hi> schloss unter dem ſchloſs
Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u
unter dem schloss
OCR/HTT (witness A) Manual (selective) transcription (witness B)
unter dem schloss
- Historical documentation
- Visualization
- Processing
- (Erkenntnispotentiale)
Allographic transcription Graphematic transcription
SLIDE 85 In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
Vnτer <hi>dem</hi> schloss unter dem ſchloſs
Gr Allogr s: s t: τ | Ɛ | √ u: u | V Gr Allogr s: s | ſ t: t u: u
unter dem schloss
- Processing
- Search
- Collation
- NLP (lemma, PoS etc.)
- Statistics (dist. reading)
OCR/HTT (witness A) Manual (selective) transcription (witness B)
unter dem schloss
(More) interoperability
- Historical documentation
- Visualization
- Processing
- (Erkenntnispotentiale)
Allographic transcription Graphematic transcription
SLIDE 86
Open issues
SLIDE 87 Open issues
–
Distjnctjve value / pertjnence (capitals, punctuatjon etc.)
–
Ligature segmentatjon one (&) or two (et)?
SLIDE 88 Open issues
–
Spelling (wife / wyfge)
SLIDE 89 Open issues
–
Spelling (wife / wyfge)
–
Abbreviatjons (ūter / unter)
SLIDE 90 Open issues
ūter dem schloss
–
Spelling (wife / wyfge)
–
Abbreviatjons (ūter / unter)
unter dem schloss
Graphematic
ūter dem ſchloſs unter dem ſchloſs
Allographic
SLIDE 91 Open issues
ūter dem schloss [unter]
–
Spelling (wife / wyfge)
–
Abbreviatjons (ūter / unter)
unter dem schloss [unter]
Graphematic
ūter dem ſchloſs unter dem ſchloſs
Allographic Linguistic (normalized)
SLIDE 92 Open issues
ūter dem schloss [unter]
- Processing
- Search
- Collation
- NLP (lemma, PoS etc.)
- Statistics (dist. reading)
Interoperability
–
Spelling (wife / wyfge)
–
Abbreviatjons (ūter / unter)
unter dem schloss [unter]
Graphematic
ūter dem ſchloſs unter dem ſchloſs
Allographic Linguistic (normalized)
SLIDE 93
Outline
SLIDE 94 Outline
- Interoperability
- f digital scholarly editjons (DSEs)
based on diplomatjc transcriptjons
- Digital modelling (ontology)
- f pre-modern writjng systems
–
Graphemes / allographs
–
Allographs: capitals, ligatures, positjonal variants, emphasis etc.
how can grapheme/allograph modelling make my DSE more interoperable?