Paolo Monella Encoding pre-modern writing systems The scholarly - - PowerPoint PPT Presentation

paolo monella
SMART_READER_LITE
LIVE PREVIEW

Paolo Monella Encoding pre-modern writing systems The scholarly - - PowerPoint PPT Presentation

Paolo Monella Encoding pre-modern writing systems The scholarly digital edition and the humanities. Theoretical approaches and alternative tools DiXiT Workshop, Rome, 4 December 2014 Saussure Saussure Ferdinand De Saussure Relational nature


slide-1
SLIDE 1

Paolo Monella

Encoding pre-modern writing systems

The scholarly digital edition and the humanities. Theoretical approaches and alternative tools DiXiT Workshop, Rome, 4 December 2014

slide-2
SLIDE 2

Saussure

slide-3
SLIDE 3

Saussure

Ferdinand De Saussure Relational nature

  • f signs

within a semiotic system

slide-4
SLIDE 4

a b c d e f g h i j l m n o p q r s t u v z . , :

a b c d e f g h i l m n o p q r s t u z . :

Saussure

MS A MS B

slide-5
SLIDE 5

a b c d e f g h i j l m n o p q r s t u v z . , :

a b c d e f g h i l m n o p q r s t u z . :

Saussure

MS A MS B

slide-6
SLIDE 6

Saussure

a b c d e f g h i l m n o p q r s t u z . :

MS A

a b c d e f g h i j l m n o p q r s t u v z . , :

yes: sir

a b c d e f g h i j l m n o p q r s t u v z . , :

he said: alas

U+003A

MS B

slide-7
SLIDE 7

Saussure

a b c d e f g h i l m n o p q r s t u z . :

MS A

a b c d e f g h i j l m n o p q r s t u v z . , :

Euery noun

a b c d e f g h i j l m n o p q r s t u v z . , :

U+0075

Every noun

U+0075 U+0076

MS B

slide-8
SLIDE 8

Saussure

Euery noun U+0075 Every noun U+0075 U+0076

  • Why comparing texts?
slide-9
SLIDE 9

Saussure

Euery noun U+0075 Every noun U+0075 U+0076

  • Why comparing texts?

– Textual Criticism

MS A MS B

slide-10
SLIDE 10

Saussure

Euery noun U+0075 Every noun U+0075 U+0076

  • Why comparing texts?

– Textual Criticism – Processing

(e. g. cross-corpus search)

query: "every"

slide-11
SLIDE 11

Saussure

  • TEI: Unicode

(a “u” is a “u”)

– P5.vi – P5.5

slide-12
SLIDE 12

Saussure

  • TEI: Unicode

(a “u” is a “u”)

  • Corpus-wide

normalization: Canterbury Tales documentation

slide-13
SLIDE 13

Saussure

  • T. Orlandi: SDE

and the theory

  • f systems

→ Graphic system

  • f MS D
slide-14
SLIDE 14

XML/TEI P5 Gaiji

slide-15
SLIDE 15

XML/TEI P5 Gaiji

a b

funny-b

c d ...

slide-16
SLIDE 16

XML/TEI P5 Gaiji

<charDecl> a b

<glyph xml:id="funny-b">

a b

funny-b

c d ...

slide-17
SLIDE 17

XML/TEI P5 Gaiji → <char>

<charDecl> <char xml:id= "uv"> <charName> SMALL LATIN U OR V </charName> <desc> Expression: U-shaped when lowercase, V-shaped when uppercase. Content: either letter Latin u or letter Latin v </desc>

Guide- lines

<charDecl> a b

<glyph xml:id="funny-b">

slide-18
SLIDE 18

XML/TEI P5 Gaiji → <char>

<charDecl> <char xml:id= "uv"> <charProp> <localName> Expression </localName> <value> U+0075 </value> <localName> Content </localName> <value> u|v </value>

Guide- lines

<charDecl> a b

<glyph xml:id="funny-b">

slide-19
SLIDE 19

XML/TEI P5 Gaiji → <char>

<charDecl> <char xml:id= "v"> <charName>Premodern Latin uncial lowercase v</charName> <charProp> <localName>Expression </localName> <value>U+0076</value> <localName>Content </localName> <value>v</value> <charProp> <mapping type="standard">v </mapping> <graphic url="v.jpg"/ </char> <charDecl>

Guide- lines

<charDecl> a b

<glyph xml:id="funny-b">

slide-20
SLIDE 20

XML/TEI P5 Gaiji → Comparing

<char xml:id= "u"> <charProp> <localName>Expression </localName> <value>U+0075</value> <localName>Content </localName> <value>u</value> <char xml:id= "uv"> <charProp> <localName>Expression </localName> <value>U+0075</value> <localName>Content </localName> <value>u|v</value> <char xml:id= "v"> <charProp> <localName>Expression </localName> <value>U+0076</value> <localName>Content </localName> <value>v</value> MS A MS B <g ref="#v" /> <g ref="#uv" />

slide-21
SLIDE 21

XML/TEI P5 Gaiji → Comparing

<char xml:id= "u"> <mapping type="standard">u </mapping> <char xml:id= "uv"> <mapping type="mapto_u">u </mapping> <mapping type="mapto_v">v </mapping> <char xml:id= "v"> <mapping type="standard">v </mapping> MS A MS B

<g ref="#uv" type="mapto_v" /> <g ref="#uv" type="mapto_u” />

slide-22
SLIDE 22

XML/TEI P5 Gaiji → Comparing

<char xml:id= "u"> <mapping type="standard">u </mapping> <char xml:id= "uv"> <mapping type="mapto_u">u </mapping> <mapping type="mapto_v">v </mapping> <char xml:id= "v"> <mapping type="standard">v </mapping> MS A MS B

<g ref="#uv" type="mapto_v" /> <g ref="#v" />

slide-23
SLIDE 23

XML/TEI P5 Gaiji → All signs?

<charDecl> <char xml:id="a"> <char xml:id="b"> <char xml:id="uv">

Guide- lines

a b c … uv ... <charDecl> a b <char xml:id="uv">

slide-24
SLIDE 24

XML/TEI P5 Gaiji → All signs?

<body> a b <g ref="#uv"> <charDecl> <char xml:id="a"> <char xml:id="b"> <char xm:id="uv">

Guide- lines

<charDecl> a b <char xml:id="uv">

slide-25
SLIDE 25

XML/TEI P5 Gaiji → All signs?

<body> a b <g ref="#uv"> <charDecl> <char xml:id="a"> <char xml:id="b"> <char xm:id="uv">

Guide- lines

<charDecl> a b <char xml:id="uv">

slide-26
SLIDE 26

XML/TEI P5 Gaiji → All signs?

<body> <g ref="#a"> <g ref="#b"> <g ref="#uv"> <charDecl> <char xml:id="a"> <char xml:id="b"> <char xm:id="uv">

Guide- lines

<body> a b <g ref="#uv"> <charDecl> a b <char xml:id="uv">

slide-27
SLIDE 27

XML/TEI P5 Gaiji → All signs?

<body> <g ref="#a"> <g ref="#b"> <g ref="#uv"> <charDecl> <char xml:id="a"> <char xml:id="b"> <char xm:id="uv">

Guide- lines

<body> a b <g ref="#uv"> <charDecl> a b <char xml:id="uv">

slide-28
SLIDE 28

XML/TEI P5 Gaiji → All signs?

<body> <g ref="#a"> <g ref="#b"> <g ref="#uv"> <charDecl> <char xml:id="a"> <char xml:id="b"> <char xm:id="uv">

Guide- lines

Vespa Project www.unipa.it/paolo.monella/lincei/edition.html <body> a b <g ref="#uv"> <charDecl> a b <char xml:id="uv">

slide-29
SLIDE 29

XML/TEI P5 Gaiji → All signs?

<charDecl> <char xml:id="a"> <char xml:id="b"> <char xm:id="uv"> <body> <g ref="#a"> <g ref="#b"> <g ref="#uv">

Guide- lines

<body> a b <g ref="#uv">

Techn.

<body> a b <g ref="#uv"> <charDecl> a b <char xml:id="uv">

<g ref="#uv" />ir <g ref="#uv" />ir <g ref="#uv" /> <g ref="#i" /> <g ref="#r" />

slide-30
SLIDE 30

XML/TEI P5 Gaiji → What would it take?

slide-31
SLIDE 31

XML/TEI P5 Gaiji → What would it take?

<charDecl> a b <glyph id="fun-b"> <charDecl> <char xml:id="a"> <char xml:id="b"> <glyph id="fun-b"> <body> <g ref="#a"> <g ref="#b"> <g ref="#fun-b"> <body> a b <g ref="#fun-b">

Guide- lines

<body> a b <g ref="#fun-b">

Techn.

slide-32
SLIDE 32

XML/TEI P5 Gaiji → What would it take?

<charDecl> a b <glyph id="fun-b"> <charDecl> <char xml:id="a"> <char xml:id="b"> <glyph id="fun-b"> <body> <g ref="#a"> <g ref="#b"> <g ref="#fun-b"> <body> a b <g ref="#fun-b">

Guide- lines

<body> a b <g ref="#fun-b">

Techn.

  • More lines of code
slide-33
SLIDE 33

XML/TEI P5 Gaiji → What would it take?

<charDecl> a b <glyph id="fun-b"> <charDecl> <char xml:id="a"> <char xml:id="b"> <glyph id="fun-b"> <body> <g ref="#a"> <g ref="#b"> <g ref="#fun-b"> <body> a b <g ref="#fun-b">

Guide- lines

<body> a b <g ref="#fun-b">

Techn.

  • More lines of code
  • Guidelines: <char>
slide-34
SLIDE 34

XML/TEI P5 Gaiji → What would it take?

<charDecl> a b <glyph id="fun-b"> <charDecl> <char xml:id="a"> <char xml:id="b"> <glyph id="fun-b"> <body> <g ref="#a"> <g ref="#b"> <g ref="#fun-b"> <body> a b <g ref="#fun-b">

Guide- lines

<body> a b <g ref="#fun-b">

Techn.

  • More lines of code
  • Guidelines: <char>
  • Technical: no <g>
slide-35
SLIDE 35

XML/TEI P5 Gaiji → What would it take?

<charDecl> a b <glyph id="fun-b"> <charDecl> <char xml:id="a"> <char xml:id="b"> <glyph id="fun-b"> <body> <g ref="#a"> <g ref="#b"> <g ref="#fun-b"> <body> a b <g ref="#fun-b">

Guide- lines

<body> a b <g ref="#fun-b">

Techn.

  • More lines of code
  • Guidelines: <char>
  • Technical: no <g>
  • Interoperability: <mapping>
slide-36
SLIDE 36

XML/TEI P5 Gaiji → What would it take?

<charDecl> a b <glyph id="fun-b"> <charDecl> <char xml:id="a"> <char xml:id="b"> <glyph id="fun-b"> <body> <g ref="#a"> <g ref="#b"> <g ref="#fun-b"> <body> a b <g ref="#fun-b">

Guide- lines

<body> a b <g ref="#fun-b">

  • More lines of code
  • Guidelines: <char>
  • Technical: no <g>
  • Interoperability: <mapping>

Techn.

slide-37
SLIDE 37

Vespa Project

slide-38
SLIDE 38

Graphemes ID Content (alphabemes ID) Expression

t t Latin minuscule uncial t u uv Latin minuscule uncial u/v (u-shaped, not v-shaped) ae a, e Latin minuscule uncial e with tail bottom left: img/ax.jpg b_ b, i, s Latin minuscule uncial b with macron top right · Middle dot

Vespa Project → Table of signs

slide-39
SLIDE 39

Graphemes ID Words ID Judici(u_) nou,[iudicium],n,s,iudicium coci nou,[cocus],g,s,coci et con,[et],et pistoris nou,[pistor],g,s,pistoris

  • lb

Vespa Project → Source file

slide-40
SLIDE 40

graphematic.xml

  • <g id="1.1" ref="#J" />
  • <g id="1.2" ref="#u" />
  • <g id="1.3" ref="#d" />
  • <g id="1.4" ref="#i" />
  • <g id="1.5" ref="#c" />
  • <g id="1.6" ref="#i" />
  • <g id="1.7" ref="#u_" />

alphabetic.xml

  • <c id="1.1.1" ref="#j" />
  • <c id="1.2.1" ref="#uv" />
  • <c id="1.3.1" ref="#d" />
  • <c id="1.4.1" ref="#i" />
  • <c id="1.5.1" ref="#c" />
  • <c id="1.6.1" ref="#i" />
  • <c id="1.7.1" ref="#uv" />
  • <c id="1.7.2" ref="#m" />

linguistic.xml

  • <w id="1">

nou,[iudicium],n,s,iudicium </w>

Vespa Project → Generated files

slide-41
SLIDE 41

graphematic.xml

  • <g id="1.1" ref="#J" />
  • <g id="1.2" ref="#u" />
  • <g id="1.3" ref="#d" />
  • <g id="1.4" ref="#i" />
  • <g id="1.5" ref="#c" />
  • <g id="1.6" ref="#i" />
  • <g id="1.7" ref="#u_" />

alphabetic.xml

  • <c id="1.1.1" ref="#j" />
  • <c id="1.2.1" ref="#uv" />
  • <c id="1.3.1" ref="#d" />
  • <c id="1.4.1" ref="#i" />
  • <c id="1.5.1" ref="#c" />
  • <c id="1.6.1" ref="#i" />
  • <c id="1.7.1" ref="#uv" />
  • <c id="1.7.2" ref="#m" />

align_alph_graph.xml

  • <link targets="graphematic.xml#1.7 #1.7" />
  • <ptr id="1.7" targets=

"alphabetic.xml#1.7.1 alphabetic.xml#1.7.2" />

Vespa Project → Alignment

slide-42
SLIDE 42

Looking for solutions

slide-43
SLIDE 43

Looking for solutions

  • Medieval Unicode Font Initiative →
  • Unicode Private Use Area (PUA) →
  • Gaiji <glyph> →
  • SGML/TEI P3 Writing System Declaration (WSD) →
slide-44
SLIDE 44

Looking for solutions

  • SGML/TEI P3 Writing System Declaration (WSD) →

<writingSystemDeclaration lang='eng' name=' ... ' date='1993-05-29'> <language iso639='...'><!-- name of language here --></language> <script><!-- description of script here ... --></script> <direction chars=LR lines=TB> <characters> <!-- description of character inventory here ... --> </characters> </writingSystemDeclaration>

slide-45
SLIDE 45

Looking for solutions

  • SGML/TEI P3 Writing System Declaration (WSD) →

25.4.1 Base Components of the WSD […] in the <characters> element:

  • reference to an international standard
  • reference to a public set of SGML entities
  • reference to another WSD
  • formal declaration of each graphic unit in the writing system
  • a combination of the above
slide-46
SLIDE 46

Looking for solutions

  • SGML/TEI P3 Writing System Declaration (WSD) →

Unicode

slide-47
SLIDE 47

Looking for solutions

  • SGML/TEI P3 Writing System Declaration (WSD) →

Unicode

… then Unicode arrived! Birnbaum, Cleminson, Kempgen & Ribarov 2008 →

slide-48
SLIDE 48

Looking for solutions

<charDecl> <char xml:id="a"> <char xml:id="b"> <char xm:id="uv"> <body> <g ref="#a"> <g ref="#b"> <g ref="#uv">

Guide- lines

<body> a b <g ref="#uv">

Techn.

<body> a b <g ref="#uv"> <charDecl> a b <char xml:id="uv">