Greek Historiography Through Dependency Syntax Treebanking Digital - - PowerPoint PPT Presentation

greek historiography through dependency syntax treebanking
SMART_READER_LITE
LIVE PREVIEW

Greek Historiography Through Dependency Syntax Treebanking Digital - - PowerPoint PPT Presentation

Greek Historiography Through Dependency Syntax Treebanking Digital Classicist New England March 25, 2015, Tufts University Robert J. Gorman, Dept. of Classics Vanessa B. Gorman, Dept. of History University of Nebraska-Lincoln


slide-1
SLIDE 1

Greek Historiography Through Dependency Syntax Treebanking

Digital Classicist New England March 25, 2015, Tufts University
 Robert J. Gorman, Dept. of Classics Vanessa B. Gorman, Dept. of History University of Nebraska-Lincoln

slide-2
SLIDE 2

http://www.dh.uni-leipzig.de/wo/projects/digital- athenaeus/

slide-3
SLIDE 3

How accurate are the quotes, paraphrases, excerpts, and epitomes attributed to earlier authors?


slide-4
SLIDE 4

The Layers of Athenaeus (c. 200 CE)

  • Narrator (Athenaeus himself)
  • The 24 Deipnosophists
  • 2500+ quotes or paraphrases to 800+

writers

  • All hopelessly intertangled
slide-5
SLIDE 5

Corrupting Luxury in Ancient Greek Literature
 


By 
 Robert J. Gorman
 and
 Vanessa B. Gorman
 


The University of Michigan Press, Ann Arbor

slide-6
SLIDE 6

Derive Syntactic “Thumbprints”

  • Create a database of syntactically

analyzed Greek prose

  • Teach the computer to distinguish

known authors (proof of concept)

  • Compare directly-transmitted to

epitomized prose by the same author

slide-7
SLIDE 7

Epitomizers and Excerptors

  • Polybius (2nd c. BCE) has 5 of 40 books

preserved through direct transmission

  • Others mainly preserved in the excerptors

working for Emperor Constantine VII Porphyrogenitus (10th c. CE)

  • Diodorus Siculus (1st c. BCE) has 15 of

40 books preserved through direct transmission

  • Others mainly in Photius (9th century CE)

and the same Constantine excerptors

slide-8
SLIDE 8

Fragments of Lost Authors

  • Compare to fragments of the same

author that are preserved elsewhere

  • Compare to context in Athenaeus and

Photius

  • Does it resemble:
  • The other fragments of the same author?
  • The context in Athenaeus?
slide-9
SLIDE 9

Dependency Syntax Treebanking

  • Corpus Linguistics
  • Annotation: create a database of

syntactically-analyzed prose

  • Abstraction: translate into a computer

searchable dataset

  • Analysis: develop algorithms to query

that dataset

slide-10
SLIDE 10

Dependency vs. Constituency Grammar

slide-11
SLIDE 11

Dependency vs. Constituency Grammar

slide-12
SLIDE 12

http://nlp.perseus.tufts.edu/syntax/treebank/greek.html

slide-13
SLIDE 13

My Dataset

AUTHOR WORK TOKEN COUNT STATUS Athenaeus Books 12-13 45,584 tokens submitted Lysias Orations 1, 14, 15 7,650 tokens submitted Polybius Book 1 28,288 tokens submitted Herodotus Book 1 32,879 tokens editing Plutarch Lycurgus 10,567 tokens submitted Antiphon Oration 1 2,015 tokens editing Diodorus Siculus Book 11 6,247 tokens [11.1-20 only] in progress Thucydides Book 1 13,720 tokens [1.1-80 only] in progress TOTAL [2/20/2015] 146,950 tokens

slide-14
SLIDE 14
slide-15
SLIDE 15

παρεσκευάζετο γὰρ πολλῇ δυνάμει 
 πλεῖν ἐπὶ τὴν Ἑλλάδα καὶ συμμαχεῖ ν 
 τοῖς Ἕλλησι κατὰ τῶν Περσῶν .

“He was preparing to sail to Greece with a great force and to fight with the Greeks against the Persians.” 
 


(Diodorus 11.26.4 [sent. 58])

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Color coding

slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24

Prague tagset

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
  • Thuc. 1.13.4 [elision]
slide-28
SLIDE 28

A flat tree: Thuc. 1.9.2 [135 words]

slide-29
SLIDE 29

A deep tree: Athen. 12.11 [82 words]

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32

For each word in AGDT we have:


  • dependency (word’s parent, children)
  • syntactic relation (grammatical label for

dependency)

  • Lemma
  • Morphology
  • Position in sentence
slide-33
SLIDE 33
slide-34
SLIDE 34

Dependency Degree Linear vs. hubby structure

slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37

Mary: SBJ-PRED-ROOT had: PRED-ROOT
 a: ATR-OBJ-PRED-ROOT
 little: ATR-OBJ-PRED-ROOT
 lamb: OBJ-PRED-ROOT

slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41

Ὦ τοῦ στρατηγήσαντος ἐν Τροίᾳ ποτὲ / Ἀγαμέμνονος παῖ "O child of Agamemnon, once leading an army at Troy"

slide-42
SLIDE 42

Ὦ τοῦ στρατηγήσαντος ἐν Τροίᾳ ποτὲ / Ἀγαμέμνονος παῖ "O child of Agamemnon, once leading an army at Troy"

slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51

Burrows Delta

slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55

Craig’s Zeta

Hit Hit Hit Hit Hit Hit Hit

  • Divide corpus 1 into segments of equal size (size = n)
  • Segments with at least 1 example of given feature are hits.
  • Each hit is worth 1 point.
  • Hits/segments = preferred feature score

Miss Miss Miss Miss Miss Miss Miss Miss

  • Divide corpus 2 into segments of size n.
  • Segments with no examples of feature are misses.
  • Each miss is worth -1 point.
  • Misses/segments = avoided feature score
slide-56
SLIDE 56
slide-57
SLIDE 57

Thucydides


slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60
slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63
slide-64
SLIDE 64

Herodotus

slide-65
SLIDE 65
slide-66
SLIDE 66
slide-67
SLIDE 67

Polybius

slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70

Homer

slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73
slide-74
SLIDE 74

Maciej Eder

slide-75
SLIDE 75
slide-76
SLIDE 76
slide-77
SLIDE 77
slide-78
SLIDE 78

What Next?

  • Test! Test! Test!
  • Cast the net as widely as possible:
  • Many flavors of sWord
  • With POS, with Dependency Distance …
  • N-grams
  • Many computational approaches
slide-79
SLIDE 79

What next?

  • Test! Test! Test!
  • Aim directly at research question
  • Athenaeus and fragments
  • Are fragments of single author

distinguishable according to transmitting source?


slide-80
SLIDE 80

What’s needed?

  • Trees! Trees! Trees!
  • Metadata
  • Digital Athenaeus
  • Digital Fragmenta Historicorum

Graecorum

  • Scalable workflow
  • Stable identification for each token

slide-81
SLIDE 81

The Vision Thing

  • Treebanker’s Utopia
  • Real time feedback for annotators
  • Is this syntactic structure feasible?
  • Is this structure prone to inter-annotator

disagreement?


  • Philologist’s Elysium
  • Real time feedback for close readers
  • How does this text compare to others:
  • Lexically, syntactically, semantically?
  • Pragmatically, acoustically, etc.?
slide-82
SLIDE 82
  • Leipzig Open Philology Project
  • Digital Athenaeus Project
  • Perseus and Perseids Projects, Tufts

University

  • Perseus Open Publication Series
  • University of Nebraska−Lincoln
  • Dept. of History
  • Dept. of Classics and Religious Studies
slide-83
SLIDE 83