 
              Deep Learning and Computational Authorship Attribution for Ancient Greek Texts The Case of the Attic Orators Mike Kestemont, Francesco Mambrini & Marco Passarotti Digital Classicist Seminar, Berlin, Germany 16 February 2016
A “Golden Age” of oratory Athens, 5th-4th centuries BCE
Even today! (2003) Our Constitution ... is called a democracy because power is in the hands not of a minority but of the greatest number. Thucydides II, 37
Oratory • Public speech: • court • parliament • ceremonies • Often professional speech writers (‘orators’ >< logographoi ) • Hired by 3rd party (might bring speech themselves) • Often survived in memory first, later written tradition
A canon of 10 names Antiphon ca 480-411 6* Andocides ca 440-390 4* Lysias ca 445-380 35* Isocrates 436-338 21 Isaeus ca 420-350 12 Demosthenes 384-321 61* Aeschines ca 390-322 3 Hyperides ca 390-322 6 Lycurgus ca 390-324 1 Dinarchus ca 360-290 3
Demosthenes Lysias Multiple genres, authenticity issues, professional writers
Why the orators? • large corpus (+600K words) • homogenous chronology, genre and dialect • different personalities • interesting problems of authorship, effect of: • genre • patron • authenticity
Computational Authorship Attribution • Stylometry • How a text is written • Fingerprint • Stylome • Stylistic DNA • (Tendentious)
• Young paradigm (1960) • Mosteller & Wallace (US) • Federalist papers (1780s) • Innovation on 2 levels: • Quantitative approach • Function words
Traditional • Guesswork • Conspicuous features • Odd verbs Mosteller & Wallace • Checklist • But... • Inconspicuous features • schools, workshops, ... • Function words • Tradition • articles (the, it, a) • Forgeries, imitation, ... • prepositions (on, from, to) • … • pronouns (self, he) • (Attic orators rich tradition!)
Advantage? Many observations All authors, same set Relatively content-independent
Count the number of f’s on the following slide...
Finished files are the result of years of scientific study combined with the experience of many years.
How many?
Do we process function words ‘subconsciously’? Finished files are the result of years of scientific study combined with the experience of many years.
Which text is on the following slide?
So?
Difficult to spot errors...
Unimportant?
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
“Functors” • Function words = ‘grammatical morphemes’ • = “functors” in psycholinguistics • In English often individual words • In more inflected languages: often affixes • Easy (naive?) solution: n-grams
N-grams • Intuitive concept: slices of length n • bigrams ( n =2): ‘_b’, ‘bi’, ‘ig’, ‘gr’, ‘ra’, ‘am’, ‘ms’, ‘s_’ • Originally used in language identification • So far, best feature in authorship attribution • Sensitive to morphemic information (e.g. ‘s_’) • ‘Functional’ n-grams are best (incl. punctuation)
character tetragrams (top 100) _ αὐτ - _ γὰρ - _ δʼ _ - _ δὲ _ - _ εἰς - _ κατ - _ καὶ - _ μὲν - _ μὴ _ - _ οὐ _ - _ οὐδ - _ οὐκ - _ παρ - _ περ - _ πολ - _ προ - _ πρὸ - _ πόλ - _ ταῦ - _ τού - _ τοὺ - _ τοῖ - _ τοῦ - _ τὰ _ - _ τὴν - _ τὸ _ - _ τὸν - _ τῆς - _ τῶν - _ τῷ _ - _ ἀλλ - _ ἀπο - _ ἂν _ - _ ἐν _ - _ ἐπι - _ ὡς _ - ίαν _ - ίας _ - αι _ τ - αὐτο - αὶ _ π - αὶ _ τ - γὰρ _ - δὲ _ τ - ειν _ - ερὶ _ - εἰς _ - εῖν _ - θαι _ - ι _ κα - ι _ το - καὶ _ - μένο - μενο - μὲν _ - ν _ αὐ - ν _ εἰ - ν _ κα - ν _ οὐ - ν _ πρ - ν _ το - ν _ ἐπ - ναι _ - νον _ - νος _ - ντα _ - ντας - ντες - ντων - νων _ - οις _ - ους _ - οὐκ _ - οὺς _ - οῖς _ - οῦτο - περὶ - πρὸς - ρὸς _ - ς _ κα - ς _ οὐ - ς _ το - σθαι - σιν _ - ται _ - τας _ - τες _ - τον _ - τος _ - τούτ - τοὺς - τοῖς - τοῦ _ - των _ - τὰς _ - τὴν _ - τὸν _ - τῆς _ - τῶν _ - ὶ _ το
Advances, but many challenges • (Large) benchmark datasets (cf. PAN) • Cross-genre attribution (cf. suicide notes) • Document length (cf. tweets) • Separating content from style: • Function words work well for long texts • Mine stylistic information from content words too
Artificial Intelligence (AI) Reproduce human intelligence in software
Machine Learning • “Learning” is central component of human intelligence • Optimise behaviour, anticipating the future • All applications: map input to output • Huge advances recently, via Deep Learning, a specific paradigm [Lecun et al. 2015]
Deep Learning paradigm Layered neural networks
‘Shallow’ versus ‘Deep’
Computer Vision Importance of layers
Low-level features Used to be ‘handcrafted’!
Higher-level features
Analogies human brain e.g. [Cahieu et al. 2014]
Representation Learning • More ‘objective’ name • Networks learn to represent data • To large extent autonomously • (As opposed to ‘handcrafting’)
Cat paper 10 Million 200x200 images from YouTube (1 week) [Quoc et al. 2012]
Cat paper (2) [Quoc et al. 2012]
How does it work? Chancellor elections C1 C2 C3 F1 F2 F3 F4 F5
Every faculty gets a vote C1 C2 C3 … F1 F2 F3 F4 F5
Votes get weighed Some faculties more important C1 C2 C3 .25 .10.10 .25 .10 .05 .25 .05 … .05 F1 F2 F3 F4 F5
A ‘dense’ layer C1 C2 C3 Dense layer F1 F2 F3 F4 F5
We add layers of ‘representation’ (Student union, professors, … get different weight too) C1 C2 C3 F1 F2 F3 F4 F5 Different sensitivities at different layers Student Profes- Dept. (Students like free beers, union sors Library librarians like free books, …) . . . . . . . . . . . . . . . . . . . . … . . .
Learning = optimising weights (“Lobbying” for a certain candidate) C1 C2 C3 .25 .10.10 .25 .10 .05 .25 .05 … .05 F1 F2 F3 F4 F5
Neural architecture (3 layers) Ten authors Softmax Dense layer Highway layer Dense layer Input features
Networks uncommon in stylometry (Data size?) Burrows’s Delta Support Vector Machine Nearest neighbour Discriminative margin Intuitive ‘Black magic’
Document-Feature matrix ‘Bag of words’ model _ αἰσ _ βασ γενέ ημέν εσθα ες _ ἐ ναῖο ν _ ἀφ Dem1 Dem2 Dem3 Lyc1 … Lyc2 … Ant1 Ant2 E.g. 2000 columns (# MFI)
Experiment • Leave-one-text-out attribution • Non-disputed texts only • (But class imbalance…) • Evaluation: Accuracy, F1 (weighted), F1 (macro- averaged) • Different features + MFI
200 2,000 20,000 Acc F1(w) F1(m) Acc F1(w) F1(m) Acc F1(w) F1(m) Delta 76.04 75.50 50.22 60.00 60.70 34.66 23.20 29.83 15.57 SVM 83.20 81.06 53.21 81.97 77.57 45.27 63.45 52.04 19.01 Net 80.74 79.30 49.68 85.67 83.83 55.60 83.95 81.52 55.92 words 200 2,000 20,000 Acc F1(w) F1(m) Acc F1(w) F1(m) Acc F1(w) F1(m) Delta 76.04 74.33 46.10 82.22 80.76 59.46 50.86 51.03 41.73 SVM 79.75 77.69 48.54 84.44 81.42 54.46 78.02 72.37 39.15 Net 79.50 78.37 0.4816 85.92 84.29 60.41 84.69 81.38 46.38 character tetragrams
Results • Net produces single highest score • But mostly on par with SVM • (Delta does surprisingly well, but never best) • Net: impressive robustness to large input space
Visualization (PCA)
Visualization (PCA)
Visualization (Net)
Visualization (Net)
Features
Features
Features
Dem 58 Dem 60 Dem 61 Dem 7 Dem 60 Dem 58
Thank you! Deep Learning and Computational Authorship Attribution for Ancient Greek Texts The Case of the Attic Orators Mike Kestemont, Francesco Mambrini & Marco Passarotti Digital Classicist Seminar, Berlin, Germany 16 February 2016
Recommend
More recommend