Deep Learning and Computational Authorship Attribution for Ancient - - PowerPoint PPT Presentation

deep learning and computational authorship attribution
SMART_READER_LITE
LIVE PREVIEW

Deep Learning and Computational Authorship Attribution for Ancient - - PowerPoint PPT Presentation

Deep Learning and Computational Authorship Attribution for Ancient Greek Texts The Case of the Attic Orators Mike Kestemont, Francesco Mambrini & Marco Passarotti Digital Classicist Seminar, Berlin, Germany 16 February 2016 A Golden


slide-1
SLIDE 1

Deep Learning and Computational Authorship Attribution for Ancient Greek Texts

The Case of the Attic Orators

Mike Kestemont, Francesco Mambrini & Marco Passarotti Digital Classicist Seminar, Berlin, Germany 16 February 2016

slide-2
SLIDE 2

A “Golden Age” of oratory

Athens, 5th-4th centuries BCE

slide-3
SLIDE 3

Even today! (2003)

Our Constitution ... is called a democracy because power is in the hands not of a minority but of the greatest number. Thucydides II, 37

slide-4
SLIDE 4

Oratory

  • Public speech:
  • court
  • parliament
  • ceremonies
  • Often professional speech writers

(‘orators’ >< logographoi)

  • Hired by 3rd party (might bring

speech themselves)

  • Often survived in memory first,

later written tradition

slide-5
SLIDE 5

A canon of 10 names

Antiphon ca 480-411 6* Andocides ca 440-390 4* Lysias ca 445-380 35* Isocrates 436-338 21 Isaeus ca 420-350 12 Demosthenes 384-321 61* Aeschines ca 390-322 3 Hyperides ca 390-322 6 Lycurgus ca 390-324 1 Dinarchus ca 360-290 3

slide-6
SLIDE 6

Lysias Demosthenes

Multiple genres, authenticity issues, professional writers

slide-7
SLIDE 7

Why the orators?

  • large corpus (+600K words)
  • homogenous chronology,

genre and dialect

  • different personalities
  • interesting problems of

authorship, effect of:

  • genre
  • patron
  • authenticity
slide-8
SLIDE 8

Computational Authorship Attribution

  • Stylometry
  • How a text is written
  • Fingerprint
  • Stylome
  • Stylistic DNA
  • (Tendentious)
slide-9
SLIDE 9
  • Young paradigm (1960)
  • Mosteller & Wallace (US)
  • Federalist papers (1780s)
  • Innovation on 2 levels:
  • Quantitative approach
  • Function words
slide-10
SLIDE 10
  • Guesswork
  • Conspicuous features
  • Odd verbs
  • Checklist
  • But...
  • schools, workshops, ...
  • Tradition
  • Forgeries, imitation, ...
  • (Attic orators rich tradition!)
  • Inconspicuous features
  • Function words
  • articles (the, it, a)
  • prepositions (on, from, to)
  • pronouns (self, he)

Traditional

Mosteller & Wallace

slide-11
SLIDE 11

Advantage?

Many observations All authors, same set Relatively content-independent

slide-12
SLIDE 12

Count the number of f’s on the following slide...

slide-13
SLIDE 13

Finished files are the result

  • f years of scientific study

combined with the experience

  • f many years.
slide-14
SLIDE 14

How many?

slide-15
SLIDE 15

Finished files are the result

  • f years of scientific study

combined with the experience

  • f many years.

Do we process function words ‘subconsciously’?

slide-16
SLIDE 16

Which text is on the following slide?

slide-17
SLIDE 17
slide-18
SLIDE 18

So?

slide-19
SLIDE 19

Difficult to spot errors...

slide-20
SLIDE 20

Unimportant?

slide-21
SLIDE 21

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

slide-22
SLIDE 22

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

slide-23
SLIDE 23

“Functors”

  • Function words = ‘grammatical morphemes’
  • = “functors” in psycholinguistics
  • In English often individual words
  • In more inflected languages: often affixes
  • Easy (naive?) solution: n-grams
slide-24
SLIDE 24

N-grams

  • Intuitive concept: slices of length n
  • bigrams (n=2): ‘_b’, ‘bi’, ‘ig’, ‘gr’, ‘ra’, ‘am’, ‘ms’, ‘s_’
  • Originally used in language identification
  • So far, best feature in authorship attribution
  • Sensitive to morphemic information (e.g. ‘s_’)
  • ‘Functional’ n-grams are best (incl. punctuation)
slide-25
SLIDE 25

_αὐτ - _γὰρ - _δʼ_ - _δὲ_ - _εἰς - _κατ - _καὶ - _μὲν - _μὴ_ - _οὐ_ - _οὐδ - _οὐκ - _παρ - _περ - _πολ - _προ - _πρὸ - _πόλ - _ταῦ - _τού - _τοὺ - _τοῖ - _τοῦ

  • _τὰ_ - _τὴν - _τὸ_ - _τὸν - _τῆς - _τῶν - _τῷ_ -

_ἀλλ - _ἀπο - _ἂν_ - _ἐν_ - _ἐπι - _ὡς_ - ίαν_ - ίας_ - αι_τ - αὐτο - αὶ_π - αὶ_τ - γὰρ_ - δὲ_τ - ειν_ - ερὶ_ - εἰς_ - εῖν_ - θαι_ - ι_κα - ι_το - καὶ_ - μένο - μενο - μὲν_ - ν_αὐ - ν_εἰ - ν_κα - ν_οὐ - ν_πρ - ν_το - ν_ἐπ - ναι_ - νον_ - νος_ - ντα_ - ντας - ντες - ντων - νων_ - οις_ - ους_ - οὐκ_ - οὺς_ - οῖς_ - οῦτο - περὶ - πρὸς - ρὸς_ - ς_κα - ς_οὐ - ς_το - σθαι - σιν_ - ται_ - τας_ - τες_ - τον_ - τος_ - τούτ - τοὺς - τοῖς - τοῦ_ - των_ - τὰς_ - τὴν_ - τὸν_ - τῆς_ - τῶν_ - ὶ_το character tetragrams (top 100)

slide-26
SLIDE 26

Advances, but many challenges

  • (Large) benchmark datasets (cf. PAN)
  • Cross-genre attribution (cf. suicide notes)
  • Document length (cf. tweets)
  • Separating content from style:
  • Function words work well for long texts
  • Mine stylistic information from content words too
slide-27
SLIDE 27

Artificial Intelligence (AI)

Reproduce human intelligence in software

slide-28
SLIDE 28

Machine Learning

  • “Learning” is central component of

human intelligence

  • Optimise behaviour, anticipating the

future

  • All applications: map input to output
  • Huge advances recently, via Deep

Learning, a specific paradigm [Lecun et al. 2015]

slide-29
SLIDE 29

Deep Learning paradigm

Layered neural networks

slide-30
SLIDE 30

‘Shallow’ versus ‘Deep’

slide-31
SLIDE 31

Computer Vision

Importance of layers

slide-32
SLIDE 32

Low-level features

Used to be ‘handcrafted’!

slide-33
SLIDE 33

Higher-level features

slide-34
SLIDE 34

Analogies human brain

e.g. [Cahieu et al. 2014]

slide-35
SLIDE 35

Representation Learning

  • More ‘objective’ name
  • Networks learn to represent data
  • To large extent autonomously
  • (As opposed to ‘handcrafting’)
slide-36
SLIDE 36

Cat paper

10 Million 200x200 images from YouTube (1 week)

[Quoc et al. 2012]

slide-37
SLIDE 37

Cat paper (2)

[Quoc et al. 2012]

slide-38
SLIDE 38

How does it work?

Chancellor elections

C1 C2 C3 F1 F2 F3 F4 F5

slide-39
SLIDE 39

Every faculty gets a vote

C1 C2 C3 F1 F2 F3 F4 F5

slide-40
SLIDE 40

Votes get weighed

Some faculties more important

C1 C2 C3 F1 F2 F3 F4 F5

… .25 .25 .25 .10 .10.10 .05 .05 .05

slide-41
SLIDE 41

A ‘dense’ layer

C1 C2 C3 F1 F2 F3 F4 F5

Dense layer

slide-42
SLIDE 42

We add layers of ‘representation’

(Student union, professors, … get different weight too)

C1 C2 C3 F1 F2 F3 F4 F5

… . . . . . . . . .

Student union Profes- sors Dept. Library

. . . . . . . . . . . . . .

Different sensitivities at different layers (Students like free beers, librarians like free books, …)

slide-43
SLIDE 43

Learning = optimising weights

(“Lobbying” for a certain candidate)

C1 C2 C3 F1 F2 F3 F4 F5

… .25 .25 .25 .10 .10.10 .05 .05 .05

slide-44
SLIDE 44

Neural architecture (3 layers)

Input features Ten authors

Dense layer

Highway layer Dense layer Softmax

slide-45
SLIDE 45

Networks uncommon in stylometry

(Data size?)

Burrows’s Delta Support Vector Machine Nearest neighbour Intuitive Discriminative margin ‘Black magic’

slide-46
SLIDE 46

Document-Feature matrix

‘Bag of words’ model

_αἰσ _βασ γενέ ημέν εσθα ες_ἐ ναῖο ν_ἀφ Dem1 Dem2 Dem3 Lyc1 … Lyc2 … Ant1 Ant2

E.g. 2000 columns (# MFI)

slide-47
SLIDE 47

Experiment

  • Leave-one-text-out attribution
  • Non-disputed texts only
  • (But class imbalance…)
  • Evaluation: Accuracy, F1

(weighted), F1 (macro- averaged)

  • Different features + MFI
slide-48
SLIDE 48

200 2,000 20,000 Acc F1(w) F1(m) Acc F1(w) F1(m) Acc F1(w) F1(m) Delta 76.04 75.50 50.22 60.00 60.70 34.66 23.20 29.83 15.57 SVM 83.20 81.06 53.21 81.97 77.57 45.27 63.45 52.04 19.01 Net 80.74 79.30 49.68 85.67 83.83 55.60 83.95 81.52 55.92 200 2,000 20,000 Acc F1(w) F1(m) Acc F1(w) F1(m) Acc F1(w) F1(m) Delta 76.04 74.33 46.10 82.22 80.76 59.46 50.86 51.03 41.73 SVM 79.75 77.69 48.54 84.44 81.42 54.46 78.02 72.37 39.15 Net 79.50 78.37 0.4816 85.92 84.29 60.41 84.69 81.38 46.38

words character tetragrams

slide-49
SLIDE 49

Results

  • Net produces single highest score
  • But mostly on par with SVM
  • (Delta does surprisingly well, but never best)
  • Net: impressive robustness to large input space
slide-50
SLIDE 50

Visualization (PCA)

slide-51
SLIDE 51

Visualization (PCA)

slide-52
SLIDE 52

Visualization (Net)

slide-53
SLIDE 53

Visualization (Net)

slide-54
SLIDE 54

Features

slide-55
SLIDE 55

Features

slide-56
SLIDE 56

Features

slide-57
SLIDE 57

Dem 7 Dem 58 Dem 58 Dem 61 Dem 60 Dem 60

slide-58
SLIDE 58

Deep Learning and Computational Authorship Attribution for Ancient Greek Texts

The Case of the Attic Orators

Mike Kestemont, Francesco Mambrini & Marco Passarotti Digital Classicist Seminar, Berlin, Germany 16 February 2016

Thank you!