The PROIEL parallel corpus of old Indo-European New Testament - - PowerPoint PPT Presentation

the proiel parallel corpus of old indo european new
SMART_READER_LITE
LIVE PREVIEW

The PROIEL parallel corpus of old Indo-European New Testament - - PowerPoint PPT Presentation

The PROIEL corpora The Syntacticus interface Case studies The PROIEL parallel corpus of old Indo-European New Testament translations Dag Trygve Truslew Haug 23 February 2018 Dag Haug PROIEL 23 February 2018 1 / 30 The PROIEL corpora The


slide-1
SLIDE 1

The PROIEL corpora The Syntacticus interface Case studies

The PROIEL parallel corpus

  • f old Indo-European New Testament translations

Dag Trygve Truslew Haug 23 February 2018

Dag Haug PROIEL 23 February 2018 1 / 30

slide-2
SLIDE 2

The PROIEL corpora The Syntacticus interface Case studies

Introduction

The PROIEL corpus: small, but deep Core: NT in Greek, Latin, Gothic, Classical Armenian and OCS What can the deep annotation do for us? Three sections:

The annotation The Syntacticus interface Some case studies

Dag Haug PROIEL 23 February 2018 2 / 30

slide-3
SLIDE 3

The PROIEL corpora The Syntacticus interface Case studies

The background

A corpus for linguists: focus on making the most of a limited data set for linguistic research

Dag Haug PROIEL 23 February 2018 3 / 30

slide-4
SLIDE 4

The PROIEL corpora The Syntacticus interface Case studies

The background

A corpus for linguists: focus on making the most of a limited data set for linguistic research Pragmatic Resources in Old Indo-European Languages (PROIEL, 2008-2012)

word order anaphoric expressions definiteness participles (background events) discourse particles

Dag Haug PROIEL 23 February 2018 3 / 30

slide-5
SLIDE 5

The PROIEL corpora The Syntacticus interface Case studies

The background

A corpus for linguists: focus on making the most of a limited data set for linguistic research Pragmatic Resources in Old Indo-European Languages (PROIEL, 2008-2012)

word order anaphoric expressions definiteness participles (background events) discourse particles

The corpus should help this research, but also be useful for others

Dag Haug PROIEL 23 February 2018 3 / 30

slide-6
SLIDE 6

The PROIEL corpora The Syntacticus interface Case studies

The background

A corpus for linguists: focus on making the most of a limited data set for linguistic research Pragmatic Resources in Old Indo-European Languages (PROIEL, 2008-2012)

word order anaphoric expressions definiteness participles (background events) discourse particles

The corpus should help this research, but also be useful for others Annotation continues (with less resources)

Dag Haug PROIEL 23 February 2018 3 / 30

slide-7
SLIDE 7

The PROIEL corpora The Syntacticus interface Case studies

Texts

NT and translations

Dag Haug PROIEL 23 February 2018 4 / 30

slide-8
SLIDE 8

The PROIEL corpora The Syntacticus interface Case studies

Texts

NT and translations Classical Greek and Latin:

Herodotus Gallic War, Letters to Atticus, De officiis

Post-classical Greek and Latin

Sphrantzes’ Chronicles Peregrinatio Aetheriae

Other corpora in the same format: Old Norse, Old Swedish, Medieval English and Romance, Old Russian and OCS

Dag Haug PROIEL 23 February 2018 4 / 30

slide-9
SLIDE 9

The PROIEL corpora The Syntacticus interface Case studies

The PROIEL annotation

Many-layered annotation:

Morphological annotation Syntactic annotation (dependency/LFG-based) Semantic and other customised annotation (e.g. animacy) Annotation of information structure and anaphoric links Experimental discourse structure annotation Token alignments

Dag Haug PROIEL 23 February 2018 5 / 30

slide-10
SLIDE 10

The PROIEL corpora The Syntacticus interface Case studies

Morphology

All our languages have relatively rich morphology

Dag Haug PROIEL 23 February 2018 6 / 30

slide-11
SLIDE 11

The PROIEL corpora The Syntacticus interface Case studies

Morphology

All our languages have relatively rich morphology inflection mood tense voice degree case person number gender strength

Dag Haug PROIEL 23 February 2018 6 / 30

slide-12
SLIDE 12

The PROIEL corpora The Syntacticus interface Case studies

Morphology

All our languages have relatively rich morphology inflection mood tense voice degree case person number gender strength 2623 unique MSD tags in the corpus 803 unique tags in Greek, 636 in Latin In addition, 26 POS tags Also derivational morphology for some categories

Example Dag Haug PROIEL 23 February 2018 6 / 30

slide-13
SLIDE 13

The PROIEL corpora The Syntacticus interface Case studies

Dependency syntax

Dependencies are asymmetric relations between words We label these dependencies with the function of the dependent The dependencies form a tree under an abstract root No explicit constituency

example Dag Haug PROIEL 23 February 2018 7 / 30

slide-14
SLIDE 14

The PROIEL corpora The Syntacticus interface Case studies

Semantic annotation – animacy

HUMAN ORG ANIMAL VEH CONC PLACE NONCONC TIME

Dag Haug PROIEL 23 February 2018 8 / 30

slide-15
SLIDE 15

The PROIEL corpora The Syntacticus interface Case studies

Semantic annotation – animacy

HUMAN ORG ANIMAL VEH CONC PLACE NONCONC TIME All Greek noun lemmata annotated for animacy

Dag Haug PROIEL 23 February 2018 8 / 30

slide-16
SLIDE 16

The PROIEL corpora The Syntacticus interface Case studies

Semantic annotation – animacy

HUMAN ORG ANIMAL VEH CONC PLACE NONCONC TIME All Greek noun lemmata annotated for animacy Adjustments at token level

Dag Haug PROIEL 23 February 2018 8 / 30

slide-17
SLIDE 17

The PROIEL corpora The Syntacticus interface Case studies

Semantic annotation – animacy

HUMAN ORG ANIMAL VEH CONC PLACE NONCONC TIME All Greek noun lemmata annotated for animacy Adjustments at token level Tag transfer to other parts of speech via anaphoric links

Dag Haug PROIEL 23 February 2018 8 / 30

slide-18
SLIDE 18

The PROIEL corpora The Syntacticus interface Case studies

Semantic annotation – animacy

HUMAN ORG ANIMAL VEH CONC PLACE NONCONC TIME All Greek noun lemmata annotated for animacy Adjustments at token level Tag transfer to other parts of speech via anaphoric links Tag transfer to other languages via token alignments

Dag Haug PROIEL 23 February 2018 8 / 30

slide-19
SLIDE 19

The PROIEL corpora The Syntacticus interface Case studies

Givenness

Givenness tags based on which context the hearer uses to establish reference

Discourse (anaphora) → OLD

Dag Haug PROIEL 23 February 2018 9 / 30

slide-20
SLIDE 20

The PROIEL corpora The Syntacticus interface Case studies

Givenness

Givenness tags based on which context the hearer uses to establish reference

Discourse (anaphora) → OLD Situation (deixis) → ACC-sit

Dag Haug PROIEL 23 February 2018 9 / 30

slide-21
SLIDE 21

The PROIEL corpora The Syntacticus interface Case studies

Givenness

Givenness tags based on which context the hearer uses to establish reference

Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Scenarios (inferences) → ACC-inf

Dag Haug PROIEL 23 February 2018 9 / 30

slide-22
SLIDE 22

The PROIEL corpora The Syntacticus interface Case studies

Givenness

Givenness tags based on which context the hearer uses to establish reference

Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Scenarios (inferences) → ACC-inf Encyclopedic knowledge → ACC-gen

Dag Haug PROIEL 23 February 2018 9 / 30

slide-23
SLIDE 23

The PROIEL corpora The Syntacticus interface Case studies

Givenness

Givenness tags based on which context the hearer uses to establish reference

Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Scenarios (inferences) → ACC-inf Encyclopedic knowledge → ACC-gen No context (no extra-NP information) → NEW

Dag Haug PROIEL 23 February 2018 9 / 30

slide-24
SLIDE 24

The PROIEL corpora The Syntacticus interface Case studies

Givenness

Givenness tags based on which context the hearer uses to establish reference

Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Scenarios (inferences) → ACC-inf Encyclopedic knowledge → ACC-gen No context (no extra-NP information) → NEW

Exists for 58756 NPs (full coverage of the Greek gospels + various

  • ther texts)

example Dag Haug PROIEL 23 February 2018 9 / 30

slide-25
SLIDE 25

The PROIEL corpora The Syntacticus interface Case studies

http://syntacticus.org

Dag Haug PROIEL 23 February 2018 10 / 30

slide-26
SLIDE 26

The PROIEL corpora The Syntacticus interface Case studies

Case studies

Select case studies to show the value of deep analysis

OCS aspect Latin participles Early Slavic DOM

A few words about the danger of superficial analyses of Biblical data

Dag Haug PROIEL 23 February 2018 11 / 30

slide-27
SLIDE 27

The PROIEL corpora The Syntacticus interface Case studies

Patterns

PSNVNRNSNCVRSNRNCDVRSNVVSNCSNDNVRPCNRSNPVSNP SARPVCDSNPVRSNCVRSNMNVRSNCVRSNCSNVPRDSVSNVSN RSNVSNSNGVSNCVSNSNVCVRSNCVRSNSNVNCNSNNVRSNVD NCVPSNIRPCVPVNNCDVSNVPCVAVNSSNCNSNPCPRSNVSNC DVPCVSNPNRSNRSNVRPCVRNCDSNVRSNCVRSNPVDVPDNVC DDSNCDVRSNPNRNACVVPPCPNNVVPVPPVSASNCVPSNVCVR PCVPSNSACVNAVRPCVAGVPVPVPNARNDSNSAVCVPCVSNPD DRASNSNCDRSNVVRSNNCNRNCNSDNNVVCDVPRPCVVPVSNC VPSNCVPADVGVSNVRPPSDVCSVCVASNVRSNCVADVANCNAV CDVVSNGVPCDADVVCVRANCDVCVPNCSRPCVPCVPGPVPCVP VDRSVNGDDVRPDVCVVRSNPRASNCSNVCVRPAVPCVVPGGVV PVCVVSNPVCVVVCDVRPSNCVCVPDVPCVPVPPVCVPVSNCVR SNPPVNRNPPDVVVACVSNGDPVRNDVCDRANVCVRPD

Dag Haug PROIEL 23 February 2018 12 / 30

slide-28
SLIDE 28

The PROIEL corpora The Syntacticus interface Case studies

Language makes a difference

−1.5 −1.0 −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5 Factor 1 (37.5 %) Factor 2 (12.3 %) 2 Corinthians Matthew Romans 1 Corinthians Revelation 2 Thessalonians Luke Titus 2 Timothy 1 Timothy Mark Matthew Ephesians Jude 1 Thessalonians Revelation 1 Timothy Colossians Philemon 1 Corinthians Galatians Matthew Philippians Ephesians Luke John 1 Thessalonians John Romans Philemon 2 Peter Acts Colossians John 2 Timothy Matthew 2 Corinthians 2 Thessalonians 1 Corinthians 1 Timothy 1 John Acts Titus Mark John Ephesians 2 Thessalonians Mark 1 Thessalonians 2 Corinthians Galatians James Luke Titus Hebrews Philippians Colossians Philemon 1 Peter Luke Mark John Galatians Romans 3 John Philippians 1 Peter 2 Timothy 2 John Luke Hebrews Mark Matthew Jude

Dag Haug PROIEL 23 February 2018 13 / 30

slide-29
SLIDE 29

The PROIEL corpora The Syntacticus interface Case studies

But it is often trivial

−1.0 −0.5 0.0 0.5 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 Factor 1 (24.1 %) Factor 2 (9.7 %) 2 Corinthians Matthew Romans 1 Corinthians Revelation 2 Thessalonians Luke Titus 2 Timothy 1 Timothy Mark Matthew Ephesians Jude 1 Thessalonians Revelation 1 Timothy Colossians Philemon 1 Corinthians Galatians Matthew Philippians Ephesians Luke John 1 Thessalonians John Romans Philemon 2 Peter Acts Colossians John 2 Timothy Matthew 2 Corinthians 2 Thessalonians 1 Corinthians 1 Timothy 1 John Acts Titus Mark John Ephesians 2 Thessalonians Mark 1 Thessalonians 2 Corinthians Galatians James Luke Titus Hebrews Philippians Colossians Philemon 1 Peter Luke Mark John Galatians Romans 3 John Philippians 1 Peter 2 Timothy 2 John Luke Hebrews Mark Matthew Jude

Dag Haug PROIEL 23 February 2018 14 / 30

slide-30
SLIDE 30

The PROIEL corpora The Syntacticus interface Case studies

Stylistics vs language

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0 Factor 1 (14.7 %) Factor 2 (12.1 %) Matthew 1 Corinthians Philippians Galatians 2 Timothy 2 Corinthians 2 Thessalonians Mark Philippians Galatians 1 Corinthians Romans Matthew 1 Thessalonians Jude 1 Peter 2 Timothy 2 Thessalonians Mark Luke John Colossians 1 Thessalonians Revelation Colossians John Luke Ephesians Philemon Ephesians Titus 1 Timothy Philemon Acts 2 Corinthians Romans Hebrews Titus 1 Timothy

Dag Haug PROIEL 23 February 2018 15 / 30

slide-31
SLIDE 31

The PROIEL corpora The Syntacticus interface Case studies

Morphology: aspect in OCS

The historical evolution of the Slavic aspect system is unclear

Dag Haug PROIEL 23 February 2018 16 / 30

slide-32
SLIDE 32

The PROIEL corpora The Syntacticus interface Case studies

Morphology: aspect in OCS

The historical evolution of the Slavic aspect system is unclear Most of the modern languages express aspect through a derivational system

Dag Haug PROIEL 23 February 2018 16 / 30

slide-33
SLIDE 33

The PROIEL corpora The Syntacticus interface Case studies

Morphology: aspect in OCS

The historical evolution of the Slavic aspect system is unclear Most of the modern languages express aspect through a derivational system OCS has a morphological expression of aspect, but (like in modern Bulgarian) the derivational (affixation) system is also in evidence

Dag Haug PROIEL 23 February 2018 16 / 30

slide-34
SLIDE 34

The PROIEL corpora The Syntacticus interface Case studies

Morphology: aspect in OCS

The historical evolution of the Slavic aspect system is unclear Most of the modern languages express aspect through a derivational system OCS has a morphological expression of aspect, but (like in modern Bulgarian) the derivational (affixation) system is also in evidence It is therefore unclear which system is the primary exponent of aspect

Dag Haug PROIEL 23 February 2018 16 / 30

slide-35
SLIDE 35

The PROIEL corpora The Syntacticus interface Case studies

Slavic verbs

Stem Present Aorist Imperfect Infinitive tvori-ipfv? tvoritż tvori tvorjaaše tvoriti sż-tvori-pfv sż-tvoritż sż-tvori *sż-tvorjaaše sż-tvoriti sż-tvarja-ipfv sż-tvarjajetż *sż-tvarja sż-tvarjaaše *sż-tvarjati

Dag Haug PROIEL 23 February 2018 17 / 30

slide-36
SLIDE 36

The PROIEL corpora The Syntacticus interface Case studies

Slavic verbs

Stem Present Aorist Imperfect Infinitive tvori-ipfv? tvoritż tvori tvorjaaše tvoriti sż-tvori-pfv sż-tvoritż sż-tvori *sż-tvorjaaše sż-tvoriti sż-tvarja-ipfv sż-tvarjajetż *sż-tvarja sż-tvarjaaše *sż-tvarjati There is a morphological exponent of ‘aspect’ in the past tenses (and in the participles)

Dag Haug PROIEL 23 February 2018 17 / 30

slide-37
SLIDE 37

The PROIEL corpora The Syntacticus interface Case studies

Slavic verbs

Stem Present Aorist Imperfect Infinitive tvori-ipfv? tvoritż tvori tvorjaaše tvoriti sż-tvori-pfv sż-tvoritż sż-tvori *sż-tvorjaaše sż-tvoriti sż-tvarja-ipfv sż-tvarjajetż *sż-tvarja sż-tvarjaaše *sż-tvarjati There is a morphological exponent of ‘aspect’ in the past tenses (and in the participles) The present and the infinitive do not express ‘aspect’ morphologically

Dag Haug PROIEL 23 February 2018 17 / 30

slide-38
SLIDE 38

The PROIEL corpora The Syntacticus interface Case studies

Slavic verbs

Stem Present Aorist Imperfect Infinitive tvori-ipfv? tvoritż tvori tvorjaaše tvoriti sż-tvori-pfv sż-tvoritż sż-tvori *sż-tvorjaaše sż-tvoriti sż-tvarja-ipfv sż-tvarjajetż *sż-tvarja sż-tvarjaaše *sż-tvarjati There is a morphological exponent of ‘aspect’ in the past tenses (and in the participles) The present and the infinitive do not express ‘aspect’ morphologically Some of these cells are not attested, and we argue that some of the gaps are not coincidental

Dag Haug PROIEL 23 February 2018 17 / 30

slide-39
SLIDE 39

The PROIEL corpora The Syntacticus interface Case studies

въдати въздати дати даꙗти издаꙗти отъдати подати прѣдати прѣдаꙗти продати продаꙗти

  • Impf. (Gk. impf.)
  • Impf. (Gk. pfv.)
  • Aor. (Gk. impf.)
  • Aor. (Gk. pfv.)

Distribution of aspect in verbs from да

Freq 10 20 30 40 50 60

Dag Haug PROIEL 23 February 2018 18 / 30

slide-40
SLIDE 40

The PROIEL corpora The Syntacticus interface Case studies

~ipfv

  • ipfv
  • ipfv(<4)

~neut ~pfv

  • pfv
  • pfv(<4)
  • unkn.

Infinitives and their Greek originals

Verbtype Freq 50 100 150 200

Dag Haug PROIEL 23 February 2018 19 / 30

slide-41
SLIDE 41

The PROIEL corpora The Syntacticus interface Case studies

We concluded that OCS attests a double exponence where both inflection and derivation encodes aspect

Dag Haug PROIEL 23 February 2018 20 / 30

slide-42
SLIDE 42

The PROIEL corpora The Syntacticus interface Case studies

We concluded that OCS attests a double exponence where both inflection and derivation encodes aspect This was possible because the annotation let us extract more sophisticated patterns than mere translation equivalence

Inflectional morphology Derivational morphology

Dag Haug PROIEL 23 February 2018 20 / 30

slide-43
SLIDE 43

The PROIEL corpora The Syntacticus interface Case studies

Syntax: Translating participles in the Vulgate

Latin does not have a past, active participle Many translation strategies, including copying Where does he do what? ptcp-X ptcp-subj sub-ptcp left right adv.clause 12.9% 25.4% 12.0% 8.0% 12.0% absolute 15.5% 8.2% 12.8% 10.4% 8.0% main 15.1% 11.2% 13.7% 28.8% 12.0% participle 51.1% 52.2% 51.3% 43.2% 60.0%

Dag Haug PROIEL 23 February 2018 21 / 30

slide-44
SLIDE 44

The PROIEL corpora The Syntacticus interface Case studies

Other factor: mood

ind imp adv.clause 7.7% 0.0% absolute 9.0% 0.0% main 25.2% 57.1% participle 49.0% 42.9%

Translation w/ imperative Translation w/ framing participle Dag Haug PROIEL 23 February 2018 22 / 30

slide-45
SLIDE 45

The PROIEL corpora The Syntacticus interface Case studies

The value of morphosyntactic annotation

By carefully examining the translations patterns we could argue that Jerome’s translations reflect syntactic ambiguity in the Greek source This was possible because the annotation let us extract more sophisticated patterns than mere translation equivalence

Inflectional morphology Syntax

Dag Haug PROIEL 23 February 2018 23 / 30

slide-46
SLIDE 46

The PROIEL corpora The Syntacticus interface Case studies

Infostatus and animacy: OCS object marking (Eckhoff 2015)

In all modern Slavic languages with case, object marking interacts with animacy Early stage attested in OCS with variable object realization (DOM) Typical DOM factors

animacy definiteness specificity topicality

Which factors are operative in OCS?

Dag Haug PROIEL 23 February 2018 24 / 30

slide-47
SLIDE 47

The PROIEL corpora The Syntacticus interface Case studies

Definiteness

Greek Nom./Acc. Gen. W/o article 53 66 W/ article 41 228

Greek indef., OCS acc. Greek def., OCS gen. Dag Haug PROIEL 23 February 2018 25 / 30

slide-48
SLIDE 48

The PROIEL corpora The Syntacticus interface Case studies

Information status

Infostatus Nom./Acc. Gen. New 27 21 Anchored 24 32 Accessible 3 65 Old 19 132

Definite, but new – OCS acc. Dag Haug PROIEL 23 February 2018 26 / 30

slide-49
SLIDE 49

The PROIEL corpora The Syntacticus interface Case studies

Saliency and pickup rates

Nom./Acc. Gen. Mean no. of pickups 6.30 2.43

A prominent son – OCS acc. Dag Haug PROIEL 23 February 2018 27 / 30

slide-50
SLIDE 50

The PROIEL corpora The Syntacticus interface Case studies

The value of morphosyntactic annotation

We argued that OCS DOM reflects information status (but not directly the Greek article), as well as “forward salience” This was possible because the annotation let us extract more sophisticated patterns than mere translation equivalence

Inflectional morphology Syntax

Dag Haug PROIEL 23 February 2018 28 / 30

slide-51
SLIDE 51

The PROIEL corpora The Syntacticus interface Case studies

Conclusions

Deep annotation increases the possibilities of extracting information from parallel corpora There is a tradeoff with breadth PROIEL available in UD → deep and wide studies?

Dag Haug PROIEL 23 February 2018 29 / 30

slide-52
SLIDE 52

The PROIEL corpora The Syntacticus interface Case studies

References

Eckhoff, Hanne Martine (2015): “Animacy and differential object marking in Old Church Slavonic”. Russian Linguistics 39:2. Eckhoff, Hanne Martine and Dag Trygve Truslew Haug (2015): “Aspect and prefixation in Old Church Slavonic”. Diachronica 32:2. Haug, Dag Trygve Truslew (2012): “Open verb-based adjuncts in New Testament Greek — with a view to the Latin Vulgate translation” in Fabricius-Hansen, Cathrine and Dag Trygve Truslew Haug (eds.) Big events and small clauses, Mouton de Gruyter.

Dag Haug PROIEL 23 February 2018 30 / 30