A Reference Dependency Bank for Analyzing Complex Predicates - - PowerPoint PPT Presentation

a reference dependency bank for analyzing complex
SMART_READER_LITE
LIVE PREVIEW

A Reference Dependency Bank for Analyzing Complex Predicates - - PowerPoint PPT Presentation

A Reference Dependency Bank for Analyzing Complex Predicates Tafseer Ahmed, Miriam Butt, Annette Hautli and Sebastian Sulger Universit at Konstanz May 25th, 2012 LREC 2012 1 / 19 Context of Work computational LFG grammar in development in


slide-1
SLIDE 1

A Reference Dependency Bank for Analyzing Complex Predicates

Tafseer Ahmed, Miriam Butt, Annette Hautli and Sebastian Sulger

Universit¨ at Konstanz

May 25th, 2012 LREC 2012

1 / 19

slide-2
SLIDE 2

Context of Work

computational LFG grammar in development in Konstanz

2 / 19

slide-3
SLIDE 3

Context of Work

computational LFG grammar in development in Konstanz aim: large-scale LFG grammar for parsing Urdu/Hindi ([B¨

  • gel et al. 2009], [Butt and King 2007])

2 / 19

slide-4
SLIDE 4

Context of Work

computational LFG grammar in development in Konstanz aim: large-scale LFG grammar for parsing Urdu/Hindi ([B¨

  • gel et al. 2009], [Butt and King 2007])

grammar is part of the ParGram project

2 / 19

slide-5
SLIDE 5

Context of Work

computational LFG grammar in development in Konstanz aim: large-scale LFG grammar for parsing Urdu/Hindi ([B¨

  • gel et al. 2009], [Butt and King 2007])

grammar is part of the ParGram project

◮ collaborative, world-wide research project 2 / 19

slide-6
SLIDE 6

Context of Work

computational LFG grammar in development in Konstanz aim: large-scale LFG grammar for parsing Urdu/Hindi ([B¨

  • gel et al. 2009], [Butt and King 2007])

grammar is part of the ParGram project

◮ collaborative, world-wide research project ◮ devoted to developing parallel LFG grammars for a variety of languages 2 / 19

slide-7
SLIDE 7

Context of Work

computational LFG grammar in development in Konstanz aim: large-scale LFG grammar for parsing Urdu/Hindi ([B¨

  • gel et al. 2009], [Butt and King 2007])

grammar is part of the ParGram project

◮ collaborative, world-wide research project ◮ devoted to developing parallel LFG grammars for a variety of languages ◮ features and analyses are kept parallel for easy transfer between

languages

2 / 19

slide-8
SLIDE 8

Context of Work

computational LFG grammar in development in Konstanz aim: large-scale LFG grammar for parsing Urdu/Hindi ([B¨

  • gel et al. 2009], [Butt and King 2007])

grammar is part of the ParGram project

◮ collaborative, world-wide research project ◮ devoted to developing parallel LFG grammars for a variety of languages ◮ features and analyses are kept parallel for easy transfer between

languages

◮ languages involved: 2 / 19

slide-9
SLIDE 9

Context of Work

computational LFG grammar in development in Konstanz aim: large-scale LFG grammar for parsing Urdu/Hindi ([B¨

  • gel et al. 2009], [Butt and King 2007])

grammar is part of the ParGram project

◮ collaborative, world-wide research project ◮ devoted to developing parallel LFG grammars for a variety of languages ◮ features and analyses are kept parallel for easy transfer between

languages

◮ languages involved:

→ large-scale: English, German, French, Japanese, Norwegian

2 / 19

slide-10
SLIDE 10

Context of Work

computational LFG grammar in development in Konstanz aim: large-scale LFG grammar for parsing Urdu/Hindi ([B¨

  • gel et al. 2009], [Butt and King 2007])

grammar is part of the ParGram project

◮ collaborative, world-wide research project ◮ devoted to developing parallel LFG grammars for a variety of languages ◮ features and analyses are kept parallel for easy transfer between

languages

◮ languages involved:

→ large-scale: English, German, French, Japanese, Norwegian → smaller-scale (yet...): Welsh, Georgian, Hungarian, Turkish, Chinese, Urdu (among many others)

2 / 19

slide-11
SLIDE 11

Complex Predicates?

Urdu has about 700 basic verbs

3 / 19

slide-12
SLIDE 12

Complex Predicates?

Urdu has about 700 basic verbs vast majority of verbal predicates is constructed using complex predicates (CPs)

3 / 19

slide-13
SLIDE 13

Complex Predicates?

Urdu has about 700 basic verbs vast majority of verbal predicates is constructed using complex predicates (CPs) most other South Asian languages make use of CPs as well

3 / 19

slide-14
SLIDE 14

Complex Predicates?

Urdu has about 700 basic verbs vast majority of verbal predicates is constructed using complex predicates (CPs) most other South Asian languages make use of CPs as well knowing how to deal with CPs is essential for doing parsing/NLP for Hindi/Urdu and for South Asian languages in general

3 / 19

slide-15
SLIDE 15

Complex Predicates?

Urdu has about 700 basic verbs vast majority of verbal predicates is constructed using complex predicates (CPs) most other South Asian languages make use of CPs as well knowing how to deal with CPs is essential for doing parsing/NLP for Hindi/Urdu and for South Asian languages in general → provide a reference dependency bank that can guide teams working

  • n NLP applications for South Asian languages (or really any

language that has CPs)

3 / 19

slide-16
SLIDE 16

Overview

1

Complex Predicates

2

Types of Complex Predicates

3

A Reference Dependency Bank for CPs

4

Conclusion

4 / 19

slide-17
SLIDE 17

Overview

1

Complex Predicates

2

Types of Complex Predicates

3

A Reference Dependency Bank for CPs

4

Conclusion

5 / 19

slide-18
SLIDE 18

Complex Predicates in General

combinations of two or more predicates that predicate as a single unit

6 / 19

slide-19
SLIDE 19

Complex Predicates in General

combinations of two or more predicates that predicate as a single unit the arguments of the CP members map onto a monoclausal syntactic structure [Butt 1995]

6 / 19

slide-20
SLIDE 20

Complex Predicates in General

combinations of two or more predicates that predicate as a single unit the arguments of the CP members map onto a monoclausal syntactic structure [Butt 1995]

◮ verb+verb, noun+verb, adj+verb, morphological causative 6 / 19

slide-21
SLIDE 21

Complex Predicates in General

combinations of two or more predicates that predicate as a single unit the arguments of the CP members map onto a monoclausal syntactic structure [Butt 1995]

◮ verb+verb, noun+verb, adj+verb, morphological causative ◮ examples from Urdu: ‘memory (N) do (V)’ = ‘remember’, ‘telephone

(N) do (V)’ = ‘telephone’, ‘fear (N) come (V)’ = ‘fear’, ‘throw (V) give (V)’ = ‘throw away’

6 / 19

slide-22
SLIDE 22

Complex Predicates in General

combinations of two or more predicates that predicate as a single unit the arguments of the CP members map onto a monoclausal syntactic structure [Butt 1995]

◮ verb+verb, noun+verb, adj+verb, morphological causative ◮ examples from Urdu: ‘memory (N) do (V)’ = ‘remember’, ‘telephone

(N) do (V)’ = ‘telephone’, ‘fear (N) come (V)’ = ‘fear’, ‘throw (V) give (V)’ = ‘throw away’

  • ften analyzed on a par with control constructions/auxiliaries/modal

verbs, but:

6 / 19

slide-23
SLIDE 23

Complex Predicates in General

combinations of two or more predicates that predicate as a single unit the arguments of the CP members map onto a monoclausal syntactic structure [Butt 1995]

◮ verb+verb, noun+verb, adj+verb, morphological causative ◮ examples from Urdu: ‘memory (N) do (V)’ = ‘remember’, ‘telephone

(N) do (V)’ = ‘telephone’, ‘fear (N) come (V)’ = ‘fear’, ‘throw (V) give (V)’ = ‘throw away’

  • ften analyzed on a par with control constructions/auxiliaries/modal

verbs, but: their syntax & semantics in fact differs markedly from these constructions [Butt 2010]

6 / 19

slide-24
SLIDE 24

Overview

1

Complex Predicates

2

Types of Complex Predicates

3

A Reference Dependency Bank for CPs

4

Conclusion

7 / 19

slide-25
SLIDE 25

A Noun+Verb Complex Predicate

formed by combining a noun and a verb

8 / 19

slide-26
SLIDE 26

A Noun+Verb Complex Predicate

formed by combining a noun and a verb

◮ noun uninflected, light verb inflected 8 / 19

slide-27
SLIDE 27

A Noun+Verb Complex Predicate

formed by combining a noun and a verb

◮ noun uninflected, light verb inflected

both contribute to overall argument structure of clause

8 / 19

slide-28
SLIDE 28

A Noun+Verb Complex Predicate

formed by combining a noun and a verb

◮ noun uninflected, light verb inflected

both contribute to overall argument structure of clause

◮ 1 argument from noun 8 / 19

slide-29
SLIDE 29

A Noun+Verb Complex Predicate

formed by combining a noun and a verb

◮ noun uninflected, light verb inflected

both contribute to overall argument structure of clause

◮ 1 argument from noun ◮ 2 arguments from verb 8 / 19

slide-30
SLIDE 30

A Noun+Verb Complex Predicate

formed by combining a noun and a verb

◮ noun uninflected, light verb inflected

both contribute to overall argument structure of clause

◮ 1 argument from noun ◮ 2 arguments from verb ◮ combine into 3 arguments in resulting CP 8 / 19

slide-31
SLIDE 31

A Noun+Verb Complex Predicate

formed by combining a noun and a verb

◮ noun uninflected, light verb inflected

both contribute to overall argument structure of clause

◮ 1 argument from noun ◮ 2 arguments from verb ◮ combine into 3 arguments in resulting CP

example: Dar lag ‘be frightened by’

nAdiyah kO hATHI sE Dar lag-A Nadya.F.Sg Dat elephant.M.Sg Inst fear.M.Sg attach-Perf.M.Sg ‘Nadya was frightened by the elephant.’

8 / 19

slide-32
SLIDE 32

A Noun+Verb Complex Predicate

formed by combining a noun and a verb

◮ noun uninflected, light verb inflected

both contribute to overall argument structure of clause

◮ 1 argument from noun ◮ 2 arguments from verb ◮ combine into 3 arguments in resulting CP

example: Dar lag ‘be frightened by’

nAdiyah kO hATHI sE Dar lag-A Nadya.F.Sg Dat elephant.M.Sg Inst fear.M.Sg attach-Perf.M.Sg ‘Nadya was frightened by the elephant.’

(lag ‘attach’: thing attached and thing that it is attached at; Dar ‘fear’: thing that is feared)

8 / 19

slide-33
SLIDE 33

A Noun+Verb Complex Predicate

"nAdiyah kO hATHI sE Dar lagA" 'lag<[1:nAdiyah], 'Dar<[21:hATHI]>'>' PRED 'nAdiyah' PRED name PROPER-TYPE PROPER NSEM proper NSYN NTYPE + SPECIFIC SEM-PROP CASE dat, GEND fem, NUM sg, PERS 3 1 SUBJ 'Dar' PRED count COMMON NSEM common NSYN NTYPE CASE nom, CLAUSE-TYPE decl, GEND masc, NUM sg, PASSIVE - OBJ 'hATHI' PRED count COMMON NSEM common NSYN NTYPE CASE inst, GEND masc, NUM sg, PERS 3 21 OBL AGENTIVE -, GOAL + LEX-SEM ASPECT perf, MOOD indicative TNS-ASP nv COMPLEX-PRED VTYPE CLAUSE-TYPE decl, PASSIVE - 104

Figure: F-Structure for nAdiyah kO hATHI sE Dar lagA ‘Nadya was frightened by the elephant.’

9 / 19

slide-34
SLIDE 34

A Permissive Complex Predicate

V+V complex predicate

10 / 19

slide-35
SLIDE 35

A Permissive Complex Predicate

V+V complex predicate

◮ infinitival main verb 10 / 19

slide-36
SLIDE 36

A Permissive Complex Predicate

V+V complex predicate

◮ infinitival main verb ◮ finite light verb 10 / 19

slide-37
SLIDE 37

A Permissive Complex Predicate

V+V complex predicate

◮ infinitival main verb ◮ finite light verb

both verbs contribute to overall argument structure of clause

10 / 19

slide-38
SLIDE 38

A Permissive Complex Predicate

V+V complex predicate

◮ infinitival main verb ◮ finite light verb

both verbs contribute to overall argument structure of clause

◮ 2 arguments from main verb 10 / 19

slide-39
SLIDE 39

A Permissive Complex Predicate

V+V complex predicate

◮ infinitival main verb ◮ finite light verb

both verbs contribute to overall argument structure of clause

◮ 2 arguments from main verb ◮ 2 arguments from light verb 10 / 19

slide-40
SLIDE 40

A Permissive Complex Predicate

V+V complex predicate

◮ infinitival main verb ◮ finite light verb

both verbs contribute to overall argument structure of clause

◮ 2 arguments from main verb ◮ 2 arguments from light verb ◮ combine into 3 arguments in resulting CP 10 / 19

slide-41
SLIDE 41

A Permissive Complex Predicate

V+V complex predicate

◮ infinitival main verb ◮ finite light verb

both verbs contribute to overall argument structure of clause

◮ 2 arguments from main verb ◮ 2 arguments from light verb ◮ combine into 3 arguments in resulting CP

example: dEkH dE ‘let see’

nAdiyah nE yAsIn kO kitAb dEkH-nE d-I Nadya.F.Sg Erg Yassin.M.Sg Dat book.F.Sg see-Inf.M.Sg give-Perf.F.Sg ‘Nadya let Yassin look at the book.’

(dEkH ‘see’: seer and seen item, dE ‘give’: permitter and action permitted)

10 / 19

slide-42
SLIDE 42

Permissive Complex Predicate

"nAdiyah nE yAsIn kO kitAb dEkHnE dI" 'dE<[1:nAdiyah] , 'dEkH<[21:yAsIn] , [41:kitAb] >'>' PRED 'nAdiyah' PRED name PROPER-TYPE PROPER NSEM proper NSYN NTYPE + SPECIFIC SEM-PROP CASE erg, GEND fem, NUM sg, PERS 3 1 SUBJ 'yAsIn' PRED name PROPER-TYPE PROPER NSEM proper NSYN NTYPE + SPECIFIC SEM-PROP CASE dat, GEND masc, NUM sg, PERS 3 21 OBJ-GO 'kitAb' PRED count COMMON NSEM common NSYN NTYPE CASE nom, GEND fem, NUM sg, PERS 3 41 OBJ AGENTIVE +, GOAL + LEX-SEM ASPECT perf, MOOD indicative TNS-ASP vv-perm COMPLEX-PRED VTYPE CLAUSE-TYPE decl, PASSIVE -, PERS 3 83

Figure: F-Structure for nAdiyah nE yAsIn kO kitAb dEkHnE dI ‘Nadya let Yassin look at the book.’

11 / 19

slide-43
SLIDE 43

Overview

1

Complex Predicates

2

Types of Complex Predicates

3

A Reference Dependency Bank for CPs

4

Conclusion

12 / 19

slide-44
SLIDE 44

Design of the Reference Dependency Bank

contains sentences illustrating examples of all common CP types in Hindi/Urdu

13 / 19

slide-45
SLIDE 45

Design of the Reference Dependency Bank

contains sentences illustrating examples of all common CP types in Hindi/Urdu strategy for creating the dependency bank:

13 / 19

slide-46
SLIDE 46

Design of the Reference Dependency Bank

contains sentences illustrating examples of all common CP types in Hindi/Urdu strategy for creating the dependency bank:

◮ sentences were parsed using the Urdu ParGram grammar → c- and

f-structures

◮ banked/disambiguated using LFG Parsebanker [Ros´

en et al. 2009]

◮ converted into triples format (see PARC700, [King et al. 2003]) via

XLE-internal process

◮ triples conversion is flexible; features may be flattened or deleted 13 / 19

slide-47
SLIDE 47

Design of the Reference Dependency Bank

contains sentences illustrating examples of all common CP types in Hindi/Urdu strategy for creating the dependency bank:

◮ sentences were parsed using the Urdu ParGram grammar → c- and

f-structures

◮ banked/disambiguated using LFG Parsebanker [Ros´

en et al. 2009]

◮ converted into triples format (see PARC700, [King et al. 2003]) via

XLE-internal process

◮ triples conversion is flexible; features may be flattened or deleted

triples format is theory-neutral; enables parsers to evaluate against the reference bank

13 / 19

slide-48
SLIDE 48

CPs in the Reference Dependency Bank

to model the verbal complex of CPs:

◮ all parts of CP contributing arguments are concatenated by underscore ◮ makes clear that CP is main predicate of clause 14 / 19

slide-49
SLIDE 49

CPs in the Reference Dependency Bank

to model the verbal complex of CPs:

◮ all parts of CP contributing arguments are concatenated by underscore ◮ makes clear that CP is main predicate of clause

triples representation split in two parts:

◮ list arguments of the whole (complex) predication ◮ indication of which part of the CP contributes which argument ◮ consecutive labeling of CP parts based on their linear order 14 / 19

slide-50
SLIDE 50

CPs in the Reference Dependency Bank

to model the verbal complex of CPs:

◮ all parts of CP contributing arguments are concatenated by underscore ◮ makes clear that CP is main predicate of clause

triples representation split in two parts:

◮ list arguments of the whole (complex) predication ◮ indication of which part of the CP contributes which argument ◮ consecutive labeling of CP parts based on their linear order

triples are restricted to predicate-argument relations neglect the more detailed information in f-structures

14 / 19

slide-51
SLIDE 51

CPs in the Reference Dependency Bank

nAdiyah nE yAsIn kO kitAb dEkH-nE d-I Nadya.F.Sg Erg Yassin.M.Sg Dat book.F.Sg see-Inf.M.Sg give-Perf.F.Sg ‘Nadya let Yassin look at the book.’

15 / 19

slide-52
SLIDE 52

CPs in the Reference Dependency Bank

nAdiyah nE yAsIn kO kitAb dEkH-nE d-I Nadya.F.Sg Erg Yassin.M.Sg Dat book.F.Sg see-Inf.M.Sg give-Perf.F.Sg ‘Nadya let Yassin look at the book.’

XLE f-structure

15 / 19

slide-53
SLIDE 53

CPs in the Reference Dependency Bank

nAdiyah nE yAsIn kO kitAb dEkH-nE d-I Nadya.F.Sg Erg Yassin.M.Sg Dat book.F.Sg see-Inf.M.Sg give-Perf.F.Sg ‘Nadya let Yassin look at the book.’

XLE f-structure triples format triples conversion

15 / 19

slide-54
SLIDE 54

CPs in the Reference Dependency Bank

nAdiyah nE yAsIn kO kitAb dEkH-nE d-I Nadya.F.Sg Erg Yassin.M.Sg Dat book.F.Sg see-Inf.M.Sg give-Perf.F.Sg ‘Nadya let Yassin look at the book.’

XLE f-structure triples format triples conversion

pred(root,dEkH dE) subj(dEkH dE,nAdiyah)

  • bj-go(dEkH dE,yAsIn)
  • bj(dEkH dE,kitAb)

complex-pred-type(dEkH dE,vv-perm) cp-part1(dEkH dE,dEkH) cp-part2(dEkH dE,dE) arg1(dE,nAdiyah) arg2(dE,dEkH) arg1(dEkH,yAsIn) arg2(dEkH,kitAb) asp(dEkH dE,perf).

application of rewrite rules

15 / 19

slide-55
SLIDE 55

Overview

1

Complex Predicates

2

Types of Complex Predicates

3

A Reference Dependency Bank for CPs

4

Conclusion

16 / 19

slide-56
SLIDE 56

Conclusion I

South Asian languages make heavy use of CPs

17 / 19

slide-57
SLIDE 57

Conclusion I

South Asian languages make heavy use of CPs essential to know about proper treatment

17 / 19

slide-58
SLIDE 58

Conclusion I

South Asian languages make heavy use of CPs essential to know about proper treatment essential to know about different CP types

17 / 19

slide-59
SLIDE 59

Conclusion I

South Asian languages make heavy use of CPs essential to know about proper treatment essential to know about different CP types also: essential to know what is not a CP

17 / 19

slide-60
SLIDE 60

Conclusion I

South Asian languages make heavy use of CPs essential to know about proper treatment essential to know about different CP types also: essential to know what is not a CP

◮ e.g. auxiliaries, modal constructions need to be distinguished from CPs 17 / 19

slide-61
SLIDE 61

Conclusion I

South Asian languages make heavy use of CPs essential to know about proper treatment essential to know about different CP types also: essential to know what is not a CP

◮ e.g. auxiliaries, modal constructions need to be distinguished from CPs ◮ examples of these constructions are also included in the dependency

bank

17 / 19

slide-62
SLIDE 62

Conclusion I

South Asian languages make heavy use of CPs essential to know about proper treatment essential to know about different CP types also: essential to know what is not a CP

◮ e.g. auxiliaries, modal constructions need to be distinguished from CPs ◮ examples of these constructions are also included in the dependency

bank

  • ther treebanks offer only limited annotation for CPs (e.g. HUTB,

[Bhatt et al. 2009])

17 / 19

slide-63
SLIDE 63

Conclusion II

presented a reference dependency bank for CPs (and other constructions that are often confused with CPs)

18 / 19

slide-64
SLIDE 64

Conclusion II

presented a reference dependency bank for CPs (and other constructions that are often confused with CPs) reference bank is designed in a theory-independent way

18 / 19

slide-65
SLIDE 65

Conclusion II

presented a reference dependency bank for CPs (and other constructions that are often confused with CPs) reference bank is designed in a theory-independent way represents a typology of CPs (reflects what we currently know about CPs...)

18 / 19

slide-66
SLIDE 66

Conclusion II

presented a reference dependency bank for CPs (and other constructions that are often confused with CPs) reference bank is designed in a theory-independent way represents a typology of CPs (reflects what we currently know about CPs...) researchers may consult this resource when working on a new language

◮ for theoretical syntax research ◮ for constructing analyses for treebanks ◮ for evaluating new parsers 18 / 19

slide-67
SLIDE 67

Conclusion II

presented a reference dependency bank for CPs (and other constructions that are often confused with CPs) reference bank is designed in a theory-independent way represents a typology of CPs (reflects what we currently know about CPs...) researchers may consult this resource when working on a new language

◮ for theoretical syntax research ◮ for constructing analyses for treebanks ◮ for evaluating new parsers

freely available on the internet

http://ling.uni-konstanz.de/pages/home/pargram urdu/main/Resources.html

18 / 19

slide-68
SLIDE 68

References

Bhatt, R., B. Narasimhan, M. Palmer, O. Rambow, D. M. Sharma, and F. Xia. 2009. A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu. In Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP ’09, 186–189, Stroudsburg, PA, USA. Association for Computational Linguistics. B¨

  • gel, T., M. Butt, A. Hautli, and S. Sulger. 2009.

Urdu and the Modular Architecture of ParGram. In Proceedings of the Conference on Language and Technology 2009 (CLT09). Butt, M. 1995. The Structure of Complex Predicates in Urdu. CSLI Publications. Butt, M. 2010. The Light Verb Jungle: Still Hacking Away. In M. Amberber, B. Baker, and M. Harvey (Eds.), Complex Predicates in Cross-Linguistic Perspective. Cambridge University Press. Butt, M., and T. H. King. 2007. Urdu in a Parallel Grammar Development Environment. Language Resources and Evaluation 41(2):191–207. King, T. H., R. Crouch, S. Riezler, M. Dalrymple, and R. Kaplan. 2003. The PARC700 Dependency Bank. In Proceedings of the EACL03: 4th International Workshop on Linguistically Interpreted Corpora (LINC-03). Ros´ en, V., P. Meurer, and K. de Smedt. 2009. LFG Parsebanker: A Toolkit for Building and Searching a Treebank as a Parsed Corpus. In F. V. Eynde, A. Frank, G. van Noord, and K. D. Smedt (Eds.), Proceedings of the 7th International Workshop on Treebanks and Linguistic Theories (TLT7), 127–133. LOT. 19 / 19