DS-to-PS conversion Fei Xia University of Washington July 29, 2011 - - PowerPoint PPT Presentation

ds to ps conversion
SMART_READER_LITE
LIVE PREVIEW

DS-to-PS conversion Fei Xia University of Washington July 29, 2011 - - PowerPoint PPT Presentation

DS-to-PS conversion Fei Xia University of Washington July 29, 2011 1 Main steps in building the treebank DS treebank: Tokenization Morphological analysis, voice, etc. POS tagging DS Propbank: adding Predicate-argument


slide-1
SLIDE 1

DS-to-PS conversion

Fei Xia University of Washington July 29, 2011

1

slide-2
SLIDE 2

Main steps in building the treebank

  • DS treebank:

– Tokenization – Morphological analysis, voice, etc. – POS tagging – DS

  • Propbank: adding Predicate-argument info
  • Automatic DS-to-PS conversion
  • Some manual check to ensure the conversion works well

2

slide-3
SLIDE 3

Outline

  • Important concepts
  • Compatibility and consistency
  • Handling inconsistency

3

slide-4
SLIDE 4

Important concepts

  • Linguistic phenomena
  • Representation type
  • Linguistic theory

– Theoretical framework – Linguistic analyses

  • Annotation guidelines

4

slide-5
SLIDE 5

Linguistic phenomena

  • They are what we want to present, including

– General concepts: e.g., which words form a phrase? What types of phrases does a language have? – Types of relations between words or phrases (e.g., subjecthood, temporal modification) – Specific constructions (e.g., small clause) – Finer-grained distinctions (e.g., unergative vs. unaccusative)

5

slide-6
SLIDE 6

Representation type

  • It is the type of mathematical object that is used to

represent syntactic facts

  • Examples: DS, PS
  • Each representation type can decide what more

specific representation devices to employ – Labels on the arcs of a tree – Use of empty nodes or coindexation between nodes

6

slide-7
SLIDE 7

Linguistic theory

  • It explains how linguistic phenomena are

represented in the chose representation type

  • It has two components:

– Theoretical framework: it provides vocabulary and constraints in which linguistic theories can be formulated: e.g., GB, LFG, LTAG, HPSG – Linguistic analyses

7

slide-8
SLIDE 8

Small clause

8

slide-9
SLIDE 9

“Exceptional case-marking” analysis

9

slide-10
SLIDE 10

“Raising-to-object” analysis

10

slide-11
SLIDE 11

Annotation guidelines

  • Guideline designers need to choose the following

– Linguistic phenomena to represent – Representation type – Theoretical framework – Linguistic analyses – Descriptions – Examples: sentences with DS or PS trees

11

slide-12
SLIDE 12

Outline

  • Important concepts
  • Compatibility and consistency
  • Handling inconsistency

12

slide-13
SLIDE 13

“Exceptional case-marking” analysis

13

slide-14
SLIDE 14

“Raising-to-object” analysis

14

slide-15
SLIDE 15

Implicit vs. explicit information

  • Certain aspects of information has to be

expressed explicitly in DS, but not PS, or vice versa

– Head in DS – Syntactic categories of phrases in PS

  • Not explicitly providing info does not mean

that corresponding concepts does not exist in DS/PS

15

slide-16
SLIDE 16

Syntactic consistency

  • We assume each phrase in a PS has a special

word, head word, which represents the property of the phrase.

  • A (DS, PS) pair is called consistent if there is a

way to assign a head word to each internal node in the PS so that the resulting DS is identical to the given DS.

16

slide-17
SLIDE 17

Consistent pairs

17

slide-18
SLIDE 18

Inconsistent pairs

18

slide-19
SLIDE 19

A real example

19

slide-20
SLIDE 20

Consistency assumption

20

slide-21
SLIDE 21

Definition of consistency

  • A DS and a PS are consistent iff there exists a

flattened version of the PS that is identical to the DS.

  • If the input DS and the desired PS are

consistent, the PS can be created by stretching the DS and adding syntactic labels.

21

slide-22
SLIDE 22

Checking consistency

  • For each (dep, head) pair in the DS

– find their location in the PS and their closest antecedent – add heads to the nodes on the path between the leaf nodes and the antecedent

  • The DS and the PS are consistent iff each node

in the PS has exactly one head.

22

slide-23
SLIDE 23

(Vinken, join)

(Vinken) (join) (join) (join)

(board, join)

(board)

(will, join) (29, join)

(29)

23

slide-24
SLIDE 24

Outline

  • Important concepts
  • Compatibility and consistency
  • Handling inconsistency

24

slide-25
SLIDE 25

wh-movement

(who, come)

come come come come come come come

(come, think)

25

slide-26
SLIDE 26

wh-movement

(who, come) (come, think)

come come come come come | think come | think come

26

slide-27
SLIDE 27

wh-movement

(who, come) (come, think)

come come come ?? think think ??

(you, think)

27

slide-28
SLIDE 28

Can DS and PS be inconsistent?

  • DS and PS can represent different aspects of the same
  • verall pictures, and still be consistent.

– Info provided in PropBank: e.g., empty subject, unaccusative – Info that is in PS only: e.g., traces

  • DS and PS should not choose “conflicting” analyses.

– DS and PS are two images of the same underlying treebank, not two separate treebanks. – Ex: ba-construction in Chinese: verb, prep, or something else? – Ex: free relatives: empty nominal head

  • The inconsistency cases should be rare and well-motivated.

28

slide-29
SLIDE 29

How to handle inconsistency?

  • Detect inconsistency in (DS, PS) pairs in the

guidelines

  • Consult guideline designers to determine

whether the inconsistency can be resolved by changing analyses

  • If not, introduce DScons and ensure sufficient info

is in DS for automatic conversion.

29

slide-30
SLIDE 30

Two-stage conversion

  • DS to DScons: by removing “inconsistency”

between DS and PS.

  • DScons to PS: by applying conversion rules

30

slide-31
SLIDE 31

Case #1: long-distance movement

31

DSconst:

DSprop:

  • Other examples: extraposition
  • Easily detectable due to non-projectivity
  • Create DSconst by moving up the “moved element” and leaving a trace
  • which node is the “moved element”?

The one that is apart from other nodes in the subtree.

slide-32
SLIDE 32

Case #2: local scrambling

32

Detectable by assuming canonical word order: k1 > k2 Need from PS/DS teams the canonical word order and what word order triggers movement

slide-33
SLIDE 33

Case #3: small clause rule

33

Detectable by dependency type k2s Need confirmation from IIIT that k2s is used only for small clause

slide-34
SLIDE 34

Case 4: support verb

34

Detectable by dependency type “pof” Need confirmation from IIIT that “pof” is used only for support verb

slide-35
SLIDE 35

Conclusion

  • We define consistency between DS and PS
  • DS and PS can be inconsistent but such cases

should be rare and well-motivated.

  • We will handle inconsistency with the two-

stage approach

35

slide-36
SLIDE 36

Conversion algorithm

36

slide-37
SLIDE 37

Definition of conversion rule

  • A conversion rule is a (DS_pattern, PS_pattern) pair.
  • Ex:
  • Simplest case:

– DS_pattern corresponds to only one dependency link – Decomposing DS becomes trivial – PS_pattern is a tree fragment (e.g., wh-movement) – Learning rules from (PS, DS) pairs is easy

37

slide-38
SLIDE 38

Extracting rules

38

slide-39
SLIDE 39

Rules extracted from the example

39

slide-40
SLIDE 40

Input DS

40

slide-41
SLIDE 41

41

slide-42
SLIDE 42

Gluing PS segments together

42

slide-43
SLIDE 43

c c c

43

slide-44
SLIDE 44

44