Domain Adaptation for Constituency Parsing Using Partial Annotations - - PowerPoint PPT Presentation

domain adaptation for constituency parsing using partial
SMART_READER_LITE
LIVE PREVIEW

Domain Adaptation for Constituency Parsing Using Partial Annotations - - PowerPoint PPT Presentation

Domain Adaptation for Constituency Parsing Using Partial Annotations Vidur Joshi Matthew Peters Mark Hopkins Constituency Parsing is Useful Textual Entailment (Bowman et al., 2016) Semantic Parsing (Hopkins et al., 2017) Sentiment Analysis


slide-1
SLIDE 1

Domain Adaptation for Constituency Parsing Using Partial Annotations

Vidur Joshi Matthew Peters Mark Hopkins

slide-2
SLIDE 2

Textual Entailment (Bowman et al., 2016) Semantic Parsing (Hopkins et al., 2017) Sentiment Analysis (Socher et al., 2013) Language Modeling (Dyer et al., 2016)

Constituency Parsing is Useful

2

slide-3
SLIDE 3

Penn Tree Bank (PTB) (Marcus et al., 1993)

3

40,000 annotated sentences Newswire domain

slide-4
SLIDE 4

Geometry Problem: In the rhombus PQRS, PR = 24 and QS = 10. Question: What's the second-most-used vowel in English? Biochemistry: Ethoxycoumarin was metabolized by isolated epidermal cells via dealkylation to 7-hydroxycoumarin ( 7-OHC ) and subsequent conjugation.

But, Target Domains Are Diverse!

4

slide-5
SLIDE 5

Parse geometry sentence with PTB trained parser

Performance Outside Source Domain

5

slide-6
SLIDE 6

Parse geometry sentence with PTB trained parser

Performance Outside Source Domain

6

slide-7
SLIDE 7

Parse geometry sentence with PTB trained parser

Performance Outside Source Domain

7

slide-8
SLIDE 8

How can we cheaply create high quality parsers for new domains?

8

slide-9
SLIDE 9

Relevant Recent Developments in NLP

9

Contextualized word representations improve sample

  • efficiency. (Peters et al., 2018)

Span-focused models achieve state-of-the-art constituency parsing results. (Stern et al., 2017)

slide-10
SLIDE 10

Contributions

10

Show contextual word embeddings help domain adaptation. E.g., Over 90% F1 on Brown Corpus. Adapt a parser using partial annotations. E.g., Increase correct geometry-domain parses by 23%.

slide-11
SLIDE 11

Outline

11

Review Contextual Word Representations Partial Annotations: Definition Training Parsing as Span Classification The Span Classification Model Experiments and Results: Performance on PTB and new Domains Adapting Using Partial Annotations

slide-12
SLIDE 12

Contextualized Word Representations

ELMo trained on Billion Word Corpus (Peters et al., 2018).

12

slide-13
SLIDE 13

Contextualized Word Representations

ELMo trained on Billion Word Corpus (Peters et al., 2018). Improve sample efficiency

13

slide-14
SLIDE 14

14

Partial Annotations

Definition Training Parsing as Span Classification The Span Classification Model

slide-15
SLIDE 15

A triangle has a perimeter of 16 and one side of length 4.

Selectively Annotate Important Phenomena

15

slide-16
SLIDE 16

A triangle has [a perimeter of 16] and one side of length 4.

Selectively Annotate Important Phenomena

16

slide-17
SLIDE 17

A triangle has [a perimeter of 16] and one side of length 4.

Selectively Annotate Important Phenomena

17

slide-18
SLIDE 18

A triangle has [a perimeter {of 16] and one side of length 4}.

Selectively Annotate Important Phenomena

18

slide-19
SLIDE 19

Full Versus Partial Annotation

(S (NP A triangle) (VP has (NP (NP (NP a perimeter) (PP of 16)) and (NP (NP one side) (PP of (NP length 4))))) .) A triangle has [a perimeter {of 16] and one side of length 4}.

19

slide-20
SLIDE 20

Partial Annotation Definition

Partial annotation is a labeled span.

A triangle has [a perimeter of 16] and one side of length 4 . A triangle has [NP a perimeter of 16] and one side of length 4 . A triangle has a perimeter {of 16 and one side of length 4} .

20

slide-21
SLIDE 21

Allowing annotators to selectively annotate important phenomena, makes the process faster and simpler.

(Mielens et al., 2015)

Why Partial Annotations?

21

slide-22
SLIDE 22

22

Definition Training Parsing as Span Classification The Span Classification Model

slide-23
SLIDE 23

Objective for Full Annotation

23

slide-24
SLIDE 24

Since we do not have a full parse, marginalize out components for which no supervision exists.

Objective for Partial Annotation

24

slide-25
SLIDE 25

Marginalize out components for which no supervision exists.

Objective for Partial Annotation

25

Expensive!

slide-26
SLIDE 26

26

One Solution: Approximation*

*(Mirroshandel and Nasr, 2011; Majidi and Crane, 2013, Nivre et al., 2014; Li et al., 2016)

slide-27
SLIDE 27

Our Solution: Parsing as Span Classification

Assume probability of a parse factors into a product of probabilities.

27

slide-28
SLIDE 28

Our Solution: Parsing as Span Classification

Assume probability of a parse factors into a product of probabilities.

28

slide-29
SLIDE 29

Our Solution: Parsing as Span Classification

Assume probability of a parse factors into a product of probabilities.

29

slide-30
SLIDE 30

Our Solution: Parsing as Span Classification

Assume probability of a parse factors into a product of probabilities. Objective now simplifies to: Easy if model classifies spans!

30

slide-31
SLIDE 31

31

Definition Training Parsing as Span Classification The Span Classification Model

slide-32
SLIDE 32

Parse Tree Labels All Spans*

32 *(Cross and Huang, 2016; Stern et al., 2017)

slide-33
SLIDE 33

Parse Tree Labels All Spans*

33 *(Cross and Huang, 2016; Stern et al., 2017)

slide-34
SLIDE 34

Parse Tree Labels All Spans*

34 *(Cross and Huang, 2016; Stern et al., 2017)

slide-35
SLIDE 35

Parse Tree Labels All Spans*

35 *(Cross and Huang, 2016; Stern et al., 2017)

slide-36
SLIDE 36

Parse Tree Labels All Spans*

36 *(Cross and Huang, 2016; Stern et al., 2017)

slide-37
SLIDE 37

Parse Tree Labels All Spans*

37 *(Cross and Huang, 2016; Stern et al., 2017)

slide-38
SLIDE 38

Parse Tree Labels All Spans*

38 *(Cross and Huang, 2016; Stern et al., 2017)

slide-39
SLIDE 39

Parse Tree Labels All Spans*

39 *(Cross and Huang, 2016; Stern et al., 2017)

slide-40
SLIDE 40

Parse Tree Labels All Spans*

40 *(Cross and Huang, 2016; Stern et al., 2017)

slide-41
SLIDE 41

Parse Tree Labels All Spans*

41 *(Cross and Huang, 2016; Stern et al., 2017)

slide-42
SLIDE 42

Parse Tree Labels All Spans*

42 *(Cross and Huang, 2016; Stern et al., 2017)

slide-43
SLIDE 43

▪ A partial annotation is a labeled span. ▪ A full parse labels every span in the sentence. Therefore, training on both is identical under our derived objective.

Training on Full and Partial Annotations

43

slide-44
SLIDE 44

Parsing Using Span Classification Model

Find maximum using dynamic programming:

44

slide-45
SLIDE 45

Partial annotations are labeled spans.

45

Summary

slide-46
SLIDE 46

Partial annotations are labeled spans. Use a span classification model to parse.

46

Summary

slide-47
SLIDE 47

Partial annotations are labeled spans. Use a span classification model to parse. Training on partial and full annotations becomes identical.

47

Summary

slide-48
SLIDE 48

48

Definition Training Parsing as Span Classification The Span Classification Model

slide-49
SLIDE 49

Model Architecture (Stern et al., 2017)

49

She enjoys playing tennis .

slide-50
SLIDE 50

Model Architecture (Stern et al., 2017)

50

She enjoys playing tennis .

slide-51
SLIDE 51

Model Architecture (Stern et al., 2017)

51

She enjoys playing tennis .

LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM

slide-52
SLIDE 52

Model Architecture (Stern et al., 2017)

52

She enjoys playing tennis .

. . . . . . . . . . . . . . .

slide-53
SLIDE 53

Span Embedding (Wang and Chang, 2016; Cross and Huang, 2016; Stern et al., 2017)

53

“enjoys playing”

=

  • She enjoys playing tennis .

. . . . . . . . . . . . . . .

slide-54
SLIDE 54

Model Architecture (Stern et al., 2017)

54

“enjoys playing”

=

  • She enjoys playing tennis .

. . . . . . . . . . . . . . .

MLP

slide-55
SLIDE 55

Differences

55

Ours Stern et al., 2017 Objective Maximum likelihood on labels Maximum margin

  • n trees

ELMo Yes No POS Tags as Input No Yes

slide-56
SLIDE 56

Differences

56

Ours Stern et al., 2017 Objective Maximum likelihood on labels Maximum margin

  • n trees

ELMo Yes No POS Tags as Input No Yes

slide-57
SLIDE 57

Differences

57

Ours Stern et al., 2017 Objective Maximum likelihood on labels Maximum margin

  • n trees

ELMo Yes No POS Tags as Input No Yes

slide-58
SLIDE 58

Differences

58

Ours Stern et al., 2017 Objective Maximum likelihood on labels Maximum margin

  • n trees

ELMo Yes No POS Tags as Input No Yes

slide-59
SLIDE 59

59

Experiments and Results

Performance on PTB Learning Curve on New Domains Adapting Using Partial Annotations

slide-60
SLIDE 60

Performance on PTB

60

+2.2 F1

+ELMo

91.8 F1

Stern et al., 2017

+0.3 F1

+Maximum Likelihood on Labels

  • POS tags

94.3 F1

Ours

=

slide-61
SLIDE 61

Performance on PTB

61

92.6 F1

Effective Inference for Generative Neural Parsing

94.3 F1

Ours

+1.7 F1

Over Previous SoTA*

*New SoTA is 95.1 (Kitaev and Klein, ACL 2018)

slide-62
SLIDE 62

62

Performance on PTB Learning Curve on New Domains Adapting Using Partial Annotations

slide-63
SLIDE 63

Question Bank (Judge et al., 2006)

▪ 4,000 questions. ▪ In contrast, PTB has few questions. Who is the author of the book, ``The Iron Lady: A Biography

  • f Margaret Thatcher''?

63

slide-64
SLIDE 64

Do We Need Domain Adaptation?

64

89.9 F1

PTB

Number of parses from Question Bank F1

+7.2 %

Training on QB

slide-65
SLIDE 65

How Much Data Do We Need?

65

89.9 F1

PTB

Number of parses from Question Bank F1

+0.9 %

From 100 to 2,000 parses

+6.3 %

From 0 to 100 parses

slide-66
SLIDE 66

How Much Data Do We Need?

66

89.9 F1

PTB

Number of parses from Question Bank F1

Not Much

Improvements taper quickly

slide-67
SLIDE 67

67

Experiments and Results

Performance on PTB Learning Curve on New Domains Adapting Using Partial Annotations

slide-68
SLIDE 68

Geometry Problems (Seo et al., 2015)

In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD at E. What is the length of BD?

68

Biochemistry (Nivre et al., 2007)

Ethoxycoumarin was metabolized by isolated epidermal cells via dealkylation to 7-hydroxycoumarin ( 7-OHC ) and subsequent conjugation .

slide-69
SLIDE 69

Setup

Annotator is a parsing expert. Sees parser output. Annotated sentences randomly split into train and dev.

69

slide-70
SLIDE 70

Biochemistry Annotations

610 partial annotations (Avg. 4.6 per sentence) train: 72 sent, dev: 62 sent

[ [ In situ ] hybridization ] has revealed a striking subnuclear

distribution of [ c-myc RNA transcripts ] .

[ Cell growth of neuroblastoma cells in [ serum containing

medium ] ] was clearly diminished by [ inhibition of FPTase ]

70

slide-71
SLIDE 71

What do partial annotations buy us?

71

Correct Constituent % Error-Free Sentences %

+9.4% +29.7%

slide-72
SLIDE 72

Geometry Annotations

379 partial annotations (Avg. 3 per sentence) train: 63 sent, dev: 62 sent

What is [ the value of [ y { + z } ] ] ?

[ Diameter AC ] is perpendicular [ to chord BD ] [ at E ] .

Find [ the measure of [ the angle designated by x ] ] .

72

slide-73
SLIDE 73

What do partial annotations buy us?

73

Correct Constituent % Error-Free Sentences %

+15.1% +33.4%

slide-74
SLIDE 74

Iterative Annotation

74

slide-75
SLIDE 75

Error Analysis on Geometry Training Set

44% math syntax Eg: “dimensions 16 by 8,” “BAC = ¼ * ACB” 19% right-attaching participial adjectives Eg: “segment labeled x,” “the center indicated” 19% PP-attachment

75

slide-76
SLIDE 76

Right Attaching Participial Adjective Error

76

Find the hypotenuse of the triangle labeled t.

slide-77
SLIDE 77

Iterative Annotation Proof-of-Concept

Invent 3 sentences similar to the incorrect one: Find the hypotenuse of [ the triangle labeled t ] .

77

slide-78
SLIDE 78

Iterative Annotation Proof-of-Concept

Invent 3 sentences similar to the incorrect one: Find the hypotenuse of [ the triangle labeled t ] . Given [ a circle with [ the tangent shown ] ] .

78

slide-79
SLIDE 79

Iterative Annotation Proof-of-Concept

Invent 3 sentences similar to the incorrect one: Find the hypotenuse of [ the triangle labeled t ] . Given [ a circle with [ the tangent shown ] ] . Examine [ the following diagram with [ the square highlighted ] ] .

79

slide-80
SLIDE 80

Performance after Iterative Annotation

Correctly identified constituents: 87.0% → 88.6% (+1.6) Error free sentences: 72.6% → 75.8% (+2.7)

80

slide-81
SLIDE 81

Conclusion

81

  • Recent developments make it much easier to train on

partial annotations and build custom parsers.

  • Making a few partial annotations can lead to significant

performance improvements.

Demo: http://demo.allennlp.org/constituency-parsing Datasets: https://github.com/vidurj/parser-adaptation/tree/master/data

slide-82
SLIDE 82

82