Annotating and Automatically Tagging Constructions of Causal - - PowerPoint PPT Presentation

annotating and automatically tagging constructions of
SMART_READER_LITE
LIVE PREVIEW

Annotating and Automatically Tagging Constructions of Causal - - PowerPoint PPT Presentation

Annotating and Automatically Tagging Constructions of Causal Language What Google displays for why questions What Google displays for why questions What Google displays for why questions could be a lot more helpful. What


slide-1
SLIDE 1

Annotating and Automatically Tagging Constructions of Causal Language

slide-2
SLIDE 2

What Google displays for “why” questions

slide-3
SLIDE 3

What Google displays for “why” questions

slide-4
SLIDE 4

What Google displays for “why” questions could be a lot more helpful.

slide-5
SLIDE 5

What Google displays for “why” questions could be a lot more helpful.

slide-6
SLIDE 6

33% 12%

Such cause-and-effect questions & assertions are far from rare.

slide-7
SLIDE 7

33% 12%

Such cause-and-effect questions & assertions are far from rare.

>5%

slide-8
SLIDE 8

Why

CAUSATION

We’d like to be able to parse causal relationships in text.

slide-9
SLIDE 9

CAUSATION

set the stage for Why

CAUSATION

We’d like to be able to parse causal relationships in text.

slide-10
SLIDE 10

CAUSATION

set the stage for Why

CAUSATION

This style of analysis is known as “shallow semantic parsing.”

slide-11
SLIDE 11

CAUSATION

set the stage for Why

CAUSATION

This style of analysis is known as “shallow semantic parsing.”

slide-12
SLIDE 12

CAUSATION

set the stage for Why

CAUSATION

This style of analysis is known as “shallow semantic parsing.”

slide-13
SLIDE 13

Task definition: connective discovery + argument identification

because

Connective discovery Argument identification

slide-14
SLIDE 14

Causality is expressed in an enormous variety of ways.

slide-15
SLIDE 15

Causality is expressed in an enormous variety of ways.

impede because of so

slide-16
SLIDE 16

Causality is expressed in an enormous variety of ways.

impede because of so

  • pens the way for
slide-17
SLIDE 17

Causality is expressed in an enormous variety of ways.

impede because of so

  • pens the way for

so that

slide-18
SLIDE 18

Causality is expressed in an enormous variety of ways.

impede because of so

  • pens the way for

so that After The more the less

slide-19
SLIDE 19

Each existing semantic parsing representation handles only a portion of this space.

slide-20
SLIDE 20

Each existing semantic parsing representation handles only a portion of this space.

MAKE.02

made Verbs only

slide-21
SLIDE 21

Each existing semantic parsing representation handles only a portion of this space.

CONTINGENCY:Cause

so Conjunctions & adverbs only

MAKE.02

made Verbs only

slide-22
SLIDE 22

Each existing semantic parsing representation handles only a portion of this space.

CONTINGENCY:Cause

so Conjunctions & adverbs only

CAUSATION

made Words or constituents

  • nly

MAKE.02

made Verbs only

slide-23
SLIDE 23

Each existing semantic parsing representation handles only a portion of this space.

CONTINGENCY:Cause

so Conjunctions & adverbs only

(one) meaning

  • ne

word

CAUSATION

made Words or constituents

  • nly

MAKE.02

made Verbs only

slide-24
SLIDE 24

Construction grammar (CxG)

  • ffers a way forward.
slide-25
SLIDE 25

Construction grammar (CxG)

  • ffers a way forward.
slide-26
SLIDE 26

EXTREME(_______)  _____ ______ _______ so that

Construction grammar (CxG)

  • ffers a way forward.
slide-27
SLIDE 27

EXTREME(_______)  _____ ______ _______ so that

Construction grammar (CxG)

  • ffers a way forward.
slide-28
SLIDE 28

(It’s not just causality, either.)

slide-29
SLIDE 29

(It’s not just causality, either.)

as as More than as as is

Comparatives

slide-30
SLIDE 30

(It’s not just causality, either.)

in spite of no matter as as as More than as as is

Concessives Comparatives

slide-31
SLIDE 31

Full CxG theory means “constructions all the way down”:

so offensive that I left

slide-32
SLIDE 32

Full CxG theory means “constructions all the way down”:

〈so ___〉 EXTREME(__)

so offensive that I left

slide-33
SLIDE 33

Full CxG theory means “constructions all the way down”:

〈so ___〉 〈_ _ _ _ _ _〉 EXTREME(__)

so offensive that I left

slide-34
SLIDE 34

Full CxG theory means “constructions all the way down”:

〈so ___〉 〈_ _ _ _ _ _〉 EXTREME(__)

so offensive that I left

slide-35
SLIDE 35

Full CxG theory means “constructions all the way down”:

〈that ____〉 〈so ___〉 Comple- ment 〈_ _ _ _ _ _〉 EXTREME(__) ____

so offensive that I left

slide-36
SLIDE 36

Full CxG theory means “constructions all the way down”:

〈that ____〉 〈so ___〉 Comple- ment 〈_ _ _ _ _ _〉 〈so ___that ____〉 EXTREME(__) ____ EXTREME(__)  ____

so offensive that I left

slide-37
SLIDE 37

Full CxG theory means “constructions all the way down”:

〈that ____〉 〈so ___〉 Comple- ment 〈_ _ _ _ _ _〉 〈so ___that ____〉 EXTREME(__) ____ EXTREME(__)  ____

  • ffensive

I left so offensive that I left

slide-38
SLIDE 38

… Tokenization POS tagging, syntactic parsing Construction recognition

The “constructions on top” approach reaps the low-hanging fruit from applying CxG to NLP .

slide-39
SLIDE 39

… Tokenization POS tagging, syntactic parsing Construction recognition Tagging causal relations

The “constructions on top” approach reaps the low-hanging fruit from applying CxG to NLP .

slide-40
SLIDE 40

“Constructions on top” borrows two key insights of CxG.

  • 1. Words, multi-word expressions, and grammar

are all on equal footing as “learned pairings of form and function.”

  • 2. Constructions pair patterns of surface forms

directly with meanings.

slide-41
SLIDE 41

Using the “constructions on top” approach to applying CxG, we can:

slide-42
SLIDE 42
  • richer, more flexible linguistic representations

Using the “constructions on top” approach to applying CxG, we can:

slide-43
SLIDE 43
  • richer, more flexible linguistic representations
  • Design annotation guidelines & annotate a corpus

Using the “constructions on top” approach to applying CxG, we can:

slide-44
SLIDE 44
  • richer, more flexible linguistic representations
  • Design annotation guidelines & annotate a corpus
  • automated machine learning taggers

Using the “constructions on top” approach to applying CxG, we can:

slide-45
SLIDE 45

T

  • day’s talk:

The BECAUSE annotation scheme & corpus

  • f causal language

Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions

slide-46
SLIDE 46

T

  • day’s talk:

The BECAUSE annotation scheme & corpus

  • f causal language

Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions

slide-47
SLIDE 47

Previous projects have struggled to annotate real-world causality.

<e1>flu</e1> <e2>virus</e2>

Cause-Effect(e2, e1) = "true"

allocated equip

BEFORE-PRECONDITIONS

slide-48
SLIDE 48

Existing shallow semantic parsing schemes include some elements of causal language.

… CAUSATION

made

slide-49
SLIDE 49

Causal language: a clause or phrase in which

  • ne event, state, action, or entity

is explicitly presented as promoting or hindering another Causal language: a clause or phrase in which

  • ne event, state, action, or entity

is explicitly presented as promoting or hindering another

slide-50
SLIDE 50

Connective: fixed constructional cue indicating a causal relationship

because prevented from causes

slide-51
SLIDE 51

Connective: fixed constructional cue indicating a causal relationship

because prevented from causes

slide-52
SLIDE 52

Cause: presented as producing effect Effect: presented as outcome

John trapped the fox it was threatening his chickens John the fox eating his chickens Ice cream consumption drowning

slide-53
SLIDE 53

Connectives can be arbitrarily complex.

For to

  • pens the way for
slide-54
SLIDE 54

We distinguish three types of causation.

because of

CONSEQUENCE

because

MOTIVATION

in order to

PURPOSE

slide-55
SLIDE 55

F1 0.77 κ 0.70

Latest annotation scheme shows very good inter-annotator agreement.

slide-56
SLIDE 56

We have annotated a small corpus with this scheme.

Total 121 4790 1803

BECAUSE = B E Cau S E

slide-57
SLIDE 57

Actual corpus examples can get quite complex.

For to must allowed to Average causal sentence length: 30 words If prevents because

slide-58
SLIDE 58

T

  • day’s talk:

The BECAUSE annotation scheme & corpus

  • f causal language

Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions

slide-59
SLIDE 59

T

  • day’s talk:

The BECAUSE annotation scheme & corpus

  • f causal language

Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions

slide-60
SLIDE 60

The computational task is challenging.

Long tail of causal connectives Requires sense disambiguation of connectives

for to for to

Complex output structure Combinatorial connective possibilities

slide-61
SLIDE 61

from from

  • 1. Pattern-based

connective discovery

slide-62
SLIDE 62

I…died from worry …called me from your hotel from from

  • 1. Pattern-based

connective discovery

  • 2. Argument

identification

slide-63
SLIDE 63

…called me from your hotel I…died from worry …called me from your hotel from from I…died from worry

  • 1. Pattern-based

connective discovery

  • 2. Argument

identification

  • 3. Statistical classifier

to filter results

slide-64
SLIDE 64

…called me from your hotel I…died from worry …called me from your hotel from from I…died from worry

  • 1. Pattern-based

connective discovery

  • 2. Argument

identification

  • 3. Statistical classifier

to filter results

  • 4. Remove duplicate

connectives

slide-65
SLIDE 65

…called me from your hotel I…died from worry …called me from your hotel from from I…died from worry

  • 1. Pattern-based

connective discovery

  • 2. Argument

identification

  • 3. Statistical classifier

to filter results

  • 4. Remove duplicate

connectives

slide-66
SLIDE 66

…called me from your hotel I…died from worry …called me from your hotel from from I…died from worry

  • 1. Pattern-based

connective discovery

  • 2. Argument

identification

  • 3. Statistical classifier

to filter results

  • 4. Remove duplicate

connectives

slide-67
SLIDE 67

i.

Causeway-S: Syntax-based pipeline

ii.

Causeway-L: Lexical pattern-based pipeline

Causeway-S/Causeway-L: two pattern-based taggers for causal constructions

slide-68
SLIDE 68

i.

Causeway-S: Syntax-based pipeline

ii.

Causeway-L: Lexical pattern-based pipeline

Causeway-S/Causeway-L: two pattern-based taggers for causal constructions

slide-69
SLIDE 69

each construction is treated as a partially-fixed parse tree fragment.

slide-70
SLIDE 70

each construction is treated as a partially-fixed parse tree fragment.

slide-71
SLIDE 71

each construction is treated as a partially-fixed parse tree fragment.

advcl mark

because/IN

slide-72
SLIDE 72

each construction is treated as a partially-fixed parse tree fragment.

advcl mark

because/IN

slide-73
SLIDE 73

TRegex patterns are extracted in training, and matched at test time.

because

advcl mark

because/IN

(/^because_[0-9]+$/ <2 /^IN.*/ <1 mark > (/.*_[0-9]+/ <1 advcl > (/.*_[0-9]+/)))

slide-74
SLIDE 74

TRegex patterns are extracted in training, and matched at test time.

because

advcl mark

because/IN

(/^because_[0-9]+$/ <2 /^IN.*/ <1 mark > (/.*_[0-9]+/ <1 advcl > (/.*_[0-9]+/))) (/^because_[0-9]+$/ <2 /^IN.*/ <1 mark > (/.*_[0-9]+/ <1 advcl > (/.*_[0-9]+/)))

TRegex 1

because

+

slide-75
SLIDE 75

Argument heads are expanded to include most dependents.

care/VBP worry/VBP

slide-76
SLIDE 76

Argument heads are expanded to include most dependents.

care/VBP worry/VBP

slide-77
SLIDE 77
  • 1. Causeway-S/Causeway-L: two simple systems

for tagging causal constructions

i.

Causeway-S: Syntax-based pipeline

ii.

Causeway-L: Lexical pattern-based pipeline

slide-78
SLIDE 78
  • 1. Causeway-S/Causeway-L: two simple systems

for tagging causal constructions

i.

Causeway-S: Syntax-based pipeline

ii.

Causeway-L: Lexical pattern-based pipeline

slide-79
SLIDE 79

constructions are matched by regular expressions over word lemmas.

because

regex

because

+

(ˆ | )([ \ S]+ )+?(because/IN) ([ \ S]+ )+? (ˆ | )([ \ S]+ )+?(because/IN) ([ \ S]+ )+?

slide-80
SLIDE 80

Arguments are labeled by a conditional random field.

… …

slide-81
SLIDE 81

Both approaches use a soft vote of three classifiers as a filter.

slide-82
SLIDE 82

Both approaches use a soft vote of three classifiers as a filter.

slide-83
SLIDE 83

Both approaches use a soft vote of three classifiers as a filter.

slide-84
SLIDE 84

Both approaches use a soft vote of three classifiers as a filter.

Example classifier features :

slide-85
SLIDE 85

Our benchmark is a dependency path memorization heuristic.

27/ 4 0 / 8

14 / 1

slide-86
SLIDE 86

11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.6%

Connective discovery

slide-87
SLIDE 87

11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.6%

Connective discovery: Causeway outperforms the benchmark by ~20 points.

slide-88
SLIDE 88

11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.5%

Performance improves even more when Causeway is combined with the benchmark.

slide-89
SLIDE 89

11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.6%

The first stage gets high recall & low precision

slide-90
SLIDE 90

11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.6%

The first stage gets high recall & low precision, but the filters balance them out for a better F1.

slide-91
SLIDE 91

Argument identification is passable given connective discovery, though effects are harder than causes.

slide-92
SLIDE 92

Argument identification is passable given connective discovery, though effects are harder than causes.

slide-93
SLIDE 93

T

  • day’s talk:

The BECAUSE annotation scheme & corpus

  • f causal language

Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions

slide-94
SLIDE 94

T

  • day’s talk:

The BECAUSE annotation scheme & corpus

  • f causal language

Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions

slide-95
SLIDE 95

Transition-based tagging builds a complex output structure using a sequence of simple operations.

slide-96
SLIDE 96

The DeepCx transition scheme

slide-97
SLIDE 97

The DeepCx transition scheme

slide-98
SLIDE 98

The DeepCx transition scheme

slide-99
SLIDE 99

The DeepCx transition scheme

Tagger state

slide-100
SLIDE 100

The DeepCx transition scheme

Tagger state

NO- CONN

slide-101
SLIDE 101

The DeepCx transition scheme

Tagger state

NO- CONN

slide-102
SLIDE 102

The DeepCx transition scheme

Tagger state

NO- CONN

slide-103
SLIDE 103

The DeepCx transition scheme

Tagger state

NO- CONN

slide-104
SLIDE 104

The DeepCx transition scheme

Tagger state

NO- CONN

slide-105
SLIDE 105

The DeepCx transition scheme

Tagger state

NO- CONN NEW-CONN

slide-106
SLIDE 106

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN

slide-107
SLIDE 107

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN LEFT-ARG(x)

slide-108
SLIDE 108

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN LEFT-ARG(x)

slide-109
SLIDE 109

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN LEFT-ARG(x)

slide-110
SLIDE 110

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN LEFT-ARG(x)

slide-111
SLIDE 111

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN LEFT-ARG(x)

slide-112
SLIDE 112

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L LEFT-ARG(x)

slide-113
SLIDE 113

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L LEFT-ARG(x)

slide-114
SLIDE 114

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L CONN-FRAG-R LEFT-ARG(x)

slide-115
SLIDE 115

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L CONN-FRAG-R LEFT-ARG(x)

slide-116
SLIDE 116

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L CONN-FRAG-R LEFT-ARG(x)

slide-117
SLIDE 117

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)

slide-118
SLIDE 118

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)

slide-119
SLIDE 119

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)

slide-120
SLIDE 120

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)

slide-121
SLIDE 121

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)

slide-122
SLIDE 122

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)

slide-123
SLIDE 123

The DeepCx transition scheme

Partially-constructed instance Tagger state

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)

slide-124
SLIDE 124

The DeepCx transition scheme

Partially-constructed instance

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R

Fully

LEFT-ARG(x)

slide-125
SLIDE 125

The DeepCx transition scheme

Partially-constructed instance

NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) NO-ARG-R CONN-FRAG-R

Fully

LEFT-ARG(x)

slide-126
SLIDE 126

DeepCx uses long short-term memory (LSTM) networks to embed sequences of words.

x2

σ tanh σ σ tanh

y2

× + × ×

x3

σ tanh σ σ tanh

y3

× + × ×

x1

σ tanh σ σ tanh

y1

× + × ×

slide-127
SLIDE 127

DeepCx uses long short-term memory (LSTM) networks to embed sequences of words.

x2

σ tanh σ σ tanh

y2

× + × ×

x3

σ tanh σ σ tanh

y3

× + × ×

x1

σ tanh σ σ tanh

y1

× + × ×

slide-128
SLIDE 128

DeepCx uses long short-term memory (LSTM) networks to embed sequences of words.

x2

σ tanh σ σ tanh

y2

× + × ×

x3

σ tanh σ σ tanh

y3

× + × ×

x1

σ tanh σ σ tanh

y1

× + × ×

Summary of inputs so far

slide-129
SLIDE 129

Transition probabilities Tagger state

slide-130
SLIDE 130

Connective words Cause words Effect words Means words Transition probabilities Tagger state

slide-131
SLIDE 131

Connective words Cause words Effect words Means words Transition probabilities Tagger state

slide-132
SLIDE 132

Connective words Cause words Effect words Means words Transition probabilities Uncompared words (L) Compared words (L) Uncompared words (R) Compared words (R) Tagger state Comparison word

slide-133
SLIDE 133

Connective words Cause words Effect words Means words Transition probabilities Uncompared words (L) Compared words (L) Uncompared words (R) Compared words (R) Parse path Tagger state Comparison word

slide-134
SLIDE 134

Action history Connective words Cause words Effect words Means words Transition probabilities Uncompared words (L) Compared words (L) Uncompared words (R) Compared words (R) Parse path Tagger state Comparison word

slide-135
SLIDE 135

Action history Connective words Cause words Effect words Means words Transition probabilities Uncompared words (L) Compared words (L) Uncompared words (R) Compared words (R) Parse path Tagger state Fixed word vectors Learned word vectors POS tag embedding Comparison word

slide-136
SLIDE 136

DeepCx significantly outperforms Causeway

  • n connective discovery.
slide-137
SLIDE 137

29.5% 53.1% 52.3% 59.2% 60.5%

DeepCx significantly outperforms Causeway

  • n connective discovery.
slide-138
SLIDE 138

29.5% 53.1% 52.3% 59.2% 60.5%

DeepCx significantly outperforms Causeway

  • n connective discovery.
slide-139
SLIDE 139

29.5% 53.1% 52.3% 59.2% 60.5%

DeepCx significantly outperforms Causeway

  • n connective discovery.
slide-140
SLIDE 140

DeepCx also significantly outperforms Causeway

  • n argument identification.
slide-141
SLIDE 141

DeepCx also significantly outperforms Causeway

  • n argument identification.
slide-142
SLIDE 142

DeepCx also significantly outperforms Causeway

  • n argument identification.
slide-143
SLIDE 143

T

  • day’s talk:

The BECAUSE annotation scheme & corpus

  • f causal language

Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions

slide-144
SLIDE 144

Contributions

  • 1. The “constructions on top” approach

to operationalizing CxG

  • 2. A COT
  • based approach

to comprehensively annotating causal language

  • 3. Pattern-based methods & architecture

for tagging causal constructions

  • 4. Transition scheme & DNN architecture

for tagging complex constructions

slide-145
SLIDE 145

Contributions

  • 1. The “constructions on top” approach

to operationalizing CxG

  • 2. A COT
  • based approach

to comprehensively annotating causal language

  • 3. Pattern-based methods & architecture

for tagging causal constructions

  • 4. Transition scheme & DNN architecture

for tagging complex constructions

1 2 3

1 2