Annotating and Automatically Tagging Constructions of Causal - - PowerPoint PPT Presentation
Annotating and Automatically Tagging Constructions of Causal - - PowerPoint PPT Presentation
Annotating and Automatically Tagging Constructions of Causal Language What Google displays for why questions What Google displays for why questions What Google displays for why questions could be a lot more helpful. What
What Google displays for “why” questions
What Google displays for “why” questions
What Google displays for “why” questions could be a lot more helpful.
What Google displays for “why” questions could be a lot more helpful.
33% 12%
Such cause-and-effect questions & assertions are far from rare.
33% 12%
Such cause-and-effect questions & assertions are far from rare.
>5%
Why
CAUSATION
We’d like to be able to parse causal relationships in text.
CAUSATION
set the stage for Why
CAUSATION
We’d like to be able to parse causal relationships in text.
CAUSATION
set the stage for Why
CAUSATION
This style of analysis is known as “shallow semantic parsing.”
CAUSATION
set the stage for Why
CAUSATION
This style of analysis is known as “shallow semantic parsing.”
CAUSATION
set the stage for Why
CAUSATION
This style of analysis is known as “shallow semantic parsing.”
Task definition: connective discovery + argument identification
because
Connective discovery Argument identification
Causality is expressed in an enormous variety of ways.
Causality is expressed in an enormous variety of ways.
impede because of so
Causality is expressed in an enormous variety of ways.
impede because of so
- pens the way for
Causality is expressed in an enormous variety of ways.
impede because of so
- pens the way for
so that
Causality is expressed in an enormous variety of ways.
impede because of so
- pens the way for
so that After The more the less
Each existing semantic parsing representation handles only a portion of this space.
Each existing semantic parsing representation handles only a portion of this space.
MAKE.02
made Verbs only
Each existing semantic parsing representation handles only a portion of this space.
CONTINGENCY:Cause
so Conjunctions & adverbs only
MAKE.02
made Verbs only
Each existing semantic parsing representation handles only a portion of this space.
CONTINGENCY:Cause
so Conjunctions & adverbs only
CAUSATION
made Words or constituents
- nly
MAKE.02
made Verbs only
Each existing semantic parsing representation handles only a portion of this space.
CONTINGENCY:Cause
so Conjunctions & adverbs only
(one) meaning
- ne
word
CAUSATION
made Words or constituents
- nly
MAKE.02
made Verbs only
Construction grammar (CxG)
- ffers a way forward.
Construction grammar (CxG)
- ffers a way forward.
EXTREME(_______) _____ ______ _______ so that
Construction grammar (CxG)
- ffers a way forward.
EXTREME(_______) _____ ______ _______ so that
Construction grammar (CxG)
- ffers a way forward.
(It’s not just causality, either.)
(It’s not just causality, either.)
as as More than as as is
Comparatives
(It’s not just causality, either.)
in spite of no matter as as as More than as as is
Concessives Comparatives
Full CxG theory means “constructions all the way down”:
so offensive that I left
Full CxG theory means “constructions all the way down”:
〈so ___〉 EXTREME(__)
so offensive that I left
Full CxG theory means “constructions all the way down”:
〈so ___〉 〈_ _ _ _ _ _〉 EXTREME(__)
so offensive that I left
Full CxG theory means “constructions all the way down”:
〈so ___〉 〈_ _ _ _ _ _〉 EXTREME(__)
so offensive that I left
Full CxG theory means “constructions all the way down”:
〈that ____〉 〈so ___〉 Comple- ment 〈_ _ _ _ _ _〉 EXTREME(__) ____
so offensive that I left
Full CxG theory means “constructions all the way down”:
〈that ____〉 〈so ___〉 Comple- ment 〈_ _ _ _ _ _〉 〈so ___that ____〉 EXTREME(__) ____ EXTREME(__) ____
so offensive that I left
Full CxG theory means “constructions all the way down”:
〈that ____〉 〈so ___〉 Comple- ment 〈_ _ _ _ _ _〉 〈so ___that ____〉 EXTREME(__) ____ EXTREME(__) ____
- ffensive
I left so offensive that I left
… Tokenization POS tagging, syntactic parsing Construction recognition
The “constructions on top” approach reaps the low-hanging fruit from applying CxG to NLP .
… Tokenization POS tagging, syntactic parsing Construction recognition Tagging causal relations
The “constructions on top” approach reaps the low-hanging fruit from applying CxG to NLP .
“Constructions on top” borrows two key insights of CxG.
- 1. Words, multi-word expressions, and grammar
are all on equal footing as “learned pairings of form and function.”
- 2. Constructions pair patterns of surface forms
directly with meanings.
Using the “constructions on top” approach to applying CxG, we can:
- richer, more flexible linguistic representations
Using the “constructions on top” approach to applying CxG, we can:
- richer, more flexible linguistic representations
- Design annotation guidelines & annotate a corpus
Using the “constructions on top” approach to applying CxG, we can:
- richer, more flexible linguistic representations
- Design annotation guidelines & annotate a corpus
- automated machine learning taggers
Using the “constructions on top” approach to applying CxG, we can:
T
- day’s talk:
The BECAUSE annotation scheme & corpus
- f causal language
Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions
T
- day’s talk:
The BECAUSE annotation scheme & corpus
- f causal language
Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions
Previous projects have struggled to annotate real-world causality.
<e1>flu</e1> <e2>virus</e2>
Cause-Effect(e2, e1) = "true"
allocated equip
BEFORE-PRECONDITIONS
Existing shallow semantic parsing schemes include some elements of causal language.
… CAUSATION
made
Causal language: a clause or phrase in which
- ne event, state, action, or entity
is explicitly presented as promoting or hindering another Causal language: a clause or phrase in which
- ne event, state, action, or entity
is explicitly presented as promoting or hindering another
Connective: fixed constructional cue indicating a causal relationship
because prevented from causes
Connective: fixed constructional cue indicating a causal relationship
because prevented from causes
Cause: presented as producing effect Effect: presented as outcome
John trapped the fox it was threatening his chickens John the fox eating his chickens Ice cream consumption drowning
Connectives can be arbitrarily complex.
For to
- pens the way for
We distinguish three types of causation.
because of
CONSEQUENCE
because
MOTIVATION
in order to
PURPOSE
F1 0.77 κ 0.70
Latest annotation scheme shows very good inter-annotator agreement.
We have annotated a small corpus with this scheme.
Total 121 4790 1803
BECAUSE = B E Cau S E
Actual corpus examples can get quite complex.
For to must allowed to Average causal sentence length: 30 words If prevents because
T
- day’s talk:
The BECAUSE annotation scheme & corpus
- f causal language
Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions
T
- day’s talk:
The BECAUSE annotation scheme & corpus
- f causal language
Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions
The computational task is challenging.
Long tail of causal connectives Requires sense disambiguation of connectives
for to for to
Complex output structure Combinatorial connective possibilities
from from
- 1. Pattern-based
connective discovery
I…died from worry …called me from your hotel from from
- 1. Pattern-based
connective discovery
- 2. Argument
identification
…called me from your hotel I…died from worry …called me from your hotel from from I…died from worry
- 1. Pattern-based
connective discovery
- 2. Argument
identification
- 3. Statistical classifier
to filter results
…called me from your hotel I…died from worry …called me from your hotel from from I…died from worry
- 1. Pattern-based
connective discovery
- 2. Argument
identification
- 3. Statistical classifier
to filter results
- 4. Remove duplicate
connectives
…called me from your hotel I…died from worry …called me from your hotel from from I…died from worry
- 1. Pattern-based
connective discovery
- 2. Argument
identification
- 3. Statistical classifier
to filter results
- 4. Remove duplicate
connectives
…called me from your hotel I…died from worry …called me from your hotel from from I…died from worry
- 1. Pattern-based
connective discovery
- 2. Argument
identification
- 3. Statistical classifier
to filter results
- 4. Remove duplicate
connectives
i.
Causeway-S: Syntax-based pipeline
ii.
Causeway-L: Lexical pattern-based pipeline
Causeway-S/Causeway-L: two pattern-based taggers for causal constructions
i.
Causeway-S: Syntax-based pipeline
ii.
Causeway-L: Lexical pattern-based pipeline
Causeway-S/Causeway-L: two pattern-based taggers for causal constructions
each construction is treated as a partially-fixed parse tree fragment.
each construction is treated as a partially-fixed parse tree fragment.
each construction is treated as a partially-fixed parse tree fragment.
advcl mark
because/IN
each construction is treated as a partially-fixed parse tree fragment.
advcl mark
because/IN
TRegex patterns are extracted in training, and matched at test time.
because
advcl mark
because/IN
(/^because_[0-9]+$/ <2 /^IN.*/ <1 mark > (/.*_[0-9]+/ <1 advcl > (/.*_[0-9]+/)))
TRegex patterns are extracted in training, and matched at test time.
because
advcl mark
because/IN
(/^because_[0-9]+$/ <2 /^IN.*/ <1 mark > (/.*_[0-9]+/ <1 advcl > (/.*_[0-9]+/))) (/^because_[0-9]+$/ <2 /^IN.*/ <1 mark > (/.*_[0-9]+/ <1 advcl > (/.*_[0-9]+/)))
TRegex 1
because
+
Argument heads are expanded to include most dependents.
care/VBP worry/VBP
Argument heads are expanded to include most dependents.
care/VBP worry/VBP
- 1. Causeway-S/Causeway-L: two simple systems
for tagging causal constructions
i.
Causeway-S: Syntax-based pipeline
ii.
Causeway-L: Lexical pattern-based pipeline
- 1. Causeway-S/Causeway-L: two simple systems
for tagging causal constructions
i.
Causeway-S: Syntax-based pipeline
ii.
Causeway-L: Lexical pattern-based pipeline
constructions are matched by regular expressions over word lemmas.
because
regex
because
+
(ˆ | )([ \ S]+ )+?(because/IN) ([ \ S]+ )+? (ˆ | )([ \ S]+ )+?(because/IN) ([ \ S]+ )+?
Arguments are labeled by a conditional random field.
… …
Both approaches use a soft vote of three classifiers as a filter.
Both approaches use a soft vote of three classifiers as a filter.
Both approaches use a soft vote of three classifiers as a filter.
Both approaches use a soft vote of three classifiers as a filter.
Example classifier features :
Our benchmark is a dependency path memorization heuristic.
27/ 4 0 / 8
14 / 1
…
11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.6%
Connective discovery
11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.6%
Connective discovery: Causeway outperforms the benchmark by ~20 points.
11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.5%
Performance improves even more when Causeway is combined with the benchmark.
11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.6%
The first stage gets high recall & low precision
11.5% 51.7% 13.4% 50.8% 31.6% 54.9% 54.6%
The first stage gets high recall & low precision, but the filters balance them out for a better F1.
Argument identification is passable given connective discovery, though effects are harder than causes.
Argument identification is passable given connective discovery, though effects are harder than causes.
T
- day’s talk:
The BECAUSE annotation scheme & corpus
- f causal language
Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions
T
- day’s talk:
The BECAUSE annotation scheme & corpus
- f causal language
Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions
Transition-based tagging builds a complex output structure using a sequence of simple operations.
The DeepCx transition scheme
The DeepCx transition scheme
The DeepCx transition scheme
The DeepCx transition scheme
Tagger state
The DeepCx transition scheme
Tagger state
NO- CONN
The DeepCx transition scheme
Tagger state
NO- CONN
The DeepCx transition scheme
Tagger state
NO- CONN
The DeepCx transition scheme
Tagger state
NO- CONN
The DeepCx transition scheme
Tagger state
NO- CONN
The DeepCx transition scheme
Tagger state
NO- CONN NEW-CONN
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance Tagger state
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) CONN-FRAG-R
Fully
LEFT-ARG(x)
The DeepCx transition scheme
Partially-constructed instance
NO- CONN NEW-CONN NO-ARG-L RIGHT-ARG(x) NO-ARG-R CONN-FRAG-R
Fully
LEFT-ARG(x)
DeepCx uses long short-term memory (LSTM) networks to embed sequences of words.
x2
σ tanh σ σ tanh
y2
× + × ×
x3
σ tanh σ σ tanh
y3
× + × ×
x1
σ tanh σ σ tanh
y1
× + × ×
DeepCx uses long short-term memory (LSTM) networks to embed sequences of words.
x2
σ tanh σ σ tanh
y2
× + × ×
x3
σ tanh σ σ tanh
y3
× + × ×
x1
σ tanh σ σ tanh
y1
× + × ×
DeepCx uses long short-term memory (LSTM) networks to embed sequences of words.
x2
σ tanh σ σ tanh
y2
× + × ×
x3
σ tanh σ σ tanh
y3
× + × ×
x1
σ tanh σ σ tanh
y1
× + × ×
Summary of inputs so far
Transition probabilities Tagger state
Connective words Cause words Effect words Means words Transition probabilities Tagger state
Connective words Cause words Effect words Means words Transition probabilities Tagger state
Connective words Cause words Effect words Means words Transition probabilities Uncompared words (L) Compared words (L) Uncompared words (R) Compared words (R) Tagger state Comparison word
Connective words Cause words Effect words Means words Transition probabilities Uncompared words (L) Compared words (L) Uncompared words (R) Compared words (R) Parse path Tagger state Comparison word
Action history Connective words Cause words Effect words Means words Transition probabilities Uncompared words (L) Compared words (L) Uncompared words (R) Compared words (R) Parse path Tagger state Comparison word
Action history Connective words Cause words Effect words Means words Transition probabilities Uncompared words (L) Compared words (L) Uncompared words (R) Compared words (R) Parse path Tagger state Fixed word vectors Learned word vectors POS tag embedding Comparison word
DeepCx significantly outperforms Causeway
- n connective discovery.
29.5% 53.1% 52.3% 59.2% 60.5%
DeepCx significantly outperforms Causeway
- n connective discovery.
29.5% 53.1% 52.3% 59.2% 60.5%
DeepCx significantly outperforms Causeway
- n connective discovery.
29.5% 53.1% 52.3% 59.2% 60.5%
DeepCx significantly outperforms Causeway
- n connective discovery.
DeepCx also significantly outperforms Causeway
- n argument identification.
DeepCx also significantly outperforms Causeway
- n argument identification.
DeepCx also significantly outperforms Causeway
- n argument identification.
T
- day’s talk:
The BECAUSE annotation scheme & corpus
- f causal language
Causeway-L/Causeway-S: two pattern-based taggers for causal constructions DeepCx: a neural, transition-based tagger for causal constructions
Contributions
- 1. The “constructions on top” approach
to operationalizing CxG
- 2. A COT
- based approach
to comprehensively annotating causal language
- 3. Pattern-based methods & architecture
for tagging causal constructions
- 4. Transition scheme & DNN architecture
for tagging complex constructions
Contributions
- 1. The “constructions on top” approach
to operationalizing CxG
- 2. A COT
- based approach
to comprehensively annotating causal language
- 3. Pattern-based methods & architecture
for tagging causal constructions
- 4. Transition scheme & DNN architecture
for tagging complex constructions
1 2 3
1 2