Using Corpus Lexicography of Constructions Jesse Dunietz, Lori - - PowerPoint PPT Presentation

▶

Apr 27, 2023 199 likes •532 views

Annotating Causal Language Using Corpus Lexicography of Constructions Jesse Dunietz, Lori Levin, and Jaime Carbonell LAW 2015 June 5, 2015 Contributions of this paper Raising issues about corpus annotation: Low agreement among

SLIDE 1

Annotating Causal Language Using Corpus Lexicography

f Constructions

Jesse Dunietz, Lori Levin, and Jaime Carbonell LAW 2015 June 5, 2015

SLIDE 2

Contributions of this paper

 Raising issues about corpus annotation:

 Low agreement among non-experts

 Methodology for annotation projects

 Lexicon driven annotation: as in PropBank and FrameNet

 An annotation scheme for causal language in English  A constructicon of causal language in English  A small annotated corpus of causal language in English  All still in progress

SLIDE 3

Ubiquitous in our mental models Ubiquitous in language Useful for downstream applications (e.g., information extraction)

Causal relations would be useful to annotate well…

2nd most common relation between verbs The prevention of FOXP3 expression was not caused by interferences.

Medical symptoms Political events Interpersonal actions

SLIDE 4

…but annotating them raises difficult annotation issues.

causes because of For reasons of forbid to convinced to too to If After Don’t because of

SLIDE 5

1. A detailed, construction-

based representation

SLIDE 6

Several projects have attempted to annotate real-world causality.

flu virus

SemEval 2007 Task 4

allocated equipped

Richer Event Descriptions

BEFORE-PRECONDITIONS

ill need

OVERLAP-CAUSE

SLIDE 7

Others have focused on causal language.

Penn Discourse Treebank

… …

Causality in TempEval-3

CAUSE BEFORE

acquired as a result of agreement

BioCause

SLIDE 8

Causal language: a clause or phrase in which

ne event, state, action, or entity

is explicitly presented as promoting or hindering another

SLIDE 9

Connective: fixed construction indicating a causal relationship

because prevented from causes because

Not “truly” causal

SLIDE 10

Effect: presented as outcome/inferred conclusion Cause: presented as producing/indicating effect

John killed the dog it was threatening his chickens John the dog eating his chickens Ice cream consumption drowning She must have met him before she recognized him yesterday

SLIDE 11

We exclude language that does not encode pure, explicit causation:

SLIDE 12

Four types of causation

because of

CONSEQUENCE

because

MOTIVATION

in order to

PURPOSE

so

INFERENCE

SLIDE 13

Not all causal relationships are of equal strength or polarity.

caused

FACILITATE

Only by can

ENABLE

Without

DISENTAIL

kept from

INHIBIT ENTAIL PREVENT

SLIDE 14

2. Comparison of

two annotation approaches

SLIDE 15

First Try

Dunietz and three annotators (A1, A2, A3)
A1, A2, and A3 are recently graduated

linguistics majors.

A1 had more than one year annotation

experience.

A2 and A3 did not have annotation

experience.

SLIDE 16

First try (Continued)

Rounds of annotation and reconciliation
Produced a coding manual
Annotator A4
Masters in linguistics plus 30 years experience with

corpus annotation and NLP

SLIDE 17

Annotators determined the causation type using a decision tree.

choose feel think fact about the world

utcome

he/she hopes to achieve Purpose Motivation temporally follow the cause more/less likely more or less strongly Disentail Inhibit

SLIDE 18

Annotators determined the causation degree using another decision tree.

increasing decreasing Facilitate Inhibit

SLIDE 19

Annotators found a more fine-grained decision tree too difficult to apply.

increasing decreasing significantly merely Facilitate Enable significantly merely Disentail Inhibit

SLIDE 20

We have annotated a small corpus with this scheme.

Total 93 3333 845

SLIDE 21

We computed intercoder agreement between Dunietz and A4 after 3 weeks

f training.

201 sentences from randomly selected documents in the NYT subcorpus Causation types:

SLIDE 22

Initial agreement between Dunietz and A4 was just moderate for connectives, and abysmal for causation types.

F1 κ κ F1 κ)

Very unhappy annotators!

SLIDE 23

T

eliminate difficult, repetitious decision-making,

we compiled a “constructicon.”

Constructicon:
Fillmore, Lee-Goldman, and Rhodes, 2012
Lee-Goldman and Petruck, ms.
Our English causal language constructicon:
79 lexical head words
166 construction types
Counting prevent and prevent from as the

same lexical head word but different constructions.

SLIDE 24

Connective pattern <cause> prevents <effect> from <effect> <enough cause> for <effect> to <effect>

SLIDE 25

Additional examples from the causal language constructicon

 For <effect> to <effect>, <cause>  As a result, <effect>  Enough <cause> to <effect>  <effect> on grounds of <cause>  <cause> is the reason to <effect>  <effect> results from <cause>

SLIDE 26

Dunietz and a new annotator, A5, annotated a similarly-sized dataset using the constructicon.

< 1 day of training 260 sentences: annotated by Dunietz and A5 Causation types:

A5 has a masters degree in language technologies and had no prior annotation experience.

SLIDE 27

Constructicon-based annotation improved results dramatically.

F1 κ κ F1 κ)

Annotators reported no difficulty!

SLIDE 28

Lexicography helps when, without it, annotators must make the same decisions repeatedly

SLIDE 29

3. Broader implications of

low non-expert agreement

SLIDE 30

Expertise

Baseball players use physics, but they don’t have to know physics. What can we expect from people who speak languages but are not trained in metalinguistic awareness? When they have trouble with our annotation schemes, we start to worry. Is it something real that only experts are aware of? Are we, the experts, just making things up?

SLIDE 31

What lends validity to an annotation scheme?

 Riezler (2014)

 Reproducibility by non-experts  Improvement of an independent task

 Chomsky’s notion of explanatory adequacy and predictive power  This annotation scheme will be validated by independent task

SLIDE 32

Annotating Causal Language Using Corpus Lexicography

Contributions of this paper

Causal relations would be useful to annotate well…

…but annotating them raises difficult annotation issues.

based representation

Several projects have attempted to annotate real-world causality.

Others have focused on causal language.

Causal language: a clause or phrase in which

is explicitly presented as promoting or hindering another

Connective: fixed construction indicating a causal relationship

because prevented from causes because

Effect: presented as outcome/inferred conclusion Cause: presented as producing/indicating effect

John killed the dog it was threatening his chickens John the dog eating his chickens Ice cream consumption drowning She must have met him before she recognized him yesterday

We exclude language that does not encode pure, explicit causation:

Four types of causation

because of

because

in order to

so

Not all causal relationships are of equal strength or polarity.

caused

Only by can

Without

kept from

two annotation approaches

First Try

linguistics majors.

experience.

experience.

First try (Continued)

corpus annotation and NLP

Annotators determined the causation type using a decision tree.

Annotators determined the causation degree using another decision tree.

Annotators found a more fine-grained decision tree too difficult to apply.

We have annotated a small corpus with this scheme.

We computed intercoder agreement between Dunietz and A4 after 3 weeks

Initial agreement between Dunietz and A4 was just moderate for connectives, and abysmal for causation types.

T

we compiled a “constructicon.”

same lexical head word but different constructions.

Additional examples from the causal language constructicon

Dunietz and a new annotator, A5, annotated a similarly-sized dataset using the constructicon.

Constructicon-based annotation improved results dramatically.

Lexicography helps when, without it, annotators must make the same decisions repeatedly

low non-expert agreement

Expertise

What lends validity to an annotation scheme?

Thank you for listening