Using Corpus Lexicography of Constructions Jesse Dunietz, Lori - - PowerPoint PPT Presentation

using corpus lexicography
SMART_READER_LITE
LIVE PREVIEW

Using Corpus Lexicography of Constructions Jesse Dunietz, Lori - - PowerPoint PPT Presentation

Annotating Causal Language Using Corpus Lexicography of Constructions Jesse Dunietz, Lori Levin, and Jaime Carbonell LAW 2015 June 5, 2015 Contributions of this paper Raising issues about corpus annotation: Low agreement among


slide-1
SLIDE 1

Annotating Causal Language Using Corpus Lexicography

  • f Constructions

Jesse Dunietz, Lori Levin, and Jaime Carbonell LAW 2015 June 5, 2015

slide-2
SLIDE 2

Contributions of this paper

 Raising issues about corpus annotation:

 Low agreement among non-experts

 Methodology for annotation projects

 Lexicon driven annotation: as in PropBank and FrameNet

 An annotation scheme for causal language in English  A constructicon of causal language in English  A small annotated corpus of causal language in English  All still in progress

slide-3
SLIDE 3

Ubiquitous in our mental models Ubiquitous in language Useful for downstream applications (e.g., information extraction)

Causal relations would be useful to annotate well…

2nd most common relation between verbs The prevention of FOXP3 expression was not caused by interferences.

Medical symptoms Political events Interpersonal actions

slide-4
SLIDE 4

…but annotating them raises difficult annotation issues.

causes because of For reasons of forbid to convinced to too to If After Don’t because of

slide-5
SLIDE 5
  • 1. A detailed, construction-

based representation

slide-6
SLIDE 6

Several projects have attempted to annotate real-world causality.

flu virus

SemEval 2007 Task 4

allocated equipped

Richer Event Descriptions

BEFORE-PRECONDITIONS

ill need

OVERLAP-CAUSE

slide-7
SLIDE 7

Others have focused on causal language.

Penn Discourse Treebank

… …

Causality in TempEval-3

CAUSE BEFORE

acquired as a result of agreement

BioCause

slide-8
SLIDE 8

Causal language: a clause or phrase in which

  • ne event, state, action, or entity

is explicitly presented as promoting or hindering another

slide-9
SLIDE 9

Connective: fixed construction indicating a causal relationship

because prevented from causes because

Not “truly” causal

slide-10
SLIDE 10

Effect: presented as outcome/inferred conclusion Cause: presented as producing/indicating effect

John killed the dog it was threatening his chickens John the dog eating his chickens Ice cream consumption drowning She must have met him before she recognized him yesterday

slide-11
SLIDE 11

We exclude language that does not encode pure, explicit causation:

slide-12
SLIDE 12

Four types of causation

because of

CONSEQUENCE

because

MOTIVATION

in order to

PURPOSE

so

INFERENCE

slide-13
SLIDE 13

Not all causal relationships are of equal strength or polarity.

caused

FACILITATE

Only by can

ENABLE

Without

DISENTAIL

kept from

INHIBIT ENTAIL PREVENT

slide-14
SLIDE 14
  • 2. Comparison of

two annotation approaches

slide-15
SLIDE 15

First Try

  • Dunietz and three annotators (A1, A2, A3)
  • A1, A2, and A3 are recently graduated

linguistics majors.

  • A1 had more than one year annotation

experience.

  • A2 and A3 did not have annotation

experience.

slide-16
SLIDE 16

First try (Continued)

  • Rounds of annotation and reconciliation
  • Produced a coding manual
  • Annotator A4
  • Masters in linguistics plus 30 years experience with

corpus annotation and NLP

slide-17
SLIDE 17

Annotators determined the causation type using a decision tree.

choose feel think fact about the world

  • utcome

he/she hopes to achieve Purpose Motivation temporally follow the cause more/less likely more or less strongly Disentail Inhibit

slide-18
SLIDE 18

Annotators determined the causation degree using another decision tree.

increasing decreasing Facilitate Inhibit

slide-19
SLIDE 19

Annotators found a more fine-grained decision tree too difficult to apply.

increasing decreasing significantly merely Facilitate Enable significantly merely Disentail Inhibit

slide-20
SLIDE 20

We have annotated a small corpus with this scheme.

Total 93 3333 845

slide-21
SLIDE 21

We computed intercoder agreement between Dunietz and A4 after 3 weeks

  • f training.

201 sentences from randomly selected documents in the NYT subcorpus Causation types:

slide-22
SLIDE 22

Initial agreement between Dunietz and A4 was just moderate for connectives, and abysmal for causation types.

F1 κ κ F1 κ)

Very unhappy annotators!

slide-23
SLIDE 23

T

  • eliminate difficult, repetitious decision-making,

we compiled a “constructicon.”

  • Constructicon:
  • Fillmore, Lee-Goldman, and Rhodes, 2012
  • Lee-Goldman and Petruck, ms.
  • Our English causal language constructicon:
  • 79 lexical head words
  • 166 construction types
  • Counting prevent and prevent from as the

same lexical head word but different constructions.

slide-24
SLIDE 24

Connective pattern <cause> prevents <effect> from <effect> <enough cause> for <effect> to <effect>

slide-25
SLIDE 25

Additional examples from the causal language constructicon

 For <effect> to <effect>, <cause>  As a result, <effect>  Enough <cause> to <effect>  <effect> on grounds of <cause>  <cause> is the reason to <effect>  <effect> results from <cause>

slide-26
SLIDE 26

Dunietz and a new annotator, A5, annotated a similarly-sized dataset using the constructicon.

< 1 day of training 260 sentences: annotated by Dunietz and A5 Causation types:

A5 has a masters degree in language technologies and had no prior annotation experience.

slide-27
SLIDE 27

Constructicon-based annotation improved results dramatically.

F1 κ κ F1 κ)

Annotators reported no difficulty!

slide-28
SLIDE 28

Lexicography helps when, without it, annotators must make the same decisions repeatedly

slide-29
SLIDE 29
  • 3. Broader implications of

low non-expert agreement

slide-30
SLIDE 30

Expertise

Baseball players use physics, but they don’t have to know physics. What can we expect from people who speak languages but are not trained in metalinguistic awareness? When they have trouble with our annotation schemes, we start to worry. Is it something real that only experts are aware of? Are we, the experts, just making things up?

slide-31
SLIDE 31

What lends validity to an annotation scheme?

 Riezler (2014)

 Reproducibility by non-experts  Improvement of an independent task

 Chomsky’s notion of explanatory adequacy and predictive power  This annotation scheme will be validated by independent task

slide-32
SLIDE 32

Thank you for listening