Identifying Semantic Roles Using Combinatory Categorial Grammar - - PowerPoint PPT Presentation

identifying semantic roles using combinatory categorial
SMART_READER_LITE
LIVE PREVIEW

Identifying Semantic Roles Using Combinatory Categorial Grammar - - PowerPoint PPT Presentation

Identifying Semantic Roles Using Combinatory Categorial Grammar Daniel Gildea and Julia Hockenmaier University of Pennsylvania Gildea & Hockenmaier EMNLP 2003 1 Introduction Understanding difficult due to variation in syntactic


slide-1
SLIDE 1

Identifying Semantic Roles Using Combinatory Categorial Grammar

Daniel Gildea and Julia Hockenmaier University of Pennsylvania

Gildea & Hockenmaier EMNLP 2003 1

slide-2
SLIDE 2

Introduction

Understanding difficult due to variation in syntactic realization of semantic roles:

  • John will meet with Mary.
  • John will meet Mary.
  • John and Mary will meet.
  • The door opened.
  • Mary opened the door.

Gildea & Hockenmaier EMNLP 2003 2

slide-3
SLIDE 3

Statistical Approaches to Semantic Roles

Gildea & Palmer ACL 2002: Predict PropBank roles using features derived from Treebank parser output (Collins). Similar approaches:

  • MUC data: Riloff & Schmelzenbach 1998, Miller et al. 2000
  • FrameNet data: Gildea & Jurafsky 2000

Problem: Long distance dependencies difficult to find/interpret.

Gildea & Hockenmaier EMNLP 2003 3

slide-4
SLIDE 4

Long Distance Dependencies

Standard Treebank parsers do not return dependencies from relative clauses, wh-movement, control, raising. truth: [ARG0 Big investment banks] refused to step up to the plate to support [ARG1 the floor traders] . system: Big investment banks refused to step up to the plate to support [ARG1 the floor traders] . CCG parsers return local and long-distance dependencies in same form.

Gildea & Hockenmaier EMNLP 2003 4

slide-5
SLIDE 5

Overview

  • Semantic roles in PropBank
  • Combinatory Categorial Grammar
  • Features: matching CCG and PropBank
  • Results and Discussion

Gildea & Hockenmaier EMNLP 2003 5

slide-6
SLIDE 6

PropBank

  • Role labels defined per-predicate:

– Core: Arg0, Arg1, ... – ArgM: Temporal, Locative, etc

  • Rolesets correspond to senses
  • Tagging all verbs in treebanked Wall Street Journal
  • Preliminary corpus: 72,109 verb instances (2462 unique verbs),

190,815 individual arguments (75% are “core”)

Kingsbury et al., HLT 2002

Gildea & Hockenmaier EMNLP 2003 6

slide-7
SLIDE 7

Sample PropBank Roleset Entry

  • ffer

Arg0: entity offering Arg1: commodity Arg2: benefactive or entity offered to Arg3: price

  • [ARG0 the company] to offer [ARG1 a 15% stake] to [ARG2 the

public].

  • [ARG0 Sotheby’s] ... offered [ARG2 the Dorrance heirs] [ARG1 a

money-back guarantee]

Gildea & Hockenmaier EMNLP 2003 7

slide-8
SLIDE 8

PropBank ArgM Roles

Location, Time, Manner, Direction, Cause, Discourse, Extent, Purpose, Negation, Modal, Adverbial

  • Location: in Tokyo
  • Discourse: However
  • Negation: not

Gildea & Hockenmaier EMNLP 2003 8

slide-9
SLIDE 9

Probability Model for Predicting Roles

Based on features extracted from parser output:

  • Phrase type: NP

, PP , S, etc

  • Position: Before/after predicate word
  • Voice: Active/passive
  • Head Word: Uses head rules of parser
  • Parse Tree Path: syntactic relation to predicate

Gildea and Palmer ACL 2002

Gildea & Hockenmaier EMNLP 2003 9

slide-10
SLIDE 10

Parse Tree Path

S NP VP NP

He ate some pancakes

PRP DT NN VB

Ex: P(fe|p =“eat”, path =“V B↑V P↑S↓NP ”, head =“He”)

Gildea & Hockenmaier EMNLP 2003 10

slide-11
SLIDE 11

Backoff Lattice

P(r | h) P(r | h, pt, p) P(r | pt, p) P(r | p) P(r | pt, path, p) P(r | h, p) P(r | pt, pos, v, p) P(r | pt, pos, v)

Gildea & Hockenmaier EMNLP 2003 11

slide-12
SLIDE 12

Sentence-Level Argument Assignment

Choose best assignment of roles r1..n given predicate p, and features F1..n:

P(r1..n|F1..n, p) ≈ P({r1..n}|p)

  • i

P(ri|Fi, p) P(ri|p)

Argument set probabilities provide (limited) dependence between individual labeling decisions.

Gildea & Hockenmaier EMNLP 2003 12

slide-13
SLIDE 13

Combinatory Categorial Grammar

  • Categories specify subcat lists of words/constituents

Declarative verb phrase:

S[dcl]\NP

Transitive declarative verb:

(S[dcl]\NP)/NP

  • Combinatory rules specify how constituents can combine.
  • Derivations spell out process of combining constituents

S[dcl] NP

London

S[dcl]\NP (S[dcl]\NP)/NP

denied

NP

plans

Gildea & Hockenmaier EMNLP 2003 13

slide-14
SLIDE 14

Predicate-argument structure in CCG

  • The argument slots of functor categories define dependencies:

S[dcl] NP1

London

S[dcl]\NP1 (S[dcl]\ NP1 )/ NP2

denied

NP2

plans

Gildea & Hockenmaier EMNLP 2003 14

slide-15
SLIDE 15

Long-range dependencies in CCG

  • Long-range dependencies are projected from the lexicon:

NP NP2

plans

NP\NP2 (NP\NPi)/(S[dcl]/NPi)

that

S[dcl]/NP2 S/(S\NP1) NP

London

(S[dcl]\NP1)/ NP2

denied

  • Similar for control, raising, etc.

Gildea & Hockenmaier EMNLP 2003 15

slide-16
SLIDE 16

CCG Predicate-Argument Relations

London denied plans on Monday

wh wa ch i

denied London

(S[dcl]\ NP1 )/NP2

1 denied plans

(S[dcl]\NP1)/ NP2

2

  • n

denied

((S\NP1)\ (S\NP)2 )/NP3

2

  • n

Monday

((S\NP1)\(S\NP)2)/ NP3

3

Gildea & Hockenmaier EMNLP 2003 16

slide-17
SLIDE 17

CCG and PropBank

  • CCG derivation often doesn’t match Penn Treebank constituent

structure

  • Training: Find maximal projection in CCG of headword of

constituent labeled in PropBank

  • Evaluation: Score on headwords, rather than constituent

boundaries

Gildea & Hockenmaier EMNLP 2003 17

slide-18
SLIDE 18

Mismatches between CCGbank and PropBank

  • 23% of PropBank arguments do not correspond to CCG

relations: – to offer ...[PP to [NPARG2 the public]] We use a path feature instead:

S[b]\NP S[b]\NP

  • ffer

(S\NP)\(S\NP) ((S\NP)\(S\NP))/NP

to

NP

the public

Sparser than Treebank path feature.

Gildea & Hockenmaier EMNLP 2003 18

slide-19
SLIDE 19

Experiment

Train on Sections 02-21, test on 23.

  • Compare Treebank- and CCG-based systems
  • Compare automatic parser output and gold standard parses
  • Compare Treebank parses with and without traces

Gildea & Hockenmaier EMNLP 2003 19

slide-20
SLIDE 20

Accuracy of Semantic Role Prediction

Parses Treebank-based CCG-based Used Args Prec Recall F-score Prec Recall F-score Automatic core 75.9 69.6 72.6 76.1 73.5 74.8 all 72.6 61.2 66.4 71.0 63.1 66.8 Gold-standard core 85.5 81.7 83.5 82.4 78.6 80.4 all 78.8 69.9 74.1 76.3 67.8 71.8 Gold-standard core 77.6 75.2 76.3 w/o traces all 74.4 66.5 70.2

Gildea & Hockenmaier EMNLP 2003 20

slide-21
SLIDE 21

Comparison of scoring regimes

Treebank-based CCG-based Parses Scoring Prec Recall F-score Prec Recall F-score Automatic Head word 72.6 61.2 66.4 71.0 63.1 66.8 Boundary 68.6 57.8 62.7 55.7 49.5 52.4 Gold-standard Head word 77.6 75.2 76.3 76.3 67.8 71.8 Boundary 74.4 66.5 70.2 67.5 60.0 63.5

Gildea & Hockenmaier EMNLP 2003 21

slide-22
SLIDE 22

Conclusion

  • CCG helps find long-distance dependencies
  • Performance on non-core arguments lower due to:

– mismatches between CCGBank and PropBank annotation – sparser CCG feature set Future Work:

  • Use PropBank annotation in conversion to CCG

Gildea & Hockenmaier EMNLP 2003 22