Learning Greedy Policies for the Easy-First Framework Jun Xie, Chao - - PowerPoint PPT Presentation

learning greedy policies for the easy first framework
SMART_READER_LITE
LIVE PREVIEW

Learning Greedy Policies for the Easy-First Framework Jun Xie, Chao - - PowerPoint PPT Presentation

Learning Greedy Policies for the Easy-First Framework Jun Xie, Chao Ma, Janardhan Rao Doppa, Prashanth Mannem, Xiaoli Fern, Tom Dietterich, Prasad Tadepalli Oregon State University 1 The Easy-First Framework: Example A 4.2 magnitude earthquake


slide-1
SLIDE 1

Learning Greedy Policies for the Easy-First Framework

Jun Xie, Chao Ma, Janardhan Rao Doppa, Prashanth Mannem, Xiaoli Fern, Tom Dietterich, Prasad Tadepalli Oregon State University

1

slide-2
SLIDE 2

The Easy-First Framework: Example

A 4.2 magnitude earthquake struck near eastern Sonoma County. A tremor struck in Sonoma County.

Doc 1 Doc 2

slide-3
SLIDE 3

The Easy-First Framework: Example

A 4.2 magnitude earthquake struck near eastern Sonoma County. A tremor struck in Sonoma County.

Doc 1 Doc 2 A 4.2 magnitude earthquake A tremor eastern Sonoma County Sonoma County

  • 1. Begin with every mention in its own cluster
slide-4
SLIDE 4

The Easy-First Framework: Example

A 4.2 magnitude earthquake struck near eastern Sonoma County. A tremor struck in Sonoma County.

Doc 1 Doc 2 A 4.2 magnitude earthquake A tremor eastern Sonoma County Sonoma County

  • 1. Begin with every mention in its own cluster
  • 2. Evaluate all possible merges with a scoring function

and select the highest scoring merge (easiest)

slide-5
SLIDE 5

The Easy-First Framework: Example

A 4.2 magnitude earthquake struck near eastern Sonoma County. A tremor struck in Sonoma County.

Doc 1 Doc 2 A 4.2 magnitude earthquake A tremor eastern Sonoma County Sonoma County

  • 1. Begin with every mention in its own cluster
  • 2. Evaluate all possible merges with a scoring function

and select the highest scoring merge (easiest)

  • 3. Repeat until stopping condition is met
slide-6
SLIDE 6

Easy First Training

6

Initial State

S

…… a b c d

04 . ) (  a f

36 . ) (  b f 57 . ) (  c f 63 . ) (  d f 03 . ) (  a f 12 . ) (  b f 29 . ) (  d f 47 . ) (  c f

1

S

…… e g h i

27 . ) (  e f 39 . ) (  g f 41 . ) (  h f 52 . ) (  i f

2

S

Good Bad …… j k m n

31 . ) (  j f 38 . ) (  k f 51 . ) (  m f 68 . ) (  n f 34 . ) (  j f 36 . ) (  k f 55 . ) (  m f 62 . ) (  n f

3

S

……

T

S

Weight Update Weight Update

slide-7
SLIDE 7

Learning Scoring Function

Possible goal: learn a scoring function such that:

in every state ALL good actions are ranked higher than all bad actions

A better goal: learn a scoring function such that in every state ONE good action is ranked higher than all bad actions

7

Over-Constrained Goal

slide-8
SLIDE 8

  

   

B b b g G g w

x w x w B ) max 1 ( 1 argmin

Proposed Objective for Update

max

𝑕∈𝐻 𝑥 ⋅ 𝑦𝑕 > 𝑥 ⋅ 𝑦𝑐 + 1

for all 𝑐 ∈ 𝐶

8

  • Goal: find a linear function such that it ranks
  • ne good action higher than all bad actions

– This can be achieved by a set of constraints

  • Our Objective:
  • Use hinge loss to capture the constraints
  • Regularization to avoid overly aggressive update

2 c

w w  

slide-9
SLIDE 9

Optimization

  • Majorization Minimization algorithm to find a

local optimal solution.

  • In each MM iteration:

– Let be the current highest scoring good action – Solve following convex objective (via subgradient descent)

  

   

B b b g G g w

x w x w B ) max 1 ( 1 argmin

2 c

w w  

* g

x w

slide-10
SLIDE 10

Contrast with Existing Methods

Good Bad Average-Good Average-Bad

  • Average-good vs. average-bad (AGAB)
  • Best-good vs. best-bad (BGBB)

Best-good Best-bad

  • Proposed method: Best-good vs. violated-bad (BGVB)

Best-good Violated-bad

10

slide-11
SLIDE 11

11

10 20 30 40 50 60 70 80 MUC B-CUBE CEAF_e CoNLL

Results on EECB corpus (Lee et al., 2012)

BGBB R-BGBB BGVB R-BGVB Lee et al.

Experiment I: cross-document entity and event coref

slide-12
SLIDE 12

12

Experiment II: within-doc Coref

10 20 30 40 50 60 70 80 MUC B-CUBE CEAF_e CoNLL

Results on OntoNotes

BGBB R-BGBB BGVB R-BGVB

slide-13
SLIDE 13

13

Approach Total Steps Mistakes Recoveries Percentage Accuracy RBGVB 50195 16228 4255 0.262 0.87

Diagnostics

  • Some training statistics on ACE 2004 corpus:
slide-14
SLIDE 14

14

Approach Total Steps Mistakes Recoveries Percentage Accuracy RBGVB 50195 16228 4255 0.262 0.87 BGBB 50195 11625 4075 0.351 0.82

BGBB corrects errors more aggressively than RBGVB. This is a strong evidence that overfitting does happen with BGBB.

Diagnostics

  • Some training statistics on ACE 2004 corpus:
slide-15
SLIDE 15

Contributions

  • We precisely represent the learning goal for

Easy First as an optimization problem

  • We develop an efficient Majorization

Minimization algorithm to optimize the proposed objective

  • Achieve highly competitive results against

state-of-the-art for both within- and cross- document coref

15

slide-16
SLIDE 16

16