Deep Reinforcement Le Learning for Me Menti tion on-Ra Rank - - PowerPoint PPT Presentation

deep reinforcement le learning for me menti tion on ra
SMART_READER_LITE
LIVE PREVIEW

Deep Reinforcement Le Learning for Me Menti tion on-Ra Rank - - PowerPoint PPT Presentation

Deep Reinforcement Le Learning for Me Menti tion on-Ra Rank nking ng Cor Coreference Mod Models Kevin Clark and Christopher D. Manning Stanford University Presented by Zubin Pahuja Coreference Resolution Identify all mentions that


slide-1
SLIDE 1

Deep Reinforcement Le Learning for Me Menti tion

  • n-Ra

Rank nking ng Cor Coreference Mod Models

Kevin Clark and Christopher D. Manning Stanford University Presented by Zubin Pahuja

slide-2
SLIDE 2

Coreference Resolution

  • Identify all mentions that refer to the

same real world entity Barack Obama nominated Hillary Rodham Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady.

slide-3
SLIDE 3

Coreference Resolution

  • Identify all mentions that refer to the

same real world entity Barack Obama nominated Hillary Rodham Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady.

slide-4
SLIDE 4

Coreference Resolution

  • Identify all mentions that refer to the

same real world entity

  • A document-level structured

prediction task Barack Obama nominated Hillary Rodham Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady.

slide-5
SLIDE 5

Applications

  • Full text understanding

Information extraction, question answering, summarization “He was born in 1961”

slide-6
SLIDE 6

Applications

  • Dialog

“Book tickets to see James Bond” “Spectre is playing near you at 2:00 and 3:00 today. How many tickets would you like?” “Two tickets for the showing at three”

slide-7
SLIDE 7

Coreference Resolution is Hard!

  • “She poured water from the pitcher into the cup until it was full”
  • “She poured water from the pitcher into the cup until it was empty”
  • The trophy would not fit in the suitcase because it was too big.
  • The trophy would not fit in the suitcase because it was too small.
slide-8
SLIDE 8

Coreference Resolution is Hard!

  • “She poured water from the pitcher into the cup until it was full”
  • “She poured water from the pitcher into the cup until it was empty”
  • The trophy would not fit in the suitcase because it was too big.
  • The trophy would not fit in the suitcase because it was too small.
  • These are called Winograd Schema
slide-9
SLIDE 9

Three Kinds of Coreference Models

  • Mention Pair
  • Mention Ranking
  • Clustering
slide-10
SLIDE 10

Clustering

“I voted for Nader because he was most aligned with my values,” she said.

slide-11
SLIDE 11

Mention Ranking

  • Assign each mention its highest scoring candidate antecedent
  • Dummy mention NA allows model to decline assigning antecedent to current mention
slide-12
SLIDE 12

Mention Ranking

  • Assign each mention its highest scoring candidate antecedent
  • Dummy mention NA allows model to decline assigning antecedent to current mention
slide-13
SLIDE 13

Mention Ranking

  • Assign each mention its highest scoring candidate antecedent
  • Dummy NA mention allows model to decline linking the current mention to anything
slide-14
SLIDE 14

Mention Ranking

  • Infer global structure by making a sequence of local decisions
slide-15
SLIDE 15

Mention Ranking

  • Infer global structure by making a sequence of local decisions
slide-16
SLIDE 16

Mention Ranking

  • Infer global structure by making a sequence of local decisions
slide-17
SLIDE 17

Mention Ranking

  • Infer global structure by making a sequence of local decisions
slide-18
SLIDE 18

Mention Ranking

  • Infer global structure by making a sequence of local decisions
slide-19
SLIDE 19

Challenge

How to train a model to make local decisions such that it produces a global structure?

slide-20
SLIDE 20

Some Local Decisions Matter More than Others

slide-21
SLIDE 21

Prior Work

Heuristically defines which error types are more important than others

slide-22
SLIDE 22

Prior Work: Coreference Error Types

slide-23
SLIDE 23

Learning Algorithms

Heuristic Loss Function

slide-24
SLIDE 24

Prior Work: Heuristic Loss Function

Heuristically costs for mistakes

slide-25
SLIDE 25

Prior Work: Heuristic Loss Function

Max-Margin Loss (Wiseman et al)

slide-26
SLIDE 26

Prior Work: Heuristic Loss Function

Di Disadvantages

  • Requires careful tuning of hyperparameters using slow grid search
  • Does not generalize across datasets, languages, metrics
  • Does not optimize for evaluation metric
  • At best loss is correlated with metric
slide-27
SLIDE 27

Reinforcement Learning to the Rescue!

  • Does not require hyperparameter training
  • Small boost in accuracy
slide-28
SLIDE 28

Coref Resolution with Reinforcement Learning

  • Model takes a sequence of actions !":$ = !", !', … , !$
  • action !) = *, +) adds a coreference link between the ith mention and candidate antecedent *
slide-29
SLIDE 29

Coref Resolution with Reinforcement Learning

  • After completing a sequence of actions, model receives a reward (!" metric)
slide-30
SLIDE 30

Learning Algorithms

REINFORCE algorithm (Williams, 1992)

slide-31
SLIDE 31

REINFORCE Algorithm

slide-32
SLIDE 32

REINFORCE Algorithm

  • Competitive with heuristic loss
  • Disadvantage Vs. Max-Margin Loss
  • REINFORCE maximizes performance in expectation
  • We only need the highest scoring action(s) to be correct, not low scoring actions
slide-33
SLIDE 33

Combine best

  • f both worlds!

Improve cost-function in Max-Margin Loss

slide-34
SLIDE 34

Learning Algorithms

Reward-Rescaling

slide-35
SLIDE 35

Reward-Rescaling

  • Since actions are independent, we can change an action !" to a

different one !"

# and see what reward we would have gotten instead

slide-36
SLIDE 36

Reward-Rescaling

  • Since actions are independent, we can change an action !" to a

different one !"

# and see what reward we would have gotten instead

slide-37
SLIDE 37

Reward-Rescaling

  • Since actions are independent, we can change an action !" to a

different one !"

# and see what reward we would have gotten instead

slide-38
SLIDE 38

Reward-Rescaling

slide-39
SLIDE 39

Experimental Setup

  • English and Chinese CoNLL 2012 Shared Task dataset
  • Mentions predicted using Stanford rule-based system (Lee et al, 2011)
  • Scores are CoNLL F-1 scores
  • Average of MUC, !" and CEAF metrics
slide-40
SLIDE 40

Neural Mention Ranking Model

Standard feed-forward neural network (Clark and Manning, 2016)

slide-41
SLIDE 41

Features

  • Word Embeddings
  • Previous two words, first word, last word, head word of each mention
  • Groups of words as average of vectors for each word in the group
  • Also
  • Distance
  • String Matching
  • Document Genre
  • Speaker Information
  • Separate network for anaphrocity scores
slide-42
SLIDE 42

Evaluation

slide-43
SLIDE 43

Error Breakdown: Avoiding Costly Mistakes

  • Reward-Rescaling makes more errors in total!
  • However, the errors are less severe
slide-44
SLIDE 44

Comparison with Heuristic Loss

  • High variance in costs

for a given error type

  • Distribution of “False

New” cost is spread

  • ut, so using fixed

penalty for an error- type is insufficient

slide-45
SLIDE 45

Example Improvement: Proper Nouns

  • Fewer “false new” errors with proper nouns
slide-46
SLIDE 46

Conclusion

Heuristic Loss < REINFORCE < Reward-Rescaling

  • Why?
  • Benefit of Max-Margin Loss
  • Directly optimizes coref metrics rather than heuristic cost function
  • Advantages:
  • Does not require hyperparameter training
  • Small boost in accuracy with fewer costly mistakes
slide-47
SLIDE 47

Caveats

  • Reward metric needs to be fast since it will be computed many times!
  • May overfit for evaluation metric
slide-48
SLIDE 48

Thank You

Any Questions?