Statistical Script Learning with Recurrent Neural Nets Karl - - PowerPoint PPT Presentation

statistical script learning with recurrent neural nets
SMART_READER_LITE
LIVE PREVIEW

Statistical Script Learning with Recurrent Neural Nets Karl - - PowerPoint PPT Presentation

Statistical Script Learning with Recurrent Neural Nets Karl Pichotta Dissertation Proposal December 17, 2015 1 Motivation Following the Battle of Actium, Octavian invaded Egypt. As he approached Alexandria, Antony's armies deserted to


slide-1
SLIDE 1

Statistical Script Learning with Recurrent Neural Nets

Karl Pichotta Dissertation Proposal December 17, 2015

1

slide-2
SLIDE 2

Motivation

  • Following the Battle of Actium, Octavian invaded
  • Egypt. As he approached Alexandria, Antony's

armies deserted to Octavian on August 1, 30 BC.

  • Did Octavian defeat Antony?

2

slide-3
SLIDE 3

Motivation

  • Following the Battle of Actium, Octavian invaded
  • Egypt. As he approached Alexandria, Antony's

armies deserted to Octavian on August 1, 30 BC.

  • Did Octavian defeat Antony?

3

slide-4
SLIDE 4

Motivation

  • Antony’s armies deserted to Octavian 


⇒
 Octavian defeated Antony

  • Not simply a paraphrase rule!
  • Need world knowledge.

4

slide-5
SLIDE 5

Scripts

  • Scripts: models of events in sequence.
  • Events don’t appear in text randomly, but

according to world dynamics.

  • Scripts try to capture these dynamics.
  • Enable automatic inference of implicit events,

given events in text (e.g. Octavian defeated Antony).

5

slide-6
SLIDE 6

Research Questions

  • How can Neural Nets improve automatic inference
  • f events from documents?
  • Which models work best empirically?
  • Which types of explicit linguistic knowledge are

useful?

6

slide-7
SLIDE 7

Outline

  • Background
  • Completed Work
  • Proposed Work
  • Conclusion
  • Completed Work
  • Proposed Work
  • Conclusion

7

slide-8
SLIDE 8

Outline

  • Background
  • Statistical Scripts
  • Recurrent Neural Nets


  • Background
  • Statistical Scripts
  • Recurrent Neural Nets


8

slide-9
SLIDE 9

Background: Statistical Scripts

  • Statistical Scripts: Statistical Models of Event

Sequences.

  • Non-statistical scripts date back to the 1970s

[Schank & Abelson 1977].

  • Statistical script learning is a small-but-growing

subcommunity [e.g. Chambers & Jurafsky 2008].

  • Model the probability of an event given prior

events.

9

slide-10
SLIDE 10

Background: Statistical Script Learning

10

Millions

  • f

Documents NLP Pipeline

  • Syntax
  • Coreference

Millions of Event Sequences Train a Statistical Model

slide-11
SLIDE 11

Background: Statistical Script Inference

11

New Test Document NLP Pipeline

  • Syntax
  • Coreference

Single Event Sequence Query Trained Statistical Model Inferred Probable Events

slide-12
SLIDE 12

Background: Statistical Scripts

  • Central Questions:
  • What is an “Event?” (Part 1 of completed work)
  • Which models work well? (Part 2 of completed

work)

  • How to evaluate?
  • How to incorporate into end tasks?

12

slide-13
SLIDE 13

Outline

  • Background
  • Statistical Scripts
  • Recurrent Neural Nets


  • Background
  • Statistical Scripts
  • Recurrent Neural Nets


13

slide-14
SLIDE 14

Background: RNNs

  • Recurrent Neural Nets (RNNs): Neural Nets

with cycles in computation graph.

  • RNN Sequence Models: Map inputs 


x1, …, xt 
 to outputs 


  • 1, …, ot 


via learned latent vector states 
 z1, …, zt.

14

slide-15
SLIDE 15

Background: RNNs

  • 15

[Elman 1990]

slide-16
SLIDE 16

Background: RNNs

  • Hidden Unit can be arbitrarily complicated, as long

as we can calculate gradients!

  • 16
slide-17
SLIDE 17

Background: LSTMs

  • Long Short-Term Memory (LSTM): More

complex hidden RNN unit. [Hochreiter & Schmidhuber, 1997]

  • Explicitly addresses two issues:
  • Vanishing Gradient Problem.
  • Long-Range Dependencies.

17

slide-18
SLIDE 18

Background: LSTM

gt = tanh (Wx,mxt + Wz,mzt−1 + bg) it = σ (Wx,ixt + Wz,izt−1 + bi) ft = σ (Wx,fxt + Wz,fzt−1 + bf)

  • t = σ (Wx,oxt + Wh,izt−1 + bo)

zt = ot tanh mt

18

  • mt = ft mt−1 + it gt
slide-19
SLIDE 19

Background: LSTMs

  • LSTMs successful for many hard NLP tasks recently:
  • Machine Translation [Kalchbrenner and Blunsom

2013, Bahdanau et al. 2015].

  • Captioning Images/Videos [Donahue et al. 2015,

Venugopalan et al. 2015].

  • Language Modeling [Sundermeyer et al. 2012, Kim et
  • al. 2016].
  • Question Answering [Hermann et al. 2015, Gao et al.

2015].

19

slide-20
SLIDE 20

Outline

  • Background
  • Completed Work
  • Proposed Work
  • Conclusion
  • Background
  • Proposed Work
  • Conclusion

20

slide-21
SLIDE 21

Outline

  • Background
  • Completed Work
  • Multi-Argument Events
  • RNN Scripts
  • Background
  • 21
slide-22
SLIDE 22

Outline

  • Background
  • Completed Work
  • Multi-Argument Events
  • RNN Scripts
  • Background
  • Completed Work
  • Multi-Argument Events
  • RNN Scripts

22

slide-23
SLIDE 23

Events

  • To model “events,” we need a formal definition.
  • For us, it will be variations of “verbs with

participants.”

23

slide-24
SLIDE 24
  • Other Methods use (verb, dependency) pair events

[Chambers & Jurafsky 2008; 2009; Jans et al. 2012; Rudinger et al. 2015].
 
 
 
 
 


  • Captures how an entity relates to a verb.

Pair Events

24

(vb, dep)

Verb Syntactic Dependency

slide-25
SLIDE 25

Pair Events

  • Napoleon remained married to Marie Louise,

though she did not join him in exile on Elba and thereafter never saw her husband again. (remain_married, subj) (not_join, obj) (not_see, obj) (remain_married, prep) (not_join, subj) (not_see, subj) N. M.L. (remain_married, subj) (not_join, obj) (not_see, obj) (remain_married, prep) (not_join, subj) (not_see, subj)

  • …Doesn’t capture interactions between entities.

25

slide-26
SLIDE 26

Multi-Argument Events

[P. & Mooney, EACL 2014]

  • Use more complex events with multiple entities.
  • Learning is more complicated…
  • …But inferred events are quantitatively better.

26

slide-27
SLIDE 27
  • We represent events as tuples:



 
 
 
 
 


  • Entities may be null (“·”).
  • Entities have only coreference information.

Multi-Argument Events

27

v (es, eo, ep)

Verb Subject Entity Object Entity Prepositional Entity

slide-28
SLIDE 28

Multi-Argument Events

  • Napoleon remained married to Marie Louise,

though she did not join him in exile on Elba and thereafter never saw her husband again. remain_married(N, ·, to ML) not_join(ML, N, ·) not_see(ML, N, ·)

  • Incorporate entities into events as variables.
  • Captures pairwise interaction between entities.

28

slide-29
SLIDE 29

Entity Rewriting

remain_married(N, ·, to ML) not_join(ML, N, ·) not_see(ML, N, ·)

  • not_join(x, y, ·) should predict not_see(x, y, ·) for all

x, y.

  • During learning, canonicalize co-occurring events:
  • Rename variables to a small fixed set.
  • Add co-occurrences of all consistent rewritings of

the events.

29

slide-30
SLIDE 30

Learning & Inference

  • Learning: From large corpus, count N(a,b), the

number of times event b occurs after event a with at most two intervening events (“2-skip bigram” counts).

  • Inference: Infer event b at timestep t according to:



 
 
 
 
 
 


S(b) =

t

X

i=1

log P(b|ai) +

`

X

i=t+1

log P(ai|b)

  • Prob. of b following

events before t

  • Prob. of b preceding

events after t

| {z } | {z }

30

[Jans et al. 2012]

slide-31
SLIDE 31

Evaluation

  • “Narrative Cloze” (Chambers & Jurafsky, 2008):

from an unseen document, hold one event out, try to infer it given remaining document.

  • “Recall at k” (Jans et al., 2012): make k top

inferences, calculate recall of held-out events.

  • We evaluate on a number of metrics, but only

present one here for clarity (different results are comparatively similar).

31

slide-32
SLIDE 32

Experiments

  • Train on 1.1M NYT articles (Gigaword).
  • Use Stanford Parser/Coref.

32

slide-33
SLIDE 33

Results: Pair Events

Unigram Single-Protagonist Joint 0.1 0.2 0.3 0.4

0.336 0.282 0.297

Recall at 10 for inferring (verb, dependency) events.

33

slide-34
SLIDE 34

Results: Multi-Argument Events

Unigram Multi-Protagonist Joint 0.063 0.125 0.188 0.25

0.245 0.209 0.216

Recall at 10 for inferring Multi-argument events.

34

slide-35
SLIDE 35

Outline

  • Background
  • Completed Work
  • Multi-Argument Events
  • RNN Scripts
  • Background
  • Completed Work
  • Multi-Argument Events
  • RNN Scripts

35

slide-36
SLIDE 36

Co-occurrence Model Shortcomings

  • The co-occurrence-based method has

shortcomings:

  • “x married y” and “x is married to y” are

unrelated events.

  • Nouns are ignored. (she sits on the chair vs she

sits on the board of directors).

  • Relative position of events in sequence is

ignored (only one notion of co-occurrence).

36

slide-37
SLIDE 37

LSTM Script models

[P. & Mooney, AAAI 2016]

  • Feed event sequences into LSTM sequence model.
  • To infer events, have the model generate likely

events from sequence.

  • Can input noun info, coref info, or both.

37

slide-38
SLIDE 38

LSTM Script models

  • In April 1866 Congress again passed the bill.

Johnson again vetoed it.

  • 38

[pass, congress, bill, in, april]; [veto, johnson, it, ·, ·]

slide-39
SLIDE 39

LSTM Script models

  • In April 1866 Congress again passed the bill.

Johnson again vetoed it.

  • 39

[pass, congress, bill, in, april]; [veto, johnson, it, ·, ·] [ ] [ ]

slide-40
SLIDE 40

LSTM Script models

  • In April 1866 Congress again passed the bill.

Johnson again vetoed it. v es eo ep p v es eo ep p

  • S 1 S - - S 1 - -
  • 40

[ ] [ ]

slide-41
SLIDE 41

LSTM Script models

  • Train on English Wikipedia.
  • Run Stanford parser, coref; extract sequences of events.
  • Train LSTM using Batch Stochastic Gradient Descent with

Momentum.

  • Minimize cross-entropy loss of predictions.
  • Backpropagate error through layers and through time.
  • To infer new events, just have the LSTM generate the next

five outputs with highest probability (using beam search).

41

slide-42
SLIDE 42

Results: Predicting Verbs & Coreference Info

Unigram Joint LSTM coref LSTM coref+noun 0.05 0.1 0.15 0.2

0.152 0.145 0.124 0.101

Recall at 25 for inferring Verbs & Coref info

42

slide-43
SLIDE 43

Results: Predicting Verbs & Nouns

Unigram Joint LSTM noun LSTM coref+noun 0.02 0.04 0.06 0.08

0.061 0.054 0.037 0.025

Recall at 25 for inferring Verbs & Nouns

43

slide-44
SLIDE 44

Human Evaluations

  • Solicit judgments on individual inferences on

Amazon Mechanical Turk.

  • Have annotators rank inferences from 1-5 (or

mark “Nonsense,” scored 0).

  • More interpretable.

44

slide-45
SLIDE 45

Results: Crowdsourced Eval

Random Joint Entity Joint Noun LSTM Entity LSTM Noun 1 2 3 4

3.67 3.08 2.21 2.87 0.87

Filtered Human judgments of top inferences (5 Max)

45

slide-46
SLIDE 46

Annotator Examples

46

As a result , during the October municipal election , serious violence broke out on polling day , with shots exchanged by competing mobs .

  • appeal to X
  • X has a X
  • known as X
  • X has a X
  • X was arrested
  • Random
  • Joint Ent
  • Joint Noun
  • LSTM Ent
  • LSTM Noun
  • 2.7
  • 0.3
  • 3.3
  • 0.3
  • 4.3
slide-47
SLIDE 47

Annotator Examples

47

Today the remaining community has shrunk to about 50 mostly elderly people . The Kehila Kedosha Yashan Synagogue remains locked , only opened for visitors on request . Emigrant Romaniotes return every summer and open the old synagogue .

  • all of the X 's men were lost
  • X found
  • X wrote
  • build a X
  • synagogue was closed
  • Random
  • Joint Ent
  • Joint Noun
  • LSTM Ent
  • LSTM Noun
  • 2.0
  • 2.0
  • 2.0
  • 1.7
  • 3.0
slide-48
SLIDE 48

Generating “Stories”

  • Can generate “stories” by starting with <S>

beginning-of-sequence pseudo-event.

  • Sample from distribution of initial event components

(first verb).

  • Take sample as first-step input, sample distribution
  • f next components.
  • Repeat until sampling </S> end-of-sequence token.

48

slide-49
SLIDE 49

Generated “Stories”

(bear, ., ., kingdom, into) (attend, she, brown, graduation, after) (earn, she, master, university, from) (admit, ., she, university, to) (receive,she,bachelor,university,from) (involve, ., she, production, in) (represent, she, company, ., .) Born into a kingdom,… …she attended Brown after graduation She earned her Masters from the University She was admitted to a University She had received a bachelors from a University She was involved in the production She represented the company.

Generated event tuples English Descriptions

49

slide-50
SLIDE 50

Outline

  • Background
  • Completed Work
  • Proposed Work
  • Conclusion
  • Background
  • Completed Work
  • Proposed Work
  • Conclusion

50

slide-51
SLIDE 51

Research Questions

  • How can Neural Nets improve automatic inference
  • f events from documents?
  • Which models work best empirically?
  • Which types of explicit linguistic knowledge are

useful?

51

slide-52
SLIDE 52

Outline

  • Background
  • Completed Work
  • Proposed Work
  • Conclusion
  • Background
  • Completed Work
  • Proposed Work
  • Conclusion

52

slide-53
SLIDE 53

Outline

  • Background
  • Completed Work
  • Proposed Work


  • Background
  • Completed Work
  • Proposed Work


  • Better Models
  • Better Events
  • Discourse Relations
  • Bonus
  • Coreference
  • Question-Answering

53

slide-54
SLIDE 54

Outline

  • Background
  • Completed Work
  • Proposed Work


  • Background
  • Completed Work
  • Proposed Work


  • Better Models
  • Better Events
  • Discourse Relations
  • Bonus
  • Coreference
  • Question-Answering

54

  • Better Models
  • Better Events
  • Discourse Relations
  • Bonus
  • Coreference
  • Question-Answering
slide-55
SLIDE 55

Better Models

  • Other Neural Approaches may work better than

LSTM.

55

slide-56
SLIDE 56

Better Models (1/3)

  • Different kinds of RNN:
  • Gated Recurrent Units (GRUs) [Cho et al. 2014]
  • Grid LSTM [Kalchbrenner et al. 2015]
  • Gated Feedback Recurrent Units [Chung et al. 2015]
  • Replacing one black box with another.

56

slide-57
SLIDE 57

Better Models (2/3)

  • Convolutional Neural Networks (CNNs):
  • Learn 1D convolution operators to apply to event

sequences.

  • Arrive ultimately at vector predicting next

event(s).

  • Recent success with NLP classification tasks.

[Kalchbrenner, Grefenstette, & Blunsom 2014; Kim 2014; Zhang, Zhao, & LeCun 2015]

57

slide-58
SLIDE 58

Better Models (3/3)

  • Attention-based Models: contain explicit notion of

where in input is most predictively useful (where to “pay attention”).

  • Recently shown to be useful in NLP tasks

(Bahdanau et al. 2015, Hermann et al. 2015).

58

slide-59
SLIDE 59

Better Models (3/3)

  • Churchill had suffered a mild stroke while on holiday in the south of France in the summer of
  • 1949. The strain of carrying the Premiership and Foreign Office contributed to his stroke at

10 Downing Street after dinner on the evening of 23 June 1953. Despite being partially paralysed down one side, he presided over a Cabinet meeting the next morning without anybody noticing his incapacity. Thereafter his condition deteriorated, and it was thought that he might not survive the weekend. Had Eden been fit, Churchill's premiership would most likely have been over. News of this was kept from the public and from Parliament, who were told that Churchill was suffering from exhaustion. He went to his country home, Chartwell, to recuperate, and by the end of June he astonished his doctors by being able, dripping with perspiration, to lift himself upright from his chair. He joked that news of his illness had chased the trial of the serial killer John Christie off the front pages. Churchill was still keen to pursue a meeting with the Soviets and was open to the idea of a reunified

  • Germany. He refused to condemn the Soviet crushing of East Germany, commenting on 10

July 1953 that "The Russians were surprisingly patient about the disturbances in East Germany". He thought this might have been the reason for the removal of Beria. Churchill returned to public life in October 1953 to make a speech at the Conservative Party conference at Margate.

59

slide-60
SLIDE 60

Better Models (3/3)

  • Churchill had suffered a mild stroke while on holiday in the south of France in the summer of
  • 1949. The strain of carrying the Premiership and Foreign Office contributed to his stroke at

10 Downing Street after dinner on the evening of 23 June 1953. Despite being partially paralysed down one side, he presided over a Cabinet meeting the next morning without anybody noticing his incapacity. Thereafter his condition deteriorated, and it was thought that he might not survive the weekend. Had Eden been fit, Churchill's premiership would most likely have been over. News of this was kept from the public and from Parliament, who were told that Churchill was suffering from exhaustion. He went to his country home, Chartwell, to recuperate, and by the end of June he astonished his doctors by being able, dripping with perspiration, to lift himself upright from his chair. He joked that news of his illness had chased the trial of the serial killer John Christie off the front pages. Churchill was still keen to pursue a meeting with the Soviets and was open to the idea of a reunified

  • Germany. He refused to condemn the Soviet crushing of East Germany, commenting on 10

July 1953 that "The Russians were surprisingly patient about the disturbances in East Germany". He thought this might have been the reason for the removal of Beria. Churchill returned to public life in October 1953 to make a speech at the Conservative Party conference at Margate.

60

slide-61
SLIDE 61

Better Models (3/3)

  • Churchill had suffered a mild stroke while on holiday in the south of France in the summer
  • f 1949. The strain of carrying the Premiership and Foreign Office contributed to his stroke at

10 Downing Street after dinner on the evening of 23 June 1953. Despite being partially paralysed down one side, he presided over a Cabinet meeting the next morning without anybody noticing his incapacity. Thereafter his condition deteriorated, and it was thought that he might not survive the weekend. Had Eden been fit, Churchill's premiership would most likely have been over. News of this was kept from the public and from Parliament, who were told that Churchill was suffering from exhaustion. He went to his country home, Chartwell, to recuperate, and by the end of June he astonished his doctors by being able, dripping with perspiration, to lift himself upright from his chair. He joked that news of his illness had chased the trial of the serial killer John Christie off the front pages. Churchill was still keen to pursue a meeting with the Soviets and was open to the idea of a reunified

  • Germany. He refused to condemn the Soviet crushing of East Germany, commenting on 10

July 1953 that "The Russians were surprisingly patient about the disturbances in East Germany". He thought this might have been the reason for the removal of Beria. Churchill returned to public life in October 1953 to make a speech at the Conservative Party conference at Margate.

61

slide-62
SLIDE 62

Better Models (3/3)

  • An attention-based script system would add an

explicit distribution of predictive utility over

  • bserved events.

62

slide-63
SLIDE 63

Outline

  • Background
  • Completed Work
  • Proposed Work


  • Background
  • Completed Work
  • Proposed Work


  • Better Models
  • Better Events
  • Discourse Relations
  • Bonus
  • Coreference
  • Question-Answering

63

  • Better Models
  • Better Events
  • Discourse Relations
  • Bonus
  • Coreference
  • Question-Answering
slide-64
SLIDE 64

Raw Text v. Linguistics

64

Only raw text Arbitrarily complex NLP Some Linguistic Structure ? ?

slide-65
SLIDE 65

Better Events

  • Events in P & Mooney (2016) are v(es, eo, ep, p) 5-

tuples.

  • This throws away a lot of important information.

Importance is an empirical question, but should be investigated.

  • We will investigate a number of ways to add

information to events (enumerated next…).

65

slide-66
SLIDE 66

Better Events (1/4)

  • Fixed-arity events throw away multiple Prepositional

Phrases.

  • In 1697, Peter the Great traveled incognito to Europe on

an 18-month journey with a large Russian delegation to seek the aid of the European monarchs.

  • Is presently: 


(travel, peter, ·, (in 1697)).

  • Could be something like 


(travel, Peter, ·, (in 1697), (to Europe), (on journey), 
 (with delegation)).

66

slide-67
SLIDE 67

Better Events (2/4)

  • Many important modifiers that aren’t grammatically

prepositions.

  • King Frederick William I nearly executed his son

for desertion.

  • Without “nearly” we make drastically wrong

inferences!

67

slide-68
SLIDE 68

Better Events (3/4)

  • Head Nouns of Arguments are Insufficient:
  • Martin Luther wrote to his bishop protesting the

sale of indulgences.

  • If “Sale of Indulgences” is just “sale”…
  • …we can’t conclude “Luther disapproved of

indulgences.”

68

slide-69
SLIDE 69

Better Events (4/4)

  • Nominal (noun) events are very common:
  • In the years following his death, a series of civil

wars tore Alexander’s empire apart.

  • Noun events are crucial for inferring events.

69

slide-70
SLIDE 70

Discourse Relations

  • Relations between events are important.
  • The Roman cavalry won an early victory by

swiftly routing the Carthaginian horses.

  • Because the local authorities had forbidden

students from forming organizations, Princip and

  • ther members of Young Bosnia met in secret.
  • Connectives express relations between events,

are likely useful for event inference.

70

slide-71
SLIDE 71

Discourse Parsers

  • Off-the-shelf Discourse Parsers have been trained

to annotate discursively important relations between spans of text.

  • Trained on one of two Discourse treebanks [RST

Treebank and Penn Discourse Treebank].

  • Label spans of text as being, e.g., causally or

temporally related .

71

slide-72
SLIDE 72

Using Discourse Parsers

  • Could incorporate shallow discourse structure into

events (i.e. “these two events are related by this discourse relation”).

  • Could also hypothetically incorporate discourse

structure directly into structure of Neural Net.

  • RST parses are trees; could use recently-

introduced Tree-LSTMs [Tai et al. 2015] on the topology.

72

slide-73
SLIDE 73

Induced Connectives

  • Could also induce a closed class of connectives

(e.g. “before,” “because of,” …) in an unsupervised manner.

  • A number of ways to integrate into RNN sequence

model.

73

slide-74
SLIDE 74

Outline

  • Background
  • Completed Work
  • Proposed Work


  • Background
  • Completed Work
  • Proposed Work


  • Better Models
  • Better Events
  • Discourse Relations
  • Bonus
  • Coreference
  • Question-Answering

74

  • Better Models
  • Better Events
  • Discourse Relations
  • Bonus
  • Coreference
  • Question-Answering
slide-75
SLIDE 75
  • Voltaire, pretending to work in Paris as an

assistant to a notary, spent much of his time writing poetry. When his father found out, he sent Voltaire to study law. Nevertheless, he continued to write…

  • Unifying “he” and “Voltaire” could be done with

script knowledge.

  • Script information can improve coreference.

Scripts for Coreference

75

slide-76
SLIDE 76
  • A number of conceivable ways to incorporate

scripts into ML-based coreference engines:

  • Add script probability, assuming a coreference

decision is made, as feature to coref system.

  • Add probability of sequence of all events

involving an entity as a feature.

Scripts for Coreference

76

slide-77
SLIDE 77
  • Scripts are intuitively useful for Question-

Answering Systems.

  • Add confident script inferences to Knowledge

Base about document.

  • Would allow inferences about implicit events.

Scripts for Question- Answering

77

slide-78
SLIDE 78

Conclusion

  • LSTMs do much better than Markov-like statistical script

systems.

  • We propose:
  • Using better neural models.
  • Better events.
  • More discourse-awareness.
  • Trying to improve coref.
  • Improving question-answering.

78

slide-79
SLIDE 79

Thanks!

79