Distant Supervision and MultiR Happy Mittal We will discuss - - PowerPoint PPT Presentation

distant supervision and multir
SMART_READER_LITE
LIVE PREVIEW

Distant Supervision and MultiR Happy Mittal We will discuss - - PowerPoint PPT Presentation

Distant Supervision and MultiR Happy Mittal We will discuss Distant Supervision [Mintz et al, 2009] MultiR [Hoffmann et al, 2011] Relation Instance Extraction Hrithik Roshans Movie Kaabil Actor(Hrithik Roshan, Kaabil) features love


slide-1
SLIDE 1

Distant Supervision and MultiR

Happy Mittal

slide-2
SLIDE 2

We will discuss

  • Distant Supervision [Mintz et al, 2009]
  • MultiR [Hoffmann et al, 2011]
slide-3
SLIDE 3

Relation Instance Extraction

  • Fully Supervised Learning
  • Labeled corpora of sentences.
  • Suffers from small dataset, domain bias.
  • Unsupervised Learning
  • Cluster patterns to identify relations.
  • Large corpora available.
  • Can’t give name to relations identified.
  • Bootstrap Learning
  • Give initial seed patterns and facts.
  • Generate more facts and patterns.
  • Suffers from semantic drift.
  • Distant Supervision
  • Combines advantages of above approaches.

Hrithik Roshan’s Movie Kaabil features love affair between two blind people.

Actor(Hrithik Roshan, Kaabil)

slide-4
SLIDE 4

Distant Supervision [Mintz et al 2009]

Sentences (Ex : Wikipedia articles)

Person Birth Place Edwin Hubble Marshfield …. ….

Knowledge base (Ex : Freebase) Generate training data

HOW ?

 Assumption : Fact r(e1,e2) => Every sentence having entities e1 and e2 specifies relation r.

slide-5
SLIDE 5

Distant Supervision (Generating training data)

  • Astronomer Edwin Hubble was born in Marshfield, Missouri.
  • Features :
  • Lexical Features
  • Entity Types of both entities.

NE1 NE2 Label PER LOC Birthplace

slide-6
SLIDE 6

Distant Supervision (Generating training data)

  • Astronomer Edwin Hubble was born in Marshfield, Missouri.
  • Features :
  • Lexical Features
  • Words between entities and their POS tags.

NE1 Middle NE2 Label PER [was/VERB born/VERB in/CLOSED] LOC Birthplace

slide-7
SLIDE 7

Distant Supervision (Generating training data)

  • Astronomer Edwin Hubble was born in Marshfield, Missouri.
  • Features :
  • Lexical Features
  • Window of k words to left and right, k∈{0,1,2}

Left Window NE1 Middle NE2 Right window Label [] PER [was/VERB born/VERB in/CLOSED] LOC [] Birthplace [Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,] Birthplace [#,Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,Missouri] Birthplace

slide-8
SLIDE 8

Distant Supervision (Generating training data)

  • Astronomer Edwin Hubble was born in Marshfield, Missouri.
  • Features :
  • Syntactic Features
  • Dependency Path between entities.
  • Window node in dependency path.
slide-9
SLIDE 9

Distant supervision

  • Strong Assumption : If a fact r(e1,e2) is seen in KB, then
  • Every sentence having e1 and e2 specifies relation r.
  • Relax this assumption :
  • At least one sentence having e1 and e2 specifies relation r [Riedel et al, 2010]
slide-10
SLIDE 10

Relaxing the assumption [Riedel et al 2010]

Founded 𝑍 ∈ R Relation Variable Z1 = 1 Z2 = 0 Steve Jobs founded Apple Steve Jobs is the CEO of Apple Z1,Z2∈ {0,1} Relation mention Variables X1 X2

  • Model the joint distribution 𝑄(𝑍 = 𝑧, 𝑎 = 𝑨|𝑦)
slide-11
SLIDE 11

Relaxing the assumption [Riedel et al 2010]

Founded 𝑍 ∈ R Relation Variable Z1 = 1 Z2 = 0 Steve Jobs founded Apple Steve Jobs is the CEO of Apple Z1,Z2∈ {0,1} Relation mention Variables X1 X2

  • Model the joint distribution 𝑄 𝑍 = 𝑧, 𝑎 = 𝑨 𝑦
  • Problem : Doesn’t allow overlapping relations.
  • MultiR solves that problem.
slide-12
SLIDE 12

MultiR [Hoffman et al 2011]

Founded 𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction) Z1 = Founded Z2 = CEO-of Steve Jobs founded Apple Steve Jobs is the CEO of Apple 𝑎𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) X1 X2 CEO-of Z3 = None Steve Jobs left Apple X3

slide-13
SLIDE 13

MultiR [Hoffman et al 2011]

  • Probability Distribution
  • 𝑄 𝑍 = 𝑧, 𝑎 = 𝑨 𝑦 =

1 𝑎𝑦 𝑠 𝜚𝑘𝑝𝑗𝑜(𝑧𝑠, 𝑨) 𝑗 ∅𝑓𝑦𝑢𝑠𝑏𝑑𝑢(𝑨𝑗,𝑦𝑗)

1 if at least one 𝑨𝑗 mentions relation 𝑧𝑠 [Mintz et al] features

slide-14
SLIDE 14

MultiR [Hoffman et al 2011]

  • Parameter Learning
  • 𝑄 𝑍 = 𝑧, 𝑎 = 𝑨 𝑦; 𝜄 =

1 𝑎𝑦 𝑠 𝜚𝑘𝑝𝑗𝑜(𝑧𝑠, 𝑨) 𝑗 ∅𝑓𝑦𝑢𝑠𝑏𝑑𝑢(𝑨𝑗,𝑦𝑗)

  • 𝑄 𝑍 = 𝑧, 𝑎 = 𝑨 𝑦; 𝜄 =

1 𝑎𝑦 𝑠 𝜚𝑘𝑝𝑗𝑜(𝑧𝑠, 𝑨) 𝑗 exp( 𝑘 𝜄 𝑘 ∅𝑘(𝑨𝑗,𝑦𝑗)

  • Treat Z variables as latent variables.
  • Interested in maximizing

𝑀 𝜄 =

𝑗

𝑄 𝑧𝑗 𝑦𝑗; 𝜄 =

𝑗 𝑨

𝑄 𝑧𝑗, 𝑨 𝑦𝑗; 𝜄 𝑚 𝜄 =

𝑗

𝑚𝑝𝑕

𝑨

𝑄 𝑧𝑗, 𝑨 𝑦𝑗; 𝜄

1 if at least one 𝑨𝑗 mentions relation 𝑧𝑠 [Mintz et al] features

slide-15
SLIDE 15

MultiR [Hoffman et al 2011]

  • Parameter learning

Assumption of online training

slide-16
SLIDE 16

MultiR [Hoffman et al 2011]

  • Parameter learning

Difficult to compute Compute argmax instead

slide-17
SLIDE 17

MultiR [Hoffman et al 2011]

  • Learning Algorithm

Need to do two inferences

slide-18
SLIDE 18

MultiR Inference 1 𝑏𝑠𝑕𝑛𝑏𝑦𝑧,𝑨𝑄(𝑧, 𝑨|𝑦; 𝜄)

?

𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction)

? ?

Steve Jobs founded Apple Apple was founded by Steve Jobs 𝑎𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) X1 X2

? ?

Steve Jobs ls the CEO of Apple X3

founder CEO-of

?

Capital

Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

slide-19
SLIDE 19

MultiR Inference 1 𝑏𝑠𝑕𝑛𝑏𝑦𝑧,𝑨𝑄(𝑧, 𝑨|𝑦; 𝜄)

?

𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction)

? ?

Steve Jobs founded Apple Apple was founded by Steve Jobs 𝑎𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) X1 X2

?

?

Steve Jobs is the CEO of Apple X3

founder CEO-of

?

Capital

Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

slide-20
SLIDE 20

MultiR Inference 1 𝑏𝑠𝑕𝑛𝑏𝑦𝑧,𝑨𝑄(𝑧, 𝑨|𝑦; 𝜄)

?

𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction)

Founder Founder

Steve Jobs founded Apple Apple was founded by Steve Jobs 𝑎𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) X1 X2

?

CEO-of

Steve Jobs is the CEO of Apple X3

founder CEO-of

?

Capital

Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

slide-21
SLIDE 21

MultiR Inference 1 𝑏𝑠𝑕𝑛𝑏𝑦𝑧,𝑨𝑄(𝑧, 𝑨|𝑦; 𝜄)

1

𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction)

Founder Founder

Steve Jobs founded Apple Apple was founded by Steve Jobs 𝑎𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) X1 X2

1

CEO-of

Steve Jobs is the CEO of Apple X3

founder CEO-of Capital

𝑃( 𝑆 𝑇 )

Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

slide-22
SLIDE 22

MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦𝑨𝑄(𝑨|𝑦, 𝑧; 𝜄)

1

𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction)

? ?

Steve Jobs founded Apple Apple was founded by Steve Jobs 𝑎𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) X1 X2

1 ?

Steve Jobs is the CEO of Apple X3

founder CEO-of Capital

Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

slide-23
SLIDE 23

MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦𝑨𝑄(𝑨|𝑦, 𝑧; 𝜄)

1

𝑍 ∈ 0,1 𝑠 Relation Variables (Capture aggregate level prediction)

? ?

Steve Jobs founded Apple Apple was founded by Steve Jobs 𝑎𝑗 ∈ 𝑆 Relation mention Variables (Capture sentence level prediction) X1 X2

1 ?

Steve Jobs is the CEO of Apple X3

founder CEO-of Capital

10.5 12.5 4.5 8.9 8.7 8.5 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

Potentials as edge weights (Ignore edges With y = 0)

slide-24
SLIDE 24

MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦𝑨𝑄(𝑨|𝑦, 𝑧; 𝜄)

1 Variant of weighted edge cover problem ? ?

Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2

1 ?

Steve Jobs is the CEO of Apple X3

founder CEO-of Capital

10.5 12.5 4.5 8.9 8.7 8.5 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

Potentials as edge weights (Ignore edges With y = 0) Each y at least one edge Each z exactly one edge

slide-25
SLIDE 25

MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦𝑨𝑄(𝑨|𝑦, 𝑧; 𝜄)

1 Variant of weighted edge cover problem ? ?

Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2

1 ?

Steve Jobs is the CEO of Apple X3

founder CEO-of Capital

10.5 12.5 4.5 8.9 8.7 8.5 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

Potentials as edge weights (Ignore edges With y = 0) Each y at least one edge Each z exactly one edge

slide-26
SLIDE 26

MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦𝑨𝑄(𝑨|𝑦, 𝑧; 𝜄)

1 Variant of weighted edge cover problem

Founder Founder

Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2

1

CEO-Of

Steve Jobs is the CEO of Apple X3

founder CEO-of Capital

10.5 12.5 4.5 8.9 8.7 8.5 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

Potentials as edge weights (Ignore edges With y = 0) Each y at least one edge Each z exactly one edge

Exact Solution 𝑃(𝑊(𝐹 + 𝑊𝑚𝑝𝑕𝑊))

slide-27
SLIDE 27

MultiR Inference 2 𝑏𝑠𝑕𝑛𝑏𝑦𝑨𝑄(𝑨|𝑦, 𝑧; 𝜄)

1 Variant of weighted edge cover problem

Founder Founder

Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2

1

CEO-Of

Steve Jobs is the CEO of Apple X3

founder CEO-of Capital

10.5 12.5 4.5 8.9 8.7 8.5 Founder 10.5 12.5 4.5 CEO-of 8.9 8.7 8.5 Capital 6.3 4.5 0.5

Potentials as edge weights (Ignore edges With y = 0) Each y at least one edge Each z exactly one edge

Approx Solution 𝑃(|𝑆||𝑇|)

slide-28
SLIDE 28

Experiments

  • Data
  • NY Times sentences : NER tagged
  • Used Freebase as KB.
  • Evaluation Metric
  • Challenging
  • Only 3% of sentences match facts in KB.
  • Number of matches across relations highly unbalanced.
  • Aggregate Extraction
  • Matched extracted relations with freebase relations.
  • Underestimates accuracy because many true relations not in free base.
  • Sentential Extraction
  • Sampled sentences from union of two sets of sentences :
  • Sentences from which some relation is extracted.
  • Sentences whose arguments match with entities in freebase.
  • Manually labelled them correct or incorrect.
  • Overestimates the recall.
slide-29
SLIDE 29

Experiments

  • Systems compared
  • Original implementation of Riedel et al [2010]
  • SoloR : Reimplementation of Riedel et al [2010]
  • MultiR
  • Metrics
  • Aggregate and sentential extraction results (PR curve)
  • Relation specific results
  • Running time
slide-30
SLIDE 30

Experiments

  • Results
  • Aggregate extraction
  • MultiR : High precision over all recall
  • MultiR : Recall from 20% to 25%
  • Low precision in 0-1% Recall
  • To investigate, extracted top 10

Relations marked wrong.

  • Correct but not present in Freebase.
slide-31
SLIDE 31

Experiments

  • Results
  • Sentential extraction
  • Riedel et al didn’t report.
  • MultiR : High precision and recall
  • MultiR : F1 score : 60.5%
slide-32
SLIDE 32

Experiments

  • Results
  • Relation specific results
  • Take 10 top frequent relations.
  • 𝑇𝑠

𝑁 : Sentences MultiR extracted relation r.

  • 𝑇𝑠

𝐺 : Sentences matching arguments in freebase for relation r.

  • Sample 100 sentences from both.
  • Compute Accuracy, Precision and recall.
slide-33
SLIDE 33

Experiments

Effect of modeling

  • verlapping relations
slide-34
SLIDE 34

Discussion

  • Only relies on freebase for experimental evaluation [Nupur et al]
  • Assumes that if a fact is present in text, then it must be present in KB

[Dinesh Raghu]

  • Only one relation in a sentence [Barun]
  • Assume entities occur as NP only. [Gagan]
  • Should use sampling instead of argmax as done in Riedel et al. [Happy,

Barun]

  • Evaluation problem : Only 3% sentences match in Freebase [Gagan]
  • For sentential extraction evaluation, sampled only 1000 sentences.
  • Separate graph for every entity pair : Scaling issue [Prachi]
slide-35
SLIDE 35

Possible Extensions

  • Evaluate on some other datasets as well, like Google knowledge graph

[Anshul, Rishabh]

  • Bootstrapping like NELL [Gagan et al]
  • Iteratively correct the facts during learning for 0-1% recall range

[Surag]

  • Extract entity mentions spanning multiple sentences [Anshul]
  • Relation to MLNs : Apply Lifting [Ankit]