Motivation Beyond local representation of language Information - - PDF document

motivation
SMART_READER_LITE
LIVE PREVIEW

Motivation Beyond local representation of language Information - - PDF document

Probabilistic First Order Models for Coreference Aron Culotta Information Extraction & Synthesis Lab University of Massachusetts joint work with advisor Andrew McCallum Motivation Beyond local representation of language


slide-1
SLIDE 1

1

Probabilistic First Order Models for Coreference

Aron Culotta

Information Extraction & Synthesis Lab University of Massachusetts

joint work with advisor Andrew McCallum

Motivation

  • Beyond local representation of language

– Information Extraction

  • Reason about extracted records, not just fields

– Identity Uncertainty (Coreference resolution)

  • Reason about entities, not just mentions

– Parsing

  • Global semantic/discourse constraints

– Joint Extraction and Data Mining

slide-2
SLIDE 2

2

Toward High-Order Representations

Identity Uncertainty

..Howard Dean.. ..H Dean.. ..Dean Martin.. ..Dino.. ..Howard Martin.. ..Howard..

Toward High-Order Representations

Identity Uncertainty

..Howard Dean.. ..H Dean.. ..Dean Martin.. ..Dino.. ..Howard Martin.. ..Howard..

slide-3
SLIDE 3

3

Toward High-Order Representations

Identity Uncertainty

Dean Martin Howard Dean Howard Martin

SamePers

  • n(Howard Dean, Howard Martin)?

SamePerson(Dean Martin, Howard Martin)? SamePerson(Dean Martin, Howard Dean)?

Pairwise Features

StringMatch(x1,x2) EditDistance(x1,x2)

Dean Martin Howard Dean Howard Martin

SamePerson(Howard Dean, Howard Martin, Dean Martin)?

First-Order Features

∀x1,x2 StringMatch(x1,x2) ∃x1,x2 ¬StringMatch(x1,x2) ∃x1,x2 EditDistance>.5(x1,x2) ThreeDistinctStrings(x1,x2, x3 )

Toward High-Order Representations

Identity Uncertainty

slide-4
SLIDE 4

4

Dean Martin Howard Dean Howard Martin Dino Howie Martin

SamePerson(x1,x2)

SamePerson(x1,x2 ,x3)

… …

Toward High-Order Representations

Identity Uncertainty

SamePerson(x1,x2 ,x3,x4) SamePerson(x1,x2 ,x3,x4 ,x5) SamePerson(x1,x2 ,x3,x4 ,x5 ,x6)

… … …

. . . . . .

Combinatorial Explosion!

This space complexity is common in first-order probabilistic models

slide-5
SLIDE 5

5 ground Markov network Markov Logic as a Template to Construct a Markov Network using First-Order Logic

[Richardson & Domingos 2005]

grounding Markov network requires space O(nr) n = number constants r = highest clause arity

How can we perform inference and learning in models that cannot be grounded?

slide-6
SLIDE 6

6

Inference in First-Order Models

SAT Solvers

  • Weighted SAT solvers [Kautz et al 1997]

–Requires complete grounding of network

  • LazySAT [Singla & Domingos 2006]

– Saves memory by only storing clauses that may become unsatisfied

  • Gibbs Sampling

– Difficult to move between high probability configurations by changing single variables

  • Although, consider MC-SAT [Poon & Domingos ‘06]
  • An alternative: Metropolis-Hastings sampling

– Can be extended to partial configurations

  • Only instantiate relevant variables

– Successfully used in BLOG models [Milch et al 2005]

Inference in First-Order Models

MCMC

slide-7
SLIDE 7

7

Learning in First-Order Models

  • Sampling
  • Pseudo-likelihood
  • Voted Perceptron
  • We propose:

– Conditional model to rank configurations – Intuitive objective function for Metropolis-Hastings

Contributions

  • Metropolis-Hastings sampling in an undirected

model with first-order features

  • Discriminative training for Metropolis-Hastings
slide-8
SLIDE 8

8

An Undirected Model of Identity Uncertainty

Dean Martin Howard Dean Howard Martin Dino Howie Martin

SamePerson(x1,x2)

SamePerson(x1,x2 ,x3)

… …

Toward High-Order Representations

Identity Uncertainty

SamePerson(x1,x2 ,x3,x4) SamePerson(x1,x2 ,x3,x4 ,x5) SamePerson(x1,x2 ,x3,x4 ,x5 ,x6)

… … …

. . . . . .

Combinatorial Explosion!

slide-9
SLIDE 9

9

Model

Howard Dean Governor Howie Dean Martin Dino Howard Martin Howie Martin

fw: SamePerson(x) fb: DifferentPerson(x, x’ ) “First-order features”

Model

Howard Dean Governor Howie Dean Martin Dino Howard Martin Howie Martin

slide-10
SLIDE 10

10

Model

ZX: Sum over all possible configurations!

Inference with Metropolis-Hastings

  • y : configuration
  • p(y’)/p(y) : likelihood ratio

– Ratio of P(Y|X) – ZX cancels

  • q(y’|y) : proposal distribution

– probability of proposing move y ⇒y’

slide-11
SLIDE 11

11

Proposal Distribution

Dean Martin Howie Martin Howard Martin Dino Dean Martin Dino Howard Martin Howie Martin y y’

Proposal Distribution

Dean Martin Howie Martin Howard Martin Dino Dean Martin Howie Martin Howard Martin Howie Martin y y’

slide-12
SLIDE 12

12

Proposal Distribution

Dean Martin Howie Martin Howard Martin Dino Dean Martin Howie Martin Howard Martin Howie Martin y y’

Learning the Likelihood Ratio

Given a pair of configurations, learn to rank the “better” configuration higher.

slide-13
SLIDE 13

13

Learning the Likelihood Ratio

S*(Y) = true evaluation of configuration (e.g. F1)

Sampling Training Examples

  • Run sampler on training data
  • Generate training example for each

proposed move

  • Iteratively retrain during sampling
slide-14
SLIDE 14

14

Tying Parameters with Proposal Distribution

  • Proposal distribution q(y’|y) “cheap”

approximation to p(y)

  • Reuse subset of parameters in p(y)
  • E.g. in identity uncertainty model

– Sample two clusters – Stochastic agglomerative clustering to propose new configuration

Experiments

slide-15
SLIDE 15

15

Simplified Model

  • Use only within-cluster factors.
  • Inference with agglomerative clustering

Dean Martin Dino Howard Martin Howie Martin

Experiments

  • Paper citation coreference
  • Author coreference
  • First-order features

– All Titles Match, Exists Year MisMatch, Average String Edit Distance > X, … – Number of mentions

slide-16
SLIDE 16

16

Results on Citation Data

84.9 81.0 reason 83.2 88.9 face 78.7 93.4 reinforce 76.7 82.3 constraint Pairwise First-Order 25.4 65.4 smith_b 36.2 43.2 li_w 61.7 41.9 miller_d Pairwise First-Order

Citeseer paper coreference results (pair F1) Author coreference results (pair F1)

Conclusions

  • Enable tractable training of first-order features

in relational models

  • Higher-order representations can help identity

uncertainty

slide-17
SLIDE 17

17

Related Work

  • MLNs [Richardson et al 2006]
  • BLOG [Milch et al 2005]
  • Lifted Inference [Poole ‘03] [Braz et al ‘05]

– Inference over populations to avoid grounding network – Difficult to answer queries about one specific input

  • SEARN [Daume et al 2005]:

– Learns distribution over possible moves in search-based inference – Assumes can enumerate all local moves

  • Reinforcement learning for combinatorial search

– [Zhang and Dietterich ‘95] [Boyan ‘98]