Soft Inference and Posterior Marginals September 19, 2013 Soft - - PowerPoint PPT Presentation

soft inference and posterior marginals
SMART_READER_LITE
LIVE PREVIEW

Soft Inference and Posterior Marginals September 19, 2013 Soft - - PowerPoint PPT Presentation

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference Task 1:


slide-1
SLIDE 1

Soft Inference and
 Posterior Marginals

September 19, 2013

slide-2
SLIDE 2

Soft vs. Hard Inference

  • Hard inference

– “Give me a single solution” – Viterbi algorithm – Maximum spanning tree (Chu-Liu-Edmonds alg.)

  • Soft inference

– Task 1: Compute a distribution over outputs – Task 2: Compute functions on distribution

  • marginal probabilities, expected values,

entropies, divergences

slide-3
SLIDE 3

Why Soft Inference?

  • Useful applications of posterior distributions

– Entropy: how confused is the model? – Entropy: how confused is the model of its prediction at time i? – Expectations

  • What is the expected number of words in a translation
  • f this sentence?
  • What is the expected number of times a word ending in

–ed was tagged as something other than a verb?

– Posterior marginals: given some input, how likely is it that some (latent) event of interest happened?

slide-4
SLIDE 4

String Marginals

  • Inference question for HMMs

– What is the probability of a string w?
 Answer: generate all possible tag sequences and explicitly marginalize
 
 time

slide-5
SLIDE 5

DET NN ADJ V

DET ADJ NN V DET 0.0 0.0 0.0 0.5 ADJ 0.3 0.2 0.1 0.1 NN 0.7 0.7 0.3 0.2 V 0.0 0.1 0.4 0.1 0.0 0.0 0.2 0.1

Transition Probabilities:

DET ADJ NN V 0.5 0.1 0.3 0.1

Initial Probabilities:

V might 0.2 watch 0.3 watches 0.2 loves 0.1 reads 0.19 books 0.01 NN book 0.3 plants 0.2 people 0.2 person 0.1 John 0.1 watch 0.1 ADJ green 0.1 big 0.4

  • ld

0.4 might 0.1 DET the 0.7 a 0.3

Emission Probabilities: Examples:

John might watch NN V V the

  • ld

person loves big books DET ADJ NN V ADJ NN

slide-6
SLIDE 6

John migh t watc h DET DET DET 0.0 DET DET ADJ 0.0 DET DET NN 0.0 DET DET V 0.0 DET ADJ DET 0.0 DET ADJ ADJ 0.0 DET ADJ NN 0.0 DET ADJ V 0.0 DET NN DET 0.0 DET NN ADJ 0.0 DET NN NN 0.0 DET NN V 0.0 DET V DET 0.0 DET V ADJ 0.0 DET V NN 0.0 DET V V 0.0 John migh t watc h ADJ DET DET 0.0 ADJ DET ADJ 0.0 ADJ DET NN 0.0 ADJ DET V 0.0 ADJ ADJ DET 0.0 ADJ ADJ ADJ 0.0 ADJ ADJ NN 0.0 ADJ ADJ V 0.0 ADJ NN DET 0.0 ADJ NN ADJ 0.0 ADJ NN NN 0.0 ADJ NN V 0.0 ADJ V DET 0.0 ADJ V ADJ 0.0 ADJ V NN 0.0 ADJ V V 0.0 John migh t watc h NN DET DET 0.0 NN DET ADJ 0.0 NN DET NN 0.0 NN DET V 0.0 NN ADJ DET 0.0 NN ADJ ADJ 0.0 NN ADJ NN 0.0000042 NN ADJ V 0.0000009 NN NN DET 0.0 NN NN ADJ 0.0 NN NN NN 0.0 NN NN V 0.0 NN V DET 0.0 NN V ADJ 0.0 NN V NN 0.0000096 NN V V 0.0000072 John migh t watc h V DET DET 0.0 V DET ADJ 0.0 V DET NN 0.0 V DET V 0.0 V ADJ DET 0.0 V ADJ ADJ 0.0 V ADJ NN 0.0 V ADJ V 0.0 V NN DET 0.0 V NN ADJ 0.0 V NN NN 0.0 V NN V 0.0 V V DET 0.0 V V ADJ 0.0 V V NN 0.0 V V V 0.0

slide-7
SLIDE 7

John migh t watc h DET DET DET 0.0 DET DET ADJ 0.0 DET DET NN 0.0 DET DET V 0.0 DET ADJ DET 0.0 DET ADJ ADJ 0.0 DET ADJ NN 0.0 DET ADJ V 0.0 DET NN DET 0.0 DET NN ADJ 0.0 DET NN NN 0.0 DET NN V 0.0 DET V DET 0.0 DET V ADJ 0.0 DET V NN 0.0 DET V V 0.0 John migh t watc h ADJ DET DET 0.0 ADJ DET ADJ 0.0 ADJ DET NN 0.0 ADJ DET V 0.0 ADJ ADJ DET 0.0 ADJ ADJ ADJ 0.0 ADJ ADJ NN 0.0 ADJ ADJ V 0.0 ADJ NN DET 0.0 ADJ NN ADJ 0.0 ADJ NN NN 0.0 ADJ NN V 0.0 ADJ V DET 0.0 ADJ V ADJ 0.0 ADJ V NN 0.0 ADJ V V 0.0 John migh t watc h NN DET DET 0.0 NN DET ADJ 0.0 NN DET NN 0.0 NN DET V 0.0 NN ADJ DET 0.0 NN ADJ ADJ 0.0 NN ADJ NN 0.0000042 NN ADJ V 0.0000009 NN NN DET 0.0 NN NN ADJ 0.0 NN NN NN 0.0 NN NN V 0.0 NN V DET 0.0 NN V ADJ 0.0 NN V NN 0.0000096 NN V V 0.0000072 John migh t watc h V DET DET 0.0 V DET ADJ 0.0 V DET NN 0.0 V DET V 0.0 V ADJ DET 0.0 V ADJ ADJ 0.0 V ADJ NN 0.0 V ADJ V 0.0 V NN DET 0.0 V NN ADJ 0.0 V NN NN 0.0 V NN V 0.0 V V DET 0.0 V V ADJ 0.0 V V NN 0.0 V V V 0.0

slide-8
SLIDE 8

Weighted Logic Programming

  • Slightly different notation than the textbook,

but you will see it in the literature

  • WLP is useful here because it lets us build

hypergraphs

slide-9
SLIDE 9

Weighted Logic Programming

  • Slightly different notation than the textbook,

but you will see it in the literature

  • WLP is useful here because it lets us build

hypergraphs

slide-10
SLIDE 10

Hypergraphs

slide-11
SLIDE 11

Hypergraphs

slide-12
SLIDE 12

Hypergraphs

slide-13
SLIDE 13

Viterbi Algorithm

Item form

slide-14
SLIDE 14

Axioms

Viterbi Algorithm

Item form

slide-15
SLIDE 15

Axioms Goals

Viterbi Algorithm

Item form

slide-16
SLIDE 16

Axioms Goals Inference rules

Viterbi Algorithm

Item form

slide-17
SLIDE 17

Axioms Goals Inference rules

Viterbi Algorithm

Item form

slide-18
SLIDE 18

Viterbi Algorithm

w=(John, might, watch) Goal:

slide-19
SLIDE 19

String Marginals

  • Inference question for HMMs

– What is the probability of a string w?
 Answer: generate all possible tag sequences and explicitly marginalize
 
 
 Answer: use the forward algorithm time time space

slide-20
SLIDE 20

Forward Algorithm

  • Instead of computing a max of inputs at

each node, use addition

  • Same run-time, same space requirements
  • Viterbi cell interpretation

– What is the score of the best path through the lattice ending in state q at time i?

  • What does a forward node weight

correspond to?

slide-21
SLIDE 21

Forward Algorithm Recurrence

slide-22
SLIDE 22

Forward Chart

a i=1

slide-23
SLIDE 23

Forward Chart

a i=1

slide-24
SLIDE 24

Forward Chart

a i=1

slide-25
SLIDE 25

Forward Chart

a b i=1 i=2

slide-26
SLIDE 26

John

0.0 0.0 0.03 0.0

might

0.0 0.0003 0.0 0.0024

watch

0.0 0.0 0.000069 0.000081

DET ADJ NN V

0.0000219

slide-27
SLIDE 27

Posterior Marginals

  • Marginal inference question for HMMs

– Given x, what is the probability of being in a state q at time i?
 
 
 – Given x, what is the probability of transitioning from state q to r at time i?

slide-28
SLIDE 28

Posterior Marginals

  • Marginal inference question for HMMs

– Given x, what is the probability of being in a state q at time i?
 
 
 – Given x, what is the probability of transitioning from state q to r at time i?

slide-29
SLIDE 29

Posterior Marginals

  • Marginal inference question for HMMs

– Given x, what is the probability of being in a state q at time i?
 
 
 – Given x, what is the probability of transitioning from state q to r at time i?

slide-30
SLIDE 30

Backward Algorithm

  • Start at the goal node(s) and work

backwards through the hypergraph

  • What is the probability in the goal node

cell?

  • What if there is more than one cell?
  • What is the value of the axiom cell?
slide-31
SLIDE 31

Backward Recurrence

slide-32
SLIDE 32

Backward Chart

slide-33
SLIDE 33

Backward Chart

i=5

slide-34
SLIDE 34

Backward Chart

i=5

slide-35
SLIDE 35

Backward Chart

i=5

slide-36
SLIDE 36

Backward Chart

i=5 b

slide-37
SLIDE 37

Backward Chart

i=5 b

slide-38
SLIDE 38

Backward Chart

b c i=3 i=4 i=5

slide-39
SLIDE 39

Forward-Backward

  • Compute forward chart
  • Compute backward chart

What is ?

slide-40
SLIDE 40

Forward-Backward

  • Compute forward chart
  • Compute backward chart

What is ?

slide-41
SLIDE 41

Edge Marginals

  • What is the probability that x was

generated and q -> r happened at time t?

slide-42
SLIDE 42

Edge Marginals

  • What is the probability that x was

generated and q -> r happened at time t?

slide-43
SLIDE 43

Forward-Backward

a b b b c i=1 i=2 i=3 i=4 i=5

slide-44
SLIDE 44

Generic Inference

  • Semirings are useful structures in

abstract algebra

– Set of values – Addition, with additive identity 0: (a + 0 = a) – Multiplication, with mult identity 1: (a * 1 = a)

  • Also: a * 0 = 0

– Distributivity: a * (b + c) = a * b + a * c – Not required: commutativity, inverses

slide-45
SLIDE 45

So What?

  • You can unify Forward and Viterbi by

changing the semiring

slide-46
SLIDE 46

Semiring Inside

  • Probability semiring

– marginal probability of output

  • Counting semiring

– number of paths (“taggings”)

  • Viterbi semiring

– best scoring derivation

  • Log semiring w[e] = wTf(e)

– log(Z) = log partition function

slide-47
SLIDE 47

Semiring Edge-Marginals

  • Probability semiring

– posterior marginal probability of each edge

  • Counting semiring

– number of paths going through each edge

  • Viterbi semiring

– score of best path going through each edge

  • Log semiring

– log (sum of all exp path weights of all paths with e)
 = log(posterior marginal probability) + log(Z)

slide-48
SLIDE 48

Max-Marginal Pruning

slide-49
SLIDE 49

Weighted Logic Programming

  • Slightly different notation than the textbook,

but you will see it in the literature

  • WLP is useful here because it lets us build

hypergraphs

slide-50
SLIDE 50

Hypergraphs

slide-51
SLIDE 51

Hypergraphs

slide-52
SLIDE 52

Hypergraphs

slide-53
SLIDE 53

Generalizing Forward-Backward

  • Forward/Backward algorithms are a

special case of Inside/Outside algorithms

  • It’s helpful to think of I/O as algorithms
  • n PCFG parse forests, but it’s more

general

– Recall the 5 views of decoding: decoding is parsing – More specifically, decoding is a weighted proof forest

slide-54
SLIDE 54

CKY Algorithm

Item form

slide-55
SLIDE 55

CKY Algorithm

Goals Item form

slide-56
SLIDE 56

CKY Algorithm

Axioms Goals Item form

slide-57
SLIDE 57

CKY Algorithm

Axioms Goals Inference rules Item form

slide-58
SLIDE 58

Posterior Marginals

  • Marginal inference question for PCFGs

– Given w, what is the probability of having a constituent of type Z from i to j? – Given w, what is the probability of having a constituent of any type from i to j? – Given w, what is the probability of using rule
 Z -> XY to derive the span from i to j?

slide-59
SLIDE 59

Inside Algorithm

slide-60
SLIDE 60

Inside Algorithm

slide-61
SLIDE 61

Inside Algorithm

slide-62
SLIDE 62

CKY Inside Algorithm

Base case(s) Recurrence

slide-63
SLIDE 63

Generic Inside

slide-64
SLIDE 64

Questions for Generic Inside

  • Probability semiring

– Marginal probability of input

  • Counting semiring

– Number of paths (parses, labels, etc)

  • Viterbi semiring

– Viterbi probability (max joint probability)

  • Log semiring

– log Z(input)

slide-65
SLIDE 65

Outside probabilities: decomposing the problem

f pe

N

g e q

N

) 1 ( +

w1 wp-1wp wq wq+1 wewe+1 wm . . . . . . . . . . . .

j pq

N

The shaded area represents the outside probability which we need to

  • calculate. How can this be decomposed?

) , ( q p

j

α

slide-66
SLIDE 66

Outside probabilities: decomposing the problem

f pe

N

w1 wp-1wp wq wq+1 wewe+1 wm . . . . . . . . . . . .

j pq

N

g e q

N

) 1 ( +

Step 1: We assume that is the parent of . Its outside probability, , (represented by the yellow shading) is available recursively. How do we calculate the cross-hatched probability?

) , ( e p

f

α

f pe

N

j pq

N

slide-67
SLIDE 67

Outside probabilities: decomposing the problem

f pe

N

w1 wp-1wp wq wq+1 wewe+1 wm . . . . . . . . . . . .

j pq

N

g e q

N

) 1 ( +

Step 2: The red shaded area is the inside probability

  • f , which is available as .

g e q

N

) 1 ( +

) , 1 ( e q

g

+ β

slide-68
SLIDE 68

Outside probabilities: decomposing the problem

f pe

N

w1 wp-1wp wq wq+1 wewe+1 wm . . . . . . . . . . . .

j pq

N

g e q

N

) 1 ( +

Step 3: The blue shaded part corresponds to the production Nf → Nj Ng, which because of the context- freeness of the grammar, is not dependent on the positions of the words. It’s probability
 is simply P(Nf → Nj Ng|Nf, G) and is
 available from the PCFG without
 calculation.

slide-69
SLIDE 69

Generic Outside

slide-70
SLIDE 70

Generic Inside-Outside

slide-71
SLIDE 71

Inside-Outside

  • Inside probabilities are required to

compute Outside probabilities

  • Inside-Outside works where Forward-

Backward does, but not vice-versa

  • Implementation considerations

– Building a hypergraph explicitly simplifies code, but it can be expensive in terms of memory