Exact Lifted Inference with Distinct Soft Evidence on Every Object - - PowerPoint PPT Presentation

exact lifted inference with distinct soft evidence on
SMART_READER_LITE
LIVE PREVIEW

Exact Lifted Inference with Distinct Soft Evidence on Every Object - - PowerPoint PPT Presentation

Exact Lifted Inference with Distinct Soft Evidence on Every Object Hung Hai Bui, Tuyen N. Huynh, Rodrigo de Salvo Braz Artificial Intelligence Center SRI International Menlo Park, CA, USA July 26, 2012 AAAI 2012 1/18 Outline 1 Outline 2


slide-1
SLIDE 1

Exact Lifted Inference with Distinct Soft Evidence on Every Object

Hung Hai Bui, Tuyen N. Huynh, Rodrigo de Salvo Braz

Artificial Intelligence Center SRI International Menlo Park, CA, USA

July 26, 2012

AAAI 2012 1/18

slide-2
SLIDE 2

Outline

1 Outline 2 Distinct Soft Evidence is Problematic 3 LIDE (Lifted Inference with Distinct Evidence) 4 Experiments

AAAI 2012 2/18

slide-3
SLIDE 3

Lifted Inference and the Problematic Soft Evidence

  • The main idea of lifted inference is to exploit symmetry of the

probabilistic models. This leads to algorithms that can be very efficient on high-tree width, but symmetric models

  • Soft evidence at the level of every object destroys the model’s

symmetry

  • Everyone has different weight, cholesterol level, etc

Symmetry ¡destroyed ¡ Symmetric ¡

  • Aim: lifted inference with distinct soft evidence on every object

AAAI 2012 3/18

slide-4
SLIDE 4

Distinct Soft Evidence on a Unary Predicate

  • The simplest form of distinct soft evidence: on every

grounding of a single unary predicate

  • Consider an MLN M consists of
  • An MLN M0 with a unary predicate q.
  • A set of soft evidence of the form wi : q(i) for every object i.

M0 Evidence

1.4 : ¬Smokes(x) 2.3 : ¬Cancer(x) 4.6 : ¬Friends(x, y) 1.5 : Smokes(x) ⇒ Cancer(x) 1.1 : Smokes(x) ∧ Friends(x, y) ⇒ Smokes(y) w1 : Cancer(P1) w2 : Cancer(P2) ... w1000 : Cancer(P1000)

(tree-width = 1000)

AAAI 2012 4/18

slide-5
SLIDE 5

LIDE (Lifted Inference with Distinct Evidence)

  • Most lifted inference methods applied to M would completely

shatter the model, thus reverting to ground inference.

  • LIDE’s approach

1 Perform lifted inference on M0 only 2 Use special operations to absorb the soft evidences

  • Instead of exploiting symmetry of the model, we exploit

symmetry of the partition function

AAAI 2012 5/18

slide-6
SLIDE 6

Symmetric Function Definition

A n-variable function F(t1, . . . , tn) is symmetric if for all permutation π, permuting the variables of F by π does not change the output value, that is, F(t1, . . . , tn) = F(tπ(1) . . . , tπ(n)).

  • F depends only on the histogram of its arguments.
  • If ti ∈ {0, 1}, the set {ck}, k = 0, . . . , n, where ck = F(t) for

any t such that t1 = k is termed the counting representation of the symmetric function F.

  • An exchangable distribution is a symmetric function, so it

admits a counting representation.

AAAI 2012 6/18

slide-7
SLIDE 7

Exchangeability of Groundings of a Unary Predicate Theorem

Let D∗ = {d1, . . . , dn} be the set of individuals that do not appear as constants in the MLN M0 and q be a unary predicate in M0. Let P0(.) = Pr(q(d1) . . . q(dn) | M0). Then, the random vector (q(d1) . . . q(dn)) is exchangeable under P0.

  • Proof is in the paper.
  • This seems trivial: d1, . . . dn do not appear in M0 so they are

“indistinguishable”. But beware, “indistinguishable” does not necessarily imply exchangeable: groundings of an n-ary predicate in general are NOT exchangeable when n > 1.

AAAI 2012 7/18

slide-8
SLIDE 8

LIDE as a Wrapper

1 Step 1: apply any applicable lifted inference technique on M0

to compute the counting representation {ck} of P0().

  • One natural method is counting elimination.

2 Step 2: Absorb the soft evidence

  • Equivalent to compute the posterior of a set of exchangable

binary random variables P(q1, . . . , qn) = 1 Z P0(q1 . . . qn)

n

  • i=1

φi(qi) where qi = q(di)

AAAI 2012 8/18

slide-9
SLIDE 9

Posterior of Exchangeable Binary RVs

Pr(q1, . . . , qn) = 1 Z P0(q1 . . . qn)

n

  • i=1

φi(qi) We discuss three related problems, to compute

  • The MAP configuration q under the marginal Pr(q) (a.k.a the

marginal-map problem)

  • The partition function Z
  • The marginal Pr(qi) for each individual di

AAAI 2012 9/18

slide-10
SLIDE 10

MAP Inference

Let αi = φi(1)

φi(0), Φ = φi(0). Then

P(q) = Φ Z P0(q1 . . . qn)

n

  • i=1

αqi

i

max

q

P(q) = Φ Z max

k

ck max

q:q1=k n

  • i=1

αqi

i

  • Observation: the 2nd maximization simply picks k largest

elements of α.

  • By sorting the vector α, the MAP problem can be solved in

O(n log(n)) given {ck} as input.

AAAI 2012 10/18

slide-11
SLIDE 11

Partition Function Z

Z(α1, . . . , αn) = Φ

  • q1...qn

P0(q1, . . . , qn)

n

  • i=1

αqi

i

  • Observation: Z is a polynomial in α. More importantly Z is a

symmetric polynomial.

  • According to the fundamental theorem of symmetric

polynomials, it can be expressed in terms of a small number of building units called elementary symmetric polynomials. Z(α) = Φ

n

  • k=0

ckek(α)

AAAI 2012 11/18

slide-12
SLIDE 12

Elementary Symmetric Polynomials

  • ek(α) is the k-th order elementary symmetric polynomial in α,

the sum of all products of distinct k elements of α ek(α) =

  • 1≤i1<...<ik≤n

αi1 . . . αik

  • Sum of

n

k

  • terms, so naive evaluation is a bad idea.
  • Newton’s Identity: Let pk(α1 . . . αn) = n

i=1 αk i be the k-th

power sum. Then ek(α) = 1

k

k

i=1(−1)i−1ek−i(α)pi(α)

  • This yields a recursive method to compute all ek(α) in O(n2).
  • Thus, Z can be computed in O(n2) given {ck}.

AAAI 2012 12/18

slide-13
SLIDE 13

Marginal on Each Individual

  • As usual, the marginals Pr(qi) can be computed in a way

similar to the computation of the normalization term Z, as the following theorem shows.

  • Let α(i) be the vector such that α(i)

i

= 0 and α(i)

j

= αj for every j = i. Then Pr(qi = 0) = Z(α(i)) Z(α) = n

k=0 ckek(α(i))

n

k=0 ckek(α)

AAAI 2012 13/18

slide-14
SLIDE 14

Experimental Setup

  • Friends and Smokes domain.
  • Task: compute the marginal probability of having cancer of

each person given the cancer test readings of whole population.

  • Individual soft evidence uniformly sample from [0,2]. Thus,

lifted BP reduces to ground BP

  • Two versions of the “Friends & and Smokes” MLN:
  • Original Friends-and-Smokes: encourage Smokes(x) and

Smokes(y) to be the same if Friends(x, y) is unknown.

  • Attractive potential between Smokes(x) and Smokes(y)
  • Friends-and-Smokes-Neg:

−1.1 : Smokes(x) ∧ Friends(x, y) ⇒ Smokes(y).

  • Repulsive potential between Smokes(x) and Smokes(y)
  • Difficult test case for BP

AAAI 2012 14/18

slide-15
SLIDE 15

Running Time on “Friends & Smokes”

100 200 300 400 500 600 700 800 900 10 20 30 40 50 60 70 80 90 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 Running time (seconds) Number of persons LIDE BP

*Note

  • Use a slightly modified C-FOVE for lifted inference without

evidence

  • C-FOVE time dominates evidence absorbing time
  • Junction-tree ran out of memory for N = 30.

AAAI 2012 15/18

slide-16
SLIDE 16

Running Time on “Friends & Smokes-Neg”

1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100 Running time (seconds) Number of persons LIDE BP with damping (damping = 0.1)

*Note: BP did not converge when N≥ 200

AAAI 2012 16/18

slide-17
SLIDE 17

Evidence Strength vs Probability

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 1.5 2 Prob(Cancer) Evidence strength w

*Note:

  • This is a scatter plot, not a function.
  • Distribution of Pr(Cancer) spreads, so quantization will loose

accuracy.

AAAI 2012 17/18

slide-18
SLIDE 18

Conclusion and Future Direction

  • We propose a new strategy for handling distinct soft evidence
  • Perform lifted inference (e.g. C-FOVE) without the distinct

soft evidence

  • Absorb the soft evidence by exploiting the symmetry of the

partition polynomial Z

  • Future direction
  • Soft evidence on multiple (L) unary predicates.
  • Polynomial in domain size N, but super-exponential in L
  • Need to depart from exact inference and derive efficient

approximation.

  • Soft evidence on one binary predicate
  • Intractable in general
  • Are there efficient approximations that can be derived from

this approach?

AAAI 2012 18/18