Unifying Local Consistency and MAX SAT Relaxations for Scalable - - PowerPoint PPT Presentation

unifying local consistency and max sat relaxations for
SMART_READER_LITE
LIVE PREVIEW

Unifying Local Consistency and MAX SAT Relaxations for Scalable - - PowerPoint PPT Presentation

Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz AISTATS


slide-1
SLIDE 1

Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees

Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz

AISTATS 2015

slide-2
SLIDE 2 2

This Talk

§ Markov random fields capture rich dependencies in structured data, but inference is NP-hard § Relaxed inference can help, but techniques have tradeoffs § Two approaches: Local Consistency Relaxation MAX SAT Relaxation

C

slide-3
SLIDE 3 3

Takeaways

§ We can combine their advantages: quality guarantees and highly scalable message-passing algorithms § New inference algorithm for broad class of structured, relational models

Local Consistency Relaxation MAX SAT Relaxation

C

slide-4
SLIDE 4

Modeling Relational Data with Markov Random Fields

slide-5
SLIDE 5 5

Markov Random Fields

§ Probabilistic model for high-dimensional data: § The random variables represent the data, such as whether a person has an attribute or whether a link exists § The potentials score different configurations of the data § The weights scale the influence of different potentials

x w φ

P(x) ∝ exp

  • w>φ(x)
slide-6
SLIDE 6 6

Markov Random Fields

§ Variables and potentials form graphical structure:

φ φ φ φ φ φ φ φ x x x x

slide-7
SLIDE 7 7

Modeling Relational Data

§ Many important problems have relational structure § Common to use logic to describe probabilistic dependencies § Relations in data map to logical predicates

crossing waiting queueing walking talking dancing jogging
slide-8
SLIDE 8 8

Logical Potentials

§ One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks

[Richardson and Domingos, 2006]

§ Each first-order rule is a template for potentials

  • Ground out rule over relational data
  • The truth table of each ground rule is a potential
  • Each potential’s weight comes from the rule that templated it

5.0 : Friends(X, Y ) ∧ Smokes(X) = ⇒ Smokes(Y )

slide-9
SLIDE 9 9

Logical Potentials: Grounding

5.0 : Friends(X, Y ) ∧ Smokes(X) = ⇒ Smokes(Y )

  • Erin

Dave Claudia Bob Alexis

5.0 : Friends(Alexis, Bob) ∧ Smokes(Alexis) = ⇒ Smokes(Bob)

slide-10
SLIDE 10 10

Logical Potentials

§ Let be a set of rules, where each rule has the general form

  • Weights and sets and index variables

wj ≥ 0 I+

j

I−

j

R Rj

wj : B @ _

i∈I+

j

xi 1 C A _ B @ _

i∈I−

j

¬xi 1 C A

slide-11
SLIDE 11 11

§ MAP (maximum a posteriori) inference seeks a most- probable assignment to the unobserved variables § MAP inference is § This MAX SAT problem is combinatorial and NP-hard!

MAP Inference

arg max

x

P(x) ≡ arg max

x∈{0,1}n

X

Rj∈R

wj B @ B @ _

i∈I+ j

xi 1 C A _ B @ _

i∈I− j

¬xi 1 C A 1 C A

slide-12
SLIDE 12

Relaxed MAP Inference

slide-13
SLIDE 13 13

Approaches to Relaxed Inference

§ Local consistency relaxation

  • Developed in probabilistic graphical models community
  • ADVANTAGE: Many highly scalable algorithms available
  • DISADVANTAGE: No known quality guarantees for logical MRFs

§ MAX SAT relaxation

  • Developed in randomized algorithms community
  • ADVANTAGE: Provides strong quality guarantees
  • DISADVANTAGE: No algorithms designed for large-scale models

§ How can we combine these advantages?

slide-14
SLIDE 14

Local Consistency Relaxation

slide-15
SLIDE 15 15

Local Consistency Relaxation

§ LCR is a popular technique for approximating MAP in MRFs

  • Often simply called linear programming (LP) relaxation
  • Dual decomposition solves dual to LCR objective

§ Lots of work in PGM community, e.g.,

  • Globerson and Jaakkola, 2007
  • Wainwright and Jordan, 2008
  • Sontag et al. 2008, 2012

§ Idea: relax search over consistent marginals to simpler set

slide-16
SLIDE 16 16

Local Consistency Relaxation

φ φ φ φ φ φ φ φ x x x x

µ µ µ µ

slide-17
SLIDE 17 17

Local Consistency Relaxation

φ φ φ φ φ φ φ φ x x x x

θ θ

slide-18
SLIDE 18 18

Local Consistency Relaxation

θ µ : pseudomarginals over variable states

: pseudomarginals over joint potential states

x

φ(xj)

arg max

(θ,µ)∈L

X

Rj∈R

wj X

xj

θj(xj) φj(xj)

slide-19
SLIDE 19

MAX SAT Relaxation

slide-20
SLIDE 20 20

Approximate Inference

§ View MAP inference as optimizing rounding probabilities § Expected score of a clause is a weighted noisy-or function: § Then expected total score is § But, is highly non-convex!

arg maxp ˆ W

ˆ W = X

Rj∈R

wj B @1 − Y

i∈I+ j

(1 − pi) Y

i∈I− j

pi 1 C A wj B @1 − Y

i∈I+ j

(1 − pi) Y

i∈I− j

pi 1 C A

C

slide-21
SLIDE 21 21

Approximate Inference

§ It is the products in the objective that make it non-convex § The expected score can be lower bounded using the relationship between arithmetic and harmonic means: § This leads to the lower bound p1 + p2 + · · · + pk k ≥

k

√p1p2 · · · pk

Goemans and Williamson, 1994

X

Rj∈R

wj B @1 − Y

i∈I+ j

(1 − pi) Y

i∈I− j

pi 1 C A ≥ ✓ 1 − 1 e ◆ X

Rj∈R

wj min 8 > < > : X

i∈I+ j

pi + X

i∈I− j

(1 − pi), 1 9 > = > ;

slide-22
SLIDE 22 22

§ So, we solve the linear program § If we set , a greedy rounding method will find a

  • optimal discrete solution

§ If we set , it improves to ¾-optimal

arg max

y∈[0,1]n

X

Rj∈R

wj min 8 > < > : X

i∈I+

j

yi + X

i∈I−

j

(1 − yi), 1 9 > = > ;

Approximate Inference

pi = yi pi = 1 2yi + 1 4

✓ 1 − 1 e ◆

Goemans and Williamson, 1994

slide-23
SLIDE 23

Unifying the Relaxations

slide-24
SLIDE 24 24

Analysis

φ φ φ φ φ φ φ φ x x x x

j=1 j=2 j=3 and so on…

arg max

(θ,µ)∈L

X

Rj∈R

wj X

xj

θj(xj) φj(xj)

slide-25
SLIDE 25 25

Analysis

max θj|(θj,µ)∈L wj X xj θj(xj) φj(xj) max θj|(θj,µ)∈L wj X xj θj(xj) φj(xj) max θj|(θj,µ)∈L wj X xj θj(xj) φj(xj) max θj|(θj,µ)∈L wj X xj θj(xj) φj(xj) max θj|(θj,µ)∈L wj X xj θj(xj) φj(xj) max θj|(θj,µ)∈L wj X xj θj(xj) φj(xj) max θj|(θj,µ)∈L wj X xj θj(xj) φj(xj) max θj|(θj,µ)∈L wj X xj θj(xj) φj(xj)

µ µ µ µ

arg max

µ∈[0,1]n

X

Rj∈R

ˆ φj(µ)

slide-26
SLIDE 26 26

Analysis

§ We can now analyze each potential’s parameterized subproblem in isolation: § Using the KKT conditions, we can find a simplified expression for each solution based on the parameters :

ˆ φj(µ) = max

θj|(θj,µ)∈L wj

X

xj

θj(xj) φj(xj)

µ

ˆ φj(µ) = wj min 8 > < > : X

i∈I+

j

µi + X

i∈I−

j

(1 − µi), 1 9 > = > ;

slide-27
SLIDE 27 27

arg max

µ∈[0,1]n

X

Rj∈R

ˆ φj(µ)

Analysis

µ µ µ µ

wj min 8 > < > : X i∈I+ j µi + X i∈I− j (1 − µi), 1 9 > = > ; wj min 8 > < > : X i∈I+ j µi + X i∈I− j (1 − µi), 1 9 > = > ; wj min 8 > < > : X i∈I+ j µi + X i∈I− j (1 − µi), 1 9 > = > ; wj min 8 > < > : X i∈I+ j µi + X i∈I− j (1 − µi), 1 9 > = > ; wj min 8 > < > : X i∈I+ j µi + X i∈I− j (1 − µi), 1 9 > = > ; wj min 8 > < > : X i∈I+ j µi + X i∈I− j (1 − µi), 1 9 > = > ; wj min 8 > < > : X i∈I+ j µi + X i∈I− j (1 − µi), 1 9 > = > ; wj min 8 > < > : X i∈I+ j µi + X i∈I− j (1 − µi), 1 9 > = > ;

Substitute back into

  • uter objective
slide-28
SLIDE 28 28

Analysis

§ Leads to simplified, projected LCR over :

arg max

µ∈[0,1]n m

X

j=1

wj min 8 > < > : X

i∈I+ j

µi + X

i∈I− j

(1 − µi), 1 9 > = > ;

µ

slide-29
SLIDE 29 29

arg max

µ∈[0,1]n

X

Rj∈R

wj min 8 > < > : X

i∈I+ j

µi + X

i∈I− j

(1 − µi), 1 9 > = > ;

Analysis

arg max

y∈[0,1]n

X

Rj∈R

wj min 8 > < > : X

i∈I+ j

yi + X

i∈I− j

(1 − yi), 1 9 > = > ;

Local Consistency Relaxation MAX SAT Relaxation

slide-30
SLIDE 30

Evaluation

slide-31
SLIDE 31 31

New Algorithm: Rounded LP

§ Three steps:

  • Solves relaxed MAP inference problem
  • Modifies pseudomarginals
  • Rounds to discrete solutions

§ We use the alternating direction method of multipliers (ADMM) to implement a message-passing approach

[Glowinski and Marrocco, 1975; Gabay and Mercier, 1976]

§ ADMM-based inference for MAX SAT form of problem was

  • riginally developed for hinge-loss MRFs

[Bach et al., 2015]

slide-32
SLIDE 32 32

Evaluation Setup

§ Compared with

  • MPLP
  • MPLP with cycle tightening

§ MPLP uses coordinate descent dual decomposition, so rounding not applicable § Solved MAP in social-network opinion models with super- and submodular features § Measured primal score, i.e., weighted sum of satisfied clauses

[Globerson and Jaakkola, 2007; Sontag et al. 2008, 2012]

slide-33
SLIDE 33 33

Results

§ Expected scores of Rounded LP are significantly better § Rounded LP’s final scores are even better § Cycle tightening has limited effect § Rounded LP does 20% better than MPLP, and only takes 1 minute for 1 million clauses

2 4 6 8 10 x 10 5 2 4 6 8 10 x 10 5 MPLP Primal Objective Primal Objective LP Upper Bound Rounded LP Rounded LP (Exp) MPLP w/ Cycles
slide-34
SLIDE 34

Conclusion

slide-35
SLIDE 35 35

Conclusion

§ Uniting local consistency and MAX SAT relaxation combines the benefits of both: scalability and accuracy § Rounding pseudomarginals can significantly improve quality over coordinate descent dual decomposition § Many applications to structured and relational data:

  • Social network analysis
  • Bioinformatics
  • Recommender systems
  • Text and video understanding

Thank You!

bach@cs.umd.edu @stevebach