Anytime Approximate Inference in Graphical Models Qi Lou Final - - PowerPoint PPT Presentation

anytime approximate inference in graphical models
SMART_READER_LITE
LIVE PREVIEW

Anytime Approximate Inference in Graphical Models Qi Lou Final - - PowerPoint PPT Presentation

Anytime Approximate Inference in Graphical Models Qi Lou Final Defense Dec. 5, 2018 Committee: Alexander Ihler (Chair) Rina Dechter Sameer Singh 1 Core of This Thesis 2 Graphical Models Describe structure in large problems Large


slide-1
SLIDE 1

Anytime Approximate Inference in Graphical Models

1

Qi Lou Final Defense

  • Dec. 5, 2018

Committee: Alexander Ihler (Chair) Rina Dechter Sameer Singh

slide-2
SLIDE 2

Core of This Thesis

2

slide-3
SLIDE 3

Graphical Models

  • Describe structure in large problems

– Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence

  • More formally:
  • Example:

A graphical model consists of:

  • - variables
  • - domains
  • - (non-negative) functions or “factors”

(we’ll assume discrete)

3

A B C

A B f(A,B) 0.24 1 0.56 1 1.1 1 1 1.2 B C f(B,C) 0.12 1 0.36 1 0.3 1 1 1.8

slide-4
SLIDE 4

Graphical Models

  • Describe structure in large problems

– Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence

  • Examples & Tasks

– Maximization (MAP): compute the most probable configuration

[Yanover & Weiss 2002]

4

slide-5
SLIDE 5

Graphical Models

  • Describe structure in large problems

– Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence

  • Examples & Tasks

– Summation & marginalization

grass plane sky grass cow

Observation y Observation y Marginals p( xi | y ) Marginals p( xi | y )

and

“partition function”

e.g., [Plath et al. 2009]

5

slide-6
SLIDE 6

Graphical Models

  • Describe structure in large problems

– Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence

  • Examples & Tasks

– Mixed inference (marginal MAP, MEU, …)

Test Drill Oil sale policy Test result Seismic structure Oil underground Oil produced Test cost Drill cost Sales cost Oil sales Market information

Influence diagrams &

  • ptimal decision-making

(the “oil wildcatter” problem)

e.g., [Raiffa 1968; Shachter 1986]

6

slide-7
SLIDE 7

Inference Queries/Tasks

  • Maximum A Posteriori (MAP)
  • The Partition Function
  • Marginal MAP (MMAP)

7

#P-complete [Valiant 1979]) NP-hard in general NPPP(decision version) [Park 2002])

slide-8
SLIDE 8

Desired Properties: Guarantee, Anytime, Anyspace

  • Anytime

– valid solution at any point – solution quality improves with additional computation

  • Anyspace

– run with limited memory resources

8

time

Bounded error

slide-9
SLIDE 9

Approximate inference

  • Three major paradigms

Search

Structured enumeration over all possible states

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1

Sampling

Use randomization to estimate averages over the state space

9

Variational methods

Reason over small subsets of variables at a time

slide-10
SLIDE 10

Approximate inference

  • Three major paradigms

– Variational methods (e.g., tree- reweighted belief propagation [Wainwright et al. 2003]), mini- bucket elimination [Dechter & Rish] 2001).

10

Variational methods

Reason over small subsets of variables at a time

Search

Structured enumeration over all possible states

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1

Sampling

Use randomization to estimate averages over the state space

slide-11
SLIDE 11

Approximate inference

  • Three major paradigms

– (Monte Carlo) Sampling (e.g., importance sampling based (e.g., [Bidyuk & Dechter 2007]), approximate hash-based counting (e.g., [Chakraborty et al. 2016])).

Variational methods

Reason over small subsets of variables at a time

Search

Structured enumeration over all possible states

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1

Sampling

Use randomization to estimate averages over the state space

11

slide-12
SLIDE 12

Approximate inference

  • Three major paradigms

– (Heuristic) Search (e.g., [Lou et al. 2017], [Viricel et al. 2016], [Henrion 1991]).

Variational methods

Reason over small subsets of variables at a time

Search

Structured enumeration over all possible states

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1

Sampling

Use randomization to estimate averages over the state space

12

slide-13
SLIDE 13

Main Contributions of This Thesis

13

slide-14
SLIDE 14

Chapter 3: Best-first Search Aided by Variational Heuristics

14

Variational methods Search

1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1

AND/OR best-first search (AOBFS) provide pre-compiled heuristics unified best-first search (UBFS)

slide-15
SLIDE 15

Search Trees and Summation

  • Organize / structure the state space

– Leaf nodes = model configurations – “Value” of a node = sum of configurations below

C

1 1 1 1

D

1 1 1 1 1 1 1 1

F

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

B

1 1

A

1 1 1 1 1

15

slide-16
SLIDE 16

Search Trees and Summation

  • Heuristic search for summation

– Heuristic function upper bounds value (sum below) at any node – Expand tree and compute updated bounds

C

1

B

1 1

A 16

slide-17
SLIDE 17

AND/OR Best-first Search (AOBFS)

17

best-first search search space heuristic priority

AND/OR search tree weighted mini-bucket potentially reduce the bound gap U – L on Z most

slide-18
SLIDE 18

AND/OR Search Trees

[Nillson 1980, Dechter and Mateescu 2007]

18

OR AND OR AND OR OR AND AND

A B B

1 1 1

F

1

G

0 1

G

0 1

F

1

G

0 1

G

0 1

C

1

E

0 1

D

0 1

E

0 1

D

0 1

C

1

E

0 1

D

0 1

E

0 1

D

0 1

C

1

E

0 1

D

0 1

E

0 1

D

0 1

C

1

E

0 1

D

0 1

E

0 1

D

0 1

F

1

G

0 1

G

0 1

F

1

G

0 1

G

0 1

A B C D E F G

(full) solution tree: corresponds to a complete configuration of all variables

slide-19
SLIDE 19

weighted mini-bucket (WMB) Heuristics

19

 Formed by intermediately generated factors (called messages, e.g., )  Upper (or lower) bound of the node value.  Monotonic: Resolving relaxations using search makes heuristics more (no less) accurate.  Quality can be roughly controlled by the ibound.

A f(A,B) B f(B,C) C f(B,F) F f(A,G) f(F,G) G f(B,E) f(C,E) λF (A,B) λB (A) λE (B,C) λD (B,C) λC (B) λG (A,F) f(A) f(B,D) f(C,D) f(A,D) λD (A)

λD (A) [Liu and Ihler, ICML’11]

slide-20
SLIDE 20

F

1

C

1

B

1

B

1

 Intuition: expand the frontier node that potentially reduces the bound gap U – L (L<=Z<=U) most

Priority

20 A

1

gap priority upper priority

slide-21
SLIDE 21

Overcome The Memory Limit

  • Main strategy (SMA*-like

[Russell 1992])

– Keep track of the lowest-priority node as well – When reach the memory limit, delete the lowest-priority nodes, and keep expanding the top- priority ones

21

1

F C

1

B

1

B

1

A

1

F

1

C

1

slide-22
SLIDE 22

Anytime Behavior of AOBFS

22

(a) PIC’11/queen5_5_4 (b) Protein/1g6x

slide-23
SLIDE 23

Aggregated Results

  • Number of instances solved to “tight”

tolerance interval. The best (most solved) for each setting is bolded.

23

slide-24
SLIDE 24

Best-first Search Aided by Variational Heuristics

25

Variational methods Search

1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1

weighted mini-bucket (WMB)

[Liu and Ihler, ICML’11]

AND/OR best-first search (AOBFS) for Z provide optimized heuristics unified best-first search (UBFS) for marginal MAP

slide-25
SLIDE 25

Unified Best-first Search (UBFS)

  • Idea: unify max- and sum- inference in one search

framework

– avoids some unnecessary exact evaluation of conditional summation problems

  • Principle: focus on reducing the upper bound of

MMAP as quickly as possible

  • How it works:

– Track the current most promising (partial) MAP configuration, i.e., one with the highest upper bound – Expand the most “influential” frontier node of that (partial) MAP configuration

  • Frontier node that contributes most to its upper bound
  • Identified by a specially designed “double-priority” system

26

slide-26
SLIDE 26

27

slide-27
SLIDE 27

28

slide-28
SLIDE 28

29

slide-29
SLIDE 29

Chapter 4: Sampling Enhanced by Best-first Search

30

Variational methods Sampling Search

1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1

weighted mini-bucket (WMB) AND/OR best-first search (AOBFS)

dynamic importance sampling (DIS) mixed dynamic importance sampling (MDIS)

provide heuristic refine proposal provide proposal WMB-IS

[Liu, Fisher, Ihler, NIPS’15]

slide-30
SLIDE 30

Monte Carlo Estimators

  • Most basic form: empirical estimate of probability
  • Relevant considerations

– Able to sample from the target distribution p(x)? – Able to evaluate p(x) explicitly, or only up to a constant?

  • “Anytime” properties

– Unbiased estimator,

  • r asymptotically unbiased,

– Variance of the estimator decreases with m

31

slide-31
SLIDE 31

Monte Carlo Estimators

  • Most basic form: empirical estimate of probability
  • Central limit theorem

– is asymptotically Gaussian:

  • Finite-sample confidence intervals

– If u(x) is bounded, e.g., , probability concentrates rapidly around the expectation:

m=1: m=5: m=15:

32

slide-32
SLIDE 32

Importance Sampling

  • Basic empirical estimate of probability:
  • Importance sampling:

33

slide-33
SLIDE 33

Importance Sampling

  • Basic empirical estimate of probability:
  • Importance sampling:

“importance weights”

34

slide-34
SLIDE 34

Choosing a proposal

  • Can use WMB upper bound to define a proposal

E: C: D: B: A:

mini-buckets

U = upper bound of Z

Weighted mixture: use mini-bucket 1 with probability w1

  • r, mini-bucket 2 with probability w2 = 1 - w1

where Key insight: provides bounded importance weights! [Liu, Fisher, Ihler, NIPS’15]

35

slide-35
SLIDE 35

WMB-IS

36

U

“Empirical Bernstein” bounds

slide-36
SLIDE 36

Two-step Sampling

38

slide-37
SLIDE 37

Boundedness of Two-step Sampling

39

: current search tree : proposal distribution defined by two-step sampling : refined upper bound by current search tree

slide-38
SLIDE 38

Two Stage Sampling

40

slide-39
SLIDE 39

Dynamic Importance Sampling (DIS)

41

slide-40
SLIDE 40

Sample Aggregation Strategy for DIS

  • Weighted average of importance weights: weight

each sample with its corresponding upper bound.

42

: importance weight corresponding to the i-th sample : upper bound being refined in the search process

slide-41
SLIDE 41

Finite-sample Bounds for DIS

43

slide-42
SLIDE 42

Results on Individual Instances

44

slide-43
SLIDE 43

Mixed Dynamic Importance Sampling (MDIS)

45

Augmented model Original model

Construct an augmented model [Doucet et al. 2002] Generalize DIS to provide finite-sample bounds for a series of summation objectives Translate bounds back to bound MMAP

slide-44
SLIDE 44

46

Number of instances that an algorithm achieves the best lower (top) and upper (down) bounds. (Entries for UBFS are blank since UBFS does not provide lower bounds.)

Empirical Evaluation for MDIS

slide-45
SLIDE 45

Chapter 5: A General Interleaving Framework

47

Variational methods Sampling

  • ptimize via message passing

provide proposal

slide-46
SLIDE 46

Adaptive Policy

  • Idea:

– Predict unit gains (bound reduction) of a message passing step and a sampling step, respectively. – Execute the action with a larger predicted unit gain.

48

slide-47
SLIDE 47

Interleaving v.s. Non-interleaving

49

slide-48
SLIDE 48

Adaptive v.s. Static

50

slide-49
SLIDE 49

Conclusions

51

AOBFS, UBFS DIS, MDIS A general framework

slide-50
SLIDE 50

Future Directions

52