Local Structure and BE extensions COMPSCI 276, Spring 2017 Set 5b: - - PowerPoint PPT Presentation

local structure and be
SMART_READER_LITE
LIVE PREVIEW

Local Structure and BE extensions COMPSCI 276, Spring 2017 Set 5b: - - PowerPoint PPT Presentation

Local Structure and BE extensions COMPSCI 276, Spring 2017 Set 5b: Rina Dechter 1 (Reading: Darwiche chapter 5, dechter chapter 4) Outline Special representations of CPTs Bucket Elimination: Finding induced-width Bucket


slide-1
SLIDE 1

1

Local Structure and BE extensions

COMPSCI 276, Spring 2017 Set 5b: Rina Dechter

(Reading: Darwiche chapter 5, dechter chapter 4)

slide-2
SLIDE 2

Outline

  • Special representations of CPTs
  • Bucket Elimination:
  • Finding induced-width
  • Bucket elimination over mixed networks
slide-3
SLIDE 3

Outline

  • Bayesian networks and queries
  • Building Bayesian Networks
  • Special representations of CPTs
  • Causal Independence (e.g., Noisy OR)
  • Context Specific Independence
  • Determinism
  • Mixed Networks
slide-4
SLIDE 4
slide-5
SLIDE 5

A noisy-or circuit We wish to specify cpt with less parameters Think about headache and 10 different conditions that may cause it.

slide-6
SLIDE 6

Causal Indepedence 6

Binary OR

A B X A B P(X=0|A,B) 0 0 1 P(X=1|A,B) 0 1 1 1 0 1 1 1 1

slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

Noisy/OR CPDs

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23

25

Intelligence Difficulty Grade Letter SAT Job Apply

A student’s example

slide-24
SLIDE 24

Causal Indepedence 26

A S L

(0.8,0.2) (0.9,0.1) (0.4,0.6) (0.1,0.9)

s1 a0 a1 s0 l1 l0

Tree CPD

If the student does not Apply, SAT and L are irrelevant Tree-CPD for job

slide-25
SLIDE 25

Causal Indepedence 27

C L2

(0.1,0.9)

l21 c1 c2 l20

L1

(0.8,0.2) (0.3,0.7)

l11 l10

(0.9,0.1)

Letter1 Job Letter2 Choice

Captures irrelevant variables

slide-26
SLIDE 26

Causal Indepedence 28

Multiplexer CPD

A CPD P(Y|A,Z1,Z2,…,Zk) is a multiplexer iff Val(A)=1,2,…k, and P(Y|A,Z1,…Zk)=Z_a

Letter1 Letter Letter2 Choice Job

slide-27
SLIDE 27

Mixture of trees

Meila and Jordan, 2000

slide-28
SLIDE 28

Mixture model with shared structure

Meila and Jordan, 2000

slide-29
SLIDE 29

Can we use hidden variables?

slide-30
SLIDE 30

Mixed Networks

(Dechter 2013) Augmenting Probabilistic networks with constraints because:

  • Some information in the world is deterministic and

undirected (X ≠ Y)

  • Some queries are complex or evidence are

complex (cnfs)

Queries are probabilistic queries

slide-31
SLIDE 31

35Changes’05

Probabilistic Reasoning

Party example: the weather effect

Alex is-likely-to-go in bad weather Chris rarely-goes in bad weather Becky is indifferent but unpredictable Questions: Given bad weather, which group of individuals is most likely to show up at the party? What is the probability that Chris goes to the party but Becky does not?

P(W,A,C,B) = P(B|W) · P(C|W) · P(A|W) · P(W) P(A,C,B|W=bad) = 0.9 · 0.1 · 0.5

P(A|W=bad)=.9

W A

P(C|W=bad)=.1

W C

P(B|W=bad)=.5

W B W P(W) P(A|W) P(C|W) P(B|W) B C A

W A P(A|W) good .01 good 1 .99 bad .1 bad 1 .9

slide-32
SLIDE 32

Party Example Again

P(C|W) P(B|W) P(W) P(A|W) W B A C

Query: Is it likely that Chris goes to the party if Becky does not but the weather is bad?

Bayes Network Constraint Network

Semantics? Algorithms?

) , , | , ( A C B A bad w B C P   

  • A→B

C→A

B A C P(C|W) P(B|W) P(W) P(A|W) W B A C

A→B C→A

B A C

slide-33
SLIDE 33

Outline

  • Special representations of CPTs
  • Bucket Elimination:
  • Finding induced-width
  • Bucket elimination over mixed networks
slide-34
SLIDE 34

38

More accurately: O(r exp(w*(d)) where r is the number of cpts. For Bayesian networks r=n. For Markov networks? O(nexp(w*+1)) and O(n exp(w*)), respectively

slide-35
SLIDE 35

39

Finding Small Induced-Width

(Dechter 3.4-3.5)

NP-complete A tree has induced-width of ? Greedy algorithms:

Min width Min induced-width Max-cardinality and chordal graphs Fill-in (thought as the best) See anytime min-width (Gogate and Dechter)

slide-36
SLIDE 36

Type of graphs

40

slide-37
SLIDE 37

The induced width

41

slide-38
SLIDE 38

42

Different Induced-graphs

slide-39
SLIDE 39

43

Min-Width Ordering

Proposition: (Freuder 1982) algorithm min-width finds a min-width

  • rdering of a graph. Complexity O(|E|)
slide-40
SLIDE 40

Greedy Orderings Heuristics

44

Theorem: A graph is a tree iff it has both width and induced-width of 1.

Complexity? O(n^3)

slide-41
SLIDE 41

45

Different Induced-Graphs

slide-42
SLIDE 42

46

Induced-width for chordal graphs

Definition: A graph is chordal if every cycle of length at least 4 has a chord Finding w* over chordal graph is easy using the max-cardinality

  • rdering: order vertices from 1 to n, always assigning the next

number to the node connected to a largest set of previously numbered nodes. Lets d be such an ordering A graph along max-cardinality order has no fill-in edges iff it is chordal. On chordal graphs width=induced-width.

slide-43
SLIDE 43

47

Max-cardinality ordering

What is the complexity of min-fill? Min-induced-width?

slide-44
SLIDE 44

K-trees

48

slide-45
SLIDE 45

49

Which greedy algorithm is best?

slide-46
SLIDE 46

Recent work in my group

Vibhav Gogate and Rina Dechter. "A Complete Anytime Algorithm for Treewidth". In UAI 2004. Andrew E. Gelfand, Kalev Kask, and Rina Dechter. "Stopping Rules for Randomized Greedy Triangulation Schemes" in Proceedings of AAAI 2011.

Kalev Kask, Andrew E. Gelfand, Lars Otten, and Rina Dechter.

"Pushing the Power of Stochastic Greedy Ordering Schemes for Inference in Graphical Models" in Proceedings of AAAI 2011. Kask, Gelfand and Dechter, BEEM: Bucket Elimination with External memory, AAAI 2011 or UAI 2011 Potential project

50

slide-47
SLIDE 47

Mixed Networks

Augmenting Probabilistic networks with constraints because:

Some information in the world is deterministic and undirected (X ≠Y). Some queries are complex or evidence are complex (cnf formulas)

Queries are probabilistic queries

slide-48
SLIDE 48

276 Fall 2007 52Changes’05

Mixed Beliefs and Constraints

If the constraint is a cnf formula Queries over hybrid network: Complex evidence structure All reduce to cnf queries over a Belief network: CPE (CNF probability evaluation): Given a belief

network, and a cnf formula, find its probability.

? ) ( ) ( ) (  

    P B D D G

? ) | ( ? ) | (

1

    x P x P

slide-49
SLIDE 49

Party example again

P(C|W) P(B|W) P(W) P(A|W) W B A C

Query: Is it likely that Chris goes to the party if Becky does not but the weather is bad? PN CN Semantics? Algorithms?

) , , | , ( A C B A bad w B C P   

  • A→B

C→A

B A C P(C|W) P(B|W) P(W) P(A|W) W B A C

A→B C→A

B A C

slide-50
SLIDE 50

Bucket Elimination for Mixed networks

55

The CPE query P((C  B) and P(A  C))

slide-51
SLIDE 51

56

slide-52
SLIDE 52

57

slide-53
SLIDE 53

Processing Mixed Buckets

58

slide-54
SLIDE 54

276 Fall 2007

A Hybrid Belief Network

D G A B C F

1 1    ) |a P(c

F D G  

Belief network P(g,f,d,c,b,a) =P(g|f,d)P(f|c,b)P(d|b,a)P(b|a)P(c|a)P(a)

Bucket G: P(G|F,D) Bucket F: P(F|B,C) Bucket D: P(D|A,B) Bucket C: P(C|A) Bucket B: P(B|A) Bucket A: P(A)

) , , ( C B A

D

 ) (A

C

 ) , , ( D C B

F

 ) , ( B A

B

 ) , | ( D F G P  G

  • )

| ( G A P

slide-55
SLIDE 55

), , ( B A

D

D

Bucket G: P(G|F,D) Bucket F: P(F|B,C) Bucket D: P(D|A,B) Bucket C: P(C|A) Bucket B: P(B|A) Bucket A: P(A)

G G D F G F G D

 

  • ),

)( )( (

(a) regular Elim-CPE Bucket G: P(G|F,D) Bucket F: P(F|B,C) Bucket D: P(D|A,B) Bucket C: P(C|A) Bucket B: P(B|A) Bucket A: P(A)

) , , ( C B A

D

 ) (A

C

 ) , , ( D C B

F

 ) , ( B A

B

 ) , | ( D F G P  G

  • )

| ( G A P

  • (b) Elim-CPE-D with clause extraction

Variable elimination for a mixed network:

) ( ) , | ( F D F G P

) ( D

  • )

(A

B

 ) | ( G A P

  • C)

(A  C) (B,

F

 ) (D

F

) , ( B A

C

slide-56
SLIDE 56

276 Fall 2007

Bucket G: P(G|F,D) Bucket D: P(D|A,B) Bucket B: P(B|A) P(F|B,C) Bucket C: P(C|A) Bucket F: Bucket A: ) , ( B A

D

 B

  • )

(F

D

 ), ( B D

  • Trace of Elim-CPE

D G A B C F

Belief network P(g,f,d,c,b,a) =P(g|f,d)P(f|c,b)P(d|b,a)P(b|a)P(c|a)P(a)

G ) (

  •  D

G ) ( C B  ) , ( C F

B

C

) (

1 A B

 ) (

2 A B

 ) (F

C

 ) (A

C

F

 ) ( P D ) , ( D F

G

slide-57
SLIDE 57

Bucket-elimination example for a mixed network

62

slide-58
SLIDE 58

Markov Networks

Dechter, chapter 2

slide-59
SLIDE 59

Complexity

64

slide-60
SLIDE 60

The running intersection property