Slides Set 5: Probabilistic Networks Rina Dechter Darwiche - - PowerPoint PPT Presentation

slides set 5 probabilistic networks
SMART_READER_LITE
LIVE PREVIEW

Slides Set 5: Probabilistic Networks Rina Dechter Darwiche - - PowerPoint PPT Presentation

Algorithms for Reasoning with graphical models Slides Set 5: Probabilistic Networks Rina Dechter Darwiche chapter 3,4, Pearl: chapters 3 slides5 828X 2019 Outline Basics of probability theory DAGS, Markov(G), Bayesian networks


slide-1
SLIDE 1

Algorithms for Reasoning with graphical models

Slides Set 5: Probabilistic Networks

Rina Dechter

slides5 828X 2019

Darwiche chapter 3,4, Pearl: chapters 3

slide-2
SLIDE 2

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring

conditional independence (CI)

 D-separation: Inferring CIs in graphs

slides5 828X 2019

slide-3
SLIDE 3

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring

conditional independence (CI)

 Capturing CIs by graphs  D-separation: Inferring CIs in graphs

slides5 828X 2019

slide-4
SLIDE 4

Examples: Common Sense Reasoning

Zebra on Pajama: (7:30 pm): I told Susannah: you have a nice

pajama, but it was just a dress. Why jump to that conclusion?: 1. because time is night time. 2. certain designs look like pajama.

Cars going out of a parking lot: You enter a parking lot which is

quite full (UCI), you see a car coming : you think ah… now there is a space (vacated), OR… there is no space and this guy is looking and leaving to another parking lot. What other clues can we have?

Robot gets out at a wrong level: A robot goes down the elevator. stops at 2nd floor instead of ground floor. It steps out and should immediately recognize not being in the right level, and go back inside.

Turing quotes

If machines will not be allowed to be fallible they cannot be intelligent

(Mathematicians are wrong from time to time so a machine should also be allowed)

slides5 828X 2019

slide-5
SLIDE 5

Why/What/How Uncertainty?

 Why Uncertainty?

 Answer: It is abandant

 What formalism to use?

 Answer: Probability theory

 How to overcome exponential

representation?

 Answer: Graphs, graphs, graphs… to

capture irrelevance, independence

slides5 828X 2019

slide-6
SLIDE 6

Why Uncertainty?

AI goal: to have a declarative, model-based, framework that allows computer system to reason.

People reason with partial information

Sources of uncertainty:

Limitation in observing the world: e.g., a physician see symptoms and not exactly what goes in the body when he performs diagnosis. Observations are noisy (test results are inaccurate)

Limitation in modeling the world,

maybe the world is not deterministic.

slides5 828X 2019

slide-7
SLIDE 7

slides5 828X 2019

slide-8
SLIDE 8

slides5 828X 2019

slide-9
SLIDE 9

slides5 828X 2019

slide-10
SLIDE 10

slides5 828X 2019

slide-11
SLIDE 11

slides5 828X 2019

slide-12
SLIDE 12

slides5 828X 2019

slide-13
SLIDE 13

slides5 828X 2019

slide-14
SLIDE 14

Alpha and beta are events

slides5 828X 2019

slide-15
SLIDE 15

slides5 828X 2019

slide-16
SLIDE 16

Burglary is independent of Earthquake

slides5 828X 2019

slide-17
SLIDE 17

Earthquake is independent of burglary

slides5 828X 2019

slide-18
SLIDE 18

slides5 828X 2019

slide-19
SLIDE 19

slides5 828X 2019

slide-20
SLIDE 20

slides5 828X 2019

slide-21
SLIDE 21

slides5 828X 2019

slide-22
SLIDE 22

slides5 828X 2019

slide-23
SLIDE 23

slides5 828X 2019

slide-24
SLIDE 24

slides5 828X 2019

slide-25
SLIDE 25

slides5 828X 2019

slide-26
SLIDE 26

slides5 828X 2019

slide-27
SLIDE 27

Example

P(B,E,A,J,M)=?

slides5 828X 2019

slide-28
SLIDE 28

slides5 828X 2019

slide-29
SLIDE 29

slides5 828X 2019

slide-30
SLIDE 30

slides5 828X 2019

slide-31
SLIDE 31

slides5 828X 2019

slide-32
SLIDE 32

Bayesian Networks: Representation

= P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) lung Cancer Smoking X-ray Bronchitis Dyspnoea

P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S)

P(S, C, B, X, D)

Conditional Independencies Efficient Representation

Θ) (G, BN 

CPD:

C B D=0 D=1 0 0 0.1 0.9 0 1 0.7 0.3 1 0 0.8 0.2 1 1 0.9 0.1

slides5 828X 2019

slide-33
SLIDE 33

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring

conditional independence (CI)

 D-separation: Inferring CIs in graphs

(Darwiche chapter 4)

slides5 828X 2019

slide-34
SLIDE 34

The causal interpretation

slides5 828X 2019

slide-35
SLIDE 35

slides5 828X 2019

slide-36
SLIDE 36

slides5 828X 2019

slide-37
SLIDE 37

slides5 828X 2019

slide-38
SLIDE 38

slides5 828X 2019

slide-39
SLIDE 39

slides5 828X 2019

slide-40
SLIDE 40

slides5 828X 2019

slide-41
SLIDE 41

slides5 828X 2019

slide-42
SLIDE 42

slides5 828X 2019

slide-43
SLIDE 43

slides5 828X 2019

slide-44
SLIDE 44

slides5 828X 2019

slide-45
SLIDE 45

slides5 828X 2019

slide-46
SLIDE 46

slides5 828X 2019

slide-47
SLIDE 47

slides5 828X 2019

slide-48
SLIDE 48

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring

conditional independence (CI)

 D-separation: Inferring CIs in graphs

slides5 828X 2019

slide-49
SLIDE 49

R and C are independent given A

This independence follows from the Markov assumption

slides5 828X 2019

slide-50
SLIDE 50

slides5 828X 2019

slide-51
SLIDE 51

Properties of Probabilistic independence

Symmetry:

I(X,Z,Y)  I(Y,Z,X)

Decomposition:

I(X,Z,YW) I(X,Z,Y) and I(X,Z,W)

Weak union:

I(X,Z,YW)I(X,ZW,Y)

Contraction:

I(X,Z,Y) and I(X,ZY,W)I(X,Z,YW)

Intersection:

I(X,ZY,W) and I(X,ZW,Y)  I(X,Z,YW)

slides5 828X 2019

slide-52
SLIDE 52

slides5 828X 2019

slide-53
SLIDE 53

Pearl language: If two pieces of information are irrelevant to X then each one is irrelevant to X

slides5 828X 2019

slide-54
SLIDE 54

Example: Two coins and a bell

slides5 828X 2019

slide-55
SLIDE 55

slides5 828X 2019

slide-56
SLIDE 56

slides5 828X 2019

slide-57
SLIDE 57

slides5 828X 2019

slide-58
SLIDE 58

slides5 828X 2019

slide-59
SLIDE 59

slides5 828X 2019

slide-60
SLIDE 60

slides5 828X 2019

slide-61
SLIDE 61

slides5 828X 2019

slide-62
SLIDE 62

slides5 828X 2019

slide-63
SLIDE 63

When there are no constraints

slides5 828X 2019

slide-64
SLIDE 64

slides5 828X 2019

slide-65
SLIDE 65

slides5 828X 2019

slide-66
SLIDE 66

slides5 828X 2019

slide-67
SLIDE 67

Properties of Probabilistic independence

Symmetry:

I(X,Z,Y)  I(Y,Z,X)

Decomposition:

I(X,Z,YW) I(X,Z,Y) and I(X,Z,W)

Weak union:

I(X,Z,YW)I(X,ZW,Y)

Contraction:

I(X,Z,Y) and I(X,ZY,W)I(X,Z,YW)

Intersection:

I(X,ZY,W) and I(X,ZW,Y)  I(X,Z,YW)

Graphoid axioms: Symmetry, decomposition Weak union and contraction Positive graphoid: +intersection In Pearl: the 5 axioms are called Graphids, the 4, semi-graphois

slides5 828X 2019

slide-68
SLIDE 68

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring

conditional independence (CI)

 D-separation: Inferring CIs in graphs

 I-maps, D-maps, perfect maps  Markov boundary and blanket  Markov networks

slides5 828X 2019

slide-69
SLIDE 69

slides5 828X 2019

slide-70
SLIDE 70

d-speration

To test whether X and Y are d-separated by Z in dag G, we need to consider every path between a node in X and a node in Y, and then ensure that the path is blocked by Z.

A path is blocked by Z if at least one valve (node) on the path is ‘closed’ given Z.

A divergent valve or a sequential valve is closed if it is in Z

A convergent valve is closed if it is not on Z nor any of its descendants are in Z.

slides5 828X 2019

slide-71
SLIDE 71

slides5 828X 2019

slide-72
SLIDE 72

slides5 828X 2019

slide-73
SLIDE 73

slides5 828X 2019

slide-74
SLIDE 74

No path Is active = Every path is blocked

slides5 828X 2019

slide-75
SLIDE 75

Bayesian Networks as i-maps

 E: Employment  V: Investment  H: Health  W: Wealth  C: Charitable

contributions

 P: Happiness

E E E C E V W C P H Are C and V d-separated give E and P? Are C and H d-separated?

slides5 828X 2019

slide-76
SLIDE 76

d-Seperation Using Ancestral Graph

 X is d-separated from Y given Z (<X,Z,Y>d) iff:

Take the ancestral graph that contains X,Y,Z and their ancestral subsets.

Moralized the obtained subgraph

Apply regular undirected graph separation

Check: (E,{},V),(E,P,H),(C,EW,P),(C,E,HP)?

E E E C E V W C P H

slides5 828X 2019

slide-77
SLIDE 77

Idsep(R,EC,B)?

slides5 828X 2019

slide-78
SLIDE 78

slides5 828X 2019

slide-79
SLIDE 79

slides5 828X 2019

slide-80
SLIDE 80

slides5 828X 2019

slide-81
SLIDE 81

Idsep(C,S,B)=?

slides5 828X 2019

slide-82
SLIDE 82

slides5 828X 2019

slide-83
SLIDE 83

slides5 828X 2019

slide-84
SLIDE 84

slides5 828X 2019

slide-85
SLIDE 85

slides5 828X 2019

slide-86
SLIDE 86

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring conditional

independence (CI)

 D-separation: Inferring CIs in graphs

 Soundness, completeness of d-seperation  I-maps, D-maps, perfect maps  Construction a minimal I-map of a distribution  Markov boundary and blanket

slides5 828X 2019

slide-87
SLIDE 87

slides5 828X 2019

slide-88
SLIDE 88

It is not a d-map

slides5 828X 2019

slide-89
SLIDE 89

slides5 828X 2019

slide-90
SLIDE 90

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring conditional

independence (CI)

 D-separation: Inferring CIs in graphs

 Soundness, completeness of d-seperation  I-maps, D-maps, perfect maps  Construction a minimal I-map of a distribution  Markov boundary and blanket

slides5 828X 2019

slide-91
SLIDE 91

slides5 828X 2019

slide-92
SLIDE 92

slides5 828X 2019

slide-93
SLIDE 93

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring conditional

independence (CI)

 D-separation: Inferring CIs in graphs

 Soundness, completeness of d-seperation  I-maps, D-maps, perfect maps  Construction a minimal I-map of a distribution  Markov boundary and blanket

slides5 828X 2019

slide-94
SLIDE 94

slides5 828X 2019

slide-95
SLIDE 95

slides5 828X 2019

slide-96
SLIDE 96

slides5 828X 2019

slide-97
SLIDE 97

slides5 828X 2019

slide-98
SLIDE 98

slides5 828X 2019

slide-99
SLIDE 99

slides5 828X 2019

slide-100
SLIDE 100

Perfect Maps for DAGs

 Theorem 10 [Geiger and Pearl 1988]: For any dag D

there exists a P such that D is a perfect map of P relative to d-separation.

 Corollary 7: d-separation identifies any implied

independency that follows logically from the set of independencies characterized by its dag.

slides5 828X 2019

slide-101
SLIDE 101

Outline

 Basics of probability theory  DAGS, Markov(G), Bayesian networks  Graphoids: axioms of for inferring conditional

independence (CI)

 D-separation: Inferring CIs in graphs

 Soundness, completeness of d-seperation  I-maps, D-maps, perfect maps  Construction a minimal I-map of a distribution  Markov boundary and blanket

slides5 828X 2019

slide-102
SLIDE 102

slides5 828X 2019

slide-103
SLIDE 103

Blanket Examples

slides5 828X 2019

slide-104
SLIDE 104

Blanket Examples

slides5 828X 2019

slide-105
SLIDE 105

Bayesian Networks as Knowledge-Bases

 Given any distribution, P, and an ordering we can

construct a minimal i-map.

 The conditional probabilities of x given its parents is

all we need.

 In practice we go in the opposite direction: the

parents must be identified by human expert… they can be viewed as direct causes, or direct influences.

slides5 828X 2019

slide-106
SLIDE 106

slides5 828X 2019

slide-107
SLIDE 107

slides5 828X 2019

slide-108
SLIDE 108

Markov Networks and Markov Random Fields (MRF)

Can we also capture conditional independence by undirected graphs? Yes: using simple graph separation

slides5 828X 2019

slide-109
SLIDE 109

Graphoids

Symmetry:

I(X,Z,Y)  I(Y,Z,X)

Decomposition:

I(X,Z,YW) I(X,Z,Y) and I(X,Z,W)

Weak union:

I(X,Z,YW)I(X,ZW,Y)

Contraction:

I(X,Z,Y) and I(X,ZY,W)I(X,Z,YW)

Intersection:

I(X,ZY,W) and I(X,ZW,Y)  I(X,Z,YW)

slides5 828X 2019

slide-110
SLIDE 110

Undirected Graphs as I-maps of Distributions

slides5 828X 2019

slide-111
SLIDE 111

Axiomatic Characterization of Graphs

Graph separation satisfies:

 Symmetry: I(X,Z,Y)  I(Y,Z,X)  Decomposition: I(X,Z,YW) I(X,Z,Y) and I(X,Z,Y)  Intersection: I(X,ZW,Y) and I(X,ZY,W)I(X,Z,YW)

Strong union: I(X,Z,Y)  I(X,ZW, Y)

Transitivity: I(X,Z,Y)  exists t s.t. I(X,Z,t) or I(t,Z,Y)

slides5 828X 2019

slide-112
SLIDE 112

Graphoids vs Undirected graphs

Symmetry:

I(X,Z,Y)  I(Y,Z,X)

Decomposition:

I(X,Z,YW) I(X,Z,Y) and I(X,Z,W)

Weak union:

I(X,Z,YW)I(X,ZW,Y)

Contraction:

I(X,Z,Y) and I(X,ZY,W)I(X,Z,YW)

Intersection:

I(X,ZY,W) and I(X,ZW,Y)  I(X,Z,YW)

Symmetry: I(X,Z,Y)  I(Y,Z,X) Decomposition: I(X,Z,YW) I(X,Z,Y) and

I(X,Z,Y)

Intersection: I(X,ZW,Y) and

I(X,ZY,W)I(X,Z,YW)

Strong union: I(X,Z,Y)  I(X,ZW, Y) Transitivity: I(X,Z,Y)  exists t s.t. I(X,Z,t) or

I(t,Z,Y)

slides5 828X 2019

slide-113
SLIDE 113

Markov Networks

 An undirected graph G which is a minimal I-map of

a probability distribution Pr, namely deleting any edge destroys its i-mappness relative to (undirected) seperation, is called a Markov network of P.

slides5 828X 2019

slide-114
SLIDE 114

slides5 828X 2019

slide-115
SLIDE 115

The unusual edge (3,4) reflects the reasoning that if we fix the arrival time (5) the travel time (4) must depends on current time (3) slides5 828X 2019

slide-116
SLIDE 116

How can we construct a probability Distribution that will have all these independencies?

slides5 828X 2019

slide-117
SLIDE 117

So, How do we learn Markov networks From data?

Markov Random Field (MRF)

slides5 828X 2019

slide-118
SLIDE 118

Examples of Bayesian and and Markov Networks

slides5 828X 2019

slide-119
SLIDE 119

Markov Networks

slides5 828X 2019

slide-120
SLIDE 120

Sample Applications for Graphical Models

slides5 828X 2019