Bayesian Networks Inference with Probabilistic Graphical Models - - PowerPoint PPT Presentation

bayesian networks
SMART_READER_LITE
LIVE PREVIEW

Bayesian Networks Inference with Probabilistic Graphical Models - - PowerPoint PPT Presentation

4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang Biointelligence Lab Seoul National University B io 4190.408 Artificial Intelligence ( 2016-Spring) 1 I ntelligence Machine Learning?


slide-1
SLIDE 1

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

4190.408 2016-Spring

Bayesian Networks

Inference with Probabilistic Graphical Models Byoung-Tak Zhang Biointelligence Lab Seoul National University

1

slide-2
SLIDE 2

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Machine Learning?

  • Learning System:

– A system which autonomously improves its performance (P) by automatically forming model (M) based on experiential data (D)

  • btained from interaction with environment (E)
  • Self-improving Systems (Perspective of AI)
  • Knowledge Discovery (Perspective of Data Mining)
  • Data-Driven Software Design

(Perspective of Software Engineering)

  • Automatic Programming

(Perspective of Computer Engineering)

2

slide-3
SLIDE 3

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Machine Learning as Automatic Programming

3

Computer Data Program Output Computer Data Output Program Traditional Programming Machine Learning

slide-4
SLIDE 4

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Machine Learning (ML): Three Tasks

4

  • Supervised Learning

– Estimate an unknown mapping from known input and target output pairs – Learn fw from training set D = {(x,y)} s.t. – Classification: y is discrete – Regression: y is continuous

  • Unsupervised Learning

– Only input values are provided – Learn fw from D = {(x)} s.t. – Density estimation and compression – Clustering, dimension reduction

  • Sequential (Reinforcement) Learning

– Not target, but rewards (critiques) are provided “sequentially” – Learn a heuristic function fw from Dt = {(st,at,rt) | t = 1, 2, …} s.t. – With respect to the future, not just past – Sequential decision-making – Action selection and policy learning

) ( ) ( x x

w

f y f  

x x

w

 ) ( f ( , , )

t t t

f a r

w s

slide-5
SLIDE 5

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Machine Learning Models

  • Supervised Learning

– Neural Nets – Decision Trees – K-Nearest Neighbors – Support Vector Machines

  • Unsupervised Learning

– Self-Organizing Maps – Clustering Algorithms – Manifold Learning – Evolutionary Learning

  • Probabilistic Graph

– Bayesian Networks – Markov Networks – Hidden Markov Models – Hypernetworks

  • Dynamic System

– Kalman Filters – Sequential Monte Carlo – Particle Filters – Reinforcement Learning

5

slide-6
SLIDE 6

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Outline

  • Bayesian Inference

– Monte Carlo – Importance Sampling – MCMC

  • Probabilistic Graphical Models

– Bayesian Networks – Markov Random Fields

  • Hypernetworks

– Architecture and Algorithms – Application Examples

  • Discussion

6

slide-7
SLIDE 7

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Bayes Theorem

7

slide-8
SLIDE 8

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

MAP vs. ML

  • What is the most probable hypothesis given data?
  • From Bayes Theorem
  • MAP (Maximum A Posteriori)
  • ML (Maximum Likelihood)

8

slide-9
SLIDE 9

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Bayesian Inference

9

slide-10
SLIDE 10

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

10

  • Prof. Schrater’s Lecture Notes

(Univ. of Minnesota)

slide-11
SLIDE 11

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

11

slide-12
SLIDE 12

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Monte Carlo (MC) Approximation

12

slide-13
SLIDE 13

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Markov chain Monte Carlo

13

slide-14
SLIDE 14

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

MC with Importance Sampling

14

slide-15
SLIDE 15

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Graphical Models

15

Graphical Models (GM) Causal Models Chain Graphs Other Semantics Directed GMs Dependency Networks Undirected GMs Bayesian Networks DBNs FST HMMs Factorial HMM Mixed Memory Markov Models BMMs Kalman Segment Models Mixture Models Decision Trees Simple Models PCA LDA Markov Random Fields / Markov networks Gibbs/Boltzman Distributions

slide-16
SLIDE 16

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

BAYESIAN NETWORKS

16

slide-17
SLIDE 17

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Bayesian Networks

  • Bayesian network

– DAG (Directed Acyclic Graph) – Express dependence relations between variables – Can use prior knowledge on the data (parameters)

17

A B C D E

n i i i

X P P

1

) | ( ) ( pa X

P(A,B,C,D,E) = P(A)P(B|A)P(C|B) P(D|A,B)P(E|B,C,D)

slide-18
SLIDE 18

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Representing Probability Distributions

  • Probability distribution

= probability for each combination of values of these attributes

  • Naïve representations (such as tables) run into troubles

– 20 attributes require more than 220 106 parameters – Real applications usually involve hundreds of attributes

18

Hospital patients described by

  • Background: age, gender, history of diseases, …
  • Symptoms: fever, blood pressure, headache, …
  • Diseases: pneumonia, heart attack, …
slide-19
SLIDE 19

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Bayesian Networks - Key Idea

  • utilize conditional independence
  • Graphical representation of conditional independence

respectively “causal” dependencies

19

Exploit regularities !!!

slide-20
SLIDE 20

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Bayesian Networks

  • 1. Finite, directed acyclic graph
  • 2. Nodes: (discrete) random variables
  • 3. Edges: direct influences
  • 4. Associated with each node: a table

representing a conditional probability distribution (CPD), quantifying the effect the parents have on the node

20

M J E B A

slide-21
SLIDE 21

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Bayesian Networks

21

X1 X2 X3

(0.2, 0.8) (0.6, 0.4) true 1 (0.2,0.8) true 2 (0.5,0.5) false 1 (0.23,0.77) false 2 (0.53,0.47)

slide-22
SLIDE 22

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Example: Use a DAG to model the causality

22

Train Strike Martin Late Norman Late Project Delay Office Dirty Boss Angry Boss Failure-in-Love Martin Oversleep Norman Oversleep Norman untidy

slide-23
SLIDE 23

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Example: Attach prior probabilities to all root nodes

23

Train Strike Martin Late Norman Late Project Delay Office Dirty Boss Angry Boss Failure-in-Love Martin Oversleep Norman Oversleep Martin Oversleep Probability T 0.01 F 0.99 Train Strike Probability T 0.1 F 0.9 Norman Oversleep Probability T 0.2 F 0.8 Boss failure- in-love Probability T 0.01 F 0.99 Norman untidy

slide-24
SLIDE 24

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Example: Attach prior probabilities to non-root nodes

24

Train Strike Martin Late Norman Late Project Delay Office Dirty Boss Angry Boss Failure-in-Love Martin Oversleep Norman Oversleep Norman

  • versleep

T F Norman untidy T 0.6 0.2 F 0.4 0.8 Train strike T F Martin oversleep T F T F Martin Late T 0.95 0.8 0.7 0.05 F 0.05 0.2 0.3 0.95 Norman untidy

Each column is summed to 1.

slide-25
SLIDE 25

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Example: Attach prior probabilities to non-root nodes

25

Train Strike Martin Late Norman Late Project Delay Office Dirty Boss Angry Boss Failure-in-Love Martin Oversleep Norman Oversleep Norman untidy

Each column is summed to 1.

Boss Failure-in-love T F Project Delay T F T F Office Dirty T F T F T F T F Boss Angry very 0.98 0.85 0.6 0.5 0.3 0.2 0.01 mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02 little 0.1 0.25 0.2 0.3 0.7 0.07 no 0.1 0.9

slide-26
SLIDE 26

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

26

Inference

slide-27
SLIDE 27

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

MARKOV RANDOM FIELDS (MARKOV NETWORKS)

27

slide-28
SLIDE 28

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Graphical Models

28

Directed Graph (e.g. Bayesian Network) Undirected Graph (e.g. Markov Random Field)

slide-29
SLIDE 29

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Bayesian Image Analysis

29

Original Image Degraded (observed) Image Noise Transmission

    

  

                                         

Likelihood Marginal y Probabilit Priori A Process n Degradatio y Probabilit Posteriori A

Image Degraded Image Original Image Original Image Degraded Image Degraded Image Original

Pr Pr Pr Pr 

slide-30
SLIDE 30

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Image Analysis

  • We could thus represent both the observed image (X)

and the true image (Y) as Markov random fields.

  • And invoke the Bayesian framework to find P(Y|X)

30

X – observed image Y – true image

slide-31
SLIDE 31

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Details

  • Remember
  • P(Y|X) proportional to P(X|Y)P(Y)

– P(X|Y) is the data model. – P(Y) models the label interaction.

  • Next we need to compute the prior P(Y=y) and

the likelihood P(X|Y).

31

P(Y | X) = P(X |Y )P(Y ) P(X) µ P(X |Y )P(Y )

slide-32
SLIDE 32

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Back to Image Analysis

32

  • Likelihood can be modeled as a mixture of

Gaussians.

  • The potential is modeled to capture the domain
  • knowledge. One common model is the Ising

model of the form βyiyj

slide-33
SLIDE 33

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Bayesian Image Analysis

  • Let X be the observed image = {x1,x2…xmn}
  • Let Y be the true image = {y1,y2…ymn}
  • Goal : find Y = y* = {y1*,y2*…} such that P(Y = y*|X)

is maximum.

  • Labeling problem with a search space of Lmn

– L is the set of labels. – m*n observations.

33

slide-34
SLIDE 34

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Unfortunately

34

Observed Image SVM MRF

slide-35
SLIDE 35

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Markov Random Fields (MRFs)

  • Introduced in the 1960s, a principled approach for

incorporating context information.

  • Incorporating domain knowledge .
  • Works within the Bayesian framework.
  • Widely worked on in the 70s, disappeared over the

80s, and finally made a big come back in the late 90s.

35

slide-36
SLIDE 36

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Markov Random Field

  • Random Field: Let be a family of random

variables defined on the set S , in which each random variable … takes a value in a label set L. The family F is called a random field.

  • Markov Random Field: F is said to be a Markov random field on

S with respect to a neighborhood system N if and only if the following two conditions are satisfied:

} ,..., , {

2 1 M

F F F F 

i

F

i

f

Positivity: ( ) 0, P f f F   

) | ( }) { | ( : ty Markoviani

i

N i i

f f P i S f P  

36

slide-37
SLIDE 37

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Inference

  • Finding the optimal y* such that P(Y=y*|X) is maximum.
  • Search space is exponential.
  • Exponential algorithm - simulated annealing (SA)
  • Greedy algorithm – iterated conditional modes (ICM)
  • There are other more advanced graph cut based strategies.

37

slide-38
SLIDE 38

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Sampling and Simulated Annealing

  • Sampling

– A way to generate random samples from a (potentially very complicated) probability distribution. – Gibbs/Metropolis.

  • Simulated annealing

– A schedule for modifying the probability distribution so that, at “zero temperature”, you draw samples only from the MAP solution.

  • If you can find the right cooling schedule the algorithm will

converge to a global MAP solution.

  • Flip side --- SLOW finding the correct schedule is non trivial.

38

slide-39
SLIDE 39

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Iterated Conditional Modes

  • Greedy strategy, fast convergence
  • Idea is to maximize the local conditional

probabilities iteratively, given an initial solution.

  • Simulated annealing with T =0 .

39

slide-40
SLIDE 40

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Parameter Learning

  • Supervised learning (easiest case)
  • Maximum likelihood:
  • For an MRF:

( | )/

1 ( | ) ( )

U f T

P f e Z

 

*

argmax ( | ) P f

  

40

slide-41
SLIDE 41

Bio Intelligence

4190.408 Artificial Intelligence (2016-Spring)

Pseudo Likelihood

  • So we approximate
  • Large lattice theorem: in the large lattice limit M, PL

converges to ML estimate.

  • Turns out that a local learning method like pseudo-likelihood

when combined with a local inference method such as ICM does quite well. Close to optimal results.

( , ) ( , )

( ) ( | ) =

i i Ni i j j N j j

U f f i N U f f i X f L

e PL f P f f e

 



( ) ( , )

i

i i N i

U f U f f 

41