AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks - - PowerPoint PPT Presentation

and machine learning
SMART_READER_LITE
LIVE PREVIEW

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks - - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian Curve Fitting (2) Plate Bayesian


slide-1
SLIDE 1

PATTERN RECOGNITION

AND MACHINE LEARNING

CHAPTER 8: GRAPHICAL MODELS

slide-2
SLIDE 2

Bayesian Networks

Directed Acyclic Graph (DAG)

slide-3
SLIDE 3

Bayesian Networks

General Factorization

slide-4
SLIDE 4

Bayesian Curve Fitting (1)

Polynomial

slide-5
SLIDE 5

Bayesian Curve Fitting (2)

Plate

slide-6
SLIDE 6

Bayesian Curve Fitting (3)

Input variables and explicit hyperparameters

slide-7
SLIDE 7

Bayesian Curve Fitting—Learning

Condition on data

slide-8
SLIDE 8

Bayesian Curve Fitting—Prediction

Predictive distribution:

where

slide-9
SLIDE 9

Generative Models

Causal process for generating images

slide-10
SLIDE 10

Discrete Variables (1)

General joint distribution: K 2 { 1 parameters Independent joint distribution: 2(K { 1) parameters

slide-11
SLIDE 11

Discrete Variables (2)

General joint distribution over M variables: KM { 1 parameters M -node Markov chain: K { 1 + (M { 1)K(K { 1) parameters

slide-12
SLIDE 12

Discrete Variables: Bayesian Parameters (1)

slide-13
SLIDE 13

Discrete Variables: Bayesian Parameters (2)

Shared prior

slide-14
SLIDE 14

Parameterized Conditional Distributions

If are discrete, K-state variables, in general has O(KM) parameters.

The parameterized form requires only M + 1 parameters

slide-15
SLIDE 15

Linear-Gaussian Models

Directed Graph Vector-valued Gaussian Nodes

Each node is Gaussian, the mean is a linear function of the parents.

slide-16
SLIDE 16

Conditional Independence

a is independent of b given c Equivalently Notation

slide-17
SLIDE 17

Conditional Independence: Example 1

slide-18
SLIDE 18

Conditional Independence: Example 1

slide-19
SLIDE 19

Conditional Independence: Example 2

slide-20
SLIDE 20

Conditional Independence: Example 2

slide-21
SLIDE 21

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c unobserved.

slide-22
SLIDE 22

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c observed.

slide-23
SLIDE 23

“Am I out of fuel?”

B = Battery (0=flat, 1=fully charged) F = Fuel Tank (0=empty, 1=full) G = Fuel Gauge Reading (0=empty, 1=full) and hence

slide-24
SLIDE 24

“Am I out of fuel?”

Probability of an empty tank increased by observing G = 0.

slide-25
SLIDE 25

“Am I out of fuel?”

Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

slide-26
SLIDE 26

D-separation

  • A, B, and C are non-intersecting subsets of nodes in a

directed graph.

  • A path from A to B is blocked if it contains a node such that

either a) the arrows on the path meet either head-to-tail or tail- to-tail at the node, and the node is in the set C, or b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

  • If all paths from A to B are blocked, A is said to be d-

separated from B by C.

  • If A is d-separated from B by C, the joint distribution over

all variables in the graph satisfies .

slide-27
SLIDE 27

D-separation: Example

slide-28
SLIDE 28

D-separation: I.I.D. Data

slide-29
SLIDE 29

Directed Graphs as Distribution Filters

slide-30
SLIDE 30

The Markov Blanket

Factors independent of xi cancel between numerator and denominator.

slide-31
SLIDE 31

Cliques and Maximal Cliques

Clique Maximal Clique

slide-32
SLIDE 32

Joint Distribution

where is the potential over clique C and is the normalization coefficient; note: M K-state variables  KM terms in Z. Energies and the Boltzmann distribution

slide-33
SLIDE 33

Illustration: Image De-Noising (1)

Original Image Noisy Image

slide-34
SLIDE 34

Illustration: Image De-Noising (2)

slide-35
SLIDE 35

Illustration: Image De-Noising (3)

Noisy Image Restored Image (ICM)

slide-36
SLIDE 36

Illustration: Image De-Noising (4)

Restored Image (Graph cuts) Restored Image (ICM)

slide-37
SLIDE 37

Converting Directed to Undirected Graphs (1)

slide-38
SLIDE 38

Converting Directed to Undirected Graphs (2)

Additional links

slide-39
SLIDE 39

Directed vs. Undirected Graphs (1)

slide-40
SLIDE 40

Directed vs. Undirected Graphs (2)

slide-41
SLIDE 41

Inference in Graphical Models

slide-42
SLIDE 42

Inference on a Chain

slide-43
SLIDE 43

Inference on a Chain

slide-44
SLIDE 44

Inference on a Chain

slide-45
SLIDE 45

Inference on a Chain

slide-46
SLIDE 46

Inference on a Chain

To compute local marginals:

  • Compute and store all forward messages, .
  • Compute and store all backward messages, .
  • Compute Z at any node xm
  • Compute

for all variables required.

slide-47
SLIDE 47

Trees

Undirected Tree Directed Tree Polytree

slide-48
SLIDE 48

Factor Graphs

slide-49
SLIDE 49

Factor Graphs from Directed Graphs

slide-50
SLIDE 50

Factor Graphs from Undirected Graphs

slide-51
SLIDE 51

The Sum-Product Algorithm (1)

Objective:

i. to obtain an efficient, exact inference algorithm for finding marginals; ii. in situations where several marginals are required, to allow computations to be shared efficiently.

Key idea: Distributive Law

slide-52
SLIDE 52

The Sum-Product Algorithm (2)

slide-53
SLIDE 53

The Sum-Product Algorithm (3)

slide-54
SLIDE 54

The Sum-Product Algorithm (4)

slide-55
SLIDE 55

The Sum-Product Algorithm (5)

slide-56
SLIDE 56

The Sum-Product Algorithm (6)

slide-57
SLIDE 57

The Sum-Product Algorithm (7)

Initialization

slide-58
SLIDE 58

The Sum-Product Algorithm (8)

To compute local marginals:

  • Pick an arbitrary node as root
  • Compute and propagate messages from the leaf

nodes to the root, storing received messages at every node.

  • Compute and propagate messages from the root to

the leaf nodes, storing received messages at every node.

  • Compute the product of received messages at each

node for which the marginal is required, and normalize if necessary.

slide-59
SLIDE 59

Sum-Product: Example (1)

slide-60
SLIDE 60

Sum-Product: Example (2)

slide-61
SLIDE 61

Sum-Product: Example (3)

slide-62
SLIDE 62

Sum-Product: Example (4)

slide-63
SLIDE 63

The Max-Sum Algorithm (1)

Objective: an efficient algorithm for finding

i. the value xmax that maximises p(x); ii. the value of p(xmax). In general, maximum marginals  joint maximum.

slide-64
SLIDE 64

The Max-Sum Algorithm (2)

Maximizing over a chain (max-product)

slide-65
SLIDE 65

The Max-Sum Algorithm (3)

Generalizes to tree-structured factor graph

maximizing as close to the leaf nodes as possible

slide-66
SLIDE 66

The Max-Sum Algorithm (4)

Max-Product  Max-Sum

For numerical reasons, use Again, use distributive law

slide-67
SLIDE 67

The Max-Sum Algorithm (5)

Initialization (leaf nodes) Recursion

slide-68
SLIDE 68

The Max-Sum Algorithm (6)

Termination (root node) Back-track, for all nodes i with l factor nodes to the root (l=0)

slide-69
SLIDE 69

The Max-Sum Algorithm (7)

Example: Markov chain

slide-70
SLIDE 70

The Junction Tree Algorithm

  • Exact inference on general graphs.
  • Works by turning the initial graph into a

junction tree and then running a sum- product-like algorithm.

  • Intractable on graphs with large cliques.
slide-71
SLIDE 71

Loopy Belief Propagation

  • Sum-Product on general graphs.
  • Initial unit messages passed across all links,

after which messages are passed around until convergence (not guaranteed!).

  • Approximate but tractable for large graphs.
  • Sometime works well, sometimes not at all.