Belief Propagation Algorithm Interest Group presentation by Eli - - PowerPoint PPT Presentation

▶

Oct 23, 2023 344 likes •523 views

Belief Propagation Algorithm Interest Group presentation by Eli Chertkov Inference Statistical inference is the determination of an underlying probability distribution from observed data. Probabilistic Graphical Models 1 1 =

SLIDE 1

Algorithm Interest Group presentation by Eli Chertkov

Belief Propagation

SLIDE 2

Inference

Statistical inference is the determination of an underlying probability distribution from observed data.

SLIDE 3

Probabilistic Graphical Models

𝑦1

𝑄 𝑦1 = 𝜚1 𝑦1

SLIDE 4

Probabilistic Graphical Models

𝑄 𝑦1, 𝑦2 = 𝜚1 𝑦1 𝜚2 𝑦2

𝑦2 𝑦1

SLIDE 5

Probabilistic Graphical Models

Directed: Bayesian Network Undirected: Markov Random Field 𝑄 𝑦1, 𝑦2 = 𝑄 𝑦2|𝑦1 𝜚1 𝑦1

𝑦2 𝑦1

𝑄 𝑦1, 𝑦2 = 𝜚12 𝑦1, 𝑦2

𝑦2 𝑦1

SLIDE 6

Probabilistic Graphical Models

Directed: Bayesian Network Undirected: Markov Random Field

𝑄 𝑦1, 𝑦2, 𝑦3, 𝑦4 = 𝑄 𝑦4|𝑦3, 𝑦2 𝑄(𝑦3|𝑦2, 𝑦1)𝜚2(𝑦2)𝜚1 𝑦1

𝑦1 𝑦2 𝑦3 𝑦4 𝑦1 𝑦2 𝑦3 𝑦4

𝑄 𝑦1, 𝑦2, 𝑦3, 𝑦4 = 𝜚43 𝑦4, 𝑦3 𝜚42(𝑦4, 𝑦2)𝜚32(𝑦3, 𝑦2)𝜚31 𝑦3, 𝑦1 𝜚2 𝑦2 𝜚1 𝑦1

SLIDE 7

Probabilistic Graphical Models

Directed: Bayesian Network Undirected: Markov Random Field

Artificial Neural Network (Deep Learning) Hidden Markov Model Restricted Boltzmann Machine Ising Model

Source: Wikipedia

SLIDE 8

Factor Graphs

𝑦1 𝑦2 𝑦3 𝑦4 𝑦1 𝑦2 𝑦3 𝑦4 𝑦1

𝑔

234

𝑦2

𝑔

123

𝑦3 𝑦4

The factors 𝑔

123 = 𝑔 123 𝑦1, 𝑦2, 𝑦3

𝑔

234 = 𝑔 234(𝑦2, 𝑦3, 𝑦4)

are chosen to match the original probability distributions. 𝑄 𝑦1, 𝑦2, 𝑦3, 𝑦4 = 𝑔

123 𝑦1, 𝑦2, 𝑦3 𝑔 234(𝑦2, 𝑦3, 𝑦4)

Directed: Bayesian Network Undirected: Markov Random Field

These probability distributions can both be represented in terms of factor graphs

SLIDE 9

Belief Propagation Outline

The goal of BP is to compute the marginal probability distribution for a

random variable 𝑦𝑗 in a graphical model:

The probability distribution of a graphical model can be represented as a

factor graph so that where 𝑦𝑘 𝑔 is the subset of the variables involved in factor 𝑔.

By interchanging the product and sum, we can write

where 𝜈𝑔→𝑦𝑗 𝑦𝑗 = 𝑔(𝑦𝑗, 𝑦𝑘 𝑔)

𝑦𝑘 𝑔

is called a message. 𝑄 𝑦𝑗 = 𝑄(𝑦1, … , 𝑦𝑂)

𝑦𝑘 \ 𝑦𝑗

𝑄 𝑦𝑗 = 𝑔(𝑦𝑗, 𝑦𝑘 𝑔)

𝑔∈𝑜𝑓(𝑦𝑗) 𝑦𝑘 \ 𝑦𝑗

𝑄 𝑦𝑗 = 𝜈𝑔→𝑦𝑗(𝑦𝑗)

𝑔∈𝑜𝑓(𝑦𝑗)

SLIDE 10

Belief Propagation Message Passing

BP is a message-passing algorithm. The idea is to pass information through your factor graph by locally updating the messages between nodes.

𝑄 𝑦𝑗 = 𝜈𝑔→𝑦𝑗(𝑦𝑗)

𝑔∈𝑜𝑓(𝑦𝑗)

Once the messages have converged, then you can efficiently evaluate the marginal distribution for each variable: There are two types of message updates:

Factor node to variable node Variable node to factor node 𝑔 𝑔

𝑦𝑗 𝑦𝑗

𝜈𝑔→𝑦𝑗 𝑦𝑗 = 𝑔({𝑦𝑘}, 𝑦𝑗) 𝜈𝑦𝑘→𝑔(𝑦𝑘)

𝑦𝑘 {𝑦𝑘∈𝑜𝑓 𝑔 \x𝑗}

𝜈𝑦𝑗→𝑔 𝑦𝑗 = 𝜈𝑔′→𝑦𝑗(𝑦𝑗)

𝑔′∈𝑜𝑓(𝑦𝑗)\f

SLIDE 11

Killer app: Error-correcting codes

𝐿 𝐿 𝑂 − 𝐿

𝒕 𝒖

To prevent the degradation of a binary signal through a noisy channel, we encode our original signal s into a redundant one t. A theoretically useful encoding scheme is linear block coding, which relates the two signals by a (binary) linear transformation

𝒖 = 𝑯𝑈𝒕

When the matrix 𝑯𝑼 is random and sparse, the encoding is called a low- density parity check (LDPC) code.

parity-check bits

Decoding the degraded signal r of a LDPC code, i.e., inferring the original signal s, is an NP-complete problem. Nonetheless, BP is efficient at providing an approximate solution.

SLIDE 12

Linear block code visualization

Source: Information Theory, Inference, and Learning Algorithms

SLIDE 13

Linear block code as a graphical model

𝒖 = 𝑯𝑈𝒕

𝑯𝑈 = 1 1 1 1 1 1 1 1 1 1 1 1 1

𝑡1 𝑡2 𝑡3 𝑡4 𝑢1 𝑢2 𝑢3 𝑢4 𝑢5 𝑢6 𝑢7

𝑄 𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑢1, … , 𝑢7 →

𝑡𝑗, 𝑢𝑘 ∈ 0,1 are binary random variables

SLIDE 14

Linear block code as a graphical model

𝑡1 𝑡2 𝑡3 𝑡4

When decoding a signal, we

bserve the transmitted bits 𝑢𝑘

and try to find the most likely source bits 𝑡𝑗. 𝑄 𝑡1, 𝑡2, 𝑡3, 𝑡4|𝑢1, … , 𝑢7 = 0101101

1 1 1 1

This means we want to maximize Observed signal Belief Propagation is an efficient way to compute the marginal probability distribution 𝑄(𝑡𝑗) of the source bits 𝑡𝑗.

SLIDE 15

My toy LDPC decoding example

Encoding matrix = 𝑯𝑼=

SLIDE 16

My toy LDPC decoding example

Encoded signal Noisy transmitted signal Marginal probabilities Reconstructed signal

Note: There is a very similar message-passing algorithm, called the max-product (or min- sum, or Viterbi) algorithm, which computes the maximum probability configuration of the probability distribution x∗ = argmaxx P(x), which might be better suited for this decoding task.

SLIDE 17

References

Information Theory, Inference, and Learning Algorithms by David MacKay. Yedidia, J.S.; Freeman, W.T.; Weiss, Y., “Understanding Belief Propagation and Its Generalizations”, Exploring Artificial Intelligence in the New Millennium (2003)

Chap. 8, pp. 239-269.

Pattern Recognition and Machine Learning by Christopher Bishop.