Belief Propagation Algorithm Interest Group presentation by Eli - - PowerPoint PPT Presentation

β–Ά
belief propagation
SMART_READER_LITE
LIVE PREVIEW

Belief Propagation Algorithm Interest Group presentation by Eli - - PowerPoint PPT Presentation

Belief Propagation Algorithm Interest Group presentation by Eli Chertkov Inference Statistical inference is the determination of an underlying probability distribution from observed data. Probabilistic Graphical Models 1 1 =


slide-1
SLIDE 1

Algorithm Interest Group presentation by Eli Chertkov

Belief Propagation

slide-2
SLIDE 2

Inference

Statistical inference is the determination of an underlying probability distribution from observed data.

slide-3
SLIDE 3

Probabilistic Graphical Models

𝑦1

𝑄 𝑦1 = 𝜚1 𝑦1

slide-4
SLIDE 4

Probabilistic Graphical Models

𝑄 𝑦1, 𝑦2 = 𝜚1 𝑦1 𝜚2 𝑦2

𝑦2 𝑦1

slide-5
SLIDE 5

Probabilistic Graphical Models

Directed: Bayesian Network Undirected: Markov Random Field 𝑄 𝑦1, 𝑦2 = 𝑄 𝑦2|𝑦1 𝜚1 𝑦1

𝑦2 𝑦1

𝑄 𝑦1, 𝑦2 = 𝜚12 𝑦1, 𝑦2

𝑦2 𝑦1

slide-6
SLIDE 6

Probabilistic Graphical Models

Directed: Bayesian Network Undirected: Markov Random Field

𝑄 𝑦1, 𝑦2, 𝑦3, 𝑦4 = 𝑄 𝑦4|𝑦3, 𝑦2 𝑄(𝑦3|𝑦2, 𝑦1)𝜚2(𝑦2)𝜚1 𝑦1

𝑦1 𝑦2 𝑦3 𝑦4 𝑦1 𝑦2 𝑦3 𝑦4

𝑄 𝑦1, 𝑦2, 𝑦3, 𝑦4 = 𝜚43 𝑦4, 𝑦3 𝜚42(𝑦4, 𝑦2)𝜚32(𝑦3, 𝑦2)𝜚31 𝑦3, 𝑦1 𝜚2 𝑦2 𝜚1 𝑦1

slide-7
SLIDE 7

Probabilistic Graphical Models

Directed: Bayesian Network Undirected: Markov Random Field

Artificial Neural Network (Deep Learning) Hidden Markov Model Restricted Boltzmann Machine Ising Model

Source: Wikipedia

slide-8
SLIDE 8

Factor Graphs

𝑦1 𝑦2 𝑦3 𝑦4 𝑦1 𝑦2 𝑦3 𝑦4 𝑦1

𝑔

234

𝑦2

𝑔

123

𝑦3 𝑦4

The factors 𝑔

123 = 𝑔 123 𝑦1, 𝑦2, 𝑦3

𝑔

234 = 𝑔 234(𝑦2, 𝑦3, 𝑦4)

are chosen to match the original probability distributions. 𝑄 𝑦1, 𝑦2, 𝑦3, 𝑦4 = 𝑔

123 𝑦1, 𝑦2, 𝑦3 𝑔 234(𝑦2, 𝑦3, 𝑦4)

Directed: Bayesian Network Undirected: Markov Random Field

These probability distributions can both be represented in terms of factor graphs

slide-9
SLIDE 9

Belief Propagation Outline

  • The goal of BP is to compute the marginal probability distribution for a

random variable 𝑦𝑗 in a graphical model:

  • The probability distribution of a graphical model can be represented as a

factor graph so that where π‘¦π‘˜ 𝑔 is the subset of the variables involved in factor 𝑔.

  • By interchanging the product and sum, we can write

where πœˆπ‘”β†’π‘¦π‘— 𝑦𝑗 = 𝑔(𝑦𝑗, π‘¦π‘˜ 𝑔)

π‘¦π‘˜ 𝑔

is called a message. 𝑄 𝑦𝑗 = 𝑄(𝑦1, … , 𝑦𝑂)

π‘¦π‘˜ \ 𝑦𝑗

𝑄 𝑦𝑗 = 𝑔(𝑦𝑗, π‘¦π‘˜ 𝑔)

π‘”βˆˆπ‘œπ‘“(𝑦𝑗) π‘¦π‘˜ \ 𝑦𝑗

𝑄 𝑦𝑗 = πœˆπ‘”β†’π‘¦π‘—(𝑦𝑗)

π‘”βˆˆπ‘œπ‘“(𝑦𝑗)

slide-10
SLIDE 10

Belief Propagation Message Passing

BP is a message-passing algorithm. The idea is to pass information through your factor graph by locally updating the messages between nodes.

𝑄 𝑦𝑗 = πœˆπ‘”β†’π‘¦π‘—(𝑦𝑗)

π‘”βˆˆπ‘œπ‘“(𝑦𝑗)

Once the messages have converged, then you can efficiently evaluate the marginal distribution for each variable: There are two types of message updates:

Factor node to variable node Variable node to factor node 𝑔 𝑔

𝑦𝑗 𝑦𝑗

πœˆπ‘”β†’π‘¦π‘— 𝑦𝑗 = 𝑔({π‘¦π‘˜}, 𝑦𝑗) πœˆπ‘¦π‘˜β†’π‘”(π‘¦π‘˜)

π‘¦π‘˜ {π‘¦π‘˜βˆˆπ‘œπ‘“ 𝑔 \x𝑗}

πœˆπ‘¦π‘—β†’π‘” 𝑦𝑗 = πœˆπ‘”β€²β†’π‘¦π‘—(𝑦𝑗)

π‘”β€²βˆˆπ‘œπ‘“(𝑦𝑗)\f

slide-11
SLIDE 11

Killer app: Error-correcting codes

𝐿 𝐿 𝑂 βˆ’ 𝐿

𝒕 𝒖

To prevent the degradation of a binary signal through a noisy channel, we encode our original signal s into a redundant one t. A theoretically useful encoding scheme is linear block coding, which relates the two signals by a (binary) linear transformation

𝒖 = π‘―π‘ˆπ’•

When the matrix 𝑯𝑼 is random and sparse, the encoding is called a low- density parity check (LDPC) code.

parity-check bits

Decoding the degraded signal r of a LDPC code, i.e., inferring the original signal s, is an NP-complete problem. Nonetheless, BP is efficient at providing an approximate solution.

slide-12
SLIDE 12

Linear block code visualization

Source: Information Theory, Inference, and Learning Algorithms

slide-13
SLIDE 13

Linear block code as a graphical model

𝒖 = π‘―π‘ˆπ’•

π‘―π‘ˆ = 1 1 1 1 1 1 1 1 1 1 1 1 1

𝑑1 𝑑2 𝑑3 𝑑4 𝑒1 𝑒2 𝑒3 𝑒4 𝑒5 𝑒6 𝑒7

𝑄 𝑑1, 𝑑2, 𝑑3, 𝑑4, 𝑒1, … , 𝑒7 β†’

𝑑𝑗, π‘’π‘˜ ∈ 0,1 are binary random variables

slide-14
SLIDE 14

Linear block code as a graphical model

𝑑1 𝑑2 𝑑3 𝑑4

When decoding a signal, we

  • bserve the transmitted bits π‘’π‘˜

and try to find the most likely source bits 𝑑𝑗. 𝑄 𝑑1, 𝑑2, 𝑑3, 𝑑4|𝑒1, … , 𝑒7 = 0101101

1 1 1 1

This means we want to maximize Observed signal Belief Propagation is an efficient way to compute the marginal probability distribution 𝑄(𝑑𝑗) of the source bits 𝑑𝑗.

slide-15
SLIDE 15

My toy LDPC decoding example

Encoding matrix = 𝑯𝑼=

slide-16
SLIDE 16

My toy LDPC decoding example

Encoded signal Noisy transmitted signal Marginal probabilities Reconstructed signal

Note: There is a very similar message-passing algorithm, called the max-product (or min- sum, or Viterbi) algorithm, which computes the maximum probability configuration of the probability distribution xβˆ— = argmaxx P(x), which might be better suited for this decoding task.

slide-17
SLIDE 17

References

Information Theory, Inference, and Learning Algorithms by David MacKay. Yedidia, J.S.; Freeman, W.T.; Weiss, Y., β€œUnderstanding Belief Propagation and Its Generalizations”, Exploring Artificial Intelligence in the New Millennium (2003)

  • Chap. 8, pp. 239-269.

Pattern Recognition and Machine Learning by Christopher Bishop.