Belief Propagation Algorithm Interest Group presentation by Eli - - PowerPoint PPT Presentation
Belief Propagation Algorithm Interest Group presentation by Eli - - PowerPoint PPT Presentation
Belief Propagation Algorithm Interest Group presentation by Eli Chertkov Inference Statistical inference is the determination of an underlying probability distribution from observed data. Probabilistic Graphical Models 1 1 =
Inference
Statistical inference is the determination of an underlying probability distribution from observed data.
Probabilistic Graphical Models
π¦1
π π¦1 = π1 π¦1
Probabilistic Graphical Models
π π¦1, π¦2 = π1 π¦1 π2 π¦2
π¦2 π¦1
Probabilistic Graphical Models
Directed: Bayesian Network Undirected: Markov Random Field π π¦1, π¦2 = π π¦2|π¦1 π1 π¦1
π¦2 π¦1
π π¦1, π¦2 = π12 π¦1, π¦2
π¦2 π¦1
Probabilistic Graphical Models
Directed: Bayesian Network Undirected: Markov Random Field
π π¦1, π¦2, π¦3, π¦4 = π π¦4|π¦3, π¦2 π(π¦3|π¦2, π¦1)π2(π¦2)π1 π¦1
π¦1 π¦2 π¦3 π¦4 π¦1 π¦2 π¦3 π¦4
π π¦1, π¦2, π¦3, π¦4 = π43 π¦4, π¦3 π42(π¦4, π¦2)π32(π¦3, π¦2)π31 π¦3, π¦1 π2 π¦2 π1 π¦1
Probabilistic Graphical Models
Directed: Bayesian Network Undirected: Markov Random Field
Artificial Neural Network (Deep Learning) Hidden Markov Model Restricted Boltzmann Machine Ising Model
Source: Wikipedia
Factor Graphs
π¦1 π¦2 π¦3 π¦4 π¦1 π¦2 π¦3 π¦4 π¦1
π
234
π¦2
π
123
π¦3 π¦4
The factors π
123 = π 123 π¦1, π¦2, π¦3
π
234 = π 234(π¦2, π¦3, π¦4)
are chosen to match the original probability distributions. π π¦1, π¦2, π¦3, π¦4 = π
123 π¦1, π¦2, π¦3 π 234(π¦2, π¦3, π¦4)
Directed: Bayesian Network Undirected: Markov Random Field
These probability distributions can both be represented in terms of factor graphs
Belief Propagation Outline
- The goal of BP is to compute the marginal probability distribution for a
random variable π¦π in a graphical model:
- The probability distribution of a graphical model can be represented as a
factor graph so that where π¦π π is the subset of the variables involved in factor π.
- By interchanging the product and sum, we can write
where ππβπ¦π π¦π = π(π¦π, π¦π π)
π¦π π
is called a message. π π¦π = π(π¦1, β¦ , π¦π)
π¦π \ π¦π
π π¦π = π(π¦π, π¦π π)
πβππ(π¦π) π¦π \ π¦π
π π¦π = ππβπ¦π(π¦π)
πβππ(π¦π)
Belief Propagation Message Passing
BP is a message-passing algorithm. The idea is to pass information through your factor graph by locally updating the messages between nodes.
π π¦π = ππβπ¦π(π¦π)
πβππ(π¦π)
Once the messages have converged, then you can efficiently evaluate the marginal distribution for each variable: There are two types of message updates:
Factor node to variable node Variable node to factor node π π
π¦π π¦π
ππβπ¦π π¦π = π({π¦π}, π¦π) ππ¦πβπ(π¦π)
π¦π {π¦πβππ π \xπ}
ππ¦πβπ π¦π = ππβ²βπ¦π(π¦π)
πβ²βππ(π¦π)\f
Killer app: Error-correcting codes
πΏ πΏ π β πΏ
π π
To prevent the degradation of a binary signal through a noisy channel, we encode our original signal s into a redundant one t. A theoretically useful encoding scheme is linear block coding, which relates the two signals by a (binary) linear transformation
π = π―ππ
When the matrix π―πΌ is random and sparse, the encoding is called a low- density parity check (LDPC) code.
parity-check bits
Decoding the degraded signal r of a LDPC code, i.e., inferring the original signal s, is an NP-complete problem. Nonetheless, BP is efficient at providing an approximate solution.
Linear block code visualization
Source: Information Theory, Inference, and Learning Algorithms
Linear block code as a graphical model
π = π―ππ
π―π = 1 1 1 1 1 1 1 1 1 1 1 1 1
π‘1 π‘2 π‘3 π‘4 π’1 π’2 π’3 π’4 π’5 π’6 π’7
π π‘1, π‘2, π‘3, π‘4, π’1, β¦ , π’7 β
π‘π, π’π β 0,1 are binary random variables
Linear block code as a graphical model
π‘1 π‘2 π‘3 π‘4
When decoding a signal, we
- bserve the transmitted bits π’π
and try to find the most likely source bits π‘π. π π‘1, π‘2, π‘3, π‘4|π’1, β¦ , π’7 = 0101101
1 1 1 1
This means we want to maximize Observed signal Belief Propagation is an efficient way to compute the marginal probability distribution π(π‘π) of the source bits π‘π.
My toy LDPC decoding example
Encoding matrix = π―πΌ=
My toy LDPC decoding example
Encoded signal Noisy transmitted signal Marginal probabilities Reconstructed signal
Note: There is a very similar message-passing algorithm, called the max-product (or min- sum, or Viterbi) algorithm, which computes the maximum probability configuration of the probability distribution xβ = argmaxx P(x), which might be better suited for this decoding task.
References
Information Theory, Inference, and Learning Algorithms by David MacKay. Yedidia, J.S.; Freeman, W.T.; Weiss, Y., βUnderstanding Belief Propagation and Its Generalizationsβ, Exploring Artificial Intelligence in the New Millennium (2003)
- Chap. 8, pp. 239-269.
Pattern Recognition and Machine Learning by Christopher Bishop.