SLIDE 13 Bregman (Dual) divergences
Dual divergences have gradient entries swapped in the table:
(Because of equivalence classes, it is sufficient to have f = Θ(g).) Dom. Function F Gradient
Divergence X (or dual G = F∗) (f = g−1) (g = f −1) DF (p, q) R Squared function⋆ Squared loss (norm) x2 2x
x 2
(p − q)2 R+
Kullback-Leibler div. (I-div.) x log x − x log x exp(x) p log p
q − p + q
Exponential Exponential loss R exp x exp x log x exp(p) − (p − q + 1) exp(q) R+∗ Burg entropy⋆ Itakura-Saito divergence − log x − 1
x
− 1
x p q − log p q − 1
[0, 1] Bit entropy Logistic loss x log x + (1 − x) log(1 − x) log
x 1−x exp x 1+exp x
p log p
q + (1 − p) log 1−p 1−q
Dual bit entropy Dual logistic loss R log(1 + exp x)
exp x 1+exp x
log
x 1−x
log 1+exp p
1+exp q − (p − q) exp q 1+exp q
[−1, 1] Hellinger⋆ Hellinger − ♣ 1 − x2
x q 1−x2 x q 1+x2 1−pq q 1−q2 −
♣ 1 − p2 (Self-dual divergences are marked with an asterisk ⋆. Note that f = ∇F and g = ∇F
−1.)
- F. Nielsen, J.-D. Boissonnat and R. Nock
On Bregman Voronoi Diagrams