A Fresh Look at the Bayes Theorem from Information Theory Tan - - PowerPoint PPT Presentation

a fresh look at the bayes theorem from information theory
SMART_READER_LITE
LIVE PREVIEW

A Fresh Look at the Bayes Theorem from Information Theory Tan - - PowerPoint PPT Presentation

A Fresh Look at the Bayes Theorem from Information Theory Tan Bui-Thanh Computational Engineering and Optimization (CEO) Group Department of Aerospace Engineering and Engineering Mechanics Institute for Computational Engineering and Sciences


slide-1
SLIDE 1

A Fresh Look at the Bayes’ Theorem from Information Theory

Tan Bui-Thanh

Computational Engineering and Optimization (CEO) Group Department of Aerospace Engineering and Engineering Mechanics Institute for Computational Engineering and Sciences (ICES) The University of Texas at Austin

Babuska Series, ICES Sep 9, 2016

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 1 / 28

slide-2
SLIDE 2

Outline

1

Bayesian Inversion Framework

2

Entropy

3

Relative Entropy

4

Bayes’ Theorem and Information Theory

5

Conclusions

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 2 / 28

slide-3
SLIDE 3

Large-scale computation under uncertainty

Inverse electromagnetic scattering Randomness Random errors in measurements are unavoidable Inadequacy of the mathematical model (Maxwell equations)

Challenge

How to invert for the invisible shape/medium using computational electromagnetics with O

  • 106

degree of freedoms?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 3 / 28

slide-4
SLIDE 4

Large-scale computation under uncertainty

Full wave form seismic inversion Randomness Random errors in seismometer measurements are unavoidable Inadequacy of the mathematical model (elastodynamics)

Challenge

How to image the earth interior using forward computational model with with O

  • 109

degree of freedoms?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 4 / 28

slide-5
SLIDE 5

Inverse Shape Electromagnetic Scattering Problem

Maxwell Equations: r ⇥ E = µ@H @t , (Faraday) r ⇥ H = ✏@E @t , (Ampere) E: Electric field, H: Magnetic field, µ: permeability, ✏: permittivity

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 5 / 28

slide-6
SLIDE 6

Inverse Shape Electromagnetic Scattering Problem

Maxwell Equations: r ⇥ E = µ@H @t , (Faraday) r ⇥ H = ✏@E @t , (Ampere) E: Electric field, H: Magnetic field, µ: permeability, ✏: permittivity

Forward problem (discontinuous Galerkin discretization)

d = G(x) where G maps shape parameters x to electric/magnetic field d at the measurement points

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 5 / 28

slide-7
SLIDE 7

Inverse Shape Electromagnetic Scattering Problem

Maxwell Equations: r ⇥ E = µ@H @t , (Faraday) r ⇥ H = ✏@E @t , (Ampere) E: Electric field, H: Magnetic field, µ: permeability, ✏: permittivity

Forward problem (discontinuous Galerkin discretization)

d = G(x) where G maps shape parameters x to electric/magnetic field d at the measurement points

Inverse Problem

Given (possibly noise-corrupted) measurements on d, infer x?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 5 / 28

slide-8
SLIDE 8

The Bayesian Statistical Inversion Framework

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 6 / 28

slide-9
SLIDE 9

The Bayesian Statistical Inversion Framework

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 6 / 28

slide-10
SLIDE 10

The Bayesian Statistical Inversion Framework

Bayes Theorem

⇡post (x|d) / ⇡like (d|x) ⇥ ⇡prior (x)

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 6 / 28

slide-11
SLIDE 11

Bayes theorem for inverse electromagnetic scattering

Prior knowledge: The obstacle is smooth: ⇡pr(x) / exp ✓

  • Z 2π

r00(x)d✓ ◆

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 7 / 28

slide-12
SLIDE 12

Bayes theorem for inverse electromagnetic scattering

Prior knowledge: The obstacle is smooth: ⇡pr(x) / exp ✓

  • Z 2π

r00(x)d✓ ◆ Likelihood: Additive Gaussian noise, for example, ⇡like(d|x) / exp ✓ 1 2 kG(x) dk2

Cnoise

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 7 / 28

slide-13
SLIDE 13

Outline

1

Bayesian Inversion Framework

2

Entropy

3

Relative Entropy

4

Bayes’ Theorem and Information Theory

5

Conclusions

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 8 / 28

slide-14
SLIDE 14

Entropy

Definition

We define the uncertainty in a random variable X distributed by 0  ⇡ (x)  1 as H (X) = Z ⇡ (x) log ⇡ (x) dx 0

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 9 / 28

slide-15
SLIDE 15

Entropy

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28

slide-16
SLIDE 16

Entropy

Wiener and Shannon Kolmogorov

Copied from Sergio Verdu

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28

slide-17
SLIDE 17

Entropy

Wiener and Shannon Kolmogorov

Copied from Sergio Verdu

Wiener: “...for it belongs to the two of us equally”

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28

slide-18
SLIDE 18

Entropy

Wiener and Shannon Kolmogorov

Copied from Sergio Verdu

Wiener: “...for it belongs to the two of us equally” Shannon: “...a mathematical pun”

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28

slide-19
SLIDE 19

Entropy

Wiener and Shannon Kolmogorov

Copied from Sergio Verdu

Wiener: “...for it belongs to the two of us equally” Shannon: “...a mathematical pun” Kolmogorov: “...has no physical interpretation”

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28

slide-20
SLIDE 20

Entropy

Entropy of uniform distribution

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28

slide-21
SLIDE 21

Entropy

Entropy of uniform distribution Let U be a uniform random variable with values in X, and |X| < 1

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28

slide-22
SLIDE 22

Entropy

Entropy of uniform distribution Let U be a uniform random variable with values in X, and |X| < 1 ⇡ (u) := 1 |X| ) H (U) = log (|X|)

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28

slide-23
SLIDE 23

Entropy

Entropy of uniform distribution Let U be a uniform random variable with values in X, and |X| < 1 ⇡ (u) := 1 |X| ) H (U) = log (|X|) How uncertain is the uniform random variable?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28

slide-24
SLIDE 24

Entropy

Entropy of uniform distribution Let U be a uniform random variable with values in X, and |X| < 1 ⇡ (u) := 1 |X| ) H (U) = log (|X|) How uncertain is the uniform random variable? H (X)  H (U)

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28

slide-25
SLIDE 25

100 years of uniform distribution

source: Christoph Aistleitner

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 12 / 28

slide-26
SLIDE 26

100 years of uniform distribution

source: Christoph Aistleitner

Hermann Weyl

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 12 / 28

slide-27
SLIDE 27

and Maximum entropy

Maximum entropy distribution X with known mean and variance

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 13 / 28

slide-28
SLIDE 28

and Maximum entropy

Maximum entropy distribution X with known mean and variance ⇡ (x)? with maximum entropy

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 13 / 28

slide-29
SLIDE 29

and Maximum entropy

Maximum entropy distribution X with known mean and variance ⇡ (x)? with maximum entropy max

π(x) H(X) =

Z ⇡(x) log(⇡(x)) dx subject to Z x⇡(x) dx = µ Z (x µ)2⇡(x) dx = 2 Z ⇡(x) dx = 1

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 13 / 28

slide-30
SLIDE 30

Gaussian and Maximum entropy

Maximum entropy distribution X with known mean and variance ⇡ (x)? with maximum entropy max

π(x) H(X) =

Z ⇡(x) log(⇡(x)) dx subject to Z x⇡(x) dx = µ Z (x µ)2⇡(x) dx = 2 Z ⇡(x) dx = 1 Gaussian distribution: ⇡ (x) = N

  • µ, 2

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 13 / 28

slide-31
SLIDE 31

Outline

1

Bayesian Inversion Framework

2

Entropy

3

Relative Entropy

4

Bayes’ Theorem and Information Theory

5

Conclusions

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 14 / 28

slide-32
SLIDE 32

Relative Entropy

Abraham Wald (1945) Harold Jeffreys (1945) D (⇡||q) := Z ⇡(x) log ✓⇡(x) q(x) ◆ dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 15 / 28

slide-33
SLIDE 33

Kullback-Leibler divergence = Relative Entropy

Solomon Kullback (1951) Richard Leibler (1951) D (⇡||q) := Z ⇡(x) log ✓⇡(x) q(x) ◆ dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 16 / 28

slide-34
SLIDE 34

Kullback-Leibler divergence = Relative Entropy

Solomon Kullback (1951) Richard Leibler (1951) D (⇡||q) := Z ⇡(x) log ✓⇡(x) q(x) ◆ dxdiscrete = X ⇡i log ✓⇡i qi ◆

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 16 / 28

slide-35
SLIDE 35

Information Inequality

The most important inequality in information theory

D (⇡||q) 0 Can we see it easily?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 17 / 28

slide-36
SLIDE 36

Information Inequality

The most important inequality in information theory

D (⇡||q) 0 Can we see it easily?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 17 / 28

slide-37
SLIDE 37

Outline

1

Bayesian Inversion Framework

2

Entropy

3

Relative Entropy

4

Bayes’ Theorem and Information Theory

5

Conclusions

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 18 / 28

slide-38
SLIDE 38

From Relative Entropy to Bayes’ Theorem

Toss n times an kth dimensional dice with the prior distribution of each face {pi}k

i=1: k

X

i=1

pi = 1

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 19 / 28

slide-39
SLIDE 39

From Relative Entropy to Bayes’ Theorem

Toss n times an kth dimensional dice with the prior distribution of each face {pi}k

i=1: k

X

i=1

pi = 1 Let ni is the number of times we see face i: ni n ! pi

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 19 / 28

slide-40
SLIDE 40

From Relative Entropy to Bayes’ Theorem

Toss n times an kth dimensional dice with the prior distribution of each face {pi}k

i=1: k

X

i=1

pi = 1 Let ni is the number of times we see face i: ni n ! pi What is the likelihood that these n faces also distributed by the posterior distrubtion qi:

k

X

i=1

qi = 1?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 19 / 28

slide-41
SLIDE 41

From Relative Entropy to Bayes’ Theorem

Toss n times an kth dimensional dice with the prior distribution of each face {pi}k

i=1: k

X

i=1

pi = 1 Let ni is the number of times we see face i: ni n ! pi What is the likelihood that these n faces also distributed by the posterior distrubtion qi:

k

X

i=1

qi = 1? The likelihood of {ni}k

i=1 distributed by {qi}k i=1

Πk

i=1qni i

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 19 / 28

slide-42
SLIDE 42

From Relative Entropy to Bayes’ Theorem

Toss n times an kth dimensional dice with the prior distribution of each face {pi}k

i=1: k

X

i=1

pi = 1 Let ni is the number of times we see face i: ni n ! pi What is the likelihood that these n faces also distributed by the posterior distrubtion qi:

k

X

i=1

qi = 1? The likelihood of {ni}k

i=1 distributed by {qi}k i=1 (Multinomial

distribution) L := n! Πk

i=1ni!Πk i=1qni i

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 19 / 28

slide-43
SLIDE 43

From Relative Entropy to Bayes’ Theorem

The likelihood of {ni}k

i=1 distributed by {qi}k i=1

L := n! Πk

i=1ni!Πk i=1qni i

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 20 / 28

slide-44
SLIDE 44

From Relative Entropy to Bayes’ Theorem

The likelihood of {ni}k

i=1 distributed by {qi}k i=1

L := n! Πk

i=1ni!Πk i=1qni i

Take the log likelihood log L = log(n!) X log(ni!) + X ni log(qi)

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 20 / 28

slide-45
SLIDE 45

From Relative Entropy to Bayes’ Theorem

The likelihood of {ni}k

i=1 distributed by {qi}k i=1

L := n! Πk

i=1ni!Πk i=1qni i

Take the log likelihood log L = log(n!) X log(ni!) + X ni log(qi) Stirling’s approximation log n! ⇡ n log(n) n log L = n log(n) X ni log(ni) + X ni log(qi) + X ni n | {z }

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 20 / 28

slide-46
SLIDE 46

From Relative Entropy to Bayes’ Theorem

The likelihood of {ni}k

i=1 distributed by {qi}k i=1

L := n! Πk

i=1ni!Πk i=1qni i

Take the log likelihood log L = log(n!) X log(ni!) + X ni log(qi) Stirling’s approximation log n! ⇡ n log(n) n log L = n log(n) X ni log(ni) + X ni log(qi) + X ni n | {z } 1 n ( log L) = X ni n log ✓ni/n qi ◆

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 20 / 28

slide-47
SLIDE 47

From Relative Entropy to Bayes’ Theorem

The likelihood of {ni}k

i=1 distributed by {qi}k i=1

L := n! Πk

i=1ni!Πk i=1qni i

Take the log likelihood log L = log(n!) X log(ni!) + X ni log(qi) Stirling’s approximation log n! ⇡ n log(n) n log L = n log(n) X ni log(ni) + X ni log(qi) + X ni n | {z } 1 n ( log L) = X ni n log ✓ni/n qi ◆ = X pi log ✓pi qi ◆

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 20 / 28

slide-48
SLIDE 48

From Relative Entropy to Bayes’ Theorem

The likelihood of {ni}k

i=1 distributed by {qi}k i=1

L := n! Πk

i=1ni!Πk i=1qni i

Take the log likelihood log L = log(n!) X log(ni!) + X ni log(qi) Stirling’s approximation log n! ⇡ n log(n) n log L = n log(n) X ni log(ni) + X ni log(qi) + X ni n | {z } Relative entropy = average likelihood 1 n ( log L) = X ni n log ✓ni/n qi ◆ = X pi log ✓pi qi ◆ = D (p||q)

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 20 / 28

slide-49
SLIDE 49

From Relative Entropy to Bayes’ Theorem

Relative entropy = average likelihood

1 n ( log L) = D (p||q)

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 21 / 28

slide-50
SLIDE 50

From Relative Entropy to Bayes’ Theorem

Relative entropy = average likelihood

1 n ( log L) = D (p||q) Write X ! Z

  • Z

log(L)p(x) dx = Z log ✓p q ◆ p(x) dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 21 / 28

slide-51
SLIDE 51

From Relative Entropy to Bayes’ Theorem

Relative entropy = average likelihood

1 n ( log L) = D (p||q) Write X ! Z

  • Z

log(L)p(x) dx = Z log ✓p q ◆ p(x) dx q(x) = L(x)p(x)

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 21 / 28

slide-52
SLIDE 52

From Relative Entropy to Bayes’ Theorem

Relative entropy = average likelihood ! Bayes

1 n ( log L) = D (p||q) Write X ! Z

  • Z

log(L)p(x) dx = Z log ✓p q ◆ p(x) dx Bayes’ theorem q(x) = L(x)p(x)

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 21 / 28

slide-53
SLIDE 53

From Optimization to Bayes’ Theorem

Inverse Problem Given observation model d = G (x) + "

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 22 / 28

slide-54
SLIDE 54

From Optimization to Bayes’ Theorem

Inverse Problem Given observation model d = G (x) + " Inverse task: given d, infer x

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 22 / 28

slide-55
SLIDE 55

From Optimization to Bayes’ Theorem

Inverse Problem Given observation model d = G (x) + " Inverse task: given d, infer x Statistical inversion: Prior knowledge: X ⇠ ⇡prior(x). Look for the posterior distribution ⇡post(x) that combines prior information and information from the data.

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 22 / 28

slide-56
SLIDE 56

From Optimization to Bayes’ Theorem

Inverse Problem Given observation model d = G (x) + " Inverse task: given d, infer x Statistical inversion: Prior knowledge: X ⇠ ⇡prior(x). Look for the posterior distribution ⇡post(x) that combines prior information and information from the data. The likelihood: assume " ⇠ N (0, C) ⇡like(x) = exp ✓ 1 2 kd G(x)k2

C

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 22 / 28

slide-57
SLIDE 57

From Optimization to Bayes’ Theorem

Prior Elicitation Try to get the best prior information = discrepancy relative to the posterior is minimized

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 23 / 28

slide-58
SLIDE 58

From Optimization to Bayes’ Theorem

Prior Elicitation Try to get the best prior information = discrepancy relative to the posterior is minimized Conversely, best prior ! the information gained in the posterior should not be large

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 23 / 28

slide-59
SLIDE 59

From Optimization to Bayes’ Theorem

Prior Elicitation Try to get the best prior information = discrepancy relative to the posterior is minimized Conversely, best prior ! the information gained in the posterior should not be large Equivalently, ⇡post = arg min

π(x)

D (⇡||⇡prior) = Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 23 / 28

slide-60
SLIDE 60

From Optimization to Bayes’ Theorem

How about information from the data? Want to find x to match the data as well as we can

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 24 / 28

slide-61
SLIDE 61

From Optimization to Bayes’ Theorem

How about information from the data? Want to find x to match the data as well as we can “Equivalently”: want to find the posterior distribution such that kd G(x)k2

C is minimized!

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 24 / 28

slide-62
SLIDE 62

From Optimization to Bayes’ Theorem

How about information from the data? Want to find x to match the data as well as we can “Equivalently”: want to find the posterior distribution such that kd G(x)k2

C is minimized!

One approach: minimize the mean squared error ⇡post = arg min

π(x)

Z ⇡(x) kd G(x)k2

C dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 24 / 28

slide-63
SLIDE 63

From Optimization to Bayes’ Theorem

How about information from the data? Want to find x to match the data as well as we can “Equivalently”: want to find the posterior distribution such that kd G(x)k2

C is minimized!

One approach: minimize the mean squared error ⇡post = arg min

π(x)

Z ⇡(x) kd G(x)k2

C dx=

Z ⇡(x) log (⇡like(x)) dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 24 / 28

slide-64
SLIDE 64

From Optimization to Bayes’ Theorem

Prior + data information From prior ⇡post = arg min

π(x)

D (⇡||⇡prior) = Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 25 / 28

slide-65
SLIDE 65

From Optimization to Bayes’ Theorem

Prior + data information From prior ⇡post = arg min

π(x)

D (⇡||⇡prior) = Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx From likelihood ⇡post = arg min

π(x)

  • Z

⇡(x) log (⇡like(x)) dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 25 / 28

slide-66
SLIDE 66

From Optimization to Bayes’ Theorem

Prior + data information From prior ⇡post = arg min

π(x)

D (⇡||⇡prior) = Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx From likelihood ⇡post = arg min

π(x)

  • Z

⇡(x) log (⇡like(x)) dx A Compromise ⇡post = arg min

π(x)

  • Z

⇡(x) log (⇡like(x)) dx+ Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx subject to Z ⇡(x) dx = 1, and ⇡(x) 0.

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 25 / 28

slide-67
SLIDE 67

From Optimization to Bayes’ Theorem

Prior + data information A Compromise ⇡post = arg min

π(x)

  • Z

⇡(x) log (⇡like(x)) dx+ Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx subject to Z ⇡(x) dx = 1, and ⇡(x) 0.

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 26 / 28

slide-68
SLIDE 68

From Optimization to Bayes’ Theorem

Prior + data information A Compromise ⇡post = arg min

π(x)

  • Z

⇡(x) log (⇡like(x)) dx+ Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx subject to Z ⇡(x) dx = 1, and ⇡(x) 0. Does it have a solution ⇡post(x)? is it unique?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 26 / 28

slide-69
SLIDE 69

From Optimization to Bayes’ Theorem

Prior + data information A Compromise ⇡post = arg min

π(x)

  • Z

⇡(x) log (⇡like(x)) dx+ Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx subject to Z ⇡(x) dx = 1, and ⇡(x) 0. Does it have a solution ⇡post(x)? is it unique? How to solve?

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 26 / 28

slide-70
SLIDE 70

From Optimization to Bayes’ Theorem

Prior + data information A Compromise ⇡post = arg min

π(x)

  • Z

⇡(x) log (⇡like(x)) dx+ Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx subject to Z ⇡(x) dx = 1, and ⇡(x) 0. Does it have a solution ⇡post(x)? is it unique? How to solve? Lagrangian + calculus of variation

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 26 / 28

slide-71
SLIDE 71

From Optimization to Bayes’ Theorem

Prior + data information A Compromise ⇡post = arg min

π(x)

  • Z

⇡(x) log (⇡like(x)) dx+ Z ⇡(x) log ✓ ⇡(x) ⇡prior(x) ◆ dx subject to Z ⇡(x) dx = 1, and ⇡(x) 0. Does it have a solution ⇡post(x)? is it unique? How to solve? Lagrangian + calculus of variation Solution = Bayes’ theorem ⇡post (x|d) = ⇡like (d|x) ⇥ ⇡prior (x) R ⇡like (d|x) ⇥ ⇡prior (x) dx

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 26 / 28

slide-72
SLIDE 72

Outline

1

Bayesian Inversion Framework

2

Entropy

3

Relative Entropy

4

Bayes’ Theorem and Information Theory

5

Conclusions

(CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 27 / 28

slide-73
SLIDE 73

Conclusions

1 Information provide an intuitive and fresh view of Bayes’ theorem (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 28 / 28

slide-74
SLIDE 74

Conclusions

1 Information provide an intuitive and fresh view of Bayes’ theorem 2 Relative entropy ! Bayes’ theorem (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 28 / 28

slide-75
SLIDE 75

Conclusions

1 Information provide an intuitive and fresh view of Bayes’ theorem 2 Relative entropy ! Bayes’ theorem 3 Optimization + information ! Bayes’ theorem (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 28 / 28