THermodynamic Formalism and Uncertainty Quantification Luc - - PowerPoint PPT Presentation

thermodynamic formalism and uncertainty quantification
SMART_READER_LITE
LIVE PREVIEW

THermodynamic Formalism and Uncertainty Quantification Luc - - PowerPoint PPT Presentation

THermodynamic Formalism and Uncertainty Quantification Luc Rey-Bellet University of Massachusetts Amherst Quantissima III, Venice, August 2019 Work supported by NSF and AFOSR 1 Collaborators on related projects Paul Dupuis (Brown


slide-1
SLIDE 1

THermodynamic Formalism and Uncertainty Quantification Luc Rey-Bellet University of Massachusetts Amherst Quantissima III, Venice, August 2019 Work supported by NSF and AFOSR 1

slide-2
SLIDE 2

Collaborators on related projects

  • Paul Dupuis (Brown University),
  • Markos Katsoulakis (UMass Amherst)
  • Sung-Ha Hwang (KAIST)
  • Peter Plechac (U. of Delaware)
  • Yannis Pantazis (FORTH Crete)
  • Jeremiah Birrell (UMass Amherst)
  • Panagiota Birmpa(UMass Amherst)
  • Konstantinos Gourgoulias (UMass Amherst)
  • Jinchao Feng (UMass Amherst)
  • Jie Wang (UMass Amherst)
  • Sosung Baek (KAIST)

2

slide-3
SLIDE 3

Some references: [1] K. Chowdhary and P. Dupuis: Distinguishing and integrating aleatoric and epistemic variation in uncertainty quantification. ESAIM: M2AN, 47:635–662, 2013. [2] R. Atar, K. Chowdhary, and P. Dupuis: Robust bounds on risk-sensitive functionals via r´ enyi divergence. SIAM/ASA Jour- nal on UQ, 3:18–33, 2015. [3] P. Dupuis, M. A. Katsoulakis, Y. Pantazis, and P. Plech´ aˇ c. Path-Space Information Bounds for Uncertainty Quantification and Sensitivity Analysis of Stochastic Dynamics. SIAM/ASA Journal on UQ, 4(1):80–111, 2016. [4] M. Katsoulakis M., L. Rey-Bellet L. and J. Wang.: Scal- able Information Inequalities for Uncertainty Quantification. J.

  • Comp. Phys. 336, 1, 513-545 (2017)

[5] K. Gourgoulias, M. Katsoulakis, L. Rey-Bellet L. and J. Wang: How biased is your model? Concentration inequalities, information and model bias. To be published in IEEE Trans.

  • Inf. Theory

3

slide-4
SLIDE 4

[6] P. Dupuis, M. Katsoulakis,Y. Pantazis, and L. Rey-Bellet: Sensitivity Analysis for Rare Events based on Renyi Divergence. To be published in Ann. Appl. Prob. [7] J. Birrel and L. Rey-Bellet: Uncertainty Quantification for Markov Processes via Variational Principles and Functional In-

  • equalities. Submitted. arXiv:1812.05174

[8] J. Birrel and L. Rey-Bellet: Concentration Ineqaulities and Performance Guaraantees for Hypocoercive Samplers. Submit- ted arXiv:1907.11973 [9] J. Birrell, M. Katsoulakis, and L. Rey-Bellet: Robustness of Dynamical Quantities of Interest via Goal-Oriented Information

  • Theory. arXiv:1906.09282

[10] S. Baek, S.-H. Hwang, and L. Rey-Bellet: Thermodynami- cal formalism and Uncertainty Quantification. In Preparation.

  • and several more to come.

4

slide-5
SLIDE 5

UQ framework: Baseline model → Baseline model P (= probability measure on X). Think of it as a (tractable) model you use to compute or do analysis Maybe obtained after inference and/or model reduction, and so

  • n....

Mots interesting you should think of P is high-dimensional, e.g, Pν is the distribution of a process {Xt}0≤t≤∞ with X0 ∼ ν. P is a Gibbs measure on ΩZd In any case, we think there are possibly lots of and large uncer- tainties in the model (model-form uncertainties) P IS NOT TO BE TRUSTED!! 5

slide-6
SLIDE 6

UQ framework: Quantities of interest Specific observables/statistics/quantities of interest = QoI

  • EP[f] (Expectation)
  • VarP(f) (Variance) or

CovP (f,g)

VarP (f)VarP (g) (correlation), or

  • ΛP,f(c) = log EP[ecf] (risk sensitive functional)
  • log P(A) ∼ log e−I(A)/ǫ (probability of some rare event)
  • r maybe path-space QoI
  • EPν

τ

0 f(xt) dt

. where τ is a stopping time.

  • EPν
  • 1

T

T

0 f(xs) dt

  • that is ergodic averages.
  • EPν

0 e−λsf(xs) dt

that is discounted observables.

  • and so on....

6

slide-7
SLIDE 7

UQ framework: Non Parametric Stress tests → Family of alternative models Q. Think of it as describing the true but ”unknowable” or partially known models. Set Qη = {Q is η ”close” to P} Given a QoI f can one find uncertainty bounds or performance guarantees inf

Q∈Qη

EQ[f] ≤ EP[f] ≤ sup

Q∈Qη

EQ[f]?

and similarly for other quantities. The bounds should be tight and computable (numerically or analytically). → Robustness , cf book by Hansen (Nobel 2011) and Sargent (Nobel 2013) → Stress tests in Operation research, Finance, etc.... 7

slide-8
SLIDE 8

UQ framework: distances and divergences Which measure of distance or pseudo-distance divergence should

  • ne use?

→ Use Information Theory concepts to measure information loss between Q and P.

  • Relative entropy (a.k.a Kullback-Leibler divergence)

R(Q||P) = EQ

  • log dQ

dP

  • Relative Renyi entropy (a.k.a Renyi divergence): For α = 0, 1

Rα(Q||P) = 1 α(α − 1) log EP

dQ

dP

α

= 1 α(α − 1) log EP

  • eα log dQ

dP

  • Note that

Rα(Q||P) →

R(Q||P)

as α → 1 R(P||Q) as α → 0 8

slide-9
SLIDE 9

UQ framework: distances and divergences

  • Scalability: If Q0:T and P 0:T are the distribution of the process

restricted to the time window 0 to T then, typically, Rα(Q0:T||P 0:T). = O(T) as T → ∞ i.e. Information is additive. For the relative entropy we have the chain rule for relative entropy which is even better (not asymptotic in T).

  • Information processing inequality: If F is a sub σ-algebra

then Rα(Q|F||P|F) ≤ Rα(Q||P)

  • What is the right divergence for the QoI?
  • Not the whole story:

→ Heavy tailed observable may require other entropies (f-divergences) → Wasserstein type distances— needed if Q ≪ P.... 9

slide-10
SLIDE 10

What is wrong with CKP? Scalability Czsizar-Kullback-Pinsker |EQ[f] − EP[f]| ≤

  • 2R(Q||P) f − EP[f]∞

Take e.g. Markov measures P = P 0:T and Q = Q0:T and FT = 1 T

T

f(Xs) ds . Then FT∞ = f∞ = O(1) and R(Q0:T||P 0:T) = O(T) and so |EQ0:T[FT] − EP 0:T[FT]|

  • =O(1)

  • 2R(Q0:T||P 0:T) FT − EP[FT]∞
  • =O(

√ T)

CKP does not scale correctly! Note though that VarP 0:T[FT] = O

1

T

  • so one would need the variance instead of the sup norm.

10

slide-11
SLIDE 11

Gibbs Variational principle a.k.a. F = U − TS

  • Relative entropy (a.k.a Kullback-Leibler divergence).

R(Q || P) =

  • EQ
  • log dQ

dP

  • if Q ≪ P

+∞

  • therwise

R(Q || P) is a divergence, that is R(Q || P) ≥ 0 and R(Q || P) = 0 if and only if Q = P.

  • Gibbs variational principle for the relative entropy:

(convex duality). log EP

  • ef

= sup

Q

{EQ[f] − R(Q||P)} with the supremum attained if and only if dQ = dQf = efdP EP[ef] Play a central role in statistical mechanics, in large deviation theory and in dynamical systems. 11

slide-12
SLIDE 12

Gibbs information inequality From the Gibbs variational principle, for any Q and c ≥ 0

EQ[±cf] ≤ log EP

  • e±cf

+ R(Q||P) . Theorem (Gibbs Information inequality) − inf

c>0

Λ(−c) + R(Q||P)

c

  • =ΞP,−f(R(Q||P))

≤ EQ[f] − EP[f] ≤ inf

c>0

Λ(c) + R(Q||P)

c

  • =ΞP,f(R(Q||P))

ΞP,f(η) ≡ inf

c>0

Λ(c) + η

c

  • Λ(c) = log EP
  • ec(f−EP [f])

= log EP

  • ecf

− EP[f] How good is it? (Long history... Dupuis; Bobkov; Boucheron,

  • Lugosi. Massart; Breuer,Czizsar, etc...)

12

slide-13
SLIDE 13

Properties of the Gibbs information inequality ΞP,f(R(Q||P)) is a divergence, i.e. ΞP,f(η) ≥ 0 and ΞP,f(η) = 0 ⇔

η = 0 i.e. Q = P

  • r f = const

Moreover the Gibbs information inequality is tight: Given the family of alternative models Qη = {Q ; R(Q||P) ≤ η} we have ΞP,f(η) = max

Q∈Qη

{EQ[f] − EP[f]} and the maximum is attained at Qη ∈ Qη with dQη dP = ec(η)f EP[ec(η)f] with c such that R(Qη||Q) = η and of course similarly for min 13

slide-14
SLIDE 14

Concentration / UQ duality Recall: If X1, X2, · · · are IID copies with (centered) MGF Λ(c) for f(X) then by Chernov bound P

  • 1

N

N

  • k=1

f(Xi) − EP[f] > x

  • ≤ e−NΛ∗(x)

Concentration and by Cramer and Sanov Theorem and the contraction principle Λ∗(x) = sup

c

{xc − Λ(c)} (Legendre transform) = inf

Q {R(Q||P) ; EQ[f] − EP[f] = x} ”(Entropy maximization)”

versus (duality of optimization problems) (Λ∗)−1

± (η)

= inf

c≥0

Λ(±c) + η

c

  • (Fenchel-Young)

= sup

Q

{±(EQ[f] − EP[f]) ; R(Q||P) = η} (UQ bounds) 14

slide-15
SLIDE 15

Linearization/ Variance Linearization: For small η = R(Q||P) one has the asymptotic expansion ΞP,f(η) =

  • 2VarP[f]η + 1

3

  • VarP[f]γP(f)η + O(η3/2)

where γP(f) = E[|f−EP [f]|3]

VarP [f]3/2

is the skewness. − → For small pertubation of P UQ is driven by CLT fluctuations, in the linear regime. − → For large perturbations of P UQ is driven by rare events or rather concentration of measure 15

slide-16
SLIDE 16

Markov process: chosing the right path space entropy Baselines: Markov process Xt with path-space measure P 0:T Alternative: Stochastic process Yt with path-space measure Q0:T (not necessarily Markovian!) and Q0:T ≪ P 0:T Idea is to restrict the relative entropy to a sub σ-algebra tailored to the observables at hand

  • Ergodic averages. Apply the inequality to FT = T

0 f(Xt) dt

EQ

FT

T

  • −EP

FT

T

  • ≤ inf

c>0

1

T log EP[ec(FT −EP [FT ])] + 1 T R(Q0:T ν0 ||P 0:T µ0 )

c

  • Under suitable ergodicity assumptions for Xt the bounds scale

as T → ∞. The important quantity is the relative entropy rate (it scales nicely with T as we shall see later)... 16

slide-17
SLIDE 17
  • Ergodic averages: statistical mechanics.

P = Gibbs measure on ΩZd (Ω finite set) with potential Φ. Q = any translation invariant measure on ΩZd. r(Q||P) = lim

V րZd

1 |V |R(Q|V ||P|V ) always exist and is finite Theorem: For (quasilocal) observable f inf

c>0

λ(−c) + r(Q|P)

c

  • ≤ EQ[f] − EP[f] ≤ inf

c>0

λ(c) + r(Q|P)

c

  • λ(c) = P(Φ + cΨf) − P(φ) translated pressure

(that is local Hamiltonian HV + c

x∈V τx(f))

17

slide-18
SLIDE 18
  • Stopping time τ and QoI Fτ = τ

0 f(Xt) dt.

It is natural to restrict the relative entropy to the σ-algebra Fτ. EQ [Fτ] − EP [Fτ] ≤ inf

c>0

  • log EP[ecFτ−EP [Fτ]] + R(Q0:τ||P 0:τ)

c

  • Just stop the process....
  • Discounted observable QoI Gλ(f) = ∞

0 f(Xt)λe−λt dt.

Define a new measure Pλ: Xt runs up to a random time T with exponential distribution with mean 1/λ. Then R(Qλ||Pλ) =

R(Q0:t||P 0:t)λe−λt dt discounted entropy EQ [Gλ(f)] − EP [Gλ(f)] ≤ inf

c>0

  • Gλ(ecf) + Rλ(Q||P)

c

  • 18
slide-19
SLIDE 19

UQ for statistical estimators/ mean field formalism How do we get UQ bounds for non-linear functionals of P, for example variance or skewness VarP[f(X)]

  • r

γP[f] = EP[(f − EP[f(X)])3] VarP[f(X)]3/2

  • r more general statistical estimators?

A fundamental result in large deviations Laplace principle: (Varadhan, Bryc, Dupuis-Ellis) The sequence SN taking value in Y satisfy a LDP with rate function I(Y ) if and only if for all Φ : Y → R bounded and continuous lim

N→∞

1 N log EP[eNΦ(NSN)] = sup

y

{Φ(y) − I(y)} 19

slide-20
SLIDE 20

Example: UQ for the variance Build a statistical estimator for the variance 1 N

N

  • i=1

f(Xi)2 −

  • 1

N

N

  • i=1

f(Xi)

2

→ VarP[f] where Xi are IID copies of X. Apply the Gibbs information inequality to statistical estimator, to find Theorem Gibbs UQ Bounds for the variance − inf

c>0

H(−c) + R(Q||P)

c

  • ≤ VarQ[f] ≤ inf

c>0

H(c) + R(Q||P)

c

  • where

H(c) = lim

N→∞

1 N log EP 0:N

  • e

N

i=1 f(Xi)2− 1 N

N

i=1 f(Xi)2

20

slide-21
SLIDE 21

Using the Laplace principle for the joint (f(X), f 2(X)) one finds the convex function H(c) = sup

(u,v)∈R2

  • c(v − u2) − I(u, v)

where Λ(α, β) = log EP

  • eαf(X)+βf 2(X)

(cumulant generating function) I(u, v) = sup

α,β

{αu + βv − Λ(α, β)} (rate function in Cramer’s Theorem) The inequality is tight with optimizer dQα,β = eαf+βf 2 EP[eαf+βf 2]dP for suitable α and β such that R(Qα,β||P) = η This generalizes to general statistical estimators. 21

slide-22
SLIDE 22

Rare events and risk sensitive functionals . UQ for rare events: P(A) ∼ e−I(A)/ǫ (rare event probability) We really want to control I(A) = −ǫ log P(A). More generally we consider risk sensitive functionals log EP[ecf] if c large (free energy) Relative Renyi entropy (a.k.a Renyi divergence): For α = 0, 1 Rα(Q||P) = 1 α(α − 1) log EP

dQ

dP

α

= 1 α(α − 1) log EP

  • eα log dQ

dP

  • 22
slide-23
SLIDE 23

Variational principle for the Relative Reny entropy: (Dupuis et al.) Extension of the Gibbs Variational Principle proved by Atar, Chowdhary, and Dupuis. Relative Renyi entropy (a.k.a Renyi divergence): For α = 0, 1 Rα(Q||P) = 1 α(α − 1) log EP

dQ

dP

α

= 1 α(α − 1) log EP

  • eα log dQ

dP

  • Renyi Variational Principle

proved by Atar, Chowdhary, and Dupuis. 1 β log EQ

  • eβg

= inf

Q

1

γ log EP

  • eγg

+ 1 γ − βR

γ γ−β (Q || P)

  • γ > β

1 β log EQ

  • eβg

= sup

Q

1

γ log EP

  • eγg

− 1 β − γR

β β−γ (Q || P)

  • γ < β

23

slide-24
SLIDE 24

UQ bounds for risk sensitive functionals sup

β<γ

1

β log EP[eβg] + 1 β − γR

γ γ−β (Q || P)

  • ≤ 1

γ log EQ[eγg] 1 γ log EQ[eγg] ≤ inf

β>γ

1

β log EP[eβg] + 1 γ − βR

γ γ−β (Q || P)

  • You can prove similar tightness properties as well.

To treat rare events you take g = −M1Ac and take M → ∞ and relabeling the indices UQ bounds for rare events − inf

α>0

  

log EP

  • e−α log dQ

dP

  • − log P(A)

α

  

≤ log Q(A) − log P(A) log Q(A) − log P(A) ≤ inf

α>1

  

log EP

  • eα log dQ

dP

  • − log P(A)

α

  

Similar optimization problem as before. 24

slide-25
SLIDE 25

Making it computable with concentration inequalities Some examples: (Much more in Gourgoulias, Katsoulakis, R.- B., Wang).

  • If a ≤ f ≤ b we have Hoeffding’s inequality

Λ(c) ≤ c2(b − a)2 8 ≤ c2f − EP[f]∞ 2 and then ΞP,f(η) ≤

  • 2ηf − EP[f]∞

(Cziszar-Kullback Pinsker).

  • If f is bounded and VarP[f] = σ2 then we have Bernstein

inequality Λ(c) ≤ c2σ2 2(1 − cf − EP[f]∞) and then ΞP,f(η) ≤

  • 2VarP[f]η + f − EP[f]∞η

This beats Pinsker if η is not too big (especially if σ2 is small) and captures the exact small η asymptotics.

  • Many more: Sharper inequalities for bounded f and other for

Poissonian, Gaussian, exponential tails.... 25

slide-26
SLIDE 26

Steady state UQ bounds for ergodic Markov processes Consider ergodic averages

1 T

T

0 f(Xs) ds then using the Gibbs

UQ bound one obtains the steady state bias bound ξP,−f(r(Q||P)) ≤ lim

T→∞

1 T

T

f(Ys)ds

  • true process

− Eµ[f]

  • baseline

≤ ξP,f(r(Q||P)) where ξP,f(η) = inf

c>0

λ(c) + η

c

  • λ(c) = lim

T→∞

1 T log EP 0:T

  • e

c T

0 (f(Xs)−Eµ[f])ds

  • (CGF)

η = lim

T→∞

1 T R(Q0:T||P 0:T) (relative entropy rate) 26

slide-27
SLIDE 27

Coercive Dynamics Langevin equation: dX = −∇V + J∇V + √ 2dWt for any any antisymmetric J has invariant measure dµ = Z−1e−V dx and we have L = ∆ − ∇V ∇

  • symmetric

+ J∇V ∇

antisymmetric

  • Main idea (from Liming Wu): Bound the Feynmann-Kac semi

group eT(L+V )h(x) = EP 0:T

δx

  • e

T

0 V (Xs)dsh(Xt)

  • using Lumer-Philips Theorem

1 T log eT(L+V )L2(µ) ≤ sup

  • g , LgL2(µ) +
  • V |g|2dµ , g2 = 1
  • .

See the works on concentration inequalities by Lezeaud, Wu, Catiaux, Guillin and collaborators on which we rely here. 27

slide-28
SLIDE 28

Poincar´ e inequalities and bounded f Theorem: If we have a Poincar´ e inequality (spectral gap) Varµ[f] ≤ −αf , LfL2(µ) , f ∈ D(L) then for bounded f and general L λ(c) ≤ c2αVarµ[f] 1 − αcf − Eµ[f]∞ Bernstein type bound ξP,f(η) ≤ 2

  • αVarµ[f]η + αf − Eµ[f]∞η

Theorem: For symmetric L we we have the sharper bound λ(c) ≤ c2σ2(f) 2(1 − αc f∞) Bernstein type bound ξP,f(η) ≤

  • 2σ2(f)η + αf − Eµ[f]∞η

(sharp for small η). 28

slide-29
SLIDE 29

Log-Sobolev inequalities and unbounded f Assume a stronger Log-Sobolev inequality

Eµ[f 2 log(f 2)] − Eµ[f 2] log Eµ[f 2] ≤ −βf , Lf

f ∈ D(L) Then using the Gibbs variational principle get the bound ξP,f(η) = inf

c>0

  • log Eµ
  • ec(f−Eµ[f])

c + βη c

  • The only trace of the dynamics is left in the constant β.

The tail behavior of f in the stationary distribution determines the UQ. Use another concentration inequality If V (x) ∼ |x|β (usual bounds on ∇V and ∆V ...)

  • Poincar´

e for β > 1

  • Log Sobolev for β > 2 so UQ bounds for V (X) itself.

For 1 < b ≤ 2 we can use F- Sobolev inequalities to consider unbounded f. 29

slide-30
SLIDE 30

Hypocoercive samplers Goal: To sample from ν(dq) ∝ e−βV (q)dq extending the phase space and sample from the measure µ(dp, dq) = ν(dq)π(dp) ∝ e−β(V (q)+p2/2m)dpdq You can use other distribution of p too. Why?: Add extra dimensions to escape your bad karma.... Make the dynamics irreversible to get faster (maybe).

  • Ex1: Langevin equation

dqt = pt mdt, dpt =

  • −∇V (qt) − γ pt

m

  • dt +

β dWt (1) L =

  • pT

m

  • ∇q − ∇V T∇p
  • T=−T ∗

+ 1 β(∆p − γ

p

M

T

∇p)

  • S=S∗

30

slide-31
SLIDE 31
  • Ex2: Randomized Hamiltonian Monte-Carlo.

The particle follow Hamiltonian equation of motions dqt = pt mdt, dpt = −∇V (qt) without noise or dissipation for a random amount of time at which we resample the momentum according to the stationary measure. With the projection Πf = f(p, q)dπ(p) the generator is (2) L =

  • pT

m

  • ∇q − ∇V T∇p
  • T=−T ∗

+ λ(Π − I)

S=S∗

31

slide-32
SLIDE 32
  • EX 3: Bouncy particle sampler.

The particle follow straight lines for a random time. At updat- ing time one either resample the momentum according to the stationary measure or the particle ”bounces”, i.e., it undergoes a Newtonian elastic collision on the hyperplane tangential to the gradient of the energy and the momentum is updated according to the rule (3) r(q)p = p − pT∇V (q) ∇V 2 ∇V Rf(p, q) = f(q, r(q)p) (4) L =

p

m

T

∇q

  • free motion

+

p

m

T

∇V (q)

+

(R − I)

  • bouncing

+ λ(Π − I)

noise

  • Zig-zag sampler..... etc...
  • Temperature accelerated molecular dynamics
  • Ask Gabriel Stoltz.

32

slide-33
SLIDE 33

Hypocoercvity Dolbeaut-Mouhot-Schmeiser Andrieu-Durmus-N¨ usken-Roussel Rouset-Stoltz-Trstanova, Olla, ... after many other works (Villani, Hereau-Nier, Hairer-Eckmann). Idea: The dynamics is not coercive (no Poincar´ e inequality in L2(µ) for L), but there exists a scalar product equivalent to L2(µ) where a Poincar’e inequality holds. f, gǫ = f, g + ǫf, (B + B∗)g. B = (1 + (TΠ)∗(TΠ))−1(−TΠ)∗ and T is the antisymmetric part of the generator Modified Poincar´ e inequality: −Lg, gǫ ≥ Λ(ǫ)Varµ(f) (5) and Λ(ǫ) is explicitly expressed in terms of the Poincar´ e constant for ν(dq) the spectral gap of the noise operator and the potential V .... 33

slide-34
SLIDE 34

Performance guarantees for hypocoercive samplers New results (Jeremiah Birell and L. R.-B.) Theorem (Bernstein type inequalities for hypocoercive samplers) For bounded f we have Pµ0

  • 1

T

T

f(Xt)dt −

  • fdµ
  • ≥ r
  • ≤ a(ǫ)
  • dµ0

  • L2(µ)

exp

  • −T

b(ǫ)Λ(ǫ)r2 4Varµ[f] + 2c(ǫ)f − Eµ[f]r

  • where a(ǫ), b(ǫ), c(ǫ) only depends on ǫ.

→ Explicit non asymptotic confidence intervals for fdµ, i.e. → UQ bounds for alternative processes ξP,f(η) ≤

  • 2d(ǫ)Λ(ǫ)Varµ[f]η + e(ǫ)Λ(ǫ)f − Eµ[f]∞η

34