Simulation methods for lower previsions Matthias C. M. Troffaes - - PowerPoint PPT Presentation

simulation methods for lower previsions
SMART_READER_LITE
LIVE PREVIEW

Simulation methods for lower previsions Matthias C. M. Troffaes - - PowerPoint PPT Presentation

Simulation methods for lower previsions Matthias C. M. Troffaes work partially supported by H2020 Marie Curie ITN, UTOPIAE, Grant Agreement No. 722734 Durham University, United Kingdom July, 2018 1 Outline Problem Description Imprecise


slide-1
SLIDE 1

Simulation methods for lower previsions

Matthias C. M. Troffaes

work partially supported by H2020 Marie Curie ITN, UTOPIAE, Grant Agreement No. 722734

Durham University, United Kingdom

July, 2018

1

slide-2
SLIDE 2

Outline

Problem Description Imprecise Estimation Lower and Upper Estimators for the Minimum of a Function Bias of Lower and Upper Estimators Consistency of the Lower Estimator Discrepancy Bounds Confidence Interval from Lower and Upper Estimators Examples Toy Problem Two-Level Monte Carlo v1 Two-Level Monte Carlo v2 Importance Sampling Stochastic Approximation Kiefer-Wolfowitz Example 1 Example 2 Open Questions

2

slide-3
SLIDE 3

Outline

Problem Description Imprecise Estimation Lower and Upper Estimators for the Minimum of a Function Bias of Lower and Upper Estimators Consistency of the Lower Estimator Discrepancy Bounds Confidence Interval from Lower and Upper Estimators Examples Toy Problem Two-Level Monte Carlo v1 Two-Level Monte Carlo v2 Importance Sampling Stochastic Approximation Kiefer-Wolfowitz Example 1 Example 2 Open Questions

3

slide-4
SLIDE 4

Problem Description

Remember the natural extension of a gamble g: E(g) ≔ min

p∈M Ep(g)

(1)

◮ It represents the supremum buying price α you should be willing to pay for g ◮ We can use this natural extension for all statistical inference and decision making. ◮ how to evaluate the minimum in eq. (1) provided we have an estimator for Ep(g)?

4

slide-5
SLIDE 5

Problem Description

statistical inference under severe uncertainty

lower previsions

credal set imprecise estimators

simulation

importance sampling

  • ptimization

5

slide-6
SLIDE 6

Outline

Problem Description Imprecise Estimation Lower and Upper Estimators for the Minimum of a Function Bias of Lower and Upper Estimators Consistency of the Lower Estimator Discrepancy Bounds Confidence Interval from Lower and Upper Estimators Examples Toy Problem Two-Level Monte Carlo v1 Two-Level Monte Carlo v2 Importance Sampling Stochastic Approximation Kiefer-Wolfowitz Example 1 Example 2 Open Questions

6

slide-7
SLIDE 7

Lower and Upper Estimators for the Minimum of a Function

(see [12])

◮ Ω = random variable, taking values in some subset of Rk ◮ t = parameter taking values in some set T

(assume T countable)

◮ θ(t) = arbitrary function of t ◮ ˆ

θΩ(t) = arbitrary estimator for θ:

E(ˆ

θΩ(t)) = θ(t),

(2)

Aim

Construct an estimator for the minimum of the function θ:

θ∗ ≔ inf

t∈T θ(t).

(3)

Example

Say for instance M = {pt : t ∈ T }, and let θ(t) ≔ Ept(f). Then θ∗ = E(f). So estimation of θ∗ = estimation of natural extension.

7

slide-8
SLIDE 8

Lower and Upper Estimators for the Minimum of a Function

Define the function

τΩ ∈ arg inf

t∈T

ˆ θΩ(t)

(4)

Theorem (Lower and Upper Estimator Theorem [12])

Assume Ω and Ω′ are i.i.d. and let

ˆ θ∗(Ω) ≔ ˆ θΩ(τΩ) = inf

t∈T

ˆ θΩ(t)

(5)

ˆ θ∗(Ω, Ω′) ≔ ˆ θΩ(τΩ′)

(6) Then

ˆ θ∗(Ω) ≤ ˆ θ∗(Ω, Ω′)

(7) and E(ˆ

θ∗(Ω)) ≤ θ∗ ≤ E(ˆ θ∗(Ω, Ω′)).

(8)

8

slide-9
SLIDE 9

Lower and Upper Estimators for the Minimum of a Function

t θ

0.30 0.35 0.40 0.45 0.50 0.55 −0.5 0.0 0.5

θ* θ(t) θ ^

Ω(t)

θ ^

Ω'(t)

9

slide-10
SLIDE 10

Lower and Upper Estimators for the Minimum of a Function

t θ

0.30 0.35 0.40 0.45 0.50 0.55 −0.5 0.0 0.5

θ* θ(t) θ ^

Ω(t)

θ ^

Ω'(t)

10

slide-11
SLIDE 11

Lower and Upper Estimators for the Minimum of a Function

t θ

0.30 0.35 0.40 0.45 0.50 0.55 −0.5 0.0 0.5

θ* τΩ θ*(Ω) θ(t) θ ^

Ω(t)

θ ^

Ω'(t)

11

slide-12
SLIDE 12

Lower and Upper Estimators for the Minimum of a Function

t θ

0.30 0.35 0.40 0.45 0.50 0.55 −0.5 0.0 0.5

θ* τΩ θ*(Ω) τΩ' θ(t) θ ^

Ω(t)

θ ^

Ω'(t)

12

slide-13
SLIDE 13

Lower and Upper Estimators for the Minimum of a Function

t θ

0.30 0.35 0.40 0.45 0.50 0.55 −0.5 0.0 0.5

θ* τΩ θ*(Ω) τΩ' θ*(Ω,Ω') θ(t) θ ^

Ω(t)

θ ^

Ω'(t)

13

slide-14
SLIDE 14

Bias of Lower and Upper Estimators

◮ ˆ

θ∗(Ω): used throughout the literature as an estimator for lower previsions

not normally noted in the literature that it is negatively biased bias can be very large in general (even infinity)!

◮ ˆ

θ∗(Ω, Ω′): introduced at last year’s WPMSIIP

still cannot yet prove much about it it allows us to bound the bias without having to do hardcore stochastic process theory

Theorem (Unbiased Case [12])

If there is a t∗ ∈ T such that ˆ

θΩ(t∗) ≤ ˆ θΩ(t) for all t ∈ T , then ˆ θ∗(Ω) = ˆ θ∗(Ω, Ω′) = ˆ θΩ(t∗)

(9) and consequently, E(ˆ

θ∗(Ω)) = θ∗ = E(ˆ θ∗(Ω, Ω′)).

(10) (Condition not normally satisfied, but explains why it is a sensible choice.)

14

slide-15
SLIDE 15

Consistency of the Lower Estimator

Very often, an estimator may take the form of an empirical mean:

ˆ θΩ,n(t) = 1

n

n

  • i=1

ˆ θVi(t)

(11) where Ω ≔ (Vi)i∈N and Vi are i.i.d. Under mild conditions, this estimator is consistent: lim

n→∞ P(|ˆ

θΩ,n(t) − θ(t)| > ǫ) = 0

(12)

◮ Under what conditions is ˆ

θ∗n(Ω) a consistent estimator for θ∗, i.e. when do we have that

lim

n→∞ P(|ˆ

θ∗n(Ω) − θ∗| > ǫ) = 0

(13)

◮ How large should n be?

15

slide-16
SLIDE 16

Consistency of the Lower Estimator

Simple case first:

Theorem (Consistency: Finite Case [12])

If T is finite, then ˆ

θ∗n(Ω) is a consistent estimator for θ∗.

(Even though consistent, may require excessively large n to control bias!) General case, no positive answer in general, but consistency can be linked to a well-known condition in stochastic process theory:

Theorem (Consistency: Sufficient Condition for General Case [12])

If the set of functions {ˆ

θ(·, t): t ∈ T } is a Glivenko-Cantelli class, then ˆ θ∗n(Ω) is a consistent

estimator for θ∗.

16

slide-17
SLIDE 17

Discrepancy Bounds for the Lower Estimator

Notation: Zn(t) ≔ ˆ

θΩ,n(t) − θ(t)

(14) dn(s, t) ≔

  • E ((Zn(s) − Zn(t))2)

(15)

∆n(A) ≔ sup

s,t∈A

dn(s, t) (16)

σ2

n ≔ inf t∈T Var(Zn(t)) = inf t∈T Var(ˆ

θΩ,n(t))

(17)

Definition (Talagrand Functional)

Define the Talagrand functional [10, p. 25] as:

γ2(T , dn) ≔ inf

Ak

sup

t∈T ∞

  • k=0

2k/2∆n(Ak(t)) (18) where the infimum is taken over all ‘admissible sequences of partitions of T’.

17

slide-18
SLIDE 18

Discrepancy Bounds for Empirical Mean Lower Estimator

Theorem (Discrepancy Bounds for Empirical Mean Lower Estimator [12])

Assume ˆ

θ∗n(Ω) ≔ 1

n

n

i=1 ˆ

θVi(t). There is a universal constant L > 0 such that, if ˆ θΩ,n(t) is

sub-Gaussian, then P

θ∗n(Ω) − θ∗| > u(σ1 + γ2(T , d1))

  • ≤ L exp(− nu2

2 )

(19) and E

θ∗n(Ω) − θ∗|

  • ≤ L σ1 + γ2(T , d1)

√n .

(20)

Corollary (Consistency of Empirical Mean Lower Estimator [12])

If ˆ

θΩ,n(t) is sub-Gaussian, then ˆ θ∗n(Ω) is a consistent estimator for θ∗ whenever the minimal

standard deviation σ1 and the Talagrand functional γ2(T ′, d1) are finite. Issue: it is not easy to compute or to bound the Talagrand functional!

18

slide-19
SLIDE 19

Empirical Mean Lower Estimator: How To Achieve Low Bias

Inconsistency Example

◮ ˆ

θΩ,n(t) has non-zero variance across all t

◮ ˆ

θΩ,n(s) and ˆ θΩ,n(t) are independent for all s t

◮ T is infinite

Then the Talagrand functional γ2(T , d1) is +∞. Important for 2-level Monte Carlo: don’t use i.i.d. samples in outer loop over t ∈ T !

Main Take-Home Message for Design of Estimators

To get a low Talagrand functional (and hence a low bias), we want ˆ

θΩ,n(s) and ˆ θΩ,n(t) to be as correlated as possible for all s t.

19

slide-20
SLIDE 20

Confidence Interval

Theorem (Confidence Interval from Lower and Upper Estimators [12])

Let χ1, . . . , χN, χ′

1, . . . , χ′ N be a sequence of i.i.d. realisations of Ω. Define

Y∗ ≔ (ˆ

θ∗(χi))N

i=1

Y∗ ≔ (ˆ

θ∗(χi, χ′

i))N i=1

(21) Let ¯ Y∗ and ¯ Y∗ be the sample means of these sequences, and let S∗ and S∗ be their sample standard deviations. Let tN−1 denote the usual two-sided critical value of the t-distribution with N − 1 degrees of freedom at confidence level 1 − α. Then, provided that supx,t |ˆ

θ(x, t)| < +∞,

  • ¯

Y∗ − tN−1 S∗

N

, ¯

Y∗ + tN−1 S∗

N

  • (22)

is an approximate confidence interval for θ∗ with confidence level (at least) 1 − α. Why is this rather slow? Note: we can cheat and use ˆ

θ∗(χ′

i, χi) instead for Y∗.

This trick halves computational time (caveat: need ¯ Y∗ ≤ ¯ Y∗ with probability ≃ 1).

20

slide-21
SLIDE 21

Outline

Problem Description Imprecise Estimation Lower and Upper Estimators for the Minimum of a Function Bias of Lower and Upper Estimators Consistency of the Lower Estimator Discrepancy Bounds Confidence Interval from Lower and Upper Estimators Examples Toy Problem Two-Level Monte Carlo v1 Two-Level Monte Carlo v2 Importance Sampling Stochastic Approximation Kiefer-Wolfowitz Example 1 Example 2 Open Questions

21

slide-22
SLIDE 22

Example: Toy Problem

(based on [13])

◮ V ≔ (U1, U2) ∼ unif([0, 1]2) ◮ t ≔ (µ, σ) ∈ [−3, 3] × {1} ◮ xt(V) ≔ µ + σ √−2 ln U1 cos(2πU2) ∼ norm(µ, σ2) ◮ ft(x) ≔ 1 √ 2πσ2 e− (x−µ)2

◮ h(x) ≔ ID(x) where D = (−∞, −1] ∪ [1, ∞) ◮ θ(t) ≔

  • h(x)ft(x) dx

22

slide-23
SLIDE 23

Example: Two-Level Monte Carlo v1

◮ different Vi(t) for each value t

ˆ θΩ(t) ≔ 1

n

n

  • i=1

h(xt(Vi(t)))

◮ simple ◮ inefficient ◮ hard to optimize ◮ horrible bias ◮ inconsistent t=−2

u1 u2

  • x

ft(x) xt(V)

t=0

u1 u2

  • x

ft(x) xt(V)

t=2

u1 u2

  • x

ft(x) xt(V) t θ

0.2 0.4 0.6 0.8 −3 −2 −1 1 2 3

θ(t) θ ^

Ω(t)

θ ^

Ω'(t)

θ ^

Ω*

θ ^

Ω*

23

slide-24
SLIDE 24

Example: Two-Level Monte Carlo v2

◮ same Vi for each value t

ˆ θΩ(t) ≔ 1

n

n

  • i=1

h(xt(Vi))

◮ most efficient ◮ can be fairly hard optimize

might have many local minima

◮ minimal bias ◮ consistent t=−2

u1 u2

  • x

ft(x) xt(V)

t=0

u1 u2

  • x

ft(x) xt(V)

t=2

u1 u2

  • x

ft(x) xt(V) t θ

0.2 0.4 0.6 0.8 −3 −2 −1 1 2 3

θ(t) θ ^

Ω(t)

θ ^

Ω'(t)

θ ^

Ω*

θ ^

Ω*

24

slide-25
SLIDE 25

Example: Importance Sampling

(see [8, 4, 14, 11, 3, 12, 13])

◮ same Vi for each value t ◮ same samples xR(Vi) for all t

ˆ θΩ(t) ≔ 1

n

n

  • i=1

ft(xR(Vi)) fR(xR(Vi))h(xR(Vi))

◮ quite efficient for fast densities ◮ easiest to optimize ◮ small bias ◮ still consistent ◮ fR needs to cover all ft

variance inflation, iterative proce- dures, . . . [13]

t=−2

u1 u2

  • x

fR(x) xR(V)

t=0

u1 u2

  • x

fR(x) xR(V)

t=2

u1 u2

  • x

fR(x) xR(V) t θ

0.2 0.4 0.6 0.8 −3 −2 −1 1 2 3

θ(t) θ ^

Ω(t)

θ ^

Ω'(t)

θ ^

Ω*

θ ^

Ω*

25

slide-26
SLIDE 26

Outline

Problem Description Imprecise Estimation Lower and Upper Estimators for the Minimum of a Function Bias of Lower and Upper Estimators Consistency of the Lower Estimator Discrepancy Bounds Confidence Interval from Lower and Upper Estimators Examples Toy Problem Two-Level Monte Carlo v1 Two-Level Monte Carlo v2 Importance Sampling Stochastic Approximation Kiefer-Wolfowitz Example 1 Example 2 Open Questions

26

slide-27
SLIDE 27

Stochastic Approximation: Kiefer-Wolfowitz

Assume E(ˆ

θΩ(t)) = θ(t), uniformly bounded variance. Let

◮ an ≔ 1/n ◮ cn ≔ n−1/3

Then tn+1(Ωn+1) = tn(Ωn) − an

ˆ θΩn+1(tn(Ωn) + cn) − ˆ θΩn+1(tn(Ωn) − cn)

2cn

  • stochastic approx of derivative dˆ

θ dt

(23) will converge with probability 1 to θ∗ = mint θ(t), provided that θ(t) is strictly convex. unbiased and consistent estimator!

27

slide-28
SLIDE 28

Stochastic Approximation: Example 1 – Single Sample

n t

−1.0 −0.8 −0.6 −0.4 −0.2 0.0 10^0 10^1 10^2 10^3 10^4 28

slide-29
SLIDE 29

Stochastic Approximation: Example 1 – Mini-Batch MCv2

n t

−1.0 −0.8 −0.6 −0.4 −0.2 0.0 10^0.0 10^0.5 10^1.0 10^1.5 10^2.0 29

slide-30
SLIDE 30

Stochastic Approximation: Example 1 – Mini-Batch Importance

n t

−1.0 −0.8 −0.6 −0.4 −0.2 0.0 10^0.0 10^0.5 10^1.0 10^1.5 10^2.0 30

slide-31
SLIDE 31

Stochastic Approximation: Example 2 – Single Sample

n t

−0.5 0.0 0.5 1.0 10^0 10^1 10^2 10^3 10^4 31

slide-32
SLIDE 32

Stochastic Approximation: Example 2 – Mini-Batch MCv2

n t

−0.5 −0.4 −0.3 −0.2 −0.1 0.0 10^0.0 10^0.5 10^1.0 10^1.5 10^2.0 32

slide-33
SLIDE 33

Stochastic Approximation: Example 2 – Mini-Batch Importance

n t

−0.8 −0.6 −0.4 −0.2 0.0 10^0.0 10^0.5 10^1.0 10^1.5 10^2.0 33

slide-34
SLIDE 34

Outline

Problem Description Imprecise Estimation Lower and Upper Estimators for the Minimum of a Function Bias of Lower and Upper Estimators Consistency of the Lower Estimator Discrepancy Bounds Confidence Interval from Lower and Upper Estimators Examples Toy Problem Two-Level Monte Carlo v1 Two-Level Monte Carlo v2 Importance Sampling Stochastic Approximation Kiefer-Wolfowitz Example 1 Example 2 Open Questions

34

slide-35
SLIDE 35

Open Questions

◮ imprecise estimation ◮ the good: we can construct confidence intervals ◮ the bad: conditions for consistency hard to quantify ◮ the ugly: need multiple runs ◮ stochastic approximation ◮ the good: simple, no bias, consistent ◮ the bad: conditions too restrictive? confidence intervals? ◮ the ugly: no proofs yet (standard conditions not satisfied yet simulations appear to

work)

35

slide-36
SLIDE 36

References I

[1]

  • J. E. Cano, L. D. Hern´

andez, and S. Moral. Importance sampling algorithms for the propagation of probabilities in belief networks. International Journal of Approximate Reasoning, 15(1):77–92, 1996. [2] Marco de Angelis, Edoardo Patelli, and Michael Beer. Advanced line sampling for efficient robust reliability analysis. Structural Safety, 52, Part B:170–182, 2015. [3] Thomas Fetz. Efficient computation of upper probabilities of failure. In Christian Bucher, Bruce R. Ellingwood, and Dan M. Frangopol, editors, 12th International Conference on Structural Safety & Reliability, pages 493–502, 2017. [4] Thomas Fetz and Michael Oberguggenberger. Imprecise random variables, random sets, and Monte Carlo simulation. In Thomas Augustin, Serena Doria, Enrique Miranda, and Erik Quaeghebeur, editors, ISIPTA ’15: Proceedings of the Ninth International Symposium on Imprecise Probability: Theories and Applications, pages 137–146, 2015. [5] Luis D. Hern´ andez and Seraf´ ın Moral. Mixing exact and importance sampling propagation algorithms in dependence graphs. International Journal of Intelligent Systems, 12(8):553–576, August 1997.

36

slide-37
SLIDE 37

References II

[6]

  • S. Moral and N. Wilson.

Importance sampling algorithms for the calculation of Dempster-Shafer belief. In Proceedings of IPMU-96 Conference, volume 3, pages 1337–1344, 1996. [7] Michael Oberguggenberger, Julian King, and Bernhard Schmelzer. Classical and imprecise probability methods for sensitivity analysis in engineering: A case study. International Journal of Approximate Reasoning, 50(4):680–693, 2009. [8]

  • B. O’Neill.

Importance sampling for Bayesian sensitivity analysis. International Journal of Approximate Reasoning, 50(2):270–278, 2009. [9] Art B. Owen. Monte Carlo theory, methods and examples. 2013. [10] Michel Talagrand. Upper and Lower Bounds for Stochastic Processes. Springer, 2014.

37

slide-38
SLIDE 38

References III

[11] Matthias C. M. Troffaes. A note on imprecise Monte Carlo over credal sets via importance sampling. In Alessandro Antonucci, Giorgio Corani, In´ es Couso, and S´ ebastien Destercke, editors, Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, volume 62 of Proceedings of Machine Learning Research, pages 325–332. PMLR, July 2017. [12] Matthias C. M. Troffaes. Imprecise Monte Carlo simulation and iterative importance sampling for the estimation of lower previsions. International Journal of Approximate Reasoning, 101:31–48, October 2018. [13] Matthias C. M. Troffaes, Thomas Fetz, and Michael Oberguggenberger. Iterative importance sampling for estimating expectation bounds under partial probability specifications. In 8th International Workshop on Reliable Engineering Computing: Computing with Confidence (REC 2018), pages 139–146, Liverpool, UK, July 2018. [14] Jiaxin Zhang and Michael D. Shields. Efficient propagation of imprecise probabilities. In 7th International Workshop on Reliable Engineering Computing, pages 197–209, 2016.

38