[PPT] - Winsorized Importance Sampling Paulo Orenstein February 8, 2019 PowerPoint Presentation

SLIDE 1

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Winsorized Importance Sampling

Paulo Orenstein

February 8, 2019

Stanford University

Paulo Orenstein Winsorized Importance Sampling Stanford University 1 / 23

SLIDE 2

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Introduction

◮ Let f (x) be an arbitrary function, p(x), q(x) probability densities. Suppose we are interested in θ = Ep[f (X)] =

R

f (x)p(x)dx.

Paulo Orenstein Winsorized Importance Sampling Stanford University 2 / 23

SLIDE 3

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Introduction

◮ Let f (x) be an arbitrary function, p(x), q(x) probability densities. Suppose we are interested in θ = Ep[f (X)] =

R

f (x)p(x)dx. ◮ Assume we can only sample from q, which is called the sampling distribu- tion; p is the target distribution.

Paulo Orenstein Winsorized Importance Sampling Stanford University 2 / 23

SLIDE 4

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Introduction

◮ Let f (x) be an arbitrary function, p(x), q(x) probability densities. Suppose we are interested in θ = Ep[f (X)] =

R

f (x)p(x)dx. ◮ Assume we can only sample from q, which is called the sampling distribu- tion; p is the target distribution. ◮ The importance sampling estimator for θ is ˆ θn = 1 n

n

i=1

f (Xi)p(Xi) q(Xi), Xi ∼ q.

Paulo Orenstein Winsorized Importance Sampling Stanford University 2 / 23

SLIDE 5

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Introduction

◮ The importance sampling (IS) estimator is unbiased: ˆ θn

n→∞

− → E

f (x)p(X)

q(X)

=
f (x)p(x)

q(x)q(x)dx =

f (x)p(x)dx = θ,

as long as q(x) > 0 whenever f (x)p(x) = 0.

Paulo Orenstein Winsorized Importance Sampling Stanford University 3 / 23

SLIDE 6

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Introduction

◮ The importance sampling (IS) estimator is unbiased: ˆ θn

n→∞

− → E

f (x)p(X)

q(X)

=
f (x)p(x)

q(x)q(x)dx =

f (x)p(x)dx = θ,

as long as q(x) > 0 whenever f (x)p(x) = 0. ◮ But it can have huge or even infinite variance, leading to terrible estimates.

Paulo Orenstein Winsorized Importance Sampling Stanford University 3 / 23

SLIDE 7

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Introduction

◮ The importance sampling (IS) estimator is unbiased: ˆ θn

n→∞

− → E

f (x)p(X)

q(X)

=
f (x)p(x)

q(x)q(x)dx =

f (x)p(x)dx = θ,

as long as q(x) > 0 whenever f (x)p(x) = 0. ◮ But it can have huge or even infinite variance, leading to terrible estimates. ◮ Can we control the variance of the terms Yi = f (Xi)p(Xi) q(Xi) by sacrificing some small amount of bias?

Paulo Orenstein Winsorized Importance Sampling Stanford University 3 / 23

SLIDE 8

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Winsorizing

◮ Can we improve on the IS estimator by winsorizing, or capping, the weights?

Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

SLIDE 9

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Winsorizing

◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? ◮ Denote the random variables winsorized at levels −M and M by Y M

i

= max(−M, min(Yi, M)).

Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

SLIDE 10

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Winsorizing

◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? ◮ Denote the random variables winsorized at levels −M and M by Y M

i

= max(−M, min(Yi, M)). ◮ Define the winsorized importance sampling estimator at level M as ˆ θM

n = 1

n

i=1

Y M

i .

Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

SLIDE 11

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Winsorizing

◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? ◮ Denote the random variables winsorized at levels −M and M by Y M

i

= max(−M, min(Yi, M)). ◮ Define the winsorized importance sampling estimator at level M as ˆ θM

n = 1

n

i=1

Y M

i .

◮ Picking the right threshold level M is crucial.

Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

SLIDE 12

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Winsorizing

◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? ◮ Denote the random variables winsorized at levels −M and M by Y M

i

= max(−M, min(Yi, M)). ◮ Define the winsorized importance sampling estimator at level M as ˆ θM

n = 1

n

i=1

Y M

i .

◮ Picking the right threshold level M is crucial. ◮ Bias-variance trade-off: smaller M implies less variance but more bias.

Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

SLIDE 13

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

How to pick M?

◮ Let {Yi}n

i=1 be random variables distributed iid with mean θ.

Paulo Orenstein Winsorized Importance Sampling Stanford University 5 / 23

SLIDE 14

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

How to pick M?

◮ Let {Yi}n

i=1 be random variables distributed iid with mean θ.

◮ Consider winsorizing Yi at different threshold levels in a pre-chosen set Λ = {M1, . . . , Mk} to obtain winsorized samples {Y

Mj i

}n

i=1, j = 1, . . . , k.

Paulo Orenstein Winsorized Importance Sampling Stanford University 5 / 23

SLIDE 15

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

How to pick M?

◮ Let {Yi}n

i=1 be random variables distributed iid with mean θ.

◮ Consider winsorizing Yi at different threshold levels in a pre-chosen set Λ = {M1, . . . , Mk} to obtain winsorized samples {Y

Mj i

}n

i=1, j = 1, . . . , k.

◮ Pick the threshold level according to the rule M∗ = min

M ∈ Λ : ∀M′, M′′ ≥ M, |Y M′ − Y M′′| ≤ α ·
ˆ

σM′ + ˆ σM′′ 2

,

where:

Paulo Orenstein Winsorized Importance Sampling Stanford University 5 / 23

SLIDE 16

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

How to pick M?

◮ Let {Yi}n

i=1 be random variables distributed iid with mean θ.

◮ Consider winsorizing Yi at different threshold levels in a pre-chosen set Λ = {M1, . . . , Mk} to obtain winsorized samples {Y

Mj i

}n

i=1, j = 1, . . . , k.

◮ Pick the threshold level according to the rule M∗ = min

M ∈ Λ : ∀M′, M′′ ≥ M, |Y M′ − Y M′′| ≤ α ·
ˆ

σM′ + ˆ σM′′ 2

,

where:

α = c ·

t √n−t

c, t are chosen constants Y M = 1

n

i=1 Y M i

ˆ σM =

1

n

i=1(Y M i

− Y M)2.

Paulo Orenstein Winsorized Importance Sampling Stanford University 5 / 23

SLIDE 17

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Why?

◮ Why is this rule sensible?

Paulo Orenstein Winsorized Importance Sampling Stanford University 6 / 23

SLIDE 18

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Why?

◮ Why is this rule sensible? ◮ Intuitively, if we have truncation levels M′ > M′′, we are willing to truncate further to M′′ if the increase in bias | 1

n

i=1 Y M′ i

− 1

n

i=1 Y M′′ i

| is small relative to the standard deviation.

Paulo Orenstein Winsorized Importance Sampling Stanford University 6 / 23

SLIDE 19

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Why?

◮ Why is this rule sensible? ◮ Intuitively, if we have truncation levels M′ > M′′, we are willing to truncate further to M′′ if the increase in bias | 1

n

i=1 Y M′ i

− 1

n

i=1 Y M′′ i

| is small relative to the standard deviation. ◮ The actual rule can be thought of as a concrete version of the Balancing Principle (or Lepski’s Method), which is reminiscent of oracle inequalities.

Paulo Orenstein Winsorized Importance Sampling Stanford University 6 / 23

SLIDE 20

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Why?

◮ Why is this rule sensible? ◮ Intuitively, if we have truncation levels M′ > M′′, we are willing to truncate further to M′′ if the increase in bias | 1

n

i=1 Y M′ i

− 1

n

i=1 Y M′′ i

| is small relative to the standard deviation. ◮ The actual rule can be thought of as a concrete version of the Balancing Principle (or Lepski’s Method), which is reminiscent of oracle inequalities. ◮ With high probability, the mean-squared error using M∗ is less than 5 times the error roughly incurred by choosing the best threshold level in the set.

Paulo Orenstein Winsorized Importance Sampling Stanford University 6 / 23

SLIDE 21

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Theorem

Let Yi be iid with mean θ. Consider winsorizing Yi at different levels in Λ = {M1, . . . , Mk} to obtain samples Y

Mj i

. Pick the threshold level M∗ = min

M ∈ Λ : ∀M′, M′′ ≥ M,

|Y M′ − Y M′′| ≤ α ·

ˆ

σM′ + ˆ σM′′ 2

,

where α = c ·

t √n−t with c, t chosen constants.

Let K > 0 be such that E[|Y

Mj i

− E[Y

Mj i

]|3] ≤ K(V[Y

Mj i

])3/2 for all j. Then, with probability 1 − 2|Λ|

1 + 50K

√n − Φ

t
n

(√n − t)2 + t2

,

it holds |Y M∗ − θ| ≤ C min

M∈Λ

|E[Y M

i ] − θ| +

t√n √n − t ˆ σM √n

,

where C = C(c) can be made less than 4.25.

Paulo Orenstein Winsorized Importance Sampling Stanford University 7 / 23

SLIDE 22

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Theorem

Let Yi be iid with mean θ. Consider winsorizing Yi at different levels in Λ = {M1, . . . , Mk} to obtain samples Y

Mj i

. Pick the threshold level M∗ = min

M ∈ Λ : ∀M′, M′′ ≥ M,

|Y M′ − Y M′′| ≤ α ·

ˆ

σM′ + ˆ σM′′ 2

,

where α = c ·

t √n−t with c, t chosen constants.

Let K > 0 be such that E[|Y

Mj i

− E[Y

Mj i

]|3] ≤ K(V[Y

Mj i

])3/2 for all j. Then, with probability 1 − 2|Λ|

1 + 50K

√n − Φ

t
n

(√n − t)2 + t2

,

it holds |Y M∗ − θ| ≤ C min

M∈Λ

|E[Y M

i ] − θ| +

t√n √n − t ˆ σM √n

,

where C = C(c) can be made less than 4.25.

Paulo Orenstein Winsorized Importance Sampling Stanford University 7 / 23

SLIDE 23

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Theorem

Let Yi be iid with mean θ. Consider winsorizing Yi at different levels in Λ = {M1, . . . , Mk} to obtain samples Y

Mj i

. Pick the threshold level M∗ = min

M ∈ Λ : ∀M′, M′′ ≥ M,

|Y M′ − Y M′′| ≤ α ·

ˆ

σM′ + ˆ σM′′ 2

,

where α = c ·

t √n−t with c, t chosen constants.

Let K > 0 be such that E[|Y

Mj i

− E[Y

Mj i

]|3] ≤ K(V[Y

Mj i

])3/2 for all j. Then, with probability 1 − 2|Λ|

1 + 50K

√n − Φ

t
n

(√n − t)2 + t2

,

it holds |Y M∗ − θ| ≤ C min

M∈Λ

|E[Y M

i ] − θ| +

t√n √n − t ˆ σM √n

,

where C = C(c) can be made less than 4.25.

Paulo Orenstein Winsorized Importance Sampling Stanford University 7 / 23

SLIDE 24

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Theorem

Let Yi be iid with mean θ. Consider winsorizing Yi at different levels in Λ = {M1, . . . , Mk} to obtain samples Y

Mj i

. Pick the threshold level M∗ = min

M ∈ Λ : ∀M′, M′′ ≥ M,

|Y M′ − Y M′′| ≤ α ·

ˆ

σM′ + ˆ σM′′ 2

,

where α = c ·

t √n−t with c, t chosen constants.

Let K > 0 be such that E[|Y

Mj i

− E[Y

Mj i

]|3] ≤ K(V[Y

Mj i

])3/2 for all j. Then, with probability 1 − 2|Λ|

1 + 50K

√n − Φ

t
n

(√n − t)2 + t2

,

it holds |Y M∗ − θ| ≤ C min

M∈Λ

|E[Y M

i ] − θ| +

t√n √n − t ˆ σM √n

,

where C = C(c) can be made less than 4.25.

Paulo Orenstein Winsorized Importance Sampling Stanford University 7 / 23

SLIDE 25

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Theorem

Let Yi be iid with mean θ. Consider winsorizing Yi at different levels in Λ = {M1, . . . , Mk} to obtain samples Y

Mj i

. Pick the threshold level M∗ = min

M ∈ Λ : ∀M′, M′′ ≥ M,

|Y M′ − Y M′′| ≤ α ·

ˆ

σM′ + ˆ σM′′ 2

,

where α = c ·

t √n−t with c, t chosen constants.

Let K > 0 be such that E[|Y

Mj i

− E[Y

Mj i

]|3] ≤ K(V[Y

Mj i

])3/2 for all j. Then, with probability 1 − 2|Λ|

1 + 50K

√n − Φ

t
n

(√n − t)2 + t2 , it holds |Y M∗ − θ| ≤ C min

M∈Λ

|E[Y M

i ] − θ| +

t√n √n − t ˆ σM √n

,

where C = C(c) can be made less than 4.25.

Paulo Orenstein Winsorized Importance Sampling Stanford University 7 / 23

SLIDE 26

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Proof

Paulo Orenstein Winsorized Importance Sampling Stanford University 8 / 23

SLIDE 27

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Proof

◮ Apply the Balancing Theorem:

Balancing Theorem

Suppose θ ∈ R is an unknown parameter, {ˆ E M}M∈Θ is a sequence of estimators of θ indexed by M ∈ Θ ⊂ R, with Θ a finite set. Additionally, suppose that for each M we know |ˆ E M − θ| ≤ bias(M) + ˆ s(M), where we assume bias(M) is unknown but non-increasing in M, and ˆ s(M) > 0 is

bserved and non-decreasing in M. Fix c > 2, and take

M∗ = min

M ∈ Θ : ∀M′, M′′ ≥ M, |ˆ

E M′ − ˆ E M′′| ≤ c ˆ s(M′) + ˆ s(M′′) 2

.

Then we have that |ˆ E M∗ − θ| ≤ C min

M∈Θ {ˆ

s(M) + bias(M)} , where C is a constant depending on the chosen c, less than 4.25.

Paulo Orenstein Winsorized Importance Sampling Stanford University 8 / 23

SLIDE 28

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Proof

◮ Apply the Balancing Theorem:

Balancing Theorem

Suppose θ ∈ R is an unknown parameter, {ˆ E M}M∈Θ is a sequence of estimators of θ indexed by M ∈ Θ ⊂ R, with Θ a finite set. Additionally, suppose that for each M we know |ˆ E M − θ| ≤ bias(M) + ˆ s(M), where we assume bias(M) is unknown but non-increasing in M, and ˆ s(M) > 0 is

bserved and non-decreasing in M. Fix c > 2, and take

M∗ = min

M ∈ Θ : ∀M′, M′′ ≥ M, |ˆ

E M′ − ˆ E M′′| ≤ c ˆ s(M′) + ˆ s(M′′) 2

.

Then we have that |ˆ E M∗ − θ| ≤ C min

M∈Θ {ˆ

s(M) + bias(M)} , where C is a constant depending on the chosen c, less than 4.25. ◮ Then, use Berry-Esseen to get probabilistic bounds.

Paulo Orenstein Winsorized Importance Sampling Stanford University 8 / 23

SLIDE 29

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Proof (of Balancing Theorem)

◮ We must thus show that for all M ∈ Θ, there exists C ≥ 0 such that |ˆ E M∗ − θ| ≤ C(ˆ s(M) + bias(M)). For this we shall consider two cases.

Paulo Orenstein Winsorized Importance Sampling Stanford University 8 / 23

SLIDE 30

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Proof (of Balancing Theorem)

◮ We must thus show that for all M ∈ Θ, there exists C ≥ 0 such that |ˆ E M∗ − θ| ≤ C(ˆ s(M) + bias(M)). For this we shall consider two cases. ◮ (i) First, consider any fixed M such that M > M∗. Then, by our definition

f M∗, and since ˆ

s(M) is non-decreasing in M, |ˆ E M∗ − ˆ E M| ≤ c · ˆ s(M). Also, as |ˆ E M − θ| ≤ bias(M) + ˆ s(M), we get |ˆ E M∗ − θ| ≤ |ˆ E M∗ − ˆ E M| + |ˆ E M − θ| ≤ cˆ s(M) + bias(M) + ˆ s(M) = bias(M) + (c + 1)ˆ s(M). This proves the case M > M∗.

Paulo Orenstein Winsorized Importance Sampling Stanford University 8 / 23

SLIDE 31

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Proof (of Balancing Theorem)

◮ We must thus show that for all M ∈ Θ, there exists C ≥ 0 such that |ˆ E M∗ − θ| ≤ C(ˆ s(M) + bias(M)). For this we shall consider two cases. ◮ (i) First, consider any fixed M such that M > M∗. Then, by our definition

f M∗, and since ˆ

s(M) is non-decreasing in M, |ˆ E M∗ − ˆ E M| ≤ c · ˆ s(M). Also, as |ˆ E M − θ| ≤ bias(M) + ˆ s(M), we get |ˆ E M∗ − θ| ≤ |ˆ E M∗ − ˆ E M| + |ˆ E M − θ| ≤ cˆ s(M) + bias(M) + ˆ s(M) = bias(M) + (c + 1)ˆ s(M). This proves the case M > M∗. ◮ (ii) The other side is harder.

Paulo Orenstein Winsorized Importance Sampling Stanford University 8 / 23

SLIDE 32

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

How well does this work in practice?

◮ We consider examples with real and synthetic data.

Paulo Orenstein Winsorized Importance Sampling Stanford University 9 / 23

SLIDE 33

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

How well does this work in practice?

◮ We consider examples with real and synthetic data. ◮ Compare three estimators:

usual IS: no winsorization; CV IS: winsorization with threshold chosen via CV; Balanced IS: winsorization with threshold chosen via Balancing Theorem.

Paulo Orenstein Winsorized Importance Sampling Stanford University 9 / 23

SLIDE 34

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

How well does this work in practice?

◮ We consider examples with real and synthetic data. ◮ Compare three estimators:

usual IS: no winsorization; CV IS: winsorization with threshold chosen via CV; Balanced IS: winsorization with threshold chosen via Balancing Theorem.

◮ CV IS takes 10-20× longer than Balanced IS and is usually worse.

Paulo Orenstein Winsorized Importance Sampling Stanford University 9 / 23

SLIDE 35

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

How well does this work in practice?

◮ We consider examples with real and synthetic data. ◮ Compare three estimators:

usual IS: no winsorization; CV IS: winsorization with threshold chosen via CV; Balanced IS: winsorization with threshold chosen via Balancing Theorem.

◮ CV IS takes 10-20× longer than Balanced IS and is usually worse. ◮ For small variances Balanced IS matches usual IS; as the proposal distri- bution gets worse, Balanced IS performs much better.

Paulo Orenstein Winsorized Importance Sampling Stanford University 9 / 23

SLIDE 36

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 10 / 23

SLIDE 37

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 10 / 23

SLIDE 38

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 10 / 23

SLIDE 39

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 10 / 23

SLIDE 40

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 10 / 23

SLIDE 41

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 10 / 23

SLIDE 42

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 10 / 23

SLIDE 43

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 10 / 23

SLIDE 44

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

◮ Knuth suggested estimating the number of self-avoiding walks using im- portance sampling.

Paulo Orenstein Winsorized Importance Sampling Stanford University 11 / 23

SLIDE 45

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

◮ Knuth suggested estimating the number of self-avoiding walks using im- portance sampling. ◮ For this, we need to choose a sampling distribution, q(x), over the self- avoiding walks.

Paulo Orenstein Winsorized Importance Sampling Stanford University 11 / 23

SLIDE 46

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

◮ Knuth suggested estimating the number of self-avoiding walks using im- portance sampling. ◮ For this, we need to choose a sampling distribution, q(x), over the self- avoiding walks. ◮ Consider building one sequentially.

Paulo Orenstein Winsorized Importance Sampling Stanford University 11 / 23

SLIDE 47

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 12 / 23

SLIDE 48

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 12 / 23

SLIDE 49

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 12 / 23

SLIDE 50

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

Paulo Orenstein Winsorized Importance Sampling Stanford University 12 / 23

SLIDE 51

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

◮ Define:

p(x) =

1 Zn I[SAW ](x); note Zn is the number of self-avoiding random walks;

q(x) =

1 d1·d2···dmx ; di is the number of available neighbors to i (could be 0);

f (x) = Zn.

Paulo Orenstein Winsorized Importance Sampling Stanford University 13 / 23

SLIDE 52

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

◮ Define:

p(x) =

1 Zn I[SAW ](x); note Zn is the number of self-avoiding random walks;

q(x) =

1 d1·d2···dmx ; di is the number of available neighbors to i (could be 0);

f (x) = Zn.

◮ We would like to estimate Zn = Ep[Zn] = Ep[f (X)] = Eq f (X)p(X) q(X)

= Eq

I[SAW ](X) q(X)

≈ 1

n

i=1

d1(Xi)d2(Xi) · · · dmXi (Xi) · I[SAW ](X).

Paulo Orenstein Winsorized Importance Sampling Stanford University 13 / 23

SLIDE 53

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

◮ How does winsorization perform?

Paulo Orenstein Winsorized Importance Sampling Stanford University 14 / 23

SLIDE 54

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

◮ How does winsorization perform? ◮ 1000 simulations of 1000 SAWs. ◮ θ = 1.56 · 1024; c = 1 + √ 3, t = 2. ◮ M ∈ {1021, 5 · 1023, 1025, 5 · 1026, 1028}.

Paulo Orenstein Winsorized Importance Sampling Stanford University 14 / 23

SLIDE 55

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Example: self-avoiding walk [Knuth, 1976]

◮ How does winsorization perform? ◮ 1000 simulations of 1000 SAWs. ◮ θ = 1.56 · 1024; c = 1 + √ 3, t = 2. ◮ M ∈ {1021, 5 · 1023, 1025, 5 · 1026, 1028}. IS CV IS Balanced IS MSE 2.075 · 1049 2.457 · 1048 2.437 · 1048 MAD 1.817 · 1024 1.567 · 1024 1.561 · 1024

Paulo Orenstein Winsorized Importance Sampling Stanford University 14 / 23

SLIDE 56

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 57

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Let M2 = 5 · 1026;

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 58

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Let M2 = 5 · 1026;

◮ if |Y

M1 −Y M2| ≤ α

ˆ

σM1 +ˆ σM2 2

, set M∗ = M2, and consider further truncation;

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 59

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Let M2 = 5 · 1026;

◮ if |Y

M1 −Y M2| ≤ α

ˆ

σM1 +ˆ σM2 2

, set M∗ = M2, and consider further truncation;

◮ else, stop

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 60

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Let M2 = 5 · 1026;

◮ if |Y

M1 −Y M2| ≤ α

ˆ

σM1 +ˆ σM2 2

, set M∗ = M2, and consider further truncation;

◮ else, stop

Let M3 = 1025

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 61

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Let M2 = 5 · 1026;

◮ if |Y

M1 −Y M2| ≤ α

ˆ

σM1 +ˆ σM2 2

, set M∗ = M2, and consider further truncation;

◮ else, stop

Let M3 = 1025

◮ if |Y

M1 − Y M3| ≤ α

ˆ

σM1 +ˆ σM3 2

and |Y

M2 − Y M3| ≤ α

ˆ

σM2 +ˆ σM3 2

, set M∗ =

M3, and consider further truncation;

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 62

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Let M2 = 5 · 1026;

◮ if |Y

M1 −Y M2| ≤ α

ˆ

σM1 +ˆ σM2 2

, set M∗ = M2, and consider further truncation;

◮ else, stop

Let M3 = 1025

◮ if |Y

M1 − Y M3| ≤ α

ˆ

σM1 +ˆ σM3 2

and |Y

M2 − Y M3| ≤ α

ˆ

σM2 +ˆ σM3 2

, set M∗ =

M3, and consider further truncation; ◮ else, stop

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 63

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Let M2 = 5 · 1026;

◮ if |Y

M1 −Y M2| ≤ α

ˆ

σM1 +ˆ σM2 2

, set M∗ = M2, and consider further truncation;

◮ else, stop

Let M3 = 1025

◮ if |Y

M1 − Y M3| ≤ α

ˆ

σM1 +ˆ σM3 2

and |Y

M2 − Y M3| ≤ α

ˆ

σM2 +ˆ σM3 2

, set M∗ =

M3, and consider further truncation; ◮ else, stop

. . .

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 64

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Procedure

◮ Procedure is run as follows:

Let M1 = 1028;

◮ set M∗ = M1

Let M2 = 5 · 1026;

◮ if |Y

M1 −Y M2| ≤ α

ˆ

σM1 +ˆ σM2 2

, set M∗ = M2, and consider further truncation;

◮ else, stop

Let M3 = 1025

◮ if |Y

M1 − Y M3| ≤ α

ˆ

σM1 +ˆ σM3 2

and |Y

M2 − Y M3| ≤ α

ˆ

σM2 +ˆ σM3 2

, set M∗ =

M3, and consider further truncation; ◮ else, stop

. . .

◮ Computational complexity: O(|Λ| · (|Λ| + n))

Paulo Orenstein Winsorized Importance Sampling Stanford University 15 / 23

SLIDE 65

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 1: Exponential

◮ p = 1

θ Expo,

◮ q = Expo, ◮ f (x) = x, ◮ θ ∈ {1.3, 1.5, 1.9, 2, 2.1, 3, 4, 10} ◮ M ∈ {550, 500, 400, 200, 100, 10}

Paulo Orenstein Winsorized Importance Sampling Stanford University 16 / 23

SLIDE 66

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 1: Exponential

30

60 90 120 Exp(1.5) Exp(1.9) Exp(2) Exp(2.1) Exp(3) Exp(4) Exp(7) Exp(10)

MSE Estimator

Importance Sampling

CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 16 / 23

SLIDE 67

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 1: Exponential

5 10 15 Exp(1.5) Exp(1.9) Exp(2) Exp(2.1) Exp(3) Exp(4) Exp(7) Exp(10)

MAD Estimator

Importance Sampling CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 16 / 23

SLIDE 68

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 2: Normal

◮ p = N(0, 1), ◮ q = N(0, θ), ◮ f (x) = x, ◮ θ = {0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.9} ◮ M ∈ {550, 500, 400, 200, 100, 10}

Paulo Orenstein Winsorized Importance Sampling Stanford University 17 / 23

SLIDE 69

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 2: Normal

0.0

0.2 0.4 0.6 N(0, 0.9) N(0, 0.8) N(0, 0.7) N(0, 0.6) N(0, 0.5) N(0, 0.4) N(0, 0.3) N(0, 0.2)

MSE Estimator

Importance Sampling

CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 17 / 23

SLIDE 70

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 2: Normal

0.0 0.1 0.2 0.3 N(0, 0.9) N(0, 0.8) N(0, 0.7) N(0, 0.6) N(0, 0.5) N(0, 0.4) N(0, 0.3) N(0, 0.2)

MAD Estimator

Importance Sampling CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 17 / 23

SLIDE 71

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 3: t

◮ p = t21(0, 1), ◮ q = t21(θ, 1 − 1/21), ◮ f (x) = x, ◮ θ = {0, 0.5, 1, 1.5, 2, 2.5, 3} ◮ M ∈ {550, 500, 400, 200, 100, 50, 5, 1}

Paulo Orenstein Winsorized Importance Sampling Stanford University 18 / 23

SLIDE 72

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 3: t

1

2 3 t(0, 20/21) t(1, 20/21) t(2, 20/21) t(3, 20/21) t(4, 20/21) t(5, 20/21) t(6, 20/21)

MSE Estimator

Importance Sampling

CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 18 / 23

SLIDE 73

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 3: t

0.00 0.25 0.50 0.75 1.00 1.25 t(0, 20/21) t(1, 20/21) t(2, 20/21) t(3, 20/21) t(4, 20/21) t(5, 20/21) t(6, 20/21)

MAD Estimator

Importance Sampling CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 18 / 23

SLIDE 74

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 4: Multivariate Normal

◮ p = Nθ(0, 1), ◮ q = t21,θ(0.4 · ✶, 0.8 · I), ◮ f (x) = θ

i=1 xi,

◮ θ = {20, 40, 60, 80, 100} ◮ M ∈ {550, 500, 400, 200, 100, 50, 10}

Paulo Orenstein Winsorized Importance Sampling Stanford University 19 / 23

SLIDE 75

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 4: Multivariate Normal

10

20 30 40 $N_{20}(0.4, 0.8 I)$ $N_{40}(0.4, 0.8 I)$ $N_{60}(0.4, 0.8 I)$ $N_{80}(0.4, 0.8 I)$ $N_{100}(0.4, 0.8 I)$

MSE Estimator

Importance Sampling

CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 19 / 23

SLIDE 76

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 4: Multivariate Normal

2 4 6 $N_{20}(0.4, 0.8 I)$ $N_{40}(0.4, 0.8 I)$ $N_{60}(0.4, 0.8 I)$ $N_{80}(0.4, 0.8 I)$ $N_{100}(0.4, 0.8 I)$

MAD Estimator

Importance Sampling CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 19 / 23

SLIDE 77

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 5: Normal Mixture

◮ p = 0.8 · N(0, 0.5) + 0.2 · N(θ, 0.5), ◮ q = N(0, 4), ◮ f (x) = x, ◮ θ = {1, 3, 5, 7, 9, 11, 12} ◮ M ∈ {550, 500, 400, 200, 100, 10}

Paulo Orenstein Winsorized Importance Sampling Stanford University 20 / 23

SLIDE 78

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 5: Normal Mixture

10

20 30 40 N(1, 0.5) N(3, 0.5) N(5, 0.5) N(7, 0.5) N(9, 0.5) N(11, 0.5) N(12, 0.5)

MSE Estimator

Importance Sampling

CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 20 / 23

SLIDE 79

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Simulation 5: Normal Mixture

0.0 0.5 1.0 1.5 2.0 2.5 N(1, 0.5) N(3, 0.5) N(5, 0.5) N(7, 0.5) N(9, 0.5) N(11, 0.5) N(12, 0.5)

MAD Estimator

Importance Sampling CV Winsorized IS Balanced IS

Paulo Orenstein Winsorized Importance Sampling Stanford University 20 / 23

SLIDE 80

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Is it worth winsorizing?

◮ Negative aspects:

theory requires high n, at least 108 (but can be improved); must be provided truncation values; why winsorize symmetrically around 0?

Paulo Orenstein Winsorized Importance Sampling Stanford University 21 / 23

SLIDE 81

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Is it worth winsorizing?

◮ Negative aspects:

theory requires high n, at least 108 (but can be improved); must be provided truncation values; why winsorize symmetrically around 0?

◮ Positive aspects:

works well in practice; adaptive to the sample; comes with finite-sample optimality properties.

Paulo Orenstein Winsorized Importance Sampling Stanford University 21 / 23

SLIDE 82

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Conclusion

◮ Importance sampling should not rely only on sample mean.

Paulo Orenstein Winsorized Importance Sampling Stanford University 22 / 23

SLIDE 83

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Conclusion

◮ Importance sampling should not rely only on sample mean. ◮ We need robust, adaptive alternatives.

Paulo Orenstein Winsorized Importance Sampling Stanford University 22 / 23

SLIDE 84

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Conclusion

◮ Importance sampling should not rely only on sample mean. ◮ We need robust, adaptive alternatives. ◮ Balanced IS has theoretical guarantees and performs well in practice:

in high-variance settings, it outperforms usual IS in low-variance settings, it matches it.

Paulo Orenstein Winsorized Importance Sampling Stanford University 22 / 23

SLIDE 85

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

Conclusion

◮ Importance sampling should not rely only on sample mean. ◮ We need robust, adaptive alternatives. ◮ Balanced IS has theoretical guarantees and performs well in practice:

in high-variance settings, it outperforms usual IS in low-variance settings, it matches it.

◮ Many future extensions.

Paulo Orenstein Winsorized Importance Sampling Stanford University 22 / 23

SLIDE 86

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion

References

◮ Ionides, E. L. (2008). Truncated importance sampling. Journal of Com- putational and Graphical Statistics, 17(2). ◮ Mathé, P. (2006). The Lepskii principle revisited. Inverse problems, 22(3). ◮ Orenstein, P. (2018). Finite-sample Guarantees for Winsorized Importance

Sampling. arXiv preprint arXiv:1810.11130.

◮ Shao, Q.-M. (2005). An explicit berry–esseen bound for student’s t- statistic via Stein’s Method. Stein’s Method and Applications, 5:143. ◮ Vehtari, A., Gelman, A., and Gabry, J. (2015). Pareto smoothed impor- tance sampling. arXiv preprint arXiv:1507.02646

Paulo Orenstein Winsorized Importance Sampling Stanford University 23 / 23