Confidence Bands for Distribution Functions: The Law of the Iterated - - PowerPoint PPT Presentation

confidence bands for distribution functions the law of
SMART_READER_LITE
LIVE PREVIEW

Confidence Bands for Distribution Functions: The Law of the Iterated - - PowerPoint PPT Presentation

Confidence Bands for Distribution Functions: The Law of the Iterated Logarithm and Shape Constraints Lutz Duembgen (Bern) Jon A. Wellner (Seattle) Petro Kolesnyk (Bern) Ralf Wilke (Copenhagen) November 2014 I. The LIL for Brownian Motion and


slide-1
SLIDE 1

Confidence Bands for Distribution Functions: The Law of the Iterated Logarithm and Shape Constraints

Lutz Duembgen (Bern) Jon A. Wellner (Seattle) Petro Kolesnyk (Bern) Ralf Wilke (Copenhagen) November 2014

slide-2
SLIDE 2
  • I. The LIL for Brownian Motion and Bridge
  • II. A General LIL for Sub-Exponential Processes
  • III. Implications for the Uniform Empirical Process

III.1 Goodness-of-Fit Tests III.2 Confidence Bands

  • IV. Bi-Log-Concave Distribution Functions
  • V. Bi-Log-Concave Binary Regression
slide-3
SLIDE 3
  • I. The LIL for Brownian Motion and Bridge

Standard Brownian motion W = (W(t))t≥0 LIL for BM: lim sup

t↓0

±W(t)

  • 2t log log(t−1)

= 1 a.s. lim sup

t↑∞

±W(t)

  • 2t log log(t)

= 1 a.s.

slide-4
SLIDE 4

Refined half of LIL for BM: For any constant ν > 3/2, lim

t→{0,∞}

W(t)2 2t − log log(t + t−1) − ν log log log(t + t−1)

  • = −∞

a.s.

slide-5
SLIDE 5

Reformulation for standard Brownian bridge U = (U(t))t∈(0,1): (0, 1) ∋ t → logit(t) := log

  • t

1 − t

  • ∈ R,

R ∋ x → ℓ(x) := ex 1 + ex ∈ (0, 1).

slide-6
SLIDE 6

Refined half of LIL for BB: For arbitrary constants ν > 3/2, sup

t∈(0,1)

  • U(t)2

2t(1 − t) − C(t) − νD(t)

  • < ∞

a.s. where C(t) := log

  • 1 + logit(t)2/2

≈ log log

  • 1

t(1 − t)

  • D(t) := log
  • 1 + C(t)2/2

≈ log log log

  • 1

t(1 − t)

  • as t → {0, 1}.
slide-7
SLIDE 7
  • II. A General LIL for Sub-Exponential Processes

Nonnegative stochastic process X = (X(t))t∈T

  • n

T ⊂ (0, 1). Locally uniform sub-exponentiality: LUSE0: For arbitrary a ∈ R, c ≥ 0 and η ≥ 0, I P

  • sup

t∈[ℓ(a),ℓ(a+c)]

X(t) ≥ η

  • ≤ M exp(−L(c) η),

where M ≥ 1 and L : [0, ∞) → [0, 1] satisfies L(c) = 1 − O(c) as c ↓ 0

slide-8
SLIDE 8

Refinement for ζ ∈ [0, 1]: LUSEζ: For arbitrary a ∈ R, c ≥ 0 and η ≥ 0, I P

  • sup

t∈[ℓ(a),ℓ(a+c)]

X(t) ≥ η

  • ≤ M exp(−L(c) η)

max(1, L(c) η)ζ , with M and L(·) as in LUSE0. Example: X(t) := U(t)2 2t(1 − t) satisfies LUSE1/2 with M = 2 and L(c) = e−c.

slide-9
SLIDE 9
  • Proposition. Suppose that X satisfies LUSEζ.

For any Lo ∈ (0, 1) and ν > 2 − ζ there exists a constant Mo = Mo(M, L(·), ζ, Lo, ν) ≥ 1 such that I P

  • sup

T

(X − C − νD) ≥ η

  • ≤ Mo exp(−Lo η)

for arbitary η ≥ 0.

slide-10
SLIDE 10
  • III. Implications for the Uniform Empirical Process

Let U1, U2, . . . , Un be i.i.d. ∼ Unif[0, 1]. Auxiliary function K : [0, 1] × (0, 1) → [0, ∞], K(x, p) := x log x p

  • + (1 − x) log

1 − x 1 − p

  • i.e. Kullback-Leibler divergence between Bin(1, x) and Bin(1, p).

Two key properties: K(x, p) = (x − p)2 2p(1 − p)

  • 1 + o(1)
  • as x → p.

K(x, p) ≤ c = ⇒ |x − p| ≤

  • 2c p(1 − p) + c
  • 2c x(1 − x) + c
slide-11
SLIDE 11

Implication 1: Uniform empirical distribution function

  • Gn(t) := 1

n

n

  • i=1

1[Ui≤t] Lemma 1. The process Xn = (Xn(t))t∈(0,1) with Xn(t) := n K( Gn(t), t) satisfies LUSE0 with M = 2 and L(c) = e−c. Theorem 1. For any fixed ν > 2, sup

(0,1)

  • Xn − C − νD
  • →L

sup

t∈(0,1)

  • U(t)2

2t(1 − t) − C(t) − νD(t)

  • .
slide-12
SLIDE 12

Main ingredients for proofs:

Gn(t)/t

  • t∈(0,1] is a reverse martingale.

◮ Exponential transform and Doob’s inequality for

submartingales.

◮ Analytical properties of K(·, ·). ◮ Donsker’s invariance for uniform empirical process.

slide-13
SLIDE 13

Implication 2: Uniform order statistics 0 < Un:1 < Un:2 < · · · < Un:n < 1. Tn := {tn1, tn2, . . . , tnn} with tni := I E(Un:i) = i n + 1. Lemma 2. The process ˜ Xn = ( ˜ Xn(t))t∈Tn with ˜ Xn(tni) := (n + 1)K(tni, Un:i) satisfies LUSE0 with M = 2 and L(c) = e−c. Theorem 2. For any fixed ν > 2, sup

Tn

˜ Xn − C − νD

  • →L

sup

t∈(0,1)

  • U(t)2

2t(1 − t) − C(t) − νD(t)

  • .
slide-14
SLIDE 14

Main ingredients for proofs:

Un:i/tni n

i=1 is a reverse martingale. ◮ Exponential transform and Doob’s inequality for

submartingales.

◮ Connection between Beta and Gamma distributions. ◮ Analytical properties of K(·, ·). ◮ Donsker’s invariance principle for uniform quantile process.

slide-15
SLIDE 15

Some realizations of ˜ Xn for n = 5000 and ν = 3:

0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 2
  • 1

1 2 t X n(t)

slide-16
SLIDE 16

0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 2
  • 1

1 2 t X n(t)

slide-17
SLIDE 17

0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 2
  • 1

1 2 3 t X n(t)

slide-18
SLIDE 18

0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 2
  • 1

1 2 t X n(t)

slide-19
SLIDE 19

Distribution function of arg maxt ˜ Xn(t):

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t

slide-20
SLIDE 20

III.1 Goodness-of-Fit Tests

Let X1, X2, . . . , Xn be i.i.d. with unknown c.d.f. F on R. Empirical c.d.f.

  • Fn(x) := 1

n

n

  • i=1

1[Xi≤x]. Testing problem: Ho : F ≡ Fo versus HA : F ≡ Fo.

slide-21
SLIDE 21

Berk–Jones (1979) proposed the test statistic Tn(Fo) := sup

R

n K( Fn, Fo) with critical value κBJ

n,α := (1 − α) − quantile of

sup

t∈(0,1)

n Kn( Gn(t), t) = log log(n) + O

  • log log log(n)
  • .
slide-22
SLIDE 22

New proposal: Tn(Fo) := sup

R

  • n K(

Fn, Fo) − C(Fo) − νD(Fo)

  • with critical value

κnew

n,α

:= (1 − α) − quantile of sup

t∈(0,1)

  • n K(

Gn(t), t) − C(t) − νD(t)

  • → (1 − α) − quantile of

sup

t∈(0,1)

  • U(t)2

2t(1 − t) − C(t) − νD(t)

  • .
slide-23
SLIDE 23
  • Power. For any fixed κ > 0,

I PFn

  • Tn(Fo) > κ
  • → 1

as sup

R

√n|Fn − Fo|

  • (1 + C(Fo))Fo(1 − Fo) + C(Fo)/√n

→ ∞.

slide-24
SLIDE 24

Special case: Detecting heterogeneous Gaussian mixtures (Ingster 1997, 1998; Donoho–Jin 2004) Setting 1: Fo := Φ, Fn := (1 − εn) Φ + εn Φ(· − µn) with εn = n−β+o(1), β ∈ (1/2, 1), µn → ∞.

slide-25
SLIDE 25
  • Theorem. For any fixed κ > 0,

I PFn

  • Tn(Fo) > κ
  • → 1

provided that µn =

  • 2r log(n)

with r >

  • β − 1/2

if β ≤ 3/4,

  • 1 − √1 − β

2 if β ≥ 3/4.

slide-26
SLIDE 26

Setting 2 (Contiguous alternatives): Fo := Φ, Fn :=

  • 1 − π

√n

  • Φ + π

√n Φ(· − µ), π, µ > 0. Optimal level-α test of Fo versus Fn has asymptotic power Φ

  • Φ−1(α) + π2(exp(µ2) − 1)

4

  • .
slide-27
SLIDE 27
  • Theorem. Let µ =
  • 2s log(1/π) for fixed s > 0. As π ↓ 0,

Φ

  • Φ−1(α) + π2(exp(µ2) − 1)

4

  • α

if s < 1, 1 if s > 1, while for any fixed κ > 0, I PFn

  • Tn(Fo) > κ

1 if s > 1.

slide-28
SLIDE 28

III.2 Confidence Bands

Owen (1995) proposed (1 − α)-confidence band

  • F : sup

R

n K( Fn, F) ≤ κBJ

n,α

  • .

New proposal: With order statistics Xn:1 ≤ Xn:2 ≤ · · · ≤ Xn:n,

  • F : max

1≤i≤n

  • (n + 1)K(tni, F(Xn:i)) − C(tni) − νD(tni)
  • ≤ ˜

κn,α

slide-29
SLIDE 29

Resulting bounds for F(x): With confidence 1 − α, on [Xn:i, Xn:i+1), 0 ≤ i ≤ n, F ∈    [aBJO

ni

, bBJO

ni

] with Owen’s (1995) proposal, [anew

ni

, bnew

ni

] with new proposal, while

  • Fn = sni := i

n.

slide-30
SLIDE 30

n = 500: i → anew

ni

, sni, bnew

ni 100 200 300 400 500 0.0 0.2 0.4 0.6 0.8 1.0

slide-31
SLIDE 31

n = 500: i → a∗

ni − sni, b∗ ni − sni 100 200 300 400 500

  • 0.05

0.00 0.05

slide-32
SLIDE 32

n = 2000: i → a∗

ni − sni, b∗ ni − sni 500 1000 1500 2000

  • 0.04
  • 0.02

0.00 0.02 0.04

slide-33
SLIDE 33

n = 8000: i → a∗

ni − sni, b∗ ni − sni 2000 4000 6000 8000

  • 0.02
  • 0.01

0.00 0.01 0.02

slide-34
SLIDE 34
  • Theorem. For any fixed α ∈ (0, 1),

max

0≤i≤n

bnew

ni

− anew

ni

bBJO

ni

− aBJO

ni

→ 1, while max

0≤i≤n (bBJO ni

− aBJO

ni

) = (1 + o(1))

  • 2 log log n

n , max

0≤i≤n (bnew ni

− anew

ni

) = O(n−1/2).

slide-35
SLIDE 35
  • IV. Bi-Log-Concave Distribution Functions

Shape constraint 1: Log-concave density. F has density f = eφ with φ : R → [−∞, ∞) concave. Shape constraint 2: Bi-log-concave distribution function. Both log(F) and log(1 − F) are concave.

  • Log-concave density =

⇒ bi-log-concave c.d.f.

  • A bi-log-concave c.d.f. may have arbitrarily many modes!
slide-36
SLIDE 36

Theorem. Let J(F) := {x ∈ R : 0 < F(x) < 1} = ∅. Four equivalent statements:

◮ F bi-log-concave. ◮ F has a density f . On J(F),

f = F ′ > 0, f F ց and f 1 − F ր.

◮ F has a bounded density f . On J(F),

f = F ′ > 0 and −f 2 1 − F ≤ f ′ ≤ f 2 F .

◮ F has a density f s.t. for arbitrary x ∈ J(F) and t ∈ R,

F(x + t)      ≤ F(x) exp f F (x) · t

  • ,

≥ 1 − (1 − F(x)) exp

f 1 − F (x) · t

  • .
slide-37
SLIDE 37
  • 4
  • 2

2 4

  • 0.5

0.0 0.5 1.0 1.5

F 1 + log(F) −log(1 − F)

slide-38
SLIDE 38
  • 4
  • 2

2 4 0.0 0.1 0.2 0.3 0.4

f = F ' f F f (1 − F)

slide-39
SLIDE 39
  • 4
  • 2

2 4

  • 0.2
  • 0.1

0.0 0.1 0.2

f ' f 2 F −f 2 (1 − F)

slide-40
SLIDE 40
  • 4
  • 2

2 4

  • 0.2

0.0 0.2 0.4 0.6 0.8 1.0 1.2

slide-41
SLIDE 41
  • Estimation. Presumably no NPMLE of a bi-log-concave F

:-( Confidence bands. Starting from a standard (1 − α)-confidence band (Ln, Un) for F, I P

  • Ln ≤ F ≤ Un on R
  • = 1 − α,

define Lo

n(x) := inf

  • G(x) : G bi-log-concave, Ln ≤ G ≤ Un on R
  • ,

Uo

n (x) := sup

  • G(x) : G bi-log-concave, Ln ≤ G ≤ Un on R
  • .
slide-42
SLIDE 42

0.0 0.2 0.4 0.6 0.8 1.0

slide-43
SLIDE 43

0.0 0.2 0.4 0.6 0.8 1.0

slide-44
SLIDE 44

0.0 0.2 0.4 0.6 0.8 1.0

slide-45
SLIDE 45

0.0 0.2 0.4 0.6 0.8 1.0

slide-46
SLIDE 46
  • 4
  • 2

2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 x F(x)

slide-47
SLIDE 47
  • 4
  • 2

2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 x F(x)

slide-48
SLIDE 48
  • Theorem. For any integer k > 0,

sup

G : Lo

n≤G≤Uo n

  • xk G(dx) −
  • xk F(dx)
  • =

   Op

  • (log n)kn−1/2

with KS band, Op

  • n−1/2

with new band. Whenever

  • eλx F(dx) < ∞,

sup

G : Lo

n≤G≤Uo n

  • eλx G(dx) −
  • eλx F(dx)
  • = op(1).
slide-49
SLIDE 49
  • V. Bi-Log-Concave Binary Regression

Generic observation: (X, Y ) ∈ R × {0, 1} (or R × [0, 1]). Shape constraint: I E(Y | X = x) = µ(x) with µ : R → [0, 1] bi-log-concave: log(µ) and log(1 − µ) both concave. Nonparametric extension of logistic regression, because x → ℓ(a + bx) strictly bi-log-concave for arbitrary a, b ∈ R.