Model selection for fast density estimation orfi 1 L aszl o (Laci) - - PowerPoint PPT Presentation

model selection for fast density estimation
SMART_READER_LITE
LIVE PREVIEW

Model selection for fast density estimation orfi 1 L aszl o (Laci) - - PowerPoint PPT Presentation

Model selection for fast density estimation orfi 1 L aszl o (Laci) Gy 1 Department of Computer Science and Information Theory Budapest University of Technology and Economics Budapest, Hungary July 17, 2008 e-mail: gyorfi@szit.bme.hu


slide-1
SLIDE 1

Model selection for fast density estimation

L´ aszl´

  • (Laci) Gy¨
  • rfi1

1Department of Computer Science and Information Theory

Budapest University of Technology and Economics Budapest, Hungary

July 17, 2008 e-mail: gyorfi@szit.bme.hu www.szit.bme.hu/˜ gyorfi

Gy¨

  • rfi

Model selection for fast density estimation

slide-2
SLIDE 2

Density estimation

Rd-valued i.i.d. random vectors X1, . . . , Xn

Gy¨

  • rfi

Model selection for fast density estimation

slide-3
SLIDE 3

Density estimation

Rd-valued i.i.d. random vectors X1, . . . , Xn distributed according to unknown probability measure µ with density f

Gy¨

  • rfi

Model selection for fast density estimation

slide-4
SLIDE 4

Density estimation

Rd-valued i.i.d. random vectors X1, . . . , Xn distributed according to unknown probability measure µ with density f The L1 norm f − g :=

  • Rd |f (x) − g(x)|dx

Gy¨

  • rfi

Model selection for fast density estimation

slide-5
SLIDE 5

Density estimation

Rd-valued i.i.d. random vectors X1, . . . , Xn distributed according to unknown probability measure µ with density f The L1 norm f − g :=

  • Rd |f (x) − g(x)|dx = 2 sup

A

  • A

f (x)dx −

  • A

g(x)dx

  • Gy¨
  • rfi

Model selection for fast density estimation

slide-6
SLIDE 6

Kernel density estimate

For a kernel function K and bandwidth h > 0, let fn be the kernel density estimate with sample size n: fn(x) = 1 nhd

n

  • i=1

K x − Xi h

  • .

Gy¨

  • rfi

Model selection for fast density estimation

slide-7
SLIDE 7

Density-free consistency

If lim

n→∞ hn = 0

and lim

n→∞ nhd n = ∞

Gy¨

  • rfi

Model selection for fast density estimation

slide-8
SLIDE 8

Density-free consistency

If lim

n→∞ hn = 0

and lim

n→∞ nhd n = ∞

then, for any density f , lim

n→∞ Ef − fn = 0

and lim

n→∞ f − fn = 0 a.s.

Gy¨

  • rfi

Model selection for fast density estimation

slide-9
SLIDE 9

Rate of convergence

If the density f has a compact support and is twice differentiable, then E(fn − f ) ≤ c1

  • nhd

n

+ c2h2

n.

Gy¨

  • rfi

Model selection for fast density estimation

slide-10
SLIDE 10

Rate of convergence

If the density f has a compact support and is twice differentiable, then E(fn − f ) ≤ c1

  • nhd

n

+ c2h2

n.

If hn = cn−1/(d+4) then E(fn − f ) ≤ Cn−2/(d+4).

Gy¨

  • rfi

Model selection for fast density estimation

slide-11
SLIDE 11

Rate of convergence

If the density f has a compact support and is twice differentiable, then E(fn − f ) ≤ c1

  • nhd

n

+ c2h2

n.

If hn = cn−1/(d+4) then E(fn − f ) ≤ Cn−2/(d+4). TOO SLOW.

Gy¨

  • rfi

Model selection for fast density estimation

slide-12
SLIDE 12

Model selection for density estimation

We wish to estimate a density f on Rd

Gy¨

  • rfi

Model selection for fast density estimation

slide-13
SLIDE 13

Model selection for density estimation

We wish to estimate a density f on Rd that belongs to a parametric family, Fk, where k is unknown,

Gy¨

  • rfi

Model selection for fast density estimation

slide-14
SLIDE 14

Model selection for density estimation

We wish to estimate a density f on Rd that belongs to a parametric family, Fk, where k is unknown, but Fk ⊂ Fk+1 for all k.

Gy¨

  • rfi

Model selection for fast density estimation

slide-15
SLIDE 15

Model selection for density estimation

We wish to estimate a density f on Rd that belongs to a parametric family, Fk, where k is unknown, but Fk ⊂ Fk+1 for all k. F =

  • k≥1

Fk.

Gy¨

  • rfi

Model selection for fast density estimation

slide-16
SLIDE 16

Model selection for density estimation

We wish to estimate a density f on Rd that belongs to a parametric family, Fk, where k is unknown, but Fk ⊂ Fk+1 for all k. F =

  • k≥1

Fk. the complexity associated with f is defined as k∗ = min{k ≥ 1 : f ∈ Fk}.

Gy¨

  • rfi

Model selection for fast density estimation

slide-17
SLIDE 17

Example

Fk is the set of mixtures of d dimensional normal densities, where the number of components is at most k

Gy¨

  • rfi

Model selection for fast density estimation

slide-18
SLIDE 18

Objective

We wish to introduce an estimate kn of the complexity k∗ and

Gy¨

  • rfi

Model selection for fast density estimation

slide-19
SLIDE 19

Objective

We wish to introduce an estimate kn of the complexity k∗ and to pick a density estimate ˆ fkn in F with

Gy¨

  • rfi

Model selection for fast density estimation

slide-20
SLIDE 20

Objective

We wish to introduce an estimate kn of the complexity k∗ and to pick a density estimate ˆ fkn in F with

1 kn → k∗ almost surely Gy¨

  • rfi

Model selection for fast density estimation

slide-21
SLIDE 21

Objective

We wish to introduce an estimate kn of the complexity k∗ and to pick a density estimate ˆ fkn in F with

1 kn → k∗ almost surely

(i.e., kn = k∗ almost surely, for all n large enough)

Gy¨

  • rfi

Model selection for fast density estimation

slide-22
SLIDE 22

Objective

We wish to introduce an estimate kn of the complexity k∗ and to pick a density estimate ˆ fkn in F with

1 kn → k∗ almost surely

(i.e., kn = k∗ almost surely, for all n large enough)

2 and

E

  • ˆ

fkn − f

  • = O

1 √n

  • .

Gy¨

  • rfi

Model selection for fast density estimation

slide-23
SLIDE 23

Objective

We wish to introduce an estimate kn of the complexity k∗ and to pick a density estimate ˆ fkn in F with

1 kn → k∗ almost surely

(i.e., kn = k∗ almost surely, for all n large enough)

2 and

E

  • ˆ

fkn − f

  • = O

1 √n

  • .

Biau, Devroye (2004)

Gy¨

  • rfi

Model selection for fast density estimation

slide-24
SLIDE 24

Objective

We wish to introduce an estimate kn of the complexity k∗ and to pick a density estimate ˆ fkn in F with

1 kn → k∗ almost surely

(i.e., kn = k∗ almost surely, for all n large enough)

2 and

E

  • ˆ

fkn − f

  • = O

1 √n

  • .

Biau, Devroye (2004) kn and ˆ fkn via projection of the empirical measure with respect to the Yatracos class

Gy¨

  • rfi

Model selection for fast density estimation

slide-25
SLIDE 25

Objective

We wish to introduce an estimate kn of the complexity k∗ and to pick a density estimate ˆ fkn in F with

1 kn → k∗ almost surely

(i.e., kn = k∗ almost surely, for all n large enough)

2 and

E

  • ˆ

fkn − f

  • = O

1 √n

  • .

Biau, Devroye (2004) kn and ˆ fkn via projection of the empirical measure with respect to the Yatracos class too complex

Gy¨

  • rfi

Model selection for fast density estimation

slide-26
SLIDE 26

Testing homogeneity

Gy¨

  • rfi

Model selection for fast density estimation

slide-27
SLIDE 27

Testing homogeneity

Two mutually independent samples X1, . . . , Xn and X ′

1, . . . , X ′ n

distributed according to unknown probability distributions µ and µ′

  • n Rd.

Gy¨

  • rfi

Model selection for fast density estimation

slide-28
SLIDE 28

Testing homogeneity

Two mutually independent samples X1, . . . , Xn and X ′

1, . . . , X ′ n

distributed according to unknown probability distributions µ and µ′

  • n Rd.

We are interested in testing the null hypothesis that the two samples are homogeneous, that is H0 : µ = µ′.

Gy¨

  • rfi

Model selection for fast density estimation

slide-29
SLIDE 29

Testing homogeneity

Two mutually independent samples X1, . . . , Xn and X ′

1, . . . , X ′ n

distributed according to unknown probability distributions µ and µ′

  • n Rd.

We are interested in testing the null hypothesis that the two samples are homogeneous, that is H0 : µ = µ′. empirical probability distributions µn and µ′

n

Gy¨

  • rfi

Model selection for fast density estimation

slide-30
SLIDE 30

The test statistic

Based on a partition Pn = {An1, . . . , Anmn} of Rd, we let the test statistic be defined as Tn =

mn

  • j=1

|µn(Anj) − µ′

n(Anj)|.

Gy¨

  • rfi

Model selection for fast density estimation

slide-31
SLIDE 31

Asymptotic behavior of Tn

Theorem. Under H0, for all 0 < ε < 2, P{Tn > ε} = e−n(gT (ε)+o(1)), as n → ∞,

Gy¨

  • rfi

Model selection for fast density estimation

slide-32
SLIDE 32

Asymptotic behavior of Tn

Theorem. Under H0, for all 0 < ε < 2, P{Tn > ε} = e−n(gT (ε)+o(1)), as n → ∞, where gT(ε) = (1 + ε/2) ln(1 + ε/2) + (1 − ε/2) ln(1 − ε/2) ≈ ε2/4. (Biau, Gy¨

  • rfi (2005))

Gy¨

  • rfi

Model selection for fast density estimation

slide-33
SLIDE 33

A strong consistent test

Corollary. Consider the test which rejects H0 when Tn > 2 √ ln 2 mn n .

Gy¨

  • rfi

Model selection for fast density estimation

slide-34
SLIDE 34

A strong consistent test

Corollary. Consider the test which rejects H0 when Tn > 2 √ ln 2 mn n . Assume that lim

n→∞

mn n = 0 and lim

n→∞

mn ln n = ∞.

Gy¨

  • rfi

Model selection for fast density estimation

slide-35
SLIDE 35

A strong consistent test

Corollary. Consider the test which rejects H0 when Tn > 2 √ ln 2 mn n . Assume that lim

n→∞

mn n = 0 and lim

n→∞

mn ln n = ∞. Then, under H0, after a random sample size the test makes a.s. no error.

Gy¨

  • rfi

Model selection for fast density estimation

slide-36
SLIDE 36

A strong consistent test

Corollary. Consider the test which rejects H0 when Tn > 2 √ ln 2 mn n . Assume that lim

n→∞

mn n = 0 and lim

n→∞

mn ln n = ∞. Then, under H0, after a random sample size the test makes a.s. no error. Moreover, if µ = µ′, and for each sphere S centered at the origin lim

n→∞

max

j:An,j∩S=∅ diam(An,j) = 0

Gy¨

  • rfi

Model selection for fast density estimation

slide-37
SLIDE 37

A strong consistent test

Corollary. Consider the test which rejects H0 when Tn > 2 √ ln 2 mn n . Assume that lim

n→∞

mn n = 0 and lim

n→∞

mn ln n = ∞. Then, under H0, after a random sample size the test makes a.s. no error. Moreover, if µ = µ′, and for each sphere S centered at the origin lim

n→∞

max

j:An,j∩S=∅ diam(An,j) = 0

then after a random sample size the test makes a.s. no error. (Biau, Gy¨

  • rfi (2005))

Gy¨

  • rfi

Model selection for fast density estimation

slide-38
SLIDE 38

Complexity estimation

Split the sample into two subsamples: {X1, . . . , Xn} and {X ′

1, . . . , X ′ n} = {Xn+1, . . . , X2n}.

Gy¨

  • rfi

Model selection for fast density estimation

slide-39
SLIDE 39

Complexity estimation

Split the sample into two subsamples: {X1, . . . , Xn} and {X ′

1, . . . , X ′ n} = {Xn+1, . . . , X2n}.

Let Pn = {Anj : j ≥ 1} be a cubic partition of Rd with volume hd

n.

Gy¨

  • rfi

Model selection for fast density estimation

slide-40
SLIDE 40

Complexity estimation

Split the sample into two subsamples: {X1, . . . , Xn} and {X ′

1, . . . , X ′ n} = {Xn+1, . . . , X2n}.

Let Pn = {Anj : j ≥ 1} be a cubic partition of Rd with volume hd

n.

Introduce the statistic dn,k = inf

g∈Fk

  • A∈Pn
  • A

g − µ2n(A)

  • .

Gy¨

  • rfi

Model selection for fast density estimation

slide-41
SLIDE 41

Complexity estimation

Split the sample into two subsamples: {X1, . . . , Xn} and {X ′

1, . . . , X ′ n} = {Xn+1, . . . , X2n}.

Let Pn = {Anj : j ≥ 1} be a cubic partition of Rd with volume hd

n.

Introduce the statistic dn,k = inf

g∈Fk

  • A∈Pn
  • A

g − µ2n(A)

  • .

Let the threshold be Tn =

  • A∈Pn

|µn(A) − µ′

n(A)|.

Gy¨

  • rfi

Model selection for fast density estimation

slide-42
SLIDE 42

Complexity estimation

Split the sample into two subsamples: {X1, . . . , Xn} and {X ′

1, . . . , X ′ n} = {Xn+1, . . . , X2n}.

Let Pn = {Anj : j ≥ 1} be a cubic partition of Rd with volume hd

n.

Introduce the statistic dn,k = inf

g∈Fk

  • A∈Pn
  • A

g − µ2n(A)

  • .

Let the threshold be Tn =

  • A∈Pn

|µn(A) − µ′

n(A)|.

Estimate of k∗: kn = min{k ≥ 1 : dn,k ≤ Tn}.

Gy¨

  • rfi

Model selection for fast density estimation

slide-43
SLIDE 43

Theorem 1

Assume that, for each k ≥ 1, Fk is closed with respect to the weak convergence topology.

Gy¨

  • rfi

Model selection for fast density estimation

slide-44
SLIDE 44

Theorem 1

Assume that, for each k ≥ 1, Fk is closed with respect to the weak convergence topology. Then there exists a positive constant κ, depending on f , such that P {kn = k∗} ≤ exp

  • −κ h−d

n

  • ,

Gy¨

  • rfi

Model selection for fast density estimation

slide-45
SLIDE 45

Theorem 1

Assume that, for each k ≥ 1, Fk is closed with respect to the weak convergence topology. Then there exists a positive constant κ, depending on f , such that P {kn = k∗} ≤ exp

  • −κ h−d

n

  • ,

and consequently, for the choice hn = n−δ with 0 < δ < 1/d, kn = k∗ almost surely, for all n large enough. (Biau, Cadre, Devroye, Gy¨

  • rfi (2008))

Gy¨

  • rfi

Model selection for fast density estimation

slide-46
SLIDE 46

Fast density estimate

Fix k ≥ 1 and introduce the (Yatracos) class of sets Ak =

  • {x : g1(x) > g2(x)} : g1, g2 ∈ Fk
  • Gy¨
  • rfi

Model selection for fast density estimation

slide-47
SLIDE 47

Fast density estimate

Fix k ≥ 1 and introduce the (Yatracos) class of sets Ak =

  • {x : g1(x) > g2(x)} : g1, g2 ∈ Fk
  • and the goodness criterion for a density g ∈ Fk:

∆k(g) = sup

A∈Ak

  • A

g − µ2n(A)

  • .

Gy¨

  • rfi

Model selection for fast density estimation

slide-48
SLIDE 48

Fast density estimate

Fix k ≥ 1 and introduce the (Yatracos) class of sets Ak =

  • {x : g1(x) > g2(x)} : g1, g2 ∈ Fk
  • and the goodness criterion for a density g ∈ Fk:

∆k(g) = sup

A∈Ak

  • A

g − µ2n(A)

  • .

The minimum distance estimate ˆ fk minimizes the criterion ∆k(g)

  • ver all g in Fk.

Gy¨

  • rfi

Model selection for fast density estimation

slide-49
SLIDE 49

Fast density estimate

Fix k ≥ 1 and introduce the (Yatracos) class of sets Ak =

  • {x : g1(x) > g2(x)} : g1, g2 ∈ Fk
  • and the goodness criterion for a density g ∈ Fk:

∆k(g) = sup

A∈Ak

  • A

g − µ2n(A)

  • .

The minimum distance estimate ˆ fk minimizes the criterion ∆k(g)

  • ver all g in Fk.

The density estimate is ˆ fkn.

Gy¨

  • rfi

Model selection for fast density estimation

slide-50
SLIDE 50

Theorem 2

If Ak∗ has finite Vapnik-Chervonenkis dimension

Gy¨

  • rfi

Model selection for fast density estimation

slide-51
SLIDE 51

Theorem 2

If Ak∗ has finite Vapnik-Chervonenkis dimension then E

  • ˆ

fkn − f

  • = O

1 √n

  • .

Gy¨

  • rfi

Model selection for fast density estimation

slide-52
SLIDE 52

Theorem 2

If Ak∗ has finite Vapnik-Chervonenkis dimension then E

  • ˆ

fkn − f

  • = O

1 √n

  • .

(Biau, Devroye (2004))

Gy¨

  • rfi

Model selection for fast density estimation

slide-53
SLIDE 53

Problem

The projection with respect to the Yatracos class is too complex.

Gy¨

  • rfi

Model selection for fast density estimation

slide-54
SLIDE 54

Problem

The projection with respect to the Yatracos class is too complex. For a kernel function K and bandwidth r > 0, let f2n be the kernel density estimate with sample size 2n: f2n(x) = 1 2nrd

2n

  • i=1

K x − Xi r

  • .

Gy¨

  • rfi

Model selection for fast density estimation

slide-55
SLIDE 55

Problem

The projection with respect to the Yatracos class is too complex. For a kernel function K and bandwidth r > 0, let f2n be the kernel density estimate with sample size 2n: f2n(x) = 1 2nrd

2n

  • i=1

K x − Xi r

  • .

let Kr ∗ g be the expectation of the kernel estimate with density g: Kr ∗ g(x) = 1 rd

  • K

x − z r

  • g(z)dz.

Gy¨

  • rfi

Model selection for fast density estimation

slide-56
SLIDE 56

Problem

The projection with respect to the Yatracos class is too complex. For a kernel function K and bandwidth r > 0, let f2n be the kernel density estimate with sample size 2n: f2n(x) = 1 2nrd

2n

  • i=1

K x − Xi r

  • .

let Kr ∗ g be the expectation of the kernel estimate with density g: Kr ∗ g(x) = 1 rd

  • K

x − z r

  • g(z)dz.

the estimate ¯ fn is defined as ¯ fn = arg min

g∈Fkn

Kr ∗ g − f2n,

Gy¨

  • rfi

Model selection for fast density estimation

slide-57
SLIDE 57

Problem

The projection with respect to the Yatracos class is too complex. For a kernel function K and bandwidth r > 0, let f2n be the kernel density estimate with sample size 2n: f2n(x) = 1 2nrd

2n

  • i=1

K x − Xi r

  • .

let Kr ∗ g be the expectation of the kernel estimate with density g: Kr ∗ g(x) = 1 rd

  • K

x − z r

  • g(z)dz.

the estimate ¯ fn is defined as ¯ fn = arg min

g∈Fkn

Kr ∗ g − f2n, ¯ fn is an L1-projection of the kernel density estimate f2n with fixed bandwidth r.

Gy¨

  • rfi

Model selection for fast density estimation

slide-58
SLIDE 58

Theorem 3

Assume that Fk is closed in the weak convergence topology for every k ≥ 1.

Gy¨

  • rfi

Model selection for fast density estimation

slide-59
SLIDE 59

Theorem 3

Assume that Fk is closed in the weak convergence topology for every k ≥ 1. Choose kn as before

Gy¨

  • rfi

Model selection for fast density estimation

slide-60
SLIDE 60

Theorem 3

Assume that Fk is closed in the weak convergence topology for every k ≥ 1. Choose kn as before such that the bandwidth is h = hn = (ln n)−(1+δ)/d with δ > 0

Gy¨

  • rfi

Model selection for fast density estimation

slide-61
SLIDE 61

Theorem 3

Assume that Fk is closed in the weak convergence topology for every k ≥ 1. Choose kn as before such that the bandwidth is h = hn = (ln n)−(1+δ)/d with δ > 0 Choose the kernel function K such that it is a density function and its characteristic function is everywhere non-zero.

Gy¨

  • rfi

Model selection for fast density estimation

slide-62
SLIDE 62

Theorem 3

Assume that Fk is closed in the weak convergence topology for every k ≥ 1. Choose kn as before such that the bandwidth is h = hn = (ln n)−(1+δ)/d with δ > 0 Choose the kernel function K such that it is a density function and its characteristic function is everywhere non-zero. Suppose that sup

g∈Fk∗

g − f Kr ∗ g − Kr ∗ f < ∞,

Gy¨

  • rfi

Model selection for fast density estimation

slide-63
SLIDE 63

Theorem 3

Assume that Fk is closed in the weak convergence topology for every k ≥ 1. Choose kn as before such that the bandwidth is h = hn = (ln n)−(1+δ)/d with δ > 0 Choose the kernel function K such that it is a density function and its characteristic function is everywhere non-zero. Suppose that sup

g∈Fk∗

g − f Kr ∗ g − Kr ∗ f < ∞, and √ f < ∞.

Gy¨

  • rfi

Model selection for fast density estimation

slide-64
SLIDE 64

Theorem 3

Assume that Fk is closed in the weak convergence topology for every k ≥ 1. Choose kn as before such that the bandwidth is h = hn = (ln n)−(1+δ)/d with δ > 0 Choose the kernel function K such that it is a density function and its characteristic function is everywhere non-zero. Suppose that sup

g∈Fk∗

g − f Kr ∗ g − Kr ∗ f < ∞, and √ f < ∞. Then E

  • ¯

fn − f

  • ≤ O

1 √n

  • .

Gy¨

  • rfi

Model selection for fast density estimation

slide-65
SLIDE 65

Luc’s problem

Let f be the density of a multidimensional normal distribution.

Gy¨

  • rfi

Model selection for fast density estimation

slide-66
SLIDE 66

Luc’s problem

Let f be the density of a multidimensional normal distribution. Find the optimal density estimate in L1.

Gy¨

  • rfi

Model selection for fast density estimation

slide-67
SLIDE 67

Luc’s problem

Let f be the density of a multidimensional normal distribution. Find the optimal density estimate in L1. min

fn Efn − f

Gy¨

  • rfi

Model selection for fast density estimation

slide-68
SLIDE 68

Luc’s problem

Let f be the density of a multidimensional normal distribution. Find the optimal density estimate in L1. min

fn Efn − f

The plug-in estimate is not optimal.

Gy¨

  • rfi

Model selection for fast density estimation