Fast adaptive estimation of log-additive exponential models in - - PowerPoint PPT Presentation

fast adaptive estimation of log additive exponential
SMART_READER_LITE
LIVE PREVIEW

Fast adaptive estimation of log-additive exponential models in - - PowerPoint PPT Presentation

Theoretic results Simulation study Fast adaptive estimation of log-additive exponential models in Kullback-Leibler divergence Colloque Jeunes Probabilistes et Statisticiens Richard Fischer EDF R&D MRI, CERMICS, LAMA Supervisors: Cristina


slide-1
SLIDE 1

Theoretic results Simulation study

Fast adaptive estimation of log-additive exponential models in Kullback-Leibler divergence

Colloque Jeunes Probabilistes et Statisticiens Richard Fischer

EDF R&D MRI, CERMICS, LAMA Supervisors: Cristina Butucea (LAMA), Jean-François Delmas (CERMICS), Anne Dutfoy (EDF R&D MRI)

18/04/2016

Richard Fischer Fast adaptive estimation of log-additive exponential models 1 / 21

slide-2
SLIDE 2

Theoretic results Simulation study

Summary

1 Theoretic results 2 Simulation study

Richard Fischer Fast adaptive estimation of log-additive exponential models 2 / 21

slide-3
SLIDE 3

Theoretic results Simulation study

Summary

1 Theoretic results 2 Simulation study

Richard Fischer Fast adaptive estimation of log-additive exponential models 3 / 21

slide-4
SLIDE 4

Theoretic results Simulation study

Estimation problem

Suppose that we have an i.i.d. sample X n = (X 1, X 2, . . . , X n) of a d-dimensional distribution whose density has a product form on △ = {x = (x1, . . . , xd) ∈ Rd : 0 ≤ x1 ≤ x2 ≤ . . . ≤ xd ≤ 1} : f (x) =

d

  • i=1

pi(xi)1△(x) = e(

d

i=1 ℓ0 i (xi )−a0) 1△(x)

such that

  • [0,1] ℓ0

i qi dx = 0 with qi the i-th marginal of the Lebesgue

measure on △, and a0 a normalizing constant Suppose that for all 1 ≤ i ≤ d, ℓ0

i belong to a Sobolev space W 2 ri (qi)

with ri ∈ N∗ unknown : W 2

ri (qi) =

  • h ∈ L2(qi); h(ri −1) is abs. cont. and h(ri ) ∈ L2(qi)
  • .

The product structure of the density suggests a log-additive model to reduce the d-variate problem to d univariate problems

Richard Fischer Fast adaptive estimation of log-additive exponential models 4 / 21

slide-5
SLIDE 5

Theoretic results Simulation study

Log-Additive Exponential Series Estimator

Log-additive exponential family For θ = (θi,k; 1 ≤ i ≤ d, 1 ≤ k ≤ mi) : fθ(x) = exp d

  • i=1

mi

  • k=1

θi,kϕi,k(xi) − ψ(θ)

  • 1△(x)

We require a family of functions (ϕi,k(xi); 1 ≤ i ≤ d, k ∈ N) adapted to △ (“orthonormality” w.r.t. the Lebesgue measure on △) Basis functions For 1 ≤ i ≤ d, k ∈ N, we define for t ∈ I : ϕi,k(t) = ρi,kP(d−i,i−1)

k

(2t − 1), where P(d−i,i−1)

k

is the k-th degree Jacobi polynomial and ρi,k a constant.

Richard Fischer Fast adaptive estimation of log-additive exponential models 5 / 21

slide-6
SLIDE 6

Theoretic results Simulation study

Maximum likelihood estimator

We have a sample of size n : X n =

  • X j = (X j

1, . . . , X j d)

  • j=1..n

Maximum likelihood estimator ˆ fm,n = fˆ

θm,n verifies, for 1 ≤ i ≤ d,

1 ≤ k ≤ mi : Ef ˆ

θm,n [ϕi,k(Xi)] = ˆ

µm,n,i,k = 1 n

n

  • j=1

ϕi,k(X j

i )

  • empirical mean

This is equivalent to (with |m| = d

i=1 mi) :

ˆ θm,n = argmax θ∈R|m|θ · ˆ µm,n − ψ(θ) = argmax θ∈R|m| 1 n

n

  • j=1

log(fθ(X j))

  • log-likelihood

Richard Fischer Fast adaptive estimation of log-additive exponential models 6 / 21

slide-7
SLIDE 7

Theoretic results Simulation study

Result of non-adaptive convergence rate I.

Theorem Let f 0(x) = exp d

i=1 ℓ0 i (xi) − a0

  • 1△(x). Assume that ℓ0

i ∈ W 2 ri (qi),

ri ∈ N, ri > d. Choose mi = mi(n) → ∞ such that : |m|2d

d

  • i=1

m−2ri

i

→ 0 and |m|2d+1 /n → 0, then the Kullback-Leibler divergence between f and fˆ

θ satisfies :

D(f 0||ˆ fm,n) = OP d

  • i=1
  • m−2ri

i

+ mi n

  • Richard Fischer

Fast adaptive estimation of log-additive exponential models 7 / 21

slide-8
SLIDE 8

Theoretic results Simulation study

Result of non-adaptive convergence rate II.

Optimal convergence rate If we choose mi proportional to n1/(2ri +1), we obtain the optimal univariate rate : D(f ||ˆ fm,n) = OP d

  • i=1

n

−2ri

2ri +1

  • = OP
  • n

−2 min(r)

2 min(r)+1

  • Same rate with mi = n1/(2 min(r)+1) for all 1 ≤ i ≤ d

Uniform convergence Kr(κ) =

  • f 0 = e

d

i=1 ℓ0 [i]−a0; ℓ0

i ∞ ≤ κ, (ℓ0 i )(ri ) L2(qi ) ≤ κ

  • The convergence in probability is uniform on the set Kr(κ) of densities :

lim

K→∞ lim sup n→∞

sup

f 0∈Kr (κ)

P

  • D
  • f 0ˆ

fm,n

d

  • i=1

m−2ri

i

+ |m| n

  • K
  • = 0

Richard Fischer Fast adaptive estimation of log-additive exponential models 8 / 21

slide-9
SLIDE 9

Theoretic results Simulation study

Adaptive estimation

The optimal choice mi ∼ n1/(2 min(r)+1) depends on r, which is unknown Adaptation method :

1 Split the sample into two parts :

X n X n

1

X n

2

Estimators Aggregation

2 Create multiple estimators ˆ

fm,n = fˆ

θm,n with m ∈ Mn based on the

sample X n

1

Number of estimators : Nn, increasing with n Each m ∈ Mn corresponds to regularity parameters r with min(r) fixed

3 Perform a convex aggregation on the logarithms of ˆ

fm,n with the sample X n

2 to obtain the final estimator fˆ λ∗

n

Richard Fischer Fast adaptive estimation of log-additive exponential models 9 / 21

slide-10
SLIDE 10

Theoretic results Simulation study

Choice of estimators

Number of estimators Nn = o(log(n)), limn→∞ Nn = +∞ The grid : Nn =

  • ⌊n

1 2(d+j)+1 ⌋, 1 ≤ j ≤ Nn

  • Same number of basis functions in each direction :

Mn =

  • m = (v, . . . , v) ∈ Rd, v ∈ Nn
  • m1

m2

Richard Fischer Fast adaptive estimation of log-additive exponential models 10 / 21

slide-11
SLIDE 11

Theoretic results Simulation study

Choice of estimators

Number of estimators Nn = o(log(n)), limn→∞ Nn = +∞ The grid : Nn =

  • ⌊n

1 2(d+j)+1 ⌋, 1 ≤ j ≤ Nn

  • Same number of basis functions in each direction :

Mn =

  • m = (v, . . . , v) ∈ Rd, v ∈ Nn
  • m1

m2

Richard Fischer Fast adaptive estimation of log-additive exponential models 10 / 21

slide-12
SLIDE 12

Theoretic results Simulation study

Choice of estimators

Number of estimators Nn = o(log(n)), limn→∞ Nn = +∞ The grid : Nn =

  • ⌊n

1 2(d+j)+1 ⌋, 1 ≤ j ≤ Nn

  • Same number of basis functions in each direction :

Mn =

  • m = (v, . . . , v) ∈ Rd, v ∈ Nn
  • m1

m2

Richard Fischer Fast adaptive estimation of log-additive exponential models 10 / 21

slide-13
SLIDE 13

Theoretic results Simulation study

Choice of estimators

Number of estimators Nn = o(log(n)), limn→∞ Nn = +∞ The grid : Nn =

  • ⌊n

1 2(d+j)+1 ⌋, 1 ≤ j ≤ Nn

  • Same number of basis functions in each direction :

Mn =

  • m = (v, . . . , v) ∈ Rd, v ∈ Nn
  • m1

m2

Richard Fischer Fast adaptive estimation of log-additive exponential models 10 / 21

slide-14
SLIDE 14

Theoretic results Simulation study

Convex aggregation of log-densities

Convex combination of log-densities Let ˆ ℓm,n(x) = d

i=1

mi

k=1 ˆ

θi,kϕi,k(xi) for m ∈ Mn fλ(x) = exp

m∈Mn

λmˆ ℓm,n(x) − ψλ

  • 1△(x)

with λ ∈ Λ+ = {(λm, m ∈ Mn), λm ≥ 0 and

m∈Mn λm = 1}

Selection of weights ˆ λ∗

n based on the sample X n 2 :

ˆ λ∗

n = argmax λ∈Λ+

1 |X n

2 |

  • X j ∈X n

2

log

  • fλ(X j)
  • log-likelihood

− 1 2 pen (λ)

  • penalty

with pen (λ) =

m∈Mn λm D

  • fλˆ

fm,n

  • Richard Fischer

Fast adaptive estimation of log-additive exponential models 11 / 21

slide-15
SLIDE 15

Theoretic results Simulation study

Sharp oracle inequality for aggregation

Lemma Let n ∈ N∗ be fixed. The convex aggregate estimator fˆ

λ∗

n verifies for any

x > 0 with probability greater than 1 − exp(−x) : D

  • f 0fˆ

λ∗

n

  • − min

m∈Mn D

  • f 0ˆ

fm,n

  • ≤ β(log(Nn) + x)

n , with a constant β = β(ℓ0 ∞, (ℓ0

i )(ri ) L2(qi )).

Order of the remainder term log(Nn)/n negligible compared to n−2 min(r)/(2 min(r)+1).

Richard Fischer Fast adaptive estimation of log-additive exponential models 12 / 21

slide-16
SLIDE 16

Theoretic results Simulation study

Adaptive estimation - Main result

Theorem The convex aggregate estimator fˆ

λ∗

n converges to f in probability with

the convergence rate : D(f ||fˆ

λ∗

n) = OP

  • n−

2 min(r) 2 min(r)+1

  • .

Uniform convergence The convergence is uniform for r ∈ Rn = {j, d + 1 ≤ j ≤ Rn} : lim

K→∞ lim sup n→∞

sup

r∈(Rn)d

sup

f 0∈Kr (κ)

P

  • D
  • f 0fˆ

λ∗

n

  • n−

2 min(r) 2 min(r)+1

  • K
  • = 0,

where Rn satisfies : Rn ≤ Nn + d, Rn ≤

  • n

1 2(d+Nn)+1

  • ,

Rn ≤ log(n) 2 log(log(Nn)) − 1 2

Richard Fischer Fast adaptive estimation of log-additive exponential models 13 / 21

slide-17
SLIDE 17

Theoretic results Simulation study

Summary

1 Theoretic results 2 Simulation study

Richard Fischer Fast adaptive estimation of log-additive exponential models 14 / 21

slide-18
SLIDE 18

Theoretic results Simulation study

Truncation model

Y = (Y1, Y2) independent with density p1, p2 We observe only when 0 ≤ Y1 ≤ Y2 ≤ 1 Density of the observations : f (x) = p1(x1)p2(x2)

  • △ p1(x1)p2(x2) dx 1△(x).

0.2 0.4 0.6 0.8 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 f(x) g_1 g_2

(a) Beta

0.2 0.4 0.6 0.8 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 f(x) g_1 g_2

(b) Gumbel

0.2 0.4 0.6 0.8 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 f(x) g_1 g_2

(c) Normal mix

Richard Fischer Fast adaptive estimation of log-additive exponential models 15 / 21

slide-19
SLIDE 19

Theoretic results Simulation study

Simulation framework

Aggregate estimator with m1 = m2 = 1, 2, 3, 4 Sample size : n = 200, 500, 1000 Split into two parts : n1 = 0.8n, n2 = 0.2n Parameter estimation : ˆ θm,n = argmax

θ∈Rm1+m2 θ · ˆ

µm,n − ψ(θ) calculated by numerical optimization Comparison with kernel density estimator with Gaussian kernel and bandwidth selected according to Scott’s rule We calculate the average Kullback-Leibler distance based on 100 estimations

Richard Fischer Fast adaptive estimation of log-additive exponential models 16 / 21

slide-20
SLIDE 20

Theoretic results Simulation study

Simulation results - Beta I.

  • AESE

Kernel 0.00 0.04 0.08

KL−distance, n=200

  • AESE

Kernel 0.00 0.02 0.04 0.06

KL−distance, n=500

  • AESE

Kernel 0.00 0.02 0.04

KL−distance, n=1000

  • AESE

Kernel 0.0 0.1 0.2 0.3 0.4

L2−distance, n=200

  • AESE

Kernel 0.00 0.10 0.20

L2−distance, n=500

  • AESE

Kernel 0.00 0.05 0.10 0.15

L2−distance, n=1000

Richard Fischer Fast adaptive estimation of log-additive exponential models 17 / 21

slide-21
SLIDE 21

Theoretic results Simulation study

Simulation results - Beta II.

X 0.0 0.2 0.4 0.6 0.8 1.0 T 0.0 0.2 0.4 0.6 0.8 1.0 f(X,T) 1 2 3 4 5 6

(a) True density

X 0.0 0.2 0.4 0.6 0.8 1.0 T 0.0 0.2 0.4 0.6 0.8 1.0 f(X,T) 1 2 3 4 5 6

(b) LAESE

X 0.0 0.2 0.4 0.6 0.8 1.0 T 0.0 0.2 0.4 0.6 0.8 1.0 f(X,T) 1 2 3 4 5 6

(c) Kernel

0.0 0.2 0.4 0.6 0.8 x_1 0.0 0.2 0.4 0.6 0.8 x_2 0.600 1.200 1.800 2.400 3.000 3 . 6 4.200 0.0 0.2 0.4 0.6 0.8 x_1 0.0 0.2 0.4 0.6 0.8 x_2 0.600 1.200 1 . 8 2.400 0.0 0.2 0.4 0.6 0.8 x_1 0.0 0.2 0.4 0.6 0.8 x_2 . 6 1.200 1 . 8 2.400 3 . 3.600

Richard Fischer Fast adaptive estimation of log-additive exponential models 18 / 21

slide-22
SLIDE 22

Theoretic results Simulation study

Simulation results - Normal mix I.

  • AESE

Kernel 0.04 0.08 0.12

KL−distance, n=200

  • AESE

Kernel 0.02 0.06 0.10

KL−distance, n=500

  • AESE

Kernel 0.020 0.030 0.040 0.050

KL−distance, n=1000

  • AESE

Kernel 0.1 0.2 0.3 0.4 0.5 0.6

L2−distance, n=200

  • AESE

Kernel 0.1 0.2 0.3 0.4 0.5

L2−distance, n=500

  • AESE

Kernel 0.10 0.15 0.20 0.25

L2−distance, n=1000

Richard Fischer Fast adaptive estimation of log-additive exponential models 19 / 21

slide-23
SLIDE 23

Theoretic results Simulation study

Simulation results - Normal mix II.

X_1 0.0 0.2 0.4 0.6 0.8 1.0 X_2 0.0 0.2 0.4 0.6 0.8 1.0 f(X_1,X_2) 1 2 3 4 5 6

(a) True density

X_1 0.0 0.2 0.4 0.6 0.8 1.0 X_2 0.0 0.2 0.4 0.6 0.8 1.0 f(X_1,X_2) 1 2 3 4 5 6

(b) LAESE

X_1 0.0 0.2 0.4 0.6 0.8 1.0 x_2 0.0 0.2 0.4 0.6 0.8 1.0 f(X_1,X_2) 1 2 3 4 5 6

(c) Kernel

0.0 0.2 0.4 0.6 0.8 x_1 0.0 0.2 0.4 0.6 0.8 x_2 0.800 1.600 1.600 2.400 2.400 3.200 3 . 2 4.000 4.000 4 . 8 4 . 8 0.0 0.2 0.4 0.6 0.8 x_1 0.0 0.2 0.4 0.6 0.8 x_2 0.800 1 . 6 2.400 2.400 3 . 2 3.200 4.000 4 . 4 . 8 4.800 0.0 0.2 0.4 0.6 0.8 x_1 0.0 0.2 0.4 0.6 0.8 x_2 0.800 1 . 6 2 . 4 3.200 3.200 4 . 4.000

Richard Fischer Fast adaptive estimation of log-additive exponential models 20 / 21

slide-24
SLIDE 24

Theoretic results Simulation study

Bibliography

A.R. Barron,C.-H. Sheu Approximation of density functions by sequences of exponential families. The Annals of Statistics, 19(3) :1347–1369, 1991.

  • Y. Yang, A.R. Barron

Information-theoretic determination of minimax rates of convergence. Annals of Statistics 1564–1599, 1999.

  • X. Wu

Exponential series estimator of multivariate densities. Journal of Econometrics, 156(2) :354–366, 2010.

  • C. Butucea, J.-F. Delmas, A. Dutfoy, R. Fischer

Nonparametric estimation of distributions of order statistics with application to nuclear engineering Paper presented at Safety and Reliability of Complex Engineered Systems : ESREL 2015, 2015.

  • C. Butucea, J.-F. Delmas, A. Dutfoy, R. Fischer

Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss Submitted to Electronic Journal of Statistics, 2016.

  • C. Butucea, J.-F. Delmas, A. Dutfoy, R. Fischer

Fast adaptive estimation of log-additive exponential models in Kullback-Leibler divergence Working paper, 2016.

Richard Fischer Fast adaptive estimation of log-additive exponential models 21 / 21