[PPT] - Adaptive estimation in functional linear model: a model selection PowerPoint Presentation

SLIDE 1

Estimation — Functional linear model

Adaptive estimation in functional linear model: a model selection approach

Angelina Roche

joint work with Élodie Brunel and André Mas

I3M-Université Montpellier II

7èmes Journées de Statistique Fonctionnelle et Opératorielle 28-29 Juin 2012, Montpellier

1 / 34

SLIDE 2

Estimation — Functional linear model

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

2 / 34

SLIDE 3

Estimation — Functional linear model Model definition and estimation context

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

3 / 34

SLIDE 4

Estimation — Functional linear model Model definition and estimation context

Functional linear regression model

We model the dependence between a functional random predictor X and a scalar response Y by the linear relation: Y = 1 β(t)X(t)dt + ε, (1) were β is an unknown function in L2([0, 1]), the slope function; X is a random variable with values in L2([0, 1]), centred, ε is a real random variable, centred, independent of X and with variance σ2. The aim is to estimate the function β from the data of a sample {(Xi, Yi), i = 1, ..., n} verifying Equation (1).

4 / 34

SLIDE 5

Estimation — Functional linear model Model definition and estimation context

Some existing works on functional linear model

Many estimation procedures with asymptotic convergence results (see e.g. Cardot et al. (1999, 2003), Cai and Hall (2006), Hall and Horowitz (2007), Crambes et al. (2009),...). Optimal theoretical choice of smoothing parameters depends on both unknown regularities of the slope β and the predictors X. Smoothing parameters obtained in practice by cross-validation. Non-asymptotic results providing adaptative data-driven estimators were missing up to the recent papers of Comte and Johannes (2010).

5 / 34

SLIDE 6

Estimation — Functional linear model Model definition and estimation context

Prevision error – Definition

Let (Xn+1, Yn+1) independent of the sample (X1, Y1), ..., (Xn, Yn) and ˆ Yn+1 = 1 ˆ β(s)Xn+1(s)ds, the value of Yn+1 predicted from Xn+1 and the estimator ˆ β. Prevision error The prevision error of ˆ β is the quantity: E

ˆ

Yn+1 − E[Yn+1|Xn+1] 2 |X1, ..., Xn

=
j≥1

λj < ˆ β − β, ϕj >2 =: ˆ β − β2

Γ,

where, for all j, λj is the eigenvalue of the covariance operator Γ associated to the eigenfunction ϕj and < ·, · > is the usual scalar product of L2([0, 1]).

6 / 34

SLIDE 7

Estimation — Functional linear model Model definition and estimation context

Reformulation of the problem equation

Multiplying both sides of the model equation by X(s) and taking the expectation we obtain the following formulation of the problem: g(s) := E [YX(s)] = E 1 β(t)X(t)dt X(s)

=: Γβ(s),

where Γ is the covariance operator associated to the random function X. The problem of estimating the function β is then clearly related to the inversion of the covariance operator Γ or of its empirical equivalent Γn : f ∈ L2([0, 1]) → 1 n

n

i=1

< f, Xi > Xi.

7 / 34

SLIDE 8

Estimation — Functional linear model Model definition and estimation context

Dimension reduction

Aim: Find an approximation space Sm of dimension Dm < +∞ containing as much information as possible. Final objective is to define an estimator based on functional PCA i.e. take Sm as the space spanned by the eigenfunctions associated to the m largest eigenvalues of Γn. Problem: Difficulty to control an estimator defined on a random space.

8 / 34

SLIDE 9

Estimation — Functional linear model Model definition and estimation context

Hypothesis – Consequence on the covariance

perator eigenfunctions

Context of “circular data”: assumption on the curve X The curve X is supposed to be 1-periodic (X(0) = X(1)) and second-order stationary. In this context the Fourier basis (ϕj)j≥1 of L2([0, 1]) defined by ϕ1 ≡ 1, ϕ2j(·) = √ 2 cos(2πj·) et ϕ2j+1(·) = √ 2 sin(2πj·), is a basis of eigenfunctions of the covariance operator Γ.

9 / 34

SLIDE 10

Estimation — Functional linear model Estimation procedure of the slope function

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

10 / 34

SLIDE 11

Estimation — Functional linear model Estimation procedure of the slope function Estimation– Minimization of the least square contrast

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

11 / 34

SLIDE 12

Estimation — Functional linear model Estimation procedure of the slope function Estimation– Minimization of the least square contrast

Dimension reduction

Approximation spaces Let Nn ∈ N∗ and m ∈ {1, ..., Nn}, we denote by Sm := span{ϕ1, ..., ϕ2m+1}, the linear space, called model, spanned by the trigonometric basis. For all m ∈ {1, ..., Nn}, we define an estimator of β on the space Sm.

12 / 34

SLIDE 13

Estimation — Functional linear model Estimation procedure of the slope function Estimation– Minimization of the least square contrast

Estimation on Sm

minimization of the least square contrast

For all m ∈ {1, ..., Nn}, we compute the least square estimator on Sm : ˆ βm := arg minf∈Smγn(f) where γn is the least square contrast defined by: γn : f → 1 n

n

i=1

(Yi− < f, Xi >)2. After this first step, we obtain a family {ˆ βm, m = 1, ..., Nn} of estimators of the slope function β.

13 / 34

SLIDE 14

Estimation — Functional linear model Estimation procedure of the slope function Penalized contrast model selection

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

14 / 34

SLIDE 15

Estimation — Functional linear model Estimation procedure of the slope function Penalized contrast model selection

Dimension selection – heuristic

Define by m∗ the unknown ideal dimension, called oracle, m∗ := arg minm=1,...,NnE[β − ˆ βm2

Γ],

the idea is to define a data-driven criterion which allows to select a dimension with performance similar to the oracle. Let βm be the orthogonal projection of β into Sm, we have the following bias-variance decomposition E[β − ˆ βm2

Γ] = E[β − βm2 Γ] + E[βm − ˆ

βm2

Γ].

The idea is to define a criterion with a similar behaviour crit(m) := γn(ˆ βm) + pen(m).

15 / 34

SLIDE 16

Estimation — Functional linear model Estimation procedure of the slope function Penalized contrast model selection

Dimension selection via penalisation

We choose then an element of the family {ˆ βm, m = 1, ..., Nn} minimizing the penalized criterion: crit(m) := γn(ˆ βm) + pen(m), where pen(m) := κ 2m+1

n

σ2, κ is a numerical constant and σ2 can be replaced by an estimator ˆ σ2

m (see section simulations).

The estimator selected by our penalized criterion is then ˆ β ˆ

m with

ˆ m ∈ arg minm=1,...,Nncrit(m).

16 / 34

SLIDE 17

Estimation — Functional linear model Upper and lower bound on the risk

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

17 / 34

SLIDE 18

Estimation — Functional linear model Upper and lower bound on the risk Oracle-inequality

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

18 / 34

SLIDE 19

Estimation — Functional linear model Upper and lower bound on the risk Oracle-inequality

Oracle-inequality

Theorem (Brunel et Roche (2011)) Under moment hypothesis on the random variables < ϕj, X > /

λj

and ε. By choosing Nn such that min

1≤j≤2Nn+1 λj ≥ 2/n2

et 2Nn + 1 ≤ K

n

log3 n , with K a constant. For all slope function β∈L2([0, 1]) such that E[< β, X >4]<+∞: E[ˆ β ˆ

m − β2 Γ] ≤ C1

min

m=1,...,Nn

inf

f∈Sm β − f2 Γ + pen(m)

+ C(β, Γ)

n , where C(β, Γ) = C2

1 + β2

Γ + E[< β, X >4]

.

19 / 34

SLIDE 20

Estimation — Functional linear model Upper and lower bound on the risk Convergence rate over Sobolev spaces

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

20 / 34

SLIDE 21

Estimation — Functional linear model Upper and lower bound on the risk Convergence rate over Sobolev spaces

Periodized Sobolev spaces

We recall the definition of a Sobolev space on [0, 1]: W α

2 =

f ∈ L2([0, 1]), f (α−1) absolutely continuous, f (α) ≤ L
,

for α ∈ N∗ and L > 0. We consider the following subset of W α

2 :

W per(α, L) =

f ∈ W α

2 , ∀j = 1, ..., α − 1, f (j)(0) = f (j)(1)

.

21 / 34

SLIDE 22

Estimation — Functional linear model Upper and lower bound on the risk Convergence rate over Sobolev spaces

Upper-bound on the rate of convergence

Theorem Polynomial case If, for all j, j−2a/c ≤ λj ≤ cj−2a, with a > 1/2 and c > 0, then: sup

β∈W per (α,L)

E[ˆ β ˆ

m − β2 Γ] ≤ CPn−(2α+2a)/(2α+2a+1).

Exponential case If, for all j, exp(−j2a)/c ≤ λj ≤ c exp(−j2a), with a, c > 0, then: sup

β∈W per (α,L)

E[ˆ β ˆ

m − β2 Γ] ≤ CEn−1(log n)1/2a.

Remark: In the case where ε ∼ N(0, σ2), those bounds coincide with the minimal bounds given by Cardot and Johannes (2010).

22 / 34

SLIDE 23

Estimation — Functional linear model Numerical results

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

23 / 34

SLIDE 24

Estimation — Functional linear model Numerical results Simulation method

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

24 / 34

SLIDE 25

Estimation — Functional linear model Numerical results Simulation method

Simulation of X

X =

1001

j=1
λjξjϕj,

with ξ1, ..., ξ1001 independent realizations of N(0, 1). We consider two sequences (λj)j≥1: λ(P)

j

= 1

j2 ;

λ(E)

j

= exp

−
j
.

0.5 1 −5 5 λ = λ(E) t X(t) 0.5 1 −5 5 λ = λ(P) t X(t)

25 / 34

SLIDE 26

Estimation — Functional linear model Numerical results Simulation method

Slope functions

We define: β1(t) = ln(15t2 + 10) + cos(4πt)(Cardot et al. (2003)); β2(t) = t(t − 1).

0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.5 3.5

β1

x Beta(x) 0.0 0.2 0.4 0.6 0.8 1.0 −0.25 −0.15 −0.05

β2

x Beta(x)

Variance of the noise ε: σ2 = 0.01, supposed to be known.

26 / 34

SLIDE 27

Estimation — Functional linear model Numerical results Results

Summary

1

Model definition and estimation context

2

Estimation procedure of the slope function Estimation– Minimization of the least square contrast Penalized contrast model selection

3

Upper and lower bound on the risk Oracle-inequality Convergence rate over Sobolev spaces

4

Numerical results Simulation method Results

27 / 34

SLIDE 28

Estimation — Functional linear model Numerical results Results

Estimation of β1(x) = ln(15x2 + 10) + cos(4πx)

n=2000

0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 3.5 4.0 x Beta(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 3.5 4.0 x Beta(x)

Mean and median for 1000 Monte-Carlo replications of ˆ β ˆ

m − β2 Γ:

n = 500 n = 1000 n = 2000 λ(P)

j

= j−2 mean (×10−3) 0.54 0.17 0.091 median (×10−3) 0.55 0.15 0.089 λ(E)

j

= e−√

j

mean (×10−3) 0.58 0.23 0.13 median (×10−3) 0.58 0.21 0.13

28 / 34

SLIDE 29

Estimation — Functional linear model Numerical results Results

Estimation of β2(x) = x(x − 1)

n = 2000

0.0 0.2 0.4 0.6 0.8 1.0 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 x Beta(x) 0.0 0.2 0.4 0.6 0.8 1.0 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 x Beta(x)

Mean and median for 1000 Monte-Carlo replications of ˆ β ˆ

m − β2 Γ:

n = 500 n = 1000 n = 2000 λ(P)

j

= j−2 mean (×10−3) 0.51 0.084 0.037 median (×10−3) 0.53 0.057 0.033 λ(E)

j

= e−√

j

mean (×10−3) 0.52 0.10 0.044 median (×10−3) 0.56 0.073 0.042

29 / 34

SLIDE 30

Estimation — Functional linear model Numerical results Results

Estimating the noise variance

In case were the variance σ2 is not supposed to be known, we replace the penalty pen(m) = κσ2 Dm

n by

pen(m) := κˆ σ2

m Dm n where

ˆ σ2

m := 1

n

i=1
Yi− < ˆ

βm, Xi > 2 = γn(ˆ βm).

Mean for 1000 Monte-Carlo replications of ˆ β ˆ

m − β2 Γ × 10−3, λj = j−2:

n = 500 n = 1000 n = 2000 β1 known σ2 0.54 0.17 0.091 unknown σ2 0.57 0.18 0.091 β2 known σ2 0.51 0.084 0.037 unknown σ2 0.54 0.089 0.037

30 / 34

SLIDE 31

Estimation — Functional linear model Conclusion and perspectives

Conclusion and perspectives

Estimation of the slope function β by minimization of a penalized least square contrast. Estimation procedure simple enough to be implemented. The prediction error of this estimator is controlled by an oracle inequality whatever the regularity of the function to be estimated. The maximum risk on Sobolev spaces reaches the optimal rate

f convergence.

However, the assumption of periodicity of the function X is too restrictive, a generalization of the results presented for non-periodic curves is in progress.

31 / 34

SLIDE 32

Estimation — Functional linear model Conclusion and perspectives

Estimation procedure

Let (ˆ λj, ˆ ϕj)j≥1 the eigenelements of Γn sorted such that ˆ λ1 ≥ ˆ λ2 ≥ ... .

1. Estimation on ˆ

Sm For all m = 1, ..., Nn, if ˆ λm > 0, set ˜ βm =

m

j=1

< ˆ g, ˆ ϕj > ˆ λj ˆ ϕj, the unique minimizer of the least square contrast on ˆ Sm = span{ ˆ ϕ1, ..., ˆ ϕm} (recall that ˆ g := 1

n

i=1 YiXi.).

2. Dimension selection

ˆ m ∈ arg minm=1,...,Nn(γn(˜ βm) + pen(m)), where pen(m) = κ′ m

n σ2.

32 / 34

SLIDE 33

Estimation — Functional linear model Conclusion and perspectives

Estimation results

Estimation of β1(x) = ln(15x2 + 10) + cos(4πx), n = 2000 λj = j−2 λj = j−3 λj = e−j

0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 3.5 4.0 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 3.5 4.0 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 3.5 4.0 x β ~(x)

Estimation of β2(x) = x(x − 1), n = 2000 λj = j−2 λj = j−3 λj = e−j

0.0 0.2 0.4 0.6 0.8 1.0 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 x β ~(x)

33 / 34

SLIDE 34

Estimation — Functional linear model Conclusion and perspectives

Thank you for your attention!

Brunel E. and Roche A. (2011). Penalized contrast estimation in functional linear model with circular data, hal.archives-ouvertes.fr:hal-00651399. Cai, T. and Hall, P . (2006). Prediction in functional linear regression, Ann. Statist., 34(5), 2159–2179. Cai, T. and Yuan, M. (2012). Minimax and adaptive prediction for functional linear regression, J. American Statistical Association, to appear. Cardot, H., Ferraty, F. and Sarda, P . (2003). Spline estimator for the functional linear model, Statistica Sinica, 13, 571–591. Comte, F. and Johannes, J. (2010). Adaptive estimation in circular functional linear models, Math. Method. Statist. 19(1) 42–63. Crambes, C., Kneip, A. and Sarda, P . (2009). Smoothing spline estimator for functional linear regression, Ann. Statist. 37(1), 35–72.

34 / 34

SLIDE 35

Estimation — Functional linear model

Assumptions on the moments

There exists two constants v > 0 and c > 0 such that, for all j = 1, ..., 2m + 1 and for all q ≥ 2: E  

< ϕj, Xi >
λj
2q

 ≤ q! 2 v2qq−2. The random variable ε admits a moment τp of order p > 6.

1 / 9

SLIDE 36

Estimation — Functional linear model

Sketch of proof

For all m = 1, ..., Nn: γn(ˆ β ˆ

m) + pen( ˆ

m) ≤ γn(ˆ βm) + pen(m) ≤ γn(βm) + pen(m), with βm the orthogonal projection of β on Sm, and γn(ˆ β ˆ

m) − γn(βm) = ˆ

β ˆ

m − β2 n − βm − β2 n + 2νn(βm − ˆ

β ˆ

m),

with, for all f ∈ L2([0, 1]): νn(f) := 1 n

n

i=1

εi < f, Xi >, and f2

n := 1

n

i=1

< f, Xi >2 . Then, ˆ β ˆ

m − β2 n ≤ βm − β2 n + 2νn(ˆ

β ˆ

m − β) + pen(m) − pen( ˆ

m).

2 / 9

SLIDE 37

Estimation — Functional linear model

Simulation of X

X =

501

j=1
λjξjψj,

with ξ1, ..., ξ501 independent realizations of N(0, 1) and (ψj)j≥1 are the eigenfunctions of the covariance operator associated to the Brownian motion: ψj(x) = √ 2 sin(π(j − 0.5)x). We consider three sequences (λj)j≥1: λ(P1)

j

= j−2 λ(P2)

j

= j−3 λ(E)

j

= exp (−j)

3 / 9

SLIDE 38

Estimation — Functional linear model

Comparison with cross validation (β1, λj = j−3, n = 1000)

vc: crit(m) = γn(ˆ βm) + pen(m) (red line) GCV: critGCV(m) =

n

i=1(Yi−ˆ

Yi)2 (1−tr(Hm)/n)2 (light blue dotted line)

CV: critCV(m) = 1

n

i=1(Yi − ˆ

Y (−i)

i

)2 (blue dot dash line)

0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.5 3.5 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.5 3.5 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.5 3.5 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.5 3.5 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.5 3.5 x β ~(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.5 3.5 x β ~(x)

4 / 9

SLIDE 39

Estimation — Functional linear model

Dataset

We apply our estimation procedure to the concentration of ozone data studied previously by Aneiros-Perez et al., Cardot et al. (2006) and Crambes et al. (2009). The dataset consists of 474 daily measurements of ozone concentrations and the ozone peak of the next day.

5 10 15 20 50 100 150

Ozone concentration

Time (h) Ozone concentration (µg/m^3) 100 200 300 400 50 100 150

Ozone peak

Days Ozone concentration 5 / 9

SLIDE 40

Estimation — Functional linear model

Division of data sample

We suppose that the dependence between the concentration curve Xj of the day j and the ozone peak Yj of the day after can be modelled by functional linear regression. We separate randomly our sample into two subsets: A sub-sample {(X E

i , Y E i ), i

= 1, ..., n}, with n = 373, used to calculate the slope estimator ˆ β ˆ

m.

The rest

f

the sample {(X T

i , Y T i ), i

= 1, ..., 101} is kept to evaluate the perfor- mance of the estimator.

6 / 9

SLIDE 41

Estimation — Functional linear model

Results

50 100 150 200 50 100 150 200

Predicted values ......vs. observed values

Estimator defined on the Fourier basis Concentration peak predicted (µg/m^3) Concentration peak observed (µg/m^3)

−100

−60 −40 −20 20

Residuals

Figure: Left: plot of the points (ˆ Y (T)

i

, Y (T)

i

) (blue) and of the line y = x. Right: boxplot of the vector {ˆ Y (T)

i

− Y (T)

i

, i = 1, ..., 101}.

7 / 9

SLIDE 42

Estimation — Functional linear model

Application to ozone data

50 100 150 200 50 100 150 200

Predicted values ......vs. observed values

Estimator defined on the Fourier basis Concentration peak predicted (µg/m^3) Concentration peak observed (µg/m^3)

50

100 150 200 50 100 150 200

Predicted values ......vs. observed values

FPCA Concentration peak predicted (µg/m^3) Concentration peak observed (µg/m^3)

estimator defined on the Fourier basis

FPCA −100 −60 −40 −20 20

Residuals

8 / 9

SLIDE 43

Estimation — Functional linear model

First theoretical results: oracle type inequality

Suppose that the eigenvalues (λj)j≥1 decrease at polynomial or exponential rate. Under some moments assumptions on the curves X and the noise ε. Then for all β ∈ L2([0, 1]) such that there exists a constant b > 0 verifying

j≥1

jb < β, ϕj >2< +∞, we have:

E[ β ˆ

m − β2 Γ]

≤ C1

min

m∈Mn

E[β − ˆ

Πmβ2

n] + E[β − ˆ

Πmβ2

Γ] + pen(m)

+C2

n (1 + β2

Γ),

with C1, C2 > 0 independent of β and n.

9 / 9