Conditional quantiles with functional covariates: an application to - - PowerPoint PPT Presentation

conditional quantiles with functional covariates an
SMART_READER_LITE
LIVE PREVIEW

Conditional quantiles with functional covariates: an application to - - PowerPoint PPT Presentation

Conditional quantiles with functional covariates: an application to Ozone pollution forecasting Herv Cardot, Christophe Crambes & Pascal Sarda Compstat - Prague August 2004 Compstat 2004 - Prague p.1/14 Presentation of the data (1)


slide-1
SLIDE 1

Conditional quantiles with functional covariates: an application to Ozone pollution forecasting

Hervé Cardot, Christophe Crambes & Pascal Sarda Compstat - Prague August 2004

Compstat 2004 - Prague – p.1/14

slide-2
SLIDE 2

Presentation of the data (1) Data (ORAMIP) :

Compstat 2004 - Prague – p.2/14

slide-3
SLIDE 3

Presentation of the data (1) Data (ORAMIP) :

9 variables : NO, N2, O3, WD, WS, . . . (hourly

measurements)

Compstat 2004 - Prague – p.2/14

slide-4
SLIDE 4

Presentation of the data (1) Data (ORAMIP) :

9 variables : NO, N2, O3, WD, WS, . . . (hourly

measurements)

6 stations

Compstat 2004 - Prague – p.2/14

slide-5
SLIDE 5

Presentation of the data (1) Data (ORAMIP) :

9 variables : NO, N2, O3, WD, WS, . . . (hourly

measurements)

6 stations 4 years : 1997 − 2000 (15th May - 15th Sept)

Compstat 2004 - Prague – p.2/14

slide-6
SLIDE 6

Presentation of the data (2)

10 20 30 40 50 60 70 hours 20 40 60 80 100 120 Ozone

Compstat 2004 - Prague – p.3/14

slide-7
SLIDE 7

Presentation of the data (3)

Compstat 2004 - Prague – p.4/14

slide-8
SLIDE 8

Presentation of the data (3)

variable of interest : max of O3 every day:

Y = t (Y1, . . . , Yn)

Compstat 2004 - Prague – p.4/14

slide-9
SLIDE 9

Presentation of the data (3)

variable of interest : max of O3 every day:

Y = t (Y1, . . . , Yn)

covariates : NO, N2, O3, DV or V V :

18h . . . 24h 1h . . . 17h day 0/day 1 X1,1 . . . . . . . . . . . . X1,24 . . . . . . . . . day n − 1/day n Xn,1 . . . . . . . . . . . . Xn,24

Compstat 2004 - Prague – p.4/14

slide-10
SLIDE 10

Presentation of the data (3)

variable of interest : max of O3 every day:

Y = t (Y1, . . . , Yn)

covariates : NO, N2, O3, DV or V V :

18h . . . 24h 1h . . . 17h day 0/day 1 X1,1 . . . . . . . . . . . . X1,24 . . . . . . . . . day n − 1/day n Xn,1 . . . . . . . . . . . . Xn,24

(Xi, Yi)i=1,...,n couples of random variables with

Yi ∈ R and Xi ∈ L2(I)

Compstat 2004 - Prague – p.4/14

slide-11
SLIDE 11

Presentation of the data (3)

variable of interest : max of O3 every day:

Y = t (Y1, . . . , Yn)

covariates : NO, N2, O3, DV or V V :

18h . . . 24h 1h . . . 17h day 0/day 1 X1,1 . . . . . . . . . . . . X1,24 . . . . . . . . . day n − 1/day n Xn,1 . . . . . . . . . . . . Xn,24

(Xi, Yi)i=1,...,n couples of random variables with

Yi ∈ R and Xi ∈ L2(I)

Xi is known in t1, . . . , tp ∈ I (equispaced)

Compstat 2004 - Prague – p.4/14

slide-12
SLIDE 12

Definition of the conditional quantiles

Compstat 2004 - Prague – p.5/14

slide-13
SLIDE 13

Definition of the conditional quantiles

α ∈]0, 1[, x ∈ L2(I)

Compstat 2004 - Prague – p.5/14

slide-14
SLIDE 14

Definition of the conditional quantiles

α ∈]0, 1[, x ∈ L2(I) α conditional quantile :

P (Y ≤ gα(X)|X = x) = α

Compstat 2004 - Prague – p.5/14

slide-15
SLIDE 15

Definition of the conditional quantiles

α ∈]0, 1[, x ∈ L2(I) α conditional quantile :

P (Y ≤ gα(X)|X = x) = α

property :

gα(x) = arg min

a∈R E (lα(Y − a)|X = x)

with lα(u) = |u| + (2α − 1)u

Compstat 2004 - Prague – p.5/14

slide-16
SLIDE 16

Presentation of the model

Compstat 2004 - Prague – p.6/14

slide-17
SLIDE 17

Presentation of the model

model (cf. Koenker and Bassett, 1978) :

gα(X) = c + Ψα, X = c +

  • I

Ψα (t)X(t) dt

Compstat 2004 - Prague – p.6/14

slide-18
SLIDE 18

Presentation of the model

model (cf. Koenker and Bassett, 1978) :

gα(X) = c + Ψα, X = c +

  • I

Ψα (t)X(t) dt

we want to estimate the function Ψα ∈ L2(I) : spline

estimation

Compstat 2004 - Prague – p.6/14

slide-19
SLIDE 19

Spline estimation of Ψα

Compstat 2004 - Prague – p.7/14

slide-20
SLIDE 20

Spline estimation of Ψα k ∈ N⋆, q ∈ N

Compstat 2004 - Prague – p.7/14

slide-21
SLIDE 21

Spline estimation of Ψα k ∈ N⋆, q ∈ N

I1 Ik Ij k sub−intervals interval I

Compstat 2004 - Prague – p.7/14

slide-22
SLIDE 22

Spline estimation of Ψα k ∈ N⋆, q ∈ N Bk,q =

t(B1, . . . , Bk+q) B-splines basis

Compstat 2004 - Prague – p.7/14

slide-23
SLIDE 23

Spline estimation of Ψα k ∈ N⋆, q ∈ N Bk,q =

t(B1, . . . , Bk+q) B-splines basis

estimator : Ψα =

tBk,q

θ =

k+q

  • j=1
  • θj Bj

Compstat 2004 - Prague – p.7/14

slide-24
SLIDE 24

Spline estimation of Ψα k ∈ N⋆, q ∈ N Bk,q =

t(B1, . . . , Bk+q) B-splines basis

estimator : Ψα =

tBk,q

θ =

k+q

  • j=1
  • θj Bj

Compstat 2004 - Prague – p.7/14

slide-25
SLIDE 25

Determination of c and θ

Compstat 2004 - Prague – p.8/14

slide-26
SLIDE 26

Determination of c and θ θ and c solution of the minimisation problem :

min

θ∈Rk+q

1 n

n

  • i=1

lα(Yi − c − tBk,qθ, Xi) + ρ (tBk,qθ)(m) 2

  • Compstat 2004 - Prague – p.8/14
slide-27
SLIDE 27

Determination of c and θ θ and c solution of the minimisation problem :

min

θ∈Rk+q

1 n

n

  • i=1

lα(Yi − c − tBk,qθ, Xi) + ρ (tBk,qθ)(m) 2

  • empirical version of

E (lα(Y − c − s, X))

Compstat 2004 - Prague – p.8/14

slide-28
SLIDE 28

Determination of c and θ θ and c solution of the minimisation problem :

min

θ∈Rk+q

1 n

n

  • i=1

lα(Yi − c − tBk,qθ, Xi) + ρ (tBk,qθ)(m) 2

  • penalization

Compstat 2004 - Prague – p.8/14

slide-29
SLIDE 29

Determination of c and θ θ and c solution of the minimisation problem :

min

θ∈Rk+q

1 n

n

  • i=1

lα(Yi − c − tBk,qθ, Xi) + ρ (tBk,qθ)(m) 2

  • no explicit solution

Compstat 2004 - Prague – p.8/14

slide-30
SLIDE 30

Determination of c and θ θ and c solution of the minimisation problem :

min

θ∈Rk+q

1 n

n

  • i=1

lα(Yi − c − tBk,qθ, Xi) + ρ (tBk,qθ)(m) 2

  • no explicit solution

algorithm : Iterative Reweighted Least Squares

Compstat 2004 - Prague – p.8/14

slide-31
SLIDE 31

Multiple conditional quantiles

Compstat 2004 - Prague – p.9/14

slide-32
SLIDE 32

Multiple conditional quantiles

v covariates X1, . . . , Xv

Compstat 2004 - Prague – p.9/14

slide-33
SLIDE 33

Multiple conditional quantiles

v covariates X1, . . . , Xv model :

gα(X1, . . . , Xv) = c +

  • I

Ψ1

α(t)X1(t) dt + . . . +

  • I

Ψv

α(t)Xv(t) dt

Compstat 2004 - Prague – p.9/14

slide-34
SLIDE 34

Multiple conditional quantiles

v covariates X1, . . . , Xv model :

gα(X1, . . . , Xv) = c +

  • I

Ψ1

α(t)X1(t) dt + . . . +

  • I

Ψv

α(t)Xv(t) dt algorithm : backfitting + Iterative Reweighted Least

Squares

Compstat 2004 - Prague – p.9/14

slide-35
SLIDE 35

Application to the pollution data

Compstat 2004 - Prague – p.10/14

slide-36
SLIDE 36

Application to the pollution data

learning sample : (Xli, Yli)i=1,...,nlearn test sample : (Xti, Yti)i=1,...,ntest

Compstat 2004 - Prague – p.10/14

slide-37
SLIDE 37

Application to the pollution data

learning sample : (Xli, Yli)i=1,...,nlearn test sample : (Xti, Yti)i=1,...,ntest number of knots : k = 8 (equispaced) degree of splines functions : q = 3

  • rder of derivation in the penalization : m = 2

Compstat 2004 - Prague – p.10/14

slide-38
SLIDE 38

Application to the pollution data

learning sample : (Xli, Yli)i=1,...,nlearn test sample : (Xti, Yti)i=1,...,ntest number of knots : k = 8 (equispaced) degree of splines functions : q = 3

  • rder of derivation in the penalization : m = 2

choice of ρ : Generalized Cross Validation

Compstat 2004 - Prague – p.10/14

slide-39
SLIDE 39

Quality criteria of the models

Compstat 2004 - Prague – p.11/14

slide-40
SLIDE 40

Quality criteria of the models C1 =

1 nt

nt

i=1(Yti −

Yti)2

1 nt

nt

i=1(Yti − Y l)2

Compstat 2004 - Prague – p.11/14

slide-41
SLIDE 41

Quality criteria of the models C1 =

1 nt

nt

i=1(Yti −

Yti)2

1 nt

nt

i=1(Yti − Y l)2

C2 = 1 nt

nt

  • i=1

| Yti − Yti |

Compstat 2004 - Prague – p.11/14

slide-42
SLIDE 42

Quality criteria of the models C1 =

1 nt

nt

i=1(Yti −

Yti)2

1 nt

nt

i=1(Yti − Y l)2

C2 = 1 nt

nt

  • i=1

| Yti − Yti | C3 =

1 nt

nt

i=1 lα(Yti −

Yti)

1 nt

nt

i=1 lα(Yti − qα(Yl))

Compstat 2004 - Prague – p.11/14

slide-43
SLIDE 43

Results (conditional median)

Compstat 2004 - Prague – p.12/14

slide-44
SLIDE 44

Results (conditional median)

Models Variables C1 C2 C3 N2 0.814 16.916 0.906 1 covariate O3 0.414 12.246 0.656 WS 0.802 16.836 0.902 O3, NO 0.413 11.997 0.643 2 covariates O3, N2 0.413 11.880 0.637 O3, WS 0.414 12.004 0.635 O3, NO, N2 0.412 12.127 0.644 3 covariates O3, N2, WD 0.409 12.004 0.645 O3, N2, WS 0.410 11.997 0.642 4 covariates O3, NO, N2, WS 0.400 11.718 0.634 5 covariates O3, NO, N2, WD, WS 0.401 11.750 0.639

Compstat 2004 - Prague – p.12/14

slide-45
SLIDE 45

Forecasting (conditional median)

Compstat 2004 - Prague – p.13/14

slide-46
SLIDE 46

Forecasting (conditional median) Predicted maximum of Ozone versus measured maximum of Ozone (covariates : O3, NO, N2, WS) :

40 60 80 100 120 140 160 180 60 80 100 120 140 160

Yti Yti Compstat 2004 - Prague – p.13/14

slide-47
SLIDE 47

Conclusion

satisfying predictions improvements : use of other covariates

(temperature, . . . )

  • utlook : model where Xi is not observed (we
  • bserve Wi = Xi + δi)

Compstat 2004 - Prague – p.14/14