Linear predictive functional model on environmental data: case of - - PowerPoint PPT Presentation

linear predictive functional model on environmental data
SMART_READER_LITE
LIVE PREVIEW

Linear predictive functional model on environmental data: case of - - PowerPoint PPT Presentation

Linear predictive functional model on environmental data: case of chlorophyll-a oceanographic profiles Sverine Bayle 1 , Pascal Monestiez 1 , David Nerini 2 1 INRA, UR 546 Biostatistics and Spatial Processes (BioSP), F-84914 AVIGNON. 2


slide-1
SLIDE 1

Linear predictive functional model on environmental data: case of chlorophyll-a

  • ceanographic profiles

Séverine Bayle1, Pascal Monestiez1, David Nerini2

1INRA, UR 546 Biostatistics and Spatial Processes (BioSP), F-84914 AVIGNON. 2Mediterranean Institute of Oceanography (MIO) - UMR 7294, Pytheas Institute (OSU),

Aix-Marseille University, Campus de Luminy, Case 901, 13288 MARSEILLE Cedex 09.

7th Days of functional statistics, Montpellier, June 28-29, 2012

Séverine Bayle (INRA) 1 / 26

slide-2
SLIDE 2

Plan of the talk

1

Introduction

2

Methodology

3

Results

4

Conclusion

Séverine Bayle (INRA) 2 / 26

slide-3
SLIDE 3

Introduction

Context and purpose of the study Physical data (profiles) collected within the framework of ANR project IPSOS-SEAL between October 2009 and January 2010 in Southern Ocean around Kerguelen islands : Chlorophyll-a (Chl-a) : CTD-Fluo and Argos devices Brightness : TDR + GPS devices

Séverine Bayle (INRA) 3 / 26

slide-4
SLIDE 4

Introduction

Capturing elephant seals for installing devices

Séverine Bayle (INRA) 4 / 26

slide-5
SLIDE 5

Introduction

Capturing elephant seals for installing devices

Séverine Bayle (INRA) 5 / 26

slide-6
SLIDE 6

Introduction

Capturing elephant seals for installing devices

Séverine Bayle (INRA) 6 / 26

slide-7
SLIDE 7

Introduction

Capturing elephant seals for installing devices

Séverine Bayle (INRA) 7 / 26

slide-8
SLIDE 8

Introduction

Capturing elephant seals for installing devices

Séverine Bayle (INRA) 8 / 26

slide-9
SLIDE 9

Introduction

Capturing elephant seals for installing devices

Séverine Bayle (INRA) 9 / 26

slide-10
SLIDE 10

Introduction

Capturing elephant seals for installing devices

Séverine Bayle (INRA) 10 / 26

slide-11
SLIDE 11

Introduction

Elephant seal dataset

60 80 100 120 140 160

  • 150
  • 100
  • 50

Light (W/m²) Depth (m) 0.0 0.5 1.0 1.5 2.0

  • 150
  • 100
  • 50

Chl-A (mg/l) Depth (m)

Séverine Bayle (INRA) 11 / 26

slide-12
SLIDE 12

Introduction

Context and purpose of the study Primary productivity : production of vegetal matter Photosynthesis : permitted through the oceanic phytoplankton content in Chl-a → Vital link between living and inorganic stocks of carbon

Séverine Bayle (INRA) 12 / 26

slide-13
SLIDE 13

Introduction

Context and purpose of the study Primary productivity : production of vegetal matter Photosynthesis : permitted through the oceanic phytoplankton content in Chl-a → Vital link between living and inorganic stocks of carbon Measurement of Chl-a concentration throughout the water column in Southern Ocean is used as an indicator of the amount of phytoplankton and allows to know the distribution of primary productivity

Séverine Bayle (INRA) 12 / 26

slide-14
SLIDE 14

Introduction

Context and purpose of the study Primary productivity : production of vegetal matter Photosynthesis : permitted through the oceanic phytoplankton content in Chl-a → Vital link between living and inorganic stocks of carbon Measurement of Chl-a concentration throughout the water column in Southern Ocean is used as an indicator of the amount of phytoplankton and allows to know the distribution of primary productivity Few Chl-a data profiles recorded : devices which record fluorescence are energy-intensive

Séverine Bayle (INRA) 12 / 26

slide-15
SLIDE 15

Introduction

Context and purpose of the study Primary productivity : production of vegetal matter Photosynthesis : permitted through the oceanic phytoplankton content in Chl-a → Vital link between living and inorganic stocks of carbon Measurement of Chl-a concentration throughout the water column in Southern Ocean is used as an indicator of the amount of phytoplankton and allows to know the distribution of primary productivity Few Chl-a data profiles recorded : devices which record fluorescence are energy-intensive But a lot of brightness data profiles

Séverine Bayle (INRA) 12 / 26

slide-16
SLIDE 16

Introduction

Context and purpose of the study Primary productivity : production of vegetal matter Photosynthesis : permitted through the oceanic phytoplankton content in Chl-a → Vital link between living and inorganic stocks of carbon Measurement of Chl-a concentration throughout the water column in Southern Ocean is used as an indicator of the amount of phytoplankton and allows to know the distribution of primary productivity Few Chl-a data profiles recorded : devices which record fluorescence are energy-intensive But a lot of brightness data profiles Idea : reconstruct Chl-a profiles from brigthness profiles

Séverine Bayle (INRA) 12 / 26

slide-17
SLIDE 17

Introduction

Context and purpose of the study In order to calibrate relationships between 2 kinds of data profiles,

  • nly data profiles collected during day were kept

Séverine Bayle (INRA) 13 / 26

slide-18
SLIDE 18

Introduction

Context and purpose of the study In order to calibrate relationships between 2 kinds of data profiles,

  • nly data profiles collected during day were kept

To be more accurate in estimation and smoothing of profiles, only Chl-a data profiles which have 18 observations recorded every 10 meters between -5 et -175 meters were kept (407 profiles selected)

Séverine Bayle (INRA) 13 / 26

slide-19
SLIDE 19

Introduction

Context and purpose of the study In order to calibrate relationships between 2 kinds of data profiles,

  • nly data profiles collected during day were kept

To be more accurate in estimation and smoothing of profiles, only Chl-a data profiles which have 18 observations recorded every 10 meters between -5 et -175 meters were kept (407 profiles selected) Selection of Chl-a and brightness data profiles collected at the same time : 208 profiles altogether

Séverine Bayle (INRA) 13 / 26

slide-20
SLIDE 20

Introduction

Context and purpose of the study In order to calibrate relationships between 2 kinds of data profiles,

  • nly data profiles collected during day were kept

To be more accurate in estimation and smoothing of profiles, only Chl-a data profiles which have 18 observations recorded every 10 meters between -5 et -175 meters were kept (407 profiles selected) Selection of Chl-a and brightness data profiles collected at the same time : 208 profiles altogether Reconstruction of one Chl-a data profile is made for each 208 pairs

Séverine Bayle (INRA) 13 / 26

slide-21
SLIDE 21

Methodology

Functional data analysis Chl-a and brightness functional profiles can be considered as curves zci(t) = yi(t) + ǫi(t), zbi(s) = xi(s) + ǫi(s)

Séverine Bayle (INRA) 14 / 26

slide-22
SLIDE 22

Methodology

Functional data analysis Chl-a and brightness functional profiles can be considered as curves zci(t) = yi(t) + ǫi(t), zbi(s) = xi(s) + ǫi(s) Modeling these functional profiles needs definition of basis functions φk, k = 1, . . . , K

Séverine Bayle (INRA) 14 / 26

slide-23
SLIDE 23

Methodology

Functional data analysis Chl-a and brightness functional profiles can be considered as curves zci(t) = yi(t) + ǫi(t), zbi(s) = xi(s) + ǫi(s) Modeling these functional profiles needs definition of basis functions φk, k = 1, . . . , K Fonctional profiles are defined as linear combinations of these basis functions : yi(t) =

K

  • k=1

cikφk(t), xi(s) =

K

  • k=1

dikφk(s)

c1, . . . , cK and d1, . . . , dK : expansion coefficients φ1, φ2, . . . , φK : basis functions

Séverine Bayle (INRA) 14 / 26

slide-24
SLIDE 24

Methodology

Functional data analysis Reconstruct functional profiles y and x using data (t, zci) and (s, zbi), i = 1, . . . , n

Séverine Bayle (INRA) 15 / 26

slide-25
SLIDE 25

Methodology

Functional data analysis Reconstruct functional profiles y and x using data (t, zci) and (s, zbi), i = 1, . . . , n Utilisation of 10 splines of order 4 1/n

n

  • i=1

(x(ti) − y(ti))2 + λ

  • (x′′(u))2du

Séverine Bayle (INRA) 15 / 26

slide-26
SLIDE 26

Methodology

Functional data analysis Reconstruct functional profiles y and x using data (t, zci) and (s, zbi), i = 1, . . . , n Utilisation of 10 splines of order 4 1/n

n

  • i=1

(x(ti) − y(ti))2 + λ

  • (x′′(u))2du

λ : Trade-off between smoothness of the curve and sum of squared deviations between model and data

Séverine Bayle (INRA) 15 / 26

slide-27
SLIDE 27

Methodology

Functional data analysis Reconstruct functional profiles y and x using data (t, zci) and (s, zbi), i = 1, . . . , n Utilisation of 10 splines of order 4 1/n

n

  • i=1

(x(ti) − y(ti))2 + λ

  • (x′′(u))2du

λ : Trade-off between smoothness of the curve and sum of squared deviations between model and data We work now with splines coefficients ck and dk

Séverine Bayle (INRA) 15 / 26

slide-28
SLIDE 28

Methodology

  • 0.0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 −150 −100 −50 Chl−a (µg/l) Depth (m)

Number of basis functions = number of knots + order of splines

Séverine Bayle (INRA) 16 / 26

slide-29
SLIDE 29

Methodology

Functional linear model We consider a fully functional linear model Assumption : relationship between derivative of brightness function and Chl-a function y(t) = α(t) +

  • β(s, t)x(s)ds + ǫ(t)

y(t) : Chl-a profile reconstructed (or predicted) t and s : Depths x(s) : Derivative of brightness function α(t) : Univariate coefficient (functional intercept) β(s, t) : Bivariate coefficient ǫ(t) : Functional error

Séverine Bayle (INRA) 17 / 26

slide-30
SLIDE 30

Methodology

Functional linear model We consider a fully functional linear model Assumption : relationship between derivative of brightness function and Chl-a function y(t) = α(t) +

  • β(s, t)x(s)ds + ǫ(t)

y(t) : Chl-a profile reconstructed (or predicted) t and s : Depths x(s) : Derivative of brightness function α(t) : Univariate coefficient (functional intercept) β(s, t) : Bivariate coefficient ǫ(t) : Functional error

FDA Package on R

Séverine Bayle (INRA) 17 / 26

slide-31
SLIDE 31

Results

Chl-a functional profiles well predicted...

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −150 −100 −50 Amount of chlorophyll−a (µg/l) Depth (m)

Reconstructed profile Predicted profile

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −150 −100 −50 Amount of chlorophyll−a (µg/l) Depth (m)

Reconstructed profile Predicted profile

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −150 −100 −50 Amount of chlorophyll−a (µg/l) Depth (m)

Reconstructed profile Predicted profile

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −150 −100 −50 Amount of chlorophyll−a (µg/l) Depth (m)

Reconstructed profile Predicted profile

Séverine Bayle (INRA) 18 / 26

slide-32
SLIDE 32

Results

...But some problems remain !

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −150 −100 −50 Amount of chlorophyll−a (µg/l) Depth (m)

Reconstructed profile Predicted profile

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −150 −100 −50 Amount of chlorophyll−a (µg/l) Depth (m)

Reconstructed profile Predicted profile

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −150 −100 −50 Amount of chlorophyll−a (µg/l) Depth (m)

Reconstructed profile Predicted profile

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −150 −100 −50 Amount of chlorophyll−a (µg/l) Depth (m)

Reconstructed profile Predicted profile

Séverine Bayle (INRA) 19 / 26

slide-33
SLIDE 33

Results

Cross validation Is 10 basis functions the optimal number to use ?

Séverine Bayle (INRA) 20 / 26

slide-34
SLIDE 34

Results

Cross validation Is 10 basis functions the optimal number to use ? Check by cross validation → Computation for each marine mammal → One profile is withdrawn (validation set), and others profiles represent training set → Calculation of mean square error → Repetition choosing another validation set which has not yet been used for the validation of the model → Mean of all mean square errors is calculated to estimate prediction error

Séverine Bayle (INRA) 20 / 26

slide-35
SLIDE 35

Results

Cross validation Is 10 basis functions the optimal number to use ? Check by cross validation → Computation for each marine mammal → One profile is withdrawn (validation set), and others profiles represent training set → Calculation of mean square error → Repetition choosing another validation set which has not yet been used for the validation of the model → Mean of all mean square errors is calculated to estimate prediction error 5 basis functions are enough to minimize prediction error

Séverine Bayle (INRA) 20 / 26

slide-36
SLIDE 36

Results

Comparison of R2 between the use of 5 and 10 basis functions Calculation of R2 between measured Chl-a profiles and predicted Chl-a profiles : R2

i = ||yi − ¯

yi||2 − ||ˆ yi − yi||2 ||yi − ¯ yi||2

Séverine Bayle (INRA) 21 / 26

slide-37
SLIDE 37

Results

Comparison of R2 between the use of 5 and 10 basis functions Calculation of R2 between measured Chl-a profiles and predicted Chl-a profiles : R2

i = ||yi − ¯

yi||2 − ||ˆ yi − yi||2 ||yi − ¯ yi||2 Number of basis functions Elephant seal Mean R2 Median R2 5 1st 0.87 0.93 2nd 0.77 0.87 3rd 0.70 0.85 10 1st 0.87 0.93 2nd 0.78 0.86 3rd 0.70 0.84

Séverine Bayle (INRA) 21 / 26

slide-38
SLIDE 38

Results

Characterization of fine scale variations (one day) Only one Chl-a functional profile 21 profiles predicted from 21 brightness functional profiles → Highlighting of fine-scale structures

Séverine Bayle (INRA) 22 / 26

slide-39
SLIDE 39

Conclusion

Discussion Method well suited to predict Chl-a profiles

Séverine Bayle (INRA) 23 / 26

slide-40
SLIDE 40

Conclusion

Discussion Method well suited to predict Chl-a profiles Applicable under similar conditions. Make sure that brightness profiles are recorded during day

Séverine Bayle (INRA) 23 / 26

slide-41
SLIDE 41

Conclusion

Discussion Method well suited to predict Chl-a profiles Applicable under similar conditions. Make sure that brightness profiles are recorded during day Difficulty of choice of number of basis functions : cross validation seems to indicate a few number

Séverine Bayle (INRA) 23 / 26

slide-42
SLIDE 42

Conclusion

Discussion Method well suited to predict Chl-a profiles Applicable under similar conditions. Make sure that brightness profiles are recorded during day Difficulty of choice of number of basis functions : cross validation seems to indicate a few number Chl-a data required pre-treatment (data day), this has a significant influence on the adjustment

Séverine Bayle (INRA) 23 / 26

slide-43
SLIDE 43

Conclusion

Prospect Promotion of many historical records of brightness profiles over a large geographic coverage which will enable monitoring of phytoplankton production

Séverine Bayle (INRA) 24 / 26

slide-44
SLIDE 44

Conclusion

Prospect Promotion of many historical records of brightness profiles over a large geographic coverage which will enable monitoring of phytoplankton production Methodological development (function linmod on R) to integrate several explanatory variables in the model

Séverine Bayle (INRA) 24 / 26

slide-45
SLIDE 45

Conclusion

Prospect Promotion of many historical records of brightness profiles over a large geographic coverage which will enable monitoring of phytoplankton production Methodological development (function linmod on R) to integrate several explanatory variables in the model How to account for Chl-a profiles registered by night ? → Interpolation using kriging

Séverine Bayle (INRA) 24 / 26

slide-46
SLIDE 46

Conclusion

References Ramsay J. & Silverman B. (2005). Functional Data Analysis. Springer, New-York. Ramsay J., Hooker G. & Graves S. (2009). Functional Data Analysis with R and MATLAB. Springer, New-York. Nerini, D., Monestiez, P . & Manté, C. (2010). Cokriging for spatial functional data. Journal of Multivariate Analysis, 101 :409-418

Séverine Bayle (INRA) 25 / 26

slide-47
SLIDE 47

Thanks for your attention !

Séverine Bayle (INRA) 26 / 26