goodness of fit tests for the functional linear model
play

Goodness-of-fit tests for the functional linear model with scalar - PowerPoint PPT Presentation

Goodness-of-fit tests for the functional linear model with scalar response with responses missing at random Manuel Febrero-Bande 1 Pedro Galeano 2 es 2 and Wenceslao Gonz alez-Manteiga 1 Eduardo Garc a-Portugu 1 Department of Statistics,


  1. Goodness-of-fit tests for the functional linear model with scalar response with responses missing at random Manuel Febrero-Bande 1 Pedro Galeano 2 es 2 and Wenceslao Gonz´ alez-Manteiga 1 Eduardo Garc´ ıa-Portugu´ 1 Department of Statistics, Mathematical Analysis and Optimization Universidade de Santiago de Compostela 2 Department of Statistics and UC3M-BS Institute of Financial Big Data Universidad Carlos III de Madrid Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 1 / 22

  2. Motivation Regression model with a functional covariate and a scalar response: ◮ General model: Y = m ( X ) + ε , where: ⋆ Real response: Y centered. ⋆ Functional covariate: X ∈ H centered and with covariance operator Γ. ⋆ Hilbert space: H of square integrable functions, with inner product �· , ·� and associated norm � · � . ⋆ Regression operator: m ( X ) = E [ Y |X = X ]. � 0 , σ 2 � ⋆ Error random variable: ε ∼ and ε uncorrelated with X . ◮ Interest: Given a random sample from ( X , Y ), { ( X i , Y i ) } n i =1 , check whether the regression operator m is linear. ◮ Goodness-of-fit tests for linearity: ⋆ Garc´ ıa-Portugu´ es, Gonz´ alez-Manteiga and Febrero-Bande (2014, JCGS). ⋆ Cuesta-Albertos, Garc´ ıa-Portugu´ es, Gonz´ alez-Manteiga and Febrero-Bande (2019, AoS). Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 2 / 22

  3. Motivation Febrero-Bande, Galeano, and Gonz´ alez-Manteiga (2019, CSDA): ◮ Data set: Data from 73 Spanish weather stations in the period 1980 − 2009. ◮ Functional covariate: Mean curve of the annual average daily temperature. ◮ Real response: Average of the total number of hours of sunshine per year. ◮ Missing responses: The responses are not observed in 26 and out of the 73 weather stations (35 . 62% of missing responses). ◮ Functional linear model with scalar response (FLMSR): m ( X ) = �X , β � , where β ∈ H is a functional slope and �· , ·� is the inner product of H . ◮ Two methods for estimating β with FPCs: Simplified method: Delete the pairs with missing responses. 1 Imputed method: Impute the missing responses before estimation. 2 ◮ Results suggest: The imputed method outperforms the simplified method. Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 3 / 22

  4. Motivation Work in progress: ◮ Goal: Analyze goodness-of-fit tests for functional regression models when some of the responses are missing at random. ◮ Two possibilities: Use goodness-of-fit tests after deleting pairs with missing responses. 1 Impute missing responses, then use goodness-of-fit tests. 2 ◮ Question: Which option is better? ◮ Today, initial results on: ⋆ Model: Functional linear model with scalar response (FLMSR). ⋆ Goodness-of-fit test: Garc´ ıa-Portugu´ es et al. (2014, JCGS). Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 4 / 22

  5. The testing problem Elements of the testing problem: ◮ Problem: Test the linear hypothesis H 0 : m ∈ {�· , β � : β ∈ H } versus the alternative hypothesis H 1 : m �∈ {�· , β � : β ∈ H } . ◮ Random sample: { ( X i , Y i , R i ) } n i =1 generated from ( X , Y , R ), where R is Bernoulli with R i = 1, if Y i is observed, and R i = 0, if Y i is missing. ◮ Missing at Random (MAR) mechanism: P ( R = 1 | Y , X ) = P ( R = 1 |X ) = p ( X ) where p : H → [0 , 1] is an unspecified function operator of X . ◮ Consequence: This mechanism allows missing responses to be predicted with the available information. Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 5 / 22

  6. Estimation of the FLMSR with MAR responses Estimation of β with Functional Principal Components (FPCs): ◮ The FLMSR: Y = �X , β � + ε . ◮ Functional slope: β = � ∞ k =1 b k ψ k , where: ⋆ ψ 1 , ψ 2 , . . . are eigenfunction of Γ linked to eigenvalues λ 1 > λ 2 > . . . > 0. ⋆ b k = Cov [ Y , S k ] , for k ∈ N . λ k ⋆ S k = �X , ψ k � , for k ∈ N , are the FPCs scores of X . ◮ Problem: Estimate β with a random sample { ( X i , Y i , R i ) } n i =1 . ◮ Need: ⋆ Estimates of ψ 1 , ψ 2 , . . . and λ 1 , λ 2 , . . . ⋆ Sample S 1 , S 2 , . . . ⋆ A cutoff to truncate the infinite sum that defines β . Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 6 / 22

  7. Estimation of the FLMSR with MAR responses Simplified estimation (Febrero-Bande et al., 2019, CSDA): ◮ Complete-case analysis: Delete pairs with missing responses. ◮ Covariates of complete pairs: X S = {X i : i ∈ I S } , where I S = { i : R i = 1 } . ◮ Estimates of ψ 1 , ψ 2 , . . . and λ 1 , λ 2 , . . . : Eigenfunctions � ψ 1 , S , � ψ 2 , S , . . . and eigenvalues � λ 1 , S ≥ � λ 2 , S ≥ · · · of the sample covariance operator � Γ X S . � � ◮ Sample FPCs scores: � X i , � S i , k , S = ψ k , S , for i ∈ I S and k ∈ N . � � � ◮ Estimate of b k : � i ∈ I S Y i � 1 1 b k , S = S i , k , S , where n S = # I S , for k ∈ N . � n S λ k , S β k S = � k S ◮ Estimate of β : � k =1 � b k , S � ψ k , S , where k S is a cutoff. Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 7 / 22

  8. Estimation of the FLMSR with MAR responses Imputed estimator (Febrero-Bande et al., 2019, CSDA): � � ◮ Impute missing responses: � X i , � Y i , k S = β k S , for i / ∈ I S . ◮ New set of responses: Y i , k S = R i Y i + (1 − R i ) � Y i , k S , for i = 1 , . . . , n . ◮ Covariates of all pairs: X C = {X i : i = 1 , . . . , n } . ◮ Estimates of ψ 1 , ψ 2 , . . . and λ 1 , λ 2 , . . . : Eigenfunctions � ψ 1 , C , � ψ 2 , C , . . . and eigenvalues � λ 1 , C ≥ � λ 2 , C ≥ · · · of the sample covariance operator � Γ X C . � � ◮ Sample FPCs scores: � X i , � S i , k , C = ψ k , C , for i = 1 , . . . , n and k ∈ N . � � � n ◮ Estimate of b k : � 1 1 i =1 Y i , k S � b k , k S , C = S i , k , C , for k ∈ N . � n λ k , C � k C ◮ Estimate of β : � � b k , k S , C � β k S , k C = ψ k , C , where k C is a cutoff. k =1 Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 8 / 22

  9. Estimation of the FLMSR with MAR responses Important notes: ◮ Selection of cutoffs: Use leave-one-out cross-validation or standard model selection criteria (GCV, AIC, AICc, SIC, SICc,. . . ). ◮ Consequence: k S in � β k S may be different to k S and/or k C in � β k S , k C , e. g., it is possible that � β 2 and � β 1 , 3 are the chosen estimators, respectively. ◮ Two sources of potential improvement: Principal component estimation: � β k S depends on � ψ k , S (constructed with X S ), 1 while � β k S , k C depends on � ψ k , C (constructed with X C ). Cutoff selection: � β k S , k C may have smaller MSEE than � β k S if the cutoffs are 2 selected appropriately (see, Febrero-Bande et al., 2019, CSDA). Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 9 / 22

  10. Testing linearity with MAR responses A Cram´ er-von Mises testing procedure (I): ◮ Garc´ ıa-Portugu´ es et al. (2014, JCGS): The following statements are equiva- lent: m ( X ) = �X , β � , ∀X ∈ H . 1 � � E ( Y − �X , β � ) ✶ {�X ,γ �≤ u } = 0, for a.e. u ∈ R and ∀ γ ∈ S H , where S H = 2 { γ ∈ H : � γ � = 1 } . ◮ Estimate of β : � β , may be � β k S , � β k S , k C or some other estimator. � � ◮ Residuals: � X i , � ε i = Y i − β , for i ∈ I S = { i : R i = 1 } . ◮ Therefore: Only residuals for the observed responses. Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 10 / 22

  11. Testing linearity with MAR responses A Cram´ er-von Mises testing procedure (II): ◮ Residual marked empirical process based on projections: � � = n − 1 / 2 � n � R β, u , γ i =1 R i � ε i ✶ {�X i ,γ �≤ u } , where u ∈ R and γ ∈ S H . ◮ CvM statistic: Measure the deviation of { ( X i , Y i , R i ) } n i =1 from H 0 with: � � � � � 2 � � PCvM β = R β, u , γ F n ,γ ( du ) ω ( d γ ) , R × S H where F n ,γ is the ECDF of {�X i , γ � : i = 1 , . . . , n } , and ω is a measure on S H . � � ◮ Unfortunately: Computation of the statistic PCvM � β is not feasible be- cause S H is of infinite dimension. Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 11 / 22

  12. Testing linearity with MAR responses A Cram´ er-von Mises testing procedure (III): ◮ Idea: Replace γ ∈ S H in PCvM with: � � γ k S = � k S ⋆ Simplified estimator: � γ, � � ψ k , S , where γ ∈ S H . ψ k , S k =1 � � γ k S , k C = � k C ⋆ Imputed estimator: � γ, � � ψ k , C , where γ ∈ S H . ψ k , C k =1 ◮ Modified CvM statistic: � � � � � 2 � � β = β, u , � γ k γ k ( du ) ω ( d � γ k ) , MPCvM R F n , � R × S k H where k is either k S or k C , and F n , � γ k is the ECDF of {�X i , � γ k � : i = 1 , . . . , n } . ◮ Simpler expression: After some algebra, it is possible to show that: � � = n − 2 � ε ′ � β ε S , MPCvM S A � where � ε S is the vector of residuals and A is a certain square symmetric matrix. Pedro Galeano Goodness-of-fit for the FLM with MAR responses III IWAFDA 12 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend