data driven tests for homoscedastic linear regression
play

Data driven tests for homoscedastic linear regression model Tadeusz - PDF document

Data driven tests for homoscedastic linear regression model Tadeusz Inglot and Teresa Ledwina Model and null hypothesis Z = ( X, Y ) random vector in [0 , 1] R, X, independent, distributions of X E f = 0 , = E f 2 (0 , ) ,


  1. Data driven tests for homoscedastic linear regression model Tadeusz Inglot and Teresa Ledwina Model and null hypothesis Z = ( X, Y ) random vector in [0 , 1] × R, X, ǫ independent, distributions of X E f ǫ = 0 , τ = E f ǫ 2 ∈ (0 , ∞ ) , and ǫ unknown, X ∼ g, ǫ ∼ f, H 0 : Y = β [ v ( X )] T + ǫ, β ∈ R q , v ( x ) = ( v 1 ( x ) , ..., v q ( x )) given vector of bounded functions. Overfitting. Auxiliary models M ( k ) , k = 1 , 2 , ... Y = θ [ u ( x )] T + β [ v ( x )] T + ǫ, θ ∈ R k , u ( x ) = ( u 1 ( x ) , ..., u k ( x )) vector of bounded functions lineary independent on v ( x ) . Auxiliary solution Given k and M ( k ) , Y = θ [ u ( x )] T + β [ v ( x )] T + ǫ, θ ∈ R k , we construct efficient score statistic for H 0 ( k ) : θ = 0 , η, η = ( β, √ g, √ f ) - nuisance parameter. ℓ ∗ - k -dimensional efficient score vector, i.e. the residuals from projections [derived under H 0 ( k ) ] of scores for the parameters of interest [ θ 1 , ..., θ k ] onto scores for the nuisance parameters [ η ]; Neyman (1959). � T � n � � n 1 1 � ℓ ∗ ( Z i ; η ) � ℓ ∗ ( Z i ; η ) W k ( η ) = √ n L √ n . i =1 i =1 where � − 1 = L , W k ( η ) D E η [ ℓ ∗ ( Z ; η )] T [ ℓ ∗ ( Z ; η )] � → χ 2 E η ℓ ∗ ( Z ; η ) = 0 , k .

  2. Define � T n n � � � 1 1 ˆ ˆ ˆ � ℓ ∗ ( Z i ; ˆ � ℓ ∗ ( Z i ; ˆ √ n √ n W k (ˆ η ) = η ) L η ) , i =1 i =1 where ˆ η ) is an estimator of ℓ ∗ ( • ; η ) , while ˆ ℓ ∗ ( • ; ˆ L is an estimator of L. Theorem 1. Assume the null hypothesis H 0 ( k ) : θ = 0 is true and some mild extra assumptions are fulfilled. Suppose that ˆ L is a consistent estimator of L and the estimator ˆ ℓ ∗ ( • ; ˆ η ) satisfies the following condition n �� � � � � � ≥ δ √ n � � � � P n � [ˆ ℓ ∗ ( Z i ; ˆ η ) − ℓ ∗ ( Z i ; η )] → 0 � � � � η � � � � � � � i =1 for every δ > 0 , as n → ∞ . Then for the test statistic W k (ˆ η ) it holds that η ) D → χ 2 W k (ˆ k , as n → ∞ . Some class of estimators is proposed for which Theorem 1 holds. Selecting k in W k (ˆ η ) Score-based rule mimicing Schwarz’s BIC S 1 = min { 1 ≤ k ≤ d : W k (ˆ η ) − k log n ≥ W s (ˆ η ) − s log n, s = 1 , ..., d } . Score-based rule imitating Akaike’s AIC A 1 = min { 1 ≤ k ≤ d : W k (ˆ η ) − 2 k ≥ W s (ˆ η ) − 2 s, s = 1 , ..., d } . Refined score-based selection rule [Inglot and Ledwina (2006c)]. � ˆ � n − 1 / 2 � n i =1 ˆ ℓ ∗ ( Z i ; ˆ L 1 / 2 . Then, W k (ˆ η ) = || ( Y 1 , ..., Y k ) || 2 . Define Set ( Y 1 , ... Y k ) = η ) new penalty if max 1 ≤ t ≤ d |Y t | ≤ √ p log n, � s log n, if max 1 ≤ t ≤ d |Y t | > √ p log n, π ( s, p ) = 2 s, where p is some fixed positive number. Then the refined selection rule is given by T 1 = min { 1 ≤ k ≤ d : W k (ˆ η ) − π ( k, p ) ≥ W s (ˆ η ) − π ( s, p ) , s = 1 , ..., d } .

  3. Data driven score test statistics W S 1 (ˆ η ) , W T 1 (ˆ η ) . Asymptotic behaviour under H 0 For simplicity we assumed that d , the number of models on the list, does not depend on n . Theorem 2. Under the null hypothesis H 0 : Y = β [ v ( X )] T + ǫ , the assumptions of Theorem 1 and n → ∞ , it holds that η ) D η ) D P n → χ 2 P n → χ 2 η ( S 1 > 1) → 0 , W S 1 (ˆ 1 , and η ( T 1 > 1) → 0 , W T 1 (ˆ 1 . Example H 0 : Y = β 1 + β 2 X + ǫ. Simulated critical values of W S 1 and W T 1 under the null model Y = 1 + 2 X + ǫ with X uniform on [0,1] and different errors. Sample size n = 300 . 5% significance level, N = 10000 MC runs. p = 2 . 4 . Error Parameter Variance Critical values distribution W S 1 W T 1 G ( σ ) 0.25 0.063 5.91 6.11 0.50 0.250 5.63 5.92 Gaussian 0.75 0.563 5.83 6.04 1.00 1.000 5.79 6.02 L ( ϕ ) 4.00 0.125 5.29 5.57 2.00 0.500 5.27 5.50 Laplace 1.00 2.000 5.75 5.93 0.50 8.000 5.61 5.82 NM ( µ ) 0.20 1.191 5.94 6.08 0.40 1.762 5.67 6.00 Normal Mixture 0.60 2.714 5.81 6.05 0.80 4.048 5.66 5.85

  4. Empirical powers Alternatives: Y = 1 + 2 X + r j ( X ) + ǫ, j = 1 , ..., 4 . Auxiliary models M ( k ) : k � Y = 1 + 2 X + θ j cos([ j + 1] πx ) , k = 1 , ..., 10 . j =1 v ( x ) = (1 , x ) , u ( x ) = (cos(2 πx ) , ..., cos([ k + 1] πx )) . Errors : as described above. Tests for comparison: CvM - Cram´ er-von Mises test, ˆ T - statistic of Guerre and Lavergne (2005). 100 100 ∗ ∗ ∗ ∗ G ( 0 . 25 ) , r 1 ❡ L ( 4 ) , r 2 ❡ ❡ ❡ ∗ 80 80 ❡ ❡ - W T 1 ∗ ❡ ✉ △ ❡ ∗ - W S 1 ❡ ✉ ❡ ❡ ∗ 60 60 ✉ ∗ ❡ ✉ - ˆ T ✉ ✉ ❡ - W T 1 △ - CvM ✉ ∗ ✉ 40 40 ∗ - W S 1 ❡ ∗ ✉ ❡ ✉ ✉ ✉ ✉ ❡ ✉ ✉ power ✉ - ˆ ∗ T ∗ ∗ 20 20 △ - CvM ∗ △ △ △ power △ △ △ △ △ △ △ △ △ △ 0 0 2 3 4 5 6 7 8 2 3 4 5 6 7 8 o s r 1 ( x ) = c × cos( πox ) , r 2 ( x ) = c × L s ( x ) , L s - s th normalized Legendre’s polynomial on [0,1]. Simulated powers of tests based on W T 1 , W S 1 , ˆ T and CvM under the alterna- tives Y = 1 + 2 X + r j ( X ) + ǫ, j = 1 , 2 , X uniform on [0,1] and different errors. Signal/noise 0 . 25 . 5% nominal level, n = 300 , N = 10000 MC runs. p = 2 . 4 .

  5. 100 100 ∗ ❡ - W T 1 L ( 4 ) , r 5 , c = 0 . 15 G ( 0 . 25 ) , r 4 , a = 0 . 3 ❡ ∗ power power ❡ ∗ - W S 1 ∗ ∗ ❡ - W T 1 ❡ 80 80 ∗ ❡ ❡ ✉ - ˆ ∗ T ❡ ∗ - W S 1 ∗ ∗ ❡ △ △ - CvM ❡ ✉ - ˆ 60 60 T ✉ △ ∗ ∗ ❡ △ - CvM ❡ △ 40 ∗ 40 ✉ ❡ ∗ △ ✉ ❡ ✉ △ ✉ ∗ ✉ ❡ ∗ 20 20 ✉ ❡ △ ✉ ✉ △ ✉ ✉ △ △ △ △ ✉ △ △ △ ✉ ✉ 0 0 3 4 5 6 7 8 9 -.40 -.55 -.70 -.85 -1.00 -1.15 -1.30 b c r 5 ( x ) = c × arctan[ b (2 x − 1)] , r 4 ( x ) = c × ( x − a )1 [ a, 1] ( x ). Simulated powers of tests based on W T 1 , W S 1 , ˆ T and CvM under the alterna- tives Y = 1 + 2 X + r j ( X ) + ǫ, j = 4 , 5 , X uniform on [0,1] and different errors. 5% nominal level, n = 300 , N = 10000 MC runs. p = 2 . 4 References Guerre, E., Lavergne, P. (2005). Data-driven rate-optimal specification test- ing in regression models. Ann. Statist. 33, 840-870. Inglot, T., Ledwina, T. (2006a). Data driven score tests for a homoscedastic linear regression model: the construction and simulations. Proc. Prague Stochas- tics 2006 , 124-137. Inglot, T., Ledwina, T. (2006b). Data driven score tests for a homoscedastic linear regression model: asymptotic results. Probab. Math. Statist. , 41-61. Inglot, T., Ledwina, T. (2006c). Towards data driven selection of a penalty function for data driven Neyman tests. Linear Algebra and its Appl. , 579-590. Neyman, J. (1959). Optimal asymptotic tests of composite statistical hypothe- ses. In Probability and Statistics: The Harald Cram´ er Volume (U. Grenander, ed.) 213-234. Wiley, New York.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend