gaussian model selection with unknown variance
play

Gaussian Model Selection with Unknown Variance Y. Baraud, C. Giraud - PowerPoint PPT Presentation

Gaussian Model Selection with Unknown Variance Y. Baraud, C. Giraud and S. Huet Universit e de Nice - Sophia Antipolis, INRA Jouy en Josas Luminy, 13-17 novembre 2006 The statistical setting The statistical model Observations: Y i = i +


  1. Gaussian Model Selection with Unknown Variance Y. Baraud, C. Giraud and S. Huet Universit´ e de Nice - Sophia Antipolis, INRA Jouy en Josas Luminy, 13-17 novembre 2006

  2. The statistical setting The statistical model Observations: Y i = µ i + σε i , i = 1 , . . . , n • µ = ( µ 1 , . . . , µ n ) ′ ∈ R n and σ > 0 are unknown • ε 1 , . . . , ε n are i.i.d standard Gaussian Collection of models / estimators • S = { S m , m ∈ M} a countable collection of linear subspaces of R n (models) • ˆ µ m = least-squares estimator of µ on S m

  3. Example: change-points detection • µ i = f ( x i ) with f : [0 , 1] �→ R , piecewise constant. • M is the set of increasing sequences m = ( t 0 , . . . , t q ) with q ∈ { 1 , . . . , p } , t 0 = 0 , t q = 1 , and { t 1 , . . . , t q − 1 } ⊂ { x 1 , . . . , x n } . • models: S m = { ( g ( x 1 ) , . . . , g ( x n )) ′ , g ∈ F m } , where   q   � ( a 1 , . . . , a q ) ∈ R q F ( t 0 ,...,t q ) =  g = a j 1 [ t j − 1 ,t j [ with  . j =1 • No residual squares to estimate the variance.

  4. Risk on a single model Euclidean risk on S m : � | 2 � | 2 + D m σ 2 | | µ − ˆ µ m | = | | µ − µ m | E � �� � � �� � variance bias � | 2 � µ m ∗ , where m ∗ minimizes m �→ E Ideal: estimate µ with ˆ | | µ − ˆ µ m | . . .

  5. Model selection Selection rule: we set D m = dim ( S m ) and select ˆ m minimizing � � 1 + pen ( m ) | 2 Crit L ( m ) = | | Y − ˆ µ m | (1) n − D m or � | � | 2 Crit K ( m ) = n | Y − ˆ µ m | + 1 2 pen ′ ( m ) . 2 log (2) n Some classical penalties: FPE AIC BIC AMDL pen ′ ( m ) = 2 D m pen ′ ( m ) = D m log n pen ′ ( m ) = 3 D m log n pen ( m ) = 2 D m

  6. Model selection Selection rule: we select ˆ m minimizing � � 1 + pen ( m ) | 2 Crit L ( m ) = | | Y − ˆ µ m | n − D m or � | � | 2 Crit K ( m ) = n | Y − ˆ µ m | + 1 2 pen ′ ( m ) . 2 log n Criteria (1) and (2) are equivalent with � � 1 + pen ( m ) pen ′ ( m ) = n log . n − D m

  7. Objectives • for classical criteria: to analyze the Euclidean risk of ˆ µ ˆ m with regard to the complexity of the family of model S , and compare this risk to | ] 2 . m ∈M E [ | inf | µ − ˆ µ m | • to propose penalties versatile enough to take into account the complexity of S and the sample size. Complexity: We say that S has an index of complexity ( M, a ) if for all D ≥ 1 card { m ∈ M , D m = D } ≤ Me aD .

  8. Theorem 1: Performances of classical penalties Let K > 1 and S with complexity ( M, a ) ∈ R 2 + . If for all m ∈ M , D m ≤ D max ( K, M, a ) (explicit) and pen ( m ) ≥ K 2 φ − 1 ( a ) D m , with φ ( x ) = ( x − 1 − log x ) / 2 for x ≥ 1 , then � � � � K 1 + pen ( m ) � | 2 � | 2 + pen ( m ) σ 2 | | µ − ˆ µ ˆ m | ≤ K − 1 inf | | µ − µ m | + R E n − D m m ∈M where � � R = Kσ 2 8 KMe − a K 2 φ − 1 ( a ) + 2 K + . � � 2 K − 1 e φ ( K ) / 2 − 1

  9. Performances of ˆ µ ˆ m • under the above hypotheses if pen ( m ) = Kφ − 1 ( a ) D m with K > 1 � � � | 2 � � | 2 � ≤ c ( K, M ) φ − 1 ( a ) + σ 2 | | µ − ˆ µ ˆ m | inf | | µ − ˆ µ m | E m ∈M E • The condition ” pen ( m ) ≥ K 2 φ − 1 ( a ) D m with K > 1 ” is sharp (at least when a = 0 and a = log n ). Roughly, for large values of n this imposes the restrictions: Criteria FPE AIC BIC AMDL a < 1 a < 3 Complexity a < 0 . 15 a < 0 . 15 2 log( n ) 2 log( n )

  10. Dkhi function For x ≥ 0 , we define �� � � 1 X D − x X N Dkhi [ D, N, x ] = E ( X D ) × E ∈ ]0 , 1] . N + where X D and X N are two independent χ 2 ( D ) and χ 2 ( N ) . Computation: x �→ Dkhi [ D, N, x ] is decreasing and � � � � x − x F D,N +2 ≥ ( N + 2) x Dkhi [ D, N, x ] = P F D +2 ,N ≥ , D P D + 2 DN where F D,N is a Fischer random variable with D and N degrees of freedom.

  11. Theorem 2: a general risk bound Let pen be an arbitrary non-negative penalty function and assume that N m = n − D m ≥ 2 for all m ∈ M . If ˆ m exists a.s., then for any K > 1 � � � � K 1 + pen ( m ) � | 2 � | 2 + pen ( m ) σ 2 | | µ − ˆ µ ˆ m | ≤ K − 1 inf | | µ − µ m | + Σ (3) E N m m ∈M where � � Σ = K 2 σ 2 D m + 1 , N m − 1 , N m − 1 � ( D m + 1) Dkhi pen ( m ) . K − 1 KN m m ∈M

  12. Minimal penalties • Choose K > 1 and L = { L m , m ∈ M} non-negative numbers (weights) such that � Σ ′ = ( D m + 1) e − L m < + ∞ . m ∈M • For any m ∈ M set N m N m − 1 Dkhi − 1 � D m + 1 , N m − 1 , e − L m � pen L K, L ( m ) = K • When L m ∨ D m ≤ κn with κ < 1 : pen L K, L ( m ) ≤ C ( K, κ ) ( L m ∨ D m ) .

  13. How to choose the L m ? • When S has a complexity ( M, a ) : a possible choice is L m = aD m + 3 log( D m +1 ) . Then � � Σ ′ = ( D m + 1) e − L m ≤ M D − 2 m ∈M D ≥ 1 �� �� n • For change-point detection: We choose L m = L ( | m | ) = log +2 log( | m | ) , | m |− 2 for which p +1 p +1 � � n 1 � � De − L ( D ) = Σ ′ = D ≤ log( p + 1) . D − 2 D =2 D =2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend