gaussian model selection with an unknown variance
play

Gaussian model selection with an unknown variance Yannick Baraud - PowerPoint PPT Presentation

Gaussian model selection with an unknown variance Yannick Baraud Laboratoire J.A. Dieudonn e Universit e de Nice Sophia Antipolis baraud@unice.fr Joint work with C. Giraud and S. Huet logo The statistical framework We observe


  1. Gaussian model selection with an unknown variance Yannick Baraud Laboratoire J.A. Dieudonn´ e Universit´ e de Nice Sophia Antipolis baraud@unice.fr Joint work with C. Giraud and S. Huet logo

  2. The statistical framework We observe � � µ, σ 2 I n Y ∼ N where both parameters µ ∈ R n and σ > 0 are unknown. Our aim: Estimate µ from the observation of Y . logo

  3. Example : Variable selection p � � � µ, σ 2 I n Y ∼ N with µ = θ j X j . j = 1 and p possibly larger than n but expect that � �� �� � ≪ n j , θ j � = 0 � � Our aim: Estimate µ and j , θ j � = 0 . logo

  4. The estimation strategy: model selection We start with a collection { S m , m ∈ M} of linear subspaces (models) of R n . S m − → ˆ µ m = Π S m Y Our aim : select ˆ m = ˆ m ( Y ) among M in such a way � m | 2 � � µ m | 2 � | µ − ˆ µ ˆ inf | µ − ˆ . E close to m ∈M E logo

  5. Variable selection (continued) p � � � µ, σ 2 I n Y ∼ N with µ = θ j X j j = 1 For m ⊂ { 1 , . . . , p } , such that | m | ≤ D max < n we set � � S m = Span X j , j ∈ m . Ordered variable selection. Take M o = {{ 1 , . . . , D } , D ≤ D max } ∪ { ∅ } (Almost) complete variable selection. Take M c = { m ⊂ P ( { 1 , . . . , p } ) , | m | ≤ D max } logo

  6. Some selection criteria � � µ m | 2 + pen ( m ) ˆ | Y − ˆ m = argmin m ∈M - Mallows’ C p (1973): pen ( m ) = 2 D m σ 2 where D m = dim ( S m ) . e & Massart (2001): pen ( m ) = pen ( m , σ 2 ) . - Birg´ logo

  7. Advantages : - Non-asymptotic theory - Variable selection: no assumption on the predictors X j . - Bayesian flavor : allows (into some extent) to take into account knowlege/intuition Drawbacks : - The computation of ˆ m may not feasible if M is too large logo

  8. For the problem of variable selection : Tibshirani(1996) Lasso :   � � 2  � �   p  � � � θ λ = argmin ˆ � � Y − θ j X j + λ | θ | 1 . � �   θ ∈ R p  � �  j = 1 Cand` es & Tao (2007) Dantzig selector:  � �  � �  p  � � � θ λ = arg ˆ � � min  | θ | 1 , max � X j , Y − θ j ′ X j ′ � ≤ λ � �  j = 1 ,..., p � � j ′ = 1 � � � m λ = j , ˆ ˆ ˆ θ λ θ λ − → j � = 0 and ˆ µ ˆ m λ = j X j j ∈ ˆ m λ logo

  9. Advantages : - The computation is feasible even if p is very large - Non-asymptotic theory Drawbacks : - The procedure work under suitable assumptions on the predictors X j - There is no way to check these assumptions if p is very large - Blind to knowledge/intuition logo

  10. For all these procedures, remains the problem of estimating σ 2 or choosing λ These parameters depends on the data distribution and must be estimated In general, there is no natural estimator of σ 2 (complete variable selection with p > n ) Cross-validation... The performance of the procedure crucially depends upon these parameters. logo

  11. Other selection criteria � � 1 + pen ( m ) µ m | 2 Crit ( m ) = | Y − ˆ n − D m or � µ m | 2 � + pen ′ ( m ) Crit ′ ( m ) | Y − ˆ = log n Both criteria are the same if one takes � � 1 + pen ( m ) pen ′ ( m ) = n log ≈ pen ( m ) n − D m logo

  12. � � 1 + pen ( m ) µ m | 2 Crit ( m ) = | Y − ˆ n − D m or � µ m | 2 � + pen ′ ( m ) Crit ( m ) = log | Y − ˆ n Akaike(1969) FPE : pen ( m ) = 2 D m Akaike(1973) AIC : pen ′ ( m ) = 2 D m Schwarz/Akaike (1978) BIC/SIC : pen ′ ( m ) = D m log ( n ) Saito(1994) AMDL : pen ′ ( m ) = 3 D m log ( n ) logo

  13. Two questions What can be said about these selection criteria from a 1 non-asymptotic point of view? Is it possible to propose other penalties that would take into 2 account the complexity of the collection { S m , m ∈ M} ? logo

  14. What do we mean by complexity? We shall say that that the collection { S m , m ∈ M} is a -complex (with a ≥ 0) if |{ m ∈ M , D m = D }| ≤ e aD ∀ D ≥ 1 . For the collection { S m , m ∈ M o } |{ m ∈ M , D m = D }| ≤ 1 = ⇒ a = 0 For the collection { S m , m ∈ M c } � p � ≤ p D |{ m ∈ M , D m = D }| ≤ = ⇒ a = log ( p ) D logo

  15. Penalty choice with regard to complexity Let φ ( x ) = ( x − 1 − log ( x )) / 2 for x ≥ 1. Consider a a -complex collection { S m , m ∈ M} . If for some K , K ′ > 1 pen ( m ) ≤ K ′ , ∀ m ∈ M ∗ K ≤ φ − 1 ( a ) D m and select � � 1 + pen ( m ) µ m | 2 ˆ m = argmin m ∈M | Y − ˆ n − D m then � � m | 2 | µ − ˆ µ ˆ E σ 2 C ( K ) K ′ φ − 1 ( a ) ≤ � � µ m | 2 | µ − ˆ inf m ∈M E ∨ 1 σ 2 logo

  16. Case of ordered variable selection a = 0, φ − 1 ( a ) = 1. For all m ∈ M such that D m � = 0 1 < K ≤ pen ( m ) ≤ K ′ D m one has � � m | 2 | µ − ˆ µ ˆ E σ 2 C ( K ) K ′ ≤ � � µ m | 2 | µ − ˆ inf m ∈M E ∨ 1 σ 2 − → FPE and AIC (for n large enough) logo

  17. Case of the complete variable selection with p = n a = log ( n ) , φ − 1 ( a ) ≈ 2 log ( n ) . If for all m ∈ M such that D m � = 0 pen ( m ) 2 D m log ( n ) ≤ K ′ 1 < K ≤ then � � m | 2 | µ − ˆ µ ˆ E σ 2 C ( K ) K ′ log ( n ) ≤ � � µ m | 2 | µ − ˆ ∨ 1 inf m ∈M E σ 2 − → AMDL (but not AIC, FPE, BIC) logo

  18. New penalties Definition Let X D ∼ χ 2 ( D ) , X N ∼ χ 2 ( N ) , be two independent χ 2 . Define �� � � 1 X D − x X N H D , N ( x ) = E ( X D ) × E , x ≥ 0 N + Definition To each S m with D m < n − 1 , we associate a weight L m ≥ 0 and the penalty � e − L m � 1 . 1 N m N m − 1 H − 1 pen ( m ) = where N m = n − D m . D m + 1 , N m − 1 logo

  19. Theorem Let { S m , m ∈ M} be a collection of models and { L m , m ∈ M} a family of weights. Assume that N m ≥ 7 and D m ∨ L m ≤ n / 2 for all m ∈ M . Define � � 1 + pen ( m ) µ m | 2 ˆ m ∈M | Y − ˆ m = argmin n − D m The estimator ˆ µ ˆ m satisfies � � m | 2 | µ − ˆ µ ˆ � × E σ 2 � � � � µ m | 2 � | µ − ˆ ( D m + 1 ) e − L m . ≤ inf + L m + E σ 2 m ∈M m ∈M logo

  20. Ordered variable selection For m ∈ M o , m = { 1 , . . . , D } , L m = | m | � ( D m + 1 ) e − L m ≤ 2 . 51 − → m ∈M If | m | ≤ D max ≤ [ n / 2 ] ∧ p , � � � � � � m | 2 µ m | 2 | µ − ˆ µ ˆ | µ − ˆ ≤ � inf ∨ 1 E E . σ 2 σ 2 m ∈M logo

  21. Complete Variable selection For m ∈ M c , �� p �� + 2 log ( | m | + 1 ) L m = log | m | � ( D m + 1 ) e − L m ≤ log ( p ) . − → m ∈M If | m | ≤ D max ≤ [ n / ( 2 log ( p ))] ∧ p , � � � � � � m | 2 µ m | 2 | µ − ˆ µ ˆ | µ − ˆ ≤ � log ( p ) inf ∨ 1 . E E σ 2 σ 2 m ∈M logo

  22. Complete Variable selection: order of magnitude of the penalty n=32 n=512 400 8000 K=1.1 AMDL 6000 300 penalty penalty 200 4000 100 2000 0 0 0 2 4 6 8 0 20 40 60 80 logo D D

  23. Comparison with Lasso/Adaptive Lasso The ”Adaptive Lasso” Proposed by Zou(2006).   � � 2  � �  p p   � � � � � � 1 θ λ = argmin ˆ � � � θ j � Y − θ j X j + λ � � γ × . � � � �   θ ∈ R p � ˜  � � θ j  � j = 1 j = 1 − → λ, γ obtained by cross-validation logo

  24. Simulation 1 Consider the predictors X 1 , . . . , X 8 ∈ R 20 such that for all i = 1 , . . . , 20 X T = ( X 1 , i , . . . , X 8 , i ) are i.i.d. N ( 0 , Γ) with Γ j , k = 0 . 5 | j − k | . i and µ = 3 X 1 + 1 . 5 X 2 + 2 X 5 logo

  25. σ = 1 E ( | � % { � % { � r m | ) m = m 0 } m ⊇ m 0 } Our procedure 1.57 3.34 72% 97.8% Lasso 2.09 5.21 10.8% 100% A. Lasso 1.99 4.56 16.8% 99% σ = 3 E ( | � m | ) % { � m = m 0 } % { � m ⊇ m 0 } r Our procedure 3.08 2.01 10.3% 15.7 Lasso 2.06 4.56 10.5% 100% A. Lasso 2.44 3.81 13.2 52% logo

  26. Simulation 2 Let X 1 , X 2 , X 3 be three vectors of R n defined by √ X 1 = ( 1 , − 1 , 0 , . . . , 0 ) / 2 √ 1 + 1 . 001 2 X 2 = ( − 1 , 1 . 001 , 0 , . . . , 0 ) / √ √ � 1 + ( n − 2 ) / n 2 X 3 = ( 1 / 2 , 1 / 2 , 1 / n , . . . , 1 / n ) / and X j = e j for all j = 4 , . . . , n . We take p = n = 20, D max = 8 and µ = ( n , n , 0 , . . . , 0 ) ∈ Span { X 1 , X 2 } . − → µ almost ⊥ X 1 , X 2 and very correlated to X 3 . logo

  27. The result E ( | � m | ) % { � m = m 0 } % { � m ⊇ m 0 } r Our procedure 2.24 2.19 83.4% 96.2% Lasso 285 6 0% 30% A. Lasso 298 5 0% 25% logo

  28. Mixed strategy Let m ∈ M c . L m = | m | if m ∈ M o �� p �� = + log ( p ( | m | + 1 )) if m ∈ M c \ M o log | m | � ( D m + 1 ) e − L m ≤ 3 . 51 − → m ∈M � � m | 2 | µ − ˆ µ ˆ ≤ � E σ 2 � � � � � � � � µ m | 2 µ m | 2 | µ − ˆ | µ − ˆ inf ∨ 1 ∧ log ( p ) inf ∨ 1 . m ∈M o E m ∈M c E σ 2 σ 2 logo

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend