compressed sensing sparsity and p values
play

Compressed sensing, sparsity and p-values Sara van de Geer April - PowerPoint PPT Presentation

Compressed sensing, sparsity and p-values Sara van de Geer April 16, 2015 (Leiden) Dantzig April 16, 2015 1 / 49 Basis Pursuit [Chen, Donoho and Saunders (1998)] X : given n p (sensing) matrix and f 0 : given n -vector of measurements. We


  1. Compressed sensing, sparsity and p-values Sara van de Geer April 16, 2015 (Leiden) Dantzig April 16, 2015 1 / 49

  2. Basis Pursuit [Chen, Donoho and Saunders (1998)] X : given n × p (sensing) matrix and f 0 : given n -vector of measurements. We know f 0 = X β 0 . We want to recover β 0 ∈ R p . There are n equations and p unknowns. High-dimensional case: p ≫ n . Notation The ℓ 1 -norm is p � | β j | , β ∈ R p . � β � 1 := j = 1 β ∗ := arg min {� β � 1 : X β = f 0 } . Basis pursuit solution (Leiden) Dantzig April 16, 2015 2 / 49

  3. Let S ⊂ { 1 , . . . , p } . Notation β S := { β j l { j ∈ S }} , β − S := β S c = β − β S .     ← 1 ∈ S β 1 0 . . .  .  .  .  . . .         ← j − 1 / ∈ S β j − 1 0     β S = , β − S =     β j ← j ∈ S 0         . . .  .  .  .  . . .     ← p / ∈ S 0 β p Definition The matrix X satisfies the null-space property at S if for all β � = 0 in null ( X ) it holds that � β − S � 1 > � β S � 1 . (Leiden) Dantzig April 16, 2015 3 / 49

  4. Basis pursuit solution β ∗ := arg min {� β � 1 : X β = f 0 } . Let S 0 := { j : β 0 j � = 0 } be the active set of β 0 . Loose definition The vector β 0 is called sparse if S 0 is small. Theorem Suppose X has the null-space property at S 0 . Then we have exact recovery: β ∗ = β 0 . (Leiden) Dantzig April 16, 2015 4 / 49

  5. Proof. Suppose β ∗ � = β 0 . Since X β ∗ = X β 0 = f 0 we have β ∗ − β 0 ∈ null ( X ) . By the null-space property � β ∗ − S 0 � 1 > � β ∗ S 0 − β 0 � 1 . Since β ∗ minimizes � · � 1 we have � β ∗ � 1 ≤ � β 0 � 1 . We can decompose the ℓ 1 -norm as � β ∗ � 1 = � β ∗ S 0 � 1 + � β ∗ − S 0 � 1 . Hence � β ∗ S 0 � 1 + � β ∗ − S 0 � 1 ≤ � β 0 � 1 . But then by the triangle inequality � β ∗ − S 0 � 1 ≤ � β ∗ S 0 − β 0 � 1 . Thus we arrived at a contradiction . ⊔ ⊓ (Leiden) Dantzig April 16, 2015 5 / 49

  6. Definition [vdG (2007)] The compatibility constant for the set S and the stretching constant L > 0 is � | S | � ˆ φ 2 ( L , S ) = min n � X β S − X β − S � 2 2 : � β − S � 1 ≤ L , � β S � 1 = 1 . We have: X satisfies the null-space property at S ⇔ ˆ φ ( 1 , S ) > 0 . (Leiden) Dantzig April 16, 2015 6 / 49

  7. X X , . . . , p 2 ˆ φ (1 , { 1 } ) X 1 The compatibility constant ˆ φ ( 1 , S ) for the case S = { 1 } . (Leiden) Dantzig April 16, 2015 7 / 49

  8. Regularized formulation � β λ := arg min {� X β − f 0 � 2 2 / n + 2 λ � β � 1 . Lemma We have λ 2 | S 0 | � X ( β λ − β 0 ) � 2 2 / n ≤ . ˆ φ 2 ( 1 , S 0 ) (Leiden) Dantzig April 16, 2015 8 / 49

  9. Adding noise Let Y = f 0 + ǫ with ǫ unobservable noise. Let β 0 be a solution of f 0 = X β 0 . Definition The Lasso is � � β := ˆ ˆ � Y − X β � 2 β λ := arg min 2 / n + 2 λ � β � 1 . β (Leiden) Dantzig April 16, 2015 9 / 49

  10. Theorem (prediction error of the Lasso) Let λ ǫ ≥ � X T ǫ � ∞ / n . Take λ > λ ǫ . Then for ¯ λ λ := λ − λ ǫ , ¯ λ := λ + λ ǫ , L := λ we have ¯ λ 2 | S 0 | � X (ˆ β − β 0 ) � 2 2 / n ≤ . ˆ φ 2 ( L , S 0 ) (Leiden) Dantzig April 16, 2015 10 / 49

  11. Note 1 � · � ∞ is the dual norm of � · � 1 . Note 2 Suppose ǫ ∼ N n ( 0 , σ 2 0 I ) and diag ( X T X ) / n = I . Then � � � 2 log ( 2 p /α ) � X T ǫ � ∞ / n ≥ σ 0 ≤ α. P n Note 3 Under compatibility conditions Lasso thus has prediction error 0 log p × | S 0 | � X (ˆ β − β 0 ) � 2 2 / n ∼ σ 2 n 0 log p × number of active parameters = σ 2 . number of observations = oracle inequality = adaptation (Leiden) Dantzig April 16, 2015 11 / 49

  12. Note 1 � · � ∞ is the dual norm of � · � 1 . Note 2 Suppose ǫ ∼ N n ( 0 , σ 2 0 I ) and diag ( X T X ) / n = I . Then � � � 2 log ( 2 p /α ) � X T ǫ � ∞ / n ≥ σ 0 ≤ α. P n Note 3 Under compatibility conditions Lasso thus has prediction error 0 log p × | S 0 | � X (ˆ β − β 0 ) � 2 2 / n ∼ σ 2 n 0 log p × number of active parameters = σ 2 . number of observations = oracle inequality = adaptation (Leiden) Dantzig April 16, 2015 11 / 49

  13. Note 1 � · � ∞ is the dual norm of � · � 1 . Note 2 Suppose ǫ ∼ N n ( 0 , σ 2 0 I ) and diag ( X T X ) / n = I . Then � � � 2 log ( 2 p /α ) � X T ǫ � ∞ / n ≥ σ 0 ≤ α. P n Note 3 Under compatibility conditions Lasso thus has prediction error 0 log p × | S 0 | � X (ˆ β − β 0 ) � 2 2 / n ∼ σ 2 n 0 log p × number of active parameters = σ 2 . number of observations = oracle inequality = adaptation (Leiden) Dantzig April 16, 2015 11 / 49

  14. What if β 0 is only approximately sparse? Theorem (trade-off approximation error and sparsity) Let λ ǫ ≥ � X T ǫ � ∞ / n . Take λ > λ ǫ . Then for ¯ λ λ := λ − λ ǫ , ¯ λ := λ + λ ǫ , L := λ we have for all β and S ¯ λ 2 | S | � X (ˆ β − β 0 ) � 2 2 / n ≤ � X ( β − β 0 ) � 2 2 / n + 4 λ � β − S � 1 + . ˆ φ 2 ( L , S ) � �� � approximation error � �� � “effective sparsity” (Leiden) Dantzig April 16, 2015 12 / 49

  15. Corollary Let S ⊂ { 1 , . . . , p } be arbitrary. Let f S be the projection of f 0 on the space spanned by { X j } j ∈ S . Then ¯ λ 2 | S | � X (ˆ β − β 0 ) � 2 2 / n ≤ � f S − f 0 � 2 2 / n + . ˆ φ 2 ( L , S ) So � � ¯ λ 2 | S | � X (ˆ β − β 0 ) � 2 � f S − f 0 � 2 2 / n ≤ min 2 / n + . ˆ φ 2 ( L , S ) S (Leiden) Dantzig April 16, 2015 13 / 49

  16. What about the ℓ 1 -estimation error? Theorem(including the ℓ 1 -error) Let λ ǫ ≥ � X T ǫ � ∞ / n . Take λ > λ ǫ . Then for ¯ λ λ := λ − λ ǫ , ¯ λ := λ + λ ǫ + δλ, L := ( 1 − δ ) λ we have for all β and S ¯ λ 2 | S | 2 δλ � ˆ β − β � 1 + � X (ˆ β − β 0 ) � 2 2 / n ≤ � X ( β − β 0 ) � 2 + 4 λ � β − S � 1 . 2 / n + ˆ φ 2 ( L , S ) (Leiden) Dantzig April 16, 2015 14 / 49

  17. Corollary (weak sparsity) Let p � ρ r | β 0 j | r , 0 < r < 1 , r := j = 1 S ∗ := { j : | β 0 j | > 3 λ ǫ } . We have (with δ = 1 / 5 , λ = 2 λ ǫ ) ρ r � ˆ β − β 0 � 1 ≤ 2 8 λ 1 − r r . ǫ ˆ φ 2 ( 4 , S ∗ ) Asymptopia Suppose 1 / ˆ φ 2 ( 4 , S ∗ ) = O ( 1 ) . � Let λ ǫ ≍ log p / n . 1 − r 2 ) we have � ˆ When ρ r β − β 0 � 1 = o P ( 1 ) . r = o (( n / log p ) (Leiden) Dantzig April 16, 2015 15 / 49

  18. Question What is so special about the ℓ 1 -norm? Why does it lead to exact recovery and oracle inequalities? Answer Its decomposability: � β � 1 = � β S � 1 + � β − S � 1 . (Leiden) Dantzig April 16, 2015 16 / 49

  19. Question What is so special about the ℓ 1 -norm? Why does it lead to exact recovery and oracle inequalities? Answer Its decomposability: � β � 1 = � β S � 1 + � β − S � 1 . (Leiden) Dantzig April 16, 2015 16 / 49

  20. Question What is so special about the ℓ 1 -norm? Why does it lead to exact recovery and oracle inequalities? Answer Its decomposability: � β � 1 = � β S � 1 + � β − S � 1 . (Leiden) Dantzig April 16, 2015 16 / 49

  21. Definition The sub-differential of β �→ � β � 1 is ∂ � β � 1 = { z : � z � ∞ = 1 , z T β = � β � 1 } . |ß | 1 +1 -1 subdifferential calculus (Leiden) Dantzig April 16, 2015 17 / 49

  22. We invoke decomposability actually as the triangle property z T β ≥ � β − S 0 � 1 − � β S 0 � 1 . max z ∈ ∂ � β 0 � 1 (Leiden) Dantzig April 16, 2015 18 / 49

  23. Other norms Let Ω be a norm on R p . Definition The dual norm of Ω is Ω( β ) ≤ 1 z T β, z ∈ R p . Ω ∗ ( z ) := max Definition The sub-differential of β �→ Ω( β ) is ∂ Ω( β ) := { z : Ω ∗ ( z ) = 1 , z T β = Ω( β ) } . (Leiden) Dantzig April 16, 2015 19 / 49

  24. Other norms Let Ω be a norm on R p . Definition The dual norm of Ω is Ω( β ) ≤ 1 z T β, z ∈ R p . Ω ∗ ( z ) := max Definition The sub-differential of β �→ Ω( β ) is ∂ Ω( β ) := { z : Ω ∗ ( z ) = 1 , z T β = Ω( β ) } . (Leiden) Dantzig April 16, 2015 19 / 49

  25. Definition We say that Ω is weakly decomposable at β 0 if there exists semi-norms Ω + and Ω − (depending on β 0 ) with Ω − ( β 0 ) = 0 such that for all β Ω( β ) ≥ Ω + ( β ) + Ω − ( β ) . Definition We say that Ω satisfies the triangle property at β 0 if there exists semi-norms Ω + and Ω − (depending on β 0 ) such that for all β z 0 ∈ ∂ Ω( β 0 ) z T ( β − β 0 ) ≥ Ω − ( β ) − Ω + ( β − β 0 ) . max (Leiden) Dantzig April 16, 2015 20 / 49

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend