the smoothed multivariate square root lasso
play

The smoothed multivariate square-root Lasso: an optimization lens on - PowerPoint PPT Presentation

The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation Joseph Salmon http://josephsalmon.eu IMAG, Univ. Montpellier, CNRS Series of works with: Quentin Bertrand (INRIA) Mathurin Massias (University of


  1. The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation Joseph Salmon http://josephsalmon.eu IMAG, Univ. Montpellier, CNRS Series of works with: Quentin Bertrand (INRIA) Mathurin Massias (University of Genova) Olivier Fercoq (Institut Polytechnique de Paris) Alexandre Gramfort (INRIA) 1 / 40

  2. Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 2 / 40

  3. The M/EEG inverse problem ◮ observe magnetoelectric field outside the scalp (100 sensors) ◮ reconstruct cerebral activity inside the brain (10,000 locations) p = 10,000 locations n = 100 sensors n ≪ p : ill-posed problem ◮ Motivation : identify brain regions responsible for the signals ◮ Applications : epilepsy treatment, brain aging, anesthesia risks 3 / 40

  4. M/EEG inverse problem for brain imaging ◮ sensors: electric and magnetic fields during a cognitive task 4 / 40

  5. MEG elements: magnometers and gradiometers Sensors Detail of a sensor Device 5 / 40

  6. M/EEG = MEG + EEG Photo Credit: Stephen Whitmarsh 6 / 40

  7. Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 7 / 40

  8. Source modeling SPACE TIME Position a few thousands candidate sources over the brain ( e.g., every 5mm) 8 / 40

  9. Design matrix - Forward operator 9 / 40

  10. Mathematical model: linear regression 10 / 40

  11. Experiments repeated r times Stimuli Stimulated M/EEG observed signals patient Repetition Repetition 11 / 40

  12. M/EEG specifity # 1: combined measurements Sensor detail Device Sensors Structure of Y and X : 12 / 40

  13. Sensor types & noise structure EEG covariance 0 N ave =55 EEG (59 channels) 20 5 uV blank 40 0 − 5 0 25 50 75 100 125 150 175 0 50 Gradiometers covariance 0 Gradiometers (203 channels) 200 fT/cm blank 0 100 0 25 50 75 100 125 150 175 200 0 200 Magnetometers (102 channels) Magnetometers covariance 500 0 fT blank 0 − 500 0 25 50 75 100 125 150 175 50 Time (ms) 100 0 100 13 / 40

  14. M/EEG specificity # 2: averaging repetitions of experiment Stimuli Stimulated M/EEG observed signals patient Repetition Repetition 14 / 40

  15. M/EEG specificity # 2: averaging repetitions of experiment Stimuli Stimulated M/EEG observed signals patient Averaged signal Repetition Repetition 14 / 40

  16. M/EEG specificity # 2: averaged signals (EEG only) (EEG only) (EEG only) Limit on the repetitions: subject/patient fatigue 15 / 40

  17. A multi-task framework Multi-task regression notation: ◮ n observations (number of sensors) ◮ T tasks (temporal information) ◮ p features (spatial description) ◮ r number of repetitions for the experiment ◮ Y (1) , . . . , Y ( r ) ∈ R n × T observation matrices; ¯ Y = 1 l Y ( l ) � r ◮ X ∈ R n × p forward matrix Y ( l ) = X B ∗ + S ∗ E ( l ) , where ◮ B ∗ ∈ R p × T : true source activity matrix ( unknown ) ++ co-standard deviation matrix (1) ( unknown ) ◮ S ∗ ∈ S n ◮ E (1) , . . . , E ( r ) ∈ R n × T : white noise (standard Gaussian) (1) S � σ means S − σ Id n is Semi-Definite Positive 16 / 40

  18. Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 17 / 40

  19. Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40

  20. Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40

  21. Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) ◮ Dictionary learning for images (2000’s) (3) (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40

  22. Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) ◮ Dictionary learning for images (2000’s) (3) ◮ Neuroimaging: measurements assumed to be explained by a few active brain sources (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40

  23. Justification for dipolarity assumption Sparsity holds: dipolar patterns equivalent to focal sources ◮ short duration ◮ simple cognitive task ◮ repetitions of experiment average out other sources ◮ ICA recovers dipolar patterns, (4) well modeled by focal sources: (4) A. Delorme et al. “Independent EEG sources are dipolar”. In: PloS one 7.2 (2012), e30135. 19 / 40

  24. (Structured) Sparsity inducing penalties (5) � 1 � ˆ 2 nT � Y − X B � 2 B ∈ arg min F + λ � B � 1 B ∈ R p × T Sparse support: no structure ✗ Lasso penalty sources p T � � � � B � 1 | B jt | j =1 t =1 time (5) G. Obozinski, B. Taskar, and M. I. Jordan. “Joint covariate selection and joint subspace selection for multiple classification problems”. In: Statistics and Computing 20.2 (2010), pp. 231–252. 20 / 40

  25. (Structured) Sparsity inducing penalties (5) � 1 � ˆ 2 nT � Y − X B � 2 B ∈ arg min F + λ � B � 2 , 1 B ∈ R p × T Sparse support: group structure ✓ Group-Lasso penalty sources p � � B � 2 , 1 � � B j : � 2 j =1 with B j : , j -th row of B time (5) G. Obozinski, B. Taskar, and M. I. Jordan. “Joint covariate selection and joint subspace selection for multiple classification problems”. In: Statistics and Computing 20.2 (2010), pp. 231–252. 20 / 40

  26. Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 (6) & whitened, say using baseline data 21 / 40

  27. Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 B repet = ˆ ◮ Fail: ˆ B (because of datafit �·� 2 F ) (6) & whitened, say using baseline data 21 / 40

  28. Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 B repet = ˆ ◮ Fail: ˆ B (because of datafit �·� 2 F ) → investigate other datafits ֒ (6) & whitened, say using baseline data 21 / 40

  29. Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 22 / 40

  30. Lasso (7) , (8) : the “modern least-squares” (9) 1 2 n � y − Xβ � 2 + λ � β � 1 ˆ β ∈ arg min β ∈ R p ◮ y ∈ R n : observations ◮ X ∈ R n × p : design matrix ◮ sparsity : for λ large enough, � ˆ β � 0 ≪ p (7) R. Tibshirani. “Regression Shrinkage and Selection via the Lasso”. In: J. R. Stat. Soc. Ser. B Stat. Methodol. 58.1 (1996), pp. 267–288. (8) S. S. Chen and D. L. Donoho. “Atomic decomposition by basis pursuit”. In: SPIE . 1995. (9) E. J. Candès, M. B. Wakin, and S. P. Boyd. “Enhancing Sparsity by Reweighted l 1 Minimization”. In: J. Fourier Anal. Applicat. 14.5-6 (2008), pp. 877–905. 23 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend