The smoothed multivariate square-root Lasso: an optimization lens on - PowerPoint PPT Presentation

The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation Joseph Salmon http://josephsalmon.eu IMAG, Univ. Montpellier, CNRS Series of works with: Quentin Bertrand (INRIA) Mathurin Massias (University of Genova) Olivier Fercoq (Institut Polytechnique de Paris) Alexandre Gramfort (INRIA) 1 / 40

Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 2 / 40

The M/EEG inverse problem ◮ observe magnetoelectric field outside the scalp (100 sensors) ◮ reconstruct cerebral activity inside the brain (10,000 locations) p = 10,000 locations n = 100 sensors n ≪ p : ill-posed problem ◮ Motivation : identify brain regions responsible for the signals ◮ Applications : epilepsy treatment, brain aging, anesthesia risks 3 / 40

M/EEG inverse problem for brain imaging ◮ sensors: electric and magnetic fields during a cognitive task 4 / 40

MEG elements: magnometers and gradiometers Sensors Detail of a sensor Device 5 / 40

M/EEG = MEG + EEG Photo Credit: Stephen Whitmarsh 6 / 40

Source modeling SPACE TIME Position a few thousands candidate sources over the brain ( e.g., every 5mm) 8 / 40

Design matrix - Forward operator 9 / 40

Mathematical model: linear regression 10 / 40

Experiments repeated r times Stimuli Stimulated M/EEG observed signals patient Repetition Repetition 11 / 40

M/EEG specifity # 1: combined measurements Sensor detail Device Sensors Structure of Y and X : 12 / 40

Sensor types & noise structure EEG covariance 0 N ave =55 EEG (59 channels) 20 5 uV blank 40 0 − 5 0 25 50 75 100 125 150 175 0 50 Gradiometers covariance 0 Gradiometers (203 channels) 200 fT/cm blank 0 100 0 25 50 75 100 125 150 175 200 0 200 Magnetometers (102 channels) Magnetometers covariance 500 0 fT blank 0 − 500 0 25 50 75 100 125 150 175 50 Time (ms) 100 0 100 13 / 40

M/EEG specificity # 2: averaging repetitions of experiment Stimuli Stimulated M/EEG observed signals patient Repetition Repetition 14 / 40

M/EEG specificity # 2: averaging repetitions of experiment Stimuli Stimulated M/EEG observed signals patient Averaged signal Repetition Repetition 14 / 40

M/EEG specificity # 2: averaged signals (EEG only) (EEG only) (EEG only) Limit on the repetitions: subject/patient fatigue 15 / 40

A multi-task framework Multi-task regression notation: ◮ n observations (number of sensors) ◮ T tasks (temporal information) ◮ p features (spatial description) ◮ r number of repetitions for the experiment ◮ Y (1) , . . . , Y ( r ) ∈ R n × T observation matrices; ¯ Y = 1 l Y ( l ) � r ◮ X ∈ R n × p forward matrix Y ( l ) = X B ∗ + S ∗ E ( l ) , where ◮ B ∗ ∈ R p × T : true source activity matrix ( unknown ) ++ co-standard deviation matrix (1) ( unknown ) ◮ S ∗ ∈ S n ◮ E (1) , . . . , E ( r ) ∈ R n × T : white noise (standard Gaussian) (1) S � σ means S − σ Id n is Semi-Definite Positive 16 / 40

Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40

Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40

Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) ◮ Dictionary learning for images (2000’s) (3) (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40

Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) ◮ Dictionary learning for images (2000’s) (3) ◮ Neuroimaging: measurements assumed to be explained by a few active brain sources (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40

Justification for dipolarity assumption Sparsity holds: dipolar patterns equivalent to focal sources ◮ short duration ◮ simple cognitive task ◮ repetitions of experiment average out other sources ◮ ICA recovers dipolar patterns, (4) well modeled by focal sources: (4) A. Delorme et al. “Independent EEG sources are dipolar”. In: PloS one 7.2 (2012), e30135. 19 / 40

(Structured) Sparsity inducing penalties (5) � 1 � ˆ 2 nT � Y − X B � 2 B ∈ arg min F + λ � B � 1 B ∈ R p × T Sparse support: no structure ✗ Lasso penalty sources p T � � � � B � 1 | B jt | j =1 t =1 time (5) G. Obozinski, B. Taskar, and M. I. Jordan. “Joint covariate selection and joint subspace selection for multiple classification problems”. In: Statistics and Computing 20.2 (2010), pp. 231–252. 20 / 40

(Structured) Sparsity inducing penalties (5) � 1 � ˆ 2 nT � Y − X B � 2 B ∈ arg min F + λ � B � 2 , 1 B ∈ R p × T Sparse support: group structure ✓ Group-Lasso penalty sources p � � B � 2 , 1 � � B j : � 2 j =1 with B j : , j -th row of B time (5) G. Obozinski, B. Taskar, and M. I. Jordan. “Joint covariate selection and joint subspace selection for multiple classification problems”. In: Statistics and Computing 20.2 (2010), pp. 231–252. 20 / 40

Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 (6) & whitened, say using baseline data 21 / 40

Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 B repet = ˆ ◮ Fail: ˆ B (because of datafit �·� 2 F ) (6) & whitened, say using baseline data 21 / 40

Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 B repet = ˆ ◮ Fail: ˆ B (because of datafit �·� 2 F ) → investigate other datafits ֒ (6) & whitened, say using baseline data 21 / 40

Lasso (7) , (8) : the “modern least-squares” (9) 1 2 n � y − Xβ � 2 + λ � β � 1 ˆ β ∈ arg min β ∈ R p ◮ y ∈ R n : observations ◮ X ∈ R n × p : design matrix ◮ sparsity : for λ large enough, � ˆ β � 0 ≪ p (7) R. Tibshirani. “Regression Shrinkage and Selection via the Lasso”. In: J. R. Stat. Soc. Ser. B Stat. Methodol. 58.1 (1996), pp. 267–288. (8) S. S. Chen and D. L. Donoho. “Atomic decomposition by basis pursuit”. In: SPIE . 1995. (9) E. J. Candès, M. B. Wakin, and S. P. Boyd. “Enhancing Sparsity by Reweighted l 1 Minimization”. In: J. Fourier Anal. Applicat. 14.5-6 (2008), pp. 877–905. 23 / 40

The smoothed multivariate square-root Lasso: an optimization lens on - PowerPoint PPT Presentation

The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation Joseph Salmon http://josephsalmon.eu IMAG, Univ. Montpellier, CNRS Series of works with: Quentin Bertrand (INRIA) Mathurin Massias (University of

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Square Root of Not: Square Root of Not: . . . A Major Difference Between Square Root of

PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Insights and algorithms for the multivariate square-root lasso Aaron J. Molstad Department of

Root River Fisheries Root River Fisheries Craig Helker Craig Helker WDNR WDNR Root River

Smoothed Particle Hydrodynamics Smoothed Particle Hydrodynamics Techniques for the Physics Based

Certicate Transparency Root Explorer Nikita Korzhitskii Niklas Carlsson Web Public Key

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

5/15/2019 Square Root - Direct Method Square Root - Direct Method In IEEE floating point standard

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Source Analysis Source Analysis of Interictal Spikes of Interictal Spikes in EEG and MEG in

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &

Operating ActiveTwo Innovative solutions for research in electrophysiology and behavior

Combining different modalities in classifying phonological categories 1 S H U N A N Z H A O 1 A

Final Project Specifications CMPE 650 kNN Overview K-N earest N eighbors (kNN) is a

Ryan Newton , Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel Madden Example

Modelling Effects of Electrical Stimulation on Seizure BENG/BGGN 260 final project By: Carissa

Enhanced Detection of Movement Onset in EEG through Deep Oversampling Monday 15 th May 2017 30 th

The smoothed multivariate square-root Lasso: an optimization lens on - PowerPoint PPT Presentation

The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation Joseph Salmon http://josephsalmon.eu IMAG, Univ. Montpellier, CNRS Series of works with: Quentin Bertrand (INRIA) Mathurin Massias (University of

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Square Root of Not: Square Root of Not: . . . A Major Difference Between Square Root of

PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Insights and algorithms for the multivariate square-root lasso Aaron J. Molstad Department of

Root River Fisheries Root River Fisheries Craig Helker Craig Helker WDNR WDNR Root River

Smoothed Particle Hydrodynamics Smoothed Particle Hydrodynamics Techniques for the Physics Based

Certicate Transparency Root Explorer Nikita Korzhitskii Niklas Carlsson Web Public Key

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

5/15/2019 Square Root - Direct Method Square Root - Direct Method In IEEE floating point standard

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Source Analysis Source Analysis of Interictal Spikes of Interictal Spikes in EEG and MEG in

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &amp;

Operating ActiveTwo Innovative solutions for research in electrophysiology and behavior

Combining different modalities in classifying phonological categories 1 S H U N A N Z H A O 1 A

Final Project Specifications CMPE 650 kNN Overview K-N earest N eighbors (kNN) is a

Ryan Newton , Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel Madden Example

Modelling Effects of Electrical Stimulation on Seizure BENG/BGGN 260 final project By: Carissa

Enhanced Detection of Movement Onset in EEG through Deep Oversampling Monday 15 th May 2017 30 th

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &