Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale - PowerPoint PPT Presentation

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 / 31

Part II of the tutorial: • Approximate message passing (AMP) decoding • Power-allocation schemes to improve finite block-length performance (Joint work with Cynthia Rush and Adam Greig) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 / 31

SPARC Decoding Section 1 Section 2 Section L M columns M columns M columns A : n rows T β : , 0 0 , √ nP 1 , 0 , √ nP 2 , 0 , 0 , √ nP L , 0 , Channel output y = A β + ε Want efficient algorithm to decode β from y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 / 31

AMP for Compressed Sensing • Approximation of loopy belief propagation for dense graphs [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] • Compressed sensing (CS): Want to recover β from y = A β + ε A is n × N measurement matrix, β i.i.d. with known prior β 1 β 2 β N In CS, we often solve LASSO: ˆ β = arg min β ∥ y − A β ∥ 2 2 + λ ∥ β ∥ 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 / 31

Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31

Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N For j = 1 , . . . , N , i = 1 , . . . , n : ∑ M t M t − 1 ˆ j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31

Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N For j = 1 , . . . , N , i = 1 , . . . , n : ∑ M t M t − 1 ˆ j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i ∑ ( y i − ( A β ) i ) 2 + ˆ M t M t i → j ( β j ) = min j ′ → i ( β j ′ ) β \ β j j ′ ∈ [ N ] \ j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31

β 1 β 2 β N ∑ M t − 1 ˆ M t j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i ∑ ( y i − ( A β ) i ) 2 + ˆ M t M t i → j ( β j ) = min j ′ → i ( β j ′ ) β \ β j j ′ ∈ [ N ] \ j But computing these messages is infeasible: — Each message needs to be computed for all β j ∈ R — There are nN such messages Further, the factor graph is not anything like a tree! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 / 31

Quadratic Approximation of messages β 1 β 2 β N Messages approximated by two numbers : ∑ r t A ij ′ β t i → j = y i − j ′ → i j ′ ∈ [ N ] \ i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 / 31

Quadratic Approximation of messages β 1 β 2 β N Messages approximated by two numbers : ( ∑ ) ∑ r t A ij ′ β t β t +1 A i ′ j r t i → j = y i − j → i = η t j ′ → i i ′ → j j ′ ∈ [ N ] \ i i ′ ∈ [ n ] \ i • For LASSO, η t is the soft-thresholding operator • We still have nN messages in each step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 / 31

∑ r t A ij ′ β t j ′ → i + A ij β t i → j = y i − j → i j ′ ∈ [ N ] ( ∑ ) β t +1 A i ′ j r t i ′ → j − A ij r t j → i = η t i → j i ′ ∈ [ n ] Using Taylor approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 / 31

The AMP algorithm for LASSO [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] r t = y − A β t + r t − 1 ∥ β t ∥ 0 n β t +1 = η t ( A T r t + β t ) • AMP iteratively produces estimates β 0 = 0 , β 1 , . . . , β t , . . . • r t is the ‘modified residual’ after step t • η t denoises the effective observation to produce β t +1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 31

The AMP algorithm for LASSO [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] r t = y − A β t + r t − 1 ∥ β t ∥ 0 n β t +1 = η t ( A T r t + β t ) • AMP iteratively produces estimates β 0 = 0 , β 1 , . . . , β t , . . . • r t is the ‘modified residual’ after step t • η t denoises the effective observation to produce β t +1 The momentum term in r t ensures that asymptotically A T r t + β t ≈ β + τ t Z t where Z t is N (0 , I ) ⇒ The effective observation A T r t + x t is true signal observed in independent Gaussian noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 31

AMP for SPARC Decoding Section 1 Section 2 Section L M columns M columns M columns A : n rows T β : 0 , √ nP 1 , 0 , √ nP 2 , 0 , , 0 0 , √ nP L , 0 , ε i.i.d. ∼ N (0 , σ 2 ) y = A β + ε, SPARC decoding is a different optimization problem from LASSO: • Want arg min β ∥ Y − A β ∥ 2 s.t. β is a SPARC message • β has one non-zero per section, section size M → ∞ • The undersampling ratio n / ( ML ) → 0. Let us revisit the (approximated) min-sum updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 / 31

Approximated Min-Sum β 1 β 2 β N ( ∑ ) ∑ r t A ij ′ β t β t +1 A i ′ j r t i → j = y i − j → i = η t , j j ′ → i i ′ → j j ′ ∈ [ N ] \ i i ′ ∈ [ n ] \ i � �� stat t , j If for j ∈ [ N ], stat t , j is approximately distributed as β j + τ t Z t , j , then the Bayes optimal choice of η t , j is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 / 31

Bayes Optimal η t η t ( stat t = s ) = E [ β | β + τ t Z = s ]: √ nP ℓ /τ 2 ( ) exp s j √ t s j ′ √ nP ℓ /τ 2 j ∈ section ℓ. η t , j ( s ) = nP ℓ ) , ( ∑ j ′ ∈ sec ℓ exp t Note that β t +1 is • the MMSE estimate of β given the observation β + τ t Z t • ∝ the posterior probability of entry j of β j being non-zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 / 31

Bayes Optimal η t η t ( stat t = s ) = E [ β | β + τ t Z = s ]: √ nP ℓ /τ 2 ( ) exp s j √ t s j ′ √ nP ℓ /τ 2 j ∈ section ℓ. η t , j ( s ) = nP ℓ ) , ( ∑ j ′ ∈ sec ℓ exp t Note that β t +1 is • the MMSE estimate of β given the observation β + τ t Z t • ∝ the posterior probability of entry j of β j being non-zero ∑ r t A ij ′ β t j ′ → i + A ij β t i → j = y i − j → i j ′ ∈ [ N ] ( ∑ ) β t +1 A i ′ j r t i ′ → j − A ij r t j → i = η t i → j i ′ ∈ [ n ] Using Taylor approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 / 31

AMP Decoder Set β 0 = 0. For t ≥ 0: ( ) r t = y − A β t + r t − 1 P − ∥ β t ∥ 2 , τ 2 n t − 1 = η t , j ( A T r t + β t ) , β t +1 for j = 1 , . . . , ML j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 / 31

AMP Decoder Set β 0 = 0. For t ≥ 0: ( ) r t = y − A β t + r t − 1 P − ∥ β t ∥ 2 , τ 2 n t − 1 = η t , j ( A T r t + β t ) , β t +1 for j = 1 , . . . , ML j √ nP ℓ /τ 2 ( ) exp s j √ t η t , j ( s ) = s j ′ √ nP ℓ /τ 2 ) , j ∈ section ℓ. nP ℓ ( ∑ j ′ ∈ sec ℓ exp t β t +1 is the MMSE estimate of β given that β t + A T r t ≈ β + τ t Z t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 / 31

The statistic β t + A T r t Suppose r t = y − A β t β t + A T r t = β + A T ε ( β t − β ) + ( I − A T A ) �� N (0 ,σ 2 ) ≈ N (0 , 1 / n ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 / 31

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale - PowerPoint PPT Presentation

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016 . . . . . . . . . . .

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Building Codes Building Codes Building Codes Building Codes 1 1 Builder Responsibilities

ECEN 5682 Theory and Practice of Error Control Codes Cyclic Codes Peter Mathys University of

Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Sparse regression DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

CODES FOR ALL SEASONS Emina Soljanin, Bell Labs IN THE CLOUD? CODES Emina @ Bell Labs Codes at

G ENERALIZED R EED -S OLOMON CODES (GRS CODES ) A CHARACTERIZATION OF MDS CODES THAT HAVE AN ERROR

Lattices from Codes or Codes from Lattices Amin Sakzad Dept of Electrical and Computer Systems

Error-Correcting codes: Application of convolutional codes to Video Streaming Diego Napp

Information Theory Lecture 8 BCH codes BCH codes: R8.45 (R5.6) Decoding BCH (and

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and Statistics

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Exoplanet Atmospheres and Giant Telescopes Ian Crossfield Sagan Fellow, UA/LPL 2015/10/08

Outline Paper presentation Ultra-Portable Devices Introduction Paper: MAC protocol

Does Planck length challenge non-relativistic quantum mechanics of large masses? Lajos Di osi

V ALID A RGUMENTS ? If God does not exist, then it is not G (P A) true that if I

2009 CCOutreach 2009 CCOutreach Regional Seminars Regional Seminars 1 Disclaimer Disclaimer

st r e rt

Fluctuations in Non-Singular Bouncing Motivation Cosmologies from Type II Superstrings

ATTORNEY LIABILITY IN PRIVATE OFFERINGS David S. Hunt TODAYS PRESENTATION: WHAT TO EXPECT