Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale - - PowerPoint PPT Presentation

sparse regression codes
SMART_READER_LITE
LIVE PREVIEW

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale - - PowerPoint PPT Presentation

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016 . . . . . . . . . . .


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sparse Regression Codes

Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016

1 / 25

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part III of the tutorial:

  • SPARCs for Lossy Compression
  • SPARCs for Multi-terminal Source and Channel Coding
  • Open questions

(Joint work with Sekhar Tatikonda, Tuhin Sarkar, Adam Greig)

2 / 25

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Lossy Compression

Codebook

S = S1, . . . , Sn

enR

ˆ S = ˆ S1, . . . , ˆ Sn

R nats/sample

  • Distortion criterion: 1

n∥S − ˆ

S∥2 = 1

n

k(Sk − ˆ

Sk)2

  • For i.i.d N(0, ν2) source, min distortion = ν2e−2R

3 / 25

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Lossy Compression

Codebook

S = S1, . . . , Sn

enR

ˆ S = ˆ S1, . . . , ˆ Sn

R nats/sample

  • Distortion criterion: 1

n∥S − ˆ

S∥2 = 1

n

k(Sk − ˆ

Sk)2

  • For i.i.d N(0, ν2) source, min distortion = ν2e−2R
  • Can we achieve this with low-complexity codes?

– Storage & Computation

3 / 25

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SPARC Construction

A: β: 0, c2, 0, cL, 0, , 0 0,

M columns M columns M columns Section 1 Section 2 Section L

T

n rows

0, c1, n rows, ML columns, Aij ∼ N(0, 1/n)

4 / 25

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SPARC Construction

A: β: 0, c2, 0, cL, 0, , 0 0,

M columns M columns M columns Section 1 Section 2 Section L

T

n rows

0, c1, n rows, ML columns, Aij ∼ N(0, 1/n) Choosing M and L:

  • For rate R codebook, need ML = enR
  • Choose M polynomial of n ⇒ L ∼ n/log n
  • Storage Complexity ↔ Size of A: polynomial in n

4 / 25

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Optimal Encoding

A: β: 0, c2, 0, cL, 0, , 0 0,

M columns M columns M columns Section 1 Section 2 Section L

T

n rows

0, c1,

Minimum Distance Encoding: ˆ β = arg minβ∈ SPARC ∥S − Aβ2∥ Theorem [Venkataramanan, Tatikonda ’12, ’14]: For source S i.i.d. ∼ N(0, ν2), the sequence of rate R SPARCs with n, L, M = Lb with b > b∗(R): P (1 n∥S − Aˆ β∥2 > D ) < e−n (E ∗(R,D)+o(1)). Achieves the optimal rate-distortion function with the optimal error exponent E ∗(R, D)

5 / 25

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Successive Cancellation Encoding

A: β: 0,

M columns Section 1

T

n rows

0, c1,

Step 1: Choose column in section 1 that minimizes ∥S − c1Aj∥2

  • Max among M inner products ⟨S, Aj⟩

6 / 25

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Successive Cancellation Encoding

A: β: 0,

M columns Section 1

T

n rows

0, c1,

Step 1: Choose column in section 1 that minimizes ∥S − c1Aj∥2

  • Max among M inner products ⟨S, Aj⟩
  • c1 =

√ 2ν2 log M

  • residual R1 = S − c1 ˆ

A1

6 / 25

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Successive Cancellation Encoding

A: β: 0, c2, 0,

M columns Section 2

T

n rows

Step 2: Choose column in section 2 that minimizes ∥R1 − c2Aj∥2

  • Max among inner products ⟨R1, Aj⟩
  • c2 =

√ 2(log M)ν2 ( 1 − 2R

L

)

  • residual R2 = R1 − c2 ˆ

A2

6 / 25

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Successive Cancellation Encoding

A: β: cL, 0, , 0

M columns Section L

T

n rows

Step L: Choose column in section L that minimizes ∥RL−1 −cLAj∥2

  • cL =

√ 2(log M)ν2 ( 1 − 2R

L

)L

  • Final residual RL = RL−1 − cL ˆ

AL Final Distortion = 1

n∥RL∥2

6 / 25

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Performance

Theorem [Venkataramanan, Sarkar,Tatikonda ’13]: For an ergodic source S with mean 0 and variance ν2, the encoding algorithm produces a codeword Aˆ β that satisfies the following for sufficiently large M, L: P (1 n∥S − Aˆ β∥2 > ν2e−2R + ∆ ) < e

−κn (∆− c log log M log M )

Deviation between actual distortion and the optimal value is O( log log n

log n )

7 / 25

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Performance

Theorem [Venkataramanan, Sarkar,Tatikonda ’13]: For an ergodic source S with mean 0 and variance ν2, the encoding algorithm produces a codeword Aˆ β that satisfies the following for sufficiently large M, L: P (1 n∥S − Aˆ β∥2 > ν2e−2R + ∆ ) < e

−κn (∆− c log log M log M )

Deviation between actual distortion and the optimal value is O( log log n

log n )

Encoding Complexity: ML inner products and comparisons ⇒ polynomial in n

7 / 25

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Numerical Experiment

Gaussian source: Mean 0, Variance 1

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Rate (bits/sample) Distortion Shannon limit

Parameters: M=L3, L∈[30,100]

SPARC

8 / 25

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why does the algorithm work?

A: β: 0,

M columns Section 1

T

n rows

0, c1, Each section is a code of rate R/L (L ∼

n log n)

  • Step 1: S

− → R1 = S − c1 ˆ A1 |R1|2 ≈ ν2e−2R/L ≈ ν2 ( 1 − 2R L ) for c1 = √ 2ν2 log M

9 / 25

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why does the algorithm work?

A: β: 0, c2, 0,

M columns Section 2

T

n rows

Each section is a code of rate R/L (L ∼

n log n)

  • Step 1: S

− → R1 = S − c1 ˆ A1 |R1|2 ≈ ν2e−2R/L ≈ ν2 ( 1 − 2R L ) for c1 = √ 2ν2 log M

  • Step 2: ‘Source’ R1

− → R2 = R1 − c2 ˆ A2

9 / 25

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why does the algorithm work?

A: β: 0, ci, 0

M columns Section i

T

n rows

Each section is a code of rate R/L (L ∼

n log n)

  • Step i: ‘Source’ Ri−1

− → Ri = Ri−1 − ci ˆ A2 With c2

i = 2Rν2 L

( 1 − 2R

L

)i−1, |Ri|2 ≈ |Ri−1|2 ( 1 − 2R L ) ≈ ν2 ( 1 − 2R L )i

9 / 25

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why does the algorithm work?

A: β: cL, 0, , 0

M columns Section L

T

n rows

Each section is a code of rate R/L (L ∼

n log n)

Final Distortion: |RL|2 ≈ ν2 ( 1 − 2R L )L ≤ ν2e−2R L-stage successive refinement L ∼ n/ log n

9 / 25

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Successive Refinement Interpretation

A: β: 0, ci, 0

M columns Section i

T

n rows

  • The encoder successively refines the source over ∼

n log n stages

  • The deviations in each stage can be significant!

|Ri|2 = ν2 ( 1 − 2R L )i

  • ‘Typical Value’

(1 + ∆i)2, i = 0, . . . , L

  • KEY to result: Controlling the final deviation ∆L
  • Recall: successive cancellation does not work for SPARC

AWGN decoding

10 / 25

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Open Questions in SPARC Compression

  • Better encoders with smaller gap to D∗(R)?

Iterative soft-decision encoding, AMP?

  • AWGN decoding AMP doesn’t work when directly used for

compression: – may need decimation a la LDGM codes for compression

  • But recall: With min-distance encoding, SPARCs attain the

rate-distortion function with the optimal error-exponent

  • Compression performance with ±1 dictionaries
  • Compression of finite alphabet sources

11 / 25

slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sparse Regression Codes for multi-terminal networks

12 / 25

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Codes for multi-terminal problems

Key ingredients:

  • Superposition (Multiple-access channel, Broadcast channel)
  • Random binning (e.g., Distributed compression, Channel

Coding with Side-Information, . . . ) SPARC is based on superposition coding!

13 / 25

slide-23
SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multiple-Access Channel

Noise

+

X1 X2 Y ˆ M1 M1 M2 ˆ M2

∥X1∥2 n ≤ P1, ∥X2∥2 n ≤ P2, Noise ∼ N(0, σ2) Corner points of capacity region are given by R1 = 1 2 log ( 1 + P1 P2 + σ2 ) , R2 = 1 2 log ( 1 + P2 σ2 ) and R1 = 1 2 log ( 1 + P1 σ2 ) , R2 = 1 2 log ( 1 + P2 P1 + σ2 )

14 / 25

slide-24
SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Successive Decoding

Y = X1 + X2 + Noise The rate-pair R1 = 1 2 log ( 1 + P1 P2 + σ2 ) , R2 = 1 2 log ( 1 + P2 σ2 ) can be achieved by point-to-point codes:

  • X1 is decoded from Y treating X2 as noise
  • Subtract off X1, then decode X2 with snr P2/σ2

15 / 25

slide-25
SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Successive Decoding

Y = X1 + X2 + Noise The rate-pair R1 = 1 2 log ( 1 + P1 P2 + σ2 ) , R2 = 1 2 log ( 1 + P2 σ2 ) can be achieved by point-to-point codes:

  • X1 is decoded from Y treating X2 as noise
  • Subtract off X1, then decode X2 with snr P2/σ2

Easy to implement with SPARCs

Rate R1 SPARC defined by n × M1L1 matrix A1 Rate R2 SPARC defined by n × M2L2 matrix A2 Y = A1β1 + A2β2 + Noise

15 / 25

slide-26
SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Joint Decoding

Y = [A1 A2] [β1 β2 ] + Noise

  • One can also decode the message pair β := [β1; β2] directly

using design matrix A := [A1 A2]

  • Can achieve all the points in the capacity region
  • Idea extends to > 2 users
  • Can achieve capacity region of scalar Gaussian Broadcast

channel with similar superposition idea Codes for MAC and BC are straightforward because SPARC is already based on superposition coding!

16 / 25

slide-27
SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Compression with Decoder Side-Information

Encoder Decoder X Y = X + Z ˆ X R

  • Side-information Y = X + Z

X ∼ N(0, ν2), Z ∼ N(0, σ2)

  • Want to compress X to within squared-distortion

D ∈ (0, var(X|Y )), var(X|Y ) =

ν2σ2 ν2+σ2

[Wyner-Ziv ’75]: The optimal rate-distortion function is R∗(D) = 1 2 log var(X|Y ) D , D ∈ (0, var(X|Y ))

  • Want to achieve this with feasible encoding + decoding

17 / 25

slide-28
SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Wyner-Ziv Coding Scheme

Encoder Decoder X Y ˆ X R

Side-info Y = X + Z X ∼ N(0, ν2), Z ∼ N(0, σ2)

18 / 25

slide-29
SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Wyner-Ziv Coding Scheme

Encoder Decoder X Y ˆ X R

Side-info Y = X + Z X ∼ N(0, ν2), Z ∼ N(0, σ2)

U 2nR1 cwds

Encoder

  • Quantize X to U: find U that minimizes ∥X − U∥2
  • Z ′ = X − U, want R1 large enough that the distortion

∥Z ′∥2 n

≤ (

1 ν2 + 1 D − 1 var(X|Y )

)−1

18 / 25

slide-30
SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Wyner-Ziv Coding Scheme

Encoder Decoder X Y ˆ X R

Side-info Y = X + Z X ∼ N(0, ν2), Z ∼ N(0, σ2)

U 2nR1 cwds 2nR bins

Encoder

  • Quantize X to U: find U that minimizes ∥X − U∥2
  • Z ′ = X − U, want R1 large enough that the distortion

∥Z ′∥2 n

≤ (

1 ν2 + 1 D − 1 var(X|Y )

)−1

18 / 25

slide-31
SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Wyner-Ziv Coding Scheme

Encoder Decoder X Y ˆ X R

Side-info Y = X + Z X ∼ N(0, ν2), Z ∼ N(0, σ2)

U 2nR1 cwds 2nR bins

Decoder

Y = X + Z = U + Z ′ + Z

  • Find U within bin that minimizes ∥Y − U∥2

– Reconstruct ˆ X = E [ X | U, Y ]

18 / 25

slide-32
SLIDE 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Binning with SPARCs

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

Section 2

  • Quantize X to U using n × ML SPARC (rate R1)

19 / 25

slide-33
SLIDE 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Binning with SPARCs

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

  • Quantize X to U using n × ML SPARC (rate R1)
  • Divide each section into subsections of M′ columns
  • Encoder sends indices of sub-sections containing the column

19 / 25

slide-34
SLIDE 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Binning with SPARCs

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

  • Quantize X to U using n × ML SPARC (rate R1)
  • Divide each section into subsections of M′ columns
  • Encoder sends indices of sub-sections containing the column
  • Each Bin is a collection of L sub-sections

( M/M′)L = 2nR bins

19 / 25

slide-35
SLIDE 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Binning with SPARCs

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

  • Quantize X to U using n × ML SPARC (rate R1)
  • Divide each section into subsections of M′ columns
  • Encoder sends indices of sub-sections containing the column
  • Each Bin is a collection of L sub-sections

( M/M′)L = 2nR bins

  • Decodes Y to U within smaller n × M′L SPARC

19 / 25

slide-36
SLIDE 36

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Writing on Dirty Paper

Encoder + Decoder M ˆ M

X

Y

S Noise ε

Y = X + S + ε, S ∼ N(0, σ2

s ),

ε ∼ N(0, σ2), ∥X∥2 n ≤ P

Theorem [Gelfand-Pinsker ’80, Costa ’83]

The capacity of this channel is 1

2 log

( 1 + P

σ2

) High-rate channel code split into bins of lower rate “source” codes

20 / 25

slide-37
SLIDE 37

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SPARC Construction

Y = X + S + ε,

∥X∥2 n

≤ P

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Encoder

  • n × ML SPARC of rate R1
  • Divide each section into M′ subsections
  • Defines (M/M′)L = 2nR bins

21 / 25

slide-38
SLIDE 38

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SPARC Construction

Y = X + S + ε,

∥X∥2 n

≤ P

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Encoder

  • Within message bin quantize S to U using rate (R1 − R)

SPARC

  • Transmit X = U − αS, for appropriately chosen constant α

21 / 25

slide-39
SLIDE 39

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SPARC Construction

Y = X + S + ε,

∥X∥2 n

≤ P

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Decoder

Y = X + S + ε ↔ Y = (1 + κ)U + ε′

  • Decode U from Y the big (rate R1) codebook

21 / 25

slide-40
SLIDE 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Binning with SPARCs

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Theorem (Venkataramanan-Tatikonda ’12)

With optimal (ML) encoding + decoding, SPARCs attain the

  • ptimal information-theoretic rates for Gaussian Wyner-Ziv and

Gelfand-Pinsker models with probability of error exponentially decaying in n.

22 / 25

slide-41
SLIDE 41

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Summary

Sparse Regression Codes:

  • Rate-optimal for Gaussian point-to-point communication and

compression

  • Low-complexity encoding and decoding algorithms
  • Nice structure that enables binning and superposition

23 / 25

slide-42
SLIDE 42

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Ongoing Work/Open Questions

– Power Allocation for binning with feasible encoding & decoding: Optimal allocation for the source and channel coding parts are different! – SPARCs for Gaussian channel coding, source coding, binning + superposition ⇒ low-complexity, rate-optimal codes for:

  • Distributed Lossy Compression (“Berger-Tung”)
  • Gaussian Multiple Descriptions
  • Gaussian Relay Channels
  • Fading Channels, MIMO Channels
  • Gaussian Multi-terminal Networks

. . . – SPARCs for interpreting variables that arise in converses

24 / 25

slide-43
SLIDE 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

References

– R. Venkataramanan, A. Joseph and S. Tatikonda, Lossy Compression via Sparse Linear Regression: Performance under Minimum-distance Encoding, IEEE Trans. Inf. Theory, June 2014 – R. Venkataramanan, T. Sarkar and S. Tatikonda, Lossy Compression via Sparse Linear Regression: Computationally Efficient Encoding and Decoding, IEEE Trans. Inf. Theory, June 2014 – R. Venkataramanan and S. Tatikonda, The Rate-Distortion Function and Error Exponent of Sparse Regression Codes with Optimal Encoding, http://arxiv.org/abs/1401.5272 (Short version at ISIT ’14) – R. Venkataramanan and S. Tatikonda, Sparse Regression Codes for Multi-terminal Source and Channel Coding, Allerton 2012

25 / 25