Sparse Regression Codes Sekhar Tatikonda (Yale University) in - - PowerPoint PPT Presentation

sparse regression codes
SMART_READER_LITE
LIVE PREVIEW

Sparse Regression Codes Sekhar Tatikonda (Yale University) in - - PowerPoint PPT Presentation

Sparse Regression Codes Sekhar Tatikonda (Yale University) in collaboration with Ramji Venkataramanan (University of Cambridge) Antony Joseph (UC-Berkeley) Tuhin Sarkar (IIT-Bombay) Information and Control in Networks October 18, 2012


slide-1
SLIDE 1

Sparse Regression Codes

Information and Control in Networks

October 18, 2012

Sekhar Tatikonda (Yale University) in collaboration with Ramji Venkataramanan (University of Cambridge) Antony Joseph (UC-Berkeley) Tuhin Sarkar (IIT-Bombay)

slide-2
SLIDE 2

Outline

Summary

  • Lossy coding fundamental component of networked control
  • Efficient codes for lossy Gaussian source coding
  • Based on sparse regression
  • Background
  • Sparse Regression Codes
  • Optimal Encoding
  • Practical Encoding
  • Multi-terminal Extensions
  • Conclusions
slide-3
SLIDE 3

Gaussian Data Compression

Codebook R bits/sample

S = S1, . . . , Sn

size 2nR

ˆ S = ˆ S1, . . . , ˆ Sn

S i.i.d Gaussian source N(0, σ2) MSE distortion: 1

nS − ˆ

S2 ≤ D Possible iff R > R∗(D) = 1

2 log σ2 D

2 / 18

slide-4
SLIDE 4

Achieving R∗(D)

Shannon random coding

S(1), . . . , ˆ S(2nR)} each ∼ i.i.d N(0, σ2 − D) Exponential storage & encoding complexity Lattice codes - compact representation

  • Conway-Sloane, Eyboglu-Forney, Zamir-Shamai-Erez, . . .

GOAL: Compact representiation + Fast encoding & decoding

3 / 18

slide-5
SLIDE 5

Related Work

Sparse regression codes for source coding

  • [Kontoyiannis, Rad, Gitzenis ITW ’10]
  • Comp. feasible constructions for finite alphabet sources:
  • Gupta, Verdu, Weissman [ISIT ’08]
  • Jalali, Weissman [ISIT ’10]
  • Kontoyiannias, Gioran [ITW’10]
  • LDGM codes: [Wainwright, Maneva, Martinian ’10]
  • Polar codes: [Korada, Urbanke ’10]

4 / 18

slide-6
SLIDE 6

In this talk . . .

Ensemble of codes based on sparse linear regression

  • For point-to-point & multi-terminal problems

Provably achieve rates close to info-theoretic limits

  • with fast encoding + decoding

Based on construction of Barron & Joseph for AWGN channel

  • Achieve capacity with fast decoding [ISIT ’10, Arxiv ’12]

6 / 20

slide-7
SLIDE 7

Outline

  • Background
  • Sparse Regression Codes
  • Optimal Encoding
  • Practical Encoding
  • Multi-terminal Extensions
  • Conclusions
slide-8
SLIDE 8

Sparse Regression Codes (SPARC)

A: n × ML design matrix or ‘dictionary’ with i.i.d N(0, 1) entries A: β: 0, c, 0, c, 0, , 0 0,

M columns M columns M columns Section 1 Section 2 Section L

T

n rows

0, c, Codewords of the form Aβ

  • β: sparse ML × 1 binary vector,

c2 = codeword variance

L

5 / 18

slide-9
SLIDE 9

SPARC Construction

A: β: 0, c, 0, c, 0, , 0 0,

M columns M columns M columns Section 1 Section 2 Section L

T

n rows

0, c,

Choosing M and L

For rate R codebook, need ML = 2nR Shannon codebook: L = 1, M = 2nR We choose M = Lb ⇒ L ∼ Θ (n/log n) Size of A ∼ n × (

n log n)b+1: polynomial in n

6 / 18

slide-10
SLIDE 10

Minimum Distance Encoding

A: β: 0, c, 0, c, 0, , 0 0,

M columns M columns M columns Section 1 Section 2 Section L

T

n rows

0, c,

Encoder: Find ˆ β = argmin

β

S − Aβ Decoder: Reconstruct ˆ S = Aˆ β Pn = P 1 nS − ˆ S2 > D

  • Error Exponent: T = − lim supn

1 n log Pn ⇒ Pn <

∼ e−nT

7 / 18

slide-11
SLIDE 11

Outline

  • Background
  • Sparse Regression Codes
  • Optimal Encoding
  • Practical Encoding
  • Multi-terminal Extensions
  • Conclusions
slide-12
SLIDE 12

Correlated Codewords

Each codeword sum of L columns Codewords ˆ S(i), ˆ S(j) dependent if they have common columns

A:

Lb columns Lb columns Lb columns Section 1 Section L n rows Section 2

# codewords dependent with ˆ S(i) = ML − 1 − (M − 1)L

11 / 18

slide-13
SLIDE 13

Error Analysis for SPARC

P(E)≤ P(|S|2 ≥ a2)

  • KL divergence

+ P(E | |S|2 < a2)

  • ?

. Define Ui(S) =

  • 1

if |ˆ S(i) − S|2 < D

  • therwise

P(E(S) | |S|2 < a2) = P  

2nR

  • i=1

Ui(S) = 0 | |S|2 < a2   {Ui(S)} are dependent

12 / 18

slide-14
SLIDE 14

Dependency Graph

A B For random variables {Ui}i∈I, any graph with vertex set I s.t: If A and B are two disjoint subsets of I such that there are no edges with one vertex in A and the other in B, then the families {Ui}i∈A and {Ui}i∈B are independent.

13 / 18

slide-15
SLIDE 15

For our problem . . .

Ui(S) =

  • 1

if |ˆ S(i) − S|2 < D

  • therwise

, i = 1, . . . , 2nR For the family {Ui(S)}, {i ∼ j : i = j and ˆ S(i), ˆ S(j) share at least one common term} is a dependency graph.

14 / 18

slide-16
SLIDE 16

Suen’s correlation inequality

Let {Ui}i∈I, be Bernoulli rvs with dependency graph Γ. Then P

  • i∈I

Ui = 0

  • ≤ exp
  • − min

λ 2 , λ2 8∆, λ 6δ

  • where

λ =

  • i∈I

EUi, ∆ = 1 2

  • i∈I
  • j∼i

E(UiUj), δ = max

i∈I

  • k∼i

EUk.

15 / 18

slide-17
SLIDE 17

Optimal Error Exponent for Gaussian Source

| S |2 = σ2 |S|2 = a2

R = 1

2 log a2 D

[Ihara, Kubo ’00]

2nR codewords i.i.d N(0, a2 − D)

Pn < P(|S|2 ≥ a2)

  • ∼ exp(−nD(a2σ2))

+ P(|S|2 < a2) · P( error | |S|2 < a2)

  • ↓ double-exponentially

8 / 18

slide-18
SLIDE 18

Main Result

A:

Lb columns Lb columns Lb columns Section 1 Section L n rows

Theorem

SPARCs with minimum distance encoding achieve the rate-distortion function with the optimal error exponent when b > 3.5R R − (1 − 2−2R). This is possible whenever D

σ2 < 0.203

Codebook representation polynomial in n: n × (

n log n)b+1 elements

9 / 18

slide-19
SLIDE 19

Performance: Min-distance Encoding

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5

D/σ2 Rate (bits) 0.5 log σ2/D 1−D/σ2

10 / 18

slide-20
SLIDE 20

Outline

  • Background
  • Sparse Regression Codes
  • Optimal Encoding
  • Practical Encoding
  • Multi-terminal Extensions
  • Conclusions
slide-21
SLIDE 21

SPARC Construction

A: β: 0, c2, 0, cL, 0, , 0 0,

M columns M columns M columns Section 1 Section 2 Section L

T

n rows

0, c1, n rows, ML columns Choosing M and L: For rate R codebook, need ML = 2nR Choose M polynomial of n ⇒ L ∼ n/log n Storage Complexity ↔ Size of A: polynomial in n

10 / 20

slide-22
SLIDE 22

A Simple Encoding Algorithm

A: β: 0,

M columns Section 1

T

n rows

0, c1,

Step 1: Choose column in Sec.1 that minimizes X − c1Aj2

  • Max among inner products X, Aj
  • ‘Residue’ R1 = X − c1 ˆ

A1

11 / 20

slide-23
SLIDE 23

A Simple Encoding Algorithm

A: β: 0, c2, 0,

M columns Section 2

T

n rows

Step 2: Choose column in Sec.2 that minimizes R1 − c2Aj2

  • Max among inner products R1, Aj
  • Residue R2 = R1 − c2 ˆ

A2

11 / 20

slide-24
SLIDE 24

A Simple Encoding Algorithm

A: β: cL, 0, , 0

M columns Section L

T

n rows

Step L: Choose column in Sec.L that minimizes RL−1 − cLAj2

  • Max among inner products RL−1, Aj
  • Final residue RL = RL−1 − cL ˆ

AL

11 / 20

slide-25
SLIDE 25

Performance

Theorem (RV, Sarkar, Tatikonda ’12)

The proposed encoding algorithm approaches the rate-distortion function with exponentially small probability of error. In particular, P

  • Distortion > σ2e−2R + ∆
  • ≤ e−L∆

for ∆ ≥ 1 log M .

Computation Complexity

ML inner products and comparisons ⇒ polynomial in n

Storage Complexity

Design matrix A: n × ML ⇒ polynomial in n

12 / 20

slide-26
SLIDE 26

Outline

  • Background
  • Sparse Regression Codes
  • Optimal Encoding
  • Practical Encoding
  • Multi-terminal Extensions
  • Conclusions
slide-27
SLIDE 27

Point-to-point Communication

Noise

+

X Z ˆ M M

Z = X + Noise, X2 n ≤ P, Noise ∼ Normal(0, N)

SPARCs

Provably good with low-complexity decoding

  • [Barron-Joseph, ISIT ’10,’11,

Arxiv ’12]

13 / 20

slide-28
SLIDE 28

SPARC Construction

A: β: 0, c2, 0, cL, 0, , 0 0,

M columns M columns M columns Section 1 Section 2 Section L

T

n rows

0, c1,

n rows, ML columns β ↔ message, Codeword Aβ For rate R codebook, need ML = 2nR

  • choose M polynomial of n ⇒ L ∼ n/log n

Adaptive successive decoding achieves R < Capacity

14 / 20

slide-29
SLIDE 29

Wyner-Ziv coding

Encoder Decoder X Y ˆ X R

Side-info Y = X + Z X ∼ N(0, σ2), Z ∼ N(0, N)

16 / 20

slide-30
SLIDE 30

Wyner-Ziv coding

Encoder Decoder X Y ˆ X R

Side-info Y = X + Z X ∼ N(0, σ2), Z ∼ N(0, N)

U 2nR1 cwds

Encoder

U = X + V , V ∼ N(0, Q) Quantize X to U

  • Find U that minimizes X − aU2,

a =

σ2 σ2+Q

16 / 20

slide-31
SLIDE 31

Wyner-Ziv coding

Encoder Decoder X Y ˆ X R

Side-info Y = X + Z X ∼ N(0, σ2), Z ∼ N(0, N)

U 2nR1 cwds 2nR bins

Encoder

U = X + V , V ∼ N(0, Q) Quantize X to U

  • Find U that minimizes X − aU2,

a =

σ2 σ2+Q

16 / 20

slide-32
SLIDE 32

Wyner-Ziv coding

Encoder Decoder X Y ˆ X R

Side-info Y = X + Z X ∼ N(0, σ2), Z ∼ N(0, N)

U 2nR1 cwds 2nR bins

Decoder

Y = X + Z ← → Y = aU + Z ′ Find U within bin that minimizes Y − aU2

  • Reconstruct ˆ

X = E [ X | UY]

16 / 20

slide-33
SLIDE 33

Binning with SPARCs

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

Section 2

Quantize X to aU using n × ML SPARC (rate R1)

17 / 20

slide-34
SLIDE 34

Binning with SPARCs

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Quantize X to aU using n × ML SPARC (rate R1) (M/M′)L = 2nR

17 / 20

slide-35
SLIDE 35

Binning with SPARCs

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Quantize X to aU using n × ML SPARC (rate R1) (M/M′)L = 2nR Bin: defined by 1 subsection from each section

  • Encoder only sends indices of non-zero subsections

Decodes Y to U within smaller n × M′L SPARC

17 / 20

slide-36
SLIDE 36

Writing on Dirty Paper

Encoder + Decoder M ˆ M

X

Z

S Noise

Z = X + S + N, X2 n ≤ P

18 / 20

slide-37
SLIDE 37

Writing on Dirty Paper

Z = X + S + N,

X2 n

≤ P

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

Section 2

Encoder

n × ML SPARC of rate R1 Divide each section into M′ subsections

  • Defines (M/M′)L = 2nR bins

18 / 20

slide-38
SLIDE 38

Writing on Dirty Paper

Z = X + S + N,

X2 n

≤ P

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Encoder

n × ML SPARC of rate R1 Divide each section into M′ subsections

  • Defines (M/M′)L = 2nR bins

18 / 20

slide-39
SLIDE 39

Writing on Dirty Paper

Z = X + S + N,

X2 n

≤ P

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Encoder

Within message bin ‘quantize’ S to U U = X + αS, U ∼ N(0, P + α2σ2

s )

18 / 20

slide-40
SLIDE 40

Writing on Dirty Paper

Z = X + S + N,

X2 n

≤ P

A: β:

T

0, c1, cL, 0, , 0 0,

M columns Section L M columns Section 1 M columns

, c2, 0,

M ′

Decoder

Z = X + S + N ↔ Z = (1 + κ)U + N′ Decode U from Z the big (rate R1) codebook

18 / 20

slide-41
SLIDE 41

Main Result

Encoder Decoder

X Y ˆ X

Rate R

Encoder + Decoder M ˆ M

X

Y

S Noise

Theorem

SPARCs attain the optimal information-theoretic limits for the Gaussian Wyner-Ziv and Gelfand-Pinsker problems with exponentially decaying probability of error.

19 / 20

slide-42
SLIDE 42

Other multi-terminal networks

Multiple-access

Noise

+

X1 X3 Z ˆ M1 M1 M3 X2 M2 ˆ M2 ˆ M3

Broadcast

Noise

+

X Z1 Z3 ˆ M1 ˆ M3

+

Noise Noise

+

Z2 ˆ M2 4 / 20

slide-43
SLIDE 43

Outline

  • Background
  • Sparse Regression Codes
  • Optimal Encoding
  • Practical Encoding
  • Multi-terminal Extensions
  • Conclusions
slide-44
SLIDE 44

Summary

Sparse Regression Codes

Rate-optimal codes for compression and communication Low-complexity coding algorithms Nice structure that enables

  • Binning (Wyner-Ziv, Gelfand-Pinsker)
  • Superposition (Multiple-access, Broadcast)

Future Directions

Interference channels, Multiple descriptions, . . . Improved coding algorithms - ℓ1 minimization etc.? General design matrices Finite-field analogs ?

20 / 20