Support Material for Presentation of Orange Book on LDPC Code - - PowerPoint PPT Presentation

support material for presentation of orange book on ldpc
SMART_READER_LITE
LIVE PREVIEW

Support Material for Presentation of Orange Book on LDPC Code - - PowerPoint PPT Presentation

Jet Propulsion Laboratory California Institute of Technology Support Material for Presentation of Orange Book on LDPC Code Selection for CCSDS Standard CCSDS, Toulouse, Nov. 15, 2004 JPL Proprietary Material 1 CCSDS, Toulouse, Nov., 2004


slide-1
SLIDE 1

1

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Support Material for Presentation of Orange Book on LDPC Code Selection for CCSDS Standard

CCSDS, Toulouse, Nov. 15, 2004

slide-2
SLIDE 2

2

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

LDPC Code Family Construction Method

  • Protographs are “skeletons” of the code family
  • Selected from dozens (hundreds?) of candidates for:
  • Low threshold, determined by Density Evolution
  • Small size, for simple implementation
  • Small edge degrees, to reduce node complexity
  • Circulants used to expand protographs, so:
  • Code description tables are small
  • Hardware has fast, simple memory addressing
  • Progressive Edge Growth (PEG) used to select circulants
  • PEG is a greedy algorithm that chooses “good” circulants
  • ACE criterion defines how good a set of circulants are
  • ACE (Approximate Cycle EMD) is a low-complexity Extrinsic Message

Degree measure

  • Better than minimum loop length for preventing low weight

codewords

  • Simulation (in hardware and software) used to determine error floor and

number of iterations required.

  • Code is constructed by extending a seed protograph into a large code, where interconnections are
  • rganized as “circulants”, and avoiding short loops
  • lower complexity decoders, while maintaining the relevant code characteristics for good performance
  • fast encoders
slide-3
SLIDE 3

3

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

LDPC Code Selection for Standard (Cont’d)

3 4 2 1 2 1 code rate =(n+1)/(n+2) 2n n=0, 1, ...... Protograph of ARA Family

This simple seed protograph, replicated enough times to obtain the large code, yields a much more structured code, suitable for high speed decoding

1/2 0.516 0.187 0.329 2/3 1.288 1.059 0.229 4/5 2.277 2.040 0.237 7/8 3.129 2.845 0.284

Difference Capacity Code Rate Protograph Threshold

Sparse circulant G matrices Input message Output codeword

D

Sparse matrix multiplies Accumulate Permute Puncture

α Π1 Π3

Π6+Π7 Π4+Π5

Π2

s0,s1 s1 p1,p2 p0 +

Threshold table (near-capacity) Family of protographs

Information Code block length n block length k rate 1/2 rate 2/3 rate 4/5 1024 2048 1536 1280 4096 8192 6144 5120 16384 32768 24576 20480

Code Family (Code rates and block lengths) Fast encoder structure

slide-4
SLIDE 4

4

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

ARAx2_4c_64c parity check matrix, with structure indicated Protograph is 3 rows by 5 columns Expanded 2 times (by hand) to eliminate parallel edges Expanded 4 times with circulants to introduce necessary irregularity Expanded 64 times with circulants to construct full code

Parity check matrix

slide-5
SLIDE 5

5

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Π2 Π6+Π7 Π3

α

Π4+Π5 Π1

m+k p1 p2 s0 p0 s1 s2 transmitted transmitted transmitted punctured transmitted 0 1 2 3 4 5 6 7

c2(j)

n = 2048 k = 1024 m = 1536 punctured rate = k/n = 1/2 unpunctured rate = k/(m+k) = 2/5 m m = n + (punctured) - k j

1 2 3 4 5 6 7

M M = 512

slide-6
SLIDE 6

6

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Rates 1/2, 2/3, and 4/5 at k=16384 Rate 0.87451 at k=7136

Peformance curves

slide-7
SLIDE 7

7

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Ten proposed codes: blue=1/2, green=2/3, red=4/5, black=0.7451

Peformance curves

slide-8
SLIDE 8

8

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

  • Followed a two-pronged development approach:

Conceived and developed two promising types of

  • decoders. Selection of best approach is in progress

BenONE™: Single-slot DIME-II™ Motherboard PCI card Java GUI user interface for remote access to HW platform $14K purchase from Nallatech Daughter Card BenDATA-WS™: 24MByte ZBT SRAM Xilinx Virtex II 8M gates

  • K. Andrews, C. Jones

FY04 Accomplishments (Cont’d) LDPC Decoder Architecture

slide-9
SLIDE 9

9

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

  • D. Divsalar, J. Lee, J. Thorpe, K. Andrews, A. Abbasfar

Parallel Decoder Type-1 – for protograph codes

Parallelization method

  • Decodes one protograph per clock cycle per half-iteration
  • Protograph with k input bits has:

Decoder speed = (k/2) x clock speed/iterations

  • Example ARA protograph with 16 input bits (expanded by 8

from seed protograph) yields 20 Mbps FPGA decoder with 50 MHz clock and 20 average iterations

Pros:

  • Highly parallel architecture for fast

decoders

  • Regular structure

Cons:

  • Little code flexibility: tailored to

protograph codes Pros:

  • Highly parallel architecture for fast

decoders

  • Regular structure

Cons:

  • Little code flexibility: tailored to

protograph codes Expanded protograph has 40 variable nodes, 24 check nodes, and 112 edges. FPGA can support up to 512 slices of protograph. This corresponds to an input block size of up to 8192 bits FPGA utilization factor is 39% logic, 67% RAM. Expanded protograph has 40 variable nodes, 24 check nodes, and 112 edges. FPGA can support up to 512 slices of protograph. This corresponds to an input block size of up to 8192 bits FPGA utilization factor is 39% logic, 67% RAM. Protograph slice # 1 slice # 2 slice # N Parallelization method check node variable node connected to channel variable node not connected to channel

FY04 Accomplishments (Cont’d)

slide-10
SLIDE 10

10

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

  • D. Divsalar, J. Lee, J. Thorpe, K. Andrews, S. Dolinar

Parallel Decoder Type-1 – for protograph codes (Cont’d)

Hardware implementation

  • Developed high-speed decoder architecture that needs only simple addition operations at

both variable and check nodes –Variable nodes add “reliabilities” = Log-likelihoods –Check nodes add “unreliabilities” –Exchanged messages transformed between reliability and unreliability

  • Non-uniform quantizer designed to maximize performance while simplifying this

transformation

Edge memories Variable node processors Constraint node processors Rel/Unrel transformation Variable nodes Check nodes

Decoder implementation for sample protograph

FY04 Accomplishments (Cont’d)

Quantized reliability/unreliability transformation

  • Suitable for in-situ communications. Estimates predict 32 Msps using XQR2V6000 radiation

tolerant FPGA (largest rad-tol FPGA available today)

slide-11
SLIDE 11

11

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Parallel Decoder – Type-1 for Protograph-based LDPC codes (cont.)

  • Developed design for efficient use of Virtex FPGA block RAM memories to maximize both the

decoder’s speed and decodable code size

–Comparison: Type-1 protograph decoder processing e edges in parallel every half-iteration is roughly e/(2L) times faster than Type-2 universal decoder processing 2L edges in parallel every half-iteration – e.g., e = 140 vs L = 16 yields speedup factor > 4 –Nearly a factor of 2 additional parallelizability/speedup may be possible if the FPGA logic can make use of the Virtex RAM’s read-before-write mode

– This would increase the parallelizability limit on e; revised constraint would be e/2 + n/2 < B

– e.g., e = 18*14 = 252 for the rate-1/2 ARA protograph would yield speedup factor ≈ ≈ ≈ ≈ 8 vs universal decoder with L = 16 –Maximum decodable code size is (nT, kT), where (n,k) is the size of the protograph and T is the size of the circulant expansion – e.g., (nT,kT) = (40960, 20480) for the rate-1/2 ARA protograph expanded to e = 140 – e.g., (nT,kT) = (73728, 36864) for the rate-1/2 ARA protograph expanded to e = 252

  • K. Andrews

Oct’04 Accomplishments (Cont’d)

Notation and other relevant details: – Virtex block RAMs are 2048 x 9 bits – Design achieves protograph expansion factor T = 1024 by using two half-RAMs for 1024 inputs and 1024 outputs for each protograph edge

– Half-RAM addresses are accessed in sequence, exploiting simplicity of circulant permutations on protograph edges

– Design achieves decoder parallelizability corresponding to a maximum protograph size with e + n/2 < B where

– B = # block RAMs (B = 168 for current FPGA, Virtex II 8000) – e = # edges in protograph – n = # channel symbols input to protograph

– Example: small rate-1/2 ARA protograph can be preliminarily expanded T′ ′ ′ ′ = 10 times to yield e = 140, n = 40, e + n/2 = 160 < B =168

slide-12
SLIDE 12

12

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

  • C. Jones

Parallel Decoder – Type-2 for arbitrary LDPC codes

Pros:

  • Decodes any LDPC code
  • Adapts to new codes via RAM

write (no FPGA redesign) Cons:

  • Potentially less parallelizable
  • More memory dedicated to code

description Pros:

  • Decodes any LDPC code
  • Adapts to new codes via RAM

write (no FPGA redesign) Cons:

  • Potentially less parallelizable
  • More memory dedicated to code

description

4x4 MUX 4x4 MUX Inverse Interleaver RAMs Interleaver RAMs

  • Determined that parallelization factors allowing up

to 2L=32 edges/clock cycle are feasible

– Check nodes use reduced-complexity approx min*: min(reliability) + correction terms – 8-bit quantizer is uniform in reliability domain

Double buffered edge memories Variable node processors Constraint node processors

Decoder architecture for parallelism L=4

  • Parallel Universal Decoder: L processors
  • Parity matrix subdivided by pre-processing

algorithm for load balancing and collision avoidance

  • Performs 2L edge updates per clock cycle

FY04 Accomplishments (Cont’d)

  • Status of universal decoder implementation:

– Implemented stopping rule – Integrated noise generation module -> FER testing to 10-9 BER to 10-11. Measured frame error rates two orders of magnitude lower than possible by software simulation – Screened more than 20 candidate codes for error floor location – Receiver data can be interfaced to decoder via PCI backplane (tested)

Universal Decoder for SPArse CodEs (UDSPACE)

slide-13
SLIDE 13

13

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

  • The UDSPACE architecture allows decoding of ANY code up to a given (edge) complexity

– Max edges depends on # of available Block RAMs – Speed of decoder approximately proportional to parallelism factor (L)

  • Over past year decoder design improved speed by increasing parallelization factor L from 1 to 8 and improved max.

codeblock size by using a larger portion of available RAM

Parallel Decoder – Type-2 for arbitrary LDPC codes

Technology migration path Oct 2004 new accomplishments/revised plans in red

  • Current decoder has L=8 and uses 152 out of 168

RAMs

– Includes 6 of 9 CCSDS codes at 10 Msps

  • Testing of more parallel design (L=16 or 2xL=8) is

underway

  • L=16 decoder is now up and running (Virtex II

XC2V8000) – Double speed (20 Msps) now verified – Add’l testing/tweaking in progress

  • Soon-to-be-available FPGA parts (Virtex 2 Pro

V2P100 — price quote received this month) will supply more block RAM and faster clock speed

– V2P100 is nearly capable of decoding 3 largest CCSDS codes (~25 Msps for L=16) — V2P100’s RAM can accommodate codes with k up to ~14000, not quite k = 16384 – V2P125 part with sufficient block RAM for 3 largest CCSDS codes will not be available — new technology path skips this stage and goes straight to Virtex 4 – Planned Virtex 4 part will provide futher improvements

  • C. Jones

128K 64K 32K 16K 8K Edges 10 20 Decoding Speed, Msps (8 iterations, 90 MHz) Virtex II XC2V8000 Virtex II Pro V2P100 Virtex II Pro V2P125

  • L=8

L=8 L=4 L=1

  • L=16

2xL=8 L=8

CCSDS k=1K CCSDS k=4K CCSDS k=16K 4K

L=16

Virtex 4

Sep’04 Oct’04

Oct ’04 Accomplishments

Revised plan to skip V2P125 (part not available)

slide-14
SLIDE 14

14

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

  • D. Divsalar, J. Thorpe, S. Dolinar, C. Jones
  • Measured code performance with hardware

decoder down to 10-11 BER

  • Evaluated implementation loss for UDSPACE

hardware decoder

  • Discovered that HW decoder surprisingly

achieves lower error floor than SW decoder

  • n the same code (implementation gain !!!)
  • Due to clipping of likelihood ratio values in HW
  • decoder. Improvement has now been reproduced

in SW decoder, which was using full precision values.

  • Found “trapping set” which influences

noticeably the error floor

  • These sets are small subsets of variable nodes

that can be decoded incorrectly while satisfying all but a few neighboring check nodes at the set’s “frontier”

Discoveries by using HW decoder Sept ’04 Accomplishments

1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1

Eb/No (dB) BER

IMPLEMENTATION GAIN !!! HW SW IMPLEMENTATION LOSS ARA FLARION

PROPOSED CCSDS STANDARD (5/04)

slide-15
SLIDE 15

15

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Goddard C1 Goddard C2 DVB JPL ARA All-but-one Rate 0.822222222 0.875244618 1/4 to 11/12 1/2; 2/3; 4/5 n 4095 8176 16200; 64800 k 3367 7156 1K; 4K; 16K Family no no yes yes Threshold * ** ** *** Regular (3,6) Error floor ***? **? **? * JPL ARA Decoder computation * *** *** *** Encoder computation ** ***? *** *** Jeremy's linear dmin Simple code description *** *** ** *** Code J Public domain yes yes no? yes Flarion

Comparison of some codes wrt. Requirements and evaluation criteria

slide-16
SLIDE 16

16

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Measuring the Asymptotic Near-Optimality of Protograph Families

  • Metric: Iterative decoding threshold (dB) minus capacity limit (dB) for protographs
  • f each rate
  • This metric is a measure of each protograph’s intrinsic non-optimality, in that it depends on
  • how the protograph’s structure constrains the inherent goodness of all codes that can be expanded from

it

  • how effectively the protograph’s structure interplays with the iterative decoding algorithm to allow near-

ML decoding

  • This is an asymptotic metric, in that it is calculated in the limit as
  • the block size of any code expanded from the protograph becomes infinite
  • the target word error rate becomes arbitrarily small
  • Illustrated in next slide for several code families
  • G4d, G3d = (4,d) , (3,d) regular Gallager codes
  • Ci = protograph family generated from GSFC code C2 by splitting/merging check nodes
  • Ci+, Ci+pre = post-coded, or post-coded & pre-coded, versions of Ci
  • AR4A, AR3A = ARA code families using repetitions of degree 4, 3
slide-17
SLIDE 17

17

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

  • Conclusions from graph above
  • The AR3A family yields the closest approach to optimality over the entire range of rates considered
  • This is JPL’s main rationale for selecting the AR3A family for the draft Orange Book
  • The AR4A family performs within 0.13 dB of the AR3A family over the entire range
  • AR4A may be preferred since finite-size codes expanded from AR4A typically have lower error floors
  • The straightforward family Ci built from C2 is fine at high rates but unacceptably far from optimal at low rates
  • Post-coded and pre-coded versions of the Ci family can approximate the near-optimality of the AR4A family

Measuring the Asymptotic Near-Optimality of Protograph Families

Optimality of Decoding Thresholds for Code Families

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Code Rate Decoding Threshold minus Capacity Limit (dB) Ci G4d G3d Ci+ Ci+pre AR4A AR3A

slide-18
SLIDE 18

18

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Part II

Need a better error floor?

slide-19
SLIDE 19

19

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

ARA Code Families ARA Repeat-3 and ARA Repeat-4

  • ARA Repeat-3 family (Nov. 2004 JPL Orange Book)

suffers to some extent from error flooring at high code rates.

  • ARA Repeat-4 family is a related family that does not

exhibit flooring at high code rates.

  • We present ARA Repeat-4 performance at k = 4096

for rates 1/2 through 5/6.

  • For rate 7/8 we present performance for k = 7168 and make

comparison with GSFC C2 (k = 7154). Summary ARA Repeat-4 family Closely related to ARA Repeat-3 family Low threshold performance across code rates 1/2 to 8/9 BER levels approach 1e-10 before detected flooring at every

  • rate. No flooring yet detected for rates above 2/3.
slide-20
SLIDE 20

20

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

0.5 1 1.5 2 2.5 3 3.5 4 4.5 10

  • 11

10

  • 10

10

  • 9

10

  • 8

10

  • 7

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 Eb/No BER/FER ARA3chr12 4c 512c 14 8 ACE, k = 4096 ARA3chr23 4c 256c 9 5 ACE, k = 4096 ARA3chr34 4c 170c 8 5 ACE, k = 4080 ARA3chr45 4c 128c 8 5 ACE, k = 4096 ARA3chr56 4c 102c 8 4 ACE, k = 4080 ARA3chr78 4c 128c 8 4 ACE, k = 7168 GSFC C2 Rate 0.875 HW, k = 7154

ARA Repeat-4 Family Performance

slide-21
SLIDE 21

21

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

code rate =(n+1)/(n+2) 2n n=0, 1, ......

Code Rate Protograph Threshold Capacity Difference 1/2 0.516 0.187 0.329 2/3 1.288 1.059 0.229 3/4 1.848 1.626 0.222 4/5 2.277 2.040 0.237 5/6 2.620 2.362 0.258 6/7 2.897 2.625 0.272 7/8 3.129 2.845 0.284 8/9 3.324 3.033 0.291 Code Rate Protograph Threshold Capacity Difference 1/2 0.560 0.187 0.373 2/3 1.414 1.059 0.355 3/4 1.980 1.626 0.354 4/5 2.396 2.040 0.356 5/6 2.717 2.362 0.355 6/7 2.980 2.625 0.355 7/8 3.197 2.845 0.352 8/9 3.385 3.033 0.352

code rate =(n+1)/(n+2)

2n

n=0, 1, ......

ARA Code Families

ARA Repeat-3 (Nov04 Orange Book) ARA Repeat-4 (Related family)

slide-22
SLIDE 22

22

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

Attached are nine files containing the parity check matrices for our proposed CCSDS codes. Each file contains one line per edge in the graph, i.e. each nonzero entry in the H matrix. Each line contains two numbers, giving the column and row indices (numbered from 0) for those entries. Matlab users, for example, could read the file "YCCSDS1280.cr" and construct the sparse parity check matrix with the two commands: cr=load('YCCSDS1280.cr')+1; H=sparse(cr(:,2),cr(:,1),ones(1,length(cr))); Note these are punctured codes, so their rates are higher than is apparent from the dimensions of the parity check

  • matrices. If the H matrix is of size m by n, then the punctured columns are those numbered m through (4/3)m-1,

inclusive (numbered from zero). The filenames and codes are: filename n k edges rate punctured columns YCCSDS1280.cr 1280 1024 4096 4/5 384-511 YCCSDS1536.cr 1536 1024 5120 2/3 768-1023 YCCSDS2048.cr 2048 1024 7168 1/2 1536-2047 YCCSDS5120.cr 5120 4096 16384 4/5 1536-2047 YCCSDS6144.cr 6144 4096 20480 2/3 3072-4095 YCCSDS8192.cr 8192 4096 28672 1/2 6144-8191 YCCSDS20480.cr 20480 16384 65536 4/5 6144-8191 YCCSDS24576.cr 24576 16384 81920 2/3 12288-16383 YCCSDS32768.cr 32768 16384 114688 1/2 24576-32767

Parity check matrices Back-up

slide-23
SLIDE 23

23

Jet Propulsion Laboratory

California Institute of Technology CCSDS, Toulouse, Nov., 2004

JPL Proprietary Material

accumulator to Channel D

puncture 0x puncture P1

accumulator

π

permutation (interleaver)

D

puncture P2

Family of Accumulate-Repeat-Accumulate codes (encoder)

repetition 3 repetition 3

puncture P0

input

Two possible, low-complexity implementations of the encoder

Patent Pending “Puncturing” yields a family of codes with higher code rates

ARA Back-up

Alternate encoder structure (systematic)