Computer Generation of Efficient Software Viterbi Decoders Frdric - - PowerPoint PPT Presentation

computer generation of efficient software viterbi decoders
SMART_READER_LITE
LIVE PREVIEW

Computer Generation of Efficient Software Viterbi Decoders Frdric - - PowerPoint PPT Presentation

Carnegie Mellon Computer Generation of Efficient Software Viterbi Decoders Frdric de Mesmay, Srinivas Chellappa, Franz Franchetti, Markus Pschel Electrical and Computer Engineering Carnegie Mellon University Co-Founder SpiralGen, Inc.


slide-1
SLIDE 1

Carnegie Mellon

Computer Generation of Efficient Software Viterbi Decoders

Sponsors: DARPA DESA program, ONR, NSF-NGS/ITR, NSF-ACR, Mercury, and Intel

Frédéric de Mesmay, Srinivas Chellappa, Franz Franchetti, Markus Püschel Electrical and Computer Engineering Carnegie Mellon University Co-Founder SpiralGen, Inc.

slide-2
SLIDE 2

Carnegie Mellon

Viterbi Decoder

 Error correction

  • Forward Error Correction
  • Digital cellular (CDMA, GSM),

modems, satellite/deep space communications, 802.11 wireless LANs

  • Software defined radio (SDR)

 Pattern Recognition

  • Speech recognition
  • text recognition
  • computational linguistics
  • bioinformatics

NASA Cassini Orbiter: K=15 rate=1/6 GSM (TCH/FS) K=5 rate=1/2 CDMA2000/UMTS/IS-95 K=9 rate=1/3

SDR requires efficient Viterbi decoder software implementations

slide-3
SLIDE 3

Carnegie Mellon

Software Defined Radio

5 10 15 20 25 30 6 12 18 24 30 36 42 48 54

WiFi transmitter on Intel Atom Dualcore

Run time per OFDM symbol [μs] vs. data rate [Mbit/s] realtime

6.3 x

Parallelism: 2 threads 4-16 way SIMD

Compilers fail to optimize: 50x

8 x

Best standard C code Straightforward C code but minimizing op count Spiral: computer generated

slide-4
SLIDE 4

Carnegie Mellon

Spiral: Viterbi Software Generation

“Click”: Push-button code generation http://www.spiral.net/software/viterbi.html

slide-5
SLIDE 5

Carnegie Mellon

Spiral: Generated SSE Viterbi Code

“Click”: Push-button code generation http://www.spiral.net/software/viterbi.html

void viterbi_ccsds(unsigned char *Y, unsigned char *X, unsigned char *syms, unsigned char *dec, unsigned char *Branchtab) { for(int i9 = 0; i9 <= 1026; i9++) { unsigned char a75, a81; int a73, a92; ... a71 = ((__m128i *) X); s18 = *(a71); a72 = (a71 + 2); s19 = *(a72); a73 = (4 * i9); a74 = (syms + a73); a75 = *(a74); a76 = _mm_set1_epi8(a75); a77 = ((__m128i *) Branchtab); a78 = *(a77); a79 = _mm_xor_si128(a76, a78); b6 = (a73 + syms); a80 = (b6 + 1); a81 = *(a80); a82 = _mm_set1_epi8(a81); a83 = (a77 + 2); a84 = *(a83); a85 = _mm_xor_si128(a82, a84); t13 = _mm_avg_epu8(a79,a85); a86 = ((__m128i ) t13); a87 = _mm_srli_epi16(a86, 2); a88 = ((__m128i ) a87); t14 = _mm_and_si128(a88, _mm_set_epi8(63, 63, 63, 63, 63, 63, 63 , 63, 63, 63, 63, 63, 63, 63, 63 , 63)); t15 = _mm_subs_epu8(_mm_set_epi8(63, 63, 63, 63, 63, 63, 63 , 63, 63, 63, 63, 63, 63, 63, 63 , 63), t14); m23 = _mm_adds_epu8(s18, t14); m24 = _mm_adds_epu8(s19, t15); m25 = _mm_adds_epu8(s18, t15); m26 = _mm_adds_epu8(s19, t14); a89 = _mm_min_epu8(m24, m23); ... } ... }

slide-6
SLIDE 6

Carnegie Mellon

Organization

 Spiral  Generating software Viterbi decoders  Performance results  Summary

slide-7
SLIDE 7

Carnegie Mellon

Organization

 Spiral  Generating software Viterbi decoders  Performance results  Summary

slide-8
SLIDE 8

Carnegie Mellon

Automatic Performance Tuning

 Current vicious circle: Whenever a new platform comes

  • ut, the same functionality needs to be rewritten and

reoptimized

 Automatic Performance Tuning

  • BLAS: ATLAS, PHiPAC
  • Linear algebra: Sparsity/OSKI, Flame
  • Sorting
  • Fourier transform: FFTW
  • Linear transforms (and Viterbi): Spiral
  • …others

Proceedings of the IEEE special issue, Feb. 2005

New problem class: software Viterbi decoders

slide-9
SLIDE 9

Carnegie Mellon

What is Spiral?

Traditionally Spiral Approach

High performance library

  • ptimized for given platform

Spiral

High performance library

  • ptimized for given platform

Comparable performance

slide-10
SLIDE 10

Carnegie Mellon

Idea: Common Abstraction and Rewriting

ν p μ

Architectural parameter: Vector length, #processors, …

rewriting defines

Kernel: problem size, algorithm choice pick search abstraction abstraction Model: common abstraction = spaces of matching formulas = domain-specific language

architecture space algorithm space

  • ptimization
slide-11
SLIDE 11

Carnegie Mellon

Program Generation in Spiral

Algorithm Generation Algorithm Optimization Implementation Code Optimization Compilation Compiler Optimizations Problem specification (transform) algorithm C code Fast executable performance Search controls controls

Spiral

Spiral: Complete automation of the implementation and

  • ptimization task

Basic ideas: Declarative representation

  • f algorithms

Rewriting systems to generate and optimize algorithms at a high level

  • f abstraction

Markus Püschel, José M. F. Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo: SPIRAL: Code Generation for DSP Transforms. Special issue, Proceedings of the IEEE 93(2), 2005

slide-12
SLIDE 12

Carnegie Mellon

Viterbi Decoding Linear Transforms Matrix-Matrix Multiplication Synthetic Aperture Radar (SAR)

interpolation 2D iFFT matched filtering preprocessing convolutional encoder Viterbi decoder

010001 11 10 00 01 10 01 11 00 010001 11 10 01 01 10 10 11 00

= £

£

Some Kernels as Operator Formulas

slide-13
SLIDE 13

Carnegie Mellon

Same Approach for Different Paradigms

Vectorization: Threading: GPUs: Verilog for FPGAs:

slide-14
SLIDE 14

Carnegie Mellon

Organization

 Spiral  Generating software Viterbi decoders  Performance results  Summary

slide-15
SLIDE 15

Carnegie Mellon

Structure of Viterbi Decoders

State machine Viterbi trellis (data flow)

01 00 11 10 0/00 1/11 1/01 1/10 0/01 0/11 0/10 1/00

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

stages states

1 1 1 1 1 1 1 1 1 1 1 1 1 1

Key observation: similarity to Walsh-Hadamard transform (WHT)

slide-16
SLIDE 16

Carnegie Mellon

Viterbi Language (VL)

VL in Backus-Naur Form (BNF) Viterbi decoder forward pass in VL

slide-17
SLIDE 17

Carnegie Mellon

Compiling VL To Code

slide-18
SLIDE 18

Carnegie Mellon

Vectorization Through Rewriting

Vectorization Rule Set Vectorized Viterbi Decoder

slide-19
SLIDE 19

Carnegie Mellon

VL Compilation System

Vectorization by Rewriting VL Compiler

metric spread

  • verflow factors

Vectorized Decoder VL Expression Target Architecture

scalar decoder

Execution VL Compiler Peephole Optimization

slide-20
SLIDE 20

Carnegie Mellon

Organization

 Spiral  Generating Software Viterbi Decoders  Performance results  Summary

slide-21
SLIDE 21

Carnegie Mellon

Comparison to Hand-Tuned Code

Karn 16-way 8-way 4-way scalar Spiral 16-way 8-way 4-way scalar

Karn’s implementation: hand-written assembly for 4 specific Viterbi codes Single core of Core2 Extreme (quad-core), 3 GHz, Intel C++ compiler 10.0

slide-22
SLIDE 22

Carnegie Mellon

Vectorization Speed-Up

Single core of Core2 Extreme (quad-core), 3 GHz, Intel C++ compiler 10.0

slide-23
SLIDE 23

Carnegie Mellon

1 10 100 1,000 10,000 100,000 6 7 8 9 10 11 12 13 14 15 16 16-way 8-way 4-way scalar Performance (kbit/s)

Decoders for rate 1/4

Constraint length K

Data Rate Results

Single core of Core2 Extreme (quad-core), 3 GHz, Intel C++ compiler 10.0

slide-24
SLIDE 24

Carnegie Mellon

Organization

 Spiral  Generating Software Viterbi Decoders  Performance results  Summary

slide-25
SLIDE 25

Carnegie Mellon

Summary

 Platforms are powerful yet complicated

  • ptimization will stay a hard problem

 Automatic generation of Viterbi decoder

from high-level specification

 Spiral: program generation and autotuning

can provide full automation

 Performance of Spiral’s Viterbi decoders

is competitive with expert hand tuning

A(µ)

M (»)

architecture kernel

Image: Intel

slide-26
SLIDE 26

Carnegie Mellon

www.spiral.net

(Part of the) Spiral Team