soc network for interleaving in wireless communications
play

SoC-Network for Interleaving in Wireless Communications Norbert - PDF document

Microelectronic System Design Research Group University Kaiserslautern www.eit.uni-kl.de/wehn SoC-Network for Interleaving in Wireless Communications Norbert Wehn wehn@eit.uni-kl.de MPSoC03 7-11 July 2003, Chamonix, France Outline


  1. Microelectronic System Design Research Group University Kaiserslautern www.eit.uni-kl.de/wehn SoC-Network for Interleaving in Wireless Communications Norbert Wehn wehn@eit.uni-kl.de MPSoC’03 7-11 July 2003, Chamonix, France Outline MPSoC’03 N. Wehn � Motivation � Outer Modem Algorithms � Channel Coding � Interleaving (Turbo-Codes) � Application Specific Processing Node � Application Specific Communication Network � Network Structure � Network Analysis � Results � Conclusion 2 1

  2. Wireless Implementation Challenges I MPSoC’03 N. Wehn � DECT 10 MIPS, GSM 100 MIPS, UMTS x 1000 MIPS 3 Wireless Implementation Challenges II MPSoC’03 N. Wehn � Algorithmic Complexity � “Shannon‘s Law beats Moore‘s Law” � Programmability and Flexibility � different QoS � „multi-mode“ support: different algorithms & standards � „software radio“ � different throughput requirements � Low Power/Low Energy � BUT: „Energy-Flexibility Gap“ � Design Space � algorithms, architecture …. 4 2

  3. Motivation MPSoC’03 N. Wehn New architectures: AP-MPSoC � scalable, highly parallel, programmable, energy-efficient � application-specific processor node running with low frequency � application-specific communication network Wireless baseband algorithms � Inner modem � signal processing based on matrix computations e.g. multi-user detection, interference cancellation, filtering, correlators � many publications on efficient multi-processor implementations of matrix computations e.g. systolic arrays � Outer Modem � Channel coding, Interleaving, Data stream segmentation � efficient multi-processor implementation largely unexplored 5 Importance of Channel Coding MPSoC’03 N. Wehn Efficient channel coding is key for reliable communication High throughput: complexity is in data distribution and not in computation 6 3

  4. Channel Coding Techniques MPSoC’03 N. Wehn � Convolutional Codes � Viterbi decoding algorithm � intensively studied (HW/SW/DSP_extensions) � Most efficient Codes: Turbo-Codes (1993), LDPC-Codes (1996) � block-based � iterative decoding techniques � computational complexity increased by order of magnitude � memory access and data transfers are very critical � Turbo-Codes � one of the big changes when moving from 2G to 3G � part of many emerging standards e.g. WLAN, 4G � Turbo-principle extended to modulation � Very active research area in the communication community Mapping of this type of algorithms onto programmable architectures largely unexplored 7 Turbo-En/Decoder Structure MPSoC’03 N. Wehn r r Systematic r s s x x s x r Parity r x 1 p RSC Coder 1 x 1 RSC Coder 1 Interleaver p Interleaver r r In s x int x 2 p RSC Coder 2 RSC Coder 2 int reliability information Deinterleaver r Deinterleaver r Λ Λ 1 2 a r a r Λ int 1 Λ e r 2 r e Softoutput Softoutput Softoutput Softoutput Λ Λ int s s Decoder Interleaver Decoder Decoder Interleaver r int Decoder r MAP 1 MAP 2 Λ Λ 1 MAP 1 2 MAP 2 p p int reliability information 8 4

  5. Turbo-Codes MPSoC’03 N. Wehn � Iterative decoding process � block-based 3GPP: 20-5114 bits, 3GPP2: 378-20730 bits � DEC1, Interleaving, DEC2, Deinterleaving � interleaved reliability information is exchanged between decoders � Softoutput Decoder � determine Log-Likelihood Ratio (LLR) of each bit being sent „0“ or „1“ (Viterbi determines only most likely path in trellis) � three step algorithm: forward/backward recursion, LLR calculation � ~2.5 x computational complexity of Viterbi algorithm � memory complexity (size,access) >> Viterbi algorithm � Interleaving/Deinterleaving � important step on the physical layer � scrambles data processing order to yield timing diversity � minimizes burst errors 9 Implementation Challenges MPSoC’03 N. Wehn � Programmability and Flexibility „...It is critical for next generation programmable DSP to adress the requirements of algorithms such as Turbo-Codes since these algorithms are essential for improved 2G and 3G wireless communication “ (I. Verbauwhede „DSP‘s for wireless communications“) � High throughput requirements � UMTS: 2 Mbit/s (terminal), >10Mbit/s (basestation) � emerging standards >100 Mbit/s � DSP performance (UMTS compliant based on Log-MAP algorithm) Clock freq. cycles/ Throughput Processor Architecture [MHz] (bit*MAP) @ 5 Iter. MOT 56603 16-bit DSP 80 472 17 kbit/s STM ST120 VLIW, 2 ALU 200 100 ~ 200 kbit/s SC140 VLIW, 4 ALU 300 50 600 kbit/s ADI TS (1) VLIW, 2 ALU 180 27 666 kbit/s (1) With special ACS-instruction support 10 5

  6. Multiprocessor Solution (Block Level) MPSoC’03 N. Wehn Simple MP solution Multiprocessor solution becomes mandatory MAP- Interleaver/ P 1 MAP- Interleaver/ Decoder Deinterleav Decoder Deinterleav Single Processor MAP- Interleaver/ MAP- Interleaver/ MAP- Interleaver/ P 2 Decoder Deinterleav MAP- Interleaver/ Decoder Deinterleav Decoder Deinterleaver Decoder Deinterleaver ............... MAP- Interleaver/ P N MAP- Interleaver/ Decoder Deinterleav Decoder Deinterleav � Sequential processing of � MAP algorithm � two MAP component decoders � N blocks are processed � Interleaving and Deinterleaving � Large latency � Low architectural efficiency � large area (memory!) � high energy 11 Optimized MPSoC (Sub-Block Level) MPSoC’03 N. Wehn Better solution: parallelization on algorithmic level (sub-block level) � MAP decoder parallelization (exploiting trellis windowing technique) � each processor can execute a sub-block of of the complete block independently � slight increase in computational complexity due to acquisition phase � allows distributed computing � Iterative exchange of interleaved information yields only limited locality write P 1 P 1 Subblock 1 � Low Latency (decreases with N) Subblock 1 read � Large architectural efficiency � Computational locality but P 2 Interleaver/ P 2 Interleaver/ Subblock 2 Deinterleaver Subblock 2 Deinterleaver network-centric architecture Network Network P N P N Subblock N Subblock N 12 6

  7. Interleaver Bottleneck MPSoC’03 N. Wehn � Data from N sources have to be „perfectly randomly“ distributed BI T P I I nterl. P I Interleaving position Network 1 1 3 1 1,2,3 M 1 2 6 1 2 P 1 3 5 1 2 4 2 2 1 4,5,6 5 2 4 2 M 2 P 2 6 1 2 1 � Average : P i sends & receives same amount of values/cycle � Peak : P i can receive up to N-1 more values than average value Crossbar functionality, but with output blocking conflict 13 Interleaving Network Requirements MPSoC’03 N. Wehn � Flexibility and Scalability � Interleaver scheme can change from decoding block to block � e.g. ~ 5000 different interleaver tables in UMTS � Different throughput requirements � Global data distribution � Good interleavers imply no locality � 0-latency penalty � data distribution should be completely done in parallel to data calculation � Write conflicts i.e. different PEs write simultanously onto same target PE � multi-port memories infeasable � conflict-free interleaver design (e.g. IMEC approach), but lack of flexibility 14 7

  8. Application Specific Processing Node MPSoC’03 N. Wehn � Increased ILP by Tensilica Xtensa RISC core for MAP calculation � double add-compare-select operation (butterfly) α k (2n) = max* ( α k-1 (n) + Λ in k (I), α k-1 (n+M/2) + Λ in k (II)) α k (2n+1) = max* ( α k-1 (n) + Λ in k (II), α k-1 (n+M/2) + Λ in k (I)) � max* operation max*(x 1 , x 2 ) = max (x 1 , x 2 ) + ln(1+exp(-| x 2 -x 1 |)) � zero overhead data-transfers: memory operations parallel to butterfly operation 1.54mm 2 (0.18um techology), f=133 MHz � Clock freq. cycles/ Throughput Processor [MHz] (bit*MAP) @ 5 Iter. Xtensa 133 9 1,4 Mbit/s STM ST120 200 100 ~ 200 kbit/s SC140 300 50 600 kbit/s ADI TS 180 27 666 kbit/s 15 Processing Node Interface MPSoC’03 N. Wehn � Fast single-cycle local data memory M C � mapped into processors adress space � XLMI single-cycle data interface for interprocessor communication � Communication device for data distribution � message passing network (message=data + target addr.) � single cycle access CPU-Core (Xtensa) CPU-Address-Space Custom-Hardware 32 32 S R PIF Core I/O 32 32 FIFO X L 16 XLMI CPU Bus Data Comm. Data 16 M M C M P Buffer Addr. Bus Dev. I Sel 0 Addr. 0 Interface Sel 1 16 Data 16 Cluster Bus Buffer 1 Addr. Cluster Bus Local address in Node ID target 0 Buffer ID 0 Message format Data (8bit) (1bit) Processor (7bit) buffer (14bit) 16 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend