MIMO OFDM Transceiver for a Many-Core Computing Fabric A Nucleus - - PowerPoint PPT Presentation

mimo ofdm transceiver for a many core computing fabric a
SMART_READER_LITE
LIVE PREVIEW

MIMO OFDM Transceiver for a Many-Core Computing Fabric A Nucleus - - PowerPoint PPT Presentation

MIMO OFDM Transceiver for a Many-Core Computing Fabric A Nucleus based Implementation T. Kempf, D. Gnther, A. Ishaque, G. Ascheid ISS (Chair of Integrated Signal Processing Systems) Institute for Communication Technologies and Embedded


slide-1
SLIDE 1

MIMO OFDM Transceiver for a Many-Core Computing Fabric – A Nucleus based Implementation

Institute for Communication Technologies and Embedded Systems

  • T. Kempf, D. Günther, A. Ishaque, G. Ascheid

ISS (Chair of Integrated Signal Processing Systems)

slide-2
SLIDE 2

Outline

Introduction Nucleus Methodology MIMO OFDM Transceiver Implementation

  • Application Analysis - Nuclei Identification

Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping

Summary & Outlook

2

slide-3
SLIDE 3

Flexible SDR

Software Defined Radio Vision

e.g UMTS Future SDR Mobile Phone

Source: Infineon Technologies

Free Area: Cost Savings

  • r

new Functionality Today‘s Mobile Phone e.g.GSM

Source: Infineon Technologies

slide-4
SLIDE 4

Flexible SDR

Software Defined Radio Vision

e.g Bluetooth

The three key properties:

Portability

  • Software is portable onto different platforms

Standard.exe → Device_1, ..., Device_n

Interoperability

  • Different devices configured for the same standard interoperate

Standard_1/Device_1 ↔ Standard_1/Device_2

Future SDR Mobile Phone

Source: Infineon Technologies

Free Area: Cost Savings

  • r

new Functionality Today‘s Mobile Phone e.g.GSM

Source: Infineon Technologies

Standard_1/Device_1 ↔ Standard_1/Device_2

Loadability

  • Platform is capable of running different standards

Device ← Standard_1.exe, ..., Standard_n.exe

But we must not forget:

Efficiency

  • Power consumption of flexible SDR must be close

to power consumption of dedicated device (battery driven!)

slide-5
SLIDE 5

Flexible SDR

Software Defined Radios Vision

e.g Bluetooth GSM.exe UMTS.exe LTE.exe On-the-fly Configuration

The three key properties:

Portability

  • Software is portable onto different platforms

Standard.exe → Device_1, ..., Device_n

Interoperability

  • Different devices configured for the same standard interoperate

Standard_1/Device_1  Standard_1/Device_2

Contradicting Requirements ! Flexibility (programmability) vs.

Future SDR Mobile Phone

Source: Infineon Technologies

Free Area: Cost Savings

  • r

new Functionality Today‘s Mobile Phone e.g.GSM

Source: Infineon Technologies

Standard_1/Device_1  Standard_1/Device_2

Loadability

  • Platform is capable of running different standards

Device ← Standard_1.exe, ..., Standard_n.exe

But we must not forget:

Efficiency

  • Power consumption of flexible SDR must be close

to power consumption of dedicated device (battery driven!)

Flexibility (programmability) vs. Energy Efficiency

slide-6
SLIDE 6

Outline

Introduction Nucleus Methodology MIMO OFDM Transceiver Implementation

  • Application Analysis - Nuclei Identification

Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping

Summary & Outlook

6

slide-7
SLIDE 7

Nucleus Methodology

Nuclei

N 1 N 2 N 7 NN N 5

Transceiver Description

Transceiver Description

N 1 N 7 N 5 N 2 Non N Tasks

Nucleus Library

Nucleus

7

PE 2 (rASIP) PE 3 (DSP) MEM

  • Comm. Arch.

PE 5 (FPGA) PE1 (ASIP) PE 4 (GPP)

HW Platform

Nucleus

  • Critical, demanding, algorithmic kernel
  • Kernel is common among different waveforms
  • Not waveform nor hardware specific
slide-8
SLIDE 8

Nuclei

N 1 N 2 N 7 NN N 5

Transceiver Description

Transceiver Description

N 1 N 7 N 5 N 2 Non N Tasks

Nucleus Library

Nucleus Methodology

8

NI

PE 2 (rASIP) PE 3 (DSP) MEM

  • Comm. Arch.

PE 5 (FPGA) PE1 (ASIP) PE 4 (GPP)

HW Platform PEs PE 2

PE 3 PE 4 PE 5

PE 1 NI

Flavor

slide-9
SLIDE 9

Nuclei

N 1 N 2 N 7 NN N 5

Transceiver Description

Transceiver Description

N 1 N 7 N 5 N 2 Non N Tasks

Mapping & Evaluation

Compile Nucleus Library

Nucleus Methodology

9

PE 2 (rASIP) PE 3 (DSP) MEM

  • Comm. Arch.

PE 5 (FPGA) PE1 (ASIP) PE 4 (GPP)

HW Platform PEs Board Support Package PE 2 PE 1

PE 3 PE 4 PE 5

NI

NI NI NI NNI Flavors

NI NI

slide-10
SLIDE 10

Outline

Introduction Nucleus Methodology MIMO OFDM Transceiver Implementation

  • Application Analysis - Nuclei Identification

Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping

Summary & Outlook

10

slide-11
SLIDE 11

Nuclei Identification: Transceiver Structure

Outer Modem

Channel (De-)coding (De-)Interleaving

IEEE 802.11n

11

(De-)Interleaving

Inner Modem (RX)

RX OFDM Processing Channel Estimation Spatial Equalizing: Mitigate channel impact on payload Soft Demapping: Calculate soft bits (LLRs)

BPSK, 4QAM, 16QAM

OFDM Slot

slide-12
SLIDE 12

Nuclei Identification: Kernel Identification

12

Analyze different algorithmic choices within RX blocks Identify computational kernels Recurring tasks Operate on data with certain alignment Build application as composition of kernels

slide-13
SLIDE 13

Nuclei Identification: Kernel Identification (Example)

LMMSE MIMO Equalizer with QRD Basic transmission equation Linear MMSE equalization Regularized QRD

R Q Q I H H

a

      =       =

σ

ˆ

n Hx y + =

( )

H H

H I H H G y G x ˆ ˆ ˆ , ˆ

1

2

+ = =

s n

E σ

13

Rewrite G using Qa and Qb

R Q I H

b 

     =       =

s n

E σ

G =

E s σ n Q bQ a H

Computational Kernels Regularized QR decomposition Matrix-matrix multiplication Matrix-vector multiplication

slide-14
SLIDE 14

Nuclei Identification: Kernel Overview

14

Application variants consist of a few kernels only!

slide-15
SLIDE 15

Outline

Introduction Nucleus Methodology MIMO OFDM Transceiver Implementation

  • Application Analysis - Nuclei Identification

Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping

Summary & Outlook

15

slide-16
SLIDE 16

Application Implementation: P2012 Platform (ST Microelectronics)

SoC platform with maximum of 32 clusters One cluster provides

  • Max. 16 RISC cores (STxP70) @ 600MHz

VECx vector extension (SIMD) 128 bit vector registers 8x16 bit or 4x32 bit operations Hardware synchronizer for inter-core signaling Interface for hardware accelerators (ASICs)

16

slide-17
SLIDE 17

Application Implementation: Kernel Overview

For 2x2 and 4x4 MIMO use case Cycles for execution on

single STxP70 processor core including VECX unit

Corresponding time for

600MHz clock frequency

17

In the range of … Competing solutions IEEE 802.11n real time

(4s per OFDM slot)

slide-18
SLIDE 18

Outline

Introduction Nucleus Methodology MIMO OFDM Transceiver Implementation

  • Application Analysis - Nuclei Identification

Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping

Summary & Outlook

18

slide-19
SLIDE 19

Algorithm Performance Evaluation: Investigated Algorithmic Choices

Wide variety of algorithms is implemented

Channel Estimation, Spatial Equalizer, Channel Coding

19

Channel Estimation, Spatial Equalizer, Channel Coding

Determine superior choice by error correction performance Channel simulation

Fading: i.i.d. Rayleigh Fading Power delay profile: Exponential 20dB drop along 150ns Noise: AWGN 4x4 MIMO system

slide-20
SLIDE 20

Algorithm Performance Evaluation: ZF vs. MMSE MIMO Equalization

MMSE equalizer Better performance at little computational cost 4x4 MIMO, r=1/2, g1=(133)8, g2=(171)8, nconv=6144 bit, nldpc = 1944 bit H(C)–H(C|)

20

4QAM region 16QAM region

I(C,) = H(C

slide-21
SLIDE 21

Frame Error Rate of 4x4 MIMO System (Short Frames)

21

Fix-point issues at low FERs when using MMSE-QRD, SIC-MMSE

slide-22
SLIDE 22

Frame Error Rate of 4x4 MIMO System (Short Frames) For the investigated algorithms MMSE-DS-QRD is a viable trade-off between

22

is a viable trade-off between algorithmic performance and implementation complexity

slide-23
SLIDE 23

Frame Error Rate of 4x4 MIMO System for different Frame Sizes Algorithmic performance comparable to results found in literature

23

to results found in literature

slide-24
SLIDE 24

Outline

Introduction Nucleus Methodology MIMO OFDM Transceiver Implementation

  • Application Analysis - Nuclei Identification

Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping

Summary & Outlook

24

slide-25
SLIDE 25

Application-to-Platform Mapping: Identify Parallelism

Parallelizable dimensions of OFDM receiver application

Space (RX antennas) Frequency (subcarriers) Time (OFDM slots)

Preamble Data payload

25

Data payload

slide-26
SLIDE 26

Application-to-Platform Mapping: Assign Cores to PGs

Given:Single core timing requirements Goal: Assign cores to match real time constraints (4s per slot)

Task time (us) #cores Preprocessing (per OFDM frame) LS Channel Estimation 17.47 Equalizer Preprocessing 215.31 Actual Processing (per OFDM slot)

4 4

26

Actual Processing (per OFDM slot) OFDM Demodulation (mem. realign) 6.83 Equalizer (Actual Detection) 6.08 Soft Demapping (16 QAM) 2.84

2 4 2

slide-27
SLIDE 27

Application-to-Platform Mapping: Assign Cores

Final mapping

Partitioning of components into processing groups Number of cores per group 8 cores enable real time

PG 2 PP&EQ 4 PEs PG 3

27

PG 1 Modulation 2 PEs PG 3 Demapping 2 PEs

slide-28
SLIDE 28

2PARMA: Occupation Graph

Implementation on P2012 platform using 8 cores Minimum latency for 27 or more OFDM slots of data payload Latency = 8.2s IEEE 802.11n allows 16s (including MAC layer)

28

slide-29
SLIDE 29

Thank you for your attention ! Any questions?

kempf@ice.rwth-aachen.de