Sora: High Performance Software Radio using General Purpose Multi- - - PowerPoint PPT Presentation

sora high performance software
SMART_READER_LITE
LIVE PREVIEW

Sora: High Performance Software Radio using General Purpose Multi- - - PowerPoint PPT Presentation

Sora: High Performance Software Radio using General Purpose Multi- Core Processors Kun Tan Jiansong Zhang Ji Fang He Liu Yusheng Ye Shen Wang Yongguang Zhang Haitao Wu Wei Wang Geoffrey M. Voelker Microsoft


slide-1
SLIDE 1

Sora: High Performance Software Radio using General Purpose Multi- Core Processors

Kun Tan† Jiansong Zhang† Ji Fang‡ He Liu§ Yusheng Ye§ Shen Wang§ Yongguang Zhang† Haitao Wu† Wei Wang† Geoffrey M. Voelker◊

† Microsoft Research Asia ‡ Tsinghua University, Beijing, China § Beijing Jiaotong University, Beijing, China ◊UCSD, La Jolla, USA NSDI 2009, Boston, USA 1

slide-2
SLIDE 2

Software Radio

NSDI 2009, Boston, USA 2

Benefits Promise of universal connectivity and cost saving Programmability => faster development cycle, faster to market Open platform for wireless research

CDMA WiFi WiMAX GPS 3G GSM Bluetooth software

Bluetooth, WiFi, WiMAX, GSM, CDMA, 3G, LTE …

General RF Frontend

slide-3
SLIDE 3

Fundamental Challenges

  • Large volume of high-fidelity digital signals

– Require a high-speed system I/O

NSDI 2009, Boston, USA 3

Antenna

Processor Software Hardware Digital Samples D/A A/D RF Frontend 1.2Gbps for 802.11

(20MHz channel, 16b A/D, 4x)

~up to 5 Gbps for 11n (4x4MIMO) ; Over 10Gbps for future high-speed wireless

slide-4
SLIDE 4

Fundamental Challenges

  • Large volume of high-fidelity digital signals

– Require a high-speed system I/O

  • Computation-intensive signal processing

NSDI 2009, Boston, USA 4

Interleaving Convolutional encoder QAM Mod IFFT GI Addition Symbol Wave Shaping Scramble To RF Demod + Interleaving FFT Viterbi decoding Remove GI From RF Descramble Transmitter: Receiver: Bits @48Mbps Bits @48Mbps Samples @512Mbps Samples @1.28Gbps Samples @640Mbps Decimation Samples @384Mbps Bits @24Mbps Samples @1.28Gbps Samples @640Mbps Samples @512Mbps Samples @384Mbps Bits @48Mbps Bits @24Mbps Bits @24Mbps From MAC To MAC Bits @24Mbps

slide-5
SLIDE 5

Fundamental Challenges

  • Large volume of high-fidelity digital signals

– Require a high-speed system I/O

  • Computation-intensive signal processing

NSDI 2009, Boston, USA 5

Interleaving Convolutional encoder QAM Mod IFFT GI Addition Symbol Wave Shaping Scramble To RF Demod + Interleaving FFT Viterbi decoding Remove GI From RF Descramble Transmitter: Receiver: Bits @48Mbps Bits @48Mbps Samples @512Mbps Samples @1.28Gbps Samples @640Mbps Decimation Samples @384Mbps Bits @24Mbps Samples @1.28Gbps Samples @640Mbps Samples @512Mbps Samples @384Mbps Bits @48Mbps Bits @24Mbps Bits @24Mbps From MAC To MAC Bits @24Mbps

Raw computation power required: 802.11b => 10Gops, 802.11a => 40Gops! (now server-class CPU runs at 3GHz clock)

slide-6
SLIDE 6

Fundamental Challenges

  • Large volume of high-fidelity digital signals

– Require a high-speed system I/O

  • Computation-intensive signal processing
  • Hard deadline and accurate timing control

– 802.11 MAC requires response within a few s – Event trigger timing accuracy at s level

NSDI 2009, Boston, USA 6

slide-7
SLIDE 7

Approaches

NSDI 2009, Boston, USA 7

Programmability Low High Low High Performance Embedded DSP Example: Rice WARP, TI SFF-SDR Programmable hardware (FPGA) Example: GNU Radio/USRP(v1&2)

  • Interface USB/GbE: <1Gbps, >1ms
  • Achievable wireless xput: ~100Kbps

Low-performance GPP-based SDR

Sora

Sora

Resolving the SDR platform dilemma

  • Commodity PC w/ C program
  • High performance
  • sys tput:10Gbps; ~s latency
  • target wireless xput:10M~1Gbps
slide-8
SLIDE 8

Sora Approach

  • New PCIe-based Interface card => high system

throughput

  • New optimizations to implement PHY

algorithms and streamline processing on multi-core CPU=> efficient PHY processing

  • Core dedication => real-time support

NSDI 2009, Boston, USA 8

slide-9
SLIDE 9

Mem RF RF RF

Sora APP

Multi-core CPU Sora Soft-Radio Stack PCIe bus Digital Samples @Multiple Gbps

RCB

A/D D/A RF

Sora APP APP APP APP APP

Sora Hardware

Sora Architecture

NSDI 2009, Boston, USA 9

General radio front-end: 700M/1.8G/2.4G/5GHz

slide-10
SLIDE 10

Mem RF RF RF

Sora APP

Multi-core CPU Sora Soft-Radio Stack PCIe bus Digital Samples @Multiple Gbps

RCB

A/D D/A RF

Sora APP APP APP APP APP

Sora Hardware

Radio Control Board

NSDI 2009, Boston, USA 10

PCIe-based High-speed Interface card  PCIe is commodity in most modern PCs  High throughput: 16Gbps at PCIe-8x  Low latency: ~ 1 s  Separated with other I/O devices

slide-11
SLIDE 11

RCB Details

NSDI 2009, Boston, USA 11

PCIe-8x interface: up to 16Gbps throughput Versatile RF interface: up to 8 channels (8x8 MIMO)

slide-12
SLIDE 12

Versatile RF interface: up to 8 channels (8x8 MIMO)

RCB Details

NSDI 2009, Boston, USA 12

s  Buffered data path: bridging the synchronous

  • ps at RF and asynchronous processing at CPU

(12.3Gbps measured )  Low latency control path for software (0.36 s measured)

A/D D/A

RF Circuit

RF Front-end

PCIE Controller SDRAM Controller

FIFO FIFO

DMA Controller DDR SDRAM FPGA

RCB

PCIe bus Antenna

RF Controller

Registers

slide-13
SLIDE 13

Mem RF RF RF

Sora APP

Multi-core CPU Sora Soft-Radio Stack PCIe bus Digital Samples @Multiple Gbps

RCB

A/D D/A RF

Sora APP APP APP APP APP

Sora Hardware

Sora Software

13

High-performance SDR processing w/ key software techniques

 Efficient PHY implementation using SIMD and LUTs  Speed up PHY using multi-core streamline processing  Core dedication for real-time support

NSDI 2009, Boston, USA

slide-14
SLIDE 14

Efficient PHY Implementation

  • Exploit large high-speed cache memory

– Extensive use of lookup tables (LUT): trade memory for calculation; still well fit into L2 cache – Applicable for more than half of the common algorithms; speedup ranges from 1.5x to 22x

NSDI 2009, Boston, USA 14

Direct impl. 8

  • ps per bit

LUT impl. 2 Look- up op for 8 bits! (size 32KB) Ex: Convolutional encoder+

+

Tb Tb Tb Tb Tb Tb

Output Data A Output Data B

slide-15
SLIDE 15

Efficient PHY Implementation

  • Exploit data parallelism in PHY

– Utilize wide-vector SIMD extension in CPU – Applicable to many PHY algorithms with significant speedups (1.6x ~ 50x)

NSDI 2009, Boston, USA 15

  • Ex. (I)FFT
slide-16
SLIDE 16

Core 2 Core 1

Speed up PHY using multi-core streamline processing

  • Efficiently partition and schedule the PHY

processing across cores

– Interconnecting sub-pipeline with light-weight, synchronized FIFOs – Static scheduling of processing modules in PHY pipeline

NSDI 2009, Boston, USA 16 Demod + Interleaving FFT Viterbi decoding Remove GI Descramble Decimation

Synchronized FIFO

slide-17
SLIDE 17

Core Dedication for Real-time Support

  • Exclusively allocate enough cores for SDR

processing in multi-core systems

– Guarantee the CPU, cache and memory bandwidth resources for predictable performance – Achieve s-level timing control – Simple abstraction, and easier to implement in standard OSes than RT-scheduler

  • Implemented in WinXP without modifications to Kernel

NSDI 2009, Boston, USA 17

slide-18
SLIDE 18

Implementation

  • Sora software platform on Win XP

– 14K lines of C code, including PCIe driver framework, memory management, FIFO management, etc

  • SoftWiFi: full implementation of IEEE

802.11a/b/g PHY and DCF MAC

– 9K lines of C code; 4 man-month for dev & test – DSSS 1, 2, 5.5, 11Mbps for 11b; OFDM 6, 9, 12, 18, 24, 36, 48, 54Mbps for 11a/g

NSDI 2009, Boston, USA 18

slide-19
SLIDE 19

1 2 3 4 5 6 7 8 9 10 1M 2M 5.5M 11M 6M 24M 54M

Required computation (Giga cycles per second)

1 2 3 4 5 6 7 8 9 10 1M 2M 5.5M 11M 6M 24M 54M

Required computation (Giga cycles per second)

Results: PHY Processing

11.6 11.7 11.7 11.8 18.3 60.4 132.4

NSDI 2009, Boston, USA 19

>30x speedup ~10x speedup

802.11b 802.11a/g 802.11b 802.11a/g After Sora Optimization

slide-20
SLIDE 20

1 2 3 4 5 6 7 8 9 10 1M 2M 5.5M 11M 6M 24M 54M

Required computation (Giga cycles per second)

1 2 3 4 5 6 7 8 9 10 1M 2M 5.5M 11M 6M 24M 54M

Required computation (Giga cycles per second)

Results: PHY Processing

11.6 11.7 11.7 11.8 18.3 60.4 132.4

NSDI 2009, Boston, USA 20

>30x speedup ~10x speedup

802.11b 802.11a/g 802.11b 802.11a/g

Sora enables software implementation of today’s high-speed wireless system in standard PC with a few cores

After Sora Optimization

slide-21
SLIDE 21

5 10 15 20 25 1M 2M 5.5M 11M 6M 24M 54M

Throughput (Mbps) Modulation Mode Sora-Commercial Commercial-Commercial Commercial-Sora

Results: End-to-end Throughput

NSDI 2009, Boston, USA 21

Communicating with commercial 802.11a/b/g card

slide-22
SLIDE 22

5 10 15 20 25 1M 2M 5.5M 11M 6M 24M 54M

Throughput (Mbps) Modulation Mode Sora-Commercial Commercial-Commercial Commercial-Sora

Results: End-to-end Throughput

NSDI 2009, Boston, USA 22

Communicating with commercial 802.11a/b/g card Seamlessly interoperate with commercial WiFi

  • Correctness of all PHY algorithms
  • Satisfying timing requirements of standards
  • Commercial equivalent performance
slide-23
SLIDE 23

Extensions

NSDI 2009, Boston, USA 23

Jumbo frames in 802.11

TDMA MAC

slide-24
SLIDE 24

Extensions: New Applications

NSDI 2009, Boston, USA 24

slide-25
SLIDE 25

Conclusion

  • Sora is a fully programmable software radio

platform on commodity PC architecture

– Easy C programming on multi-core CPU – High performance: high processing speed, low latency, and performance guarantee

  • Confirmed by SoftWiFi, the first fully interoperable IEEE

802.11 (PHY and MAC) on general purpose processors

  • Plan to release Sora SDK to research

community

– H/W: RCB + 2.4G RF front-end set (~$2K USD)

NSDI 2009, Boston, USA 25

slide-26
SLIDE 26
  • Sora demo in the demo session this evening
  • You can interact with Sora with your own laptop,

iPhone, or other smart phones

SSID: soranet

DEMO

NSDI 2009, Boston, USA 26 11/5/2008

HD Video streaming

slide-27
SLIDE 27

Q&A

NSDI 2009, Boston, USA 27