sora high performance software
play

Sora: High Performance Software Radio using General Purpose Multi- - PowerPoint PPT Presentation

Sora: High Performance Software Radio using General Purpose Multi- Core Processors Kun Tan Jiansong Zhang Ji Fang He Liu Yusheng Ye Shen Wang Yongguang Zhang Haitao Wu Wei Wang Geoffrey M. Voelker Microsoft


  1. Sora: High Performance Software Radio using General Purpose Multi- Core Processors Kun Tan † Jiansong Zhang † Ji Fang ‡ He Liu § Yusheng Ye § Shen Wang § Yongguang Zhang † Haitao Wu † Wei Wang † Geoffrey M. Voelker ◊ † Microsoft Research Asia ‡ Tsinghua University, Beijing, China § Beijing Jiaotong University, Beijing, China ◊ UCSD, La Jolla, USA NSDI 2009, Boston, USA 1

  2. Software Radio Bluetooth GPS 3G General RF Frontend WiFi CDMA Bluetooth, WiFi, WiMAX, GSM, WiMAX software GSM CDMA, 3G, LTE … Benefits  Promise of universal connectivity and cost saving  Programmability => faster development cycle, faster to market  Open platform for wireless research NSDI 2009, Boston, USA 2

  3. Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O 1.2Gbps for 802.11 Antenna (20MHz channel, 16b A/D, 4x) ~up to 5 Gbps for 11n (4x4MIMO) ; Over 10Gbps for future high-speed wireless RF D/A Processor Frontend A/D Digital Hardware Software Samples NSDI 2009, Boston, USA 3

  4. Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing Samples Samples Samples Samples Bits Bits Bits Bits @384Mbps @512Mbps @640Mbps @1.28Gbps @24Mbps @24Mbps @48Mbps @48Mbps Convolutional Symbol Wave Scramble Interleaving QAM Mod IFFT GI Addition encoder Shaping Transmitter: To RF From MAC Samples Samples Samples Samples Bits Bits Bits @1.28Gbps @640Mbps @512Mbps @384Mbps @48Mbps @24Mbps @24Mbps Receiver: Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding From RF To MAC NSDI 2009, Boston, USA 4

  5. Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing Samples Samples Samples Samples Bits Bits Bits Bits @384Mbps @512Mbps @640Mbps @1.28Gbps @24Mbps @24Mbps @48Mbps @48Mbps Raw computation power required: Convolutional Symbol Wave Scramble Interleaving QAM Mod IFFT GI Addition encoder Shaping Transmitter: To RF 802.11b => 10Gops, 802.11a => 40Gops! From MAC Samples Samples Samples Samples Bits Bits Bits (now server-class CPU runs at 3GHz clock) @1.28Gbps @640Mbps @512Mbps @384Mbps @48Mbps @24Mbps @24Mbps Receiver: Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding From RF To MAC NSDI 2009, Boston, USA 5

  6. Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing • Hard deadline and accurate timing control – 802.11 MAC requires response within a few  s – Event trigger timing accuracy at  s level NSDI 2009, Boston, USA 6

  7. Approaches Sora Sora Programmable hardware Embedded (FPGA) Resolving the SDR platform dilemma High DSP • Commodity PC w/ C program • High performance Performance Example: Rice WARP, TI SFF-SDR • sys tput:10Gbps; ~  s latency • target wireless xput:10M~1Gbps Low Low-performance GPP-based SDR Example: GNU Radio/USRP(v1&2) • Interface USB/GbE: <1Gbps, >1ms • Achievable wireless xput: ~100Kbps Low High Programmability NSDI 2009, Boston, USA 7

  8. Sora Approach • New PCIe-based Interface card => high system throughput • New optimizations to implement PHY algorithms and streamline processing on multi-core CPU=> efficient PHY processing • Core dedication => real-time support NSDI 2009, Boston, USA 8

  9. Sora Architecture Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware General radio front-end: 700M/1.8G/2.4G/5GHz NSDI 2009, Boston, USA 9

  10. Radio Control Board Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware PCIe-based High-speed Interface card  PCIe is commodity in most modern PCs  High throughput: 16Gbps at PCIe-8x  Low latency: ~ 1  s  Separated with other I/O devices NSDI 2009, Boston, USA 10

  11. RCB Details PCIe-8x interface: up to 16Gbps throughput Versatile RF interface: up to 8 channels (8x8 MIMO) NSDI 2009, Boston, USA 11

  12. RCB Details FPGA FIFO A/D DMA RF RF Circuit Controller Controller FIFO D/A Antenna PCIE PCIe SDRAM Controller RF Front-end bus Controller Registers  s DDR SDRAM RCB  Buffered data path: bridging the synchronous ops at RF and asynchronous processing at CPU (12.3Gbps measured )  Low latency control path for software (0.36  s measured) Versatile RF interface: up to 8 channels (8x8 MIMO) NSDI 2009, Boston, USA 12

  13. Sora Software Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware High-performance SDR processing w/ key software techniques  Efficient PHY implementation using SIMD and LUTs  Speed up PHY using multi-core streamline processing  Core dedication for real-time support NSDI 2009, Boston, USA 13

  14. Efficient PHY Implementation • Exploit large high-speed cache memory – Extensive use of lookup tables (LUT): trade memory for calculation; still well fit into L2 cache – Applicable for more than half of the common algorithms; speedup ranges from 1.5x to 22x Output Data A Ex: Convolutional encoder + Direct impl. 8 ops per bit T b T b T b T b T b T b LUT impl. 2 Look- up op for 8 bits! (size 32KB) + Output Data B NSDI 2009, Boston, USA 14

  15. Efficient PHY Implementation • Exploit data parallelism in PHY – Utilize wide-vector SIMD extension in CPU – Applicable to many PHY algorithms with significant speedups (1.6x ~ 50x) Ex. (I)FFT NSDI 2009, Boston, USA 15

  16. Speed up PHY using multi-core streamline processing • Efficiently partition and schedule the PHY processing across cores – Interconnecting sub-pipeline with light-weight, synchronized FIFOs – Static scheduling of processing modules in PHY pipeline Core 1 Core 2 Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding Synchronized FIFO NSDI 2009, Boston, USA 16

  17. Core Dedication for Real-time Support • Exclusively allocate enough cores for SDR processing in multi-core systems – Guarantee the CPU, cache and memory bandwidth resources for predictable performance – Achieve  s-level timing control – Simple abstraction, and easier to implement in standard OSes than RT-scheduler • Implemented in WinXP without modifications to Kernel NSDI 2009, Boston, USA 17

  18. Implementation • Sora software platform on Win XP – 14K lines of C code, including PCIe driver framework, memory management, FIFO management, etc • SoftWiFi: full implementation of IEEE 802.11a/b/g PHY and DCF MAC – 9K lines of C code; 4 man-month for dev & test – DSSS 1, 2, 5.5, 11Mbps for 11b; OFDM 6, 9, 12, 18, 24, 36, 48, 54Mbps for 11a/g NSDI 2009, Boston, USA 18

  19. Results: PHY Processing 11.6 11.7 11.7 11.8 18.3 60.4 132.4 10 10 Required computation (Giga cycles per second) Required computation (Giga cycles per second) 9 9 >30x speedup ~10x speedup 8 8 7 7 After Sora Optimization 6 6 5 5 4 4 3 3 2 2 1 1 0 0 1M 2M 5.5M 11M 6M 24M 54M 1M 2M 5.5M 11M 6M 24M 54M 802.11b 802.11a/g 802.11b 802.11a/g NSDI 2009, Boston, USA 19

  20. Results: PHY Processing 11.6 11.7 11.7 11.8 18.3 60.4 132.4 10 10 Required computation (Giga cycles per second) Required computation (Giga cycles per second) 9 9 >30x speedup ~10x speedup 8 8 7 7 After Sora Optimization 6 6 Sora enables software implementation of 5 5 4 4 today’s high -speed wireless system in 3 3 standard PC with a few cores 2 2 1 1 0 0 1M 2M 5.5M 11M 6M 24M 54M 1M 2M 5.5M 11M 6M 24M 54M 802.11b 802.11a/g 802.11b 802.11a/g NSDI 2009, Boston, USA 20

  21. Results: End-to-end Throughput Communicating with commercial 802.11a/b/g card 25 Sora-Commercial Throughput (Mbps) 20 Commercial-Commercial Commercial-Sora 15 10 5 0 1M 2M 5.5M 11M 6M 24M 54M Modulation Mode NSDI 2009, Boston, USA 21

  22. Results: End-to-end Throughput Communicating with commercial 802.11a/b/g card 25 Sora-Commercial Throughput (Mbps) 20 Commercial-Commercial Seamlessly interoperate with commercial WiFi Commercial-Sora 15 • Correctness of all PHY algorithms • Satisfying timing requirements of standards 10 • Commercial equivalent performance 5 0 1M 2M 5.5M 11M 6M 24M 54M Modulation Mode NSDI 2009, Boston, USA 22

  23. Extensions TDMA MAC Jumbo frames in 802.11 NSDI 2009, Boston, USA 23

  24. Extensions: New Applications NSDI 2009, Boston, USA 24

  25. Conclusion • Sora is a fully programmable software radio platform on commodity PC architecture – Easy C programming on multi-core CPU – High performance: high processing speed, low latency, and performance guarantee • Confirmed by SoftWiFi, the first fully interoperable IEEE 802.11 (PHY and MAC) on general purpose processors • Plan to release Sora SDK to research community – H/W: RCB + 2.4G RF front-end set (~$2K USD) NSDI 2009, Boston, USA 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend