Adapting a SDR environment to GPU architectures 06/22/2011 - - - PowerPoint PPT Presentation
Adapting a SDR environment to GPU architectures 06/22/2011 - - - PowerPoint PPT Presentation
Adapting a SDR environment to GPU architectures 06/22/2011 - 06/24/2011 SDR11 - WinnComm - Europe Pierre-Henri Horrein Fr ed eric P etrot (TIMA) Christine Hennebert Contents Context and aim Approaches Results Conclusion
Contents
Context and aim Approaches Results Conclusion
1
Context and aim
2
Approaches
3
Results
4
Conclusion
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 2
Outline
Context and aim Approaches Results Conclusion
1 Context and aim 2 Approaches 3 Results 4 Conclusion
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 3
OpenCL architecture
Context and aim Approaches Results Conclusion
Centralized management on host SIMD architecture: same kernels applied on large vectors
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 4
GNURadio
Context and aim Approaches Results Conclusion
SDR framework Provides:
- a large set of SDR basic operations
- runtime management of the operations
- I/O integration (Ettus Research, audio, . . . )
T ask 0 T ask 1 T ask 2 T ask 5 T ask 4 T ask 3 IQ Samples Applications
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 5
Aim
Context and aim Approaches Results Conclusion
Host
CU CU CU CU CU CU PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE
- Comp. Dev.
- Comp. Dev.
??
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 6
Outline
Context and aim Approaches Results Conclusion
1 Context and aim 2 Approaches 3 Results 4 Conclusion
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 7
Straightforward approach
Context and aim Approaches Results Conclusion
Use GPU as a single very efficient CPU Per-block optimization Efficient for some operations on very large data set
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 8
Mapping to GPU : parallelism
Context and aim Approaches Results Conclusion
Use each PE as a small CPU Apply an optimized sequential operation on each data set Lauch operation on multiple data sets Efficient for streaming applications, requires more memory
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 9
Outline
Context and aim Approaches Results Conclusion
1 Context and aim 2 Approaches 3 Results 4 Conclusion
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 10
Test platform and method
Context and aim Approaches Results Conclusion
Test platform Intel Core i5 760 CPU (4 cores, 2.8GHz, 8MB cache) 4GB DDR3 memory Linux 2.6.36 kernel NVidia GTS 450 GPU, Asus DirectCU Card, 1GB DDR5 memory Method 3 single operations:
- FFT
- IIR
- Mapping
Sequences of operations
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 11
FFT
Context and aim Approaches Results Conclusion
500 1000 1500 2000 2500 5 6 7 8 9 10 11 12 13 time(ms) N CPU OO
Straightforward solution inefficient on considered vector sizes Small gain for GPU solution Data transfer reduces performance GPU monitoring :
- 10% for straightforward
solution
- 98% for parallel solution
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 12
IIR
Context and aim Approaches Results Conclusion
500 1000 1500 2000 2500 5 6 7 8 9 10 11 12 13 time(ms) N CPU OO
No optimized algorithm for straightforward solution ∼ 50% gain for GPU solution High block size requires more memory
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 13
Demapping
Context and aim Approaches Results Conclusion
500 1000 1500 2000 2500 5 6 7 8 9 10 11 12 13 time(ms) N CPU OO
No need for high processing power → GPU core is sufficient Very efficient on GPU, even for large data set
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 14
Multitasking
Context and aim Approaches Results Conclusion
500 1000 1500 2000 2500 3000 3500 5 6 7 8 9 10 11 12 13 time(ms) N CPU OO
No multitasking on GPU: sequential execution Issue on buffer management reduces performance 20% gain for 4 tasks for size 1024
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 15
Outline
Context and aim Approaches Results Conclusion
1 Context and aim 2 Approaches 3 Results 4 Conclusion
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 16
Conclusion and perspectives
Context and aim Approaches Results Conclusion
Contributions Study of two possible solutions for GPU integration
- an existing solution, with disappointing results
- a new solution for streaming application, with promising
performance
Perspectives Resolve the buffer management issue Experiment in a real radio application
SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 17