Adapting a SDR environment to GPU architectures 06/22/2011 - - - PowerPoint PPT Presentation

adapting a sdr environment to gpu architectures
SMART_READER_LITE
LIVE PREVIEW

Adapting a SDR environment to GPU architectures 06/22/2011 - - - PowerPoint PPT Presentation

Adapting a SDR environment to GPU architectures 06/22/2011 - 06/24/2011 SDR11 - WinnComm - Europe Pierre-Henri Horrein Fr ed eric P etrot (TIMA) Christine Hennebert Contents Context and aim Approaches Results Conclusion


slide-1
SLIDE 1

Adapting a SDR environment to GPU architectures

06/22/2011 - 06/24/2011 SDR’11 - WinnComm - Europe

Pierre-Henri Horrein Fr´ ed´ eric P´ etrot (TIMA) Christine Hennebert

slide-2
SLIDE 2

Contents

Context and aim Approaches Results Conclusion

1

Context and aim

2

Approaches

3

Results

4

Conclusion

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 2

slide-3
SLIDE 3

Outline

Context and aim Approaches Results Conclusion

1 Context and aim 2 Approaches 3 Results 4 Conclusion

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 3

slide-4
SLIDE 4

OpenCL architecture

Context and aim Approaches Results Conclusion

Centralized management on host SIMD architecture: same kernels applied on large vectors

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 4

slide-5
SLIDE 5

GNURadio

Context and aim Approaches Results Conclusion

SDR framework Provides:

  • a large set of SDR basic operations
  • runtime management of the operations
  • I/O integration (Ettus Research, audio, . . . )

T ask 0 T ask 1 T ask 2 T ask 5 T ask 4 T ask 3 IQ Samples Applications

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 5

slide-6
SLIDE 6

Aim

Context and aim Approaches Results Conclusion

Host

CU CU CU CU CU CU PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE

  • Comp. Dev.
  • Comp. Dev.

??

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 6

slide-7
SLIDE 7

Outline

Context and aim Approaches Results Conclusion

1 Context and aim 2 Approaches 3 Results 4 Conclusion

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 7

slide-8
SLIDE 8

Straightforward approach

Context and aim Approaches Results Conclusion

Use GPU as a single very efficient CPU Per-block optimization Efficient for some operations on very large data set

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 8

slide-9
SLIDE 9

Mapping to GPU : parallelism

Context and aim Approaches Results Conclusion

Use each PE as a small CPU Apply an optimized sequential operation on each data set Lauch operation on multiple data sets Efficient for streaming applications, requires more memory

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 9

slide-10
SLIDE 10

Outline

Context and aim Approaches Results Conclusion

1 Context and aim 2 Approaches 3 Results 4 Conclusion

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 10

slide-11
SLIDE 11

Test platform and method

Context and aim Approaches Results Conclusion

Test platform Intel Core i5 760 CPU (4 cores, 2.8GHz, 8MB cache) 4GB DDR3 memory Linux 2.6.36 kernel NVidia GTS 450 GPU, Asus DirectCU Card, 1GB DDR5 memory Method 3 single operations:

  • FFT
  • IIR
  • Mapping

Sequences of operations

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 11

slide-12
SLIDE 12

FFT

Context and aim Approaches Results Conclusion

500 1000 1500 2000 2500 5 6 7 8 9 10 11 12 13 time(ms) N CPU OO

Straightforward solution inefficient on considered vector sizes Small gain for GPU solution Data transfer reduces performance GPU monitoring :

  • 10% for straightforward

solution

  • 98% for parallel solution

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 12

slide-13
SLIDE 13

IIR

Context and aim Approaches Results Conclusion

500 1000 1500 2000 2500 5 6 7 8 9 10 11 12 13 time(ms) N CPU OO

No optimized algorithm for straightforward solution ∼ 50% gain for GPU solution High block size requires more memory

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 13

slide-14
SLIDE 14

Demapping

Context and aim Approaches Results Conclusion

500 1000 1500 2000 2500 5 6 7 8 9 10 11 12 13 time(ms) N CPU OO

No need for high processing power → GPU core is sufficient Very efficient on GPU, even for large data set

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 14

slide-15
SLIDE 15

Multitasking

Context and aim Approaches Results Conclusion

500 1000 1500 2000 2500 3000 3500 5 6 7 8 9 10 11 12 13 time(ms) N CPU OO

No multitasking on GPU: sequential execution Issue on buffer management reduces performance 20% gain for 4 tasks for size 1024

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 15

slide-16
SLIDE 16

Outline

Context and aim Approaches Results Conclusion

1 Context and aim 2 Approaches 3 Results 4 Conclusion

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 16

slide-17
SLIDE 17

Conclusion and perspectives

Context and aim Approaches Results Conclusion

Contributions Study of two possible solutions for GPU integration

  • an existing solution, with disappointing results
  • a new solution for streaming application, with promising

performance

Perspectives Resolve the buffer management issue Experiment in a real radio application

SDR’11 - WinnComm - Europe - Pierre-Henri Horrein | 17