Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation - - PowerPoint PPT Presentation

using gpu vsipl cuda to accelerate rf clutter simulation
SMART_READER_LITE
LIVE PREVIEW

Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation - - PowerPoint PPT Presentation

23 September 2010 Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation 2010 High Performance Embedded 2010 High Performance Embedded Computing Workshop ECRB - HPC - 1 Dan Campbell, Mark McCans, Mike


slide-1
SLIDE 1

23 September 2010

Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation

2010 High Performance Embedded

ECRB - HPC - 1

2010 High Performance Embedded Computing Workshop

Dan Campbell, Mark McCans, Mike Davis, Mike Brinkmann dan.campbell@gtri.gatech.edu

GTRI_B-1

ECRB - HPC - 1

slide-2
SLIDE 2

Outline

  • RF Clutter Simulation
  • Validation Approach
  • GPU VSIPL
  • GPU VSIPL
  • Precision Issues

ECRB - HPC - 2

  • VSIPL Port, Optimization, and Results

GTRI_B-2

ECRB - HPC - 2

slide-3
SLIDE 3

Outline

  • RF Clutter Simulation
  • Validation Approach
  • GPU VSIPL
  • GPU VSIPL
  • Precision Issues

ECRB - HPC - 3

  • VSIPL Port, Optimization, and Results

GTRI_B-3

ECRB - HPC - 3

slide-4
SLIDE 4

Radar Clutter Radar Clutter

Radar will observe echo from object…

ECRB - HPC - 4

…as well as a strong return from the ground.

Strong returns from the ground, called “clutter”, often limit the

g g

GTRI_B-4

ECRB - HPC - 4

g g , , performance of radars in air-to-air and air-to-ground operations.

slide-5
SLIDE 5

Synthetic Air-to-Air Clutter Synthetic Air to Air Clutter

350 400 350 400 30 35

7,500 Hz 10,000 Hz 12,500 Hz

350 400 Range Bin 150 200 250 300 Range Bin 150 200 250 300 10 15 20 25 Range Bin 150 200 250 300 Doppler (Hz)

  • 5000

5000 50 100 Doppler (Hz)

  • 5000

5000 50 100 5 Doppler (Hz)

  • 5000

5000 50 100

MPRF L k D MPRF RG HPRF HPRF

ECRB - HPC - 5 Bin 10 12 14 16 Bin 4 5 6

MPRF Look-Down MPRF RG-HPRF HPRF

Range 2 1 1 2 2 4 6 8 10 Range 5 5 1 2 3

GTRI_B-5

ECRB - HPC - 5 Doppler (Hz)

  • 2
  • 1

1 2 x 10

4

Doppler (Hz)

  • 5

5 x 10

4

Targets at same range/Doppler as clutter will be obscured.

slide-6
SLIDE 6

RF Clutter Simulation RF Clutter Simulation

Approach: Sub-divide ground into number of

l bl l tt t h d t unresolvable clutter patches and compute contribution of each.

ECRB - HPC - 6

GTRI_B-6

ECRB - HPC - 6

slide-7
SLIDE 7

RF Clutter Simulation RF Clutter Simulation

Phase Shift Delayed Signal Signal

ECRB - HPC - 7

Radar clutter data is sum of delayed and h hift d i f d f

GTRI_B-7

ECRB - HPC - 7

phase shifted versions of radar waveform.

slide-8
SLIDE 8

RF Clutter Simulation RF Clutter Simulation

Notional Parameters

Air-to-Air SAR Imaging (Air-to-Ground) Our Test

Notional Parameters

# of Range Bins 200 1750 500 # of Pulses 128 3000 8 # of Clutter 6,800 Rng x 96 Az 14,500 Rng x 26,812 Az 566 rng x 52 az

ECRB - HPC - 8

Patches = 6.5 x 105 = 3.8 x 108 = 29,432

Computational load depends on radar parameters and collection geometry (e.g., high resolution scenarios require a large number of independent clutter patches)

GTRI_B-8

ECRB - HPC - 8

require a large number of independent clutter patches)

slide-9
SLIDE 9

RF Clutter Simulation RF Clutter Simulation

Algorithm:

Inputs Inputs

  • Radar Parameters (waveform, antenna, etc.)
  • Location of platform for each pulse

Output Si l t d d d t b ( l lt f h l h h l d h bi )

  • Simulated radar data cube (sample voltage for each pulse, each channel, and each range bin)

For each pulse and for each range bin… For each clutter patch in this range ring

ECRB - HPC - 9

For each clutter patch in this range ring…

  • 1. Compute range, azimuth, and elevation from platform to clutter

patch. 2 Scale contribution of this clutter patch according to the radar

  • 2. Scale contribution of this clutter patch according to the radar

range equation.

  • 3. Accumulate the contribution of this clutter patch to the

simulated data cube.

GTRI_B-9

ECRB - HPC - 9

s u ated data cube

slide-10
SLIDE 10

Outline

  • RF Clutter Simulation
  • Validation Approach
  • GPU VSIPL
  • GPU VSIPL
  • Precision Issues

ECRB - HPC - 10

  • VSIPL Port, Optimization, and Results

GTRI_B-10

ECRB - HPC - 10

slide-11
SLIDE 11

Validation Needs

  • Porting MATLAB  C introduces changes
  • Random Number Generator

Random Number Generator

  • Double  Single
  • Implementation of some functions e.g. transcendentals

p g

  • Reordering of operations
  • Programmer Error

ECRB - HPC - 11

  • Identical output too costly
  • Derive acceptance criteria from expected usage needs
  • Derive acceptance criteria from expected usage needs

GTRI_B-11

ECRB - HPC - 11

slide-12
SLIDE 12

Validation Approach

  • Modify sim to capture RNG stream from MATLAB

A t t l b f f ld d t

  • Automate large number of runs for golden data
  • Accelerated port optionally ingests RNG stream
  • Capture port output and compare to golden data
  • Acceptance Criteria:

ECRB - HPC - 12

  • Acceptance Criteria:

 CNR∆ = ( CNRM – CNRT ) / CNRM < 10- 4

  • ECR = 20 log10( norm(M(:)

T(:)) / norm(M(:)) ) < 60dB

  • ECR = 20 log10( norm(M(:) - T(:)) / norm(M(:)) ) < -60dB
  • ADMSE = Mean( | fft2(M(:)) - fft2(T(:)) |2 ) < 10- 3

GTRI_B-12

ECRB - HPC - 12

slide-13
SLIDE 13

Outline

  • RF Clutter Simulation
  • Validation Approach
  • GPU VSIPL
  • GPU VSIPL
  • Precision Issues

ECRB - HPC - 13

  • VSIPL Port, Optimization, and Results

GTRI_B-13

ECRB - HPC - 13

slide-14
SLIDE 14

GPU VSIPL

 http://www.vsipl.org  Industry standard C API for portable dense linear  Industry standard C API for portable dense linear

algebra & signal processing

 Also C++, Python

ECRB - HPC - 14

 Accelerated implementations for many platforms,

primarily embedded, coprocessor-based systems

VSIPL implementation that exploits

VSIPL implementation that exploits Graphics Processing Units to accelerate VSIPL applications – developed at GTRI

GTRI_B-14

ECRB - HPC - 14

 http://gpu-vsipl.gtri.gatech.edu

slide-15
SLIDE 15

Outline

  • RF Clutter Simulation
  • Validation Approach
  • GPU VSIPL
  • GPU VSIPL
  • Precision Issues

ECRB - HPC - 15

  • VSIPL Port, Optimization, and Results

GTRI_B-15

ECRB - HPC - 15

slide-16
SLIDE 16

Original Validation Results

 VSIPL versions compared to MATLAB version

VSIPL Double VSIPL Single Threshold

CNR Consistent

Yes Yes

CNR ∆

10 1 6 10 6 10 4

ECRB - HPC - 16

CNR ∆

10- 1 6 10- 6 < 10- 4

ECR

  • 152 dB

2.9 dB < -60 dB

ADMSE

10- 1 2 10 4 < 10- 3

GTRI_B-16

ECRB - HPC - 16

slide-17
SLIDE 17

Single Precision

Single precision errors caused by high dynamic range in platform to clutter patch range calculation: range in platform to clutter patch range calculation:

d(Platformclutter) >>> d(clutter patchclutter patch)

Solution: use far-field approximation technique

  • Double precision used to compute a base range

ECRB - HPC - 17

Double precision used to compute a base range

  • Single precision for sets of ∆R values
  • Small number of double precision calculations has

negligible affect on performance

GTRI_B-17

ECRB - HPC - 17

slide-18
SLIDE 18

Far Field Approx. via Taylor Expansion

Range between platform at x and clutter patch at y

Distance from center Unit vector from CPI center to

Linear approximation near x0

from center

  • f scene,

CPI center to clutter patch

ECRB - HPC - 18

Distance travelled in direction orthogonal to “lines” of constant range

Quadratic Term

lines of constant range

GTRI_B-18

ECRB - HPC - 18

slide-19
SLIDE 19

Bounding Error

Approximation Error Case 1: Air-to-Air

128 pulses, 20 kHz PRF, 300 m/s velocity  10 km Altitude  error < 50 µm < 0.06° phase at X band

ECRB - HPC - 19

µ p

Case 2: SAR

10 second dwell, 100 m/s velocity  10 km Altitude  10 km Altitude  error < 12.5 m >> λ at X band!!!

Linear approximation to range may be appropriate

GTRI_B-19

ECRB - HPC - 19

for typical air-to-air scenarios.

slide-20
SLIDE 20

Validation Results

 Comparison to original MATLAB version

  • Approximation technique used in each version listed

Approximation technique used in each version listed

MATLAB Single VSIPL Double VSIPL Single Threshold

CNR Consistent

Yes Yes Yes

ECRB - HPC - 20

CNR ∆

10- 7 10- 14 10- 5 < 10- 4

ECR

  • 101 dB
  • 130 dB
  • 98 dB

< -60 dB d 30 d 98 d 60 d

ADMSE

10- 7 10- 10 10 - 6 < 10- 3

GTRI_B-20

ECRB - HPC - 20

slide-21
SLIDE 21

Outline

  • RF Clutter Simulation
  • Validation Approach
  • GPU VSIPL
  • GPU VSIPL
  • Precision Issues

ECRB - HPC - 21

  • VSIPL Port, Optimization, and Results

GTRI_B-21

ECRB - HPC - 21

slide-22
SLIDE 22

VSIPL PORT

  • MATLAB to VSIPL port made easier due to VSIPL

functions that emulate MATLAB operations p

  • Original MATLAB code very complex, particularly for

radar novice

  • First pass of the port was done with almost no attempts at
  • ptimizations

ECRB - HPC - 22

  • GPU transition required some additional changes
  • Single vs Double precision issues

g p

  • Time cost of operations differ TASP  GPU
  • VSIPL needs “sample” function

GTRI_B-22

ECRB - HPC - 22

  • VSIPL needs “sample” function
slide-23
SLIDE 23

Optimization Issues

  • MATLAB code written for readability over speed
  • Too many nested loops, operations involving small datasets
  • a y

ested oops, ope at o s

  • g s

a datasets

  • Many redundant calculations
  • Original code was very flexible, due to large user base

Original code was very flexible, due to large user base

  • Most optimizations required removing some generality
  • Assumptions need to be made about the scenario

ECRB - HPC - 23

  • Abstraction barrier issues
  • Small operations less costly on CPU than GPU
  • Operation fusion, coarser operations, and leaving small

things in C each helped

GTRI_B-23

ECRB - HPC - 23

slide-24
SLIDE 24

HPC Port – Performance

 Optimization progression of single precision VSIPL:

180s Matlab VSIPL GPU VSIPL 140s 160s 80s 100s 120s

ECRB - HPC - 24

40s 60s s 20s

GTRI_B-24

ECRB - HPC - 24

slide-25
SLIDE 25

HPC Port – Performance

 Optimization progression of single precision VSIPL:

180s Matlab VSIPL GPU VSIPL 140s 160s

Reduced generality; Dynamic  Static

80s 100s 120s

ECRB - HPC - 25

40s 60s s 20s

GTRI_B-25

ECRB - HPC - 25

slide-26
SLIDE 26

HPC Port – Performance

 Optimization progression of single precision VSIPL:

180s Matlab VSIPL GPU VSIPL 140s 160s

Reduced generality; Dynamic  Static

80s 100s 120s

Small ops VSIPL  C

ECRB - HPC - 26

40s 60s s 20s

GTRI_B-26

ECRB - HPC - 26

slide-27
SLIDE 27

HPC Port – Performance

 Optimization progression of single precision VSIPL:

180s Matlab VSIPL GPU VSIPL 140s 160s

Reduced generality; Dynamic  Static Reduced generality; simplified operations

80s 100s 120s

Small ops VSIPL  C

ECRB - HPC - 27

40s 60s s 20s

GTRI_B-27

ECRB - HPC - 27

slide-28
SLIDE 28

HPC Port – Performance

 Optimization progression of single precision VSIPL:

180s Matlab VSIPL GPU VSIPL 140s 160s

Reduced generality; Dynamic  Static Reduced generality; simplified operations Hoisted invariants; d d f f i

80s 100s 120s

Small ops VSIPL  C reordered for fusion

ECRB - HPC - 28

40s 60s s 20s

GTRI_B-28

ECRB - HPC - 28

slide-29
SLIDE 29

HPC Port – Performance

 Optimization progression of single precision VSIPL:

180s Matlab VSIPL GPU VSIPL 140s 160s

Reduced generality; Dynamic  Static Reduced generality; simplified operations Hoisted invariants; d d f f i

80s 100s 120s

Small ops VSIPL  C reordered for fusion

ECRB - HPC - 29

40s 60s

Stride consciousness; coarser VSIPL ops; loop fusion

s 20s

GTRI_B-29

ECRB - HPC - 29

slide-30
SLIDE 30

HPC Port – Performance

 Performance Timing Results:

Version Runtime(s) Speedup MATLAB 162 5 1x MATLAB 162.5 1x TASP VSIPL Double 20.9 7.8x TASP VSIPL Single 14.0 11.6x

ECRB - HPC - 30

GPU VSIPL Single 2.2 73.8x CUDA Native 1.3 125x

  • GTX 480/Q6600 TASP single core only
  • GTX 480/Q6600 TASP single core only

GTRI_B-30

ECRB - HPC - 30