Utilizing commercial graphics processors Utilizing commercial - - PowerPoint PPT Presentation

utilizing commercial graphics processors utilizing
SMART_READER_LITE
LIVE PREVIEW

Utilizing commercial graphics processors Utilizing commercial - - PowerPoint PPT Presentation

Utilizing commercial graphics processors Utilizing commercial graphics processors in the real-time geo-registration of in the real-time geo-registration of streaming high-resolution imagery streaming high-resolution imagery Laurence Flath


slide-1
SLIDE 1

August 2004 LMF

Utilizing commercial graphics processors Utilizing commercial graphics processors in the real-time geo-registration of in the real-time geo-registration of streaming high-resolution imagery streaming high-resolution imagery

Laurence Laurence Flath Flath, Michael Kartz, Randall Frank , Michael Kartz, Randall Frank Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory GP GP2

2 Workshop

Workshop August 7-8, 2004 August 7-8, 2004

UCRL-PRES-205737

slide-2
SLIDE 2

August 2004 LMF

Outline Outline

  • Introduction
  • Problem

– Real-time video streaming

  • Solution

– Real-time geo-registration using GPU

  • Implementation on graphics processor
  • Experimental results
  • Demonstration
  • Conclusion
slide-3
SLIDE 3

August 2004 LMF

Introduction Introduction

  • Real-time video processing is a trade-off

– Image size [pixels / frame] – Image rate [frames / second]

  • General-purpose central processing units

(CPUs) are very fast, but total system throughput often is not fast enough

+ Superscalar architecture, vector units – Inadequate memory bandwidth

  • Graphics Processsing Units (GPUs) are

not as fast, but are specialized for the task

+ Huge memory bandwidth – Not ideal for generalized algorithms

Source: Intel Source: ATI

slide-4
SLIDE 4

August 2004 LMF

Graphics Processing Work at LLNL Graphics Processing Work at LLNL

  • GPU Discovery Project

– Develop infrastructure for streaming data processing – Research novel data mapping for non-traditional data types – Implement algorithms on graphics processors for real-world computational problems – Catalog advantages / disadvantages of GPU-based algorithms – Investigate mapping of algorithms to higher-level APIs – Study impact of GPU implementations on next-generation systems

  • Visualization
  • Numerous application-based projects for which the GPU is an

integral system component

slide-5
SLIDE 5

August 2004 LMF

Problem - Viewing Remote Video Problem - Viewing Remote Video

Mobile Camera Ground Link Scene

slide-6
SLIDE 6

August 2004 LMF

Video Transmission Limited by Bandwidth Video Transmission Limited by Bandwidth

  • High-resolution video generates a huge amount of data

– 10 100 MPixels/sec – 10 400 MB/sec

  • Mobile platforms often don’t have access to a high-

bandwidth transport medium

– Microwave, laser comm systems designed for stationary

  • peration

– Available mobile systems range from 9.6 Kbs to 30 Mb/sec

  • Need a 104 - 105 compression ratio

– Generic spatial-temporal algorithms give (at best) 100:1

slide-7
SLIDE 7

August 2004 LMF

How To Get 10 How To Get 104

4 Compression Ratio

Compression Ratio

  • Re-scope the problem!
  • Ground imagery doesn’t change very often

– Point camera for a long time at the same spot – Transform imagery such that background doesn’t change, even though camera platform is moving – Only transmit required scene information

  • Moving objects
  • Stationary background (occasionally)
slide-8
SLIDE 8

August 2004 LMF

Geo-Registration Geo-Registration

  • Transform imagery recorded in one

perspective into another

  • Usually produce ‘nadir-looking’ view

– Stationary scene ideal for background removal algorithms – Result may be used as a map, with features given in GPS coordinates – Permits sensor fusion; e.g. visible, infrared, radar, etc.

  • Requires inertial navigation system

(INS) data from sensor platform

– Global positioning system (GPS) provides position – Inertial measurement unit (IMU) provides attitude; e.g. roll, pitch, heading

slide-9
SLIDE 9

August 2004 LMF

Imaging the ground obliquely - Imaging the ground obliquely - y-axis

y-axis

Treat system like a pinhole camera; i.e. Z, G >> f ‘Lens’

yc yg G pitch y ƒ Z Center of FOV D

yg = D 2 Z

  • G

Z

  • yc

f

  • 1

yc f

slide-10
SLIDE 10

August 2004 LMF

Imaging the ground obliquely - Imaging the ground obliquely - x-axis

x-axis

y yc = y xF xc + y yc xc xF

xg = D G Z

  • yc

f

  • 1

xc f

“Single-point perspective Single-point perspective” ”

slide-11
SLIDE 11

August 2004 LMF

Geo-Registration Algorithm Geo-Registration Algorithm

  • Calculate angles and distances from INS data
  • Map perspective equations to a homogeneous

coordinate transformation

  • Propagate source pixels through transformation to
  • utput image plane

– Interpolate / anti-alias – Fill-in blank patches

  • Remove jitter due to GPS uncertainty, IMU drift

– Register to known target or previous imagery – Shift result to sub-pixel resolution

slide-12
SLIDE 12

August 2004 LMF

MRC RF MRC RF Comm Comm System System (TCP/IP) (TCP/IP)

Muir Flight Experiment Muir Flight Experiment

Commodity PCs Commodity PCs (Twin Apple G5s, ATI (Twin Apple G5s, ATI Radeon Radeon 9600) 9600) Twin 11MPixel, 2Hz Twin 11MPixel, 2Hz Prototype Cameras Prototype Cameras GPS/IMU System GPS/IMU System Helicopter Platform Helicopter Platform Wescam Wescam 3-Axis 3-Axis Stabilized Stabilized Gimbal Gimbal

Demonstrate real-time on-board geo-registration and moving target extraction Demonstrate real-time on-board geo-registration and moving target extraction

slide-13
SLIDE 13

August 2004 LMF

Camera Camera Disk Record Disk Record GPS/IMU Rectify GPS/IMU Rectify Correlate / Shift Correlate / Shift Decimate / Segment Decimate / Segment Motion Detect Motion Detect Blob Find Blob Find Disk Record Disk Record

Socket Connection Socket Connection Socket Connection Socket Connection

System Data Flow System Data Flow - Per Camera

  • Per Camera
  • Six objects (threads)

distribute work load between two CPUs:

– Camera / frame grabber – Camera record – Geo-registration – Motion-detect / blob find – Blob record – Decimated imagery

  • Socket communications

– Individual control of

  • bjects

– Live data for ground station

slide-14
SLIDE 14

August 2004 LMF

Image Processing Steps Image Processing Steps

Raw Data Collection

(Frame Grabber)

Raw Data Collection

(Frame Grabber)

Geo-Rectify

(GPS/IMU)

Geo-Rectify

(GPS/IMU)

Scene-Based Registration

(Spatial Correlator)

Scene-Based Registration

(Spatial Correlator)

Motion-Detect

(Star-Killer)

Motion-Detect

(Star-Killer)

Blob-Detect

(Threshold / Blob-Find)

Blob-Detect

(Threshold / Blob-Find)

Data Archival

(Transmit and/or Store)

Data Archival

(Transmit and/or Store)

Registration From Adjacent FPAs

Geometry Correction

(Image Flip, Distortion)

Geometry Correction

(Image Flip, Distortion)

INS Data Per-Sensor Data Pipeline

slide-15
SLIDE 15

August 2004 LMF

Image Processing Steps Image Processing Steps

GPU-based CPU-based

Raw Data Collection

(Frame Grabber)

Raw Data Collection

(Frame Grabber)

Geo-Rectify

(GPS/IMU)

Geo-Rectify

(GPS/IMU)

Scene-Based Registration

(Spatial Correlator)

Scene-Based Registration

(Spatial Correlator)

Motion-Detect

(Star-Killer)

Motion-Detect

(Star-Killer)

Blob-Detect

(Threshold / Blob-Find)

Blob-Detect

(Threshold / Blob-Find)

Data Transmit /Archive

(Transmit and/or Store)

Data Transmit /Archive

(Transmit and/or Store)

Registration From Adjacent FPAs

Geometry Correction

(Image Flip, Distortion)

Geometry Correction

(Image Flip, Distortion)

INS Data

slide-16
SLIDE 16

August 2004 LMF

GPU Processing GPU Processing

  • Load image into texture(s)

– 4K x 2.6K x U16 pixel image – Four 2K x 2K tiles (GL_MAX_TEXTURE_SIZE == 2048)

  • Calculate transformation matrix based on INS data
  • Render registration region

– Read pixels – Perform correlation on CPU – Feedback shifts to transformation matrix

  • Render entire output image

– Currently rendering to glX context (not a pbuffer) – If desired output greater than 1K x 1K, need to tile output as well – Use asynchronous transfer and shared caching modes to reduce readback time

slide-17
SLIDE 17

August 2004 LMF

slide-18
SLIDE 18

August 2004 LMF

Muir Experimental Results Muir Experimental Results

  • Demonstrated on-board, real-time image

processing of data

– Geo-rectification – Imagery stabilized by auto-correlation – Motion detection and object tracking

  • 15 hours of broad area geo-locked imagery
  • Stored 5 TB of raw imagery

– Raw data enable us to repeat the flight in the lab and validate our capabilities – Full image collection is not necessary for operation

  • Ground station displays geo-registered imagery and object info
  • Image updates every 20 sec (depending on link bandwidth).
  • Blob information updates at camera frame rate
slide-19
SLIDE 19

August 2004 LMF

Demonstration Demonstration Demonstration

slide-20
SLIDE 20

August 2004 LMF

Conclusion Conclusion

  • LLNL has several efforts aimed at

incorporating graphics processors into real- world applications

  • GPU-based geo-registration algorithm was

demonstrated successfully in the field

– Real-time transformation of 44 Mpixels/sec, including jitter-removal, motion detection, and blob-finding using commodity hardware – Permitted real-time transmission of high- resolution ground imagery

slide-21
SLIDE 21

August 2004 LMF

Future Work Future Work

  • Ongoing efforts for remote video

application

– Map motion detection algorithms to GPU, if found to be practical

  • Fragment shader

– Remove optical distortions via image warping

  • Vertex shader

– Add support for digital elevation map (DEM)

  • Wrap texture to 3-D surface

– PCI Express to reduce readback overhead

Raw Data Collection Raw Data Collection Geo-Rectify Geo-Rectify Registration Registration Motion-Detect Motion-Detect Blob-Detect Blob-Detect Data Transmit /Archive Data Transmit /Archive Geometry Correction Geometry Correction

slide-22
SLIDE 22

August 2004 LMF

Acknowledgements Acknowledgements

  • NAI

– Charles Bennett – Michael Carter – John Marion – Robert Sawvel – Sheila Vaidya

  • Computations

– David Bremer – John Johnson – Holger Jones – Jeremy Meredith

slide-23
SLIDE 23

August 2004 LMF

Reserve Slides Reserve Slides Reserve Slides

slide-24
SLIDE 24

August 2004 LMF

CPU Comparison CPU Comparison

Parameter Intel Pentium 4 IBM PowerPC 970 ATI Radeon 9700 Sony Emotion Engine Virtual address range 32 bits 64 bits 64 bits Real address range 32 bits 42 bits Scalar datapath width 32 bits 64 bits 32/128 bits 128 bits CPU cores per die 1 1 1 1 Superscalar execution 4 4 + 1 branch 4 Vertex + 8 pixel 2 Pipeline depth (int) 20 stages 16 stages 5 (Vertex) Vector extensions Yes Yes Yes FPUs 1 + SSE2 2 + AltiVec L1 cache I/D (ways) 12K/8K 64K/32K (DM)

  • 16K/8K + 16K Scratch

L2 cache 512K 512K

  • Core frequency (max)

3.0Ghz 1.8GHz 325MHz 300 MHz FSB frequency 200MHz 450MHz 310MHz 150 MHz FSB effective bit rate 800MHz 900MHz 620MHz FSB width 64 bits 2 x 32 bits 1 x 32 bits (AGP 8x), 4 x 64 bits (DDR) 2 x 16 bit FSB data bandwidth 3.2GB/s 2 x 3.2GB/s 2 GB/s (AGP 8x), 19.8GB/s (DDR) 3.2 GB/s (RDRAM) Transistors 54 million 52 million 107 million 10.5 million IC process 0.13m 0.13m Cu + SOI 0.15m 0.25m Die size 131mm2 118mm2 240mm2 Voltage (core) 1.5V 1.3V (1.8GHz) 1.8V Power (typical) 82W 42W @ 1.8GHz ~50W 13W Production Shipping Shipping Shipping Shipping Floating-point performance 6.2 GFLOPS

slide-25
SLIDE 25

August 2004 LMF

Homogeneous Coordinates Homogeneous Coordinates

  • Homogeneous coordinates are scale invariant:
  • Represent scaled coordinates in 3-space:

x y z w

  • x /w

y /w z /w

  • x

y z w

  • ax

ay az aw

  • Homogeneous

3-D World

slide-26
SLIDE 26

August 2004 LMF

Homogeneous Coordinates Homogeneous Coordinates - cont.

  • cont.
  • A homogeneous affine transform:
  • Basic means of all transform operators in

higher-level 3-D languages

– OpenGL – DirectX

  • x
  • y
  • z
  • w
  • =

m0 m4 m8 m12 m1 m5 m9 m13 m2 m6 m10 m14 m3 m7 m11 m15

  • x

y z w

slide-27
SLIDE 27

August 2004 LMF

Scheimpflug Scheimpflug Checkerboard Checkerboard

  • Virtual overlap of focal plane arrays permits effective

pixel counts much larger than commercially available

– Multiple apertures (maximum required is four) – Use inexpensive COTS FPAs

4 6 12 14

c

2 8 10

a

5 7 13 15

d

1 3 9 11

b