Autonomous driving visual perception on the DRIVE PX2 Dr. Antonio - - PowerPoint PPT Presentation

autonomous driving visual perception
SMART_READER_LITE
LIVE PREVIEW

Autonomous driving visual perception on the DRIVE PX2 Dr. Antonio - - PowerPoint PPT Presentation

1 Autonomous driving visual perception on the DRIVE PX2 Dr. Antonio Espinosa http:// grupsderecerca.uab.cat/hpca4se/en/content/gpu Dr. Antonio M. Lpez www.cvc.uab.es/~antonio http://adas.cvc.uab.es/elektra/ http://adas.cvc.uab.es/elektra/


slide-1
SLIDE 1

1

http://adas.cvc.uab.es/elektra/

Autonomous driving visual perception

  • n the DRIVE PX2
  • Dr. Antonio M. López

www.cvc.uab.es/~antonio http://adas.cvc.uab.es/elektra/

  • Dr. Antonio Espinosa

http://grupsderecerca.uab.cat/hpca4se/en/content/gpu

slide-2
SLIDE 2

2

http://adas.cvc.uab.es/elektra/

Our background: camera-based ADAS

slide-3
SLIDE 3

3

http://adas.cvc.uab.es/elektra/

Our current research:

slide-4
SLIDE 4

4

http://adas.cvc.uab.es/elektra/

Stereo Vision for Depth Computation

Disparity: distance between same point in left & right images higher disparity = Objects are closer

10 meters

slide-5
SLIDE 5

5

http://adas.cvc.uab.es/elektra/

Optimize Stereo Matching: Semi-Global Matching (SGM)

Image Resolution

  • Max. Disparity: 256

640 x 480 0.63 B ops. 1280 x 480 1.26 B ops. 1280 x 960 2.51 B ops.

Total Computation Work  ( Height × Width × MaxDisp × Path Directions )

DEVICE (GPU) HOST (CPU) HOST (CPU)

… Input: Left and Right Images Output: Disparity Image

Matching Cost Smoothed Cost

slide-6
SLIDE 6

6

http://adas.cvc.uab.es/elektra/

GPU Implementation: Proposal

x

… …

x

C

y d

L

First level parallelism (big granularity) Second level parallelism (medium granularity)

SGM stencil SGM stencil

Third level parallelism (fine-grained)

Dependency: Serialized Collaborative work

slide-7
SLIDE 7

7

http://adas.cvc.uab.es/elektra/

GPU Implementation: Performance & Energy Efficiency

  • Tegra X1 achieves real-time ( 20 FPS ) with > 2X efficiency than Titan X
  • Newer GPUs, with higher memory bandwidth, will achieve faster solutions

Energy Efficiency

50 100 TEGRA X1: Performance (Frames/Second, fps)

640x480 1280x480 1280x960

D= 128

# path directions 2 4 8 real-time

slide-8
SLIDE 8

8

http://adas.cvc.uab.es/elektra/

GPU Implementation: Results

slide-9
SLIDE 9

9

http://adas.cvc.uab.es/elektra/

  • Stereo of 1280 x 960 = 1,228,800 pixels => Too much data to process
  • Medium-level representation with only relevant information
  • Fixed width stixels, variable number of stixels per column
  • Stixel = Stick + Pixel

Stereo + Horizon Line + Road Slope Stixels

slope Horizon

Stereo Images

Stixel World: Compact representation of the world

Obj. Sky Obj. Obj. Grnd

slide-10
SLIDE 10

10 10

http://adas.cvc.uab.es/elektra/

Image Resolution Computation 640 x 480 147 M ops. 1280 x 480 294 M ops. 1280 x 960 1179 M ops. Total Computation Work  ( Width × Height 2 )

Find best configuration

Sky: Far pixels, near 0 disparity Object: constant disparity Ground: close to expectedmodel

Stixel World: a Mid-level Image Representation

Computed independently for each column Enforces constraints: no sky below horizon, no neighbors objects at the same distance… Combinatorial explosion (of possible segments): dynamic programming technique

slide-11
SLIDE 11

11 11

http://adas.cvc.uab.es/elektra/

Stereo Disparity

First level parallelism (big granularity): Second level parallelism (fine-grained):

  • Independent / Task level: Typical CPU parallelization
  • Each image column is processed in parallel by a CTA
  • CTA = Cooperating Threads Array

CTA

···

CTA

h h

GPU Implementation: Proposal

slide-12
SLIDE 12

12 12

http://adas.cvc.uab.es/elektra/

GPU Implementation: Second Level Parallelism

CTA

step 1

C

i

C

i

C

i

C

i

step 2 step 3 …..

  • Extra parallelism level needed for efficient GPU use
  • Sequentially perform h (image height) steps
  • CTA threads collaborate sharing info each step
  • Decreasing Parallelism: Each step uses one thread less

Second level parallelism (fine-grained):

h

Computational Analysis

Thread Parallelism h×w Compute Work per thread h Total Global Data Reads h2×w Total Global Data Stores h×w step h …..

slide-13
SLIDE 13

13 13

http://adas.cvc.uab.es/elektra/

13

GPU Performance (Frames/Second, fps) Energy Efficiency

  • Real-time performance for energy efficient GPU: NVIDIA DRIVE PX
  • NVIDIA Drive PX has better energetic efficiency than high-end GPUs

8.68 4.00 4.57 2.32 2.23 1.49

1 2 3 4 5 6 7 8 9 10 Tegra X1 Titan X

frames per second / Watt

1280 x 240 640 x 480 1280 x 480

86.8 1000 45.7 581.0 22.3 373.0

1.0 10.0 100.0 1000.0

Tegra X1 Titan X frames per second

1280 x 240 640 x 480 1280 x 480

GPU Implementation: Performance & Energy Efficiency

slide-14
SLIDE 14

14 14

http://adas.cvc.uab.es/elektra/

GPU Implementation: Result (using SYNTHIA)

slide-15
SLIDE 15

15 15

http://adas.cvc.uab.es/elektra/

Image generator to acquire thousands of precise data with several kinds of ground truth.

  • RGB & Per pixel: depth, semantic class, instance ID, optical flow
  • Covering popular Cityscapes classes (see www.synthia-dataset.net)

What is this for? teaching cars to perceive the environment using machine learning techniques (e.g. deep learning, reinforcement learning).

slide-16
SLIDE 16

16 16

http://adas.cvc.uab.es/elektra/