GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F - - PowerPoint PPT Presentation

gpu b ased d eep
SMART_READER_LITE
LIVE PREVIEW

GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F - - PowerPoint PPT Presentation

GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F REDERICK S OO , CTO April 4, 2016 Confidential and Proprietary Nauto is launching a connected camera for professional drivers Drive more than most consumers Exposed to


slide-1
SLIDE 1

Confidential and Proprietary

GPU-BASED DEEP LEARNING IN CLOUD AND EMBEDDED SYSTEMS FREDERICK SOO, CTO

April 4, 2016

slide-2
SLIDE 2

Nauto is launching a connected camera for professional drivers

2

  • Drive more than most

consumers

  • Exposed to passenger

and driver liability

  • Driver quality unknown -

small number of very bad drivers

slide-3
SLIDE 3

Massive shift in transportation due to synergistic technologies

3

Autonomous

90% reduction in accidents

Connected Electric Shared

$0.08 / mile

85% efficient drivetrain 50-70% utilization Fleet

  • ptimization
slide-4
SLIDE 4

Why use deep learning?

4

Good at visual tasks Scalable Deployable

Most important for NAUTO

slide-5
SLIDE 5

Small brains have a lot of functionality

5

26 billion neurons 1 million 10 million 100 million 20 watts 1mW 10mW 100mW

slide-6
SLIDE 6

Required performance depends on use case

6

slide-7
SLIDE 7

Small changes in F1 with size

7

  • Large networks can be

used in later stages of cascade

  • Order of magnitude

improvements in speed with basic exploration

  • Always worth

measuring performance/size tradeoff

slide-8
SLIDE 8

Test your chipsets - algorithm speed important but not entire story

8

30 60 90 120 150 A B C D E Nauto CNN forward pass (msec) Embedded SoC

  • Chipsets released in

2014, 2015 and 2016

  • Pricing varying from

$25 to $60+

  • Varying degrees of

HW/SW support

slide-9
SLIDE 9

Algorithm is not the bottleneck

9

Image processing Conversion to CNN space CNN forward pass Other steps 30msec 30msec … msec 15msec

slide-10
SLIDE 10

Entire system must be optimized

10

Collect data Label Train Deploy years months months months/years Pre-GPU

slide-11
SLIDE 11

Entire system must be optimized

11

Collect data Label Train Deploy weeks months months months/years Post-GPU years months months months/years Pre-GPU

slide-12
SLIDE 12

Entire system must be optimized

12

Collect data Label Train Deploy weeks months months months/years Post-GPU days weeks weeks weeks Nauto prototype years months months months/years Pre-GPU

slide-13
SLIDE 13

Entire system must be optimized

13

Collect data Label Train Deploy weeks months months months/years Post-GPU days weeks weeks weeks Nauto prototype years months months months/years Pre-GPU Nauto at- scale ? ? ? ?

slide-14
SLIDE 14

Easy to think of optimization; hard to think of system

14

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Donald Knuth

slide-15
SLIDE 15

Lessons

15

  • Embedded pipeline as important as raw CNN

performance

  • Match algorithm performance to use case
  • Overall system performance (data acquisition,

labeling, training) is where big progress to be made

slide-16
SLIDE 16

The future is in distributed awareness

16

Real world search

slide-17
SLIDE 17

Team

17

Ludmila Levkova Nikhil Deshmukh Joe Virzi Jonathan Soo