A SINGLE NEURAL NETWORK FORWARD PROPAGATION DETECTOR MINYOUNG KIM - - PowerPoint PPT Presentation

a single neural network forward propagation detector
SMART_READER_LITE
LIVE PREVIEW

A SINGLE NEURAL NETWORK FORWARD PROPAGATION DETECTOR MINYOUNG KIM - - PowerPoint PPT Presentation

A SINGLE NEURAL NETWORK FORWARD PROPAGATION DETECTOR MINYOUNG KIM MINYOUNG.KIM@US.PANASONIC.COM PANASONIC SILICON VALLEY LABORATORY PANASONIC SILICON VALLEY LABORATORY Location Cupertino Lab. Size ~ 100 people Team ~ 20 Deep Learning


slide-1
SLIDE 1

A SINGLE NEURAL NETWORK FORWARD PROPAGATION DETECTOR

MINYOUNG KIM

MINYOUNG.KIM@US.PANASONIC.COM PANASONIC SILICON VALLEY LABORATORY

slide-2
SLIDE 2

PANASONIC SILICON VALLEY LABORATORY

Location Cupertino

  • Lab. Size ~ 100 people

Team ~ 20 Activities Deep Learning Robotics ADAS (Advanced Drivers Assistant System ) Drones Collaboration with Universities in the world

slide-3
SLIDE 3

OBJECT DETECTION WITH DEEP LEARNING

Pros. Ø High performance Ø beat state-of-the-art records in many tasks including image classification and detection1) Cons. Ø Large set of database Ø High computational power Ø Deep Neural Networks with millions of parameters Ø Slower running time than most of conventional algorithms

slide-4
SLIDE 4

OBJECT DETECTION SYSTEM

Building Object Detection System Ø Training Deep Neural Network for Classification Ø Pedestrian detection: Binary classification Ø Object Proposal Generation at different scales Ø Generate box proposals (1000 ~ 2000 boxes)

Ø Selective Search2),Edge Boxes3), etc.

Ø Merge largely overlapping boxes Ø Non Maximum Suppression

Recognition Network Proposal Generation

Classification

Pedestrian Background

Run Recognizer Merge boxes

slide-5
SLIDE 5

OBJECT DETECTION SYSTEM

Proposal Generation & Scaling Ø Region proposal Ø Selective Search: 2 seconds per image (CPU)

Ø

  • rder of magnitude slower

Ø Edge Boxes: 0.2 seconds per image Ø Scaling Ø multiple forward propagations Ø Bottleneck Ø a forward propagation of an image

Ø less than 0.1 seconds (GPU) Time Consuming!

Proposal Generation Scaling

slide-6
SLIDE 6

OUR PEDESTRIAN DETECTION SYSTEM

Purpose Ø Speed up Ø remove proposal generation step to make the system faster

Ø Speed is one of the most important element in ADAS (Advanced Driver Assistant Systems) applications (Practical Applications)

Ø Build scale-invariant system Ø no need to process multiple scaled image to detect different size of pedestrians in an image PSVL Pedestrian Detection System

INPUT A Single Forward Propagation

PSVL Neural Detector

OUTPUT

slide-7
SLIDE 7

OUR PEDESTRIAN DETECTION SYSTEM

Recognition Network Fully Convolutional Network as Detector Add Regression Layer and Finetune Detection by a single forward propagation

slide-8
SLIDE 8

RECOGNITION NETWORK

Train DNN for recognition Ø GPU Ø NVIDIA Titan X, NVIDIA Tesla K80 Ø Framework Ø Caffe5) (Deep learning frame by the BVLC6)) Ø Network Architectures Ø Layers

Ø Modified GoogLeNet7) Ø 25~30 Convolutional layers

Ø Input

Ø Patches of Pedestrian and Backgrounds (80x32)

Ø Output

Ø Sigmoid or Softmax

slide-9
SLIDE 9

RECOGNITION NETWORK

Train DNN for recognition (Cont’d.) Ø Dataset – Caltech Pedestrian Detection Benchmark4)

Ø Approximately 10 hours of 640(w) x 480(h) 30Hz video Ø Regular traffic in an urban environment Ø About 250,000 frames with a total of 350,000 bounding boxes

Training Set

  • Mean Height: 64pixels
  • Mean Width: 24 pixels

Testing Set

  • Mean Height: 52 pixels
  • Mean Width: 22 pixels

We choose…

  • 80(h) x 32(w) pixels
  • 0.4 Aspect Ratio
slide-10
SLIDE 10

FULLY CONVOLUTIONAL NETWORK

Convert recognition network to a fully convolutional network

Base Network

limited input size

Kernel sliding

Input size not limited

Fully connected Convolutional

slide-11
SLIDE 11

FULLY CONVOLUTIONAL NETWORK

Regression Layer Ø Ground truth Data Ø Nx4 box coordinates data

Ø N: Feature Map resolution (NX x NY) Ø Original GT Box: B = [x1, y1, x2, y2] Ø New GT Box: B’ = B / m Ø m: Multiplier of Window Size (120 x 120)

Output Feature Map

4

240 120

m = 2 NX NY

slide-12
SLIDE 12

FULLY CONVOLUTIONAL NETWORK

Training detector network Ø Network Architectures Ø Custom loss functions Ø Feature Map: Cross Entropy Loss with Boosting

Ø Boosting Ø Ped: Correct Results (TPs) + Ground Truths (FNs) Ø True Positive if IOU > 0.5 Ø False Negative if Ground Truths not detected Ø NonPed: FPs Ø False Positive if IOU < 0.5

Ø Regression: Euclidean Loss with Feature Map Data incorporated

+

640x480 Original Images

Regression Layer

Fully Convolutional Network

Feature Map Box Coord- inates

slide-13
SLIDE 13

PSVL ND (**)

PERFORMANCE – VERY FAST WITH COMPETITIVE ACCURACY

Performance of Pedestrian Detection Methods (Accuracy vs. Speed)

Ø from DeepCascade paper8) Ø DeepCascade: NVIDIA K20

Ø 15 fps

Ø Ours: NVIDIA GTX770

Ø 34 fps Ø Speed Adjustment Ø 34*0.96999) = 33fps

Ø Ours: NVIDIA Titan X

Ø 51.422 fps w/o cuDNN Ø 85.565 fps with cuDNN4

(*): Left hand side for methods with unknown fps or less than 0.2 fps (**): DeepCascade without extra data (***): SpatialPooling+/Katamari methods use additional motion information

(*) (***)

Faster More accurate

slide-14
SLIDE 14

ND ON PORTABLE DEVICE

Deploy PSVL ND on Google Nexus 9 Ø Hardware Specification Ø 8MP rear camera, 1.6MP front camera Ø Processor Ø NVIDIA Tegra K1 Ø GPU: NVIDIA Kepler with 192 CUDA cores Ø Software Ø Android application Ø Adjustable threshold bars

Ø Probability and NMS Threshold Bar

Ø Speed Ø base resolution (600x390): 5fps Ø lower resolution (280x240): 16fps

slide-15
SLIDE 15

ND APPLICATION

Threshold Information Probability and NMS Threshold Bar Detection box with Probability Toggle for Threshold Bar

slide-16
SLIDE 16

ND APPLICATION DEMO (NEXUS 9)

slide-17
SLIDE 17

ND APPLICATION DEMO (NEXUS 9)

slide-18
SLIDE 18

ND APPLICATION DEMO (LAPTOP WITH GTX970M)

slide-19
SLIDE 19

ND APPLICATION DEMO (CLUSTER WITH TITAN X)

slide-20
SLIDE 20

SUMMARY & CONCLUSION

PSVL Neural Detector supports… Ø end-to-end Pedestrian Detection with a single forward propagation

  • f the neural network

Ø very high speeds with competitive accuracy Ø capable to be run in real-time even when deployed in embedded systems PSVL Neural Detector can be used for… Ø extended system for Multiple-object detection on road conditions Ø Pedestrian, Car, Bus, Truck, Bicycle, Traffic Sign, etc Ø Scalable

Ø with a bit of extra computational power needed

slide-21
SLIDE 21

THANK YOU!

slide-22
SLIDE 22

REFERENCES

1)

  • Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel.

Backpropagation applied to handwritten zip code recognition. Neural Comput., 1(4):541–551, Dec. 1989 2)

  • J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, IJCV 2013

3)

  • C. Lawrence Zitnick and Piotr Doll´ar, Microsoft Research

4) http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ 5) http://caffe.berkeleyvision.org/ 6) Berkeley Vision and Learning Center (http://bvlc.eecs.berkeley.edu/) 7)

  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A.
  • Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition, pages 1–9, 2015 8)

  • A. Angelova, A. Krizhevsky V. Vanhoucke, A. Ogale, D. Ferguson. Real-Time Pedestrian Detection With

Deep Network Cascades 9) http://caffe.berkeleyvision.org/performance_hardware.html