a single neural network forward propagation detector
play

A SINGLE NEURAL NETWORK FORWARD PROPAGATION DETECTOR MINYOUNG KIM - PowerPoint PPT Presentation

A SINGLE NEURAL NETWORK FORWARD PROPAGATION DETECTOR MINYOUNG KIM MINYOUNG.KIM@US.PANASONIC.COM PANASONIC SILICON VALLEY LABORATORY PANASONIC SILICON VALLEY LABORATORY Location Cupertino Lab. Size ~ 100 people Team ~ 20 Deep Learning


  1. A SINGLE NEURAL NETWORK FORWARD PROPAGATION DETECTOR MINYOUNG KIM MINYOUNG.KIM@US.PANASONIC.COM PANASONIC SILICON VALLEY LABORATORY

  2. PANASONIC SILICON VALLEY LABORATORY Location Cupertino Lab. Size ~ 100 people Team ~ 20 Deep Learning Robotics ADAS (Advanced Drivers Activities Assistant System ) Drones Collaboration with Universities in the world

  3. OBJECT DETECTION WITH DEEP LEARNING Pros. Ø High performance Ø beat state-of-the-art records in many tasks including image classification and detection 1) Cons. Ø Large set of database Ø High computational power Ø Deep Neural Networks with millions of parameters Ø Slower running time than most of conventional algorithms

  4. OBJECT DETECTION SYSTEM Run Recognizer Background Classification Pedestrian Proposal Generation Recognition Network Merge boxes Building Object Detection System Ø Training Deep Neural Network for Classification Ø Pedestrian detection: Binary classification Ø Object Proposal Generation at different scales Ø Generate box proposals (1000 ~ 2000 boxes) Ø Selective Search 2) ,Edge Boxes 3) , etc. Ø Merge largely overlapping boxes Ø Non Maximum Suppression

  5. OBJECT DETECTION SYSTEM Proposal Generation & Scaling Ø Region proposal Ø Selective Search: 2 seconds per image (CPU) Ø order of magnitude slower Ø Edge Boxes: 0.2 seconds per image Ø Scaling Proposal Generation Ø multiple forward propagations Time Consuming! Ø Bottleneck Scaling Ø a forward propagation of an image Ø less than 0.1 seconds (GPU)

  6. OUR PEDESTRIAN DETECTION SYSTEM Purpose Ø Speed up Ø remove proposal generation step to make the system faster Ø Speed is one of the most important element in ADAS (Advanced Driver Assistant Systems) applications (Practical Applications) Ø Build scale-invariant system Ø no need to process multiple scaled image to detect different size of pedestrians in an image PSVL Pedestrian Detection System INPUT OUTPUT PSVL Neural Detector A Single Forward Propagation

  7. OUR PEDESTRIAN DETECTION SYSTEM Fully Convolutional Network as Detector Recognition Network Detection by a single forward propagation Add Regression Layer and Finetune

  8. RECOGNITION NETWORK Train DNN for recognition Ø GPU Ø NVIDIA Titan X, NVIDIA Tesla K80 Ø Framework Ø Caffe 5) (Deep learning frame by the BVLC 6) ) Ø Network Architectures Ø Layers Ø Modified GoogLeNet 7) Ø 25~30 Convolutional layers Ø Input Ø Patches of Pedestrian and Backgrounds (80x32) Ø Output Ø Sigmoid or Softmax

  9. RECOGNITION NETWORK Train DNN for recognition (Cont’d.) Ø Dataset – Caltech Pedestrian Detection Benchmark 4) Ø Approximately 10 hours of 640(w) x 480(h) 30Hz video Ø Regular traffic in an urban environment Ø About 250,000 frames with a total of 350,000 bounding boxes We choose … - 80(h) x 32(w) pixels - 0.4 Aspect Ratio Training Set - Mean Height: 64pixels - Mean Width: 24 pixels Testing Set - Mean Height: 52 pixels - Mean Width: 22 pixels

  10. FULLY CONVOLUTIONAL NETWORK Convert recognition network to a fully convolutional network Base Network Fully connected Convolutional Kernel sliding limited input size Input size not limited

  11. FULLY CONVOLUTIONAL NETWORK Regression Layer Ø Ground truth Data N X 4 Ø Nx4 box coordinates data Output Ø N: Feature Map resolution (N X x N Y ) N Y Feature Ø Original GT Box: B = [x 1 , y 1 , x 2 , y 2 ] Map Ø New GT Box: B’ = B / m Ø m : Multiplier of Window Size (120 x 120) 120 240 m = 2

  12. FULLY CONVOLUTIONAL NETWORK Training detector network Ø Network Architectures Fully Box 640x480 Convolutional + Feature Coord- Original Network Map inates Images Regression Layer Ø Custom loss functions Ø Feature Map: Cross Entropy Loss with Boosting Ø Boosting Ø Ped: Correct Results (TPs) + Ground Truths (FNs) Ø True Positive if IOU > 0.5 Ø False Negative if Ground Truths not detected Ø NonPed: FPs Ø False Positive if IOU < 0.5 Ø Regression: Euclidean Loss with Feature Map Data incorporated

  13. PERFORMANCE – VERY FAST WITH COMPETITIVE ACCURACY Performance of Pedestrian Detection Methods (Accuracy vs. Speed) Faster (*) Ø from DeepCascade paper 8) More accurate Ø DeepCascade: NVIDIA K20 Ø 15 fps Ø Ours: NVIDIA GTX770 (**) (***) Ø 34 fps Ø Speed Adjustment PSVL ND Ø 34*0.9699 9) = 33fps Ø Ours: NVIDIA Titan X Ø 51.422 fps w/o cuDNN Ø 85.565 fps with cuDNN4 (*): Left hand side for methods with unknown fps or less than 0.2 fps (**): DeepCascade without extra data (***): SpatialPooling+/Katamari methods use additional motion information

  14. ND ON PORTABLE DEVICE Deploy PSVL ND on Google Nexus 9 Ø Hardware Specification Ø 8MP rear camera, 1.6MP front camera Ø Processor Ø NVIDIA Tegra K1 Ø GPU: NVIDIA Kepler with 192 CUDA cores Ø Software Ø Android application Ø Adjustable threshold bars Ø Probability and NMS Threshold Bar Ø Speed Ø base resolution (600x390): 5fps Ø lower resolution (280x240): 16fps

  15. ND APPLICATION Toggle for Threshold Bar Detection box with Probability Probability and NMS Threshold Information Threshold Bar

  16. ND APPLICATION DEMO (NEXUS 9)

  17. ND APPLICATION DEMO (NEXUS 9)

  18. ND APPLICATION DEMO (LAPTOP WITH GTX970M)

  19. ND APPLICATION DEMO (CLUSTER WITH TITAN X)

  20. SUMMARY & CONCLUSION PSVL Neural Detector supports … Ø end-to-end Pedestrian Detection with a single forward propagation of the neural network Ø very high speeds with competitive accuracy Ø capable to be run in real-time even when deployed in embedded systems PSVL Neural Detector can be used for … Ø extended system for Multiple-object detection on road conditions Ø Pedestrian, Car, Bus, Truck, Bicycle, Traffic Sign, etc Ø Scalable Ø with a bit of extra computational power needed

  21. THANK YOU!

  22. REFERENCES 1) Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1(4):541–551, Dec. 1989 2) J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, IJCV 2013 3) C. Lawrence Zitnick and Piotr Doll´ar, Microsoft Research 4) http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ 5) http://caffe.berkeleyvision.org/ 6) Berkeley Vision and Learning Center (http://bvlc.eecs.berkeley.edu/) 7) C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015 8) A. Angelova, A. Krizhevsky V. Vanhoucke, A. Ogale, D. Ferguson. Real-Time Pedestrian Detection With Deep Network Cascades 9) http://caffe.berkeleyvision.org/performance_hardware.html

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend