you only look once
play

You Only Look Once: Unified, Real-Time Object Detection Redmon et - PowerPoint PPT Presentation

You Only Look Once: Unified, Real-Time Object Detection Redmon et al., CVPR 2016 Mincheul Kang 1 Image Retrieval using Scene Graphs Develop novel framework for semantic image retrieval based on the notion of a scene graph Use scene


  1. You Only Look Once: Unified, Real-Time Object Detection Redmon et al., CVPR 2016 Mincheul Kang 1

  2. Image Retrieval using Scene Graphs • Develop novel framework for semantic image retrieval based on the notion of a scene graph • Use scene graphs as query • Introduce a novel dataset of 5K human-generated scene graphs grounded to images Measure Score Query Output Object & Attribute Relationship 2

  3. Contents 1. Background 2. Related work 3. Overview 4. Approach 5. Results 6. Conclusion 7. Q&A 3

  4. Background • Object detection Localization Where? Recognition What? 4 Fast R-CNN slides : Ross Girshick

  5. Background • Object detection in application • Image retrieval • Robotics • Self-driving car Need a fast and accurate algorithms 5 http://www.nvidia.com/object/drive-px.html http://kitschthingoftheday.blogspot.com/2011/06/breakfast-making-robots-at-tum.html

  6. Background • Progress of object detection After CNN PASCAL VOC 80% mean Average Precision (mAP) Faster R-CNN 70% Fast R-CNN 60% R-CNN 50% 40% DPM 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Machine learning + Computer vision 6

  7. Related work • R-CNN (Region proposals + CNN) • Selective search • CNN that extracts a fixed-length feature vector from each region • Binary linear SVMs Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 7 Ross Girshick et al., CVPR 2014

  8. Related work • Problem in R-CNN • Progress in several stages • Training and detection time is slow • Need a high capacity storage space 8

  9. Related work • Fast R-CNN • Training is single-stage, using a multi-task loss • Training can update all network layers • No disk storage is required for feature caching Fast R-CNN, 9 Ross Girshick et al., ICCV 2015

  10. Related work • Faster R-CNN • “selective search” => Computing time is long • Region Proposal Network Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 10 Shaoqing Ren et al., NIPS 2015 and Slides

  11. Related work • Summary • Improve the speed and mAP after CNN • But, It is not enough to operate real-time yet • YOLO • Enable real-time speeds while maintain high average precision 11

  12. Overview • YOLO detection system 1) Resizes the input images to 448 X 448 2) Runs a single convolutional networks on the image 3) Thresholds the resulting detections by the model’s confidence You only look once: Unified, real-time object detection, 12 J Redmon et al., CVPR 2016

  13. Approach • Divide the input image into an S X S grid Input image You only look once: Unified, real-time object detection, 13 J Redmon et al., CVPR 2016

  14. Approach • Each grid cell predicts bounding boxes and confidence scores for those boxes. • IOU (intersection over union) • Confidence : You only look once: Unified, real-time object detection, 14 J Redmon et al., CVPR 2016

  15. Approach • Each grid cell also predicts conditional class probabilities • Class probability : You only look once: Unified, real-time object detection, 15 J Redmon et al., CVPR 2016

  16. Approach • Thresholds the resulting detections by the model’s confidence You only look once: Unified, real-time object detection, 16 J Redmon et al., CVPR 2016

  17. Approach • YOLO • Enables end-to-end training and real-time speeds • Predict all bounding boxes across all classes for an image simultaneously You only look once: Unified, real-time object detection, 17 J Redmon et al., CVPR 2016

  18. Approach • Training • Cost function : You only look once: Unified, real-time object detection, 18 J Redmon et al., CVPR 2016

  19. Result • Result in sample artwork and natural images from internet You only look once: Unified, real-time object detection, 19 J Redmon et al., CVPR 2016

  20. Result • Real-time speeds while maintaining high average precision 69.0 You only look once: Unified, real-time object detection, 20 J Redmon et al., CVPR 2016

  21. Conclusion • Using a single network, it can be optimized end-to-end directly on detection • Predict all bounding boxes across all classes for an image simultaneously • Real-time speeds while maintaining high average precision • Limitations • Struggle with small objects that appear in groups, such as flocks of birds • Incorrect localizations 21

  22. Q & A 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend