You Only Look Once: Unified, Real-Time Object Detection Redmon et - - PowerPoint PPT Presentation

you only look once
SMART_READER_LITE
LIVE PREVIEW

You Only Look Once: Unified, Real-Time Object Detection Redmon et - - PowerPoint PPT Presentation

You Only Look Once: Unified, Real-Time Object Detection Redmon et al., CVPR 2016 Mincheul Kang 1 Image Retrieval using Scene Graphs Develop novel framework for semantic image retrieval based on the notion of a scene graph Use scene


slide-1
SLIDE 1

Mincheul Kang

You Only Look Once: Unified, Real-Time Object Detection

1

Redmon et al., CVPR 2016

slide-2
SLIDE 2

Image Retrieval using Scene Graphs

2

  • Develop novel framework for semantic image

retrieval based on the notion of a scene graph

  • Use scene graphs as query
  • Introduce a novel dataset of 5K human-generated

scene graphs grounded to images

Object & Attribute Relationship

Query

Output

Measure Score

slide-3
SLIDE 3

Contents

3

  • 1. Background
  • 2. Related work
  • 3. Overview
  • 4. Approach
  • 5. Results
  • 6. Conclusion
  • 7. Q&A
slide-4
SLIDE 4
  • Object detection

Background

4

Localization Recognition What? Where?

Fast R-CNN slides : Ross Girshick

slide-5
SLIDE 5

Background

5

  • Object detection in application
  • Image retrieval
  • Robotics
  • Self-driving car

Need a fast and accurate algorithms

http://www.nvidia.com/object/drive-px.html http://kitschthingoftheday.blogspot.com/2011/06/breakfast-making-robots-at-tum.html

slide-6
SLIDE 6

Background

6 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 80% 70% 60% 50% 40% 30% 20% 10% 0%

year mean Average Precision (mAP)

PASCAL VOC R-CNN DPM Fast R-CNN Faster R-CNN

  • Progress of object detection

After CNN Machine learning + Computer vision

slide-7
SLIDE 7

Related work

7

  • R-CNN (Region proposals + CNN)
  • Selective search
  • CNN that extracts a fixed-length feature vector from

each region

  • Binary linear SVMs

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, Ross Girshick et al., CVPR 2014

slide-8
SLIDE 8

Related work

8

  • Problem in R-CNN
  • Progress in several stages
  • Training and detection time is slow
  • Need a high capacity storage space
slide-9
SLIDE 9

Related work

9

  • Fast R-CNN
  • Training is single-stage, using a multi-task loss
  • Training can update all network layers
  • No disk storage is required for feature caching

Fast R-CNN, Ross Girshick et al., ICCV 2015

slide-10
SLIDE 10

Related work

10

  • Faster R-CNN
  • “selective search” => Computing time is long
  • Region Proposal Network

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Shaoqing Ren et al., NIPS 2015 and Slides

slide-11
SLIDE 11

Related work

11

  • Summary
  • Improve the speed and mAP after CNN
  • But, It is not enough to operate real-time yet
  • YOLO
  • Enable real-time speeds while maintain high average

precision

slide-12
SLIDE 12

Overview

12

  • YOLO detection system

1) Resizes the input images to 448 X 448 2) Runs a single convolutional networks on the image 3) Thresholds the resulting detections by the model’s confidence

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-13
SLIDE 13

Approach

13

  • Divide the input image into an S X S grid

Input image

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-14
SLIDE 14

Approach

14

  • Each grid cell predicts bounding boxes and

confidence scores for those boxes.

  • IOU (intersection over union)
  • Confidence :

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-15
SLIDE 15

Approach

15

  • Each grid cell also predicts conditional class

probabilities

  • Class probability :

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-16
SLIDE 16

Approach

16

  • Thresholds the resulting detections by the

model’s confidence

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-17
SLIDE 17

Approach

17

  • YOLO
  • Enables end-to-end training and real-time speeds
  • Predict all bounding boxes across all classes for an

image simultaneously

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-18
SLIDE 18

Approach

18

  • Training
  • Cost function :

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-19
SLIDE 19

Result

19

  • Result in sample artwork and natural images

from internet

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-20
SLIDE 20

Result

20

  • Real-time speeds while maintaining high

average precision

69.0

You only look once: Unified, real-time object detection, J Redmon et al., CVPR 2016

slide-21
SLIDE 21

Conclusion

21

  • Using a single network, it can be optimized end-to-end

directly on detection

  • Predict all bounding boxes across all classes for an image

simultaneously

  • Real-time speeds while maintaining high average precision
  • Limitations
  • Struggle with small objects that appear in groups, such

as flocks of birds

  • Incorrect localizations
slide-22
SLIDE 22

Q & A

22