CSC2548: Machine Learning in Computer Vision
YOLO9000: Better, Faster, Stronger
1 Haris Khan
Date: January 24, 2018 Prepared by Haris Khan (University of Toronto)
YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared - - PowerPoint PPT Presentation
YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) CSC2548: Machine Learning in Computer Vision Haris Khan 1 Overview 1. Motivation for one-shot object detection and weakly-supervised
CSC2548: Machine Learning in Computer Vision
1 Haris Khan
Date: January 24, 2018 Prepared by Haris Khan (University of Toronto)
CSC2548: Machine Learning in Computer Vision
weakly-supervised learning
2 Haris Khan
CSC2548: Machine Learning in Computer Vision
and Faster R-CNN [5] Motivation:
class probabilities at the same time
methods
Haris Khan 3
CSC2548: Machine Learning in Computer Vision
VOC 2007 / 2012:
ImageNet1000:
retriever, European fire salamander
Haris Khan 4
MS COCO:
Motivation:
during training using existing detection and classification datasets
CSC2548: Machine Learning in Computer Vision
Haris Khan 5
PASCAL VOC:
cell has 𝐶 objects.
into S × 𝑇 × (5𝐶 + 𝐷)
Image Credit: [1]
CSC2548: Machine Learning in Computer Vision
Haris Khan 6
Image Credit: [1] Grid creation, bounding box & class predictions
CSC2548: Machine Learning in Computer Vision
Haris Khan 7
Image Credit: [1]
CSC2548: Machine Learning in Computer Vision
Haris Khan 8
VOC 2007 Test Results VOC 2012 Test Results Table Credits: [1] * *Speed measured on Titan X GPU
CSC2548: Machine Learning in Computer Vision
Haris Khan 9
Image Credit: [1]
CSC2548: Machine Learning in Computer Vision
YOLOv2 [2]:
accuracy YOLO9000 [2]:
can learn by using weakly-supervised training on the union of detection (i.e. VOC, COCO) and classification (i.e. ImageNet) datasets
Haris Khan 10
CSC2548: Machine Learning in Computer Vision
Haris Khan 11
Modification Effect Bounding Boxes Anchor Boxes 7% recall increase Dimension clusters + new bounding box parameterization 4.8% mAP increase Architecture New Darknet-19 replaces GoogLeNet 33% computation decrease, 0.4% mAP increase Convolutional prediction layer 0.3% mAP increase Training Batch normalization 2% mAP increase High resolution fine-tuning of weights 4% mAP increase Multi-scale images 1.1% mAP increase Passthrough for fine-grained features 1% mAP increase
CSC2548: Machine Learning in Computer Vision
various aspect ratio to be detected in a single grid cell
clustering of VOC 2007 training set
IOU / model complexity
predicts bounding box centre point, width and height
Haris Khan 12
Image Credit: [2]
CSC2548: Machine Learning in Computer Vision
pooling layers
Haris Khan 13
Table Credit: [2] DarkNet-19 for Image Classification
CSC2548: Machine Learning in Computer Vision
Haris Khan 14
Video link: https://youtu.be/Cgxsv1riJhI?t=290
CSC2548: Machine Learning in Computer Vision
Haris Khan 15
Image Credits: [2]
CSC2548: Machine Learning in Computer Vision Haris Khan 16
Slide Credit: Joseph Redmon [3]
CSC2548: Machine Learning in Computer Vision
Haris Khan 17
Image Credit: [2]
CSC2548: Machine Learning in Computer Vision Haris Khan 18
Slide Credit: Joseph Redmon [3]
CSC2548: Machine Learning in Computer Vision Haris Khan 19
Image Credit: Joseph Redmon [3]
CSC2548: Machine Learning in Computer Vision Haris Khan 20
Image Credit: Joseph Redmon [3]
CSC2548: Machine Learning in Computer Vision
Datasets:
Data Augmentation:
Hyperparameters:
Haris Khan 21
Training Enhancements:
layers replace last convolutional layer of DarkNet-19 base model
3x3x512 and second-to-last convolutional layers, adding fine- grained features to prediction layer
CSC2548: Machine Learning in Computer Vision
Datasets:
Bounding Boxes:
Haris Khan 22
Backpropagating Loss:
backpropagate as in YOLOv2
images, only backpropagate classification loss, while finding best matching bounding box from WordTree
CSC2548: Machine Learning in Computer Vision
Haris Khan 23
Image Credit: Joseph Redmon [3] VOC 2007 Test Results
CSC2548: Machine Learning in Computer Vision Haris Khan 24
Table Credits: [2] VOC 2012 Test Results COCO Test-Dev 2015 Results
CSC2548: Machine Learning in Computer Vision
ImageNet and COCO
classes
Haris Khan 25
Table Credit: [2] Best and Worst Classes on ImageNet
CSC2548: Machine Learning in Computer Vision
Strengths:
training data
datasets
Weaknesses:
Haris Khan 26
CSC2548: Machine Learning in Computer Vision
domains, such as image segmentation or dense captioning
Haris Khan 27
CSC2548: Machine Learning in Computer Vision
Haris Khan 28
CSC2548: Machine Learning in Computer Vision
[1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788. [2] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” arXiv preprint. ArXiv161208242, 2016. [3] J. Redmon, “YOLO9000 Better, Faster, Stronger,” presented at the CVPR, 2017. [4] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587. [5] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448 [6] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015,
[7] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” arXiv preprint. ArXiv170802002, 2017.
Haris Khan 29