MIL-UT at ILSVRC2014
Senthil Purushwalkam, Yuichiro Tsuchiya, Atsushi Kanehira, Asako Kanezaki and *Tatsuya Harada The University of Tokyo
IIT Guwahati (undergrad)
- > Virginia Tech (intern)
MIL-UT at ILSVRC2014 IIT Guwahati (undergrad) -> Virginia Tech - - PowerPoint PPT Presentation
MIL-UT at ILSVRC2014 IIT Guwahati (undergrad) -> Virginia Tech (intern) Senthil Purushwalkam, Yuichiro Tsuchiya, Atsushi Kanehira, Asako Kanezaki and *Tatsuya Harada The University of Tokyo Pipeline of CLS-LOC task Multiclass Object
Senthil Purushwalkam, Yuichiro Tsuchiya, Atsushi Kanehira, Asako Kanezaki and *Tatsuya Harada The University of Tokyo
IIT Guwahati (undergrad)
Input image Extract region proposals Extract CNN features Scoring regions by multiclass PA Whole image Extract FV with spacial information Scoring whole image by multiclass PA
Averaged multiclass Passive Aggressive with hard negative mining
1-1 Scoring each bounding boxes by RCNN 1-2 Scoring whole image by FV as contextual scores fc7
Averaged multiclass Passive Aggressive Late fusion Score
Multiclass Object Detection with hard negative classes
accurate object detection and semantic segmentation, CVPR, 2014.
Object Recognition. IJCV, 2013.
Input image Extract region proposals Extract CNN features Scoring regions by multiclass PA Whole image Extract FV with spacial information Scoring whole image by multiclass PA Averaged multiclass Passive Aggressive with hard negative mining 1-1 Scoring each bounding boxes by RCNN 1-2 Scoring whole image by FV as contextual scores fc7 Averaged multiclass Passive Aggressive Late fusion Score
Kuniyoshi and T. Harada. Hard Negative Classes for Multiple Object Detection. ICRA, 2014.
Input image Extract region proposals Extract CNN features Scoring regions by multiclass PA Whole image Extract FV with spacial information Scoring whole image by multiclass PA Averaged multiclass Passive Aggressive with hard negative mining 1-1 Scoring each bounding boxes by RCNN 1-2 Scoring whole image by FV as contextual scores fc7 Averaged multiclass Passive Aggressive Late fusion Score
๐๐
(๐ข+1) = ๐๐ (๐ข) + ๐๐ข๐๐ข
โฆ
Score of class 1 Score of class 2 Score of class K
โฆ
: Positive class : Negative class with the highest score
r s
๐๐ข
๐๐๐ข =
We use Passive Aggressive (PA) [Crammer et al., 2006] to learn multi-class linear classifiers. ๐
๐ข+1 = arg min ๐ 1 2 ๐ โ ๐ ๐ข 2 + ๐ทฮถs. t. ๐ ๐๐ ๐ข , ๐ง๐ ๐ข ; ๐ โค ฮถ, ฮถโฅ 0
where
๐๐ก
(๐ข+1) = ๐๐ก (๐ข) โ ๐๐ข๐๐ข
๐๐ข = min ๐ท, 1 โ (๐๐
๐ข ๐
๐๐ข โ ๐๐ก
๐ข ๐
๐๐ข) 2 ๐๐ข
2
๐๐ฟ
ERROR
๐๐
(๐ข+1) = ๐๐ (๐ข) + ๐๐ข๐๐ข
โฆ
Score of class 1 Score of class 2 Score of class K
โฆ
๐๐ข
where
๐๐ก
(๐ข+1) = ๐๐ก (๐ข) โ ๐๐ข๐๐ข
๐๐ข = min ๐ท, 1 โ (๐๐
๐ข ๐
๐๐ข โ ๐๐ก
๐ข ๐
๐๐ข) 2 ๐๐ข
2
๐๐ฟ โฆ ๐โฒ๐ฟ ๐โฒ1 ๐โฒ2
Score of negative class 1 Score of negative class 2 Score of negative class K
โฆ
ERROR
๐ ๐๐ ๐ข , ๐ง๐ ๐ข ; ๐
๐1 ๐2
Cf.) single background class
๐๐๐
does not work.
๐๐
(๐ข+1) = ๐๐ (๐ข) + ๐๐ข๐๐ข
โฆ
Score of class 1 Score of class 2 Score of class K
โฆ
= class 2 : Negative class with the highest score
r s
๐๐ข
where
๐๐ก
(๐ข+1) = ๐๐ก (๐ข) โ ๐๐ข๐๐ข
๐๐ข = min ๐ท, 1 โ (๐๐
๐ข ๐
๐๐ข โ ๐๐ก
๐ข ๐
๐๐ข) 2 ๐๐ข
2
๐๐ฟ โฆ ๐โฒ๐ฟ ๐โฒ1 ๐โฒ2
Score of negative class 1 Score of negative class 2 Score of negative class K
โฆ
ERROR
๐ ๐๐ ๐ข , ๐ง๐ ๐ข ; ๐
๐1 ๐2
Ex.) If a training sample ๐๐ข is a positive sample of class 2, Candidates of ๐ก๏ผ class1, 3, โฆ, or K,
Classification error
s r
๐๐
(๐ข+1) = ๐๐ (๐ข) + ๐๐ข๐๐ข
โฆ
Score of class 1 Score of class 2 Score of class K
โฆ
= class 2 = negative class 2
๐๐ข
where
๐๐ก
(๐ข+1) = ๐๐ก (๐ข) โ ๐๐ข๐๐ข
๐๐ข = min ๐ท, 1 โ (๐๐
๐ข ๐
๐๐ข โ ๐๐ก
๐ข ๐
๐๐ข) 2 ๐๐ข
2
๐๐ฟ โฆ ๐โฒ๐ฟ ๐โฒ1 ๐โฒ2
Score of negative class 1 Score of negative class 2 Score of negative class K
โฆ
ERROR
๐ ๐๐ ๐ข , ๐ง๐ ๐ข ; ๐
๐1 ๐2
Ex.) If a training sample ๐๐ข is a negative sample of class 2,
Detection error
โ
โ INRIA's Fisher vector implementation
โ L2 normalization, Power normalization, Spatial pyramid
โ Dimension reduction of local feature (D): 64 dim โ # of components in GMM (K): 256 โ 5 scales of local patches โ Spatial pyramid (P): 1x1 + 2x2 + 3x1 = 8 โ Dimension of IFK: 2PKD=262,144 dim
โ SIFT
9
Input image Extract region proposals Extract CNN features Scoring regions by multiclass PA Whole image Extract FV with spacial information Scoring whole image by multiclass PA Averaged multiclass Passive Aggressive with hard negative mining 1-1 Scoring each bounding boxes by RCNN 1-2 Scoring whole image by FV as contextual scores fc7 Averaged multiclass Passive Aggressive Late fusion Score
Input image Extract region proposals Extract CNN features Scoring regions by multiclass PA Whole image Extract FV with spacial information Scoring whole image by multiclass PA Averaged multiclass Passive Aggressive with hard negative mining 1-1 Scoring each bounding boxes by RCNN 1-2 Scoring whole image by FV as contextual scores fc7 Averaged multiclass Passive Aggressive Late fusion Score
Three Guidelines of Online Learning for Large-Scale Visual Recognition. CVPR, 2014. 1. Perceptron can compete against the latest methods.
2. Averaging is necessary for any algorithm.
cannot compete against second-order algorithms.
accuracies of all algorithms become very close to each other.
3. Investigate multiclass learning first.
multiclass learning achieve similar accuracy.
much longer CPU time to converge than multiclass does. ๏จ ๏ฉ
T
T ฮผ ฮผ ฮผ ฮผ ๏ซ ๏ซ ๏ซ ๏ฝ ๏
2 1
1 i y i y Y y i
i
y x ฮผ ๏ ๏ฝ ๏ข
๏ \
max arg
Averaging
Input image Extract region proposals Compute CNN features Scoring regions by Multiclass PA for each class Whole image Extract FV with spacial information Scoring by linear classifier trained by PA for each class
Multiclass PA for class 1 Multiclass PA for class ๐
โฎ
Multiclass PA for class 1000
โฎ ๐๐,1
๐ท๐๐
๐๐,๐
๐ท๐๐
๐๐,1000
๐ท๐๐
โฎ โฎ
Multiclass PA for class 1 Multiclass PA for class ๐
โฎ
Multiclass PA for class 1000
โฎ ๐1
๐บ๐
๐
๐ ๐บ๐
๐1000
๐บ๐
โฎ โฎ 1-1 Scoring each bounding boxes by RCNN 1-2 Scoring whole image by FV as contextual scores fc7
For bounding box ๐, class ๐, ๐๐,๐
๐๐๐ฅ = ๐๐,๐ ๐ท๐๐๐ ๐ ๐บ๐
Method Localization error Classification error R-CNN feature + one-vs-all SVMs 0.631743 0.460080 R-CNN feature + multi-class PA 0.446121 0.285720 R-CNN feature + multi-class PA using hard negative classes 0.387516 0.227200 R-CNN feature + multi-class PA using hard negative classes, and FV 0.341743 0.18768 Team name Localization error Classification error
VGG 0.253231 0.07405 GoogLeNet 0.264414 0.14828 SYSU_Vision 0.31899 0.14446 MIL (our team) 0.337414 ๏ 0.20734 ๏
Validation dataset Test dataset
โ R-CNN based region proposals and features with multi-class object detectors which create hard negative class for each positive class โ Global features (FVs) with multi-class online-learning โ Late fusion of region score and global score
performance.
the-rest SVMs.
Input image Extract region proposals Extract CNN features Scoring regions by multiclass PA Whole image Extract FV with spacial information Scoring whole image by multiclass PA
Averaged multiclass Passive Aggressive with hard negative mining
1-1 Scoring each bounding boxes by RCNN 1-2 Scoring whole image by FV as contextual scores fc7
Averaged multiclass Passive Aggressive
Late fusion Score