Segments, Residuals and Embeddings for Few-Example Video Event Detection
Dennis Koelma and Cees Snoek University of Amsterdam The Netherlands
Segments, Residuals and Embeddings for Few-Example Video Event - - PowerPoint PPT Presentation
Segments, Residuals and Embeddings for Few-Example Video Event Detection Dennis Koelma and Cees Snoek University of Amsterdam The Netherlands Pipeline 10Ex 2016 CNN Inception avg ImageNet sample pool 2 / sec Shuffle SVM Videos Frames
Dennis Koelma and Cees Snoek University of Amsterdam The Netherlands
Videos Frames sample 2 / sec pool5 CNN Inception ImageNet Shuffle SVM 10Ex M1 avg pool Video Story embedding SVM 10Ex M5 prob SVM 10Ex M2 avg pool dense trajectories SVM 10Ex M3 Fisher vector mfcc0 mfcc1 mfcc2 SVM 10Ex M4 Fisher vector
Videos Frames sample 2 / sec pool5 ResNet + ResNeXt ImageNet Shuffle SVM 10Ex M1 difference coding Video Story embedding SVM 10Ex M5 sliding window SVM 10Ex M2 avg pool dense trajectories SVM 10Ex M3 Fisher vector mfcc0 mfcc1 mfcc2 SVM 10Ex M4 Fisher vector
4
Gametophyte Siderocyte 296 classes with 1 image Example imbalance Irrelevant classes
Roll N < 3000 : Bind N > 2000 : Subsample N < 200 : Promote
The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection, Pascal Mettes and Dennis Koelma and Cees Snoek, International Conference on Multimedia Retrieval, 2016
0.290 0.300 0.310 0.320 0.330 0.340 0.350 flatten-avg flatten-dc
MAP 2014 Test Set
ResNet ResNeXt Fusion
Bike Motorcycle Stunt
Videostory: A new multimedia embedding for few-example recognition and translation of events, Amirhossein Habibian and Thomas Mensink and Cees Snoek, Proceedings of the ACM International Conference on Multimedia, 2014
0.300 0.305 0.310 0.315 0.320 0.325 0.330 0.335 flatten-avg video story
MAP 2014 Test Set
ResNet ResNeXt Fusion
Attempting a bike trick 0.45 bike 0.30 man
Cosine similarity 1.0 attempt 1.0 bike 1.0 trick
Window Example1 Example1_1 Example1_2 Example1_3
Cosine similarity
0.295 0.300 0.305 0.310 0.315 0.320 0.325 0.330 0.335 0.340 0.345 flatten-avg flatten-window
MAP 2014 Test Set
ResNet ResNeXt Fusion
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
flatten-avg softmax trajectories mfcc video story flatten-dc flatten-window
ResNet ResNeXt Fusion
R < Rx < F DC is best
VS > flatten window > avg
0.315 0.320 0.325 0.330 0.335 0.340 0.345 0.350 0.355 0.360 VS DC Win DC-VS DC-Win VS-DC-Win
ResNet + ResNeXt
0.315 0.320 0.325 0.330 0.335 0.340 0.345 0.350 0.355 0.360
AVG2-DT-MFCC-VS DC VS-DC-Win VS-DC-Win-DT-MFCC VS-DC-Win-DT-MFCC-AVG2
ResNeXt ResNet+ResNeXt
last year new features single mod visual fusion MM fusion + avg
50 100 150 200 250
Feature Extraction
p-visualFusionTwoCNN c-mmFusionTwoCNN c-visualFusionOneCNN c-mmFusionOneCNN c-visualSingle 33.8 34 34.2 34.4 34.6 34.8 35 35.2 35.4 35.6 35.8
MAP
0.02 0.04 0.06 0.08 0.1 0.12
Classification
Test 2014 PS p-visualFusionTwoCNN c-mmFusionTwoCNN c-visualFusionOneCNN c-mmFusionOneCNN c-visualSingle AH
5 10 15 20 25 30 35 40 45 50
PS
10 20 30 40 50 60 70 80 MediaMill MediaMill TokyoTech TokyoTech ITICERTH ITICERTH INF
AH