Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa - PowerPoint PPT Presentation

TRECVID 2014 TokyoTech-Waseda Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa Inoue and Koichi Shinoda Zhang Xuefeng and Kazuya Ueki Tokyo Institute of Technology Waseda University

TRECVID 2014 TokyoTech-Waseda Outline ! Part 1: Our system at TRECVID 2014 - Deep CNNs + GMM spuervectors - n-gram models for re-scoring Best result: Mean InfAP = 0.281 ! Part 2: Motion features & Future work TokyoTech-Waseda_1 � 1 1

TRECVID 2014 TokyoTech-Waseda System Overview ! Deep CNN + GMM Supervectors � Deep CNN SVM Video shot Fusion & Rescoring GMM Supervectors SVMs Audio & Visual Low-level features � GMM � 2 2

TRECVID 2014 TokyoTech-Waseda Deep CNN � ! A 4096 dimensional feature vector at the sixth layer is extracted ! A pre-trained model on ImageNET 2012 [1] [1] Y. Jia, et al., Caffe: Convolutional Architecture for Fast Feature Embedding. Proc. ACM Multimedia Open Source Competition, 2014. � 3

TRECVID 2014 TokyoTech-Waseda GMM Supervectors ! Extend BoW to a probabilistic framework 1) Extract 6 types of visual/audio features: Har-SIFT, Hes- SIFT, Dense HOG, Dense LBP, Dense SIFTH, and MFCC 2) Estimate GMM parameters for each shot 3) Combine normalized mean vectors GMM supervector 4

TRECVID 2014 TokyoTech-Waseda Shot Scores ! Linear combination of SVM scores where F is a feature type, is a weight. shot 1 � shot 2 � shot 3 � shot 4 � shot 5 � 5

TRECVID 2014 TokyoTech-Waseda n-Gram Models � ! n-consecutive video shots are dependent ! Bigram (n=2) � shot i-1 shot i Re-scoring by � Label (+1 or -1) Shot score � Label � N. Inoue and K. Shinoda, “n-gram models for video semantic indexing,” ACM MM 2014. � 6

TRECVID 2014 TokyoTech-Waseda A Full-Gram Model � ! n-consecutive video shots are dependent ! Full-gram - we simply add the maximum shot score in a video clip Full-gram ! � max 7

TRECVID 2014 TokyoTech-Waseda Results Mean Run ID Method InfAP TokyoTech-Waseda_4 baseline: GMM Supervectors + Full- 0.260 � gram re-scoring TokyoTech-Waseda_3 + sampling � 0.262 � + Deep CNN � 0.280 � TokyoTech-Waseda_2 TokyoTech-Waseda_1 + Deep CNN (optimized weight) � 0.281 � TokyoTech-Waseda_1 � 8

TRECVID 2014 TokyoTech-Waseda InfAP by Semantic Concepts 9

TRECVID 2014 TokyoTech-Waseda Evaluation of n-Gram Models � ! Mean AP on SIN 2012 Method � MeanAP SIN 2012 � Baseline � 0.306 � Bi-gram(n=2) � 0.312 � Tri-gram(n=3) � 0.312 � Full-gram � 0.321 � 10

TRECVID 2014 TokyoTech-Waseda Conclusion (Part 1) ! Deep CNN + GMM Supervector ! n-gram models for re-scoring ! Experimental Results - Mean InfAP: 0.281 ! Future work - Improving audio analysis - Introducing motion features for object tracking with deep CNNs 11

TRECVID 2014 TokyoTech-Waseda Motion features ! Our baseline system did not include any motion information - 5 visual (Har-SIFT, Hes-SIFT, Dense HOG, Dense LBP, and Dense SIFTH) + 1 audio features ! Tried to introduce Dense trajectories into our system - Probably effective for some actions / movements. ex.) “Running”, “Swimming”, “Throwing” and etc. - But unfortunately, we could not finish before the submission deadline. 12

TRECVID 2014 TokyoTech-Waseda Dense trajectories ! 4 types of features were extracted from each shot - Trajectory (a sequence of displacement vectors) - HOG (Histogram of Oriented Gradient) - HOF (Histogram of Optical Flow) - MBH (Motion Boundary Histogram)

TRECVID 2014 TokyoTech-Waseda Dense trajectories ! Setting - Use every other frames - Trajectory length L=15 " More than 30 frames are needed to extract features, but about 40% of shots have less than 30 frames ! - Volume is subdivided into a spatio-temporal grid of size 2 x 2 x 3 - Orientations are quantized into 8 (or 9) bins. L = 15 [frames] � ・ HOG: 96 dim 32 dim ・ HOF: 108 dim 32 dim 2 x 2 � PCA � ・ MBH: 108x2 dim 64 dim 5 [frames] �

TRECVID 2014 TokyoTech-Waseda Dense trajectories GMM Supervectors SVMs Scores Trajectory Video shot HOG SVMs Scores on trajectories HOF SVMs Scores on trajectories MBH SVMs Scores on trajectories GMM � 15

TRECVID 2014 TokyoTech-Waseda Performance of dense trajectories Mean AP on SIN 2010 Method � MeanAP(%) � Baseline (6 features) � 14.07 � Trajectory � 1.28 � HOG � 8.30 � on trajectories HOF � 4.79 � on trajectories MBH � 7.14 � on trajectories 16

TRECVID 2014 TokyoTech-Waseda Complementarity Mean AP (%) on SIN 2010 Dense HOG + HOG on trajectories 9.82 � 10.90 � 8.30 � Late fusion ! We have not tried the fusion weight optimization, but Dense HOG and HOG on trajectories is not so complementary. 17

TRECVID 2014 TokyoTech-Waseda Complementarity ! HOF and MBH are different from other features. ! Finally, we could slightly improve mean AP by combining MBH with our baseline method. Mean AP (%) on SIN 2010 6 features + MBH on trajectories 14.29 � 14.07 � 7.14 � Late fusion (*) no fusion weight optimization � 18

TRECVID 2014 TokyoTech-Waseda Future work ! Adapt velocity pyramid to dense SIFT/HOG/LBP ! ! Motion features with deep CNN 19

Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa - PowerPoint PPT Presentation

TRECVID 2014 TokyoTech-Waseda Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa Inoue and Koichi Shinoda Zhang Xuefeng and Kazuya Ueki Tokyo Institute of Technology Waseda University TRECVID 2014 TokyoTech-Waseda Outline ! Part

Semantic Indexing Using GMM Supervectors and Video-Clip Scores Nakamasa Inoue, Kotaro Mori, and

Semantic Indexing Using GMM Supervectors with MFCCs and SIFT features Ilseo Kim, Byungki Byun

Semantic Indexing Using GMM Supervectors and Tree-structured GMMs Nakamasa Inoue, Koichi Shinoda,

Multimedia Event Detection Using GMM Supervectors and Camera Motion Cancelled Features Yusuke

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Single-Equation GMM Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

Semantic Analysis and Semantic Roles Ling 571 Deep Processing Techniques for NLP February 10,

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

Information Retrieval WS 2016 / 2017 Lecture 5, Tuesday November 22 nd , 2016 (Fuzzy Search, Edit

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

On Out-of-Distribution Detection Algorithms with Deep Neural Skin Cancer Classifiers Andre G. C.

N-gram Graph: Representation for Graphs Shengchao Liu, Mehmet Furkan Demirel, Yingyu Liang

Rich History of WIC MN Sen. HuBERT Humphrey sponsored legislation creating WIC in 1972

Disclosures None Thyroid Cases Case Based Discussion 69 yo healthy active man with abnormal

Hypothyroidism Therapeutics PHAR 451 Peter Loewen, B.Sc.(Pharm), ACPR, Pharm.D., FCSHP Lower

Disease Modifying Therapies David Paling - Sheffield Rachel Dorsey-Campbell - Imperial Timeline

Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa - PowerPoint PPT Presentation

TRECVID 2014 TokyoTech-Waseda Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa Inoue and Koichi Shinoda Zhang Xuefeng and Kazuya Ueki Tokyo Institute of Technology Waseda University TRECVID 2014 TokyoTech-Waseda Outline ! Part

Semantic Indexing Using GMM Supervectors and Video-Clip Scores Nakamasa Inoue, Kotaro Mori, and

Semantic Indexing Using GMM Supervectors with MFCCs and SIFT features Ilseo Kim, Byungki Byun

Semantic Indexing Using GMM Supervectors and Tree-structured GMMs Nakamasa Inoue, Koichi Shinoda,

Multimedia Event Detection Using GMM Supervectors and Camera Motion Cancelled Features Yusuke

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Single-Equation GMM Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye &amp; Woon Kyoung Sung

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

Semantic Analysis and Semantic Roles Ling 571 Deep Processing Techniques for NLP February 10,

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing &amp; Searching 3

Information Retrieval WS 2016 / 2017 Lecture 5, Tuesday November 22 nd , 2016 (Fuzzy Search, Edit

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

On Out-of-Distribution Detection Algorithms with Deep Neural Skin Cancer Classifiers Andre G. C.

N-gram Graph: Representation for Graphs Shengchao Liu, Mehmet Furkan Demirel, Yingyu Liang

Rich History of WIC MN Sen. HuBERT Humphrey sponsored legislation creating WIC in 1972

Disclosures None Thyroid Cases Case Based Discussion 69 yo healthy active man with abnormal

Hypothyroidism Therapeutics PHAR 451 Peter Loewen, B.Sc.(Pharm), ACPR, Pharm.D., FCSHP Lower

Disease Modifying Therapies David Paling - Sheffield Rachel Dorsey-Campbell - Imperial Timeline

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3