semantic indexing using gmm supervectors and tree
play

Semantic Indexing Using GMM Supervectors and Tree-structured GMMs - PowerPoint PPT Presentation

TRECVID 2011 TokyoTech+Canon Semantic Indexing Using GMM Supervectors and Tree-structured GMMs Nakamasa Inoue, Koichi Shinoda, Department of Computer Science, Tokyo Institute of Technology TRECVID 2011 TokyoTech+Canon Outline System


  1. TRECVID 2011 TokyoTech+Canon Semantic Indexing Using GMM Supervectors and Tree-structured GMMs Nakamasa Inoue, Koichi Shinoda, Department of Computer Science, Tokyo Institute of Technology

  2. TRECVID 2011 TokyoTech+Canon Outline � System overview � Fast and high-performance semantic indexing system - 6 types of audio and visual features - Gaussian mixture model (GMM) supervectors - Tree-structured GMMs � Best result: Mean InfAP = 17.3% 1

  3. TRECVID 2011 TokyoTech+Canon System Overview � Fast and high-performance semantic indexing system Tree-sturuc GMM 1) SIFT-Har tured GMMs supervectors SVM score video (shot) … 2) SIFT-Hes Score … 3) SIFTH-Dense fusion … 4) HOG-Dense … 5) HOG-Sub 6) MFCC SVM score 2

  4. TRECVID 2011 TokyoTech+Canon System Overview � Fast and high-performance semantic indexing system Tree-sturuc Tree-sturuc GMM GMM 1) SIFT-Har tured GMMs tured GMMs supervectors supervectors SVM SVM score score video (shot) video (shot) … … 2) SIFT-Hes Score Score … … 3) SIFTH-Dense nse fusion fusion … … 4) HOG-Dense se … … 5) HOG-Sub 6) MFCC SVM SVM score score 3

  5. TRECVID 2011 TokyoTech+Canon Local Feature Extraction 1) SIFT-Har - Harris-affine detector: extension of Harris corner detector [Mikolajczyk, 2004] - Multi-frame (every other frame) 2) SIFT-Hes - Hessian-affine detector - Multi-frame (every other frame) Feature avg. #features avg. #features type per frame per shot SIFT-Har 247 19,536 SIFT-Hes 240 18,986 4

  6. TRECVID 2011 TokyoTech+Canon Local Feature Extraction 3) SIFTH-Dense - SIFT + Hue histogram - 30,000 samples from a key-frame 4) HOG-Dense - 32 dimensional HOG - 10,000 samples from a key-frame 5) HOG-Sub - Dense HOG features extracted from temporal subtraction images - Capture movement 5

  7. TRECVID 2011 TokyoTech+Canon Local Feature Extraction 6) MFCC - Mel-frequency cepstrum coefficients (MFCC) - Audio features for speech recognition - Targets: Speaking, Singing etc. MFCC(12) MFCC(12) MFCC(12) Log-power(1) Log-power(1) 6

  8. TRECVID 2011 TokyoTech+Canon System Overview � Fast and high-performance semantic indexing system Tree-sturuc GMM 1) SIFT-Har 1) SIFT-Har tured GMMs supervectors SVM SVM score score video (shot) video (shot) … 2) SIFT-Hes 2) SIFT-Hes Score Score … 3) SIFTH-D 3) SIFTH-Dense fusion fusion … 4) HOG-De 4) HOG-Dense … 5) HOG-Sub 5) HOG-Sub 6) MFCC 6) MFCC SVM SVM score score 7

  9. TRECVID 2011 TokyoTech+Canon Gaussian Mixture Models (GMMs) � Each shot is model by a GMM : local features : GMM parameters � GMM parameters are estimated by using fast maximum a posteriori (MAP) adaptation UBM* Fast MAP adaptation *Universal background model (UBM): a prior GMM which is estimated by using all video data. 8

  10. TRECVID 2011 TokyoTech+Canon Gaussian Mixture Models (GMMs) � (Basic) MAP adaptation for mean vectors: where responsibility of component for Computational cost: high UBM* Fast MAP adaptation *Universal background model (UBM): a prior GMM which is estimated by using all video data. 9

  11. TRECVID 2011 TokyoTech+Canon Gaussian Mixture Models (GMMs) � : responsibility of component for Gaussian components - Tree-structured GMMs calculate quickly! 10

  12. TRECVID 2011 TokyoTech+Canon Gaussian Mixture Models (GMMs) � : responsibility of component for Gaussian components - Tree-structured GMMs calculate quickly! 10

  13. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Tree-structured GMMs � Calculate responsibilities quickly. : responsibility of component for Gaussian components 11

  14. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Tree-structured GMMs � Leaf layer Leaf node has a Gaussian of the UBM (prior GMM). Gaussian components 11

  15. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Tree-structured GMMs � Non-leaf layers Non-leaf node has a Gaussian that approximates its descendant Gauusians Gaussian components 11

  16. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Tree-structured GMMs � Non-leaf layers Non-leaf node has a Gaussian that approximates its descendant Gauusians Gaussian components 11

  17. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Tree-structured GMMs � Non-leaf layers Non-leaf node has a Gaussian that approximates its descendant Gauusians Gaussian components 11

  18. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Tree-structured GMMs � Calculate responsibilities quickly. : responsibility of component for Gaussian components 12

  19. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Fast MAP Adaptation � Calculate responsibilities quickly. 1. Initialize : a set of active nodes : active node 12

  20. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Fast MAP Adaptation � Calculate responsibilities quickly. 2. Make children of active 3. Keep nodes if : active node 12

  21. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Fast MAP Adaptation � Calculate responsibilities quickly. 2. Make children of active 3. Keep nodes if : active node 12

  22. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia (short paper), 2011] TRECVID 2011 TokyoTech+Canon Fast MAP Adaptation � Calculate responsibilities quickly. 2. Make children of active 3. Keep nodes if : active node 12

  23. [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- (short paper), 2011] supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia TRECVID 2011 TokyoTech+Canon Fast MAP Adaptation � Summary of the algorithm : a set of active nodes 1. Initialize : root node 2. Make children of active 3. Calculate and keep nodes active if : active node 4. Go to 5 if all nodes in are leafs, otherwise return to 2 5. Output GMM parameters 13

  24. supervector-based Video Semantic Indexing Systems,”In Proc. of ACM Multimedia [ Nakamasa Inoue, Koichi Shinoda, “A Fast MAP Adaptation Technique for GMM- (short paper), 2011] TRECVID 2011 TokyoTech+Canon Fast MAP Adaptation � Summary of the algorithm 5. Output GMM parameters where : active node 13

  25. TRECVID 2011 TokyoTech+Canon Fast MAP Adaptation � Calculation time for MAP adaptation - 4.2 times faster than without tree-structured GMMs - No decrease in accuracy Mean InfAP(%) on TRECVID 2010 dataset Optimized tree: the best tree in terms of calculation time on training data. Trees of depth at most 5 that have at most 5 children per node are tested. 14

  26. TRECVID 2011 TokyoTech+Canon GMM Supervector � Combine normalized mean vectors. where normalized mean UBM Fast MAP GMM adaptation supervector 15

  27. TRECVID 2011 TokyoTech+Canon System Overview � Fast and high-performance semantic indexing system Tree-sturuc Tree-sturuc GMM GMM 1) SIFT-Har 1) SIFT-Har tured GMMs tured GMMs supervector supervectors SVM score video (shot) video (shot) … … 2) SIFT-Hes 2) SIFT-Hes Score … … 3) SIFTH-Dense 3) SIFTH-Dense fusion … … 4) HOG-Dense 4) HOG-Dense … … 5) HOG-Sub 5) HOG-Sub 6) MFCC 6) MFCC SVM score 16

  28. TRECVID 2011 TokyoTech+Canon Score Fusion � SVMs are trained with RBF-kernels � Score fusion Linear combination of SVM scores: where Combination coefficients are optimized on a validation set (IACC_1_tv10_training for training, and IACC_1_A for validation) . 17

  29. TRECVID 2011 TokyoTech+Canon Experimental Condition � TokyoTech_Canon_1 6 features, 3 parameters for RBF-kernel (18 SVMs for one semantic concept) � TokyoTech_Canon_2 6 features, the parameter h is fixed to 1.0 (6 SVMs for one semantic concept) 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend