Waseda_Meisei at TRECVID 2017 Ad-hoc Video Search(AVS) Kazuya UEKI - PowerPoint PPT Presentation

Waseda_Meisei at TRECVID 2017 Ad-hoc Video Search(AVS) Kazuya UEKI Koji HIRAKAWA Kotaro KIKUCHI Tetsuji OGAWA Tetsunori KOBAYASHI Waseda University Meisei University 1

Highlights - AVS’s task objective ： To return a list of at most 1000 shot IDs ranked according to their likelihood for each query. - Our system: Based on a large semantic concept bank. (More than 50,000 concepts) - This is our first submission to full automatic run: Problem: Word ambiguity in concept selection step. WordNet/Word2Vec-based methods were proposed. WordNet-based one outperformed Word2Vec-based one. 2

1. System outline 3

1. System outline New Same as 2016 system Query Video Score 1 1 Concept 1 1 Keyword 1 ・・・＞５０ K ・・・ Score calculation Concept bank Score 1 M 1 Concept1 M 1 Score Score fusion Score 2 1 Concept 2 1 for Concept bank ・・・・・・ Keyword 2 Video Score 2 M 2 Concept 2 M 2 & ・・・ Query Score N 1 Concept N 1 ・・・ Keyword N ・・・ Score N M N Concept N M N CNN/SVM of Score of each concept each concept 4

Training Dataset Training Dataset Type #Concepts, Data Network Model TRECVID346 Object, 346 concepts GoogLeNet CNN/SVM (ImageNet) Scene, Action tandem PLACES205 Scene 205 concepts AlexNet CNN 2500K pictures PLACES365 Scene 365 concepts GoogLeNet CNN 1800K pictures Hybrid1183 Object, 1183 concepts AlexNet CNN (Places+ImageNet) Scene 3600K pictures ImageNet1000 Object 1000 concepts AlexNet CNN 1200K pictures ImageNet4000,4437, Object 4000,4437,8201, GoogLeNet CNN 8201,12988 12988 concepts ImageNet21841 Object 21841 concepts GoogLeNet CNN 14200K pictures FCVID239 Object, 239 concepts GoogLeNet CNN/SVM (ImageNet) Scene,Action 91223 movies tandem UCF101 Action 101 concepts GoogLeNet CNN/SVM 5 (ImageNet) 13320 movies tandem

2. Detail of concept selection 6

2. Detail: Step 1 Extract keyword Search keyword from query. Query: “ One or more people at train station platform ” …… N/A N/A “train” “platform” “ people ” “station” “ train_station_platform ” (Collocation) 7

2. Detail: Step 2 Choose concepts for each keyword Query Concept bank One or more people e.g. Airplane Index 1 : at train station ・・・ Model of Concept 1 Index 2 : Model of Concept 2 Keyword i Index 3 : e.g. Aircraft Model of Concept 3 ・・・ Problem : Representation of the keyword Index N : Model of Concept N is not the same as that of the index word. Which concept should be used for the keyword. 8

2. Detail: Step 2 Choose concepts for each keyword • Manual runs – The concept for the keyword is manually selected. • Automatic runs – WordNet based method. • Exact match of synset . – Word2Vec based method. • Similarity of skipgram. – Hybrid of WordNet & Word2Vec. 9

2. Detail: Step 2 Choose concepts for each keyword Automatic approach #1: WordNet synset matching WordNet Word Lexeme Synset Each “Word” has a set of “Lexeme”s. Lexemes which have the same meaning make sysnset. 10

2. Detail: Step 2 Choose concepts for each keyword Automatic approach #1: WordNet synset matching Query Concept bank Synset One or more people of Index 1 Index 1 : at train station ・・・ Model of Concept 1 Synset Index 2 : exact of Index 2 Model of Concept 2 Keyword i matched Index 3 : Synset Model of Concept 3 of Index 3 Synset of ・・・ Keyword i not matched Index N : Synset Model of Concept N of Index N : WordNet 11

2. Detail: Step 2 Choose concepts for each keyword Automatic approach #2: Word2Vec similarity Word2Vec w’ j similarity w i-2 embedding w i-1 w’ k w’ i w i embedding w i+1 w j w k vs. w i+2 Skipgram 12

2. Detail: Step 2 Choose concepts for each keyword Automatic approach #2: Word2Vec similarity Query Concept bank Vector rep. One or more people of Index 1 Index 1 : at train station ・・・ Model of Concept 1 Vector rep. not similar Index 2 : of Index 2 Model of Concept 2 Keyword i Index 3 : similar Vector rep. Model of Concept 3 of Index 3 Vector rep. of similar ・・・ Keyword i not similar Index N : Vector rep. w i-2 Model of Concept N of Index N w i-1 w’ i w i w i+1 : Word2Vec 13 w i+2

2. Detail: Step 2 Choose concepts for each keyword Automatic approach #3: Hybrid Hybrid method: Apply WordNet-based method, first. If failed /* WordNet-based method find no concepts */ then Apply Word2Vec-based one. 14

2. Detail: Step 2 Choose concepts for each keyword Expected Coverage Word2Vec-based approach tends to select too many concepts WordNet-based approach tends to lack some concepts. Desired(ideal) Concept Set 15

2. Detail: Step 2 Calculate score - TRECVID346 CNN/SVM tandem - FCVID239 connectionist architecture - UCF101 1 st frame 2 nd frame 10 th frame           2 . 051 2 . 051 9 . 251 3 . 482             1 . 349 0 . 148 3 . 039 1 . 498         ・・・                   max                        2 . 493      1 . 455 2 . 411 pooling   5 . 471 at most 10 images hidden layer score SVM CNN 16

2. Detail: Step 2 Calculate score PLACES205 IMAGENET1000 IMAGENET8201 PLACES365 IMAGENET4000 IMAGENET12988 HYBRID1183 IMAGENET4437 IMAGENET21841 The shot scores were obtained directly from the output layer (before softmax was applied) 1 st frame 2 nd frame 10 th frame         2 . 051 9 . 251 3 . 482          1 . 349 3 . 039 1 . 498       ・・・                at most                  2 . 493     1 . 455 2 . 411 output 10 images layer max pooling   2 . 051    0 . 148   score             5 . 471 CNN 17

3. Results 18

3. Results (Manual runs) Comparison of Waseda_Meisei manual runs Name Fusion method Fusion weight mAP Manual-1 Multiply(log) 21.6 Manual-2 Multiply(log) 20.4 Manual-3 Sum(linear) 20.7 Manual-4 Sum(linear) 18.9 Fusion method: Multiply(log) > Sum(linear) Fusion weight: w/ weight > w/o weight 19

3. Results (Manual runs) Manual 1 Manual 3 Manual 2 Manual 4 Comparison of Waseda Meisei runs with the runs of other teams for all submitted manually assisted runs. 20

3. Results (Automatic runs) Comparison of Waseda_Meisei automatic runs WordNet FCVID239 Name mAP Word2Vec ＋ UCF101 synset Auto-1 15.9 Auto-2 14.3 Auto-3 14.1 Auto-4 12.5 WordNet vs. Word2Vec: WordNet > Word2Vec 21

3. . Re Resul ults Results for 2016 TRECVID dataset WordNet FCVID239 Name mAP Word2Vec ＋ UCF101 synset Auto-1 17.8 Auto-2 17.4 Auto-3 17.4 Auto-4 17.8 22

3. Results (Automatic runs) Auto 1: WordNet synset Auto 2: Word2Vec Auto 3: Word2Vec (rich DB incl. FCVID239 ＋ UCF101 ) Auto 4: WordNet+Wrd2Vec Hybrid (Bug) Comparison of Waseda Meisei runs with the runs of other teams for all the fully automatic runs. 23

3. R Resul ults: Di Differenc nce b btw. w. our ur Au Auto & & our ur Manu nu. 1.0 0.0 534 542 559 534 Find shots of a person talking behind a podium wearing a suit outdoors during daytime → “ Speaker_At_Podium ” is used in manu. 542 Find shots of at least two planes both visible → Object counting module is installed in manual condition. 559 Find shots of a man and woman inside a car → “ car_interior ” is used and “car” is not used in manual. (All, parsing (linguistic) problem) 24

3. R Resul ults: Di Differenc nce btw. w. our ur Au Auto & & To Top. p. 1.0 0.0 543 548 554 558 543 Find shots of a person communicating using sign language → No concept for “sign language”. (Short of concepts) 554 Find shots of a person holding or operating a TV or movie camera → “TV” contaminated. (Parsing problem) 558 Find shots of a person wearing a scarf → “ scarf_joint ” contaminated. (Word -concept matching problem) Scarf itself is difficult to recognize. (Scoring problem) 25

4. Summary & future works 26

4. Summary and future works Summary • We joined in “ad - hoc video search” task. • This is our first attempt to “automatic run”. In step2 (selection of concepts from keyword), WordNet-based/Word2Vec-based methods proposed • WordNet-based concept selection outperformed Word2Vec-based one. 27

4. Summary and future works Future works • To improve the concept selection methods. e.g. Other use of WordNet / Word2Vec • To improve linguistic part. e.g. a person talking behind xxxx, inside car, at least two xxxx TV or movie camera • To handle action type concepts. 28

Thank you for your attention. Any questions? 29

Waseda_Meisei at TRECVID 2017 Ad-hoc Video Search(AVS) Kazuya UEKI - PowerPoint PPT Presentation

Waseda_Meisei at TRECVID 2017 Ad-hoc Video Search(AVS) Kazuya UEKI Koji HIRAKAWA Kotaro KIKUCHI Tetsuji OGAWA Tetsunori KOBAYASHI Waseda University Meisei University 1 Highlights - AVSs task objective To return a list of at

(Waseda University) with Masaki Honda (Waseda Univ.) Akane Oikawa (Waseda Univ.) based on

Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa Inoue and Koichi Shinoda Zhang

Waseda at TRECVID 2016 Ad-hoc Video Search(AVS) Kazuya UEKI Kotaro KIKUCHI

Aging and Deflation from a Fiscal Perspective Hideki Konishi and Kozo Ueda Waseda Univ May 2014

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Invariant measures in coupled KPZ equations Tadahisa Funaki Waseda University/University of

PKU_ICST at TRECVID 2017: Instance Search Task Yuxin Peng, Xin Huang, Jinwei Qi, Junchao Zhang,

communications systems in Japan for 2020 KOGA, Yasuyuki (kogay@aoni.waseda.jp) IEEE WCNC 2015

Global solvability of some double-diffusive convection systems . Mitsuharu O TANI Waseda

Communicating Agents Overview Communication exchange of information, shared system of signs,

From Activity to Language: Learning to recognise the meaning of motion Centre for Vision, Speech

Making a Tanabata wish at the LLC Think of a wish it can be anything youd like for

Brief Introduction to Continuous Sign Language Recognition 2019.1.19 Introduction

Todays Presenters Liesl Jacobson Amalia Butler Daniels Assistant Director of Senior

The Meaning of Pronouns: Insights from Sign Language Philippe Schlenker (Institut Jean-Nicod &

SAN FRANCISCO SAN FRANCISCO PAID PARENTAL LEAVE PAID PARENTAL LEAVE ORDINANCE (PPLO) ORDINANCE

THIRD ROUND PROMISE ZONES INITIATIVE REQUEST FOR COMMENTS Rural Stakeholders Webinar August 12,

Sambuz

Useful Links

Newsletter

Mail Us

Waseda_Meisei at TRECVID 2017 Ad-hoc Video Search(AVS) Kazuya UEKI - PowerPoint PPT Presentation

Waseda_Meisei at TRECVID 2017 Ad-hoc Video Search(AVS) Kazuya UEKI Koji HIRAKAWA Kotaro KIKUCHI Tetsuji OGAWA Tetsunori KOBAYASHI Waseda University Meisei University 1 Highlights - AVSs task objective To return a list of at

(Waseda University) with Masaki Honda (Waseda Univ.) Akane Oikawa (Waseda Univ.) based on

Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa Inoue and Koichi Shinoda Zhang

Waseda at TRECVID 2016 Ad-hoc Video Search(AVS) Kazuya UEKI Kotaro KIKUCHI

Aging and Deflation from a Fiscal Perspective Hideki Konishi and Kozo Ueda Waseda Univ May 2014

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen &amp; Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Invariant measures in coupled KPZ equations Tadahisa Funaki Waseda University/University of

PKU_ICST at TRECVID 2017: Instance Search Task Yuxin Peng, Xin Huang, Jinwei Qi, Junchao Zhang,

communications systems in Japan for 2020 KOGA, Yasuyuki (kogay@aoni.waseda.jp) IEEE WCNC 2015

Global solvability of some double-diffusive convection systems . Mitsuharu O TANI Waseda

Communicating Agents Overview Communication exchange of information, shared system of signs,

From Activity to Language: Learning to recognise the meaning of motion Centre for Vision, Speech

Making a Tanabata wish at the LLC Think of a wish it can be anything youd like for

Brief Introduction to Continuous Sign Language Recognition 2019.1.19 Introduction

Todays Presenters Liesl Jacobson Amalia Butler Daniels Assistant Director of Senior

The Meaning of Pronouns: Insights from Sign Language Philippe Schlenker (Institut Jean-Nicod &amp;

SAN FRANCISCO SAN FRANCISCO PAID PARENTAL LEAVE PAID PARENTAL LEAVE ORDINANCE (PPLO) ORDINANCE

THIRD ROUND PROMISE ZONES INITIATIVE REQUEST FOR COMMENTS Rural Stakeholders Webinar August 12,

Sambuz

Useful Links

Newsletter

Mail Us

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

The Meaning of Pronouns: Insights from Sign Language Philippe Schlenker (Institut Jean-Nicod &