Kobe University, NICT, and University of Siegen at TRECVID 2016 - PowerPoint PPT Presentation

Kobe University, NICT, and University of Siegen at TRECVID 2016 AVS Task Yasuyuki Matsumoto, Kuniaki Uehara (Kobe University) Takashi Shinozaki (NICT) Kimiaki Shirahama, Marcin Grzegozek (University of Siegen)

Our Contribution A method of using small-scale neural network to greatly accelerate concept classifier training. Transfer learning can be used to acquire temporal characteristics effiently by combining both small networks and LSTM . Evaluate the effectiveness of using balanced examples at the time of training. 2

The Problem Using pre-trained neural networks to extract features is a very popular approach. However, training of classifiers takes long time. This training gets even worse if classifiers required are many. extract ? feature ~ pre-trained network 3

Micro Neural Networks Binary classifier that outputs two values to predict the presence or absence of the concept. A micro Neural Network is a fully-connected neural network with a single hidden layer. Dropout is used to avoid overfitting. Calculation time could be reduced (hours->minutes). 4

Our Approach - Overview Overview of our method for TRECVID 2016 AVS task Query Concept Model Precision + Feature extraction + Manual selection + MicroNN training + Shot retrieval + LSTM 5

Our Approach - Overview How we extracted concepts from the queries Query Concept Model Precision + Feature extraction + Manual selection + MicroNN training + Shot retrieval + LSTM 6

Our Approach - Manual Selection Begin with manually selecting relevant concepts for each query Simple rule is used to make it easier to automate the concept selection in the future. “look’’ Base form Query (502) ’’Find shots of a man indoors looking at camera where a bookcase is behind him’’ “man’’ “bookcase’’, Pick only noun and verb “bookshelf”, “furniture’’ Synonyms (from ImageNet) 7

Our Approach - Manual Selection Begin with manually selecting relevant concepts for each query Simple rule is used to make it easier to automate the concept selection in the future. “look’’ Base form Query (502) ’’Find shots of a man indoors looking at camera where a bookcase is behind him’’ “man’’ “bookcase’’, Pick only noun and verb “bookshelf”, “furniture’’ Synonyms (from ImageNet) Concept Indoor Speaking_to_camera Bookshelf Funiture 8

Our Approach - Overview Combine the concepts from each query. Query Concept Model Precision + Feature extraction + Manual selection + MicroNN training + Shot retrieval + LSTM 10

Our Approach - Feature Extraction Pre-trained network is usually transferred into classifiers suitable for the target problem Conv1 Use pre-trained VGGNet ILSVRC 2014 Conv2 • CNN with very deep architecture • Conv3 The 16 layer version is used • FC7 : Use output at the second   • Conv4 fully connected layer Conv5 FC6 FC7 FC8 Softmax K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition” 11

Our Approach - MicroNN Training Perform gradual transfer learning for each concept in the following step ① Start with training microNN using images ~ Image VGG Net 12

Previous Approach - SVM Training Until now . . . Previous studies have trained classifiers such as SVM by extracted features. This requires a lot of time. ~ Image VGG Net SVM 13

Our Approach - MicroNN Training Perform gradual transfer learning for each concept in the following step ① Start with training microNN using images ~ Image VGG Net microNN 14

Our Approach - MicroNN Training Perform gradual transfer learning for each concept in the following step ① Start with training microNN using images ~ 15

Our Approach - MicroNN Training Perform gradual transfer learning for each concept in the following step ② Refine the microNN using shots in video dataset. ~ 16

Our Approach - MicroNN Training Perform gradual transfer learning for each concept in the following step ② Refine the microNN using shots in video dataset. The microNN has weight parameters learned at first step as its initial value. W, b ~ ~ Video 17

Our Approach - MicroNN Training Perform gradual transfer learning for each concept in the following step ③ Futher, hidden layer of microNN is replaced with LSTM for acquiring temporal characteristics. Refine the microNN starting with weight parameters learned at the second step as initial values. W, b ~ ~ V ~ Video LSTM 18

Our Approach - Overview How we go from a shot’s concept relevance to its search score Query Concept Model Precision + Feature extraction + Manual selection + MicroNN training + Shot retrieval + LSTM 20

Our Approach - Shot Retrieval For each shot, calculate the avarage of output values of microNNs for the selected concepts in a query MicroNN outputs are normalized to [-1, 1], to balance between different concepts. Concept Indoor Speaking_to_camera Bookshelf Funiture Output values 0.7 0.1 0.4 0.6 21

Our Approach - Shot Retrieval How do we compare that with other shots Calculate the average of output values and use it as overall search score. Concept Indoor Speaking_to_camera Bookshelf Funiture Output values 0.7 0.1 0.4 0.6 / 4 Average of output values (Search Score) 0.45 22

Purpose of Experiment 1. Evaluate the learning speed. 2. Evaluate the effectiveness of using LSTM to acquire temporal characteristics. 3. Evaluate wheather using same number of positive and negative   examples (“Balanced”) for training improves classification. 23

Experiment - Three Runs Submitted the following for TRECVID 2016 AVS task kobe_nict_siegen_D_M_ 1 kobe_nict_siegen_D_M_ 2 kobe_nict_siegen_D_M_ 3 Imbalanced Balanced (Imbalanced) LSTM Fine-tuning is carried out Fine-tuning is carried out Unlike max-pooling, LSTM obtains using imbalanced numbers using balanced numbers temporal characteristics. of positive and negative examples. of positive and negative examples. LSTM-based microNNs are trained (30,000 total) (30,000 total) only for 14 concepts for which temporal relations among video positive positive frames are important Dataset Dataset Only 14 concepts Ratio Ratio negative negative 24

Experiment - Dataset Used in this study ImageNet TRECVID IACC UCF 101 Image data Video data Video data 39 concepts 61 concepts 5 concepts 25

Experiment - Dataset Training time ImageNet TRECVID IACC UCF 101 Image data Video data Video data 39 concepts 61 concepts 5 concepts 3 2 sec / concept min / concept (30000 shots) (30000 shots) 26

Experiment - Dataset Used in this study List of some concepts selected for each query query_id ImageNet TRECVID UCF 101 501 Outdoor playingGuitar 502 Indoor Speaking_to_camera bookshelf Furniture 503 drum drumming Indoor 27

Experiment - Result Performance comparison between Imbalanced, Balanced and LSTM on each of the 30 queries AP 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 Imbalanced Balanced LSTM 28

Experiment - Result Performance comparison between Imbalanced, Balanced and LSTM on each of the 30 queries AP Using imbalanced training examples leads to higher average 0.35 precisions than using balanced ones. 0.3 0.25 0.2 0.15 0.1 0.05 0 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 Imbalanced Balanced LSTM 29

Experiment - Result Performance comparison between Imbalanced, Balanced and LSTM on each of the 30 queries AP U sing LSTM is more than three times higher than 0.35 the ones not-using LSTM. 0.3 0.25 0.2 0.15 0.1 0.05 0 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 Imbalanced Balanced LSTM 30

Experiment - Result Performance comparison between our method and the other methods developed for the manually-assisted category in AVS task MAP 0.2 0.18 0.16 0.14 0.12 0.1 0.08 LSTM Imbalanced Balanced 0.06 0.04 0.02 0 Ours Others 31

Kobe University, NICT, and University of Siegen at TRECVID 2016 - PowerPoint PPT Presentation

Kobe University, NICT, and University of Siegen at TRECVID 2016 AVS Task Yasuyuki Matsumoto, Kuniaki Uehara (Kobe University) Takashi Shinozaki (NICT) Kimiaki Shirahama, Marcin Grzegozek (University of Siegen) Our Contribution A method of using

NiCT/ATR in NTCIR-7 CCLQA Track Youzheng WU, Wenliang CHEN, Hideki KASHIOKA NiCT/ATR, Japan

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

NICT Use Cases and Requirements for New Models of Human Language to Support Mobile Conversational

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

On Cross-Domain Data Access for Cyber-Physical-Social S ystems Koji Zettsu zettsu@nict.go.jp

Constructing English Reading Courseware Masao Utiyama (NICT) Midori Tanimura (Kinki Univ.)

Evolution equations for B -meson distribution amplitudes Yao Ji University of Siegen Workshop

Carbon Coated GEMs for TPC Saiqa Shahid University of Siegen Department of physics 10.01.2013

Lepton Collider Simulations With WHIZARD New Developments Wolfgang Kilian University of Siegen

Constructive Category Theory and Applications in Algebraic Geometry Sebastian Gutsche

TCPDB.DAT case Archive file signatures Attack surface Attack vectors Defense Conclusions P A

Multimodal KBs: Extraction & Completion Sameer Singh University of California, Irvine Gray

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Aspect Extraction with Automated Prior Knowledge Learning Zhiyuan (Brett) Chen Arjun Mukherjee

for scalable information extraction Pablo Barrio , Columbia University Gonalo Simes, INESC-ID

Back to the Whiteboard: a Principled Approach for the Assessment and Design of Memory Forensic

Malicious PDF Detection is important! 129 Adobe Reader CVE's in 2015 Up from 44 in 2014

Extracting Metadata from Stata Datasets Suzanna Vidmar and Luke Stevens Clinical Epidemiology and

Kobe University, NICT, and University of Siegen at TRECVID 2016 - PowerPoint PPT Presentation

Kobe University, NICT, and University of Siegen at TRECVID 2016 AVS Task Yasuyuki Matsumoto, Kuniaki Uehara (Kobe University) Takashi Shinozaki (NICT) Kimiaki Shirahama, Marcin Grzegozek (University of Siegen) Our Contribution A method of using

NiCT/ATR in NTCIR-7 CCLQA Track Youzheng WU, Wenliang CHEN, Hideki KASHIOKA NiCT/ATR, Japan

CMU @ TRECVID Event Detection @ Ming-yu Chen &amp; Alex Hauptmann School of Computer Science

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

NICT Use Cases and Requirements for New Models of Human Language to Support Mobile Conversational

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

On Cross-Domain Data Access for Cyber-Physical-Social S ystems Koji Zettsu zettsu@nict.go.jp

Constructing English Reading Courseware Masao Utiyama (NICT) Midori Tanimura (Kinki Univ.)

Evolution equations for B -meson distribution amplitudes Yao Ji University of Siegen Workshop

Carbon Coated GEMs for TPC Saiqa Shahid University of Siegen Department of physics 10.01.2013

Lepton Collider Simulations With WHIZARD New Developments Wolfgang Kilian University of Siegen

Constructive Category Theory and Applications in Algebraic Geometry Sebastian Gutsche

TCPDB.DAT case Archive file signatures Attack surface Attack vectors Defense Conclusions P A

Multimodal KBs: Extraction &amp; Completion Sameer Singh University of California, Irvine Gray

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Aspect Extraction with Automated Prior Knowledge Learning Zhiyuan (Brett) Chen Arjun Mukherjee

for scalable information extraction Pablo Barrio , Columbia University Gonalo Simes, INESC-ID

Back to the Whiteboard: a Principled Approach for the Assessment and Design of Memory Forensic

Malicious PDF Detection is important! 129 Adobe Reader CVE's in 2015 Up from 44 in 2014

Extracting Metadata from Stata Datasets Suzanna Vidmar and Luke Stevens Clinical Epidemiology and

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Multimodal KBs: Extraction & Completion Sameer Singh University of California, Irvine Gray