PKU_ICST at TRECVID 2017: Instance Search Task Yuxin Peng, Xin - PowerPoint PPT Presentation

TRECVID 2017 PKU_ICST at TRECVID 2017: Instance Search Task Yuxin Peng, Xin Huang, Jinwei Qi, Junchao Zhang, Junjie Zhao, Mingkuan Yuan, Yunkan Zhuo, Jingze Chi, and Yuxin Yuan Institute of Computer Science and Technology, Peking University, Beijing 100871, China {pengyuxin@pku.edu.cn}

Outline Introduction Our approach Results and conclusions Our related works 2

Introduction • Instance search (INS) task – Provided: separate person and location examples – Topic: combination of a person and a location – Target: retrieve specific persons in specific locations Query person Query location (Ryan) (Cafe1) Ryan in Cafe1 3

Our approach • Overview Location-specific search Person- specific Similarity search computing stage Fusion Semi- Result supervised re-ranking re-ranking stage 5

Our approach • Overview Location-specific search Similarity computing stage Result re-ranking stage 6

Our approach • Location-specific search – Integrates handcrafted and deep features – Similarity score: 𝑡𝑗𝑛 𝑚𝑝𝑑𝑏𝑢𝑗𝑝𝑜 = 𝑥 1 ∙ 𝐵𝐿𝑁 + 𝑥 2 ∙ 𝐸𝑂𝑂 AKM -based location search DNN -based location search 7

Location-specific search • AKM-based location search – Keypoint-based BoW features are applied to capture local details – Total 6 kinds of BoW features, which are combinations of 3 detectors and 2 descriptors – AKM algorithm is used to get one-million dimensional visual words • Similarity score: 𝐵𝐿𝑁 = 1 𝐶𝑃𝑋 𝑙 𝑂 ෍ 𝑙 8

Location-specific search • DNN-based location search – DNN features are used to capture semantic information – Ensemble of 3 CNN models VGGNet GoogLeNet ResNet 9

Location-specific search • DNN-based location search – All 3 CNNs are trained with progressive training strategy • Progressive training Query examples VGGNet Training data GoogLeNet ResNet 10

Location-specific search • DNN-based location search – All 3 CNNs are trained with progressive training strategy • Progressive training Query examples VGGNet Training data GoogLeNet ResNet 11

Location-specific search • DNN-based location search – All 3 CNNs are trained with progressive training strategy • Progressive training Query examples VGGNet Training data GoogLeNet ResNet Top ranked shots 12

Location-specific search • DNN-based location search – All 3 CNNs are trained with progressive training strategy • Progressive training Query examples VGGNet Training data GoogLeNet ResNet Top ranked shots 13

Our approach • Overview Location-specific search Person- specific Similarity search computing stage Result re-ranking stage 14

Our approach • Person-specific search – We apply face recognition technique based on deep model – We also conduct text-based person search, where persons ’ auxiliary information is minded from the provided video transcripts 15

Person-specific search • Face recognition based person search – Face detection 16

Person-specific search • Face recognition based person search – Face detection – Remove “bad” faces automatically: hard to distingush Before removal of bad faces: Wrong Right Wrong 17

Person-specific search • Face recognition based person search – Face detection – Remove “bad” faces automatically: hard to distingush Before removal of bad faces: Right Right Right 18

Person-specific search • Face recognition based person search – We use VGG-Face model to extract face features – We integrate cosine similarity and SVM prediction scores to get the person similarity scores. 𝑡𝑗𝑛 𝑞𝑓𝑠𝑡𝑝𝑜 = 𝑥 1 ∙ 𝐷𝑃𝑇 + 𝑥 2 ∙ 𝑇𝑊𝑁 19

Person-specific search • Face recognition based person search – We use VGG-Face model to extract face features – We integrate cosine similarity and SVM prediction scores to get the person similarity scores. – We adopt similar progressive training strategy to finetune the VGG-Face model 𝑡𝑗𝑛 𝑞𝑓𝑠𝑡𝑝𝑜 = 𝑥 1 ∙ 𝐷𝑃𝑇 + 𝑥 2 ∙ 𝑇𝑊𝑁 Progressive training 20

Our approach • Overview Location-specific search Person- specific Similarity search computing stage Fusion Result re-ranking stage 21

Our approach • Instance score fusion – Direction 1, we search person in specific location 𝑡 1 = 𝜈 ∙ 𝑡𝑗𝑛 𝑞𝑓𝑠𝑡𝑝𝑜 – 𝜈 is a bonus parameter based on text-based person search 22

Our approach • Instance score fusion – Direction 2, we search location containing specific person 𝑡 2 = 𝜈 ∙ 𝑡𝑗𝑛 𝑚𝑝𝑑𝑏𝑢𝑗𝑝𝑜 – 𝜈 is a bonus parameter based on text-based person search 25

Our approach • Instance score fusion – Combine scores of above two directions: 𝑡 𝑔 = 𝜕 ∙ 𝛽 ∙ 𝑡 1 + 𝛾 ∙ 𝑡 2 – 𝝏 indicates whether the shot is simultaneously included in candidate location shots and candidate person shots 26

Our approach • Overview Location-specific search Person- specific Similarity search computing stage Fusion Semi- Result supervised re-ranking re-ranking stage 27

Our approach • Re-ranking – Most of the top ranked shots are correct and look similar – Noisy shots with large dissimilarity can be filtered using similarity scores among top ranked shots – A semi-supervised re- ranking method is proposed to refine the result 28

Re-ranking • Semi-supervised re-ranking algorithm – Obtain affinity matrix W of top-ranked shots F : 𝑈 ∙ 𝐺 𝐺 𝑗 𝑘 , 𝑗 ≠ 𝑘 𝑋 𝑗𝑘 = ൞ , 𝑗, 𝑘 = 1,2, ⋯ , 𝑜 𝐺 𝑗 ∙ 𝐺 𝑘 – 0, 𝑗 = 𝑘 – Update W according to k-NN graph: 𝑗𝑘 = ൝𝑋 𝑗𝑘 , 𝐺 𝑗 ∈ 𝐿𝑂𝑂 𝐺 𝑘 𝑋 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 , 𝑗, 𝑘 = 1,2, ⋯ , 𝑜 0, – Construct the matrix: 𝑇 = 𝐸 −1 2 𝑋𝐸 −1 2 – Re-rank search result: 𝐻 𝑢+1 = 𝛽𝑇𝐻 𝑢 + 1 − 𝛽 𝑍 where Y is the ranked list obtained by above fusion step 29

Results and Conclusions • Results – We submitted 7 runs, and ranked 1st in both automatic and interactive search – Interactive run is performed based on RUN2 with expanding positive examples as queries Type ID MAP Brief description RUN1_A 0.448 AKM+DNN+Face RUN1_E 0.471 AKM+DNN+Face RUN2_A 0.531 RUN1+Text Automatic RUN2_E 0.549 RUN1+Text RUN3_A 0.528 RUN2+Re-rank RUN3_E 0.549 RUN2+Re-rank Interactive RUN4 0.677 RUN2+Human feedback 31

Results and Conclusions • Conclusions – Video examples are helpful for accuracy improvement – Automatic removal of “bad faces” is important – Fusion of location and person similarity is a key factor of the instance search Type ID MAP Brief description RUN1_A 0.448 AKM+DNN+Face RUN1_E 0.471 AKM+DNN+Face RUN2_A 0.531 RUN1+Text Automatic RUN2_E 0.549 RUN1+Text RUN3_A 0.528 RUN2+Re-rank RUN3_E 0.549 RUN2+Re-rank Interactive RUN4 0.677 RUN2+Human feedback 32

1. Video concept recognition (1/2) • Video concept recognition − Learn semantics from video content and classify videos into pre-defined categories automatically. − For examples: human action recognition and multimedia event detection, etc. PlayingGitar Birthday Celebration Parade HorseRiding 34

1. Video concept recognition (2/2) • We propose two-stream collaborative learning with spatial-temporal attention − spatial-temporal attention model : jointly capture the video evolutions both in spatial and temporal domains − static-motion collaborative model : adopt collaborative guidance between static and motion information to promote feature learning 35

1. Video concept recognition (2/2) • We propose two-stream collaborative learning with spatial-temporal attention − spatial-temporal attention model : jointly capture the video evolutions both in spatial and temporal domains − static-motion collaborative model : adopt collaborative guidance between static and motion information to promote feature learning Yuxin Peng, Yunzhen Zhao, and Junchao Zhang, “Two -stream Collaborative Learning with Spatial-Temporal Attention for Video Classification”, IEEE TCSVT 2017 (after minor revision) arXiv: 1704.01740 36

2. Cross-media Retrieval (1/5) • Cross-media retrieval: − Perform retrieval among different media types, such as image, text, audio and video • Challenge: − Heterogeneity gap : Different media types have inconsistent representations Query examples of Golden Gate Bridge Submit a query of any media type Heterogeneity Gap 37

PKU_ICST at TRECVID 2017: Instance Search Task Yuxin Peng, Xin - PowerPoint PPT Presentation

TRECVID 2017 PKU_ICST at TRECVID 2017: Instance Search Task Yuxin Peng, Xin Huang, Jinwei Qi, Junchao Zhang, Junjie Zhao, Mingkuan Yuan, Yunkan Zhuo, Jingze Chi, and Yuxin Yuan Institute of Computer Science and Technology, Peking University,

My PKU U journe rney Intr troductio duction Karen Willetts from Dublin, Ireland

Amino acid disorders (PKU, MSUD, HT,HCU) Amino acid disorders (PKU, MSUD, HT,HCU) Biochemical

PKU-IDM@TRECVID-CCD 2010: Copy Detection with Visual-Audio Feature Fusion and Sequential Pyramid

PKU-NEC@TRECvid SED 2011: Sequence-Based Event Detection in Surveillance Video Yonghong Tian 1 ,

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Measurement Uncertainty in relation to monitoring of PKU & MSUD Marjorie Dixon/ Helen Prunty

Artificial Intelligence AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 1 Information AI Slides

Automated Reasoning 6 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 1 6 6 Automated Reasoning

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning Chengliang Zhang

The Development of Chinese Literacy Culture By Wenjia Liu, Zhifan Zhang,Yuyi Zhou Introduction

NTCIR-7 MOAT Overview Yohei Seki, Lun-Wei Ku, David Kirk Evans, Le Sun 1 Opinion Analysis

CAPS: A Cross-genre Author Profiling System Ivan Bilan and Desislava Zhekova Center for

1 Z-Score Test for Comparing One-sided vs Two-sided Tests Learned Hypotheses Assumes h 1 is

L ECTURE 9: E VALUATION Prof. Julia Hockenmaier juliahmr@illinois.edu Admin Homework 1 is being

Training, test and validation splits Dmitriy (Dima) Gorenshteyn Lead Data Scientist, Memorial

PKU_ICST at TRECVID 2017: Instance Search Task Yuxin Peng, Xin - PowerPoint PPT Presentation

TRECVID 2017 PKU_ICST at TRECVID 2017: Instance Search Task Yuxin Peng, Xin Huang, Jinwei Qi, Junchao Zhang, Junjie Zhao, Mingkuan Yuan, Yunkan Zhuo, Jingze Chi, and Yuxin Yuan Institute of Computer Science and Technology, Peking University,

My PKU U journe rney Intr troductio duction Karen Willetts from Dublin, Ireland

Amino acid disorders (PKU, MSUD, HT,HCU) Amino acid disorders (PKU, MSUD, HT,HCU) Biochemical

PKU-IDM@TRECVID-CCD 2010: Copy Detection with Visual-Audio Feature Fusion and Sequential Pyramid

PKU-NEC@TRECvid SED 2011: Sequence-Based Event Detection in Surveillance Video Yonghong Tian 1 ,

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &amp;

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen &amp; Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Measurement Uncertainty in relation to monitoring of PKU &amp; MSUD Marjorie Dixon/ Helen Prunty

Artificial Intelligence AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 1 Information AI Slides

Automated Reasoning 6 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 1 6 6 Automated Reasoning

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning Chengliang Zhang

The Development of Chinese Literacy Culture By Wenjia Liu, Zhifan Zhang,Yuyi Zhou Introduction

NTCIR-7 MOAT Overview Yohei Seki, Lun-Wei Ku, David Kirk Evans, Le Sun 1 Opinion Analysis

CAPS: A Cross-genre Author Profiling System Ivan Bilan and Desislava Zhekova Center for

1 Z-Score Test for Comparing One-sided vs Two-sided Tests Learned Hypotheses Assumes h 1 is

L ECTURE 9: E VALUATION Prof. Julia Hockenmaier juliahmr@illinois.edu Admin Homework 1 is being

Training, test and validation splits Dmitriy (Dima) Gorenshteyn Lead Data Scientist, Memorial

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Measurement Uncertainty in relation to monitoring of PKU & MSUD Marjorie Dixon/ Helen Prunty