Attentive Moment Retrieval in Videos Meng Liu 1 , Xiang Wang 2 , - PowerPoint PPT Presentation

Attentive Moment Retrieval in Videos Meng Liu 1 , Xiang Wang 2 , Liqiang Nie 1 , Xiangnan He 2 , Baoquan Chen 1 , and Tat-Seng Chua 2 1 Shandong University, China 2 National University of Singapore, Singapore

Pipeline • Background • Learning Model • Experiment • Conclusion

Background • Inter-video Retrieval Query: FIFA World Cup … … Query: Pig Peggy … …

Background • Intra-video Retrieval Retrieving a segment from the untrimmed videos, which contain complex scenes and involve a large number of objects, attributes, actions, and interactions. Messi's penalty/Football shot

Background • Surveillance Videos: Finding missing children or pets and suspects Query: A girl in orange first walks by the camera. • Home videos: Recalling the desired moment Query: Baby’s face gets very close to the camera. • Online Videos: Quickly Jumping to the specific moment

Background Reality: Dragging progress bar to locate the desired moment. Boring and time consuming Research: Densely segment the long video into different scale moments, and then match each moment with the query. Expensive computational costs and the exponential search space

Problem Formulation -Temporal Moment Localization Input: a video and a language query Query: a girl in orange walks by the camera. 24s 30s Output: Temporal moment corresponding to the given query (green box) with time points [24s,30,s]

Learning Model-Pipeline Video Query Girl with blue shirt drives past on bike. Memory Attention Moment Feature Query Feature Network Cross-Modal Fusion Alignment Score Localization Offset MLP Model 0.9 [1.2s, 2.5s]

Learning Model-Feature Extraction • Video 1. Segmentation: segment video into moments with sliding window, each moment ! has a time location [# $ ,# & ] 2. Computing location offset: [ ( $ ,( & ]= [) $ ,) & ] - [# $ ,# & ] , [) $ ,) & ] is the temporal interval of the given query 3. Computing temporal-spatio feature * + : C3D feature for each moment • Query , : Skip-thoughts feature

Learning Model-Moment Attention Network • There are many temporal constraint words in the given query, such as the term “first”, “second”, and “closer to”, therefore temporal context are useful to the localization. • Not all the context have the same influence on the localization, the near context are more important than the far ones.

Learning Model-Memory Attention Network Memory cell , $ # $ < + ) $ ) > ? 8(' 5 6 , , 7 = 8(+ ' @ 7 + ) @ ) :;/0 1 5(6 , , 7) 4 $ % = , C ∈ [−E $ , E $ ] 0 1 ∑ B;/0 1 5(6 B , 7) # $ % = ' " $ # $ % + ) $ ! * $ = + 4 $ % " # $ % ,∈[/0 1 ,0 1 ]

Learning Model- Cross-modal Fusion The output of this fusion procedure explores the intra- modal and the inter-modal feature interactions to generate the moment-query representations. ) ⋮ ⋮ " ) Query ( ! Mean Mean Moment ( ⋮ ⋮ pooling pooling ⨂ ⋮ ⋮ 1 1 ⋮ ⋮ ⋯ ⋯ ⋯ ⋯ 1 "# = % & " ⨂ ) * ! 1 = [ % & " , % & " ⨂) *, ) *, 1] Intra Query Intra Visual Inter Modal 1

Learning Model- Loss Function Given the output of the fusion model into a two Layer MLP model, and the output of the MLP model is a three dimension vector ! " = [% &' ,) * ,) + ] . - = - ./012 + 4- /5& - ./012 = 6 7 8 (&,')∈< log 1 + exp −% &' + 6 E 8 (&,')∈F log 1 + exp % &' ∗ − ) * + G() + ∗ − ) + )] - /5& = 8 [G ) * (&,')∈<

Experiment - Dataset • TACoS and DiDeMo • Evaluation: R(n,m)=“R@n,IoU=m”

Experiment – Performance Comparison

Experiment – Model Variants • ACRN-a (pink): Mean pooling context feature as moment feature • ACRN-m (purple): Attention model without memory part • ACRN-c (blue): Concatenating multi-modal features

Experiment – Qualitative Result

Conclusion • We present a novel Attentive Cross-Modal Retrieval Network, which jointly characterizes the attentive contextual visual feature and the cross-modal feature representation. • We introduce a temporal memory attention network to memorize the contextual information for each moment, and treat the natural language query as the input of an attention network to adaptively assign weights to the memory representation. • We perform extensive experiments on two benchmark datasets to demonstrate the performance improvement.

Thank you Q&A

Attentive Moment Retrieval in Videos Meng Liu 1 , Xiang Wang 2 , - PowerPoint PPT Presentation

Attentive Moment Retrieval in Videos Meng Liu 1 , Xiang Wang 2 , Liqiang Nie 1 , Xiangnan He 2 , Baoquan Chen 1 , and Tat-Seng Chua 2 1 Shandong University, China 2 National University of Singapore, Singapore Pipeline Background Learning

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

8 energy = our inner feelings Energy is how you feel from moment to moment Mood Elevator

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

30/03/2016 Safety Moment Ben Green Planning Forum Ian Fletcher 22 March 2016 2 Safety Moment

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval Saeid Balaneshin 1

Creating Videos Session will begin shortly Why create instructional videos for your courses?

Consuming videos with the ForkBrowser Consuming videos with the ForkBrowser Ork de Rooij, Cees

Dennis Rosenberg http://DennisRosenberg.com Why Videos? People love watching videos Higher

Understand Basketball Games 2018.6.15 Sports Videos Large quantity, high

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Differentially Private Oblivious RAM Sameer Wagh , Paul Cuff , Prateek Mittal July 24,

Informatics 1: Data & Analysis Lecture 16: Vector Spaces for Information Retrieval Ian Stark

Introduction to NLP Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Noah Smith at

Deep Quantization Network for Efficient Image Retrieval . . . Yue Cao, Mingsheng Long, Jianmin

Image organization, annotation, Image organization, annotation, and retrieval from a human- -

Current Status of NOAA IASI and CrIS/ATMS retrieval algorithm Antonia Gambacorta (1), Kexin Zhang

CSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1)

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. Movaon Math

Sambuz

Useful Links

Newsletter

Mail Us

Attentive Moment Retrieval in Videos Meng Liu 1 , Xiang Wang 2 , - PowerPoint PPT Presentation

Attentive Moment Retrieval in Videos Meng Liu 1 , Xiang Wang 2 , Liqiang Nie 1 , Xiangnan He 2 , Baoquan Chen 1 , and Tat-Seng Chua 2 1 Shandong University, China 2 National University of Singapore, Singapore Pipeline Background Learning

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

8 energy = our inner feelings Energy is how you feel from moment to moment Mood Elevator

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

30/03/2016 Safety Moment Ben Green Planning Forum Ian Fletcher 22 March 2016 2 Safety Moment

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval Saeid Balaneshin 1

Creating Videos Session will begin shortly Why create instructional videos for your courses?

Consuming videos with the ForkBrowser Consuming videos with the ForkBrowser Ork de Rooij, Cees

Dennis Rosenberg http://DennisRosenberg.com Why Videos? People love watching videos Higher

Understand Basketball Games 2018.6.15 Sports Videos Large quantity, high

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Differentially Private Oblivious RAM Sameer Wagh , Paul Cuff , Prateek Mittal July 24,

Informatics 1: Data &amp; Analysis Lecture 16: Vector Spaces for Information Retrieval Ian Stark

Introduction to NLP Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Noah Smith at

Deep Quantization Network for Efficient Image Retrieval . . . Yue Cao, Mingsheng Long, Jianmin

Image organization, annotation, Image organization, annotation, and retrieval from a human- -

Current Status of NOAA IASI and CrIS/ATMS retrieval algorithm Antonia Gambacorta (1), Kexin Zhang

CSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1)

}w !&quot;#$%&amp;'()+,-./012345&lt;yA| Illustraons by Ji Franek. Movaon Math

Sambuz

Useful Links

Newsletter

Mail Us

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Informatics 1: Data & Analysis Lecture 16: Vector Spaces for Information Retrieval Ian Stark

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. Movaon Math