Video with Temporal Match Kernel Shinichi Satoh 23 Junfu Pu 1 Yusuke - - PowerPoint PPT Presentation

video with temporal match kernel
SMART_READER_LITE
LIVE PREVIEW

Video with Temporal Match Kernel Shinichi Satoh 23 Junfu Pu 1 Yusuke - - PowerPoint PPT Presentation

Energy Based Fast Event Retrieval in Video with Temporal Match Kernel Shinichi Satoh 23 Junfu Pu 1 Yusuke Matsui 2 Fan Yang 32 1. University of Science and Technology of China 2. National Institute of Informatics 3. The University of Tokyo


slide-1
SLIDE 1

Energy Based Fast Event Retrieval in Video with Temporal Match Kernel

Junfu Pu1 Yusuke Matsui2 Fan Yang32 Shin’ichi Satoh23

  • 1. University of Science and Technology of China
  • 2. National Institute of Informatics
  • 3. The University of Tokyo
slide-2
SLIDE 2

Outline

 Introduction  Background  Matching with Energy  Algorithm Speed up with PQ  Experiments  Conclusion

2

slide-3
SLIDE 3

Introduction

 Approach for fast content-based search in large video database

3

Database Query

slide-4
SLIDE 4

Introduction

 Related work

 Jerome Revaud, et al., Event retrieval in large video collections with circulant temporal encoding, CVPR, 2013  Matthijs Douze, et al., Stable hyper-pooling and query expansion for event detection, ICCV, 2013  Sebastien Poullot, et.al, Temporal matching kernel with explicit feature maps, ACM MM, 2015

 Contribution

 Simplify the similarity metric by calculating the energy of the score function  Derive the energy formulation by Parseval’s theorem  Accelerate the computation with product quantization

4

slide-5
SLIDE 5

Background

𝐲 = (𝒚0, … , 𝒚𝑢 … ) y= 𝒛0, … , 𝒛𝑢 … time offset: ∆ A kernel defined with 𝐲, 𝐳, and ∆

𝜆△ 𝐲, 𝐳 ∝

𝑢=0 ∞

𝒚𝑢

𝑈𝒛𝑢+△ = 𝑢=0 ∞

𝒚𝑢⨂𝜒 𝑢

𝑈 𝑢′=0 ∞

𝒛𝑢′⨂𝜒 𝑢′ +△

𝑈

5

𝜔0 𝐲 𝜔△ 𝒛

𝜒 𝑢 = 𝑏0 𝑏1 cos(2𝜌 𝑈 𝑢) 𝑏1sin(2𝜌 𝑈 𝑢) ⋮ 𝑏𝑛cos(2𝜌 𝑈 𝑛𝑢) 𝑏𝑛sin(2𝜌 𝑈 𝑛𝑢) 𝑏𝑗: the fourier coefficients 𝜔0 𝐲 = 𝐖0

𝑈, 𝐖 1,𝑑 𝑈 , 𝐖 1,𝑡 𝑈 , … , 𝐖𝑛,𝑑 𝑈 , 𝐖𝑛,𝑡 𝑈 𝑈

𝐖0 = 𝑏0

𝑢=0 ∞

𝒚𝑢 ∈ ℝ𝐸 , 𝐖𝑗,c = 𝑏𝑗

𝑢=0 ∞

𝒚𝑢cos(2𝜌 𝑈 𝑗𝑢) ∈ ℝ𝐸 𝐖𝑗,𝑡 = 𝑏𝑗

𝑢=0 ∞

𝒚𝑢sin(2𝜌 𝑈 𝑗𝑢) ∈ ℝ𝐸

 

slide-6
SLIDE 6

Background

 Final Formulation

𝜆𝐲,𝐳 △ = 𝐖0

𝐲 , 𝐖0 (𝐳)

+

𝑜=1 𝑛

cos 𝑜 △ 𝐖𝑜,𝑑

𝐲 , 𝐖𝑜,𝑑 𝐳

+ 𝐖𝑜,𝑡

𝐲 , 𝐖𝑜,𝑡 𝐳

+

𝑜=1 𝑛

sin 𝑜 △ − 𝐖𝑜,𝑑

𝐲 , 𝐖𝑜,𝑡 𝐳

+ 𝐖𝑜,𝑡

𝐲 , 𝐖𝑜,𝑑 𝐳

 Similarity Score 𝑇 𝐲, 𝐳 = max

𝜆𝐲,𝐳 △ 𝑢𝑛 = arg max

𝜆𝐲,𝐳 △

6

slide-7
SLIDE 7

Our Method

 Matching with energy

𝐹 𝜆𝐲,𝐳1 > 𝐹 𝜆𝐲,𝐳2 if 𝑇 𝐲, 𝐳1 > 𝑇 𝐲, 𝐳2 Denote the Fourier series of 𝑔(𝑦) as The energy of 𝑔(𝑦) is 𝐹 𝑔 𝑦 =

−∞ ∞ 𝑔 𝑦 2 𝑒𝑦

According to the Parseval’s Theorem 1 2𝜌

−∞ ∞

𝑔 𝑦

2 𝑒𝑦 = 𝑗=1 𝑜

𝑑𝑗

2 + 𝑡𝑗 2 + 𝑑0 2

7

𝑇 𝐲, 𝒛 = 𝐹(𝜆𝐲,𝐳 (△)) 𝑔 𝑦 = 1 2 𝑑0 +

𝑜=1 𝑛

𝑑𝑜 cos 𝑜𝑦 +

𝑜=1 𝑛

𝑡𝑜 sin(𝑜𝑦)

slide-8
SLIDE 8

Our Method

 Matching with energy

The final form of the energy 𝑇 𝐲, 𝐳 for 𝜆𝐲,𝐳 △ is

 Generalized formulation

𝑇 𝑞 𝐲, 𝐳 =

𝑞

𝑗=1 𝑛

𝑑𝑗

2 + 𝑡𝑗 2 𝑞

𝑇 ∞ 𝐲, 𝐳 = lim

𝑞→∞

1 𝑁

𝑞

𝑗=1 𝑛

𝑑𝑗

2 + 𝑡𝑗 2 𝑞 = max 𝑜

𝑑𝑜

2 + 𝑡𝑜 2 𝑞

8

𝑇 𝐲, 𝐳 = 𝐹 𝜆𝐲,𝐳 △ =

𝑜=1 𝑛

𝐖𝑜,𝑑

𝐲 , 𝐖𝑜,𝑑 𝐳

+ 𝐖𝑜,𝑡

𝐲 , 𝐖𝑜,𝑡 𝐳 2

slide-9
SLIDE 9

Our Method

 Matching with energy

 Given a query video, go through the candidate in database  Calculate the 𝑇 𝐲, 𝐳 between query and candidate  Retrieval with 𝑇 𝐲, 𝐳

 Advantages

 More stable (maximum of 𝑇(𝐲, 𝐳) is sensitive to noise)  Lower computational complexity  Further accelerate the computation using approximate nearest neighbor method such as PQ

9

slide-10
SLIDE 10

Our Method

 Algorithm speedup with PQ 𝑘th codebook 𝒅𝑘∗ generated from 𝐖

𝑘,𝑑 (𝐲𝑗): 𝑗 ∈ {1, … , 𝑂} ⋃ 𝐖 𝑘,𝑡 (𝐲𝑗): 𝑗 ∈ {1, … , 𝑂}

 Searching steps

 Quantize query 𝑟 to its 𝜕 nearest neighbors with 𝑇 𝐲, 𝐳  Compute the squared distances and dot product for each subquantizer 𝑘 and each of its centroid 𝒅𝑘𝑗  Using the subvector-to-centroid distance, calculate the similarity score 𝑇 𝐲, 𝐳  Order the candidates by decreasing 𝑇 𝐲, 𝐳

10

slide-11
SLIDE 11

Experiments

 EVent VidEo (EVVE) dataset [CVPR’13]

 620 queries, 2375 database videos, 13 events  1024-D multi-VLAD frame descriptor

 Experimental results

11

The average mAP using 𝑇 𝑞 𝐲, 𝐳 for different 𝑞

𝑞

mAP 𝑇 𝑞 𝐲, 𝐳 =

𝑞

𝑗=1 𝑛

𝑑𝑗

2 + 𝑡𝑗 2 𝑞

slide-12
SLIDE 12

Experiment

 Results on EVVE and comparison

12

Baseline (temporal match kernel): MM’15 MMV (mean-multiVLAD): CVPR’13 CTE (circulant temporal encoding): CVPR’13 SHP (stable hyper-pooling): ICCV’13

slide-13
SLIDE 13

Conclusion

 Propose a fast event retrieval method in video database with temporal match kernel  Use the energy of the score function as similarity metric  Derive the simplified energy formulation by using Parsevals’s theorem  With the energy formulation, we use PQ to accelerate the computation  Achieve competitive performance with the-state-of- the-art

13

slide-14
SLIDE 14

Thank you! 