Energy Based Fast Event Retrieval in Video with Temporal Match Kernel
Junfu Pu1 Yusuke Matsui2 Fan Yang32 Shin’ichi Satoh23
- 1. University of Science and Technology of China
- 2. National Institute of Informatics
- 3. The University of Tokyo
Video with Temporal Match Kernel Shinichi Satoh 23 Junfu Pu 1 Yusuke - - PowerPoint PPT Presentation
Energy Based Fast Event Retrieval in Video with Temporal Match Kernel Shinichi Satoh 23 Junfu Pu 1 Yusuke Matsui 2 Fan Yang 32 1. University of Science and Technology of China 2. National Institute of Informatics 3. The University of Tokyo
Junfu Pu1 Yusuke Matsui2 Fan Yang32 Shin’ichi Satoh23
Introduction Background Matching with Energy Algorithm Speed up with PQ Experiments Conclusion
2
Approach for fast content-based search in large video database
3
Database Query
Related work
Jerome Revaud, et al., Event retrieval in large video collections with circulant temporal encoding, CVPR, 2013 Matthijs Douze, et al., Stable hyper-pooling and query expansion for event detection, ICCV, 2013 Sebastien Poullot, et.al, Temporal matching kernel with explicit feature maps, ACM MM, 2015
Contribution
Simplify the similarity metric by calculating the energy of the score function Derive the energy formulation by Parseval’s theorem Accelerate the computation with product quantization
4
𝐲 = (𝒚0, … , 𝒚𝑢 … ) y= 𝒛0, … , 𝒛𝑢 … time offset: ∆ A kernel defined with 𝐲, 𝐳, and ∆
𝜆△ 𝐲, 𝐳 ∝
𝑢=0 ∞
𝒚𝑢
𝑈𝒛𝑢+△ = 𝑢=0 ∞
𝒚𝑢⨂𝜒 𝑢
𝑈 𝑢′=0 ∞
𝒛𝑢′⨂𝜒 𝑢′ +△
𝑈
5
𝜔0 𝐲 𝜔△ 𝒛
𝜒 𝑢 = 𝑏0 𝑏1 cos(2𝜌 𝑈 𝑢) 𝑏1sin(2𝜌 𝑈 𝑢) ⋮ 𝑏𝑛cos(2𝜌 𝑈 𝑛𝑢) 𝑏𝑛sin(2𝜌 𝑈 𝑛𝑢) 𝑏𝑗: the fourier coefficients 𝜔0 𝐲 = 𝐖0
𝑈, 𝐖 1,𝑑 𝑈 , 𝐖 1,𝑡 𝑈 , … , 𝐖𝑛,𝑑 𝑈 , 𝐖𝑛,𝑡 𝑈 𝑈
𝐖0 = 𝑏0
𝑢=0 ∞
𝒚𝑢 ∈ ℝ𝐸 , 𝐖𝑗,c = 𝑏𝑗
𝑢=0 ∞
𝒚𝑢cos(2𝜌 𝑈 𝑗𝑢) ∈ ℝ𝐸 𝐖𝑗,𝑡 = 𝑏𝑗
𝑢=0 ∞
𝒚𝑢sin(2𝜌 𝑈 𝑗𝑢) ∈ ℝ𝐸
Final Formulation
𝜆𝐲,𝐳 △ = 𝐖0
𝐲 , 𝐖0 (𝐳)
+
𝑜=1 𝑛
cos 𝑜 △ 𝐖𝑜,𝑑
𝐲 , 𝐖𝑜,𝑑 𝐳
+ 𝐖𝑜,𝑡
𝐲 , 𝐖𝑜,𝑡 𝐳
+
𝑜=1 𝑛
sin 𝑜 △ − 𝐖𝑜,𝑑
𝐲 , 𝐖𝑜,𝑡 𝐳
+ 𝐖𝑜,𝑡
𝐲 , 𝐖𝑜,𝑑 𝐳
Similarity Score 𝑇 𝐲, 𝐳 = max
△
𝜆𝐲,𝐳 △ 𝑢𝑛 = arg max
△
𝜆𝐲,𝐳 △
6
Matching with energy
𝐹 𝜆𝐲,𝐳1 > 𝐹 𝜆𝐲,𝐳2 if 𝑇 𝐲, 𝐳1 > 𝑇 𝐲, 𝐳2 Denote the Fourier series of 𝑔(𝑦) as The energy of 𝑔(𝑦) is 𝐹 𝑔 𝑦 =
−∞ ∞ 𝑔 𝑦 2 𝑒𝑦
According to the Parseval’s Theorem 1 2𝜌
−∞ ∞
𝑔 𝑦
2 𝑒𝑦 = 𝑗=1 𝑜
𝑑𝑗
2 + 𝑡𝑗 2 + 𝑑0 2
7
𝑇 𝐲, 𝒛 = 𝐹(𝜆𝐲,𝐳 (△)) 𝑔 𝑦 = 1 2 𝑑0 +
𝑜=1 𝑛
𝑑𝑜 cos 𝑜𝑦 +
𝑜=1 𝑛
𝑡𝑜 sin(𝑜𝑦)
Matching with energy
The final form of the energy 𝑇 𝐲, 𝐳 for 𝜆𝐲,𝐳 △ is
Generalized formulation
𝑇 𝑞 𝐲, 𝐳 =
𝑞
𝑗=1 𝑛
𝑑𝑗
2 + 𝑡𝑗 2 𝑞
𝑇 ∞ 𝐲, 𝐳 = lim
𝑞→∞
1 𝑁
𝑞
𝑗=1 𝑛
𝑑𝑗
2 + 𝑡𝑗 2 𝑞 = max 𝑜
𝑑𝑜
2 + 𝑡𝑜 2 𝑞
8
𝑇 𝐲, 𝐳 = 𝐹 𝜆𝐲,𝐳 △ =
𝑜=1 𝑛
𝐖𝑜,𝑑
𝐲 , 𝐖𝑜,𝑑 𝐳
+ 𝐖𝑜,𝑡
𝐲 , 𝐖𝑜,𝑡 𝐳 2
Matching with energy
Given a query video, go through the candidate in database Calculate the 𝑇 𝐲, 𝐳 between query and candidate Retrieval with 𝑇 𝐲, 𝐳
Advantages
More stable (maximum of 𝑇(𝐲, 𝐳) is sensitive to noise) Lower computational complexity Further accelerate the computation using approximate nearest neighbor method such as PQ
9
Algorithm speedup with PQ 𝑘th codebook 𝒅𝑘∗ generated from 𝐖
𝑘,𝑑 (𝐲𝑗): 𝑗 ∈ {1, … , 𝑂} ⋃ 𝐖 𝑘,𝑡 (𝐲𝑗): 𝑗 ∈ {1, … , 𝑂}
Searching steps
Quantize query 𝑟 to its 𝜕 nearest neighbors with 𝑇 𝐲, 𝐳 Compute the squared distances and dot product for each subquantizer 𝑘 and each of its centroid 𝒅𝑘𝑗 Using the subvector-to-centroid distance, calculate the similarity score 𝑇 𝐲, 𝐳 Order the candidates by decreasing 𝑇 𝐲, 𝐳
10
EVent VidEo (EVVE) dataset [CVPR’13]
620 queries, 2375 database videos, 13 events 1024-D multi-VLAD frame descriptor
Experimental results
11
The average mAP using 𝑇 𝑞 𝐲, 𝐳 for different 𝑞
𝑞
mAP 𝑇 𝑞 𝐲, 𝐳 =
𝑞
𝑗=1 𝑛
𝑑𝑗
2 + 𝑡𝑗 2 𝑞
Results on EVVE and comparison
12
Baseline (temporal match kernel): MM’15 MMV (mean-multiVLAD): CVPR’13 CTE (circulant temporal encoding): CVPR’13 SHP (stable hyper-pooling): ICCV’13
Propose a fast event retrieval method in video database with temporal match kernel Use the energy of the score function as similarity metric Derive the simplified energy formulation by using Parsevals’s theorem With the energy formulation, we use PQ to accelerate the computation Achieve competitive performance with the-state-of- the-art
13