LEARNING TEMPORAL EMBEDDINGS FOR COMPLEX VIDEO ANALYSIS BY - - PowerPoint PPT Presentation

learning temporal embeddings for complex video analysis
SMART_READER_LITE
LIVE PREVIEW

LEARNING TEMPORAL EMBEDDINGS FOR COMPLEX VIDEO ANALYSIS BY - - PowerPoint PPT Presentation

LEARNING TEMPORAL EMBEDDINGS FOR COMPLEX VIDEO ANALYSIS BY RAMANATHAN, TANG, MORI, AND LI Chad Voegele PROBLEM What can we learn about videos ? without supervision MOTIVATION ... quick fox jumps over dog ... WORD2VEC FOR VIDEOS? words


slide-1
SLIDE 1

LEARNING TEMPORAL EMBEDDINGS FOR COMPLEX VIDEO ANALYSIS

BY RAMANATHAN, TANG, MORI, AND LI

Chad Voegele

slide-2
SLIDE 2

?

PROBLEM

What can we learn about videos without supervision

slide-3
SLIDE 3

MOTIVATION

... quick fox jumps over dog ...

slide-4
SLIDE 4

WORD2VEC FOR VIDEOS?

words frames

sentences video segments

slide-5
SLIDE 5

WORD2VEC FOR VIDEOS?

ISSUES

  • 1. Frames are not discrete.
  • 2. Visual similarity between neighboring frames.
  • 3. Representation of context.
slide-6
SLIDE 6

FRAME EMBEDDING

slide-7
SLIDE 7

FRAME EMBEDDING

Alex Net

fc7 input Magic ReLU LRN

slide-8
SLIDE 8

EMBEDDING OBJECTIVE

similarity(a, b) = a ⋅ b ∥a∥∥b∥ = a ⋅ b

slide-9
SLIDE 9

EMBEDDING OBJECTIVE

⋅ ≫ ⋅ fvj hvj f− hvj

slide-10
SLIDE 10

EMBEDDING OBJECTIVE

max (0, 1 − ( − ) ⋅ ) min

embedding ∑ v∈V

∈v vj

≠ v− vj

fvj f− hvj

slide-11
SLIDE 11

EMBEDDING OBJECTIVE

WANT

⇔ 1 − ( − ) ⋅ < 0 fvj f− hvj ⋅ > 1 + ⋅ fvj hvj f− hvj

slide-12
SLIDE 12

FRAME CONTEXT

= + hvj 1 2T ∑

t=1 T

fvj−t fvj+t = hvj 1 T ∑

t=1 T

fvj−t ∈ { | k ≠ j} hvj fvk

slide-13
SLIDE 13

MULTI-RESOLUTION & NEGATIVES

slide-14
SLIDE 14

EVENT RETRIEVAL

TASK

v → { ∈ V | event(v) = event( )} vj vj

METHOD For each ,

  • 1. Uniformly sample 4 frames from

.

  • 2. Compute and average the frame embeddings.

Then,

  • 1. Sort

∈ V vj vj { ⋅ ≠ v} fv ¯ fvk ¯ ∣ ∣ vk

slide-15
SLIDE 15

EVENT RETRIEVAL

Method mAP (%) Chance 6.53 Two-stream pre-trained 20.09 fc6 20.08 fc7 21.24 Model (no future) 21.30 Model (no hard neg.) 24.22 Model (best) 25.07

slide-16
SLIDE 16

EVENT RETRIVEAL

slide-17
SLIDE 17

SAMPLE VIDEOS

Awesome Parkour and Freerunning 20... Skateboarding Montage 2015

slide-18
SLIDE 18

TEMPORAL ORDER RECOVERY

2 1 4 3 1 2 3 4

slide-19
SLIDE 19

TEMPORAL ORDER RECOVERY

METHOD Given Until done,

  • 1. Average last two frame embeddings.
  • 2. Find next frame as frame with highest similarity.

{ ∈ } svj ∣ ∣ svj vj

slide-20
SLIDE 20

TEMPORAL ORDER RECOVERY

Method Kendall Tau Chance 50 Two-stream 42.05 fc6 42.43 fc7 41.67 Model (pairwise) 42.03 Model (no future) 40.91 Model (best) 40.41

slide-21
SLIDE 21

TEMPORAL ORDERING FOR PHOTOS

slide-22
SLIDE 22

DISCUSSION

How are long-distance dependencies captured? Can we estimate the quality of embeddings independent

  • f application?

Hyper-parameter tuning: fps sampling, embedding dimension, negative selection, context representation

slide-23
SLIDE 23

SOURCES

Groundhog Day, 1993, Columbia Pictures Word2Vec: An Introduction Unsupervised Learning of Visual Representations using Videos by Nitish Srivastava Visualizing Data using t-SNE by van der Maaten Fox Over Dog Picture Efficient Estimation of Word Representations in Vector Space by Mikolov