Tw Two-St Stream C Con
- nvol
- lution
- nal Networ
- rks
fo for Action Recognition in Vi Videos
Karen Simonyan Andrew Zisserman
Cemil Zalluhoğlu
Tw Two-St Stream C Con onvol olution onal Networ orks fo for - - PowerPoint PPT Presentation
Tw Two-St Stream C Con onvol olution onal Networ orks fo for Action Recognition in Vi Videos Karen Simonyan Andrew Zisserman Cemil Zalluholu Introduction Aim Extend deep ConvolutionNetworks to action recognitionin video.
Cemil Zalluhoğlu
Ø dt : displacement vector fields between the pairs of consecutive frames t and t + 1 Ø dt(u,v) : denote the displacement vector at the point (u, v) in frame t, which moves the point to the corresponding point in the followingframe t + 1. ØdtX , dtY : horizontal and vertical components of the vector field
w and h be the width and height of a video, L is number of consecutive frames, 2L comes from (dtX and dtY )
Construct an input volume Iτ by stacking L/2 forward flows between frames τ and τ +L/2 and L/2 backward flows between frames τ − L/2 and τ. The input Iτ thus has the same number of channels (2L) as before.
For camera motion compensation, from each displacement field d , Subtract its mean vector
videos respectively.
connected layer:
coming from the respective dataset.
weight derivatives can be found by back-propagation.
cropped and flipped.
iterations, then to 10 -4 after 70K iterations, and training is stopped after 80K iterations.
training stopped after 20K iterations.
constitutes a 3.2 times speed-up over single-GPU training
videos.
report the average accuracy over three splits on both UCF-101 and HMDB-51