Gate-Shift Networks for Video Action Recognition
Swathikiran Sudhakaran1
1Fondazione Bruno Kessler, Italy 3Universitat de Barcelona, Spain 2Computer Vision Center, Spain
Gate-Shift Networks for Video Action Recognition Swathikiran - - PowerPoint PPT Presentation
Gate-Shift Networks for Video Action Recognition Swathikiran Sudhakaran 1 Sergio Escalera 2,3 Oswald Lanz 1 1 Fondazione Bruno Kessler, Italy 2 Computer Vision Center, Spain 3 Universitat de Barcelona, Spain Motivation Video action recognition
1Fondazione Bruno Kessler, Italy 3Universitat de Barcelona, Spain 2Computer Vision Center, Spain
Video action recognition requires spatio-temporal reasoning
Large number of parameters in 3D CNNs require large scale annotated data for training Existing approaches address this problem by a hard-wired decomposition of the 3D kernels which is suboptimal GSM leverages spatial gating for adaptive feature propagation
GSM
0 -1 1 1 -1 0
T HxW C C3D T H x W C S3D / R(2+1)D T C HxW TSM
0 0 1
C HxW T
0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 . . .
GSM
0 -1 1 1 -1 0
T HxW C C3D T H x W C S3D / R(2+1)D T C HxW TSM
0 0 1
C HxW T
0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 . . .
GSM develops a flexible and data dependent decomposition of 3D kernels with reduced parameters and computational overhead
GSM C3D S3D / R(2+1)D TSM
Putting sth similar to other things that are already on the table Unfolding sth
TSN TSN Gate-Shift TSN Gate-Shift TSN
10.45M 16.37G 10.50M 16.46G +29%
Ablation study on Sth-V1
State-of-the-art recognition accuracy of 55% on Something Something-V1
1Fondazione Bruno Kessler, Italy 3Universitat de Barcelona, Spain 2Computer Vision Center, Spain