Gate-Shift Networks for Video Action Recognition Swathikiran - - PowerPoint PPT Presentation

gate shift networks for video action recognition
SMART_READER_LITE
LIVE PREVIEW

Gate-Shift Networks for Video Action Recognition Swathikiran - - PowerPoint PPT Presentation

Gate-Shift Networks for Video Action Recognition Swathikiran Sudhakaran 1 Sergio Escalera 2,3 Oswald Lanz 1 1 Fondazione Bruno Kessler, Italy 2 Computer Vision Center, Spain 3 Universitat de Barcelona, Spain Motivation Video action recognition


slide-1
SLIDE 1

Gate-Shift Networks for Video Action Recognition

Swathikiran Sudhakaran1

1Fondazione Bruno Kessler, Italy 3Universitat de Barcelona, Spain 2Computer Vision Center, Spain

Oswald Lanz1 Sergio Escalera2,3

slide-2
SLIDE 2

Video action recognition requires spatio-temporal reasoning

Motivation

Putting something similar to other things that are already on the table Taking one of many similar things on the table

slide-3
SLIDE 3

Large number of parameters in 3D CNNs require large scale annotated data for training Existing approaches address this problem by a hard-wired decomposition of the 3D kernels which is suboptimal GSM leverages spatial gating for adaptive feature propagation

Contribution

GSM

0 -1 1 1 -1 0

T HxW C C3D T H x W C S3D / R(2+1)D T C HxW TSM

0 0 1

C HxW T

0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 . . .

slide-4
SLIDE 4

GSM

0 -1 1 1 -1 0

T HxW C C3D T H x W C S3D / R(2+1)D T C HxW TSM

0 0 1

C HxW T

0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 . . .

slide-5
SLIDE 5

GSM develops a flexible and data dependent decomposition of 3D kernels with reduced parameters and computational overhead

GSM C3D S3D / R(2+1)D TSM

slide-6
SLIDE 6

Putting sth similar to other things that are already on the table Unfolding sth

TSN TSN Gate-Shift TSN Gate-Shift TSN

Effectiveness of GSM

10.45M 16.37G 10.50M 16.46G +29%

Ablation study on Sth-V1

slide-7
SLIDE 7

State-of-the-art recognition accuracy of 55% on Something Something-V1

slide-8
SLIDE 8

Gate-Shift Networks for Video Action Recognition

Swathikiran Sudhakaran1

1Fondazione Bruno Kessler, Italy 3Universitat de Barcelona, Spain 2Computer Vision Center, Spain

Oswald Lanz1 Sergio Escalera2,3