Neural Attention for Object Tracking Brian Cheung - - PowerPoint PPT Presentation

neural attention for object tracking
SMART_READER_LITE
LIVE PREVIEW

Neural Attention for Object Tracking Brian Cheung - - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley Neural Attention for Object Tracking Brian Cheung bcheung@berkeley.edu Redwood Center for Theoretical Neuroscience, UC Berkeley Visual Computing Research, NVIDIA Source: Wikipedia School Bus Motivation


slide-1
SLIDE 1

April 4-7, 2016 | Silicon Valley

Brian Cheung bcheung@berkeley.edu

Redwood Center for Theoretical Neuroscience, UC Berkeley Visual Computing Research, NVIDIA

Neural Attention for Object Tracking

slide-2
SLIDE 2

Source: Wikipedia “School Bus”

slide-3
SLIDE 3

Motivation

Solving complex vision problems

  • Question Answering
  • Search
  • Navigation

Two core components:

  • Attention
  • Memory

3

slide-4
SLIDE 4

Emergent Properties from Attention

Xu et. al. 2015

4

slide-5
SLIDE 5

Recurrent Networks

5

h(t) x(t)

  • (t)
slide-6
SLIDE 6

Formulating a Glimpse

6

Parameters in the kernel control the layout of the attention window over the

  • riginal image.

Translation Scale

slide-7
SLIDE 7

Spatial Transformer

Jaderberg et. al. 2015

7

slide-8
SLIDE 8

Spatial Transformer Network

Jaderberg et. al. 2015

8

slide-9
SLIDE 9

Foveal Attention Network

9

Cheung et. al. 2015

Recurrent Network Glimpse Network Image Location Network Classification Network

slide-10
SLIDE 10

Foveal Attention Network

10

Cheung et. al. 2015

Recurrent Network Glimpse Network Image Location Network Classification Network

slide-11
SLIDE 11

Foveal Attention Network

11

Cheung et. al. 2015

Recurrent Network Glimpse Network Image Location Network Classification Network

slide-12
SLIDE 12

Foveal Attention Network

12

Cheung et. al. 2015

Recurrent Network Glimpse Network Image Location Network Classification Network

slide-13
SLIDE 13

Foveal Attention Network

13

Cheung et. al. 2015

Recurrent Network Glimpse Network Image Location Network Classification Network

slide-14
SLIDE 14

Foveal Attention Network

14

‘5’

Cheung et. al. 2015

Recurrent Network Glimpse Network Image Location Network Classification Network

slide-15
SLIDE 15

Benefits of Attention

  • Less parameters/Less Computation

○ Smaller Convolutional Network

  • Better Performance

○ Significant performance over ConvNet over entire image ○ Breaks down complex problems into a sequence of simpler problems ○ Filters out noise and distractors

  • Localization information is free

15

slide-16
SLIDE 16

KITTI Tracking Dataset

16

  • 375x1240 video
  • Bounding boxes over time of cars,

pedestrians, etc.

Geiger et. al. 2012

slide-17
SLIDE 17

Recurrent Network Grid Generator Localization Network Convolutional Network

θ

Tracking Network Generate Image Glimpse = Tθ(Image(t), θloc(t-1))

17

slide-18
SLIDE 18

Recurrent Network Grid Generator Localization Network Convolutional Network

θ

Tracking Network Generate features from ConvNet hcnet(t) = fcnet( )

18

slide-19
SLIDE 19

Recurrent Network Grid Generator Localization Network Convolutional Network

θ

Tracking Network Generate features from Recurrent Network hrnn(t) = frnn(hcnet(t), θloc(t-1), hrnn(t-1))

19

slide-20
SLIDE 20

Recurrent Network Grid Generator Localization Network Convolutional Network

θ

Tracking Network Generate parameters for next glimpse from Localization Network θloc(t) = floc(hrnn(t-1))

20

slide-21
SLIDE 21

Recurrent Network Grid Generator Localization Network Convolutional Network

θ

Tracking Network Generate tracking prediction from Tracking Network θpred(t), ypres(t) = ftracking(hrnn(t-1))

21

slide-22
SLIDE 22

Pretraining on Classification Task

Grid Generator Convolutional Network {‘Car’, ‘Pedestrian’, ‘Truck’, ‘Tram’, ‘Cyclist’, ‘Misc’, ‘Van’, ‘Person Sitting’} ~3% Classification Error

  • n validation set

22

slide-23
SLIDE 23

Pretraining on the Registration Task

23

Grid Generator Convolutional Network

Glimpse Parameters θ

slide-24
SLIDE 24

Pretraining on the Registration Task

24

Input Glimpse Predicted Correction Actual Correction

  • Simpler task similar to tracking: Fix a bad

glimpse

  • Useful signal for Localization Network
slide-25
SLIDE 25

Comparing Training Gradients

With ConvNet Pretraining Without pretraining (Random Initialization)

25

slide-26
SLIDE 26

Bouncing MNIST

slide-27
SLIDE 27

Bouncing MNIST

slide-28
SLIDE 28

Bouncing MNIST

MNIST Position Attention Position Tracking Network Localization Network

Output:

x x y y Prediction Ground Truth

slide-29
SLIDE 29

Conclusions

  • End-to-End visual attention works for simple

tasks

  • Robust to encoding of attention parameters

29

slide-30
SLIDE 30

Conclusions

  • Difficult to train on more complex tasks

First Step toward Model-Free, Anonymous Object Tracking with Recurrent Neural Networks (Gan et. al. 2015)

RATM: Recurrent Attentive Tracking Model (Kahou et. al. 2015)

  • Scaling computational costs

30

slide-31
SLIDE 31

Future Work

  • Integrate more tailored components

○ Spatial Memory (Weiss et. al. 2015)

  • Train compact ImageNet models for

initialization

  • Exploration/Unsupervised strategies to

recover from mistakes

○ Error Based Attention (Rezende et. al. 2016)

31

slide-32
SLIDE 32

Acknowledgements

Special thanks to: Shalini Gupta Jan Kautz Pavlo Molchanov Stephen Tyree Eric Weiss

32