Recurrent Neural Networks for Person Re-identification Revisited - - PowerPoint PPT Presentation

recurrent neural networks for person re identification
SMART_READER_LITE
LIVE PREVIEW

Recurrent Neural Networks for Person Re-identification Revisited - - PowerPoint PPT Presentation

Recurrent Neural Networks for Person Re-identification Revisited Jean-Baptiste Boin Andr Araujo Bernd Girod Stanford University Google AI Stanford University jbboin@stanford.edu andrearaujo@google.com bgirod@stanford.edu Person video


slide-1
SLIDE 1

Recurrent Neural Networks for Person Re-identification Revisited

Jean-Baptiste Boin Stanford University jbboin@stanford.edu André Araujo Google AI andrearaujo@google.com Bernd Girod Stanford University bgirod@stanford.edu

slide-2
SLIDE 2

Person video re-identification

▪ Goal: associate person video tracks from different cameras ▪ Applications: › Video surveillance › Home automation › Crowd dynamics understanding

2

Image credit: PRID2011 dataset [Hirzer et al., 2011]

slide-3
SLIDE 3

Person video re-identification: challenges

3

Lighting variations Clothing similarity Viewpoint changes Background clutter and occlusions

Credit: iLIDS-VID dataset [Wang et al., 2014]

slide-4
SLIDE 4

4

Sequence feature extraction Sequence feature extraction Sequence feature extraction Sequence feature extraction Sequence feature extraction Sequence feature extraction

Database (Camera A) Query (Camera B) Sequence matching by feature similarity

Framework: re-identification by retrieval

slide-5
SLIDE 5

Related work

▪ Most common setup › Frame feature extraction: CNN › Sequence processing: RNN › Temporal pooling: mean pooling

› [McLaughlin et al., 2016], [Yan et al., 2016], [Wu et al., 2016]

5

CNN CNN CNN RNN Mean pooling Sequence feature RNN RNN

slide-6
SLIDE 6

Related work

▪ Most common setup › Frame feature extraction: CNN › Sequence processing: RNN › Temporal pooling: mean pooling

› [McLaughlin et al., 2016], [Yan et al., 2016], [Wu et al., 2016]

▪ Extensions › Bi-directional RNNs [Zhang et al., 2017] › Multi-scale + attention pooling [Xu et al., 2017] › Fusion of CNN+RNN features [Chen et al., 2017] See review paper [Zheng et al., 2016]

6

CNN CNN CNN RNN Mean pooling Sequence feature RNN RNN

slide-7
SLIDE 7

Outline

▪ Feed-forward RNN approximation with similar representational power ▪ New training protocol to leverage multiple video tracks within a mini-batch ▪ Experimental evaluation ▪ Conclusions

7

slide-8
SLIDE 8

RNN setup

8

CNN

Wi tanh Ws

  • (t-1)
  • (t)
  • (t)

f(t)

  • (t+1)
  • (t-1)

vs

slide-9
SLIDE 9

Proposed feed-forward approximation (1/2)

9

▪ “Short-term dependency” approximation

Disregard terms from step (t-2) in output from step (t)

slide-10
SLIDE 10

Proposed feed-forward approximation (2/2)

10

▪ “Long sequence” approximation

Using approximation from previous slide Disregard edge cases (first and last frame) since videos are long

slide-11
SLIDE 11

Proposed feed-forward approximation: new block

11

Wi tanh Ws

õ(t) f(t)

Ours: FNN ▪ Same memory footprint ▪ Direct mapping between RNN and FNN parameters RNN

Wi tanh Ws

  • (t-1)
  • (t)
  • (t)

f(t)

slide-12
SLIDE 12

Training pipeline

12

▪ Training data

Video tracks (camera B) Video tracks (camera A) Frames

slide-13
SLIDE 13

Training pipeline: RNN baseline

13

▪ SEQ: load sequences of consecutive frames in mini-batch

Video tracks (camera B) Video tracks (camera A)

slide-14
SLIDE 14

Proposed FNN training pipeline

14

▪ FRM: load independent frames ▪ Load images from many more identities in a mini-batch (same memory/computational cost) SEQ (baseline) FRM (ours)

slide-15
SLIDE 15

Data and experimental protocol

▪ Dataset 1: PRID2011 [Hirzer et al., 2011] › 200 identities, average length: 100 frames / track ▪ Dataset 2: iLIDS-VID [Wang et al., 2014] › 300 identities, average length: 71 frames / track ▪ Data splits › Train/test set with half of the identities each › Performance averaged over 20 splits ▪ Evaluation metric: CMC (equivalent to mean accuracy at rank k)

15

slide-16
SLIDE 16

Experiment: Influence of the recurrent connection

16

▪ Train weights on RNN-SEQ (RNN architecture, SEQ training protocol) ▪ Evaluate on RNN and FNN using the weights directly (no re-training) ▪ Same performance obtained

PRID2011 dataset

slide-17
SLIDE 17

Experiment: Comparison with baseline

17

▪ FNN-FRM (ours) outperforms RNN-SEQ ▪ More diversity in mini-batches allows for a much better training

slide-18
SLIDE 18

Comparison with baseline (comprehensive)

18

▪ Our method outperforms the baseline for all ranks in both datasets

CMC values (in %)

slide-19
SLIDE 19

Comparison with state-of-the-art RNN methods

19

▪ Our method is considerably simpler than the other state-of-the-art RNN methods compared but still achieves comparable performance results

CMC values (in %)

slide-20
SLIDE 20

Conclusions

▪ Simple feed-forward RNN approximation with similar representational power ▪ New training protocol to leverage multiple video sequences within a mini-batch ▪ Results significantly and consistently improved compared to baseline ▪ Results on par or better than other published work based on RNNs, with a much simpler technique ▪ Faster model training compared to RNN baseline

20

slide-21
SLIDE 21

Questions?