Crowd Scene Understanding with Coherent Recurrent Neural Networks - - PowerPoint PPT Presentation

crowd scene understanding with coherent recurrent neural
SMART_READER_LITE
LIVE PREVIEW

Crowd Scene Understanding with Coherent Recurrent Neural Networks - - PowerPoint PPT Presentation

Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu Department of Computer Science and Technology, Tsinghua University July 12, 2016 Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1 Outline 1


slide-1
SLIDE 1

Crowd Scene Understanding with Coherent Recurrent Neural Networks

Hang Su, Yinpeng Dong, Jun Zhu

Department of Computer Science and Technology, Tsinghua University

July 12, 2016

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1

slide-2
SLIDE 2

Outline

1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 2

slide-3
SLIDE 3

Outline

1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 3

slide-4
SLIDE 4

Background

Understanding Collective behaviors has a wide range applications in video surveillance and crowd management.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

slide-5
SLIDE 5

Background

Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. In the real scenes, pedestrians tend to form groups and their trajectories are influenced by others and obstacles.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

slide-6
SLIDE 6

Background

Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. In the real scenes, pedestrians tend to form groups and their trajectories are influenced by others and obstacles. The main challenges of crowd motion analysis are nonlinear dynamics and coherent motion.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

slide-7
SLIDE 7

Problem Formulation

Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t, the ith person is represented by his/her coordinate (xi(t), yi(t)).

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 5

slide-8
SLIDE 8

Problem Formulation

Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t, the ith person is represented by his/her coordinate (xi(t), yi(t)). Predict future trajectories of pedestrians and use extracted hidden features to recognize crowd motions.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 5

slide-9
SLIDE 9

Previous Work

Social Force model

Optimize energy function Hand-crafted functions Hard to generalize

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

slide-10
SLIDE 10

Previous Work

Social Force model

Optimize energy function Hand-crafted functions Hard to generalize

Probabilistic Forecasting

Gaussian Process

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

slide-11
SLIDE 11

Previous Work

Social Force model

Optimize energy function Hand-crafted functions Hard to generalize

Probabilistic Forecasting

Gaussian Process

Recurrent Neural Networks

N-LSTM [Alahi et al., 2016]

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

slide-12
SLIDE 12

Outline

1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 7

slide-13
SLIDE 13

LSTM

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 8

slide-14
SLIDE 14

LSTM

Structure

Input / Output / Forget gate Memory state ct

Advantage

Prevent vanishing gradient problem Nonlinear characteristic Generalization

ct =ft ⊙ ct−1 + it ⊙ tanh(Wxcxt + Whcht−1 + bc) (1)

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 8

slide-15
SLIDE 15

Outline

1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 9

slide-16
SLIDE 16

Why Coherent LSTM?

LSTM can model individual behaviors but can’t capture the interaction in a group.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

slide-17
SLIDE 17

Why Coherent LSTM?

LSTM can model individual behaviors but can’t capture the interaction in a group. When the neighboring relationship of individuals remain invariant

  • ver time and correlation of their velocities remain high, they tend

to have similar hidden state.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

slide-18
SLIDE 18

Why Coherent LSTM?

LSTM can model individual behaviors but can’t capture the interaction in a group. When the neighboring relationship of individuals remain invariant

  • ver time and correlation of their velocities remain high, they tend

to have similar hidden state. The trajectories of pedestrians not only follow the old trend, but also are influenced by current environment.

Coherent regularization Coherent regularization Motion Prediction LSTM LSTM LSTM

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

slide-19
SLIDE 19

cLSTM Unit

ct = ft ⊙ ct−1 + it ⊙ tanh(Wxcxt + Whcht−1 + bc) +

  • j∈N

λj(t)fj

t ⊙ cj t−1

(2) σ σ σ

ϕ

Forget Gate Input Gate Output Gate Cell

t

x

1 t

h −

t

h

Coherent Regularization Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 11

slide-20
SLIDE 20

Coherent Motion Modeling

Use coherent filtering [Zhou et al., 2012] [Shao et al., 2014] to discover the coherent group.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 12

slide-21
SLIDE 21

Coherent Motion Modeling

Use coherent filtering [Zhou et al., 2012] [Shao et al., 2014] to discover the coherent group. The dependency relationship between two tracklets within the same group is measured as: τj(t) = vi(t) · vj(t) vi(t)vj(t) (3)

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 12

slide-22
SLIDE 22

Dependency Coefficient

The dependency coefficient between the ith and jth tracklets in Eq. (2) is defined as λj(t) = 1 Zi exp τj(t) − 1 2σ2

  • ∈ (0, 1]

(4)

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 13

slide-23
SLIDE 23

Dependency Coefficient

The dependency coefficient between the ith and jth tracklets in Eq. (2) is defined as λj(t) = 1 Zi exp τj(t) − 1 2σ2

  • ∈ (0, 1]

(4) Zi: normalization constant corresponding to the ith tracklet. λj(t) ≃ Z−1

i

if vi(t) ≃ vj(t) which implies that tracklets i and j are similar. Coherent regularization encourages the tracklets to learn similar feature distributions by sharing information across tracklets within a coherent group.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 13

slide-24
SLIDE 24

Framework

Unsupervised encoder-decoder cLSTM framework: hT = cLSTMe(xT , hT−1), (5) ˆ xt = cLSTMdr(ht, ˆ xt+1), where t ∈ [1, T], (6) ˆ xt = cLSTMdp(ht, ˆ xt−1). where t > T, (7)

Coherent Regularization

1

x

2

x

3

x

2

! x

1

! x

3

! x

4

! x

5

! x

6

! x

e

W

e

W

rd

W

rd

W

pd

W

pd

W

Encoder Reconstruction Decoder Prediction Decoder Learnt Hidden Features

T

h

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 14

slide-25
SLIDE 25

Crowd Scene Profiling

Solve critical tasks in crowd scene analysis:

Group state estimation Crowd video classification

Softmax classification using the feature learnt from the unsupervised cLSTM.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 15

slide-26
SLIDE 26

Outline

1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 16

slide-27
SLIDE 27

Datasets and Settings

CUHK Crowd Dataset

http://www.ee.cuhk.edu.hk/~xgwang/CUHKcrowd.html Scene: streets, shopping malls, airports and parks More than 400 sequences and more then 200,000 traklets

Settings

128 hidden units in cLSTM 2/3 of tracklets as the input and 1/3 as the predicted tracklets to evaluate the performance.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 17

slide-28
SLIDE 28

Future Path Forecasting

Table 1: Error of Path Prediction(pixels)

Kalman Filter Un-coherent LSTM Coherent LSTM 9.32 ± 1.99 6.64 ± 1.76 4.37±0.93

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 18

slide-29
SLIDE 29

Group State Estimation

(a) Gas (b) Solid (c) Pure Fluid (d) Impure Fluid

(a) Collective Transition (e) Coherent LSTM (d) Un-coherent LSTM (b) Prediction LSTM (c) Reconstruction LSTM

Confusion matrices of estimating group states using different methods: (a) collective transition [Shao et al., 2014]; (b) prediction LSTM; (c) reconstruction LSTM; (d) un-coherent LSTM; and (e) coherent LSTM.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 19

slide-30
SLIDE 30

Crowd Video Classification

All video clips are annotated into 8 classes as 1) Highly mixed pedestrian walking; 2) Crowd walking following a mainstream and well

  • rganized; 3) Crowd walking following a mainstream but poorly
  • rganized; 4) Crowd merge; 5) Crowd split; 6) Crowd crossing in
  • pposite directions; 7) Intervened escalator traffic; and 8) Smooth

escalator traffic.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 20

slide-31
SLIDE 31

Outline

1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 21

slide-32
SLIDE 32

Conclusion

A novel recurrent neural network with coherent long short term memory unit; Introduce a coherent regularization to consider the collective properties; Outperform other methods in group state estimation and crowd video classification.

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 22

slide-33
SLIDE 33

Thanks for your time! Questions?

Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 23