Crowd Scene Understanding with Coherent Recurrent Neural Networks
Hang Su, Yinpeng Dong, Jun Zhu
Department of Computer Science and Technology, Tsinghua University
July 12, 2016
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1
Crowd Scene Understanding with Coherent Recurrent Neural Networks - - PowerPoint PPT Presentation
Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu Department of Computer Science and Technology, Tsinghua University July 12, 2016 Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1 Outline 1
Hang Su, Yinpeng Dong, Jun Zhu
Department of Computer Science and Technology, Tsinghua University
July 12, 2016
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1
1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 2
1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 3
Understanding Collective behaviors has a wide range applications in video surveillance and crowd management.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4
Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. In the real scenes, pedestrians tend to form groups and their trajectories are influenced by others and obstacles.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4
Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. In the real scenes, pedestrians tend to form groups and their trajectories are influenced by others and obstacles. The main challenges of crowd motion analysis are nonlinear dynamics and coherent motion.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4
Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t, the ith person is represented by his/her coordinate (xi(t), yi(t)).
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 5
Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t, the ith person is represented by his/her coordinate (xi(t), yi(t)). Predict future trajectories of pedestrians and use extracted hidden features to recognize crowd motions.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 5
Social Force model
Optimize energy function Hand-crafted functions Hard to generalize
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6
Social Force model
Optimize energy function Hand-crafted functions Hard to generalize
Probabilistic Forecasting
Gaussian Process
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6
Social Force model
Optimize energy function Hand-crafted functions Hard to generalize
Probabilistic Forecasting
Gaussian Process
Recurrent Neural Networks
N-LSTM [Alahi et al., 2016]
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6
1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 7
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 8
Structure
Input / Output / Forget gate Memory state ct
Advantage
Prevent vanishing gradient problem Nonlinear characteristic Generalization
ct =ft ⊙ ct−1 + it ⊙ tanh(Wxcxt + Whcht−1 + bc) (1)
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 8
1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 9
LSTM can model individual behaviors but can’t capture the interaction in a group.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10
LSTM can model individual behaviors but can’t capture the interaction in a group. When the neighboring relationship of individuals remain invariant
to have similar hidden state.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10
LSTM can model individual behaviors but can’t capture the interaction in a group. When the neighboring relationship of individuals remain invariant
to have similar hidden state. The trajectories of pedestrians not only follow the old trend, but also are influenced by current environment.
Coherent regularization Coherent regularization Motion Prediction LSTM LSTM LSTM
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10
ct = ft ⊙ ct−1 + it ⊙ tanh(Wxcxt + Whcht−1 + bc) +
λj(t)fj
t ⊙ cj t−1
(2) σ σ σ
ϕ
Forget Gate Input Gate Output Gate Cell
t
x
1 t
h −
t
h
Coherent Regularization Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 11
Use coherent filtering [Zhou et al., 2012] [Shao et al., 2014] to discover the coherent group.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 12
Use coherent filtering [Zhou et al., 2012] [Shao et al., 2014] to discover the coherent group. The dependency relationship between two tracklets within the same group is measured as: τj(t) = vi(t) · vj(t) vi(t)vj(t) (3)
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 12
The dependency coefficient between the ith and jth tracklets in Eq. (2) is defined as λj(t) = 1 Zi exp τj(t) − 1 2σ2
(4)
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 13
The dependency coefficient between the ith and jth tracklets in Eq. (2) is defined as λj(t) = 1 Zi exp τj(t) − 1 2σ2
(4) Zi: normalization constant corresponding to the ith tracklet. λj(t) ≃ Z−1
i
if vi(t) ≃ vj(t) which implies that tracklets i and j are similar. Coherent regularization encourages the tracklets to learn similar feature distributions by sharing information across tracklets within a coherent group.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 13
Unsupervised encoder-decoder cLSTM framework: hT = cLSTMe(xT , hT−1), (5) ˆ xt = cLSTMdr(ht, ˆ xt+1), where t ∈ [1, T], (6) ˆ xt = cLSTMdp(ht, ˆ xt−1). where t > T, (7)
Coherent Regularization
1
x
2
x
3
x
2
! x
1
! x
3
! x
4
! x
5
! x
6
! x
e
W
e
W
rd
W
rd
W
pd
W
pd
W
Encoder Reconstruction Decoder Prediction Decoder Learnt Hidden Features
Th
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 14
Solve critical tasks in crowd scene analysis:
Group state estimation Crowd video classification
Softmax classification using the feature learnt from the unsupervised cLSTM.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 15
1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 16
CUHK Crowd Dataset
http://www.ee.cuhk.edu.hk/~xgwang/CUHKcrowd.html Scene: streets, shopping malls, airports and parks More than 400 sequences and more then 200,000 traklets
Settings
128 hidden units in cLSTM 2/3 of tracklets as the input and 1/3 as the predicted tracklets to evaluate the performance.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 17
Table 1: Error of Path Prediction(pixels)
Kalman Filter Un-coherent LSTM Coherent LSTM 9.32 ± 1.99 6.64 ± 1.76 4.37±0.93
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 18
(a) Gas (b) Solid (c) Pure Fluid (d) Impure Fluid
(a) Collective Transition (e) Coherent LSTM (d) Un-coherent LSTM (b) Prediction LSTM (c) Reconstruction LSTM
Confusion matrices of estimating group states using different methods: (a) collective transition [Shao et al., 2014]; (b) prediction LSTM; (c) reconstruction LSTM; (d) un-coherent LSTM; and (e) coherent LSTM.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 19
All video clips are annotated into 8 classes as 1) Highly mixed pedestrian walking; 2) Crowd walking following a mainstream and well
escalator traffic.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 20
1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 21
A novel recurrent neural network with coherent long short term memory unit; Introduce a coherent regularization to consider the collective properties; Outperform other methods in group state estimation and crowd video classification.
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 22
Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 23