crowd scene understanding with coherent recurrent neural
play

Crowd Scene Understanding with Coherent Recurrent Neural Networks - PowerPoint PPT Presentation

Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu Department of Computer Science and Technology, Tsinghua University July 12, 2016 Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1 Outline 1


  1. Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu Department of Computer Science and Technology, Tsinghua University July 12, 2016 Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1

  2. Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 2

  3. Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 3

  4. Background Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

  5. Background Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. In the real scenes, pedestrians tend to form groups and their trajectories are influenced by others and obstacles. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

  6. Background Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. In the real scenes, pedestrians tend to form groups and their trajectories are influenced by others and obstacles. The main challenges of crowd motion analysis are nonlinear dynamics and coherent motion . Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

  7. Problem Formulation Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t , the i th person is represented by his/her coordinate ( x i ( t ) , y i ( t )). Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 5

  8. Problem Formulation Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t , the i th person is represented by his/her coordinate ( x i ( t ) , y i ( t )). Predict future trajectories of pedestrians and use extracted hidden features to recognize crowd motions. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 5

  9. Previous Work Social Force model Optimize energy function Hand-crafted functions Hard to generalize Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

  10. Previous Work Social Force model Optimize energy function Hand-crafted functions Hard to generalize Probabilistic Forecasting Gaussian Process Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

  11. Previous Work Social Force model Optimize energy function Hand-crafted functions Hard to generalize Probabilistic Forecasting Gaussian Process Recurrent Neural Networks N-LSTM [Alahi et al., 2016] Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

  12. Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 7

  13. LSTM Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 8

  14. LSTM Structure Input / Output / Forget gate Memory state c t Advantage Prevent vanishing gradient problem Nonlinear characteristic Generalization c t = f t ⊙ c t − 1 + i t ⊙ tanh( W xc x t + W hc h t − 1 + b c ) (1) Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 8

  15. Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 9

  16. Why Coherent LSTM? LSTM can model individual behaviors but can’t capture the interaction in a group. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

  17. Why Coherent LSTM? LSTM can model individual behaviors but can’t capture the interaction in a group. When the neighboring relationship of individuals remain invariant over time and correlation of their velocities remain high, they tend to have similar hidden state. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

  18. Why Coherent LSTM? LSTM can model individual behaviors but can’t capture the interaction in a group. When the neighboring relationship of individuals remain invariant over time and correlation of their velocities remain high, they tend to have similar hidden state. The trajectories of pedestrians not only follow the old trend, but also are influenced by current environment. LSTM Coherent Motion Prediction LSTM regularization Coherent regularization LSTM Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

  19. cLSTM Unit � λ j ( t ) f j t ⊙ c j c t = f t ⊙ c t − 1 + i t ⊙ tanh( W xc x t + W hc h t − 1 + b c ) + t − 1 j ∈N (2) Forget Gate σ Input σ Gate Coherent ϕ Regularization Cell Output Gate x t σ h − t 1 h t Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 11

  20. Coherent Motion Modeling Use coherent filtering [Zhou et al., 2012] [Shao et al., 2014] to discover the coherent group. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 12

  21. Coherent Motion Modeling Use coherent filtering [Zhou et al., 2012] [Shao et al., 2014] to discover the coherent group. The dependency relationship between two tracklets within the same group is measured as: v i ( t ) · v j ( t ) τ j ( t ) = (3) � v i ( t ) �� v j ( t ) � Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 12

  22. Dependency Coefficient The dependency coefficient between the i th and j th tracklets in Eq. (2) is defined as � τ j ( t ) − 1 � λ j ( t ) = 1 exp ∈ (0 , 1] (4) 2 σ 2 Z i Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 13

  23. Dependency Coefficient The dependency coefficient between the i th and j th tracklets in Eq. (2) is defined as � τ j ( t ) − 1 � λ j ( t ) = 1 exp ∈ (0 , 1] (4) 2 σ 2 Z i Z i : normalization constant corresponding to the i th tracklet. λ j ( t ) ≃ Z − 1 if v i ( t ) ≃ v j ( t ) which implies that tracklets i and j i are similar. Coherent regularization encourages the tracklets to learn similar feature distributions by sharing information across tracklets within a coherent group. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 13

  24. Framework Unsupervised encoder-decoder cLSTM framework: h T = cLSTM e ( x T , h T − 1 ) , (5) ˆ x t = cLSTM dr ( h t , ˆ x t +1 ) , where t ∈ [1 , T ] , (6) ˆ x t = cLSTM dp ( h t , ˆ x t − 1 ) . where t > T, (7) x ! x ! x ! 3 2 1 W W Reconstruction rd rd Decoder x x x 1 2 3 W W e e x ! x ! x ! h T 4 5 6 Encoder W W Prediction pd pd Learnt Hidden Decoder Features Coherent Regularization Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 14

  25. Crowd Scene Profiling Solve critical tasks in crowd scene analysis: Group state estimation Crowd video classification Softmax classification using the feature learnt from the unsupervised cLSTM. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 15

  26. Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 16

  27. Datasets and Settings CUHK Crowd Dataset http://www.ee.cuhk.edu.hk/~xgwang/CUHKcrowd.html Scene: streets, shopping malls, airports and parks More than 400 sequences and more then 200,000 traklets Settings 128 hidden units in cLSTM 2/3 of tracklets as the input and 1/3 as the predicted tracklets to evaluate the performance. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 17

  28. Future Path Forecasting Table 1: Error of Path Prediction(pixels) Kalman Filter Un-coherent LSTM Coherent LSTM 9.32 ± 1.99 6.64 ± 1.76 4.37 ± 0.93 Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 18

  29. Group State Estimation (a) Gas (b) Solid (c) Pure Fluid (d) Impure Fluid (a) Collective Transition (b) Prediction LSTM (c) Reconstruction LSTM (d) Un-coherent LSTM (e) Coherent LSTM Confusion matrices of estimating group states using different methods: (a) collective transition [Shao et al., 2014]; (b) prediction LSTM; (c) reconstruction LSTM; (d) un-coherent LSTM; and (e) coherent LSTM. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 19

  30. Crowd Video Classification All video clips are annotated into 8 classes as 1) Highly mixed pedestrian walking ; 2) Crowd walking following a mainstream and well organized ; 3) Crowd walking following a mainstream but poorly organized ; 4) Crowd merge ; 5) Crowd split ; 6) Crowd crossing in opposite directions ; 7) Intervened escalator traffic ; and 8) Smooth escalator traffic . Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 20

  31. Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 21

  32. Conclusion A novel recurrent neural network with coherent long short term memory unit ; Introduce a coherent regularization to consider the collective properties; Outperform other methods in group state estimation and crowd video classification. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 22

  33. Thanks for your time! Questions? Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend