Crowd Scene Understanding with Coherent Recurrent Neural Networks - PowerPoint PPT Presentation

Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu Department of Computer Science and Technology, Tsinghua University July 12, 2016 Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1

Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results 5 Conclusion Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 2

Background Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

Background Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. In the real scenes, pedestrians tend to form groups and their trajectories are influenced by others and obstacles. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

Background Understanding Collective behaviors has a wide range applications in video surveillance and crowd management. In the real scenes, pedestrians tend to form groups and their trajectories are influenced by others and obstacles. The main challenges of crowd motion analysis are nonlinear dynamics and coherent motion . Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 4

Problem Formulation Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t , the i th person is represented by his/her coordinate ( x i ( t ) , y i ( t )). Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 5

Problem Formulation Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t , the i th person is represented by his/her coordinate ( x i ( t ) , y i ( t )). Predict future trajectories of pedestrians and use extracted hidden features to recognize crowd motions. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 5

Previous Work Social Force model Optimize energy function Hand-crafted functions Hard to generalize Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

Previous Work Social Force model Optimize energy function Hand-crafted functions Hard to generalize Probabilistic Forecasting Gaussian Process Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

Previous Work Social Force model Optimize energy function Hand-crafted functions Hard to generalize Probabilistic Forecasting Gaussian Process Recurrent Neural Networks N-LSTM [Alahi et al., 2016] Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 6

LSTM Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 8

LSTM Structure Input / Output / Forget gate Memory state c t Advantage Prevent vanishing gradient problem Nonlinear characteristic Generalization c t = f t ⊙ c t − 1 + i t ⊙ tanh( W xc x t + W hc h t − 1 + b c ) (1) Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 8

Why Coherent LSTM? LSTM can model individual behaviors but can’t capture the interaction in a group. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

Why Coherent LSTM? LSTM can model individual behaviors but can’t capture the interaction in a group. When the neighboring relationship of individuals remain invariant over time and correlation of their velocities remain high, they tend to have similar hidden state. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

Why Coherent LSTM? LSTM can model individual behaviors but can’t capture the interaction in a group. When the neighboring relationship of individuals remain invariant over time and correlation of their velocities remain high, they tend to have similar hidden state. The trajectories of pedestrians not only follow the old trend, but also are influenced by current environment. LSTM Coherent Motion Prediction LSTM regularization Coherent regularization LSTM Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 10

cLSTM Unit � λ j ( t ) f j t ⊙ c j c t = f t ⊙ c t − 1 + i t ⊙ tanh( W xc x t + W hc h t − 1 + b c ) + t − 1 j ∈N (2) Forget Gate σ Input σ Gate Coherent ϕ Regularization Cell Output Gate x t σ h − t 1 h t Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 11

Coherent Motion Modeling Use coherent filtering [Zhou et al., 2012] [Shao et al., 2014] to discover the coherent group. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 12

Coherent Motion Modeling Use coherent filtering [Zhou et al., 2012] [Shao et al., 2014] to discover the coherent group. The dependency relationship between two tracklets within the same group is measured as: v i ( t ) · v j ( t ) τ j ( t ) = (3) � v i ( t ) �� v j ( t ) � Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 12

Dependency Coefficient The dependency coefficient between the i th and j th tracklets in Eq. (2) is defined as � τ j ( t ) − 1 � λ j ( t ) = 1 exp ∈ (0 , 1] (4) 2 σ 2 Z i Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 13

Dependency Coefficient The dependency coefficient between the i th and j th tracklets in Eq. (2) is defined as � τ j ( t ) − 1 � λ j ( t ) = 1 exp ∈ (0 , 1] (4) 2 σ 2 Z i Z i : normalization constant corresponding to the i th tracklet. λ j ( t ) ≃ Z − 1 if v i ( t ) ≃ v j ( t ) which implies that tracklets i and j i are similar. Coherent regularization encourages the tracklets to learn similar feature distributions by sharing information across tracklets within a coherent group. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 13

Framework Unsupervised encoder-decoder cLSTM framework: h T = cLSTM e ( x T , h T − 1 ) , (5) ˆ x t = cLSTM dr ( h t , ˆ x t +1 ) , where t ∈ [1 , T ] , (6) ˆ x t = cLSTM dp ( h t , ˆ x t − 1 ) . where t > T, (7) x ! x ! x ! 3 2 1 W W Reconstruction rd rd Decoder x x x 1 2 3 W W e e x ! x ! x ! h T 4 5 6 Encoder W W Prediction pd pd Learnt Hidden Decoder Features Coherent Regularization Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 14

Crowd Scene Profiling Solve critical tasks in crowd scene analysis: Group state estimation Crowd video classification Softmax classification using the feature learnt from the unsupervised cLSTM. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 15

Datasets and Settings CUHK Crowd Dataset http://www.ee.cuhk.edu.hk/~xgwang/CUHKcrowd.html Scene: streets, shopping malls, airports and parks More than 400 sequences and more then 200,000 traklets Settings 128 hidden units in cLSTM 2/3 of tracklets as the input and 1/3 as the predicted tracklets to evaluate the performance. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 17

Future Path Forecasting Table 1: Error of Path Prediction(pixels) Kalman Filter Un-coherent LSTM Coherent LSTM 9.32 ± 1.99 6.64 ± 1.76 4.37 ± 0.93 Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 18

Group State Estimation (a) Gas (b) Solid (c) Pure Fluid (d) Impure Fluid (a) Collective Transition (b) Prediction LSTM (c) Reconstruction LSTM (d) Un-coherent LSTM (e) Coherent LSTM Confusion matrices of estimating group states using different methods: (a) collective transition [Shao et al., 2014]; (b) prediction LSTM; (c) reconstruction LSTM; (d) un-coherent LSTM; and (e) coherent LSTM. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 19

Crowd Video Classification All video clips are annotated into 8 classes as 1) Highly mixed pedestrian walking ; 2) Crowd walking following a mainstream and well organized ; 3) Crowd walking following a mainstream but poorly organized ; 4) Crowd merge ; 5) Crowd split ; 6) Crowd crossing in opposite directions ; 7) Intervened escalator traffic ; and 8) Smooth escalator traffic . Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 20

Conclusion A novel recurrent neural network with coherent long short term memory unit ; Introduce a coherent regularization to consider the collective properties; Outperform other methods in group state estimation and crowd video classification. Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 22

Thanks for your time! Questions? Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 23

Crowd Scene Understanding with Coherent Recurrent Neural Networks - PowerPoint PPT Presentation

Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu Department of Computer Science and Technology, Tsinghua University July 12, 2016 Hang Su, Yinpeng Dong, Jun Zhu July 12, 2016 1 Outline 1

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Recurrent Neural Network Agenda Recurrent Neural Network

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei S. Kasaei

Deep Learning beyond Classification Cees Snoek, UvA Efstratios Gavves, UvA Laurens van de

Large-Scale Self-Supervised Robotic Learning Chelsea Finn In collaboration with Sergey Levine

Localization and Mapping Chapter 25.3 Chapter 25.3 1 Sensors Range finders: sonar (land,

Computational Seismology: An Introduction Li Zhao Institute of Earth Sciences Academia Sinica,

Video Compression Lecture # 5 6 Shahab Baqai LUMS Outline Image compression

Hot Research Topics in Mobile Video Sensing and opportunistic delivery of ubiquitous video in

Patents In Telecom David Howard Corporate Vice President and Deputy General Counsel, Litigation