C d C ti d B h i Crowd Counting and Behavior Modeling with Modeling with Convolutional Neural Networks
Hongsheng Li 李鴻升 Hongsheng Li 李鴻升
1 Dept. of Electronic Enigineering, 2 Multimedia Laboratory
C Crowd Counting and Behavior d C ti d B h i Modeling with - - PowerPoint PPT Presentation
C Crowd Counting and Behavior d C ti d B h i Modeling with Modeling with Convolutional Neural Networks Hongsheng Li Hongsheng Li 1 Dept. of Electronic Enigineering, 2 Multimedia Laboratory The Chinese University of Hong
1 Dept. of Electronic Enigineering, 2 Multimedia Laboratory
2
[Stauffer and Grimson 1999] [Elgammal et al 2000] [Stauffer and Grimson 1999] [Elgammal et al. 2000] [Zivkovic 2004] [Kim et al. 2005] [Sheikh and Shah 2005]
3
[Lucas and Kanade 1981] [Shi and Tomasi 1994] [Wang et al. 2011]
4
[Ali and Shah 2007] [Amer and Todorovic 2011] [Chang et al 2011] [Ali and Shah 2007] [Amer and Todorovic 2011] [Chang et al. 2011] [Loy et al. 2012] [Pellegrini et al. 2009] [Zhou et al. 2013]
5
Top 20 training scenes Test Scene 1 Top‐20 training scenes Test Scene 2
Target scene d it Training patches d density distribution density distribution
Dataset # frames # scenes Resolution FPS # people per # total frame annotations UCSD 2,000 1 158 X 238 10 11‐46 49885 UCF FF 50 50 50 Various image 94 4543 63974 UCF_FF_50 50 50 Various image 94‐4543 63974 WorldExpo 4.44 million 108 576 X 720 25 1‐253 199923
Method Scene 1 Scene 2 Scene 3 Scene 4 Scene 5 Average LBP+RR 13.6 58.9 37.1 21.8 23.4 31.0 LBP RR 13.6 58.9 37.1 21.8 23.4 31.0
Fiaschi et al. ICPR’12
2.2 87.3 22.2 16.4 5.4 26.7
Chen et al. BMVC’12
2.1 55.9 9.6 11.3 3.4 16.5 Crowd CNN 2.0 29.5 9.7 9.3 3.1 10.7
GT counting map
GT velocity map GT density map
d Estimated counting map
Estimated velocity map Estimated density map
BLUE: input locations. GREEN: GT future locations. RED: current locations.
BLUE: input locations. GREEN: GT future locations. RED: current locations.
Input pedestrian Encoded input Encoded output Predicted pedestrian Input pedestrian walking paths displacement volume displacement volume Beha ior Predicted pedestrian walking paths Encoding Decoding Behavior CNN
Input pedestrian Encoded input Input pedestrian walking paths displacement volume Encoding Pedestrian i Displacement vector i Walking paths Displacement volume Walking paths Pedestrian j Displacement vector j volume
38
Input pedestrian Encoded input Encoded output Predicted pedestrian Input pedestrian walking paths displacement volume displacement volume B h i Predicted pedestrian walking paths Encoding Decoding Behavior CNN
Dataset I [*] Dataset II Dataset I [ ] Dataset II Scene type Indoor Outdoor Resolution (pixel) 1,920 by 1,080 1,920 by 1,080 (p ) , y , , y , Video duration (s) 4,000 450 Frame rate (fps) 25 25 Annotated pedestrians 5,000 560
12,684 797
[*] S. Yi, H. Li, and X. Wang. Understanding pedestrian behaviors from stationary crowd groups. In Proc. CVPR. IEEE, 2015.
40
41
The input pedestrian paths can be classified into some rough categories by the g y filters in conv1.
42
Filt f 2 d Filters of conv2 and conv3 generally classify pedestrians into finer and more specific categories. For filters in higher‐ level layers, they y y generally encode more complex behavior, e.g. stationary crowds.
stationary crowds.
43
Dataset I Dataset II Dataset I Dataset II Dataset I (Annotation) Dataset II (Annotation) Dataset I (KLT) Dataset II (KLT) Behavior‐CNN 2.421% 2.348% 2.517% 3.816% Constant velocity 6.091% 6.468% 5.864% 5.635% Constant acceleration 9.899% 9.428% 6.619% 7.656% SVM regression 4 639% 4 276% 5 053% 5 327% SVM regression 4.639% 4.276% 5.053% 5.327% SFM [Helbing’95] 4.280% 5.921% 4.447% 5.044% LTA [Pellegrini’09] 4.723% 4.571% 4.346% 4.639% TIM [Cancela’14] 4.075% 4.141% 4.790% 4.790%
Prediction results (MSE) of different methods trained on the annotated d i lki h h KLT j i D I d D II pedestrian walking paths or the KLT trajectories on Dataset I and Dataset II.
Annotation Training: annotated pedestrian locations
44
Training: annotated pedestrian locations Evaluation: annotated pedestrian locations
C t ti fi ld i
3conv+pool+3conv and 3conv+3conv.
Prediction results of different filter sizes and net structures on Dataset I.
45
Proposed Behavior‐CNN Input pedestrian walking paths Constant velocity BLUE dots: input previous locations. GREEN dots: ground truth future locations. g RED dots: the predicted future locations. LTA [Pellegrini et al.]
46
Proposed Behavior‐CNN Input pedestrian walking paths Constant velocity BLUE dots: input previous locations. GREEN dots: ground truth future locations. g RED dots: the predicted future locations. LTA [Pellegrini et al.]
47
Results of pedestrian tracking on Dataset I
GREEN dots: ground truth
BLUE dots: RFT [Zhou et al.] RED dots: proposed
GREEN dots: ground truth
BLUE dots: RFT [Zhou et al.] RED dots: proposed