c crowd counting and behavior d c ti d b h i modeling
play

C Crowd Counting and Behavior d C ti d B h i Modeling with - PowerPoint PPT Presentation

C Crowd Counting and Behavior d C ti d B h i Modeling with Modeling with Convolutional Neural Networks Hongsheng Li Hongsheng Li 1 Dept. of Electronic Enigineering, 2 Multimedia Laboratory The Chinese University of Hong


  1. C Crowd Counting and Behavior d C ti d B h i Modeling with Modeling with Convolutional Neural Networks Hongsheng Li 李鴻升 Hongsheng Li 李鴻升 1 Dept. of Electronic Enigineering, 2 Multimedia Laboratory The Chinese University of Hong Kong The Chinese University of Hong Kong

  2. Typical Surveillance Scenario yp 2

  3. Background Subtraction g [Stauffer and Grimson 1999] [Stauffer and Grimson 1999] [Elgammal et al. 2000] [Elgammal et al 2000] [Zivkovic 2004] [Kim et al. 2005] [Sheikh and Shah 2005] 3

  4. Crowd Tracking [Lucas and Kanade 1981] [Shi and Tomasi 1994] [Wang et al. 2011] 4

  5. Crowd Motion Analysis y [Ali and Shah 2007] [Ali and Shah 2007] [Amer and Todorovic 2011] [Chang et al. 2011] [Amer and Todorovic 2011] [Chang et al 2011] [Loy et al. 2012] [Pellegrini et al. 2009] [Zhou et al. 2013] 5

  6. Contents Contents • Crowd Counting – Cross ‐ scene crowd count and density estimation y with deep CNN [Zhang et al. CVPR’15] – Crossing ‐ line crowd counting with two ‐ phase deep Crossing line crowd counting with two phase deep CNN [Zhao et al. ECCV’16] • Crowd Behavior Modeling C d B h i M d li – Multi ‐ person walking path prediction [Yi et al. ECCV’16]

  7. Contents Contents • Crowd Counting – Cross ‐ scene crowd count and density estimation y with deep CNN [Zhang et al. CVPR’15] – Crossing ‐ line crowd counting with two ‐ phase deep Crossing line crowd counting with two phase deep CNN [Zhao et al. ECCV’16] • Crowd Behavior Modeling C d B h i M d li – Multi ‐ person walking path prediction [Yi et al. ECCV’16]

  8. Cross scene Crowd Counting Cross ‐ scene Crowd Counting • Problem definition • Problem definition – Counting the people in the Region ‐ Of ‐ Interest (ROI d (ROI, denoted as the blue region) d h bl i )

  9. CNN for Crowd Counting CNN for Crowd Counting • Training strategy • Training strategy – Patch ‐ based training – Alternatively training with crowd counts and crowd density objectives

  10. Create Ground Truth Patches (Cont’d) Create Ground Truth Patches (Cont d) • Estimating perspective map of a scene • Estimating perspective map of a scene – Each scene needs 2 ‐ 4 annotations of person h i h height – Each pixel stores the value that how many meters current pixel represent

  11. Create Ground Truth Patches (Cont’d) Create Ground Truth Patches (Cont d) • Convolution on the head annotation map with p person ‐ shape kernel – Person ‐ shape kernel should be sum to 1 – Person ‐ shape kernel should be sum to 1 – Crop 3x3 meter patches – Normalize patches to the same size (72x72)

  12. Alternative Training Strategy Alternative Training Strategy • Train each step until convergence • Train each step until convergence – Train with pixel ‐ level density maps and L2 loss – Train with crowd counts of patches

  13. Finetuning on Unseen Scenes Finetuning on Unseen Scenes • Training on all training scenes • Training on all training scenes • For an unseen scene, the trained model might not be suitable for direct deployment • Finetuning the pre ‐ trained model on training Finetuning the pre trained model on training patches similar to those test patches

  14. Training Patch Retrieval Training Patch Retrieval • Candidate training scene retrieval • Candidate training scene retrieval – Given a target scene, retrieve training scenes with similar perspective map (i.e., scenes with similar i il i (i i h i il viewing angles) – Top 20 perspective ‐ map ‐ similar training scenes are kept Top 20 training scenes Top ‐ 20 training scenes Test Scene 1 Test Scene 2

  15. Training Patch Retrieval (cont’d) Training Patch Retrieval (cont d) • Candidate training patch retrieval • Candidate training patch retrieval – Estimate target scene density using pretrained model – Retrieve training patches to match the distribution of target scene according to its density histogram Target scene Training patches d density it d density distribution distribution

  16. Datasets Datasets • UCSD [Chan et al CVPR’08] • UCSD [Chan et al. CVPR 08] • UCF_CC_50 [Idrees et al. CVPR’13] • WordExpo’10 dataset (with SJTU) – Train & validation: 1,127 one ‐ minute video clips of Train & validation: 1 127 one minute video clips of 103 scenes – Test: 5 one ‐ hour video clips from 5 scenes T t 5 h id li f 5 Dataset # frames # scenes Resolution FPS # people per # total frame annotations UCSD 2,000 1 158 X 238 10 11 ‐ 46 49885 UCF_FF_50 UCF FF 50 50 50 50 50 Various Various image image 94 ‐ 4543 94 4543 63974 63974 WorldExpo 4.44 million 108 576 X 720 25 1 ‐ 253 199923

  17. Results Results

  18. Results: WorldExpo’10 Results: WorldExpo 10 • Metric: mean absolute error Metric: mean absolute error Method Scene 1 Scene 2 Scene 3 Scene 4 Scene 5 Average LBP RR LBP+RR 13.6 13.6 58.9 58.9 37.1 37.1 21.8 21.8 23.4 23.4 31.0 31.0 Fiaschi et al. 2.2 87.3 22.2 16.4 5.4 26.7 ICPR’12 Chen et al. 2.1 55.9 9.6 11.3 3.4 16.5 BMVC’12 Crowd CNN 2.0 29.5 9.7 9.3 3.1 10.7

  19. Results: UCSD & UCF CC 50 Results: UCSD & UCF_CC_50 UCSD UCSD UCF_CC_50

  20. Contents Contents • Crowd Counting – Cross ‐ scene crowd count and density estimation y with deep CNN [Zhang et al. CVPR’15] – Crossing ‐ line crowd counting with two ‐ phase deep Crossing line crowd counting with two phase deep CNN [Zhao et al. ECCV’16] • Crowd Behavior Modeling C d B h i M d li – Multi ‐ person walking path prediction [Yi et al. ECCV’16]

  21. Cross scene Crowd Counting Cross ‐ scene Crowd Counting • Problem definition • Problem definition – Count people crossing a Line ‐ of ‐ Inerest in both di directions i – Has practical needs in intelligent surveillance

  22. Temporal slicing Temporal slicing • Existing LOI counting methods mostly use • Existing LOI counting methods mostly use temporal slices

  23. CNN with Pixel level Supervision CNN with Pixel ‐ level Supervision • CNN trained with pixel level supervision maps • CNN trained with pixel ‐ level supervision maps – Instantaneous crowd counting map, which can be d decomposed to d – Crowd density map – Crowd velocity map

  24. Definition of Crowd Counting Map Definition of Crowd Counting Map • At a single time step how many persons have • At a single time step, how many persons have passed this location along x and y directions at each location. h l i • Crossing ‐ line counts can be calculated by g y projecting the values to the normal direction of the LOI of the LOI

  25. Definition of Crowd Counting Map (cont’d) • Corwd counting map can be decomposed as • Corwd counting map can be decomposed as the multiplication of crowd density map and crowd velocity map d l i

  26. Two phase Strategy Two ‐ phase Strategy • Two phase strategy • Two phase strategy – Phase I: train with density and velocity supervision – Phase II: train with counting supervision

  27. Supervision Maps Supervision Maps GT counting map GT velocity map GT density map Estimated counting map d Estimated velocity map Estimated density map

  28. From Instantaneous Counts to LOI Counts • Project the x and y directional counting values on the LOI to Project the x and y directional counting values on the LOI to its normal direction. • Integrating over all the projected values leads to the g g p j instantaneous LOI counts and in the two directions at time t • For certain period of time T, integrate the instantaneous counting numbers to obtain the final crossing line counts within T, LOI

  29. LOI Counting Dataset LOI Counting Dataset • A new LOI counting dataset • A new LOI counting dataset • Evaluation metric ‐ Mean Windowed Relative Absolute Errors Mean Windowed Relative Absolute Errors

  30. LOI Counting Dataset LOI Counting Dataset • A new LOI counting dataset • A new LOI counting dataset

  31. Results Results • Baselines: • Baselines: – Phase I: no phase II training, estimated velocity map and density map are directly multiplied density map are directly multiplied – Direct ‐ A: CNN without elementwise multiplication, direct train with Phase II supervision p – Direct ‐ B: CNN with elementwise multiplication, direct train with Phase II supervision – Two ‐ separate: two separate CNNs for velocity and density

  32. Results Results 2X Speed Downward Downward Upward Upward

  33. Contents Contents • Crowd Counting – Cross ‐ scene crowd count and density estimation y with deep CNN [Zhang et al. CVPR’15] – Crossing ‐ line crowd counting with two ‐ phase deep Crossing line crowd counting with two phase deep CNN [Zhao et al. ECCV’16] • Crowd Behavior Modeling C d B h i M d li – Multi ‐ person walking path prediction [Yi et al. ECCV’16]

  34. Problem Definition Problem Definition • Previous five frames as input BLUE : input locations. GREEN : GT future locations. RED : current locations.

  35. Problem Definition Problem Definition • Need to predict future five frames BLUE : input locations. GREEN : GT future locations. RED : current locations.

  36. Main Difficulties Main Difficulties • How to solve the problem with deep neural network? • How to encode pedestrian walking paths as the input of a deep networks? the input of a deep networks? • How to jointly model the behaviors of all pedestrians in the scene?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend