pku trecvid2009 single actor and pair activity event
play

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in - PowerPoint PPT Presentation

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in Surveillance Video General Coach: Wen Gao a , Xihong Wu b , Tiejun Huang a Executive Coach: Yonghong Tian a , Yaowei Wang a , Lei Qing a Member: Zhipeng Hu a* , Guangnan Ye b* ,


  1. PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in Surveillance Video General Coach: Wen Gao a , Xihong Wu b , Tiejun Huang a Executive Coach: Yonghong Tian a , Yaowei Wang a , Lei Qing a Member: Zhipeng Hu a* , Guangnan Ye b* , Guochen Jia a , Xibin Chen b , Qiong Hu c , Kaihua Jiang b a National Engineering Laboratory for Video Technology, Peking University b Speech and Hearing Research Center, Peking University c Key Lab of Intel. Inf. Proc., Institute of Computing Technology, Chinese Academy of Sciences

  2. Outline  Overview  Introduction of TRECVID-ED Tasks  Summary of TRECVID-ED 2008  Our Results in TRECVID-ED 2009  Our Solution in the eSur System  Background Modeling  Detection and Tracking  Event Classification  Post-processing  Illustrative Results  Summary 2

  3. Overview of TRECVID-ED Tasks  Task  To develop an automatic system to detect observable events in surveillance video  Ten Events  Challenges  PeopleMeet  Clutter scenes  PeopleSplitUp  Illumination  Embrace variations   ElevatorNoEntry Occlusion   PersonRun Different camera  views CellToEar   No clear event ObjectPut definition  TakePicture  Pointing  OpposingFlow 3

  4. The Best Results of 2008 SITEID Event #Ref #Sys #CorDet #FA #Miss Act.DCR IFP-UIUC-NEC CellToEar 349 15 1 14 348 0.999 Intuvision ElevatorNoEntry 0 8 0 8 0 NA DCU Embrace 401 36193 91 5091 310 1.271 IFP-UIUC-NEC ObjectPut 1944 83 6 77 1938 1.004 Intuvision OpposingFlow 12 31 9 12 3 0.251 SJTU PeopleMeet 1182 25033 270 5779 912 1.337 CMU PeopleSplitUp 671 42415 185 42230 486 4.856 MCG-ICT-CAS PersonRuns 314 662 23 639 291 0.989 SJTU Pointing 2316 1005 35 970 2281 1.080 Intuvision TakePicture 23 10 0 10 23 1.000  Note:  There are much rooms for improvement.  OpposingFlow event has good detection performance.  ElevatorNoEntry and TakePicture events are zero CorDets. 4

  5. Approaches in 2008  PeopleMeet (SJTU): Camshift guided particle filter + HMM  Combine Head top detector and human detector  Camshift guided particle filter to obtain trajectory  HMM models to detect hidden states defined by trajectory features.  PeopleSplitUp (CMU): Key points + SVM  Cluster interest points into visual keywords  SVM classifiers to detect activities  Event segmentation was done in a multi-resolution framework, where all activity durations found in training were tried.  Embrace (DCU): Pedestrian tracking in 3D space  Detect and track pedestrians to infer the 3D location  Calculate the probability of person taking part in Embrace evens.  PersonRuns (ICT): Data correlation + trajectory features  Train full-body and head-shoulder detectors using standard haar-like features  P. Yarlagadda, et. al, INTUVISION EVENT DETECTION SYSTEM FOR TRECVID 2008 Adopt the data correlation method with the visual features to track objects P. Wilkins, et al. Dublin City  Event detection by trajectory length, location of trajectory points and speed. University at TRECVID 2008  ElevatorNoEntry (INTUVISION): Pedestrian detection + histogram matching X. Yang, et al., Shanghai Jiao Tong University participation in high-level feature J.B. Guo et. al, TRECVID 2008 Event Detection By MCG-ICT-CAS extraction,automatic search and surveillance event detection at TRECVID 2008  Haar object pedestrian detection  Histogram matching to find person not entering an elevator  …… 5 A. Hauptmann et al. Informedia @ TRECVID2008: Exploring New Frontiers

  6. Our Results in TRECVID-ED2009 (1) Event #Ref #Sys #CorDet #FA #Miss Act. DCR p-eSur_1 1.023 PeopleMeet 449 125 7 118 442 PeopleSplitUp 187 198 7 191 180 1.025 1.020 Embrace 175 80 1 79 174 0.334 ElevatorNoEntry 3 4 2 2 1 p-eSur_2 Event #Ref #Sys #CorDet #FA #Miss Act. DCR PeopleMeet 449 210 15 195 434 1.030 PeopleSplitUp 187 881 14 867 173 1.209 Embrace 175 164 3 161 172 1.036 PersonRuns 107 356 5 351 102 1.068 p-eSur_3 Event #Ref #Sys #CorDet #FA #Miss Act. DCR PeopleMeet 449 210 15 195 434 1.030 PeopleSplitUp 187 881 14 867 173 1.209 Embrace 175 164 3 161 172 1.036 ElevatorNoEntry 3 0 0 0 3 1.000 6

  7. Our Results in TRECVID-ED2009 (2)  Compared with the best results in TRECVID-ED 2008  Directly on the reported results in terms of Act. DCR Event Our Best Best 2008 Imp. PeopleMeet 1.023 1.337 -0.314 PeopleSplitUp 1.025 4.856 -3.831 Note: Our results are evaluated on the ED Embrace 1.020 1.271 -0.251 2009 data by 2009 DCR metric, while the ElevatorNoEntry 0.334 N/A - 2008 best results are evaluated on the ED PersonRuns 1.068 0.989 +0.079 2008 data by 2008 DCR metric.  On the TRECVID-ED 2008 data in terms of 2008 Act. DCR Event Our Best Best 2008 Imp. PeopleMeet 1.245 1.337 -0.092 PeopleSplitUp 1.976 4.856 -2.880 Embrace 1.208 1.271 -0.063 ElevatorNoEntry 0.130 N/A - 1.249 PersonRuns 0.989 +0.260 7

  8. What are Improved?  What? 1. Effectively reduce the false alarms of detection 2. Obtain comparable detection accuracy, and much better results for ElevatorNoEntry  Why? 1. Adaptive background modeling 2. Effective human detection and tracking 3. Ensemble of one-vs.-all SVM and automata-based classifiers 4. Effective event merging and post-processing 8

  9. Our Solution : Treatments for Different Event Categories  Pair-activity Event:  One people interact with another people  Single-actor Event:  No interaction with other people Retrospective event detection Pair-activity Single-actor events events People Elevator PeopleMeet Embrace PersonRuns SplitUp NoEntry 9

  10. Our eSur Framework for TRECVID-ED Camera Classification Feature Extraction Body Detection Background Subtraction Head-Shoulder Detection Post- Processing Object Tracking Events Merging Feature Extraction One VS All SVM Automata

  11. Our Solution (1) : Background Modeling  Mixture of Gaussian (MoG):  To accurately extract the foreground while effectively decreasing detection false alarms.  Block-wise PCA Model:  To identify which camera the video belongs to  Also used in the ElevatorNoEntry event detection.  “block” : segment each frame into blocks  “wise” : adaptively select the principle component for background reconstruction 11

  12. MoG  Key Idea  Randomly select 1000 frames from each camera  Manually label the foreground objects  Use EM algorithm to estimate the model  Results of Background Reconstruction Cam1 Background Cam2 Background Cam3 Background Cam5 Background  Disadvantage: Computation time-consuming 12

  13. Block-wise PCA  General PCA  Model a whole frame  Problems  high spatio-temporal computation complexity  high miss ratio (especially for static objects).  Block-wise PCA  Segment a frame into blocks, and model each block respectively.  Lower spatio-temporal computation complexity  Adaptively select principle component by the MMSE to the mean background  Lower miss ratio and less block effect. 2 = − = φφ T B argmin I B B I B i i i i i φ where I is the trained mean background, is the i th principle i component and is the ith reconstructed background B 13 i

  14. Comparative Results  Blocking vs. No Blocking Method No-blocking Blocking Training time 361.332s 150.406s (for 300 frames) * Experiment platform : Intel Xeon E5410 2.33GHz , 8G Result with no blocking Result with blocking  Block PCA vs. Block-wise PCA original image Block-wise PCA Block PCA 14

  15. Our Solution (2) : Detection and Tracking  Detection: Histogram of oriented gradients (HOG) for both whole body and head-shoulder  Tracking: Online boosting  Forward and backward tracking  Combining color similarity to reduce drift 15

  16. HOG Detector  Fusion of Head-shoulder and Body detection  Adjust the detector searching scales

  17. Detection Results 17

  18. Tracking Process Frame 1 . 2. 3. 4. … Forward Tracking Backward Tracking … Combined Result: Expected Target : Detection Result : Canceled : Expected Path : Final Path : 18

  19. State Machine of Tracking D : Detection existence ND: No detection results P : Online boosting prediction result NH: Not human, drifting happens H : No drifting S : Online boosting and detection results are similar U : Online boosting and detection results aren’t similar S H Head –shoulder Start Prediction and Body Detection D P Start U NH ND End 19

  20. Detection and Tracking Results Detection Results Tracking Results 20

  21. Drift Reduction by Color Similarity  Problem: Drifting  Solution: Combine color similarity to refine tracking results Tracking Result without Color Similarity Comparison Tracking Result with Color Similarity Comparison 21 [CLICK FOR PLAY]

  22. Our Solution (3) : Events Detection - Pair-activity  Event Analysis using key frames  Key Frames: Frames characterize an event happening  “PeopleMeet” and “Embrace”  At the end of the event  “PeopleSplitUp”  At the beginning of the event PeopleMeet Embrace PeopleSplitUp 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend