AT&T Research at TRECVID 2013: Surveillance Event Detection - - PowerPoint PPT Presentation

at t research at trecvid 2013 surveillance event detection
SMART_READER_LITE
LIVE PREVIEW

AT&T Research at TRECVID 2013: Surveillance Event Detection - - PowerPoint PPT Presentation

AT&T Research at TRECVID 2013: Surveillance Event Detection Xiaodong Yang * , Zhu Liu , Eric Zavesky , David Gibbon , Behzad Shahraray City College of New York, CUNY AT&T Labs - Research *This work is carried out


slide-1
SLIDE 1

AT&T Research at TRECVID 2013: Surveillance Event Detection

Xiaodong Yang†*, Zhu Liu‡, Eric Zavesky‡, David Gibbon‡, Behzad Shahraray‡

†City College of New York, CUNY ‡AT&T Labs - Research

*This work is carried out when the author worked as a research intern at AT&T Labs – Research.

slide-2
SLIDE 2

Team Members

Xiaodong Yang Zhu Liu Eric Zavesky David Gibbon Behzad Shahraray

slide-3
SLIDE 3

Outline

  • System Overview
  • Low-Level Features
  • Video Representation
  • CascadeSVMs
  • Human Interactions
  • Performance Evaluation
  • Conclusion
slide-4
SLIDE 4

Outline

  • System Overview
  • Low-Level Features
  • Video Representation
  • CascadeSVMs
  • Human Interactions
  • Performance Evaluation
  • Conclusion
slide-5
SLIDE 5

System Overview

slide-6
SLIDE 6

Outline

  • System Overview
  • Low-Level Features
  • Video Representation
  • CascadeSVMs
  • Human Interactions
  • Performance Evaluation
  • Conclusion
slide-7
SLIDE 7

System Overview

slide-8
SLIDE 8

Low-Level Feature Extraction

  • STIP-HOG/HOF
  • MoSIFT
  • ActionHOG
  • Dense Trajectories (DT)
  • Trajectory
  • HOG
  • HOF
  • Motion Boundary Histogram (MBH)
slide-9
SLIDE 9

Low-Level Feature Extraction

  • STIP
  • 3D Harris corner detector
  • HOG-HOF descriptor
  • I. Laptev. On Space-Time Interest Points. IJCV, 2005.
slide-10
SLIDE 10

Low-Level Feature Extraction

  • MoSIFT
  • SIFT detector + motion
  • SIFT descriptor
  • image gradient
  • optical flow
  • M. Chen and A. Hauptmann. MoSIFT: Recognizing Human

Actions in Surveillance Videos. CMU-CS-09-161, 2009.

slide-11
SLIDE 11

Low-Level Feature Extraction

  • ActionHOG
  • SURF detector + motion
  • HOG
  • image gradient
  • motion history image
  • optical flow
  • X. Yang, C. Yi, L. Cao, and Y. Tian. MediaCCNY at TRECVID 2012:

Surveillance Event Detection. NIST TRECVID Workshop, 2012.

slide-12
SLIDE 12

Low-Level Feature Extraction

  • Dense Trajectories
  • dense sampling + tracking
  • Trajectory
  • HOG
  • HOF
  • MBH
  • H. Wang, A. Klaser, C. Schmid, and C. Liu. Action

Recognition by Dense Trajectories. CVPR, 2011.

slide-13
SLIDE 13

Outline

  • System Overview
  • Low-Level Features
  • Video Representation
  • CascadeSVMs
  • Human Interactions
  • Performance Evaluation
  • Conclusion
slide-14
SLIDE 14

System Overview

slide-15
SLIDE 15

Video Representation

  • Fisher Vector
  • low-level features
  • GMM
  • gradient wrt. mean
  • gradient wrt. variance
  • F. Perronnin, J. Sanchez, and T. Mensink. Improving The Fisher

Kernel for Large-Scale Image Classification. ECCV, 2010.

slide-16
SLIDE 16

Video Representation

  • Fisher Vector
  • concatenation of and
  • dimension of
  • GMM-128

Feature STIP MoSIFT ActionHOG DT-HOG DT-HOF DT-MBH DT-Traj Feat-Dim 162 256 216 96 108 192 30 FV-Dim 330K 520K 440K 200K 220K 400K 60K

slide-17
SLIDE 17

Video Representation

  • Spatial Pyramids
  • S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bag of Features: Spatial

Pyramid Matching for Recognizing Natural Scene Categories. CVPR, 2006.

slide-18
SLIDE 18

Outline

  • System Overview
  • Low-Level Features
  • Video Representation
  • CascadeSVMs
  • Human Interactions
  • Performance Evaluation
  • Conclusion
slide-19
SLIDE 19

System Overview

slide-20
SLIDE 20

CascadeSVMs

  • Imbalanced Data
slide-21
SLIDE 21

CascadeSVMs

  • Imbalanced Data

0.5 1 1.5 2 2.5 3 3.5 4 4.5

%

slide-22
SLIDE 22

CascadeSVMs

  • X. Yang, C. Yi, L. Cao, and Y. Tian. MediaCCNY at TRECVID 2012:

Surveillance Event Detection. NIST TRECVID Workshop, 2012.

Model-1 Model-2 Model-3 Model-C positive prediction negative prediction Sample

slide-23
SLIDE 23

CascadeSVMs

  • Feature Fusion
slide-24
SLIDE 24

Outline

  • System Overview
  • Low-Level Features
  • Video Representation
  • CascadeSVMs
  • Human Interactions
  • Performance Evaluation
  • Conclusion
slide-25
SLIDE 25

System Overview

slide-26
SLIDE 26

Human Interactions

  • High Throughput UI
slide-27
SLIDE 27

Human Interactions

  • Triage UI
slide-28
SLIDE 28

Outline

  • System Overview
  • Low-Level Features
  • Video Representation
  • CascadeSVMs
  • Human Interactions
  • Performance Evaluation
  • Conclusion
slide-29
SLIDE 29

Performance Evaluation

  • Experimental Setup
  • PersonRuns
  • Fisher Vector
  • CascadeSVMs
  • 40-hour videos for training
  • 10-hour videos for testing
slide-30
SLIDE 30

Performance Evaluation

  • Number of Gaussian Components
  • STIP
slide-31
SLIDE 31

Performance Evaluation

  • Comparisons of Low-Level Features
  • STIP
  • MoSIFT
  • ActionHOG
  • DT-Trajectory
  • DT-HOG
  • DT-HOF
  • DT-MBH
slide-32
SLIDE 32
slide-33
SLIDE 33

Performance Evaluation

  • How A Larger Training Set Helps
  • 40 vs. 90 hours training videos
slide-34
SLIDE 34

Performance Evaluation

  • Feature Fusion
  • 90 hours training videos
  • STIP, DT-Trajectory, DT-MBH
  • Early Fusion
  • Late Fusion
  • Early + Late Fusion
slide-35
SLIDE 35
slide-36
SLIDE 36

Performance Evaluation

  • Formal Evaluation
  • Comparative Results
slide-37
SLIDE 37

Outline

  • System Overview
  • Low-Level Features
  • Video Representation
  • CascadeSVMs
  • Human Interactions
  • Performance Evaluation
  • Conclusion
slide-38
SLIDE 38

Conclusion

  • Best ADCR
slide-39
SLIDE 39

Conclusion

  • Best ADCR

Single Person Multiple People Multiple People Person Object Single Person Person Object Multiple People

slide-40
SLIDE 40

Conclusion

  • Multiple Features
  • fusion scheme
  • ranking and selection
  • event-specific investigation
  • Fisher Vector
  • accuracy and computation
  • Human Interaction
  • collaborative mode
  • cross-event mode
  • static gesture detection
slide-41
SLIDE 41