action recognition with improved trajectories
play

Action Recognition with Improved Trajectories Heng Wang and Cordelia - PowerPoint PPT Presentation

Action Recognition with Improved Trajectories Heng Wang and Cordelia Schmid LEAR, INRIA, France IEEE ICCV 2013 Presentation by Santiago Gonzalez Presentation by Santiago Gonzalez The Problem How can we recognize actions in video?


  1. Action Recognition with Improved Trajectories Heng Wang and Cordelia Schmid LEAR, INRIA, France IEEE ICCV 2013 Presentation by Santiago Gonzalez Presentation by Santiago Gonzalez

  2. The Problem • How can we recognize actions in video? • Applications include gesture recognition, threat detection, media indexing and querying, etc. people running shutterstock Presentation by Santiago Gonzalez

  3. Past Approaches • Image segmentation to separate background and estimate camera motion • Stabilization using coarse optical flow • Saliency mapping • Dense trajectory clustering Presentation by Santiago Gonzalez

  4. Agenda • The Problem and Past Approaches • Improved Trajectories • Experimental Setup • Results • Concluding Remarks and Discussion Presentation by Santiago Gonzalez

  5. Action Recognition with Improved Trajectories • Explicit camera motion estimation • Corrects optical flow, prunes background • Leads to better motion descriptor performance Presentation by Santiago Gonzalez

  6. Improved Trajectories Presentation by Santiago Gonzalez

  7. Pipeline Overview • For consecutive frames: • Extract SURF descriptors with nearest-neighbor matching • Estimate optical flow, sample by thresholding smallest autocorrelation matrix λ s (optimal sampling for tracking) [35] • Estimate homography using RANSAC • Remove camera-induced displacement via thresholding Presentation by Santiago Gonzalez

  8. Features • SURF works great for detecting blob-like structures • (Speeded [sic] Up Robust Features) • Much faster than SIFT • Patented • Optical flow w/ good-features-to-track [35] great for detecting large gradients (i.e., corners and edges) Presentation by Santiago Gonzalez

  9. Polynomial Expansion Optical Flow Estimation [8] • Gunnar Farnebäck, 2003 • Estimate displacement d by modeling pixel neighborhood as a quadratic polynomial • Assume slowly varying displacement field Presentation by Santiago Gonzalez

  10. Human Detection • We know humans aren’t background a priori • Part-based human detection with tracking, works with occlusion • Mask away matches from humans when estimating homography SURF Flow SURF + detection Flow + detection Presentation by Santiago Gonzalez

  11. Experimental Setup Presentation by Santiago Gonzalez

  12. Dense Trajectory Features* • Points densely sampled at di ff erent spatial scales • Points are tracked using in heterogeneous areas (tracked for 15 frames to avoid drift) • HOG, HOF , MBH, and trajectory (i.e., concatenation of displacement vectors) descriptors are calculated • Descriptors calculated in space-time volume aligned with trajectory * Nothing new, mostly replicating setup in [40] Presentation by Santiago Gonzalez

  13. Feature Encoding • Bag of features and Fischer vector (includes 2 nd order data) • 4,000 element codebook build using k-means from 100,000 random features • Classification: • RBF-kernel SVM for bag of features • Linear SVM for Fisher vector Presentation by Santiago Gonzalez

  14. Datasets Hollywood2 HMDB51 Olympic Sports UCF50 69 movies >6k videos 783 sequences >6k YouTube videos 12 actions 51 actions 16 actions 50 actions Each dataset has hundreds to thousands of video sequences. Presentation by Santiago Gonzalez

  15. Results Presentation by Santiago Gonzalez

  16. Video Demo Presentation by Santiago Gonzalez https://lear.inrialpes.fr/people/wang/improved_trajectories

  17. Recognition Accuracy Use all features Warping with homography Background pruning Warping with homography and background pruning Presentation by Santiago Gonzalez

  18. Recognition Accuracy Presentation by Santiago Gonzalez

  19. Combined Descriptor Recognition Accuracy Dense Trajectory Features Improved Trajectory Features Presentation by Santiago Gonzalez

  20. Human Detection: Effect on Accuracy * with Fisher Vector encoding Presentation by Santiago Gonzalez

  21. State of the Art Results State of the Art Improvement Over Dataset Accuracy State of the Art Hollywood2 2% 62.5% HMDB51 5% 52.1% Olympic Sports 8% 83.2% UCF50 8% 83.3% Presentation by Santiago Gonzalez

  22. Technique Deficiencies • Failure cases: • Homography is fit to foreground if it dominates the frame • Strong motion blur (issue in real-world datasets) Presentation by Santiago Gonzalez

  23. Technique Deficiencies • Failure cases: • Complex mapping from estimated homography to background Presentation by Santiago Gonzalez

  24. Discussion + Q&A Presentation by Santiago Gonzalez

  25. Discussion Points • How can some of this technique’s deficiencies be overcome? • What other types of a priori knowledge can be incorporated? • The four datasets are all human-centric, how well would this pipeline work for nonhuman agents (e.g., cars)? • Bag of features and Fischer vectors seem somewhat naïve, would a di ff erent encoding work better? Presentation by Santiago Gonzalez

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend