Action recognition in videos II Action recognition in videos II - PowerPoint PPT Presentation

Action recognition in videos II Action recognition in videos II Cordelia Schmid INRIA Grenoble

Action recognition - goal Action recognition goal • Short actions, i.e. answer phone, shake hands hand shake hand shake answer phone h

Action recognition - goal Action recognition goal • Activities/events, i.e. making a sandwich, doing homework Activities/events i e making a sandwich doing homework M ki Making sandwich d i h D i Doing homework h k TrecVid Multi-media event detection dataset

Action recognition - goal Action recognition goal • Activities/events, i.e. birthday party, parade Activities/events i e birthday party parade Birthday party Parade TrecVid Multi-media event detection dataset

Action recognition - tasks Action recognition tasks Tasks Tasks • Action classification: assigning an action label to a video clip Action classification: assigning an action label to a video clip M ki Making sandwich: present d i h Feeding animal: not present …

Action recognition - tasks Action recognition tasks Tasks Tasks • Action classification: assigning an action label to a video clip Action classification: assigning an action label to a video clip M ki Making sandwich: present d i h Feeding animal: not present … • Action localization: search locations of an action in a video Action locali ation search locations of an action in a ideo

Outline Outline • Improved video description Improved video description – Dense trajectories and motion-boundary descriptors • Adding temporal information to the bag of features – Actom sequence model for efficient action detection – Actom sequence model for efficient action detection • Modeling human-object interaction Modeling human-object interaction

Dense trajectories - motivation Dense trajectories motivation • Dense sampling improves results over sparse interest D li i lt i t t points for image classification [Fei-Fei'05, Nowak'06] • Recent progress by using feature trajectories for action recognition [Messing'09 Sun'09] recognition [Messing 09, Sun 09] • The 2D space domain and 1D time domain in videos have • The 2D space domain and 1D time domain in videos have very different characteristics  Dense trajectories: a combination of dense sampling with feature trajectories [Wang, Klaeser, Schmid & Lui, CVPR’11] feature trajectories [Wang, Klaeser, Schmid & Lui, CVPR 11]

Approach Approach • Dense multi-scale sampling D lti l li • Feature tracking over L frames with optical flow • Trajectory-aligned descriptors with a spatio-temporal grid T j t li d d i t ith ti t l id

Approach Approach Dense sampling – remove untrackable points remove untrackable points – based on the eigenvalues of the auto-correlation matrix Feature tracking – by median filtering in dense optical flow field – length is limited to avoid drifting

Feature tracking Feature tracking SIFT tracks KLT tracks Dense tracks

Trajectory descriptors Trajectory descriptors • Motion boundary descriptor Motion boundary descriptor – spatial derivatives are calculated separately for optical flow in x and y , quantized into a histogram q g – relative dynamics of different regions – suppresses constant motions as appears for example due to b background camera motion k d ti

Trajectory descriptors Trajectory descriptors • Trajectory shape described by normalized relative point Trajectory shape described by normalized relative point coordinates • HOG, HOF and MBH are encoded along each trajectory

Experimental setup Experimental setup • Bag-of-features with 4000 clusters obtained by k-means, Bag of features with 4000 clusters obtained by k means classification by non-linear SVM with RBF + chi-square kernel kernel – confirmed by recent results with Fisher vector + linear SVM • Descriptors are combined by addition of distances • Evaluation on two datasets: UCFSport (classification accuracy) and Hollywood2 (mean average precision) y) y ( g p ) • Two baseline trajectories: KLT and SIFT j

Comparison of descriptors Comparison of descriptors Hollywood2 Hollywood2 UCFSports UCFSports Trajectory 47.8% 75.4% HOG 41.2% 84.3% HOF 50.3% 76.8% MBH 55.1% 84.2% Combined 58.2% 88.0% • Trajectory descriptor performs well • HOF >> HOG for Hollywood2, dynamic information is relevant • HOG >> HOF for sports datasets, spatial context is relevant • MBH consistently outperforms HOF, robust to camera motion

Comparison of trajectories Comparison of trajectories Hollywood2 y UCFSports p Dense trajectory + MBH 55.1% 84.2% KLT trajectory + MBH 48.6% 78.4% SIFT trajectory + MBH 40.6% 72.1% • Dense >> KLT >> SIFT trajectories

Improved trajectories (Wang & Schmid ICCV’13) Improved trajectories (Wang & Schmid ICCV 13) • Dense trajectories impacted by camera motion Dense trajectories impacted by camera motion • Stabilize camera motion before computing optical flow – Extract features matches (SURF and dense optical flow) Extract features matches (SURF and dense optical flow) – Compute robust homography

Improved trajectories Improved trajectories

Experimental setting Experimental setting

Results Results

Excellent results in TrecVid MED’13 Excellent results in TrecVid MED 13 • Combination of MBH SIFT, audio, text & speech recognition Combination of MBH SIFT audio text & speech recognition • First in the know event challenge, first in the adhoc event challenge challenge Making sandwich Making sandwich – results results Rank 1 (pos) R k 1 ( ) R Rank 20 (pos) k 20 ( ) R Rank 21 (neg) k 21 ( )

Excellent results in TrecVid MED’13 Excellent results in TrecVid MED 13 Fl FlashMob gathering – results hM b th i lt Rank 1 (pos) Rank 18 (pos) Rank 19 (neg)

Impact of different channels Impact of different channels

Conclusion Conclusion • Dense trajectory representation for action recognition Dense trajectory representation for action recognition outperforms existing approaches • Motion stabilization improves performance of motion- based descriptors MBH and HOF based descriptors MBH and HOF • Efficient algorithm on-line available at Efficient algorithm, on-line available at https://lear.inrialpes.fr/software • Recent excellent results in the TrecVID MED 2013 challenge g

Outline Outline • Improved video description Improved video description – Dense trajectories and motion-boundary descriptors • Adding temporal information to the bag of features – Actom sequence model for efficient action detection – Actom sequence model for efficient action detection • Modeling human-object interaction Modeling human-object interaction

Approach for action modeling Approach for action modeling • Model of the temporal structure of an action with a Model of the temporal structure of an action with a sequence of “action atoms” (actoms) • Action atoms are action specific short key events whose Action atoms are action specific short key events, whose sequence is characteristic of the action

Related work Related work • Temporal structuring of video data – Bag ‐ of ‐ features with spatio ‐ temporal pyramids [Laptev’08] – Loose hierarchical structure of latent motion parts [Niebles’10] – Facial action recognition with action unit detection and structured learning of temporal segments [Si structured learning of temporal segments [Simon’10] ’10]

Approach for action modeling Approach for action modeling • Actom Sequence Model ( ASM ): ( ) histogram of time ‐ anchored visual features

Actom annotation Actom annotation • Actoms for training actions are obtained manually (3 actoms per action here) (3 actoms per action here) • Alternative supervision to clips annotation (beginning Alt ti i i t li t ti (b i i and end frames) with similar cost and smaller annotation variability t ti i bilit • Automatic detection of actoms at test time

Actom descriptor Actom descriptor • An actom is parameterized by: A t i t i d b – central frame location – time ‐ span – time ‐ span – temporally weighted feature assignment mechanism • Actom descriptor: – histogram of quantized visual words in the actom’s range – contribution depends on temporal distance to actom center (using temporal Gaussian weighting) (using temporal Gaussian weighting)

Actom sequence model (ASM) Actom sequence model (ASM) • ASM: concatenation of actom histograms ASM t ti f t hi t • Temporally structured extension of BOF • Action represented by a single sparse sequential model

Actom Sequence Model (ASM) q ( ) Parameters • ASM model has two parameters: overlap between actoms ASM model has two parameters: overlap between actoms (controls radius) and soft ‐ voting “ peakyness ” (controls profile) Keyframe ‐ like BOF ‐ like

Action recognition in videos II Action recognition in videos II - PowerPoint PPT Presentation

Action recognition in videos II Action recognition in videos II Cordelia Schmid INRIA Grenoble Action recognition - goal Action recognition goal Short actions, i.e. answer phone, shake hands hand shake hand shake answer phone h

Action recognition in videos Action recognition in videos Cordelia Schmid Cordelia Schmid

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Creating Videos Session will begin shortly Why create instructional videos for your courses?

Consuming videos with the ForkBrowser Consuming videos with the ForkBrowser Ork de Rooij, Cees

Dennis Rosenberg http://DennisRosenberg.com Why Videos? People love watching videos Higher

Understand Basketball Games 2018.6.15 Sports Videos Large quantity, high

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A.

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Content-Based Projections for Panoramic Images and Panoramic Images and Videos Videos

Action recognition Cordelia Schmid INRIA Grenoble Action recognition examples Short

Action recognition Cordelia Schmid INRIA Grenoble Action recognition examples Short

Towards efficient end-to-end architectures for action recognition and detection in videos Limin

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Affinity Group 1 April 16, 2019 The University of Wisconsin Service Center will Serve the

Commission Briefing Managing Fatigue 10 CFR Part 26, Subpart I Tony Pietrangelo Nuclear Energy

CDW Corporation Webcast Conference Call October 31, 2018 CDW .com | 8 0 0 .8 0 0 .4 2 3 9

UTSA Computer Science MS Degree Program Weining Zhang Outline General information

Ja c kso nville De ve lo pme nta l Ce nte r Ove rvie w Ba c kg ro und Me tho do lo g y

Relativism & Utilitarianism August 29th, 2018 Every society has rules of conduct that define

Lecture 13: Kuhn and Progress 1 1. Pre-paradigm science... 2. Normal science... 3. Accumulation

Fast SixTrack Space Charge module JB. Lagrange, H. Bartosik, R. De Maria, K. Sjbk, F.

Action recognition in videos II Action recognition in videos II - PowerPoint PPT Presentation

Action recognition in videos II Action recognition in videos II Cordelia Schmid INRIA Grenoble Action recognition - goal Action recognition goal Short actions, i.e. answer phone, shake hands hand shake hand shake answer phone h

Action recognition in videos Action recognition in videos Cordelia Schmid Cordelia Schmid

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Creating Videos Session will begin shortly Why create instructional videos for your courses?

Consuming videos with the ForkBrowser Consuming videos with the ForkBrowser Ork de Rooij, Cees

Dennis Rosenberg http://DennisRosenberg.com Why Videos? People love watching videos Higher

Understand Basketball Games 2018.6.15 Sports Videos Large quantity, high

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A.

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Content-Based Projections for Panoramic Images and Panoramic Images and Videos Videos

Action recognition Cordelia Schmid INRIA Grenoble Action recognition examples Short

Action recognition Cordelia Schmid INRIA Grenoble Action recognition examples Short

Towards efficient end-to-end architectures for action recognition and detection in videos Limin

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Affinity Group 1 April 16, 2019 The University of Wisconsin Service Center will Serve the

Commission Briefing Managing Fatigue 10 CFR Part 26, Subpart I Tony Pietrangelo Nuclear Energy

CDW Corporation Webcast Conference Call October 31, 2018 CDW .com | 8 0 0 .8 0 0 .4 2 3 9

UTSA Computer Science MS Degree Program Weining Zhang Outline General information

Ja c kso nville De ve lo pme nta l Ce nte r Ove rvie w Ba c kg ro und Me tho do lo g y

Relativism &amp; Utilitarianism August 29th, 2018 Every society has rules of conduct that define

Lecture 13: Kuhn and Progress 1 1. Pre-paradigm science... 2. Normal science... 3. Accumulation

Fast SixTrack Space Charge module JB. Lagrange, H. Bartosik, R. De Maria, K. Sjbk, F.

Relativism & Utilitarianism August 29th, 2018 Every society has rules of conduct that define