Motion and Human Actions Ivan Laptev ivan.laptev@inria.fr INRIA, - PowerPoint PPT Presentation

Reconnaissance d’objets et vision artificielle 2013 Motion and Human Actions Ivan Laptev ivan.laptev@inria.fr INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d’Informatique , Ecole Normale Supérieure, Paris

Class overview Motivation Historic review Modern applications Appearance-based methods Motion history images Active shape models Tracking and motion priors Motion-based methods Generic and parametric Optical Flow Motion templates Space-time methods Local space-time features Action classification and detection Weakly-supervised action learning

What we have seen so far ? Temporal templates: Active shape models: Tracking with motion priors: + simple, fast + shape regularization + improved tracking and - sensitive to simultaneous action recognition - sensitive to initialization and - sensitive to initialization and segmentation errors tracking failures tracking failures Motion-based recognition: + generic descriptors; less depends on appearance - sensitive to localization/tracking errors

Motivation Goal: Interpreting complex dynamic scenes Common methods: Common problems: • Segmentation ? • Complex & changing BG • Changing appearance ? • Tracking  No global assumptions about the scene

Space-time No global assumptions  Consider local spatio-temporal neighborhoods hand waving boxing

Actions == Space-time objects?

Local approach: Bag of Visual Words Airplanes Motorbikes Faces Wild Cats Leaves People Bikes

Space-time local features

Space-Time Interest Points: Detection What neighborhoods to consider? Look at the High image Distinctive   distribution of the variation in space neighborhoods gradient and time Definitions: Original image sequence Space-time Gaussian with covariance Gaussian derivative of Space-time gradient Second-moment matrix

Space-Time Interest Points: Detection Properties of : defines second order approximation for the local distribution of within neighborhood  1D space-time variation of , e.g. moving bar  2D space-time variation of , e.g. moving ball  3D space-time variation of , e.g. jumping ball Large eigenvalues of  can be detected by the local maxima of H over (x,y,t): (similar to Harris operator [Harris and Stephens, 1988])

Space-Time interest points appearance/ Velocity split/merge disappearance changes

Space-Time Interest Points: Examples Motion event detection

Spatio-temporal scale selection Local features can be adapted scale changes Selection of temporal scales captures the frequency of events

Relative camera motion Local features can be adapted to motion changes time time

Local features for human actions

Local features for human actions boxing walking hand waving

Local space-time descriptor: HOG/HOF Multi-scale space-time patches Histogram of Histogram oriented spatial of optical  grad. (HOG) ‏ flow (HOF) ‏ 3x3x2x5bins HOF 3x3x2x4bins HOG descriptor descriptor

Visual Vocabulary: K-means clustering  Group similar points in the space of image descriptors using K-means clustering  Select significant clusters Clustering c1 c2 c3 c4 Classification

Local features: Matching  Finds similar events in pairs of video sequences

Action Classification: Overview Bag of space-time features + multi-channel SVM [Laptev’ 03 , Schuldt’ 04 , Niebles’ 06 , Zhang’ 07] Collection of space-time patches Histogram of visual words Multi-channel HOG & HOF SVM patch Classifier descriptors

Action classification results KTH dataset Hollywood-2 dataset AnswerPhone GetOutCar HandShake StandUp DriveCar Kiss [Laptev, Marszałek , Schmid, Rozenfeld 2008]

Action classification Test episodes from movies “The Graduate”, “It’s a Wonderful Life”, “Indiana Jones and the Last Crusade”

Evaluation of local feature detectors and descriptors Four types of detectors: • Harris3D [Laptev 2003] • Cuboids [Dollar et al. 2005] • Hessian [Willems et al. 2008] • Regular dense sampling Four types of descriptors: • HoG/HoF [Laptev et al. 2008] • Cuboids [Dollar et al. 2005] • HoG3D [Kläser et al. 2008] • Extended SURF [ Willems’et al. 2008] Three human actions datasets: • KTH actions [Schuldt et al. 2004] • UCF Sports [Rodriguez et al. 2008] • Hollywood 2 [ Marszałek et al. 2009]

Space-time feature detectors Harris3D Hessian Cuboids Dense

Results on KTH Actions 6 action classes, 4 scenarios, staged Detectors Harris3D Cuboids Hessian Dense HOG3D 89.0% 90.0% 84.6% 85.3% Descriptors HOG/HOF 91.8% 88.7% 88.7% 86.1% 80.9% 82.3% 77.7% 79.0% HOG 92.1% 88.2% 88.6% 88.0% HOF Cuboids - 89.1% - - E-SURF - - 81.4% - (Average accuracy scores) • Best results for sparse Harris3D + HOF • Dense features perform relatively poor compared to sparse features [Wang, Ullah, Kläser, Laptev, Schmid, 2009]

Results on Diving Kicking Walking UCF Sports Skateboarding High-Bar-Swinging Golf-Swinging 10 action classes, videos from TV broadcasts Detectors Harris3D Cuboids Hessian Dense Descriptors 79.7% 82.9% 79.0% 85.6% HOG3D HOG/HOF 78.1% 77.7% 79.3% 81.6% HOG 71.4% 72.7% 66.0% 77.4% HOF 75.4% 76.7% 75.3% 82.6% - 76.6% - - Cuboids - - 77.3% - E-SURF (Average precision scores) • Best results for dense + HOG3D [Wang, Ullah, Kläser, Laptev, Schmid, 2009]

Results on Hollywood-2 AnswerPhone GetOutCar Kiss HandShake StandUp DriveCar 12 action classes collected from 69 movies Detectors Harris3D Cuboids Hessian Dense Descriptors HOG3D 43.7% 45.7% 41.3% 45.3% HOG/HOF 45.2% 46.2% 46.0% 47.4% 32.8% 39.4% 36.2% 39.4% HOG 43.3% 42.9% 43.0% 45.5% HOF Cuboids - 45.0% - - E-SURF - - 38.2% - (Average precision scores) • Best results for dense + HOG/HOF [Wang, Ullah, Kläser, Laptev, Schmid, 2009]

Other recent local representations • Y. and L. Wolf, "Local Trinary Patterns for Human Action Recognition ", ICCV 2009 • P. Matikainen, R. Sukthankar and M. Hebert "Trajectons: Action Recognition Through the Motion Analysis of Tracked Features" ICCV VOEC Workshop 2009, • • H. Wang, A. Klaser, C. Schmid, C.-L. Liu, "Action Recognition by Dense Trajectories", CVPR 2011 • Recognizing Human Actions by Attributes J. Liu, B. Kuipers, S. Savarese, CVPR 2011

Dense trajectory descriptors [Wang et al. CVPR’ 11]

Dense trajectory descriptors [Wang et al. CVPR’11] [Wang et al.] [Wang et al.] [Wang et al.] [Wang et al.]

Dense trajectory descriptors [Wang et al. CVPR’11] Computational cost:

Highly-efficient video descriptors Optical flow from MPEG video compression

Highly-efficient video descriptors Evaluation on Hollywood2 [Wang et al.’11] Evaluation on UCF50 [Wang et al.’ 11] [Kantorov & Laptev, 2013]

Beyond BOF: Temporal structure • Modeling Temporal Structure of Decomposable Motion Segments for Activity Classication, J.C. Niebles, C.-W. Chen and L. Fei-Fei, ECCV 2010 • Learning Latent Temporal Structure for Complex Event Detection. Kevin Tang, Li Fei-Fei and Daphne Koller, CVPR 2012

Beyond BOF: Social roles • T. Yu, S.-N. Lim, K. Patwardhan, and N. Krahnstoever. Monitoring, recognizing and discovering social networks. In CVPR, 2009. • L. Ding and A. Yilmaz. Learning relations among movie characters: A social network perspective. In ECCV, 2010 • V. Ramanathan, B. Yao, and L. Fei-Fei. Social Role Discovery in Human Events. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2013.

Beyond BOF: Egocentric activities • A. Fathi, A. Farhadi, and J. M. Rehg. Understanding egocentric activities. In ICCV, 2011. • H. Pirsiavash, D. Ramanan. Recognizing Activities of Daily Living in First-Person Camera Views, In CVPR, 2012.

Beyond BOF: Action localization Manual annotation of drinking actions in movies: “Coffee and Cigarettes”; “Sea of Love” “ Drinking ”: 159 annotated samples “ Smoking ”: 149 annotated samples Temporal annotation First frame Keyframe Last frame Spatial annotation head rectangle torso rectangle

Action representation Hist. of Gradient Hist. of Optic Flow

Action learning selected features boosting weak classifier • Efficient discriminative classifier [Freund&Schapire’ 97] AdaBoost: • Good performance for face detection [Viola&Jones’ 01] pre-aligned Haar optimal threshold samples features Fisher discriminant Histogram features [Laptev, Perez 2007]

Action Detection Test episodes from the movie “Coffee and cigarettes” [Laptev, Perez 2007]

20 most confident detections

Where to get training data? Weakly-supervised learning

Actions in movies • Realistic variation of human actions • Many classes and many examples per class • Typically only a few class-samples per movie • Manual annotation is very time consuming

Motion and Human Actions Ivan Laptev ivan.laptev@inria.fr INRIA, - PowerPoint PPT Presentation

Reconnaissance dobjets et vision artificielle 2013 Motion and Human Actions Ivan Laptev ivan.laptev@inria.fr INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire dInformatique , Ecole Normale Suprieure, Paris Class overview Motivation

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

Visual Motion Motion illusions Uses for motion cues Optic flow Motion blindness

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Motion and Human Motion and Human Actions Actions Ivan Laptev ivan.laptev@ens.fr Equipe projet

Motion Aftereffects Without Motion: Engaging the Human Motion Perception System With Still

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

Motion Estimation for Video Coding Motion-Compensated Prediction Bit Allocation Motion

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Motion Capture Specialized Motion Capture N. Alberto Borghese Laboratory of Human Motion

Outline Outline Motion & Inverse Motion Motion & Inverse Motion Time

Learning to Synthesize Motion Blur CVPR 2019 Tim Brooks and Jon Barron Research Motion During

Motion in Photography Freeze Motion / Blur Motion Objective The student will create freeze

Civil Actions Civil Actions Civil Actions Lesson No. 13 ENV H 471 Environmental Health

Simple Harmonic Motion (SHM) Slide 2 / 67 SHM and Uniform Circular Motion There is a deep

Simple Harmonic Motion Slide 2 / 70 SHM and Circular Motion There is a deep connection between

CIVILIZATION ANP363 COURSE WEBSITE anthropology.msu.edu/anp363-ss16 WHAT? THE ANCIENT STATE

1 John Series Lesson #040 October 21, 2001 Dean Bible Ministries www.deanbibleministries.org

Coast Bible Conference Cleansing July 2015 Dean Bible Ministries www.deanbibleministries.org

Latitude / Longitude The Equator and Parallels of Latitude Longitude The Prime Meridian

Mesopotamia 5,000 years ago, in the Middle east came the first cities (ever) Fertile land

IMTKU Question Answering System for World History Exams at NTCIR-13 QALab-3 Department of

RANDOM WALK IN DYNAMIC RANDOM ENVIRONMENT Frank den Hollander Leiden University The Netherlands

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Motion and Human Actions Ivan Laptev ivan.laptev@inria.fr INRIA, - PowerPoint PPT Presentation

Reconnaissance dobjets et vision artificielle 2013 Motion and Human Actions Ivan Laptev ivan.laptev@inria.fr INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire dInformatique , Ecole Normale Suprieure, Paris Class overview Motivation

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

Visual Motion Motion illusions Uses for motion cues Optic flow Motion blindness

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Motion and Human Motion and Human Actions Actions Ivan Laptev ivan.laptev@ens.fr Equipe projet

Motion Aftereffects Without Motion: Engaging the Human Motion Perception System With Still

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

Motion Estimation for Video Coding Motion-Compensated Prediction Bit Allocation Motion

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Motion Capture Specialized Motion Capture N. Alberto Borghese Laboratory of Human Motion

Outline Outline Motion &amp; Inverse Motion Motion &amp; Inverse Motion Time

Learning to Synthesize Motion Blur CVPR 2019 Tim Brooks and Jon Barron Research Motion During

Motion in Photography Freeze Motion / Blur Motion Objective The student will create freeze

Civil Actions Civil Actions Civil Actions Lesson No. 13 ENV H 471 Environmental Health

Simple Harmonic Motion (SHM) Slide 2 / 67 SHM and Uniform Circular Motion There is a deep

Simple Harmonic Motion Slide 2 / 70 SHM and Circular Motion There is a deep connection between

CIVILIZATION ANP363 COURSE WEBSITE anthropology.msu.edu/anp363-ss16 WHAT? THE ANCIENT STATE

1 John Series Lesson #040 October 21, 2001 Dean Bible Ministries www.deanbibleministries.org

Coast Bible Conference Cleansing July 2015 Dean Bible Ministries www.deanbibleministries.org

Latitude / Longitude The Equator and Parallels of Latitude Longitude The Prime Meridian

Mesopotamia 5,000 years ago, in the Middle east came the first cities (ever) Fertile land

IMTKU Question Answering System for World History Exams at NTCIR-13 QALab-3 Department of

RANDOM WALK IN DYNAMIC RANDOM ENVIRONMENT Frank den Hollander Leiden University The Netherlands

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Outline Outline Motion & Inverse Motion Motion & Inverse Motion Time