Video-based Action Recognition Ying Wu Electrical Engineering and - PowerPoint PPT Presentation

Video-based Action Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208

Outline Introduction The Task of Action Recognition Main Challenges in Action Recognition Categorization of Existing Methods Common-used Action Datasets Action Recognition by Appearance Representation - I On Space-Time Interest Points Action Recognition by Appearance Representation - II Recognizing Human Actions: A Local SVM Approach Action Recognition by Dynamic Modeling Coupled Hidden Markov Models for Complex Action Recognition

What is an Action? ◮ Action : Atomic motion(s) that can be unambiguously distinguished (e.g. sitting down, running). ◮ An activity is composed of several actions performed in succession (e.g. dining, meeting a person). ◮ Event is a combination of activities (e.g. football match, traffic accident).

What is Action Recognition? ◮ What is Recognition? ◮ Verification : Is the walking man Obama? ◮ Identification : Who is the walking man? ◮ Recognition : What is the man doing? ◮ The recognition of action is to match the observation (e.g. videos) with previously defined patterns and then assign it a label, i.e. action type. ◮ Input : an action video; ◮ Output : an action label;

Why Need Action Recognition? ◮ Expensive human effort to handle rapidly increasing amount of video records; ◮ Large number of potential applications: ◮ visual surveillance ◮ crowd behavior analysis ◮ human-machine interfaces ◮ sports video analysis ◮ video retrieval ◮ etc.

Main Challenges in Action Recognition ◮ Different scales ◮ People may appear at different scales in different videos, yet perform the same action. ◮ Movement of the camera ◮ Background “clutter” ◮ Other objects/humans present in the video frame. ◮ Partial Occlusions ◮ Human/Action variation (large intra-class variation) ◮ Walking movements can differ in speed and stride length. ◮ Etc.

Categories of Action Recognition Methods Appearance Representation ◮ Focus on extracting “better” appearance representation from action video; ◮ hand-crafted features : HOG [7], HOF [4], MBH [18] or combinations [18]; ◮ learned features : deep neural network [20, 5, 16]

Categories of Action Recognition Methods Dynamic Modeling ◮ Focus on molding the dynamics and motions in action video; ◮ Deterministic models : dynamic time warping [24], maximum margin temporal warping [19], actom sequence model [6], graphs [3] and deep neural architectures [14, 17]; ◮ Generative models : HMM [10], coupled HMM [2], CRF [21] and dynamic Bayes nets [23].

Small Size Datasets ◮ The KTH Dataset [13] ◮ 6 actions (walking, jogging, running, boxing, hand waving and hand clapping) ◮ The Weizmann Dataset [1] ◮ 10 actions (walk, run, jump, gallop sideways, bend, one-hand wave, two-hands wave, jump in place, jumping jack and skip) ◮ The UCF Sports Action Dataset ◮ 9 actions (diving, golf swinging, kicking, weightlifting, horseback riding, running, skating, swinging a baseball bat and walking)

Large Size Datasets ◮ The IXMAS Dataset [22] ◮ 14 actions (check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, point, pick up, throw over head and throw from bottom up) ◮ Hollywood Human Action Dataset [11] ◮ 12 actions (answer phone, get out of car, handshake, hug, kiss, sit down, sit up, stand up, drive car, eat, fight and run) ◮ The UCF50 Dataset [12]: 50 different actions/activities ◮ The HMDB51 Dataset [8]: 51 different actions/activities ◮ The UCF101 Dataset [15]: 101 different actions/activities

Data Samples (a) KTH Dataset (b) Hollywood Dataset

Overview ◮ Title : On Space-Time Interest Points (2005) 1 ◮ Motivated by the idea of Harris and Forstner spatial interest point operators, extended into the spatio-temporal domain; ◮ Aim to find the “good” spatio-temporal positions in a sequence for feature extraction; ◮ Distinct and stable descriptors are extracted from the obtained interest points; ◮ Author : Ivan Laptev 1 I. Laptev. On space-time interest points. International Journal of Computer Vision , 64(2-3):107–123, 2005

Spatio-Temporal Interest Points ◮ The points that have large variations along both the spatial and the temporal directions in local spatio temporal volumes. Figure : Detecting the strongest spatio-temporal interest points in a football sequence with a player heading the ball.

Spatio-Temporal Interest Point Detection ◮ In the spatial domain, we can model an image f sp by its linear scale-space representation L sp : ∗ f sp ( x , y ) L sp � x , y ; σ 2 = g sp � x , y ; σ 2 � � l l ◮ Like the operation for image, we can model the sequence by a linear scale-space representation L : � · ; σ 2 l , τ 2 � � · ; σ 2 l , τ 2 � L = g ∗ f ( · ) l l = exp ( − ( x 2 + y 2 ) / 2 σ 2 l − t 2 / 2 τ 2 l ) x , y , t ; σ 2 l , τ 2 � � g l � (2 π ) 3 σ 4 l τ 2 l

Spatio-Temporal Interest Point Detection ◮ Construct a 3 × 3 spatio-temporal second-moment matrix:  L 2  L x L y L x L t x · ; σ 2 i , τ 2 L 2 � � µ = g ∗ L x L y L y L t i  y  L 2 L x L t L y L t t ◮ The first-order derivatives are defined as: ( ξ = { x,y,t } ) � · ; σ 2 l , τ 2 � L ξ = ∂ ξ ( g ∗ f ) l ◮ Compute the three eigenvalues λ 1 , λ 2 and λ 3 of µ , the Harris corner function is then defined as: H = det ( µ ) − k · trace 3 ( µ ) = λ 1 λ 2 λ 3 − k ( λ 1 + λ 2 + λ 3 ) 3 ◮ Detect the interest points by calculating the positive local maxima of H ;

Space-Time Interest Points: Examples (a) Action : clapping hands (b) The detected interst points

Spatio-Temporal Scale Adaptation ◮ Let’s recall the scale-space representation L ( · ; σ 2 l , τ 2 l ), the two scale factors σ 2 l and τ 2 l influence the result a lot; ◮ The larger the τ 2 l is, the easier the space-time structures with long temporal extents are detected; ◮ The larger the σ 2 l is, the easier the space-time structures with long spatial extents are detected;

Spatio-Temporal Scale Adaptation ◮ By finding the extrema of ▽ 2 norm L over both spatial and temporal scales, we can automatically determine the scale factors.

Result Figure : Results of spatial/spatio-temporal interest point detection for a zoom-in sequence of a walking person.

Result Figure : (top): Correct matches in sequences with leg actions; (bottom): Correct matches in sequences with arm actions;

Overview ◮ Title : Recognizing Human Actions: A Local SVM Approach (2004) 2 ◮ Use local space-time features to represent video sequences that contain actions. ◮ Classification is done via an SVM. ◮ Author : Christian Schuldt, Ivan Laptev and Barbara Caputo 2 C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local svm approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on , volume 3, pages 32–36. IEEE, 2004

Local Space-time Features Figure : Local space-time features detected for a walking pattern 3 3 C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local svm approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on , volume 3, pages 32–36. IEEE, 2004

Representation of Features ◮ Spatial-temporal “jets” (4th order) are computed at each feature center: j = ( L x , L y , L t , L xx , · · · , L tttt ) | σ 2 =˜ σ 2 i ,τ 2 =˜ τ 2 i L x m y n t k = σ m + n τ k ( ∂ x m y n t k g ) ∗ f ◮ Using k-means clustering over j , a vocabulary consisting of words h i is created from the jet descriptors; ◮ Finally, a given video is represented by a histogram of counts of occurrences of features corresponding to h i in that video: H = ( h 1 , ..., h n )

Recognition by Support Vector Machines ◮ For action recognition, we combine the obtained local space-time features with SVM; ◮ Given a set of training data from different action classes { ( H i , y i ) } n i =1 , a SVM classifier for each action class is learned: � n � � f ( H ) = sgn α i y i H i + b i =1 ◮ Easy to extend to a kernelized version;

Results Figure : Results of action recognition for different methods and scenarios on KTH dataset.

Video-based Action Recognition Ying Wu Electrical Engineering and - PowerPoint PPT Presentation

Video-based Action Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 Outline Introduction The Task of Action Recognition Main Challenges in Action Recognition Categorization of

Action recognition in videos Action recognition in videos Cordelia Schmid Cordelia Schmid

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Action recognition in videos II Action recognition in videos II Cordelia Schmid INRIA Grenoble

Keypoint-Based Action Keypoint-Based Action Recognition Recognition Presenter: Jianchao Yang

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Action recognition Cordelia Schmid INRIA Grenoble Action recognition examples Short

Action recognition Cordelia Schmid INRIA Grenoble Action recognition examples Short

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Gate-Shift Networks for Video Action Recognition Swathikiran Sudhakaran 1 Sergio Escalera 2,3

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

License Plate Recognition License Plate Recognition License Plate Recognition License Plate

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

OUR MISSION ST. AUGUSTINE HIGH SCHOOL WILL PREPARE ALL STUDENTS FOR COLLEGES AND CAREERS THROUGH

Welcome Class of 2024! OXFORD HIGH SCHOOL Daily Course Rotation Extra & Co-Curricular

GR Presentation of students 1 st EPAL Dramas Presentation of students My name is Anna - Maria

Sports Policy factors Leading to International Sporting Success Northern Ireland Update

GRADUATION REQUIREMENTS English 4 Credits -I, II, III, IV Math 4 Credits Math I, Math II,

FIVE MINUTES WITH AMSTERDAM REDEFINING TRADITIONAL HOSPITALITY YOTEL provides its

Parks and Recreation Master Plan Inspiring communities to action Agenda Influencing Factors

EASM 2014 (UN Inter-Agency Taskforce on Sport for Development and Peace, 2005). Pro Sport

Video-based Action Recognition Ying Wu Electrical Engineering and - PowerPoint PPT Presentation

Video-based Action Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 Outline Introduction The Task of Action Recognition Main Challenges in Action Recognition Categorization of

Action recognition in videos Action recognition in videos Cordelia Schmid Cordelia Schmid

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Action recognition in videos II Action recognition in videos II Cordelia Schmid INRIA Grenoble

Keypoint-Based Action Keypoint-Based Action Recognition Recognition Presenter: Jianchao Yang

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Action recognition Cordelia Schmid INRIA Grenoble Action recognition examples Short

Action recognition Cordelia Schmid INRIA Grenoble Action recognition examples Short

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Gate-Shift Networks for Video Action Recognition Swathikiran Sudhakaran 1 Sergio Escalera 2,3

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

License Plate Recognition License Plate Recognition License Plate Recognition License Plate

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

OUR MISSION ST. AUGUSTINE HIGH SCHOOL WILL PREPARE ALL STUDENTS FOR COLLEGES AND CAREERS THROUGH

Welcome Class of 2024! OXFORD HIGH SCHOOL Daily Course Rotation Extra &amp; Co-Curricular

GR Presentation of students 1 st EPAL Dramas Presentation of students My name is Anna - Maria

Sports Policy factors Leading to International Sporting Success Northern Ireland Update

GRADUATION REQUIREMENTS English 4 Credits -I, II, III, IV Math 4 Credits Math I, Math II,

FIVE MINUTES WITH AMSTERDAM REDEFINING TRADITIONAL HOSPITALITY YOTEL provides its

Parks and Recreation Master Plan Inspiring communities to action Agenda Influencing Factors

EASM 2014 (UN Inter-Agency Taskforce on Sport for Development and Peace, 2005). Pro Sport

Welcome Class of 2024! OXFORD HIGH SCHOOL Daily Course Rotation Extra & Co-Curricular