Action Recognition with Improved Trajectories Heng Wang and Cordelia - - PowerPoint PPT Presentation

action recognition with improved trajectories
SMART_READER_LITE
LIVE PREVIEW

Action Recognition with Improved Trajectories Heng Wang and Cordelia - - PowerPoint PPT Presentation

Action Recognition with Improved Trajectories Heng Wang and Cordelia Schmid LEAR, INRIA, France IEEE ICCV 2013 Presentation by Santiago Gonzalez Presentation by Santiago Gonzalez The Problem How can we recognize actions in video?


slide-1
SLIDE 1

Presentation by Santiago Gonzalez

Action Recognition with Improved Trajectories

Heng Wang and Cordelia Schmid LEAR, INRIA, France

Presentation by Santiago Gonzalez IEEE ICCV 2013

slide-2
SLIDE 2

Presentation by Santiago Gonzalez

The Problem

  • How can we recognize actions in video?
  • Applications include gesture recognition, threat detection,

media indexing and querying, etc.

people running

shutterstock

slide-3
SLIDE 3

Presentation by Santiago Gonzalez

Past Approaches

  • Image segmentation to separate background and

estimate camera motion

  • Stabilization using coarse optical flow
  • Saliency mapping
  • Dense trajectory clustering
slide-4
SLIDE 4

Presentation by Santiago Gonzalez

Agenda

  • The Problem and Past Approaches
  • Improved Trajectories
  • Experimental Setup
  • Results
  • Concluding Remarks and Discussion
slide-5
SLIDE 5

Presentation by Santiago Gonzalez

Action Recognition with Improved Trajectories

  • Explicit camera motion

estimation

  • Corrects optical flow,

prunes background

  • Leads to better motion

descriptor performance

slide-6
SLIDE 6

Presentation by Santiago Gonzalez

Improved Trajectories

slide-7
SLIDE 7

Presentation by Santiago Gonzalez

Pipeline Overview

  • For consecutive frames:
  • Extract SURF descriptors with nearest-neighbor matching
  • Estimate optical flow, sample by thresholding smallest

autocorrelation matrix λs (optimal sampling for tracking) [35]

  • Estimate homography using RANSAC
  • Remove camera-induced displacement via thresholding
slide-8
SLIDE 8

Presentation by Santiago Gonzalez

Features

  • SURF works great for detecting blob-like structures
  • (Speeded [sic] Up Robust Features)
  • Much faster than SIFT
  • Patented
  • Optical flow w/ good-features-to-track [35] great for

detecting large gradients (i.e., corners and edges)

slide-9
SLIDE 9

Presentation by Santiago Gonzalez

Polynomial Expansion Optical Flow Estimation [8]

  • Gunnar Farnebäck, 2003
  • Estimate displacement d by modeling pixel neighborhood

as a quadratic polynomial

  • Assume slowly varying displacement field
slide-10
SLIDE 10

Presentation by Santiago Gonzalez

Human Detection

  • We know humans aren’t background a priori
  • Part-based human detection with tracking, works with occlusion
  • Mask away matches from humans when estimating homography

SURF Flow SURF + detection Flow + detection

slide-11
SLIDE 11

Presentation by Santiago Gonzalez

Experimental Setup

slide-12
SLIDE 12

Presentation by Santiago Gonzalez

Dense Trajectory Features*

  • Points densely sampled at different spatial scales
  • Points are tracked using in heterogeneous areas (tracked

for 15 frames to avoid drift)

  • HOG, HOF

, MBH, and trajectory (i.e., concatenation of displacement vectors) descriptors are calculated

  • Descriptors calculated in space-time volume aligned with

trajectory

* Nothing new, mostly replicating setup in [40]

slide-13
SLIDE 13

Presentation by Santiago Gonzalez

Feature Encoding

  • Bag of features and Fischer vector (includes 2nd order data)
  • 4,000 element codebook build using k-means from 100,000

random features

  • Classification:
  • RBF-kernel SVM for bag of features
  • Linear SVM for Fisher vector
slide-14
SLIDE 14

Presentation by Santiago Gonzalez

Datasets

Hollywood2 HMDB51 UCF50 Olympic Sports

Each dataset has hundreds to thousands of video sequences.

69 movies 12 actions >6k videos 51 actions 783 sequences 16 actions >6k YouTube videos 50 actions

slide-15
SLIDE 15

Presentation by Santiago Gonzalez

Results

slide-16
SLIDE 16

Presentation by Santiago Gonzalez

https://lear.inrialpes.fr/people/wang/improved_trajectories

Video Demo

slide-17
SLIDE 17

Presentation by Santiago Gonzalez

Recognition Accuracy

Warping with homography and background pruning Warping with homography Background pruning Use all features

slide-18
SLIDE 18

Presentation by Santiago Gonzalez

Recognition Accuracy

slide-19
SLIDE 19

Presentation by Santiago Gonzalez

Combined Descriptor Recognition Accuracy

Dense Trajectory Features Improved Trajectory Features

slide-20
SLIDE 20

Presentation by Santiago Gonzalez

Human Detection: Effect on Accuracy

* with Fisher Vector encoding

slide-21
SLIDE 21

Presentation by Santiago Gonzalez

State of the Art Results

Dataset State of the Art Accuracy Improvement Over State of the Art Hollywood2 62.5% 2% HMDB51 52.1% 5% Olympic Sports 83.2% 8% UCF50 83.3% 8%

slide-22
SLIDE 22

Presentation by Santiago Gonzalez

Technique Deficiencies

  • Failure cases:
  • Homography is fit to foreground if it dominates the frame
  • Strong motion blur (issue in real-world datasets)
slide-23
SLIDE 23

Presentation by Santiago Gonzalez

Technique Deficiencies

  • Failure cases:
  • Complex mapping from estimated homography to background
slide-24
SLIDE 24

Presentation by Santiago Gonzalez

Discussion + Q&A

slide-25
SLIDE 25

Presentation by Santiago Gonzalez

Discussion Points

  • How can some of this technique’s deficiencies be
  • vercome?
  • What other types of a priori knowledge can be

incorporated?

  • The four datasets are all human-centric, how well would

this pipeline work for nonhuman agents (e.g., cars)?

  • Bag of features and Fischer vectors seem somewhat

naïve, would a different encoding work better?