Computer Vision by Learning: Motion in Action Jan van Gemert, UvA 2 - PowerPoint PPT Presentation

Computer Vision by Learning: Motion in Action Jan van Gemert, UvA

2 Motion and perceptual organization • Even “impoverished” motion data can evoke a strong percept

3 Motion and perceptual organization • Even “impoverished” motion data can evoke a strong percept

4 Uses of motion • Estimating 3D structure • Segmenting objects based on motion cues • Learning dynamical models • Improving video quality (motion stabilization) • Recognizing actions, activities, events

5 Action Recognition Pipeline Spatio-temporal Space-time Space-time Interest point detection patch/trajectory descriptor Followed by Bag-of-Words/Fisher vector and SVM Similar setup as in static image-classification

6 Measuring Motion Lagrangian and Eulerian Perspectives: Fluid Dynamics Lagrangian Eulerian 6

7 1. Lagrangian Perspective: Optical Flow • Track each pixel as it moves through a video Lagrangian

8 [Lukas-‑Kanade, ¡1981] ¡ Problem definition: optical flow How to estimate pixel motion from image H to image I? • Solve pixel correspondence problem – given a pixel in H, look for nearby pixels of the same color in I Key assumptions • color constancy: a point in H looks the same in I – For grayscale images, this is brightness constancy • small motion: points do not move very far This is called the optical flow problem

9 Visualizing optical flow Color Legend Q: What do you notice? • Camera Motion • Parallax (Hue is an angular color space)

10 Optical flow and parallax P ( t+dt ) • P ( t ) is a moving 3D point V P ( t ) • Velocity of scene point: V = d P /d t • p ( t ) = ( x ( t ), y ( t )) is the projection of P in the image • Apparent velocity v in the image: given by v p ( t+dt ) components v x = d x /d t and p ( t ) v y = d y /d t • Length of v inversely proportional to the depth Z of the 3d point

11 Optical flow and camera motion Q: Name the camera motion: Zoom out Zoom in Pan right to left

12 [Dalal, ¡eccv06] ¡ Motion boundaries Video frame Spatial gradient Horizontal motion boundaries Optical flow (quivers) Optical flow (hue) Vertical motion boundaries Q: What do you notice? • Motion boundaries are invariant to constant camera motion Color legend

13 Animated Example Video frame Spatial gradient Horizontal motion boundaries Optical flow (quivers) Optical flow (hue) Vertical motion boundaries Motion boundaries are the spatial gradients of the x and y flow images Q: What do you notice? • Similar properties as the spatial gradient • No motion: motion boundaries disappear • Parallax Color legend

14 Modeling camera motion [wang, ¡iccv13] ¡ [jain, ¡cvpr13] ¡ Video frame Remove global motion: 1. Globally align frames Optical Flow 2. Optical flow on the aligned frames Subtracted (assumes homography) Camera motion Subtract background motion: Video Frame Subtracted Cam Human Detector Subtracted Cam • Assume the background is where the human is not

15 Flow trajectory descriptors [Wang, ¡ijcv13] ¡

16 2. Eulerian Perspective: stationary • Treat each pixel as a time series through a video Eulerian y x time

17 [Laptev, ¡ijcv03] ¡ Spatio-Temporal Interest Points (STIP) Spatio-Temporal Harris Corners

18 [willems,eccv08] ¡ Spatio-Temporal Blobs (hes-Stip) • Spatio-Temporal extension of hessian blob detector • Strength S of the interest point computed with the determinant of the Hessian matrix H • Approximations with integral videos

[Dollar, ¡vspets05] ¡ 19 Periodic Interest Points (Cuboids) Beyond Spatio-Temporal Corners 2D Gaussian smoothing kernel: 1D Gabor filters applied temporally:

20 Dense Sampling • Motivation: Dense sampling outperforms interest- points for object recognition • Extract 3D cubes at regular positions (x,y,t) with varying scales

21 [Kläser, ¡bmvc08] ¡ Spatio-Temporal Gradient Descriptor (HOG3D) 2D HOG/SIFT : 3D HOG/SIFT : (polygon) (polyhedrons) HOG3D: Quantization of 3D gradient : Extensions to color [Everts, ¡cvpr13] ¡ Concatenation: Integration (tensors):

22 3. Action Recognition • Automatically recognizing actions, activities, events • Learn from training data • Apply on unseen test data

23 Video Coding • Create feature vocabulary Fisher – Kmeans (BOW) – GMM (Fisher) • Assign features to vocabulary – Hard assignment (BOW) – Vector differences (Fisher) • Aggregate over whole video – Spatio-Temporal Pyramid • Classifier (SVM) Bag of words

24 Detectors and Descriptors Interest point Space-time Space-time detection patch/trajectory descriptor Detectors: Descriptors: Dense HOG3D Harris3D STIP HOG Hessian STIP HOF Cuboids MBH KLTtraj Modeling the camera yes/no DenseTraj

Action Recognition Datasets Hollywood2 , ¡ 12 ¡classes ¡1,707 ¡vids: ¡movie ¡acJons ¡ UCF50 , ¡50 ¡classes ¡6,618 ¡vids: ¡sports, ¡daily ¡exercises ¡ HMDB51 , ¡51 ¡classes ¡6,766 ¡vids: ¡body ¡moJon, ¡Facial ¡expressions, ¡human ¡InteracJons ¡

26 Results Hollywood2 Motion is ref Detector HOG3D HOG HOF MBH important [wang,bmvc09] Harris3D STIP 43,7 32,8 43,3 [wang,bmvc09] Cuboids 45,7 39,4 42,9 [wang,bmvc09] Hessian STIP 41,3 36,2 43 [wang,bmvc09] Dense 45,3 39,4 45,5 [klaser,bmvc08] Cuboids 48,6 38,2 43,8

27 Results Hollywood2 Motion is ref Detector HOG3D HOG HOF MBH important [wang,bmvc09] Harris3D STIP 43,7 32,8 43,3 [wang,bmvc09] Cuboids 45,7 39,4 42,9 Camera [wang,bmvc09] Hessian STIP 41,3 36,2 43 motion [wang,bmvc09] Dense 45,3 39,4 45,5 invariance [klaser,bmvc08] Cuboids 48,6 38,2 43,8 [wang,ijcv13] Dense 43,3 48 52,1 [wang,ijcv13] KLTtraj 41 48,4 48,6 [wang, ijcv13] DenseTraj 41,2 50,3 55,1 Dense [wang, ijcv13] Harris3D STIP 40,4 44,9 Traject [jain,cvpr13] DenseTrajCam 45,6 54,1 54,2

28 Results Hollywood2 Motion is ref Detector HOG3D HOG HOF MBH important [wang,bmvc09] Harris3D STIP 43,7 32,8 43,3 [wang,bmvc09] Cuboids 45,7 39,4 42,9 Camera [wang,bmvc09] Hessian STIP 41,3 36,2 43 motion [wang,bmvc09] Dense 45,3 39,4 45,5 invariance [klaser,bmvc08] Cuboids 48,6 38,2 43,8 [wang,ijcv13] Dense 43,3 48 52,1 [wang,ijcv13] KLTtraj 41 48,4 48,6 [wang, ijcv13] DenseTraj 41,2 50,3 55,1 Dense [wang, ijcv13] Harris3D STIP 40,4 44,9 Traject [jain,cvpr13] DenseTrajCam 45,6 54,1 54,2 [oneata,iccv13] DenseTrajFisher 42,5 61,9 [wang,iccv13] DenseTrajFisher 46,9 51,4 57,4 [wang,iccv13] DenseTrajFisherCam 47,1 58,8 60,5 Fisher

29 Results UCF50 and HMDB51 UCF50: Dense ref Detector HOG3D HOG HOF MBH [everts,cvpr13] Cuboids 68,3 [everts,cvpr13] CuboidsColor 72,9 Camera [wang,ijcv13] Dense 64,4 65,9 78,3 [wang,ijcv13] KLTtraj 57,4 57,9 71,1 motion [wang,ijcv13] DenseTraj 68 68,2 82,2 invariance [shi,cvpr13] Dense10k 72,4 58,6 69,7 80,1 [oneata,iccv13] DenseTrajFisher 76,3 87,8 [wang,iccv13] DenseTrajFisher 81,8 74,3 86,5 Fisher [wang,iccv13] DenseTrajFisherCam 82,6 85,1 88,9 HMDB51: ref Detector HOG3D HOG HOF MBH Dense [wang,ijcv13] Dense 25,2 29,4 40,9 [wang,ijcv13] KLTtraj 22,2 23,7 33,7 Camera [wang,ijcv13] DenseTraj 27,9 31,5 43,2 [shi,cvpr13] Dense10k 34,7 21 33,5 43 motion [oneata,iccv13] DenseTrajFisher 34,8 51,9 invariance [jain,cvpr13] DenseTrajCam 29,1 38,6 40,9 [wang,iccv13] DenseTrajFisher 38,4 39,5 49,1 Fisher [wang,iccv13] DenseTrajFisherCam 40,2 48,9 52,1

30 Reflection • Detector: dense (trajectories) • Descriptor: camera motion invariance – MBH descriptor • Fisher Vector • Ignored: – Combinatorics of hog+hof+mbh (muddies the analysis) – Human pose modeling literature – Deep learning performs still below State-of-the-art • Gaps: – Fisher on HOG3D? – Camera motion invariance for Eularian methods? – Parallax?

31 4. Action Localisation Goal: Finding Actions in Videos: Where , When and What is happening (tube) Challenges: Exponential search space, Occlusion, Motion, Non-rigid deformations Applications : Video Indexing, Security, Sport Statistics, Animal Monitoring, Elderly Safety, Marketing Research.

Inspired ¡by ¡Object ¡LocalizaJon ¡ 32 In ¡StaJc ¡Images ¡ Sliding Window Boosting Cascade Image Video Image Video … [Rowley, ¡pami98] ¡ [Rodriguez, ¡cvpr08] ¡ [violaJones, ¡ijcv04] ¡ [ke, ¡iccv05] ¡ Branch and Bound Deformable Parts Image Video Image Video [Lampert, ¡pami09] ¡ [Yuan, ¡pami11] ¡ [Felzenswalb, ¡pami10] ¡ [Tian, ¡cvpr13] ¡

SelecJve ¡Search ¡for ¡ ¡ 33 [Uijlings, ¡ijcv13] ¡ StaJc ¡Image ¡Object ¡LocalizaJon ¡ Object hypotheses based on hierarchical grouping of super-pixels • High recall with modest nr of Object Hypotheses. • Train an expensive classifier for single hypothesis Q: How would you extend it to video?

[Jain, ¡CVPR14] ¡ 34 Selective Search for Action Localisation in Video Super-voxel video segmentation with high boundary recall [Xu, ¡cvpr12] ¡ Tubelet hypothesis generation by merging independent cues Tubelet classification based on MBH and SVM

35 Example of Super-Voxel Segmentation

36 Example of Super-Voxel Segmentation

Computer Vision by Learning: Motion in Action Jan van Gemert, UvA 2 - PowerPoint PPT Presentation

Computer Vision by Learning: Motion in Action Jan van Gemert, UvA 2 Motion and perceptual organization Even impoverished motion data can evoke a strong percept 3 Motion and perceptual organization Even impoverished motion

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Computer Vision Neurobio 230 Bill Lotter Exciting time: Neuroscience computer vision

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

Seminar Important Developments in Computer Vision and Machine Learning Kickoff Meeting

Designing a Multipath Transport Protocol Costin Raiciu Joint work with Mark Handley Part of the

Winter Consultation 2007/ 8 Philip Davies Director, GB Markets 2 October 2007 Outline W

Global Illumination Multi-Sampling Path Tracing Simple Sampling Josef talked about all of

Sampling CS 6965 Fall 2011 Creative Program 3 CS 6965 Fall 2011 2 CS 6965 Fall 2011 3 CS

Video Joseph April 20th - Sept 21st a new sermon series - jealousy, betrayal, temptation &

Knowledge in the Situation Calculus Adrian Pearce 8 July 2009 includes slides by Ryan Kelly

Medical Care of Vulnerable and Underserved Populations February 28- March 2, 2019 Holiday Inn

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Computer Vision by Learning: Motion in Action Jan van Gemert, UvA 2 - PowerPoint PPT Presentation

Computer Vision by Learning: Motion in Action Jan van Gemert, UvA 2 Motion and perceptual organization Even impoverished motion data can evoke a strong percept 3 Motion and perceptual organization Even impoverished motion

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Computer Vision Neurobio 230 Bill Lotter Exciting time: Neuroscience computer vision

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

Seminar Important Developments in Computer Vision and Machine Learning Kickoff Meeting

Designing a Multipath Transport Protocol Costin Raiciu Joint work with Mark Handley Part of the

Winter Consultation 2007/ 8 Philip Davies Director, GB Markets 2 October 2007 Outline W

Global Illumination Multi-Sampling Path Tracing Simple Sampling Josef talked about all of

Sampling CS 6965 Fall 2011 Creative Program 3 CS 6965 Fall 2011 2 CS 6965 Fall 2011 3 CS

Video Joseph April 20th - Sept 21st a new sermon series - jealousy, betrayal, temptation &amp;

Knowledge in the Situation Calculus Adrian Pearce 8 July 2009 includes slides by Ryan Kelly

Medical Care of Vulnerable and Underserved Populations February 28- March 2, 2019 Holiday Inn

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Video Joseph April 20th - Sept 21st a new sermon series - jealousy, betrayal, temptation &