Computer Vision by Learning: Motion in Action
Jan van Gemert, UvA
Computer Vision by Learning: Motion in Action Jan van Gemert, UvA 2 - - PowerPoint PPT Presentation
Computer Vision by Learning: Motion in Action Jan van Gemert, UvA 2 Motion and perceptual organization Even impoverished motion data can evoke a strong percept 3 Motion and perceptual organization Even impoverished motion
Jan van Gemert, UvA
2
3
4
5
6
Lagrangian Eulerian
6
7
Lagrangian
8
– given a pixel in H, look for nearby pixels of the same color in I
Key assumptions
– For grayscale images, this is brightness constancy
This is called the optical flow problem
[Lukas-‑Kanade, ¡1981] ¡
9
Color Legend
(Hue is an angular color space)
10
11 Zoom out Zoom in Pan right to left
Q: Name the camera motion:
12
[Dalal, ¡eccv06] ¡
Video frame Optical flow (quivers) Spatial gradient Optical flow (hue) Horizontal motion boundaries Vertical motion boundaries
Color legend
13
Video frame Optical flow (quivers) Spatial gradient Optical flow (hue) Horizontal motion boundaries Vertical motion boundaries
Color legend
Motion boundaries are the spatial gradients of the x and y flow images
14
[wang, ¡iccv13] ¡ [jain, ¡cvpr13] ¡
Video frame Optical Flow Subtracted Camera motion
(assumes homography)
Video Frame Subtracted Cam Human Detector Subtracted Cam
background is where the human is not
15 [Wang, ¡ijcv13] ¡
16
Eulerian
17 [Laptev, ¡ijcv03] ¡
18 [willems,eccv08] ¡
19
2D Gaussian smoothing kernel: 1D Gabor filters applied temporally:
[Dollar, ¡vspets05] ¡
20
21 [Kläser, ¡bmvc08] ¡
(polyhedrons)
(polygon)
[Everts, ¡cvpr13] ¡
22
– Kmeans (BOW) – GMM (Fisher)
– Hard assignment (BOW) – Vector differences (Fisher)
– Spatio-Temporal Pyramid
23
24
ref Detector HOG3D HOG HOF MBH [wang,bmvc09] Harris3D STIP 43,7 32,8 43,3 [wang,bmvc09] Cuboids 45,7 39,4 42,9 [wang,bmvc09] Hessian STIP 41,3 36,2 43 [wang,bmvc09] Dense 45,3 39,4 45,5 [klaser,bmvc08] Cuboids 48,6 38,2 43,8
26
ref Detector HOG3D HOG HOF MBH [wang,bmvc09] Harris3D STIP 43,7 32,8 43,3 [wang,bmvc09] Cuboids 45,7 39,4 42,9 [wang,bmvc09] Hessian STIP 41,3 36,2 43 [wang,bmvc09] Dense 45,3 39,4 45,5 [klaser,bmvc08] Cuboids 48,6 38,2 43,8 [wang,ijcv13] Dense 43,3 48 52,1 [wang,ijcv13] KLTtraj 41 48,4 48,6 [wang, ijcv13] DenseTraj 41,2 50,3 55,1 [wang, ijcv13] Harris3D STIP 40,4 44,9 [jain,cvpr13] DenseTrajCam 45,6 54,1 54,2
27
ref Detector HOG3D HOG HOF MBH [wang,bmvc09] Harris3D STIP 43,7 32,8 43,3 [wang,bmvc09] Cuboids 45,7 39,4 42,9 [wang,bmvc09] Hessian STIP 41,3 36,2 43 [wang,bmvc09] Dense 45,3 39,4 45,5 [klaser,bmvc08] Cuboids 48,6 38,2 43,8 [wang,ijcv13] Dense 43,3 48 52,1 [wang,ijcv13] KLTtraj 41 48,4 48,6 [wang, ijcv13] DenseTraj 41,2 50,3 55,1 [wang, ijcv13] Harris3D STIP 40,4 44,9 [jain,cvpr13] DenseTrajCam 45,6 54,1 54,2 [oneata,iccv13] DenseTrajFisher 42,5 61,9 [wang,iccv13] DenseTrajFisher 46,9 51,4 57,4 [wang,iccv13] DenseTrajFisherCam 47,1 58,8 60,5
28
ref Detector HOG3D HOG HOF MBH [everts,cvpr13] Cuboids 68,3 [everts,cvpr13] CuboidsColor 72,9 [wang,ijcv13] Dense 64,4 65,9 78,3 [wang,ijcv13] KLTtraj 57,4 57,9 71,1 [wang,ijcv13] DenseTraj 68 68,2 82,2 [shi,cvpr13] Dense10k 72,4 58,6 69,7 80,1 [oneata,iccv13] DenseTrajFisher 76,3 87,8 [wang,iccv13] DenseTrajFisher 81,8 74,3 86,5 [wang,iccv13] DenseTrajFisherCam 82,6 85,1 88,9 29 ref Detector HOG3D HOG HOF MBH [wang,ijcv13] Dense 25,2 29,4 40,9 [wang,ijcv13] KLTtraj 22,2 23,7 33,7 [wang,ijcv13] DenseTraj 27,9 31,5 43,2 [shi,cvpr13] Dense10k 34,7 21 33,5 43 [oneata,iccv13] DenseTrajFisher 34,8 51,9 [jain,cvpr13] DenseTrajCam 29,1 38,6 40,9 [wang,iccv13] DenseTrajFisher 38,4 39,5 49,1 [wang,iccv13] DenseTrajFisherCam 40,2 48,9 52,1
30
Goal: Finding Actions in Videos: Where, When and What is happening (tube) Challenges: Exponential search space, Occlusion, Motion, Non-rigid deformations Applications: Video Indexing, Security, Sport Statistics, Animal Monitoring, Elderly Safety, Marketing Research.
31
[Lampert, ¡pami09] ¡ [Rodriguez, ¡cvpr08] ¡
Sliding Window Branch and Bound Deformable Parts
[Yuan, ¡pami11] ¡ [Felzenswalb, ¡pami10] ¡ [Tian, ¡cvpr13] ¡ [ke, ¡iccv05] ¡ [violaJones, ¡ijcv04] ¡
Boosting Cascade
[Rowley, ¡pami98] ¡
…
Image Video Image Video Image Video Image Video 32
[Uijlings, ¡ijcv13] ¡
Object hypotheses based on hierarchical grouping of super-pixels
33
[Xu, ¡cvpr12] ¡
Super-voxel video segmentation with high boundary recall Tubelet hypothesis generation by merging independent cues Tubelet classification based on MBH and SVM
[Jain, ¡CVPR14] ¡ 34
35
36
Tubelet hypothesis generation by merging independent cues:
– Color: HSV histograms – Texture: HOG – Motion: Independent motion, ML-estimate of affine camera deviation – Size: Smallest-first – Fill: Joined_cuboid – voxel_pair
37
UCF-Sports, 150 vids, 10 actions MSR-II, 54 vids, 3 actions, unconstrained temporal location (boxing, hand-clapping, hand-waving)
38
Overlap: Avg Best Overlap (ABO): MABO = mean ABO over all classes
39
UCF-Sports: MSR-II:
Precision Precision Precision Recall
40
41
42
43
44