Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition - - PowerPoint PPT Presentation

▶

beyond object recognition in 2d

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition - - PowerPoint PPT Presentation

Feb 06, 2023 611 likes •1.05k views

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition in 2D The World is 3D Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018 Motion is Important for Recognition Johansson, Biological Motion

slide-1

SLIDE 1

Beyond Object Recognition in 2D

Georgia Gkioxari

slide-2

SLIDE 2

Object Recognition in 2D

slide-3

SLIDE 3

Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018

The World is 3D

slide-4

SLIDE 4

Johansson, Biological Motion Perception

Motion is Important for Recognition

slide-5

SLIDE 5

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-6

SLIDE 6

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-7

SLIDE 7

2D: Mask R-CNN

He et al., Mask R-CNN, ICCV 2017

slide-8

SLIDE 8

2D: Mask R-CNN

Object Localization
Instance Segmentation
Pose Estimation

from a Single Image

He et al., Mask R-CNN, ICCV 2017

slide-9

SLIDE 9

2D + t: Object & Pose Tracking

Challenges

Multiple Objects
Occlusions
Variations in Poses

slide-10

SLIDE 10

2D + t: 3D Mask R-CNN

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-11

SLIDE 11

2D + t: 3D Mask R-CNN

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

3D inflated CNN

slide-12

SLIDE 12

2D + t: 3D Mask R-CNN

Predicts 3D tubes instead of 2D rois

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-13

SLIDE 13

2D + t: 3D Mask R-CNN

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

RoiAlign in (x, y, t)

slide-14

SLIDE 14

2D + t: 3D Mask R-CNN

Tube object classification

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-15

SLIDE 15

2D + t: 3D Mask R-CNN

Pose estimation for each tube for each time step

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-16

SLIDE 16

2D + t: 3D Mask R-CNN

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-17

SLIDE 17

The Challenges When Learning from Video

3D CNNs are time and memory consuming
Small batch sizes
Prone to overfitting
Redundant Computations
Consecutive frames look similar
3D convolutions allocate the same amount of computation across time and

pixels

3D extensions of Image-based CNNs might be suboptimal

slide-18

SLIDE 18

Slow-Fast Networks for Video Recognition

Feichtenhofer et al., arXiv 2018

slide-19

SLIDE 19

Slow-Fast Networks for Video Recognition

Slow pathway

T C H,W prediction C C C αT αT αT βC βC βC T T T

Slow Fast

Fast pathway

Feichtenhofer et al., arXiv 2018

slide-20

SLIDE 20

Slow-Fast Networks for Video Recognition

Feichtenhofer et al., arXiv 2018

Fast pathway Slow pathway

βC T C H,W C αT C C αT αT βC βC T T T concat

slide-21

SLIDE 21

Slow-Fast Networks for Video Recognition

Kinetics 400

slide-22

SLIDE 22

Slow-Fast Networks for Video Recognition

AVA

slide-23

SLIDE 23

Can Motion Also Help 2D?

Motion is important for video understanding
Object Tracking
Action Recognition
Can motion help single image understanding?
Humans learn to recognize using motion cues
Can motion help us recognize better or with less data?

slide-24

SLIDE 24

DensePose

input image DensePose surface of 3D model

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

slide-25

SLIDE 25

DensePose: Annotations

full annotations limited dense annotations sparse annotations keypoints

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

slide-26

SLIDE 26

DensePose: Performance wrt #Annotations

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

slide-27

SLIDE 27

DensePose: Annotation Propagation with Optical Flow

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

Transfer a given label to a new frame

slide-28

SLIDE 28

DensePose: Annotation Propagation with Optical Flow

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

0.5 1 1.5 2

ground truth propagation equivariance all Gains in performance

slide-29

SLIDE 29

slide-30

SLIDE 30

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-31

SLIDE 31

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-32

SLIDE 32

Mesh R-CNN: Objects and Shapes

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-33

SLIDE 33

Mesh R-CNN: Objects and Shapes

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-34

SLIDE 34

Mesh R-CNN: Objects and Shapes

sofa chair

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-35

SLIDE 35

Mesh R-CNN: Objects and Shapes

sofa chair

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-36

SLIDE 36

Mesh R-CNN: Objects and Shapes

sofa chair

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-37

SLIDE 37

Mesh R-CNN: Objects and Shapes

slide-38

SLIDE 38

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-39

SLIDE 39

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-40

SLIDE 40

Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018

slide-41

SLIDE 41

Thank you