Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition - - PowerPoint PPT Presentation

beyond object recognition in 2d
SMART_READER_LITE
LIVE PREVIEW

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition - - PowerPoint PPT Presentation

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition in 2D The World is 3D Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018 Motion is Important for Recognition Johansson, Biological Motion


slide-1
SLIDE 1

Beyond Object Recognition in 2D

Georgia Gkioxari

slide-2
SLIDE 2

Object Recognition in 2D

slide-3
SLIDE 3

Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018

The World is 3D

slide-4
SLIDE 4

Johansson, Biological Motion Perception

Motion is Important for Recognition

slide-5
SLIDE 5

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-6
SLIDE 6

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-7
SLIDE 7

2D: Mask R-CNN

He et al., Mask R-CNN, ICCV 2017

slide-8
SLIDE 8

2D: Mask R-CNN

  • Object Localization
  • Instance Segmentation
  • Pose Estimation

from a Single Image

He et al., Mask R-CNN, ICCV 2017

slide-9
SLIDE 9

2D + t: Object & Pose Tracking

Challenges

  • Multiple Objects
  • Occlusions
  • Variations in Poses
slide-10
SLIDE 10

2D + t: 3D Mask R-CNN

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-11
SLIDE 11

2D + t: 3D Mask R-CNN

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

3D inflated CNN

slide-12
SLIDE 12

2D + t: 3D Mask R-CNN

Predicts 3D tubes instead of 2D rois

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-13
SLIDE 13

2D + t: 3D Mask R-CNN

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

RoiAlign in (x, y, t)

slide-14
SLIDE 14

2D + t: 3D Mask R-CNN

Tube object classification

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-15
SLIDE 15

2D + t: 3D Mask R-CNN

Pose estimation for each tube for each time step

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-16
SLIDE 16

2D + t: 3D Mask R-CNN

Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

slide-17
SLIDE 17

The Challenges When Learning from Video

  • 3D CNNs are time and memory consuming
  • Small batch sizes
  • Prone to overfitting
  • Redundant Computations
  • Consecutive frames look similar
  • 3D convolutions allocate the same amount of computation across time and

pixels

  • 3D extensions of Image-based CNNs might be suboptimal
slide-18
SLIDE 18

Slow-Fast Networks for Video Recognition

Feichtenhofer et al., arXiv 2018

slide-19
SLIDE 19

Slow-Fast Networks for Video Recognition

Slow pathway

T C H,W prediction C C C αT αT αT βC βC βC T T T

Slow Fast

Fast pathway

Feichtenhofer et al., arXiv 2018

slide-20
SLIDE 20

Slow-Fast Networks for Video Recognition

Feichtenhofer et al., arXiv 2018

Fast pathway Slow pathway

βC T C H,W C αT C C αT αT βC βC T T T concat

slide-21
SLIDE 21

Slow-Fast Networks for Video Recognition

  • Kinetics 400
slide-22
SLIDE 22

Slow-Fast Networks for Video Recognition

  • AVA
slide-23
SLIDE 23

Can Motion Also Help 2D?

  • Motion is important for video understanding
  • Object Tracking
  • Action Recognition
  • Can motion help single image understanding?
  • Humans learn to recognize using motion cues
  • Can motion help us recognize better or with less data?
slide-24
SLIDE 24

DensePose

input image DensePose surface of 3D model

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

slide-25
SLIDE 25

DensePose: Annotations

full annotations limited dense annotations sparse annotations keypoints

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

slide-26
SLIDE 26

DensePose: Performance wrt #Annotations

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

slide-27
SLIDE 27

DensePose: Annotation Propagation with Optical Flow

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

Transfer a given label to a new frame

slide-28
SLIDE 28

DensePose: Annotation Propagation with Optical Flow

Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

0.5 1 1.5 2

ground truth propagation equivariance all Gains in performance

slide-29
SLIDE 29
slide-30
SLIDE 30

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-31
SLIDE 31

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-32
SLIDE 32

Mesh R-CNN: Objects and Shapes

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-33
SLIDE 33

Mesh R-CNN: Objects and Shapes

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-34
SLIDE 34

Mesh R-CNN: Objects and Shapes

sofa chair

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-35
SLIDE 35

Mesh R-CNN: Objects and Shapes

sofa chair

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-36
SLIDE 36

Mesh R-CNN: Objects and Shapes

sofa chair

Gkioxari et al., Mesh R-CNN, ArXiv 2019

slide-37
SLIDE 37

Mesh R-CNN: Objects and Shapes

slide-38
SLIDE 38

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-39
SLIDE 39

Appearance (x, y) Motion (x, y, t) Shape (x, y, z)

slide-40
SLIDE 40

Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018

slide-41
SLIDE 41

Thank you