Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition - - PowerPoint PPT Presentation
Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition - - PowerPoint PPT Presentation
Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition in 2D The World is 3D Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018 Motion is Important for Recognition Johansson, Biological Motion
Object Recognition in 2D
Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018
The World is 3D
Johansson, Biological Motion Perception
Motion is Important for Recognition
Appearance (x, y) Motion (x, y, t) Shape (x, y, z)
Appearance (x, y) Motion (x, y, t) Shape (x, y, z)
2D: Mask R-CNN
He et al., Mask R-CNN, ICCV 2017
2D: Mask R-CNN
- Object Localization
- Instance Segmentation
- Pose Estimation
from a Single Image
He et al., Mask R-CNN, ICCV 2017
2D + t: Object & Pose Tracking
Challenges
- Multiple Objects
- Occlusions
- Variations in Poses
2D + t: 3D Mask R-CNN
Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN
Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
3D inflated CNN
2D + t: 3D Mask R-CNN
Predicts 3D tubes instead of 2D rois
Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN
Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
RoiAlign in (x, y, t)
2D + t: 3D Mask R-CNN
Tube object classification
Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN
Pose estimation for each tube for each time step
Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN
Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
The Challenges When Learning from Video
- 3D CNNs are time and memory consuming
- Small batch sizes
- Prone to overfitting
- Redundant Computations
- Consecutive frames look similar
- 3D convolutions allocate the same amount of computation across time and
pixels
- 3D extensions of Image-based CNNs might be suboptimal
Slow-Fast Networks for Video Recognition
Feichtenhofer et al., arXiv 2018
Slow-Fast Networks for Video Recognition
Slow pathway
T C H,W prediction C C C αT αT αT βC βC βC T T T
Slow Fast
Fast pathway
Feichtenhofer et al., arXiv 2018
Slow-Fast Networks for Video Recognition
Feichtenhofer et al., arXiv 2018
Fast pathway Slow pathway
βC T C H,W C αT C C αT αT βC βC T T T concat
Slow-Fast Networks for Video Recognition
- Kinetics 400
Slow-Fast Networks for Video Recognition
- AVA
Can Motion Also Help 2D?
- Motion is important for video understanding
- Object Tracking
- Action Recognition
- Can motion help single image understanding?
- Humans learn to recognize using motion cues
- Can motion help us recognize better or with less data?
DensePose
input image DensePose surface of 3D model
Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
DensePose: Annotations
full annotations limited dense annotations sparse annotations keypoints
Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
DensePose: Performance wrt #Annotations
Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
DensePose: Annotation Propagation with Optical Flow
Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
Transfer a given label to a new frame
DensePose: Annotation Propagation with Optical Flow
Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
0.5 1 1.5 2
ground truth propagation equivariance all Gains in performance
Appearance (x, y) Motion (x, y, t) Shape (x, y, z)
Appearance (x, y) Motion (x, y, t) Shape (x, y, z)
Mesh R-CNN: Objects and Shapes
Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes
Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes
sofa chair
Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes
sofa chair
Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes
sofa chair
Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes
Appearance (x, y) Motion (x, y, t) Shape (x, y, z)
Appearance (x, y) Motion (x, y, t) Shape (x, y, z)
Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018