Mask R-CNN OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION - - PowerPoint PPT Presentation

mask r cnn
SMART_READER_LITE
LIVE PREVIEW

Mask R-CNN OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION - - PowerPoint PPT Presentation

Mask R-CNN OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION Kaiming He Georgia Gkioxari Piotr Dollr Ross Girshick RESEARCH SCIENTIST POSTDOC RESEARCH SCIENTIST RESEARCH SCIENTIST FACEBOOK AI RESEARCH (FAIR) Classic Computer Vision


slide-1
SLIDE 1

OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION

RESEARCH SCIENTIST POSTDOC RESEARCH SCIENTIST

Kaiming He Georgia Gkioxari Piotr Dollár

Mask R-CNN

RESEARCH SCIENTIST

Ross Girshick

FACEBOOK AI RESEARCH (FAIR)

slide-2
SLIDE 2

Classic Computer Vision Problems

Image classification

✓boat ✓person

Source: PASCAL Dataset

slide-3
SLIDE 3

Classic Computer Vision Problems

Image classification

✓boat ✓person

Object detection

Source: PASCAL Dataset

slide-4
SLIDE 4

Semantic Segmentation

person

Semantic segmentation

(pixel-level classification)

Source: PASCAL Dataset

slide-5
SLIDE 5

The Instance Segmentation Task

person

Person 1 Person 2 Person 3 Person 4 Person 5

Semantic segmentation

(pixel-level classification)

Instance segmentation

(pixel-level detection)

Our task

Source: PASCAL Dataset

slide-6
SLIDE 6

Source: COCO Dataset

slide-7
SLIDE 7

Source: DAVIS Dataset

slide-8
SLIDE 8

Mask R-CNN

  • Mask R-CNN
  • Object instance segmentation
  • Human pose estimation
  • Role of Caffe2 in our research
  • Conclusions

TALK OUTLINE

slide-9
SLIDE 9

SOURCE: GIRSHICK, DONAHUE, DARRELL, MALIK. RICH FEATURE HIERARCHIES FOR ACCURATE OBJECT DETECTION AND SEMANTIC SEGMENTATION. CVPR 2014

REGION-BASED CONVOLUTION NEURAL NETWORK

Object Detection: R-CNN

Image Per-region classification by a CNN Region proposals (External algorithm)

slide-10
SLIDE 10

SOURCE: GIRSHICK, DONAHUE, DARRELL, MALIK. RICH FEATURE HIERARCHIES FOR ACCURATE OBJECT DETECTION AND SEMANTIC SEGMENTATION. CVPR 2014

REGION-BASED CONVOLUTION NEURAL NETWORK

Object Detection: R-CNN

CNN

Class/box

CNN

Class/box

CNN

Class/box

CNN

Class/box

Per-region classification by a CNN Region proposals (External algorithm) Image

slide-11
SLIDE 11

SOURCE: GIRSHICK. FAST R-CNN. ICCV 2015

A SHARED CNN BODY

Fast R-CNN

Class/box

CNN applied to entire image RoIPool op

Class/box Class/box

Shared region-wise subnetwork External region proposal algorithm (same as R-CNN)

slide-12
SLIDE 12

SOURCE: REN, HE, GIRSHICK,SUN. FASTER R-CNN: TOWARDS REAL-TIME OBJECT DETECTION WITH REGION PROPOSAL NETWORKS. NIPS 2015

REGION PROPOSAL NETWORK

Faster R-CNN

Class/box

RoIPool op

Class/box Class/box

Shared region-wise subnetwork In-network region proposals from RPN CNN applied to entire image

slide-13
SLIDE 13

Mask R-CNN for Instance Segmentation

  • An extension of Faster R-CNN
  • Surprisingly simple
  • Fast: 200 ms / im
  • Accurate: state of the art on COCO

OVERVIEW

slide-14
SLIDE 14

Mask R-CNN for Instance Segmentation

Faster R-CNN Region-wise segmentation subnetwork CNN applied to entire image Mask “head” RoIAlign

slide-15
SLIDE 15

Mask R-CNN results on COCO

slide-16
SLIDE 16

Mask R-CNN results on COCO

slide-17
SLIDE 17

Mask R-CNN results on COCO

slide-18
SLIDE 18

Quantitative Results

backbone mask AP MNC ResNet-101-C4 24.6 FCIS w/ OHEM ResNet-101-C5-dilated 29.2 FCIS+++ w/ OHEM ResNet-101-C5-dilated 33.6 Mask R-CNN ResNet-101-C4 33.1 Mask R-CNN ResNet-101-FPN 35.7 Mask R-CNN ResNeXt-101-FPN 37.1

2015 COCO winner 2016 COCO winner

[seconds per image]

Our 200ms version

slide-19
SLIDE 19

Mask R-CNN for Human Pose Estimation

  • Keypoint = 1-hot mask
  • Human pose = 17 keypoints
  • Represent pose as 17 masks

OVERVIEW

slide-20
SLIDE 20

Mask R-CNN results on COCO

slide-21
SLIDE 21

Mask R-CNN results on COCO

slide-22
SLIDE 22

Mask R-CNN results on COCO

slide-23
SLIDE 23

Mask R-CNN results on COCO

slide-24
SLIDE 24

Quantitative Results

keypoint AP CMU-Pose+++ 61.8 G-RMI [w/ extra data] 62.4 Mask R-CNN [keypoint-only] 62.7 Mask R-CNN [keypoint & mask] 63.1

2016 COCO winner

[seconds per image]

Our 200ms version

slide-25
SLIDE 25

Caffe2 Accelerated Research

slide-26
SLIDE 26

Caffe2 Object Detection Platform

  • Early alpha users starting in May 2016
  • Ported py-faster-rcnn from Caffe to Caffe2
  • Key design choices
  • Flexible framework for implementing object detection models
  • Parallelize data loading with forward/backward computation

RAPID IDEA ITERATION IS A KEY ENABLING FACTOR IN RESEARCH

slide-27
SLIDE 27

Caffe2 Object Detection Platform

  • Sync SGD with 8 GPUs [Tesla M40] in a BigSur server
  • Rapid prototyping of Mask R-CNN models in 8-12 hours
  • SOTA Mask R-CNN models train in 44 hours
  • Previous systems: ~ 4 days training time [experience from

MSRA]

RAPID IDEA ITERATION IS A KEY ENABLING FACTOR IN RESEARCH

slide-28
SLIDE 28

From Research to Mobile with Caffe2

slide-29
SLIDE 29

Conclusions

  • Simple and effective
  • Fast inference
  • Box, mask, and pose all-in-one network and method
  • Caffe2 enables extremely fast prototyping, critical to our

success

slide-30
SLIDE 30