A Fuller Understanding of Fully Convolutional Networks Evan - - PowerPoint PPT Presentation

a fuller understanding of fully convolutional networks
SMART_READER_LITE
LIVE PREVIEW

A Fuller Understanding of Fully Convolutional Networks Evan - - PowerPoint PPT Presentation

A Fuller Understanding of Fully Convolutional Networks Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16 1 pixels in, pixels out colorization Zhang et al.2016 monocular depth + normals Eigen & Fergus 2015


slide-1
SLIDE 1

UC Berkeley in CVPR'15, PAMI'16

A Fuller Understanding of Fully Convolutional Networks

Evan Shelhamer* Jonathan Long* Trevor Darrell

1

slide-2
SLIDE 2

pixels in, pixels out

semantic segmentation 2 monocular depth + normals Eigen & Fergus 2015 boundary prediction Xie & Tu 2015

  • ptical flow Fischer et al. 2015

colorization Zhang et al.2016

slide-3
SLIDE 3

3

“tabby cat” 1000-dim vector < 1 millisecond

convnets perform classification

end-to-end learning

slide-4
SLIDE 4

4

~1/10 second end-to-end learning

???

lots of pixels, little time?

slide-5
SLIDE 5

“tabby cat”

5

a classification network

slide-6
SLIDE 6

6

becoming fully convolutional

slide-7
SLIDE 7

7

becoming fully convolutional

slide-8
SLIDE 8

8

upsampling output

slide-9
SLIDE 9

9

end-to-end, pixels-to-pixels network

slide-10
SLIDE 10

conv, pool, nonlinearity upsampling pixelwise

  • utput + loss

end-to-end, pixels-to-pixels network

10

slide-11
SLIDE 11

spectrum of deep features

combine where (local, shallow) with what (global, deep)

fuse features into deep jet

(cf. Hariharan et al. CVPR15 “hypercolumn”)

11

slide-12
SLIDE 12

skip layers

skip to fuse layers! interp + sum interp + sum dense output

12

end-to-end, joint learning

  • f semantics and location
slide-13
SLIDE 13

stride 32 no skips stride 16 1 skip stride 8 2 skips ground truth input image

skip layer refinement

13

slide-14
SLIDE 14

skip FCN computation

Stage 1 (60.0ms) Stage 2 (18.7ms) Stage 3 (23.0ms) A multi-stream network that fuses features/predictions across layers

slide-15
SLIDE 15

FCN SDS* Truth Input

15

Relative to prior state-of-the-art SDS:

  • 30% relative

improvement for mean IoU

  • 286× faster

*Simultaneous Detection and Segmentation Hariharan et al. ECCV14

slide-16
SLIDE 16

leaderboard == segmentation with Caffe

16

FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN

slide-17
SLIDE 17

17

slide-18
SLIDE 18

care and feeding of fully convolutional networks

18

slide-19
SLIDE 19
  • train full image at a time without sampling
  • reshape network to take input of any size
  • forward time is ~100ms for 500 x 500 x 21 output

(on M. Titan X)

usage

19

slide-20
SLIDE 20

image-to-image optimization

20

slide-21
SLIDE 21

momentum and batch size

21

slide-22
SLIDE 22

sampling images?

no need! no improvement from sampling across images

22

slide-23
SLIDE 23

sampling pixels?

no need! no improvement from (partially) decorrelating pixels

23

uniform poisson

slide-24
SLIDE 24

context?

24

  • do FCNs incorporate

contextual cues?

  • loses 3-4 % points when

the background is masked

  • can learn from BG/shape

alone if forced to!

  • Standard 85 IU
  • BG alone 38 IU
  • Shape 29 IU
slide-25
SLIDE 25

past and future history of fully convolutional networks

25

slide-26
SLIDE 26

history

Convolutional Locator Network Wolf & Platt 1994 Shape Displacement Network Matan & LeCun 1992

26

slide-27
SLIDE 27

27

Scale Pyramid, Burt & Adelson ‘83

pyramids

1 2

The scale pyramid is a classic multi-resolution representation Fusing multi-resolution network layers is a learned, nonlinear counterpart

slide-28
SLIDE 28

28

Jet, Koenderink & Van Doorn ‘87

jets

The local jet collects the partial derivatives at a point for a rich local description The deep jet collects layer compositions for a rich, learned description

slide-29
SLIDE 29

29

extensions

  • detection + instances
  • structured output
  • weak supervision
slide-30
SLIDE 30

30

detection: fully conv. proposals

Fast R-CNN, Girshick ICCV'15 Faster R-CNN, Ren et al. NIPS'15

end-to-end detection by proposal FCN RoI classification

slide-31
SLIDE 31

fully conv. nets + structured output

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Chen* & Papandreou* et al. ICLR 2015.

31

slide-32
SLIDE 32

fully conv. nets + structured output

Conditional Random Fields as Recurrent Neural Networks. Zheng* & Jayasumana* et al. ICCV 2015.

32

slide-33
SLIDE 33

dilation for structured output

Multi-Scale Context Aggregation by Dilated Convolutions. Yu & Koltun. ICLR 2016

33

  • enlarge effective receptive

field for same no. params

  • raise resolution
  • convolutional context model:

similar accuracy to CRF but non-probabilistic

slide-34
SLIDE 34

[ comparison credit: CRF as RNN, Zheng* & Jayasumana* et al. ICCV 2015 ]

34

DeepLab: Chen* & Papandreou* et al. ICLR 2015. CRF-RNN: Zheng* & Jayasumana* et al. ICCV 2015

slide-35
SLIDE 35

fully conv. nets + weak supervision

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Pathak et al. arXiv 2015.

FCNs expose a spatial loss map to guide learning: segment from tags by MIL or pixelwise constraints

35

slide-36
SLIDE 36

fully conv. nets + weak supervision

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al. 2015.

FCNs expose a spatial loss map to guide learning: mine boxes + feedback to refine masks

36

slide-37
SLIDE 37

fully conv. nets + weak supervision

FCNs can learn from sparse annotations == sampling the loss

What's the Point? Semantic Segmentation with Point Supervision. Bearman et al. ECCV 2016.

37

slide-38
SLIDE 38

fcn.berkeleyvision.org

conclusion

fully convolutional networks are fast, end-to-end models for pixelwise problems

  • code in Caffe
  • models for PASCAL VOC, NYUDv2,

SIFT Flow, PASCAL-Context

38

caffe.berkeleyvision.org github.com/BVLC/caffe model example inference example solving example