SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY - - PowerPoint PPT Presentation

semantic image segmentation with deep convolutional nets
SMART_READER_LITE
LIVE PREVIEW

SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY - - PowerPoint PPT Presentation

SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS Paper by Chen, Papandreou, Kokkinos, Murphy, Yuille Slides by Josh Kelle (with graphics from the paper) Semantic Segmentation Goal: Partition the image into


slide-1
SLIDE 1

SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS

Paper by Chen, Papandreou, Kokkinos, Murphy, Yuille Slides by Josh Kelle (with graphics from the paper)

slide-2
SLIDE 2

Semantic Segmentation

Goal: Partition the image into semantically meaningful parts, and classify each part.

semantic segmentation horse person car background

slide-3
SLIDE 3

Main Idea

1.Use CNN to generate a rough prediction of segmentation (smooth, blurry heat map) 2.Refine this prediction with a conditional random field (CRF)

CNN output CRF output image

slide-4
SLIDE 4

Why are CNNs insufficient?

Too much invariance. Good for high-level vision tasks like classification, bad for low level tasks like segmentation.

  • Problem: subsampling


Solution: ‘atrous’ algorithm (hole algorithm)

  • Problem: spatial invariance (shared kernel

weights)
 Solution: fully connected CRF

slide-5
SLIDE 5

DCNN output CRF 1 iteration CRF 2 iteration CRF 10 iteration image ground truth

Example

slide-6
SLIDE 6

Part 1: CNN

slide-7
SLIDE 7

CNNs for Dense Feature Extraction

  • Construct “DeepLab” by modifying VGG-16 (a 16-

layer CNN pre-trained on ImageNet, publicly available).

  • Convert the fully-connected layers of VGG-16 into

convolutional layers.

  • Skip subsampling after the last two max-pooling

layers.

slide-8
SLIDE 8

Hole Algorithm

  • How to skip max pooling, but

keep learned kernels the same?

  • Could introduce zeros into the

kernels, but that’s slow.

  • The hole algorithm is faster.

Input stride

slide-9
SLIDE 9

Image Resolution

  • CNN shrinks the image. We need image at original

resolution.

  • Skipping the last two phases of max pooling helps, but

the CNN output is still 8x too small.

  • Since the score maps are smooth, just use bi-linear

interpolation to grow the image.

Deep Convolutional Neural Network Input Aeroplane Coarse Score map Bi-linear Interpolation

slide-10
SLIDE 10

Part 2: CRF

slide-11
SLIDE 11

Fully Connected CRF

  • Traditionally, short range CRFs are used to smooth

noisy segmentation.

  • CNN output is already very smooth. Short range

CRF would make it worse.

  • Use a fully connected CRF. The graphical model

has every pixel connected to every other pixel.

slide-12
SLIDE 12

CRF Energy Function

E(x) = X

i

θi(xi) + X

ij

θij(xi, xj)

where xi is assignment of pixel i

θi(xi) = − log P(xi) P(xi) = label assignment probability computed by CNN

slide-13
SLIDE 13

CRF Energy Function

θij(xi, xj) = µ(xi, xj)

K

X

m=1

wm · km(f i, f j)

slide-14
SLIDE 14

CRF Energy Function

θij(xi, xj) = µ(xi, xj)

K

X

m=1

wm · km(f i, f j)

µ(xi, xj) = 1 if xi 6= xj, and zero otherwise

indicator function

slide-15
SLIDE 15

CRF Energy Function

θij(xi, xj) = µ(xi, xj)

K

X

m=1

wm · km(f i, f j)

µ(xi, xj) = 1 if xi 6= xj, and zero otherwise +w2 exp ⇣ − ||pi − pj||2 2σ2

γ

⌘ w1 exp ⇣ − ||pi − pj||2 2σ2

α

− ||Ii − Ij||2 2σ2

β

K

X

m=1

wm · km(f i, f j) =

p = pixel position I = pixel color intensities

indicator function 2 Gaussian kernels

(w and σ are hyper parameters fit with cross validation)

slide-16
SLIDE 16

Full Pipeline “DeepLab-CRF”

Deep Convolutional Neural Network Input Aeroplane Coarse Score map Bi-linear Interpolation Fully Connected CRF Final Output

slide-17
SLIDE 17

Comparison to state-of-the-art

Method mean IOU (%) MSRA-CFM 61.8 FCN-8s 62.2 TTI-Zoomout-16 64.4 DeepLab-CRF 66.4 DeepLab-MSc-CRF 67.1 DeepLab-MSc-CRF-LargeFOV 71.6

slide-18
SLIDE 18

Comparison to state-of-the-art

ground truth FCN-8s image DeepLab-CRF

slide-19
SLIDE 19

Comparison to state-of-the-art

ground truth TTI-Zoomout-16 image DeepLab-CRF

slide-20
SLIDE 20

Success Cases

image ground truth DeepLab DeepLab-CRF

slide-21
SLIDE 21

Failure Cases

image ground truth DeepLab DeepLab-CRF

slide-22
SLIDE 22

Conclusion

  • Modify the CNN architecture to become less

spatially invariant.

  • Use the CNN to compute a rough score map.
  • Use a fully connected CRF to sharpen the score

map.