SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS
Paper by Chen, Papandreou, Kokkinos, Murphy, Yuille Slides by Josh Kelle (with graphics from the paper)
SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY - - PowerPoint PPT Presentation
SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS Paper by Chen, Papandreou, Kokkinos, Murphy, Yuille Slides by Josh Kelle (with graphics from the paper) Semantic Segmentation Goal: Partition the image into
Paper by Chen, Papandreou, Kokkinos, Murphy, Yuille Slides by Josh Kelle (with graphics from the paper)
Goal: Partition the image into semantically meaningful parts, and classify each part.
semantic segmentation horse person car background
1.Use CNN to generate a rough prediction of segmentation (smooth, blurry heat map) 2.Refine this prediction with a conditional random field (CRF)
CNN output CRF output image
Too much invariance. Good for high-level vision tasks like classification, bad for low level tasks like segmentation.
Solution: ‘atrous’ algorithm (hole algorithm)
weights) Solution: fully connected CRF
DCNN output CRF 1 iteration CRF 2 iteration CRF 10 iteration image ground truth
layer CNN pre-trained on ImageNet, publicly available).
convolutional layers.
layers.
keep learned kernels the same?
kernels, but that’s slow.
Input stride
resolution.
the CNN output is still 8x too small.
interpolation to grow the image.
Deep Convolutional Neural Network Input Aeroplane Coarse Score map Bi-linear Interpolation
noisy segmentation.
CRF would make it worse.
has every pixel connected to every other pixel.
E(x) = X
i
θi(xi) + X
ij
θij(xi, xj)
where xi is assignment of pixel i
θi(xi) = − log P(xi) P(xi) = label assignment probability computed by CNN
θij(xi, xj) = µ(xi, xj)
K
X
m=1
wm · km(f i, f j)
θij(xi, xj) = µ(xi, xj)
K
X
m=1
wm · km(f i, f j)
µ(xi, xj) = 1 if xi 6= xj, and zero otherwise
indicator function
θij(xi, xj) = µ(xi, xj)
K
X
m=1
wm · km(f i, f j)
µ(xi, xj) = 1 if xi 6= xj, and zero otherwise +w2 exp ⇣ − ||pi − pj||2 2σ2
γ
⌘ w1 exp ⇣ − ||pi − pj||2 2σ2
α
− ||Ii − Ij||2 2σ2
β
⌘
K
X
m=1
wm · km(f i, f j) =
p = pixel position I = pixel color intensities
indicator function 2 Gaussian kernels
(w and σ are hyper parameters fit with cross validation)
Deep Convolutional Neural Network Input Aeroplane Coarse Score map Bi-linear Interpolation Fully Connected CRF Final Output
Method mean IOU (%) MSRA-CFM 61.8 FCN-8s 62.2 TTI-Zoomout-16 64.4 DeepLab-CRF 66.4 DeepLab-MSc-CRF 67.1 DeepLab-MSc-CRF-LargeFOV 71.6
ground truth FCN-8s image DeepLab-CRF
ground truth TTI-Zoomout-16 image DeepLab-CRF
image ground truth DeepLab DeepLab-CRF
image ground truth DeepLab DeepLab-CRF
spatially invariant.
map.