Learning Deep Structured Models for Semantic Segmentation Guosheng - - PowerPoint PPT Presentation

learning deep structured models for semantic segmentation
SMART_READER_LITE
LIVE PREVIEW

Learning Deep Structured Models for Semantic Segmentation Guosheng - - PowerPoint PPT Presentation

Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation Outline Exploring Context with Deep Structured Models Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel; Efficient Piecewise Training


slide-1
SLIDE 1

Learning Deep Structured Models for Semantic Segmentation

Guosheng Lin

slide-2
SLIDE 2

Semantic Segmentation

slide-3
SLIDE 3

Outline

  • Exploring Context with Deep Structured Models

– Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel;

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation; arXiv.

  • Learning CNN based Message Estimators

– Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel;

Deeply Learning the Messages in Message Passing Inference; NIPS 2015.

slide-4
SLIDE 4
  • Fully convolution network for semantic segmentation

– Long et al. CVPR2015 Fully convolution net Score map low resolution prediction e.g., 1/32 or 1/8

  • f the input image size

Bilinear upsample Prediction in the Size of the input image

Background

slide-5
SLIDE 5

Background

Fully convolution net Score map low resolution prediction e.g., 1/32 or 1/8

  • f the input image size

Bilinear upsample and refine Prediction in the Size of the input image Recent methods focus on the up-sample and refinement stage. e.g., DeepLab (ICLR 2015), CRF-RNN(ICCV 2015), DeconvNet(ICCV 2015), DPN (ICCV 2015)

slide-6
SLIDE 6

Background

Contextual Deep Structured model Score map low resolution prediction e.g., 1/32 or 1/8

  • f the input image size

Bilinear upsample and refine Prediction in the Size of the input image Our focus: explore contextual information using deep structured model

slide-7
SLIDE 7

Explore Context

  • Spatial Context:

– Semantic relations between image regions.

  • e.g., a car is likely to appear over a road
  • A person appears above a horse is more likely than a dog

appears above a horse.

– We focus on two types of context:

  • Patch-Patch context
  • Patch-Background context
slide-8
SLIDE 8

Patch-Patch Context Patch-Background Context

slide-9
SLIDE 9

Overview

slide-10
SLIDE 10

Patch-Patch Context

  • Learning CRFs with CNN based pairwise potential functions.

FeatMap-Net Feature map Create the CRF graph (create nodes and pairwise connections)

slide-11
SLIDE 11

Create nodes in the CRF graph One node corresponds to one spatial position in the feature map

… …

Generate pairwise connection One node connects to the nodes that lie in a spatial range box (box with the dashed lines) Feature map d Create the CRF graph (create nodes and pairwise connections)

slide-12
SLIDE 12

Patch-Patch Context

  • Construct CRF graph

Constructing pairwise connections in a CRF graph:

slide-13
SLIDE 13

CRFs with CNN based potentials

The conditional likelihood for one image:

slide-14
SLIDE 14

CRFs with CNN based potentials

The conditional likelihood for one image:

slide-15
SLIDE 15

Explore background context

FeatMap-Net: multi-scale network for generating feature map

slide-16
SLIDE 16
slide-17
SLIDE 17

Prediction

  • Coarse-level prediction stage:

– P(y|x) is approximated using the mean-field algorithm

  • Prediction refinement stage

– Sharpen the object boundary by leveraging low-level pixel information for

smoothness.

– First up-sample the confidence map of the coarse prediction to the original input

image size. Then perform Dense-CRF. (P. Kr ahenb uhl and V. KoltunNIPS2012)

slide-18
SLIDE 18

Prediction

  • Coarse-level prediction stage:

– P(y|x) is approximated using the mean-field algorithm

  • Prediction refinement stage

– Sharpen the object boundary by leveraging low-level pixel information for

smoothness.

– First up-sample the confidence map of the coarse prediction to the original input

image size. Then perform Dense-CRF. (P. Kr ahenb uhl and V. KoltunNIPS2012)

slide-19
SLIDE 19

CRF learning

Minimize the negative log-likelihood: SGD optimization, difficulty in calculating the gradient of the partition function: Require marginal inference at each SGD. Since the huge number of SGD iteration and large number of nodes, this approach is not practical or even intractable. We apply piecewise training to avoid repeat inference at each SGD iteration.

slide-20
SLIDE 20

Results

slide-21
SLIDE 21

PASCAL Leaderboard

http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=6

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

Examples on Internet images

slide-26
SLIDE 26

Test image: street scene

slide-27
SLIDE 27

Result from a model trained on street scene images (around 1000 training images)

slide-28
SLIDE 28

Building Road Side-walk Car

slide-29
SLIDE 29

Tree Rider Fence Person

slide-30
SLIDE 30
slide-31
SLIDE 31

Result from a model trained on street scene images (around 1000 training images)

slide-32
SLIDE 32

Result from PASCAL VOC model

slide-33
SLIDE 33

Test image: indoor scene

slide-34
SLIDE 34

Result from NYUD trained model (around 800 training images)

slide-35
SLIDE 35

Result from PASCAL VOC trained model

slide-36
SLIDE 36
slide-37
SLIDE 37

Result from NYUD trained model

slide-38
SLIDE 38

Message Learning

slide-39
SLIDE 39

CRFs+CNNs

Conditional likelihood: Energy function: CNN based (log-) potential function (factor function): The potential function can be a unary, pairwise, or high-order potential function y1 y2 CNN based pairwise potential, measure the confidence of the pairwise label configuration CNN based unary potential: measure the labelling confidence of a single variable Factor graph: a factorization of the joint distribution of variables

slide-40
SLIDE 40

Challenges in Learning CRFs+CNNs

Prediction can be made by marginal inference (e.g. message passing): CRF-CNN joint learning: learning CNN potential functions by optimizing the CRF objective, typically, minimizing the negative conditional log-likelihood (NLL) Learning CNN parameters with stochastic gradient descend. The partition function Z brings difficulties for optimization: For each SGD iteration: require approximate marginal inference to calculate the factor marginals. CNN training need a large number of SGD iterations, training become intractable.

slide-41
SLIDE 41
  • Traditional approach:

– Applying approximate learning objectives

  • Replace the optimization objectives to avoid inference
  • e.g., piecewise training, pseudo-likelihood
  • Our approach

– Directly target the final prediction

  • Traditional approach aims to learn the potentials function and perform inference for

final prediction

– Not learning the potential function – Learning CNN estimators to directly output the required intermediate

values in an inference algorithm

  • Focus on message passing based inference for prediction (specifically Loopy BP).
  • Directly learning CNNs to predict the messages.

Solutions

slide-42
SLIDE 42

belief propagation: message passing based inference

y1 y2 Factor-to-variable message Variable-to-factor message: Message: K-dimensional vector, K is the number of classes (node states) Factor-to-variable message: marginal distribution (beliefs) of one variable: Variable-to-factor message y3 A simple example of the marginal inference on the node y2:

slide-43
SLIDE 43

CNN message estimators

  • Directly learn a CNN function to output the message vector

– Don't need to learn the potential functions

The factor-to-variable message: A message prediction function formualted by a CNN dependent message feature vector: encodes all dependent messages from the neighboring nodes that are connected to the node p by the factor F Input image region

slide-44
SLIDE 44

Learning CNN message estimator

Define the cross entropy loss between the ideal marginal and the estimated marginal: The optimization problem for learning: The variable marginals estimated by CNN:

slide-45
SLIDE 45

Application on semantic segmentation