Pay Attention to the Pixel, Understand the Scene Better Shu Kong - - PowerPoint PPT Presentation

pay attention to the pixel understand the scene better
SMART_READER_LITE
LIVE PREVIEW

Pay Attention to the Pixel, Understand the Scene Better Shu Kong - - PowerPoint PPT Presentation

Pay Attention to the Pixel, Understand the Scene Better Shu Kong CS, ICS, UCI Background: Scene Parsing pixel level labeling semantic segmentation -- assigning a class label to each pixel wall painting pillow sofa cabinet


slide-1
SLIDE 1

Shu Kong

CS, ICS, UCI

Pay Attention to the Pixel, Understand the Scene Better

slide-2
SLIDE 2

pixel level labeling semantic segmentation -- assigning a class label to each pixel Background: Scene Parsing

wall painting pillow sofa cabinet towel

slide-3
SLIDE 3

Background: Scene Parsing

  • ld days (before 2013), extracting features to represent pixels,

unary pixel classification and pixel pairs for CRF

features, transforms, grouping, etc.

slide-4
SLIDE 4

Background: Scene Parsing nowadays, deep learning

slide-5
SLIDE 5

Convolutional Neural Network (CNN) aggregating local information avoid computing over whole image directly Background: Convolutional Neural Net

  • Y. Handwritten digit recognition: Applications of neural net chips and automatic learning, Neural Computation, 1989.
slide-6
SLIDE 6

Receptive Field @ high-level layers Background: Receptive Field in CNN

photo credit to Honglak Lee

slide-7
SLIDE 7

Image Classification Background: CNN for Scene Parsing

cat vs. dog

slide-8
SLIDE 8

Dense Prediction Background: CNN for Scene Parsing

wall painting pillow sofa cabinet towel

slide-9
SLIDE 9

Background: Perspective and Scale size(car) > size(train)? size(chair) > size(whiteboard)?

slide-10
SLIDE 10
  • 1. Perspective-aware Pooling
  • 2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing

  • 3. Discussion

Outline

slide-11
SLIDE 11
  • 1. Perspective-aware Pooling
  • 2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing

  • 3. Discussion

Outline

slide-12
SLIDE 12

Goal: deciding for each pixel the size of receptive field (RF) to aggregate information Perspective-aware Pooling

slide-13
SLIDE 13

Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information. Perspective-aware Pooling

slide-14
SLIDE 14

Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.. Depth conveys the scale information. Perspective-aware Pooling

slide-15
SLIDE 15

Idea: making the pooling size adaptive w.r.t depth Perspective-aware Pooling

slide-16
SLIDE 16

Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution). Perspective-aware Pooling

slide-17
SLIDE 17

Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution). Perspective-aware Pooling

slide-18
SLIDE 18

2D atrous convolution of different dilate rates. allowing for larger RF to aggregate more contextual information Perspective-aware Pooling

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

slide-19
SLIDE 19

Idea: making the pooling size adaptive w.r.t depth

Perspective-aware Pooling

slide-20
SLIDE 20

Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling

slide-21
SLIDE 21

Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling

Multiplicative gating

slide-22
SLIDE 22

Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling

slide-23
SLIDE 23

Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling

slide-24
SLIDE 24

When depth is not available in inference --

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-25
SLIDE 25

When depth is not available in inference -- Idea: train a depth estimator

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-26
SLIDE 26

When depth is not available in inference -- Idea: train a depth estimator

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-27
SLIDE 27

When depth is not available in inference -- Idea: train a depth estimator

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Why better?

capacity, representation power

slide-28
SLIDE 28

Consistent improvement

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-29
SLIDE 29

Recurrent refinement by adapting the predicted depth

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-30
SLIDE 30

Recurrent refinement by adapting the predicted depth

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-31
SLIDE 31

Recurrently refining by adapting the predicted depth

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-32
SLIDE 32

Recurrent refinement by adapting the predicted depth

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-33
SLIDE 33

No depth at all? Idea: train unsupervised attention map

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-34
SLIDE 34

No depth at all? Idea: train unsupervised attention map

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
slide-35
SLIDE 35

No depth at all? Idea: train unsupervised attention map

Perspective-aware Pooling

  • S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

gt-depth pred-depth attention

slide-36
SLIDE 36
  • 1. Perspective-aware Pooling
  • 2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing

  • 3. Discussion

Pixel-wise Attentional Gating

slide-37
SLIDE 37

Improvement 1. attention vs. depth

general, learning rule

Pixel-wise Attentional Gating

slide-38
SLIDE 38

Improvement 1. attention vs. depth

general, learning rule

2. binary gating vs. weighted gating

Pixel-wise Attentional Gating

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
slide-39
SLIDE 39

Improvement 1. attention vs. depth

general, learning rule

2. binary gating vs. weighted gating

computational saving

Pixel-wise Attentional Gating

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
slide-40
SLIDE 40

PAG produces binary masks, e.g., using argmax on softmax.

Pixel-wise Attentional Gating

slide-41
SLIDE 41

PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training?

Pixel-wise Attentional Gating

slide-42
SLIDE 42

PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training? Idea: Gumbel-softmax trick[1,2]

Pixel-wise Attentional Gating

[1] Categorical reparameterization with gumbel-softmax, ICLR, 2017 [2] The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

slide-43
SLIDE 43

The normal way for sampling is to use argmax operator on softmax.

Pixel-wise Attentional Gating

Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

slide-44
SLIDE 44

The normal way for sampling is to use argmax operator on softmax. Then,

Pixel-wise Attentional Gating

Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

slide-45
SLIDE 45

The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where

Pixel-wise Attentional Gating

Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

slide-46
SLIDE 46

The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where random variable m follows

Pixel-wise Attentional Gating

Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

slide-47
SLIDE 47

but not continuous, not differentiable

Pixel-wise Attentional Gating

Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

slide-48
SLIDE 48

but not continuous, not differentiable expressing a discrete random variable as a one-hot vector

Pixel-wise Attentional Gating

Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

slide-49
SLIDE 49

but not continuous, not differentiable expressing a discrete random variable as a one-hot vector the Gumbel softmax relaxation is

Pixel-wise Attentional Gating

Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

slide-50
SLIDE 50
  • ne-hot-encoded categorical distribution

from argmax to uniform controlled by = 0~inf

Pixel-wise Attentional Gating

Categorical reparameterization with gumbel-softmax, ICLR, 2017

slide-51
SLIDE 51

PAG produces binary masks. Two applications

1. parallel pooling branch for deciding the “right” receptive field 2. pixel-level dynamic routing

Pixel-wise Attentional Gating

slide-52
SLIDE 52
  • 1. Perspective-aware Pooling
  • 2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing

  • 3. Discussion

Attentional Pooling for the “Right” Receptive Field

slide-53
SLIDE 53

Improvement 1. attention vs. depth

general, learning rule

2. binary gating vs. weighted gating

computational saving

Attentional Pooling for the “Right” Receptive Field

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
slide-54
SLIDE 54

Visual summary of three tasks on three different datasets Consistent result: PAG improves baseline, and is comparable to Multiplicative gating

Attentional Pooling for the “Right” Receptive Field

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-55
SLIDE 55

semantic segmentation

Attentional Pooling for the “Right” Receptive Field

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-56
SLIDE 56
  • 1. Perspective-aware Pooling
  • 2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing

  • 3. Discussion

Pixel-Level Dynamic Routing

slide-57
SLIDE 57

dynamic computation time for each pixel of an image It is useful with limited computation budget.

Pixel-Level Dynamic Routing

slide-58
SLIDE 58

Pixel-Level Dynamic Routing

slide-59
SLIDE 59

sparse PAG at each layer Pixel-Level Dynamic Routing

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-60
SLIDE 60

sparse binary mask Using KL-divergence term for sparse masks.

Pixel-Level Dynamic Routing

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-61
SLIDE 61

experiment of semantic segmentation

Pixel-Level Dynamic Routing

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-62
SLIDE 62

experiment of semantic segmentation

Pixel-Level Dynamic Routing

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-63
SLIDE 63

learning to skip layers for dynamic routing Pixel-Level Dynamic Routing

[1] BlockDrop: Dynamic Inference Paths in Residual Networks [2] Convolutional Networks with Adaptive Computation Graphs [3] SkipNet: Learning Dynamic Routing in Convolutional Networks

slide-64
SLIDE 64

experiment of semantic segmentation

Pixel-Level Dynamic Routing

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-65
SLIDE 65

experiment of semantic segmentation

Pixel-Level Dynamic Routing

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-66
SLIDE 66

Semantic segmentation on NYU-depth-v2 dataset Boundary detection on BSDS500 dataset

Pixel-Level Dynamic Routing

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-67
SLIDE 67

Visualization of sparse binary masks ponder map = pixel-level computation depth

Pixel-Level Dynamic Routing

  • S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
slide-68
SLIDE 68

indoor image panorama street scene

Pixel-Level Dynamic Routing

slide-69
SLIDE 69
  • 1. Perspective-aware Pooling
  • 2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing

  • 3. Discussion

Discussion

slide-70
SLIDE 70

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.

Discussion

slide-71
SLIDE 71

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems. 2. PAG scales back computation with little cost in performance.

Discussion

slide-72
SLIDE 72

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems. 2. PAG scales back computation with little cost in performance.

enough savings? real-time?

Discussion

slide-73
SLIDE 73

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems. 2. PAG scales back computation with little cost in performance. 3. More attention to pixels for scene parsing.

Discussion

slide-74
SLIDE 74

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

slide-75
SLIDE 75

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

slide-76
SLIDE 76

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

slide-77
SLIDE 77

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

slide-78
SLIDE 78

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

slide-79
SLIDE 79

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems. 2. PAG scales back computation with little cost in performance. real-time? 3. More attention to pixels for scene parsing. 4. Potentially a unified model for all these tasks. How?

Discussion

slide-80
SLIDE 80

Thanks

Q&A Charless Fowlkes Shu Kong