Pay Attention to the Pixel, Understand the Scene Better Shu Kong - - PowerPoint PPT Presentation
Pay Attention to the Pixel, Understand the Scene Better Shu Kong - - PowerPoint PPT Presentation
Pay Attention to the Pixel, Understand the Scene Better Shu Kong CS, ICS, UCI Background: Scene Parsing pixel level labeling semantic segmentation -- assigning a class label to each pixel wall painting pillow sofa cabinet
pixel level labeling semantic segmentation -- assigning a class label to each pixel Background: Scene Parsing
wall painting pillow sofa cabinet towel
Background: Scene Parsing
- ld days (before 2013), extracting features to represent pixels,
unary pixel classification and pixel pairs for CRF
features, transforms, grouping, etc.
Background: Scene Parsing nowadays, deep learning
Convolutional Neural Network (CNN) aggregating local information avoid computing over whole image directly Background: Convolutional Neural Net
- Y. Handwritten digit recognition: Applications of neural net chips and automatic learning, Neural Computation, 1989.
Receptive Field @ high-level layers Background: Receptive Field in CNN
photo credit to Honglak Lee
Image Classification Background: CNN for Scene Parsing
cat vs. dog
Dense Prediction Background: CNN for Scene Parsing
wall painting pillow sofa cabinet towel
Background: Perspective and Scale size(car) > size(train)? size(chair) > size(whiteboard)?
- 1. Perspective-aware Pooling
- 2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing
- 3. Discussion
Outline
- 1. Perspective-aware Pooling
- 2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing
- 3. Discussion
Outline
Goal: deciding for each pixel the size of receptive field (RF) to aggregate information Perspective-aware Pooling
Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information. Perspective-aware Pooling
Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.. Depth conveys the scale information. Perspective-aware Pooling
Idea: making the pooling size adaptive w.r.t depth Perspective-aware Pooling
Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution). Perspective-aware Pooling
Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution). Perspective-aware Pooling
2D atrous convolution of different dilate rates. allowing for larger RF to aggregate more contextual information Perspective-aware Pooling
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Idea: making the pooling size adaptive w.r.t depth
Perspective-aware Pooling
Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling
Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling
Multiplicative gating
Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling
Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling
When depth is not available in inference --
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
When depth is not available in inference -- Idea: train a depth estimator
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
When depth is not available in inference -- Idea: train a depth estimator
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
When depth is not available in inference -- Idea: train a depth estimator
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Why better?
capacity, representation power
Consistent improvement
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Recurrent refinement by adapting the predicted depth
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Recurrent refinement by adapting the predicted depth
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Recurrently refining by adapting the predicted depth
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Recurrent refinement by adapting the predicted depth
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
No depth at all? Idea: train unsupervised attention map
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
No depth at all? Idea: train unsupervised attention map
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
No depth at all? Idea: train unsupervised attention map
Perspective-aware Pooling
- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
gt-depth pred-depth attention
- 1. Perspective-aware Pooling
- 2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing
- 3. Discussion
Pixel-wise Attentional Gating
Improvement 1. attention vs. depth
general, learning rule
Pixel-wise Attentional Gating
Improvement 1. attention vs. depth
general, learning rule
2. binary gating vs. weighted gating
Pixel-wise Attentional Gating
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
Improvement 1. attention vs. depth
general, learning rule
2. binary gating vs. weighted gating
computational saving
Pixel-wise Attentional Gating
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
PAG produces binary masks, e.g., using argmax on softmax.
Pixel-wise Attentional Gating
PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training?
Pixel-wise Attentional Gating
PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training? Idea: Gumbel-softmax trick[1,2]
Pixel-wise Attentional Gating
[1] Categorical reparameterization with gumbel-softmax, ICLR, 2017 [2] The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
The normal way for sampling is to use argmax operator on softmax.
Pixel-wise Attentional Gating
Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
The normal way for sampling is to use argmax operator on softmax. Then,
Pixel-wise Attentional Gating
Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where
Pixel-wise Attentional Gating
Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where random variable m follows
Pixel-wise Attentional Gating
Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
but not continuous, not differentiable
Pixel-wise Attentional Gating
Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
but not continuous, not differentiable expressing a discrete random variable as a one-hot vector
Pixel-wise Attentional Gating
Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
but not continuous, not differentiable expressing a discrete random variable as a one-hot vector the Gumbel softmax relaxation is
Pixel-wise Attentional Gating
Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
- ne-hot-encoded categorical distribution
from argmax to uniform controlled by = 0~inf
Pixel-wise Attentional Gating
Categorical reparameterization with gumbel-softmax, ICLR, 2017
PAG produces binary masks. Two applications
1. parallel pooling branch for deciding the “right” receptive field 2. pixel-level dynamic routing
Pixel-wise Attentional Gating
- 1. Perspective-aware Pooling
- 2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing
- 3. Discussion
Attentional Pooling for the “Right” Receptive Field
Improvement 1. attention vs. depth
general, learning rule
2. binary gating vs. weighted gating
computational saving
Attentional Pooling for the “Right” Receptive Field
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
Visual summary of three tasks on three different datasets Consistent result: PAG improves baseline, and is comparable to Multiplicative gating
Attentional Pooling for the “Right” Receptive Field
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
semantic segmentation
Attentional Pooling for the “Right” Receptive Field
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
- 1. Perspective-aware Pooling
- 2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing
- 3. Discussion
Pixel-Level Dynamic Routing
dynamic computation time for each pixel of an image It is useful with limited computation budget.
Pixel-Level Dynamic Routing
Pixel-Level Dynamic Routing
sparse PAG at each layer Pixel-Level Dynamic Routing
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
sparse binary mask Using KL-divergence term for sparse masks.
Pixel-Level Dynamic Routing
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
experiment of semantic segmentation
Pixel-Level Dynamic Routing
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
experiment of semantic segmentation
Pixel-Level Dynamic Routing
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
learning to skip layers for dynamic routing Pixel-Level Dynamic Routing
[1] BlockDrop: Dynamic Inference Paths in Residual Networks [2] Convolutional Networks with Adaptive Computation Graphs [3] SkipNet: Learning Dynamic Routing in Convolutional Networks
experiment of semantic segmentation
Pixel-Level Dynamic Routing
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
experiment of semantic segmentation
Pixel-Level Dynamic Routing
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
Semantic segmentation on NYU-depth-v2 dataset Boundary detection on BSDS500 dataset
Pixel-Level Dynamic Routing
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
Visualization of sparse binary masks ponder map = pixel-level computation depth
Pixel-Level Dynamic Routing
- S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
indoor image panorama street scene
Pixel-Level Dynamic Routing
- 1. Perspective-aware Pooling
- 2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing
- 3. Discussion
Discussion
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.
Discussion
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems. 2. PAG scales back computation with little cost in performance.
Discussion
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems. 2. PAG scales back computation with little cost in performance.
enough savings? real-time?
Discussion
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems. 2. PAG scales back computation with little cost in performance. 3. More attention to pixels for scene parsing.
Discussion
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot