Pay Attention to the Pixel, Understand the Scene Better Shu Kong - PowerPoint PPT Presentation

Pay Attention to the Pixel, Understand the Scene Better Shu Kong CS, ICS, UCI

Background: Scene Parsing pixel level labeling semantic segmentation -- assigning a class label to each pixel wall painting pillow sofa cabinet towel

Background: Scene Parsing old days (before 2013), extracting features to represent pixels, unary pixel classification and pixel pairs for CRF features, transforms, grouping, etc.

Background: Scene Parsing nowadays, deep learning

Background: Convolutional Neural Net Convolutional Neural Network (CNN) aggregating local information avoid computing over whole image directly Y. Handwritten digit recognition: Applications of neural net chips and automatic learning, Neural Computation, 1989.

Background: Receptive Field in CNN Receptive Field @ high-level layers photo credit to Honglak Lee

Background: CNN for Scene Parsing Image Classification cat vs. dog

Background: CNN for Scene Parsing Dense Prediction wall painting pillow sofa cabinet towel

Background: Perspective and Scale size(car) > size(train)? size(chair) > size(whiteboard)?

Outline 1. Perspective-aware Pooling 2. Pixel-wise Attentional Gating (PAG) 1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing 3. Discussion

Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information

Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.

Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.. Depth conveys the scale information.

Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth

Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution).

Perspective-aware Pooling 2D atrous convolution of different dilate rates. allowing for larger RF to aggregate more contextual information DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth

Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16} Multiplicative gating

Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling When depth is not available in inference -- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Perspective-aware Pooling When depth is not available in inference -- Idea: train a depth estimator S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Perspective-aware Pooling When depth is not available in inference -- Idea: train a depth estimator Why better? capacity, representation power S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Perspective-aware Pooling Consistent improvement S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Perspective-aware Pooling Recurrent refinement by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Perspective-aware Pooling Recurrently refining by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Perspective-aware Pooling Recurrent refinement by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Perspective-aware Pooling No depth at all? Idea: train unsupervised attention map S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Perspective-aware Pooling No depth at all? Idea: train unsupervised attention map gt-depth pred-depth attention S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Pixel-wise Attentional Gating 1. Perspective-aware Pooling 2. Pixel-wise Attentional Gating (PAG) 1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing 3. Discussion

Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule

Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule 2. binary gating vs. weighted gating S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018

Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule 2. binary gating vs. weighted gating computational saving S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018

Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax.

Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training?

Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training? Idea: Gumbel-softmax trick [1,2] [1] Categorical reparameterization with gumbel-softmax, ICLR, 2017 [2] The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where random variable m follows Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

Pixel-wise Attentional Gating but not continuous, not differentiable Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

Pixel-wise Attentional Gating but not continuous, not differentiable expressing a discrete random variable as a one-hot vector Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

Pixel-wise Attentional Gating but not continuous, not differentiable expressing a discrete random variable as a one-hot vector the Gumbel softmax relaxation is Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

Pixel-wise Attentional Gating one-hot-encoded categorical distribution from argmax to uniform controlled by = 0~inf Categorical reparameterization with gumbel-softmax, ICLR, 2017

Pixel-wise Attentional Gating PAG produces binary masks. Two applications 1. parallel pooling branch for deciding the “right” receptive field 2. pixel-level dynamic routing

Pay Attention to the Pixel, Understand the Scene Better Shu Kong - PowerPoint PPT Presentation

Pay Attention to the Pixel, Understand the Scene Better Shu Kong CS, ICS, UCI Background: Scene Parsing pixel level labeling semantic segmentation -- assigning a class label to each pixel wall painting pillow sofa cabinet

Pixel Presentation What is Pixel Pixel is an education and training institution with a specific

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Econom ical Aspects Econom ical Aspects Pay per Risk Pay per Use Pay per Use Pay per

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Pixel Art What is pixel art? Pixel art is a digital art form that is created in raster in its

The pixel hybrid photon detectors The pixel hybrid photon detectors f or the LHCb LHCb- RI CH

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Gender Pay Gap Reporting What is Gender Pay Gap? Gender Pay Gap is the difference between the

PAY ATTENTION! Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention:

Development of the CMS Phase-1 Pixel Online Monitoring System and the Evolution of Pixel Leakage

The ATLAS Pixel Detector Vclav Vrba Institute of Physics, Praha Representing the ATLAS Pixel

DEPFET Pixel: A Pixel Device with Integrated Amplification Johannes Ulrici Bonn University

PIXEL DETECTOR for X-RAY experiments PIXEL DETECTOR for X-Rays Experiments Jean-Claude CLEMENS

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

Do Now! Have out your IMN, calculator and a pencil. TOC: pg. 13,Dilations and Scale Factor

CSSE463: Image Recognition Day 3 Announcements/reminders: Lab 1 should have been turned in

Einfhrung in Visual Computing Unit 18: Morphological Operations http://

SVG Filters A Crash Course blur() brightness() contrast() grayscale()

h -polynomials of dilated lattice polytopes Katharina Jochemko KTH Stockholm Einstein

Convolutions and Their Tails Anirban DasGupta 1 Y, Z : ( , A , P ) ( X , B ) , Y, Z

Review Images an array of colors Color RGBA Loading, modifying, updating pixels

Pay Attention to the Pixel, Understand the Scene Better Shu Kong - PowerPoint PPT Presentation

Pay Attention to the Pixel, Understand the Scene Better Shu Kong CS, ICS, UCI Background: Scene Parsing pixel level labeling semantic segmentation -- assigning a class label to each pixel wall painting pillow sofa cabinet

Pixel Presentation What is Pixel Pixel is an education and training institution with a specific

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Econom ical Aspects Econom ical Aspects Pay per Risk Pay per Use Pay per Use Pay per

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Pixel Art What is pixel art? Pixel art is a digital art form that is created in raster in its

The pixel hybrid photon detectors The pixel hybrid photon detectors f or the LHCb LHCb- RI CH

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Gender Pay Gap Reporting What is Gender Pay Gap? Gender Pay Gap is the difference between the

PAY ATTENTION! Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention:

Development of the CMS Phase-1 Pixel Online Monitoring System and the Evolution of Pixel Leakage

The ATLAS Pixel Detector Vclav Vrba Institute of Physics, Praha Representing the ATLAS Pixel

DEPFET Pixel: A Pixel Device with Integrated Amplification Johannes Ulrici Bonn University

PIXEL DETECTOR for X-RAY experiments PIXEL DETECTOR for X-Rays Experiments Jean-Claude CLEMENS

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs &amp; hierarchies

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

Do Now! Have out your IMN, calculator and a pencil. TOC: pg. 13,Dilations and Scale Factor

CSSE463: Image Recognition Day 3 Announcements/reminders: Lab 1 should have been turned in

Einfhrung in Visual Computing Unit 18: Morphological Operations http://

SVG Filters A Crash Course blur() brightness() contrast() grayscale()

h -polynomials of dilated lattice polytopes Katharina Jochemko KTH Stockholm Einstein

Convolutions and Their Tails Anirban DasGupta 1 Y, Z : ( , A , P ) ( X , B ) , Y, Z

Review Images an array of colors Color RGBA Loading, modifying, updating pixels

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies