Attend, Infer, Repeat: Fast Scene Understanding with Generative - PowerPoint PPT Presentation

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models S.M. Eslami,N. Heess, T. Weber, Y. Tassa, D. Szepesvari, K.Kavukcuoglu, G. E. Hinton Nicolas Brandt nbrandt@cs.toronto.edu

Origins Structured generative methods : Deep generative methods : + : More easily interpretable + : Impressive samples and likelihood score - : Inference hard and slow - : Lack of interpretable meaning How can we combine deep networks and structured probabilistic models in order to obtain interpretable data while being time efficient ? 2

Principle Many real-world scenes can be decomposed into objects. Thus, given an image x , we can make the modeling assumption that the underlying scene description z is structured into groups of variable z i . Each z i will represent the attributes of one object in the scene (type, appearance, position...) 3

Principle Given x and a model p θ ( x | z )p θ ( z ) parameterized by θ, we wish to recover z by computing p θ ( z | x ) = p θ ( x | z )p θ ( z )/p θ ( x ). p θ ( x ) = [1] As the number of objects present in the image will most likely vary from a picture to another, p N (n) will be our prior on the number of objects. NB: We have to define N which will be the maximum possible number of objects present in an image. 4

Principle The inference network will attend to one object at a time and train it jointly with its model. 5

Inference Network Most of the time, the equation [1] is intractable Necessity to approximate the true posterior. Learning a distribution q Φ ( z ,n| x ) parametrized by Φ that minimizes KL[q Φ ( z ,n| x )||p θ ( z ,n| x )] (amortized variational approximation ~ VAE) Nevertheless, in order to use this approximation we have to resolve 2 others problems. 6

Inference Network Trans-dimensionality: Amortized variational approximation is normally used with a fixed size of the latent space, here it is a random variable. We have to evaluate p N (n| x )= ∫p θ ( z ,n| x )d z for n=1,...,N Symmetry: As the index for each object is arbitrary, we can see alternative assignments of objects appearing in an image x to latent variable z i . In order to resolve these issues, we will use an iterative process implemented as a recurrent neural network. This network is run for N steps and will infer at each step the attributes of one object given the image and its previous knowledge of other objects on the image. 7

Inference Network If we consider a vector z pres composed of n ones followed by a zero we can consider q Φ ( z , z pres | x ) instead of q Φ ( z ,n| x ). This new representation will simplify the sequential reasoning : z pres can be considered as a counter stop. While the neural network q Φ outputs z pres =1, it means that the networks should describe at least one more object, if z pres =0, all objects have been described. 8

Learning process The parameters θ (model) and Φ (inference network) can be jointly optimized by using gradient descent in order to maximize : (negative free energy) If p θ is differentiable in θ, it is possible to compute a Monte Carlo Estimate of . Computing is a bit more complex. 9

Learning process For a step i, we consider w i =(z i pres , z i ). Thus, by using chain rule, we have : . Now, if we consider an arbitrary element z i from (z i pres , z i ), we will be able to compute the result with different methods depending on whether z i is continuous (position) or discrete (z i prez ). Continuous: we use the ‘re-parametrization trick’ in order to ‘back-propagate’ through z i Discrete: we use the likelihood ratio estimator. 10

Experiment: MNIST digits Objective: Learn to detect and generate the constituents digits from scratch. In this experiment, we will consider N=3. In practice, each image will only contain 0,1 or 2 numbers. Here, z i =(z i what ,z i where ) where z i what is an integer (value of the digit) and z i where is a 3-dimensional vector (scale and position of the digit) 11

Experiment: MNIST digits Generative Model: 12

Experiment: MNIST digits Inference Network: 13

Experiment: MNIST digits Interaction between Inference and Generation networks: 14

Experiment: MNIST digits Result: source : https://www.youtube.com/watch?v=4tc84kKdpY4&feature=youtu.be&t=60 15

Generalization When the model is trained only using images composed of 0, 1 or 2 digits, it will not be able to infer the correct count when given an image with 3 digits. The model learnt during the training to not expect more than 2 digits How can we improve the generalization ? 16

Differential AIR 17

Conclusion This model structure managed to keep interpretable representation while allowing fast inference (5.6 ms for MNIST). Nevertheless, there are still some challenges : ● Dealing with the reconstruction loss ● Not limiting the maximum number of objects 18

Thank you for your attention ! 19

Attend, Infer, Repeat: Fast Scene Understanding with Generative - PowerPoint PPT Presentation

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models S.M. Eslami,N. Heess, T. Weber, Y. Tassa, D. Szepesvari, K.Kavukcuoglu, G. E. Hinton Nicolas Brandt nbrandt@cs.toronto.edu Origins Structured generative methods : Deep

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Faster Attend-Infer-Repeat with Tractable Probabilistic Models Karl Stelzner 1 , Robert Peharz 2 ,

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Ordering in Tees Valley and County Durham CCGs Changes to Repeat Prescription Ordering in Tees

Approaches to Repeat Finding Beth Skwarecki Cornell Genomics Forum, 2005-03-18 Why is repeat

Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem

Finding Inter-procedural Bugs at Scale with Infer Jules Villard <jul@fb.com> Facebook London

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Short tandem Repeat (STR) Simple sequence repeat Located in non-coding region of genome

High Performance Biopolymers HiperBioPol 1 Repeat units ~ 10 5 Mw > 2 x 10 6 g/mol Repeat

Overlap distribution in the Spherical Sherrington-Kirkpatrick model with V.-L. Nguyen, Benjamin

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence & fit

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York

Resid u als from regression model FOR E C ASTIN G P R OD U C T D E MAN D IN R Aric LaBarr , Ph

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

CS 309: Autonomous Intelligent Robotics FRI I Lecture 09: Introduction to HRI Instructor: Justin

Attend, Infer, Repeat: Fast Scene Understanding with Generative - PowerPoint PPT Presentation

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models S.M. Eslami,N. Heess, T. Weber, Y. Tassa, D. Szepesvari, K.Kavukcuoglu, G. E. Hinton Nicolas Brandt nbrandt@cs.toronto.edu Origins Structured generative methods : Deep

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Faster Attend-Infer-Repeat with Tractable Probabilistic Models Karl Stelzner 1 , Robert Peharz 2 ,

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs &amp; hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

Ordering in Tees Valley and County Durham CCGs Changes to Repeat Prescription Ordering in Tees

Approaches to Repeat Finding Beth Skwarecki Cornell Genomics Forum, 2005-03-18 Why is repeat

Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem

Finding Inter-procedural Bugs at Scale with Infer Jules Villard &lt;jul@fb.com&gt; Facebook London

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Scene Understanding Introduction &amp; Overview Outline Motivation The problems Scene

Deep Incremental Scene Understanding Federico Tombari &amp; Christian Rupprecht Technical

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Short tandem Repeat (STR) Simple sequence repeat Located in non-coding region of genome

High Performance Biopolymers HiperBioPol 1 Repeat units ~ 10 5 Mw &gt; 2 x 10 6 g/mol Repeat

Overlap distribution in the Spherical Sherrington-Kirkpatrick model with V.-L. Nguyen, Benjamin

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence &amp; fit

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York

Resid u als from regression model FOR E C ASTIN G P R OD U C T D E MAN D IN R Aric LaBarr , Ph

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

CS 309: Autonomous Intelligent Robotics FRI I Lecture 09: Introduction to HRI Instructor: Justin

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Finding Inter-procedural Bugs at Scale with Infer Jules Villard <jul@fb.com> Facebook London

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical

High Performance Biopolymers HiperBioPol 1 Repeat units ~ 10 5 Mw > 2 x 10 6 g/mol Repeat

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence & fit