[PPT] - Poster #24 1 Applied AI Lab, Oxford Robotics Institute 2 Department PowerPoint Presentation

SLIDE 1

Sequential Attend, Infer, Repeat:

Generative Modelling of Moving Objects

NeurIPS 2018

Poster #24

Adam R. Kosiorek1,2, Hyunjik Kim2, Ingmar Posner1, Yee Whye Teh2

1 Applied AI Lab, Oxford Robotics Institute 2 Department of Statistics, University of Oxford

SLIDE 2

Attend, Infer, Repeat1

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

SLIDE 3

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

SLIDE 4

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

Variational Autoencoder (VAE)

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

SLIDE 5

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

Variational Autoencoder (VAE)
Decomposes an image into objects

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

SLIDE 6

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

Variational Autoencoder (VAE)
Decomposes an image into objects
Explains each object with a separate latent

variable

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

SLIDE 7

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

Variational Autoencoder (VAE)
Decomposes an image into objects
Explains each object with a separate latent

variable Here, we have two objects with superscripts 1 and 4

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

SLIDE 8

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

AIR: Latent Variables

Objects are explained by separate latent variables

SLIDE 9

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

AIR: Latent Variables

Objects are explained by separate latent variables what: Gaussian, how does it look like?

SLIDE 10

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

AIR: Latent Variables

Objects are explained by separate latent variables Gaussian, how does it look like? Gaussian, where and how big is it? what: where:

SLIDE 11

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

AIR: Latent Variables

what: where: presence: Gaussian, how does it look like? Gaussian, where and how big is it? Bernoulli, does it exist? Objects are explained by separate latent variables

SLIDE 12

Sequential Attend, Infer, Repeat

SLIDE 13

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR: Generative Model

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences

SLIDE 14

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR: Generative Model

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

SLIDE 15

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Objects can appear and disappear in every frame

SQAIR: Generative Model

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

SLIDE 16

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Here, object 4 appeared and

bject 3 disappeared in frame t

Objects can appear and disappear in every frame

SQAIR: Generative Model

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

SLIDE 17

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Reconstructions

SQAIR can model sequences of moving objects

SLIDE 18

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Reconstructions

SQAIR can model sequences of moving objects like this one

SLIDE 19

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Reconstructions

SQAIR can model sequences of moving objects like this one any VAE could reconstruct it

SLIDE 20

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Reconstructions

SQAIR can model sequences of moving objects like this one any VAE could reconstruct it

ne latent variable per object

knows their location maintains identity (unlike AIR) SQAIR:

SLIDE 21

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Samples

Once trained, we can sample from SQAIR Check what the model learned

SLIDE 22

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Samples

Once trained, we can sample from SQAIR Check what the model learned Object appearance does not change between frames

SLIDE 23

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Samples

Once trained, we can sample from SQAIR Check what the model learned Object appearance does not change between frames Motion is consistent with motion patterns in the training set

SLIDE 24

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Conditional Generation

Condition the model on three frames Predict the next 97 frames by sampling from the prior

SLIDE 25

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Conditional Generation

Condition the model on three frames For every conditioning sequence, we can imagine different rollouts Predict the next 97 frames by sampling from the prior

SLIDE 26

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

bservations

SQAIR AIR

SLIDE 27

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

bservations

SQAIR AIR

SLIDE 28

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

bservations

SQAIR AIR

SLIDE 29

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

bservations

SQAIR AIR

Disentangling overlapping

bjects

SQAIR AIR

SLIDE 30

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

bservations

SQAIR AIR

Disentangling overlapping

bjects

SQAIR AIR

SLIDE 31

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

bservations

SQAIR AIR

Disentangling overlapping

bjects

SQAIR AIR

missing

bjects!

SLIDE 32

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

bservations

SQAIR AIR

Disentangling overlapping

bjects

SQAIR AIR

missing

bjects!

SLIDE 33

Real World Data: Unsupervised Detection & Tracking

f Pedestrians

SLIDE 34

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Reconstructions

2 Ristani et. al., “Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking”, ECCV workshop, 2016.

DukeMTMC dataset2 contains videos from static CCTV cameras

SLIDE 35

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Reconstructions

2 Ristani et. al., “Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking”, ECCV workshop, 2016.

DukeMTMC dataset2 contains videos from static CCTV cameras Pre-process by removing backgrounds and inverting colours

SLIDE 36

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Reconstructions

2 Ristani et. al., “Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking”, ECCV workshop, 2016.

DukeMTMC dataset2 contains videos from static CCTV cameras Pre-process by removing backgrounds and inverting colours SQAIR learns to detect & track pedestrians without human supervision!

SLIDE 37

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Conditional Generation

SQAIR trained on sequences

f five frames

SLIDE 38

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Conditional Generation

SQAIR trained on sequences

f five frames
Condition the model on five frames
Predict the next 15 frames by

sampling from the prior

SLIDE 39

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Conditional Generation

SQAIR trained on sequences

f five frames

Each row contains five different predictions for the same sequence

Condition the model on five frames
Predict the next 15 frames by

sampling from the prior

SLIDE 40

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Sequential Attend, Infer, Repeat:

Generative Modelling of Moving Objects

Poster #24

Attend, Infer, Repeat1

Attend, Infer, Repeat1 (AIR):

Attend, Infer, Repeat1 (AIR):

Attend, Infer, Repeat1 (AIR):

Attend, Infer, Repeat1 (AIR):

variable

Attend, Infer, Repeat1 (AIR):

variable Here, we have two objects with superscripts 1 and 4

Objects are explained by separate latent variables

Objects are explained by separate latent variables what: Gaussian, how does it look like?

Objects are explained by separate latent variables Gaussian, how does it look like? Gaussian, where and how big is it? what: where:

what: where: presence: Gaussian, how does it look like? Gaussian, where and how big is it? Bernoulli, does it exist? Objects are explained by separate latent variables

Sequential Attend, Infer, Repeat

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

Objects can appear and disappear in every frame

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

Here, object 4 appeared and

Objects can appear and disappear in every frame

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

SQAIR can model sequences of moving objects

SQAIR can model sequences of moving objects like this one

SQAIR can model sequences of moving objects like this one any VAE could reconstruct it

SQAIR can model sequences of moving objects like this one any VAE could reconstruct it

knows their location maintains identity (unlike AIR) SQAIR:

Once trained, we can sample from SQAIR Check what the model learned

Once trained, we can sample from SQAIR Check what the model learned Object appearance does not change between frames

Once trained, we can sample from SQAIR Check what the model learned Object appearance does not change between frames Motion is consistent with motion patterns in the training set

Condition the model on three frames Predict the next 97 frames by sampling from the prior

Condition the model on three frames For every conditioning sequence, we can imagine different rollouts Predict the next 97 frames by sampling from the prior

Reconstruction from partial

Reconstruction from partial

Reconstruction from partial

Reconstruction from partial

Disentangling overlapping

Reconstruction from partial

Disentangling overlapping

Reconstruction from partial

Disentangling overlapping

Reconstruction from partial

Disentangling overlapping

Real World Data: Unsupervised Detection & Tracking

DukeMTMC dataset2 contains videos from static CCTV cameras

DukeMTMC dataset2 contains videos from static CCTV cameras Pre-process by removing backgrounds and inverting colours

DukeMTMC dataset2 contains videos from static CCTV cameras Pre-process by removing backgrounds and inverting colours SQAIR learns to detect & track pedestrians without human supervision!

SQAIR trained on sequences

SQAIR trained on sequences

sampling from the prior

SQAIR trained on sequences

Each row contains five different predictions for the same sequence

sampling from the prior

Poster #24 Code: /akosiorek/SQAIR