Poster #24 1 Applied AI Lab, Oxford Robotics Institute 2 Department - - PowerPoint PPT Presentation

poster 24
SMART_READER_LITE
LIVE PREVIEW

Poster #24 1 Applied AI Lab, Oxford Robotics Institute 2 Department - - PowerPoint PPT Presentation

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects Adam R. Kosiorek 1,2 , Hyunjik Kim 2 , Ingmar Posner 1 , Yee Whye Teh 2 Poster #24 1 Applied AI Lab, Oxford Robotics Institute 2 Department of Statistics, University of


slide-1
SLIDE 1

Sequential Attend, Infer, Repeat:

Generative Modelling of Moving Objects

NeurIPS 2018

Poster #24

Adam R. Kosiorek1,2, Hyunjik Kim2, Ingmar Posner1, Yee Whye Teh2

1 Applied AI Lab, Oxford Robotics Institute 2 Department of Statistics, University of Oxford

slide-2
SLIDE 2

Attend, Infer, Repeat1

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

slide-3
SLIDE 3

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

slide-4
SLIDE 4

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

  • Variational Autoencoder (VAE)

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

slide-5
SLIDE 5

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

  • Variational Autoencoder (VAE)
  • Decomposes an image into objects

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

slide-6
SLIDE 6

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

  • Variational Autoencoder (VAE)
  • Decomposes an image into objects
  • Explains each object with a separate latent

variable

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

slide-7
SLIDE 7

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Attend, Infer, Repeat

Attend, Infer, Repeat1 (AIR):

  • Variational Autoencoder (VAE)
  • Decomposes an image into objects
  • Explains each object with a separate latent

variable Here, we have two objects with superscripts 1 and 4

1 Eslami et. al., “Attend, Infer, Repeat”, NIPS 2016.

slide-8
SLIDE 8

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

AIR: Latent Variables

Objects are explained by separate latent variables

slide-9
SLIDE 9

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

AIR: Latent Variables

Objects are explained by separate latent variables what: Gaussian, how does it look like?

slide-10
SLIDE 10

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

AIR: Latent Variables

Objects are explained by separate latent variables Gaussian, how does it look like? Gaussian, where and how big is it? what: where:

slide-11
SLIDE 11

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

AIR: Latent Variables

what: where: presence: Gaussian, how does it look like? Gaussian, where and how big is it? Bernoulli, does it exist? Objects are explained by separate latent variables

slide-12
SLIDE 12

Sequential Attend, Infer, Repeat

slide-13
SLIDE 13

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR: Generative Model

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences

slide-14
SLIDE 14

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR: Generative Model

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

slide-15
SLIDE 15

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Objects can appear and disappear in every frame

SQAIR: Generative Model

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

slide-16
SLIDE 16

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Here, object 4 appeared and

  • bject 3 disappeared in frame t

Objects can appear and disappear in every frame

SQAIR: Generative Model

Sequential Attend, Infer Repeat (SQAIR) extends AIR to image sequences Like AIR: model objects with separate latent variables

slide-17
SLIDE 17

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Reconstructions

SQAIR can model sequences of moving objects

slide-18
SLIDE 18

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Reconstructions

SQAIR can model sequences of moving objects like this one

slide-19
SLIDE 19

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Reconstructions

SQAIR can model sequences of moving objects like this one any VAE could reconstruct it

slide-20
SLIDE 20

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Reconstructions

SQAIR can model sequences of moving objects like this one any VAE could reconstruct it

  • ne latent variable per object

knows their location maintains identity (unlike AIR) SQAIR:

slide-21
SLIDE 21

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Samples

Once trained, we can sample from SQAIR Check what the model learned

slide-22
SLIDE 22

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Samples

Once trained, we can sample from SQAIR Check what the model learned Object appearance does not change between frames

slide-23
SLIDE 23

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Samples

Once trained, we can sample from SQAIR Check what the model learned Object appearance does not change between frames Motion is consistent with motion patterns in the training set

slide-24
SLIDE 24

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Conditional Generation

Condition the model on three frames Predict the next 97 frames by sampling from the prior

slide-25
SLIDE 25

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

MNIST: Conditional Generation

Condition the model on three frames For every conditioning sequence, we can imagine different rollouts Predict the next 97 frames by sampling from the prior

slide-26
SLIDE 26

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

  • bservations

SQAIR AIR

slide-27
SLIDE 27

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

  • bservations

SQAIR AIR

slide-28
SLIDE 28

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

  • bservations

SQAIR AIR

slide-29
SLIDE 29

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

  • bservations

SQAIR AIR

Disentangling overlapping

  • bjects

SQAIR AIR

slide-30
SLIDE 30

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

  • bservations

SQAIR AIR

Disentangling overlapping

  • bjects

SQAIR AIR

slide-31
SLIDE 31

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

  • bservations

SQAIR AIR

Disentangling overlapping

  • bjects

SQAIR AIR

missing

  • bjects!
slide-32
SLIDE 32

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

SQAIR vs AIR

Reconstruction from partial

  • bservations

SQAIR AIR

Disentangling overlapping

  • bjects

SQAIR AIR

missing

  • bjects!
slide-33
SLIDE 33

Real World Data: Unsupervised Detection & Tracking

  • f Pedestrians
slide-34
SLIDE 34

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Reconstructions

2 Ristani et. al., “Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking”, ECCV workshop, 2016.

DukeMTMC dataset2 contains videos from static CCTV cameras

slide-35
SLIDE 35

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Reconstructions

2 Ristani et. al., “Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking”, ECCV workshop, 2016.

DukeMTMC dataset2 contains videos from static CCTV cameras Pre-process by removing backgrounds and inverting colours

slide-36
SLIDE 36

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Reconstructions

2 Ristani et. al., “Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking”, ECCV workshop, 2016.

DukeMTMC dataset2 contains videos from static CCTV cameras Pre-process by removing backgrounds and inverting colours SQAIR learns to detect & track pedestrians without human supervision!

slide-37
SLIDE 37

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Conditional Generation

SQAIR trained on sequences

  • f five frames
slide-38
SLIDE 38

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Conditional Generation

SQAIR trained on sequences

  • f five frames
  • Condition the model on five frames
  • Predict the next 15 frames by

sampling from the prior

slide-39
SLIDE 39

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

DukeMTMC: Conditional Generation

SQAIR trained on sequences

  • f five frames

Each row contains five different predictions for the same sequence

  • Condition the model on five frames
  • Predict the next 15 frames by

sampling from the prior

slide-40
SLIDE 40

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Poster #24 Code: /akosiorek/SQAIR