S3VAE: Self-Supervised Sequential VAE for Representation - - PowerPoint PPT Presentation

s3vae self supervised sequential vae for representation
SMART_READER_LITE
LIVE PREVIEW

S3VAE: Self-Supervised Sequential VAE for Representation - - PowerPoint PPT Presentation

S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation Disentangled Representation Learning: Framework Encoder VAE Objectives: Decoder LSTM in the latent space Self-Supervised Signal (1):


slide-1
SLIDE 1

S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

slide-2
SLIDE 2

Disentangled Representation Learning: Framework

  • Encoder
  • Decoder
  • LSTM in the latent space

VAE Objectives:

slide-3
SLIDE 3

Self-Supervised Signal (1): Static Consistency Constraint

  • To encourage the appearance representation to exclude any dynamic

information.

  • Triplet Loss:

Shuffle temporal

  • rder

Positive Negative Anchor

slide-4
SLIDE 4

Self-Supervised Signal (2): Dynamic Factor Prediction

  • To encourage the motion representation to carry

adequate and correct time-dependent information of each timestep

  • Optical flow provides the location of motion

○ Grid the optical flow map with indices

  • Landmarks provides the subtle motion on facial

expression ○ Distances between upper and lower eyelips and distances between lips

The input frame and optical flow The three distances on faces

slide-5
SLIDE 5

Self-Supervised Signal (3): Mutual Information

  • To encourage the information in and to be mutually exclusive.
  • To minimize the mutual information between and
slide-6
SLIDE 6

Experiments: Representation Swapping

  • Swap the appearance and motion representation of two given videos

Video A Video B Video A Video B

slide-7
SLIDE 7

Real Video Synthesized Video

Experiments: Representation Swapping

slide-8
SLIDE 8

Experiments: Manipulating video generation(Dsprite)

Fix appearance representation Fix motion representation

slide-9
SLIDE 9

Experiments: Manipulating video generation (MUG)

Fix appearance representation Fix motion representation

slide-10
SLIDE 10

Experiments: Quantitatively performance comparison

  • Baseline: our sequential VAE without self-supervision
  • Baseline-sv: our sequential VAE with supervision of ground truth labels
  • Full model: our sequential VAE with self-supervision