[PPT] - Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon PowerPoint Presentation

SLIDE 1

Multi-Dimensional LSTM Networks for Video Prediction

Wonmin Byeon

NVIDIA Research

March 29, 2018

Wonmin Byeon | NVIDIA Research | March 29, 2018 1 / 44

SLIDE 2

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 2 / 44

Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] Page Segmentation Fluid Simulation / Pressure Solve Fully-CNN [Wick18] CNN [Tompson17]

SLIDE 3

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 3 / 44

Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15]

Needs a lot of computations

SLIDE 4

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 3 / 44

Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15]

Needs a lot of computations

Each window can be computed in parallel An efficient GPU implementation is possible

SLIDE 5

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 4 / 44

Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15]

Needs a lot of computations

Each window can be computed in parallel An efficient GPU implementation is possible

Has a fixed size of receptive field

SLIDE 6

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 4 / 44

Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15]

Needs a lot of computations

Each window can be computed in parallel An efficient GPU implementation is possible

Has a fixed size of receptive field
Perceives only small local contexts of the pixels

SLIDE 7

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 5 / 44 Images from Zheng’s ECCV16 tutorial

Needs a lot of computations

Each window can be computed in parallel An efficient GPU implementation is possible

Has a fixed size of receptive field
Perceives only small local contexts of the pixels

SLIDE 8

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 5 / 44 Images from Zheng’s ECCV16 tutorial

Needs a lot of computations

Each window can be computed in parallel An efficient GPU implementation is possible

Has a fixed size of receptive field
Perceives only small local contexts of the pixels

Solutions?

SLIDE 9

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks: solutions

Wonmin Byeon | NVIDIA Research | March 29, 2018 6 / 44

Up-pooling (deconvolution) Adding Conditional Random Field (CRF) DeconvNet [Noh16] DeepLab [Chen16]

SLIDE 10

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks: solutions?

Wonmin Byeon | NVIDIA Research | March 29, 2018 7 / 44

Up-pooling (deconvolution) Adding Conditional Random Field (CRF) DeconvNet [Noh16] DeepLab [Chen16] Using Dilated/Atrous Convolutions Dilated Convolutions [Yu15] DeepLab V2 [Chen16]

Animation from https://github.com/vdumoulin/conv_arithmetic

SLIDE 11

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks: solutions?

Wonmin Byeon | NVIDIA Research | March 29, 2018 8 / 44

Using Dilated Convolutions & Going Deeper DeepLab V3 [Chen17]

SLIDE 12

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional Neural Networks: solutions?

Wonmin Byeon | NVIDIA Research | March 29, 2018 9 / 44

Using Dilated Convolutions & Going Deeper DeepLab V3 [Chen17] Fusing Multi-Resolutions Adopting Large Kernels RefineNet [Lin16] Global-CNN [Peng17]

SLIDE 13

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44

How can we efficiently capture global/long range context?

SLIDE 14

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44

How can we efficiently capture global/long range context?

SLIDE 15

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44

How can we efficiently capture global/long range context?

SLIDE 16

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44

How can we efficiently capture global/long range context?

Image from http://staffwww.dcs.shef.ac.uk/people/H.Lu/feeler.html

SLIDE 17

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 11 / 44

Long Short Term Memory Recurrent Networks

SLIDE 18

Multi-Dimensional LSTM Networks for Video Prediction

LSTM Networks for Sequence Learning

Wonmin Byeon | NVIDIA Research | March 29, 2018 12 / 44

speech [Graves05, Graves06] handwriting [Liwicki07, Graves09]

SLIDE 19

Multi-Dimensional LSTM Networks for Video Prediction

Sequence Classification Task with Dependencies

Wonmin Byeon | NVIDIA Research | March 29, 2018 13 / 44

mapping x1x2... to y1y2....

x1 y1 x2 y2 x3 y3 x4 y 4

* * F :x∈X y∈Y

SLIDE 20

Multi-Dimensional LSTM Networks for Video Prediction

Sequence Classification Task with Dependencies

Wonmin Byeon | NVIDIA Research | March 29, 2018 13 / 44

mapping x1x2... to y1y2....

x1 y1 x2 y2 x3 y3 x4 y 4 ... … h1

input sequence x*

utput sequence

y*

SLIDE 21

Multi-Dimensional LSTM Networks for Video Prediction

1-Dimensional LSTM Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 14 / 44

Standard LSTM

[Hochreiter97,Gers99]

... ...

Input Hidden Layer Output

LSTM

ht−1 xt xt−1 xt+1 yt

SLIDE 22

Multi-Dimensional LSTM Networks for Video Prediction

1-Dimensional LSTM Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 14 / 44

Standard LSTM Bidirectional LSTM

[Hochreiter97,Gers99] [Graves05, Chen05]

... ...

Input Hidden Layer Output

LSTM

ht−1 xt xt−1 xt+1 yt

... ...

Hidden Layer Output

LSTM

ht−1 xt xt−1 xt+1 yt

... ...

Input

xt xt−1 xt+1

LSTM

ht+1

SLIDE 23

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 15 / 44

Multi-Dimensional LSTM networks

Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]

SLIDE 24

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 25

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 26

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 27

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 28

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 29

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 30

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 31

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 32

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

red: the current pixel

SLIDE 33

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44

LSTM Layer

Input Ik

3 s s s s 3x1x1 3x1x1 3x1x1 3x1x1

LSTM LSTM LSTM LSTM Output

3

...

Hidden Layer

1 1

Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]

SLIDE 34

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44

LSTM Layer

Input Ik

3 s s s s 3x1x1 3x1x1 3x1x1 3x1x1

LSTM LSTM LSTM LSTM Output

3

...

Hidden Layer

1 1

Perceives the entire spatio-temporal context of each pixel in a few sweeps through all pixels Requires fewer number of parameters to takes both local and global contexts into account End-to-End learning, No pre- and post- processing

Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]

SLIDE 35

Multi-Dimensional LSTM Networks for Video Prediction

2-Dimensional LSTM Networks for images

Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44

LSTM Layer

Input Ik

3 s s s s 3x1x1 3x1x1 3x1x1 3x1x1

LSTM LSTM LSTM LSTM Output

3

...

Hidden Layer

1 1

Perceives the entire spatio-temporal context of each pixel in a few sweeps through all pixels Requires fewer number of parameters to takes both local and global contexts into account End-to-End learning, No pre- and post- processing

Needs a lot of computations
Is hard to parallelize due to the

dependencies between pixels

Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]

SLIDE 36

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 18 / 44

Too slow.. Can we use the model for more than 2-Dimensional data?

SLIDE 37

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 19 / 44

Too slow.. Can we use the model for more than 2-Dimensional data? Parallel Computing?

SLIDE 38

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 20 / 44

Parallel Multi-Dimensional LSTM

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]

SLIDE 39

Multi-Dimensional LSTM Networks for Video Prediction

Parallel MD-LSTM Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 21 / 44 (x−1, y) (x , y−1)

MD-LSTM

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]

SLIDE 40

Multi-Dimensional LSTM Networks for Video Prediction

Parallel MD-LSTM Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 21 / 44 (x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)

MD-LSTM Parallel MD-LSTM

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]

SLIDE 41

Multi-Dimensional LSTM Networks for Video Prediction

Parallel MD-LSTM Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 21 / 44 (x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)

MD-LSTM Parallel MD-LSTM Retain all advantages of LSTMs

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]

SLIDE 42

Multi-Dimensional LSTM Networks for Video Prediction

Parallel MD-LSTM Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 21 / 44 (x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)

MD-LSTM Parallel MD-LSTM Retain all advantages of LSTMs Easy to parallelize on GPUs

All operations can be implemented using convolutions

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]

SLIDE 43

Multi-Dimensional LSTM Networks for Video Prediction

Parallel MD-LSTM Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 22 / 44

2D images

x1 x2 x3

w1 w2 w3

x1 x2 x3

*

w1 w2 w3 3D volumetric images

x1 x4 x7 x2 x5 x8 x3 x6 x9 x1 x4 x7 x2 x5 x8 x3 x6 x9

x1 x4 x7 x2 x5 x8 x3 x6 x9 w1 w4 w7 w2 w5 w8 w3 w6 w9

*

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]

SLIDE 44

Multi-Dimensional LSTM Networks for Video Prediction

Parallel MD-LSTM Networks

Wonmin Byeon | NVIDIA Research | March 29, 2018 23 / 44 (x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)

MD-LSTM Parallel MD-LSTM Retain all advantages of LSTMs Easy to parallelize on GPUs

All operations can be implemented using convolutions

More efficiently covers volume

PyraMiD-LSTM: 6 pyramids

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]

SLIDE 45

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 24 / 44

Applications

SLIDE 46

Multi-Dimensional LSTM Networks for Video Prediction

Application 1: Image Segmentation

Wonmin Byeon | NVIDIA Research | March 29, 2018 25 / 44

Input Ground truth MD-LSTM

Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]

SLIDE 47

Multi-Dimensional LSTM Networks for Video Prediction

Application 2: Layout Analysis

Wonmin Byeon | NVIDIA Research | March 29, 2018 26 / 44

Simultaneously performs: text/image segmentation, border noise removal, text/math segmentation, text line detection 1 sec / page (GPU)

Training for Different Detection Tasks

riginal page

body text detection text detection text frame detection border noise detection

Robust, Simple Page Segmentation using Hybrid Convolutional MDLSTM Networks [Breuel17]

SLIDE 48

Multi-Dimensional LSTM Networks for Video Prediction

Application 3: Medical Image Segmentation

Wonmin Byeon | NVIDIA Research | March 29, 2018 27 / 44

Membrane Segmentation on Electron Microscopy Images Rand error - human: 0.002, CNNs [Ciresan12]: 0.050, MD-LSTM: 0.047 Input CNNs

(after post-processing)

MD-LSTM

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]

SLIDE 49

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 28 / 44

Video Prediction

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 50

Multi-Dimensional LSTM Networks for Video Prediction

Video Prediction

Wonmin Byeon | NVIDIA Research | March 29, 2018 29 / 44

t: time frame

Convolutional LSTM (ConvLSTM)

[Shi15, Finn16, Lotter16, Villegas17]

3D CNN

[Tran15, Mathieu15]

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 51

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional LSTM for Video Prediction

Wonmin Byeon | NVIDIA Research | March 29, 2018 30 / 44

frame t-1 frame t

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 52

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional LSTM for Video Prediction

Wonmin Byeon | NVIDIA Research | March 29, 2018 30 / 44

Blind Spot Problem

prediction

frame T+1 frame T frame T-1 frame T-2 frame T-3

red: current pixel blue: context covered gray: blind spots

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 53

Multi-Dimensional LSTM Networks for Video Prediction

Convolutional LSTM for Video Prediction

Wonmin Byeon | NVIDIA Research | March 29, 2018 30 / 44

Solution?

frame T+1 frame T frame T-1 frame T-2 frame T-3

prediction

red: current pixel blue: context covered gray: blind spots

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 54

Multi-Dimensional LSTM Networks for Video Prediction

Fully Context-Aware Video Prediction

Wonmin Byeon | NVIDIA Research | March 29, 2018 31 / 44

frame T+1 frame T frame T-1 frame T-2 frame T-3

prediction

frame t-1

t- direction w- direction w+ direction h- direction h+ direction

frame t frame t-1 frame t frame t-1 frame t frame t-1 frame t frame t+1 frame t frame t-1 frame t+1 frame t+1 frame t+1

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 55

Multi-Dimensional LSTM Networks for Video Prediction

Fully Context-Aware Video Prediction

Wonmin Byeon | NVIDIA Research | March 29, 2018 32 / 44

frame t-1

t- direction

frame t

Fully Context-Aware Video Prediction [Byeon17]

SLIDE 56

Multi-Dimensional LSTM Networks for Video Prediction

Fully Context-Aware Video Prediction

Wonmin Byeon | NVIDIA Research | March 29, 2018 33 / 44 ContextVP: Fully Context-Aware Video Prediction [Byeon17]

w- direction

frame t frame t-1 frame t+1

h- direction

frame t-1 frame t frame t+1

SLIDE 57

Multi-Dimensional LSTM Networks for Video Prediction

Fully Context-Aware Video Prediction

Wonmin Byeon | NVIDIA Research | March 29, 2018 34 / 44

w- direction

frame t frame t-1 frame t+1

h- direction

frame t-1 frame t frame t+1 ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 58

Multi-Dimensional LSTM Networks for Video Prediction

Fully Context-Aware Video Prediction: ContextVP

Wonmin Byeon | NVIDIA Research | March 29, 2018 35 / 44

PMD(w-) PMD(w+) PMD(h-) PMD(h+) PMD(t-)

PMD1 Blending1 PMD2 Blending2

PMD(w-) PMD(w+) PMD(h-) PMD(h+) PMD(t-) PMD(w-) PMD(w+) PMD(h-) PMD(h+) PMD(t-) PMD(w-) PMD(w+) PMD(h-) PMD(h+) PMD(t-)

PMD3 Blending3 PMD4 Blending4

σ

Output layer

Parallel MD-LSTM (PMD) Block: sd, d ∈ D = {h-, h+, w-, w+, t-} Blending Block

Uniform Blening: m = f ((

d∈D sd) · W + b)

Weighted Blending: m = f (S · W + b), S =

st−

sh− sh+ sw− sw+T

Directional Weight Sharing (DWS) Loss: L1 + Gradient Difference Loss (GDL) [Mathieu15]

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 59

Multi-Dimensional LSTM Networks for Video Prediction

Fully Context-Aware Video Prediction: Baseline

A stack of 20 ConvLSTM layers + 2 skip connections The number of parameters comparable to our best model

Wonmin Byeon | NVIDIA Research | March 29, 2018 36 / 44 ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 60

Multi-Dimensional LSTM Networks for Video Prediction

Fully Context-Aware Video Prediction: Baseline

A stack of 20 ConvLSTM layers + 2 skip connections The number of parameters comparable to our best model It outperforms almost all state of the art models!

Wonmin Byeon | NVIDIA Research | March 29, 2018 36 / 44 ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 61

Multi-Dimensional LSTM Networks for Video Prediction

Fully Context-Aware Video Prediction: Baseline

A stack of 20 ConvLSTM layers + 2 skip connections The number of parameters comparable to our best model It outperforms almost all state of the art models! Compare to ContextVP?

ContextVP performs better than the baseline with the similar number

f parameters

ContextVP is more suitable for parallelization: pixel-level and direction-level parallelization are possible

Wonmin Byeon | NVIDIA Research | March 29, 2018 36 / 44 ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 62

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 37 / 44

Next-Frame Prediction

Train: KITTI dataset, Test: CalTech Pedestrian dataset, 10 input frames

ContextVP: 4 layers with Weighted Blending and DWS ConvLSTM: 20 ConvLSTM layers (baseline) PredNet: Deep Predictive Coding Network [Lotter16]

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 63

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 38 / 44

Multi-Step Prediction Trained for next-frame prediction Tested to recursively predict 8 future frames

UCF-101 dataset, 4 input frames

ContextVP: 4 layers with Weighted Blending and DWS ConvLSTM: 20 ConvLSTM layers (baseline) MCnet [Villegas17]

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 64

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 39 / 44

Multi-Step Prediction Trained for next-frame prediction Tested to recursively predict 8 future frames

UCF-101 dataset, 4 input frames

ContextVP: 4 layers with Weighted Blending and DWS ConvLSTM: 20 ConvLSTM layers (baseline) MCnet [Villegas17]

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 65

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 40 / 44

Multi-Step Prediction

1 2 3 4 5 6 7 8

time steps

20 22 24 26 28 30 32 34

PSNR

ContextVP4-WD-big ContextVP4-WD-small ConvLSTM20 MCnet+RES [16] BeyondMSE (from [16])

1 2 3 4 5 6 7 8

time steps

0.75 0.80 0.85 0.90

SSIM

ContextVP4-WD-small: 2.0M parameters ContextVP4-WD-big: 8.6M parameters ConvLSTM20: 9.0M parameters MCnet: 14M parameters BeyondMSE: 8.9M parameters

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 66

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 41 / 44

Multi-Step Prediction

1 2 3 4 5 6 7 8

time steps

20 22 24 26 28 30 32 34

PSNR

ContextVP4-WD-big ContextVP4-WD-small ConvLSTM20 MCnet+RES [16] BeyondMSE (from [16])

1 2 3 4 5 6 7 8

time steps

0.75 0.80 0.85 0.90

SSIM

ContextVP4-WD-small: 2.0M parameters ContextVP4-WD-big: 8.6M parameters ConvLSTM20: 9.0M parameters MCnet: 14M parameters BeyondMSE: 8.9M parameters

ContextVP: Fully Context-Aware Video Prediction [Byeon17]

SLIDE 67

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 42 / 44

(x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)

MD-LSTM Parallel MD-LSTM

SLIDE 68

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 42 / 44

(x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)

MD-LSTM Parallel MD-LSTM

Applications

body text detection text detection

image segmentation layout analysis medical image segmentation [Byeon15] [Breuel17] [Stollenga&Byeon15]

SLIDE 69

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 42 / 44

(x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)

MD-LSTM Parallel MD-LSTM

Applications

body text detection text detection

image segmentation layout analysis medical image segmentation [Byeon15] [Breuel17] [Stollenga&Byeon15]

Fully Context-Aware Video Prediction [Byeon17] Paper: https://arxiv.org/abs/1710.08518

SLIDE 70

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 43 / 44

Collaborators

Qin Wang (ETH Zurich) Rupesh Kumar Srivastava (NNAISENSE) Petros Koumoutsakos (ETH Zurich) Thomas M. Breuel (NVIDIA) Juergen Schumidhuber (IDSIA, NNAISENSE) Marijn Stollenga (TU Munich) Federico Raue (TU KL) Marcus Liwicki (TU KL)

SLIDE 71

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 44 / 44