Multi-Dimensional LSTM Networks for Video Prediction
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon
NVIDIA Research
March 29, 2018
Wonmin Byeon | NVIDIA Research | March 29, 2018 1 / 44
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon - - PowerPoint PPT Presentation
Multi-Dimensional LSTM Networks for Video Prediction Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon NVIDIA Research March 29, 2018 Wonmin Byeon | NVIDIA Research | March 29, 2018 1 / 44 Multi-Dimensional LSTM Networks for
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 1 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 2 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 3 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 3 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 4 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 4 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 5 / 44 Images from Zheng’s ECCV16 tutorial
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 5 / 44 Images from Zheng’s ECCV16 tutorial
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 6 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 7 / 44
Animation from https://github.com/vdumoulin/conv_arithmetic
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 8 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 9 / 44
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44
Image from http://staffwww.dcs.shef.ac.uk/people/H.Lu/feeler.html
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 11 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 12 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 13 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 13 / 44
input sequence x*
y*
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 14 / 44
... ...
Input Hidden Layer Output
LSTM
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 14 / 44
... ...
Input Hidden Layer Output
LSTM
... ...
Hidden Layer Output
LSTM
... ...
Input
LSTM
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 15 / 44
Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44
LSTM Layer
Input Ik
3 s s s s 3x1x1 3x1x1 3x1x1 3x1x1
LSTM LSTM LSTM LSTM Output
3
...
Hidden Layer
1 1
Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44
LSTM Layer
Input Ik
3 s s s s 3x1x1 3x1x1 3x1x1 3x1x1
LSTM LSTM LSTM LSTM Output
3
...
Hidden Layer
1 1
Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44
LSTM Layer
Input Ik
3 s s s s 3x1x1 3x1x1 3x1x1 3x1x1
LSTM LSTM LSTM LSTM Output
3
...
Hidden Layer
1 1
Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 18 / 44
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 19 / 44
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 20 / 44
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 21 / 44 (x−1, y) (x , y−1)
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 21 / 44 (x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 21 / 44 (x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 21 / 44 (x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 22 / 44
w1 w2 w3
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 23 / 44 (x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 24 / 44
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 25 / 44
Scene Labeling with LSTM Recurrent Neural Networks [Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 26 / 44
Robust, Simple Page Segmentation using Hybrid Convolutional MDLSTM Networks [Breuel17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 27 / 44
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 28 / 44
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 29 / 44
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 30 / 44
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 30 / 44
prediction
frame T+1 frame T frame T-1 frame T-2 frame T-3
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 30 / 44
frame T+1 frame T frame T-1 frame T-2 frame T-3
prediction
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 31 / 44
frame T+1 frame T frame T-1 frame T-2 frame T-3
prediction
frame t-1
t- direction w- direction w+ direction h- direction h+ direction
frame t frame t-1 frame t frame t-1 frame t frame t-1 frame t frame t+1 frame t frame t-1 frame t+1 frame t+1 frame t+1
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 32 / 44
frame t-1
frame t
Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 33 / 44 ContextVP: Fully Context-Aware Video Prediction [Byeon17]
w- direction
frame t frame t-1 frame t+1
h- direction
frame t-1 frame t frame t+1
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 34 / 44
w- direction
frame t frame t-1 frame t+1
h- direction
frame t-1 frame t frame t+1 ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 35 / 44
PMD(w-) PMD(w+) PMD(h-) PMD(h+) PMD(t-)
PMD1 Blending1 PMD2 Blending2
PMD(w-) PMD(w+) PMD(h-) PMD(h+) PMD(t-) PMD(w-) PMD(w+) PMD(h-) PMD(h+) PMD(t-) PMD(w-) PMD(w+) PMD(h-) PMD(h+) PMD(t-)
PMD3 Blending3 PMD4 Blending4
σ
Output layer
d∈D sd) · W + b)
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 36 / 44 ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 36 / 44 ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction
Wonmin Byeon | NVIDIA Research | March 29, 2018 36 / 44 ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 37 / 44
Train: KITTI dataset, Test: CalTech Pedestrian dataset, 10 input frames
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 38 / 44
UCF-101 dataset, 4 input frames
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 39 / 44
UCF-101 dataset, 4 input frames
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 40 / 44
1 2 3 4 5 6 7 8
time steps
20 22 24 26 28 30 32 34
PSNR
ContextVP4-WD-big ContextVP4-WD-small ConvLSTM20 MCnet+RES [16] BeyondMSE (from [16])
1 2 3 4 5 6 7 8
time steps
0.75 0.80 0.85 0.90
SSIM
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 41 / 44
1 2 3 4 5 6 7 8
time steps
20 22 24 26 28 30 32 34
PSNR
ContextVP4-WD-big ContextVP4-WD-small ConvLSTM20 MCnet+RES [16] BeyondMSE (from [16])
1 2 3 4 5 6 7 8
time steps
0.75 0.80 0.85 0.90
SSIM
ContextVP: Fully Context-Aware Video Prediction [Byeon17]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 42 / 44
(x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 42 / 44
(x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)
body text detection text detection
image segmentation layout analysis medical image segmentation [Byeon15] [Breuel17] [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 42 / 44
(x−1, y) (x , y−1) (x−1, y+1) (x−1, y−1) (x−1, y)
body text detection text detection
image segmentation layout analysis medical image segmentation [Byeon15] [Breuel17] [Stollenga&Byeon15]
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 43 / 44
Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon | NVIDIA Research | March 29, 2018 44 / 44