1
DVDNet Deep Blind Video Decaptioning with 3D-2D Gated Convolutions
Dahun Kim*, Sanghyun Woo*, Joonyoung Lee, In So Kweon
2018 ChaLearn Looking at People Challenge
- Track 2. Video Decaptioning
DVDNet Deep Blind Video Decaptioning with 3D-2D Gated Convolutions - - PowerPoint PPT Presentation
2018 ChaLearn Looking at People Challenge - Track 2. Video Decaptioning DVDNet Deep Blind Video Decaptioning with 3D-2D Gated Convolutions Dahun Kim*, Sanghyun Woo*, Joonyoung Lee, In So Kweon 1 Our Problem Remove text overlays in video
1
Two important points :
+ Gated convolution
Output Input
3Dgated- CNN Encoder 2Dgated- CNN Decoder
Skipconnections Prediction
Two important points :
2DCNN Encoder 2DCNN Decoder
Skipconnections Prediction Input
* Ronneberger, O.et al. “U-net: Convolutional networks for biomedical image segmentation.” MICCAI 2015.
Object movements Subtitle changes
3DCNN Encoder 3DCNN Decoder
Skipconnections Prediction Input
* C¸ ic¸ek, O¨ .et al. “3d u-net: learning dense volumetric segmentation from sparse annotation.” MICCAI 2016.
Leading frames Lagging frames Output Center frame 3D-2D U-Net
Input
to match the shape and concatenate. 3Dgated- CNN Encoder 2Dgated- CNN Decoder
Skipconnections Prediction
Two important points :
Output Input
3Dgated- CNN Encoder 2Dgated- CNN Decoder
Skipconnections Prediction
Implicitly knows the inpainting mask
Input feature Conv Conv Sigmoid
* Yu, J. et al. “Free-form image inpainting with gated convolution”. arXiv preprint arXiv:1806.03589.
14