- Prof. Leal-Taixé and Prof. Niessner
Advanced Deep Learnin ing for Computer Vis isio ion
1
Advanced Deep Learnin ing for Computer Vis isio ion Prof. - - PowerPoint PPT Presentation
Advanced Deep Learnin ing for Computer Vis isio ion Prof. Leal-Taix and Prof. Niessner 1 The Team Lecturers Prof. Dr. Laura Prof. Dr. Matthias Leal-Taix Niessner Tutors Tim Dave Ji Maxim Chen Hou Meinhardt Maximov Prof.
1
Lecturers
Leal-Taixé
Niessner Tim Meinhardt
Tutors
2
Ji Hou Maxim Maximov Dave Chen
various Computer Vision tasks
3
specific vision problem!
project where you can put all the knowledge to practice
4
5
10:0 :00-11 11:3 :30h h
:00-15 15:3 :30h h (Seminar Room, 02.09.023)
https://dvl.in.tum.de/teaching/adl4cv-ws19/
6
27th
th Febru
ruary, 13 13:3 :30-14 14:30
7
https://dvl.in.tum.de/teaching/adl4cv-ws19/
.10., mid idnig ight: delive liver r a 1 1 page abstract of f your r id idea fo for r th the pro roje ject.
8
irst t pre resentatio ion: firs first re result lts, challe llenges
– 04 04.12 12.: Gro roups #1 1 – 11 11.12 .12.: : Gro roups #2
9
resentation: alm lmost t fin final l re result lts, new th thin ings you trie tried
– 08 08.01. 1.: Gro roups #1 1 – 15 15.0 .01. 1.: : Gro roups #2
10
.02.: .: fin final l deadlin line on re report (d (deadlin line noon) – Max 4 pages using CVPR template
inal l pre resentation = POSTER – Date 05.02. 13:00-16:00
11
– Presentations (2 oral pres. + 1 poster) = 1/3 – Final report = 1/3 – Code/submission = 1/3
12
have weekly office hours to discuss the progress
approved
13
https://dvl.in.tum.de/teaching/adl4cv-ws19/
adl4cv@dvl.in.tum.de
14
– Chat after the lecture – Post it on Moodle
15
16
Ji Ji Hou
17
3D Detection/Segemntation/In Instance/Comple letion on
18
ion on Sin Single RGB-D Image.
– Song, Shuran, and Jianxiong Xiao. "Deep sliding shapes for amodal 3d object detection in rgb-d images." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. – Qi, Charles R., et al. "Frustum pointnets for 3d object detection from rgb-d data." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. – Qi, Charles R., et al. "Deep Hough Voting for 3D Object Detection in Point Clouds." arXiv preprint arXiv:1904.09664(2019).
19
iftin ing 2D 2D det etectio ion to to 3D
– Srivastava, Siddharth, Frederic Jurie, and Gaurav Sharma. "Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles." arXiv preprint arXiv:1904.08494(2019). – Kulkarni, Nilesh, et al. "3D-RelNet: Joint Object and Relational Network for 3D Prediction." arXiv preprint arXiv:1906.02729(2019). – http://www.cvlibs.net/datasets/kitti/
20
stance Se Segmenta tatio ion/Completio ion on 3D re reconstr tructio ion
– Hou, Ji, Angela Dai, and Matthias Nießner. "3d-sis: 3d semantic instance segmentation of rgb-d scans." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. – Hou, Ji, Angela Dai, and Matthias Nießner. "3D-SIC: 3D Semantic Instance Completion for RGB-D Scans." arXiv preprint arXiv:1904.12012 (2019).
21
etectio ion on
iews
– Chen, Xiaozhi, et al. "Multi-view 3d object detection network for autonomous driving." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. – Single View + Merging
22
to co combine ge geometry ry and co color r (a (and ra radar) r)
prediction for 3d semantic scene segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
for 3D Scene Understanding." arXiv preprint arXiv:1909.13603 (2019).
23
ructi tion fro from RGB im image(s)
and multi-view 3d object reconstruction." European conference on computer vision. Springer, Cham, 2016.
generation network for 3d object reconstruction from a single image." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
24
Dave Z. . Chen
25
D Cross-modal Retri trieval: : Brid ridging the the Gap be betw tween 3D Obje bjects an and Natu atural l La Language De Desc scriptio tions
– Chen et al. "Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings" ArXiv Preprint. 2018. – Han et al. "Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences" The AAAI Conference on Artificial Intelligence. 2018. – Tutor: Dave Z. Chen – Contact: zhenyu.chen@tum.de
26
tomatic ic Descrip ription Genera rating fo for r 3D CAD models ls
– Xu et al. "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. – Lu et al. "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2017. – Tutor: Dave Z. Chen – Contact: zhenyu.chen@tum.de
27
: Genera rating descrip riptions fo for r objects in in 3D scenes
– Xu et al. "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. – Lu et al. "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2017. – Tutor: Dave Z. Chen – Contact: zhenyu.chen@tum.de
28
lizatio ion in in 3D scenes usin ing Natu tural l Language
– Hu et al. "Natural Language Object Retrieval" Proceedings of the IEEE Conference
– Hu et al. "Segmentation from Natural Language Expressions" Proceedings of the IEEE European Conference on Computer Vision. 2016. – Tutor: Dave Z. Chen – Contact: zhenyu.chen@tum.de
29
rounding re refe ferrin ing exp xpressions in in 3D scenes with ith multi ltimodal l data ta
– Hu et al. "Natural Language Object Retrieval" Proceedings of the IEEE Conference
– Dai, Angela, and Matthias Nießner. "3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation." Proceedings of the European Conference on Computer Vision. 2018. – Tutor: Dave Z. Chen – Contact: zhenyu.chen@tum.de
30
Tim im Meinhardt
31
ideo obje ject segmentatio ion (sin ingle le/mult ltip iple le obje jects)
32
ideo obj bject segmentation (sin ingle/mult ltiple le obje bjects)
Bringing OSVOS to real world pedestrian tracking scenarios:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017. Related work:
Voigtlaender, B. Leibe, BMVC 2017. – Datasets: MO MOTS TS: Mu Mult lti-Object Tracking and and Seg egment ntation Paul Paul Vo Voigtlaender, , Mich Michael l Kra raus use, , Aljoša Ošep, Jona nathon n Lu Luiten en, , Be Berin rin Bal Balachandar Gnana Gnana Sekar, , And ndrea eas Gei Geiger, r, Bas Bastian n Lei
019
– Tutor: Tim Meinhardt – Contact: tim.meinhardt@tum.de
33
segmentation (sin single/multiple obj
Enhancing OSVOS for multi-object segmentation: Related work:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017.
Order Spatio-Temporal MRF. L. Bao, B. Wu, W. Liu, CVPR 2018.
– Tutor: Tim Meinhardt – Contact: tim.meinhardt@tum.de
34
ltip iple le object tr trackin ing in in re real-world ld scenari rios
35
ulti tiple obje bject tra racking in n rea real-world ld sce scenario ios – Meta-learning for:
Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixe. IEEE International Conference on Computer Vision (ICCV), 2019. Related work:
Liangliang Ren, Jiwen Lu, Zifeng Wang1, Qi Tian,Jie Zhou. ECCV 2018. – Tutor: Tim Meinhardt – Contact: tim.meinhardt@tum.de
36
ulti tiple le obje ject t tr track ckin ing in in real real-world ld sc scen enario ios – Building an appearance model for:
Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixe. IEEE International Conference on Computer Vision (ICCV), 2019.
– Tutor: Tim Meinhardt – Contact: tim.meinhardt@tum.de
37
Maxim im Maxim imov
38
atching bl blurr rry imag ages
– How partially blurry images can be matched with sharp ones (for different tasks) – Estimate between 2 images: disparity map OR camera localization OR some other metric
– “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric” – “Efficient Deep Learning for Stereo Matching” – “Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching” – "DeMoN: Depth and Motion Network for Learning Monocular Stereo" – "Learning Monocular Depth by Distilling Cross-domain Stereo Networks" – Other works for stereo matching
39
ing meth thods: de-blu lurring, sta tabiliz lizatio ion, styliz tylizatio ion etc tc.
40
deo Stab Stabilizat ation
– Temporally coherent & sharp
– Only from videos
– “Burst Image Deblurring Using Permutation Invariant Convolutional Neural Networks” – Google Approach – “Deep Online Video Stabilization” – "Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring" – "DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks" – Other motion deblurring papers
41
to re render r 2D Im Images (V (Vid ideo)
timate components of re f rendered im image
42
endering fro rom intermediate re renders
– “Render” RGB image based on masks, normals, depth or RGB or etc – Make it realistic (appearance, shadow) – Different options (regular approach, focus on light\shadows, GAN, video)
– “Geometric Image Synthesis”, – “Photographic Image Synthesis with Cascaded Refinement Networks”, – "IGNOR: Image-guided Neural Object Rendering" – "NVS Machines: Learning Novel View Synthesis with Fine-grained View Control" – Other Image synthesis papers
43
Represe senta tatio tion of
scene recon reconst stru ructio tion netwo network rk
– How to fuse representations from different viewpoints – Open topic
– “Neural scene representation and rendering” – "Inverting Visual Representations with Convolutional Networks" – "Learning to Generate Chairs, Tables and Cars with Convolutional Networks" – "Learning a Probabilistic Latent Space of Object Shapes via 3D Generative- Adversarial Modeling" – "Neural Discrete Representation Learning" – "DeepVoxels: Learning Persistent 3D Feature Embeddings" – Other 3D Reconstruction papers with latent representation
44
llumination estimation
– Use RGB (+ optionally Depth) – How mirror ball would look like given an image – Or\and estimate shadow map
– “Neural Inverse Rendering of an Indoor Scene from a Single Image” – “DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality” – “What Is Around The Camera?” – "LIME: Live Intrinsic Material Estimation" – "Neural Inverse Rendering of an Indoor Scene from a Single Image" – "Learning to Reconstruct Shape and Spatially-Varying Reflectance from a Single Image" – AR selfie method – Other inverse-rendering papers
45
Deep Le Learning Models Int nterpretabili lity
– Analysis + Visualization
– Common Problems – Open topic – Many related work
– Github - pytorch-cnn-vizualization – Building blocks of interpretability – “The elephant in the room”, etc – Many other papers
46
47
rative adversa rsaria ial l netw tworks fo for r vid ideo generatio ion
"Generating videos with scene dynamics." Advances In Neural Information Processing Systems. 2016
preprint arXiv:1610.00527 (2016)
preprint arXiv:1808.06601 (2018).
48
: fo forg rgery genera ratio ion and dete tection
Video Dataset for Forgery Detection in Human Faces." arXiv preprint arXiv:1803.09179 (2018).
preprint arXiv:1805.11714 (2018).
49
50
– No lecture next week !!! (ICCV)
projects!
51
52