Delving Deep into Computer Vision Caner Hazirbas Machine Learning - PowerPoint PPT Presentation

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1

Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 2

Delving Deep into Computer Vision FlowNet FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 96 x 128 9 5 512 512 192 x 256 512 512 256 384 x 512 256 136 x 320 128 64 6 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 3

Learning Optical Flow with FlowNet Convolutional Networks ICCV’15 FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 9 96 x 128 5 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 4

Flying Chairs FlowNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 5

FlowNetSimple FlowNet FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 96 x 128 9 5 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 6

<latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> FlowNetCorr FlowNet FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv4 conv3_1 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 X c ( x 1 , x 2 ) = h f 1 ( x 1 + o ) , f 2 ( x 2 + o ) i , o ∈ [ − k,k ] × [ − k,k ] K := 2 k + 1 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 7

Simple vs. Corr   FlowNet Flying Chairs FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- kernel prediction 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 FlowNetS FlowNetCorr Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 8

Simple vs. Corr   FlowNet Sintel FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- kernel prediction 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 FlowNetS FlowNetCorr Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 9

Learning Optical Flow with FlowNet Convolutional Networks Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 10

Delving Deep into Computer Vision FlowNet FuseNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 11

Incorporating Depth into Semantic Segmentation via Fusion-based CNN FuseNet Architecture ACCV’16 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 12

A conventional way: HHA FuseNet Multi-Scale Convolutional Architecture for Semantic Segmentation, Raj et al., Tech. Report, CMU-RI-TR-15-21,2015 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 13

A deep way… FuseNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 14

Why a second encoder for FuseNet Depth input? Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 15

Are we any better than HHA? FuseNet Proposed network improves all segmentation • metrics Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 16

What about the others? FuseNet Proposed network improves all segmentation metrics • Metrics   • Global : total number of correctly classified pixels   Mean : average class accuracy   IoU : average of intersection over union. Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 17

Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 18

Image-based localization using LSTMs PoseLSTM for structured feature correlation ICCV’17 LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 19

PoseNet PoseLSTM Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 20

Structured Feature Correlation PoseLSTM LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 21

Winner in Outdoor: SIFT PoseLSTM Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 22

Where SIFT dies… PoseLSTM TUM-LSI Dataset The map cannot be reconstructed due to a lack of sufficient matches: repeated structures, textureless areas Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 23

Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 24

Deep Depth From Focus DDFF Image of a point intersects the camera sensor when the point is in focus • Therefore, sharpness determines the focused regions on the images • https://inst.eecs.berkeley.edu/~cs39j/sp02/session12.html Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 25

Conventional DFF methods DDFF Image of a point intersects the camera sensor when the point is in focus • Therefore, sharpness determines the focused regions on the images • Distance of a point from the camera can be formulated wrt. focus • Measure of Optimizer sharpness [Pertuz et al.] [Moeller et al.] Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 26

Deep Depth From Focus DDFF Focus gradually changes on each image in the stack • End-to-end trained convolutional auto-encoder • Depth (disparity) from focal stack • Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 27

How to get data? DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 28

Delving Deep into Computer Vision Caner Hazirbas Machine Learning - PowerPoint PPT Presentation

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 2 Delving Deep into

Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen

Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield, Karen

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Expansive Mind - Heightened Consciousness Delving into the Mystic Path of the Baal Shem Tov:

Delving more deeply into UNIX Bualo Chapter 3 1 / 21 Overview 1) A Little Review 2) Unix

Delving further into privacy policies Engineering & Public Policy Lorrie Cranor October

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Reasoning A Vision for Automated Deduction Stephan Schulz Deep Reasoning A Vision for

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Convolutional Neural Networks for Computer Vision Caner Hazrba Centrum fr Informations- und

3D Computer Vision Dmitry Chetverikov, Levente Hajder Etvs Lornd University, Faculty of

Computer Vision and Deep Learning Introduction to Data Science 2019 University of Helsinki Mats

COMPUTER VISION Robust estimation Emanuel Aldea < emanuel.aldea@u-psud.fr >

Computer'Vision Course'Introduction Prof.&Flvio&Cardeal&

Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

Sound Synthesis (Part 2) Graduate School of Culture Technology, KAIST Juhan Nam Category of

Delving Deep into Computer Vision Caner Hazirbas Machine Learning - PowerPoint PPT Presentation

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 2 Delving Deep into

Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen

Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield, Karen

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Expansive Mind - Heightened Consciousness Delving into the Mystic Path of the Baal Shem Tov:

Delving more deeply into UNIX Bualo Chapter 3 1 / 21 Overview 1) A Little Review 2) Unix

Delving further into privacy policies Engineering &amp; Public Policy Lorrie Cranor October

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Reasoning A Vision for Automated Deduction Stephan Schulz Deep Reasoning A Vision for

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Convolutional Neural Networks for Computer Vision Caner Hazrba Centrum fr Informations- und

3D Computer Vision Dmitry Chetverikov, Levente Hajder Etvs Lornd University, Faculty of

Computer Vision and Deep Learning Introduction to Data Science 2019 University of Helsinki Mats

COMPUTER VISION Robust estimation Emanuel Aldea &lt; emanuel.aldea@u-psud.fr &gt;

Computer'Vision Course'Introduction Prof.&amp;Flvio&amp;Cardeal&amp;

Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

Sound Synthesis (Part 2) Graduate School of Culture Technology, KAIST Juhan Nam Category of

Delving further into privacy policies Engineering & Public Policy Lorrie Cranor October

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

COMPUTER VISION Robust estimation Emanuel Aldea < emanuel.aldea@u-psud.fr >

Computer'Vision Course'Introduction Prof.&Flvio&Cardeal&