Delving Deep into Computer Vision Caner Hazirbas Machine Learning - - PowerPoint PPT Presentation

delving deep into computer vision
SMART_READER_LITE
LIVE PREVIEW

Delving Deep into Computer Vision Caner Hazirbas Machine Learning - - PowerPoint PPT Presentation

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 2 Delving Deep into


slide-1
SLIDE 1

Delving Deep into Computer Vision

Caner Hazirbas Machine Learning Meetup #1

slide-2
SLIDE 2

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 2

Delving Deep into Computer Vision

FuseNet PoseLSTM DDFF FlowNet

slide-3
SLIDE 3

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 3

Delving Deep into Computer Vision

FlowNet

96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment

FlowNetSimple

slide-4
SLIDE 4

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 4

96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment

FlowNetSimple

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

Learning Optical Flow with Convolutional Networks

ICCV’15

FlowNet

slide-5
SLIDE 5

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 5

Flying Chairs

FlowNet

slide-6
SLIDE 6

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 6

96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment

FlowNetSimple

FlowNetSimple

FlowNet

slide-7
SLIDE 7

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 7

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

FlowNetCorr

FlowNet c(x1, x2) = X

  • ∈[−k,k]×[−k,k]

hf1(x1 + o), f2(x2 + o)i , K := 2k + 1

<latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit><latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit><latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit><latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit>
slide-8
SLIDE 8

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 8

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

FlowNetS FlowNetCorr

Simple vs. Corr
 Flying Chairs

FlowNet

slide-9
SLIDE 9

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 9

FlowNetS FlowNetCorr

Simple vs. Corr
 Sintel

FlowNet

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

slide-10
SLIDE 10

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 10

Learning Optical Flow with Convolutional Networks

FlowNet

slide-11
SLIDE 11

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 11

Delving Deep into Computer Vision

FuseNet FlowNet

slide-12
SLIDE 12

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu FuseNet 12

Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture

ACCV’16

slide-13
SLIDE 13

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu FuseNet 13

A conventional way: HHA

Multi-Scale Convolutional Architecture for Semantic Segmentation, Raj et al., Tech. Report, CMU-RI-TR-15-21,2015

slide-14
SLIDE 14

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu FuseNet 14

A deep way…

slide-15
SLIDE 15

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 15 FuseNet

Why a second encoder for Depth input?

slide-16
SLIDE 16

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • Proposed network improves all segmentation

metrics

16 FuseNet

Are we any better than HHA?

slide-17
SLIDE 17

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • Proposed network improves all segmentation metrics
  • Metrics


Global: total number of correctly classified pixels
 Mean: average class accuracy
 IoU: average of intersection over union.

17 FuseNet

What about the others?

slide-18
SLIDE 18

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 18

Delving Deep into Computer Vision

FuseNet PoseLSTM FlowNet

Y ∈ R32×64 GoogLeNet Pretrained CNNs y ∈ R2048 p ∈ R3 FC FC q ∈ R4

z ∈ R128

LSTMs

slide-19
SLIDE 19

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu PoseLSTM 19

ICCV’17

Image-based localization using LSTMs for structured feature correlation

Y ∈ R32×64 GoogLeNet Pretrained CNNs y ∈ R2048 LSTMs p ∈ R3 FC FC q ∈ R4

z ∈ R128

slide-20
SLIDE 20

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu PoseLSTM 20

PoseNet

GoogLeNet Pretrained CNNs y ∈ R2048 p ∈ R3 FC FC q ∈ R4

R128

slide-21
SLIDE 21

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu PoseLSTM 21

Structured Feature Correlation

Y ∈ R32×64 GoogLeNet Pretrained CNNs y ∈ R2048 LSTMs p ∈ R3 FC FC q ∈ R4

z ∈ R128

slide-22
SLIDE 22

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 22 PoseLSTM

Winner in Outdoor: SIFT

slide-23
SLIDE 23

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 23 PoseLSTM

The map cannot be reconstructed due to a lack of sufficient matches: repeated structures, textureless areas

Where SIFT dies…

TUM-LSI Dataset

slide-24
SLIDE 24

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 24

Delving Deep into Computer Vision

FuseNet PoseLSTM DDFF FlowNet

slide-25
SLIDE 25

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu DDFF 25

  • Image of a point intersects the camera sensor when the point is in focus
  • Therefore, sharpness determines the focused regions on the images

https://inst.eecs.berkeley.edu/~cs39j/sp02/session12.html

Deep Depth From Focus

slide-26
SLIDE 26

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 26

  • Image of a point intersects the camera sensor when the point is in focus
  • Therefore, sharpness determines the focused regions on the images
  • Distance of a point from the camera can be formulated wrt. focus

Measure of sharpness Optimizer

[Moeller et al.] [Pertuz et al.]

DDFF

Conventional DFF methods

slide-27
SLIDE 27

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 27

  • Focus gradually changes on each image in the stack
  • End-to-end trained convolutional auto-encoder
  • Depth (disparity) from focal stack

DDFF

Deep Depth From Focus

slide-28
SLIDE 28

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 28 DDFF

How to get data?

slide-29
SLIDE 29

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

http://limu.ait.kyushu-u.ac.jp/e/project/project003.html

29

I(x, y) = Z

u

Z

v

L(u, v, x, y) ∂u ∂v

DDFF

Light-field Imaging

slide-30
SLIDE 30

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 30

http://limu.ait.kyushu-u.ac.jp/e/project/project003.html

DDFF

Light-field Imaging

slide-31
SLIDE 31

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 31

I0(x, y) = Z

u

Z

v

L(u, v, x + ∆x(u), y + ∆y(v)) ∂u ∂v

DDFF

Digital Refocusing

slide-32
SLIDE 32

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 32

I0(x, y) = Z

u

Z

v

L(u, v, x + ∆x(u), y + ∆y(v)) ∂u ∂v

DDFF

Digital Refocusing

slide-33
SLIDE 33

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 33

I0(x, y) = Z

u

Z

v

L(u, v, x + ∆x(u), y + ∆y(v)) ∂u ∂v

DDFF

Digital Refocusing

✓∆x(u) ∆y(v) ◆ = baseline · f Z | {z }

disparity

· ✓ucenter − u vcenter − v ◆

  • Z: any arbitrary depth
  • baseline: distance between adjacent sub-apertures
  • f: focal length of the micro-lenses
  • (u v)T: spatial location of the sub aperture in the camera plane
slide-34
SLIDE 34

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • 720 recorded light-field depth pairs
  • collected in 12 different scenes
  • each of 6-scene has 100, each of 6-scene 20

34 DDFF

DDFF 12-Scene dataset

slide-35
SLIDE 35

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • Micro disparity (270 micrometer = 27e-5 m)

between sub-apertures results in sub-pixel shift

  • Therefore, focus is not observable by human eyes
  • Shift the sub-apertures using phase-shift algorithm

35

F{I0(x + ∆x(u))} = F{I(x)} · exp2πi∆x(u)

[Jeon et al.]

DDFF

First Challenge

slide-36
SLIDE 36

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • 10 refocused images in between 50cm to 7m
  • Linear change of focus (disparity) in the stack

36 DDFF

Focal Stack

slide-37
SLIDE 37

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • What network to choose?

37 DDFF

Second Challenge

slide-38
SLIDE 38

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • What network to choose?
  • How to process the stack through the network?

38 DDFF

Second Challenge

slide-39
SLIDE 39

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • What network to choose?
  • How to process the stack through the network?
  • What to expect from the network to learn?

39 1 2 3 4 5 6 7 8 9 10 DDFF

Second Challenge

slide-40
SLIDE 40

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • Loss: missing depth/disparity values are ignored

40 DDFF

Training

L =

HW

X

p

M(p) ·

  • fW(S, p) D(p)
  • 2

2 + λkWk2 2

<latexit sha1_base64="cnxehwoS25IOj2X3N28OileXmTs=">ACmHicbVHbahsxENVub6l7c5K39kXUlLo0NbumkLwUTFtoCi2kF9sBy1m02lbRHtBmg0xi76kX9Yf6HdUu3ETnHRAcDjnzGh0FJdKGgyC35/6/adu/e27ncePHz0+El3e2dikoLGItCFfo45gaUzGMEhUclxp4FiuYxqcfGn16BtrIv+JqxLmGV/kMpWCo6Oi7i+WcVwKruovlr6jzFRZVJf2pD6cWnqpfbX98hVlIimQslguKFMT0EjTiCGcY5zWU9u/dP+we9TZ39CPbVfr140/Gp7UQ0tfu3a3YcL/jbkacmUcsr2o2wsGQVv0JgjXoEfWdR1/7CkEFUGOQrFjZmFQYnzmuUQoHtsMpAycUpX0C9BHUGuMHNHMx5BmZet8Fa+sIxCU0L7U6OtGU3pvDMmFUWO2fzeHNda8j/abMK04N5LfOyQsjFxUVpSgWtPklmkgNAtXKAS60dPtTseSaC3R/2XHBhNdjuAkmw0EYDMJvb3uj9+uItsgz8pz0SUj2yYgckiMyJsLzvJde4IX+U3/kf/I/X1h9b92zSzbK/4Xk3JCg=</latexit><latexit sha1_base64="cnxehwoS25IOj2X3N28OileXmTs=">ACmHicbVHbahsxENVub6l7c5K39kXUlLo0NbumkLwUTFtoCi2kF9sBy1m02lbRHtBmg0xi76kX9Yf6HdUu3ETnHRAcDjnzGh0FJdKGgyC35/6/adu/e27ncePHz0+El3e2dikoLGItCFfo45gaUzGMEhUclxp4FiuYxqcfGn16BtrIv+JqxLmGV/kMpWCo6Oi7i+WcVwKruovlr6jzFRZVJf2pD6cWnqpfbX98hVlIimQslguKFMT0EjTiCGcY5zWU9u/dP+we9TZ39CPbVfr140/Gp7UQ0tfu3a3YcL/jbkacmUcsr2o2wsGQVv0JgjXoEfWdR1/7CkEFUGOQrFjZmFQYnzmuUQoHtsMpAycUpX0C9BHUGuMHNHMx5BmZet8Fa+sIxCU0L7U6OtGU3pvDMmFUWO2fzeHNda8j/abMK04N5LfOyQsjFxUVpSgWtPklmkgNAtXKAS60dPtTseSaC3R/2XHBhNdjuAkmw0EYDMJvb3uj9+uItsgz8pz0SUj2yYgckiMyJsLzvJde4IX+U3/kf/I/X1h9b92zSzbK/4Xk3JCg=</latexit><latexit sha1_base64="cnxehwoS25IOj2X3N28OileXmTs=">ACmHicbVHbahsxENVub6l7c5K39kXUlLo0NbumkLwUTFtoCi2kF9sBy1m02lbRHtBmg0xi76kX9Yf6HdUu3ETnHRAcDjnzGh0FJdKGgyC35/6/adu/e27ncePHz0+El3e2dikoLGItCFfo45gaUzGMEhUclxp4FiuYxqcfGn16BtrIv+JqxLmGV/kMpWCo6Oi7i+WcVwKruovlr6jzFRZVJf2pD6cWnqpfbX98hVlIimQslguKFMT0EjTiCGcY5zWU9u/dP+we9TZ39CPbVfr140/Gp7UQ0tfu3a3YcL/jbkacmUcsr2o2wsGQVv0JgjXoEfWdR1/7CkEFUGOQrFjZmFQYnzmuUQoHtsMpAycUpX0C9BHUGuMHNHMx5BmZet8Fa+sIxCU0L7U6OtGU3pvDMmFUWO2fzeHNda8j/abMK04N5LfOyQsjFxUVpSgWtPklmkgNAtXKAS60dPtTseSaC3R/2XHBhNdjuAkmw0EYDMJvb3uj9+uItsgz8pz0SUj2yYgckiMyJsLzvJde4IX+U3/kf/I/X1h9b92zSzbK/4Xk3JCg=</latexit><latexit sha1_base64="cnxehwoS25IOj2X3N28OileXmTs=">ACmHicbVHbahsxENVub6l7c5K39kXUlLo0NbumkLwUTFtoCi2kF9sBy1m02lbRHtBmg0xi76kX9Yf6HdUu3ETnHRAcDjnzGh0FJdKGgyC35/6/adu/e27ncePHz0+El3e2dikoLGItCFfo45gaUzGMEhUclxp4FiuYxqcfGn16BtrIv+JqxLmGV/kMpWCo6Oi7i+WcVwKruovlr6jzFRZVJf2pD6cWnqpfbX98hVlIimQslguKFMT0EjTiCGcY5zWU9u/dP+we9TZ39CPbVfr140/Gp7UQ0tfu3a3YcL/jbkacmUcsr2o2wsGQVv0JgjXoEfWdR1/7CkEFUGOQrFjZmFQYnzmuUQoHtsMpAycUpX0C9BHUGuMHNHMx5BmZet8Fa+sIxCU0L7U6OtGU3pvDMmFUWO2fzeHNda8j/abMK04N5LfOyQsjFxUVpSgWtPklmkgNAtXKAS60dPtTseSaC3R/2XHBhNdjuAkmw0EYDMJvb3uj9+uItsgz8pz0SUj2yYgckiMyJsLzvJde4IX+U3/kf/I/X1h9b92zSzbK/4Xk3JCg=</latexit>
slide-41
SLIDE 41

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • DDFFNet reduces the depth error by 75% respect

to VDFF

  • Best scaling factor for VDFF and Lytro:

41 DDFF

Evaluation

k∗ = arg min

k

X

p

kk · ˜ Zp Zpk2

2

<latexit sha1_base64="nbvGwbqnx/1YCAHXfsmzlaLHTA=">ACSXicbVAxTxsxGPUFSiEUCO3IYhEhdSjRXYREFyRUlg4dQCIBEYeTz/clsWL7TvZ3kaJT/lT/Bn+gHWHo3g0x4QtZAjzJ1vN7Otl+RKOgzDv0FtZfXD2sf1jfrmp63tncbu567LCiugIzKV2euEO1DSQAclKrjOLXCdKLhKxmeVfzUB62RmLnGaQ1/zoZEDKTh6KW78Gt8y7pCeUMbtkH2jTEsTjylzhY7z6qy6YJF6RaQZUoZSpVDezOKcHtIbvzNbBeL2bZvFjWbYCuegb0m0IE2ywHnc+MfSTBQaDArFnetFY79kluUQsGszgoHORdjPoRyBGoCuKT1PDVcg+uX8yZm9MArKR1k1i+DdK4u3cK1c1Od+KTmOHKvUp8z+sVOPjeL6XJCwQjXh4aFIpiRqtaSotCFRT7iw0v+fihG3XKAv+6LiV7X8JZ0260obEUXR83TH4uK1ske2SdfSUSOySn5Sc5Jhwjym/wh9+QhuAv+B4/B0u0FixmvpAl1FaeAaRVsWA=</latexit><latexit sha1_base64="nbvGwbqnx/1YCAHXfsmzlaLHTA=">ACSXicbVAxTxsxGPUFSiEUCO3IYhEhdSjRXYREFyRUlg4dQCIBEYeTz/clsWL7TvZ3kaJT/lT/Bn+gHWHo3g0x4QtZAjzJ1vN7Otl+RKOgzDv0FtZfXD2sf1jfrmp63tncbu567LCiugIzKV2euEO1DSQAclKrjOLXCdKLhKxmeVfzUB62RmLnGaQ1/zoZEDKTh6KW78Gt8y7pCeUMbtkH2jTEsTjylzhY7z6qy6YJF6RaQZUoZSpVDezOKcHtIbvzNbBeL2bZvFjWbYCuegb0m0IE2ywHnc+MfSTBQaDArFnetFY79kluUQsGszgoHORdjPoRyBGoCuKT1PDVcg+uX8yZm9MArKR1k1i+DdK4u3cK1c1Od+KTmOHKvUp8z+sVOPjeL6XJCwQjXh4aFIpiRqtaSotCFRT7iw0v+fihG3XKAv+6LiV7X8JZ0260obEUXR83TH4uK1ske2SdfSUSOySn5Sc5Jhwjym/wh9+QhuAv+B4/B0u0FixmvpAl1FaeAaRVsWA=</latexit><latexit sha1_base64="nbvGwbqnx/1YCAHXfsmzlaLHTA=">ACSXicbVAxTxsxGPUFSiEUCO3IYhEhdSjRXYREFyRUlg4dQCIBEYeTz/clsWL7TvZ3kaJT/lT/Bn+gHWHo3g0x4QtZAjzJ1vN7Otl+RKOgzDv0FtZfXD2sf1jfrmp63tncbu567LCiugIzKV2euEO1DSQAclKrjOLXCdKLhKxmeVfzUB62RmLnGaQ1/zoZEDKTh6KW78Gt8y7pCeUMbtkH2jTEsTjylzhY7z6qy6YJF6RaQZUoZSpVDezOKcHtIbvzNbBeL2bZvFjWbYCuegb0m0IE2ywHnc+MfSTBQaDArFnetFY79kluUQsGszgoHORdjPoRyBGoCuKT1PDVcg+uX8yZm9MArKR1k1i+DdK4u3cK1c1Od+KTmOHKvUp8z+sVOPjeL6XJCwQjXh4aFIpiRqtaSotCFRT7iw0v+fihG3XKAv+6LiV7X8JZ0260obEUXR83TH4uK1ske2SdfSUSOySn5Sc5Jhwjym/wh9+QhuAv+B4/B0u0FixmvpAl1FaeAaRVsWA=</latexit><latexit sha1_base64="nbvGwbqnx/1YCAHXfsmzlaLHTA=">ACSXicbVAxTxsxGPUFSiEUCO3IYhEhdSjRXYREFyRUlg4dQCIBEYeTz/clsWL7TvZ3kaJT/lT/Bn+gHWHo3g0x4QtZAjzJ1vN7Otl+RKOgzDv0FtZfXD2sf1jfrmp63tncbu567LCiugIzKV2euEO1DSQAclKrjOLXCdKLhKxmeVfzUB62RmLnGaQ1/zoZEDKTh6KW78Gt8y7pCeUMbtkH2jTEsTjylzhY7z6qy6YJF6RaQZUoZSpVDezOKcHtIbvzNbBeL2bZvFjWbYCuegb0m0IE2ywHnc+MfSTBQaDArFnetFY79kluUQsGszgoHORdjPoRyBGoCuKT1PDVcg+uX8yZm9MArKR1k1i+DdK4u3cK1c1Od+KTmOHKvUp8z+sVOPjeL6XJCwQjXh4aFIpiRqtaSotCFRT7iw0v+fihG3XKAv+6LiV7X8JZ0260obEUXR83TH4uK1ske2SdfSUSOySn5Sc5Jhwjym/wh9+QhuAv+B4/B0u0FixmvpAl1FaeAaRVsWA=</latexit>
slide-42
SLIDE 42

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • DDFFNet-CC3 (S=10) has the least badpix and

depth error

42 DDFF

Evaluation

slide-43
SLIDE 43

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 43

More analyses of DDFFNet

  • sharpness in DDFFNet
  • non-linearly refocused stack

DDFF 12-Scene dataset

  • refocusing,
  • DLF,
  • 3D reconstruction

DDFF

What’s Next?

slide-44
SLIDE 44

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu 44

Delving Deep into Computer Vision

FuseNet PoseLSTM DDFF FlowNet

slide-45
SLIDE 45

Delving Deep into Computer Vision Caner Hazirbas | hazirbas@cs.tum.edu

  • FlowNet: Learning Optical Flow with Convolutional Networks

  • A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazirbas, V. Golkov, P. van der Smagt, D.

Cremers, T. Brox, ICCV’15

  • FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-

based CNN Architecture


  • C. Hazirbas, L. Ma, C. Domokos, D. Cremers, ACCV’16
  • Image-based localization using LSTMs for structured feature correlation

  • F. Walch, C. Hazirbas, L. Leal-Taixé, T. Sattler, S. Hilsenbeck, D. Cremers, ICCV’17
  • Deep Depth From Fous

  • C. Hazirbas, L. Leal-Taixé, T. Sattler, S. Hilsenbeck, D. Cremers, ArXiv’16

45

References