Bored by Classification ConvNets? End-to-end Learning of other - - PowerPoint PPT Presentation

bored by classification convnets
SMART_READER_LITE
LIVE PREVIEW

Bored by Classification ConvNets? End-to-end Learning of other - - PowerPoint PPT Presentation

Bored by Classification ConvNets? End-to-end Learning of other Computer Vision Tasks Thomas Brox University of Freiburg Germany Research funded by ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung


slide-1
SLIDE 1

Thomas Brox

Bored by Classification ConvNets? End-to-end Learning of other Computer Vision Tasks

Thomas Brox University of Freiburg Germany

Research funded by ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung

slide-2
SLIDE 2

Thomas Brox

Generative networks U-Net: Multi-instance segmentation FlowNet: Estimating optical flow

2

Outline

slide-3
SLIDE 3

Thomas Brox

3

Typical ConvNet architecture

cat

Classification network

slide-4
SLIDE 4

Thomas Brox

4

Typical ConvNet architecture

cat

Classification network

cat

slide-5
SLIDE 5

Thomas Brox

Small gray

  • ffice

chair, side view

cat

5

Up-convolutional network

Image generation

Related work:

  • Eigen et al. NIPS 2014: Network for depth map prediction
  • Long et al. CVPR 2015: Network for semantic segmentation

Alexey Dosovitskiy CVPR 2015

New: Expanding network architecture

slide-6
SLIDE 6

Thomas Brox

Generating chair images with a network

Dosovitskiy et al., CVPR 2015

6

slide-7
SLIDE 7

Thomas Brox

7

Training set

Source: https://github.com/dimatura/seeing3d

3D chair dataset

Aubry et al. CVPR 2014 Rendering 809 chair styles From 62 viewpoints

Some of the rendered chairs

slide-8
SLIDE 8

Thomas Brox

8

Generating images of unseen views

Training set split into two subsets: Source set: 62 viewpoints available (90% of all chair models) Target set: fewer viewpoints available (10% of all models)

slide-9
SLIDE 9

Thomas Brox

9

Generating images of unseen views

8 azimuths available 4 azimuths available 2 azimuths available 1 azimuth available

slide-10
SLIDE 10

Thomas Brox

10

Comparison to baselines

Alexey Dosovitskiy CVPR 2015

slide-11
SLIDE 11

Thomas Brox

Interpolation of chair styles

11 11

Alexey Dosovitskiy CVPR 2015

slide-12
SLIDE 12

Thomas Brox

12

Correspondences between chair instances

Alexey Dosovitskiy CVPR 2015

slide-13
SLIDE 13

Thomas Brox

  • Generate intermediate images with the network
  • Track points with optical flow (LDOF) along the sequence

13

Correspondences between chair instances

all easy difficult Deformable Spatial Pyramid Matching (Kim et al. 2013) 5.2 3.3 6.3 SIFT Flow (Liu et al. 2008) 4.0 2.8 4.8 Ours 3.9 3.9 3.9 Human performance 1.1 1.1 1.1

slide-14
SLIDE 14

Thomas Brox

14

Preview: Inverting ConvNets with ConvNets

Alexey Dosovitskiy arXiv 2015

Image features e.g. from AlexNet

Related work:

  • Mahendran & Vedaldi CVPR 2015
  • Zeiler & Fergus ECCV 2014

Learn to re-generate the input image from its feature representation

Up-convolutional network

slide-15
SLIDE 15

Thomas Brox

15

Reconstruction results

Up-Conv. Mahendran & Vedaldi Auto- encoder More reconstructions with up-convolutional network:

slide-16
SLIDE 16

Thomas Brox

16

Color and position are preserved in high layers

input All FC8 Top 5 FC8 All but Top 5 FC8

Color experiment Position experiment

slide-17
SLIDE 17

Thomas Brox

A generative network U-Net: Multi-instance segmentation FlowNet: Estimating optical flow

17

Outline

slide-18
SLIDE 18

Thomas Brox

U-Net: Image segmentation with a ConvNet

18

Olaf Ronneberger

18

  • Similar to Fully

Convolutional Network [Long et al., CVPR 2015]

  • Original inspiration:

Depth map prediction [Eigen et al., NIPS 2014]

Philipp Fischer MICCAI 2015

slide-19
SLIDE 19

Thomas Brox

19

Binary segmentation

Electron Microscopy ISBI 2012 Challenge Rank 1 Light microscopy cell tracking ISBI 2015 Challenge Rank 1

slide-20
SLIDE 20

Thomas Brox

Intersection over union: 77.5% Second best: 46%

20

Multi-class semantic segmentation

X-ray dental segmentation, ISBI 2015 Challenge, Rank 1

slide-21
SLIDE 21

Thomas Brox

21

Multi-instance segmentation

Light microscopy, DIC-HeLa cell tracking ISBI 2015 Challenge: Rank 1

slide-22
SLIDE 22

Thomas Brox

A generative network U-Net: Multi-instance segmentation FlowNet: Estimating optical flow

22

Outline

slide-23
SLIDE 23

Thomas Brox

23

FlowNet: Estimating optical flow with a ConvNet

Refinement: expanding architecture

slide-24
SLIDE 24

Thomas Brox

24

Helping the network with a correlation layer

Alexey Dosovitskiy Philipp Fischer Eddy Ilg Philip Häusser Caner Hazirbas Vladimir Golkov Joint work with the group of Daniel Cremers

slide-25
SLIDE 25

Thomas Brox

  • Getting ground truth optical flow for realistic videos is

hard

  • Existing datasets are small:

25

Enough data to train such a network?

Frames with ground truth Middlebury 8 KITTI 194 Sintel 1041 Needed >10000

slide-26
SLIDE 26

Thomas Brox

26

Realism is overrated: the “flying chair” dataset

Rendered image Optical flow

slide-27
SLIDE 27

Thomas Brox

27

It works!

Although the network has only seen flying chairs for training, it predicts good optical flow on Sintel

Input images Ground truth FlowNetSimple FlowNetCorr

slide-28
SLIDE 28

Thomas Brox

28

Results on various datasets

Middlebury KITTI Sintel Clean Sintel Final Flying Chairs EpicFlow 0.39 3.8 4.1 6.3 2.9 DeepFlow 0.42 5.8 5.4 7.2 3.5 LDOF 0.56 12.4 7.6 9.1 3.5 FlowNetS

  • 7.4

8.4 2.7 FlowNetS+v

  • 6.5

7.7 2.9 FlowNetS+ft

  • 9.1

7.0 7.8 3.0 FlowNetS+ft+v 0.47 7.6 6.2 7.2 3.0 FlowNetC

  • 7.3

8.8 2.2 FlowNetC+v

  • 6.3

8.0 2.6 FlowNetC+ft

  • 6.9

8.5 2.3 FlowNetC+ft+v 0.5

  • 6.1

7.9 2.7

Networks can compete with state-of-the-art conventional optical flow estimation methods

slide-29
SLIDE 29

Thomas Brox

29

Can handle large displacements

Input images Ground truth FlowNetSimple FlowNetCorr DeepFlow (Weinzaepfel et al. ICCV 2013) EpicFlow (Revaud et al. CVPR 2015)

slide-30
SLIDE 30

Thomas Brox

30

Sometimes wrong direction

Input images Ground truth FlowNetSimple FlowNetCorr DeepFlow (Weinzaepfel et al. ICCV 2013) EpicFlow (Revaud et al. CVPR 2015)

slide-31
SLIDE 31

Thomas Brox

31

Often captures fine details

Input images Ground truth FlowNetSimple FlowNetCorr DeepFlow (Weinzaepfel et al. ICCV 2013) EpicFlow (Revaud et al. CVPR 2015)

slide-32
SLIDE 32

Thomas Brox

32

Results on “Flying chairs” test set

Input images FlowNetCorr EpicFlow (Revaud et al. CVPR 2015) Ground truth

slide-33
SLIDE 33

Thomas Brox

33

Runs with 10fps on the GPU

slide-34
SLIDE 34

Thomas Brox

A generative network U-Net: Multi-instance segmentation FlowNet: Estimating optical flow

34

Summary

slide-35
SLIDE 35

Thomas Brox

35

Tip of the day