Vi Video Ob eo Object ject Segm Segmen enta tati tion on - - PowerPoint PPT Presentation

vi video ob eo object ject segm segmen enta tati tion on
SMART_READER_LITE
LIVE PREVIEW

Vi Video Ob eo Object ject Segm Segmen enta tati tion on - - PowerPoint PPT Presentation

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Vi Video deo Objec ject Seg egmen entat ation on Lectures 2-3 Lectures 4-5 Object Detection Object Tracking Lectures 7-8 This lecture


slide-1
SLIDE 1

Vi Video Ob eo Object ject Segm Segmen enta tati tion

  • n

CV3DST | Prof. Leal-Taixé 1

slide-2
SLIDE 2

Vi Video deo Objec ject Seg egmen entat ation

  • n

Object Detection Lectures 2-3 Object Tracking Lectures 4-5 Object Segmentation Lectures 7-8 Video Object Segmentation This lecture

CV3DST | Prof. Leal-Taixé 2

slide-3
SLIDE 3

Vi Video deo Objec ject Seg egmen entat ation

  • n
  • Goal: Generate accurate and temporally consistent

pixel masks for objects in a video sequence.

CV3DST | Prof. Leal-Taixé 3

slide-4
SLIDE 4

VO VOS: som

  • me

e chal allen enges es

  • Strong viewpoint/appearance changes

CV3DST | Prof. Leal-Taixé 4

slide-5
SLIDE 5

VO VOS: som

  • me

e chal allen enges es

  • Strong viewpoint/appearance changes
  • Occlusions

CV3DST | Prof. Leal-Taixé 5

slide-6
SLIDE 6

VO VOS: som

  • me

e chal allen enges es

  • Strong viewpoint/appearance changes
  • Occlusions
  • Scale changes

CV3DST | Prof. Leal-Taixé 6

slide-7
SLIDE 7

VO VOS: som

  • me

e chal allen enges es

  • Strong viewpoint/appearance changes
  • Occlusions
  • Scale changes
  • Illumination
  • Shape

Hard to make assumptions about

  • bject’s appearance

Hard to make assumptions about

  • bject’s motion

CV3DST | Prof. Leal-Taixé 7

slide-8
SLIDE 8

VO VOS: tas asks

Semi-supervised (one-shot) video

  • bject segmentation

Unsupervised (zero- shot) video object segmentation

We get the first frame ground truth mask, we know what object to segment We have to find the

  • bjects as well as their

masks

CV3DST | Prof. Leal-Taixé 8

slide-9
SLIDE 9

VO VOS: tas asks

Semi-supervised (one-shot) video

  • bject segmentation

Unsupervised (zero- shot) video object segmentation

We get the first frame ground truth mask, we know what object to segment We have to find the

  • bjects as well as their

masks

CV3DST | Prof. Leal-Taixé 9

Motion segmentation, salient object detection..

slide-10
SLIDE 10

VO VOS: tas asks

Semi-supervised (one-shot) video

  • bject segmentation

Unsupervised (zero- shot) video object segmentation

We get the first frame ground truth mask, we know what object to segment We have to find the

  • bjects as well as their

masks

CV3DST | Prof. Leal-Taixé 10

This lecture

slide-11
SLIDE 11

Supe Superv rvised Video Obj bject Se Segm gment ntation

  • Task formulation

– Given: segmentation mask of target object(s) in the first frame – Goal: pixel-accurate segmentation of the entire video – Currently a major testing ground for segmentation-based tracking

Given: First-frame ground truth Goal: Complete video segmentation

CV3DST | Prof. Leal-Taixé 11

slide-12
SLIDE 12

VO VOS Dat atas aset ets

  • Remember that large-scale datasets are needed for

learning-based methods

DAVIS 2016 (30/20, single objects, first frames) DAVIS 2017 (60/90, multiple

  • bjects, first frames)

YouTube-VOS 2018 (3471/982, multiple

  • bjects, first frame

where object appears)

https://davischallenge.org https://youtube-vos.org

CV3DST | Prof. Leal-Taixé 12

slide-13
SLIDE 13

Bef Befor

  • re

e we e get et star arted… ed…

  • Pixel-wise output
  • If we talk about pixel-wise outputs and motion, there

is a concept in Computer Vision that we need to know first

CV3DST | Prof. Leal-Taixé 13

slide-14
SLIDE 14

Optical l flo low

14 CV3DST | Prof. Leal-Taixé

slide-15
SLIDE 15

Opt Optica cal l flo flow

  • Input: 2 consecutive images (e.g. from a video)
  • Output: displacement of every pixel from image A to

image B

  • Results in the “perceived” 2D motion, not the real

motion of the object

15 CV3DST | Prof. Leal-Taixé

slide-16
SLIDE 16

Opt Optica cal l flo flow

16 CV3DST | Prof. Leal-Taixé

slide-17
SLIDE 17

Opt Optica cal l flo flow

17 CV3DST | Prof. Leal-Taixé

slide-18
SLIDE 18

Opt Optica cal l flo flow with CNNs NNs

  • End-to-end supervised learning of optical flow

18 CV3DST | Prof. Leal-Taixé

  • P. Fischer et al. „FlowNet: Learning Optical Flow With Convolutional Networks“. ICCV 2015
slide-19
SLIDE 19

Opt Optica cal l flo flow with CNNs NNs

19 CV3DST | Prof. Leal-Taixé

  • P. Fischer et al. „FlowNet: Learning Optical Flow With Convolutional Networks“. ICCV 2015
slide-20
SLIDE 20

Fl FlowNet: a : arc rchit itecture ure 1 1

  • Stack both images à input is now 2 x RGB = 6

channels

20 CV3DST | Prof. Leal-Taixé

slide-21
SLIDE 21

Fl FlowNet: a : arc rchit itecture ure 2 2

  • Siamese architecture

21 CV3DST | Prof. Leal-Taixé

slide-22
SLIDE 22

Fl FlowNet: a : arc rchit itecture ure 2 2

  • Two key design choices

22 CV3DST | Prof. Leal-Taixé

How to combine the information from both images?

slide-23
SLIDE 23

Cor Correl elation ion layer er

  • Multiplies a feature vector with another feature vector

23 CV3DST | Prof. Leal-Taixé

Fixed operation. No learnable weights!

slide-24
SLIDE 24

Cor Correl elation ion layer er

  • The matching score represents how correlated these

two feature vectors are

24 CV3DST | Prof. Leal-Taixé

slide-25
SLIDE 25

Cor Correl elation ion layer er

  • Hint for anyone interested in 3D reconstruction:

Useful for finding image correspondences

25 CV3DST | Prof. Leal-Taixé

  • I. Rocco et al. “Convolutional neural network architecture for geometric matching. CVPR 2017.

Find a transformation from image A to image B A B

slide-26
SLIDE 26

Fl FlowNet : a : arc rchit itecture ure 2 2

  • Two key design choices

26 CV3DST | Prof. Leal-Taixé

How to combine the information from both images? How to obtain high- quality results?

slide-27
SLIDE 27

Ca Can we e do

  • VOS wit

ith OF?

  • Indeed!
  • Better if we focus on the

flow of the object

  • We can improve

segmentation and OF iteratively (no DL yet)

29 CV3DST | Prof. Leal-Taixé

Y.H. Tsai et al. “Video Segmentation via Object Flow“. CVPR 2016