Vi Video Ob eo Object ject Segm Segmen enta tati tion on - - PowerPoint PPT Presentation

▶

Sep 12, 2022 520 likes •809 views

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Vi Video deo Objec ject Seg egmen entat ation on Lectures 2-3 Lectures 4-5 Object Detection Object Tracking Lectures 7-8 This lecture

SLIDE 1

Vi Video Ob eo Object ject Segm Segmen enta tati tion

CV3DST | Prof. Leal-Taixé 1

SLIDE 2

Vi Video deo Objec ject Seg egmen entat ation

Object Detection Lectures 2-3 Object Tracking Lectures 4-5 Object Segmentation Lectures 7-8 Video Object Segmentation This lecture

CV3DST | Prof. Leal-Taixé 2

SLIDE 3

Vi Video deo Objec ject Seg egmen entat ation

n
Goal: Generate accurate and temporally consistent

pixel masks for objects in a video sequence.

CV3DST | Prof. Leal-Taixé 3

SLIDE 4

VO VOS: som

e chal allen enges es

Strong viewpoint/appearance changes

CV3DST | Prof. Leal-Taixé 4

SLIDE 5

VO VOS: som

e chal allen enges es

Strong viewpoint/appearance changes
Occlusions

CV3DST | Prof. Leal-Taixé 5

SLIDE 6

VO VOS: som

e chal allen enges es

Strong viewpoint/appearance changes
Occlusions
Scale changes

CV3DST | Prof. Leal-Taixé 6

SLIDE 7

VO VOS: som

e chal allen enges es

Strong viewpoint/appearance changes
Occlusions
Scale changes
Illumination
Shape
…

Hard to make assumptions about

bject’s appearance

Hard to make assumptions about

bject’s motion

CV3DST | Prof. Leal-Taixé 7

SLIDE 8

VO VOS: tas asks

Semi-supervised (one-shot) video

bject segmentation

Unsupervised (zero- shot) video object segmentation

We get the first frame ground truth mask, we know what object to segment We have to find the

bjects as well as their

masks

CV3DST | Prof. Leal-Taixé 8

SLIDE 9

VO VOS: tas asks

Semi-supervised (one-shot) video

bject segmentation

Unsupervised (zero- shot) video object segmentation

We get the first frame ground truth mask, we know what object to segment We have to find the

bjects as well as their

masks

CV3DST | Prof. Leal-Taixé 9

Motion segmentation, salient object detection..

SLIDE 10

VO VOS: tas asks

Semi-supervised (one-shot) video

bject segmentation

Unsupervised (zero- shot) video object segmentation

We get the first frame ground truth mask, we know what object to segment We have to find the

bjects as well as their

masks

CV3DST | Prof. Leal-Taixé 10

This lecture

SLIDE 11

Supe Superv rvised Video Obj bject Se Segm gment ntation

Task formulation

– Given: segmentation mask of target object(s) in the first frame – Goal: pixel-accurate segmentation of the entire video – Currently a major testing ground for segmentation-based tracking

Given: First-frame ground truth Goal: Complete video segmentation

CV3DST | Prof. Leal-Taixé 11

SLIDE 12

VO VOS Dat atas aset ets

Remember that large-scale datasets are needed for

learning-based methods

DAVIS 2016 (30/20, single objects, first frames) DAVIS 2017 (60/90, multiple

bjects, first frames)

YouTube-VOS 2018 (3471/982, multiple

bjects, first frame

where object appears)

https://davischallenge.org https://youtube-vos.org

CV3DST | Prof. Leal-Taixé 12

SLIDE 13

Bef Befor

e we e get et star arted… ed…

Pixel-wise output
If we talk about pixel-wise outputs and motion, there

is a concept in Computer Vision that we need to know first

CV3DST | Prof. Leal-Taixé 13

SLIDE 14

Optical l flo low

14 CV3DST | Prof. Leal-Taixé

SLIDE 15

Opt Optica cal l flo flow

Input: 2 consecutive images (e.g. from a video)
Output: displacement of every pixel from image A to

image B

Results in the “perceived” 2D motion, not the real

motion of the object

15 CV3DST | Prof. Leal-Taixé

SLIDE 16

Opt Optica cal l flo flow

16 CV3DST | Prof. Leal-Taixé

SLIDE 17

Opt Optica cal l flo flow

17 CV3DST | Prof. Leal-Taixé

SLIDE 18

Opt Optica cal l flo flow with CNNs NNs

End-to-end supervised learning of optical flow

18 CV3DST | Prof. Leal-Taixé

P. Fischer et al. „FlowNet: Learning Optical Flow With Convolutional Networks“. ICCV 2015

SLIDE 19

Opt Optica cal l flo flow with CNNs NNs

19 CV3DST | Prof. Leal-Taixé

P. Fischer et al. „FlowNet: Learning Optical Flow With Convolutional Networks“. ICCV 2015

SLIDE 20

Fl FlowNet: a : arc rchit itecture ure 1 1

Stack both images à input is now 2 x RGB = 6

channels

20 CV3DST | Prof. Leal-Taixé

SLIDE 21

Fl FlowNet: a : arc rchit itecture ure 2 2

Siamese architecture

21 CV3DST | Prof. Leal-Taixé

SLIDE 22

Fl FlowNet: a : arc rchit itecture ure 2 2

Two key design choices

22 CV3DST | Prof. Leal-Taixé

How to combine the information from both images?

SLIDE 23

Cor Correl elation ion layer er

Multiplies a feature vector with another feature vector

23 CV3DST | Prof. Leal-Taixé

Fixed operation. No learnable weights!

SLIDE 24

Cor Correl elation ion layer er

The matching score represents how correlated these

two feature vectors are

24 CV3DST | Prof. Leal-Taixé

SLIDE 25

Cor Correl elation ion layer er

Hint for anyone interested in 3D reconstruction:

Useful for finding image correspondences

25 CV3DST | Prof. Leal-Taixé

I. Rocco et al. “Convolutional neural network architecture for geometric matching. CVPR 2017.

Find a transformation from image A to image B A B

SLIDE 26

Fl FlowNet : a : arc rchit itecture ure 2 2

Two key design choices

26 CV3DST | Prof. Leal-Taixé

How to combine the information from both images? How to obtain high- quality results?

SLIDE 27

Ca Can we e do

VOS wit

ith OF?

Indeed!
Better if we focus on the

flow of the object

We can improve

segmentation and OF iteratively (no DL yet)

29 CV3DST | Prof. Leal-Taixé

Y.H. Tsai et al. “Video Segmentation via Object Flow“. CVPR 2016