vi video ob eo object ject segm segmen enta tati tion on
play

Vi Video Ob eo Object ject Segm Segmen enta tati tion on - PowerPoint PPT Presentation

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Vi Video deo Objec ject Seg egmen entat ation on Lectures 2-3 Lectures 4-5 Object Detection Object Tracking Lectures 7-8 This lecture


  1. Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taixé 1

  2. Vi Video deo Objec ject Seg egmen entat ation on Lectures 2-3 Lectures 4-5 Object Detection Object Tracking Lectures 7-8 This lecture Object Segmentation Video Object Segmentation CV3DST | Prof. Leal-Taixé 2

  3. Vi Video deo Objec ject Seg egmen entat ation on • Goal: Generate accurate and temporally consistent pixel masks for objects in a video sequence. CV3DST | Prof. Leal-Taixé 3

  4. VO VOS: som ome e chal allen enges es • Strong viewpoint/appearance changes CV3DST | Prof. Leal-Taixé 4

  5. VO VOS: som ome e chal allen enges es • Strong viewpoint/appearance changes • Occlusions CV3DST | Prof. Leal-Taixé 5

  6. VO VOS: som ome e chal allen enges es • Strong viewpoint/appearance changes • Occlusions • Scale changes CV3DST | Prof. Leal-Taixé 6

  7. VOS: som VO ome e chal allen enges es • Strong viewpoint/appearance changes • Occlusions Hard to make • Scale changes assumptions about • Illumination object’s appearance • Shape Hard to make • … assumptions about object’s motion CV3DST | Prof. Leal-Taixé 7

  8. VO VOS: tas asks Semi-supervised Unsupervised (zero- (one-shot) video shot) video object object segmentation segmentation We get the first frame We have to find the ground truth mask, we know objects as well as their what object to segment masks CV3DST | Prof. Leal-Taixé 8

  9. VO VOS: tas asks Motion segmentation, salient object detection.. Semi-supervised Unsupervised (zero- (one-shot) video shot) video object object segmentation segmentation We get the first frame We have to find the ground truth mask, we know objects as well as their what object to segment masks CV3DST | Prof. Leal-Taixé 9

  10. VO VOS: tas asks This lecture Semi-supervised Unsupervised (zero- (one-shot) video shot) video object object segmentation segmentation We get the first frame We have to find the ground truth mask, we know objects as well as their what object to segment masks CV3DST | Prof. Leal-Taixé 10

  11. Supe Superv rvised Video Obj bject Se Segm gment ntation Given: First-frame ground truth Goal: Complete video segmentation Task formulation • – Given: segmentation mask of target object(s) in the first frame – Goal: pixel-accurate segmentation of the entire video – Currently a major testing ground for segmentation-based tracking CV3DST | Prof. Leal-Taixé 11

  12. VO VOS Dat atas aset ets • Remember that large-scale datasets are needed for learning-based methods DAVIS 2016 DAVIS 2017 YouTube-VOS 2018 (30/20, single objects, (60/90, multiple (3471/982, multiple first frames) objects, first frames) objects, first frame where object appears) https://davischallenge.org https://youtube-vos.org CV3DST | Prof. Leal-Taixé 12

  13. Bef Befor ore e we e get et star arted… ed… • Pixel-wise output • If we talk about pixel-wise outputs and motion, there is a concept in Computer Vision that we need to know first CV3DST | Prof. Leal-Taixé 13

  14. Optical l flo low CV3DST | Prof. Leal-Taixé 14

  15. Opt Optica cal l flo flow • Input: 2 consecutive images (e.g. from a video) • Output: displacement of every pixel from image A to image B • Results in the “perceived” 2D motion, not the real motion of the object CV3DST | Prof. Leal-Taixé 15

  16. Opt Optica cal l flo flow CV3DST | Prof. Leal-Taixé 16

  17. Opt Optica cal l flo flow CV3DST | Prof. Leal-Taixé 17

  18. Opt Optica cal l flo flow with CNNs NNs • End-to-end supervised learning of optical flow P. Fischer et al. „FlowNet: Learning Optical Flow With Convolutional Networks“. ICCV 2015 CV3DST | Prof. Leal-Taixé 18

  19. Opt Optica cal l flo flow with CNNs NNs P. Fischer et al. „FlowNet: Learning Optical Flow With Convolutional Networks“. ICCV 2015 CV3DST | Prof. Leal-Taixé 19

  20. Fl FlowNet: a : arc rchit itecture ure 1 1 • Stack both images à input is now 2 x RGB = 6 channels CV3DST | Prof. Leal-Taixé 20

  21. Fl FlowNet: a : arc rchit itecture ure 2 2 • Siamese architecture CV3DST | Prof. Leal-Taixé 21

  22. Fl FlowNet: a : arc rchit itecture ure 2 2 • Two key design choices How to combine the information from both images? CV3DST | Prof. Leal-Taixé 22

  23. Cor Correl elation ion layer er • Multiplies a feature vector with another feature vector Fixed operation. No learnable weights! CV3DST | Prof. Leal-Taixé 23

  24. Cor Correl elation ion layer er • The matching score represents how correlated these two feature vectors are CV3DST | Prof. Leal-Taixé 24

  25. Cor Correl elation ion layer er • Hint for anyone interested in 3D reconstruction: Useful for finding image correspondences A Find a transformation from image A to image B B I. Rocco et al. “Convolutional neural network architecture for geometric matching. CVPR 2017. CV3DST | Prof. Leal-Taixé 25

  26. Fl FlowNet : a : arc rchit itecture ure 2 2 • Two key design choices How to obtain high- quality results? How to combine the information from both images? CV3DST | Prof. Leal-Taixé 26

  27. Ca Can we e do o VOS wit ith OF? • Indeed! • Better if we focus on the flow of the object • We can improve segmentation and OF iteratively (no DL yet) Y.H. Tsai et al. “Video Segmentation via Object Flow“. CVPR 2016 CV3DST | Prof. Leal-Taixé 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend