Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions
Dong Zhang1, Omar Javed2, Mubarak Shah1 presented by Sehyun Joo based on Dong’s slide
Spatially Accurate and Temporally Dense Extraction of Primary Object - - PowerPoint PPT Presentation
Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions Dong Zhang 1 , Omar Javed 2 , Mubarak Shah 1 presented by Sehyun Joo based on Dongs slide Review : Semantic Based Action
Dong Zhang1, Omar Javed2, Mubarak Shah1 presented by Sehyun Joo based on Dong’s slide
Dong Zhang1, Omar Javed2, Mubarak Shah1 presented by Sehyun Joo based on Dong’s slide
– Method Framework – Object Proposal Expansion – Layered DAG based Primary Object Selection – GMMs and MRF Optimization
Segmentation of the primary moving object in videos
label MRF optimization”, BMVC, 2010:
– SegTrack dataset – Assume manual annotations
segmentation”, ICCV, 2011:
– Object proposals – Extract “key-segments”
constraints for video object segmentation”, CVPR, 2012:
– Object proposals – NP-hard problem, approximate optimization, computation cost
[1] Ian Endres and Derek Hoiem, “Category Independent Object Proposals”, ECCV, 2010 [2] Alexe, B., Deselares, T. and Ferrari, V., “What is an object?”, CVPR, 2010
Frame index 1 2 3 4 # Segtrack (monkeydog)
… …
100 1 2 3 4 30 40 17 21
… … … … … …
100 1 2 3 4 1 2 3 4 1 2 3 4 51 60
… …
100 1 2 3 4 18 Ranked object proposals
100 100
… … … … … … … … … …
Frame index 1 2 3 4 # 96 98 100 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Segtrack (parachute) 33 38 40 43 49 Ranked object proposals expansion
problem for an efficient solution.
problem for an efficient solution.
(1) Object proposal expansion (1) A novel DAG formulation (2) Dynamic programming solution
(2) Flow warped shape and color similarity measure connect frames (3) Optical flow gradient scoring function
Layered DAG Optimization for Primary Object Selection GMMs and MRF based Optimization
Input Videos Object Segmentation
Object Proposal Generation & Expansion
Beginning node Ending node Unary edge
Represents object-ness
An object proposal
𝑻𝒗𝒐𝒃𝒔𝒛 = 𝑵 𝒔 + 𝑩(𝒔)
𝑩 𝒔 : appearance score from [1] 𝑵(𝒔) : average Frobenius norm for optical flow gradient
𝑽𝒚 = 𝒗𝒚 𝒗𝒛 𝒘𝒚 𝒘𝒛
𝑮
= 𝒗𝒚
𝟑 + 𝒗𝒛 𝟑 + 𝒘𝒚 𝟑 + 𝒘𝒛 𝟑
[1] Ian Endres and Derek Hoiem, “Category Independent Object Proposals”, ECCV, 2010
Original video frame Optical flow Object region Optical flow gradient Boundary region OF gradient around boundary
Binary edge
Frame i Frame i+1
Frame i+2
… … … … … … … … … … … … … … … … … … … …
𝑻𝒄𝒋𝒐𝒃𝒔𝒛 𝒔𝒏, 𝒔𝒐 = 𝝁 ∙ 𝑻𝒑𝒘𝒇𝒔𝒎𝒃𝒒 𝒔𝒏, 𝒔𝒐 ∙ 𝑻𝒅𝒑𝒎𝒑𝒔 (𝒔𝒏, 𝒔𝒐) 𝑻𝒅𝒑𝒎𝒑𝒔(𝒔𝒏, 𝒔𝒐) = 𝒊𝒋𝒕𝒖(𝒔𝒏) ∙ 𝒊𝒋𝒕𝒖(𝒔𝒐) 𝑼 𝑻𝒑𝒘𝒇𝒔𝒎𝒃𝒒(𝒔𝒏, 𝒔𝒐) = 𝒔𝒏 ∩ 𝒙𝒃𝒔𝒒𝒏𝒐(𝒔𝒐) 𝒔𝒏 ∪ 𝒙𝒃𝒔𝒒𝒏𝒐(𝒔𝒐)
Frame i-1 Frame i Frame i+1
t s Layer 2i-3 Layer 2i-2 Layer 2i-1 Layer 2i Layer 2i+1 Layer 2i+2
Goal: Find only one object proposal from each frame, such that all of them have high object-ness and high similarity across frames. Find the highest weighted path in the DAG. Longest Path Problem of DAG Dynamic Programming Solution.
Original video Ground truth Selected object proposals Segmentation results
Region within the red boundary is the object region
Original video Ground truth Selected object proposals Segmentation results
Region within the red boundary is the object region
Original video Ground truth Segmentation results Region within the red boundary is the object region
Original video Ground truth Segmentation results Region within the red boundary is the object region
* Average per-frame pixel error rate. The smaller, the better.
Ours [14] [13] [20] [6]
Supervised? N N N Y Y Birdfall 155 189 288 252 454 Cheetah 633 806 905 1142 1217 Girl 1488 1698 1785 1304 1755 Monkeydog 365 472 521 533 683 Parachute 220 221 201 235 502 Avg. 452 542 592 594 791
Original video Segmentation results Region within the red boundary is the object region
Original video Segmentation results Region within the red boundary is the object region
(a) Directed Acyclic Graph (b) Directed Cyclic Graph (c) Undirected Acyclic Graph (d) Undirected Cyclic Graph
What is criteria of weight of binary edge?
(a) how much non-moving (b) how much similar between object across frames (c) how much the difference of color histogram (d) how much the object in frame has precise shape