SLIDE 27 Perceptual contrastive loss Lcontrast
To facilitate the decoder D segmenting the co-occurrent objects, we exploit two properties:
◮ high foreground object similarity across images. ◮ high foreground-background discrepancy within each image.
We first generate the object image I o
i and the background image I b i
for each image Ii by I o
i = Mi ⊗ Ii and I b i = (1 − Mi) ⊗ Ii for i ∈ {A, B},
(6) where ⊗ denotes the pixel-wise multiplication between the two
We apply an ImageNet-pretrained ResNet-50 network F to I o
i and I b i
to extract their semantic feature vectors F(I o
i ) and F(I b i ),
respectively.
27 / 48