Evaluation of Segmentation Quality via Adaptive Composition of - - PowerPoint PPT Presentation

evaluation of segmentation quality via adaptive
SMART_READER_LITE
LIVE PREVIEW

Evaluation of Segmentation Quality via Adaptive Composition of - - PowerPoint PPT Presentation

Evaluation of Segmentation Quality via Adaptive Composition of Reference Segmentations Bo Peng 1 , Lei Zhang 2 , Xuanqin Mou 3 , and Ming-Hsuan Yang 4 1 Southwest Jiaotong University, 2 Hong Kong Polytechnic University, 3 Xi'an Jiaotong University,


slide-1
SLIDE 1

Evaluation of Segmentation Quality via Adaptive Composition of Reference Segmentations

Bo Peng1, Lei Zhang2, Xuanqin Mou3, and Ming-Hsuan Yang4

1Southwest Jiaotong University, 2Hong Kong Polytechnic University, 3Xi'an Jiaotong University, 4University of California at Merced

1

slide-2
SLIDE 2

Introduction

Ø What is image segmentation?

tissue soft bone

hard bone

Extract “regions”

  • r “boundaries”

labeling

2

slide-3
SLIDE 3

Introduction

Ø Evaluation of image segmentation quality

Hand-labeled segmentation (ground truth) Machine segmentation

Reference-based segmentation evaluation

3

slide-4
SLIDE 4

Introduction

Ø Applications

Ø Performance evaluation of segmentation algorithms. Ø Proper parameter values can be determined based on reliable quantitative evaluation of image segmentation.

4

slide-5
SLIDE 5

Related work

Ø Variation of Information metric (VOI)

It measures the distance between two segmentations in terms of their average conditional entropy.

Ø Segmentation Covering (SC)

It measures the similarity between segmentations by weight averaging the overlaps of regions in two segmentations.

) , ( 2 ) | ( ) | ( ) , (

2 1 1 2 2 1 2 1

S S I S S H S S H S S VOI   

 

    

1 2

' ' max 1 ) (

' 2 1 S R S R

R R R R R N S S C

  • 1. M. Meila. Comparing clusterings: an axiomatic view. In International Conference on Machine Learning, pages 577-584, 2005
  • 2. P. Arbelaez, M. Maire, C. C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE Transactions
  • n Pattern Analysis and Machine Intelligence, 33(5):898–916, 2011

5

slide-6
SLIDE 6

Related work

Ø Global Consistency Error (GCE)

It measure to which degree the segmentations S1 and S2 agree with each other.

Ø F-measure

A combination of precision and recall leads to the F-measure.

) , ( ) , ( \ ) , ( ) , , (

1 2 1 2 1 i i i i

p S R p S R p S R p S S E  } ) , , ( , ) , , ( min{ 1 ) , (

1 2 2 1 2 1

 

i i i i

p S S E p S S E N S S GCE P R PR F ) 1 (     

  • D. Martin. An Empirical Approach to Grouping and Segmentation. PhD thesis, EECS Department, University of

California, Berkeley, 2002 6

slide-7
SLIDE 7

Related work

Ø Global comparison strategy

Elements (e.g. pixels) from one segmentation are fully compared with those of another segmentation (i.e. the ground truth).

Ø The human visual system (HVS): highly adapted to extract structural information from natural scenes. Ø Human observers may pay different attentions to different parts of the images. Ø Ground truths of the same image therefore present various granularities in the object parts. This fact makes them rarely identical in the global view, while highly consistent in the local structures.

7

slide-8
SLIDE 8

Motivation

An illustrative example between a machine segmentation and labeled segmentations by humans.

8

slide-9
SLIDE 9

Proposed evaluation framework

9

slide-10
SLIDE 10

Composing Reference Segmentations

Seek for the labeling that minimizes the energy:

 

    

i M g g g g g g gj

j j j j j j

l l T u l D l E

} , { } , {

' ' '

) ( ) ( ) (  We use l labels, where each label corresponds to one reference segmentation, to compose G*.

10

slide-11
SLIDE 11

Composing Reference Segmentations

Seek for the labeling l that minimizes the energy:

 

    

i M g g g g g g gj

j j j j j j

l l T u l D l E

} , { } , {

' ' '

) ( ) ( ) (  ) , ( ) (

j j g

g s d l D

j

 

     

  • therwise

if 1 ) (

' ' j j j j

g g g g

l l l l T

} , min{

' } , {

'

j j g g

d d u

j j

  

11

slide-12
SLIDE 12

Composing Reference Segmentations

Seek for the labeling that minimizes the energy.

 

    

i M g g g g g g gj

j j j j j j

l l T u l D l E

} , { } , {

' ' '

) ( ) ( ) ( 

Multi-label graph cuts [Y.Boykov et al. 2001]

12

slide-13
SLIDE 13

Composing Reference Segmentations

Localization errors from human labeling process Structural similarity index: define a pixel-based distance, which uses the complex Gabor transform coefficients.

) , ( 1 ) , (

' ' s s s s

c c H c c d   

wavelet transform coefficients: CW-SSIM [M. Sampat, 2009]

    

    

      

N i i y i x N i i y i x N i N i i y i x N i i y i x y x

c c c c c c c c H

1 * , , 1 * , , 1 1 2 , 2 , 1 * , ,

| | 2 | | 2 | | | | | || | 2 ) , (     c c

13

) , ( ) (

j j g

g s d l D

j

 

slide-14
SLIDE 14

Measuring Segmentation Quality

  • Compute the similarity (or distance) between S and the reference

*

G



 

   

K i S s s j j p

i j j

R g s d N S Q

1 *))

, ( 2 1 ( 1 ) , ( G The similarity between S and the composite ground truth G*: The empirical global confidence of G*:

j s

d R

j

   1

) , (

* j j g

s d 

14

slide-15
SLIDE 15

Examples of composed references

69 . , 73 . , 80 . , 81 .

2 1 2 1

    Q Q Q Q

p p

49 . , 49 . , 56 . , 52 .

2 1 2 1

    Q Q Q Q

p p

15

slide-16
SLIDE 16

Datasets

Ø Image segmentation dataset

User interface of the developed image segmentation tool. Ø Livewire

16

slide-17
SLIDE 17

Datasets

Ø Image segmentation dataset

database BDSD Our Database # images 500 200 # ground truths/image 4-9 6-15 Image type Natural images Natural images Software supported yes yes # subjects 30 45 Time/segmentation 5-30 min 2-4 min

17

slide-18
SLIDE 18

Datasets

ØSegmentation evaluation dataset

Ø Compare the performance of a pair of segmentations based on a segmentation dataset with human labeled results. Ø Contains 500 pairs of segmentations and the corresponding evaluation results by human subjects.

  • Seg. Algorithms

Parameter values EG K = {600,800,1000,1400,1800} MS hr ={7, 11, 15, 19, 23}, hs=7, minR = 150. CTM ε={0.1 , 0.2, 0.3, 0.4, 0.5} TBES Nsp = 200, ε= {50, 100, 200,300, 400}

18

slide-19
SLIDE 19

Datasets

ØSegmentation evaluation dataset

Ø Compare the performance of a pair of segmentations based on a segmentation dataset with human labeled results. Ø Contains 500 pairs of segmentations and the corresponding evaluation results by human subjects.

  • Seg. pairs: with 10 human subjects, the the best 3 and the worst 3 segmentations randomly select
  • ne segmentation from the group of good/bad.

Subjective evaluation: 70 subjects with little or no research experience in image segmentation. 500 pairs of segmentations are evenly divided into 10 groups.

19

slide-20
SLIDE 20

Datasets

Ø Segmentation evaluation dataset

Distribution of confidence rates on the proposed segmentation evaluation dataset.

20

slide-21
SLIDE 21

Experimental Results

Ø Intel Core 2 Duo 3.00 GHz CPU and 4GB memory. Ø The run time: 24.6 6.0 seconds for composing the reference G* 10.7 1.1 seconds for computing the score Qp.

  21

slide-22
SLIDE 22

Experimental Results

Ø Sensitivity analysis

Test the effects of and initial labeling on the final evaluation score. Alpha-expansion algorithm: break multi-way cut computation into a sequence of binary s-t cuts. 

 

    

i M g g g g g g gj

j j j j j j

l l T u l D l E

} , { } , {

' ' '

) ( ) ( ) ( 

22

slide-23
SLIDE 23

Experimental Results

Ø Sensitivity analysis

Fix to be [500,1200], with an interval of 50. The initial labeling of graph cut is set randomly, then the mean values and standard deviations of

p

Q

23

slide-24
SLIDE 24

Experimental Results

Ø Sensitivity analysis

Ø Fix to be 800. Ø Carry out the proposed algorithm 50 times with random initialization of labeling.

24

slide-25
SLIDE 25

Evaluation with Meta-Measure

Ø The meta-measure

Ø human labeled segmentation vs. human labeled segmentation of the same image Ø human labeled segmentation vs. machine segmentations of a different image

25

A A B C

slide-26
SLIDE 26

Evaluation with Meta-Measure

Ø The meta-measure

Ø human labeled segmentation vs. human labeled segmentation of the same image Ø human labeled segmentation vs. machine segmentations of a different image Ø the percentage of comparisons that agree with this principle as the meta-measure result

26

slide-27
SLIDE 27

Evaluation with meta-measure

Evaluation results with the meta-measure on different measures

Measures PRI GCE VOI BDE F-measure SC(S->G) SC(G->S) Qp BSDS500 0.911 0.929 0.967 0.921 0.882 0.962 0.956 0.984 Proposed dataset 0.959 0.981 0.991 0.947 0.838 0.974 0.979 0.994

27

slide-28
SLIDE 28

Evaluation with proposed segmentation dataset

28

slide-29
SLIDE 29

Evaluation with proposed segmentation dataset

Evaluation results by different measures.

29

slide-30
SLIDE 30

Evaluation with proposed segmentation dataset

The false evaluation rates with respect to the confidence rate of human subjects

30

slide-31
SLIDE 31

Further work

Composed exemplar reference image using region-based distance:

[1] Features of similarity.[A. Tversky. Psychological Review, 1977] [2] Region based exemplar references for image segmentation evaluation. [B.Peng et al. SPL,2016]

) ( ) ( 1 ) , ( B A M B A M B A d     

j j j

R R R M

N N Q

Improved measure

VOI GCE PRI

Q Q Q ,

,

31

slide-32
SLIDE 32

Experimental Results

Ø Intel Core 2 Duo 3.00 GHz CPU and 4GB memory. Ø The run time: 6:5 4.5 seconds for composing the reference G* Ø is set by line search within range [50, 500] for each input segmentation

32

slide-33
SLIDE 33

Experimental Results

Meta-measure resutls: Subjective evaluation resutls:

33

slide-34
SLIDE 34

Conclusions

Ø Proposed a framework for evaluating segmentation quality with multiple human labeled segmentations. Ø A reference segmentation was adaptively constructed. Ø We presented a segmentation dataset and segmentation evaluation dataset to facilitate quantitative quality assessment. Ø Extensive experiments demonstrate the effectiveness of our framework.

34