[PPT] - Performance Metrics for Group-Detection Algorithms Presented at PowerPoint Presentation

SLIDE 1

ALPHATECH, Inc.

Performance Metrics for Group-Detection Algorithms

Presented at Interface 2004 May 29, 2004 Jim White jim.white@alphatech.com Sam Steingold Connie Fournelle

SLIDE 2

ALPHATECH, Inc.

Introduction

What is the group detection problem?
Evaluating Group-Detection Algorithms (GDAs) with synthetic data
Performance metrics
Some performance evaluation results

SLIDE 3

ALPHATECH, Inc.

Introduction to Group Detection

Links GDA Putative Groups Link-quality parameters

1. P45, P671, P7 2. P456, P73 3. P7, P55, P873, P1356, P561, P3 1. List of proteins 2. List of proteins 3. List of proteins …

…

Each link is a list of proteins that were observed to be working together or

interacting, probably because they belong to a larger group of interacting proteins

Groups may be cellular processes, bio subsystems, …
Links are noisy fragments of evidence, possibly much smaller than the

generating groups

SLIDE 4

ALPHATECH, Inc.

Groups and Links

Group 1 Group 2 Group 3 Group 4 Group n

Orphans

Entities (proteins)
Exchangeable
May belong to more than one group
Groups (processes, systems)
Independent, may overlap
Generate links
Observed Links
Either group-generated or clutter
Each group-generated link is produced by
ne of the groups
Orphan entities
Don’t belong to any groups
Link-quality parameters
PI = prior probability that a link is clutter

(independent of the groups)

PR = prior probability that an entity in a

group-generated link is noise (not in the group)

SLIDE 5

ALPHATECH, Inc.

How Links Are Generated

Clutter Link PI = Prob. Link is independent of groups 1 - PI

Select N entities from population, uniform random sampling

Add Noise to Link

Randomly select a group, then randomly sample that group N times

Make a group-generated link

Each entity in link has probability PR

f being replaced

by an entity from

utside the group

Group- Generated Link

Each link is either clutter or is generated by one of the groups

SLIDE 6

ALPHATECH, Inc.

Evaluating GDAs with Synthetic Data

Noisy Synthetic Links

GDA

Synthetic Groups Statistical Analysis System Performance Metrics GDA Outputs Group Detection System Under Test

System performance depends on both the GDA and the information content of the Links

SLIDE 7

ALPHATECH, Inc.

Testing with Synthetic Data Can Answer Important Questions

How many links are needed?
Is link size critical?
How sensitive is performance to noise and clutter?
How does performance vary with # of groups and group size?
What are typical scenarios in which the GDA does very well?
What are problem scenarios in which the GDA under performs?
Testing with synthetic data provides a rational basis for planning

follow-up tests with real data.

SLIDE 8

ALPHATECH, Inc.

Performance Metrics

SLIDE 9

ALPHATECH, Inc.

Input-Output Model for Analyzing Detection System Performance

Detection System

P(y,x) = P(y|x)P(x) x y Input SNR = P(x=1)/P(x=0) Output SNR = P(x=1|y=1)/P(x=0|y=1)

Actual Group Membership Detector Output

Link data & GDA

Indicator variables (x,y) for membership of a generic entity in a generic group

x = 1 if entity actually belongs to the group, x = 0 otherwise y = 1 if detector assigns the entity to the group, y =0 otherwise

Four probabilities characterize detection performance (joint distribution)

P(x=0,y=0) P(x=0,y=1) P(x,y) = P(x=1,y=0) P(x=1,y=1)

SLIDE 10

ALPHATECH, Inc.

Performance in a 3-D World

The four probabilities in P(x,y) sum to 1, so detection performance lives in a 3-D world A nice parameterization

Pg = P(x=1)

prior probability of an entity belonging to the group (group prevalence)

Fn = P(y=0|x=1) false-negative rate (miss rate) Fp = P(y=1|x=0) false-positive rate (false-alarm rate)

Classical performance metrics are functions of Fn, Fp, and Pg

Error rate Pe = P(x does not equal y) = Pg Fn + (1-Pg) Fp Detection Probability Tp = P(y=1|x=1) = 1 – Fn (Recall metric, sensitivity) Positive Predictive Value PV+ = P(x=1|y=1) (Precision metric) Negative Predictive Value PV- = P(x=0|y=0) Bayes Factor G1 = Posterior odds favoring x=1 divided by prior odds favoring x=1 Signal-to-noise ratios SNRout = G1 SNRin, where SNRin = P(x=1)/P(x=0)

SLIDE 11

ALPHATECH, Inc.

Proficiency Metric

Avoids a Limitation of Classical Metrics

No single classical metric is sensitive to both Fn and Fp as input

SNR goes to 0

Analyst must consider two metrics simultaneously to measure performance
ROC curves
Fn and Fp
Precision and Recall
Juggling two metrics complicates algorithm optimization and the interpretation of

detection performance

Usual focus on error rate can be very misleading
The Proficiency metric from information theory is never blind
Proficiency = I(x,y) / H(x)
I(x,y) = amount of information about x that is provided by y (the mutual information)
H(x) = the amount of information about x that is required to achieve ideal error-free

performance (the entropy of x)

0 ≤ Proficiency ≤ 1
Deficiency is defined as 1 - Proficiency

SLIDE 12

ALPHATECH, Inc.

Definitions of I(X,Y) and H(X)

Mutual information of joint distribution P(x,y)

I(X,Y) = ΣΣ P(x,y) log P(x,y) / [ P(x)P(y) ]

Entropy of marginal distribution P(x)

H(X) = - Σ p(x) log p(x)

SLIDE 13

ALPHATECH, Inc.

Proficiencies and ROC Curves

4
3.5
3
2.5
2
1.5
1
0.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Log10(False-Positive Rate) Recall ROC curves for different proficiencies 90 80 70 70 60 60 5 50 4 40 4 3 30 3 2 2 20 2

Each curve has

constant proficiency

Red Box
Recall > 0.75
Precision > 0.5
Contains operating

points such that Proficiency > 0.56

P(x=1) is 0.002 Precision = 0.5

SLIDE 14

ALPHATECH, Inc.

Comparing Deficiency with Error Rate

A detection system is looking for two groups: a large one and a small one
The group sizes are 1/10 and 1/10,000 of the population
The detection system has low error rates: Fn = Fp = 1/10,000
Deficiency metric shows that the smaller group is harder to find while the Error

Rate is the same for both

Deficiencies: 0.0026 vs 0.136
Error Rates: 0.0001 vs 0.0001 (insensitive to changes in group prevalence)
Precision metric (and the output SNR) track the performance difference in this

case

Precision: 0.999 vs 0.5
Output SNR: 999 vs 1

SLIDE 15

ALPHATECH, Inc.

Example of Performance Evaluation Using Synthetic Data

SLIDE 16

ALPHATECH, Inc.

Sensitivity to Number of Experiments

Using CMU’s K-Groups Algorithm (J. Schneider & A. Moore)

Synthetic Universe
10,000 proteins
5 groups, each containing 20

proteins

Synthetic Links
Link = proteins observed to

interact or work together

Size = 2, 4, or 6 proteins
One link per experiment
Link Quality
10% clutter links
10% noise in group links
Evaluation Objective
Determine proficiency vs #

experiments

Google autonlab to get k-

groups software

Unsupervised detection

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

Group Proficiency (%) Orphan Proficiency (%)

2 4 8 16 32 64 128 256 512

P(x=1) is 0.002

SLIDE 17

ALPHATECH, Inc.

More Noise or Clutter

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

Group Proficiency (%) Orphan Proficiency (%)

2 4 8 16 32 64 128 256 512

PI = 0.2 (twice the clutter)

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

Group Proficiency (%) Orphan Proficiency (%)

2 4 8 16 32 64 128 256 512

PR = 0.2 (twice the noise)

SLIDE 18

ALPHATECH, Inc.

Less Noise or Clutter

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

Group Proficiency (%) Orphan Proficiency (%)

2 4 8 16 32 64 128 256 512

PI = 0.05 (half the clutter)

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

Group Proficiency (%) Orphan Proficiency (%)

2 4 8 16 32 64 128 256 512

PR = 0.05 (half the noise)

SLIDE 19

ALPHATECH, Inc.

Summary

Group detection
Is distinct from clustering
Looks for small groups of interacting entities in large populations
Proficiency Metric
Is a rigorous information-theoretic performance measure
Much safer than using just error rate or accuracy
May be used when tuning the parameters in machine-learning algorithms that use

supervised learning

Simplifies the interpretation of performance evaluations based on synthetic or

labeled real data

SLIDE 20

ALPHATECH, Inc.

Appendix

SLIDE 21

ALPHATECH, Inc.

Proficiency and Area Under ROC Curve

P(x=1) is 0.1

Proficiency

SLIDE 22

ALPHATECH, Inc.

Finding scientific teams doing research on aerosols

Using CMU’s K-Groups Algorithm (J. Schneider & A. Moore)

Synthetic Links
Authors of 504 research papers

published over last 3 years

Synthetic Universe
10,000 scientists, engineers, and

mathematicians

Link Quality
10% clutter

The links that were not generated by a single aerosol research team

10% noise

Percentage of authors that were actually not on the research team that wrote the paper

Synthetic Ground Truth
Twenty research teams
Test Objective
Determine proficiency vs # groups to

find (an input to K-Groups)

Underestimating is worse than
verestimating the # research teams

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

Group Proficiency (%) Orphan Proficiency (%)

10 15 20 30 40

SLIDE 23

ALPHATECH, Inc.

Proficiency and the ROC

P(x=1) is 0.1

Proficiency → ROC

curve

Each ROC curve

corresponds to a different value of proficiency

Mapping depends
n the group

prevalence P(x=1)

Red box
recall > 75%
precision > 50%

True positive rate