ALPHATECH, Inc.
Performance Metrics for Group-Detection Algorithms
Presented at Interface 2004 May 29, 2004 Jim White jim.white@alphatech.com Sam Steingold Connie Fournelle
Performance Metrics for Group-Detection Algorithms Presented at - - PowerPoint PPT Presentation
ALPHATECH, Inc. Performance Metrics for Group-Detection Algorithms Presented at Interface 2004 May 29, 2004 Jim White jim.white@alphatech.com Sam Steingold Connie Fournelle Introduction ALPHATECH, Inc. What is the group detection
ALPHATECH, Inc.
Presented at Interface 2004 May 29, 2004 Jim White jim.white@alphatech.com Sam Steingold Connie Fournelle
ALPHATECH, Inc.
ALPHATECH, Inc.
Links GDA Putative Groups Link-quality parameters
1. P45, P671, P7 2. P456, P73 3. P7, P55, P873, P1356, P561, P3 1. List of proteins 2. List of proteins 3. List of proteins …
…
interacting, probably because they belong to a larger group of interacting proteins
generating groups
ALPHATECH, Inc.
Group 1 Group 2 Group 3 Group 4 Group n
(independent of the groups)
group-generated link is noise (not in the group)
ALPHATECH, Inc.
Clutter Link PI = Prob. Link is independent of groups 1 - PI
Select N entities from population, uniform random sampling
Add Noise to Link
Randomly select a group, then randomly sample that group N times
Make a group-generated link
Each entity in link has probability PR
by an entity from
Group- Generated Link
ALPHATECH, Inc.
Noisy Synthetic Links
GDA
Synthetic Groups Statistical Analysis System Performance Metrics GDA Outputs Group Detection System Under Test
ALPHATECH, Inc.
ALPHATECH, Inc.
ALPHATECH, Inc.
Detection System
P(y,x) = P(y|x)P(x) x y Input SNR = P(x=1)/P(x=0) Output SNR = P(x=1|y=1)/P(x=0|y=1)
Actual Group Membership Detector Output
Link data & GDA
x = 1 if entity actually belongs to the group, x = 0 otherwise y = 1 if detector assigns the entity to the group, y =0 otherwise
P(x=0,y=0) P(x=0,y=1) P(x,y) = P(x=1,y=0) P(x=1,y=1)
ALPHATECH, Inc.
Pg = P(x=1)
prior probability of an entity belonging to the group (group prevalence)
Fn = P(y=0|x=1) false-negative rate (miss rate) Fp = P(y=1|x=0) false-positive rate (false-alarm rate)
Error rate Pe = P(x does not equal y) = Pg Fn + (1-Pg) Fp Detection Probability Tp = P(y=1|x=1) = 1 – Fn (Recall metric, sensitivity) Positive Predictive Value PV+ = P(x=1|y=1) (Precision metric) Negative Predictive Value PV- = P(x=0|y=0) Bayes Factor G1 = Posterior odds favoring x=1 divided by prior odds favoring x=1 Signal-to-noise ratios SNRout = G1 SNRin, where SNRin = P(x=1)/P(x=0)
ALPHATECH, Inc.
detection performance
performance (the entropy of x)
ALPHATECH, Inc.
I(X,Y) = ΣΣ P(x,y) log P(x,y) / [ P(x)P(y) ]
H(X) = - Σ p(x) log p(x)
ALPHATECH, Inc.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Log10(False-Positive Rate) Recall ROC curves for different proficiencies 90 80 70 70 60 60 5 50 4 40 4 3 30 3 2 2 20 2
constant proficiency
points such that Proficiency > 0.56
P(x=1) is 0.002 Precision = 0.5
ALPHATECH, Inc.
Rate is the same for both
case
ALPHATECH, Inc.
ALPHATECH, Inc.
proteins
interact or work together
experiments
groups software
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Group Proficiency (%) Orphan Proficiency (%)
2 4 8 16 32 64 128 256 512
P(x=1) is 0.002
ALPHATECH, Inc.
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Group Proficiency (%) Orphan Proficiency (%)
2 4 8 16 32 64 128 256 512
PI = 0.2 (twice the clutter)
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Group Proficiency (%) Orphan Proficiency (%)
2 4 8 16 32 64 128 256 512
PR = 0.2 (twice the noise)
ALPHATECH, Inc.
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Group Proficiency (%) Orphan Proficiency (%)
2 4 8 16 32 64 128 256 512
PI = 0.05 (half the clutter)
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Group Proficiency (%) Orphan Proficiency (%)
2 4 8 16 32 64 128 256 512
PR = 0.05 (half the noise)
ALPHATECH, Inc.
supervised learning
labeled real data
ALPHATECH, Inc.
ALPHATECH, Inc.
P(x=1) is 0.1
Proficiency
ALPHATECH, Inc.
published over last 3 years
mathematicians
The links that were not generated by a single aerosol research team
Percentage of authors that were actually not on the research team that wrote the paper
find (an input to K-Groups)
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Group Proficiency (%) Orphan Proficiency (%)
10 15 20 30 40
ALPHATECH, Inc.
P(x=1) is 0.1
curve
corresponds to a different value of proficiency
prevalence P(x=1)
True positive rate