Methods for biomarker identification and model interrogation and statistical approaches for model comparisons
Lee Lancashire Bioinformatics Group Leader: Compandia Ltd. Visiting Scholar: Nottingham Trent University Stratified Medince: Diagnostic, Prognostic and Predictive Biomarkers in Clinical Practice University of Birmingham 30th June 2010
Outline
- Introduction into biomarkers and problems associated
with their identification
- Outline of current solutions being developed by
Compandia to overcome these issues
– Case study 1
- Introduction into statistical approaches for comparing
diagnostic models
– Case study 2
Uses of biomarkers
- Early detection screening
- Diagnosis
- Outcome risk‐ prognostic
- Treatment selection‐ predictive
- 1. Classification using biomarkers
- Binary classification
– (Instances, Class labels): (x1, y1), (x2, y2), ..., (xn, yn) – yi {0,1} ‐ valued – Classifier: provides class prediction Ŷ for an instance
- Outcomes for a prediction:
1 1 True positive (TP) False positive (FP) False negative (FP) True negative (TN)
Predicted class True class
Problems with biomarker identification
- Dimensionality
– Particularly in genomic and proteomic studies – Thousands of genes, proteins or peptides representing the profile
- f an individual
- Complexity
– Genes and proteins relate to phenotype with non‐linear relationships
Biomarker Distiller
- An advanced algorithm based on ANNs.
– Predict classes or continuous variables. – Models the outcome of the question being asked. E.g. Responder or non‐responder, patient or control. – Can cope with noise, complexity and non‐linearity found in biological data
- Comprehensive and robust data‐mining.
– For a typical gene array dataset‐ searches through 50 million model combinations for an
- ptimum solution
– Every model developed is optimised for performance on an unseen data set.
- Models predict well for new blind cases.
– Provide decision tools that are applicable to all cases that could present
- Finds an optimised solution.
– E.g. 9 genes compared with 70+ genes (comparison with other, recursive methods)
- We can gain information on a system by interrogation of this optimised model.
– Assess performance measures e.g. ROC curves, sensitivity and specificity – Ranking of cases and population structure – A probability visualisation for all cases – Response curves and surfaces for each parameter in the model. – Performance and probabilities for any new or blind cases available