Harnessing Metabolomics for Biomarker Discovery in Colon Cancer: - - PowerPoint PPT Presentation
Harnessing Metabolomics for Biomarker Discovery in Colon Cancer: - - PowerPoint PPT Presentation
Harnessing Metabolomics for Biomarker Discovery in Colon Cancer: Opportunities and Challenges Daniel Raftery UW Medicine Northwest Metabolomics Research Center Acknowledgements Collaborators Current and Past Members Min Zhang, Purdue Rob
Acknowledgements
Current and Past Members
Nagana Gowda Danijel Djukovic Lisa Bettcher Natalie Nguyen Hayley Purcell Vadim Pascua Fausto Carnevale-Neto
Collaborators
Min Zhang, Purdue Dabao Zhang, Purdue Chen Chen, Purdue Bruce Clurman, FH Gabi Chiorean, SCCA Jiyong Dong, Xiamen
https://nwwashington.edu
Rob Pepin Jianjiang Zhu Xinyu Zhang Haiwei Gu Leela Paudel Lingli Deng Ping Zhang Wentao Zhu Dan Du Renke Zhang Dongfang Wang Qiang Fei
Metabolomics in Context
Biological Systems
O OH
O H HO H HO H H OH H OH OH
NH O OH
H2N OH O
Genes Metabolites Metabolic Profiling Proteomics Genomics Proteins Advanced Analytical Techniques Stimuli
Genotype + Environment --> Phenotype
- Analysis of small molecules
in bio-systems ~20,000 aq & 200,000+ lipids Endogenous metabolites Exogenous metabolites
- Applications in Metabolomics
Disease Diagnostics Companion (Drug) Diagnostics Toxicology Food and Nutrition Drug Discovery Personalized Medicine Systems Biology Research
- G. A. N. Gowda, S. Zhang, H. Gu, V. Asiago, N. Shanaiah and D. Raftery, "Metabolomics-Based Methods for Early Disease Diagnostics: A Review,"
Expert Rev. Mol. Diagnos., 8:617-33 (2008)
Metabolomics
Time Sick Regulation Healthy
Discovery of Biomarkers
Understanding Systems Biology BioMarkers
Brief History
2000 BC Chinese/Greek apocryphal story of ants and urine 1800-1900: Identification of various metabolites 1930 – 50’s Metabolite pathways identified 1950 -1960’s: MS and NMR development 1960’s: First “metabolomics” studies 1970’s: LC and chemometrics development 1980’s: LC-MS and high field NMR development 1998-99: Metabonomics and metabolomics coined 2000’s: Development of statistical methods and databases Field is expanding rapidly (>1000 papers/year)
Metabolism
Metabolism is:
Complex Interconnected Influenced by genetics &
environment (food, stresses including illness)
Affects upstream biology
(gene expression, epigenetics, protein function)
Metabolic Maps
Metabolomics Methods and Applications
Biomarkers
Biospecimens
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1
Sensitivity 1-Specificity Mechanistic Studies: Tracing Altered Pathways Statistical Modeling Identifying Drug Targets & Translation Unknown Identification MS or NMR Metabolite Detection Early Disease Detection Animal Models Human Population Cell Lines
Oxaloacetate Citrate Isocitrate 2-Oxo-glutarate Succinyl-CoA Succinate Fumarate Malate TCA CYCLE
Gowda & Raftery
- J. Magn. Reson. 2015
Analysis of complex biological samples/systems: 1000’s of small molecules
SysBio
Cancer Related Metabolomic Findings
New findings link genetic defect with metabolic up- regulation of metabolite linked with brain cancer. Sarcosine found as a strong tissue marker
- f PC aggressiveness.
Lipid Panel for Alzheimer’s Disease Prediction
10 lipids found to predict AD with 90% accuracy
Biomarker Translation Process
Gowda and Raftery, Current Metabolomics, 2013
Metabolomics Related Companies
The Metabolome and Its Measure
Human metabolome: 20,000 aqueous + 200,000 lipid metabolites Global Profiling >2000 aq. metabolites Targeted Profiling 20-300 aq. metabolites Quantitative 10-70 aq. metabolites Flux
Metabolome = small molecules <1500 Da
Dark Metabolome Quantitative Lipidomics: ~1200 lipids, 13 classes GOT- MS
Assays at NW-MRC
SOPs
Assay Metabolites
Targeted Aqueous Assay >300 aqueous metabolites from 60 pathways Global profiling ~2000 MS features, ~400 metabolites Quantitative lipid profiling Up to 1100 lipids from 13 classes Flux analysis TCA, glycolysis, amino acids, fatty acids, etc. Bile acid analysis 55 bile acids Tryptophan pathway analysis 25 Trp metabolites Carnitine analysis ~40 carnitines Cardiolipin assay ~20 cardiolipins Oxylipin assay ~100 signaling lipids Co-enzyme analysis 7 co-enzymes In situ monitoring Time resolved kinetics
… …
amino acids
- rganic acids
some amines glucose lipid classes
Major Metabolomics Tools
NMR LC-MS GC-MS
amino acids amines fatty acids nucleosides lipids carbohydrates Etc.
- rganic acids
aldehydes ketones
- ther volatiles
fatty acids amino acids steroids
Detected molecules: 30-100 Detected molecules: ~2000 (500 ID’d) Detected molecules: ~300 (150 ID’d)
Positive Negative
Targeted LC-MS Analysis
- >350 metabolite identities verified by standards
- Covers 60 KEGG pathways
- Hydrophilic interaction chromatography (HILIC) column
- Two columns in parallel for high throughput analysis
- All metabolites verified with standards
- Multiple-reaction-monitoring (MRM) mode
- Throughput: 30 study samples per day
- CV ~6-8%
Quantitative Lipidomics
New quantitative platform targets
- ver 1100 lipids from 13 lipid classes
Measures 700-900 lipids in blood with absolute concentrations:
triacylglycerols (TAG) diacylglycerols (DAG) cholesterol esters (CE) free fatty acids (FFA), phosphatidylcholines (PC) phosphatidylethanolamines (PE) lysophosphatidylcholines (LPC) lysophosphatidylethanolamines (LPE) sphingomyelins (SM) ceramides (CER, DCER, LCER, HCER)
Performance: Reproducibility: ~5% within batch Accuracy: ~10%
x10 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5
- ESI TIC Scan Frag=120.0V 10.d
Counts vs. Acquisition Time (min) 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5
5x10 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 Cpd 2278: 12.652: -ESI EIC(488.8730) Scan Frag=120.0V 10.d Counts vs. Acquisition Time (min) 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5
Total Ion Current
Human Urine ESI+ LC-QTOF-MS m/Z 60-1000 Agilent 6520 >2200 compounds
Global Profiling
How do you characterize these data?
1) #features 2) #compounds 3) Average CV 4) Number of metabolite IDs
Known Metabolome
Major Challenge:
Tools for Unknown Identification
Metabolic annotation of ten urine samples analyzed by HILIC-(-)-ESI-MS/MS (DDA), according to “direct parent level” of the ClassyFire ranking system.
Molecular Networking from UC San Diego
Global mass spectral molecular network of urine samples acquired in ESI positive mode. Molecular network colored by putative chemical “superclass level” retrieved through the MolNetEnhancer workflow and ClassyFire.
ClassyFire from UC Davis
Data Quality Metrics for MS Global Profiling
Despite successes and wide usage, global profiling has had trouble with reproducibility and data quality
There are some new efforts to develop standard reference materials (SRMs) at NIST for example.
But better measures of data quality are needed.
Towards that end we’re working on a set of 5 Data Quality Metrics (5 Easy Metrics).
Experiment:
50 replicates of a pooled human serum sample
protein precipitated
Run on 2 Agilent QTOF instruments 6520, 6545
ESI+ only
Processed using Progenesis QI
Profile and Centroid data acquired
Feature/compound defined as having a minimum of 2 ions
Goal is to help define a set of consensus measures
Xinyu Zhang Jiyang Dong
5 Easy Metrics of Data Quality
TIC Reproducibility
5 10 15
0.0 2.0x10
84.0x10
86.0x10
88.0x10
8TIC (A.U.) Retention Time (min) B%
B% ESI (+) 20 40 60 80 100
Missing Values CV vs Intensity Metabolites Detected
1000 2000 3000 4000 5000 6545(P) 6545(C) 6520(P) 6520(C)
Compound Numbers
Compounds detected with two-or-more ions m/Z identified by HMDB
ICC vs Intensity Useful for comparing and optimizing analyses as well as documenting/publishing
Typical Metabolomics Data Analysis Workflow
1,000,000 data points 2,000 features 300 Identified metabolites 10-50 statistically different metabolites Instrument manufacturer
- r 3rd party software
Library of compound spectra Statistical methods: Feature selection Statistical model for validation
- 0.6
- 0.4
- 0.2
0.0 0.2 0.4 0.6
- 6000
- 4000
- 2000
2000 4000 6000 8000
- 6000
- 4000
- 2000
2000
GCGCMS PC1 NMR PC1 NMR PC2
Cancer Control
Statistical methods: Model building and testing
5 x10 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 Cpd 2278: 12.652: -ESI EIC(488.8730) Scan Frag=120.0V 10.d Counts vs. Acquisition Time (min) 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5MS or NMR data Global Profiling Data
Supervised Multivariate Statistics: PLS-DA
bi : re regre ression coefficients for r vari riables X 1 thr hroug ugh h X p
PLS-DA is used to fit a model between the spectral data and the class information:
Y = b0 + b1X1 + b2X2 + ... + bpXp
Class variables: “Control = 0” and “Case = 1”
Test Set of Samples Disease Control Training set samples Statistical model Training Set of Samples Disease Control
- 40 -20
20 40 60
- 20
- 10
10 20 30 40
- 100
- 80
- 60
- 40
- 20
0 20 40
Control Cancer BE HGD LV 2 (10.69%) L V 3 ( 3 . 8 4 % ) L V 1 ( 1 2 . 4 8 % )
PLS-DA score plot for the whole NMR spectra
- 80
- 70
- 60
- 50
- 40-30-20-10
0 10 20 30
- 50
- 40
- 30
- 20
- 10
10 20 30 40
- 40
- 20
0 20406080 100 120 140
Normal Cancer BE HGD LV 3 (2.40%) LV 2 (3.80%) LV 1 (3.69%)
PLS-DA score plot for the whole LC-MS spectra
Zhang J, et al. J Thorac Cardiovasc Surg., 2011 Zhang J, et al. PlosOne, 2012. Buas et al. Metabolomics, 2017
Global Metabolomics of Esophageal Cancer
Analysis of serum samples from patients with EC, at risk patients and healthy controls However, the clinically relevant comparison, BE vs EC, is harder to distinguish.
Focusing on at-Risk Patients
BE (n=62) vs GERD (n=100) AUC = 0.64 HGD/EC (n=100) vs BE (n=60) AUC = 0.75 Buas et al., Metabolomics 2017.
Challenges in Biomarker Translation
Ease of discovery in small studies
Typically, variables > samples Very, very powerful statistical methods Opportunity for mismatched sample sets to give “signal”
Difficulty in Validation
Biomarker signals tend to be small, Δµ/σ < 1 Performance degrades due to overtraining Mismatch between discovery and validation sets
Challenges in uptake by medical community
Does biomarker change clinical practice? Ramifications of false positives vs false negatives? Will anyone pay for it?
Targeted LC-MS Based Metabolomics
- f Colon Cancer
Jiangjiang Zhu
Colorectal Cancer (CRC)
No.3 leading cancer type in the US. No.3 cause of cancer death in the US. Five-year Relative Survival Rates:
Local: 90% Regional: 70% Distant: 12%
American Cancer Society, Surveillance Research 2013
Picture source: AGAJournals.org
Colon Cancer Development
CC can develop for 10-20 years before polyps convert to cancer. Risk factors: Age Race Gender Smoking Diet Diabetes Other cancers Industrial Countries
Classical Screening Tests
Blood test Stool test Colonoscopy Biopsy
Blackdoctor.org Nytimes.com Healthland.tiem.com drdach.com
- Low sensitivity
- - 43% for FOBT, 70% for FIT
- Invasiveness
- Potential risks of complications
- Experience of pain and discomfort
- Low compliance rate (<50% for colonoscopy)
Drawbacks
Cologard Test
Uses BMP4, NDRG4, KRAS gene panel + FIT for human hemoglobin. 10,000 patient trial (300 CRC patients) $100M, FDA approved Now covered by some insurance
2014
Altered Metabolism in Cancer: Warburg Effect
Glutamine found to replace missing energy
Thompson et al, Science 2009
Cancer Disturbs Normal Metabolism
Positive Negative
Targeted LC-MS Analysis
- Hydrophilic interaction chromatography (HILIC) column
- Two columns in parallel for high throughput analysis
- Multiple-reaction-monitoring (MRM) mode
- All 158 metabolite identities verified by standards
Danijel Djukovic
Metabolic Pathways Number of Metabolites Alanine, aspartate and glutamate metabolism
10
Arginine and proline metabolism
13
Butanoate metabolism
6
Citrate cycle (TCA cycle)
11
Cysteine and methionine metabolism
14
Fatty acid metabolism
3
Glutathione metabolism
12
Glycine, serine and threonine metabolism
6
Glycolysis / Gluconeogenesis
12
Histidine metabolism
8
Lysine biosynthesis
7
Lysine degradation
6
Nitrogen metabolism
9
Oxidative phosphorylation
6
Pentose phosphate pathway
5
Phenylalanine metabolism
6
Phenylalanine, tyrosine and tryptophan biosynthesis
8
Purine metabolism
10
Pyrimidine metabolism
6
Pyruvate metabolism
5
Synthesis and degradation of ketone bodies
4
Tryptophan metabolism
10
Tyrosine metabolism
9
Valine, leucine and isoleucine biosynthesis
7
Valine, leucine and isoleucine degradation
5
Metabolites/Pathways Under Investigation
Distribution of Coefficient of Variance (CV)
0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 <5% 5-10% 10-15% 15-20% >20%
% of metabolites Positive mode Negative mode CV value CV value
10 20 30 40 50 60 <5% 5-10% 10-15% 15-20% >20%
Batch 1 2 3 4 5 Average Pos CV (%) 8.6 6.5 7.7 5.6 6.4 7.0 Neg CV (%) 6.7 7.2 5.7 5.8 7.3 6.5
Average CV over 12 days is ~ 11%
Total n=234 CRC n=66 Polyps n=76 Healthy Control n=92 Age Median 58 56 57 Min 27 37 18 Max 88 86 80 Gender Male 30 37 45 Female 36 39 47 Cancer stage Stage I/II 21 − − Stage III 17 − − Stage IV 28 − − Diagnosis Colon Cancer 39 − − Rectal Cancer 27 − −
Summary of Patient Information
Data Analysis And Biomarker Selection
Univariate analysis (p-value, individual ROC) Multivariate analysis
Variables (core metabolites) selection using PLS VIP Build PLS-DA model use VIP metabolites Build enhanced PLS-DA model by adding clinical
factors (age, gender, smoking and alcohol status)
Monte Carlo Cross Validation
Metabolites Cancer vs. Health FC Cancer vs. Polyps FC Histidine (156.1 / 110.0) 2.75E-06 0.81 2.57E-05 0.85 Glyceraldehyde (89.0 / 59.0) 1.52E-05 1.34 2.19E-07 1.41 Glycochenodeoxycholate (448.3 / 74.0) 5.84E-05 1.42 9.22E-05 2.27 Hyppuric Acid (178.0 / 134.0) 8.46E-05 2.73 1.40E-05 3.22 Methionine (150.1 / 61.0) 1.16E-04 0.88 1.89E-06 0.85 Lysine (147.1 / 84.0 (2)) 1.18E-04 0.88 5.00E-06 0.84 Linolenic Acid (279.1 / 261.0) 3.32E-04 0.78 2.04E-04 0.77 Glycocholate (464.3 / 74.0) 4.25E-04 1.79 1.94E-04 3.01 Glutamic acid (148.1 / 84.0) 6.27E-04 1.22 5.84E-03 1.22 N-AcetylGlycine (116.0 / 74.0) 7.70E-04 0.75 1.71E-03 0.67 2'-Deoxyuridine (229.1 / 113.0) 8.10E-04 0.91 3.81E-05 0.89 Allantoin (157.0 / 114.0) 1.02E-03 1.24 2.58E-02 1.11 Glutamine (147.1 / 84.0) 1.12E-03 0.92 3.91E-04 0.92 Aspartic Acid (132.0 / 88.0) 1.40E-03 1.37 1.32E-03 1.39 Dimethylglycine (104.1 / 58.0) 1.40E-03 0.82 2.11E-04 0.78 Maleic Acid (115.0 / 71.0 (2)) 1.45E-03 1.13 1.96E-03 1.11 Hydroxyproline/Aminolevulinate (132.1 / 86.2) 1.66E-03 1.37 9.09E-04 1.32 Adenylosuccinate (462.1 / 79.0) 2.35E-03 1.21 1.02E-02 1.19 Malonic Acid/3HBA (103.0 / 59.0) 3.47E-03 0.78 Cystathionine (221.1 / 134.0) 4.90E-03 1.45 Alpha-Ketoglutaric Acid (145.0 / 101.0) 5.71E-03 0.93 3.06E-03 0.93 Kynorenate (188.0 / 144.0) 6.49E-03 1.16 1.01E-02 1.10 2-Aminoadipate (160.1 / 116.0) 8.81E-03 0.86 Creatinine (114.1 / 44.0) 9.08E-03 0.90 2.04E-03 0.88 Urate (167.0 / 124.0) 1.40E-02 0.93 1.49E-02 0.93 Homogentisate (167.0 / 123.0) 1.59E-02 0.92 2.81E-02 0.93 Proline (116.1 / 70.0) 1.72E-02 1.10 gama-Aminobutyrate (102.1 / 85.0) 1.75E-02 0.93 4.95E-03 0.91 A i i (175 1 / 70 0) 1 77E 02 1 08 1 05
Summary of p-Values and Fold Changes
Lower Bound Upper Bound Histidine 0.719 0.040 0.640 0.798 0.924 0.467 0.658 Glyceraldehyde 0.702 0.042 0.619 0.785 0.742 0.641 0.686 Glycochenodeoxycholate 0.688 0.042 0.605 0.770 0.879 0.435 0.620 Hyppuric Acid 0.684 0.044 0.597 0.771 0.591 0.794 0.709 Methionine 0.680 0.043 0.596 0.764 0.667 0.630 0.646 Lysine 0.680 0.043 0.595 0.764 0.530 0.794 0.684 Linolenic Acid 0.668 0.044 0.581 0.755 0.439 0.880 0.696 Glycocholate 0.665 0.043 0.580 0.749 0.742 0.565 0.703 Glutamic acid 0.660 0.044 0.574 0.746 0.606 0.707 0.665 N-AcetylGlycine 0.657 0.044 0.570 0.744 0.788 0.511 0.623 2'-Deoxyuridine 0.656 0.044 0.571 0.742 0.576 0.685 0.639 Allantoin 0.653 0.043 0.568 0.739 0.606 0.663 0.639 Glutamine 0.652 0.044 0.566 0.739 0.546 0.707 0.639 Aspartic Acid 0.649 0.046 0.559 0.739 0.439 0.859 0.684 Dimethylglycine 0.649 0.044 0.562 0.736 0.606 0.663 0.639 Maleic Acid ) 0.649 0.045 0.560 0.737 0.606 0.707 0.665 Hydroxyproline/Aminolevulinate 0.647 0.044 0.561 0.733 0.682 0.587 0.627 Adenylosuccinate 0.642 0.045 0.553 0.731 0.439 0.815 0.658 Malonic Acid/3HBA 0.637 0.048 0.542 0.731 0.546 0.815 0.703 Accuracy Metabolites AUROC Std. Error 95% Confidence Interval Sensitivity Specificity
Single Metabolite Performance
A B Cancer patients vs. Healthy controls AUROC 0.90 Cancer patients vs. Polyp patients AUROC 0.94
ROC Curves of PLS-DA VIP Metabolite Model
Cancer patients vs. Healthy controls Cancer patients vs. Polyp patients
MCCV of VIP Metabolite Model
AUROC: 0.90 AUROC: 0.93 Sensitivity: 0.96 Specificity: 0.80 AUROC: 0.94 AUROC: 0.95 Sensitivity: 0.89 Specificity: 0.88
CRC vs. Control CRC vs. Polyp Metabolites only model Metabolites + Clinical factor model
Improved Model using Clinical Information
Compared to Healthy controls Colon Cancer Rectal Cancer Stage I/II Stage III Stage IV AUROC 0.96 0.93 0.93 0.93 0.99 Sensitivity 0.95 0.93 0.95 0.93 0.94 Specificity 0.88 0.82 0.82 0.75 0.94 Compared to Polyps Colon Cancer Rectal Cancer Stage I/II Stage III Stage IV AUROC 0.96 0.95 0.97 0.94 0.99 Sensitivity 0.92 0.92 0.95 0.94 1.00 Specificity 0.91 0.78 0.92 0.82 0.96
Model Performance
Summary
Combining MS and NMR using Variable Selection
Ling Li, A. Chem., 2016
Study Demographics
Healthy Controls Polyps CRC Patients 55 44 28 Age, mean (range) 52.8 (21-74) 56.1 (39-68) 55.3(27-86) Gender Male 25 23 16 Female 30 21 12 Polyps size 0-5
- 26
- 6-9
- 4
- 10-15
- 3
- 16+
- 11
- Cancer stage
Stage I/II
- 3
Stage III
- 8
Stage IV
- 17
Diagnosis Colon Cancer
- 19
Rectal Cancer
- 9
Ethnicity Caucasian 50 23 15 African American 4 1 2 Asian 1 NA* 1 19 11
Statistical Approach
127 serum samples: CRC=28, Polyp=44, Control=55 Drop each variable and establish new PLS-DA models PLS-DA model with selected variables Select the variables that generate the best prediction Monte Carlo cross validation MS data NMR data NMR+MS data Optimal variable subset
Backward Variable Elimination
BVE Results
MCCV Model Accuracy
Differentiating Polyps from Controls
More difficult, still BVE approach provides best accuracy
Cologard Results
Polyps are hard to detect, even using stool based DNA…
NEJM, 2014
Modeling Metabolite Levels
Usually, in metabolomics, one models a disease state based on metabolite levels and possibly clinical/demographic variables (age, gender, BMI, etc). For example, in a PLS-DA model:
Y = b0 + b1X1 + b2X2 + ... + bpXp
Clinical variables can have a large effect on the model. We’re interested in the effects of clinical variables on metabolite levels. So, instead, we’re going to try to model metabolite levels based on clinical variables…
Chen et al., J. Prot. Res., 2015
Supervised Multivariate Statistics: PLS-DA
bi : re regre ression coefficients for r vari riables X 1 thr hroug ugh h X p
PLS-DA is used to fit a model between the spectral data and the class information:
Y = b0 + b1X1 + b2X2 + ... + bpXp
Class variables: “Control = 0” and “Case = 1”
Test Set of Samples Disease Control Training set samples Statistical model Training Set of Samples Disease Control
Seemingly Unrelated Regression
To do this, we’re going to use a SUR model. Developed in 1962 by Arnold Zellnera to model missing but correlated economic data:
… …
Where: Y is a predicted metabolite level x is a confounding factor ß is a fitted coefficient And ϵ is a correlated error matrix
Chen et al., J. Prot. Res., 2015 Metabolomics, 2017
Study Information
24 metabolites detected by NMR Clinical info: age, gender, BMI, smoking, alcohol, diagnosis Added interactions between diagnosis and clinical variables. Total n=102 Polyps n=44 Healthy Controls n=58 Age Mean 62.4 ±6.3 59.7±6.6 Gender Male 29 24 Female 15 34 BMI Mean 29.1±5.6 27.9±6.5 Ever Smoked Yes 22 25 No 22 33 Alcohol Status Alcohol 29 43 Little/No Alcohol 15 15
Analysis Flowchart
Build 24 metabolite SUR model Backward Selection for Covariates
M = b0 + b1G + b2BMI + b3S ... + bpDp
SUR Results on Covariates and Metabolites
Selected Effects p-Value Gender 9.9e-08 BMI 0.0023 BMI2 0.15 Smoking 0.56 Diagnosis 0.39 Diagnosis × Gender 0.045 Diagnosis × BMI 0.012 Diagnosis × BMI2 0.17 Diagnosis × Smoking 0.0049
Metabolite p-Value 3-Hydroxybutyric acid 0.59 Acetic acid 0.98 Acetoacetate 0.66 Acetone 0.22 Alanine 0.35 Choline/Phosphocholine 0.78 Citric acid 0.75 Creatinine 0.46 Dimethylglycine 0.91 Formate 0.91 Glucose 0.070 Glutamic acid 0.84 Glutamine 0.86 Glycine 0.60 Histidine 0.67 Isoleucine 0.54 Lactate 0.81 Lysine 0.51 Phenylalanine 0.26 Threonine 0.78 Tyrosine 0.55 Unsaturated-Lipids 0.39 Urea 0.098 Valine 0.010
A number of clinical covariates significantly affect metabolite levels. However, when we take these clinical factors into account, the disease signal is very small on individual metabolites
M = b0 + b1G1 + b2BMI + b3S ... + bpDp
SUR Results on Biological Groups
Biological Groups p-Value Adjusted p-Value
Group 1: acetate, glucose, lactate 0.014 0.023 Group 2: isoleucine, valine 0.0046 0.012 Group 3: alanine, glutamic acid, glutamine 0.060 0.069 Group 4: creatinine, glutamine, urea 0.0010 0.0050 Group 5: glutamic acid, histidine 0.058 0.072 Group 6: acetoacetate, acetone, lactate 0.33 0.33 Group 7: acetoacetate, citric acid, tyrosine 0.0011 0.0041 Group 8: citric acid, formate, glutamic acid, glutamine 0.23 0.25 Group 9: phenylalanine, tyrosine 0.0021 0.0063 Group 10: alanine, glutamic acid, glutamine, glycine, histidine, isoleucine, lysine, phenylalanine, threonine, tyrosine, valine 1.5e-07 2.3e-06 Group 11: alanine, citric acid, glucose, lactate 0.021 0.032 Group 12: glycine, threonine 0.0051 0.011 Group 13: alanine, glutamic acid, glycine, threonine 0.031 0.042 Group 14: alanine, glutamic acid, glycine, isoleucine, threonine, valine 1.2e-05 9.1e-05 Group 15: choline/phosphocholine, glycine, threonine 0.0057 0.011
Combining the results of multiple metabolites did bring back some performance:
Modeling Metabolite Levels
Chen et al., Metabolomics, 2017
Using Targeted MS Data
Effect of SUR on Correlations
Urea cycle
New Approaches for Pathway Mapping
Conclusions
Biomarker
discovery and especially validation is very
- challenging. High quality data is an important first step.
Improved performance of global profiling is being achieved
and new sets of data quality metrics will be useful to assess these improvements.
Broad-based targeted MS provides excellent reliability, which
allows a realistic idea of putative biomarker strength.
Advanced statistical methods (like SUR and other modeling
approaches) may account for some confounding factors and provide a path for improving biomarker performance.
Biological Samples
Metabolomics is a comparative science Human samples are challenging because they vary due to:
Gender Age Diet Lifestyle, etc. Nevertheless, human studies are possible with careful design
Animal samples can be controlled better
Their samples numbers can be smaller (>5) Genetic knock-outs, knock-ins; normal v disease Paired studies are better Great also for validating human discoveries
Cell based studies can be even more attractive
Provide an opportunity to probe fundamental biology Of course, such studies are further from important applications
Sample Prep
Quench Sample (-80 °C) Extract metabolites (MeOH, CHCl3, etc.) Dry/concentrate sample Add standards and run!