[PPT] - Harnessing Metabolomics for Biomarker Discovery in Colon Cancer: PowerPoint Presentation

SLIDE 1

Harnessing Metabolomics for Biomarker Discovery in Colon Cancer: Opportunities and Challenges

Daniel Raftery UW Medicine

Northwest Metabolomics Research Center

SLIDE 2

Acknowledgements

Current and Past Members

Nagana Gowda Danijel Djukovic Lisa Bettcher Natalie Nguyen Hayley Purcell Vadim Pascua Fausto Carnevale-Neto

Collaborators

Min Zhang, Purdue Dabao Zhang, Purdue Chen Chen, Purdue Bruce Clurman, FH Gabi Chiorean, SCCA Jiyong Dong, Xiamen

https://nwwashington.edu

Rob Pepin Jianjiang Zhu Xinyu Zhang Haiwei Gu Leela Paudel Lingli Deng Ping Zhang Wentao Zhu Dan Du Renke Zhang Dongfang Wang Qiang Fei

SLIDE 3

Metabolomics in Context

Biological Systems

O OH

O H HO H HO H H OH H OH OH

NH O OH

H2N OH O

Genes Metabolites Metabolic Profiling Proteomics Genomics Proteins Advanced Analytical Techniques Stimuli

Genotype + Environment --> Phenotype

SLIDE 4

Analysis of small molecules

in bio-systems ~20,000 aq & 200,000+ lipids Endogenous metabolites Exogenous metabolites

Applications in Metabolomics

Disease Diagnostics Companion (Drug) Diagnostics Toxicology Food and Nutrition Drug Discovery Personalized Medicine Systems Biology Research

G. A. N. Gowda, S. Zhang, H. Gu, V. Asiago, N. Shanaiah and D. Raftery, "Metabolomics-Based Methods for Early Disease Diagnostics: A Review,"

Expert Rev. Mol. Diagnos., 8:617-33 (2008)

Metabolomics

Time Sick Regulation Healthy

Discovery of Biomarkers

Understanding Systems Biology BioMarkers

SLIDE 5

Brief History

 2000 BC Chinese/Greek apocryphal story of ants and urine  1800-1900: Identification of various metabolites  1930 – 50’s Metabolite pathways identified  1950 -1960’s: MS and NMR development  1960’s: First “metabolomics” studies  1970’s: LC and chemometrics development  1980’s: LC-MS and high field NMR development  1998-99: Metabonomics and metabolomics coined  2000’s: Development of statistical methods and databases  Field is expanding rapidly (>1000 papers/year)

SLIDE 6

Metabolism

Metabolism is:

 Complex  Interconnected  Influenced by genetics &

environment (food, stresses including illness)

 Affects upstream biology

(gene expression, epigenetics, protein function)

SLIDE 7

Metabolic Maps

SLIDE 8

Metabolomics Methods and Applications

Biomarkers

Biospecimens

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Sensitivity 1-Specificity Mechanistic Studies: Tracing Altered Pathways Statistical Modeling Identifying Drug Targets & Translation Unknown Identification MS or NMR Metabolite Detection Early Disease Detection Animal Models Human Population Cell Lines

Oxaloacetate Citrate Isocitrate 2-Oxo-glutarate Succinyl-CoA Succinate Fumarate Malate TCA CYCLE

Gowda & Raftery

J. Magn. Reson. 2015

Analysis of complex biological samples/systems: 1000’s of small molecules

SysBio

SLIDE 9

Cancer Related Metabolomic Findings

New findings link genetic defect with metabolic up- regulation of metabolite linked with brain cancer. Sarcosine found as a strong tissue marker

f PC aggressiveness.

SLIDE 10

Lipid Panel for Alzheimer’s Disease Prediction

10 lipids found to predict AD with 90% accuracy

SLIDE 11

Biomarker Translation Process

Gowda and Raftery, Current Metabolomics, 2013

SLIDE 12

Metabolomics Related Companies

SLIDE 13

The Metabolome and Its Measure

Human metabolome: 20,000 aqueous + 200,000 lipid metabolites Global Profiling >2000 aq. metabolites Targeted Profiling 20-300 aq. metabolites Quantitative 10-70 aq. metabolites Flux

Metabolome = small molecules <1500 Da

Dark Metabolome Quantitative Lipidomics: ~1200 lipids, 13 classes GOT- MS

SLIDE 14

Assays at NW-MRC

SOPs

Assay Metabolites

Targeted Aqueous Assay >300 aqueous metabolites from 60 pathways Global profiling ~2000 MS features, ~400 metabolites Quantitative lipid profiling Up to 1100 lipids from 13 classes Flux analysis TCA, glycolysis, amino acids, fatty acids, etc. Bile acid analysis 55 bile acids Tryptophan pathway analysis 25 Trp metabolites Carnitine analysis ~40 carnitines Cardiolipin assay ~20 cardiolipins Oxylipin assay ~100 signaling lipids Co-enzyme analysis 7 co-enzymes In situ monitoring Time resolved kinetics

… …

SLIDE 15

amino acids

rganic acids

some amines glucose lipid classes

Major Metabolomics Tools

NMR LC-MS GC-MS

amino acids amines fatty acids nucleosides lipids carbohydrates Etc.

rganic acids

aldehydes ketones

ther volatiles

fatty acids amino acids steroids

Detected molecules: 30-100 Detected molecules: ~2000 (500 ID’d) Detected molecules: ~300 (150 ID’d)

SLIDE 16

Positive Negative

Targeted LC-MS Analysis

>350 metabolite identities verified by standards
Covers 60 KEGG pathways
Hydrophilic interaction chromatography (HILIC) column
Two columns in parallel for high throughput analysis
All metabolites verified with standards
Multiple-reaction-monitoring (MRM) mode
Throughput: 30 study samples per day
CV ~6-8%

SLIDE 17

Quantitative Lipidomics

New quantitative platform targets

ver 1100 lipids from 13 lipid classes

Measures 700-900 lipids in blood with absolute concentrations:

triacylglycerols (TAG) diacylglycerols (DAG) cholesterol esters (CE) free fatty acids (FFA), phosphatidylcholines (PC) phosphatidylethanolamines (PE) lysophosphatidylcholines (LPC) lysophosphatidylethanolamines (LPE) sphingomyelins (SM) ceramides (CER, DCER, LCER, HCER)

Performance: Reproducibility: ~5% within batch Accuracy: ~10%

SLIDE 18 6

x10 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5

ESI TIC Scan Frag=120.0V 10.d

Counts vs. Acquisition Time (min) 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5

5

x10 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 Cpd 2278: 12.652: -ESI EIC(488.8730) Scan Frag=120.0V 10.d Counts vs. Acquisition Time (min) 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5

Total Ion Current

Human Urine ESI+ LC-QTOF-MS m/Z 60-1000 Agilent 6520 >2200 compounds

Global Profiling

How do you characterize these data?

1) #features 2) #compounds 3) Average CV 4) Number of metabolite IDs

SLIDE 19

Known Metabolome

Major Challenge:

SLIDE 20

Tools for Unknown Identification

Metabolic annotation of ten urine samples analyzed by HILIC-(-)-ESI-MS/MS (DDA), according to “direct parent level” of the ClassyFire ranking system.

Molecular Networking from UC San Diego

Global mass spectral molecular network of urine samples acquired in ESI positive mode. Molecular network colored by putative chemical “superclass level” retrieved through the MolNetEnhancer workflow and ClassyFire.

ClassyFire from UC Davis

SLIDE 21

Data Quality Metrics for MS Global Profiling



Despite successes and wide usage, global profiling has had trouble with reproducibility and data quality



There are some new efforts to develop standard reference materials (SRMs) at NIST for example.



But better measures of data quality are needed.



Towards that end we’re working on a set of 5 Data Quality Metrics (5 Easy Metrics).



Experiment:



50 replicates of a pooled human serum sample



protein precipitated



Run on 2 Agilent QTOF instruments 6520, 6545



ESI+ only



Processed using Progenesis QI



Profile and Centroid data acquired



Feature/compound defined as having a minimum of 2 ions



Goal is to help define a set of consensus measures

Xinyu Zhang Jiyang Dong

SLIDE 22

5 Easy Metrics of Data Quality

TIC Reproducibility

5 10 15

0.0 2.0x10

8

4.0x10

8

6.0x10

8

8.0x10

8

TIC (A.U.) Retention Time (min) B%

B% ESI (+) 20 40 60 80 100

Missing Values CV vs Intensity Metabolites Detected

1000 2000 3000 4000 5000 6545(P) 6545(C) 6520(P) 6520(C)

Compound Numbers

Compounds detected with two-or-more ions m/Z identified by HMDB

ICC vs Intensity Useful for comparing and optimizing analyses as well as documenting/publishing

SLIDE 23

Typical Metabolomics Data Analysis Workflow

1,000,000 data points 2,000 features 300 Identified metabolites 10-50 statistically different metabolites Instrument manufacturer

r 3rd party software

Library of compound spectra Statistical methods: Feature selection Statistical model for validation

0.6
0.4
0.2

0.0 0.2 0.4 0.6

6000
4000
2000

2000 4000 6000 8000

6000
4000
2000

2000

GCGCMS PC1 NMR PC1 NMR PC2

Cancer Control

Statistical methods: Model building and testing

5 x10 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 Cpd 2278: 12.652: -ESI EIC(488.8730) Scan Frag=120.0V 10.d Counts vs. Acquisition Time (min) 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5

MS or NMR data Global Profiling Data

SLIDE 24

Supervised Multivariate Statistics: PLS-DA

bi : re regre ression coefficients for r vari riables X 1 thr hroug ugh h X p

PLS-DA is used to fit a model between the spectral data and the class information:

Y = b0 + b1X1 + b2X2 + ... + bpXp

Class variables: “Control = 0” and “Case = 1”

Test Set of Samples Disease Control Training set samples Statistical model Training Set of Samples Disease Control

SLIDE 25

40 -20

20 40 60

20
10

10 20 30 40

100
80
60
40
20

0 20 40

Control Cancer BE HGD LV 2 (10.69%) L V 3 ( 3 . 8 4 % ) L V 1 ( 1 2 . 4 8 % )

PLS-DA score plot for the whole NMR spectra

80
70
60
50
40-30-20-10

0 10 20 30

50
40
30
20
10

10 20 30 40

40
20

0 20406080 100 120 140

Normal Cancer BE HGD LV 3 (2.40%) LV 2 (3.80%) LV 1 (3.69%)

PLS-DA score plot for the whole LC-MS spectra

Zhang J, et al. J Thorac Cardiovasc Surg., 2011 Zhang J, et al. PlosOne, 2012. Buas et al. Metabolomics, 2017

Global Metabolomics of Esophageal Cancer

Analysis of serum samples from patients with EC, at risk patients and healthy controls However, the clinically relevant comparison, BE vs EC, is harder to distinguish.

SLIDE 26

Focusing on at-Risk Patients

BE (n=62) vs GERD (n=100) AUC = 0.64 HGD/EC (n=100) vs BE (n=60) AUC = 0.75 Buas et al., Metabolomics 2017.

SLIDE 27

Challenges in Biomarker Translation

 Ease of discovery in small studies

 Typically, variables > samples  Very, very powerful statistical methods  Opportunity for mismatched sample sets to give “signal”

 Difficulty in Validation

 Biomarker signals tend to be small, Δµ/σ < 1  Performance degrades due to overtraining  Mismatch between discovery and validation sets

 Challenges in uptake by medical community

 Does biomarker change clinical practice?  Ramifications of false positives vs false negatives?  Will anyone pay for it?

SLIDE 28

Targeted LC-MS Based Metabolomics

f Colon Cancer

Jiangjiang Zhu

SLIDE 29

Colorectal Cancer (CRC)

 No.3 leading cancer type in the US.  No.3 cause of cancer death in the US.  Five-year Relative Survival Rates:

 Local: 90%  Regional: 70%  Distant: 12%

American Cancer Society, Surveillance Research 2013

Picture source: AGAJournals.org

SLIDE 30

Colon Cancer Development

CC can develop for 10-20 years before polyps convert to cancer. Risk factors: Age Race Gender Smoking Diet Diabetes Other cancers Industrial Countries

SLIDE 31

Classical Screening Tests

Blood test Stool test Colonoscopy Biopsy

Blackdoctor.org Nytimes.com Healthland.tiem.com drdach.com

SLIDE 32

Low sensitivity
- 43% for FOBT, 70% for FIT
Invasiveness
Potential risks of complications
Experience of pain and discomfort
Low compliance rate (<50% for colonoscopy)

Drawbacks

SLIDE 33

Cologard Test

Uses BMP4, NDRG4, KRAS gene panel + FIT for human hemoglobin. 10,000 patient trial (300 CRC patients) $100M, FDA approved Now covered by some insurance

2014

SLIDE 34

Altered Metabolism in Cancer: Warburg Effect

Glutamine found to replace missing energy

Thompson et al, Science 2009

SLIDE 35

Cancer Disturbs Normal Metabolism

SLIDE 36

Positive Negative

Targeted LC-MS Analysis

Hydrophilic interaction chromatography (HILIC) column
Two columns in parallel for high throughput analysis
Multiple-reaction-monitoring (MRM) mode
All 158 metabolite identities verified by standards

Danijel Djukovic

SLIDE 37

Metabolic Pathways Number of Metabolites Alanine, aspartate and glutamate metabolism

10

Arginine and proline metabolism

13

Butanoate metabolism

6

Citrate cycle (TCA cycle)

11

Cysteine and methionine metabolism

14

Fatty acid metabolism

3

Glutathione metabolism

12

Glycine, serine and threonine metabolism

6

Glycolysis / Gluconeogenesis

12

Histidine metabolism

8

Lysine biosynthesis

7

Lysine degradation

6

Nitrogen metabolism

9

Oxidative phosphorylation

6

Pentose phosphate pathway

5

Phenylalanine metabolism

6

Phenylalanine, tyrosine and tryptophan biosynthesis

8

Purine metabolism

10

Pyrimidine metabolism

6

Pyruvate metabolism

5

Synthesis and degradation of ketone bodies

4

Tryptophan metabolism

10

Tyrosine metabolism

9

Valine, leucine and isoleucine biosynthesis

7

Valine, leucine and isoleucine degradation

5

Metabolites/Pathways Under Investigation

SLIDE 38

Distribution of Coefficient of Variance (CV)

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 <5% 5-10% 10-15% 15-20% >20%

% of metabolites Positive mode Negative mode CV value CV value

10 20 30 40 50 60 <5% 5-10% 10-15% 15-20% >20%

Batch 1 2 3 4 5 Average Pos CV (%) 8.6 6.5 7.7 5.6 6.4 7.0 Neg CV (%) 6.7 7.2 5.7 5.8 7.3 6.5

Average CV over 12 days is ~ 11%

SLIDE 39

Total n=234 CRC n=66 Polyps n=76 Healthy Control n=92 Age Median 58 56 57 Min 27 37 18 Max 88 86 80 Gender Male 30 37 45 Female 36 39 47 Cancer stage Stage I/II 21 − − Stage III 17 − − Stage IV 28 − − Diagnosis Colon Cancer 39 − − Rectal Cancer 27 − −

Summary of Patient Information

SLIDE 40

Data Analysis And Biomarker Selection

 Univariate analysis (p-value, individual ROC)  Multivariate analysis

 Variables (core metabolites) selection using PLS VIP  Build PLS-DA model use VIP metabolites  Build enhanced PLS-DA model by adding clinical

factors (age, gender, smoking and alcohol status)

 Monte Carlo Cross Validation

SLIDE 41

Metabolites Cancer vs. Health FC Cancer vs. Polyps FC Histidine (156.1 / 110.0) 2.75E-06 0.81 2.57E-05 0.85 Glyceraldehyde (89.0 / 59.0) 1.52E-05 1.34 2.19E-07 1.41 Glycochenodeoxycholate (448.3 / 74.0) 5.84E-05 1.42 9.22E-05 2.27 Hyppuric Acid (178.0 / 134.0) 8.46E-05 2.73 1.40E-05 3.22 Methionine (150.1 / 61.0) 1.16E-04 0.88 1.89E-06 0.85 Lysine (147.1 / 84.0 (2)) 1.18E-04 0.88 5.00E-06 0.84 Linolenic Acid (279.1 / 261.0) 3.32E-04 0.78 2.04E-04 0.77 Glycocholate (464.3 / 74.0) 4.25E-04 1.79 1.94E-04 3.01 Glutamic acid (148.1 / 84.0) 6.27E-04 1.22 5.84E-03 1.22 N-AcetylGlycine (116.0 / 74.0) 7.70E-04 0.75 1.71E-03 0.67 2'-Deoxyuridine (229.1 / 113.0) 8.10E-04 0.91 3.81E-05 0.89 Allantoin (157.0 / 114.0) 1.02E-03 1.24 2.58E-02 1.11 Glutamine (147.1 / 84.0) 1.12E-03 0.92 3.91E-04 0.92 Aspartic Acid (132.0 / 88.0) 1.40E-03 1.37 1.32E-03 1.39 Dimethylglycine (104.1 / 58.0) 1.40E-03 0.82 2.11E-04 0.78 Maleic Acid (115.0 / 71.0 (2)) 1.45E-03 1.13 1.96E-03 1.11 Hydroxyproline/Aminolevulinate (132.1 / 86.2) 1.66E-03 1.37 9.09E-04 1.32 Adenylosuccinate (462.1 / 79.0) 2.35E-03 1.21 1.02E-02 1.19 Malonic Acid/3HBA (103.0 / 59.0) 3.47E-03 0.78 Cystathionine (221.1 / 134.0) 4.90E-03 1.45 Alpha-Ketoglutaric Acid (145.0 / 101.0) 5.71E-03 0.93 3.06E-03 0.93 Kynorenate (188.0 / 144.0) 6.49E-03 1.16 1.01E-02 1.10 2-Aminoadipate (160.1 / 116.0) 8.81E-03 0.86 Creatinine (114.1 / 44.0) 9.08E-03 0.90 2.04E-03 0.88 Urate (167.0 / 124.0) 1.40E-02 0.93 1.49E-02 0.93 Homogentisate (167.0 / 123.0) 1.59E-02 0.92 2.81E-02 0.93 Proline (116.1 / 70.0) 1.72E-02 1.10 gama-Aminobutyrate (102.1 / 85.0) 1.75E-02 0.93 4.95E-03 0.91 A i i (175 1 / 70 0) 1 77E 02 1 08 1 05

Summary of p-Values and Fold Changes

SLIDE 42

Lower Bound Upper Bound Histidine 0.719 0.040 0.640 0.798 0.924 0.467 0.658 Glyceraldehyde 0.702 0.042 0.619 0.785 0.742 0.641 0.686 Glycochenodeoxycholate 0.688 0.042 0.605 0.770 0.879 0.435 0.620 Hyppuric Acid 0.684 0.044 0.597 0.771 0.591 0.794 0.709 Methionine 0.680 0.043 0.596 0.764 0.667 0.630 0.646 Lysine 0.680 0.043 0.595 0.764 0.530 0.794 0.684 Linolenic Acid 0.668 0.044 0.581 0.755 0.439 0.880 0.696 Glycocholate 0.665 0.043 0.580 0.749 0.742 0.565 0.703 Glutamic acid 0.660 0.044 0.574 0.746 0.606 0.707 0.665 N-AcetylGlycine 0.657 0.044 0.570 0.744 0.788 0.511 0.623 2'-Deoxyuridine 0.656 0.044 0.571 0.742 0.576 0.685 0.639 Allantoin 0.653 0.043 0.568 0.739 0.606 0.663 0.639 Glutamine 0.652 0.044 0.566 0.739 0.546 0.707 0.639 Aspartic Acid 0.649 0.046 0.559 0.739 0.439 0.859 0.684 Dimethylglycine 0.649 0.044 0.562 0.736 0.606 0.663 0.639 Maleic Acid ) 0.649 0.045 0.560 0.737 0.606 0.707 0.665 Hydroxyproline/Aminolevulinate 0.647 0.044 0.561 0.733 0.682 0.587 0.627 Adenylosuccinate 0.642 0.045 0.553 0.731 0.439 0.815 0.658 Malonic Acid/3HBA 0.637 0.048 0.542 0.731 0.546 0.815 0.703 Accuracy Metabolites AUROC Std. Error 95% Confidence Interval Sensitivity Specificity

Single Metabolite Performance

SLIDE 43

A B Cancer patients vs. Healthy controls AUROC 0.90 Cancer patients vs. Polyp patients AUROC 0.94

ROC Curves of PLS-DA VIP Metabolite Model

SLIDE 44

Cancer patients vs. Healthy controls Cancer patients vs. Polyp patients

MCCV of VIP Metabolite Model

SLIDE 45

AUROC: 0.90 AUROC: 0.93 Sensitivity: 0.96 Specificity: 0.80 AUROC: 0.94 AUROC: 0.95 Sensitivity: 0.89 Specificity: 0.88

CRC vs. Control CRC vs. Polyp Metabolites only model Metabolites + Clinical factor model

Improved Model using Clinical Information

SLIDE 46

Compared to Healthy controls Colon Cancer Rectal Cancer Stage I/II Stage III Stage IV AUROC 0.96 0.93 0.93 0.93 0.99 Sensitivity 0.95 0.93 0.95 0.93 0.94 Specificity 0.88 0.82 0.82 0.75 0.94 Compared to Polyps Colon Cancer Rectal Cancer Stage I/II Stage III Stage IV AUROC 0.96 0.95 0.97 0.94 0.99 Sensitivity 0.92 0.92 0.95 0.94 1.00 Specificity 0.91 0.78 0.92 0.82 0.96

Model Performance

SLIDE 47

Summary

SLIDE 48

Combining MS and NMR using Variable Selection

Ling Li, A. Chem., 2016

SLIDE 49

Study Demographics

Healthy Controls Polyps CRC Patients 55 44 28 Age, mean (range) 52.8 (21-74) 56.1 (39-68) 55.3(27-86) Gender Male 25 23 16 Female 30 21 12 Polyps size 0-5

26
6-9
4
10-15
3
16+
11
Cancer stage

Stage I/II

3

Stage III

8

Stage IV

17

Diagnosis Colon Cancer

19

Rectal Cancer

9

Ethnicity Caucasian 50 23 15 African American 4 1 2 Asian 1 NA* 1 19 11

SLIDE 50

Statistical Approach

127 serum samples: CRC=28, Polyp=44, Control=55 Drop each variable and establish new PLS-DA models PLS-DA model with selected variables Select the variables that generate the best prediction Monte Carlo cross validation MS data NMR data NMR+MS data Optimal variable subset

Backward Variable Elimination

SLIDE 51

BVE Results

MCCV Model Accuracy

SLIDE 52

Differentiating Polyps from Controls

More difficult, still BVE approach provides best accuracy

SLIDE 53

Cologard Results

Polyps are hard to detect, even using stool based DNA…

NEJM, 2014

SLIDE 54

Modeling Metabolite Levels

Usually, in metabolomics, one models a disease state based on metabolite levels and possibly clinical/demographic variables (age, gender, BMI, etc). For example, in a PLS-DA model:

Y = b0 + b1X1 + b2X2 + ... + bpXp

Clinical variables can have a large effect on the model. We’re interested in the effects of clinical variables on metabolite levels. So, instead, we’re going to try to model metabolite levels based on clinical variables…

Chen et al., J. Prot. Res., 2015

SLIDE 55

Supervised Multivariate Statistics: PLS-DA

bi : re regre ression coefficients for r vari riables X 1 thr hroug ugh h X p

PLS-DA is used to fit a model between the spectral data and the class information:

Y = b0 + b1X1 + b2X2 + ... + bpXp

Class variables: “Control = 0” and “Case = 1”

Test Set of Samples Disease Control Training set samples Statistical model Training Set of Samples Disease Control

SLIDE 56

Seemingly Unrelated Regression

To do this, we’re going to use a SUR model. Developed in 1962 by Arnold Zellnera to model missing but correlated economic data:

… …

Where: Y is a predicted metabolite level x is a confounding factor ß is a fitted coefficient And ϵ is a correlated error matrix

Chen et al., J. Prot. Res., 2015 Metabolomics, 2017

SLIDE 57

Study Information

 24 metabolites detected by NMR  Clinical info: age, gender, BMI, smoking, alcohol, diagnosis  Added interactions between diagnosis and clinical variables. Total n=102 Polyps n=44 Healthy Controls n=58 Age Mean 62.4 ±6.3 59.7±6.6 Gender Male 29 24 Female 15 34 BMI Mean 29.1±5.6 27.9±6.5 Ever Smoked Yes 22 25 No 22 33 Alcohol Status Alcohol 29 43 Little/No Alcohol 15 15

SLIDE 58

Analysis Flowchart

Build 24 metabolite SUR model Backward Selection for Covariates

M = b0 + b1G + b2BMI + b3S ... + bpDp

SLIDE 59

SUR Results on Covariates and Metabolites

Selected Effects p-Value Gender 9.9e-08 BMI 0.0023 BMI2 0.15 Smoking 0.56 Diagnosis 0.39 Diagnosis × Gender 0.045 Diagnosis × BMI 0.012 Diagnosis × BMI2 0.17 Diagnosis × Smoking 0.0049

Metabolite p-Value 3-Hydroxybutyric acid 0.59 Acetic acid 0.98 Acetoacetate 0.66 Acetone 0.22 Alanine 0.35 Choline/Phosphocholine 0.78 Citric acid 0.75 Creatinine 0.46 Dimethylglycine 0.91 Formate 0.91 Glucose 0.070 Glutamic acid 0.84 Glutamine 0.86 Glycine 0.60 Histidine 0.67 Isoleucine 0.54 Lactate 0.81 Lysine 0.51 Phenylalanine 0.26 Threonine 0.78 Tyrosine 0.55 Unsaturated-Lipids 0.39 Urea 0.098 Valine 0.010

A number of clinical covariates significantly affect metabolite levels. However, when we take these clinical factors into account, the disease signal is very small on individual metabolites

M = b0 + b1G1 + b2BMI + b3S ... + bpDp

SLIDE 60

SUR Results on Biological Groups

Biological Groups p-Value Adjusted p-Value

Group 1: acetate, glucose, lactate 0.014 0.023 Group 2: isoleucine, valine 0.0046 0.012 Group 3: alanine, glutamic acid, glutamine 0.060 0.069 Group 4: creatinine, glutamine, urea 0.0010 0.0050 Group 5: glutamic acid, histidine 0.058 0.072 Group 6: acetoacetate, acetone, lactate 0.33 0.33 Group 7: acetoacetate, citric acid, tyrosine 0.0011 0.0041 Group 8: citric acid, formate, glutamic acid, glutamine 0.23 0.25 Group 9: phenylalanine, tyrosine 0.0021 0.0063 Group 10: alanine, glutamic acid, glutamine, glycine, histidine, isoleucine, lysine, phenylalanine, threonine, tyrosine, valine 1.5e-07 2.3e-06 Group 11: alanine, citric acid, glucose, lactate 0.021 0.032 Group 12: glycine, threonine 0.0051 0.011 Group 13: alanine, glutamic acid, glycine, threonine 0.031 0.042 Group 14: alanine, glutamic acid, glycine, isoleucine, threonine, valine 1.2e-05 9.1e-05 Group 15: choline/phosphocholine, glycine, threonine 0.0057 0.011

Combining the results of multiple metabolites did bring back some performance:

SLIDE 61

Modeling Metabolite Levels

Chen et al., Metabolomics, 2017

Using Targeted MS Data

SLIDE 62

Effect of SUR on Correlations

Urea cycle

SLIDE 63

New Approaches for Pathway Mapping

SLIDE 64

Conclusions

 Biomarker

discovery and especially validation is very

challenging. High quality data is an important first step.

 Improved performance of global profiling is being achieved

and new sets of data quality metrics will be useful to assess these improvements.

 Broad-based targeted MS provides excellent reliability, which

allows a realistic idea of putative biomarker strength.

 Advanced statistical methods (like SUR and other modeling

approaches) may account for some confounding factors and provide a path for improving biomarker performance.

SLIDE 65

Biological Samples

 Metabolomics is a comparative science  Human samples are challenging because they vary due to:

 Gender  Age  Diet  Lifestyle, etc.  Nevertheless, human studies are possible with careful design

 Animal samples can be controlled better

 Their samples numbers can be smaller (>5)  Genetic knock-outs, knock-ins; normal v disease  Paired studies are better  Great also for validating human discoveries

 Cell based studies can be even more attractive

 Provide an opportunity to probe fundamental biology  Of course, such studies are further from important applications

SLIDE 66

Sample Prep

Quench Sample (-80 °C) Extract metabolites (MeOH, CHCl3, etc.) Dry/concentrate sample Add standards and run!

Challenge: Each sample prep method will emphasize some metabolites over others. You have to choose… It’s impossible to measure all metabolites with one sample prep type and one instrumental measurement. Most metabolites are still not identified by the MS or NMR instruments!