Metabolite Identification via Machine Learning Huibin Shen - PowerPoint PPT Presentation

Introduction Fingerprints prediction Database matching Result Conclusion Metabolite Identification via Machine Learning Huibin Shen Department of Information and Computer Science Aalto University February 7, 2013 Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Database matching Result Conclusion Outline Introduction 1 Fingerprints prediction 2 Database matching 3 Result 4 Conclusion 5 Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion General picture What is the metabolites identification? Figure 1: Metabolomics pipeline towards a systems biology approach: from the whole metabolome to identified metabolites [M. Sofia, 2007]. Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion Standard computational method Matching reference spectral database. Problems: Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion Standard computational method Matching reference spectral database. Problems: Quality of data. Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion Standard computational method Matching reference spectral database. Problems: Quality of data. Seldom public. Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion Standard computational method Matching reference spectral database. Problems: Quality of data. Seldom public. Limited number. Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion Standard computational method Matching reference spectral database. Problems: Quality of data. Seldom public. Limited number. Diversity of mass spectrometer. Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion Standard computational method Matching reference spectral database. Problems: Quality of data. Seldom public. Limited number. Diversity of mass spectrometer. Similarity definition. Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion Molecular fingerprint Figure 2: Representation of a molecular substructure fingerprint with a substructure fingerprint dictionary of given substructure patterns. This molecule is represented in a series of binary bits that represent the presence or absence of particular substructures in the molecules [D.S. Cao, 2012]. Huibin Shen Metabolite Identification via Machine Learning

Introduction General picture Fingerprints prediction Computational methods Database matching Key concept Result Machine learning method Conclusion Machine learning method We propose a new framework to identify metabolites through machine learning: Figure 3: The overview of the two-step metabolite identification framework. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Method Database matching Kernels Result Conclusion Support Vector Machine (SVM) SVM, a supervised machine learning method for classification and regression. Figure 4: Three dimensional case for SVM 1 . 1 Figure from http://www.dtreg.com/svm.htm Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Method Database matching Kernels Result Conclusion kernels for mass spectrum Feature mapping ≈ kernel function. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Method Database matching Kernels Result Conclusion kernels for mass spectrum Feature mapping ≈ kernel function. Three basic features and their combination. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Method Database matching Kernels Result Conclusion kernels for mass spectrum Feature mapping ≈ kernel function. Three basic features and their combination. Two families of kernels: integral mass kernel and probability product kernel. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Method Database matching Kernels Result Conclusion Integral mass kernels k ( x , x ′ ) = � x , x ′ � O OH HN NH 2 Collision energy 10eV Collision energy 20ev Collision energy 30ev 1 0.9 0.8 I 0.7 n 145.1 t e 0.6 n s 0.5 i 117.0 t 0.4 y 0.3 0.2 169.3 73 187.4 0.1 0 0 20 40 60 80 100 120 140 160 180 200 m/z Figure 5: Three basic features and integral mass kernel. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Method Database matching Kernels Result Conclusion Probability product kernel k ( x , x ′ ) = k prob ( p ( x ) , p ′ ( x ′ )) = X p ( x ) p ′ ( x ′ ) dx � Figure 6: Probability product kernel. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Database matching Result Conclusion Scoring i =1 ∈ R m over m fingerprints Given the cross validation accuracy p = ( p i ) m i =1 . The similarity score between two fingerprints y and y ∗ is: y = ( y i ) m m 1 −| y i − y ∗ i | (1 − p i ) | y i − y ∗ p ( y | p , y ∗ ) = � i | . p i i =1 Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Experiment 1 Database matching Experiment 2 Result Experiment 3 Conclusion Experiments data A summary of the datasets is listed in this table Data Device Size Mode Mass error Std Fingerprints QqQ misc 514 Pos 286 - API3000 410 Pos 0.128 0.164 - QuattroPremier XE 82 Pos -0.092 0.073 - TSQ 7000 17 Pos -0.124 0.036 - TSQ Quantum AM 3 Pos - Q-Trap 2 Pos Ltq LTQ Orbitrap XL 293 Pos 0.0 0.049 128 Lipids LTQ Orbitrap 403 Neg -0.135 0.090 20 Table 1: The dataset statistics. Only a subset of fingerprints are exhibited in each dataset’s molecules. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Experiment 1 Database matching Experiment 2 Result Experiment 3 Conclusion Fingerprint prediction We show the predication accuracies for ltq dataset. 1.0 0.9 0.8 accuracy 0.7 0.6 Integral mass kernel High resolution mass kernel 0.5 1 30 60 90 128 fingerprints Figure 7: Light grey is improvement by integral kernel from default classifier. Dark grey is improvement by product probability kernel from integral kernel. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Experiment 1 Database matching Experiment 2 Result Experiment 3 Conclusion Feature selection We show the effect of different features. peaks nloss 94 diff peaks+nloss ● peaks+diff full mean accuracy 92 ● ● 90 88 Integral mass kernel High resolution mass kernel 40 45 50 55 60 65 mean F1 Figure 8: Scatter plot of the aggregate average accuracy/F 1 across three datasets. The non-filled marks represent higher accuracy/F 1 ratio in quadratic kernel. Huibin Shen Metabolite Identification via Machine Learning

Introduction Fingerprints prediction Experiment 1 Database matching Experiment 2 Result Experiment 3 Conclusion Experiments data (for CASMI challenge) MS2 spectra are used to train the model and MS1 spectra are used for comparing the result of isotopic patterns matching. MS type Instument type Size No. of Mol Fingerprints MS2 APCI-ITFT-CID 295 65 179 APCI-ITFT-HCD 882 86 181 LC-ESI-ITFT-CID 447 244 281 LC-ESI-ITFT-HCD 2655 225 281 LC-ESI-QTOF-CID 1027 523 290 MS1 LC-ESI-ITFT 41 41 LC-ESI-QTOF 62 62 Table 2: The dataset statistics. Only a subset of fingerprints are exhibited in each dataset’s molecules. Huibin Shen Metabolite Identification via Machine Learning

Metabolite Identification via Machine Learning Huibin Shen - PowerPoint PPT Presentation

Introduction Fingerprints prediction Database matching Result Conclusion Metabolite Identification via Machine Learning Huibin Shen Department of Information and Computer Science Aalto University February 7, 2013 Huibin Shen Metabolite

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Urinary antihypertensive drug metabolite screening using molecular networking coupled to

United States Court of Appeals for the Federal Circuit 03-1120 METABOLITE LABORATORIES, INC. and

Metabolite Databases for Plant Biotechnology Metabolomics approaches for Agro-Biotechnology

Uncertainty in diagnostic metabolite assays white cell cystine as an example Mick Henderson

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Mol2Net-04 Antimicrobial and anticancer activities of bisacodyl and its deacetylated metabolite

metabolite produced by Streptomyces badius Magdalena Postek, Aleksandra Rajnisz-Mateusiak, Joanna

Synthesis of 5-BromoVerongamine, an Antibacterial Dibromotyrosine Metabolite from Pseudoceratina

Detection of metabolite corona on amino functionalised polystyrene nanoparticles and its

Gwenal Abril Biologie des organismes et des cosystmes

in Davy Crockett Cartridge, 20mm Spotting M101 Robert Cherry, PhD, CHP President-elect

ORBITRAP Mass Spectrometer An Ultimate Qual and Quan Machine Pongsagon Pothavorn Scispec Co.,

Upwelling of volatiles from the mantle Upwelling of volatiles from the mantle and the subsiding

CHEMICAL ALTERNATIVES TO METHYL BROMIDE IN ITALY: AN UPDATE Andrea Minuto*, Angelo Garibaldi and

Investor Presentation Full Year 2019 Results April 2020 The Health Care Background HEALTH CARE

MIGRANTE (Italy, Lazio, Olevano Romano) FROM TRUCKS TO WINE : The winery was founded in 2000 by

Disability patterns among Italian adult population: a multistate approach Cristina