How We Handle Mass Spectra NIST Mass Spectrometry Data Center

NIST/EPA/NIH Mass Spectral Library Numbers of Spectra 200,000 180,000 160,000 140,000 120,000 Replicates 100,000 Compounds 80,000 60,000 40,000 20,000 0 '78 '80 '83 '86 '88 '90 '93 '98 '02 Red Books EPA NIST

Libraries Distributed/Year 4500 4000 3500 3000 2500 2000 1500 1000 500 0 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04

The Data H Cl 12 13 16 17 H 1 10 14 2 11 9 NH H 20 3 5 8 15 4 H 6 7 H 19 O N NH NH N NH 2

Connection Table 1 2 3 4 4 D S Cl 1 3 D S 2 S S S 3 S 4 1 2

From Structure to Spectrum: A Mass “Fragmentogram” mass = 140 u H 3 C H 3 C + O O e- + 2e- + H C O P CH 3 H C O P CH 3 F F H 3 C H 3 C + H 3 C O CH 2 OH + + CH 3 CH CH O P CH 3 H O P CH 3 + H 2 C F F mass = 125 u mass = 99 u

Molecular Fingerprints VX HD GB

I will discuss • Library Searching – Full and Partial Spectra • Spectrum Purification • Chemical Structure Representation • Peptide Spectra Libraries

Instrument ‘Noise Signature’ 250 Hexachlorobenzene Spectra same instrument, calibration mix 1000 800 600 400 200 Bars show quartiles 0 0 50 100 150 200 250 300

Instrument Effects

Library Search unknown MF=93 sarin MF=68

Spectral Similarity � MR � � M R M = f (Abundance) Peak in Measured Spectrum • R = f (Abundance) Peak in Reference Spectrum • • Sum over all peaks f (Abundance) • – Abundance – Abundance * m/z – Certainty

Algorithm Performance 12,592 Replicate Spectra against NIST Library Percent Correct Model Top Hit Top 2 Hits Top 3 Hits Correlation – Weighted 74.9 86.9 91.7 Correlation 72.9 85.9 90.8 Euclidean Distance 71.9 83.9 88.9 Absolute Distance 67.9 80.3 85.5 PBM - Published 64.7 78.4 84.8 Hites/Hertz/Biemann 64.4 77.2 83.2

FP/FP Above Given Match Factor for NIST Library Spectra 1.0 0.8 False Negatives (21,000 replicate spectra) 0.6 Fraction Recovered 0.4 False Positives (108,000 compounds) 0.2 0.0 0 20 40 60 80 100 Match Factor Threshold

FP/FN Expanded View m/z weighting 0.8 0.6 Fraction Recovered 0.4 no weighting FN 0.2 FP x 10,000 0.0 80 85 90 95 100 Match Factor

FP Depends on Spectrum Uniqueness decalin decane TMB HCB 200 DMPB 150 FP 100 sarin malathion 50 0 0 20 40 60 80 100 HCB = hexachlorobenzene Match Factor DMPB = dimethylpenobarbital TMB = 1,2,3-trimethylbenzene

Multiple Ion Monitoring • What is is? – Use 2-5 Major Peaks in Spectrum of Target • 10 – 100 more sensitive • What’s the problem? – Can match major Target peaks with Minor Sample Peaks • What we have done: – Examine risk using library as source of potential false positive IDs

False Positive Risk vs Number of Peaks Used Figure 1. Median FPP vs. NP Figure 1. Median FPP vs. NP 1 1 BMA BMA 0.1 0.1 1/128 1/128 1/128 1/64 1/64 1/64 FP/ 0.01 0.01 1/32 1/32 1/32 spectrum 1/16 1/16 1/16 1/8 1/8 1/8 0.001 0.001 1/4 1/4 1/4 (median) 1/2 1/2 1/2 0.0001 0.0001 1 1 0.00001 0.00001 1 1 2 2 3 3 4 4 5 5 Number of Peaks Number of Peaks Number of Peaks Abundance Ratio: Biggest Search Peak/ Matching Peak in FP

Mass Spectral Peak Occurrences are Correlated Small Peaks 100 100 90 90 80 80 Relative Probabilities Relative Probabilities 70 70 Joint 60 60 Occurrence 50 50 s s Prob . 40 40 30 30 20 20 10 10 0 0 0 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 m/z Difference m/z Difference Difference in Peak Position Medium Big (m/z) Peaks Peaks

FP Observed and Computed (from individual peak probabilities) 10000 10000 1000 1000 Actual 100 100 FP 10 10 1 1 0.1 0.1 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 Observed FPP Percentile Observed FPP Percentile No Peak Correlation FP Percentile/10

Search Results Depend on Search Spectrum Quality AMDIS: http://chemdata.nist.gov

Real Data Total ion chromatogram A mass spectrum (scan)

Chromatogram with single ion

AMDIS Analysis of Data O AMDIS Match = 81 O P F

Order of Analysis • Noise Analysis – find ‘Noise Factor’ • Find and quantify maximizing ions • Combine to create ‘Model Peak’ • Use Model Peak shape (intensity vs time) to purify spectra • Find best matching library spectrum

Derive Noise Factor Noise Intensity = Noise K Intensity noise

Finding Possible Peaks for Each m/z Maximum rate Scan number n

Find Possible Compounds: Do Ions Maximize at Same Time? 36 10 .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 2

Separate the Components 508 751 103 4 264 42 41 15 111 2 6 18 14 16 16 7 13 22 8 7 85 37 96 82 75 36 10 11 147 82 81 23 57 14 305 19 1 2 .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .3 .3 yes yes .2 .2 NO .6

A ‘Model Peak’ Provides Shape 508 751 103 4 264 42 41 15 111 2 6 18 14 16 16 7 13 22 8 7 85 37 96 82 75 36 10 11 147 82 81 23 57 14 305 19 1 2 .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 The model shape is defined as the sum of all of the ion .3 .3 chromatograms that maximize within the range and yes have a sharpness value within 75% of the maximum. yes .2 .2 NO .6

AMDIS Testing – Closely Eluting Components

Representing Chemical Identity • Visual: 2D Structure • Text: IUPAC Name • Digital: No Accepted, Open Method • Solution: The IUPAC/NIST Chemical Identifier

Connection Table 1 2 3 4 4 D S Cl 1 3 D S 2 S S S 3 S 4 1 2

Chemical Identity Problems H 3 C CH 3 H 3 C CH 3 Registry Number possible for each exact form, mixture, unknown, unspecified Experts required Expensive, ambiguous and error prone

Requirements • Different compounds have different identifiers – Keep all distinguishing structural information = = IChI - 2 IChI - 1

Requirements • One compound has only one identifier – Omit unnecessary information O O O O O + O O O N N N N = = = Same INChI

3 Steps to INChI • Chemistry – ‘Normalize’ Input Structure • Implement chemical rules • Math – ‘Canonicalize’ (label the atoms) • Equivalent atoms get the same label • Format – ‘Serialize’ Labeled Structure • Output as character string (‘name’)

“ Layers ” Chemical Substances formula connectivity stereo isotope

9 8 O O 7 + N Nitrobenzene 6 C 4 5 CH CH 2 3 CH CH CH 1 Canonical numbering Description Layers formula C6H5NO2 connectivity 8-7(9)6-4-2-1-3-5-6 H-atoms 1-5H charges

8 9 O O + 1 Na 1 4 5 CH 2 C C 2 7 O CH 2 CH H O 10 3 MSG NH 2 6 Canonical numbering Description Layers formula C5H8NO4.Na connectivity 6-3(5(9)10)1-2-4(7)8; H-atoms 1-2H2,3H,6H2(H-,7,8,9,10); stereo sp 3 3-; charges -1;+1 C5H9NO4.Na/c6-3(5(9)10)1-2-4(7)8;/h1- 2H2,3H,6H2,(H,7,8)(H,9,10);/q;+1/p-1/t3-;/m1./s1

Input/ Result Mobile H On/Off Include Org- Metal Bonds INChI Test Version

Peptide Mass Spectra: Libraries for Organisms • Proteins are linear sequences of amino acids – characteristic of Genome (organism) • Peptides are ‘digested’ fragments of proteins • MS ‘sequences’ peptides to reveal source Protein • Peptides fragmentation spectra are not quite predictable • Peptide fragmentation spectra for a ‘genome’ can be contained in one Library.

Spectrum Prediction Programs

Peptide Spectra Reference Library (multiple measurements each of 10,000 peptides) HLQLAIR/2+

MS Mapped to the Genome From Eric Deutsch, ISB, 6/2004

How We Handle Mass Spectra NIST Mass Spectrometry Data Center - PowerPoint PPT Presentation

How We Handle Mass Spectra NIST Mass Spectrometry Data Center NIST/EPA/NIH Mass Spectral Library Numbers of Spectra 200,000 180,000 160,000 140,000 120,000 Replicates 100,000 Compounds 80,000 60,000 40,000 20,000 0 '78 '80 '83

Spectra Access Northeast Project What is Spectra? Spectra is a Houston-based interstate

Touchless Handle Touchless Handle | Product Vision Touchless Handle is a gesture-based way to

In Silico Spectra Lab Slide 1 In Silico Spectra Lab Explore & investigate Explore &

Quadrupole Mass Filter Ion Trap Mass Filter Ion Cyclotron Resonance Mass Spectrometer Time of

Prime Spectra of 2-Categories Category theory Joint work with Milen Yakimov The prime spectra

Touchless Handle Swipe to lock/unlock Touchless Handle is a hands-free way to operate a bathroom

MASSES Saturday Vigil 4:30PM Mass in English Sunday 8:00AM Mass in English 9:30AM Mass

Proteomics Informatics Analysis of mass spectra: signal processing, peak finding, and isotope

Mass Spectrometry MALDI-TOF ESI/MS/MS Mass spectrometer Basic components Ionization

SIO15-18: Lecture 11: Landslides, Mass Movements SIO15-18: Lecture 11: Landslides, Mass Movements

Franca Petrucci Ed Whitford City of Dawson Creek January 21, 2013 Spectra Energy in Western

Natural Gas Pipeline 101 Brian Fahrenthold Director Government Affairs, Spectra Energy Spectra

Vector and baryon spectra via holography in an AdS deformed background Miguel Angel Mart

Spectra of C* algebras, classification. Eberhard Kirchberg HU Berlin Lect.1, Copenhagen, 09

Spectra of digraph transformations Aiping Deng Donghua University, University of Wisconsin Joint

What Keeps You Up at Night? Issues of Fraud and Abuse Compliance Series How to Handle the Bad

NEW TEACHER SUMMER ACADEMY INTRODUCTION TO RESTORATIVE PRACTICES Scott Davis, Elementary

ooc A hybrid language experiment http://ooc-lang.org/ Amos Wenger OSCON 2010 Why? Software

62 Gt 22 Mt of P Terrestrial 12 Gt of Waste Increasing cost Source: OECD/IEA data, OECD

PRESENTATION SEPTEMBER 2016 H ow lucky we are. We have the scientific data, and as Art of

Tandem Mass Spectrometry: Practicalities and troubleshooting Sarah Montague and Dipti Seekun

Mass Spectra Alignments and their Significance ocker 1 , Hans-Michael Kaltenbach 2 Sebastian B

Targeted mass spectrometry Marina Zajec Dept. of Neurology and Clinical Chemistry Lab. of

Direct Aqueous Determination of Glyphosate and Related Compounds by Liquid Chromatography/ Tandem

Sambuz

Useful Links

Newsletter

Mail Us