how we handle mass spectra
play

How We Handle Mass Spectra NIST Mass Spectrometry Data Center - PowerPoint PPT Presentation

How We Handle Mass Spectra NIST Mass Spectrometry Data Center NIST/EPA/NIH Mass Spectral Library Numbers of Spectra 200,000 180,000 160,000 140,000 120,000 Replicates 100,000 Compounds 80,000 60,000 40,000 20,000 0 '78 '80 '83


  1. How We Handle Mass Spectra NIST Mass Spectrometry Data Center

  2. NIST/EPA/NIH Mass Spectral Library Numbers of Spectra 200,000 180,000 160,000 140,000 120,000 Replicates 100,000 Compounds 80,000 60,000 40,000 20,000 0 '78 '80 '83 '86 '88 '90 '93 '98 '02 Red Books EPA NIST

  3. Libraries Distributed/Year 4500 4000 3500 3000 2500 2000 1500 1000 500 0 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04

  4. The Data H Cl 12 13 16 17 H 1 10 14 2 11 9 NH H 20 3 5 8 15 4 H 6 7 H 19 O N NH NH N NH 2

  5. Connection Table 1 2 3 4 4 D S Cl 1 3 D S 2 S S S 3 S 4 1 2

  6. From Structure to Spectrum: A Mass “Fragmentogram” mass = 140 u H 3 C H 3 C + O O e- + 2e- + H C O P CH 3 H C O P CH 3 F F H 3 C H 3 C + H 3 C O CH 2 OH + + CH 3 CH CH O P CH 3 H O P CH 3 + H 2 C F F mass = 125 u mass = 99 u

  7. Molecular Fingerprints VX HD GB

  8. I will discuss • Library Searching – Full and Partial Spectra • Spectrum Purification • Chemical Structure Representation • Peptide Spectra Libraries

  9. Instrument ‘Noise Signature’ 250 Hexachlorobenzene Spectra same instrument, calibration mix 1000 800 600 400 200 Bars show quartiles 0 0 50 100 150 200 250 300

  10. Instrument Effects

  11. Library Search unknown MF=93 sarin MF=68

  12. Spectral Similarity � MR � � M R M = f (Abundance) Peak in Measured Spectrum • R = f (Abundance) Peak in Reference Spectrum • • Sum over all peaks f (Abundance) • – Abundance – Abundance * m/z – Certainty

  13. Algorithm Performance 12,592 Replicate Spectra against NIST Library Percent Correct Model Top Hit Top 2 Hits Top 3 Hits Correlation – Weighted 74.9 86.9 91.7 Correlation 72.9 85.9 90.8 Euclidean Distance 71.9 83.9 88.9 Absolute Distance 67.9 80.3 85.5 PBM - Published 64.7 78.4 84.8 Hites/Hertz/Biemann 64.4 77.2 83.2

  14. FP/FP Above Given Match Factor for NIST Library Spectra 1.0 0.8 False Negatives (21,000 replicate spectra) 0.6 Fraction Recovered 0.4 False Positives (108,000 compounds) 0.2 0.0 0 20 40 60 80 100 Match Factor Threshold

  15. FP/FN Expanded View m/z weighting 0.8 0.6 Fraction Recovered 0.4 no weighting FN 0.2 FP x 10,000 0.0 80 85 90 95 100 Match Factor

  16. FP Depends on Spectrum Uniqueness decalin decane TMB HCB 200 DMPB 150 FP 100 sarin malathion 50 0 0 20 40 60 80 100 HCB = hexachlorobenzene Match Factor DMPB = dimethylpenobarbital TMB = 1,2,3-trimethylbenzene

  17. Multiple Ion Monitoring • What is is? – Use 2-5 Major Peaks in Spectrum of Target • 10 – 100 more sensitive • What’s the problem? – Can match major Target peaks with Minor Sample Peaks • What we have done: – Examine risk using library as source of potential false positive IDs

  18. False Positive Risk vs Number of Peaks Used Figure 1. Median FPP vs. NP Figure 1. Median FPP vs. NP 1 1 BMA BMA 0.1 0.1 1/128 1/128 1/128 1/64 1/64 1/64 FP/ 0.01 0.01 1/32 1/32 1/32 spectrum 1/16 1/16 1/16 1/8 1/8 1/8 0.001 0.001 1/4 1/4 1/4 (median) 1/2 1/2 1/2 0.0001 0.0001 1 1 0.00001 0.00001 1 1 2 2 3 3 4 4 5 5 Number of Peaks Number of Peaks Number of Peaks Abundance Ratio: Biggest Search Peak/ Matching Peak in FP

  19. Mass Spectral Peak Occurrences are Correlated Small Peaks 100 100 90 90 80 80 Relative Probabilities Relative Probabilities 70 70 Joint 60 60 Occurrence 50 50 s s Prob . 40 40 30 30 20 20 10 10 0 0 0 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 m/z Difference m/z Difference Difference in Peak Position Medium Big (m/z) Peaks Peaks

  20. FP Observed and Computed (from individual peak probabilities) 10000 10000 1000 1000 Actual 100 100 FP 10 10 1 1 0.1 0.1 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 Observed FPP Percentile Observed FPP Percentile No Peak Correlation FP Percentile/10

  21. Search Results Depend on Search Spectrum Quality AMDIS: http://chemdata.nist.gov

  22. Real Data Total ion chromatogram A mass spectrum (scan)

  23. Chromatogram with single ion

  24. AMDIS Analysis of Data O AMDIS Match = 81 O P F

  25. Order of Analysis • Noise Analysis – find ‘Noise Factor’ • Find and quantify maximizing ions • Combine to create ‘Model Peak’ • Use Model Peak shape (intensity vs time) to purify spectra • Find best matching library spectrum

  26. Derive Noise Factor Noise Intensity = Noise K Intensity noise

  27. Finding Possible Peaks for Each m/z Maximum rate Scan number n

  28. Find Possible Compounds: Do Ions Maximize at Same Time? 36 10 .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 2

  29. Separate the Components 508 751 103 4 264 42 41 15 111 2 6 18 14 16 16 7 13 22 8 7 85 37 96 82 75 36 10 11 147 82 81 23 57 14 305 19 1 2 .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .3 .3 yes yes .2 .2 NO .6

  30. A ‘Model Peak’ Provides Shape 508 751 103 4 264 42 41 15 111 2 6 18 14 16 16 7 13 22 8 7 85 37 96 82 75 36 10 11 147 82 81 23 57 14 305 19 1 2 .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 The model shape is defined as the sum of all of the ion .3 .3 chromatograms that maximize within the range and yes have a sharpness value within 75% of the maximum. yes .2 .2 NO .6

  31. AMDIS Testing – Closely Eluting Components

  32. Representing Chemical Identity • Visual: 2D Structure • Text: IUPAC Name • Digital: No Accepted, Open Method • Solution: The IUPAC/NIST Chemical Identifier

  33. Connection Table 1 2 3 4 4 D S Cl 1 3 D S 2 S S S 3 S 4 1 2

  34. Chemical Identity Problems H 3 C CH 3 H 3 C CH 3 Registry Number possible for each exact form, mixture, unknown, unspecified Experts required Expensive, ambiguous and error prone

  35. Requirements • Different compounds have different identifiers – Keep all distinguishing structural information = = IChI - 2 IChI - 1

  36. Requirements • One compound has only one identifier – Omit unnecessary information O O O O O + O O O N N N N = = = Same INChI

  37. 3 Steps to INChI • Chemistry – ‘Normalize’ Input Structure • Implement chemical rules • Math – ‘Canonicalize’ (label the atoms) • Equivalent atoms get the same label • Format – ‘Serialize’ Labeled Structure • Output as character string (‘name’)

  38. “ Layers ” Chemical Substances formula connectivity stereo isotope

  39. 9 8 O O 7 + N Nitrobenzene 6 C 4 5 CH CH 2 3 CH CH CH 1 Canonical numbering Description Layers formula C6H5NO2 connectivity 8-7(9)6-4-2-1-3-5-6 H-atoms 1-5H charges

  40. 8 9 O O + 1 Na 1 4 5 CH 2 C C 2 7 O CH 2 CH H O 10 3 MSG NH 2 6 Canonical numbering Description Layers formula C5H8NO4.Na connectivity 6-3(5(9)10)1-2-4(7)8; H-atoms 1-2H2,3H,6H2(H-,7,8,9,10); stereo sp 3 3-; charges -1;+1 C5H9NO4.Na/c6-3(5(9)10)1-2-4(7)8;/h1- 2H2,3H,6H2,(H,7,8)(H,9,10);/q;+1/p-1/t3-;/m1./s1

  41. Input/ Result Mobile H On/Off Include Org- Metal Bonds INChI Test Version

  42. Peptide Mass Spectra: Libraries for Organisms • Proteins are linear sequences of amino acids – characteristic of Genome (organism) • Peptides are ‘digested’ fragments of proteins • MS ‘sequences’ peptides to reveal source Protein • Peptides fragmentation spectra are not quite predictable • Peptide fragmentation spectra for a ‘genome’ can be contained in one Library.

  43. Spectrum Prediction Programs

  44. Peptide Spectra Reference Library (multiple measurements each of 10,000 peptides) HLQLAIR/2+

  45. MS Mapped to the Genome From Eric Deutsch, ISB, 6/2004

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend