Mass Spectra Alignments and their Significance ocker 1 , - - PowerPoint PPT Presentation

mass spectra alignments and their significance
SMART_READER_LITE
LIVE PREVIEW

Mass Spectra Alignments and their Significance ocker 1 , - - PowerPoint PPT Presentation

Mass Spectra Alignments and their Significance ocker 1 , Hans-Michael Kaltenbach 2 Sebastian B 1 Technische Fakult at, Universit at Bielefeld 2 NRW Intl Graduate School in Bioinformatics and Genome Research, Universit at Bielefeld


slide-1
SLIDE 1

Mass Spectra Alignments and their Significance

Sebastian B¨

  • cker1, Hans-Michael Kaltenbach2

1 Technische Fakult¨

at, Universit¨ at Bielefeld

2 NRW Int’l Graduate School in Bioinformatics and Genome Research,

Universit¨ at Bielefeld

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-2
SLIDE 2

Overview

◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-3
SLIDE 3

Overview

◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-4
SLIDE 4

Overview

◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-5
SLIDE 5

Overview

◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-6
SLIDE 6

Overview

◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-7
SLIDE 7

Proteins

Biology Proteins are directed polymers of 20 different amino acids.

K K K K G T D S T N M D A S T A A Q I T

Mathematics Proteins are strings over an alphabet Σ.

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-8
SLIDE 8

Mass Spectrometry

Mass Spectrometry in Bioscience Mass spectrometry measures the masses and quantity of molecules in a probe. It is widely used in biosciences to identify proteins and other biomolecules.

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-9
SLIDE 9

Fragmentation of peptides

Problem Solely measuring the mass of a protein is not sufficient for identifi- cation.

mass abundance

K M D Q I T K A K A S T A K G T D S T N

Idea Break up the protein into smaller pieces in a deterministic way. The spectrum of these pieces is called a fingerprint of the protein.

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-10
SLIDE 10

Fragmentation of peptides

Problem Solely measuring the mass of a protein is not sufficient for identifi- cation.

mass abundance

K M D Q I T K A K A S T A K G T D S T N

Idea Break up the protein into smaller pieces in a deterministic way. The spectrum of these pieces is called a fingerprint of the protein.

mass abundance

K M D Q I T K A K A S T A K G T D S T N

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-11
SLIDE 11

Peptide Mass Fingerprints

Enzymatic cleavage example An enzyme cuts amino acid sequence after each letter K. K K K K G T D S T N M D A S T A A Q I T

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-12
SLIDE 12

Peptide Mass Fingerprints

Enzymatic cleavage example An enzyme cuts amino acid sequence after each letter K. K K K K G T D S T N M D A S T A A Q I T

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-13
SLIDE 13

Peptide Mass Fingerprints

200 300 400 500 600 700 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Artificial Spectrum of GTDSTNKDMKASTAKAKQIT

Mass

  • Rel. Abundance

GTDSTNK / 703.7071 DMK / 374.4614 ASTAK / 458.5151 AK / 199.3618 QIT / 343.3801

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-14
SLIDE 14

Real Mass Spectrum (PMF peaks annotated)

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-15
SLIDE 15

Processing the spectrum

Peak extraction Spectra are summarized into peak lists, but extracting peaks is in- herently difficult. Problem: Peak lists are never correct

◮ Inaccurate calibration ◮ Probe contamination ◮ Peak detection ◮ ...

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-16
SLIDE 16

Identification

Protein Identification w/ PMF

◮ Isolate many copies of ONE protein ◮ Digest it into specific smaller fragments (Mass Fingerprint) ◮ Make a mass spectrum of these fragments ◮ Compare spectrum to all predicted mass spectra from DB

Mass Fingerprint Mass Spectrometry via Mass Fingerprint via in-silico fragmentation Score + Significance Comparison AVKKPPTVHIIT... KVVGTASILLYV... VVNMTREEEASD... QEVFGGTELLPP... PLMKKRPHGTFD... ............... KLMMMTGERDFG... HILKMLVFDSAQ... Peaklist Peaklist

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-17
SLIDE 17

Identification

Protein Identification w/ PMF

◮ Isolate many copies of ONE protein ◮ Digest it into specific smaller fragments (Mass Fingerprint) ◮ Make a mass spectrum of these fragments ◮ Compare spectrum to all predicted mass spectra from DB

Mass Fingerprint Mass Spectrometry via Mass Fingerprint via in-silico fragmentation Score + Significance Comparison AVKKPPTVHIIT... KVVGTASILLYV... VVNMTREEEASD... QEVFGGTELLPP... PLMKKRPHGTFD... ............... KLMMMTGERDFG... HILKMLVFDSAQ... Peaklist Peaklist

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-18
SLIDE 18

Comparing Two Peak Lists

Peaklists and Empty Peaks Let Sm, Sp be an extracted and a predicted peaklist. Let ε denote a special gap peak. Scoring Scheme Each assignment between the two peak lists can be scored: score(Sp, Sm) =

  • matched i,j

score(i, j) matched peaks +

  • missing

score(i, ε) missing peaks +

  • additional

score(ε, j) additional peaks

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-19
SLIDE 19

Matching peaklists

Matching

◮ One-to-one peak matching ◮ Peak matchings should not cross ◮ Any peak must be matched either to a peak or to the gap

peak

◮ Matching score mainly based on mass difference but can

include other features Best matching Using such scoring schemes, the best peaklist matching can be com- puted using standard global alignment.

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-20
SLIDE 20

Scoring scheme example: Peak counting

Peak counting score score(i, j) = 1 |mass(i) − mass(j)| ≤ δ else score(i, ε) = score(ε, j) = 0 δ = 10, Sm = {1000, 1230, 1500} and Sp = {1000, 1235, 1700} Alignment Sp 1000 1235 ε 1700 Sm 1000 1230 1500 ε score(Sm, Sp) = (1 + 1) + 0 + 0 = 2.

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-21
SLIDE 21

Estimating the score distribution

Problem The score distribution depends on

◮ Measured spectrum ◮ Sequence length ◮ Mass and probability of characters

Estimation techniques

◮ Different null-models:

Sampling against spectra or sampling against sequences

◮ Sampling against sequences

Random or DB sequences both take long time

◮ Estimation of moments

Works with certain classes of distributions

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-22
SLIDE 22

Score distribution

Claim In most useful cases, the score distribution for fixed string length can be well approximated by a normal distribution and is then de- termined by expectation and variance. Missing and additional scores are usually very small compared to matches.

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-23
SLIDE 23

Computing moments

Main Idea Probability of a peak corresponds to probability of a fragment of same mass in peptide.

◮ Discretize masses by scaling and rounding ◮ Compute probability of fragment of length l with mass = m ◮ Compute probability of string of length L to have no fragment

  • f peak mass m

◮ Can all be done in preprocessing ◮ Estimate moments ◮ Compute p-value

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-24
SLIDE 24

Computing moments

Main Idea Probability of a peak corresponds to probability of a fragment of same mass in peptide.

◮ Discretize masses by scaling and rounding ◮ Compute probability of fragment of length l with mass = m ◮ Compute probability of string of length L to have no fragment

  • f peak mass m

◮ Can all be done in preprocessing ◮ Estimate moments ◮ Compute p-value

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-25
SLIDE 25

Fragment probability

Weighted Alphabet We call the tuple (Σ, µ) with mass function µ : Σ → N an (integer) weighted alphabet. Define µ(s) := |s|

k=1 µ(sk).

Fragments Let x be the cleavage character and Σx = Σ\{x}. The number of fragments of length l with mass m is then given by c[l, m] =

  • σ∈Σx,µ(σ)≤m

c[l − 1, m − µ(σ)] and for uniform character distribution we get the probability r[l, m] = 1 − c[l, m] |Σx|l

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-26
SLIDE 26

Probability in Strings

Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate.

K G T D S T N K M D K A S T A K A Q I T p[L,m]

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-27
SLIDE 27

Probability in Strings

Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate.

K G T D S T N K M D K A S T A K A Q I T 1st cleavage site at position l p[L,m]

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-28
SLIDE 28

Probability in Strings

Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate.

K G T D S T N K M D K A S T A K A Q I T 1st cleavage site at position l p[L,m] r[l-1,m-µ(K)]

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-29
SLIDE 29

Probability in Strings

Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate.

K G T D S T N K M D K A S T A K A Q I T p[L-l,m] 1st cleavage site at position l p[L,m] r[l-1,m-µ(K)]

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-30
SLIDE 30

Probability in Strings

Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate. The prob. of s ∈ ΣL to have NO fragment of mass m is given by ¯ p[L, m] = r[L, m] × P (no cleavage at all) +

L

  • l=1

r[l − 1, m − µ(x)]

  • first frag.

×P (first cleavage at l) × ¯ p[L − l, m]

  • suffix left

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-31
SLIDE 31

Expected match score of a peak

U1 U2

m/z [Da] Score

U3 M1 M2 M3

threshold

Extracted Peaks

Score distribution

The expected value of extracted peak j with support Uj is E(matchscore(j)) =

  • m∈Uj

p[L, m] × score(mass(j), m)

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005

slide-32
SLIDE 32

Conclusion

Main features

◮ Scoring schemes allow very flexible identification routines ◮ Computation of significance is database independent ◮ Extension to other cleavage schemes possible ◮ Extension to nonuniform alphabets and to isotope masses

straightforward

  • cker, Kaltenbach

Mass Spectra Alignments CPM 2005