Statistical modeling in molecular medicine: proteomics Anna Gambin - - PowerPoint PPT Presentation

statistical modeling in molecular medicine proteomics
SMART_READER_LITE
LIVE PREVIEW

Statistical modeling in molecular medicine: proteomics Anna Gambin - - PowerPoint PPT Presentation

Statistical modeling in molecular medicine: proteomics Anna Gambin Institute of Informatics, University of Warsaw outline masSpec basics modeling isotopic distribution modeling exopeptidase activity incorporating MEROPS data


slide-1
SLIDE 1

Statistical modeling in molecular medicine: proteomics

Anna Gambin Institute of Informatics, University of Warsaw

slide-2
SLIDE 2
  • utline
  • masSpec basics
  • modeling isotopic distribution
  • modeling exopeptidase activity
  • incorporating MEROPS data
  • peptidase activity in time
  • modeling electron transfer dissociation
  • deconvolution of spectra
  • modeling fragmentation
slide-3
SLIDE 3

Mass Spectrometry

Proteins

  • data source:

Center For Proteomics, Antwerp, belgium

slide-4
SLIDE 4
slide-5
SLIDE 5

Identifying proteins is complicated there are plenty of proteins in a sample proteins are frequently fragmented even a single protein has a complicated signal

slide-6
SLIDE 6

Chemical compounds are made of different isotopes

  • isotopic envelope
slide-7
SLIDE 7

CcHhNnOoSs

  • ne

ie

  • huge number of isotopologues
slide-8
SLIDE 8

important observation

some isotopic variants are more probable than others P( ) =

slide-9
SLIDE 9

Assume 1) variants of isotopes of atoms are independent 2) elements vary in abundances of isotopes P( ) =

slide-10
SLIDE 10
  • 0 + o1 + o2 = 200
slide-11
SLIDE 11

How much we gain by considering

the smallest set

with a fixed probability ?

  • Y

Elements

n

ie−1 2

e

Y

Elements

nie−1

e

≈ Clattice ⇣ Y

Elements

n

ie−1 2

e

p det ∆e ⌘ qk

χ2(k)

πk/2 Γ(k/2 + 1) ∝

slide-12
SLIDE 12

To get the smallest set with probability P:

Find the most probable variant while Total Probability < P : Get layer so that p> P(v)>=qp where Trim the least probable variants from the last layer so that Total Probability >= P p = P(vmin previous layer)

slide-13
SLIDE 13

Smallest set with current Total Probability

Monotonic Expansion Property:

For each v set {W: P(W)>=P(v) } is adjacent to v

multinomial distribution

slide-14
SLIDE 14
slide-15
SLIDE 15
  • ur OPTIMAL implementation uses

queue for storing subsequent layers

  • a version of quick select for trimming

complexity

  • ther tricks
  • O(n) in the total number of configurations
slide-16
SLIDE 16

We provide theoretical background and get better run times

slide-17
SLIDE 17

LC-MS/MS

  • data for colorectal

cancer patients and healthy donors

  • ca 1000 peptides
  • preprocessing: spectra

interpretation and retention time aligning

proteolytic fragmentation

slide-18
SLIDE 18

Exopeptidase activity

  • motivation: differential

exoprotease activities contribute to cancer type–specific serum peptidome degradation

  • our goal: first formal

model estimated from LC-MS/MS data

Villanueva, J., Nazarian, A., Lawlor, K., et al. 2008. A sequence-specific exopeptidase activity test (sseat) for “functional” biomarker discovery. Mol. Cell. Proteomics 7, 509–518.

slide-19
SLIDE 19

FT FTS TS FTSS TSS SS FTSST TSST SST ST FTSSTS TSSTS SSTS STS SSTSY STSY TSY SY

⌥ Q(x, x⇥) =          a⇥i if x⇥

i = xi + 1, x⇥ i = xi for some i,

ar(i,j)xi if x⇥

j = xj + 1, x⇥ i = xi 1,

and x⇥

ij = xij for some i ⇧ j,

ai†xi if x⇥

i = xi 1, x⇥ i = xi for some i.

Cleavage graph

Q(x, x) =                        ai ar(i,j)xi ai†xi

create move annihilate/degrade

transition intensities for Markov process describing the flow of particles through the graph i.e. the process of peptidome degradation

slide-20
SLIDE 20

Proposition 1 (Equilibrium distribution). The process .X.t// has the equilibrium (stationary) dis- tribution given by: .x/ D Y

i2V

ei xi

i

xiŠ ;

where the configuration of intensities .i/i2V is the unique solution to the following system of “balance” equations: X

k!i

kar.k;i/ C a?i D i @X

i!j

ar.i;j/ C ai 1 A for every i 2 V:

in equilibrium

  • ld as the hills, but…
slide-21
SLIDE 21

(Br)r∈R (br)r∈R ∼ Dir((Br)r∈R) (B?i)i∈Vin (b?i)i∈Vin ∼ Dir((B?i)i∈Vin) Sshape, Srate s ∼ Gamma(Sshape, Srate) i = i(s, b?, b) for i ∈ V (✏i)i∈V q i ∼ Bern(q) for i: ✏i = 1 xi ∼ Poiss(i) for i: ✏i = 1 ⌧ yi ∼ LogNormal(xi, ⌧) for i: i = 1 yi ∼ Background for i: i = 0

hierarchical Bayesian model missing readings errors Metropolis-Hastings to sample from posterior:

slide-22
SLIDE 22

NON TRIVIAL TASK: filling the cleavage graph with real data

  • 1000 peptides: mass,

charge, retention time

  • 243 precursor peptides
  • ca. 40 000 subsequences

FTSS

  • from aa sequence:

calculate mass

  • consider all charges
  • predict retention

time (random forests) quite often: missing reads and errors !

slide-23
SLIDE 23

Cleavage graph for real proteolytic events

  • 20 colorectal cancer

patients and 20 healthy donors,

  • ca 1000 peptides,
  • preprocessing phase

MSFT†LTN†K u pepsin ⇥peps

xy

thermolysin ⇥ther

vw

LTNK w MSFT v MSFT†L†TN x K y thermolysin ⇥ther

vz

chemotrypsin ⇥chem

st

LTN z MSFTL s TN t

MUCH SMALLER cleavage graphs !

slide-24
SLIDE 24

25 38 16 14 7 19 3 13 1 9 15 37 28 26 34 33 39 31 22 35 17 6 12 29 8 10 2 27 23 32 11 20 18 24 21 4 5 30 36

data set no.

eupitrilysin cathepsin.B membrane.type.matrix.metallopeptidase.3 trypsin.1 cathepsin.S granzyme.B...Homo.sapiens..type. elastase.1 tripeptidyl.peptidase.I matrix.metallopeptidase.20 tryptase.alpha chymase...Homo.sapiens..type. myeloblastin cathepsin.G cathepsin.L calpain.1 membrane.type.matrix.metallopeptidase.6 ADAMTS5.peptidase chymotrypsin.C pepsin.A ADAMTS4.peptidase caspase.1 ADAM17.peptidase ADAM10.peptidase cathepsin.H membrane.type.matrix.metallopeptidase.4 cathepsin.K legumain aminopeptidase.PILS kallikrein.related.peptidase.3 matrix.metallopeptidase.3 calpain.2 neprilysin plasmin 10 30

Value

20 60 100

Color Key and Histogram

Count

identified enzymes make sense !

slide-25
SLIDE 25

stochastic dynamics in time

  • A. Gambin, B. Kluge / Modeling Proteolysis from MS data

MSFT†LTN†K u pepsin ⇥peps

xy

thermolysin ⇥ther

vw

LTNK w MSFT v MSFT†L†TN x K y thermolysin ⇥ther

vz

chemotrypsin ⇥chem

st

LTN z MSFTL s TN t

  • Qxx =
  • cT⇥vwxu

if x = x u + v + w and u = v † w ,

  • therwise.

by ρvw the vector of all peptidase affinity coefficients for the cleavage v †w (for

estimate peptidase cutting intensities vector to perform the cleavage is proportional

from MEROPS: P(x, t) = P(X(t) = x).

⌥ ⌥tP(x, t) = ⌅

y⇥=x

(QyxP(y, t) − QxyP(x, t)) = ⌅

u=v†w

cT⇤vw [(xu + 1)P(x + u − v − w, t) − xuP(x, t)] = ⌅

u=v†w

cT⇤vw[x

uP(x, t) − xuP(x, t)],

calculated from CME

no more monomolecular system - we have reactions: A -> B and A-> B+C (endopeptidases) to be estimated:

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

interesting moments...

u − v − w

  • by Eq (t) the expected number of instances of peptide q at time t.

equation above:

Eq (t) = ⌅

x

xqP(x, t), ⌅

∂t Eq (t) = ⌦

u→q

λuq Eu (t) + λqq Eq (t) ⇥

q∈V

.

E (t) = E (0)T exp(Λt),

Row

20 40 60 20 40 60 −150 −100 −50 50 100 150

the matrix Λ = (λvw)v,w∈V for peptide VAHRFKDLGEEN.

slide-31
SLIDE 31

ETD fragmentation

more fragments more insight into structure more confidence in correct identification

slide-32
SLIDE 32

some bonds get easily broken .. others not

ETD

slide-33
SLIDE 33

understand fragmentation inside the instrument under different experimental conditions use purified chemical samples study fragmentation pathways locate fragments in data

  • 1. deconvolute signals and
  • 2. infer fragmentation

reaction constants

the goal of masstodon solution:

slide-34
SLIDE 34 0.00 0.01 0.02 0.03 410 415 420 425 430 Mass [Da] Probability

using atomic compositions of the fragments we generate isotopic spectra with

we can aggregate masses to match data resolution

slide-35
SLIDE 35

complications

we take into account charges … and imprecisions in instrumental mass calibration

slide-36
SLIDE 36

mass imprecisions

tolerance intervals around theoretical isotopic envelope natural data centroiding

m/z [Th]

T0 T1 T2

m/z [Th]

slide-37
SLIDE 37

m/z [Th]

T0 T1 T2 T1 T0 T2 G0 G1 G2 G3 G4 G5 G6 G7 intervals may overlap using interval trees we build up the deconvolution graph

Fragment Fragment

slide-38
SLIDE 38

m/z

TA0

G1 G0 G2 G3

TA1 TA2 TB0

G4 G5 G6 G7

TB1 TB2

FA FB

theory empiria

slide-39
SLIDE 39

F P P P F P P P P P P F P P P P F P P P F P P P P E E E G E G E G E G E G E G E E E G E G E

a-theoretical peaks (no fragments around) fragment with no empirical support fragment with its isotopic envelope a fragment with empirical support: trivial case (no need for deconvolution) more a-theoretical peaks

connected components of the deconvolution graph provide a wealth of insight into the spectrum

two fragments with empirical support: suitable for deconvolution

slide-40
SLIDE 40

to perform deconvolution we present the problem as a linear programme similar to the max flow problem

theory empiria

pA1 pA0 pA2 pB2 pB1 pB0

αB αA

xA00

xA11 xA12

xB02 xB03

xA24

xA25 xB15 xB16 xB27

slide-41
SLIDE 41

Electron Transfer Dissociation

+ +

+

  • Cleavage of protein backbone by a

rapid neutralization of charge

  • To identify proteins
  • To sequence proteins de novo
  • To identify post-translational

modifications

slide-42
SLIDE 42

Petri Net model

PTR ETnoD ETDn

+

+ + +

  • Electron Transfer Dissociation (ETD):

[M + nH]n+ [M1 + n1H]n1+ + [M2 + n2H]n2+

  • Proton Transfer Reaction (PTR):

[M + nH]n+ [M + (n-1)H](n-1)+

  • Electron Transfer Without Dissociation (ETnoD):

[M + nH]n+ [M + nH](n-1)+

slide-43
SLIDE 43

Ion Space Parametrization

(A, p, q)

Aminoacid Sequence Number of protons quenched by ETnoD Charge

  • Evolution of an ion = Markov

Jump Process

  • Jump = transition between states
  • Jump intensity = reaction

intensity Draw reaction time Draw reaction type Compute reaction products Put ions into sample Distribute charges

  • n precursor

sequence

slide-44
SLIDE 44

Population approach

Stochastic description of a single ion

ODE description of a big population of ions

slide-45
SLIDE 45
  • Tree-like structure
slide-46
SLIDE 46

Sequence Charge Electrons Intensity RPKPQQ 3 0.25 RPKP 1 1 0.01 PQQ 1 0.12 ... ... ... ...

slide-47
SLIDE 47

Intensity estimation

slide-48
SLIDE 48

Piotr Dittwald Frederik Lermyte Dirk Valkenborg M i c h a l S t a r t e k Frank Sobott Blazej Miasojedow

Many thanks to collaborators

Mateusz Łącki Michał Ciach