L14 Mass Spec Quantitation MS applications Microarray analysis - - PowerPoint PPT Presentation

l14
SMART_READER_LITE
LIVE PREVIEW

L14 Mass Spec Quantitation MS applications Microarray analysis - - PowerPoint PPT Presentation

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I Peptide 1 m/z time A peptide/feature can be labeled with the triple Peptide 2 elution (M,T,I): x x x x monoisotopic M/Z, centroid x x x


slide-1
SLIDE 1

CSE182

L14

Mass Spec Quantitation MS applications Microarray analysis

slide-2
SLIDE 2

CSE182

LC-MS Maps

time

m/z I

Peptide 2 Peptide 1

x x x x x x x x x x x x x x x x x x x x

time m/z

Peptide 2 elution

  • A peptide/feature can be

labeled with the triple (M,T,I):

– monoisotopic M/Z, centroid retention time, and intensity

  • An LC-MS map is a collection
  • f features
slide-3
SLIDE 3

CSE182

Time scaling: Approach 1 (geometric matching)

  • Match features based on M/Z, and (loose) time matching.

Objective Σf (t1-t2)2

  • Let t2’ = a t2 + b. Select a,b so as to minimize Σf (t1-t’2)2
slide-4
SLIDE 4

CSE182

Geometric matching

  • Make a graph. Peptide a in

LCMS1 is linked to all peptides with identical m/ z.

  • Each edge has score

proportional to t1/t2

  • Compute a maximum weight

matching.

  • The ratio of times of the

matched pairs gives a.

  • Rescale and compute the

scaling factor

T M/Z

slide-5
SLIDE 5

CSE182

Approach 2: Scan alignment

  • Each time scan is a vector
  • f intensities.
  • Two scans in different runs

can be scored for similarity (using a dot product)

S11 S12 S22 S21 M(S1i,S2j) = ∑k S1i(k) S2j (k) S1i= 10 5 0 0 7 0 0 2 9 S2j= 9 4 2 3 7 0 6 8 3

slide-6
SLIDE 6

CSE182

Scan Alignment

  • Compute an alignment of the

two runs

  • Let W(i,j) be the best scoring

alignment of the first i scans in run 1, and first j scans in run 2

  • Advantage: does not rely on

feature detection.

  • Disadvantage: Might not

handle affine shifts in time scaling, but is better for local shifts S11 S12 S22 S21 W (i, j) = max W (i −1, j −1) + M[S1i,S2 j] W (i −1, j) + ... W (i, j −1) + ...     

slide-7
SLIDE 7

CSE182

Chemistry based methods for comparing peptides

slide-8
SLIDE 8

CSE182

ICAT

  • The reactive group

attaches to Cysteine

  • Only Cys-peptides will

get tagged

  • The biotin at the other

end is used to pull down peptides that contain this tag.

  • The X is either

Hydrogen, or Deuterium (Heavy)

– Difference = 8Da

slide-9
SLIDE 9

CSE182

ICAT

  • ICAT reagent is attached to particular amino-acids (Cys)
  • Affinity purification leads to simplification of complex

mixture

“diseased”

Cell state 1 Cell state 2

“Normal” Label proteins with heavy ICAT Label proteins with light ICAT Combine Fractionate protein prep

  • membrane
  • cytosolic

Proteolysis Isolate ICAT- labeled peptides

  • Nat. Biotechnol. 17: 994-999,1999
slide-10
SLIDE 10

CSE182

Differential analysis using ICAT

ICAT pairs at known distance

heavy light

Time M/Z

slide-11
SLIDE 11

CSE182

ICAT issues

  • The tag is heavy, and decreases the

dynamic range of the measurements.

  • The tag might break off
  • Only Cysteine containing peptides are

retrieved Non-specific binding to strepdavidin

slide-12
SLIDE 12

CSE182

Serum ICAT data

MA13_02011_02_ALL01Z3I9A* Overview (exhibits ’stack-ups’)

slide-13
SLIDE 13

CSE182

Serum ICAT data

8 22 24 30 32 38 40 46 16

  • Instead of

pairs, we see entire clusters at 0, +8,+16,+22

  • ICAT based

strategies must clarify ambiguous pairing.

slide-14
SLIDE 14

CSE182

ICAT problems

  • Tag is bulky, and can break off.
  • Cys is low abundance
  • MS2 analysis to identify the peptide is

harder.

slide-15
SLIDE 15

CSE182

SILAC

  • A novel stable isotope labeling strategy
  • Mammalian cell-lines do not ‘manufacture’ all

amino-acids. Where do they come from?

  • Labeled amino-acids are added to amino-acid

deficient culture, and are incorporated into all proteins as they are synthesized

  • No chemical labeling or affinity purification is

performed.

  • Leucine was used (10% abundance vs 2% for Cys)
slide-16
SLIDE 16

CSE182

SILAC vs ICAT

  • Leucine is higher

abundance than Cys

  • No affinity tagging

done

  • Fragmentation

patterns for the two peptides are identical

– Identification is easier

Ong et al. MCP, 2002

slide-17
SLIDE 17

CSE182

Incorporation of Leu-d3 at various time points

  • Doubling time of the cells is 24

hrs.

  • Peptide =

VAPEEHPVLLTEAPLNPK

  • What is the charge on the

peptide?

slide-18
SLIDE 18

CSE182

Quantitation on controlled mixtures

slide-19
SLIDE 19

CSE182

Identification

  • MS/MS of differentially labeled peptides
slide-20
SLIDE 20

CSE182

Peptide Matching

  • Computational: Under identical Liquid

Chromatography conditions, peptides will elute in the same order in two experiments.

– These peptides can be paired computationally

  • SILAC/ICAT allow us to compare relative

peptide abundances in a single run using an isotope tag.

slide-21
SLIDE 21

CSE182

MS quantitation Summary

  • A peptide elutes over a mass range (isotopic

peaks), and a time range.

  • A ‘feature’ defines all of the peaks corresponding

to a single peptide.

  • Matching features is the critical step to

comparing relative intensities of the same peptide in different samples.

  • The matching can be done chemically (isotope

tagging), or computationally (LCMS map comparison)

slide-22
SLIDE 22

CSE182

  • Biol. Data analysis: Review

Protein Sequence Analysis

Sequence Analysis/ DNA signals Gene Finding Assembly

slide-23
SLIDE 23

CSE182

Other static analysis is possible

Protein Sequence Analysis

Sequence Analysis Gene Finding Assembly ncRNA Genomic Analysis/ Pop. Genetics

slide-24
SLIDE 24

CSE182

A Static picture of the cell is insufficient

  • Each Cell is continuously

active,

– Genes are being transcribed into RNA – RNA is translated into proteins – Proteins are PT modified and transported – Proteins perform various cellular functions

  • Can we probe the Cell

dynamically?

– Which transcripts are active? – Which proteins are active? – Which proteins interact?

Gene Regulation Proteomic profiling Transcript profiling

slide-25
SLIDE 25

CSE182

Micro-array analysis

slide-26
SLIDE 26

CSE182

The Biological Problem

  • Two conditions that need to be

differentiated, (Have different treatments).

  • EX: ALL (Acute Lymphocytic Leukemia) &

AML (Acute Myelogenous Leukima)

  • Possibly, the set of expressed genes is

different in the two conditions

slide-27
SLIDE 27

CSE182

Supplementary fig. 2. Expression levels of predictive genes in independent dataset. The expression levels of the 50 genes most highly correlated with the ALL-AML distinction in the initial dataset were determined in the independent

  • dataset. Each row corresponds to a gene, with the columns corresponding to expression levels in different samples.

The expression level of each gene in the independent dataset is shown relative to the mean of expression levels for that gene in the initial dataset. Expression levels greater than the mean are shaded in red, and those below the mean are shaded in blue. The scale indicates standard deviations above or below the mean. The top panel shows genes highly expressed in ALL, the bottom panel shows genes more highly expressed in AML.

slide-28
SLIDE 28

CSE182

Gene Expression Data

  • Gene Expression data:

– Each row corresponds to a gene – Each column corresponds to an expression value

  • Can we separate the experiments

into two or more classes?

  • Given a training set of two classes,

can we build a classifier that places a new experiment in one of the two classes.

g s1 s2 s

slide-29
SLIDE 29

CSE182

Three types of analysis problems

  • Cluster analysis/unsupervised learning
  • Classification into known classes

(Supervised)

  • Identification of “marker” genes that

characterize different tumor classes

slide-30
SLIDE 30

CSE182

Supervised Classification: Basics

  • Consider genes g1 and g2

– g1 is up-regulated in class A, and down-regulated in class B. – g2 is up-regulated in class A, and down-regulated in class B.

  • Intuitively, g1 and g2 are effective in classifying the two
  • samples. The samples are linearly separable.

g1 g2

1 .9 .8 .1 .2 .1 .1 0 .2 .8 .7 .9 1 2 3 4 5 6 1 2 3

slide-31
SLIDE 31

CSE182

Basics

  • With 3 genes, a plane is used to separate (linearly

separable samples). In higher dimensions, a hyperplane is used.

slide-32
SLIDE 32

CSE182

Non-linear separability

  • Sometimes, the data is

not linearly separable, but can be separated by some other function

  • In general, the linearly

separable problem is computationally easier.

slide-33
SLIDE 33

CSE182

Formalizing of the classification problem for micro-arrays

  • Each experiment (sample) is

a vector of expression values.

– By default, all vectors v are column vectors. – vT is the transpose of a vector

  • The genes are the dimension
  • f a vector.
  • Classification problem: Find

a surface that will separate the classes v vT

slide-34
SLIDE 34

CSE182

Formalizing Classification

  • Classification problem: Find a surface (hyperplane)

that will separate the classes

  • Given a new sample point, its class is then

determined by which side of the surface it lies on.

  • How do we find the hyperplane? How do we find

the side that a point lies on? g1 g2

1 .9 .8 .1 .2 .1 .1 0 .2 .8 .7 .9

1 2 3 4 5 6 1 2 3

slide-35
SLIDE 35

CSE182

Basic geometry

  • What is ||x||2 ?
  • What is x/||x||
  • Dot product?

x=(x1,x2) y

xT y = x1y1 + x2y2 = || x ||⋅ || y ||cosθx cosθy+ || x ||⋅ || y ||sin(θx)sin(θy) || x ||⋅ || y ||cos(θx −θy)

slide-36
SLIDE 36

End of L14

CSE182

slide-37
SLIDE 37

CSE182

Dot Product

  • Let β be a unit vector.

– ||β|| = 1

  • Recall that

– βTx = ||x|| cos θ

  • What is βTx if x is
  • rthogonal

(perpendicular) to β? θ

x β

βTx = ||x|| cos θ

slide-38
SLIDE 38

CSE182

Hyperplane

  • How can we define a

hyperplane L?

  • Find the unit vector that

is perpendicular (normal to the hyperplane)

slide-39
SLIDE 39

CSE182

Points on the hyperplane

  • Consider a hyperplane L

defined by unit vector β, and distance β0

  • Notes;

– For all x ∈ L, xTβ must be the same, xTβ = β0 – For any two points x1, x2,

  • (x1- x2)T β=0

x1 x2

slide-40
SLIDE 40

CSE182

Hyperplane properties

  • Given an arbitrary point x,

what is the distance from x to the plane L? – D(x,L) = (βTx - β0)

  • When are points x1 and x2
  • n different sides of the

hyperplane?

x β0

slide-41
SLIDE 41

CSE182

Separating by a hyperplane

  • Input: A training set of +ve &
  • ve examples
  • Goal: Find a hyperplane that

separates the two classes.

  • Classification: A new point x

is +ve if it lies on the +ve side

  • f the hyperplane, -ve
  • therwise.
  • The hyperplane is

represented by the line

  • {x:-β0+β1x1+β2x2=0}

x2 x1

+

slide-42
SLIDE 42

CSE182

Error in classification

  • An arbitrarily chosen

hyperplane might not separate the test. We need to minimize a mis-classification error

  • Error: sum of distances of the

misclassified points.

  • Let yi=-1 for +ve example i,

– yi=1 otherwise.

  • Other definitions are also

possible. x2 x1

+

  • D(β,β0) =

yi xi

Tβ + β0

( )

i∈M

β

slide-43
SLIDE 43

CSE182

Gradient Descent

  • The function D(β) defines

the error.

  • We follow an iterative
  • refinement. In each step,

refine β so the error is reduced.

  • Gradient descent is an

approach to such iterative refinement.

D(β)

β

β ← β − ρ ⋅ D'(β)

D’(β)

slide-44
SLIDE 44

CSE182

Rosenblatt’s perceptron learning algorithm

D(β,β0) = yi xi

Tβ + β0

( )

i∈M

∂D(β,β0) ∂β = yixi

i∈M

∂D(β,β0) ∂β0 = yi

i∈M

⇒ Update rule : β β0       = β β0       − ρ yixi

i∈M

yi

i∈M

         

slide-45
SLIDE 45

CSE182

Classification based on perceptron learning

  • Use Rosenblatt’s algorithm to compute the

hyperplane L=(β,β0).

  • Assign x to class 1 if f(x) >= 0, and to class

2 otherwise.

slide-46
SLIDE 46

CSE182

Perceptron learning

  • If many solutions are possible, it does no

choose between solutions

  • If data is not linearly separable, it does

not terminate, and it is hard to detect.

  • Time of convergence is not well understood
slide-47
SLIDE 47

CSE182

Linear Discriminant analysis

  • Provides an alternative

approach to classification with a linear function.

  • Project all points, including

the means, onto vector β.

  • We want to choose β such

that – Difference of projected means is large. – Variance within group is small

x2 x1

+

  • β
slide-48
SLIDE 48

CSE182

LDA Cont’d

˜ m

1 = 1

n1 βT x

x

= wTm1 Scatter between samples: | ˜ m

1 − ˜

m

2 |2= βT (m1 − m2) 2

˜ m

1 − ˜

m

2 2 = βTSBβ

scatter within sample : ˜ s

1 2 + ˜

s

2 2

where, ˜ s

1 2 =

(y − ˜ m

1 y

)2 = (βT (x − m1)

x∈D1

)2 = βTS1β ˜ s

1 2 + ˜

s

2 2 = βT (S1 + S2)β = βTSwβ

maxβ βTSBβ βTSwβ

Fisher Criterion

slide-49
SLIDE 49

CSE182

Maximum Likelihood discrimination

  • Suppose we knew the

distribution of points in each class.

– We can compute Pr(x|ωi) for all classes i, and take the maximum

slide-50
SLIDE 50

CSE182

ML discrimination

  • Suppose all the points

were in 1 dimension, and all classes were normally distributed.

Pr(ωi | x) = Pr(x |ωi)Pr(ωi) Pr(x |ω j)Pr(ω j)

j

gi(x) = ln Pr(x |ωi)

( ) + ln Pr(ωi) ( )

≅ −(x − µi)2 2σ i

2

+ ln Pr(ωi)

( )

slide-51
SLIDE 51

CSE182

ML discrimination recipe

  • We know the distribution for each class, but not

the parameters

  • Estimate the mean and variance for each class.
  • For a new point x, compute the discrimination

function gi(x) for each class i.

  • Choose argmaxi gi(x) as the class for x