L14 Mass Spec Quantitation MS applications Microarray analysis - PowerPoint PPT Presentation

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182

LC-MS Maps Peptide 2 I Peptide 1 m/z time • A peptide/feature can be labeled with the triple Peptide 2 elution (M,T,I): x x x x – monoisotopic M/Z, centroid x x x x x x retention time, and intensity • An LC-MS map is a collection x x x x m/z of features x x x x x x time CSE182

Time scaling: Approach 1 (geometric matching) • Match features based on M/Z, and (loose) time matching. Objective Σ f (t 1 -t 2 ) 2 • Let t 2 ’ = a t 2 + b. Select a,b so as to minimize Σ f (t 1 -t’ 2 ) 2 CSE182

Geometric matching • Make a graph. Peptide a in LCMS1 is linked to all peptides with identical m/ M/Z z. • Each edge has score proportional to t 1/ t 2 • Compute a maximum weight matching. • The ratio of times of the T matched pairs gives a. • Rescale and compute the CSE182 scaling factor

Approach 2: Scan alignment • Each time scan is a vector S 11 S 12 of intensities. • Two scans in different runs can be scored for similarity (using a dot product) S 1i = 10 5 0 0 7 0 0 2 9 S 2j = 9 4 2 3 7 0 6 8 3 M(S 1i ,S 2j ) = ∑ k S 1i (k) S 2j (k) S 21 S 22 CSE182

Scan Alignment • Compute an alignment of the two runs S 11 S 12 • Let W(i,j) be the best scoring alignment of the first i scans in run 1, and first j scans in run 2  W ( i − 1, j − 1) + M [ S 1 i , S 2 j ]  W ( i , j ) = max W ( i − 1, j ) + ...   W ( i , j − 1) + ...  • Advantage: does not rely on feature detection. • Disadvantage: Might not handle affine shifts in time scaling, but is better for local shifts S 21 S 22 CSE182

Chemistry based methods for comparing peptides CSE182

ICAT • The reactive group attaches to Cysteine • Only Cys-peptides will get tagged • The biotin at the other end is used to pull down peptides that contain this tag. • The X is either Hydrogen, or Deuterium (Heavy) – Difference = 8Da CSE182

ICAT Label proteins Cell state 1 with heavy ICAT Combine Proteolysis “Normal” Cell state 2 Isolate Fractionate ICAT- protein prep Label proteins with labeled light ICAT peptides - membrane - cytosolic “diseased” Nat. Biotechnol. 17: 994-999,1999 • ICAT reagent is attached to particular amino-acids (Cys) • Affinity purification leads to simplification of complex mixture CSE182

Differential analysis using ICAT Time ICAT pairs at heavy known distance M/Z light CSE182

ICAT issues • The tag is heavy, and decreases the dynamic range of the measurements. • The tag might break off • Only Cysteine containing peptides are retrieved Non-specific binding to strepdavidin CSE182

Serum ICAT data MA13_02011_02_ALL01Z3I9A* Overview (exhibits ’stack-ups’) CSE182

Serum ICAT data • Instead of pairs, we see 46 40 entire 38 32 clusters at 0, 30 24 +8,+16,+22 22 16 • ICAT based 8 strategies 0 must clarify ambiguous pairing. CSE182

ICAT problems • Tag is bulky, and can break off. • Cys is low abundance • MS 2 analysis to identify the peptide is harder. CSE182

SILAC • A novel stable isotope labeling strategy • Mammalian cell-lines do not ‘manufacture’ all amino-acids. Where do they come from? • Labeled amino-acids are added to amino-acid deficient culture, and are incorporated into all proteins as they are synthesized • No chemical labeling or affinity purification is performed. • Leucine was used (10% abundance vs 2% for Cys) CSE182

SILAC vs ICAT Ong et al. MCP, 2002 • Leucine is higher abundance than Cys • No affinity tagging done • Fragmentation patterns for the two peptides are identical – Identification is easier CSE182

Incorporation of Leu-d3 at various time points • Doubling time of the cells is 24 hrs. • Peptide = VAPEEHPVLLTEAPLNPK • What is the charge on the peptide? CSE182

Quantitation on controlled mixtures CSE182

Identification • MS/MS of differentially labeled peptides CSE182

Peptide Matching • Computational: Under identical Liquid Chromatography conditions, peptides will elute in the same order in two experiments. – These peptides can be paired computationally • SILAC/ICAT allow us to compare relative peptide abundances in a single run using an isotope tag. CSE182

MS quantitation Summary • A peptide elutes over a mass range (isotopic peaks), and a time range. • A ‘feature’ defines all of the peaks corresponding to a single peptide. • Matching features is the critical step to comparing relative intensities of the same peptide in different samples. • The matching can be done chemically (isotope tagging), or computationally (LCMS map comparison) CSE182

Biol. Data analysis: Review Assembly Protein Sequence Sequence Analysis Analysis/ Gene Finding DNA signals CSE182

Other static analysis is possible Genomic Analysis/ Pop. Genetics Assembly Protein Sequence Sequence Analysis Analysis Gene Finding ncRNA CSE182

A Static picture of the cell is insufficient • Each Cell is continuously active, – Genes are being transcribed into RNA – RNA is translated into proteins – Proteins are PT modified and transported – Proteins perform various cellular functions • Can we probe the Cell dynamically? – Which transcripts are active? Gene – Which proteins are active? Proteomic Regulation – Which proteins interact? Transcript profiling profiling CSE182

Micro-array analysis CSE182

The Biological Problem • Two conditions that need to be differentiated, (Have different treatments). • EX: ALL (Acute Lymphocytic Leukemia) & AML (Acute Myelogenous Leukima) • Possibly, the set of expressed genes is different in the two conditions CSE182

Supplementary fig. 2. Expression levels of predictive genes in independent dataset. The expression levels of the 50 genes most highly correlated with the ALL-AML distinction in the initial dataset were determined in the independent dataset. Each row corresponds to a gene, with the columns corresponding to expression levels in different samples. The expression level of each gene in the independent dataset is shown relative to the mean of expression levels for that gene in the initial dataset. Expression levels greater than the mean are shaded in red, and those below the mean are shaded in blue. The scale indicates standard deviations above or below the mean. The top panel shows genes highly expressed in ALL, the bottom panel shows genes more highly expressed in AML. CSE182

Gene Expression Data Gene Expression data: • s 1 s 2 s – Each row corresponds to a gene – Each column corresponds to an expression value • Can we separate the experiments into two or more classes? g • Given a training set of two classes, can we build a classifier that places a new experiment in one of the two classes. CSE182

Three types of analysis problems • Cluster analysis/unsupervised learning • Classification into known classes (Supervised) • Identification of “marker” genes that characterize different tumor classes CSE182

Supervised Classification: Basics • Consider genes g 1 and g 2 – g 1 is up-regulated in class A, and down-regulated in class B. – g 2 is up-regulated in class A, and down-regulated in class B. • Intuitively, g1 and g2 are effective in classifying the two samples. The samples are linearly separable. 1 1 2 3 4 5 6 2 g 1 3 1 .9 .8 .1 .2 .1 .1 0 .2 .8 .7 .9 g 2 CSE182

Basics • With 3 genes, a plane is used to separate (linearly separable samples). In higher dimensions, a hyperplane is used. CSE182

Non-linear separability • Sometimes, the data is not linearly separable, but can be separated by some other function • In general, the linearly separable problem is computationally easier. CSE182

Formalizing of the classification problem for micro-arrays v • Each experiment (sample) is v T a vector of expression values. – By default, all vectors v are column vectors. – v T is the transpose of a vector • The genes are the dimension of a vector. • Classification problem: Find a surface that will separate the classes CSE182

Formalizing Classification • Classification problem: Find a surface (hyperplane) that will separate the classes • Given a new sample point, its class is then determined by which side of the surface it lies on. • How do we find the hyperplane? How do we find the side that a point lies on? 1 2 3 4 5 6 1 2 g 1 1 .9 .8 .1 .2 .1 3 .1 0 .2 .8 .7 .9 g 2 CSE182

Basic geometry • What is || x || 2 ? • What is x /|| x || x=(x 1 ,x 2 ) • Dot product? y x T y x 1 y 1 + x 2 y 2 = || x || ⋅ || y ||cos θ x cos θ y + || x || ⋅ || y ||sin( θ x )sin( θ y ) = || x || ⋅ || y ||cos( θ x − θ y ) CSE182

End of L14 CSE182

Dot Product x • Let β be a unit vector. – || β || = 1 • Recall that – β T x = ||x|| cos θ θ β • What is β T x if x is orthogonal β T x = ||x|| cos θ (perpendicular) to β ? CSE182

Hyperplane • How can we define a hyperplane L? • Find the unit vector that is perpendicular (normal to the hyperplane) CSE182

L14 Mass Spec Quantitation MS applications Microarray analysis - PowerPoint PPT Presentation

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I Peptide 1 m/z time A peptide/feature can be labeled with the triple Peptide 2 elution (M,T,I): x x x x monoisotopic M/Z, centroid x x x

L14. Sound Detector + Localization 2 - delay r ( ) f ( t ) h ( ) dt Jeffress Model

CSE182-L14 Population Genetics: Basics Population Structure 377 locations (loci) were

L14 July 7, 2017 1 Lecture 14: Crash Course in Probability CSCI 1360E: Foundations for

L14 July 11, 2018 1 Lecture 14: Data Exploration and Visualization CSCI 1360E: Foundations for

From Newtons law to hydrodynamic equations 18.354 - L14 Goal: derive t + r (

Chapra, L14 David A. Reckhow CEE 577 #3 1 Watershed & Hydrogeometric Parameters

Polymers (continued) 18.S995 - L14 &15 dunkel@mit.edu persistent RW model Karjalainen et

Chapra, L14 (cont.) David A. Reckhow CEE 577 #10 1 Longitudinal Dispersion From Fischer et

Longitudinal Dispersion From Fischer et al., 1979 m/s m 2 s -1 Width (m) 2 2 U B 0 011

CS3505/5020 Software Practice II Project #4: Review progress Use cases II Class diagrams II CS

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Mario

Related topics: Marc Van Droogenbroecks Computer Vision and Louis Wehenkel/Pierre

Cytogenetics Update Lynda J Campbell lynda.campbell@svhm.org.au Ph Nowell and Hungerford,

genomic res earch ARES Gianluca Reali coordinator University of Perugia 2nd TERENA Network

2019 Philmont Expeditions Parents & Participants Orientation Chester County Council High

Chapter 1 Rationale for Survival Analysis Time-to-event data have as principal end- point the

Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing

Overcoming Barriers to Access to Medicines and Health T echnologies for Cancer Stronger health

Gene Expression Microarray 02-223 How to Analyze Your Own

PUTTING IT ALL TOGETHER: CASE STUDIES I have nothing to disclose. Tiffany Kim, MD Assistant

Brookings Roundtable Webinar: Mini Sentinel Accomplishments and Plans for Year 2 January 31,

Good Morning to everyone.

recommendations, guidelines and local experience K M Chang Hospital Ampang WHO classification

Agenda Darren Coffman Oregon Health Evidence Review 3:00 3:20pm Commission (HERC) Dan

L14 Mass Spec Quantitation MS applications Microarray analysis - PowerPoint PPT Presentation

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I Peptide 1 m/z time A peptide/feature can be labeled with the triple Peptide 2 elution (M,T,I): x x x x monoisotopic M/Z, centroid x x x

L14. Sound Detector + Localization 2 - delay r ( ) f ( t ) h ( ) dt Jeffress Model

CSE182-L14 Population Genetics: Basics Population Structure 377 locations (loci) were

L14 July 7, 2017 1 Lecture 14: Crash Course in Probability CSCI 1360E: Foundations for

L14 July 11, 2018 1 Lecture 14: Data Exploration and Visualization CSCI 1360E: Foundations for

From Newtons law to hydrodynamic equations 18.354 - L14 Goal: derive t + r (

Chapra, L14 David A. Reckhow CEE 577 #3 1 Watershed &amp; Hydrogeometric Parameters

Polymers (continued) 18.S995 - L14 &amp;15 dunkel@mit.edu persistent RW model Karjalainen et

Chapra, L14 (cont.) David A. Reckhow CEE 577 #10 1 Longitudinal Dispersion From Fischer et

Longitudinal Dispersion From Fischer et al., 1979 m/s m 2 s -1 Width (m) 2 2 U B 0 011

CS3505/5020 Software Practice II Project #4: Review progress Use cases II Class diagrams II CS

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Mario

Related topics: Marc Van Droogenbroecks Computer Vision and Louis Wehenkel/Pierre

Cytogenetics Update Lynda J Campbell lynda.campbell@svhm.org.au Ph Nowell and Hungerford,

genomic res earch ARES Gianluca Reali coordinator University of Perugia 2nd TERENA Network

2019 Philmont Expeditions Parents &amp; Participants Orientation Chester County Council High

Chapter 1 Rationale for Survival Analysis Time-to-event data have as principal end- point the

Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing

Overcoming Barriers to Access to Medicines and Health T echnologies for Cancer Stronger health

Gene Expression Microarray 02-223 How to Analyze Your Own

PUTTING IT ALL TOGETHER: CASE STUDIES I have nothing to disclose. Tiffany Kim, MD Assistant

Brookings Roundtable Webinar: Mini Sentinel Accomplishments and Plans for Year 2 January 31,

Good Morning to everyone.

recommendations, guidelines and local experience K M Chang Hospital Ampang WHO classification

Agenda Darren Coffman Oregon Health Evidence Review 3:00 3:20pm Commission (HERC) Dan

Chapra, L14 David A. Reckhow CEE 577 #3 1 Watershed & Hydrogeometric Parameters

Polymers (continued) 18.S995 - L14 &15 dunkel@mit.edu persistent RW model Karjalainen et

2019 Philmont Expeditions Parents & Participants Orientation Chester County Council High