Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele - - PowerPoint PPT Presentation
Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele - - PowerPoint PPT Presentation
Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano Pathways analysis in proteomics the input is the expression proteomics data and the output is the list of activated or dominant pathways in a given
Pathways analysis in proteomics
the input is the expression proteomics data and the
- utput is the list of activated or
dominant pathways in a given sample
Aim
- To generate non-trivial functional hypotheses on biological
systems
- To define disease biomarker among pathways or pathway
patterns instead of single molecules
- To rationalize how molecules interact in ‘molecular pathways’, i.e.
chains of chemical reactions or physical interactions in which the product of one reaction becomes the reactant of the other.
- To construct for each organism the minimum set of linearly
independent (orthogonal) pathways, non-negative linear combinations of which represent all potential steady states of the organism.
Differential Analysis of Promonocytic U937 “Plus” And “Minus” Cell Clones
Promonocytic U937 cell line is used as in vitro model of HIV infection. Two different clones of U937 have been described and defined as “Plus” and “Minus” in respect of their efficiency or inefficiency to support productive HIV-1 infection. Aim of this study was to investigate the whole proteome of Plus (10) and Minus(34) clones in order to detect potential quali/quantitative differences at the protein expression level to unravel protein correlates of efficient/inefficient HIV replication.
Known differences at morphological, proliferative and molecular levels
Expression of common and differential myelomonocytic Ag in U937 Plus and Minus cell clones Cellular factors differentially expressed by Plus and Minus U937 cell clones
- SINGLE-STEP ANALYSIS
About 250 ng of tryptic digest have been injected into the nLC and separated with a 194 minutes long gradient.
C:\Xcalibur\data\09Sep\27Sep\AA1_7LM 9/28/2007 1:44:06 PM RT: 8.95 - 77.50 SM: 7G 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Time (min) 10 20 30 40 50 60 70 80 90 100 Relative Abundance 17.32 23.49 29.43 38.90 29.15 34.56 26.55 33.50 35.41 22.68 24.29 33.36 31.64 20.71 16.52 37.61 40.56 16.27 41.53 15.58 44.93 10.53 47.19 62.61 70.83 49.99 53.35 NL: 8.43E4 Base Peak F: ITMS + c NSI E d Full ms2 MS AA1_7LM AA1_7LM #746 RT: 25.55 AV: 1 NL: 1.79E4 T: ITMS + c NSI E d Full ms2 681.33@cid35.00 [175.00-1375.00] 200 300 400 500 600 700 800 900 1000 1100 1200 1300 m/z 10 20 30 40 50 60 70 80 90 100 Relative Abundance 963.2988 805.2812 672.2371 557.1674 876.2897 1116.1715 399.0985 718.2689 486.1531 644.1760 1187.2200 539.1782 371.1805 300.0984 1003.1432 246.2188 1062.3458 589.1819 440.1023 759.1253 945.2876 858.3187 215.0024 1227.2920 1300.2496
MS high resolution Top 5 low resolution
- SINGLE-STEP ANALYSIS
About 250 ng of tryptic digest have been injected into the nLC and separated with a 180 minutes long gradient. Mass spectra have been acquired twice with two partially overlapping mass ranges:
- 300-900 low mass
- 750-1600 high mass
U937_10 U937_10bis U937_34 U937_34bis 12415 10689 Number of proteins (bold red and ion score more than 20) 664 733 752 680 13708 10510 Number of queries
Mass ranges combined. Mascot and X! tandem searches. 5ppm precursor, 0.5 Da for
- fragments. At least 2 peptides at 95%, protein probability >99%.
more than 700 proteins identified
Expression Proteomics
- Fast (2-4h)
- Reliable (more than 2 peptides,>99%)
- Quantitative (R2 around 0.9)
- Specific (FDR< 1%)
- Deep (dynamic range >1:20)
- Sensitive (100 ng proteins)
- Comprehensive (>500 proteins )
- Inexpensive (200 euro)
The QUANTI analysis:
The MASCOT database search is compared to the chromatographic run to get the abundance of each single peptide identified.
Abundance of the below peptide MS of a peptide Sum of isotopic peaks Retention Time Accurate mass Chromatographic peak MASCOT result
Peptide QUANTIfication
SEQUENCE RT MA SCOT RT APEX MZ MASCOT MZ Q UA NTI IPI MA SCOT SCORE INT ENSITY FULLINT R.YESLTDPSKLDSGK.E 91.200607 84.407524 770.381287 770.379028 IPI00784295 61 212771.6406 K.HLEINPDHSIIET LR.Q 86.204216 86.17421 893.976074 893.978271 IPI00784295 64 65058548 K.VILHLKEDQ TEYLEER.R 83.876076 83.714897 1008.028015 1008.027954 IPI00784295 70 10953614 K.DLVILLYETALLSSGFSLEDPQTHANR.I 133.445953 131.745605 1001.520325 1001.524231 IPI00784295 137 203456512 M.PEETQTQDQPMEEEEVETFAFQ AEIAQLMSLIINTFYSNK.E 173.490768 173.519012 1560.401001 1560.397095 IPI00784295 54 182668.7188 M.PEETQTQDQPMEEEEVETFAFQ AEIAQLMSLIINTFYSNKEIFLR.E 177.030182 176.660629 1335.139771 1335.140381 IPI00784295 26 87048.85938 K.TLNDELEIIEG MK.F 82.980316 77.69648 752.882202 752.881042 IPI00784154 96 1714931.5 R.ALM LQG VDLLADAVAVTMGPK.G 125.89357 126.296585 1057.073608 1057.074341 IPI00784154 98 118192832 R.TALLDAAG VASLLTTAEVVVTEIPK.E 201.519104 95.73555 828.140686 828.142761 IPI00784154 69 1631606.125 K.VVIGMDVAASEFFR.S 122.195076 119.691261 770.89563 770.895142 IPI00465248 115 32162.14258 K.IDKLM IEMDG TENK.S 105.149544 103.420265 818.901428 818.9021 IPI00465248 56 147749.8281 R.AAVPSGASTGIYEALELR.D 92.408096 92.50663 902.976379 902.978149 IPI00465248 93 456041760 K.LAM QEFMILPVGAANFR.E 101.599594 101.737404 954.499329 954.499756 IPI00465248 70 258961744 K.FTASAG IQVVGDDLTVTNPK.R 112.769814 117.65992 1017.031128 1017.029846 IPI00465248 88 78492.90625 K.FTASAG IQVVGDDLTVTNPKR.I 88.349304 88.265793 1095.084839 1095.081543 IPI00465248 111 9042372
Protein QUANTIfication
IPI Abundance IPI00784295 279951163.2 IPI00784154 121539369.6 IPI00465248 724304280.9 … … … …
Sum of the intensity of every peak belonging to that protein:
SINGLE-STEP ANALYSIS
Total lysates in-solution digested: 250 ng of digested proteins separated by LC through very long gradient. Analysis in duplicates. MS acquired two times, with two different and partially overlapping mass ranges (300-900 and 750-1600). Mass spectra summed up and submitted to database searching as a whole; MASCOT and X! tandem algorithms for database search. More than 700 proteins identified and quantitated : 19 unique of clone 10 47 unique of clone 34 64 up regulated and 27 down-regulated in 34
Go analysis
PSE Zubarev et al. J Proteomics. 2008
Protein names and protein abundances are loaded Two analysis can be performed: direct and TF mediated, but both pass through the Key Node (signaling molecules found on pathway intersections in the upstream vicinity
- f the genes from the input list) filtering step.
The resultant sets of genes are compared and their intersection is mapped on a pathway database. Each found key-node receives a score reflecting its connectivity, i.e. how many input- list genes are reached and the proximities to the reached genes. Key-nodes with the highest connectivity (highest score) are then selected, and downstream genes are chosen as a subset for subsequent mapping onto the pathways.
Load Proteins list and abundance to ExPlain
IPI no. Abundance IPI00003362 433576562 IPI00784154 104565394 IPI00021439 620550130 IPI00010796 188395124 IPI00784347 75332303 IPI00604784 1741269646 IPI00465028 149096200 IPI00019502 98212724 IPI00788958 82243515 IPI00396378 101936146 IPI00169383 143714167 IPI00027720 115625859 IPI00215743 35239791 IPI00219018 193798265 IPI00554648 217364644 IPI00021405 80042812 IPI00030363 66232299 IPI00291006 98192134 IPI00003865 130129714 IPI00303476 91854550 IPI00465248 176033959 IPI00021428 86977256 IPI00013808 47563661
- Abundance
Proteins converted to genes
KeyNodes Analysis
KeyNode Score= connectivity KeyNode Name Key Node = signalling molecule found on pathway intersections in the upstream vicinity of the genes from the input list
Mapping of signaling molecules onto pathways
Pathway Score
Pathway name Score EGF pathway 4.38 stress‐associated pathways 3.45 E2F network 3.34 Caspase network 3.28 insulin pathway 2.80 p53 pathway 2.71 T‐cell antigen receptor pathway 2.54 JNK pathway 2.42 Fas pathway 2.40 PRL pathway 2.29 B‐cell antigen receptor pathway 2.22 TGFbeta pathway 2.16 RANKL pathway 2.06 Sphase(Cdk2) 2.03 Epo pathway 1.86 TLR4 pathway 1.85 IL‐1 pathway 1.83 G1phase(Cdk2) 1.80
.htm file of KeyNode list .htm file of pathway list
+
Score
Pathway score = ∑KeyNode scores
Sample
Score 4.32 4.10 2.93 2.67 2.63 2.52 2.27 2.17 2.12 2.08 2.04 2.04 1.99 1.78 1.76 1.75 1.74 1.66 1.58 1.41 1.40 1.29 1.27 Temp Score ‐0.41 3.87 ‐0.76 0.65 ‐0.49 1.48 ‐1.00 1.01 ‐0.34 0.09 0.85 ‐0.38 ‐0.10 ‐0.15 ‐0.15 ‐0.11 ‐0.18 ‐0.63 ‐0.12 ‐0.08 ‐0.22 0.00 0.01 Rank 19 1 22 5 20 2 23 3 17 6 4 18 10 13 14 11 15 21 12 9 16 8 7
Ranked by Temp Score
Control
Pathway name Score EGF pathway 4.41 E2F network 3.01 Stress pathways 3.18 p53 pathway 2.41 insulin pathway 2.81 Sphase(Cdk2) 1.84 Caspase network 2.67 G1phase(Cdk2) 1.64 PRL pathway 2.28 TGFbeta pathway 2.03 G2/Mphase(cyclinB:Cd k1) 1.57 JNK pathway 2.22 RANKL pathway 2.04 Epo pathway 1.86 IL‐1 pathway 1.85 PDGF pathway 1.81 TLR4 pathway 1.84 Fas pathway 2.00 TLR3 pathway 1.66 IL‐1beta → p50:RelA 1.47 p38 pathway 1.55 tuberin pathway 1.29 VEGF‐A pathway 1.26
Each Sample/Control pair 2 Sample, 2 Controls 4 Sample/Control pairs 4 ranks for each pathway e-value = Rank1•Rank2•Rank3•Rank4/N3 (N = total number of pathways)
Final Score:
(N - R1) - (N – R2) - (N – R3) - (N – R4) R1 × R2 × R3 × R4 log
Comparing pathways from different samples
Identify up/down regulated pathways
Pathway List X 4 (2 Controls and 2 Samples)
Pathway name
Score
EGF pathway 4.38 stress‐associated pathways 3.45 E2F network 3.34 Caspase network 3.28 insulin pathway 2.80 p53 pathway 2.71 T‐cell antigen receptor pathway 2.54 JNK pathway 2.42 Fas pathway 2.40 PRL pathway 2.29 B‐cell antigen receptor pathway 2.22 TGFbeta pathway 2.16 RANKL pathway 2.06 Sphase(Cdk2) 2.03 Epo pathway 1.86 TLR4 pathway 1.85 IL‐1 pathway 1.83 G1phase(Cdk2) 1.80
Final Score Sample Random ‐8 1 ‐7 1 ‐6 4 ‐5 4 ‐4 8 3 ‐3 15 6 ‐2 27 25 ‐1 45 67 81 88 1 79 80 2 41 54 3 25 29 4 10 8 5 6 1 6 5 7 3 8 3 9 1 10 1 11 1
Down-regulated pathways Up-regulated pathways
Final Score
Distributions of final score of pathways
U937_34Long Vs U937_10Long
Top 3 Outliers (> 3 σ)
Two dominant pathways in Minus clone : FAS and JNK, while IFN resulted dominant in the Plus clone
RESULTS
Up to now, the analysis on the two clones showed differences at a morphological, proliferative and molecular level (such as the expression of surface antigens, proteases, and membrane receptors). A new dimension has been added by pathways analysis. ..but needs to be validated Involved pathways: Fas and JNK activated in U937 clone 34 IFN upregulated in U937 clone 10 To confirm this hypothesis, different antibodies against elements of these pathways have been used in western blot analysis. In particular: anti-casp3 (cleaved form) (FAS pathway) anti-pJNK (active form) (JNK pathway) anti-CASP8 (FAS pathway)
Testing the hypothesis
#34 #34 #10 #10
Fas pathway is activated in clone 34
Anti Caspase-8
The JNKs family includes; JNK1 (four isoforms), JNK2 (four isoforms), and JNK3 (two isoforms). JNKs are activated by MAP2kinases The activated JNK/SAPK translocate to the nucleus where they phosphorylate transcription factors such as c-Jun,Elk1,DPC4 JIP3 JIP2
NUCLEUS CYTOPLASM MEMBRANE MAPKKKs JNK1/3
JIP1
MKK4/7 JNK JNK GTPaseGTPaseGTPase Growth factors, UV, Inflammatory cutokines Cellular stress, γ radiation TF
transcription Growth Differentiation Survival Apoptosis
Testing the hypothesis
JNK1 is activated in clone 34, while JNK2/3 are activated in clone 10
47.5 32.5
Actin
62 47.5
Anti pSAPK/JNK
#10 #34 #34 #10
expected MW: approx 46 kDA (pJNK1) and 54 kDa (pJNK2/3)
Where are we?
We can , in 1 exp. from 1x106 cells, identify and relatively quantitate (1 status vs the other) about 500-1000 proteins. Pathway Search Engine
Dominant pathway identification
What do we need?
- Development of a ‘global’ kinetic model of cell regulation (linking
the observed protein abundances and the classical signaling pathways)
Problems to overcome:
- The lack of the orthogonal pathway basis set
- strong nonlinear relation of biological systems
- the presence of variability in definitions of even major
pathways in different biological databases
- Are our data suitable? (we are limited to key nodes because we
see a partial set)
- Is our quantitation/dynamic description suitable?
What is missing?
- Is our data set informative and big
enough?
- PTMs
- how can we select the crucial protein in
a pathway?
- ......
Thanks to
- Roman Zubarev & Eva Fung
Uppsala University, Sweden Guido Poli
AIDS Immunopathogenesis Unit Umberto Restuccia