Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele - - PowerPoint PPT Presentation

▶

Oct 15, 2023 356 likes •672 views

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano Pathways analysis in proteomics the input is the expression proteomics data and the output is the list of activated or dominant pathways in a given

SLIDE 1

Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Pathways analysis in proteomics

SLIDE 2

Pathways analysis in proteomics

the input is the expression proteomics data and the

utput is the list of activated or

dominant pathways in a given sample

SLIDE 3

Aim

To generate non-trivial functional hypotheses on biological

systems

To define disease biomarker among pathways or pathway

patterns instead of single molecules

To rationalize how molecules interact in ‘molecular pathways’, i.e.

chains of chemical reactions or physical interactions in which the product of one reaction becomes the reactant of the other.

To construct for each organism the minimum set of linearly

independent (orthogonal) pathways, non-negative linear combinations of which represent all potential steady states of the organism.

SLIDE 4

Differential Analysis of Promonocytic U937 “Plus” And “Minus” Cell Clones

Promonocytic U937 cell line is used as in vitro model of HIV infection. Two different clones of U937 have been described and defined as “Plus” and “Minus” in respect of their efficiency or inefficiency to support productive HIV-1 infection. Aim of this study was to investigate the whole proteome of Plus (10) and Minus(34) clones in order to detect potential quali/quantitative differences at the protein expression level to unravel protein correlates of efficient/inefficient HIV replication.

SLIDE 5

Known differences at morphological, proliferative and molecular levels

Expression of common and differential myelomonocytic Ag in U937 Plus and Minus cell clones Cellular factors differentially expressed by Plus and Minus U937 cell clones

SLIDE 6

SINGLE-STEP ANALYSIS

About 250 ng of tryptic digest have been injected into the nLC and separated with a 194 minutes long gradient.

C:\Xcalibur\data\09Sep\27Sep\AA1_7LM 9/28/2007 1:44:06 PM RT: 8.95 - 77.50 SM: 7G 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Time (min) 10 20 30 40 50 60 70 80 90 100 Relative Abundance 17.32 23.49 29.43 38.90 29.15 34.56 26.55 33.50 35.41 22.68 24.29 33.36 31.64 20.71 16.52 37.61 40.56 16.27 41.53 15.58 44.93 10.53 47.19 62.61 70.83 49.99 53.35 NL: 8.43E4 Base Peak F: ITMS + c NSI E d Full ms2 MS AA1_7LM AA1_7LM #746 RT: 25.55 AV: 1 NL: 1.79E4 T: ITMS + c NSI E d Full ms2 681.33@cid35.00 [175.00-1375.00] 200 300 400 500 600 700 800 900 1000 1100 1200 1300 m/z 10 20 30 40 50 60 70 80 90 100 Relative Abundance 963.2988 805.2812 672.2371 557.1674 876.2897 1116.1715 399.0985 718.2689 486.1531 644.1760 1187.2200 539.1782 371.1805 300.0984 1003.1432 246.2188 1062.3458 589.1819 440.1023 759.1253 945.2876 858.3187 215.0024 1227.2920 1300.2496

MS high resolution Top 5 low resolution

SLIDE 7

SINGLE-STEP ANALYSIS

About 250 ng of tryptic digest have been injected into the nLC and separated with a 180 minutes long gradient. Mass spectra have been acquired twice with two partially overlapping mass ranges:

300-900 low mass
750-1600 high mass

U937_10 U937_10bis U937_34 U937_34bis 12415 10689 Number of proteins (bold red and ion score more than 20) 664 733 752 680 13708 10510 Number of queries

Mass ranges combined. Mascot and X! tandem searches. 5ppm precursor, 0.5 Da for

fragments. At least 2 peptides at 95%, protein probability >99%.

more than 700 proteins identified

SLIDE 8

Expression Proteomics

Fast (2-4h)
Reliable (more than 2 peptides,>99%)
Quantitative (R2 around 0.9)
Specific (FDR< 1%)
Deep (dynamic range >1:20)
Sensitive (100 ng proteins)
Comprehensive (>500 proteins )
Inexpensive (200 euro)

SLIDE 9

The QUANTI analysis:

The MASCOT database search is compared to the chromatographic run to get the abundance of each single peptide identified.

Abundance of the below peptide MS of a peptide Sum of isotopic peaks Retention Time Accurate mass Chromatographic peak MASCOT result

SLIDE 10

Peptide QUANTIfication

SEQUENCE RT MA SCOT RT APEX MZ MASCOT MZ Q UA NTI IPI MA SCOT SCORE INT ENSITY FULLINT R.YESLTDPSKLDSGK.E 91.200607 84.407524 770.381287 770.379028 IPI00784295 61 212771.6406 K.HLEINPDHSIIET LR.Q 86.204216 86.17421 893.976074 893.978271 IPI00784295 64 65058548 K.VILHLKEDQ TEYLEER.R 83.876076 83.714897 1008.028015 1008.027954 IPI00784295 70 10953614 K.DLVILLYETALLSSGFSLEDPQTHANR.I 133.445953 131.745605 1001.520325 1001.524231 IPI00784295 137 203456512 M.PEETQTQDQPMEEEEVETFAFQ AEIAQLMSLIINTFYSNK.E 173.490768 173.519012 1560.401001 1560.397095 IPI00784295 54 182668.7188 M.PEETQTQDQPMEEEEVETFAFQ AEIAQLMSLIINTFYSNKEIFLR.E 177.030182 176.660629 1335.139771 1335.140381 IPI00784295 26 87048.85938 K.TLNDELEIIEG MK.F 82.980316 77.69648 752.882202 752.881042 IPI00784154 96 1714931.5 R.ALM LQG VDLLADAVAVTMGPK.G 125.89357 126.296585 1057.073608 1057.074341 IPI00784154 98 118192832 R.TALLDAAG VASLLTTAEVVVTEIPK.E 201.519104 95.73555 828.140686 828.142761 IPI00784154 69 1631606.125 K.VVIGMDVAASEFFR.S 122.195076 119.691261 770.89563 770.895142 IPI00465248 115 32162.14258 K.IDKLM IEMDG TENK.S 105.149544 103.420265 818.901428 818.9021 IPI00465248 56 147749.8281 R.AAVPSGASTGIYEALELR.D 92.408096 92.50663 902.976379 902.978149 IPI00465248 93 456041760 K.LAM QEFMILPVGAANFR.E 101.599594 101.737404 954.499329 954.499756 IPI00465248 70 258961744 K.FTASAG IQVVGDDLTVTNPK.R 112.769814 117.65992 1017.031128 1017.029846 IPI00465248 88 78492.90625 K.FTASAG IQVVGDDLTVTNPKR.I 88.349304 88.265793 1095.084839 1095.081543 IPI00465248 111 9042372

Protein QUANTIfication

IPI Abundance IPI00784295 279951163.2 IPI00784154 121539369.6 IPI00465248 724304280.9 … … … …

Sum of the intensity of every peak belonging to that protein:

SLIDE 11

SINGLE-STEP ANALYSIS

Total lysates in-solution digested: 250 ng of digested proteins separated by LC through very long gradient. Analysis in duplicates. MS acquired two times, with two different and partially overlapping mass ranges (300-900 and 750-1600). Mass spectra summed up and submitted to database searching as a whole; MASCOT and X! tandem algorithms for database search. More than 700 proteins identified and quantitated : 19 unique of clone 10 47 unique of clone 34 64 up regulated and 27 down-regulated in 34

SLIDE 12

Go analysis

SLIDE 13

SLIDE 14

PSE Zubarev et al. J Proteomics. 2008

Protein names and protein abundances are loaded Two analysis can be performed: direct and TF mediated, but both pass through the Key Node (signaling molecules found on pathway intersections in the upstream vicinity

f the genes from the input list) filtering step.

The resultant sets of genes are compared and their intersection is mapped on a pathway database. Each found key-node receives a score reflecting its connectivity, i.e. how many input- list genes are reached and the proximities to the reached genes. Key-nodes with the highest connectivity (highest score) are then selected, and downstream genes are chosen as a subset for subsequent mapping onto the pathways.

SLIDE 15

Load Proteins list and abundance to ExPlain

IPI no. Abundance IPI00003362 433576562 IPI00784154 104565394 IPI00021439 620550130 IPI00010796 188395124 IPI00784347 75332303 IPI00604784 1741269646 IPI00465028 149096200 IPI00019502 98212724 IPI00788958 82243515 IPI00396378 101936146 IPI00169383 143714167 IPI00027720 115625859 IPI00215743 35239791 IPI00219018 193798265 IPI00554648 217364644 IPI00021405 80042812 IPI00030363 66232299 IPI00291006 98192134 IPI00003865 130129714 IPI00303476 91854550 IPI00465248 176033959 IPI00021428 86977256 IPI00013808 47563661

Abundance

Proteins converted to genes

SLIDE 16

KeyNodes Analysis

KeyNode Score= connectivity KeyNode Name Key Node = signalling molecule found on pathway intersections in the upstream vicinity of the genes from the input list

SLIDE 17

Mapping of signaling molecules onto pathways

SLIDE 18

Pathway Score

Pathway name Score EGF pathway 4.38 stress‐associated pathways 3.45 E2F network 3.34 Caspase network 3.28 insulin pathway 2.80 p53 pathway 2.71 T‐cell antigen receptor pathway 2.54 JNK pathway 2.42 Fas pathway 2.40 PRL pathway 2.29 B‐cell antigen receptor pathway 2.22 TGFbeta pathway 2.16 RANKL pathway 2.06 Sphase(Cdk2) 2.03 Epo pathway 1.86 TLR4 pathway 1.85 IL‐1 pathway 1.83 G1phase(Cdk2) 1.80

.htm file of KeyNode list .htm file of pathway list

+

Score

Pathway score = ∑KeyNode scores

SLIDE 19

Sample

Score 4.32 4.10 2.93 2.67 2.63 2.52 2.27 2.17 2.12 2.08 2.04 2.04 1.99 1.78 1.76 1.75 1.74 1.66 1.58 1.41 1.40 1.29 1.27 Temp Score ‐0.41 3.87 ‐0.76 0.65 ‐0.49 1.48 ‐1.00 1.01 ‐0.34 0.09 0.85 ‐0.38 ‐0.10 ‐0.15 ‐0.15 ‐0.11 ‐0.18 ‐0.63 ‐0.12 ‐0.08 ‐0.22 0.00 0.01 Rank 19 1 22 5 20 2 23 3 17 6 4 18 10 13 14 11 15 21 12 9 16 8 7

Ranked by Temp Score

Control

Pathway name Score EGF pathway 4.41 E2F network 3.01 Stress pathways 3.18 p53 pathway 2.41 insulin pathway 2.81 Sphase(Cdk2) 1.84 Caspase network 2.67 G1phase(Cdk2) 1.64 PRL pathway 2.28 TGFbeta pathway 2.03 G2/Mphase(cyclinB:Cd k1) 1.57 JNK pathway 2.22 RANKL pathway 2.04 Epo pathway 1.86 IL‐1 pathway 1.85 PDGF pathway 1.81 TLR4 pathway 1.84 Fas pathway 2.00 TLR3 pathway 1.66 IL‐1beta → p50:RelA 1.47 p38 pathway 1.55 tuberin pathway 1.29 VEGF‐A pathway 1.26

Each Sample/Control pair 2 Sample, 2 Controls 4 Sample/Control pairs 4 ranks for each pathway e-value = Rank1•Rank2•Rank3•Rank4/N3 (N = total number of pathways)

Final Score:

(N - R1) - (N – R2) - (N – R3) - (N – R4) R1 × R2 × R3 × R4 log

Comparing pathways from different samples

SLIDE 20

Identify up/down regulated pathways

Pathway List X 4 (2 Controls and 2 Samples)

Pathway name

Score

EGF pathway 4.38 stress‐associated pathways 3.45 E2F network 3.34 Caspase network 3.28 insulin pathway 2.80 p53 pathway 2.71 T‐cell antigen receptor pathway 2.54 JNK pathway 2.42 Fas pathway 2.40 PRL pathway 2.29 B‐cell antigen receptor pathway 2.22 TGFbeta pathway 2.16 RANKL pathway 2.06 Sphase(Cdk2) 2.03 Epo pathway 1.86 TLR4 pathway 1.85 IL‐1 pathway 1.83 G1phase(Cdk2) 1.80

Final Score Sample Random ‐8 1 ‐7 1 ‐6 4 ‐5 4 ‐4 8 3 ‐3 15 6 ‐2 27 25 ‐1 45 67 81 88 1 79 80 2 41 54 3 25 29 4 10 8 5 6 1 6 5 7 3 8 3 9 1 10 1 11 1

Down-regulated pathways Up-regulated pathways

Final Score

Distributions of final score of pathways

SLIDE 21

U937_34Long Vs U937_10Long

Top 3 Outliers (> 3 σ)

Two dominant pathways in Minus clone : FAS and JNK, while IFN resulted dominant in the Plus clone

SLIDE 22

RESULTS

Up to now, the analysis on the two clones showed differences at a morphological, proliferative and molecular level (such as the expression of surface antigens, proteases, and membrane receptors). A new dimension has been added by pathways analysis. ..but needs to be validated Involved pathways: Fas and JNK activated in U937 clone 34 IFN upregulated in U937 clone 10 To confirm this hypothesis, different antibodies against elements of these pathways have been used in western blot analysis. In particular: anti-casp3 (cleaved form) (FAS pathway) anti-pJNK (active form) (JNK pathway) anti-CASP8 (FAS pathway)

SLIDE 23

Testing the hypothesis

#34 #34 #10 #10

Fas pathway is activated in clone 34

Anti Caspase-8

SLIDE 24

The JNKs family includes; JNK1 (four isoforms), JNK2 (four isoforms), and JNK3 (two isoforms). JNKs are activated by MAP2kinases The activated JNK/SAPK translocate to the nucleus where they phosphorylate transcription factors such as c-Jun,Elk1,DPC4 JIP3 JIP2

NUCLEUS CYTOPLASM MEMBRANE MAPKKKs JNK1/3

JIP1

MKK4/7 JNK JNK GTPaseGTPaseGTPase Growth factors, UV, Inflammatory cutokines Cellular stress, γ radiation TF

transcription Growth Differentiation Survival Apoptosis

SLIDE 25

Testing the hypothesis

JNK1 is activated in clone 34, while JNK2/3 are activated in clone 10

47.5 32.5

Actin

62 47.5

Anti pSAPK/JNK

#10 #34 #34 #10

expected MW: approx 46 kDA (pJNK1) and 54 kDa (pJNK2/3)

SLIDE 26

Where are we?

We can , in 1 exp. from 1x106 cells, identify and relatively quantitate (1 status vs the other) about 500-1000 proteins. Pathway Search Engine

Dominant pathway identification

SLIDE 27

What do we need?

Development of a ‘global’ kinetic model of cell regulation (linking

the observed protein abundances and the classical signaling pathways)

Problems to overcome:

The lack of the orthogonal pathway basis set
strong nonlinear relation of biological systems
the presence of variability in definitions of even major

pathways in different biological databases

Are our data suitable? (we are limited to key nodes because we

see a partial set)

Is our quantitation/dynamic description suitable?

SLIDE 28

What is missing?

Is our data set informative and big

enough?

PTMs
how can we select the crucial protein in

a pathway?

......

SLIDE 29

SLIDE 30

Thanks to

Roman Zubarev & Eva Fung

Uppsala University, Sweden Guido Poli

AIDS Immunopathogenesis Unit Umberto Restuccia