Post-separation analysis Pierre-Alain Binz Swiss Institute of - - PDF document

post separation analysis
SMART_READER_LITE
LIVE PREVIEW

Post-separation analysis Pierre-Alain Binz Swiss Institute of - - PDF document

Post-separation analysis Pierre-Alain Binz Swiss Institute of Bioinformatics EMBNet course March 1-5, 2004 Proteomics pathway / generic workflow Sample Data Analysis, Separation Selection of spot(s) G Q E R N K M T E


slide-1
SLIDE 1

1

Post-separation analysis

Pierre-Alain Binz Swiss Institute of Bioinformatics EMBNet course March 1-5, 2004

Proteomics pathway / generic workflow

Databases Separation Sample Data processing Data Analysis, Selection of spot(s)

G Q M R T N E K E

... NRTKGG ...

Post-separation analysis

slide-2
SLIDE 2

2 Proteomics pathway / generic workflow

Databases Separation Sample Data processing Data Analysis, Selection of spot(s)

G Q M R T N E K E

... NRTKGG ...

Post-separation analysis

choice of sample sample collection sample pre-fractionation sample pre-treatment LC (CEX, affinity, etc.) 1-DE (CE, SDS) 2-DLC, 2-DE Sample and data tracking Samples comparison Statistical analysis Choice of fractions (LC) Choice of gel spots (1-DE, 2-DE) Systematic analysis Edman sequencing AAA Endoproteolytic cleavage Mass Spectrometry (MALDI-MS, ESI MS/MS) Specific Identification tools Specific Characterisation tools Analysis tools Validation tools

Experimental attributes for Experimental attributes for proteome proteome studies studies

2D-PAGE «spot» cut from 2-DE pI Mr spot intensity extraction digestion Edman degradation AA analysis Entire protein Peptide fragments Peptide fragments sequence, sequence tags AA composition MALDI-MS, ESI-MS peptide mass fingerprints MS/MS PSD CID, ISD Mw

species tissue keyword

PTM biological sample, cell extract, ... biological sample, cell extract, ... Transblot

  • n a

membrane HPLC

slide-3
SLIDE 3

3

Protein Identification Scheme

Access Statistics Feb 1st 2004

Expert Protein Analysis System

http://www.expasy.org

Total number of connections since August 1993: 347'781'091 Total number of hosts that accessed ExPASy: 4'101'804 January 2004 (connections:) 6'296'007

Currently 8 mirrors available: Australia, Bolivia, Canada, China, Korea, Switzerland, Taiwan, USA And a secure server at

slide-4
SLIDE 4

4

ExPASy tools page

Identification: PeptIdent, Aldente TagIdent, AAcompIdent, MultiIdent, CombSearch Identification: PeptIdent, Aldente TagIdent, AAcompIdent, MultiIdent, CombSearch Characterization: FindMod, GlycoMod, FindPept Characterization: FindMod, GlycoMod, FindPept Analysis: PeptideMass, GlycanMass BioGraph, ProtScale, ProtParams Analysis: PeptideMass, GlycanMass BioGraph, ProtScale, ProtParams

  • Use annotation in SWISS-PROT and TrEMBL

(preprocessing, PTMs, etc.)

  • Hyper-links between tools and databases
slide-5
SLIDE 5

5

Proteins identification using proteomic analysis methods

  • Gel matching
  • Co-migration
  • AA composition analysis
  • Imunodetection
  • Mass spectrometry

Confidence index

  • +

Gel matching & protein co-migration

slide-6
SLIDE 6

6

AA composition analysis

MRSLLILVLC FLPLAALGKV FGRCELAAAM...

Acid hydrolysis Cat: phenol cristal 6M HCl, 1 h, 160°C Free amino acids AA derivatisation HPLC separation Chromatogram

Determination of the AA composition Quantification of material analysed

slide-7
SLIDE 7

7

AA composition analysis AA composition analysis

slide-8
SLIDE 8

8

AA composition analysis of Lysozyme

AA LYC_CHICK STD [AA] en pmol % AA Theo AA % D & Q 900429 284708 790.7 19.2 21.4 E & N 355747 342065 260.0 6.3 5.1 Cys-PAM 1162 0.0 Cys-CAM 55843 0.0 Hyp 1299384 1145045 283.7 0.0 S 800004 500572 399.5 9.7 10.2 H 14344 263754 13.6 0.3 1.0 G 1260209 636948 494.6 12.0 12.2 T 491550 484090 253.9 6.2 7.1 A 727955 441015 412.7 10.0 12.2 P 192289 619034 77.7 1.9 2.1 Y 61184 276460 55.3 1.3 3.0 R 796946 471232 422.8 10.3 11.2 V 344061 577000 149.1 3.6 6.2 M 47795 355085 33.7 0.8 2.1 I 287566 521712 137.8 3.4 6.2 L 344061 503932 170.7 4.2 8.1 F 397367 259872 382.3 9.3 3.0 K 74344 325145 57.2 1.4 6.2

AACompIdent

slide-9
SLIDE 9

9

PVDF-bound protein from 2-D gas phase hydrolysis in glass vial

  • ne step AA

extraction in same vial automated AA analysis protein identification by database matching

Protein identification by amino acid analysis

AACompIdent output for single species identification:

Spot ECOLItest ============== AMINO ACID COMPOSITION Asx: 17.90 Glx: 15.00 Ser: 3.40 Gly: 7.70 His: 0.30 Thr: 5.50 Ala: 14.10 Pro: 4.70 Tyr: 0.30 Arg: 1.40 Val: 9.30 Met: 0.70 Ile: 3.70 Leu: 4.50 Phe: 3.80 Lys: 8.50 Tagging: No_Tag pI: 4.68 Range: 4.43 - 4.93 Mw: 9741 Range: 7793 - 11689 The ECOLI entries having pI and Mw values in the specified range: Rank Score Protein pI Mw Description ======================================================================= 1 20 HDEA_ECOLI 4.68 9741 PROTEIN HDEA. 2 187 GLR1_ECOLI 4.81 9685 GLUTAREDOXIN 1 (GRX1). 3 187 YCCJ_ECOLI 4.70 8524 HYPOTHETICAL 8.5 KD PROTEIN IN AGP 4 224 YFHF_ECOLI 4.43 11536 HYPOTHETICAL 11.5 KD PROTEIN IN HSCA 5 224 THIO_ECOLI 4.67 11675 THIOREDOXIN.

slide-10
SLIDE 10

10

Edman Degradation Edman Degradation

slide-11
SLIDE 11

11 AACompIdent with tag option

SpotNb YEAST-JH7 =============== AMINO ACID COMPOSITION FOR UNKNOWN PROTEIN Asx: 15.68 Glx: 9.70 Ser: 6.63 His: 1.09 Gly: 12.17 Thr: 4.86 Ala: 11.52 Pro: 2.57 Tyr: 2.47 Arg: 2.51 Val: 8.95 Met: 0.50 Ile: 5.53 Leu: 6.30 Phe: 3.17 Lys: 6.35 Tagging: VRVA pI: 6.60 Range: 6.35 - 6.85 Mw: 33095 Range: 26476 - 39714 The YEAST entries having pI and Mw values in the specified range: Rank Score Protein pI Mw N-terminal Sequence ========================================================================== * 1 19 G3P3_YEAST 6.49 35615 vrvaINGFGRIGRLVMRIALSRPNVEVVALNDPFITNDYA 2 32 DHSO_YEAST 6.46 38165 MSQNSNPAVVLEKVGDIAIEQRPIPTIKDPHYVKLAIKAT 3 40 YKV8_YEAST 6.67 34899 MIVPTYGDVLDASNRIKEYVNKTPVLTSRMLNDRLGAQIY 4 43 YHQ6_YEAST 6.41 37087 MIKHIVSPFRTNFVGISKSVLSRMIHHKVTIIGSGPAAHT 5 44 MDHM_YEAST 6.79 33833 YKVTVLGAGGGIGQPLSLLLKLNHKVTDLRLYDLKGAKGV

AACompIdent output for sequence tag and AAC identification

slide-12
SLIDE 12

12

Many proteins have unique N- and C- sequence tags

10 20 30 40 50 60 70 80 90 100 3AA N-term 4AA N-term 3AA C-term 4AA C-term % proteins with unique tag

  • M. genitalium (469)
  • E. coli (4481)
  • S. cerevisiae (4799)

How frequent are non-unique E. coli N- and C- tags?

3AA N-Tags

10 20 30 40 MKK MSE MSK MKT MSN MSQ MKI MST MTT MKQ

4AA N-Tags

10 20 30 40 MKTL MKKI MKKL MALL MKIL MKNI MLKR MRVL MSEK MAKN

3AA C-Tags

10 20 30 40 AKK KKK RRR AAQ AVL EEE KAA LLK QGE YLS

4AA C-tags

10 20 30 40 AKKK EAAQ FGSN KAGR LEDE RTIA VEKV VYQF AAAN AAGG

slide-13
SLIDE 13

13

  • E. coli proteins with N-tag MKTL have different pI and MW

10000 20000 30000 40000 50000 60000 3 4 5 6 7 8 9 10 predicted pI predicted mass

Protein identification with TagIdent http://www.expasy.org/tools/tagident.html

slide-14
SLIDE 14

14

Search performed with following values: pI = 5.97 Mw = 45098 delta-pI = 0.50 delta-Mw = 9019 OS or OC = ECOLI Tagging = MDQT

  • 223 proteins found

Results with tagging: 1 found DHE4_ECOLI (P00370) [pI = 5.98; Mw = 48581.1] mdqtYSLESFLNHVQKRDPNQTEFAQAVREVMTTLWPFLE

  • Results without tagging: 222 found
  • e.g. E. coli 2D-000KWF

Protein identification with sequence tag, pI & MW Protein identification with sequence tag, pI & MW

Best applications:

  • Organisms with a “small” proteome (e.g. bacteria)
  • Organisms with a known genome

Means of generating terminal tags:

  • Edman degradation (automated Tag machine, 8-24 samples/day)
  • C-terminal chemical sequencing
  • Amino- or carboxypeptidases and mass spectrometry

If results are ambiguous:

  • Use the same PVDF membrane for further analyses

Has advantages:

  • No pretreatment needed (e.g. digestion / extraction)
  • Data and results are quickly interpreted

Wilkins M.R., Gasteiger E. et.al., J.Mol.Biol.278:599-608(1998)

slide-15
SLIDE 15

15

Protein identification with sequence tag, pI & MW

Has disadvantages:

  • slow
  • needs > 1 pmol of purified protein
  • 75% of high eukaryote proteins are N-term blocked
  • N-term tag of mature protein can be different from precursor
  • pI and MW not valid if protein highly modified

Immunodetection

1st step Protein electroblotting from a gel to a PVDF membrane 2nd step Proteins on the membrane are incubated with specific antibody solution (diluted serum/plasma or purified antibody solution) 3rd step Peroxidase conjugated anti- antibodies are added and incubated 4th step Light is obtained 5 - 10 minutes after addition of the colouring solution and bands are visible at the sites on the membrane

  • ccupied by antibodies
slide-16
SLIDE 16

16

Immunodetection

Western immunoblots of food contaminated with Staphylococcal Enterotoxins A (SEA). Food samples were homogenized and spiked with purified SEA. The sample (40 µl) was then applied directly to the gel and assayed by Western blot. Milk, potato salad and meat product with or without SEA were tested. Lane 1 -- Protein Standards; Lane 2 -- milk; Lane 3 -- milk+SEA; Lane 4 -- potato salad; Lane 5 -- potato salad+SEA; Lane 6 -- meat; Lane 7 - meat+SEA.

Protein identification by immunodetection

Has advantages:

  • No pretreatment needed (e.g. digestion / extraction)
  • Data and results are quickly interpreted (no bioinformatics)
  • Can highlight the presence of a protein in a complex mixture
  • Very sensitive

Has disadvantages:

  • Some proteins are difficult to transfer on membranes
  • One protein (epitope) can be detected at a time
  • Specificity depends on the specificity of the antibody
  • Rarely allows to tackle the PTMs or splicing variant forms
slide-17
SLIDE 17

17

Identification / Caractérisation des protéines

  • Dégradation d’Edman,

20-40 résidus/j 100-1000 échantillons/j 15-30 échantillons/j

  • Analyse par spectrométrie

de masse :

  • Analyse peptidique,
  • Analyse de la séquence

peptidique,

  • Analyse de la composition

en acides aminés, 20-40 protéines/j

  • Identification des protéines

par immunoblotting, 1-10 protéines/analyse Protein Identification using Mass Spectrometry

protein from gel/ PVDF/LC fraction tryptic digestion & peptide extraction PMF identification Mass spectrometry, peptide mass fingerprints

TYGGAAR PSTTGVEMFR EHICLLGK GANK

unmodified and modified peptides 1-DE, 2-DE, LC MS/MS identification Mass spectrometry, peptide MS fragments MS Fragmentation

slide-18
SLIDE 18

18

How does a mass spectrometer works? Let’s start with PMF-based Protein identification and characterization

slide-19
SLIDE 19

19

Proteome Proteome complexity complexity

a b c a c d

splicing variants

a’ b c d

truncations, fragments

a b’ c’ d a b c d a b c d a b c d a b c d a b c d a b c d

discrete and heterogeneous PTMs I have identified the protein ABC The protein ABC? OK, which one? Protein Identification using Mass Spectrometry

protein from gel/ PVDF/LC fraction tryptic digestion & peptide extraction PMF identification Mass spectrometry, peptide mass fingerprints

TYGGAAR PSTTGVEMFR EHICLLGK GANK

unmodified and modified peptides 1-DE, 2-DE, LC

FRSDKTHMNIFR… EWQLNNHFSIRS… HRITPEWMHCDL…

slide-20
SLIDE 20

20

Input parameters

PeptIdent output

slide-21
SLIDE 21

21

GlycoMod: prediction of glycosilations FindMod: prediction of PTMs FindPept: prediction of non-specific cleavages, contaminants, etc PeptideMass: calculation of theoretical peptide masses

Protein characterization

Exact primary structure Splicing variants Sequence conflicts PTMs 1 protein entry does not represent 1 unique molecule

slide-22
SLIDE 22

22

SWISS-PROT feature table: active protein is more than just translation

  • f gene sequence (example: P20366)

Detection of PTMs in MS of tryptic fragments

624.3 769.8 893.4 994.5 1056.1 1326.7 1501.9 1759.8 1923.4 2100.6

600 2200 624.3 769.8 893.4 994.5 1056.1 1326.7 1501.9 1759.8 1923.4 2100.6 600 2200

624.3 769.8 893.4 994.5 1056.1 1326.7 1501.9 1759.8 1923.4 2100.6

600 2200 624.3 769.8 893.4 994.5 1070.1 1326.7 1501.9 1759.8 1923.4 2100.6 600 2200

Δ m/z => PTM Unmodified tryptic fragments Tryptic fragments

  • f a modified

protein

slide-23
SLIDE 23

23

A complementary pair of protein identification and characterization tools at ExPASy using peptide mass fingerprinting data http://www.expasy.org/tools/

identification program that accounts for known modifications & processing in SWISS-PROT programs to predict new post-translational modifications, amino acid substitutions and processing events PeptIdent Aldente FindMod GlycoMod FindPept

MALDI-MS Identification of EFTU_ECOLI and Verification of N-acetylation and Methylysine

unmatched peptides: 1085.26 1192.68 1216.92 1631.81 1734.56 2682.66

slide-24
SLIDE 24

24

FindMod Tool

URL address : http://www.expasy.org/tools/findmod/

D B e n t r y e x p e r i m e n t a l m a s s e s e x p e r i m e n t a l

  • p

t i

  • n

s A A m

  • d

i f i c a t i

  • n

s

FindMod Output

}

}

unmodified peptides, modified peptides known in SWISS-PROT and chemically modified peptides putatively modified peptides predicted by mass differences + putative AA substitutions

slide-25
SLIDE 25

25

Modification rules can be defined from SWISS-PROT, PROSITE and the literature

modification amino acid rule exceptions

farnesylation Cys

  • palmitoylation

Cys Ser, Thr O-GlcNAc Ser, Thr Asn amidation

Xaa (C-term) where Gly followed Xaa

pyrrolidone carboxylic acid Gln (N-term)

  • phosphorylation in eukaryotes:

Ser, Thr, Asp, His, Tyr

  • in prokaryotes: Ser, Thr, Asp, His, Cys
  • sulfatation in eukaryotes

Tyr, PROSITE PDOC00003

some examples:

FindMod Output - Application of Rules

  • potentially modified peptides that agree with rules are listed
  • amino acids that potentially carry modifications are shown
  • peptides potentially modified only by mass difference
  • predictions can be tested by MS-MS peptide fragmentation
slide-26
SLIDE 26

26

GlycoMod - Prediction of glycopeptides

  • Protein glycosylation is one of the most complex

and most frequently occurring PTMs;

  • GlycoMod can predict the possible N- or O-linked
  • ligosaccharide structures that occur on proteins

from their experimentally determined masses;

  • can be used for free or derivatized oligosaccharides

and for glycopeptides;

  • 12 million different compositions for O-linked
  • ligosaccharides with masses <= 5000 Da, 2 million

different masses.

Prediction of glycosylation: http://www.expasy.org/tools/glycomod/

Proteomics 1:340-349 (2001).

slide-27
SLIDE 27

27

GlycoMod - input form (part 2)

GlycoMod Result: N-linked glycosylation of P09923 (human alkaline phosphatase, intestinal)

Interpretation help: link to glycosylation db

slide-28
SLIDE 28

28 GlycoSuiteDb (http://www.glycosuite.com)

FindPept

http://www.expasy.org/tools/findpept.html

  • From MS (peptide mass fingerprint) data -

detection of :

  • Matching peptides for unspecific cleavage
  • Masses resulting from possible

contaminants

  • Matching peptides for specific cleavage

(16 different enzymes)

  • Peptides resulting from protease autolysis
slide-29
SLIDE 29

29

slide-30
SLIDE 30

30

Measured masses Real masses Approximation Shift Slope Spectrometer deviation function

The main idea is to take into account the problems of the calibration bind to the spectrometer.

Theo Exp

Exp : Experimental m/z Theo : Theoretical m/z

All possible match between theoretical and experimental masses.

slide-31
SLIDE 31

31

Exp Theo

The user defines the maximum delta allowed to compare the masses. This can be defined either by a dalton or a ppm area or both. The program locates and solves the ambiguities. 1 exp 3 theo 2 exp 2 theo

Exp Theo

slide-32
SLIDE 32

32

Exp Theo

The program finds the best way to make correspond the masses.

Exp Exp - Theo

An other representation that allows to scale the axes differently. Here is also represented the internal error allowed by the user.

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

Other PMF identification tools on the web Mascot MS-fit Peptide Search Profound PepMapper

slide-35
SLIDE 35

35 http://www.expasy.org/tools/

slide-36
SLIDE 36

36

slide-37
SLIDE 37

37

slide-38
SLIDE 38

38

slide-39
SLIDE 39

39

Peptides from protein A Peptides from another prot Peptides from keratines Peptides from trypsin autolysis non-peptidic signals PTM: no-modif annotated PTM non-annotated PTM Met: no-modif Oxidized Met Cys: no-modif Cys-CAM Cys-PAM no-modif mutated AA without K/R mutated with K/R unspecific cleavage + Na, +K,... + peptides invisible: too big, too small, not iodized + metastable ions

Challenges of MS data interpretation

slide-40
SLIDE 40

40 PMF modelisation

PKM modelisation

  • Use extracted knowledge (decision tree,

model tree,…) to predict the relative intensities the peptides should attain in a experiemental PKM spectra.

slide-41
SLIDE 41

41

Use of the predictors Use of the predictors

Next step: MS/MS