Proteomics Informatics Protein identification I: searching protein - PowerPoint PPT Presentation

Proteomics Informatics – Protein identification I: searching protein sequence collections and significance testing (Week 4)

Peptide Mapping - Mass Accuracy 2

Peptide Mapping Database Size Human C. elegans S. cerevisiae 3

Peptide Mapping Cys-Containing Peptides Human C. elegans S. cerevisiae 4

Identification – Peptide Mass Fingerprinting Sequence DB Repeat for each protein Pick Protein Digestion MS All Peptide Masses MS Compare, Score, Test Significance Identified Proteins

ProFound Results

Database size

Mixtures

Peptide Fragmentation Mass Frag- Mass Ion Source Detector Analyzer 1 mentation Analyzer 2 b y

Identification – Tandem MS

Tandem MS – Sequence Confirmation S G F L E E D E L K 100 % Relative Abundance 0 250 500 750 1000 m/z

Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 100 % Relative Abundance 0 250 500 750 1000 m/z

Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Relative Abundance 0 250 500 750 1000 m/z

Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 113 [M+2H] 2+ 113 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 129 % Relative Abundance 875 [M+2H] 2+ 129 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

Tandem MS – de novo Sequencing 762 100 % Relative Abundance Amino acid masses 875 [M+2H] 2+ 1-letter 3-letter Chemical Monois Average code code formula otopic 633 292 A Ala 71.0371 71.0788 C 3 H 5 ON 405 534 1022 260 389 504 R Arg 156.101 156.188 C 6 H 12 ON 4 907 1020 663 778 1080 N Asn 114.043 114.104 C 4 H 6 O 2 N 2 0 250 500 750 1000 D Asp 115.027 115.089 C 4 H 5 O 3 N m/z C Cys 103.009 103.139 C 3 H 5 ONS E Glu 129.043 129.116 C 5 H 7 O 3 N Mass Differences Q Gln 128.059 128.131 C 5 H 8 O 2 N 2 G Gly 57.0215 57.0519 C 2 H 3 ON H His 137.059 137.141 C 6 H 7 ON 3 I Ile 113.084 113.159 C 6 H 11 ON Sequences L Leu 113.084 113.159 C 6 H 11 ON K Lys 128.095 128.174 consistent C 6 H 12 ON 2 M Met 131.04 131.193 C 5 H 9 ONS with spectrum F Phe 147.068 147.177 C 9 H 9 ON P Pro 97.0528 97.1167 C 5 H 7 ON S Ser 87.032 87.0782 C 3 H 5 O 2 N T Thr 101.048 101.105 C 4 H 7 O 2 N W Trp 186.079 186.213 C 11 H 10 ON 2 Y Tyr 163.063 163.176 C 9 H 9 O 2 N V Val 99.0684 99.1326 C 5 H 9 ON

Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 260 32 129 145 244 274 373 403 502 518 615 647 760 762 819 292 97 113 212 242 341 371 470 486 583 615 728 730 787 389 16 115 145 244 274 373 389 486 518 631 633 690 405 99 129 228 258 357 373 470 502 615 617 674 504 30 129 159 258 274 371 403 516 518 575 534 99 129 228 244 341 373 486 488 545 633 30 129 145 242 274 387 389 446 663 99 115 212 244 357 359 416 762 16 113 145 258 260 317 778 97 129 242 244 301 875 32 145 147 204 907 113 115 172 1020 2 59 1022 57

Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 260 32 129 145 244 274 373 403 502 518 615 647 760 762 819 97 113 292 212 242 341 371 470 486 583 615 728 730 787 389 16 115 145 244 274 373 389 486 518 631 633 690 99 129 405 228 258 357 373 470 502 615 617 674 504 30 129 159 258 274 371 403 516 518 575 99 129 534 228 244 341 373 486 488 545 633 30 129 145 242 274 387 389 446 99 115 663 212 244 357 359 416 762 16 113 145 258 260 317 97 129 778 242 244 301 875 32 145 147 204 113 115 907 172 1020 2 59 57 1022

Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 32 E 260 145 244 274 373 403 502 518 615 647 760 762 819 X P I/L 212 292 242 341 371 470 486 583 615 728 730 787 16 D 389 145 244 274 373 389 486 518 631 633 690 X V E 405 228 258 357 373 470 502 615 617 674 30 E 504 159 258 274 371 403 516 518 575 X V E 534 228 244 341 373 486 488 545 30 E 633 145 242 274 387 389 446 …GF(I/L)EEDE(I/L)… …GF(I/L)EEDE(I/L)… S GF(I/L)EEDE(I/L)… X V D 663 212 244 357 359 416 …(I/L)EDEE(I/L)FG… …(I/L)EDEE(I/L)FG… 16 I/L 145 762 258 260 317 1166 – 1020 – 18 = 128  K or Q Peptide M+H = 1166 X P E 778 242 244 301 1166 -1079 = 87 => S 145 F 875 32 204 SGF(I/L)EEDE(I/L)( K/Q) S GF(I/L)EEDE(I/L)… X I/L D 907 172 1020 2 59 G 1022

Tandem MS – de novo Sequencing Challenges in de novo sequencing Challenges in de novo sequencing Neutral loss (-H 2 O, -NH 3 ) Neutral loss (-H 2 O, -NH 3 ) Modifications Modifications Background peaks Background peaks Incomplete information Incomplete information

Tandem MS – Database Search Sequence DB Lysis Pick Protein Fractionation Repeat for all proteins Digestion LC-MS Pick Peptide all peptides Repeat for MS/MS All Fragment Masses MS/MS Compare, Score, Test Significance

Search Results

Significance Testing False protein identification is caused by random matching An objective criterion for testing the significance of protein identification results is necessary. The significance of protein identifications can be tested once the distribution of scores for false results is known.

Significance Testing - Expectation Values The majority of sequences in a collection will give a score due to random matching.

Significance Testing - Expectation Values Database Search List of Candidates M/Z Distribution of Scores for Random and False Identifications Extrapolate And Calculate Expectation Values List of Candidates With Expectation Values

Rho-diagrams: Overall Quality of a Data Set Expectation values as a function of score for    random matching: e ( s ) exp( s ) Definition: E i (i=0,-1,- 2,…) is the number of spectra that has been assigned an expectation value between exp(i) and exp(i-1). For random matching:  e exp( i )      E Nde N {exp( i ) exp( i 1 )} i   e exp( i 1 )   E i N exp( i ){ 1 exp( 1 )}      ( i ) log( ) log( ) i   E N { 1 exp( 1 )} 0

Rho-diagram Random Matching -6 -5 -4 -3 -2 -1 0 0 -1 -2  -3 -4 -5 -6 log(e)

Rho-diagram Data Quality -10 -8 -6 -4 -2 0 0 -2 -4  -6 -8 -10 log(e)

Rho-diagram Parameters

How many fragments are sufficient? To identify an unmodified peptide? To identify an unmodified peptide? To identify an unmodified peptide? To identify a modified peptide? To identify a modified peptide? To localize a modification on a peptide?

How many fragments are sufficient? How does it depend on different parameters? • Precursor mass • Precursor mass error • Fragment mass error • Background peaks

Proteomics Informatics Protein identification I: searching protein - PowerPoint PPT Presentation

Proteomics Informatics Protein identification I: searching protein sequence collections and significance testing (Week 4) Peptide Mapping - Mass Accuracy 2 Peptide Mapping Database Size Human C. elegans S. cerevisiae 3 Peptide Mapping

Proteomics Informatics Protein identification I: searching protein sequence collections and

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Proteomics Informatics Protein Characterization II: Protein Interactions (Week 11)

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

Proteomics Informatics (BMSC-GA 4437) Instructor David Feny Contact information

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

Proteomics and Protein Structure Introduction to Bioinformatics Dortmund, 16.-20.07.2007

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Proteomics Informatics Databases, data repositories and standardization (Week 8) Protein

Proteomics Informatics Databases, data repositories and standardization (Week 7) Protein

Protein separation and sample preparation for mass spectrometry Manfredo Quadroni

Searching in speech Language and Keyword searching in OSCAR Language and Computers Computers

Content A brief introduction to mass spectrometry Mass spectrometry instrumentation

Status of the KATRIN Experiment and commissioning of the spectrometer and detector section Thomas

Lecture 12 Subtypes and Subclasses Leah Perlmutter / Summer 2018 Announcements Announcements

Overview of detector development at ESRF Ongoing activities and strategy for future instruments

Joint use of AUC and SAS Olwyn Byron School of Life Sciences College of Medical, Veterinary and

New Quality Improvement Plan for CIBMTR Data February 22, 2017 By: Nicolette M. Minas, MS, CCRP

OP OPERATIO IONALIZ IZIN ING B BPCI I ADVA VANCED MAY 2018 2018 Intr troducti tions

Green Domino Incentives: Impact of Energy-aware Adaptive Link Rate Policies in Routers Cyriac

Sambuz

Useful Links

Newsletter

Mail Us