Comparative protein structure modeling of genes and genomes Marc A. - - PowerPoint PPT Presentation

comparative protein structure modeling of genes and
SMART_READER_LITE
LIVE PREVIEW

Comparative protein structure modeling of genes and genomes Marc A. - - PowerPoint PPT Presentation

Comparative protein structure modeling of genes and genomes Marc A. Marti-Renom Department of Biopharmaceutical Sciences University of California, San Francisco Comparative Modeling Why protein structure prediction? Y 2003 Y 2005


slide-1
SLIDE 1

Comparative protein structure modeling of genes and genomes

Marc A. Marti-Renom

Department of Biopharmaceutical Sciences University of California, San Francisco

slide-2
SLIDE 2

Comparative Modeling…

slide-3
SLIDE 3

Y 2003 Y 2005 Sequences 1,000,000 millions Structures 18,000 50,000

Why protein structure prediction?

slide-4
SLIDE 4

Y 2003 Sequences 1,000,000 Structures 18,000

Why protein structure prediction?

Theory Experiment

slide-5
SLIDE 5

Y 2003 Sequences 1,000,000 Structures 18,000

Why protein structure prediction?

Theory Experiment 400,000

http://salilab.org/ modbase

slide-6
SLIDE 6

Principles of Protein Structure

GFCHIKAYTRLIMVG…

Folding

Ab initio prediction

Anabaena 7120 Anacystis nidulans Condrus crispus Desulfovibrio vulgaris

Evolution

Threading Comparative Modeling

slide-7
SLIDE 7

Comparative Modeling by Satisfaction of Spatial Restraints (MODELLER)

3D GKITFYERGFQGHCYESDC-NLQP… SEQ GKITFYERG---RCYESDCPNLQP…

  • A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993.

J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994.

  • A. Fiser, R. Do & A. Šali. Prot Sci. 9, 1753, 2000.

http://salilab.org/modeller

slide-8
SLIDE 8

Comparative Modeling by Satisfaction of Spatial Restraints (MODELLER)

3D GKITFYERGFQGHCYESDC-NLQP… SEQ GKITFYERG---RCYESDCPNLQP…

  • 1. Extract spatial restraints
  • A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993.

J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994.

  • A. Fiser, R. Do & A. Šali. Prot Sci. 9, 1753, 2000.

http://salilab.org/modeller

slide-9
SLIDE 9

Comparative Modeling by Satisfaction of Spatial Restraints (MODELLER)

3D GKITFYERGFQGHCYESDC-NLQP… SEQ GKITFYERG---RCYESDCPNLQP…

  • 1. Extract spatial restraints

F(R) = Π pi (fi / I)

i

  • 2. Satisfy spatial restraints
  • A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993.

J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994.

  • A. Fiser, R. Do & A. Šali. Prot Sci. 9, 1753, 2000.

http://salilab.org/modeller

slide-10
SLIDE 10

Steps in Comparative Protein Structure Modeling

START

ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK

TARGET

  • A. Šali, Curr. Opin. Biotech. 6, 437, 1995.
  • R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997.
  • M. A. Martí-Renom et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000.
slide-11
SLIDE 11

Steps in Comparative Protein Structure Modeling

START

ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK

TARGET

Template Search

TEMPLATE

  • A. Šali, Curr. Opin. Biotech. 6, 437, 1995.
  • R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997.
  • M. A. Martí-Renom et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000.
slide-12
SLIDE 12

Steps in Comparative Protein Structure Modeling

Target – Template Alignment

MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE

START

ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK

TARGET

Template Search

TEMPLATE

  • A. Šali, Curr. Opin. Biotech. 6, 437, 1995.
  • R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997.
  • M. A. Martí-Renom et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000.
slide-13
SLIDE 13

Steps in Comparative Protein Structure Modeling

Target – Template Alignment

MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE

Model Building

START

ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK

TARGET

Template Search

TEMPLATE

  • A. Šali, Curr. Opin. Biotech. 6, 437, 1995.
  • R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997.
  • M. A. Martí-Renom et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000.
slide-14
SLIDE 14

Steps in Comparative Protein Structure Modeling

Target – Template Alignment

MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE

Model Building

START

ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK

TARGET

Template Search

TEMPLATE

OK? Model Evaluation

END

Yes

  • A. Šali, Curr. Opin. Biotech. 6, 437, 1995.
  • R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997.
  • M. A. Martí-Renom et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000.
slide-15
SLIDE 15

Steps in Comparative Protein Structure Modeling

No Target – Template Alignment

MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE

Model Building

START

ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK

TARGET

Template Search

TEMPLATE

OK? Model Evaluation

END

Yes

  • A. Šali, Curr. Opin. Biotech. 6, 437, 1995.
  • R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997.
  • M. A. Martí-Renom et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000.
slide-16
SLIDE 16

Model Accuracy as a Function of Target-Template Sequence Identity

slide-17
SLIDE 17

Typical Errors in Comparative Models

Distortion in correctly aligned regions Region without a template Sidechain packing Incorrect template MODEL X-RAY TEMPLATE Misalignment

slide-18
SLIDE 18

Model Accuracy

Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

MEDIUM ACCURACY LOW ACCURACY HIGH ACCURACY

NM23 Seq id 77% CRABP Seq id 41% EDN Seq id 33% X-RAY

slide-19
SLIDE 19

Model Accuracy

Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

MEDIUM ACCURACY LOW ACCURACY HIGH ACCURACY

NM23 Seq id 77% CRABP Seq id 41% EDN Seq id 33% X-RAY Sidechains Core backbone Loops / MODEL

Cα equiv 147/148 RMSD 0.41Å

slide-20
SLIDE 20

Model Accuracy

Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

MEDIUM ACCURACY LOW ACCURACY HIGH ACCURACY

NM23 Seq id 77% CRABP Seq id 41% EDN Seq id 33% X-RAY Sidechains Core backbone Loops / MODEL

Cα equiv 147/148 RMSD 0.41Å

Sidechains Core backbone Loops Alignment

Cα equiv 122/137 RMSD 1.34Å

slide-21
SLIDE 21

Model Accuracy

Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

MEDIUM ACCURACY LOW ACCURACY HIGH ACCURACY

NM23 Seq id 77% CRABP Seq id 41% EDN Seq id 33% X-RAY Sidechains Core backbone Loops / MODEL

Cα equiv 147/148 RMSD 0.41Å

Sidechains Core backbone Loops Alignment

Cα equiv 122/137 RMSD 1.34Å

Sidechains Core backbone Loops Alignment Fold assignment

Cα equiv 90/134 RMSD 1.17Å

slide-22
SLIDE 22

Applications of Comparative Models

  • A. Sali & J. Kuriyan.

TIBS 22, M20, 1999.

  • D. Baker & A. Sali.

Science 294, 93, 2001.

slide-23
SLIDE 23

genes…

slide-24
SLIDE 24

Do mast cell proteases bind proteoglycans? Where? When?

1. mMCPs bind negatively charged proteoglycans through electrostatic interactions? 2. Comparative models used to find clusters of positively charged surface residues. 3. Tested by site-directed mutagenesis..

Huang et al. J. Clin. Immunol. 18,169,1998. Matsumoto et al. J.Biol.Chem. 270,19524,1995. Šali et al. J. Biol. Chem. 268, 9023, 1993.

Native mMCP-7 at pH=5 (His+) Native mMCP-7 at pH=7 (His0)

Predicting features of a model that are not present in the template

slide-25
SLIDE 25

What is the physiological ligand of Brain Lipid-Binding Protein?

  • L. Xu, R. Sánchez, A. Šali, N. Heintz, J. Biol. Chem. 271, 24711, 1996.

BLBP/Docosahexaenoic acid BLBP/oleic acid

Ligand binding cavity Cavity is not filled Cavity is filled

1. BLBP binds fatty acids. 2. Build a 3D model. 3. Find the fatty acid that fits most snuggly into the ligand binding cavity. Predicting features of a model that are not present in the template

slide-26
SLIDE 26

Some Models Can Be Used in Docking to Density Maps

(Yeast Ribosomal 40S subunit)

Docking of comparative models into the cryo-EM map.

Spahn et al. 2001 Cell 107:373-386

Small 30S subunit from Thermus thermophilus Large 50S subunit from Haloarcula marismortui

slide-27
SLIDE 27

40S Subunit 60S Subunit

  • 43 proteins could be modeled on 20-56%

seq.id. to a known structure.

  • The coverage of the models ranges from

34-99%.

  • Models were manually docked into the 15Å

cryo-electron density map.

  • The solid orange in the 60S subunit and the

solid green in the 40S subunit correspond to proteins without known bacterial homologs.

slide-28
SLIDE 28

Nebojsa Mirkovic, Marc A. Marti-Renom, Andrej Sali Alvaro N.A. Monteiro (Sprang Center, Cornell U.)

Structural analysis of missense mutations in human BRCA1 BRCT domains

slide-29
SLIDE 29

200 aa RING NLS BRCT

Globular regions Nonglobular regions

BRCA1 BRCT repeats, 1jnx

Human BRCA1 and its two BRCT domains

Williams, Green, Glover. Nat.Struct.Biol. 8, 838, 2001

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32

C1697R R1699W A1708E S1715R P1749R M1775R M1652I A1669S V1665M D1692N G1706A D1733G M1775V P1806A M1652K L1657P E1660G H1686Q R1699Q K1702E Y1703H F1704S L1705PS 1715NS 1722FF 1734LG 1738EG 1743RA 1752PF 1761I F1761S M1775E M1775K L1780P I1807S V1833E A1843T M1652T V1653M L1664P T1685A T1685I M1689R D1692Y F1695L V1696L R1699L G1706E W1718C W1718S T1720A W1730S F1734S E1735K V1736A G1738R D1739E D1739G D1739Y V1741G H1746N R1751P R1751Q R1758G L1764P I1766S P1771L T1773S P1776S D1778N D1778G D1778H M1783T A1823T V1833M W1837R W1837G S1841N A1843P T1852S P1856T P1859R

cancer associated

? ?

Missense Mutations in BRCT Domains by Function

C1787S G1788 D G1788V G1803A V1804D V1808A V1809A V1809F V1810G Q1811R P1812S N1819S

not cancer associated no transcription activation transcription activation

9/18/02

slide-33
SLIDE 33

YES charge change

+

buriedness YES NO

<30A3

≥60A3 <90A3 ≥90A3

rigid (<

  • 0.7)

rigid (<-0.7)

n o n - r i g i d (≥-0.7)

non-rigid (≥-0.7) exposed

buried

residue rigidity volume change volume change volume change functional site

  • 0 or 1 class

phylogenetic entropy polarity change <0

  • NO

non 0 ≥0 YES

+

  • “Decision” Tree for Predicting

Functional Impact

  • f Genetic

Variants

NO 2 class <60A3 ≥30A3

neighborhood rigidity

buriedness residue rigidity volume change charge change polarity change phylogenetic entropy

  • ther information

(helix breaker, turn breaker)

  • ther information

(helix breaker, turn breaker)

+

mutation likelihood mutation likelihood

  • residue rigidity

volume change polarity change phylogenetic entropy

  • ther information

(helix breaker, turn breaker)

+

mutation likelihood buriedness

START neighborhood rigidity neighborhood rigidity

charge change

slide-34
SLIDE 34

Putative Binding Site on BRCA1

RMSMVVSGLTPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYF WVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDRKIFRGLEICCYGPFTNMPTDQLEWMVQL CGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDTYLI PQIP

RMSMVVSGLTPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERK MLNEHDFEVRGDVVNGRNHQGPKRARESQDRKIFRGLEICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHP IVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDTYLIPQIP

slide-35
SLIDE 35

genomes…

slide-36
SLIDE 36

Structural Genomics

Characterize most protein sequences based on related known structures.

  • Sali. Nat. Struct. Biol. 5, 1029, 1998.

Sali et al. Nat. Struct. Biol., 7, 986, 2000.

  • Sali. Nat. Struct. Biol. 7, 484, 2001.

Baker & Sali. Science 294, 93, 2001.

11/11/02

slide-37
SLIDE 37

Structural Genomics

Characterize most protein sequences based on related known structures.

  • Sali. Nat. Struct. Biol. 5, 1029, 1998.

Sali et al. Nat. Struct. Biol., 7, 986, 2000.

  • Sali. Nat. Struct. Biol. 7, 484, 2001.

Baker & Sali. Science 294, 93, 2001.

11/11/02

slide-38
SLIDE 38

Structural Genomics

Characterize most protein sequences based on related known structures.

  • Sali. Nat. Struct. Biol. 5, 1029, 1998.

Sali et al. Nat. Struct. Biol., 7, 986, 2000.

  • Sali. Nat. Struct. Biol. 7, 484, 2001.

Baker & Sali. Science 294, 93, 2001.

11/11/02

Characterize most protein sequences based on related known structures.

slide-39
SLIDE 39

Structural Genomics

Characterize most protein sequences based on related known structures.

  • Sali. Nat. Struct. Biol. 5, 1029, 1998.

Sali et al. Nat. Struct. Biol., 7, 986, 2000.

  • Sali. Nat. Struct. Biol. 7, 484, 2001.

Baker & Sali. Science 294, 93, 2001.

11/11/02

Characterize most protein sequences based on related known structures.

slide-40
SLIDE 40

Structural Genomics

Characterize most protein sequences based on related known structures.

  • Sali. Nat. Struct. Biol. 5, 1029, 1998.

Sali et al. Nat. Struct. Biol., 7, 986, 2000.

  • Sali. Nat. Struct. Biol. 7, 484, 2001.

Baker & Sali. Science 294, 93, 2001.

The number of “families” is much smaller than the number of proteins. Any one of the members

  • f a family is fine.

11/11/02

Characterize most protein sequences based on related known structures.

slide-41
SLIDE 41

Structural Genomics

Characterize most protein sequences based on related known structures. There are ~16,000 30% seq id families (90%)

(Vitkup et al. Nat. Struct. Biol. 8, 559, 2001)

  • Sali. Nat. Struct. Biol. 5, 1029, 1998.

Sali et al. Nat. Struct. Biol., 7, 986, 2000.

  • Sali. Nat. Struct. Biol. 7, 484, 2001.

Baker & Sali. Science 294, 93, 2001.

The number of “families” is much smaller than the number of proteins. Any one of the members

  • f a family is fine.

11/11/02

Characterize most protein sequences based on related known structures.

slide-42
SLIDE 42

Modeling with NY-SGRC structures

June 2001

Bonanno et al. Proc.Natl.Acad.Sci.USA 98, 12896, 2001. Chance et al. Protein Science 11, 723, 2002.

http://salilab.org/ modweb/

slide-43
SLIDE 43

START

Get profile for sequence (NR) Scan sequence profile against representative PDB chains Scan PDB chain profiles against sequence

PSI-BLAST MODPIPE: Large-Scale Comparative Protein Structure Modeling

Select templates using permissive E-value cutoff

1

Expand match to cover complete domains

1 For each sequence

END

For each template

Build model for target segment by satisfaction of spatial restraints Evaluate model Align matched parts of sequence and structure

MODELLER

  • R. Sánchez & A. Šali, Proc. Natl. Acad. Sci. USA 95, 13597, 1998.
  • N. Eswar, M. Marti-Renom, M.S. Madhusudhan, B. John, A. Fiser, R. Sánchez, F. Melo, N. Mirkovic, A. Šali.
slide-44
SLIDE 44

Comparative modeling of the TrEMBL database

Unique sequences processed: 733,239 Sequences with fold assignments or models: 415,937 (57%)

4/03/02 ~4 weeks on 500 Pentium III CPUs

70% of models based on <30% sequence identity to template. On average, only a domain per protein is modeled

(an “average” protein has 2.5 domains of 175 aa).

slide-45
SLIDE 45

Modeling Coverage of the Sequence Space

Fold assignment: PSI-BLAST E-value ≤ 10-4 Reliable Model: Model Score ≥ 0.7

Not Attempted 43% Reliable Model Only 0% Fold Assignment Only 12% Reliable Model + Fold Assignment 44%

slide-46
SLIDE 46

http://salilab.org/modbase

Pieper et al., Nucl. Acids Res. 2002.

8/9/02

slide-47
SLIDE 47

Conclusions

 At present, useful 3D models can be obtained for

domains in ~ 55% of the proteins (25% of domains).

 Sampling at >30% sequence identity level.  Completeness in structural coverage.  Application to biological problems.

slide-48
SLIDE 48

http://www.salilab.org

Acknowledgments

Andrej Sali Frank Alber Fred Davis Damien Devos Narayanan Eswar Rachel Karchin Libusha Kelly Michael F. Kim Dmitry Korkin

  • M. S. Madhusudhan

Nebosja Mirkovic Ursula Pieper Andrea Rossi Min-yi Shen Maya Topf