 
              Computational and Mathematical Biology Computational and Mathematical Biology in the Genomics Age: in the Genomics Age: Predicting protein structures Predicting protein structures Ron Elber, Cornell Ron Elber, Cornell 11/13/2006 11/13/2006 1 1
Crash course on proteins Crash course on proteins � Proteins are one � Proteins are one- -dimensional polymers dimensional polymers � Made of 20 types of monomers (amino acids) with � Made of 20 types of monomers (amino acids) with different side chains (ACDEFG… …) but the same ) but the same different side chains (ACDEFG backbone backbone � Fold into a well defined 3D shape that includes � Fold into a well defined 3D shape that includes secondary structure elements (helices, sheets) secondary structure elements (helices, sheets) � They are the machines of the smallest living � They are the machines of the smallest living entities (cells) entities (cells) 11/13/2006 11/13/2006 2 2
Why protein structures? Sequence Why protein structures? Sequence determines 3D shape. Shape determines determines 3D shape. Shape determines function. function. ACDEFGHIJKLMNPQ ACDEFGHIJKLMNPQ Drug design…. Active site! Active site! 11/13/2006 11/13/2006 3 3
Approaches to determine protein structure Approaches to determine protein structure � Experiment (X � Experiment (X- -ray, NMR): ray, NMR): months months � Modeling the chemical physics � Modeling the chemical physics weeks weeks � Homology based modeling: � Homology based modeling: hours hours 11/13/2006 11/13/2006 4 4
Structures Are Evolutionary Templates High degree of Oxygen Transport Proteins structural similarity is often observed in proteins with diverse sequences and in different species (below noise level – 15 percent sequence identity). Leghemoglobin in Plants Myoglobin in Mammals 11/13/2006 11/13/2006 5 5
Three steps in homology modeling Three steps in homology modeling � Identify a structural � Identify a structural template to unknown template to unknown sequence sequence ACEFGH…. � Align the unknown � Align the unknown A - C D W L K sequence to the sequence to the A R C - F L R structural template structural template � Build an atomic model � Build an atomic model based on the template based on the template 11/13/2006 11/13/2006 6 6
Measures of tertiary Measures of tertiary structure fitness structure fitness Instead of direct sequence comparison Instead of direct sequence comparison 1BIN:A 2/3 AFTEKQDALVSSSFEAFKANIPQYSVVFYTSILEKAPAAKDLFSFLANG-----VDPTNP 1MBC:_ 1/2 VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASE- 1BIN:A 57/58 KLTGHAEKLFALVRDSAGQLKASGTVV—ADAALGSVHAQKAVTDPQFVVVKEALLKTIK 1MBC:_ 60/61 DLKKHGVTVLTALGAILKKK---GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLH 1BIN:A 115/116 AAVGDKWSDELSRAWEVAYDELAAAIKKA 1MBC:_ 117/118 SRHPGDFGADAQGAMNKALELFRKDIAAK known structure of a sequence Match unknown sequence to a known structure of a sequence Match unknown sequence to a AFTEKQDALVSSSFEAFKANIPQYSVVFYTSILE KAPAAKDLFSFLANGVDPTNPKLTGHAEKLFA LVRDSAGQLKASGTVVADAALGSVHAQKAVT DPQFVVVKEALLKTIKAAVGDKWSDELSRAW EVAYDELAAAIKKA 11/13/2006 11/13/2006 7 7
Sequence structure function Sequence structure function � Testing folds � Testing folds ISTHISMYSHAPE ISTHISMYSHAPE � Find � Find homologs homologs ANYRELATIVES ANYRELATIVES PERHAPSIAM PERHAPSIAM 11/13/2006 11/13/2006 8 8
A Machine Learning Algorithm to A Machine Learning Algorithm to Match a Protein Sequence to a Match a Protein Sequence to a Homolog Structure Homolog Structure � Potential design: Formulation and � Potential design: Formulation and application application � Generating and learning alignments � Generating and learning alignments � Applications � Applications 11/13/2006 11/13/2006 9 9
Potential design Potential design Pair or Contact potetial ( ) ∑ = E u r P ; ij ij > i j Profile potential ∑ ( ) = E u x P ; i i i 11/13/2006 11/13/2006 10 10
11 11 11/13/2006 11/13/2006
Learning the fold that matches a Learning the fold that matches a sequence from the set of all known sequence from the set of all known structures structures ( ) ( ) − > E S , X P ; E S , X ; P 0 n i n n a a a 3 ... a 1 2 n 11/13/2006 11/13/2006 12 12
Learning folds: Find a potential that Learning folds: Find a potential that recognizes the native fold recognizes the native fold ( ) ( ) − > E S , X P ; E S , X ; P 0 n i n n = ∑ ( ) ( ) E X p f X i i = ∑ E n p α α contact α 11/13/2006 11/13/2006 13 13
Mathematical Programming approach to potential design (contact energies) Interior point, SVM i j ∑ ∑ = = E p n p α α ij > α i j ∆ = − > E E E 1 i nat , i nat ( ) ∑ ∆ = − =∆ ⋅ > i nat E n n p n p 1 α i nat , α α α 2 = subject to p m in p is the unknown 11/13/2006 11/13/2006 14 14
15 15 1 > n p = ∆ ⋅ α p ) nat α n − α i n ( ∑ α = i nat , E ∆ 11/13/2006 11/13/2006
Learning the correct fold Learning the correct fold using 60 million comparisons using 60 million comparisons between native and wrong structures between native and wrong structures ( ) ( ) − > E S , X E S , X 0 i=1,...,60000000 n i n n a a a 3 ... a 1 2 n 11/13/2006 11/13/2006 16 16
General pairwise potentials are insufficient to recognize correct protein fold for a large set of protein-like structures (13 steps optimized independently lead to infeasibility): Tobi & Elber, Proteins 41,40-46(2000) Pairwise potentials are better than profile models (to be shown) but still not good enough. Need statistical enhancements of the signal. 11/13/2006 11/13/2006 17 17
Threading Onion Model Threading Onion Model (THOM2) (THOM2) An improved profile model that mixes the An improved profile model that mixes the accuracy of pairwise energies and the energies and the accuracy of pairwise efficiency of profile energies. energies. efficiency of profile Defining effective pair energies in terms of Defining effective pair energies in terms of structural fingerprints of sites in contact … … structural fingerprints of sites in contact 11/13/2006 11/13/2006 18 18
19 19 11/13/2006 11/13/2006
THOM2 yields effective pair interactions, THOM2 yields effective pair interactions, maintaining the efficiency of profile maintaining the efficiency of profile models. models. � Comparable performance to contact potentials � Comparable performance to contact potentials (with 300 parameters) in terms of self- -recognition recognition (with 300 parameters) in terms of self � LP derived optimal parameters (interior point � LP derived optimal parameters (interior point algorithms!) algorithms!) � Optimal alignments with gaps found using � Optimal alignments with gaps found using dynamic programming dynamic programming � Need for gap penalties for family recognition � Need for gap penalties for family recognition … … 11/13/2006 11/13/2006 20 20
Alignment Alignment Even if we identify a homolog, the problem of Even if we identify a homolog, the problem of structural modeling is not solved. An accurate structural modeling is not solved. An accurate alignment is crucial for successful modeling. alignment is crucial for successful modeling. Also the presence of gaps can make the Also the presence of gaps can make the identification more difficult identification more difficult − − a a a a 1 2 3 4 − x x x x x 1 2 3 4 5 If we need gaps we call the fitness function – score (instead of energy) and denote it by T 11/13/2006 11/13/2006 21 21
An alignment is a path in a dynamic An alignment is a path in a dynamic programming table programming table − a a a a a 1 2 3 4 5 − → → → → → 0 2 3 4 5 g g g g g ↓ ↓ ↓ ↓ ↓ \ \ \ \ \ → → → → → x g 1 ↓ ↓ ↓ ↓ ↓ \ \ \ \ \ → → → → → 2 x g 2 ↓ ↓ ↓ ↓ ↓ ↓ \ \ \ \ \ → → → → → x 3 g 3 ↓ ↓ ↓ ↓ ↓ ↓ \ \ \ \ \ → → → → → 4 x g 4 ↓ ↓ ↓ ↓ ↓ ↓ \ \ \ \ \ → → → → → x 5 g 5 Finding the optimal alignment is quadratic in the protein length using Dynamic Programming 11/13/2006 11/13/2006 22 22
Dynamics programming Dynamics programming Find optimal alignment for a given Find optimal alignment for a given set of parameters set of parameters ( ) The optimal score for aligning a sequence length n T n m , against a sequence length m If we had the optimal scores for the following earlier alignments: ( ) − − T n 1, m 1) ( ) − T n 1, m ( ) T n m − , 1 can we construct the score ? ( ) , T n m Yes… 11/13/2006 11/13/2006 23 23
Recommend
More recommend