Inferring protein functions by matching binding surfaces through evolutionary models
Jie Liang (Joint work with Jeffrey Tseng)
- Dept. of Bioengineering
Inferring protein functions by matching binding surfaces through - - PowerPoint PPT Presentation
Inferring protein functions by matching binding surfaces through evolutionary models Jie Liang (Joint work with Jeffrey Tseng) Dept. of Bioengineering University of Illinois at Chicago Outline Methodology: Computational geometry of
All β α/β
(from SCOP)
Tenasin Phosphotransferase 1ten 1poh Tenasin Phosphotransferase 1ten 1poh (SCOP) All beta proteins a+b proteins Ig like beta sandwich HPr fold
– Sequence homologs are often hypothetical proteins.
– Geometric computation: A library of >2 million surface patterns on > 20,000l PDBs. (cast.engr.uic.edu) – Similarity measure: Sequence patterns, coordinate RMSD, and orientational RMSD. – Scoring matrix.
(Binkowski, Adamian, and Liang,
(Mucke and Edelsbrunner, ACM Trans. Graphics. 1994. Edelsbrunner, et al, Discrete Applied Math. 1998.)
Number of Residues Num of Voids and Pockets 200 600 1000 50 150
1cdk.A 49LGTGSFGRVMLVKHKETGNHFAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEYSFKDNSNL YMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFG FAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVR FPSHFSSDLKDLLRNLLQVDLTKRFGNLKDGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSN F327 1cdk.A_p 49LGTGSFGRV A K V MEYV E K EN L TD F 2src.m 273LGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIV TEYMSKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVAD404 2src.m_p 273LGQGCFGEV A K V TEYM GS D D R AN L AD
Tyr Protein Kinase c-src
cAMP Dependent Protein Kinase
) exp( 1 ) ' ( , ln '
x
e x S p Kmn S S
−
− − = ≥ − = λ
all all
d
Retroviral protease Retroviral protease Family Family All All β β
Class Class Binds poly Binds poly-
peptide substrate acetyl-
pepstatin
Fold Fold
HIV HIV-
1 Protease (
(5hvp 5hvp) )
Pocket Pocket Acid proteases Acid proteases
CATH CATH
Hsp90 Hsp90 Family Family α α+ +β β
Class Class Binds protein segment Binds protein segment geldanamycin geldanamycin
Fold Fold
Heat Shock Protein 90 Heat Shock Protein 90 (
(1yes 1yes) )
Pocket Pocket α α/ /β β sandwhich sandwhich
CATH CATH
Conserved residues both important in polypeptide binding polypeptide binding
Both pockets undergo conformational changes upon binding changes upon binding
T h s h h h
, , 2 , 1
1 10 11 12 13 14 15 16 2 3 4 5 6 7 8 9
0.1 substitution/site
= ∈ ∈
∈
s h h s x I i j i ij x x x h
A i j i k
1 1 ) , (
ε
1 10 11 12 13 14 15 16 2 3 4 5 6 7 8 9
0.1 substitution/site
14000 14500 15000 15500 0e+00 3e + 5 6e + 5
Negative Log Likelihood Number of Steps
data points collected every 500 simulation steps
100 200 300 400 −150 −100 −50 50
rate index
JTT model Parameters Estimated 1 Parameters Estimated 2
−1 1 2 3 4 5 6 7 x 10
−3
1 2 3 4 5 6 7 8 9 10 11
Relative Error
100 200 300 400 500 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
Relative Error Sequence Length
A R N D C Q E G H I L K M F P S T W Y V
0.00 0.04 0.08 0.00 0.04 0.08
Amino Acid Composition of ActiveSite Pockets
Amion Acid probability
Active Site Pocket Composition JTT Amino Acid Composition
0.05
50 100 150 200 500 1000 1500
2335
ActiveSite Pocket length Distribution length Frequency
Total 6273 protein functional pockets mean ~ 35 residues median ~ 23 residues
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
The Active Pocket [ValidPairs: 39]
(a)
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
The rest of Surface [ValidPairs: 177]
(b)
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
Interior [ValidPairs: 190]
(c)
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
Surface [ValidPairs: 187]
(d)
Sij (i, j) are residues shown in the same column of MSA defined as Sampled Pairs and Sij are estimated by Baysian MCMC }
NP_991531 1A7U 1BRT 1HKH 1A88 1A8S 1A8Q 1IUN 1J1I ZP_00217495 ZP_00221914 ZP_00194950 NP_822724 ZP_00031039 ZP_00032388 NP_069699 ZP_00081046 NP_273326 NP_618510 NP_871588 NP_778087 NP_249193 NP_790345 ZP_00086843 NP_904047 NP_298645 NP_796527 1M33 0.1 substitutions/site
99 100 94 68 100 93 91 60 62 70 85 44 63 82 97 66 99 53 93 98 100 49 85 90 85