Prediction of Human Protein Kinase Substrate Specificities Javad - - PowerPoint PPT Presentation

prediction of human protein kinase substrate specificities
SMART_READER_LITE
LIVE PREVIEW

Prediction of Human Protein Kinase Substrate Specificities Javad - - PowerPoint PPT Presentation

Prediction of Human Protein Kinase Substrate Specificities Javad Safaei 1 , Jan Manuch 1 , Arvind Gupta 1 , Ladislav Stacho 2 , Steven Pelech 3 1. UBC, Department of Computer Science 2. SFU, Department of Mathematics 3. UBC, Department of


slide-1
SLIDE 1

Prediction of Human Protein Kinase Substrate Specificities

Javad Safaei1, Jan Manuch1, Arvind Gupta1, Ladislav Stacho2, Steven Pelech3

1. UBC, Department of Computer Science 2. SFU, Department of Mathematics 3. UBC, Department of Medicine, and Kinexus Bioinformatics Corporation

slide-2
SLIDE 2

Cell Signaling Network

Human body consists of different

types of cells

23,000 different protein types in

cells

Different cell types are different in

the level of each protein type

Defects in the cell signaling

network leads to 400 diseases (esp. Cancer, Diabetes, and Alzheimer)

Modeling the network is useful for

drug discovery

slide-3
SLIDE 3

Major components in cell phosphorylation signaling

Each component participate in interaction via its domains (d1,d2) Phosphorylation creates dramatic changes in 3D structure of

proteins leading to inhibition, stimulation of proteins

Kinases are S, S-T, Y specific based on their phosphorylation

slide-4
SLIDE 4

Dynamics of Kinase-Substrate Interaction (Docking)

Kinase-Phospho site

interaction is a kind of key and lock model

Active sites should be

close to each other for interaction

Important factors in bond

Size and position of

amino acids

Charge of amino acids

Protein Kinase Protein Substrate

  • +

+ +

H H

+ +

slide-5
SLIDE 5

cAMP-dependent Protein Kinase Structure (PKA)

Alanine as base 0 position Isolecine as +1 position Argenine as -2 position Phospho-S Peptide

GRTGRRNSIHPDSAC

  • Sub-domains are shown in color
  • SDRs are the key residues helpful for

specificity prediction L205 L198 P202

+1 I

  • 2 R

E230 P169 E170

slide-6
SLIDE 6

Problem and Dataset nature

Peptides are found usually in vitro

by mass-spectrometry

Peptide is a small sub-sequence

with length 15 centered at phospho-site (S, T, Y)

Kinases

  • with a lot of peptides

with a few peptides with no peptide

Problem is to find PSSM matrix

(kinase specificity) of all kinases having only primary structure

.

slide-7
SLIDE 7

Alignments of catalytic domains

Done by

ClustalW tool

Purified Manually

by experts

Each column is a

random variables (RV)

We can now infer

how the dynamics

  • f the binding will

be

slide-8
SLIDE 8

Charge Matrix R(xi,yj)

Glycine is favoured to

be on the peptide

Histedine is less

positive than the

  • thers

S, T,

Y are neutral but tend to attract each

  • ther

Proline is neutral and

creates stair like structure on the protein

slide-9
SLIDE 9

Graphical model of the interaction

Mutual Information Charge Dependecy (n is # of

training data for each RV)

Correlation Charge

Dependecy

... X1 X2 X3 X4 X245 X246 X247 ... Y1 Y2 Y3 Y15

slide-10
SLIDE 10

Graphical model of the interaction

Pick top 7, X variables as SDRs Compute the probability of

each amino acid on the peptide

Having trained the model, for a

new kinase aligned catalytic domain we can predict the specificity matrix, knowing only SDRs

...

Z1 Z2 Z3 Z7 Y1

Cc(Z1,Y1) Cc ( Z7 , Y1 ) Cc(Z3,Y1)

...

C

c

( Z

2

, Y

1

)

slide-11
SLIDE 11

Compute profile matrix of a kinase without peptide data

Having trained

the model, for a new aligned kinase catalytic domain we can predict the profile matrix, knowing only SDRs

slide-12
SLIDE 12

Data and Process Flow

NetPhorest Predictor Sites 9,125 Kinase-Phospho Peptide pairs for 309 Kinase domains 550 Kinases in human 500 Kinase catalytic domain from 488 Kinases Phospho.ELM PhosphoSite Plus Literature 229 Kinases with consensus sequences Compute Background (Surface) Frequency of Amino Acids Compute Profile Matrix of 309 Kinases domains with data Compute Specificity (PSSM) Matrices Of 309 Kinases domains Find SDRs and Profile Matrix of 500 Kinases with No Data Compute PSSM Matrices for 500 domains Remove atypical kinases Comparison in Experiment Comparison in Experiment Maching Learning ANN, SVM, HMM

slide-13
SLIDE 13

Definitions

Background Frequency B(i), probability of amino acid i on the

surface, we compute it by peptide training data

Profile Matrix of each Kinase, Pk(i,j) amino acid i at position j of the

peptides phosphorylated by Kinase K.

Specificity (PSSM) Matrix of a Kinase usually is log odds ratio

Mk(i,j)= log(Pk(i,j) / B(i))

We used the following eq. to eliminate –inf in the matrix

Mk(i,j)= sgn{Pk(i,j) – B(i)}× |Pk(i,j) – B(i)|1.2

slide-14
SLIDE 14

Predicted vs. Experimental profile matrices

Comparison for 309 Kinases that we have phospho-peptide data Prediction was 100% correct to recognize (S,S-T,Y) specific kinases, using

  • nly their aligned SDRs
slide-15
SLIDE 15

Comparison with Netphorest

  • Netphorest has
  • 8,746 phosphosite-kinase
  • 169 Kinases
  • 50 Kinase groups
  • Doesn’t work for kinases with

no data

  • Keeping the best kinase for each

site leads to 6299 site-kinase for comparison

  • Our Method
  • Works for all 500 kinases
  • 500 different profile matrices

and specificities

  • SDRs and yielding information

about 3D structure

slide-16
SLIDE 16

Future work (1)

hybrid recommender

systems for prediction

Sparse utility matrix

should be completed

SDRs therefore are

important features in user spec vector

User / Kinase 1 User / Kinase 2 User / Kinase N Movie/ Peptide 1 Movie/ Peptide 2 Movie/ Peptide 3 Movie/ Peptide M u

1

u2

3

u4 ?

SN×N QM×M UN×M

u5

Utility Matrix Similarity Matrix Similarity Matrix

slide-17
SLIDE 17

Future work (2)

Generalize it for SH2, PTB domain proteins, to complete

  • ur model of cell signalling pathway

We have many crystallographic datasets here from PDB,

and computational geometry or vision methods can be applied

Like user-movie problem, there is signal strength between

SH2 domain proteins and receptor (substrate) proteins.