Evolutionary Conservation of Human Phosphorylation Sites Javad - - PowerPoint PPT Presentation

evolutionary conservation of human phosphorylation sites
SMART_READER_LITE
LIVE PREVIEW

Evolutionary Conservation of Human Phosphorylation Sites Javad - - PowerPoint PPT Presentation

Evolutionary Conservation of Human Phosphorylation Sites Javad Safaei 1 , Jan Manuch 1 , Arvind Gupta 1 , Ladislav Stacho 2 , Steven Pelech 3 1. UBC, Department of Computer Science 2. SFU, Department of Mathematics 3. UBC, Department of


slide-1
SLIDE 1

Evolutionary Conservation of Human Phosphorylation Sites

Javad Safaei1, Jan Manuch1, Arvind Gupta1, Ladislav Stacho2, Steven Pelech3

1. UBC, Department of Computer Science 2. SFU, Department of Mathematics 3. UBC, Department of Medicine, and Kinexus Bioinformatics Corporation

1 1:02 AM

slide-2
SLIDE 2

Cell Signaling Network

Human body consists of different

types of cells

23,000 different protein types in

cells

Different cell types are different in

the level of each protein type

Defects in the cell signaling

network leads to 400 diseases (esp. Cancer, Diabetes, and Alzheimer)

Modeling the network is useful for

drug discovery

slide-3
SLIDE 3

Cell Phosphorylation Network

Network defect correlates with 400

diseases (Cancer)

Phosphorylation

  • Important PTMs
  • Protein kinases phosphorylate

substrates, protein phosphatases dephosphorylate substrates, phospho- dependent proteins bind to phosphate and move around it

  • Can change protein function and 3D

structure dramatically

Phosphosites

  • Only on serine (S), threonine(T),

tyrosine (Y), and rarely histidine (H). Into two main groups:

  • Inhibitory: inhibit the protein from its

activity

  • Activatory: activate protein

3 1:02 AM

slide-4
SLIDE 4

Phosphosites Conservation

Why?

  • Correlate conservation and

inhibition/activation sites

  • Correlate conservation and confirmed

disease mutation data

  • Investigate conservation in S, T, and

Y sites

  • How much phosphates (negative

moiety) are replaced by negatively charged amino acids: aspartic (D), glutamic (E) amino acids

Conservation of sites, requires

conservation of proteins, and that requires recognition of human protein

  • rthologs in other species.

4 1:02 AM

slide-5
SLIDE 5

Orthologs Recognition

  • The most similar protein in the
  • ther species is the orthologs

protein

  • Certain threshold of similarity

needed for ortholog

  • Global sequence alignment is the

similarity measure

  • Protein orthologs are aligned

with blue rectangles

  • Number of proteins are different

in different species

5 1:02 AM

slide-6
SLIDE 6

Orthologs Recognition

  • Big protein databases, need to be done fast and accurately
  • For each species build the blast database from Fasta sequences

Species_DB <= formatdb -i Species_Seqs -p T -o T

  • p T works proteins, and -o T to create indices in the results.
  • For each human protein run blast search on each formatted species db, and retrieve top five

candidate proteins Top_5_Orthologs <= blastp –i Input_Seq –d Species_DB –b 5

  • Blast is imperfect database search, therefore for each candidate protein compute the global

alignment based on Needleman–Wunsch.

  • Protein with the highest percent identity is chosen as the human protein in that ortholog.
  • Works correctly to find the protein itself in human protein database.

6 1:02 AM

slide-7
SLIDE 7

Conservation of Phosphosites

  • Phosphosites are analyzed through regions r1 , r2 , r3 (subsequence) centered at each site (15

residues in our case)

  • This region is well known in biology and specificity of the kinases and phosphatases is defined

using it.

  • Globally aligning human proteins (ph) with species orthologs (ps), automatically aligns

phospho-regions but with high probability of gaps.

  • We modified needleman-wunsch global alignment to take gaps outside of the phospho-

regions, and also predict more sites in the ortholog: constrained global alignment (CGA)

  • Some sites (r3) are aligned with different amino acids than S, T, Y (we don’t count those cases

in statistics).

7 1:02 AM

slide-8
SLIDE 8

Constrained Global Alignment (CGA)

8 1:02 AM

slide-9
SLIDE 9

Phosphosite Prediction in Human

  • Sites gathered from PhosphositePlus, Phospho-ELM, Phosidia, Literature
  • Prediction of over 3,000 phospho-sites by constrained GA from 30,000 sites in 3 different

species.

  • (T, Y)-sites are more conserved than S-sites.
  • zero Y
  • site in yeast, leads to 9 Y
  • sites in Human (i.e. S, T have changed to Y in human)
  • The more similar specie to human, the more sites predicted in human.

Prediction Species # Proteins P-Ser P-Thr P-Tyr Total Sites Yeast to Human Yeast 1,542 8,184 1,855 10,039 Human 311 225 126 9 360 Ratio (Human/Yeast) 20.17% 2.75% 6.79% NA 3.59% Worm to Human Worm 696 3,060 440 114 3,614 Human 369 178 82 27 287 Ratio (Human/Worm) 53.02% 5.82% 18.64% 23.68% 7.94% Fruit fly to Human Fruit Fly 3,956 11,556 3,495 705 15,756 Human 1,676 1,666 917 188 2,771 Ratio (Human/Fruit Fly) 42.37% 14.42% 26.24% 26.67% 17.59% Total Predicted Human 2,356 2,069 1,125 224 3,418

9 1:02 AM

slide-10
SLIDE 10

Phosphosite Prediction in Species

Using 90K experimentally

confirmed phosphosites in human

Prediction of over 620K sites in 19

species

Availability

www.phosphonet.ca includes exact

proteins and sites information

The farther the species, the more

Thr/Ser- ratio

P-Ser P-Thr P-Tyr Thr/Ser All Human 53,478 16,971 18,849 32% 89,298 1 Mouse 45,096 14,344 16,598 32% 76,038 2 Dog 42,479 13,605 15,830 32% 71,914 3 Chimpanzee 41,471 14,030 15,227 34% 70,728 4 Rhesus macaque 40,163 13,228 14,735 33% 68,126 5 Rat 39,733 13,437 14,672 34% 67,842 6 Chicken 30,333 11,233 12,566 37% 54,132 7 Brachydanio rerio 26,669 11,045 11,050 41% 48,764 8 Duckbill platypus 24,467 9,035 10,023 37% 43,525 9 African clawed frog 19,780 8,617 8,911 44% 37,308 10 Fruit fly 9,665 5,878 4,698 61% 20,241 11 Purple sea urchin 8,156 4,709 3,489 58% 16,354 12 Honeybee 6,766 4,219 3,440 62% 14,425 13 Nematode worm 5,364 3,390 2,846 63% 11,600 14 Baker's yeast 3,135 2,223 1,661 71% 7,019 15 Mouse-ear cress 3,070 1,752 1,444 57% 6,266 16 Red bread mold 791 671 557 85% 2,019 17 Maize 693 419 488 60% 1,600 18 Western balsam poplar 747 430 371 58% 1,548 19 Tammar wallaby 38 31 23 82% 92 Total Predicted Sites 348,616 132,296 138,629 NA 619,541

10 1:02 AM

slide-11
SLIDE 11

Human Phosphosites Scores

Avg Activation score

Check if the negatively charged PO3

  • 4 , is

replaced by Aspartic (D) or Gultamic (E) acids in other species to keep the functionality.

Avg Conservation score

Identity Conservation Similarity Conservation Divide by the number of found

phospho-regions (less than 20)

11 1:02 AM

slide-12
SLIDE 12

Amino Acids Similarity

  • T
  • compute percent similarity of phospho-regions, the following graph is suggested by

experience.

  • Edges means similarity
  • Different than BLOSUM matrix that is for conservation
  • df

12 1:02 AM

slide-13
SLIDE 13

Conclusion Results

P-Ser P-Thr P-Tyr Total All P-Sites Total: 89,298 Avg Activation 0.00390 0.00099

  • 0.00048

0.00242 Avg Conservation 25.62 27.33 30.52 26.98 Functional P-Sites Total: 769 Avg Activation - Activating 0.006557 0.006543 0.018085 0.009709 Avg Conservation - Activating 35.81 39.36 34.57 36.59 Avg Activation - Inhibitory 9.26E-05

  • 0.00634

0.025484 0.003 Avg Conservation - Inhibitory 30.56 34.34 33.34 31.90 Functional Kinase P-Sites Total: 183 Avg Activation - Activating 0.006025 0.008227 0.016565 0.009931 Avg Conservation - Activating 37.48 40.51 34.86 37.67 Avg Activation - Inhibitory

  • 0.00357

0.002857 0.025 0.005179 Avg Conservation - Inhibitory 30.29 36.01 33.30 32.47 Kinase P-Sites Total: 7,121 Avg Activation 0.000331 0.000408 0.003833 0.001276 Avg Conservation 27.62 32.34 33.78 30.33

Conservation Similarity is used

  • Phospho-Thr sites are more conserved,

than Ser-Tyr sites.

  • Phospho-Thr/Phospho-Ser ratio

increase in farther species to human

  • Kinase sites are more conserved than a

random site in a substrate

  • Functional activatory sites are more

conserved.

Activation Scores

  • Activatory sites have higher avg.

activation score than inhibition sites as we excepted.

13 1:02 AM

slide-14
SLIDE 14

Acknowledgement

CRD grant from the Natural Sciences and Engineering

Research Council (NSERC) of Canada and the MITACS Accelerate Internship Program

Kinexus Company, on data preparation

14 1:02 AM

slide-15
SLIDE 15

Questions

15 1:02 AM