Protein Interaction Prediction: The PIPE and InSiPS Projects Frank - - PowerPoint PPT Presentation

protein interaction prediction the pipe and insips
SMART_READER_LITE
LIVE PREVIEW

Protein Interaction Prediction: The PIPE and InSiPS Projects Frank - - PowerPoint PPT Presentation

Protein Interaction Prediction: The PIPE and InSiPS Projects Frank Dehne School of Computer Science Centre For Advanced Studies Canada Frank Dehne www.dehne.net Parallel Computational Biochemistry Protein-Protein Interactions Frank Dehne


slide-1
SLIDE 1

Frank Dehne ■ www.dehne.net

Frank Dehne

School of Computer Science Centre For Advanced Studies Canada

Protein Interaction Prediction: The PIPE and InSiPS Projects

slide-2
SLIDE 2

Frank Dehne ■ www.dehne.net

Parallel Computational Biochemistry

Protein-Protein Interactions

slide-3
SLIDE 3

Frank Dehne ■ www.dehne.net

The PIPE Project (Start: 2003)

  • Computer Science

– F.Dehne

  • Biochemistry

– A.Golshani – A.Wong – J.Greenblatt (Toronto)

  • Biomedical Engineering

– J.Green

  • Graduate Students (CS)

– S.Pitre, C.North, A.Amos-

Binks, A.Schoenrock, ...

  • Graduate Students

(Biochemistry)

– B.Samanfar, M.Hooshyar,

M.Alamgir, K.Omidi, D.Burnside , ...

Multi-Disciplinary Team

slide-4
SLIDE 4

Frank Dehne ■ www.dehne.net

Proteins

slide-5
SLIDE 5

Frank Dehne ■ www.dehne.net

Proteins

Primary Sequence: V H L T P E E K ... 3D Structure:

slide-6
SLIDE 6

Frank Dehne ■ www.dehne.net

Protein-Protein Interactions (PPIs)

slide-7
SLIDE 7

Frank Dehne ■ www.dehne.net

Protein-Protein Interaction Networks

Partial Arabidopsis PPI Network

slide-8
SLIDE 8

Frank Dehne ■ www.dehne.net

PPI Enabled Cell Processes

  • S. cerevisiae (Yeast)
slide-9
SLIDE 9

Frank Dehne ■ www.dehne.net

Tandem Affinity Purification (TAP)

Do YGL227W and YMR135C interact?

slide-10
SLIDE 10

Frank Dehne ■ www.dehne.net

Experimental Data

Y2H TAP tag

  • S. cerevisiae (Yeast)
slide-11
SLIDE 11

Frank Dehne ■ www.dehne.net

Experimental Data

species # proteins # protein pairs # known interactions # unknown interactions

  • S. cerevisiae

6,300 19,867,056 15,151 ???

  • C. elegans

23,684 280,454,086 6,607 ???

  • H. sapiens

22,513 253,406,328 41,678 ???

  • S. cerevisiae
  • C. elegans
  • H. sapiens
slide-12
SLIDE 12

Frank Dehne ■ www.dehne.net

PPI Prediction

  • Can we detect PPIs based on primary sequence only?
  • Advantages:

– No 3D structure information needed. (PDB is small) – Can be applied to all proteins, even those without known 3D

structure.

– Can be applied to all genomes, even newly sequenced ones.

slide-13
SLIDE 13

Frank Dehne ■ www.dehne.net

Basic PIPE Algorithm

String comparison

Match = (Sum of pairwise PAM values > Threshold)

slide-14
SLIDE 14

Frank Dehne ■ www.dehne.net

PIPE Output

Positive

slide-15
SLIDE 15

Frank Dehne ■ www.dehne.net

PIPE Output

Negative

slide-16
SLIDE 16

Frank Dehne ■ www.dehne.net

PIPE: Detecting Novel Protein-Protein Interactions

Yeast: YGL227W - YMR135C

slide-17
SLIDE 17

Frank Dehne ■ www.dehne.net

PIPE: Detecting Novel Protein-Protein Interactions

Banting and Best Institute of Medical Research, Toronto

Yeast: YGL227W - YMR135C Experimental Verification

slide-18
SLIDE 18

Frank Dehne ■ www.dehne.net

PIPE: Detecting Novel Protein-Protein Interactions

Banting and Best Institute of Medical Research, Toronto

Yeast: YGL227W - YMR135C

slide-19
SLIDE 19

Frank Dehne ■ www.dehne.net

PIPE: Detecting Novel Protein-Protein Interactions

Banting and Best Institute of Medical Research, Toronto

Protein complex: YGL227W, YMR135C, YIL017C, YDL176W, YIL097W, YDR255C, YBR105C

Yeast: YGL227W - YMR135C

slide-20
SLIDE 20

Frank Dehne ■ www.dehne.net

PIPE: Elucidating the Architecture of Protein Complexes

  • S. cerevisiae
slide-21
SLIDE 21

Frank Dehne ■ www.dehne.net

Global Scan of Entire Protein Interaction Networks

species # proteins # protein pairs # known interactions # unknown interactions

  • S. cerevisiae

6,300 19,867,056 15,151 ???

  • C. elegans

23,684 280,454,086 6,607 ???

  • H. sapiens

22,513 253,406,328 41,678 ???

slide-22
SLIDE 22

Frank Dehne ■ www.dehne.net

Challenges

  • Large number of protein pairs

– Requires innovative data structures for approx. string matching

(Hamming distance via PAM matrix).

– Requires high performance computing.

  • Small number of true positives (very sparse, ~ 0.1 % density)

– Requires very high specificity ~99.95 % (i.e. less than 0.05% false

positive rate)

– Otherwise: #false positives > #true positives

slide-23
SLIDE 23

Frank Dehne ■ www.dehne.net

Challenges

  • False positives created by “popular” motifs that are not related

to protein interaction.

slide-24
SLIDE 24

Frank Dehne ■ www.dehne.net

PIPE's Prediction Accuracy

slide-25
SLIDE 25

Frank Dehne ■ www.dehne.net

BMC Bioinformatics: Comparison Study

PIPE Consensus (incl. PIPE) PIPE 2nd 2nd Yeast Human

slide-26
SLIDE 26

Frank Dehne ■ www.dehne.net

PIPE's Performance

PIPE Sequential Performance Improvements:

  • Character based amino acid representation was converted into

binary encodings that eliminated lookup in PAM120.

  • “Sliding window” process was improved to use incremental

updates.

  • Fast similarity search: Pre-computed all possible protein

fragment comparisons and stored all matches of similar fragments in a hash table.

slide-27
SLIDE 27

Frank Dehne ■ www.dehne.net

Large Scale Parallelization: MP-PIPE

H.sapiens protein pairs

Architecture:

Cluster of multi-core processors

One MP-PIPE worker per proc.

Each worker with multiple threads

slide-28
SLIDE 28

Frank Dehne ■ www.dehne.net

Global Scan of Entire Protein Interaction Networks

species # proteins # protein pairs # known interaction s # novel PIPE pred. * S. cerevisiae 6,300 19,867,056 15,151 14,438

  • C. elegans

23,684 280,454,086 6,607 32,548 H.sapiens 22,513 253,406,328 41,678 130,470 * False positive rate: 0.0001 MP-PIPE's superior performance and prediction accuracy enabled the first ever complete scan of entire protein interaction networks 1 hour 1 week 3 months Running time

(1,000 proc. cores)

slide-29
SLIDE 29

H.Sapiens dsDNA Break Repair

Blue: Proteins known to be involved in dsDNA break repair Green: Known interaction Red: Novel interactions discovered by PIPE Yellow: Novel proteins likely involved in dsDNA break repair

slide-30
SLIDE 30

Frank Dehne ■ www.dehne.net

A computational tool that can synthesize proteins with specific protein-protein interaction prediction profiles.

InSiPS: The In Silico Protein Synthesizer

slide-31
SLIDE 31

Frank Dehne ■ www.dehne.net

The In Silico Protein Synthesizer (InSiPS)

  • Given

– a set of target proteins and – a set of non-target proteins.

  • Design a protein (sequence)

that is

– predicted to interact with the

target proteins and

– predicted not to interact with

the non-targets.

?

targets non-targets

slide-32
SLIDE 32

Frank Dehne ■ www.dehne.net

Drugs Based On PPI Inhibitors

slide-33
SLIDE 33

Frank Dehne ■ www.dehne.net

Drugs Based On PPI Inhibitors

slide-34
SLIDE 34

Frank Dehne ■ www.dehne.net

Fragment Based Screening

slide-35
SLIDE 35

Frank Dehne ■ www.dehne.net

InSiPS: Synthetic Proteins As PPI Inhibitors

  • More “druggable targets”
  • Can attach to “flat” larger

interaction regions that smaller compounds can not recognize

  • Natural compounds can have

side effects

X

target No side effects pathway intercept

slide-36
SLIDE 36

Frank Dehne ■ www.dehne.net

InSiPS: Algorithm

slide-37
SLIDE 37

Frank Dehne ■ www.dehne.net

Performance On BlueGene /Q

#Nodes (16 cores per node) #Nodes (16 cores per node) Population Size: 1500 Sequences. 1 Target. 250 Non-targets.

slide-38
SLIDE 38

Frank Dehne ■ www.dehne.net

Parameter Tuning Limitations

Parameters:

  • Can InSiPS always find a PPI

inhibitor for any combination

  • f target / non-target proteins
  • No! (May not even be

biochemically possible.)

slide-39
SLIDE 39

Frank Dehne ■ www.dehne.net

InSiPS: Limitations

“Good” Cases “Bad” Cases

slide-40
SLIDE 40

Frank Dehne ■ www.dehne.net

InSiPS: Experimental Verification

  • Task: Design a protein that attaches to a yeast protein involved in DNA repair, thereby

blocking its function.

  • Target Yeast protein: YAL017W (PSK1) – DNA repair
  • Non-Targets: All other Yeast proteins (~ 6,000)
  • InSiPS generated protein: “Anti-PSK1”:

HHHHHHSDNEHLHKCQRLKTRWKMARQFSDPQHNMYWIINWAQAMNIHADQNQEEEEELHDASVNNAEQYMAQCAPE EACQYPVRRSYGLHATNCIERRKCCMIMYQHPTCRQWEAKNTCAISRAGKGVYWKGIIFMRAWKHWCTRRLVQ

  • Fitness: 0.465163
  • Target score: 0.71832232
  • Max non-target score: 0.35243136 (YLL039C)
  • Avg non-target score: 0.0720702297

Blue Gene /Q

InSiPS

slide-41
SLIDE 41

Frank Dehne ■ www.dehne.net

InSiPS: Experimental Verification

  • Task: Design a protein that attaches to a yeast protein involved in DNA repair, thereby

blocking its function.

  • Target Yeast protein: YAL017W (PSK1) – DNA repair
  • Non-Targets: All other Yeast proteins (~ 6,000)
  • InSiPS generated protein: “Anti-PSK1”:

HHHHHHSDNEHLHKCQRLKTRWKMARQFSDPQHNMYWIINWAQAMNIHADQNQEEEEELHDASVNNAEQYMAQCAPE EACQYPVRRSYGLHATNCIERRKCCMIMYQHPTCRQWEAKNTCAISRAGKGVYWKGIIFMRAWKHWCTRRLVQ

  • Fitness: 0.465163
  • Target score: 0.71832232
  • Max non-target score: 0.35243136 (YLL039C)
  • Avg non-target score: 0.0720702297
slide-42
SLIDE 42

Frank Dehne ■ www.dehne.net

InSiPS: Experimental Verification

mR NA DNA Prote in mRNA mR NA Protein PSK1

UV Light

mR NA DNA Prote in mRNA mR NA Protein mR NA DNA Prote in mRNA mR NA Protein

Deletion

Anti-PSK1

slide-43
SLIDE 43

Frank Dehne ■ www.dehne.net

InSiPS: Experimental Verification

WT WT (empty vector) WT + Anti-PSK1 expressed PSK1 knockout Expression of Anti-Psk1 causes sensitivity to UV light. Equal numbers of cells serially diluted and exposed to 30s of UV light Decreasing cell density

slide-44
SLIDE 44

Frank Dehne ■ www.dehne.net

Current Project: Muscular Dystrophy

With Alex Blais, Ottawa General Hospital

slide-45
SLIDE 45

Frank Dehne ■ www.dehne.net

Muscular Dystrophy

With Alex Blais, Ottawa General Hospital

slide-46
SLIDE 46

Frank Dehne ■ www.dehne.net

Muscular Dystrophy

Healthy donor Dystrophic patient

  • 1. Muscle biopsy from healthy donor
  • 2. Satellite cell isolation
  • 3. In vitro expansion
  • 4. Transplantation into patient

Stem Cell Therapy

With Alex Blais, Ottawa General Hospital

slide-47
SLIDE 47

Frank Dehne ■ www.dehne.net

Muscular Dystrophy

  • Six1 transcription factor
  • Eya family of Six co-factors

Problem: Immediate fusion of satellite cells

With Alex Blais, Ottawa General Hospital

slide-48
SLIDE 48

Frank Dehne ■ www.dehne.net

Muscular Dystrophy

  • Six1 transcription factor
  • Eya family of Six co-factors

Problem: Immediate fusion of satellite cells

Research Questions:

  • Can we disrupt the

interaction between Six1 and Eya?

  • Does Six1 have other co-

factors it interacts with directly? Can we disrupt their interaction too?

With Alex Blais, Ottawa General Hospital

slide-49
SLIDE 49

Frank Dehne ■ www.dehne.net

Publications

  • "Engineering inhibitory proteins with InSiPS: The in-silico protein synthesizer", to appear in
  • Proc. Supercomputing (SC'15), ACM Dig. Library, 2015.
  • "Efficient prediction of human protein-protein interactions at a global scale", BMC

Bioinformatics 15:383, 2014.

  • "Short co-occurring polypeptide regions can predict global protein interaction maps”,

Scientific Reports (Nature.com/srep), vol.2, art.239, 2012.

  • "Binding site prediction for protein-protein interactions and novel motif discovery using re-
  • ccurring polypeptide sequences", BMC Bioinformatics, 12:225, 2011.
  • "Global investigation of protein–protein interactions in yeast saccharomyces cerevisiae using

re-occurring short polypeptide sequences", Nucleic Acids Research, vol.36, pp.4286-4294, 2008.

  • "PIPE: a protein-protein interaction prediction engine based on the re-occurring short

polypeptide sequences between known interacting protein pairs", BMC Bioinformatics, vol.7, p.365 (15 pages), 2006.