Continuous global optimization for protein structure analysis . - - PowerPoint PPT Presentation

continuous global optimization for protein structure
SMART_READER_LITE
LIVE PREVIEW

Continuous global optimization for protein structure analysis . - - PowerPoint PPT Presentation

Introduction Continuous Optimization Numerical Results Conclusions Continuous global optimization for protein structure analysis . Bertolazzi 1 , C. Guerra 2 , F .Lampariello 1 , G. Liuzzi 1 P PR PS BB 2011, 13/09/2011 1IASI - Consiglio


slide-1
SLIDE 1

Introduction Continuous Optimization Numerical Results Conclusions

Continuous global optimization for protein structure analysis

P . Bertolazzi1, C. Guerra2, F .Lampariello1, G. Liuzzi1

PR PS BB 2011, 13/09/2011

1IASI - Consiglio Nazionale delle Ricerche 2DEI - Università di Padova

slide-2
SLIDE 2

Introduction Continuous Optimization Numerical Results Conclusions

Outline

Continuous global optimization for protein structure analysis

1

Introduction

2

Continuous Optimization

3

Numerical Results

4

Conclusions

slide-3
SLIDE 3

Introduction Continuous Optimization Numerical Results Conclusions

Problem description

Given two patches of two proteins surfaces, find the isometric transformation (roto-translation) which best overlaps one patch

  • nto the other.
slide-4
SLIDE 4

Introduction Continuous Optimization Numerical Results Conclusions

Motivations

Binding pockets or cavities of similar shape are likely to bind the same ligand Surface alignment is useful in determining if there exists a portion of a target protein which is similar to the active site of a known (model) protein If this happens then the target protein is likely to bind the same ligand as the model one thus having similar functional properties 1gol 1csn Ligand: ATP (Adenosine TriPhosphate)

slide-5
SLIDE 5

Introduction Continuous Optimization Numerical Results Conclusions

Approaches

In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments

  • utcome depends on initial guess

Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process

slide-6
SLIDE 6

Introduction Continuous Optimization Numerical Results Conclusions

Approaches

In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments

  • utcome depends on initial guess

Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process

slide-7
SLIDE 7

Introduction Continuous Optimization Numerical Results Conclusions

Approaches

In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments

  • utcome depends on initial guess

Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process

slide-8
SLIDE 8

Introduction Continuous Optimization Numerical Results Conclusions

Approaches

In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments

  • utcome depends on initial guess

Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process

slide-9
SLIDE 9

Introduction Continuous Optimization Numerical Results Conclusions

Approaches

In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments

  • utcome depends on initial guess

Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process

slide-10
SLIDE 10

Introduction Continuous Optimization Numerical Results Conclusions

Iterative Closest Point

Assumes closest points correspond to each other Optimize to reduce overall error Good for registering surfaces with shapes very similar to each

  • ther

Final result heavily depends on initial relative position between

  • surfaces. Convergence to local minima

Poor results for protein surfaces ICP attracted by local minima

slide-11
SLIDE 11

Introduction Continuous Optimization Numerical Results Conclusions

Iterative Closest Point

Assumes closest points correspond to each other Optimize to reduce overall error Good for registering surfaces with shapes very similar to each

  • ther

Final result heavily depends on initial relative position between

  • surfaces. Convergence to local minima

Poor results for protein surfaces ICP attracted by local minima

slide-12
SLIDE 12

Introduction Continuous Optimization Numerical Results Conclusions

Continuous Global Optimization

The problem can be (mathematically) formulated as min

u=(x,y,z,α,β,γ) f(u)

(P) where: f(u) is a so called distance function. The global minimizer(s) u⋆ = (x, y, z, α, β, γ)⋆ of Problem (P) gives the best isometric transformation(s) which makes the two surfaces best overlap onto each other

slide-13
SLIDE 13

Introduction Continuous Optimization Numerical Results Conclusions

Problem Properties

Problem (P) has the following distinguishing properties presence of many local minima besides the global ones first derivatives of f are unavailable Hence we use a Derivative-Free Controlled Random Search (DF-CRS) global optimization method

slide-14
SLIDE 14

Introduction Continuous Optimization Numerical Results Conclusions

ACRS

The method maintains a population of candidate solution throughout the entire process. It mainly consists of two phases: An initial global random search phase - generation of the inital population An iterative local refinement phase - progressive update of the population

slide-15
SLIDE 15

Introduction Continuous Optimization Numerical Results Conclusions

ACRS

Let ǫ > 0 be a given tolerance, N = 6, p = 50N. Global phase: Randomly generate set S0 = {u1

0, . . . , up 0}. k = 0

Do While

  • f max

k

− f min

k

> ǫ

  • Local phase: generate a new point and update set Sk

Set k = k + 1 End Do

umax

k

= arg max

u∈Sk

f(u) f max

k

= f(umax

k

) umin

k

= arg min

u∈Sk f(u)

f min

k

= f(umin

k

)

slide-16
SLIDE 16

Introduction Continuous Optimization Numerical Results Conclusions

Running Time

Numerical experience revealed that ACRS requires O(103) up to O(104) iterations, on avarage, to converge It is able to recover good alignment when surfaces are indeed similar

slide-17
SLIDE 17

Introduction Continuous Optimization Numerical Results Conclusions

Binding sites alignment

We perform an all-to-all comparisons on a dataset of 100 proteins in complex with one of 9 ligands: AMP , ATP , FAD, FMN, GLC, HEME, NAD, PO4, and TES. The proteins were carefully selected so that the dataset is non-redundant and the binding sites are not evolutionary related. Use atoms near (7 Åfrom the lingad) the binding site. Report qij = 2num. align. atoms num.P1+num.P2 which is between 0 and 1.

slide-18
SLIDE 18

Introduction Continuous Optimization Numerical Results Conclusions

Binding sites alignment

AMP ATP FAD FMN GLC HEM NAD PO4 TES TES PO4 NAD HEM GLC FMN FAD ATP AMP

RED corresponds to high number of aligned atoms (good similarity) Mostly red ares around the main diagonal. Proteins of the same class are correctly classified Proteins belonging to the PO4 group are similar to each other and well separated from other groups Also for HEM and FAD, to some extent, similar considerations apply

slide-19
SLIDE 19

Introduction Continuous Optimization Numerical Results Conclusions

Binding sites alignment

AMP ATP FAD FMN GLC HEM NAD PO4 TES TES PO4 NAD HEM GLC FMN FAD ATP AMP

RED corresponds to high number of aligned atoms (good similarity) Mostly red ares around the main diagonal. Proteins of the same class are correctly classified Proteins belonging to the PO4 group are similar to each other and well separated from other groups Also for HEM and FAD, to some extent, similar considerations apply

slide-20
SLIDE 20

Introduction Continuous Optimization Numerical Results Conclusions

Binding sites alignment

AMP ATP FAD FMN GLC HEM NAD PO4 TES TES PO4 NAD HEM GLC FMN FAD ATP AMP

RED corresponds to high number of aligned atoms (good similarity) Mostly red ares around the main diagonal. Proteins of the same class are correctly classified Proteins belonging to the PO4 group are similar to each other and well separated from other groups Also for HEM and FAD, to some extent, similar considerations apply

slide-21
SLIDE 21

Introduction Continuous Optimization Numerical Results Conclusions

Binding sites recognition

An important aspect of an alignment method is its ability to retrieve, for a given query binding site, those proteins of the dataset binding the same ligand. Given a query binding site: Compute alignment between query and all other (model) binding sites. Rank the alignments from best to worst in term of # aligned atoms. Choose a threshold 0 ≤ s ≤ 1. Query-Model pairing is considered TRUE positive if they bind the same ligand and qij ≥ s. Results summarized by the Receiver Operating Characteristic (ROC) curves.

slide-22
SLIDE 22

Introduction Continuous Optimization Numerical Results Conclusions

Binding sites recognition

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 AMP ATP FAD FMN GLC HEM NAD PO4 TES

ROC curves display the fraction

  • f TP vs fraction of TN for all

positions of the ranked solution. Each curve in figure displays the avarage value obained on all query proteins of a group. s = 0 top right corner; s = 1 bottom left corner. Again proteins of the PO4 group are well recognised. As before, also HEM and FAD are well recognised.

slide-23
SLIDE 23

Introduction Continuous Optimization Numerical Results Conclusions

Binding sites recognition

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 AMP ATP FAD FMN GLC HEM NAD PO4 TES

ROC curves display the fraction

  • f TP vs fraction of TN for all

positions of the ranked solution. Each curve in figure displays the avarage value obained on all query proteins of a group. s = 0 top right corner; s = 1 bottom left corner. Again proteins of the PO4 group are well recognised. As before, also HEM and FAD are well recognised.

slide-24
SLIDE 24

Introduction Continuous Optimization Numerical Results Conclusions

Binding sites recognition

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 AMP ATP FAD FMN GLC HEM NAD PO4 TES

ROC curves display the fraction

  • f TP vs fraction of TN for all

positions of the ranked solution. Each curve in figure displays the avarage value obained on all query proteins of a group. s = 0 top right corner; s = 1 bottom left corner. Again proteins of the PO4 group are well recognised. As before, also HEM and FAD are well recognised.

slide-25
SLIDE 25

Introduction Continuous Optimization Numerical Results Conclusions

Comparison with MolLoc

MolLoc [Angaran et al. 2009] is a binding site alignment tool which is publicly available on the web. 1ATP vs 18 proteins binding ligand ATP Use RMSD and SAS to asses alignment SAS = RMSD × 100 num.align.atom

slide-26
SLIDE 26

Introduction Continuous Optimization Numerical Results Conclusions

Comparison with MolLoc

CO MolLoc Rank Protein Pair

  • N. corresp

RMSD SAS

  • N. corresp

RMSD SAS atoms atoms 1 1atpE-1hck 62 1.2 1.94 45 1.3 2.89 2 1atpE-1phk 57 0.91 1.6 63 0.9 1.43 3 1atpE-1csn 50 1.18 2.36 55 0.9 1.64 4 1atpE-1nsf 34 2.11 6.21 11 1.4 12.73 5 1atpE-1j7k 25 1.81 7.24 25 1.6 6.4 6 1atpE-1e8xA 24 1.74 7.25 20 1.7 8.5 7 1atpE-1f9aC 21 2.17 10.33 18 1.6 8.89 8 1atpE-1kay 20 1.9 9.5 8 1.7 21.25 9 1atpE-1yag 20 1.92 9.6 17 1.6 9.41 10 1atpE-1a82 19 2.02 10.63 13 1.9 14.62 11 1atpE-1jjv 18 1.76 9.78 10 1.8 18 12 1atpE-1gn8A 17 2.37 13.94 14 1.6 11.43 13 1atpE-1b8aA 16 2.05 12.81 10 2 20 14 1atpE-1mjhA 16 2.28 14.25 14 1.9 13.57 15 1atpE-1e2q 15 1.39 9.27 5 1.8 36 16 1atpE-1kp2A 13 1.51 11.62 15 1.9 12.67 17 1atpE-1ayl 12 1.21 10.08 16 2 12.5 18 1atpE-1g5t 7 2.26 32.29 8 1.6 20 avg.SAS 10.04 avg.SAS 12.88

slide-27
SLIDE 27

Introduction Continuous Optimization Numerical Results Conclusions

Conclusions

We used a continuous global optimization method for binding site alignment and comparison; Though iterative, the method is quite fast; Results on a 100 proteins data-set are encouraging and prove, to some extent, usefulness of the method CO is publicly available on the web http://www.iasi.cnr.it/∼liuzzi/BIOCOMP/PSA/

slide-28
SLIDE 28

Introduction Continuous Optimization Numerical Results Conclusions

Future work

Use a continuous global optimization approach for Protein-peptide docking Find docking pocket and correct docking position by minimizing the total potential energy of the interaction Consider peptide to be as flexible as possible Assume total energy is given by Lennar-Jones (ELJ) and Coulomb (EC) potentials Solve a constrained global optimization problem by means of a deterministic gaussian-filling method

slide-29
SLIDE 29

Introduction Continuous Optimization Numerical Results Conclusions

Future work

Use a continuous global optimization approach for Protein-peptide docking Find docking pocket and correct docking position by minimizing the total potential energy of the interaction Consider peptide to be as flexible as possible Assume total energy is given by Lennar-Jones (ELJ) and Coulomb (EC) potentials Solve a constrained global optimization problem by means of a deterministic gaussian-filling method

slide-30
SLIDE 30

Introduction Continuous Optimization Numerical Results Conclusions

Future work

Use a continuous global optimization approach for Protein-peptide docking Find docking pocket and correct docking position by minimizing the total potential energy of the interaction Consider peptide to be as flexible as possible Assume total energy is given by Lennar-Jones (ELJ) and Coulomb (EC) potentials Solve a constrained global optimization problem by means of a deterministic gaussian-filling method

slide-31
SLIDE 31

Introduction Continuous Optimization Numerical Results Conclusions

Future work

Use a continuous global optimization approach for Protein-peptide docking Find docking pocket and correct docking position by minimizing the total potential energy of the interaction Consider peptide to be as flexible as possible Assume total energy is given by Lennar-Jones (ELJ) and Coulomb (EC) potentials Solve a constrained global optimization problem by means of a deterministic gaussian-filling method

slide-32
SLIDE 32

Introduction Continuous Optimization Numerical Results Conclusions

Future work

Use a continuous global optimization approach for Protein-peptide docking Find docking pocket and correct docking position by minimizing the total potential energy of the interaction Consider peptide to be as flexible as possible Assume total energy is given by Lennar-Jones (ELJ) and Coulomb (EC) potentials Solve a constrained global optimization problem by means of a deterministic gaussian-filling method

slide-33
SLIDE 33

Introduction Continuous Optimization Numerical Results Conclusions

Thank you for your attention !