Incorporating Molecular Flexibility into Three- Dimensional - - PowerPoint PPT Presentation

incorporating molecular flexibility into three
SMART_READER_LITE
LIVE PREVIEW

Incorporating Molecular Flexibility into Three- Dimensional - - PowerPoint PPT Presentation

Center for Bioinformatics Tbingen Incorporating Molecular Flexibility into Three- Dimensional Structural Kernels Andreas Jahn 4. German Conference on Chemoinformatics 10.11.2008 Goslar Computer Science Department Computer Architecture


slide-1
SLIDE 1

Center for Bioinformatics Tübingen

Computer Science Department • Computer Architecture • Prof. Zell

Incorporating Molecular Flexibility into Three- Dimensional Structural Kernels

Andreas Jahn

  • 4. German Conference on Chemoinformatics

10.11.2008 Goslar

slide-2
SLIDE 2

Andreas Jahn

2

Introduction & Motivation

“… enzyme and substrate must fit each other like a lock and key.” (Emil Fischer, 1894) “Form follows function.” (Louis H. Sullivan, 1896) Activity is a function of the 3D structure.

  • 3D structure is not unique due to the flexibility
  • f the compounds.
  • What are the possible 3D structures?
  • Possible solution:

Conformational sampling

  • But: Time-consuming & non deterministic

Try to encode the flexibility and possible shape into the data structure.

slide-3
SLIDE 3

Andreas Jahn

3

Basics

Optimal Assignment Kernel

  • Atom-based similarity measure
  • RBF kernel calculates local

atom similarity using atom and bond descriptors

  • Incorporates the local

neighbourhood

  • Atom-wise similarity acts as

weight of an edge in complete bipartite graph.

  • Choose edges that maximize the

sum of the edges.

source; QSAR Comb. Sci., 2006, 25, 4, 317-323

) , ( ) , (

RBF

j i j i w Κ =

+

) , ( ) , (

RBF 1 RBF 1

j1 i2 j1 i1 Κ + Κ γ γ

+

  • +

Κ ) , (

RBF 2

j2 i3 γ

slide-4
SLIDE 4

Andreas Jahn

4

Basics Two problems of the Optimal Assignment Kernel Fix the kernel matrix with

source; J.-P. Vert, Technical Report HAL-002182 78, 2008

Ι − Κ ← Κ

min

λ

The Optimal Assignment Kernel is not a valid kernel function. No consideration of the flexibility and the shape of the structures.

slide-5
SLIDE 5

Andreas Jahn

5

Methods - Overview

Two different methods were implemented

  • OAKFLEX
  • Encode the neighbourhood flexibility space relative to an atom.
  • Determine the similarity of the flexibility space.
  • Incorporate the similarity of the flexibility space into the Optimal

Assignment Kernel.

  • Rigid Superposition
  • Identify rigid scaffolds of the structures.
  • Superposition of rigid fragments and determine a similarity score.
  • Integrate the similarity score into the Optimal Assignment Kernel.
slide-6
SLIDE 6

Andreas Jahn

6

Rigid Superposition

  • Rule-based expert system identifies rigid scaffolds of the

structures.

  • Calculate all pairwise similarity values.
  • Calculate the optimal assignment of the fragments.

Molecule a Molecule b

Fragment #1 Fragment #2 Fragment #1 Fragment #2 Molecule a Similarity of Fragments Molecule b 19.262 6.599 F #2 8.676 23.662 F #1 F #2 F #1

23.662 19.262

slide-7
SLIDE 7

Andreas Jahn

7

Rigid Superposition

  • Superposition of the assigned fragments.
  • Calculate a similarity score based on the overlap volume.
  • Integrate information into the Optimal Assignment Kernel.
slide-8
SLIDE 8

Andreas Jahn

8

OAKFLEX

  • The flexibility space results from rotatable bonds.

All single bonds outside of a ring generate flexibility spaces.

  • Flexibility space of the whole molecule is important.

For each atom the relative flexibility space has to be enumerated.

  • Flexibility spaces have to be comparable.

Unique parameterisation of the space necessary.

Encode the neighbourhood flexibility space relative to an atom

slide-9
SLIDE 9

Andreas Jahn

9

OAKFLEX

Flexibility space and the unique parameterisation

  • Core atom acts as origin.
  • Parameterisation of the flexibility relative to core atom.
  • 1st order rotation is parameterised by d1 and r1.

Core atom Neighbour n1 Rotatable bond Neighbour n2

slide-10
SLIDE 10

Andreas Jahn

10

OAKFLEX

Enumeration of the 1st order rotations

  • Depth-limited search with limited depth of 2.
  • Prune subtrees after rigid bond.

1 2 7 8 4 9 10 11 3 6 5

slide-11
SLIDE 11

Andreas Jahn

11

OAKFLEX

Extension to the 2nd order rotation

  • Unique parametrization by M1, M2, r2 and h.
  • Additional flag necessary for case

differentiation.

Core atom

h

slide-12
SLIDE 12

Andreas Jahn

12

OAKFLEX

Different cases of the 2nd order rotation

  • Both cases are special cases of the 1st order rotation.

Only two parameters and the flag are necessary.

Core atom Core atom Atom n1

slide-13
SLIDE 13

Andreas Jahn

13

OAKFLEX

Enumeration of the 2nd order rotations

  • Depth-limited search with limited depth of 3.
  • Prune subtrees after 2 rigid bonds.
slide-14
SLIDE 14

Andreas Jahn

14

OAKFLEX

Similarity calculation of two flexibility spaces

  • RBF kernel based on the parameters.
  • Individual σ to adjust weight of the parameter

Core atom Core atom

d r d r

Parameters: Parameters:

        − + − −

=

r d

r r d d

e

σ σ 2 ) ( 2 ) (

2 2

Similarity

slide-15
SLIDE 15

Andreas Jahn

15

OAKFLEX

Comparison of the flexibility spaces of two core atoms

  • Atoms have list of flexibility spaces.
  • But: Only one similarity value is needed.

Calculate similarity matrix and use

  • ptimal assignment.

Atom a Atom b #1 #2 #1 #2 #3 RBF(#2,#3) RBF(#2,#2) RBF(#2,#1) #2 RBF(#1,#3) RBF(#1,#2) RBF(#1,#1) #1 #3 #2 #1 Similarity

) , ( ) , ( ) , ( ) , ( b b k a a k b a k b a k ←

Normalize similarity value

slide-16
SLIDE 16

Andreas Jahn

16

OAKFLEX

Overview of the calculation steps

Atom A 1st R. Atom B 1st R. Atom A 2nd R. Atom B 2nd R. RBF-Kernel RBF-Kernel Matrix 1st R. Matrix 2nd R. Hungarian Method Hungarian Method Flex-Matrix Local atom similarity Matrix Hungarian Method Normalisation Kernel value

OAK

1st R. similarity matrix 2nd R. similarity matrix

slide-17
SLIDE 17

Andreas Jahn

17

Results

  • Methods evaluated on 8 QSAR datasets compiled by

Sutherland et al.

  • Using ε-SVR to build models.
  • Seeded 10-fold multirun
  • Equal folds for both methods
  • Comparison of the methods possible
  • 100 multiruns generate 1000 MSE values.
  • Each value is considered as a sample of a Gaussian distribution.

Paired Wilcoxon signed-rank test determines significant shifts of the mean.

  • Hypotheses for the test:

source; J. Med. Chem., 2004, 47, 22, 5541-5554

FLEX FLEX

H H

OAK OAK 1 OAK OAK

: : µ µ µ µ > =

slide-18
SLIDE 18

Andreas Jahn

18

Results

0.022 0.59 ± 0.24 0.42 ± 0.24 0.57 ± 0.25 0.47 ± 0.26 THRθ 0.149 0.66 ± 0.22 1.56 ± 1.00 0.64 ± 0.21 1.64 ± 0.96 THERη 0.089 0.60 ± 0.29 0.53 ± 0.37 0.59 ± 0.25 0.55 ± 0.33 GPBζ 0.001 0.73 ± 0.08 0.60 ± 0.17 0.71 ± 0.08 0.64 ± 0.19 DHFRε 0.001 0.53 ± 0.12 0.97 ± 0.27 0.51 ± 0.13 1.02 ± 0.31 COX2δ < 0.001 0.54 ± 0.17 0.58 ± 0.25 0.48 ± 0.19 0.67 ± 0.30 BZRγ 0.02 0.52 ± 0.19 0.80 ± 0.30 0.48 ± 0.21 0.86 ± 0.36 AchEβ 0.98 0.71 ± 0.13 1.52 ± 0.63 0.73 ± 0.13 1.48 ± 0.61 ACEα p-value Q2 MSE Q2 MSE Dataset OAKFLEX OAK

α Angiotensine Converting Enzyme, β Acetylcholinesterase, γ Benzodiazepine Receptor, δ Cyclooxygenase II, ε Dihydrofolate Reductase, ζ Glycogen Phosphorylase B, η Thermolysin, θ Thrombin

slide-19
SLIDE 19

Andreas Jahn

19

Computation time

Comparison of the avg. runtime

  • Overhead between 16% and 70%.
  • Overhead correlates with the flexibility of the molecules.

1.53 1.70 1.46 1.36 1.37 1.16 1.26 1.49 Factor Dataset 51% 24% 36% 58% 66% 77% 71% 35% Ring atoms 18.0 12.3 6.9 8.2 8.5 5.6 8.5 7.9 Ø OAKFLEX (ms) 11.7 7.2 4.7 6.0 6.2 4.8 6.7 5.3 Ø OAK (ms) THR THERM GPB DHFR COX2 BZR AchE ACE

slide-20
SLIDE 20

Andreas Jahn

20

Discussion

  • Interpretation of kernel models are difficult.
  • But: Visualization of the mappings disclose differences.

OAK OAKFLEX

slide-21
SLIDE 21

Andreas Jahn

21

Conclusion

  • Method incorporates molecular flexibility for similarity

calculations.

  • Significant performance gain in 5 of 8 QSAR datasets.
  • Type of encoding the flexibility not suitable for all datasets.

(ACE: Quality of the model decreased)

  • Publication: Fechner, N.; Jahn, A.; Hinselmann, G.; Zell, A. Journal of

Chemical Information and Modeling, in revision.

slide-22
SLIDE 22

Center for Bioinformatics Tübingen

Computer Science Department • Computer Architecture • Prof. Zell

Acknowledgement

I thank Nikolas Fechner, Georg Hinselmann and Andreas Zell.

slide-23
SLIDE 23

Center for Bioinformatics Tübingen

Computer Science Department • Computer Architecture • Prof. Zell

Thank you for your attention

slide-24
SLIDE 24

Andreas Jahn

24

OAKFLEX

Performance tuning

  • Hungarian method:
  • Performance problem due to high number of calculations.

Implementation of a greedy heuristic to reduce computational cost.

( )

( )

3

b a + Ο

Hungarian Heuristic Hungarian Heuristic 0,008% 0,561% Difference 2,18 2,709 2,091 2,079 Ø sum 2nd order rotations 1st order rotations