Exploring Activity Cliffs from a Chemoinformatics Perspective Jrgen - - PowerPoint PPT Presentation

exploring activity cliffs from a chemoinformatics
SMART_READER_LITE
LIVE PREVIEW

Exploring Activity Cliffs from a Chemoinformatics Perspective Jrgen - - PowerPoint PPT Presentation

Exploring Activity Cliffs from a Chemoinformatics Perspective Jrgen Bajorath Life Science Informatics University of Bonn Activity Cliff Concept Activity cliff is generally defined as a pair of structurally similar active compounds with


slide-1
SLIDE 1

Exploring Activity Cliffs from a Chemoinformatics Perspective

Jürgen Bajorath Life Science Informatics University of Bonn

slide-2
SLIDE 2

Activity Cliff Concept

¡ Activity cliff is generally defined as a pair of structurally similar active compounds with a large difference in potency

Analogs

Paradigm: “small chemical modifications – large biological effects“ è high SAR information content

2390 nM 6 nM

slide-3
SLIDE 3

Activity Cliffs in Medicinal Chemistry

¡ Utility in SAR analysis and compound optimization ¡ Which compound to make next? ¡ Typically focused on individual compound series ¡ Methodological simplicity and chemical intuition are key to practical utility in med. chem.

slide-4
SLIDE 4

Activity Cliffs in Chemoinformatics

¡ Much stronger emphasis on methodological aspects ¡ Departure from individual series toward global analysis

slide-5
SLIDE 5

Activity Cliffs in Chemoinformatics

¡ Molecular representation dependence ¡ Large-scale compound data mining ¡ Activity cliff networks ¡ Prediction of activity cliffs

slide-6
SLIDE 6

Activity Cliffs

¡ Activity cliff is generally defined as a pair of structurally similar active compounds with a large difference in potency

Definition requires consideration of: Similarity criterion Potency difference criterion

2390 nM 6 nM

Analogs

slide-7
SLIDE 7

Activity Cliff Definition

¡ Alternative similarity criteria

Fingerprint Tanimoto similarity MACCS Tc 0.85, ECFP4 Tc 0.55 Substructure-based similarity

Matched molecular pairs, scaffolds ¡ Potency difference criterion

Usually at least 1 or 2 orders of magnitude (10- or 100-fold)

slide-8
SLIDE 8
  • 1. Molecular Representations

¡ Activity cliff distribution is strongly influenced by selected molecular representations and similarity criteria

¡ Qualifying pairs (QPs)

− QPs are compound pairs exceeding a given similiarity threshold

¡ Activity cliff frequency

− percentage of QPs with a more than 100-fold difference in potency

slide-9
SLIDE 9

Molecular Representation Dependence

¡ QPs and activity cliff distribution for six different fingerprints

5.47% 5.47% 5.87% 6.78% 7.43% 8.88% 3.36% 8.99% 447,224 414,224 467,592 512,026 468,145 563,445 130,223 1,076,177

ECFP4 FCFP4 GpiDAPH3 MACCS TGT TGD Consensus Union

¡ 128 activity classes from ChEMBL with more than 100 compounds ¡ 35,021 unique compounds

Stumpfe D, Hu Y, Dimova D & Bajorath J. J Med Chem, 57, 18 (2014)

slide-10
SLIDE 10

Activity Cliff-Forming Compounds

¡ Percentage of compounds that form at least one activity cliff ¡ Union of cliff-forming compounds: More than 64% of all compounds form at least one cliff

Stumpfe D, Hu Y, Dimova D & Bajorath J. J Med Chem, 57, 18 (2014)

128 activity classes (>100 cpds) from ChEMBL

35.3% 34.2% 36.1% 41% 37.2% 41.4% 14.7% 64.5%

ECFP4 FCFP4 GpiDAPH3 MACCS TGT TGD Consensus Union

slide-11
SLIDE 11

MMPs as Molecular Represetation

¡ A Matched Molecular Pair (MMP) is formed by two structurally related compounds that

− differ only by a small structural change at a single site − are related by the exchange of a substructure (termed chemical transformation)

MMP

slide-12
SLIDE 12

Transformation Size Restriction

¡ Transformation size-restricted MMPs were introduced to limit transformations to small and chemically intuitive replacements

Examples of largest permitted transformations

slide-13
SLIDE 13

Preferred Activity Cliff Definition

¡ Transformation size-restricted MMPs

  • substructure-based similarity assessment (med. chem. focus)

¡ At least 100-fold difference in potency ¡ Equilibrium constants (Ki)

MMP

7.2 pKi 4.6 pKi

Stumpfe D & Bajorath J. J Chem Inf Model 52, 2348 (2012)

slide-14
SLIDE 14

Activity Cliff-Forming Compounds

¡ MMPs and six fingerprint representations ¡ MMPs yield smallest percentage of cliff compounds 27.5%

35.3% 34.2% 36.1% 41% 37.2% 41.4% 14.7% 10.9% 64.5% 65.6% 35021 35021 35021 35021 35021 35021 35021 35021 35021 35021 35021

MMP ECFP4 FCFP4 GpiDAPH3 MACCS TGT TGD Consensus (FP only) Consensus Union (FP only) Union

128 activity classes (>100 cpds) from ChEMBL

Stumpfe D, Hu Y, Dimova D & Bajorath J. J Med Chem, 57, 18 (2014)

slide-15
SLIDE 15
  • 2. Large-Scale Data Mining

Proportion of bioactive compounds forming activity cliffs ? Percentage of all bioactive compounds involved in the formation of activity cliffs (ChEMBL survey):

31.7% (ECFP4/Tanimoto-based cliffs) 22.8% (MMP-cliffs)

slide-16
SLIDE 16

Large-Scale Data Mining

Currently available high-confidence activity cliffs ?

(ChEMBL version 17)

20,080 MMP-cliffs detected for 293 targets involving 11,783 unique active compounds

slide-17
SLIDE 17

Target Distribution

200 400 600 800 1000 1200 1400 1600 10 20 30 40 50 60

% MMP-cliffs

2 40 60 8 100 120 200 400 600 800 1000 1200 1400 1600

# Compounds

% Cliff-forming compounds

414 activity classes from ChEMBL

Hu Y, Stumpfe D, Bajorath J. F1000Research 2, 199 (2013)

slide-18
SLIDE 18

Target Distribution

200 400 600 800 1000 1200 1400 1600 10 20 30 40 50 60

For data set with >200 cpds, activity cliffs and cliff compounds are fairly evenly distributed among many different targets

2 40 60 8 100 120 200 400 600 800 1000 1200 1400 1600

# Compounds 414 activity classes from ChEMBL

slide-19
SLIDE 19

Ligand Efficiency (LE) for MMP-Cliffs

¡ Changes in LE accompanying activity cliff formation ¡ Difference in LE between weakly and highly potent cliff partners ¡ LE increase detected for 99.1% of all activity cliffs; average Δ LE = 6.27

2 4 6 8 10 12 4 9 14 19 24 29 34 39

Percentage of cliff partners Ligand efficiency Weakly potent cliff partner Highly potent cliff partner

LE = pKi / MW

de la Vega de Leon A & Bajorath J. AAPS J 16, 335 (2014)

slide-20
SLIDE 20

Lipophilic Efficiency (LipE)

¡ Changes in LipE accompanying activity cliff formation ¡ Difference in LipE between weakly and highly potent cliff partners ¡ LipE increase detected for 96.7% of all activity cliffs; average Δ LipE = 2.42

Weakly potent cliff partner Highly potent cliff partner

LipE = pKi - cLogP

2 4 6 8 10 12

  • 5
  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Percentage of cliff partners Lipophilic efficiency

slide-21
SLIDE 21
  • 3. Activity Cliff Network Analysis
slide-22
SLIDE 22

Isolated vs. Coordinated Cliffs

¡ ‘Isolated‘ cliffs: cliff partners are only involved in a single activity cliff ¡ ‘Coordinated‘ cliffs: cliff partners are involved in multiple and overlapping activity cliffs

Cliff type Isolated cliffs % Coordinated cliffs % MACCS 1.4 98.6 ECFP4 2.2 97.8 MMP-cliffs 3.5 96.5 128 activity classes (>100 cpds) from ChEMBL

slide-23
SLIDE 23

Isolated vs. Coordinated Cliffs

¡ MMP-cliff network for serotonin 1d receptor ligands

46 compounds (nodes) 69 MMP-cliffs (edges) 2 isolated cliffs 67 coordinated cliffs highly potent cliff partner weakly potent cliff partner both highly and weakly potent cliff partner

slide-24
SLIDE 24

Global MMP-Cliff Network

¡ ChEMBL 17 ¡ 14,044 nodes (compounds) ¡ 20,080 edges (MMP-cliffs) ¡ Many separate components ¡ 2072 clusters

Stumpfe D et al. & Bajorath J. J Chem Inf Model 54, 451 (2014)

slide-25
SLIDE 25

Activity Cliff Cluster Size Distibution

¡ 769 isolated cliffs ¡ 1303 coordinated cliff cluster ¡ 26 clusters with > 50 compounds ¡ 420 clusters comprising six to 15 compounds

Cluster size 1-5 6-10 10-15 15-20 21-30 31-40 51-60 41-50 61-70 71-80 81-90 91-100 101-152 # Cluster 1463 306 114 65 56 27 11 15 4 2 3 2 4

slide-26
SLIDE 26

Node Degree Distribution

¡ Average node degree 2.9 ¡ The union of all clusters follows a power law P(k)~k-γ with γ having a value of 2.5, which is characteristic of scale-free networks ¡ Many densely connected nodes: activity cliff hubs

Node degree 1-4 5-9 10-14 15-20 21-30 31-40 51-60 41-50 61-70 # Nodes 11878 1552 341 155 85 17 4 9 3

slide-27
SLIDE 27

Network Modification

¡ Deletion of all hubs with a degree ≥ 5 (2166 nodes, i.e. 15.4%)

slide-28
SLIDE 28

Network Modification

¡ Deletion of all hubs with a degree ≥ 10 (614 nodes, i.e. 4.4%)

slide-29
SLIDE 29

Global MMP-Cliff Network

¡ 2072 clusters ¡ 769 isolated cliffs ¡ 19,311 coordinated cliffs in 1303 clusters ¡ 450 cluster topologies with 1 to 769 instances

Stumpfe D et al. & Bajorath J. J Chem Inf Model 54, 451 (2014)

slide-30
SLIDE 30

Activity Cliff Cluster Topologies

¡ Topologies with ≥ 3 instances ¡ Identification of 3 recurrent main topologies

n n Star Chain Rectangle

slide-31
SLIDE 31

Activity Cliff Cluster Topologies

¡ Topologies with ≥ 3 instances ¡ Cover 861 of 1303 clusters main topologies

n n Star Chain Rectangle

slide-32
SLIDE 32

Main Topologies and Extensions

Main topology Extensions

  • f main topology

Star

Twin Star

Chain

Modified Chain

Rectangle

Modified Rectangle

n n n n n n

slide-33
SLIDE 33

Main Topologies and Extensions

Main topology Extensions

  • f main topology

Hybrid topologies Irregular topologies Star Chain Rectangle n n n n n n

slide-34
SLIDE 34

Star Topology Example

¡ Adenosine A3 receptor ligands

8.3 pKi 5.5 pKi 5.6 pKi 6.2 pKi 4.8 pKi 5.1 pKi

slide-35
SLIDE 35

Star Topology Example

¡ Adenosine A3 receptor ligands

6.9 pKi 6.7 pKi 6.2 pKi 9.1 pKi 6.7 pKi

slide-36
SLIDE 36

Rectangle Topology Example

¡ Adenosine A2b receptor ligands

8.0 pKi 6.0 pKi 5.4 pKi 8.3 pKi

slide-37
SLIDE 37
  • 4. Can We Predict Activity Cliffs?

¡ Support vector machines for prediction of activity cliffs in compound data sets ¡ Non-trivial problem: compound pairs (with different potency) need to be predicted ¡ Design of compound pair-based kernel functions

slide-38
SLIDE 38

Support Vector Machines (SVMs)

¡ Derivation of a separating hyperplane in chemical space between positive and negative training compounds ¡ If no linear separation is possible data are projected into higher dimensional spaces through the use of kernel functions

Negative class Positive class

slide-39
SLIDE 39

Support Vector Machines (SVMs)

¡ Binary classification of test compounds depending on which side of the hyperplane they fall ¡ Ranking of test compounds based on their (positive or negative) distance from the hyperplane

Negative class Positive class

slide-40
SLIDE 40

SVMs in Compound Pair Space

pKi = 6.3 pKi = 8.9 pKi = 7.5 pKi = 8.1

Negative class Positive class

Heikamp K et al. & Bajorath J. J Chem Inf Model 52, 2354 (2012)

¡ Data points are compound pairs (MMPs) ¡ Negative class:

  • MMPs not forming

activity cliffs

¡ Positive class:

  • MMPs forming

activity cliffs

¡ Reference space: compound pair space

slide-41
SLIDE 41

Design of a Transformation Kernel

non-Cliff Activity Cliff Design principle:

  • encode activity cliff transformations and compare them

with transformations from non-cliffs

MMP 2 MMP 1

pKi = 7.5 pKi = 8.1 pKi = 6.3 pKi = 8.9

slide-42
SLIDE 42

Transformation Kernel

Transformation space Step 1: Fingerprint representation of transformation substructures Trans2 Trans1 (fingerprints: structural keys or atom pairs)

MMP 2 MMP 1

pKi = 7.5 pKi = 8.1 pKi = 6.3 pKi = 8.9

slide-43
SLIDE 43

Kernel for Compound Pairs

MMP 2 MMP 1

pKi = 7.5 pKi = 8.1 pKi = 6.3 pKi = 8.9 Core 1 Trans 1 Trans 2 Core 2

slide-44
SLIDE 44

Transformation Kernel

Step 2: Substructure difference vector (size 2n) from transformation mini-fingerprints (1n) (each pair of transformations yields feature vectors u and v) Substructure difference vector contains all features that distinguish the transformation substructures Trans 1 Trans 2 Core 2 Core 1

slide-45
SLIDE 45

Transformation Kernel

Step 3: Kernel from substructure difference feature vectors Trans 1 Trans 2 Core 2 Core 1 Transformation space (calculate Tanimoto kernel from feature vectors) Ktransformation(u, v) = K(u, v)

slide-46
SLIDE 46

Core 1 Trans 1 Trans 2 Core 2

Design of an MMP Kernel

Design principle:

  • combine core structure and transformation information
  • add core structure representation to substructure difference vectors

non-Cliff Activity Cliff

slide-47
SLIDE 47

MMP Kernel

(fingerprint representation of core: structural fragments or atom environments) Transformation space Core space

Core1 c1 Core2 c2 Ktransformation(u, v) = K(u, v)

Core 1 Trans 1 Trans 2 Core 2

slide-48
SLIDE 48

MMP Kernel

Combining core and transformation feature vectors yields a kernel product Core 1 Trans 1 Trans 2 Core 2

slide-49
SLIDE 49

MMP Kernel

Transformation space Core space KMMP= Kcore(c1, c2) x Ktransformation(t1,t2) KMMP= KTanimoto(c1, c2) x KTanimoto(t1,t2) Core 1 Trans 1 Trans 2 Core 2

slide-50
SLIDE 50

Accurate Prediction of Activity Cliffs

Target Transformation kernel MMP kernel TPR TNR P F-score TPR TNR P F-score *fxa 72.69 91.79 50.45 59.51 82.17 96.11 70.92 76.03 *mcr4 78.15 96.72 53.02 63.01 83.05 99.06 80.82 81.82 *kor 66.87 88.92 35.44 46.26 72.58 96.87 67.88 70.04 *thr 81.15 90.23 58.99 68.29 84.05 95.43 76.20 79.85 *aa3 71.95 88.46 38.63 50.19 74.45 97.20 73.23 73.57 cal2 97.44 95.40 92.14 94.65 97.69 97.63 95.79 96.70 catb 88.33 97.56 91.10 89.39 90.83 98.67 95.43 92.76 dpp8 99.29 100.00 100.00 99.63 99.29 100.00 100.00 99.63 jak2 92.73 88.42 82.82 87.30 91.82 90.53 85.55 88.28

Parameters: ¡ MACCS for transformation substructure representation ¡ Molprint2D for core structure representation ¡ Tanimoto/transformation kernel, Tanimoto/MMP kernel TPR True positive rate TNR True negative rate P Precision F-score 2 · TPR · P/(TPR + P)

* Unbalanced composition: ratio of non-cliffs to cliffs between 6 and 21

slide-51
SLIDE 51

Accurate Prediction of Activity Cliffs

Results: ¡ Both methods can accurately predict activity cliffs in different data sets ¡ Prediction accuracy is further improved when core structure information is added to transformation information (MMP kernel)

Target Transformation kernel MMP kernel TPR TNR P F-score TPR TNR P F-score fxa 72.69 91.79 50.45 59.51 82.17 96.11 70.92 76.03 mcr4 78.15 96.72 53.02 63.01 83.05 99.06 80.82 81.82 kor 66.87 88.92 35.44 46.26 72.58 96.87 67.88 70.04 thr 81.15 90.23 58.99 68.29 84.05 95.43 76.20 79.85 aa3 71.95 88.46 38.63 50.19 74.45 97.20 73.23 73.57 cal2 97.44 95.40 92.14 94.65 97.69 97.63 95.79 96.70 catb 88.33 97.56 91.10 89.39 90.83 98.67 95.43 92.76 dpp8 99.29 100.00 100.00 99.63 99.29 100.00 100.00 99.63 jak2 92.73 88.42 82.82 87.30 91.82 90.53 85.55 88.28 Heikamp K et al. & Bajorath J. J Chem Inf Model 52, 2354 (2012)

slide-52
SLIDE 52

Activity Cliff Transformations

¡ Identification of characteristic cliff transformations leading to highly potent compounds

calpain 2 inhibitors Important structural patterns

highly potent compounds weakly potent compounds

slide-53
SLIDE 53

Activity Cliff Summary

  • Similarity / potency difference criteria are critical
  • Cliffs can be represented in different ways
  • Preference for MMP-cliffs
  • Bioactive compounds frequently form activity cliffs
  • Similar distribution over different targets
  • Most cliffs are formed in a coordinated manner
  • Global activity cliff network: scale-free
  • Activity cliff clusters with recurrent topology
  • Prediction of activity cliffs via SVM / MMP kernels