Euclidean Distance Geometry Leo Liberti IBM Research, USA CNRS LIX - - PowerPoint PPT Presentation
Euclidean Distance Geometry Leo Liberti IBM Research, USA CNRS LIX - - PowerPoint PPT Presentation
Euclidean Distance Geometry Leo Liberti IBM Research, USA CNRS LIX Ecole Polytechnique, France MFD 2014, Campinas [L., Lavor: Introduction to Euclidean Distance Geometry , in preparation] Table of contents 1. Applications 2. Definition 3.
Table of contents
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
2
Applications
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
3
Clock Synchronization
5 7 3 4 Atomic clock (S) 16:27 Chuck Alice Bob
↓
16:20 16:22 16:24 16:26 16:28 16:30
A C S B
16:21 16:23 16:25 16:27 16:29 16:31
[Singer, 2011]
4
Sensor network localization
29
0 / 2.5078136
1 / 1.9790651
2 / 1.1364952
3 / 2.9295580
4 / 2.554891
5 / 2.4284 175 / 2.49225 177 / 1.427687
176 / 2.73147 211 / 0.989917 214 / 2.3178649
210 / 2.8329184
213 / 3.0201866
212 / 1.56417 268 / 1.76662 267 / 2.31042 271 / 1.7541769
269 / 2.062273
270 / 3.1047985
272 / 1.15897 346 / 1.500561 18
6 / 1.9012637
7 / 2.6135241
8 / 1.660445
9 / 1.9291357
10 / 2.6640758
11 / 0.65886183
12 / 1.32064 110 / 1.29792 111 / 0.840347 113 / 2.14071 114 / 2.0357647
112 / 2.5425 215 / 2.12848 217 / 3.0822 218 / 1.98508 216 / 2.70003 229 / 2.97748 231 / 1.63275 232 / 2.32966 230 / 2.92252 249 / 1.49322 250 / 1.4467 251 / 3.11231 289 / 2.505681
290 / 2.54013 291 / 1.979312 5
13 / 1.6222714
14 / 2.0101517
15 / 2.6842220
16 / 3.0587223
17 / 2.4833242
18 / 1.0575664
19 / 2.2135667
20 / 2.4291288
21 / 2.3414693
22 / 1.0662 28 / 2.06104 29 / 1.80061 31 / 1.46243 33 / 1.38178 35 / 1.0118633
30 / 3.1539265
32 / 2.1972670
34 / 3.17108 82 / 2.05285 83 / 1.9929 84 / 2.90088 85 / 0.965501 87 / 1.40579 88 / 2.51699 86 / 3.1900398
89 / 2.50718 106 / 1.36517 107 / 2.2605 109 / 2.8833655
108 / 2.81041 119 / 2.25758 123 / 0.666733 126 / 2.79445 122 / 1.26894 124 / 2.4502971
125 / 2.61515 127 / 2.552356
120 / 2.4616863
121 / 3.15779 143 / 2.62768 145 / 3.01294 146 / 1.6287189
147 / 3.16128 144 / 2.60777 233 / 1.59139 234 / 3.07459 235 / 1.62072 316 / 1.57213 318 / 2.5408492
317 / 3.16387 326 / 2.30061 325 / 2.97621 327 / 2.62621 350 / 3.189273
23 / 0.844888 26 / 2.43216 24 / 1.43953 25 / 2.52302 27 / 1.2847 329 / 2.97101 331 / 1.47226 330 / 2.8153379
339 / 3.17026 195 / 3.02256 196 / 2.72116 194 / 3.1643153
193 / 0.88823 319 / 1.78484 322 / 3.15941 320 / 1.19371 321 / 2.59083 332 / 2.8817177
333 / 2.54276
36 / 2.65816 37 / 2.84506 39 / 3.0091339
38 / 0.977096 40 / 0.14758390
41 / 2.54617 223 / 1.11811 225 / 1.7429172
224 / 2.873 257 / 2.650417 10
42 / 1.6771116
43 / 1.2809230
44 / 2.6039944
45 / 1.93798 46 / 0.9094678
47 / 1.9520486
48 / 3.170297
49 / 2.91638 50 / 2.79803 59 / 2.64882 61 / 3.11897 63 / 2.57151 64 / 0.63344 65 / 1.4705915
58 / 2.5359621
60 / 2.7287459
62 / 2.44092 98 / 2.45974 102 / 2.89582 101 / 3.00823 100 / 2.72757 103 / 1.23805 104 / 2.65248 105 / 1.5614924
99 / 2.76855 182 / 2.80251 183 / 2.17394 178 / 0.684621 184 / 2.04844 186 / 2.36739 179 / 1.75353 181 / 1.2755962
180 / 2.82612 185 / 2.02673 244 / 2.90515 245 / 2.54461246 / 1.4908 248 / 2.19914 242 / 2.10075 243 / 1.81286 247 / 2.70375 334 / 2.85895 335 / 2.74921 345 / 0.96983896
344 / 3.015178
52 / 1.59789 53 / 3.110913
51 / 2.96353 54 / 2.606294
55 / 2.44249 81 / 2.4437231
80 / 3.09041 352 / 3.1736395
353 / 1.696069
56 / 2.23711 57 / 2.87413 95 / 2.43097 97 / 1.79475 90 / 0.575857 91 / 1.8488260
92 / 3.1144161
93 / 0.74585374
94 / 1.08674 96 / 1.91462 132 / 2.48921 134 / 1.67584 128 / 2.41661 129 / 2.54924 130 / 0.857403 131 / 1.03428 133 / 1.33929 295 / 2.78515 296 / 3.03355 297 / 2.79379 292 / 2.23004 293 / 2.5905976
294 / 2.2505211 22
66 / 2.5433525
67 / 1.5903926
68 / 2.5564435
69 / 1.2218343
70 / 1.2632346
71 / 0.657841 72 / 1.45548 73 / 1.45696 74 / 2.4359799
75 / 0.710271 135 / 1.03939 137 / 2.64809 138 / 2.36917 139 / 2.85603 141 / 1.99344 142 / 0.63212928
136 / 3.03397 140 / 1.63894 150 / 2.05658 151 / 2.68427 152 / 1.33314 153 / 1.85365 155 / 1.0403 156 / 1.22973 157 / 2.29037 154 / 2.67744 158 / 3.05556 159 / 2.34031 160 / 1.82595 162 / 2.70381 163 / 2.43938 161 / 1.90326 203 / 0.991465 204 / 1.81584 206 / 2.66247 207 / 2.45539 208 / 2.2551 209 / 1.246648
205 / 3.06568 236 / 1.90515 238 / 2.5944 239 / 2.72018 240 / 3.16484 241 / 0.720351 237 / 2.87679 252 / 0.86256 254 / 0.843728 255 / 2.43018 256 / 1.26966 253 / 2.83326 262 / 0.880856 263 / 3.06774 264 / 1.88647 261 / 1.9719 348 / 2.26962 349 / 2.1111 351 / 3.0276612
78 / 2.35652 76 / 0.500095 77 / 3.13454 79 / 2.02458 173 / 2.85379 172 / 2.85207 174 / 2.12157 281 / 1.77315 280 / 2.13908 187 / 0.741716 188 / 1.28288 304 / 2.67349 298 / 3.19922 301 / 3.10155 303 / 1.246468
299 / 2.68553 300 / 2.9330782
302 / 2.70564 306 / 3.166 308 / 2.46248 305 / 0.378469 307 / 1.95331 341 / 2.70349 340 / 1.87485 354 / 2.04535 149 / 1.7129232
148 / 2.6343819
117 / 2.4561 118 / 1.55824 115 / 1.4090450
116 / 0.648045 259 / 2.59804 260 / 2.60371 258 / 2.04883 265 / 2.53475 266 / 1.20584 285 / 3.04711 284 / 1.28966 286 / 0.444549 287 / 2.49879 283 / 0.739388 282 / 2.4463 288 / 2.40494 312 / 2.0258 313 / 1.03139 314 / 2.65429 315 / 1.7063834
189 / 2.9280638
190 / 2.0733940
191 / 2.80145 192 / 2.35256 324 / 2.63064 323 / 2.5872927
171 / 2.23482 167 / 2.38609 164 / 1.92141 165 / 2.6582 166 / 2.28031 168 / 2.41683 169 / 0.955716 170 / 0.967808 199 / 2.18351 197 / 0.88776 198 / 1.51159 200 / 0.583974 201 / 2.87535 202 / 1.31584 220 / 2.98526 219 / 1.26904 221 / 0.308754 222 / 1.8407 226 / 1.33417227 / 3.09905 228 / 1.31378 328 / 1.65874 338 / 1.28689 337 / 1.8367575
336 / 3.11488 347 / 3.11676 310 / 2.37803 309 / 1.9722 311 / 1.13749 276 / 3.05065 277 / 2.40579 273 / 2.81885 275 / 3.02541 274 / 2.4944454
278 / 3.02172 279 / 1.17834 343 / 1.31891 342 / 2.50522− → [Yemini, 1978]
5
Protein conformation from NMR data
1 2 3 4 5 9 6 8 10 7 11
− → [Crippen & Havel 1988]
6
Clock synchronization: solutions
|xA − xB| = 7 |xA − xC| = 3 |xA − xS| = 5 |xB − xC| = 4 xS = 16:27.
5 7 3 4
xA xB xC xS
xS = 16:27 xA = 16:22 xB=16:15 xC=16:19 xB=16:15 xC=16:25 xB=16:29 xC=16:19 xB=16:29 xC=16:25 xA = 16:32 xB=16:25 xC=16:29 xB=16:25 xC=16:35 xB=16:39 xC=16:29 xB=16:39 xC=16:35
7
Definition
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
8
Distance Geometry Problem (DGP)
Given: Determine whether ∃: – a simple graph G = (V, E) – an edge function d : E → R≥0 – an integer K ∈ N a realization x : V → RK s.t. ∀{u, v} ∈ E xu − xv2 = duv 1 2 3 4 1 √3 1 1 1 1 1 2 3 4
Let n = |V |
9
More applications
- Autonomous underwater vehicles [Bahr et al. 2009]
- Statics of rigid structures [Maxwell 1864]
- Matrix completion [Laurent 2009]
- Statistics [Boer 2013]
- Psychology [Kruskal 1964]
[Liberti et al., SIREV 2014]
10
Complexity primer
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
11
Definitions
- Decision problem: mathematical YES/NO-type question depending
- n a parameter vector π
- Instance: same as above with π replaced by given values v
- Certificate: proof that a given answer is true
- P: all decision problems solvable in at most p(|π|) steps
where p is a polynomial
- NP: all decision problems with |YES certificate| ≤ p(|π|)
where p is a polynomial
12
Reductions
- P, Q: decision problems
- If ∃ algorithm A which:
- 1. reformulates instances ¯
P of P into instances ¯ Q of Q
- 2. has answer( ¯
P) = YES iff answer(A( ¯ Q)) = YES
- 3. is polytime in the instance size | ¯
P| then A is a reduction of P to Q
13
NP-hardness
- Q is NP-hard if every problem in NP reduces to Q
- Q is NP-complete if it is NP-hard and is in NP
Why does it work? any P in NP
polytime reduction
− − − − − − − − − − − − − − − − − − → Q: how hard?
- Suppose Q easier than P
- Solve P by reducing to Q in polytime and then solve Q
- Then P as easy as Q, against assumption
- ⇒ Q at least as hard as P
So if Q is NP-hard it is as hard as any problem in NP
⇒ Q is as hard as the hardest problem in NP
14
NP-hardness proofs
Given a new problem Q, take any known NP-hard problem P and reduce it to Q Why does it work? P: NP-hard
polytime reduction
− − − − − − − − − − − − − − − − − − → Q: how hard?
- As before: Suppose . . . (etc.) ⇒ Q at least as hard as P
- Since P is NP-hard, it is hardest in NP, and so is Q
⇒ Q is NP-hard
15
Complexity of the DGP
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
16
DGP ∈ NP?
- NP: YES/NO problems with polytime-checkable proofs for YES
- DGP is a YES/NO problem
- DGP1 ∈ NP, since duv = |xu − xv| ⇒ (d ∈ Q → x ∈ Q)
- Solutions might involve irrational numbers when K > 1
- Some empirical evidence that DGP ∈ NP [Beeker et al. 2013]
17
The DGP is NP-hard
Partition Given a = (a1, . . . , an) ∈ Nn, ∃ I ⊆ {1, . . . , n} s.t.
i∈I
ai =
i∈I
ai ?
- Reduce (NP-hard) Partition to DGP1
- a −
→ cycle C with V (C) = {1, . . . , n}, E(C) = {{1, 2}, . . . , {n, 1}}
- For i < n let di,i+1 = ai, and dn,n+1 = dn1 = an
- E.g. for a = (1, 4, 1, 3, 3), get cycle graph:
1 2 3 4 5 1 4 1 3 3
[Saxe, 1979]
18
Partition is YES ⇒ DGP1 is YES
- Given: I ⊂ {1, . . . , n} s.t.
i∈I
ai =
i∈I
ai
- Construct: realization x of C in R
- 1. x1 = 0
// start
- 2. induction step: suppose xi known
if i ∈ I let xi+1 = xi + di,i+1
// go right
else xi+1 = xi − di,i+1
// go left
- Correctness proof: by the same induction
but careful when i = n: have to show xn+1 = x1
19
I = {1, 2, 3}
1 2 3 4 5 6 x1
1 2 3 4 5 6 x1 x2 1 ∈ I
1 2 3 4 5 6 x1 x2 x3 1 ∈ I 2 ∈ I
1 2 3 4 5 6 x1 x2 x3 x4 1 ∈ I 2 ∈ I 3 ∈ I
1 2 3 4 5 6 x1 x2 x3 x4 x5 1 ∈ I 2 ∈ I 3 ∈ I 4 ∈ I
1 2 3 4 5 6 x6 = x1? x2 x3 x4 x5 1 ∈ I 2 ∈ I 3 ∈ I 4 ∈ I 5 ∈ I
Partition is YES ⇒ DGP1 is YES
(1) =
- i∈I
(xi+1 − xi) =
- i∈I
di,i+1 = =
- i∈I
ai =
- i∈I
ai = =
- i∈I
di,i+1 =
- i∈I
(xi − xi+1) = (2) (1) = (2) ⇒
- i∈I
(xi+1 − xi) =
- i∈I
(xi − xi+1) ⇒
- i≤n
(xi+1 − xi) = ⇒ (xn+1 − xn) + (xn − xn−1) + · · · + (x3 − x2) + (x2 − x1) = ⇒ xn+1 = x1
26
Partition is NO ⇒ DGP1 is NO
- By contradiction: suppose DGP1 is YES, x realization of C
- F = {{u, v} ∈ E(C) | xu ≤ xv}, E(C)F = {{u, v} ∈ E(C) | xu > xv}
- Trace x1, . . . , xn: follow edges in F (→) and in E(C) F (←)
1 2 3
- 1
- 2
- 3
x1 x2 x3 x4 x5
- {u,v}∈F
(xv − xu) =
- {u,v}∈F
(xu − xv)
- {u,v}∈F
|xu − xv| =
- {u,v}∈F
|xu − xv|
- {u,v}∈F
duv =
- {u,v}∈F
duv
- Let J = {i < n | {i, i + 1} ∈ F} ∪ {n | {n, 1} ∈ F}
⇒
- i∈J
ai =
- i∈J
ai
- So J solves Partition instance, contradiction
- ⇒ DGP is NP-hard, DGP1 is NP-complete
27
Number of solutions
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
28
With congruences
- (G, K): DGP instance
- ˜
X ⊆ RKn: set of solutions
- Congruence: composition of translations, rotations, reflections
- C = set of congruences in RK
- x ∼ y means ∃ρ ∈ C (y = ρx):
distances in x are preserved in y through ρ
- ⇒ if | ˜
X| > 0, | ˜ X| = 2ℵ0
29
Modulo congruences
- Congruence is an equivalence relation ∼ on ˜
X (reflexive, symmetric, transitive) ∼
- Partitions ˜
X into equivalence classes
- X = ˜
X/∼: sets of representatives of equivalence classes
- Focus on |X| rather than | ˜
X|
30
Cardinality of X
- infeasible ⇔ |X| = 0
- rigid graph ⇔ |X| < ℵ0
- globally rigid graph ⇔ |X| = 1
- flexible graph ⇔ |X| = 2ℵ0
- |X| = ℵ0: impossible by Milnor’s theorem
31
Milnor’s theorem implies |X| = ℵ0
- System S of polynomial equations of degree d
∀i ≤ m pi(x1, . . . , xnK) = 0
- Let X be the set of x ∈ RnK satisfying S
- Number of connected components of X is ≤ d(2d − 1)nK−1
[Milnor 1964]
- If |X| is countable then G cannot be flexible
⇒ incongruent elements of X are separate connected components ⇒ by Milnor’s theorem, there’s finitely many of them
32
Examples
V 1 = {1, 2, 3} E1 = {{u, v} | u < v} d1 = 1
x1 x2 x3
ρ congruence in R2 ⇒ ρx valid realization |X| = 1 V 2 = V 1 ∪ {4} E2 = E1 ∪ {{1, 4}, {2, 4}} d2 = 1 ∧ d14 = √ 2
x1 x2 x3 x4
ρ reflects x4 wrt x1, x2 ⇒ ρx valid realization |X| = 2 ( , ) V 3 = V 2 E3 = {{u, u + 1}|u ≤ 3} ∪ {1, 4} d1 = 1
x1 x2 x3 x4
ρ rotates x2x3, x1x4 by θ ⇒ ρx valid realization |X| is uncountable ( , , , , . . .)
33
Mathematical optimization formulations
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
34
System of quadratic constraints ∀{u, v} ∈ E xu − xv2 = d2
uv
- Around 10 vertices
- Computationally useless
35
Quadratic objective min
x∈RnK
- {u,v}∈E
(xu − xv2 − d2
uv)2
- Globally optimal value zero iff x is a realization of G
- sBB: 10-100 vertices, exact solutions
- heuristics: 100-1000 vertices, poor quality
[Lavor et al., 2006]
36
Convexity and concavity max
x∈RnK
- {u,v}∈E
xu − xv2 ∀{u, v} ∈ E xu − xv2 ≤ d2
uv
- Convex constraints, concave objective
- Computationally no better than “quadratic objective”
37
Pointwise reformulation max
x∈RnK
- {u,v}∈E,k≤K
θuvk(xuk − xvk) ∀{u, v} ∈ E xu − xv2 ≤ d2
uv
- Convex subproblem in stochastic iterative heuristics
“guess θ and solve”
- 100-1000 vertices, good quality
[L. IOS14/MAGO14(slides)]
38
SDP formulation min
X0
- {u,v}∈E
(Xuu + Xvv − 2Xuv) ∀{u, v} ∈ E Xuu + Xvv − 2Xuv ≥ d2
uv
- Similar to those of Ye, Wolkowicz — works better for proteins
- 100 vertices, good quality
[D’Ambrosio et al., in progress]
39
Realizing complete graphs
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
40
Cliques
2-clique 1 2 3-clique 1 2 3 4-clique 1 2 3 4 (K + 1)-clique = K-clique ⊕ a vertex
41
Triangulation
1 2 3
1 1 2
− → 1 2 3
1 2
Example: realize triangle on a line
- From x3 − x1 = 2 and x3 − x2 = 1 get
x2
3 − 2x1x3 + x2 1
= 4 (1) x2
3 − 2x2x3 + x2 2
= 1. (2)
- (??) − (??) yields
2x3(x1 − x2) = x2
1 − x2 2 − 3
⇒ 2x3 = 4,
- Hence x3 = 2
42
Realizing a (K + 1)-clique in RK−1
- Apply triangulation inductively on K
assume x1, . . . , xK ∈ RK−1 known, compute y = xK+1
- K quadratic eqns (∀j ≤ K y − xj2 = d2
j,K+1) in K − 1 vars
y2 − 2x1 · y + x12 = d2
1,K+1
[1] . . . . . . y2 − 2xK · y + xK2 = d2
K,K+1
[K]
- Form system ∀j ≤ K ([j] − [K])
2(x1 − xK) · y = x12 − xK2 − d2
1,K+1 + d2 K,K+1
[1] − [K] . . . . . . 2(xK−1 − xK) · y = xK−12 − xK2 − d2
K−1,K+1 + d2 K,K+1
[K − 1] − [K]
- This is a (K − 1) × (K − 1) linear system Ay = b
Solve to find y [Dong, Wu 2002]
43
“Solve”?
- 1. What if A is singular?
- 2. Or: A nonsingular but instance is NO
44
Singularity: rkA = K − 2
One row xj − xK of A depends on the others
K = 2 triangle in R1 x1 − x2 = 0
x1 =x2 x3? x3?
K = 3 4-clique in R2 x1, x2, x3 on a line
x1 x2 x3 x4? x4?
K = 4 5-clique in R3 x1, . . . , x4 in a plane
x1 x2 x3 x4 x5? x5?
Trend continues: rk A = K − 2 ⇒ |X| = 2 (see later)
45
Singularity: rkA = K − 3
Two rows xj − xk depend on the others
K = 3 4-clique in R2 x1 = x2 = x3
x1=x2=x3 x4?
K = 4 5-clique in R3 x1, . . . , x4 on a line
x1 x2 x3 x4 x5?
Trend continues: [Hendrickson, 1992]
- Thm. 5.8. If a graph G is connected, flexible and has more than K vertices, |X|
contains almost always a submanifold diffeomorphic to a circle
46
Hendrickson’s theorem also applies to non-cliques
47
Nonsingular matrix A with NO instance
- Infeasible quadratic system ∀j ≤ K + 1 xj − xK2 = djK
- Take differences, get nonsingular A and value for xK
- . . . but it’s wrong!
Shit happens! Every time you solve the linear system Ay = b check feasibility with quadratic system
48
Algorithm for realizing complete graphs in RK
- Assume:
(i) G = (V, E) complete (ii) |V | = n ≥ K + 2 (iii) we know x1, . . . , xK+1
- Increase K: we know how to realize xK+2 in RK
- Use this inductively for each i ∈ {K + 2, . . . , n}
49
Algorithm for realizing complete graphs in RK
// realize next vertex iteratively for i ∈ {K + 2, . . . , n} do // use (K + 1) immediate adjacent predecessors to compute xi if rkA = K then xi = A−1b // A, b defined as above else xi = ∞ // A singular, mark ∞ and exit break end if // check that xi is feasible w.r.t. other distances for {j ∈ N(i) | j < i} do if xi − xj = dij then // if not, mark infeasible and exit loop ∗ xi = ∅ break end if end for if xi = ∅ then break end if end for return x
∗ the “ignore trouble” policy, a.k.a. “ignore probability zero events”
50
Complexity of Alg. 1
- Outer loop: O(n)
- Rank and inverse of A: O(K3)
- Inner loop: O(n)
- Get O(n2K3)
- But in most applications K is fixed
- Get O(n2)
But how do we find the realization of the first K + 1 vertices?
51
Realizing (K + 1)-cliques in RK
- Realizing (K + 1)-cliques in RK−1 yields “flat simplices”
(e.g. triangles on lines)
- Use “natural” embedding dimension RK
- Same reasoning as above:
get system Ay = b where y = xK+1 and Aj = 2(xj − xK)
- But now A is (K − 1) × K
- Same as previous case with A singular
52
Almost square
How can you solve the following system Ay = b:
a11 a12 . . . a1K . . . . . . ... . . . aK−1,1 aK−1,2 . . . aK−1,K
y1 . . . yK−1
=
b1 . . . bK−1
where A has one more columns than rows and rank K − 1?
53
Basics and nonbasics
- Since rk A = K − 1, ∃ K − 1 linearly independent columns
- B: set of their indices
- N : index of remaining columns
- B: (K − 1) × (K − 1) square matrix of columns in B
- ⇒ B is nonsingular
- Can partition columns as A = (B|N)
Column j corresponds to variable yj
- Variables yB are called basic variables
- Variable yN is called nonbasic variable
54
The dictionary
(B|N)y = b ⇒ ByB + NyN = b ⇒ yB = B−1b − B−1NyN Basics expressed in function of nonbasic
55
One quadratic equation
- From value of yN, can use dictionary to get y
- Use one quadratic equation
- 1. Pick any h ∈ {1, . . . , K − 1}, equation is xh − y2
2 = d2 hK
- 2. y = (yB|yN)⊤
- 3. Replace yB with B−1b − B−1NyN in equation
- 4. Solve resulting quadratic equation in one variable yN
- 5. Get 0,1 or 2 values for yN
- 6. ⇒ Get 0,1 or 2 positions for xK+1
56
What if B−1N is zero?
- yB = B−1b − B−1NyN reduces to yB = B−1b
- Use one quadratic equation
- 1. Pick any h ∈ {1, . . . , K − 1}, equation is xh − y2
2 = d2 hK
- 2. y = (yB|yN)⊤
- 3. Replace yB with B−1b in equation
- 4. Solve resulting quadratic equation in one variable yN
- 5. Get 0,1 or 2 values for yN
- 6. ⇒ Get 0,1 or 2 positions for xK+1
57
The difference
- B−1N = 0: yN
dictionary
− − − − − − − → yB
- Different values y+
N = y− N −
→ y+, y− with different components
- B−1N = 0: yB
quadratic eqn.
− − − − − − − − − − → yN
- Even if y+
N = y− N, K − 1 components of y+, y− are equal
aff(x1, . . . , xK−1) = {y ∈ RK | yN = 0}
58
The case of no solutions
- No realizations exist for this (K + 1)-clique in RK
- DGP instance is NO
59
The case of one solution
- Assume for simplicity: N = K, h = 1, B−1N = 0
Then xh − y2 = d2
h,K+1 becomes: λy2
K − 2µyK + ν
= 0, where λ =
- ℓ,j<K
1 + β2
ℓja2 jK
µ = x1K +
- ℓ,j<K
βℓjajK(βℓjbℓ − x1ℓ) ν =
- ℓ,j<K
βℓjbℓ(βℓjbℓ − 2x1ℓ) + x12 − d2
1,K+1
- (Exactly one solution for yK) ⇔ µ2 = λν, not a tautology
- The set of all (K + 1)-clique DGP instances in RK s.t. µ2 = λν
has Lebesgue measure 0
- Ignore them, they happen with probability∗ 0!
∗ Assuming continuous distributions over the reals. For floating point number, who knows? . . .
but we’ll ignore these instances anyhow 60
Discriminant > 0, = 0
4 4 4 1 3 1 2 3 1 2 3 2
61
The case of two solutions
- K spheres SK−1
1
, . . . , SK−1
K
in RK
centered at x1, . . . , xK with radii d1,K+1, . . . , dK,K+1
- xK+1 must be at the intersection of SK−1
1
, . . . , SK−1
K
- If
j SK−1 j
= ∅, then |
j SK−1 j
| = 2 in general
- will not mention “probability 0” or “in general” anymore
[Coope 2000]
62
Mirror images
- Let x+ = {x1, . . . , xK, x+
K+1}, x− = {x1, . . . , xK, x− K+1}
assume dim aff(x1, . . . , xK) = K (†)
- Theorem
x+, x− ∈ RK are reflections w.r.t. hyperplane defined by x1, . . . , xK
- Proof
- 1. x+, x− congruent by construction
- 2. ∀i ≤ K xi ∈ x+ ∩ x− → x+, x− not translations
- 3. |x+ ∩ x−| = K < |x+| = |x−| → x+, x− not rotations by (†)
- 4. ⇒ must be reflections
63
Algorithm for realizing (K + 1)-cliques in RK
// realize 1 at the origin x1 = (0, . . . , 0) // realize next vertex iteratively for ℓ ∈ {2, . . . , K + 1} do // at most two positions in Rℓ−1 for vertex ℓ S =
i<ℓ
Sℓ−2
i
if S = ∅ then // warn if infeasible return ∅ end if // arbitrarily choose one of the two points choose any xℓ ∈ S end for // return feasible realization return x
64
Complexity of Alg. 2
- Outer loop: O(K)
- Gaussian elimination on A: O(K3)
- Some messing about to obtain x+
K+1, x− K+1: +O(K2)
- Get O(K4)
- But in most applications K is fixed
- Get O(1)
65
Back to complete graphs
- Alg. 2: realize 1, . . . , K + 1 in RK: O(1)
- Alg. 1: Realize K + 2, . . . , n: O(n2)
- ⇒ O(n2)
- What about |X|?
– Alg. 1 is deterministic: one solution from x1, . . . , xK+1 – Alg. 2 is stochastic: pick one of two values K times
⇒ |X| = 2K
66
K-trilaterative graphs
- In Alg. 1 we only need each v > K + 1 to have K + 1 adjacent
predecessors in order to find a unique solution for xv
- Determination of xv from K+1 adjacent predecessors: K-trilateration
- K-trilaterative graph:
(i) has a vertex order ensuring this property (ii) the initial K + 1 vertices induce a (K + 1)-clique the order is called K-trilateration order
- Alg. 1 realizes all K-trilaterative graphs
The DGP restricted to K-trilaterative graphs in RK is easy [Eren et al. 2004]
67
The story so far
- Lots of nice applications
- DGP is NP-hard
- May have 0, 1, finitely many or 2ℵ0 solutions modulo congruences
- Continuous optimization techniques don’t scale well
- Using K + 1 adjacent predecessors, realize K-trilaterative graphs
in RK in polytime
- Do we need K + 1 adjacent predecessors, or can we do with
less?
68
The Branch-and-Prune algorithm
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
69
Fewer adjacent predecessors
- Alg. 2 only needs K adjacent predecessor
- Extend to n vertices: (K − 1)-trilaterative graphs
- Can we realize (K − 1)-trilaterative graphs in RK?
- A small case: graph consisting of two K + 1 cliques
1 1 1 5 5 5 2 2 2 3 3 3 4 4 4
70
Take a closer look. . .
5 5 2 2 3 3 4 4
- Realization of a K + 1 clique in RK knowing x1, . . . , xK
- We know how to do that!
- Consistent with 2 solutions for x5, reflected across plane through
x2, x3, x4
71
Discretization and pruning edges
- (K − 1)-trilaterative graph G = (V, E):
∀v > K ∃Uv ⊂ V (|Uv| = K ∧ ∀u ∈ Uv(u < v) ∧ {u, v} ∈ E)
- Discretization edges:
ED = {{u, v} ∈ E | u, v ≤ K}
- initial clique
∪ {{u, v} ∈ E | v > K ∧ u ∈ Uv}
- vertex order
- Pruning edges EP = E ED
10 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19
72
Role of discretization edges
Missing discretization edge ⇒ non-rigid structure ⇒ X uncountable Else: X finite
73
Role of pruning edges
No pruning edges: 8 incongruent realizations in R2 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
74
Role of pruning edges
Pruning edge {1, 4}: only 4 realizations remain valid 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
75
Motivation Protein backbones
- Total order < on V
- Covalent bond distances: {u − 1, u} ∈ E
- Covalent bond angles: {u − 2, u} ∈ E
- NMR experiments: {u − 3, u} ∈ E
(and other edges {u, v} with v − u > 3) Generalize “3” to K
[Lavor et al., COAP 2012]
76
KDMDGP graphs
K = 2 K = 3 1 2 3 4 5 Generalization of protein backbone order: v > K is adjacent to K immediate predecessors v − 1, . . . , v − K
KDMDGP: Discretizable Molecular Distance Geometry Problem
77
The Branch-and-Prune (BP) algorithm
BP(v, ¯ x, X):
- 1. Given v > K, realization ¯
x = (x1, . . . , xv−1)
- 2. Compute S =
- u∈Uv
SK−1
u
- 3. For each xv ∈ S s.t. ∀{u, v} ∈ EP (u < v → xu − xv = duv)
(a) let x = (¯ x, xv) (b) if v = n add x to X, else call BP(v + 1, x, X)
- Recursive: starts with BP(K + 1, (x1, . . . , xK), ∅)
- All realizations in X are incongruent∗
- Can be easily modified to find only p solutions for given p
- Applies to all (K − 1)-trilaterative graphs in RK
- Specialize to KDMDGP graph by setting Uv = {v − 1, . . . , v − K}
∗ with probability 1, and aside from one reflection at v = K + 1
[L. et al. ITOR 2008]
78
Complexity of BP
- Most operations are O(Kh) for some fixed h ⇒ O(1)
- Distance check at Step 3: O(n)
- Recursion on at most 2 branches at each call: binary tree
- Only recurse when v > K, v < n: 2n−K nodes
- Overall O(n2n−K) = O(2n)
Worst-case exponential behaviour
79
Hardness of KDMDGP
- The KDMDGP is NP-hard for each K
– every DGP instance is also DMDGP if K = 1 – reduction from Partition can be extended to any K
- (K − 1)-trilateration graphs are NP-hard by inclusion
- No polytime algorithm unless P=NP
Trilaterative graphs in RK are complexitywise borderline at K
80
Correctness
Thm. When BP terminates, X contains every incongruent realization of G Proof.
- Let ¯
y be any realization of G
- Since G has an initial K-clique, can rotate/translate/reflect ¯
y to y[K] = x[K] for all x ∈ X
- BP exhaustively constructs every extension of x[K] which is feasible
with all distances, so y ∈ X
for a realization y, y[h] = (y1, . . . , yh) is the initial segment of y 81
Two examples
82
Empirical observations
- Fast: up to 10k vertices in a few seconds on 2010 hardware
- Precise: errors in range O(10−9)-O(10−12)
- Number of solutions always a power of 2:
- bvious if EP = ∅, but otherwise mysterious
- Linear-time behaviour on proteins:
this really shouldn’t happen
83
Symmetry in the KDMDGP
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
[L. et al. DAM 2014]
84
Partial reflections
- For each v > K, let
gv(x) = (x1, . . . , xv−1, Rv
x(xv), . . . , Rv x(xn))
be the partial reflection of x w.r.t. v
xv−3 xv−2 xv−1
- Note: the gv’s are idempotent operators
- GD = (V, ED): subgraph of G given by discretization edges
- ∀v > K reflection Rv
x gives a binary choice in general∗
- XD ⊂ RnK contains 2n−K incongruent realizations of GD
∗ subsequent results hold “with probability 1”
85
Discretization group
- GD = gv | v > K: the discretization group of G w.r.t. K
subgroup of a Cartesian product of reflection groups
- An element g ∈ GD has the form
- v>K
gav
v , where av ∈ {0, 1}
- Action of GD on XD: g(x) =
- gaK+1
K+1 ◦ · · · ◦ gan n
- (x)
86
Commutativity of partial reflections
Lemma A GD is Abelian Proof Assume K < u < v. Then gugv(x) = gu(x1, . . . , xv−1, Rv
x(xv), . . . , Rv x(xn))
= (x1 . . . , xu−1, Ru
gv(x)(xu), . . . , Ru gv(x)Rv x(xv), . . . , Ru gv(x)Rv x(xn))
= (x1 . . . , xu−1, Ru
x(xu), . . . , Rv gu(x)Ru x(xv), . . . , Rv gu(x)Ru x(xn))
= gv(x1, . . . , xu−1, Ru
x(xu), . . . , Ru x(xn))
= gvgu(x) where equality of these terms holds by a Technical Lemma (next slide)
[L. et al. 2013]
87
Commutativity of partial reflections
Technical Lemma (Proof sketch for K = 2) Let y⊥Aff(xv−1, . . . , xv−K) and ρy = Rv
x
ρz ρy t O z y ρzy ρyt ρzt ρρzy ρzρyt = ρρzyρzt
88
One realization generates all others
Lemma B The action of GD on XD is transitive
K = 2
x y g g g5
4 3
∃g ∈ GD (y = g(x)): namely, y = g5(g4(g3(x)))
Proof By induction on v: assume result holds to v − 1 with g′, then either it holds for v and g = g′, else flip and let g = gvg′ [L. et al. 2013]
89
Structure and invariance
- GD is Abelian and generated by n − K idempotent elements
⇒ GD ∼ = Cn−K
2
- GD ≤ Aut(XD) by construction
90
Solution sets
- X: set of incongruent realizations of G
- GD defined on same vertices but fewer edges
⇒ fewer distance constraints on realizations ⇒ more realizations
- All realizations of G are also realizations of GD
⇒ X ⊆ XD
91
Losing invariance on pruning edges
Lemma C Let W uv = {u + K + 1, . . . , v} be the range of {u, v} ∀x ∈ X, u, w, v ∈ V (w ∈ W uv ↔ xu − xv = gw(x)u − gw(x)v) Proof sketch for K = 2
u w v
Corollary If {u, v} ∈ EP and w ∈ W uv, gw(x) ∈ X
[L. et al. 2013]
92
Pruning group
Define: Γ = {gw ∈ GD | w > K ∧ ∀{u, v} ∈ EP (w ∈ W uv)}
GP
= Γ Lemma D X is invariant w.r.t. GP Proof Follows by corollary, invariance of XD w.r.t. GD and X ⊆ XD
93
Transitivity of the pruning group
Lemma E The action of GP on X is transitive
- Given x, y ∈ X, aim to show ∃g ∈ GP (y = g(x))
- Lemma B ⇒ ∃g ∈ GD with y = g(x) ∈ XD
- Suppose g ∈ GP and aim for a contradiction
- ⇒ ∃{u, v} ∈ EP and w ∈ W uv s.t. gw is a component of g
- Lemma C ⇒ gw(x)u − gw(x)v = duv
- If w is the only such vertex, y = g(x) = x against hypothesis, done
- Suppose ∃ another z ∈ W uv s.t. gz is a component of g
- Set of cases s.t. xu − xv = gzgw(x)u − gzgw(x)v given gw(x)u − gw(x)v =
xu − xv = gz(x)u − gz(x)v has Lebesgue measure 0 in all DGP inputs
- By induction, holds for any number of components gz of g with z ∈ W uv
- ⇒ y = g(x) = x against hypothesis, done
[L. et al. 2013]
94
The main result
Theorem |X| = 2|Γ|
- Lemma A ⇒ GD ∼
= Cn−K
2
⇒ |GD| = 2n−K
- GP ≤ GD ⇒ ∃ℓ ∈ N (GP ∼
= Cℓ
2) , with ℓ = |Γ|
- Lemma E ⇒ ∀x ∈ X GPx = X
- Idempotency ⇒ ∀g ∈ GP
g−1 = g ⇒ ∀g, h ∈ GP, x ∈ X (gx = hx → h−1gx = x → hgx = x → hg = I → h = g−1= g) ⇒ the mapping GPx → GP given by gx → g is injective
- ∀g, h ∈ GP, x ∈ X (g = h → gx = hx)
⇒ the mapping gx → g is surjective
- ⇒ the mapping gx → g is a bijection
- ⇒ |GPx| = |GP|
- ⇒ ∀x ∈ X
|X| = |GPx| = |GP| = 2|Γ| [L. et al. 2013]
95
Symmetry-aware BP
- Don’t need to explore all branches of BP tree
- Build Γ as a pre-processing step
- Run BP, terminating as soon as |X| = 1
- For each g ∈ GP , compute gx
[Mucherino et al. JBCB 2012]
96
Complexity
- Computing Γ: O(mn)
- 1. initialize indicator vector ι = (ιK+1, . . . , ιn) for gv ∈ Γ
- 2. initialize ι = 1
- 3. for each {u, v} ∈ EP and w ∈ W uv let ιw = 0
- BP: O(2n)
- Compute gx for each g ∈ GP: O(2|Γ|)
- Overall: O(2n)
- Gains depend on the instance
97
Tractability of protein instances
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
[L. et al. 2013]
98
Let’s handle the BP tree
1 2 3 4 29 5 17 6 13 7 12 8 9 10 11 14 15 16 18 22 19 21 20 23 24 25 26 27 28 30 42 31 38 32 37 33 34 35 36 39 40 41 43 47 44 46 45 48 49 50 51 52 53
Max depth: n, looks good! Aim to prove width is bounded
99
Number of solutions at each BP tree level
Depends on range of longer pruning edge incident to level v
2 4 8 16 32 64 128 256 512 256 128 64 32 16 8 2 1 2 3 4 5 6 7 8 9 2 4 8 16 32 64 128 2 4 8 16 32 64 2 4 8 16 32 2 4 8 16 2 4 8 2 4 2 4 K+ Green path 8 nodes at level K + 4 {3, K+5} ∈ EP ⇒ gK+4, gK+5 ∈ Γ ⇒ no symmetry at levels K + 4 and K + 5 ⇒ only 4 nodes at level K + 5 no pruning edges pruning edges make graph K-trilaterative 100
Periodic pruning edges
1 16 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17
11 5 6 7 8 9 2 4 8 16 32 64 128 256 512 256 128 64 32 16 8 2 2 4 8 16 32 64 128 2 4 8 16 32 64 2 4 8 16 32 2 4 8 16 2 4 8 2 4 2 4 12 10 ... 4
- 2ℓ growth up to level ℓ, then constant: O(2ℓn) nodes in BP tree
- BP is Fixed-Parameter Tractable (FPT) in a bunch of cases
- For all tested protein backbones, ℓ ≤ 5 ⇒ BP linear on proteins!
101
The story so far
- Nice applications, problem is hard, could have many solutions
- Continuous methods don’t scale
- If certain vertex orders are present, use mixed-combinatorial methods
- Realize K-trilaterative in polytime but (K − 1)-trilaterative are
hard
- If adjacent predecessors are immediate, theory of symmetries
- Number of solutions is a power of two
- For proteins, BP is linear time
- How do we find these vertex orders?
102
Finding vertex orders
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
[Cassioli et al., DAM]
103
. . . wasn’t the backbone providing them?
- NMR data not as clean as I pretended
- Have to mess around with side chains
- What about other applications, anyhow?
Methods for finding trilaterative orders automatically
104
Mostly bad news
- Finding K-trilaterative orders is NP-complete
:-(
- But also FPT
:-)
- Finding KDMDGP orders is NP-complete for all K
:-(
- It’s also really hard in practice, and methods don’t scale well
105
Definitions
- Trilateration Ordering Problem (TOP)
Given a connected graph G = (V, E) and a positive integer K, does G have a K-trilateration order?
- Contiguous Trilateration Ordering Problem (CTOP)
Given a connected graph G = (V, E) and a positive integer K, does G have a (K −1)-trilateration order such that Uv = {v − 1, . . . , v − K} for each v > K? Both problems are in NP
106
Hardness of TOP
- Essentially due to finding the initial clique
– brute force: test all
n
K
- subsets of V
–
n
K
- is O(nK), polytime if K fixed
- Reduction from K-Clique problem:
Given a graph, does it have a K-clique?
[Mucherino et al., OPTL 2012]
107
Reduction from K-Clique
reduction
G G′ = G ⊕ K
- If K-Clique instance is YES
– start with α = (initial clique of G, K) – induction: if αv−1 defined, pick αv at shortest path distance 1 from α
- If K-Clique instance is NO
– By contradiction: suppose ∃ trilateration order α in G′ – Initial clique α[K] = (α1, . . . , αK) must have K − 1 vertices in G, 1 in K – αK+1 must be in G, hence ∃ K-clique in G [Cassioli et al., DAM]
108
Once the initial clique is known
Greedily grow a trilateration order α
- Initialize α with initial K-clique K
- Let W = V K
- ∀v > K av = |vertices in K adjacent to v|
// at termination, av will be the number of adjacent predecessors of v
- While W = ∅:
- 1. choose v ∈ W with largest av
- 2. if av < K instance is NO
- 3. α ← (α, v)
- 4. for all u ∈ W adjacent to v, increase au
- 5. W ← W {v}
- Instance is YES
[Mucherino et al., OPTL 2012]
109
Greedy algorithm is correct
- Assume TOP instance is YES, proceed by induction
– start: by maximality, aK+1 > K – assume α is a valid TOP up to v − 1, suppose av < K – but instance is YES so there is another z ∈ W with az ≥ K – contradicts maximality of av
- Assume TOP instance is NO
– algorithm termination at W = ∅ contradicts the NO – hence it must terminate with W = ∅ and “NO” answer
110
Complexity
- Outer while loop: O(n)
- Choice of largest av: O(n)
- Inner loop on W: O(n)
- Overall: O(n2)
- If we add brute force initial clique: O(2Kn2)
- Polytime if K fixed, FPT otherwise
111
CTOP is hard
- Reduction from Hamiltonian Path (HP)
Given a graph G, does it have a path passing through each vertex exactly once?
- α a H. path in G ⇒ ∀v = 1, n αv is adjacent to αv−1, αv+1
- Apart from initial 1-clique α1
every αv is adjacent to its immediate predecessor
- ⇒ α is a KDMDGP order in G with K = 1
- HP is the same as KDMDGP with K = 1
- ⇒ By inclusion, KDMDGP is NP-hard
[Cassioli et al., DAM]
112
CTOP is hard for all K
- Reduction from HP
reduction
1 2 3 C1 C2 C3 C12 C13
- Technical proof
[Cassioli et al., DAM]
113
How do we find KDMDGP orders?
Mathematical optimization & CPLEX
- xvi = 1 iff vertex v has rank i in the order
- Each vertex has a unique order rank:
∀v ∈ V
- i∈¯
n
xvi = 1;
- Each rank value is assigned a unique vertex:
∀i ∈ ¯ n
- v∈V
xvi = 1;
- There must be an initial K-clique:
∀v ∈ V, i ∈ {2, . . . , K}
- u∈N(v)
- j<i
xuj ≥ (i − 1)xvi;
- Each vertex with rank > K must have at least K contiguous adjacent predecessors
∀v ∈ V, i > K
- u∈N(v)
- i−K≤j<i
xuj ≥ Kxvi.
- Do not expect too much; scales up to 100 vertices
114
How about those 10k-atom backbones?
We have Carlile for those
- Note the repetitions — they serve a purpose!
- Repetition orders are also hard to find for any K
- . . . but Carlile knows how to handcraft them!
[Lavor et al. JOGO 2013]
115
And what about the side-chains?
The Carlile+Antonio tool!
[Costa et al. JOGO, submitted]
116
Approximate realizations
- 1. Applications
- 2. Definition
- 3. Complexity primer
- 4. Complexity of the DGP
- 5. Number of solutions
- 6. Mathematical optimization formulations
- 7. Realizing complete graphs
- 8. The Branch-and-Prune algorithm
- 9. Symmetry in the KDMDGP
- 10. Tractability of protein instances
- 11. Finding vertex orders
- 12. Approximate realizations
117
Data errors
The “distance = real number” paradigm is a lie!
- Covalent bonds are fairly precise
- NMR data is a mess [Berger, J. ACM 1999]
– experimental errors yield intervals [dL
uv, dU uv]
– NMR outputs frequencies of (atom type pair, distance value) weighted graph reconstruction yields systematic error – some atom type pairs yield more error (“only trust H—H”)
- Properties of specific molecules give rise to other constraints
- The protein graph may not be (K − 1)-trilaterative based on
the backbone
118
The Lavorder comes to the rescue!
- Carlile’s handcrafted repetition orders properties:
– repetitions allow a “virtual backbone” of H atoms only – discretization edges: {v, v − i} covalent bonds for i ∈ {1, 2}, {v, v − 3} sometimes covalent sometimes from NMR – most NMR data restricted to pruning edges
- When dv,v−3 is an interval: intersect two spheres with sph. shell
dL dU
- Discretize circular segments and run BP with modified S
Algorithm no longer exhaustive
119
Die Symmetriktheoried¨ ammerung
- Intervals and discretization break the theory of symmetries
- duv
- Only some bounds for the number b of BP solutions:
∃ℓ, k 2ℓqk ≤ b ≤ 2n−3qM
q = |discretization points|, M = |NMR discretization edges|
120
But at least it’s producing results
Joint work with Institut Pasteur
[Cassioli et al., BMC Bioinf., submitted]
121
General approximate methods
- All these methods are specialized to
protein distance data from NMR
- What about general approximate methods?
- Assume large-sized input data with errors
- No assumptions on graph structure
122
Ingredients
- PDM = Partial Distance Matrix (a representation of G)
- EDM = Euclidean Distance Matrix
- 1. Complete the given PDM d to a symmetric matrix D
- 2. Find a realization x (in some dimension ¯
K) s.t. the EDM (xu − xv) is “close” to D
- 3. Project x from dimension ¯
K to dimension K, keeping pairwise distances approximately equal
123
Completing the distance matrix
- ∀{u, v} ∈ E let Duv = length of the shortest path u → v
- Use Floyd-Warshall’s algorithm O(n3)
1: // n × n array Dij to store distances 2: D = 0 3: for {i, j} ∈ E do 4:
Dij = dij
5: end for 6: for k ∈ V do 7:
for j ∈ V do
8:
for i ∈ V do
9:
if Dik + Dkj < Dij then
10:
// Dij fails to satisfy triangle inequality, update
11:
Dij = Dik + Dkj
12:
end if
13:
end for
14:
end for
15: end for
124
Finding a realization
- Let’s give ourselves many dimensions, say ¯
K = n
- Attempt to find x : V → Rn with (xu − xv2) ≈ (Duv)
- If we had the Gram matrix B of x, then:
- 1. find eigen(value/vector) matrices Λ, Y of B
- 2. since B is PSD, Λ ≥ 0 ⇒
√ Λ exists
- 3. ⇒ B = Y ΛY ⊤ = (Y
√ Λ)(Y √ Λ)⊤
- 4. x = Y
√ Λ is such that xx⊤ = B
- Can we compute B from D?
125
Schoenberg’s theorem
- Standard method for computing B from D2
- Also known as classic MultiDimensional Scaling (MDS)
- Apply many algebraic manipulations to
d2
uv = xu − xv2 = xu⊤xu + xv⊤xv − 2xu⊤xv
where the centroid
- k≤n
xuk = 0 for all u ≤ n
- Get B = −1
2(In − 1 n1n)D2(In − 1 n1n), i.e.
xu · xv = 1 2n
- k≤n
(d2
uk + d2 kv) − d2 uv −
1 2n2
- h≤n
k≤n
d2
hk
- D “approximately” EDM ⇒ B “approximately” Gram
[Schoenberg, Annals of Mathematics, 1935]
126
Project to RK for a given K
- Only use the K largest eigenvalues of Λ
- Y [K] = K columns of Y corresp. to K largest eigenvalues
- Λ[K] = K largest eigenvalues of Λ on diagonal
- x = Y [K]
- Λ[K] is a K × n matrix
- Y [K] span the subspace where x “fills more space”, i.e. neglecting other
dimensions causes smaller errors w.r.t. the realization in Rn
This method is called Principal Component Analysis (PCA)
127
Isomap
Given K and PDM d:
- 1. D = FloydWarshall(d)
- 2. B = MDS(D)
- 3. x = PCA(B, K)
− → [Tenenbaum et al. Science 2000]
128
Some references
- L. Liberti, C. Lavor, N. Maculan, A. Mucherino, Euclidean distance geometry
and applications, SIAM Review, 56(1):3-69, 2014
- L. Liberti, B. Masson, J. Lee, C. Lavor, A. Mucherino, On the number
- f realizations of certain Henneberg graphs arising in protein conformation,
Discrete Applied Mathematics, 165:213-232, 2014
- L. Liberti, C. Lavor, A. Mucherino, The discretizable molecular distance geometry
problem seems easier on proteins, in [see below], 47-60
- A. Mucherino, C. Lavor, L. Liberti, N. Maculan (eds.), Distance Geometry: