Euclidean Distance Geometry Leo Liberti IBM Research, USA CNRS LIX - - PowerPoint PPT Presentation

euclidean distance geometry
SMART_READER_LITE
LIVE PREVIEW

Euclidean Distance Geometry Leo Liberti IBM Research, USA CNRS LIX - - PowerPoint PPT Presentation

Euclidean Distance Geometry Leo Liberti IBM Research, USA CNRS LIX Ecole Polytechnique, France MFD 2014, Campinas [L., Lavor: Introduction to Euclidean Distance Geometry , in preparation] Table of contents 1. Applications 2. Definition 3.


slide-1
SLIDE 1

Euclidean Distance Geometry

Leo Liberti

IBM Research, USA CNRS LIX Ecole Polytechnique, France MFD 2014, Campinas [L., Lavor: Introduction to Euclidean Distance Geometry, in preparation]

slide-2
SLIDE 2

Table of contents

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

2

slide-3
SLIDE 3

Applications

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

3

slide-4
SLIDE 4

Clock Synchronization

5 7 3 4 Atomic clock (S) 16:27 Chuck Alice Bob

16:20 16:22 16:24 16:26 16:28 16:30

A C S B

16:21 16:23 16:25 16:27 16:29 16:31

[Singer, 2011]

4

slide-5
SLIDE 5

Sensor network localization

29

0 / 2.50781

36

1 / 1.97906

51

2 / 1.13649

52

3 / 2.92955

80

4 / 2.5548

91

5 / 2.4284 175 / 2.49225 177 / 1.4276

87

176 / 2.73147 211 / 0.989917 214 / 2.31786

49

210 / 2.83291

84

213 / 3.02018

66

212 / 1.56417 268 / 1.76662 267 / 2.31042 271 / 1.75417

69

269 / 2.0622

73

270 / 3.10479

85

272 / 1.15897 346 / 1.50056

1 18

6 / 1.90126

37

7 / 2.61352

41

8 / 1.6604

45

9 / 1.92913

57

10 / 2.66407

58

11 / 0.658861

83

12 / 1.32064 110 / 1.29792 111 / 0.840347 113 / 2.14071 114 / 2.03576

47

112 / 2.5425 215 / 2.12848 217 / 3.0822 218 / 1.98508 216 / 2.70003 229 / 2.97748 231 / 1.63275 232 / 2.32966 230 / 2.92252 249 / 1.49322 250 / 1.4467 251 / 3.11231 289 / 2.5056

81

290 / 2.54013 291 / 1.97931

2 5

13 / 1.62227

14

14 / 2.01015

17

15 / 2.68422

20

16 / 3.05872

23

17 / 2.48332

42

18 / 1.05756

64

19 / 2.21356

67

20 / 2.42912

88

21 / 2.34146

93

22 / 1.0662 28 / 2.06104 29 / 1.80061 31 / 1.46243 33 / 1.38178 35 / 1.01186

33

30 / 3.15392

65

32 / 2.19726

70

34 / 3.17108 82 / 2.05285 83 / 1.9929 84 / 2.90088 85 / 0.965501 87 / 1.40579 88 / 2.51699 86 / 3.19003

98

89 / 2.50718 106 / 1.36517 107 / 2.2605 109 / 2.88336

55

108 / 2.81041 119 / 2.25758 123 / 0.666733 126 / 2.79445 122 / 1.26894 124 / 2.45029

71

125 / 2.61515 127 / 2.5523

56

120 / 2.46168

63

121 / 3.15779 143 / 2.62768 145 / 3.01294 146 / 1.62871

89

147 / 3.16128 144 / 2.60777 233 / 1.59139 234 / 3.07459 235 / 1.62072 316 / 1.57213 318 / 2.54084

92

317 / 3.16387 326 / 2.30061 325 / 2.97621 327 / 2.62621 350 / 3.18927

3

23 / 0.844888 26 / 2.43216 24 / 1.43953 25 / 2.52302 27 / 1.2847 329 / 2.97101 331 / 1.47226 330 / 2.81533

79

339 / 3.17026 195 / 3.02256 196 / 2.72116 194 / 3.16431

53

193 / 0.88823 319 / 1.78484 322 / 3.15941 320 / 1.19371 321 / 2.59083 332 / 2.88171

77

333 / 2.5427

6

36 / 2.65816 37 / 2.84506 39 / 3.00913

39

38 / 0.977096 40 / 0.147583

90

41 / 2.54617 223 / 1.11811 225 / 1.74291

72

224 / 2.873 257 / 2.65041

7 10

42 / 1.67711

16

43 / 1.28092

30

44 / 2.60399

44

45 / 1.93798 46 / 0.90946

78

47 / 1.95204

86

48 / 3.1702

97

49 / 2.91638 50 / 2.79803 59 / 2.64882 61 / 3.11897 63 / 2.57151 64 / 0.63344 65 / 1.47059

15

58 / 2.53596

21

60 / 2.72874

59

62 / 2.44092 98 / 2.45974 102 / 2.89582 101 / 3.00823 100 / 2.72757 103 / 1.23805 104 / 2.65248 105 / 1.56149

24

99 / 2.76855 182 / 2.80251 183 / 2.17394 178 / 0.684621 184 / 2.04844 186 / 2.36739 179 / 1.75353 181 / 1.27559

62

180 / 2.82612 185 / 2.02673 244 / 2.90515 245 / 2.54461246 / 1.4908 248 / 2.19914 242 / 2.10075 243 / 1.81286 247 / 2.70375 334 / 2.85895 335 / 2.74921 345 / 0.969838

96

344 / 3.01517

8

52 / 1.59789 53 / 3.1109

13

51 / 2.96353 54 / 2.6062

94

55 / 2.44249 81 / 2.44372

31

80 / 3.09041 352 / 3.17363

95

353 / 1.69606

9

56 / 2.23711 57 / 2.87413 95 / 2.43097 97 / 1.79475 90 / 0.575857 91 / 1.84882

60

92 / 3.11441

61

93 / 0.745853

74

94 / 1.08674 96 / 1.91462 132 / 2.48921 134 / 1.67584 128 / 2.41661 129 / 2.54924 130 / 0.857403 131 / 1.03428 133 / 1.33929 295 / 2.78515 296 / 3.03355 297 / 2.79379 292 / 2.23004 293 / 2.59059

76

294 / 2.25052

11 22

66 / 2.54335

25

67 / 1.59039

26

68 / 2.55644

35

69 / 1.22183

43

70 / 1.26323

46

71 / 0.657841 72 / 1.45548 73 / 1.45696 74 / 2.43597

99

75 / 0.710271 135 / 1.03939 137 / 2.64809 138 / 2.36917 139 / 2.85603 141 / 1.99344 142 / 0.632129

28

136 / 3.03397 140 / 1.63894 150 / 2.05658 151 / 2.68427 152 / 1.33314 153 / 1.85365 155 / 1.0403 156 / 1.22973 157 / 2.29037 154 / 2.67744 158 / 3.05556 159 / 2.34031 160 / 1.82595 162 / 2.70381 163 / 2.43938 161 / 1.90326 203 / 0.991465 204 / 1.81584 206 / 2.66247 207 / 2.45539 208 / 2.2551 209 / 1.2466

48

205 / 3.06568 236 / 1.90515 238 / 2.5944 239 / 2.72018 240 / 3.16484 241 / 0.720351 237 / 2.87679 252 / 0.86256 254 / 0.843728 255 / 2.43018 256 / 1.26966 253 / 2.83326 262 / 0.880856 263 / 3.06774 264 / 1.88647 261 / 1.9719 348 / 2.26962 349 / 2.1111 351 / 3.02766

12

78 / 2.35652 76 / 0.500095 77 / 3.13454 79 / 2.02458 173 / 2.85379 172 / 2.85207 174 / 2.12157 281 / 1.77315 280 / 2.13908 187 / 0.741716 188 / 1.28288 304 / 2.67349 298 / 3.19922 301 / 3.10155 303 / 1.2464

68

299 / 2.68553 300 / 2.93307

82

302 / 2.70564 306 / 3.166 308 / 2.46248 305 / 0.378469 307 / 1.95331 341 / 2.70349 340 / 1.87485 354 / 2.04535 149 / 1.71292

32

148 / 2.63438

19

117 / 2.4561 118 / 1.55824 115 / 1.40904

50

116 / 0.648045 259 / 2.59804 260 / 2.60371 258 / 2.04883 265 / 2.53475 266 / 1.20584 285 / 3.04711 284 / 1.28966 286 / 0.444549 287 / 2.49879 283 / 0.739388 282 / 2.4463 288 / 2.40494 312 / 2.0258 313 / 1.03139 314 / 2.65429 315 / 1.70638

34

189 / 2.92806

38

190 / 2.07339

40

191 / 2.80145 192 / 2.35256 324 / 2.63064 323 / 2.58729

27

171 / 2.23482 167 / 2.38609 164 / 1.92141 165 / 2.6582 166 / 2.28031 168 / 2.41683 169 / 0.955716 170 / 0.967808 199 / 2.18351 197 / 0.88776 198 / 1.51159 200 / 0.583974 201 / 2.87535 202 / 1.31584 220 / 2.98526 219 / 1.26904 221 / 0.308754 222 / 1.8407 226 / 1.33417227 / 3.09905 228 / 1.31378 328 / 1.65874 338 / 1.28689 337 / 1.83675

75

336 / 3.11488 347 / 3.11676 310 / 2.37803 309 / 1.9722 311 / 1.13749 276 / 3.05065 277 / 2.40579 273 / 2.81885 275 / 3.02541 274 / 2.49444

54

278 / 3.02172 279 / 1.17834 343 / 1.31891 342 / 2.50522

− → [Yemini, 1978]

5

slide-6
SLIDE 6

Protein conformation from NMR data

1 2 3 4 5 9 6 8 10 7 11

− → [Crippen & Havel 1988]

6

slide-7
SLIDE 7

Clock synchronization: solutions

|xA − xB| = 7 |xA − xC| = 3 |xA − xS| = 5 |xB − xC| = 4 xS = 16:27.

5 7 3 4

xA xB xC xS

xS = 16:27 xA = 16:22 xB=16:15 xC=16:19 xB=16:15 xC=16:25 xB=16:29 xC=16:19 xB=16:29 xC=16:25 xA = 16:32 xB=16:25 xC=16:29 xB=16:25 xC=16:35 xB=16:39 xC=16:29 xB=16:39 xC=16:35

7

slide-8
SLIDE 8

Definition

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

8

slide-9
SLIDE 9

Distance Geometry Problem (DGP)

Given: Determine whether ∃: – a simple graph G = (V, E) – an edge function d : E → R≥0 – an integer K ∈ N a realization x : V → RK s.t. ∀{u, v} ∈ E xu − xv2 = duv 1 2 3 4 1 √3 1 1 1 1 1 2 3 4

Let n = |V |

9

slide-10
SLIDE 10

More applications

  • Autonomous underwater vehicles [Bahr et al. 2009]
  • Statics of rigid structures [Maxwell 1864]
  • Matrix completion [Laurent 2009]
  • Statistics [Boer 2013]
  • Psychology [Kruskal 1964]

[Liberti et al., SIREV 2014]

10

slide-11
SLIDE 11

Complexity primer

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

11

slide-12
SLIDE 12

Definitions

  • Decision problem: mathematical YES/NO-type question depending
  • n a parameter vector π
  • Instance: same as above with π replaced by given values v
  • Certificate: proof that a given answer is true
  • P: all decision problems solvable in at most p(|π|) steps

where p is a polynomial

  • NP: all decision problems with |YES certificate| ≤ p(|π|)

where p is a polynomial

12

slide-13
SLIDE 13

Reductions

  • P, Q: decision problems
  • If ∃ algorithm A which:
  • 1. reformulates instances ¯

P of P into instances ¯ Q of Q

  • 2. has answer( ¯

P) = YES iff answer(A( ¯ Q)) = YES

  • 3. is polytime in the instance size | ¯

P| then A is a reduction of P to Q

13

slide-14
SLIDE 14

NP-hardness

  • Q is NP-hard if every problem in NP reduces to Q
  • Q is NP-complete if it is NP-hard and is in NP

Why does it work? any P in NP

polytime reduction

− − − − − − − − − − − − − − − − − − → Q: how hard?

  • Suppose Q easier than P
  • Solve P by reducing to Q in polytime and then solve Q
  • Then P as easy as Q, against assumption
  • ⇒ Q at least as hard as P

So if Q is NP-hard it is as hard as any problem in NP

⇒ Q is as hard as the hardest problem in NP

14

slide-15
SLIDE 15

NP-hardness proofs

Given a new problem Q, take any known NP-hard problem P and reduce it to Q Why does it work? P: NP-hard

polytime reduction

− − − − − − − − − − − − − − − − − − → Q: how hard?

  • As before: Suppose . . . (etc.) ⇒ Q at least as hard as P
  • Since P is NP-hard, it is hardest in NP, and so is Q

⇒ Q is NP-hard

15

slide-16
SLIDE 16

Complexity of the DGP

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

16

slide-17
SLIDE 17

DGP ∈ NP?

  • NP: YES/NO problems with polytime-checkable proofs for YES
  • DGP is a YES/NO problem
  • DGP1 ∈ NP, since duv = |xu − xv| ⇒ (d ∈ Q → x ∈ Q)
  • Solutions might involve irrational numbers when K > 1
  • Some empirical evidence that DGP ∈ NP [Beeker et al. 2013]

17

slide-18
SLIDE 18

The DGP is NP-hard

Partition Given a = (a1, . . . , an) ∈ Nn, ∃ I ⊆ {1, . . . , n} s.t.

i∈I

ai =

i∈I

ai ?

  • Reduce (NP-hard) Partition to DGP1
  • a −

→ cycle C with V (C) = {1, . . . , n}, E(C) = {{1, 2}, . . . , {n, 1}}

  • For i < n let di,i+1 = ai, and dn,n+1 = dn1 = an
  • E.g. for a = (1, 4, 1, 3, 3), get cycle graph:

1 2 3 4 5 1 4 1 3 3

[Saxe, 1979]

18

slide-19
SLIDE 19

Partition is YES ⇒ DGP1 is YES

  • Given: I ⊂ {1, . . . , n} s.t.

i∈I

ai =

i∈I

ai

  • Construct: realization x of C in R
  • 1. x1 = 0

// start

  • 2. induction step: suppose xi known

if i ∈ I let xi+1 = xi + di,i+1

// go right

else xi+1 = xi − di,i+1

// go left

  • Correctness proof: by the same induction

but careful when i = n: have to show xn+1 = x1

19

slide-20
SLIDE 20

I = {1, 2, 3}

1 2 3 4 5 6 x1

slide-21
SLIDE 21

1 2 3 4 5 6 x1 x2 1 ∈ I

slide-22
SLIDE 22

1 2 3 4 5 6 x1 x2 x3 1 ∈ I 2 ∈ I

slide-23
SLIDE 23

1 2 3 4 5 6 x1 x2 x3 x4 1 ∈ I 2 ∈ I 3 ∈ I

slide-24
SLIDE 24

1 2 3 4 5 6 x1 x2 x3 x4 x5 1 ∈ I 2 ∈ I 3 ∈ I 4 ∈ I

slide-25
SLIDE 25

1 2 3 4 5 6 x6 = x1? x2 x3 x4 x5 1 ∈ I 2 ∈ I 3 ∈ I 4 ∈ I 5 ∈ I

slide-26
SLIDE 26

Partition is YES ⇒ DGP1 is YES

(1) =

  • i∈I

(xi+1 − xi) =

  • i∈I

di,i+1 = =

  • i∈I

ai =

  • i∈I

ai = =

  • i∈I

di,i+1 =

  • i∈I

(xi − xi+1) = (2) (1) = (2) ⇒

  • i∈I

(xi+1 − xi) =

  • i∈I

(xi − xi+1) ⇒

  • i≤n

(xi+1 − xi) = ⇒ (xn+1 − xn) + (xn − xn−1) + · · · + (x3 − x2) + (x2 − x1) = ⇒ xn+1 = x1

26

slide-27
SLIDE 27

Partition is NO ⇒ DGP1 is NO

  • By contradiction: suppose DGP1 is YES, x realization of C
  • F = {{u, v} ∈ E(C) | xu ≤ xv}, E(C)F = {{u, v} ∈ E(C) | xu > xv}
  • Trace x1, . . . , xn: follow edges in F (→) and in E(C) F (←)

1 2 3

  • 1
  • 2
  • 3

x1 x2 x3 x4 x5

  • {u,v}∈F

(xv − xu) =

  • {u,v}∈F

(xu − xv)

  • {u,v}∈F

|xu − xv| =

  • {u,v}∈F

|xu − xv|

  • {u,v}∈F

duv =

  • {u,v}∈F

duv

  • Let J = {i < n | {i, i + 1} ∈ F} ∪ {n | {n, 1} ∈ F}

  • i∈J

ai =

  • i∈J

ai

  • So J solves Partition instance, contradiction
  • ⇒ DGP is NP-hard, DGP1 is NP-complete

27

slide-28
SLIDE 28

Number of solutions

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

28

slide-29
SLIDE 29

With congruences

  • (G, K): DGP instance
  • ˜

X ⊆ RKn: set of solutions

  • Congruence: composition of translations, rotations, reflections
  • C = set of congruences in RK
  • x ∼ y means ∃ρ ∈ C (y = ρx):

distances in x are preserved in y through ρ

  • ⇒ if | ˜

X| > 0, | ˜ X| = 2ℵ0

29

slide-30
SLIDE 30

Modulo congruences

  • Congruence is an equivalence relation ∼ on ˜

X (reflexive, symmetric, transitive) ∼

  • Partitions ˜

X into equivalence classes

  • X = ˜

X/∼: sets of representatives of equivalence classes

  • Focus on |X| rather than | ˜

X|

30

slide-31
SLIDE 31

Cardinality of X

  • infeasible ⇔ |X| = 0
  • rigid graph ⇔ |X| < ℵ0
  • globally rigid graph ⇔ |X| = 1
  • flexible graph ⇔ |X| = 2ℵ0
  • |X| = ℵ0: impossible by Milnor’s theorem

31

slide-32
SLIDE 32

Milnor’s theorem implies |X| = ℵ0

  • System S of polynomial equations of degree d

∀i ≤ m pi(x1, . . . , xnK) = 0

  • Let X be the set of x ∈ RnK satisfying S
  • Number of connected components of X is ≤ d(2d − 1)nK−1

[Milnor 1964]

  • If |X| is countable then G cannot be flexible

⇒ incongruent elements of X are separate connected components ⇒ by Milnor’s theorem, there’s finitely many of them

32

slide-33
SLIDE 33

Examples

V 1 = {1, 2, 3} E1 = {{u, v} | u < v} d1 = 1

x1 x2 x3

ρ congruence in R2 ⇒ ρx valid realization |X| = 1 V 2 = V 1 ∪ {4} E2 = E1 ∪ {{1, 4}, {2, 4}} d2 = 1 ∧ d14 = √ 2

x1 x2 x3 x4

ρ reflects x4 wrt x1, x2 ⇒ ρx valid realization |X| = 2 ( , ) V 3 = V 2 E3 = {{u, u + 1}|u ≤ 3} ∪ {1, 4} d1 = 1

x1 x2 x3 x4

ρ rotates x2x3, x1x4 by θ ⇒ ρx valid realization |X| is uncountable ( , , , , . . .)

33

slide-34
SLIDE 34

Mathematical optimization formulations

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

34

slide-35
SLIDE 35

System of quadratic constraints ∀{u, v} ∈ E xu − xv2 = d2

uv

  • Around 10 vertices
  • Computationally useless

35

slide-36
SLIDE 36

Quadratic objective min

x∈RnK

  • {u,v}∈E

(xu − xv2 − d2

uv)2

  • Globally optimal value zero iff x is a realization of G
  • sBB: 10-100 vertices, exact solutions
  • heuristics: 100-1000 vertices, poor quality

[Lavor et al., 2006]

36

slide-37
SLIDE 37

Convexity and concavity max

x∈RnK

  • {u,v}∈E

xu − xv2 ∀{u, v} ∈ E xu − xv2 ≤ d2

uv

  • Convex constraints, concave objective
  • Computationally no better than “quadratic objective”

37

slide-38
SLIDE 38

Pointwise reformulation max

x∈RnK

  • {u,v}∈E,k≤K

θuvk(xuk − xvk) ∀{u, v} ∈ E xu − xv2 ≤ d2

uv

  • Convex subproblem in stochastic iterative heuristics

“guess θ and solve”

  • 100-1000 vertices, good quality

[L. IOS14/MAGO14(slides)]

38

slide-39
SLIDE 39

SDP formulation min

X0

  • {u,v}∈E

(Xuu + Xvv − 2Xuv) ∀{u, v} ∈ E Xuu + Xvv − 2Xuv ≥ d2

uv

  • Similar to those of Ye, Wolkowicz — works better for proteins
  • 100 vertices, good quality

[D’Ambrosio et al., in progress]

39

slide-40
SLIDE 40

Realizing complete graphs

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

40

slide-41
SLIDE 41

Cliques

2-clique 1 2 3-clique 1 2 3 4-clique 1 2 3 4 (K + 1)-clique = K-clique ⊕ a vertex

41

slide-42
SLIDE 42

Triangulation

1 2 3

1 1 2

− → 1 2 3

1 2

Example: realize triangle on a line

  • From x3 − x1 = 2 and x3 − x2 = 1 get

x2

3 − 2x1x3 + x2 1

= 4 (1) x2

3 − 2x2x3 + x2 2

= 1. (2)

  • (??) − (??) yields

2x3(x1 − x2) = x2

1 − x2 2 − 3

⇒ 2x3 = 4,

  • Hence x3 = 2

42

slide-43
SLIDE 43

Realizing a (K + 1)-clique in RK−1

  • Apply triangulation inductively on K

assume x1, . . . , xK ∈ RK−1 known, compute y = xK+1

  • K quadratic eqns (∀j ≤ K y − xj2 = d2

j,K+1) in K − 1 vars

  

y2 − 2x1 · y + x12 = d2

1,K+1

[1] . . . . . . y2 − 2xK · y + xK2 = d2

K,K+1

[K]

  • Form system ∀j ≤ K ([j] − [K])

  

2(x1 − xK) · y = x12 − xK2 − d2

1,K+1 + d2 K,K+1

[1] − [K] . . . . . . 2(xK−1 − xK) · y = xK−12 − xK2 − d2

K−1,K+1 + d2 K,K+1

[K − 1] − [K]

  • This is a (K − 1) × (K − 1) linear system Ay = b

Solve to find y [Dong, Wu 2002]

43

slide-44
SLIDE 44

“Solve”?

  • 1. What if A is singular?
  • 2. Or: A nonsingular but instance is NO

44

slide-45
SLIDE 45

Singularity: rkA = K − 2

One row xj − xK of A depends on the others

K = 2 triangle in R1 x1 − x2 = 0

x1 =x2 x3? x3?

K = 3 4-clique in R2 x1, x2, x3 on a line

x1 x2 x3 x4? x4?

K = 4 5-clique in R3 x1, . . . , x4 in a plane

x1 x2 x3 x4 x5? x5?

Trend continues: rk A = K − 2 ⇒ |X| = 2 (see later)

45

slide-46
SLIDE 46

Singularity: rkA = K − 3

Two rows xj − xk depend on the others

K = 3 4-clique in R2 x1 = x2 = x3

x1=x2=x3 x4?

K = 4 5-clique in R3 x1, . . . , x4 on a line

x1 x2 x3 x4 x5?

Trend continues: [Hendrickson, 1992]

  • Thm. 5.8. If a graph G is connected, flexible and has more than K vertices, |X|

contains almost always a submanifold diffeomorphic to a circle

46

slide-47
SLIDE 47

Hendrickson’s theorem also applies to non-cliques

47

slide-48
SLIDE 48

Nonsingular matrix A with NO instance

  • Infeasible quadratic system ∀j ≤ K + 1 xj − xK2 = djK
  • Take differences, get nonsingular A and value for xK
  • . . . but it’s wrong!

Shit happens! Every time you solve the linear system Ay = b check feasibility with quadratic system

48

slide-49
SLIDE 49

Algorithm for realizing complete graphs in RK

  • Assume:

(i) G = (V, E) complete (ii) |V | = n ≥ K + 2 (iii) we know x1, . . . , xK+1

  • Increase K: we know how to realize xK+2 in RK
  • Use this inductively for each i ∈ {K + 2, . . . , n}

49

slide-50
SLIDE 50

Algorithm for realizing complete graphs in RK

// realize next vertex iteratively for i ∈ {K + 2, . . . , n} do // use (K + 1) immediate adjacent predecessors to compute xi if rkA = K then xi = A−1b // A, b defined as above else xi = ∞ // A singular, mark ∞ and exit break end if // check that xi is feasible w.r.t. other distances for {j ∈ N(i) | j < i} do if xi − xj = dij then // if not, mark infeasible and exit loop ∗ xi = ∅ break end if end for if xi = ∅ then break end if end for return x

∗ the “ignore trouble” policy, a.k.a. “ignore probability zero events”

50

slide-51
SLIDE 51

Complexity of Alg. 1

  • Outer loop: O(n)
  • Rank and inverse of A: O(K3)
  • Inner loop: O(n)
  • Get O(n2K3)
  • But in most applications K is fixed
  • Get O(n2)

But how do we find the realization of the first K + 1 vertices?

51

slide-52
SLIDE 52

Realizing (K + 1)-cliques in RK

  • Realizing (K + 1)-cliques in RK−1 yields “flat simplices”

(e.g. triangles on lines)

  • Use “natural” embedding dimension RK
  • Same reasoning as above:

get system Ay = b where y = xK+1 and Aj = 2(xj − xK)

  • But now A is (K − 1) × K
  • Same as previous case with A singular

52

slide-53
SLIDE 53

Almost square

How can you solve the following system Ay = b:

  

a11 a12 . . . a1K . . . . . . ... . . . aK−1,1 aK−1,2 . . . aK−1,K

     

y1 . . . yK−1

   =   

b1 . . . bK−1

  

where A has one more columns than rows and rank K − 1?

53

slide-54
SLIDE 54

Basics and nonbasics

  • Since rk A = K − 1, ∃ K − 1 linearly independent columns
  • B: set of their indices
  • N : index of remaining columns
  • B: (K − 1) × (K − 1) square matrix of columns in B
  • ⇒ B is nonsingular
  • Can partition columns as A = (B|N)

Column j corresponds to variable yj

  • Variables yB are called basic variables
  • Variable yN is called nonbasic variable

54

slide-55
SLIDE 55

The dictionary

(B|N)y = b ⇒ ByB + NyN = b ⇒ yB = B−1b − B−1NyN Basics expressed in function of nonbasic

55

slide-56
SLIDE 56

One quadratic equation

  • From value of yN, can use dictionary to get y
  • Use one quadratic equation
  • 1. Pick any h ∈ {1, . . . , K − 1}, equation is xh − y2

2 = d2 hK

  • 2. y = (yB|yN)⊤
  • 3. Replace yB with B−1b − B−1NyN in equation
  • 4. Solve resulting quadratic equation in one variable yN
  • 5. Get 0,1 or 2 values for yN
  • 6. ⇒ Get 0,1 or 2 positions for xK+1

56

slide-57
SLIDE 57

What if B−1N is zero?

  • yB = B−1b − B−1NyN reduces to yB = B−1b
  • Use one quadratic equation
  • 1. Pick any h ∈ {1, . . . , K − 1}, equation is xh − y2

2 = d2 hK

  • 2. y = (yB|yN)⊤
  • 3. Replace yB with B−1b in equation
  • 4. Solve resulting quadratic equation in one variable yN
  • 5. Get 0,1 or 2 values for yN
  • 6. ⇒ Get 0,1 or 2 positions for xK+1

57

slide-58
SLIDE 58

The difference

  • B−1N = 0: yN

dictionary

− − − − − − − → yB

  • Different values y+

N = y− N −

→ y+, y− with different components

  • B−1N = 0: yB

quadratic eqn.

− − − − − − − − − − → yN

  • Even if y+

N = y− N, K − 1 components of y+, y− are equal

aff(x1, . . . , xK−1) = {y ∈ RK | yN = 0}

58

slide-59
SLIDE 59

The case of no solutions

  • No realizations exist for this (K + 1)-clique in RK
  • DGP instance is NO

59

slide-60
SLIDE 60

The case of one solution

  • Assume for simplicity: N = K, h = 1, B−1N = 0

Then xh − y2 = d2

h,K+1 becomes: λy2

K − 2µyK + ν

= 0, where λ =

  • ℓ,j<K

1 + β2

ℓja2 jK

µ = x1K +

  • ℓ,j<K

βℓjajK(βℓjbℓ − x1ℓ) ν =

  • ℓ,j<K

βℓjbℓ(βℓjbℓ − 2x1ℓ) + x12 − d2

1,K+1

  • (Exactly one solution for yK) ⇔ µ2 = λν, not a tautology
  • The set of all (K + 1)-clique DGP instances in RK s.t. µ2 = λν

has Lebesgue measure 0

  • Ignore them, they happen with probability∗ 0!

∗ Assuming continuous distributions over the reals. For floating point number, who knows? . . .

but we’ll ignore these instances anyhow 60

slide-61
SLIDE 61

Discriminant > 0, = 0

4 4 4 1 3 1 2 3 1 2 3 2

61

slide-62
SLIDE 62

The case of two solutions

  • K spheres SK−1

1

, . . . , SK−1

K

in RK

centered at x1, . . . , xK with radii d1,K+1, . . . , dK,K+1

  • xK+1 must be at the intersection of SK−1

1

, . . . , SK−1

K

  • If

j SK−1 j

= ∅, then |

j SK−1 j

| = 2 in general

  • will not mention “probability 0” or “in general” anymore

[Coope 2000]

62

slide-63
SLIDE 63

Mirror images

  • Let x+ = {x1, . . . , xK, x+

K+1}, x− = {x1, . . . , xK, x− K+1}

assume dim aff(x1, . . . , xK) = K (†)

  • Theorem

x+, x− ∈ RK are reflections w.r.t. hyperplane defined by x1, . . . , xK

  • Proof
  • 1. x+, x− congruent by construction
  • 2. ∀i ≤ K xi ∈ x+ ∩ x− → x+, x− not translations
  • 3. |x+ ∩ x−| = K < |x+| = |x−| → x+, x− not rotations by (†)
  • 4. ⇒ must be reflections

63

slide-64
SLIDE 64

Algorithm for realizing (K + 1)-cliques in RK

// realize 1 at the origin x1 = (0, . . . , 0) // realize next vertex iteratively for ℓ ∈ {2, . . . , K + 1} do // at most two positions in Rℓ−1 for vertex ℓ S =

i<ℓ

Sℓ−2

i

if S = ∅ then // warn if infeasible return ∅ end if // arbitrarily choose one of the two points choose any xℓ ∈ S end for // return feasible realization return x

64

slide-65
SLIDE 65

Complexity of Alg. 2

  • Outer loop: O(K)
  • Gaussian elimination on A: O(K3)
  • Some messing about to obtain x+

K+1, x− K+1: +O(K2)

  • Get O(K4)
  • But in most applications K is fixed
  • Get O(1)

65

slide-66
SLIDE 66

Back to complete graphs

  • Alg. 2: realize 1, . . . , K + 1 in RK: O(1)
  • Alg. 1: Realize K + 2, . . . , n: O(n2)
  • ⇒ O(n2)
  • What about |X|?

– Alg. 1 is deterministic: one solution from x1, . . . , xK+1 – Alg. 2 is stochastic: pick one of two values K times

⇒ |X| = 2K

66

slide-67
SLIDE 67

K-trilaterative graphs

  • In Alg. 1 we only need each v > K + 1 to have K + 1 adjacent

predecessors in order to find a unique solution for xv

  • Determination of xv from K+1 adjacent predecessors: K-trilateration
  • K-trilaterative graph:

(i) has a vertex order ensuring this property (ii) the initial K + 1 vertices induce a (K + 1)-clique the order is called K-trilateration order

  • Alg. 1 realizes all K-trilaterative graphs

The DGP restricted to K-trilaterative graphs in RK is easy [Eren et al. 2004]

67

slide-68
SLIDE 68

The story so far

  • Lots of nice applications
  • DGP is NP-hard
  • May have 0, 1, finitely many or 2ℵ0 solutions modulo congruences
  • Continuous optimization techniques don’t scale well
  • Using K + 1 adjacent predecessors, realize K-trilaterative graphs

in RK in polytime

  • Do we need K + 1 adjacent predecessors, or can we do with

less?

68

slide-69
SLIDE 69

The Branch-and-Prune algorithm

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

69

slide-70
SLIDE 70

Fewer adjacent predecessors

  • Alg. 2 only needs K adjacent predecessor
  • Extend to n vertices: (K − 1)-trilaterative graphs
  • Can we realize (K − 1)-trilaterative graphs in RK?
  • A small case: graph consisting of two K + 1 cliques

1 1 1 5 5 5 2 2 2 3 3 3 4 4 4

70

slide-71
SLIDE 71

Take a closer look. . .

5 5 2 2 3 3 4 4

  • Realization of a K + 1 clique in RK knowing x1, . . . , xK
  • We know how to do that!
  • Consistent with 2 solutions for x5, reflected across plane through

x2, x3, x4

71

slide-72
SLIDE 72

Discretization and pruning edges

  • (K − 1)-trilaterative graph G = (V, E):

∀v > K ∃Uv ⊂ V (|Uv| = K ∧ ∀u ∈ Uv(u < v) ∧ {u, v} ∈ E)

  • Discretization edges:

ED = {{u, v} ∈ E | u, v ≤ K}

  • initial clique

∪ {{u, v} ∈ E | v > K ∧ u ∈ Uv}

  • vertex order
  • Pruning edges EP = E ED

10 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19

72

slide-73
SLIDE 73

Role of discretization edges

Missing discretization edge ⇒ non-rigid structure ⇒ X uncountable Else: X finite

73

slide-74
SLIDE 74

Role of pruning edges

No pruning edges: 8 incongruent realizations in R2 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

74

slide-75
SLIDE 75

Role of pruning edges

Pruning edge {1, 4}: only 4 realizations remain valid 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

75

slide-76
SLIDE 76

Motivation Protein backbones

  • Total order < on V
  • Covalent bond distances: {u − 1, u} ∈ E
  • Covalent bond angles: {u − 2, u} ∈ E
  • NMR experiments: {u − 3, u} ∈ E

(and other edges {u, v} with v − u > 3) Generalize “3” to K

[Lavor et al., COAP 2012]

76

slide-77
SLIDE 77

KDMDGP graphs

K = 2 K = 3 1 2 3 4 5 Generalization of protein backbone order: v > K is adjacent to K immediate predecessors v − 1, . . . , v − K

KDMDGP: Discretizable Molecular Distance Geometry Problem

77

slide-78
SLIDE 78

The Branch-and-Prune (BP) algorithm

BP(v, ¯ x, X):

  • 1. Given v > K, realization ¯

x = (x1, . . . , xv−1)

  • 2. Compute S =
  • u∈Uv

SK−1

u

  • 3. For each xv ∈ S s.t. ∀{u, v} ∈ EP (u < v → xu − xv = duv)

(a) let x = (¯ x, xv) (b) if v = n add x to X, else call BP(v + 1, x, X)

  • Recursive: starts with BP(K + 1, (x1, . . . , xK), ∅)
  • All realizations in X are incongruent∗
  • Can be easily modified to find only p solutions for given p
  • Applies to all (K − 1)-trilaterative graphs in RK
  • Specialize to KDMDGP graph by setting Uv = {v − 1, . . . , v − K}

∗ with probability 1, and aside from one reflection at v = K + 1

[L. et al. ITOR 2008]

78

slide-79
SLIDE 79

Complexity of BP

  • Most operations are O(Kh) for some fixed h ⇒ O(1)
  • Distance check at Step 3: O(n)
  • Recursion on at most 2 branches at each call: binary tree
  • Only recurse when v > K, v < n: 2n−K nodes
  • Overall O(n2n−K) = O(2n)

Worst-case exponential behaviour

79

slide-80
SLIDE 80

Hardness of KDMDGP

  • The KDMDGP is NP-hard for each K

– every DGP instance is also DMDGP if K = 1 – reduction from Partition can be extended to any K

  • (K − 1)-trilateration graphs are NP-hard by inclusion
  • No polytime algorithm unless P=NP

Trilaterative graphs in RK are complexitywise borderline at K

80

slide-81
SLIDE 81

Correctness

Thm. When BP terminates, X contains every incongruent realization of G Proof.

  • Let ¯

y be any realization of G

  • Since G has an initial K-clique, can rotate/translate/reflect ¯

y to y[K] = x[K] for all x ∈ X

  • BP exhaustively constructs every extension of x[K] which is feasible

with all distances, so y ∈ X

for a realization y, y[h] = (y1, . . . , yh) is the initial segment of y 81

slide-82
SLIDE 82

Two examples

82

slide-83
SLIDE 83

Empirical observations

  • Fast: up to 10k vertices in a few seconds on 2010 hardware
  • Precise: errors in range O(10−9)-O(10−12)
  • Number of solutions always a power of 2:
  • bvious if EP = ∅, but otherwise mysterious
  • Linear-time behaviour on proteins:

this really shouldn’t happen

83

slide-84
SLIDE 84

Symmetry in the KDMDGP

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

[L. et al. DAM 2014]

84

slide-85
SLIDE 85

Partial reflections

  • For each v > K, let

gv(x) = (x1, . . . , xv−1, Rv

x(xv), . . . , Rv x(xn))

be the partial reflection of x w.r.t. v

xv−3 xv−2 xv−1

  • Note: the gv’s are idempotent operators
  • GD = (V, ED): subgraph of G given by discretization edges
  • ∀v > K reflection Rv

x gives a binary choice in general∗

  • XD ⊂ RnK contains 2n−K incongruent realizations of GD

∗ subsequent results hold “with probability 1”

85

slide-86
SLIDE 86

Discretization group

  • GD = gv | v > K: the discretization group of G w.r.t. K

subgroup of a Cartesian product of reflection groups

  • An element g ∈ GD has the form
  • v>K

gav

v , where av ∈ {0, 1}

  • Action of GD on XD: g(x) =
  • gaK+1

K+1 ◦ · · · ◦ gan n

  • (x)

86

slide-87
SLIDE 87

Commutativity of partial reflections

Lemma A GD is Abelian Proof Assume K < u < v. Then gugv(x) = gu(x1, . . . , xv−1, Rv

x(xv), . . . , Rv x(xn))

= (x1 . . . , xu−1, Ru

gv(x)(xu), . . . , Ru gv(x)Rv x(xv), . . . , Ru gv(x)Rv x(xn))

= (x1 . . . , xu−1, Ru

x(xu), . . . , Rv gu(x)Ru x(xv), . . . , Rv gu(x)Ru x(xn))

= gv(x1, . . . , xu−1, Ru

x(xu), . . . , Ru x(xn))

= gvgu(x) where equality of these terms holds by a Technical Lemma (next slide)

[L. et al. 2013]

87

slide-88
SLIDE 88

Commutativity of partial reflections

Technical Lemma (Proof sketch for K = 2) Let y⊥Aff(xv−1, . . . , xv−K) and ρy = Rv

x

ρz ρy t O z y ρzy ρyt ρzt ρρzy ρzρyt = ρρzyρzt

88

slide-89
SLIDE 89

One realization generates all others

Lemma B The action of GD on XD is transitive

K = 2

x y g g g5

4 3

∃g ∈ GD (y = g(x)): namely, y = g5(g4(g3(x)))

Proof By induction on v: assume result holds to v − 1 with g′, then either it holds for v and g = g′, else flip and let g = gvg′ [L. et al. 2013]

89

slide-90
SLIDE 90

Structure and invariance

  • GD is Abelian and generated by n − K idempotent elements

⇒ GD ∼ = Cn−K

2

  • GD ≤ Aut(XD) by construction

90

slide-91
SLIDE 91

Solution sets

  • X: set of incongruent realizations of G
  • GD defined on same vertices but fewer edges

⇒ fewer distance constraints on realizations ⇒ more realizations

  • All realizations of G are also realizations of GD

⇒ X ⊆ XD

91

slide-92
SLIDE 92

Losing invariance on pruning edges

Lemma C Let W uv = {u + K + 1, . . . , v} be the range of {u, v} ∀x ∈ X, u, w, v ∈ V (w ∈ W uv ↔ xu − xv = gw(x)u − gw(x)v) Proof sketch for K = 2

u w v

Corollary If {u, v} ∈ EP and w ∈ W uv, gw(x) ∈ X

[L. et al. 2013]

92

slide-93
SLIDE 93

Pruning group

Define: Γ = {gw ∈ GD | w > K ∧ ∀{u, v} ∈ EP (w ∈ W uv)}

GP

= Γ Lemma D X is invariant w.r.t. GP Proof Follows by corollary, invariance of XD w.r.t. GD and X ⊆ XD

93

slide-94
SLIDE 94

Transitivity of the pruning group

Lemma E The action of GP on X is transitive

  • Given x, y ∈ X, aim to show ∃g ∈ GP (y = g(x))
  • Lemma B ⇒ ∃g ∈ GD with y = g(x) ∈ XD
  • Suppose g ∈ GP and aim for a contradiction
  • ⇒ ∃{u, v} ∈ EP and w ∈ W uv s.t. gw is a component of g
  • Lemma C ⇒ gw(x)u − gw(x)v = duv
  • If w is the only such vertex, y = g(x) = x against hypothesis, done
  • Suppose ∃ another z ∈ W uv s.t. gz is a component of g
  • Set of cases s.t. xu − xv = gzgw(x)u − gzgw(x)v given gw(x)u − gw(x)v =

xu − xv = gz(x)u − gz(x)v has Lebesgue measure 0 in all DGP inputs

  • By induction, holds for any number of components gz of g with z ∈ W uv
  • ⇒ y = g(x) = x against hypothesis, done

[L. et al. 2013]

94

slide-95
SLIDE 95

The main result

Theorem |X| = 2|Γ|

  • Lemma A ⇒ GD ∼

= Cn−K

2

⇒ |GD| = 2n−K

  • GP ≤ GD ⇒ ∃ℓ ∈ N (GP ∼

= Cℓ

2) , with ℓ = |Γ|

  • Lemma E ⇒ ∀x ∈ X GPx = X
  • Idempotency ⇒ ∀g ∈ GP

g−1 = g ⇒ ∀g, h ∈ GP, x ∈ X (gx = hx → h−1gx = x → hgx = x → hg = I → h = g−1= g) ⇒ the mapping GPx → GP given by gx → g is injective

  • ∀g, h ∈ GP, x ∈ X (g = h → gx = hx)

⇒ the mapping gx → g is surjective

  • ⇒ the mapping gx → g is a bijection
  • ⇒ |GPx| = |GP|
  • ⇒ ∀x ∈ X

|X| = |GPx| = |GP| = 2|Γ| [L. et al. 2013]

95

slide-96
SLIDE 96

Symmetry-aware BP

  • Don’t need to explore all branches of BP tree
  • Build Γ as a pre-processing step
  • Run BP, terminating as soon as |X| = 1
  • For each g ∈ GP , compute gx

[Mucherino et al. JBCB 2012]

96

slide-97
SLIDE 97

Complexity

  • Computing Γ: O(mn)
  • 1. initialize indicator vector ι = (ιK+1, . . . , ιn) for gv ∈ Γ
  • 2. initialize ι = 1
  • 3. for each {u, v} ∈ EP and w ∈ W uv let ιw = 0
  • BP: O(2n)
  • Compute gx for each g ∈ GP: O(2|Γ|)
  • Overall: O(2n)
  • Gains depend on the instance

97

slide-98
SLIDE 98

Tractability of protein instances

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

[L. et al. 2013]

98

slide-99
SLIDE 99

Let’s handle the BP tree

1 2 3 4 29 5 17 6 13 7 12 8 9 10 11 14 15 16 18 22 19 21 20 23 24 25 26 27 28 30 42 31 38 32 37 33 34 35 36 39 40 41 43 47 44 46 45 48 49 50 51 52 53

Max depth: n, looks good! Aim to prove width is bounded

99

slide-100
SLIDE 100

Number of solutions at each BP tree level

Depends on range of longer pruning edge incident to level v

2 4 8 16 32 64 128 256 512 256 128 64 32 16 8 2 1 2 3 4 5 6 7 8 9 2 4 8 16 32 64 128 2 4 8 16 32 64 2 4 8 16 32 2 4 8 16 2 4 8 2 4 2 4 K+ Green path 8 nodes at level K + 4 {3, K+5} ∈ EP ⇒ gK+4, gK+5 ∈ Γ ⇒ no symmetry at levels K + 4 and K + 5 ⇒ only 4 nodes at level K + 5 no pruning edges pruning edges make graph K-trilaterative 100

slide-101
SLIDE 101

Periodic pruning edges

1 16 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17

11 5 6 7 8 9 2 4 8 16 32 64 128 256 512 256 128 64 32 16 8 2 2 4 8 16 32 64 128 2 4 8 16 32 64 2 4 8 16 32 2 4 8 16 2 4 8 2 4 2 4 12 10 ... 4

  • 2ℓ growth up to level ℓ, then constant: O(2ℓn) nodes in BP tree
  • BP is Fixed-Parameter Tractable (FPT) in a bunch of cases
  • For all tested protein backbones, ℓ ≤ 5 ⇒ BP linear on proteins!

101

slide-102
SLIDE 102

The story so far

  • Nice applications, problem is hard, could have many solutions
  • Continuous methods don’t scale
  • If certain vertex orders are present, use mixed-combinatorial methods
  • Realize K-trilaterative in polytime but (K − 1)-trilaterative are

hard

  • If adjacent predecessors are immediate, theory of symmetries
  • Number of solutions is a power of two
  • For proteins, BP is linear time
  • How do we find these vertex orders?

102

slide-103
SLIDE 103

Finding vertex orders

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

[Cassioli et al., DAM]

103

slide-104
SLIDE 104

. . . wasn’t the backbone providing them?

  • NMR data not as clean as I pretended
  • Have to mess around with side chains
  • What about other applications, anyhow?

Methods for finding trilaterative orders automatically

104

slide-105
SLIDE 105

Mostly bad news

  • Finding K-trilaterative orders is NP-complete

:-(

  • But also FPT

:-)

  • Finding KDMDGP orders is NP-complete for all K

:-(

  • It’s also really hard in practice, and methods don’t scale well

105

slide-106
SLIDE 106

Definitions

  • Trilateration Ordering Problem (TOP)

Given a connected graph G = (V, E) and a positive integer K, does G have a K-trilateration order?

  • Contiguous Trilateration Ordering Problem (CTOP)

Given a connected graph G = (V, E) and a positive integer K, does G have a (K −1)-trilateration order such that Uv = {v − 1, . . . , v − K} for each v > K? Both problems are in NP

106

slide-107
SLIDE 107

Hardness of TOP

  • Essentially due to finding the initial clique

– brute force: test all

n

K

  • subsets of V

n

K

  • is O(nK), polytime if K fixed
  • Reduction from K-Clique problem:

Given a graph, does it have a K-clique?

[Mucherino et al., OPTL 2012]

107

slide-108
SLIDE 108

Reduction from K-Clique

reduction

G G′ = G ⊕ K

  • If K-Clique instance is YES

– start with α = (initial clique of G, K) – induction: if αv−1 defined, pick αv at shortest path distance 1 from α

  • If K-Clique instance is NO

– By contradiction: suppose ∃ trilateration order α in G′ – Initial clique α[K] = (α1, . . . , αK) must have K − 1 vertices in G, 1 in K – αK+1 must be in G, hence ∃ K-clique in G [Cassioli et al., DAM]

108

slide-109
SLIDE 109

Once the initial clique is known

Greedily grow a trilateration order α

  • Initialize α with initial K-clique K
  • Let W = V K
  • ∀v > K av = |vertices in K adjacent to v|

// at termination, av will be the number of adjacent predecessors of v

  • While W = ∅:
  • 1. choose v ∈ W with largest av
  • 2. if av < K instance is NO
  • 3. α ← (α, v)
  • 4. for all u ∈ W adjacent to v, increase au
  • 5. W ← W {v}
  • Instance is YES

[Mucherino et al., OPTL 2012]

109

slide-110
SLIDE 110

Greedy algorithm is correct

  • Assume TOP instance is YES, proceed by induction

– start: by maximality, aK+1 > K – assume α is a valid TOP up to v − 1, suppose av < K – but instance is YES so there is another z ∈ W with az ≥ K – contradicts maximality of av

  • Assume TOP instance is NO

– algorithm termination at W = ∅ contradicts the NO – hence it must terminate with W = ∅ and “NO” answer

110

slide-111
SLIDE 111

Complexity

  • Outer while loop: O(n)
  • Choice of largest av: O(n)
  • Inner loop on W: O(n)
  • Overall: O(n2)
  • If we add brute force initial clique: O(2Kn2)
  • Polytime if K fixed, FPT otherwise

111

slide-112
SLIDE 112

CTOP is hard

  • Reduction from Hamiltonian Path (HP)

Given a graph G, does it have a path passing through each vertex exactly once?

  • α a H. path in G ⇒ ∀v = 1, n αv is adjacent to αv−1, αv+1
  • Apart from initial 1-clique α1

every αv is adjacent to its immediate predecessor

  • ⇒ α is a KDMDGP order in G with K = 1
  • HP is the same as KDMDGP with K = 1
  • ⇒ By inclusion, KDMDGP is NP-hard

[Cassioli et al., DAM]

112

slide-113
SLIDE 113

CTOP is hard for all K

  • Reduction from HP

reduction

1 2 3 C1 C2 C3 C12 C13

  • Technical proof

[Cassioli et al., DAM]

113

slide-114
SLIDE 114

How do we find KDMDGP orders?

Mathematical optimization & CPLEX

  • xvi = 1 iff vertex v has rank i in the order
  • Each vertex has a unique order rank:

∀v ∈ V

  • i∈¯

n

xvi = 1;

  • Each rank value is assigned a unique vertex:

∀i ∈ ¯ n

  • v∈V

xvi = 1;

  • There must be an initial K-clique:

∀v ∈ V, i ∈ {2, . . . , K}

  • u∈N(v)
  • j<i

xuj ≥ (i − 1)xvi;

  • Each vertex with rank > K must have at least K contiguous adjacent predecessors

∀v ∈ V, i > K

  • u∈N(v)
  • i−K≤j<i

xuj ≥ Kxvi.

  • Do not expect too much; scales up to 100 vertices

114

slide-115
SLIDE 115

How about those 10k-atom backbones?

We have Carlile for those

  • Note the repetitions — they serve a purpose!
  • Repetition orders are also hard to find for any K
  • . . . but Carlile knows how to handcraft them!

[Lavor et al. JOGO 2013]

115

slide-116
SLIDE 116

And what about the side-chains?

The Carlile+Antonio tool!

[Costa et al. JOGO, submitted]

116

slide-117
SLIDE 117

Approximate realizations

  • 1. Applications
  • 2. Definition
  • 3. Complexity primer
  • 4. Complexity of the DGP
  • 5. Number of solutions
  • 6. Mathematical optimization formulations
  • 7. Realizing complete graphs
  • 8. The Branch-and-Prune algorithm
  • 9. Symmetry in the KDMDGP
  • 10. Tractability of protein instances
  • 11. Finding vertex orders
  • 12. Approximate realizations

117

slide-118
SLIDE 118

Data errors

The “distance = real number” paradigm is a lie!

  • Covalent bonds are fairly precise
  • NMR data is a mess [Berger, J. ACM 1999]

– experimental errors yield intervals [dL

uv, dU uv]

– NMR outputs frequencies of (atom type pair, distance value) weighted graph reconstruction yields systematic error – some atom type pairs yield more error (“only trust H—H”)

  • Properties of specific molecules give rise to other constraints
  • The protein graph may not be (K − 1)-trilaterative based on

the backbone

118

slide-119
SLIDE 119

The Lavorder comes to the rescue!

  • Carlile’s handcrafted repetition orders properties:

– repetitions allow a “virtual backbone” of H atoms only – discretization edges: {v, v − i} covalent bonds for i ∈ {1, 2}, {v, v − 3} sometimes covalent sometimes from NMR – most NMR data restricted to pruning edges

  • When dv,v−3 is an interval: intersect two spheres with sph. shell

dL dU

  • Discretize circular segments and run BP with modified S

Algorithm no longer exhaustive

119

slide-120
SLIDE 120

Die Symmetriktheoried¨ ammerung

  • Intervals and discretization break the theory of symmetries
  • duv
  • Only some bounds for the number b of BP solutions:

∃ℓ, k 2ℓqk ≤ b ≤ 2n−3qM

q = |discretization points|, M = |NMR discretization edges|

120

slide-121
SLIDE 121

But at least it’s producing results

Joint work with Institut Pasteur

[Cassioli et al., BMC Bioinf., submitted]

121

slide-122
SLIDE 122

General approximate methods

  • All these methods are specialized to

protein distance data from NMR

  • What about general approximate methods?
  • Assume large-sized input data with errors
  • No assumptions on graph structure

122

slide-123
SLIDE 123

Ingredients

  • PDM = Partial Distance Matrix (a representation of G)
  • EDM = Euclidean Distance Matrix
  • 1. Complete the given PDM d to a symmetric matrix D
  • 2. Find a realization x (in some dimension ¯

K) s.t. the EDM (xu − xv) is “close” to D

  • 3. Project x from dimension ¯

K to dimension K, keeping pairwise distances approximately equal

123

slide-124
SLIDE 124

Completing the distance matrix

  • ∀{u, v} ∈ E let Duv = length of the shortest path u → v
  • Use Floyd-Warshall’s algorithm O(n3)

1: // n × n array Dij to store distances 2: D = 0 3: for {i, j} ∈ E do 4:

Dij = dij

5: end for 6: for k ∈ V do 7:

for j ∈ V do

8:

for i ∈ V do

9:

if Dik + Dkj < Dij then

10:

// Dij fails to satisfy triangle inequality, update

11:

Dij = Dik + Dkj

12:

end if

13:

end for

14:

end for

15: end for

124

slide-125
SLIDE 125

Finding a realization

  • Let’s give ourselves many dimensions, say ¯

K = n

  • Attempt to find x : V → Rn with (xu − xv2) ≈ (Duv)
  • If we had the Gram matrix B of x, then:
  • 1. find eigen(value/vector) matrices Λ, Y of B
  • 2. since B is PSD, Λ ≥ 0 ⇒

√ Λ exists

  • 3. ⇒ B = Y ΛY ⊤ = (Y

√ Λ)(Y √ Λ)⊤

  • 4. x = Y

√ Λ is such that xx⊤ = B

  • Can we compute B from D?

125

slide-126
SLIDE 126

Schoenberg’s theorem

  • Standard method for computing B from D2
  • Also known as classic MultiDimensional Scaling (MDS)
  • Apply many algebraic manipulations to

d2

uv = xu − xv2 = xu⊤xu + xv⊤xv − 2xu⊤xv

where the centroid

  • k≤n

xuk = 0 for all u ≤ n

  • Get B = −1

2(In − 1 n1n)D2(In − 1 n1n), i.e.

xu · xv = 1 2n

  • k≤n

(d2

uk + d2 kv) − d2 uv −

1 2n2

  • h≤n

k≤n

d2

hk

  • D “approximately” EDM ⇒ B “approximately” Gram

[Schoenberg, Annals of Mathematics, 1935]

126

slide-127
SLIDE 127

Project to RK for a given K

  • Only use the K largest eigenvalues of Λ
  • Y [K] = K columns of Y corresp. to K largest eigenvalues
  • Λ[K] = K largest eigenvalues of Λ on diagonal
  • x = Y [K]
  • Λ[K] is a K × n matrix
  • Y [K] span the subspace where x “fills more space”, i.e. neglecting other

dimensions causes smaller errors w.r.t. the realization in Rn

This method is called Principal Component Analysis (PCA)

127

slide-128
SLIDE 128

Isomap

Given K and PDM d:

  • 1. D = FloydWarshall(d)
  • 2. B = MDS(D)
  • 3. x = PCA(B, K)

− → [Tenenbaum et al. Science 2000]

128

slide-129
SLIDE 129

Some references

  • L. Liberti, C. Lavor, N. Maculan, A. Mucherino, Euclidean distance geometry

and applications, SIAM Review, 56(1):3-69, 2014

  • L. Liberti, B. Masson, J. Lee, C. Lavor, A. Mucherino, On the number
  • f realizations of certain Henneberg graphs arising in protein conformation,

Discrete Applied Mathematics, 165:213-232, 2014

  • L. Liberti, C. Lavor, A. Mucherino, The discretizable molecular distance geometry

problem seems easier on proteins, in [see below], 47-60

  • A. Mucherino, C. Lavor, L. Liberti, N. Maculan (eds.), Distance Geometry:

Theory, Methods and Applications, Springer, New York, 2013

THE END