Distance Geometry in Data Science Leo Liberti, CNRS LIX Ecole - PowerPoint PPT Presentation

Reformulations The (MSSC) formulation has the same optima as: � �  min P ij x ij  x,y,P  i ≤ n j ≤ d   � p i − y j � 2  ∀ i ≤ n, j ≤ d ≤ P ij  2   � �  ∀ j ≤ d p i x ij = y j x ij    i ≤ n i ≤ n  � ∀ i ≤ n x ij = 1  j ≤ d    ∀ j ≤ d y j ∈ ([min i ≤ n p ia ] , max i ≤ n p ia | a ≤ d )      { 0 , 1 } nd x ∈     [0 , P U ] nd ∈ P ◮ Only nonconvexities: products of bounded by binary variables ◮ Caveat : cannot have empty clusters 26 / 160

Products of binary and continuous vars. ◮ Suppose term xy appears in a formulation ◮ Assume x ∈ { 0 , 1 } and y ∈ [0 , 1] is bounded ◮ means “either z = 0 or z = y ” ◮ Replace xy by a new variable z ◮ Adjoin the following constraints: z ∈ [0 , 1] y − (1 − x ) ≤ z ≤ y + (1 − x ) − x ≤ z ≤ x ◮ ⇒ Everything’s linear now! [Fortet 1959] 27 / 160

Products of binary and continuous vars. ◮ Suppose term xy appears in a formulation ◮ Assume x ∈ { 0 , 1 } and y ∈ [ y L , y U ] is bounded ◮ means “either z = 0 or z = y ” ◮ Replace xy by a new variable z ◮ Adjoin the following constraints: [min( y L , 0) , max( y U , 0)] z ∈ y − (1 − x ) max( | y L | , | y U | ) ≤ ≤ y + (1 − x ) max( | y L | , | y U | ) z − x max( | y L | , | y U | ) ≤ ≤ x max( | y L | , | y U | ) z ◮ ⇒ Everything’s linear now! [L. et al. 2009] 28 / 160

MSSC is a convex MINLP � � min χ ij x,y,P,χ,ξ i ≤ n j ≤ d ∀ i ≤ n, j ≤ d 0 ≤ χ ij ≤ P ij ∀ i ≤ n, j ≤ qquadP ij − (1 − x ij ) P U ≤ ≤ x ij P U χ ij � p i − y j � 2 ∀ i ≤ n, j ≤ d ≤ P ij ⇐ convex 2 � � ∀ j ≤ d p i x ij = ξ ij i ≤ n i ≤ n y j − (1 − x ij ) max( | y L | , | y U | ) ≤ ≤ y j + (1 − x ij ) max( | y L | , | y U | ) ∀ i ≤ n, j ≤ d ξ ij − x ij max( | y L | , | y U | ) ≤ ≤ x ij max( | y L | , | y U | ) ∀ i ≤ n, j ≤ d ξ ij � ∀ i ≤ n x ij = 1 j ≤ d [ y L , y U ] ∀ j ≤ d y j ∈ { 0 , 1 } nd x ∈ [0 , P U ] nd P ∈ [0 , P U ] nd χ ∈ [min( y L , 0) , max( y U , 0)] ∀ i ≤ n, j ≤ d ξ ij ∈ y j , ξ ij , y L , y U are vectors in R m 29 / 160

Solving the MSSC ◮ k-means ◮ heuristic (optimum not guaranteed) ◮ fast, well-known, lots of analyses ◮ scales reasonably well ◮ implemented in practically all languages ◮ convex MINLP ◮ exact (guaranteed global optima) ◮ reasonably fast only for small sizes ◮ scales exponentially ◮ Solvers: KNITRO (commercial), Bonmin (free) need an MP language interpreter (AMPL) 30 / 160

Outline Direct methods Reasoning Semidefinite Programming Relations, graphs, distances Diagonal Dominance Clustering Barvinok’s naive algorithm Clustering in graphs Isomap for the DGP Clustering in Euclidean spaces Distance resolution limit Metric embeddings When to start worrying Fréchet embeddings in ℓ ∞ Embeddings in ℓ 2 Random projections Classic MDS More efficient clustering PCA Random projections in LP Distance Geometry Projecting feasibility Projecting optimality DGP applications Solution retrieval Complexity of the DGP Quantile regression Number of solutions Solution methods The end 31 / 160

Metric embeddings Mapping metric spaces into each other 32 / 160

Graphs with shortest path metric E.g. mathematical genealogy skeleton 33 / 160

The distance matrix (only for a subgraph) Euler Thibaut Pfaff Lagrange Laplace Möbius Gudermann Dirksen Gauss Kästner 10 1 1 9 8 2 2 2 2 Euler 11 9 1 3 10 12 12 8 Thibaut 2 10 10 3 1 1 3 Pfaff 8 8 1 3 3 1 Lagrange 2 9 11 11 7 Laplace 9 11 11 7 Möbius 4 4 2 Gudermann 2 4 Dirksen 4 0 10 1 1 9 8 2 2 2 2   10 0 11 9 1 3 10 12 12 8   1 11 0 2 10 10 3 1 1 3     1 9 2 0 8 8 1 3 3 1     9 1 10 8 0 2 9 11 11 7   D =   8 3 10 8 2 0 9 11 11 7     2 10 3 1 9 9 0 4 4 2     2 12 1 3 11 11 4 0 2 4     2 12 1 3 11 11 4 2 0 4   2 8 3 1 7 7 2 4 4 0 34 / 160

Subsection 1 Fréchet embeddings in ℓ ∞ 35 / 160

ℓ ∞ embeddings ◮ Given a metric space ( X, d ) with distance matrix D = ( d ij ) ◮ Consider i -th row δ i = ( d i 1 , . . . , d in ) of D ◮ Embed i ∈ X by vector δ i ∈ R n ◮ Define f ( X ) = { δ 1 , . . . , δ n } , f ( d ( i, j )) = � f ( i ) − f ( j ) � ∞ ◮ Thm.: ( f ( X ) , ℓ ∞ ) is a metric space with distance matrix D ◮ If ( X, d ) not a metric space, ∃ i, j ∈ X ( d ( i, j ) � = � δ i − δ j � ∞ ) ◮ Issue: embedding is high-dimensional ( R n ) [Kuratowski 1935] 36 / 160

Proof ◮ Consider i, j ∈ X with distance d ( i, j ) = d ij ◮ Then f ( d ( i, j )) = � δ i − δ j � ∞ = max k ≤ n | d ik − d jk | ≤ max k ≤ n | d ij | = d ij ◮ max | d ik − d jk | over k ≤ n is achieved when k ∈ { i, j } ⇒ f ( d ( i, j )) = d ij 37 / 160

Genealogical similarity example Fréchet embedding in the 3 most significant (rotated) dimensions 38 / 160

Subsection 2 Approximate embeddings in ℓ 2 39 / 160

Lower dimensional (approximate) embeddings ◮ Given distance matrix, find approximate Euclidean embedding ◮ Application: visualize a metric space e.g. embed genealogy tree in R 3 (some errors allowed) ◮ For visualization purposes, K ∈ { 1 , 2 , 3 } for other purposes, K < n Classical methods ◮ Multi-Dimensional Scaling (MDS) ◮ Principal Component Analysis (PCA) 40 / 160

Classic Multidimensional Scaling 41 / 160

Definition and history ◮ [I. Schoenberg, Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espaces distanciés vectoriellement applicable sur l’espace de Hilbert” , Ann. Math., 1935] ◮ Question: Given n × n symmetric matrix D , what are necessary and sufficient conditions s.t. D is a EDM 1 corresponding to n points x 1 , . . . , x n ∈ R K with minimum K ? ◮ Main theorem: Thm. D = ( d ij ) is an EDM iff 1 2 ( d 2 1 i + d 2 1 j − d 2 ij | 2 ≤ i, j ≤ n ) is PSD of rank K ◮ PSD: positive semidefinite , all eigenvalues ≥ 0 1 Euclidean Distance Matrix 42 / 160

Gram in function of EDM ◮ x = ( x 1 , . . . , x n ) ⊆ R K , written as n × K matrix ◮ matrix G = xx ⊤ = ( x i · x j ) is the Gram matrix of x ◮ Schoenberg’s theorem: relation between EDMs and Gram matrices G = − 1 2 JD 2 J ( § ) ◮ D 2 = ( d 2 ij ) , J = I n − 1 n 11 ⊤ 43 / 160

Multidimensional scaling (MDS) ◮ Often get approximate EDMs ˜ D from raw data (dissimilarities, discrepancies, differences) ◮ ˜ 2 J ˜ G = − 1 D 2 J is an approximate Gram matrix ◮ Approximate Gram ⇒ spectral decomposition P ˜ Λ P ⊤ has ˜ Λ �≥ 0 ◮ Let Λ be closest PSD diagonal matrix to ˜ Λ : zero the negative components of ˜ Λ √ Λ is an “approximate realization” of ˜ ◮ x = P D 44 / 160

Classic MDS: Main result 1. Prove Schoenberg’s theorem: G = − 1 2 JD 2 J 2. Prove matrix is Gram iff it is PSD 45 / 160

Classic MDS: Proof 1/3 ◮ Assume zero centroid WLOG (can translate x as needed) ◮ Expand: d 2 ij = � x i − x j � 2 2 = ( x i − x j )( x i − x j ) = x i x i + x j x j − 2 x i x j ( ∗ ) ◮ Aim at “inverting” ( ∗ ) to express x i x j in function of d 2 ij ✿ 0 by zero centroid ◮ Sum ( ∗ ) over i : � i d 2 ij = � � i x i x i + nx j x j − 2 x j ✘✘ i x i ✘ ◮ Similarly for j and divide by n , get: 1 1 � d 2 � = x i x i + x j x j ( † ) ij n n i ≤ n i ≤ n 1 x i x i + 1 � d 2 � = x j x j ( ‡ ) ij n n j ≤ n j ≤ n ◮ Sum ( † ) over j , get: 1 ij = n 1 � d 2 � � � x i x i + x j x j = 2 x i x i n n i,j i j i ◮ Divide by n , get: 1 ij = 2 � d 2 � x i x i ( ∗∗ ) n 2 [Borg 2010] n i,j i 46 / 160

Classic MDS: Proof 2/3 ◮ Rearrange ( ∗ ), ( † ), ( ‡ ) as follows: x i x i + x j x j − d 2 2 x i x j = (1) ij 1 ij − 1 � d 2 � x i x i = x j x j (2) n n j j 1 ij − 1 � d 2 � x j x j = x i x i (3) n n i i ◮ Replace LHS of Eq. (2)-(3) in Eq. (1), get 2 x i x j = 1 ik + 1 ij − 2 � d 2 n d 2 kj − d 2 � x k x k n n k k ◮ By ( ∗∗ ) replace 2 1 d 2 � � x i x i with ij , get n n 2 i i,j 2 x i x j = 1 ij − 1 � ( d 2 ik + d 2 kj ) − d 2 � d 2 ( § ) hk n 2 n k h,k which expresses x i x j in function of D 47 / 160

Classic MDS: Proof 3/3 ◮ Gram ⊆ PSD ◮ x is an n × K real matrix ◮ G = xx ⊤ its Gram matrix ◮ For each y ∈ R n we have yGy ⊤ = y ( xx ⊤ ) y ⊤ = ( yx )( x ⊤ y ⊤ ) = ( yx )( yx ) ⊤ = � yx � 2 2 ≥ 0 ◮ ⇒ G � 0 ◮ PSD ⊆ Gram ◮ Let G � 0 be n × n ◮ Spectral decomposition : G = P Λ P ⊤ ( P orthogonal, Λ ≥ 0 diagonal) √ ◮ Λ ≥ 0 ⇒ Λ exists √ √ √ √ ⊤ P ⊤ ) = ( P ⊤ ◮ G = P Λ P ⊤ = ( P Λ)( Λ Λ)( P Λ) √ ◮ Let x = P Λ , then G is the Gram matrix of x 48 / 160

Principal Component Analysis 49 / 160

Principal Component Analysis (PCA) ◮ MDS with fixed K √ ◮ Motivation: “draw” x = P Λ in 2D or 3D but rank (Λ) = K > 3 ◮ Only keep 2 or 3 largest components of Λ zero the rest ◮ Get realization in desired space [Pearson 1901] 50 / 160

Mathematicians genealogy in 2D/3D In 2D In 3D 51 / 160

Outline Direct methods Reasoning Semidefinite Programming Relations, graphs, distances Diagonal Dominance Clustering Barvinok’s naive algorithm Clustering in graphs Isomap for the DGP Clustering in Euclidean spaces Distance resolution limit Metric embeddings When to start worrying Fréchet embeddings in ℓ ∞ Embeddings in ℓ 2 Random projections Classic MDS More efficient clustering PCA Random projections in LP Distance Geometry Projecting feasibility Projecting optimality DGP applications Solution retrieval Complexity of the DGP Quantile regression Number of solutions Solution methods The end 52 / 160

Distance Geometry Embedding weighted graphs in Euclidean spaces 53 / 160

Distance Geometry Problem (DGP) Given K ∈ N and G = ( V, E, d ) with d : E → R + , find x : V → R K s.t. � x i − x j � 2 2 = d 2 ∀{ i, j } ∈ E ij Given a weighted graph , draw it so edges are drawn as segments with lengths = weights [Cayley 1841, Menger 1928, Schoenberg 1935, Yemini 1978] 54 / 160

Subsection 1 DGP applications 55 / 160

Some applications ◮ clock synchronization ( K = 1 ) ◮ sensor network localization ( K = 2 ) ◮ molecular structure from distance data ( K = 3 ) ◮ autonomous underwater vehicles ( K = 3 ) ◮ distance matrix completion (whatever K ) 56 / 160

Clock synchronization From [Singer, Appl. Comput. Harmon. Anal. 2011] Determine a set of unknown timestamps from a partial measurements of their time differences ◮ K = 1 ◮ V : timestamps ◮ { u, v } ∈ E if known time difference between u, v ◮ d : values of the time differences Used in time synchronization of distributed networks 57 / 160

Clock synchronization 58 / 160

Sensor network localization From [Yemini, Proc. CDSN , 1978] The positioning problem arises when it is necessary to locate a set of geographically distributed objects using measurements of the distances between some object pairs ◮ K = 2 ◮ V : (mobile) sensors ◮ { u, v } ∈ E iff distance between u, v is measured ◮ d : distance values Used whenever GPS not viable (e.g. underwater) d uv ∝ ∼ battery consumption in P2P communication betw. u, v 59 / 160

Sensor network localization 60 / 160

Molecular structure from distance data From [L. et al., SIAM Rev. , 2014] ◮ K = 3 ◮ V : atoms ◮ { u, v } ∈ E iff distance between u, v is known ◮ d : distance values Used whenever X-ray crystallography does not apply (e.g. liquid) Covalent bond lengths and angles known precisely Distances � 5 . 5 measured approximately by NMR 61 / 160

Subsection 2 Complexity of the DGP 62 / 160

Complexity class ◮ DGP 1 with d : E → Q + is in NP ◮ if instance YES ∃ realization x ∈ R n × 1 ◮ if some component x i �∈ Q translate x so x i ∈ Q ◮ consider some other x j � ◮ let ℓ = ( length sh. path p : i → j ) = d uv ∈ Q { u,v }∈ p ◮ then x j = x i ± ℓ → x j ∈ Q ◮ ⇒ verification of ∀{ i, j } ∈ E | x i − x j | = d ij in polytime ◮ DGP K may not be in NP for K > 1 don’t know how to verify � x i − x j � 2 = d ij for x �∈ Q nK 63 / 160

Hardness ◮ Want to show DGP 1 is NP -hard by reduction from Partition Given a = ( a 1 , . . . , a n ) ∈ N n , ∃ I ⊆ { 1 , . . . , n } s.t. � a i = � a i ? i ∈ I i �∈ I ◮ a − → cycle C V ( C ) = { 1 , . . . , n } , E ( C ) = {{ 1 , 2 } , . . . , { n, 1 }} ◮ For i < n let d i,i +1 = a i d n,n +1 = d n 1 = a n ◮ E.g. for a = (1 , 4 , 1 , 3 , 3) , get cycle graph: [Saxe, 1979] 64 / 160

Partition is YES ⇒ DGP 1 is YES (1/2) ◮ Given: I ⊂ { 1 , . . . , n } s.t. � a i = � a i i ∈ I i �∈ I ◮ Construct: realization x of C in R 1. x 1 = 0 // start 2. induction step: suppose x i known if i ∈ I let x i +1 = x i + d i,i +1 // go right else let x i +1 = x i − d i,i +1 // go left ◮ Correctness proof: by the same induction but careful when i = n : have to show x n +1 = x 1 65 / 160

Partition is YES ⇒ DGP 1 is YES (2/2) � � (1) = ( x i +1 − x i ) = d i,i +1 = i ∈ I i ∈ I � � = a i = a i = i ∈ I i �∈ I � � = d i,i +1 = ( x i − x i +1 ) = (2) i �∈ I i �∈ I � � � (1) = (2) ⇒ ( x i +1 − x i ) = ( x i − x i +1 ) ⇒ ( x i +1 − x i ) = 0 i ∈ I i �∈ I i ≤ n ⇒ ( x n +1 − x n ) + ( x n − x n − 1 ) + · · · + ( x 3 − x 2 ) + ( x 2 − x 1 ) = 0 ⇒ x n +1 = x 1 66 / 160

Partition is NO ⇒ DGP 1 is NO ◮ By contradiction: suppose DGP 1 is YES, x realization of C ◮ F = {{ u, v } ∈ E ( C ) | x u ≤ x v } , E ( C ) � F = {{ u, v } ∈ E ( C ) | x u > x v } ◮ Trace x 1 , . . . , x n : follow edges in F ( → ) and in E ( C ) � F ( ← ) � � ( x v − x u ) = ( x u − x v ) { u,v }∈ F { u,v }�∈ F � � | x u − x v | = | x u − x v | { u,v }∈ F { u,v }�∈ F � � d uv = d uv { u,v }∈ F { u,v }�∈ F ◮ Let J = { i < n | { i, i + 1 } ∈ F } ∪ { n | { n, 1 } ∈ F } � � ⇒ a i = a i i ∈ J i �∈ J ◮ So J solves Partition instance, contradiction ◮ ⇒ DGP is NP-hard; since ∈ NP , DGP 1 is also NP-complete 67 / 160

Subsection 3 Number of solutions 68 / 160

Number of solutions ◮ ( G, K ) : DGP instance ˜ X ⊆ R Kn : set of solutions ◮ ◮ Congruence : composition of translations, rotations, reflections ◮ C = set of congruences in R K ◮ x ∼ y means ∃ ρ ∈ C ( y = ρx ) : distances in x are preserved in y through ρ ◮ ⇒ if | ˜ X | > 0 , | ˜ X | = 2 ℵ 0 69 / 160

Number of solutions modulo congruences ◮ Congruence is an equivalence relation ∼ on ˜ X (reflexive, symmetric, transitive) ◮ Partitions ˜ X into equivalence classes ◮ X = ˜ X/ ∼ : sets of representatives of equivalence classes ◮ Focus on | X | rather than | ˜ X | 70 / 160

Examples 71 / 160

Rigidity, flexibility and | X | ◮ infeasible ⇔ | X | = 0 ◮ rigid graph ⇔ | X | < ℵ 0 ◮ globally rigid graph ⇔ | X | = 1 ◮ flexible graph ⇔ | X | = 2 ℵ 0 ◮ DMDGP graphs ⇔ | X | a power of 2 ◮ | X | = ℵ 0 : impossible by Milnor’s theorem [Milnor 1964, L. et al. 2013] 72 / 160

Milnor’s theorem implies | X | � = ℵ 0 ◮ System S of polynomial equations of degree 2 ∀ i ≤ m p i ( x 1 , . . . , x nK ) = 0 ◮ Let X be the set of x ∈ R nK satisfying S ◮ Number of connected components of X is O (3 nK ) [Milnor 1964] ◮ If | X | is countably ∞ then G cannot be flexible ⇒ incongruent elts of X are separate connected components ⇒ by Milnor’s theorem, there’s finitely many of them 73 / 160

Subsection 4 MP based solution methods 74 / 160

Direct methods 75 / 160

System of quadratic equations � x u − x v � 2 = d 2 ∀{ u, v } ∈ E (4) uv Computationally: useless (less than 10 vertices with K = 3 using Octave) 76 / 160

Unconstrained Global Optimization � ( � x u − x v � 2 − d 2 uv ) 2 min (5) x { u,v }∈ E Globally optimal obj. fun. value of (5) is 0 iff x solves (4) Computational experiments in [L. et al., 2006] : ◮ GO solvers from 10 years ago ◮ randomly generated protein data: ≤ 50 atoms ◮ cubic crystallographic grids: ≤ 64 atoms 77 / 160

Constrained global optimization � |� x u − x v � 2 − d 2 ◮ min x uv | exactly reformulates (4) { u,v }∈ E ◮ Relax objective f to concave part, remove constant term, rewrite min − f as max f ◮ Reformulate convex part of obj. fun. to convex constraints ◮ Exact reformulation � � � x u − x v � 2 max x (6) { u,v }∈ E � x u − x v � 2 ≤ d 2 ∀{ u, v } ∈ E uv Theorem (Activity) At a glob. opt. x ∗ of a YES instance, all constraints of (6) are active [Mencarelli et al. 2017] 78 / 160

Semidefinite Programming 79 / 160

Linearization � x i − x j � 2 2 = d 2 ∀{ i, j } ∈ E ij � x i � 2 2 + � x j � 2 2 − 2 x i · x j = d 2 ⇒ ∀{ i, j } ∈ E ij � ∀{ i, j } ∈ E d 2 X ii + X jj − 2 X ij = ij ⇒ x x ⊤ X = X = x x ⊤ ⇔ ∀ i, j X ij = x i x j 80 / 160

Relaxation x x ⊤ X = X − x x ⊤ ⇒ = 0 X − x x ⊤ (relax) ⇒ � 0 � I K � x ⊤ � Schur ( X, x ) = 0 x X If x does not appear elsewhere ⇒ get rid of it (e.g. choose x = 0 ): replace Schur ( X, x ) � 0 by X � 0 Reason for this weird relaxation: there are efficient solvers for Semidefinite Programming (SDP) 81 / 160

SDP relaxation min F • X d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij X � 0 How do we choose F ? 82 / 160

Some possible objective functions ◮ For protein conformation: � max ( X ii + X jj − 2 X ij ) { i,j }∈ E with = changed to ≤ in constraints (or min and ≥ ) [Dias & L. 2016] “push-and-pull” the realization ◮ [Ye, 2003], application to wireless sensors localization min tr ( X ) improve covariance estimator accuracy ◮ How about “just random”? [Barvinok 1995] 83 / 160

Computational evaluation ◮ Download protein files from Protein Data Bank (PDB) they contain atom realizations ◮ Mimick a Nuclear Magnetic Resonance experiment Keep only pairwise distances < 5 . 5 ◮ Try and reconstruct the protein shape from those weighted graphs ◮ Quality evaluation of results: ◮ LDE ( x ) = max { i,j }∈ E | � x i − x j � − d ij | � 1 ◮ MDE ( x ) = | � x i − x j � − d ij | | E | { i,j }∈ E 84 / 160

Objective function tests SDP solved with Mosek SDP + PCA Instance LDE MDE CPU Name | V | | E | PP Ye Rnd PP Ye Rnd PP Ye Rnd C0700odd.1 15 39 3.31 4.57 4.44 1.92 2.52 2.50 0.13 0.07 0.08 C0700odd.C 36 242 10.61 4.85 4.85 3.02 3.02 3.02 0.69 0.43 0.44 C0700.odd.G 36 308 4.57 4.77 4.77 2.41 2.84 2.84 0.86 0.54 0.54 C0150alter.1 37 335 4.66 4.88 4.86 2.52 3.00 3.00 0.97 0.59 0.58 C0080create.1 60 681 7.17 4.86 4.86 3.08 3.19 3.19 2.48 1.46 1.46 tiny 37 335 4.66 4.88 4.88 2.52 3.00 3.00 0.97 0.60 0.60 1guu-1 150 959 10.20 4.93 4.93 3.43 3.43 3.43 9.23 5.68 5.70 SDP + PCA + NLP Instance LDE MDE CPU | V | | E | Name PP Ye Rnd PP Ye Rnd PP Ye Rnd 1b03 89 456 0.00 0.00 0.00 0.00 0.00 0.00 8.69 6.28 9.91 1crn 138 846 0.81 0.81 0.81 0.07 0.07 0.07 33.33 31.32 44.48 1guu-1 150 959 0.97 4.93 0.92 0.10 3.43 0.08 56.45 7.89 65.33 [Dias & L., 2016] 85 / 160

Empirical considerations ◮ Ye very fast but often imprecise ◮ Random good but nondeterministic ◮ Push-and-Pull relaxes X ii + X jj − 2 X ij = d 2 ij to X ii + X jj − 2 X ij ≥ d 2 ij , easier to satisfy feasibility ...will be useful in DDP later on Focus on Push-and-Pull objective 86 / 160

Diagonally Dominant Programming 87 / 160

When SDP solvers hit their size limit ◮ SDP solver: technological bottleneck ◮ How can we best use an LP solver? ◮ Diagonally Dominant (DD) matrices are PSD ◮ Not vice versa : inner approximate PSD cone Y � 0 ◮ Idea by A.A. Ahmadi and co-authors [Ahmadi & Majumdar 2014, Ahmadi & Hall 2015] 88 / 160

Diagonally dominant matrices n × n matrix X is DD if � ∀ i ≤ n X ii ≥ | X ij | . j � = i 1 0 . 1 − 0 . 2 0 0 . 04 0   0 . 1 1 − 0 . 05 0 . 1 0 0   E.g. − 0 . 2 − 0 . 05 1 0 . 1 0 . 01 0     0 0 . 1 0 . 1 1 0 . 2 0 . 3     0 . 04 0 0 . 01 0 . 2 1 − 0 . 3   0 0 0 0 . 3 − 0 . 3 1 89 / 160

DD Linearization � ∀ i ≤ n X ii ≥ | X ij | ( ∗ ) j � = i ◮ introduce “sandwiching” variable T ◮ write | X | as T ◮ add constraints − T ≤ X ≤ T ◮ by ≥ constraint sense, write ( ∗ ) as � X ii ≥ T ij j � = i 90 / 160

DD Programming (DDP) � d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij X is DD  d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij  �   ∀ i ≤ n T ij ≤ X ii ⇒ j ≤ n j � = i    − T ≤ X ≤ T 91 / 160

DDP formulation for the DGP �  min ( X ii + X jj − 2 X ij )   { i,j }∈ E    d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij ≥   ij  � ∀ i ≤ n ≤ T ij X ii j ≤ n   j � = i   − T ≤ X ≤ T     ≥ T 0 [Dias & L., 2016] 92 / 160

SDP vs. DDP: tests Using “push-and-pull” objective in SDP SDP solved with Mosek, DDP with CPLEX SDP/DDP + PCA SDP DDP Instance LDE MDE CPU modl/soln LDE MDE CPU modl/soln C0700odd.1 0.79 0.34 0.06/0.12 0.38 0.30 0.15/0.15 C0700.odd.G 2.38 0.89 0.57/1.16 1.86 0.58 1.11/0.95 C0150alter.1 1.48 0.45 0.73/1.33 1.54 0.55 1.23/1.04 C0080create.1 2.49 0.82 1.63/7.86 0.98 0.67 3.39/4.07 0.50 0.15 6.67/684.89 1.00 0.85 37.74/153.17 1guu-1 93 / 160

Subsection 5 Barvinok’s naive algorithm 94 / 160

Concentration of measure From [Barvinok, 1997] The value of a “well behaved” function at a random point of a “big” probability space X is “very close” to the mean value of the function. and In a sense, measure concentration can be considered as an extension of the law of large numbers. 95 / 160

Concentration of measure Given Lipschitz function f : X → R s.t. ∀ x, y ∈ X | f ( x ) − f ( y ) | ≤ L � x − y � 2 for some L ≥ 0 , there is concentration of measure if ∃ constants c, C s.t. P x ( | f ( x ) − E ( f ) | > ε ) ≤ c e − Cε 2 /L 2 ∀ ε > 0 ≡ “discrepancy from mean is unlikely” 96 / 160

Barvinok’s theorem Consider: ◮ for each k ≤ m , manifolds X k = { x ∈ R n | x ⊤ Q k x = a k } ◮ a feasibility problem x ∈ � X k k ≤ m ◮ its SDP relaxation ∀ x ≤ m ( Q k • X = a k ) with soln. ¯ X X ) , y ∼ N n (0 , 1) and x ′ = Ty Let T = factor ( ¯ Then ∃ c and n 0 ∈ N s.t. if n ≥ n 0 , � � � � ¯ ∀ k ≤ m dist ( x ′ , X k ) ≤ c Prob X � 2 ln n ≥ 0 . 9 . IDEA: since x ′ is “close” to each X k try local Nonlinear Programming (NLP) 97 / 160

Application to the DGP X ij = { x ∈ R nK | � x i − x j � 2 ◮ ∀{ i, j } ∈ E 2 = d 2 ij } � ◮ DGP can be written as X ij { i,j }∈ E ◮ SDP relaxation X ii + X jj − 2 X ij = d 2 ij ∧ X � 0 with soln. ¯ X ◮ Difference with Barvinok: x ∈ R Kn , rk ( ¯ X ) ≤ K ◮ IDEA: sample y ∼ N nK (0 , 1 K ) √ ◮ Thm. Barvinok’s theorem works in rank K [L. & Vu, unpublished] 98 / 160

The heuristic 1. Solve SDP relaxation of DGP, get soln. ¯ X use DDP+LP if SDP+IPM too slow 2. a. T = factor ( ¯ X ) b. y ∼ N nK (0 , 1 K ) √ c. x ′ = Ty 3. Use x ′ as starting point for a local NLP solver on formulation � � x i − x j � 2 − d 2 � � 2 min ij x { i,j }∈ E and return improved solution x [Dias & L., 2016] 99 / 160

SDP+Barvinok vs. DDP+Barvinok SDP DDP Instance LDE MDE CPU LDE MDE CPU C0700odd.1 0.00 0.00 0.63 0.00 0.00 1.49 C0700.odd.G 0.00 0.00 21.67 0.42 0.01 30.51 C0150alter.1 0.00 0.00 29.30 0.00 0.00 34.13 C0080create.1 0.00 0.00 139.52 0.00 0.00 141.49 1b03 0.18 0.01 132.16 0.38 0.05 101.04 0.78 0.02 800.67 0.76 0.04 522.60 1crn 1guu-1 0.79 0.01 1900.48 0.90 0.04 667.03 Most of the CPU time taken by local NLP solver 100 / 160

Distance Geometry in Data Science Leo Liberti, CNRS LIX Ecole - PowerPoint PPT Presentation

Distance Geometry in Data Science Leo Liberti, CNRS LIX Ecole Polytechnique liberti@lix.polytechnique.fr CNMAC 2017 1 / 160 Line of reasoning for this talk 1. Graphs and weighted graphs necessary to model data 2. Computers can reason by

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Stochastic geometry and random generation 1 Stochastic geometry and random generation

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Distance fields imre paadik Overview Signed distance fields Distance fields in computer

Distance Computation on Boost.Geometry Vissarion Fisikopoulos FOSDEM 2018 Hello World! Distance

Algorithms in Bioinformatics: Proteins Methods for protein Molecular Distance Geometry

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d

Distance Education Technologies: Distance Education Technologies: Distance Education

CCLD 363 CCLD 363 Distance Field Distance Field Education Education Placements Placements

Phylogenetics: Distance Methods COMP 571 Luay Nakhleh, Rice University Outline Evolutionary

Distance Methods Distance Estimates attempt to estimate the mean number of changes per site

Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline

The Blame Game for Property-based Testing: work-in-progress Alberto Momigliano, joint work with

Hemip Hem iplegic legic Sh Shou oulder lder Power Point for staff education sessions

Computational Models of Events Lecture 5: Events above the Sentence: Discourse and Narratives

Tensor-Based Abduction in Horn Propositional Programs Yaniv Aspis, Krysia Broda, Alessandra

Rigorous Explanations for Machine Learning Models Joao Marques-Silva (joint work with A. Ignatiev

A formal explication of the search for explanations. The adaptive logics approach to abductive

Natural Language Processing Watson Question Answering Dan Klein UC Berkeley The following

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

Distance Geometry in Data Science Leo Liberti, CNRS LIX Ecole - PowerPoint PPT Presentation

Distance Geometry in Data Science Leo Liberti, CNRS LIX Ecole Polytechnique liberti@lix.polytechnique.fr CNMAC 2017 1 / 160 Line of reasoning for this talk 1. Graphs and weighted graphs necessary to model data 2. Computers can reason by

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Stochastic geometry and random generation 1 Stochastic geometry and random generation

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Distance fields imre paadik Overview Signed distance fields Distance fields in computer

Distance Computation on Boost.Geometry Vissarion Fisikopoulos FOSDEM 2018 Hello World! Distance

Algorithms in Bioinformatics: Proteins Methods for protein Molecular Distance Geometry

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics &amp; PCA 3d geometry 3d geometry 3d

Distance Education Technologies: Distance Education Technologies: Distance Education

CCLD 363 CCLD 363 Distance Field Distance Field Education Education Placements Placements

Phylogenetics: Distance Methods COMP 571 Luay Nakhleh, Rice University Outline Evolutionary

Distance Methods Distance Estimates attempt to estimate the mean number of changes per site

Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline

The Blame Game for Property-based Testing: work-in-progress Alberto Momigliano, joint work with

Hemip Hem iplegic legic Sh Shou oulder lder Power Point for staff education sessions

Computational Models of Events Lecture 5: Events above the Sentence: Discourse and Narratives

Tensor-Based Abduction in Horn Propositional Programs Yaniv Aspis, Krysia Broda, Alessandra

Rigorous Explanations for Machine Learning Models Joao Marques-Silva (joint work with A. Ignatiev

A formal explication of the search for explanations. The adaptive logics approach to abductive

Natural Language Processing Watson Question Answering Dan Klein UC Berkeley The following

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d