on efficient low distortion ultrametric embedding
play

On Efficient Low Distortion Ultrametric Embedding Vincent - PowerPoint PPT Presentation

On Efficient Low Distortion Ultrametric Embedding Vincent Cohen-Addad -- CNRS & Google Zrich Karthik C. S. -- Tel Aviv University Guillaume Lagarde -- LaBRI Flat clustering Flat clustering - Cluster analysis - Features for


  1. On Efficient Low Distortion Ultrametric Embedding Vincent Cohen-Addad -- CNRS & Google Zürich Karthik C. S. -- Tel Aviv University Guillaume Lagarde -- LaBRI

  2. “Flat” clustering

  3. “Flat” clustering - Cluster analysis - Features for machine learning - Data compression - etc.

  4. Hierarchical clustering Recursive partitioning of the data ● n points → 2n-1 nested clusters with different granularities ●

  5. Ultrametric Recursive partitioning of the data ● Ultrametric n points → 2n-1 nested clusters with different granularities ● Metric where: 150 Triangle inequality d(x,z) ≤ d(x,y) + d(y,z) 100 111 34 is strengthened to 41 11 14 Ultrametric inequality d(x,z) ≤ max(d(x,y), d(y,z))

  6. Ultrametric Recursive partitioning of the data ● Ultrametric n points → 2n-1 nested clusters with different granularities ● Metric where: 150 Triangle inequality d(x,z) ≤ d(x,y) + d(y,z) 100 111 34 is strengthened to 41 11 14 Ultrametric inequality d(x,z) ≤ max(d(x,y), d(y,z)) d(x,y) = value of the lowest common ancestor

  7. Ultrametric Recursive partitioning of the data ● Ultrametric n points → 2n-1 nested clusters with different granularities ● Metric where: 150 Triangle inequality d(x,z) ≤ d(x,y) + d(y,z) 100 111 34 is strengthened to 41 11 14 Ultrametric inequality d(x,z) ≤ max(d(x,y), d(y,z)) d(x,y) = value of the lowest common ancestor

  8. Agglomerative algorithms average-linkage, single-linkage, Ward’s method, complete-linkage, … ● Produce an embedding of a metric into an ultrametric ● Bottom-up: proceed by agglomerating the pair of clusters of minimum dissimilarity ●

  9. Agglomerative algorithms average-linkage, single-linkage, Ward’s method, complete-linkage, … ● Produce an embedding of a metric into an ultrametric ● Bottom-up: proceed by agglomerating the pair of clusters of minimum dissimilarity ● Major drawback: quadratic running time

  10. Ultrametric Goal : given some dataset, find eiciently its best ultrametric representation

  11. Ultrametric Goal : given some dataset, find eiciently its best ultrametric representation wait… the best ?

  12. Problem statement BEST ULTRAMETRIC FIT (BUF ∞ ) INPUT : a set V of n elements v 1 , v 2 , …, v n ● a weight function w : V× V→ R ● OUTPUT : an ultrametric Δ such that ● w(v i , v j ) ≤ Δ(v i , v j ) ≤ 𝛽 · w(v i , v j ) for the minimal value 𝛽 .

  13. Main results V = R d Theorem 1 (upper bound) There are algorithms that produce, for Euclidean instances of BUF ∞ For any γ>1, a 5 γ -approximation in time O(nd+n 1+O(1/γ^2) ) ● w(v i , v j ) = ||v i - v j || 2 a √(log n)-approximation in time O(nd + n log 2 n) ●

  14. Main results V = R d Theorem 1 (upper bound) There are algorithms that produce, for Euclidean instances of BUF ∞ For any γ>1, a 5 γ -approximation in time O(nd+n 1+O(1/γ^2) ) ● w(v i , v j ) = ||v i - v j || 2 a √(log n)-approximation in time O(nd + n log 2 n) ● SAT can’t be Theorem 2 (lower bounds) -- informal statement solved in 2 n(1−o(1)) Assuming the Strong Exponential Time Hypothesis ● SETH (SETH) , there is no algorithm running in subquadratic time that can approximate BUF ∞ within a factor 3/2−o(1) for the w(v i , v j ) = ||v i - v j || ∞ L ∞ norm + another lower bound for Euclidean metric under a ● “ Colinearity Hypothesis” .

  15. Related work [CM10] (Carlsson and Mémoli) → study of linkage algorithms [Das15] (Dasgupta) → what is a good hierarchical clustering? (cost functions) [MW17] (Moseley and Wang) [CAKMTM18] (Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu) → good approximation guarantees for average-linkage for the (dual of) Dasgupta’s cost function & new algorithms ‘beyond-worst-case’ scenario [CM15] (Cochez and Mou) [ACH19] (Abboud, Cohen-Addad, and Houdrouge) → subquadratic running time implementation of average-linkage and Ward’s method many others [RP16, CC17, CAKMT17, CCN19, CCNY18, ...] +

  16. Starting point

  17. Starting point Solves a slightly more general problem ● Provides an algorithm that runs in O(n 2 ) (given ● queries to w are done in constant time) This algorithm is optimal ●

  18. APPROX-BUF: an approximation algorithm for BUF ∞ APPROX-BUF

  19. APPROX-BUF: an approximation algorithm for BUF ∞ APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G

  20. APPROX-BUF: an approximation algorithm for BUF ∞ APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀ -estimate of the cut weights of the edges in T

  21. APPROX-BUF: an approximation algorithm for BUF ∞ b APPROX-BUF a 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀ -estimate of the cut weights of the edges in T

  22. APPROX-BUF: an approximation algorithm for BUF ∞ b APPROX-BUF a 1. Compute a γ-approximate MST T over the complete graph G L R 2. Compute a ฀ -estimate of the cut weights of the edges in T

  23. APPROX-BUF: an approximation algorithm for BUF ∞ b APPROX-BUF a 84 1. Compute a γ-approximate MST T over the complete graph G L R 2. Compute a ฀ -estimate of the cut weights of the edges in T

  24. APPROX-BUF: an approximation algorithm for BUF ∞ 10 43 13 117 APPROX-BUF 80 23 8 50 1. Compute a γ-approximate MST T over 61 the complete graph G 2. Compute a ฀ -estimate of the cut weights of the edges in T

  25. APPROX-BUF: an approximation algorithm for BUF ∞ 10 43 13 117 APPROX-BUF 80 23 8 50 1. Compute a γ-approximate MST T over 61 the complete graph G 2. Compute a ฀ -estimate of the cut weights of the edges in T 3. Compute the cartesian tree

  26. APPROX-BUF: an approximation algorithm for BUF ∞ 10 43 13 117 APPROX-BUF 80 23 8 50 1. Compute a γ-approximate MST T over 61 the complete graph G 2. Compute a ฀ -estimate of the cut weights of the edges in T 3. Compute the cartesian tree

  27. APPROX-BUF: an approximation algorithm for BUF ∞ 10 43 13 117 APPROX-BUF 80 23 8 50 1. Compute a γ-approximate MST T over 61 the complete graph G 2. Compute a ฀ -estimate of the cut weights of the edges in T 117 3. Compute the cartesian tree

  28. APPROX-BUF: an approximation algorithm for BUF ∞ 10 43 13 117 APPROX-BUF 80 23 8 50 1. Compute a γ-approximate MST T over 61 the complete graph G 2. Compute a ฀ -estimate of the cut weights of the edges in T 117 3. Compute the cartesian tree → This gives a γ · ฀ -approximation to BUF ∞

  29. APPROX-BUF: an approximation algorithm for BUF ∞ Fast implementation in Euclidean space of dimension d APPROX-BUF Based on γ-spanner constructions using Har-Peled, Indyk, Sidiropoulos 1. Compute a γ-approximate MST T over the complete graph G Any γ >1 γ = √(log n) 2. Compute a ฀ -estimate of the cut weights of the edges in T Locality sensitive hash Lipschitz partitions 3. Compute the cartesian tree family (Andoni and (Charikar et al.) Indyk) → This gives a γ · ฀ -approximation to BUF ∞ O(nd+n 1+O(1/ γ ^2) ) O(nd+n log 2 n)

  30. APPROX-BUF: an approximation algorithm for BUF ∞ Fast implementation in Euclidean space of dimension d APPROX-BUF Tweak a union-find data structure and compute bottom-up the cut weights 1. Compute a γ-approximate MST T over the complete graph G ฀ =5 2. Compute a ฀ -estimate of the cut weights of the edges in T Triangular inequality 3. Compute the cartesian tree O(nd+n log n) → This gives a γ · 5-approximation to BUF ∞

  31. THEORY REAL LIFE

  32. THEORY REAL LIFE

  33. Experiments: maximum distortion DIABETES -- 768 samples, 8 features ● 𝜷 = max vi,vj Δ(v i ,v j )/w(v i ,v j ) MICE -- 1080 samples, 77 features ● PENDIGITS -- 10992 samples, 16 features ● DIABETES MICE PENDIGITS Average 11.1 9.7 27.5 Complete 18.5 11.8 33.8 Single 6.0 4.9 14 Approx-BUF: approx MST + approx cut weights Ward 61.0 59.3 433.8 Approx-BUF 41.0 51.2 109.8 Approx-BUF2: exact MST + approx cut weights Approx-BUF2 9.6 9.4 37.2 Farach et al. 6.0 4.9 13.9

  34. Experiments: running time Running times, in seconds

  35. Conclusion Seems promising ● A good MST is crucial → can we compute a better one efficiently? ● Cut weights suffer from an approximation of ฀ =5 → can we do better? ●

  36. Thanks!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend