On Efficient Low Distortion Ultrametric Embedding Vincent - - PowerPoint PPT Presentation

on efficient low distortion ultrametric embedding
SMART_READER_LITE
LIVE PREVIEW

On Efficient Low Distortion Ultrametric Embedding Vincent - - PowerPoint PPT Presentation

On Efficient Low Distortion Ultrametric Embedding Vincent Cohen-Addad -- CNRS & Google Zrich Karthik C. S. -- Tel Aviv University Guillaume Lagarde -- LaBRI Flat clustering Flat clustering - Cluster analysis - Features for


slide-1
SLIDE 1

On Efficient Low Distortion Ultrametric Embedding

Vincent Cohen-Addad -- CNRS & Google Zürich Karthik C. S. -- Tel Aviv University Guillaume Lagarde -- LaBRI

slide-2
SLIDE 2

“Flat” clustering

slide-3
SLIDE 3

“Flat” clustering

  • Cluster analysis
  • Features for machine learning
  • Data compression
  • etc.
slide-4
SLIDE 4

Hierarchical clustering

  • Recursive partitioning of the data
  • n points → 2n-1 nested clusters with different granularities
slide-5
SLIDE 5

Ultrametric

  • Recursive partitioning of the data
  • n points → 2n-1 nested clusters with different granularities

Ultrametric Metric where: Triangle inequality d(x,z) ≤ d(x,y) + d(y,z) is strengthened to Ultrametric inequality d(x,z) ≤ max(d(x,y), d(y,z))

150 100 111 34 14 41 11

slide-6
SLIDE 6

Ultrametric

  • Recursive partitioning of the data
  • n points → 2n-1 nested clusters with different granularities

Ultrametric Metric where: Triangle inequality d(x,z) ≤ d(x,y) + d(y,z) is strengthened to Ultrametric inequality d(x,z) ≤ max(d(x,y), d(y,z))

150 100 111 34 14 41 11

d(x,y) = value of the lowest common ancestor

slide-7
SLIDE 7

Ultrametric

  • Recursive partitioning of the data
  • n points → 2n-1 nested clusters with different granularities

Ultrametric Metric where: Triangle inequality d(x,z) ≤ d(x,y) + d(y,z) is strengthened to Ultrametric inequality d(x,z) ≤ max(d(x,y), d(y,z))

150 100 111 34 14 41 11

d(x,y) = value of the lowest common ancestor

slide-8
SLIDE 8

Agglomerative algorithms

  • average-linkage, single-linkage, Ward’s method, complete-linkage, …
  • Produce an embedding of a metric into an ultrametric
  • Bottom-up: proceed by agglomerating the pair of clusters of minimum dissimilarity
slide-9
SLIDE 9

Agglomerative algorithms

  • average-linkage, single-linkage, Ward’s method, complete-linkage, …
  • Produce an embedding of a metric into an ultrametric
  • Bottom-up: proceed by agglomerating the pair of clusters of minimum dissimilarity

Major drawback: quadratic running time

slide-10
SLIDE 10

Ultrametric

Goal: given some dataset, find eiciently its best ultrametric representation

slide-11
SLIDE 11

Ultrametric

Goal: given some dataset, find eiciently its best ultrametric representation wait… the best ?

slide-12
SLIDE 12

Problem statement

BEST ULTRAMETRIC FIT (BUF∞) INPUT:

  • a set V of n elements v1, v2, …, vn
  • a weight function w : V× V→ R

OUTPUT:

  • an ultrametric Δ such that

w(vi, vj) ≤ Δ(vi, vj) ≤ 𝛽 · w(vi, vj)

for the minimal value 𝛽.

slide-13
SLIDE 13

Main results

Theorem 1 (upper bound) There are algorithms that produce, for Euclidean instances of BUF∞

  • For any γ>1, a 5γ-approximation in time O(nd+n1+O(1/γ^2))
  • a √(log n)-approximation in time O(nd + n log2 n)

V = Rd w(vi, vj) = ||vi - vj||2

slide-14
SLIDE 14

Main results

Theorem 2 (lower bounds) -- informal statement

  • Assuming the Strong Exponential Time Hypothesis

(SETH), there is no algorithm running in subquadratic time that can approximate BUF∞ within a factor 3/2−o(1) for the L∞ norm

  • + another lower bound for Euclidean metric under a

“Colinearity Hypothesis”. V = Rd w(vi, vj) = ||vi - vj||2

SETH

w(vi, vj) = ||vi - vj||∞

SAT can’t be solved in 2n(1−o(1))

Theorem 1 (upper bound) There are algorithms that produce, for Euclidean instances of BUF∞

  • For any γ>1, a 5γ-approximation in time O(nd+n1+O(1/γ^2))
  • a √(log n)-approximation in time O(nd + n log2 n)
slide-15
SLIDE 15

Related work

[CM10] (Carlsson and Mémoli) → study of linkage algorithms [Das15] (Dasgupta) → what is a good hierarchical clustering? (cost functions) [MW17] (Moseley and Wang) [CAKMTM18] (Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu) → good approximation guarantees for average-linkage for the (dual of) Dasgupta’s cost function & new algorithms ‘beyond-worst-case’ scenario [CM15] (Cochez and Mou) [ACH19] (Abboud, Cohen-Addad, and Houdrouge) → subquadratic running time implementation of average-linkage and Ward’s method + many others [RP16, CC17, CAKMT17, CCN19, CCNY18, ...]

slide-16
SLIDE 16

Starting point

slide-17
SLIDE 17

Starting point

  • Solves a slightly more general problem
  • Provides an algorithm that runs in O(n2) (given

queries to w are done in constant time)

  • This algorithm is optimal
slide-18
SLIDE 18

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF

slide-19
SLIDE 19

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G

slide-20
SLIDE 20

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T
slide-21
SLIDE 21

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

a b

slide-22
SLIDE 22

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

L R

a b

slide-23
SLIDE 23

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

L R

a b

84

slide-24
SLIDE 24

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

117 13 10 43 50 23 61 80 8

slide-25
SLIDE 25

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

3. Compute the cartesian tree

117 13 10 43 50 23 61 80 8

slide-26
SLIDE 26

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

3. Compute the cartesian tree

117 13 10 43 50 23 61 80 8

slide-27
SLIDE 27

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

3. Compute the cartesian tree

117 13 10 43 50 23 61 80 8 117

slide-28
SLIDE 28

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

3. Compute the cartesian tree → This gives a γ · ฀-approximation to BUF∞

117 13 10 43 50 23 61 80 8 117

slide-29
SLIDE 29

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

3. Compute the cartesian tree → This gives a γ · ฀-approximation to BUF∞ Based on γ-spanner constructions using Har-Peled, Indyk, Sidiropoulos Any γ>1 γ = √(log n) Locality sensitive hash family (Andoni and Indyk) Lipschitz partitions (Charikar et al.) O(nd+n1+O(1/γ^2)) O(nd+n log2 n) Fast implementation in Euclidean space of dimension d

slide-30
SLIDE 30

APPROX-BUF: an approximation algorithm for BUF∞

APPROX-BUF 1. Compute a γ-approximate MST T over the complete graph G 2. Compute a ฀-estimate of the cut weights

  • f the edges in T

3. Compute the cartesian tree → This gives a γ · 5-approximation to BUF∞ Tweak a union-find data structure and compute bottom-up the cut weights Fast implementation in Euclidean space of dimension d ฀=5 Triangular inequality O(nd+n log n)

slide-31
SLIDE 31

THEORY REAL LIFE

slide-32
SLIDE 32

THEORY REAL LIFE

slide-33
SLIDE 33

Experiments: maximum distortion

DIABETES MICE PENDIGITS Average 11.1 9.7 27.5 Complete 18.5 11.8 33.8 Single 6.0 4.9 14 Ward 61.0 59.3 433.8 Approx-BUF 41.0 51.2 109.8 Approx-BUF2 9.6 9.4 37.2 Farach et al. 6.0 4.9 13.9

  • DIABETES -- 768 samples, 8 features
  • MICE -- 1080 samples, 77 features
  • PENDIGITS -- 10992 samples, 16 features

Approx-BUF: approx MST + approx cut weights Approx-BUF2: exact MST + approx cut weights

𝜷 = maxvi,vj Δ(vi,vj)/w(vi,vj)

slide-34
SLIDE 34

Experiments: running time

Running times, in seconds

slide-35
SLIDE 35

Conclusion

  • Seems promising
  • A good MST is crucial → can we compute a better one efficiently?
  • Cut weights suffer from an approximation of ฀=5 → can we do better?
slide-36
SLIDE 36

Thanks!