Approximating the Best–Fit Tree Under Lp Norms
Boulos Harb, Sampath Kannan and Andrew McGregor, UPenn
Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath - - PowerPoint PPT Presentation
Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath Kannan and Andrew McGregor, UPenn 0 = h L x,z ( x ) x ( dz ) 1 t + h L x,y x ( s ) ( x ) ds t L x,z ( x ) E y t 0 The
Boulos Harb, Sampath Kannan and Andrew McGregor, UPenn
Lp(D, T) =
i,j
|D[i, j] − T[i, j]|p
1/p
+ h 1 tε
tε Lx,yx(s)ϕ(x) ds − tε
+ 1 tε
tε Lx,yx(s)ϕ(x) ds − Ex,y tε Lx = h Lxϕ(x) + hθε(x, y)
Lp(D, T) =
i,j
|D[i, j] − T[i, j]|p
1/p
+ h 1 tε
tε Lx,yx(s)ϕ(x) ds − tε
+ 1 tε
tε Lx,yx(s)ϕ(x) ds − Ex,y tε Lx = h Lxϕ(x) + hθε(x, y)
+ h 1 tε
tε Lx,yx(s)ϕ(x) ds − tε
+ 1 tε
tε Lx,yx(s)ϕ(x) ds − Ex,y tε Lx = h Lxϕ(x) + hθε(x, y)
Lrel(D, T) =
max D[i, j] T[i, j] , T[i, j] D[i, j]
weighted tree in which all leaves are equidistance from root.
∀x, y, z ∈ [n] T[x, y] ≤ max{T[x, z], T[z, y]}
∀w, x, y, z ∈ [n] T[w, x] + T[y, z] ≤ max{T[w, y] + T[x, z], T[w, z] + T[x, y]}
weighted tree in which all leaves are equidistance from root.
2 2 1 1 1 2 2 3 3 1 4 3 3 3 2
∀x, y, z ∈ [n] T[x, y] ≤ max{T[x, z], T[z, y]}
∀w, x, y, z ∈ [n] T[w, x] + T[y, z] ≤ max{T[w, y] + T[x, z], T[w, z] + T[x, y]}
Theorist Computational Geometer Shell Fish Chimp Orangutan Bee Fish Wasp Spider
Exact construction of best-fit ultrametric under L∞
Exact construction of best-fit ultrametric under L∞
3 approximation of best-fit tree under L∞
Exact construction of best-fit ultrametric under L∞
3 approximation of best-fit tree under L∞
n1/p approximation of best-fit non-contracting ultrametric under Lp
Exact construction of best-fit ultrametric under L∞
3 approximation of best-fit tree under L∞
n1/p approximation of best-fit non-contracting ultrametric under Lp
O(log1/p n) approximation of best-fit line metric under Lp
Lp: O(k log n)1/p approximation to best-fit tree where k is the number of distinct distances in D Lrel: O(log2 n) approximation to best-fit ultrametric
Lp: n1/p approximation to best-fit tree
a) There exists a best-fit (under L1) ultrametric whose distances are a subset of {d1,d2,... , dk}
a) There exists a best-fit (under L1) ultrametric whose distances are a subset of {d1,d2,... , dk} b) There exists an ultrametric whose distances are a subset of {d1,d2,... , dk} whose cost-of-fit is at most twice optimal (under Lp).
a) There exists a best-fit (under L1) ultrametric whose distances are a subset of {d1,d2,... , dk} b) There exists an ultrametric whose distances are a subset of {d1,d2,... , dk} whose cost-of-fit is at most twice optimal (under Lp). c)There exists an ultrametric with O(log n) distances whose cost-of-fit is at most twice optimal (under Lrel). [Assuming dk/d1 is polynomial in n.]
d1 d2 d3 d4
d1 d2 d3 d4
d1 d2 d3 d4
“Splitting Distance” of internal node v = Distance between leaves of subtree rooted a v
d1 d2 d3 d4
“Splitting Distance” of internal node v = Distance between leaves of subtree rooted a v
d1 d2 d3 d4
“Splitting Distance” of internal node v = Distance between leaves of subtree rooted a v
Set length of inter-cluster edges to dk All other lengths will be set to ≤ dk-1
T[i,j]=dk
T[i,j]≤dk-1
(|we| if e is split) +
(|we| if e is not split)
(|we| if e is split) +
(|we| if e is not split)
+2 +1 +3 +1
+2
(|we| if e is split) +
(|we| if e is not split)
+2 +1 +3 +1
+2
Best-Fit Ultrametric Instance:
20 11 14 17 20 18 20 20 18 20
Best-Fit Ultrametric Instance:
20 11 14 17 20 18 20 20 18 20
Possible Splitting Distances: 20, 18, 17, 14, 11
Best-Fit Ultrametric Instance:
20 11 14 17 20 18 20 20 18 20
Possible Splitting Distances: 20, 18, 17, 14, 11 Top level clustering: Increase some lengths to 20 and decrease some length 20 edges to 18
Best-Fit Ultrametric Instance:
20 11 14 17 20 18 20 20 18 20
+9 +6 +3
+2
+2
Correlation Clustering Instance:
Best-Fit Ultrametric Instance:
20 11 14 17 20 18 20 20 18 20
+9 +6 +3
+2
+2
Correlation Clustering Instance:
Best-Fit Ultrametric Instance:
20 11 14 17 20 18 20 20 18 20
+9 +6 +3
+2
+2
Correlation Clustering Instance:
Best-Fit Ultrametric Instance:
20 11 14 17 20 20 20 18 20
+9 +6 +3
+2
+2
Correlation Clustering Instance:
20
Best-Fit Ultrametric Instance:
20 11 14 17 20 20 20 18 20
+9 +6 +3
+2
+2
Correlation Clustering Instance:
20
Cost of length changes = Cost of disagreements during clustering
Best-Fit Ultrametric Instance:
20 11 14 17 20 20 20 18 20
+9 +6 +3
+2
+2
Correlation Clustering Instance:
20
Cost of length changes = Cost of disagreements during clustering
17 11 14
Recurse:
18
Best-Fit Ultrametric Instance:
20 11 14 17 20 20 20 18 20
+9 +6 +3
+2
+2
Correlation Clustering Instance:
20
Cost of length changes = Cost of disagreements during clustering
17 11 14
Recurse:
14
Cost of optimal clustering is ≤ OPT Cost of our clustering is ≤ O(log n) OPT
Cost of optimal clustering is ≤ OPT Cost of our clustering is ≤ O(log n) OPT
Cost of optimal clustering is ≤ OPT Cost of our clustering is ≤ O(log n) OPT
Cost of optimal clustering is ≤ OPT Cost of our clustering is ≤ O(log n) OPT
Cost of optimal clustering is ≤ OPT Cost of our clustering is ≤ O(log n) OPT
Consider reducing maximum length to d and forcing a partition “Push-down-cost(d)” - cost of reducing each length ≥ d to d “Cutting-cost(d)” - cost of increasing cut edge’s length to d
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 1 9 6 3 2 1 1 2 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 1 9 6 3 2 1 1 2 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 1 9 6 3 2 1 1 2 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 8 5 2 1 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3 Split at 19: Push-down cost = 1, Cut-cost =1
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 8 5 2 1 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3 Split at 19: Push-down cost = 1, Cut-cost =1
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 8 5 2 1 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3 Split at 19: Push-down cost = 1, Cut-cost =1
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 7 4 1 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3 Split at 19: Push-down cost = 1, Cut-cost =1 Split at 18: Push-down cost = 5, Cut-cost = 0
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 7 4 1 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3 Split at 19: Push-down cost = 1, Cut-cost =1 Split at 18: Push-down cost = 5, Cut-cost = 0
Best-Fit Ultrametric Instance:
19 11 14 17 20 18 19 19 18 19 7 4 1 1
Minimum Cut Instance:
Split at 20: Push-down cost = 0, Cut-cost = 3 Split at 19: Push-down cost = 1, Cut-cost =1 Split at 18: Push-down cost = 5, Cut-cost = 0
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
T[i,j]=dk-1
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
d1 dk-2 dk-1 dk
mind Push-down-cost(d) + Cutting-cost(d) ≤ OPT
An α-approx to the optimal “a-restricted ultrametric” (under Lp) can be used to construct an 3α-approx to the optimal tree metric under (under Lp).
An α-approx to the optimal “a-restricted ultrametric” (under Lp) can be used to construct an 3α-approx to the optimal tree metric under (under Lp).
For all i, T[a,i] = 2μ For all i,j, 2μ ≥ T[i,j] ≥2 (μ-min (D[a,i], D[a,j])) where μ=maxi D[a,i ]
Lp: O(min(n, k log n))1/p approximation where k is the number of distinct distances in D Lrel: O(log2 n) approximation
Lp: n1/p approximation
Lp: O(min(n, k log n))1/p approximation where k is the number of distinct distances in D Lrel: O(log2 n) approximation
Lp: n1/p approximation Late Breaking News: Upcoming FOCS paper byAilon and Charikar has improved results!