Faster Core-Set Construction and Data-Stream Algorithm in Fixed - - PowerPoint PPT Presentation

▶

Feb 02, 2023 287 likes •449 views

Faster Core-Set Construction and Data-Stream Algorithm in Fixed Dimentions Timothy M. Chan Main Results: 1) Diameter: O(n+1/ d-3/2 ), improvement from O(n+1/ d-1/2 ) 2) Width: O(n+1/ d-1 ) improvement from O(n+1/ 3(d-1)/2 ) 3)

SLIDE 1

Faster Core-Set Construction and Data-Stream Algorithm in Fixed Dimentions

Timothy M. Chan

SLIDE 2

Main Results:

1) Diameter: O(n+1/ε d-3/2), improvement from O(n+1/ε d-1/2) 2) Width: O(n+1/ε d-1) improvement from O(n+1/ε 3(d-1)/2) 3) Enclosing Cylinder: O(n+1/ε d-1) improvement from O(n+1/ε 3(d-1)/2) 4) Dynamic Data structures with smaller polylogarithmic factor for update.

SLIDE 3

Main Observation:

Let [E] represent integers {1,...,E} Let pi denote i-th coordinate of a given point p in d dimentional space. Fix δ>0 and suppose Eδ ≤ F ≤ E, where E and F are integers. Let P ⊆ [E]d-1 ×  be an n-point set For every ξ ∈[F]d-1 compute q[ξ]=argmaxp∈P(p1ξ1 + ....+ pd−1ξd−1 + pd ) This can be done in O(n+Ed-2F) total time.

SLIDE 4

Geometric Interpretation in 2D

SLIDE 5

Algorithms

1. for i ∈[E] do
2. for ξ2,...,ξd−1 ∈[F] do
3. r[i,ξ2,...,ξd−1] = argmaxp1 =i, p∈P(p2ξ2 + ...+ pd−1ξd−1 + pd )
4. for ξ2,...,ξd−1 ∈[F] do
5. for ξ1 ∈ [F] do
6. q[ξ1,...,ξd-1] = argmaxp(p1ξ1 + ...+ pd−1ξd−1 + pd )

where p ∈ {r[i,ξ1,...,ξd-1] | i ∈ [E]}

SLIDE 6

Geometric Interpretation in 2D

SLIDE 7

Running Time

Lines 2, 3 is d-1 dimentional subproblem of the same type. Line 4 is of size Fd-2 Line 5, 6 is 2 dimentional subproblem of the same type. We get running time T

d(n) =

d-1(ni) + Fd-2T 2(E) + O(EFd-2 + Fd-1) i=1 E

∑

Base case d = 2 can be handeled by explisidly constructing convex hull using radix-sort O(n + Eδ ) and Grahn's scan. Hence T

2(n) = O(n+F).

By induction and since F ≤ E we get T

d(n) = O(n+Ed-2F)

1. for i ∈[E] do
2. for ξ2,...,ξd−1 ∈[F] do
3. r[i,ξ2,...,ξd−1] = argmaxp1 =i, p∈P(p2ξ2 + ...+ pd−1ξd−1 + pd)
4. for ξ2,...,ξd−1 ∈[F] do
5. for ξ1 ∈ [F] do
6. q[ξ1,...,ξd-1] = argmaxp(p1ξ1 + ...+ pd−1ξd−1 + pd)

where p ∈ {r[i,ξ1,...,ξd-1] | i ∈ [E]}

SLIDE 8

Corollary

Given [E]k × d−k we cancompute nearest neighbor to each grid point is [F]k × {0}d−k it total time O(n+Ek-1F) Proof: Given ξ ∈[F]k × {0}d−k, argminp ∈ P p-ξ is also minimizing ξ

2 - 2p1ξ1 + ...+ 2pkξk + p 2 . Which is the same as to find

argmaxp∈P(2p1ξ1 + ...+ 2pkξk − p

2). pk+1 + ...+ pd = p' ∈ ,

SLIDE 9

Diameter

Theorem 1: Suppose origin ο ∈ box B, where the boundary ∂B is of distance at least 1 from ο. Given an ε-grid over ∂B, for any vector x ∃ a grid point ξ such that the angle ∠(ξ,x) between οξ and x is at most arccos(1-ε 2 / 8) Proof: By scaling assume that x ∈ ∂B. Clearly ∃ ξ : ξ-x ≤ ε / 2. Hence 2ξix ≥ ξ

2 + x 2 − ε 2 / 4 ≥ 2 ξ x − ε 2 / 4 ≥ 2 ξ x (1− ε 2 / 8).

Now since cos∠(ξ,x) = ξix / ξ x theorem follows.

SLIDE 10

Diameter Cont’d

Theorem: The diameter of n points set in d can be approximated to within a factor

f 1+ε in O(n+1/ε d-3/2)

Proof: 1) Compute 2 approximation by takin a random point which will be the origin ο and finding the point furtherst from it - v. The diameter Δ ⊆ ( v ,2 v ) 2) Round each point p ∈ P to point p' on an (εΔ) − grid. 3) Let Ξ be the points of a ε − grid over ∂[-1,1]d 4) Return the fartherst pair among {(pξ,qξ)} ξ∈Ξ where pξ,qξ ∈P maximize (p'ξ-q'ξ)iξ

SLIDE 11

Diameter Cont’d

Clearly this is the direct reduction to the main observation with E = O(1/ε) and F = O(1/ ε ); since we are maximizing p'iξ and minimizing q'iξ for each point on 2d grid of dimention d-1. We get the running time quoted. Remember that ∠(ξ, p'-q') ≤ arccos(1-ε/8). Then: (p'ξ − q'ξ)iξ ≥ (p' - q')iξ ⇒ by Cauch-Schwartz p'ξ − q'ξ ξ ≥ p' − q' ξ (1− ε / 8) ⇒ by definition of rounding pξ − qξ + εΔ ≥ ( p − q + εΔ)(1− ε / 8) since maxξ∈Ξ pξ − qξ is 1 + O(ε) approximation of diameter we can readjust ε to get approximation factor of 1+ε

SLIDE 12

Dynamic Cylinder Approx.

Let Rad(P) denote minimum radius of all cylinders enclosing point set P Let d(p, l) denote the distance between point p and line l. WLOG let ο be the origin. Let v be the point furtherst from ο. Theorem1: d(p, οv) ≤ 2( p v +1)Rad({o,v,p}) Proof: In the triangle ovp let h1,h2,h3 be the respective altitudes for ov, op and pv. 2Rad({ο,ν,p}) = min{h1,h2,h3}=h1 min{1, v p , v p − v }, since h1 = d(p,ov) h1 min{1, v p , v p − v } ≥ d(p,ov)( v p + v ).

SLIDE 13

Dynamic Cylinder

Approx. Cont’d

Theorem2: Given a stream of points in d we can maintain factor-18 approximation of minimum radius

ver all enclosing cylinders in a single pass and O(d) space and update time.

Proof: 1) start with two points o,v, set value of w = 0. insert(p): 2) w = max{w, Rad({o,v,p})} 3) if p > 2 v then v = p Let wf and vf refer to the final values of w and v, and let vi refer to the value after insertion i. Remember that vi > 2 vi-1 for all i.

SLIDE 14

Dynamic Cylinder

Approx. Cont’d

Assume that we just added point q. By theorem 1: d(q, ovj) ≤ 2(2+1)Rad({o,vj,q}) ≤ 6wf. For every i > j: d(q, ovi) ≤ d(q, ovi-1) + d(q', ovi) where q' is the orthoganal projection of q on ovi-1. By similarity of triangles: d(q', ovi) = ( q' / vi-1 )d(vi-1,ovi) and d(q', ovi) ≤ ( q / vi-1 )3wf Therefore: d(q, ovi) ≤ d(q, ovi-1) + 3wf q / vi−1

SLIDE 15

Dynamic Cylinder

Approx. Cont’d

We had d(q, ovi) ≤ d(q, ovi-1) + 3wf q / vi−1 Now due to the doubling property we get: d(q, ovf) ≤ d(q, ovj) + 3wf(1+1/ 2 +1/ 4 + ...) q / vj ≤ 6wf + 2(2)3wf = 18wf