lecture 8
play

Lecture 8 Barna Saha AT&T-Labs Research October 3, 2013 - PowerPoint PPT Presentation

Lecture 8 Barna Saha AT&T-Labs Research October 3, 2013 Outline Clustering K-Center K-Center Given a set of distinct points P = { p 1 , p 2 , . . . , p n } find a set of k points Q P , | Q | = k , that minimizes max min q Q


  1. Lecture 8 Barna Saha AT&T-Labs Research October 3, 2013

  2. Outline Clustering K-Center

  3. K-Center ◮ Given a set of distinct points P = { p 1 , p 2 , . . . , p n } find a set of k points Q ⊂ P , | Q | = k , that minimizes max min q ∈ Q d ( p i , q ) i where d is any metric. Suppose the optimal distance is r . If we know r , can find 2-approx in O ( k ) space. Thresholded Algorithm When a new point comes, if the minimum distance of this point from already opened centers is more than 2 r , open a center at that point. Else, assign it to the nearest open center. Can find (2 + ǫ ) approximation in O ( k ǫ log b / a ) space if we know a < r < b Theorem ǫ log 1 (2 + ǫ ) -approximation in O ( k ǫ ) space.

  4. K-Center-Algorithm ◮ Read the first k items in the input. This has error 0. Keep reading the input as long as the error remains 0. ◮ Suppose, we see the first input which causes non-zero error. This gives a lower bound a for r . ◮ Initialize and run the thresholded algorithm for l 0 = a , l 1 = a (1 + ǫ ′ ) , l 2 = a (1 + ǫ ) 2 , ..., l J = a (1 + ǫ ) J = O ( 1 ǫ ). ◮ If the thresholded algorithm declares “FAIL” (tries to open k + 1 centers) for some l i , i ∈ [1 , J ], terminate the algorithm for all l i ′ , i ′ ≤ i . Start running a thresholded algorithm for l i ′ (1 + ǫ ′ ) J +1 for i ′ ∈ [0 , i ] using summarization of threshold l i ′ as the initial input.[Stream-Strapping] ◮ Repeat the above steps until the end of input. At that time report the centers for the lowest estimate for which the thresholded algorithm is still running.

  5. K-center, Sketch Analysis ◮ Suppose end threshold is R and it is updated i times: R 0 , R 0 (1 + ǫ ′ ) J +1 , R 0 (1 + ǫ ) 2( J +1) , ..., R 0 (1 + ǫ ) i ( J +1) ◮ i = 0. Q 1 = P 1 = [ p 1 , p 2 , .., p j ] Error ( Q 1 ) = Error ( P 1 ) ≤ 2 R 0 R 0 OPT ( Q 1 ) > (1 + ǫ ′ ) Error ( Q 1 ) ≤ 2 R 0 ≤ (2 + 2 ǫ ) OPT ( Q 1 ) ◮ i = 1 Q 2 = [ q 1 , q 2 , ..., q k , p j +1 , p j +2 , .., p j ′ ] =, P 2 = p j +1 , p j +2 , .., p j ′ . Terminates with R 1 = R 0 (1 + ǫ ) J +1 R 1 but not with (1+ ǫ ) . Error ( Q 2 ) ≤ 2 R 1 R 1 OPT ( Q 2 ) > 1 + ǫ Error ( Q 2 ) ≤ 2 R 1 = (2 + 2 ǫ ) OPT ( Q 2 )

  6. K-center, Sketch Analysis � P 2 ) and in ◮ Relationships between Error ( Q 2 ) and Error ( P 1 � P 2 ) between OPT ( Q 2 ) and OPT ( P 1 � P 2 ) ≤ Error ( Q 2 ) + Error ( Q 1 ) ≤ 2 R 1 + 2 R 0 = 1 Error ( P 1 � � 1 2 R 1 1 + (1+ ǫ ) J +1 � P 2 ) ≥ OPT ( Q 2 ) − Error ( Q 1 ) ≥ R 1 2 OPT ( P 1 (1+ ǫ ) − 2 R 0 = � � R 1 2 1 − (1+ ǫ ) (1+ ǫ ) J

  7. K-Median r ◮ When we know the optimum solution r : Set f = k (1+log n ) ◮ When considering point x , let δ be the distance to the nearest open center. Open a center at x with probability δ f . Else, assign to the nearest open center.

  8. K-Median Setting the initial estimate Error after reading k + 1th point. How many copies to maintain ? O ( 1 ǫ log 1 ǫ ). But needs O ( 1 ǫ log n ) copies of Stream-Strap to boost the confidence. When to declare an individual estimate is wrong ? If error becomes more than 4(1 + ǫ ) L or open more than k ′ ≃ k log n centers. ǫ ′ Initial Summary k ′ centers weighted by the number of points assigned to those centers. Final Output Run K-median offline algorithm on the selected k ′ weighted centers.

  9. K-Means++ ◮ Extension of K-means clustering: minimizes within cluster sum of squared error. ◮ Initial choice of centers is crucial to guarantee quicker convergence and approximation bound.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend