Algorithms with provable guarantees for clustering problems Ola - - PowerPoint PPT Presentation
Algorithms with provable guarantees for clustering problems Ola - - PowerPoint PPT Presentation
Algorithms with provable guarantees for clustering problems Ola Svensson Where to place rescue centers? Build k centers so as to minimize sum of travel distances Where to place rescue centers? optimize some objective Build k centers so as to
Where to place rescue centers?
Build k centers so as to minimize sum of travel distances
Where to place rescue centers?
Build k centers so as to minimize sum of travel distances
- ptimize some objective
Median and Center
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
decrease distance for 3 clients increase distance for 6 clients
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
decrease distance for 3 clients increase distance for 6 clients decrease distance for 6 clients increase distance for 3 clients
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
CENTER: Open point/facility on real line so as to minimize max distance
- ver all clients ( )
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
CENTER: Open point/facility on real line so as to minimize max distance
- ver all clients ( )
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
CENTER: Open point/facility on real line so as to minimize max distance
- ver all clients ( )
x x
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-CENTER: Open k points/facilities in a metric space so as to minimize max distance over all clients ( ) K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-CENTER: Open k points/facilities in a metric space so as to minimize max distance over all clients ( ) K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
Mathematical formulation of objective functions
Mathematical formulation of objective functions
General Problem parameterized by ๐ โฅ ๐: Find a set ๐ป of k points/facilities in a metric space so as to minimize
๐ ๐ ๐๐๐๐๐
๐ ๐, ๐ป ๐
๐/๐
Mathematical formulation of objective functions
General Problem parameterized by ๐ โฅ ๐: Find a set ๐ป of k points/facilities in a metric space so as to minimize
๐ ๐ ๐๐๐๐๐
๐ ๐, ๐ป ๐
๐/๐
Distance from client j to closest facility in S
K-MEDIAN: ๐ = ๐ K-CENTER: ๐ = โ K-MEANS: ๐ = ๐ Actually, ๐ ๐๐๐๐๐๐ข ๐ ๐, ๐ 2 and Euclidean metric
Facility Location
Facility Location: Open facilities in a metric space so as to minimize sum of distances from clients + opening costs
ALL THESE PROBLEMS ARE INTRACTABLE (NP-HARD) IN THE WORST CASE
Solving intractable problems
- Heuristics
- good for โtypicalโ instances
- bad instances do not happen too often
1 4 16 64 256 1024 4096 16384 50's 70's 80's 90's 00's
Dantzig, Fulkerson, and Johnson solve a 49- city instance to optimality Applegate, Bixby, Chvatal, Cook, and Helsgaun solve a 24978-city instance
!
Sweden has only 9 million inhabitants โ 360 persons/city
Solving intractable problems
- Approximation Algorithms
- Perhaps we can efficiently find a reasonably good solution?
Approximation Ratio: worst case over all instances
- ฮฑ=1 is an exact polynomial time algorithm
- ฮฑ=1.01 then algorithm finds a solution with at most 1% higher cost
GOAL: Complete understanding of worst case behavior
State of the Art
Approximation Hardness Facility Location 1.488
[Liโ11]
1.463
[Guha & Khullerโ98]
K-Center 2
[Gonzalesโ85, Hochbaum & Shmoysโ85]
2
[Hsu & Nemhauserโ79]
K-Median 2.67
[Byrka et alโ15]
1+2/e
[Jain et al.โ02]
K-Means 9
[Kanungo et alโ2004]
1.0013
[Lee. Schmidt, Wrightโ15]
Even better: Approximation algorithms (can be) achieved by standard LP relaxations and techniques transfer between problems
A 2-APPROXIMATION ALGORITHM FOR K-CENTER
Greedy K-Center
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Greedy K-Center
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Greedy K-Center
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Greedy K-Center
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Consider optimal solution and corresponding Voronoi diagram
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Case 1: We opened up one point in each cell
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Case 1: We opened up one point in each cell
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Case 1: We opened up one point in each cell
โค ๐๐๐ โค ๐๐๐
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Case 1: We opened up one point in each cell
โค ๐๐๐ โค ๐๐๐ โค 2 โ ๐๐๐
In this case any client is connected within distance โค ๐ โ ๐ท๐ธ๐ผ
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Case 1I: We did not open up one point in each cell
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Case 1I: We opened up two points in a single cell
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Case 1I: We opened up two points in a single cell
โค ๐๐๐ โค ๐๐๐
Analysis
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
Case 1I: We opened up two points in a single cell
โค ๐๐๐ โค 2 โ ๐๐๐
Also in this case any client is connected within distance โค ๐ โ ๐ท๐ธ๐ผ
โค ๐๐๐
Open any point For ๐ = 2, โฆ , ๐ Open point farthest away from already
- pened points
THEOREM:
The above greedy algorithm is a 2-approximation for k-Center
Gonzales, Hochbaum & Shmoysโ85
ALGORITHMS FOR FACILITY LOCATION AND K-MEDIAN
LINEAR PROGRAMMING RELAXATION
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
LP Relaxation for Facility Location
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
- pening cost
connection cost
LP Relaxation for Facility Location
minimize ๐โ๐บ ๐
๐๐ง๐ + ๐โ๐บ,๐โ๐ท ๐๐๐๐ฆ๐๐
subject to
๐โ๐บ ๐ฆ๐๐ = 1 ๐ โ ๐ท ๐ฆ๐๐ โค ๐ง๐ i โ ๐บ, ๐ โ ๐ท ๐ฆ๐๐, ๐ง๐ โ [0,1] i โ ๐บ, ๐ โ ๐ท
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
LP Relaxation for Facility Location
minimize ๐โ๐บ ๐
๐๐ง๐ + ๐โ๐บ,๐โ๐ท ๐๐๐๐ฆ๐๐
subject to
๐โ๐บ ๐ฆ๐๐ = 1 ๐ โ ๐ท ๐ฆ๐๐ โค ๐ง๐ i โ ๐บ, ๐ โ ๐ท ๐ฆ๐๐, ๐ง๐ โ [0,1] i โ ๐บ, ๐ โ ๐ท
Every client is connected
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
LP Relaxation for Facility Location
minimize ๐โ๐บ ๐
๐๐ง๐ + ๐โ๐บ,๐โ๐ท ๐๐๐๐ฆ๐๐
subject to
๐โ๐บ ๐ฆ๐๐ = 1 ๐ โ ๐ท ๐ฆ๐๐ โค ๐ง๐ i โ ๐บ, ๐ โ ๐ท ๐ฆ๐๐, ๐ง๐ โ [0,1] i โ ๐บ, ๐ โ ๐ท
Clients connected to open facilities
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
LP Relaxation for Facility Location
minimize ๐โ๐บ ๐
๐๐ง๐ + ๐โ๐บ,๐โ๐ท ๐๐๐๐ฆ๐๐
subject to
๐โ๐บ ๐ฆ๐๐ = 1 ๐ โ ๐ท ๐ฆ๐๐ โค ๐ง๐ i โ ๐บ, ๐ โ ๐ท ๐ฆ๐๐, ๐ง๐ โ [0,1] i โ ๐บ, ๐ โ ๐ท
ALGORITHMS USING RELAXATION
Randomized Rounding
Interpret yi as the probability that facility i is opened
Randomized Rounding
Interpret yi as the probability that facility i is opened
Open each facility i with probability yi Connect client to closest opened facility
Randomized Rounding
Interpret yi as the probability that facility i is opened PROBLEM:
- With constant probability: a client has no facility opened close to it
Open each facility i with probability yi Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
While possible select ball with smallest radius that is disjoint from selected balls
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
While possible select ball with smallest radius that is disjoint from selected balls => Every client has a โfall backโ path of length 3 times it radius
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
While possible select ball with smallest radius that is disjoint from selected balls => Every client has a โfall backโ path of length 3 times it radius
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
First constant approximation algorithm
THEOREM:
โdependent roundingโ gives 3.16-approximation algorithm
Shmoys, Tardos, Aardalโ97
Impressive progress based on same LP
THEOREM:
โdependent roundingโ gives (1+2/e)-approximation algorithm
Chudak & Shmoysโ99
THEOREM:
Primal-dual gives 3-approximation algorithm
Jain & Vaziraniโ01, Jain et alโ03, Mahdian et al.โ02
Impressive progress based on same LP
THEOREM:
โdependent roundingโ gives (1+2/e)-approximation algorithm
Chudak & Shmoysโ99
THEOREM:
Primal-dual gives 1.6-approximation algorithm
Jain & Vaziraniโ01, Jain et alโ03, Mahdian et al.โ02
Impressive progress based on same LP
THEOREM:
โdependent roundingโ gives (1+2/e)-approximation algorithm
Chudak & Shmoysโ99
THEOREM:
Primal-dual gives 1.52-approximation algorithm
Jain & Vaziraniโ01, Jain et alโ03, Mahdian et al.โ02
Impressive progress based on same LP
THEOREM:
โdependent roundingโ gives (1+2/e)-approximation algorithm
Chudak & Shmoysโ99
THEOREM:
Primal-dual gives 1.52-approximation algorithm
Jain & Vaziraniโ01, Jain et alโ03, Mahdian et al.โ02
THEOREM:
โdependent roundingโ+primal-dual gives 1.5-approximation algorithm
Byrkaโ07
Impressive progress based on same LP
THEOREM:
Primal-dual gives 1.52-approximation algorithm
Jain & Vaziraniโ01, Jain et alโ03, Mahdian et al.โ02
THEOREM:
โdependent roundingโ+primal-dual gives 1.5-approximation algorithm
Byrkaโ07
THEOREM:
โdependent roundingโ+primal-dual gives 1.488-approximation algorithm
Liโ11
Impressive progress based on same LP
THEOREM:
โdependent roundingโ+primal-dual gives 1.488-approximation algorithm
Liโ11
ALMOST TIGHT: It is NP-hard to do better than 1.463 Guha and Kullerโ99
Relation to k-Median
K-MEDIAN: same as facility location but hard constraint that at most k facilities are opened.
Relation to k-Median
K-MEDIAN: same as facility location but hard constraint that at most k facilities are opened. Relationship to facility location: Simple economy
- If the price of opening facilities is cheap, many facilities will be opened
- If the price of opening facilities is expensive, few facilities will be opened
Relation to k-Median
K-MEDIAN: same as facility location but hard constraint that at most k facilities are opened. Relationship to facility location: Simple economy
- If the price of opening facilities is cheap, many facilities will be opened
- If the price of opening facilities is expensive, few facilities will be opened
=> Find price so that โ k facilities are opened
Relation to k-Median
K-MEDIAN: same as facility location but hard constraint that at most k facilities are opened. Relationship to facility location: Simple economy
- If the price of opening facilities is cheap, many facilities will be opened
- If the price of opening facilities is expensive, few facilities will be opened
=> Find price so that โ k facilities are opened
First exploited by Jain & Vaziraniโ01 to give fast and elegant approximation algorithms for k-median based on algorithms for facility location
Relaxing hard constraint for k-Median
- Difficulty is the hard constraint that we can open at most k facilities
THEOREM:
An r-pseudo-approximation algorithm that opens k+c facilities can be turned into a r+ฮต-approximation algorithm that opens k facilities and runs in time nO(c/ฮต)
Li & S.โ12 Together with an improved โpseudo-approximationโ gives THEOREM:
There is a 2.73- approximation algorithm for k-Median
Li & S.โ12
Relaxing hard constraint for k-Median
- Difficulty is the hard constraint that we can open at most k facilities
THEOREM:
An r-pseudo-approximation algorithm that opens k+c facilities can be turned into a r+ฮต-approximation algorithm that opens k facilities and runs in time nO(c/ฮต)
Li & S.โ12 Together with an improved โpseudo-approximationโ gives THEOREM:
There is a 2.73- approximation algorithm for k-Median
Li & S.โ12 THEOREM:
There is a 2.67- approximation algorithm for k-Median
Byrka et alโ15
State of the Art
Approximation Hardness Facility Location 1.488
[Liโ11]
1.463
[Guha & Khullerโ98]
K-Center 2
[Gonzalesโ85, Hochbaum & Shmoysโ85]
2
[Hsu & Nemhauserโ79]
K-Median 2.6
[Byrka et alโ15]
1+2/e
[Jain et al.โ02]
K-Means 9
[Kanungo et al.โ04]
1.0013
[Lee. Schmidt, Wrightโ15]
Techniques developed transfers to the different problems
State of the Art
Approximation Hardness Facility Location 1.488
[Liโ11]
1.463
[Guha & Khullerโ98]
K-Center 2
[Gonzalesโ85, Hochbaum & Shmoysโ85]
2
[Hsu & Nemhauserโ79]
K-Median 2.6
[Byrka et alโ15]
1+2/e
[Jain et al.โ02]
K-Means 9
[Kanungo et al.โ04]
1.0013
[Lee. Schmidt, Wrightโ15]
Techniques developed transfers to the different problems
What is his problem?
Facilities have Capacities
Facilities have Capacities
Each potential facility i has a capacity Ui that regulates how many clients facility can accept 3 3 3 3
Facilities have Capacities
Each potential facility i has a capacity Ui that regulates how many clients facility can accept 3 3 3 3
Facilities have Capacities
Each potential facility i has a capacity Ui that regulates how many clients facility can accept 3 3 3 3
State of the Art
Capacitated Approximation Hardness Facility Location 5
[Bansal, Garg, Guptaโ12]
1.463
[Guha & Khullerโ98]
K-Center 9
[An et al.โ14]
3
[Cygan et al.โ12]
K-Median
- 1+2/e
[Jain et al.โ02]
K-Means
- 1.0013
[Lee, Schmidt, Wrightโ15]
No โuniformโ approach
Standard LP has unbounded integrality gap
APPRECIATE THE DIFFICULTY
Special case of Capacitated Facility Location
Special case: all distances are 0
Special case: all distances are 0
INPUT: n clients, set of facilities with capacities and opening costs
Special case: all distances are 0
INPUT: n clients, set of facilities with capacities and opening costs GOAL: find a subset of facilities so that 1. Total capacity is at least n 2. Opening costs are minimized
Special case: all distances are 0
INPUT: n clients, set of facilities with capacities and opening costs GOAL: find a subset of facilities so that 1. Total capacity is at least n 2. Opening costs are minimized
Minimum Knapsack Problem
Standard LP has bad integrality gap Strengthened using knapsack-cover inequalities
Add a constraint for each subset of facilities โthat we suppose to openโ
Knapsack-Cover Inequalities (Wolseyโ75)
1โฆ 20 clients
โฌ2 โค8 โฌ0 โค5 โฌ1 โค3 โฌ10 โค19 โฌ0 โค2
Knapsack-Cover Inequalities (Wolseyโ75)
- Suppose a subset S of facilities was already included in the solution
โฆ 20 clients
โฌ2 โค8 โฌ0 โค5 โฌ1 โค3 โฌ10 โค19 โฌ0 โค2
S
Knapsack-Cover Inequalities (Wolseyโ75)
- Suppose a subset S of facilities was already included in the solution
- Among the remaining facilities must open capacity
โฆ 20 clients
โฌ2 โค8 โฌ0 โค5 โฌ1 โค3 โฌ10 โค19 โฌ0 โค2
S
Knapsack-Cover Inequalities (Wolseyโ75)
- Suppose a subset S of facilities was already included in the solution
- Among the remaining facilities must open capacity
- Strengthen since no need to have higher capacity than right-hand-side
โฆ 20 clients
โฌ2 โค8 โฌ0 โค5 โฌ1 โค3 โฌ10 โค19 โฌ0 โค2
S
Knapsack-Cover Inequalities (Wolseyโ75)
- Suppose a subset S of facilities was already included in the solution
- Among the remaining facilities must open capacity
- Strengthen since no need to have higher capacity than right-hand-side
โฆ 20 clients
โฌ2 โค8 โฌ0 โค5 โฌ1 โค3 โฌ10 โค19 โฌ0 โค2
S
Non-Trivial to Generalize to Facility Location
- Several proposed inequalities
- Leung and Magnantiโ89, Cornuejols, Sridharan, Thizyโ91. Aardalโ92, Aardal, Pochet and Wolseyโ93, Deng and
Simchi-Leviโ93
- Many recently proved insufficient Kolliopoulos & Moysoglouโ13
- Sequence of local search algorithms that give 5-approximation algorithm
- Uniform capacities: Korupolu, Plaxton, Rajaramanโ00, Chudak & Williamsonโ05, Aggarwal et al.โ13
- General capacities: Pal, Tardos, Wexlerโ01, Bansal, Garg, Guptaโ12
Recent progress
THEOREM:
A generalization of the knapsack cover inequalities yields a โgoodโ LP- relaxation for capacitated facility location. Polynomial time rounding algorithm that finds a solution whose cost is no more than a constant times LP-OPT.
An, Singh, Svenssonโ14
Constant should be improved; not optimized constant is 288 ๏ No known large lower bound on the integrality gap Rich family of techniques to tap into to analyze the relaxation Are the techniques flexible enough to apply to related problems?
TIME TO SUMMARIZE
- Many interesting techniques developed by studying these problems
- Quite good understanding of uncapacitated problems
- Increased understanding of capacitated ones
Better algorithms for k-Median and Facility Location? More uniform treatment of capacitated problems?
- Integrality gap of relaxation for capacitated facility location?
- Is there a โgoodโ compact relaxation?
- Constant factor for capacitated k-Median?
What about k-Means?