Composable Core-sets for Diversity and Coverage Maximization Piotr - - PowerPoint PPT Presentation

β–Ά
composable core sets for diversity and coverage
SMART_READER_LITE
LIVE PREVIEW

Composable Core-sets for Diversity and Coverage Maximization Piotr - - PowerPoint PPT Presentation

Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi (MIT) Mohammad Mahdian (Google) Vahab S. Mirrokni (Google) Core-Set Definition Setup Set of points in -dimensional


slide-1
SLIDE 1

Composable Core-sets for Diversity and Coverage Maximization

Piotr Indyk (MIT) Sepideh Mahabadi (MIT) Mohammad Mahdian (Google) Vahab S. Mirrokni (Google)

slide-2
SLIDE 2

Core-Set Definition

  • Setup

– Set of π‘œ points 𝑸 in 𝑒-dimensional space – Optimize a function 𝑔

slide-3
SLIDE 3

Core-Set Definition

  • Setup

– Set of π‘œ points 𝑸 in 𝑒-dimensional space – Optimize a function 𝑔

  • 𝒅-Core-set: Small subset of points S βŠ‚ 𝑄

which suffices to 𝑑-approximate the optimal solution

  • Maximization:

𝑔

𝑝𝑝𝑝 𝑄

𝑑

≀ 𝑔

𝑝𝑝𝑝 𝑇 ≀ 𝑔 𝑝𝑝𝑝(𝑄)

slide-4
SLIDE 4

Core-Set Definition

  • Setup

– Set of π‘œ points 𝑸 in 𝑒-dimensional space – Optimize a function 𝑔

  • 𝒅-Core-set: Small subset of points S βŠ‚ 𝑄

which suffices to 𝑑-approximate the optimal solution

  • Maximization:

𝑔

𝑝𝑝𝑝 𝑄

𝑑

≀ 𝑔

𝑝𝑝𝑝 𝑇 ≀ 𝑔 𝑝𝑝𝑝(𝑄)

  • Example

– Optimization Function: Distance of the two farthest points

slide-5
SLIDE 5

Core-Set Definition

  • Setup

– Set of π‘œ points 𝑸 in 𝑒-dimensional space – Optimize a function 𝑔

  • 𝒅-Core-set: Small subset of points S βŠ‚ 𝑄

which suffices to 𝑑-approximate the optimal solution

  • Maximization:

𝑔

𝑝𝑝𝑝 𝑄

𝑑

≀ 𝑔

𝑝𝑝𝑝 𝑇 ≀ 𝑔 𝑝𝑝𝑝(𝑄)

  • Example

– Optimization Function: Distance of the two farthest points – 1-Core-set: Points on the convex hull.

slide-6
SLIDE 6

Composable Core-sets

  • Setup

– π‘ΈπŸ, π‘ΈπŸ‘, … , 𝑸𝒏 are set of points in 𝑒-dimensional space – Optimize a function 𝑔 over their union 𝑸.

slide-7
SLIDE 7

Composable Core-sets

  • Setup

– π‘ΈπŸ, π‘ΈπŸ‘, … , 𝑸𝒏 are set of points in 𝑒-dimensional space – Optimize a function 𝑔 over their union 𝑸.

  • 𝒅-Composable Core-sets: Subsets of

points S1 βŠ‚ 𝑄

1, S2 βŠ‚ 𝑄2, … , Sm βŠ‚ 𝑄 𝑛

points such that the solution of the union

  • f the core-sets approximates the solution
  • f the point sets.
  • Maximization :

1 𝑑 𝑔

𝑝𝑝𝑝 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ≀ 𝑔

  • pt S1 βˆͺ β‹― βˆͺ 𝑇𝑛 ≀ 𝑔

𝑝𝑝𝑝(𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛)

slide-8
SLIDE 8

Composable Core-sets

  • Setup

– π‘ΈπŸ, π‘ΈπŸ‘, … , 𝑸𝒏 are set of points in 𝑒-dimensional space – Optimize a function 𝑔 over their union 𝑸.

  • 𝒅-Composable Core-sets: Subsets of

points S1 βŠ‚ 𝑄

1, S2 βŠ‚ 𝑄2, … , Sm βŠ‚ 𝑄 𝑛

points such that the solution of the union

  • f the core-sets approximates the solution
  • f the point sets.
  • Maximization :

1 𝑑 𝑔

𝑝𝑝𝑝 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ≀ 𝑔

  • pt S1 βˆͺ β‹― βˆͺ 𝑇𝑛 ≀ 𝑔

𝑝𝑝𝑝(𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛)

  • Example: two farthest points
slide-9
SLIDE 9

Composable Core-sets

  • Setup

– π‘ΈπŸ, π‘ΈπŸ‘, … , 𝑸𝒏 are set of points in 𝑒-dimensional space – Optimize a function 𝑔 over their union 𝑸.

  • 𝒅-Composable Core-sets: Subsets of

points S1 βŠ‚ 𝑄

1, S2 βŠ‚ 𝑄2, … , Sm βŠ‚ 𝑄 𝑛

points such that the solution of the union

  • f the core-sets approximates the solution
  • f the point sets.
  • Maximization :

1 𝑑 𝑔

𝑝𝑝𝑝 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ≀ 𝑔

  • pt S1 βˆͺ β‹― βˆͺ 𝑇𝑛 ≀ 𝑔

𝑝𝑝𝑝(𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛)

  • Example: two farthest points
slide-10
SLIDE 10

Composable Core-sets

  • Setup

– π‘ΈπŸ, π‘ΈπŸ‘, … , 𝑸𝒏 are set of points in 𝑒-dimensional space – Optimize a function 𝑔 over their union 𝑸.

  • 𝒅-Composable Core-sets: Subsets of

points S1 βŠ‚ 𝑄

1, S2 βŠ‚ 𝑄2, … , Sm βŠ‚ 𝑄 𝑛

points such that the solution of the union

  • f the core-sets approximates the solution
  • f the point sets.
  • Maximization :

1 𝑑 𝑔

𝑝𝑝𝑝 𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛 ≀ 𝑔

  • pt S1 βˆͺ β‹― βˆͺ 𝑇𝑛 ≀ 𝑔

𝑝𝑝𝑝(𝑄 1 βˆͺ β‹― βˆͺ 𝑄 𝑛)

  • Example: two farthest points
slide-11
SLIDE 11

Applications – Streaming Computation

  • Streaming Computation:

– Processing sequence of π‘œ data elements β€œon the fly” – limited Storage

slide-12
SLIDE 12

Applications – Streaming Computation

  • Streaming Computation:

– Processing sequence of π‘œ data elements β€œon the fly” – limited Storage

  • 𝒅-Composable Core-set of size 𝒍

– Chunks of size π‘œπ‘œ , thus number of chunks = π‘œ/π‘œ

slide-13
SLIDE 13

Applications – Streaming Computation

  • Streaming Computation:

– Processing sequence of π‘œ data elements β€œon the fly” – limited Storage

  • 𝒅-Composable Core-set of size 𝒍

– Chunks of size π‘œπ‘œ , thus number of chunks = π‘œ/π‘œ – Core-set for each chunk – Total Space: π‘œ π‘œ/π‘œ + π‘œπ‘œ = 𝑃( π‘œπ‘œ) – Approximation Factor: 𝑑

slide-14
SLIDE 14

Applications – Distributed Systems

  • Streaming Computation
  • Distributed System:

– Each machine holds a block of data. – A composable core-set is computed and sent to the server

slide-15
SLIDE 15

Applications – Distributed Systems

  • Streaming Computation
  • Distributed System:

– Each machine holds a block of data. – A composable core-set is computed and sent to the server

  • Map-Reduce Model:
  • One round of Map-Reduce
  • π‘œ/π‘œ mappers each getting π‘œπ‘œ points
  • Mapper computes a composable core-set of size π‘œ
  • Will be passed to a single reducer
slide-16
SLIDE 16

Applications – Similarity Search

  • Streaming Computation
  • Distributed System
  • Similarity Search: Small output size
slide-17
SLIDE 17

Applications – Similarity Search

  • Streaming Computation
  • Distributed System
  • Similarity Search: Small output size
  • Good to have result from each

cluster: relevant and diverse

slide-18
SLIDE 18

Applications – Similarity Search

  • Streaming Computation
  • Distributed System
  • Similarity Search: Small output size
  • Good to have result from each

cluster: relevant and diverse

  • Diverse Near Neighbor Problem

[Abbar, Amer-Yahia, Indyk, Mahabadi

WWW’13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCG’13]

slide-19
SLIDE 19

Applications – Similarity Search

  • Streaming Computation
  • Distributed System
  • Similarity Search: Small output size
  • Good to have result from each

cluster: relevant and diverse

  • Diverse Near Neighbor Problem

[Abbar, Amer-Yahia, Indyk, Mahabadi

WWW’13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCG’13]

– uses Locality Sensitive Hashing (LSH) and Composable Core- sets techniques.

slide-20
SLIDE 20

Diversity Maximization Problem

  • A set of π‘œ points 𝑄 in metric space

(Ξ”, 𝑒𝑒𝑒𝑒)

  • Optimization Problem:

– Find a subset of π‘œ points 𝑇 which maximizes Diversity

k=4 n = 6

slide-21
SLIDE 21

Diversity Maximization Problem

  • A set of π‘œ points 𝑄 in metric space

(Ξ”, 𝑒𝑒𝑒𝑒)

  • Optimization Problem:

– Find a subset of π‘œ points 𝑇 which maximizes Diversity

  • Diversity:

– Minimum pairwise distance (Remote Edge)

k=4 n = 6

slide-22
SLIDE 22

Diversity Maximization Problem

  • A set of π‘œ points 𝑄 in metric space

(Ξ”, 𝑒𝑒𝑒𝑒)

  • Optimization Problem:

– Find a subset of π‘œ points 𝑇 which maximizes Diversity

  • Diversity:

– Minimum pairwise distance (Remote Edge) – Sum of Pairwise distances (Remote Clique)

k=4 n = 6

slide-23
SLIDE 23

Diversity Maximization Problem

  • A set of π‘œ points 𝑄 in metric space

(Ξ”, 𝑒𝑒𝑒𝑒)

  • Optimization Problem:

– Find a subset of π‘œ points 𝑇 which maximizes Diversity

  • Diversity:

– Minimum pairwise distance (Remote Edge) – Sum of Pairwise distances (Remote Clique)

  • Long list of variants [Chandra and

Halldorsson β€˜01]

k=4 n = 6

slide-24
SLIDE 24

Diversity Functions

Diversity function over a set 𝑇 of π‘œ point Description Remote-edge Minimum Pairwise Distance: min

𝑝,π‘Ÿβˆˆπ‘‡ 𝑒𝑒𝑒𝑒(π‘ž, π‘Ÿ)

Remote-clique Sum of Pairwise Distances : βˆ‘ 𝑒𝑒𝑒𝑒(π‘ž, π‘Ÿ)

𝑝,π‘Ÿβˆˆπ‘‡

Remote-tree Weight of Minimum Spanning Tree (MST) of the set 𝑇 Remote-cycle Weight of minimum Traveling Salesman Tour (TSP) of the set 𝑇 Remote-star Weight of minimum star: min

π‘βˆˆπ‘‡ βˆ‘

𝑒𝑒𝑒𝑒(π‘ž, π‘Ÿ)

π‘Ÿβˆˆπ‘‡

Remote-Pseudoforest Sum of the distance of each point to its nearest neighbor βˆ‘ min

π‘Ÿβˆˆπ‘‡ 𝑒𝑒𝑒𝑒(π‘ž, π‘Ÿ) π‘βˆˆπ‘‡

Remote-Matching Weight of minimum perfect Matching of the set 𝑇 Max-Coverage How well the points cover each coordinate max

π‘βˆˆπ‘‡ π‘žπ‘— 𝑒 𝑗=1

slide-25
SLIDE 25

Our Results

Diversity function Offline ApproxFactor Composable Coreset Approx factor [Our Results] Remote-edge Minimum Pairwise Distance 𝑃(1) [Tmair 91][White 91] [Ravi et al 94] 𝑷(𝟐) Remote-clique Sum of Pairwise Distances 𝑃(1) [Hassin et al 97] 𝑷(𝟐) Remote-tree Weight of MST 𝑃(1) [Halldorsson et al 99] 𝑷(𝟐) Remote-cycle Weight of minimum TSP 𝑃(1) [Halldorsson et al 99] 𝑷(𝟐) Remote-star Weight of minimum star 𝑃(1) [Chandra&Halldorsson 01] 𝑷(𝟐) Remote-Pseudoforest Sum of the distance of each point to its nearest neighbor 𝑃(log π‘œ) [Chandra&Halldorsson 01] 𝑷(𝐦𝐦𝐦 𝒍) Remote-Matching Weight of minimum perfect Matching 𝑃(log π‘œ) [Chandra&Halldorsson 01] 𝑷(𝐦𝐦𝐦 𝒍) Max-Coverage How well the points cover each coordinate max

π‘βˆˆπ‘‡ π‘žπ‘— 𝑒 𝑗=1

𝑃(1) [Feige 98] No Composable Coreset of Poly size in 𝒍 with app. factor

𝒍 π’Žπ’Žπ’Ž 𝒍

slide-26
SLIDE 26

Review of Offline Algorithms

  • We have a set of π‘œ point 𝑄
  • Goal: find a subset 𝑇 of size π‘œ which

maximizes the diversity

slide-27
SLIDE 27

The Greedy Algorithm

  • Used for minimum-pairwise distance
slide-28
SLIDE 28

The Greedy Algorithm

  • Used for minimum-pairwise distance
  • Greedy Algorithm [Ravi, Rosenkrantz,

Tayi] [Gonzales]

– Choose an arbitrary point – Repeat k-1 times

  • Add the point whose minimum distance to

the currently chosen points is maximized

slide-29
SLIDE 29

The Greedy Algorithm

  • Used for minimum-pairwise distance
  • Greedy Algorithm [Ravi, Rosenkrantz,

Tayi] [Gonzales]

– Choose an arbitrary point – Repeat k-1 times

  • Add the point whose minimum distance to

the currently chosen points is maximized

  • Remote-edge: computes a 2-

approximate set

slide-30
SLIDE 30

Local Search Algorithm

  • Used for sum of pairwise distances
slide-31
SLIDE 31

Local Search Algorithm

  • Used for sum of pairwise distances
  • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑇 with an arbitrary set of π‘œ points which contains the two farthest points

slide-32
SLIDE 32

Local Search Algorithm

  • Used for sum of pairwise distances
  • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑇 with an arbitrary set of π‘œ points which contains the two farthest points – While there exists a swap that improves diversity by a factor of 1 +

πœ— π‘œ

slide-33
SLIDE 33

Local Search Algorithm

  • Used for sum of pairwise distances
  • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑇 with an arbitrary set of π‘œ points which contains the two farthest points – While there exists a swap that improves diversity by a factor of 1 +

πœ— π‘œ

Β» Perform the swap

slide-34
SLIDE 34

Local Search Algorithm

  • Used for sum of pairwise distances
  • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑇 with an arbitrary set of π‘œ points which contains the two farthest points – While there exists a swap that improves diversity by a factor of 1 +

πœ— π‘œ

Β» Perform the swap

slide-35
SLIDE 35

Local Search Algorithm

  • Used for sum of pairwise distances
  • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑇 with an arbitrary set of π‘œ points which contains the two farthest points – While there exists a swap that improves diversity by a factor of 1 +

πœ— π‘œ

Β» Perform the swap

  • For Remote-Clique

– Number of rounds: log 1+πœ—

π‘œ

π‘œ2 = 𝑃(π‘œ

πœ— log π‘œ)

– Approximation factor is constant.

slide-36
SLIDE 36

Composable Core-sets

  • Greedy Algorithm Computes a 3-composable core-set for

minimum pairwise distance

  • Local Search Algorithm Computes a constant factor

composable core-set for sum of pairwise distances.

slide-37
SLIDE 37

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

slide-38
SLIDE 38

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c

slide-39
SLIDE 39

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c

slide-40
SLIDE 40

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c Let 𝑃𝑄𝑃 = 𝑝1, β‹― , 𝑝𝑙 be the optimal solution Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀(𝑃𝑄𝑃) / c

slide-41
SLIDE 41

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c Let 𝑃𝑄𝑃 = 𝑝1, β‹― , 𝑝𝑙 be the optimal solution Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀(𝑃𝑄𝑃) / c

slide-42
SLIDE 42

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c Let 𝑃𝑄𝑃 = 𝑝1, β‹― , 𝑝𝑙 be the optimal solution Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀(𝑃𝑄𝑃) / c Let 𝑠 be their maximum diversity , 𝑠 = max

𝑗 𝑒𝑒𝑀 𝑇𝑗 , Note: divk 𝑇 β‰₯ 𝑠

slide-43
SLIDE 43

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c Let 𝑃𝑄𝑃 = 𝑝1, β‹― , 𝑝𝑙 be the optimal solution Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀(𝑃𝑄𝑃) / c Let 𝑠 be their maximum diversity , 𝑠 = max

𝑗 𝑒𝑒𝑀 𝑇𝑗 , Note: divk 𝑇 β‰₯ 𝑠

Case 1: one of 𝑇𝑗 has diversity as good as the optimum: 𝑠 β‰₯ 𝑷 𝑒𝑒𝑀 𝑃𝑄𝑃

slide-44
SLIDE 44

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c Let 𝑃𝑄𝑃 = 𝑝1, β‹― , 𝑝𝑙 be the optimal solution Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀(𝑃𝑄𝑃) / c Let 𝑠 be their maximum diversity , 𝑠 = max

𝑗 𝑒𝑒𝑀 𝑇𝑗 , Note: divk 𝑇 β‰₯ 𝑠

Case 1: one of 𝑇𝑗 has diversity as good as the optimum: 𝑠 β‰₯ 𝑷 𝑒𝑒𝑀 𝑃𝑄𝑃 Case 2: : 𝑠 ≀ 𝑷(𝑒𝑒𝑀(𝑃𝑄𝑃))

slide-45
SLIDE 45

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c Let 𝑃𝑄𝑃 = 𝑝1, β‹― , 𝑝𝑙 be the optimal solution Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀(𝑃𝑄𝑃) / c Let 𝑠 be their maximum diversity , 𝑠 = max

𝑗 𝑒𝑒𝑀 𝑇𝑗 , Note: divk 𝑇 β‰₯ 𝑠

Case 1: one of 𝑇𝑗 has diversity as good as the optimum: 𝑠 β‰₯ 𝑷 𝑒𝑒𝑀 𝑃𝑄𝑃 Case 2: : 𝑠 ≀ 𝑷(𝑒𝑒𝑀(𝑃𝑄𝑃))

  • find a one-to-one mapping 𝜈 from 𝑃𝑄𝑃 = {𝑝1, β‹― , 𝑝𝑙} to 𝑇 = 𝑇1 βˆͺ β‹― βˆͺ 𝑇𝑛 s.t.

𝑒𝑒𝑒𝑒 𝑝𝑗 , 𝜈 𝑝𝑗 ≀ 𝑷(𝑠)

slide-46
SLIDE 46

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c Let 𝑃𝑄𝑃 = 𝑝1, β‹― , 𝑝𝑙 be the optimal solution Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀(𝑃𝑄𝑃) / c Let 𝑠 be their maximum diversity , 𝑠 = max

𝑗 𝑒𝑒𝑀 𝑇𝑗 , Note: divk 𝑇 β‰₯ 𝑠

Case 1: one of 𝑇𝑗 has diversity as good as the optimum: 𝑠 β‰₯ 𝑷 𝑒𝑒𝑀 𝑃𝑄𝑃 Case 2: : 𝑠 ≀ 𝑷(𝑒𝑒𝑀(𝑃𝑄𝑃))

  • find a one-to-one mapping 𝜈 from 𝑃𝑄𝑃 = {𝑝1, β‹― , 𝑝𝑙} to 𝑇 = 𝑇1 βˆͺ β‹― βˆͺ 𝑇𝑛 s.t.

𝑒𝑒𝑒𝑒 𝑝𝑗 , 𝜈 𝑝𝑗 ≀ 𝑷(𝑠)

  • Replacing 𝑝𝑗 with 𝜈(𝑝𝑗) has still large diversity
  • 𝑒𝑒𝑀

𝜈 𝑝𝑗 is approximately as good as 𝑒𝑒𝑀 𝑝𝑗

slide-47
SLIDE 47

Proof Idea

Let 𝑄

1, β‹― , 𝑄 𝑛 be the set of points , 𝑄 = ⋃𝑄 𝑗

𝑇1, β‹― , 𝑇𝑛 be their core-sets, S = ⋃𝑇𝑗 Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀𝑙(𝑄) / c Let 𝑃𝑄𝑃 = 𝑝1, β‹― , 𝑝𝑙 be the optimal solution Goal: 𝑒𝑒𝑀𝑙 𝑇 β‰₯ 𝑒𝑒𝑀(𝑃𝑄𝑃) / c Let 𝑠 be their maximum diversity , 𝑠 = max

𝑗 𝑒𝑒𝑀 𝑇𝑗 , Note: divk 𝑇 β‰₯ 𝑠

Case 1: one of 𝑇𝑗 has diversity as good as the optimum: 𝑠 β‰₯ 𝑷 𝑒𝑒𝑀 𝑃𝑄𝑃 Case 2: : 𝑠 ≀ 𝑷(𝑒𝑒𝑀(𝑃𝑄𝑃))

  • find a one-to-one mapping 𝜈 from 𝑃𝑄𝑃 = {𝑝1, β‹― , 𝑝𝑙} to 𝑇 = 𝑇1 βˆͺ β‹― βˆͺ 𝑇𝑛 s.t.

𝑒𝑒𝑒𝑒 𝑝𝑗 , 𝜈 𝑝𝑗 ≀ 𝑷(𝑠)

  • Replacing 𝑝𝑗 with 𝜈(𝑝𝑗) has still large diversity
  • 𝑒𝑒𝑀

𝜈 𝑝𝑗 is approximately as good as 𝑒𝑒𝑀 𝑝𝑗

  • The actual mapping 𝜈 depends on the specific diversity measure we are considering.
slide-48
SLIDE 48

Maximum k-Coverage

  • A set of π‘œ points 𝑄 in 𝑒-dimensional space
  • Each dimension corresponds to a feature.
  • Goal: choose a set of π‘œ points 𝑇 in 𝑄 which maximizes the total

coverage:

– cov S = βˆ‘ max

π‘‘βˆˆπ‘‡ 𝑒𝑗 𝑒 𝑗=1

slide-49
SLIDE 49

Maximum k-Coverage

  • A set of π‘œ points 𝑄 in 𝑒-dimensional space
  • Each dimension corresponds to a feature.
  • Goal: choose a set of π‘œ points 𝑇 in 𝑄 which maximizes the total

coverage:

– cov S = βˆ‘ max

π‘‘βˆˆπ‘‡ 𝑒𝑗 𝑒 𝑗=1

  • Special Case hamming space:
  • A collection of π‘œ sets 𝑄
  • Over the universe 𝑉 = 1, … , 𝑒
  • Goal: choose π‘œ sets 𝑇 = {𝑇1, … , 𝑇𝑙} in 𝑄 whose union is

maximized.

slide-50
SLIDE 50

Maximum k-Coverage

  • A set of π‘œ points 𝑄 in 𝑒-dimensional space
  • Each dimension corresponds to a feature.
  • Goal: choose a set of π‘œ points 𝑇 in 𝑄 which maximizes the total

coverage:

– cov S = βˆ‘ max

π‘‘βˆˆπ‘‡ 𝑒𝑗 𝑒 𝑗=1

  • Special Case hamming space:
  • A collection of π‘œ sets 𝑄
  • Over the universe 𝑉 = 1, … , 𝑒
  • Goal: choose π‘œ sets 𝑇 = {𝑇1, … , 𝑇𝑙} in 𝑄 whose union is

maximized.

  • Theorem: for any 𝛽 <

𝑙 log 𝑙 and any constant 𝛾 > 1, there is

no 𝛽-composable core-set of size π‘œπ›Ύ

slide-51
SLIDE 51

Proof Idea

Build a set of instances 𝑄

1, β‹― , 𝑄𝑃 𝑙

let 𝑉 = 1, β‹― , 𝑃 π‘œ4

slide-52
SLIDE 52

Proof Idea

Build a set of instances 𝑄

1, β‹― , 𝑄𝑃 𝑙

let 𝑉 = {1, β‹― , 𝑃 π‘œ4 }

  • Let π‘Š

𝑗 be subset of size π‘œ of 𝑉

slide-53
SLIDE 53

Proof Idea

Build a set of instances 𝑄

1, β‹― , 𝑄𝑃 𝑙

let 𝑉 = {1, β‹― , 𝑃 π‘œ4 }

  • Let π‘Š

𝑗 be subset of size π‘œ of 𝑉

  • 𝑄𝑗 is a collection of subsets of size

π‘œ from π‘Š

𝑗

slide-54
SLIDE 54

Proof Idea

Build a set of instances 𝑄

1, β‹― , 𝑄𝑃 𝑙

let 𝑉 = {1, β‹― , 𝑃 π‘œ4 }

  • Let π‘Š

𝑗 be subset of size π‘œ of 𝑉

  • 𝑄𝑗 is a collection of subsets of size

π‘œ from π‘Š

𝑗

  • 𝑄𝑗 has cardinality

𝑙 𝑙

slide-55
SLIDE 55

Proof Idea

Build a set of instances 𝑄

1, β‹― , 𝑄𝑃 𝑙

let 𝑉 = {1, β‹― , 𝑃 π‘œ4 }

  • Let π‘Š

𝑗 be subset of size π‘œ of 𝑉

  • 𝑄𝑗 is a collection of subsets of size

π‘œ from π‘Š

𝑗

  • 𝑄𝑗 has cardinality

𝑙 𝑙

We show there exists π‘Š

1, β‹― , π‘Š 𝑃 𝑙 such that

– π‘Š

𝑗 βˆ– π‘Š 1 has size π‘œ

– π‘Š

𝑗 βˆ– π‘Š 1 and π‘Š π‘˜ βˆ– π‘Š 1 are disjoint for 𝑒 β‰  π‘˜

slide-56
SLIDE 56

Proof Idea

Build a set of instances 𝑄

1, β‹― , 𝑄𝑃 𝑙

let 𝑉 = {1, β‹― , 𝑃 π‘œ4 }

  • Let π‘Š

𝑗 be subset of size π‘œ of 𝑉

  • 𝑄𝑗 is a collection of subsets of size

π‘œ from π‘Š

𝑗

  • 𝑄𝑗 has cardinality

𝑙 𝑙

We show there exists π‘Š

1, β‹― , π‘Š 𝑃 𝑙 such that

– π‘Š

𝑗 βˆ– π‘Š 1 has size π‘œ

– π‘Š

𝑗 βˆ– π‘Š 1 and π‘Š π‘˜ βˆ– π‘Š 1 are disjoint for 𝑒 β‰  π‘˜

  • Using π‘œ sets everything in βˆͺ π‘Š

𝑗 can be covered,

that is 𝑃(π‘œ3/2) elements.

slide-57
SLIDE 57

Proof Idea

Build a set of instances 𝑄

1, β‹― , 𝑄𝑃 𝑙

let 𝑉 = {1, β‹― , 𝑃 π‘œ4 }

  • Let π‘Š

𝑗 be subset of size π‘œ of 𝑉

  • 𝑄𝑗 is a collection of subsets of size

π‘œ from π‘Š

𝑗

  • 𝑄𝑗 has cardinality

𝑙 𝑙

We show there exists π‘Š

1, β‹― , π‘Š 𝑃 𝑙 such that

– π‘Š

𝑗 βˆ– π‘Š 1 has size π‘œ

– π‘Š

𝑗 βˆ– π‘Š 1 and π‘Š π‘˜ βˆ– π‘Š 1 are disjoint for 𝑒 β‰  π‘˜

  • Using π‘œ sets everything in βˆͺ π‘Š

𝑗 can be covered,

that is 𝑃(π‘œ3/2) elements.

  • Using core-sets only π‘Š

1 + π‘œ log π‘œ = O(k log k )

can be covered

slide-58
SLIDE 58

Conclusion

  • Applications of composable core-sets
slide-59
SLIDE 59

Conclusion

  • Applications of composable core-sets
  • We showed construction of composable core-sets for a

wide range of diversity measures

slide-60
SLIDE 60

Conclusion

  • Applications of composable core-sets
  • We showed construction of composable core-sets for a

wide range of diversity measures

  • We showed non existence of core-sets of polynomial

size in π‘œ for maximum coverage

slide-61
SLIDE 61

Conclusion

  • Applications of composable core-sets
  • We showed construction of composable core-sets for a

wide range of diversity measures

  • We showed non existence of core-sets of polynomial

size in π‘œ for maximum coverage

  • Open Problems

– Are there any other applications of composable core-sets?

slide-62
SLIDE 62

Conclusion

  • Applications of composable core-sets
  • We showed construction of composable core-sets for a

wide range of diversity measures

  • We showed non existence of core-sets of polynomial

size in π‘œ for maximum coverage

  • Open Problems

– Are there any other applications of composable core-sets? – Is there a general characterization of measures for which composable core-sets exist?

slide-63
SLIDE 63

Conclusion

  • Applications of composable core-sets
  • We showed construction of composable core-sets for a

wide range of diversity measures

  • We showed non existence of core-sets of polynomial

size in π‘œ for maximum coverage

  • Open Problems

– Are there any other applications of composable core-sets? – Is there a general characterization of measures for which composable core-sets exist? – Better approximation factors?

slide-64
SLIDE 64

Thank You!

Questions?