SLIDE 1 MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension
Andrea Pietracaprina
- Dip. Ingegneria dell’Informazione, Universit`
a di Padova
Joint work with:
- M. Ceccarello, G. Pucci (U. Padova), and E. Upfal (Brown U.)
VLDB’17
SLIDE 2 Outline
◮ Problem definition and applications ◮ Background ◮ Summary of results ◮ Our approach:
◮ Core-set construction ◮ Streaming implementation ◮ MapReduce implementation
◮ Experiments ◮ Diversity maximization under matroid costraints ◮ Summary and future work
SLIDE 3
Problem definition and applications
SLIDE 4
Diversity maximization Objective:
For a given dataset
SLIDE 5
Diversity maximization Objective:
Determine the most diverse subset of given (small) size k
SLIDE 6
Applications
SLIDE 7 Applications Aggregator websites (e.g., Google News)
◮ Documents (possibly clustered into categories) ◮ Diversified set of representative docs (from various clusters)
SLIDE 8 Applications Aggregator websites (e.g., Google News)
◮ Documents (possibly clustered into categories) ◮ Diversified set of representative docs (from various clusters)
E-commerce
◮ Consideration set: products returned by shopping portal in
reply to user query
◮ Diversity of returned products (w.r.t. unspecified attributes)
correlates to user satisfaction
SLIDE 9 Applications Aggregator websites (e.g., Google News)
◮ Documents (possibly clustered into categories) ◮ Diversified set of representative docs (from various clusters)
E-commerce
◮ Consideration set: products returned by shopping portal in
reply to user query
◮ Diversity of returned products (w.r.t. unspecified attributes)
correlates to user satisfaction
Facility location
◮ Franchise location (noncompetition) ◮ Strategic facilities (dispersion against simultaneous attacks)
SLIDE 10
Diversity maximization: formal definition
SLIDE 11 Diversity maximization: formal definition Given:
- 1. Set S of n points in a metric space ∆
- 2. Distance function d : ∆ × ∆ → R+ ∪ {0}
- 3. Integer k > 1
- 4. (Distance-based) diversity function div : 2∆ → R+ ∪ {0}
SLIDE 12 Diversity maximization: formal definition Given:
- 1. Set S of n points in a metric space ∆
- 2. Distance function d : ∆ × ∆ → R+ ∪ {0}
- 3. Integer k > 1
- 4. (Distance-based) diversity function div : 2∆ → R+ ∪ {0}
Return
S∗ ⊂ S, |S∗| = k s.t. S∗ = argmaxS′⊆S,|S′|=k div(S′)
SLIDE 13 Diversity maximization: formal definition Given:
- 1. Set S of n points in a metric space ∆
- 2. Distance function d : ∆ × ∆ → R+ ∪ {0}
- 3. Integer k > 1
- 4. (Distance-based) diversity function div : 2∆ → R+ ∪ {0}
Return
S∗ ⊂ S, |S∗| = k s.t. S∗ = argmaxS′⊆S,|S′|=k div(S′) The k-diversity of S is divk(S) = div(S∗)
SLIDE 14 Diversity functions studied in this work
Problem div(S′) Remote-edge minp,q∈S′ d(p, q) Remote-clique
Remote-star w(MSTAR(S′)) Remote-bipartition w(MBIP(S′)) Remote-tree w(MST(S′)) Remote-cycle w(TSP(S′))
SLIDE 15 Diversity functions studied in this work
Problem div(S′) Remote-edge minp,q∈S′ d(p, q) Remote-clique
Remote-star w(MSTAR(S′)) Remote-bipartition w(MBIP(S′)) Remote-tree w(MST(S′)) Remote-cycle w(TSP(S′))
◮ MSTAR(S′): min-weight star in G(S′) (≡ complete graph over S′)
SLIDE 16 Diversity functions studied in this work
Problem div(S′) Remote-edge minp,q∈S′ d(p, q) Remote-clique
Remote-star w(MSTAR(S′)) Remote-bipartition w(MBIP(S′)) Remote-tree w(MST(S′)) Remote-cycle w(TSP(S′))
◮ MSTAR(S′): min-weight star in G(S′) (≡ complete graph over S′) ◮ MBIP(S′): min-weight balanced bipartition of S′ in G(S′)
SLIDE 17 Diversity functions studied in this work
Problem div(S′) Remote-edge minp,q∈S′ d(p, q) Remote-clique
Remote-star w(MSTAR(S′)) Remote-bipartition w(MBIP(S′)) Remote-tree w(MST(S′)) Remote-cycle w(TSP(S′))
◮ MSTAR(S′): min-weight star in G(S′) (≡ complete graph over S′) ◮ MBIP(S′): min-weight balanced bipartition of S′ in G(S′) ◮ MST(S′): min-weight spanning tree of G(S′)
SLIDE 18 Diversity functions studied in this work
Problem div(S′) Remote-edge minp,q∈S′ d(p, q) Remote-clique
Remote-star w(MSTAR(S′)) Remote-bipartition w(MBIP(S′)) Remote-tree w(MST(S′)) Remote-cycle w(TSP(S′))
◮ MSTAR(S′): min-weight star in G(S′) (≡ complete graph over S′) ◮ MBIP(S′): min-weight balanced bipartition of S′ in G(S′) ◮ MST(S′): min-weight spanning tree of G(S′) ◮ TSP(S′): min-weight tour of S′ in G(S′)
SLIDE 19 Diversity functions studied in this work
Problem div(S′) Remote-edge minp,q∈S′ d(p, q) Remote-clique
Remote-star w(MSTAR(S′)) Remote-bipartition w(MBIP(S′)) Remote-tree w(MST(S′)) Remote-cycle w(TSP(S′))
◮ MSTAR(S′): min-weight star in G(S′) (≡ complete graph over S′) ◮ MBIP(S′): min-weight balanced bipartition of S′ in G(S′) ◮ MST(S′): min-weight spanning tree of G(S′) ◮ TSP(S′): min-weight tour of S′ in G(S′)
Except for remote-clique, all problems are max-min optimizations.
SLIDE 20
Background
SLIDE 21 Previous work Sequential approximation and hardness results
Problem
LB Remote-edge 2 ≥ 2 Remote-clique 2 ≥ 2 − ǫ Remote-star 2 – Remote-bipartition 3 – Remote-tree 4 ≥ 2 Remote-cycle 3 ≥ 2 Specialized results (hardness and better approx. ratios) for remote clique and remote edge under Euclidean distances
SLIDE 22
Previous work β-core-set [Agarwal et al.’04]
SLIDE 23 Previous work β-core-set [Agarwal et al.’04]
◮ (Small) subset T ⊆ S such that divk(T) ≥ (1/β) divk(S)
SLIDE 24 Previous work β-core-set [Agarwal et al.’04]
◮ (Small) subset T ⊆ S such that divk(T) ≥ (1/β) divk(S)
SLIDE 25 Previous work β-core-set [Agarwal et al.’04]
◮ (Small) subset T ⊆ S such that divk(T) ≥ (1/β) divk(S) ◮ T filters out redundancy
SLIDE 26 Previous work β-core-set [Agarwal et al.’04]
◮ (Small) subset T ⊆ S such that divk(T) ≥ (1/β) divk(S) ◮ T filters out redundancy ◮ Approximate solution can be computed on T.
SLIDE 27
Previous work β-composable core-set [Indyk et al.’14]
SLIDE 28 Previous work β-composable core-set [Indyk et al.’14]
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ
SLIDE 29 Previous work β-composable core-set [Indyk et al.’14]
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ β-composable core-sets Ti ⊂ Si ⇒ ∪Ti is a β-core-set
SLIDE 30 Previous work β-composable core-set [Indyk et al.’14]
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ β-composable core-sets Ti ⊂ Si ⇒ ∪Ti is a β-core-set
SLIDE 31 Previous work β-composable core-set [Indyk et al.’14]
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ β-composable core-sets Ti ⊂ Si ⇒ ∪Ti is a β-core-set ◮ Application to MapReduce and Streaming frameworks
SLIDE 32
Previous work Known β-composable core-sets for k-diversity maximization
([Indyk et al.’14,Aghamolaei et al.’15]) β αseq β · αseq Remote-edge 3 2 6 Remote-clique 6 + ǫ 2 12 + ǫ Remote-star 12 2 24 Remote-bipartition 18 3 54 Remote-tree 4 4 16 Remote-cycle 3 3 9
SLIDE 33 Previous work Known β-composable core-sets for k-diversity maximization
([Indyk et al.’14,Aghamolaei et al.’15]) β αseq β · αseq Remote-edge 3 2 6 Remote-clique 6 + ǫ 2 12 + ǫ Remote-star 12 2 24 Remote-bipartition 18 3 54 Remote-tree 4 4 16 Remote-cycle 3 3 9
◮ αseq = best sequential approximation ratio
SLIDE 34 Previous work Known β-composable core-sets for k-diversity maximization
([Indyk et al.’14,Aghamolaei et al.’15]) β αseq β · αseq Remote-edge 3 2 6 Remote-clique 6 + ǫ 2 12 + ǫ Remote-star 12 2 24 Remote-bipartition 18 3 54 Remote-tree 4 4 16 Remote-cycle 3 3 9
◮ αseq = best sequential approximation ratio ◮ General metric spaces
SLIDE 35 Previous work Known β-composable core-sets for k-diversity maximization
([Indyk et al.’14,Aghamolaei et al.’15]) β αseq β · αseq Remote-edge 3 2 6 Remote-clique 6 + ǫ 2 12 + ǫ Remote-star 12 2 24 Remote-bipartition 18 3 54 Remote-tree 4 4 16 Remote-cycle 3 3 9
◮ αseq = best sequential approximation ratio ◮ General metric spaces ◮ Core-set size: k
SLIDE 36
Computational Frameworks for Massive Data Analysis
SLIDE 37 Computational Frameworks for Massive Data Analysis MapReduce
◮ Targets distributed cluster-based architectures ◮ Computation: sequence of rounds where data are partitioned into
(small) subsets, processed in parallel
◮ Goals: few rounds, sublinear local space, linear overall space.
SLIDE 38 Computational Frameworks for Massive Data Analysis MapReduce
◮ Targets distributed cluster-based architectures ◮ Computation: sequence of rounds where data are partitioned into
(small) subsets, processed in parallel
◮ Goals: few rounds, sublinear local space, linear overall space.
Streaming
◮ Data provided as a continuous stream and processed using limited
local space (much smaller than input size)
◮ Multiple passes over data may be allowed ◮ Goals: few passes, sublinear local space
SLIDE 39 Computational Frameworks for Massive Data Analysis MapReduce
◮ Targets distributed cluster-based architectures ◮ Computation: sequence of rounds where data are partitioned into
(small) subsets, processed in parallel
◮ Goals: few rounds, sublinear local space, linear overall space.
Streaming
◮ Data provided as a continuous stream and processed using limited
local space (much smaller than input size)
◮ Multiple passes over data may be allowed ◮ Goals: few passes, sublinear local space
Fact: Known composable core-sets for k-diversity maximization yield 1-pass/2-round Streaming/MapReduce algorithms with O √ kn
SLIDE 40
Doubling space Doubling space
Metric space such that any ball of radius r is covered by ≤ 2D balls of radius r/2 (⇒1/ǫD balls of radius ǫr), for some D = O(1). E.g.: Euclidean spaces of constant dimension (under ℓ1 or ℓ2 norm); Shortest-path distances of mildly expanding topologies
SLIDE 41
Summary of Results
SLIDE 42
Summary of results For doubling spaces of dim D, all div’s, and ǫ > 0:
SLIDE 43 Summary of results For doubling spaces of dim D, all div’s, and ǫ > 0:
◮ (1 + ǫ)-(composable) core-sets
SLIDE 44 Summary of results For doubling spaces of dim D, all div’s, and ǫ > 0:
◮ (1 + ǫ)-(composable) core-sets ◮ 1-pass Streaming and 2-round MapReduce (MR) algorithms
– approximation ratio: αseq + ǫ – local space requirements:
SLIDE 45 Summary of results For doubling spaces of dim D, all div’s, and ǫ > 0:
◮ (1 + ǫ)-(composable) core-sets ◮ 1-pass Streaming and 2-round MapReduce (MR) algorithms
– approximation ratio: αseq + ǫ – local space requirements:
Streaming MR (det.) MR (rand.) r-edge/cycle O(k/ǫD) O
O(k2/ǫD) O
SLIDE 46 Summary of results For doubling spaces of dim D, all div’s, and ǫ > 0:
◮ (1 + ǫ)-(composable) core-sets ◮ 1-pass Streaming and 2-round MapReduce (MR) algorithms
– approximation ratio: αseq + ǫ – local space requirements:
Streaming MR (det.) MR (rand.) r-edge/cycle O(k/ǫD) O
O(k2/ǫD) O
- k
- n/ǫD
- O
- kn log n/ǫD
- ◮ One extra pass/round brings deterministic space bounds for
- ther div′ s down to those for r-edge/cycle
SLIDE 47
Our approach
SLIDE 48
Core-set construction: algorithm
Input dataset: S Optimal solution OPT ⊂ S, with |OPT| = k
SLIDE 49
Core-set construction: algorithm
Input dataset: S Optimal solution OPT ⊂ S, with |OPT| = k MAIN IDEA: Compute core-set T such that each o ∈ OPT has a (distinct) proxy p(o) ∈ T with “small” d(o, p(o))
SLIDE 50 Core-set construction: algorithm
Input dataset: S Optimal solution OPT ⊂ S, with |OPT| = k MAIN IDEA: Compute core-set T such that each o ∈ OPT has a (distinct) proxy p(o) ∈ T with “small” d(o, p(o))
- 1. Partition S into k′ > k clusters of small radius
SLIDE 51 Core-set construction: algorithm
Input dataset: S Optimal solution OPT ⊂ S, with |OPT| = k MAIN IDEA: Compute core-set T such that each o ∈ OPT has a (distinct) proxy p(o) ∈ T with “small” d(o, p(o))
- 1. Partition S into k′ > k clusters of small radius
- 2. T = {cluster centers}
SLIDE 52 Core-set construction: algorithm
Input dataset: S Optimal solution OPT ⊂ S, with |OPT| = k MAIN IDEA: Compute core-set T such that each o ∈ OPT has a (distinct) proxy p(o) ∈ T with “small” d(o, p(o))
- 1. Partition S into k′ > k clusters of small radius
- 2. T = {cluster centers}
- 3. If injectivity of p(·) required:
T = {cluster centers} ∪ {f (k) ≤ k − 1 delegates per cluster}.
SLIDE 53 Core-set construction: algorithm
◮ k = 3, k′ = 8
SLIDE 54 Core-set construction: algorithm
◮ Compute k′-center clustering
SLIDE 55 Core-set construction: algorithm
◮ No injectivity required: T = {cluster centers} (|T| = k′)
SLIDE 56 Core-set construction: algorithm
◮ Injectivity required: T = {1 center + 2 delegates per cluster}
SLIDE 57
Core-set construction: analysis
Setting for the analysis:
SLIDE 58 Core-set construction: analysis
Setting for the analysis:
◮ S from doubling space of dimension D ◮ Focus on remote-clique (similiar for other div’s) ◮ Fix ǫ ∈ (0, 1/2).
SLIDE 59 Core-set construction: analysis
Setting for the analysis:
◮ S from doubling space of dimension D ◮ Focus on remote-clique (similiar for other div’s) ◮ Fix ǫ ∈ (0, 1/2).
Theorem
If k′ = (16/ǫ)Dk and k − 1 delegates per clusters are taken, T is a (1 + ǫ)-core-set for S, of size O
SLIDE 60
Core-set construction: analysis Proof.
SLIDE 61 Core-set construction: analysis Proof.
◮ Let ρ = div(OPT)/
k
2
Claim: The radius of an optimal k′-clustering of S is rk′ ≤ (ǫ/8)ρ
SLIDE 62 Core-set construction: analysis Proof.
◮ Let ρ = div(OPT)/
k
2
Claim: The radius of an optimal k′-clustering of S is rk′ ≤ (ǫ/8)ρ
◮ ∃ an injective p(·) such that for each o ∈ OPT, p(o) ∈ T and
d(o, p(o)) ≤ 2rk′ ≤ (ǫ/4)ρ
SLIDE 63 Core-set construction: analysis Proof.
◮ Let ρ = div(OPT)/
k
2
Claim: The radius of an optimal k′-clustering of S is rk′ ≤ (ǫ/8)ρ
◮ ∃ an injective p(·) such that for each o ∈ OPT, p(o) ∈ T and
d(o, p(o)) ≤ 2rk′ ≤ (ǫ/4)ρ
◮ Hence:
divk(T) ≥
d(p(o1), p(o2)) (injectivity!) ≥ divk(S)/(1 + ǫ) (claim+triangle ineq.)
SLIDE 64
Core-set construction: composability
SLIDE 65
Core-set construction: composability Composability
SLIDE 66 Core-set construction: composability Composability
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ Extract a core-set Ti ⊂ Si as before ◮ ∪Ti is a (1 + ǫ)-core-set of size O
.
SLIDE 67 Core-set construction: composability Composability
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ Extract a core-set Ti ⊂ Si as before ◮ ∪Ti is a (1 + ǫ)-core-set of size O
. ⇒ the Ti’s are (1 + ǫ)-composable core-sets
SLIDE 68 Core-set construction: composability Composability
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ Extract a core-set Ti ⊂ Si as before ◮ ∪Ti is a (1 + ǫ)-core-set of size O
. ⇒ the Ti’s are (1 + ǫ)-composable core-sets
Saving space
SLIDE 69 Core-set construction: composability Composability
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ Extract a core-set Ti ⊂ Si as before ◮ ∪Ti is a (1 + ǫ)-core-set of size O
. ⇒ the Ti’s are (1 + ǫ)-composable core-sets
Saving space
◮ Initial random partition ◮ Build each Ti picking Θ (max{k/ℓ, log n}) delegates per
cluster (rather than k − 1)
◮ ∪Ti is a (1 + ǫ)-core-set of size O
, w.h.p.
SLIDE 70
Streaming implementation
SLIDE 71 Streaming implementation
◮ One-pass
◮ Fix k′ = O
◮ Run adaptation of 8-approximate k′-center algorithm of
[Charikar et al.’04] to compute a (1 + ǫ)-core-set of size O (k′ · k) = O
◮ Run sequential approximation on the core-set
SLIDE 72 Streaming implementation
◮ One-pass
◮ Fix k′ = O
◮ Run adaptation of 8-approximate k′-center algorithm of
[Charikar et al.’04] to compute a (1 + ǫ)-core-set of size O (k′ · k) = O
◮ Run sequential approximation on the core-set
◮ If D is unknown, it can be guessed in O (1) passes
SLIDE 73
MapReduce implementation
SLIDE 74 MapReduce implementation
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ
SLIDE 75 MapReduce implementation
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ Two rounds
◮ Round 1: run 2-approximation k′-center algorithm in each Si
to find Ti (e.g., Gonzalez’s)
◮ Round 2: run best sequential algorithm for k-diversity on ∪Ti
SLIDE 76 MapReduce implementation
◮ Input partition S = S1 ∪ S2 ∪ · · · ∪ Sℓ ◮ Two rounds
◮ Round 1: run 2-approximation k′-center algorithm in each Si
to find Ti (e.g., Gonzalez’s)
◮ Round 2: run best sequential algorithm for k-diversity on ∪Ti
◮ Need not know D! Stop when radius of k′-clustering is a
factor ǫ of radius of k-clustering
SLIDE 77
Experiments
SLIDE 78
Experiments: setup
SLIDE 79 Experiments: setup Datasets:
◮ Synthetic data: [4M ÷ 1.6G] points in R3 ◮ Real data: musiXmatch dataset
◮ ≈ 250K songs. ◮ bag-of-words model, cosine distance
SLIDE 80 Experiments: setup Datasets:
◮ Synthetic data: [4M ÷ 1.6G] points in R3 ◮ Real data: musiXmatch dataset
◮ ≈ 250K songs. ◮ bag-of-words model, cosine distance
Platform:
◮ 16-node cluster (Intel i7) ◮ 18GB-RAM/256GB-SSD per node ◮ 10Gbps ethernet ◮ MapReduce: Apache Spark ◮ Streaming: simulation
SLIDE 81 Experiments: setup Datasets:
◮ Synthetic data: [4M ÷ 1.6G] points in R3 ◮ Real data: musiXmatch dataset
◮ ≈ 250K songs. ◮ bag-of-words model, cosine distance
Platform:
◮ 16-node cluster (Intel i7) ◮ 18GB-RAM/256GB-SSD per node ◮ 10Gbps ethernet ◮ MapReduce: Apache Spark ◮ Streaming: simulation
SLIDE 82 Experiments: effectiveness of approach Streaming algorithm on real dataset
◮ X-axis: K ◮ Y-axis: approx-ratio (w.r.t. empirical best solution)
SLIDE 83 Experiments: comparison with state-of-art Our algorithm (CPPU) vs [Aghamolei et al.’15] (AFZ)
◮ MapReduce using 16 machines ◮ 4M points in R3 (max feasible for AFZ) ◮ Our algorithm: k′ = 128
approximation time (s) k AFZ CPPU AFZ CPPU 4 1.023 1.012 807.79 1.19 6 1.052 1.018 1,052.39 1.29 8 1.029 1.028 4,625.46 1.12
SLIDE 84 Experiments: scalability Scalability in MapReduce wrt input size
◮ Synthetic data
SLIDE 85 Experiments: scalability Scalability in MapReduce wrt number of processors
◮ Synthetic data: 100M points ◮ 1 processor ≡ streaming algorithm
SLIDE 86
Diversity maximization under matroid costraints
SLIDE 87
Diversity maximization under matroid costraints
SLIDE 88 Diversity maximization under matroid costraints
◮ Consider diversity maximization problems where solutions are
required to satisfy some given matroid constraint
SLIDE 89 Diversity maximization under matroid costraints
◮ Consider diversity maximization problems where solutions are
required to satisfy some given matroid constraint
◮ [Abassi et al. KDD’13]: sequential (2 + ǫ)-approximation
(based on local-search) for remote-clique diversity under matroid constraint
SLIDE 90 Diversity maximization under matroid costraints
◮ Consider diversity maximization problems where solutions are
required to satisfy some given matroid constraint
◮ [Abassi et al. KDD’13]: sequential (2 + ǫ)-approximation
(based on local-search) for remote-clique diversity under matroid constraint
◮ Our approach can be generalized to provide
(1 + ǫ)-composable core-sets for all div’s under partition/transversal matroid constraint in doubling spaces. Main adaptations required:
SLIDE 91 Diversity maximization under matroid costraints
◮ Consider diversity maximization problems where solutions are
required to satisfy some given matroid constraint
◮ [Abassi et al. KDD’13]: sequential (2 + ǫ)-approximation
(based on local-search) for remote-clique diversity under matroid constraint
◮ Our approach can be generalized to provide
(1 + ǫ)-composable core-sets for all div’s under partition/transversal matroid constraint in doubling spaces. Main adaptations required:
◮ Selection of k′
SLIDE 92 Diversity maximization under matroid costraints
◮ Consider diversity maximization problems where solutions are
required to satisfy some given matroid constraint
◮ [Abassi et al. KDD’13]: sequential (2 + ǫ)-approximation
(based on local-search) for remote-clique diversity under matroid constraint
◮ Our approach can be generalized to provide
(1 + ǫ)-composable core-sets for all div’s under partition/transversal matroid constraint in doubling spaces. Main adaptations required:
◮ Selection of k′ ◮ Selection of delegates in each cluster to include in core-set
SLIDE 93
Summary and future work
SLIDE 94
Summary
SLIDE 95 Summary
◮ (1 + ǫ)-(composable) core-sets for diversity maximization in
doubling spaces (under partition/transversal matroid costraint)
◮ 1-pass/2-round Streaming/MapReduce algorithms ◮ Experiments on real and synthetic data reveal effectiveness,
efficiency and scalability of our approach
SLIDE 96
Future Work
SLIDE 97 Future Work
◮ Further exploit randomization to obtain lower space
constraints in MapReduce
◮ Extend to broader classes of metric spaces ◮ Deal with general matroid constraints ◮ Incorporate other constraints (e.g., density)
SLIDE 98 Future Work
◮ Further exploit randomization to obtain lower space
constraints in MapReduce
◮ Extend to broader classes of metric spaces ◮ Deal with general matroid constraints ◮ Incorporate other constraints (e.g., density)