Agenda
n Background n Model n Hamming Distance 1 n Triangle Finding n Matrix Multiplication
cs848 Models and Applications of Distributed Data Processing Systems
1
Agenda 1 cs848 Models and Applications of Distributed Data - - PowerPoint PPT Presentation
n Background n Model n Hamming Distance 1 n Triangle Finding n Matrix Multiplication Agenda 1 cs848 Models and Applications of Distributed Data Processing Systems The Problem Tradeoff between parallelism and communication cost in a
n Background n Model n Hamming Distance 1 n Triangle Finding n Matrix Multiplication
cs848 Models and Applications of Distributed Data Processing Systems
1
The Problem
map-reduce computation.
parallelism can be extracted, the greater will be the total communication between mappers and reducers.
cs848 Models and Applications of Distributed Data Processing Systems
2
Why important
computation.
cs848 Models and Applications of Distributed Data Processing Systems
3
Previous Work
communication cost in one round Map-Reduce computations.
case.
algorithms that we might think of as truly parallel.
cs848 Models and Applications of Distributed Data Processing Systems
4
Two Parameters
which each input is mapped by the mappers.
reducer can receive.
cs848 Models and Applications of Distributed Data Processing Systems
5
cs848 Models and Applications of Distributed Data Processing Systems
6
Tradeoff
cs848 Models and Applications of Distributed Data Processing Systems
7
r = qi I
i=1 p
Mapping Schemas
assigned all of the inputs for that output. We say such a reducer covers the output. This reducer need not be unique, and it is permitted that these same inputs are assigned also to other reducers.
cs848 Models and Applications of Distributed Data Processing Systems
8
cs848 Models and Applications of Distributed Data Processing Systems
9
cs848 Models and Applications of Distributed Data Processing Systems
10 Steps:
Q1: Is this assumption reasonable? Q2: Can be applied to most problems or only several specific problem?
cs848 Models and Applications of Distributed Data Processing Systems
11
cs848 Models and Applications of Distributed Data Processing Systems
12
cs848 Models and Applications of Distributed Data Processing Systems
13
proof in technical report: F. N. Afrati, A. D. Sarma, S. Salihoglu, and J. D. Ullman. Upper and lower bounds on the cost of a map-reduce computation. CoRR, abs/1206.4377, 2012.
cs848 Models and Applications of Distributed Data Processing Systems
14
cs848 Models and Applications of Distributed Data Processing Systems
15 Upper Bound: Splitting Algorithm
cs848 Models and Applications of Distributed Data Processing Systems
16 Upper Bound for large q: Replicas on neighboring reducer
easily to higher distance.
reducer.
cs848 Models and Applications of Distributed Data Processing Systems
17
nodes such that in the graph there are edges between each pair
partition the nodes into disjoint sets, such that the subgraph induced by each partition is either:
cs848 Models and Applications of Distributed Data Processing Systems
18
cs848 Models and Applications of Distributed Data Processing Systems
19
cs848 Models and Applications of Distributed Data Processing Systems
20 Matrix Multiplication Using Two Phases
bound-on-the-cost-of-a-map-reduce-computation
ullman.pdf
cs848 Models and Applications of Distributed Data Processing Systems
21
cs848 Models and Applications of Distributed Data Processing Systems
22