Ges de Dées à Gra Écel
Mape n Hop
Francieli ZANON BOITO
francieli.zanon-boito@inria.fr November 2018
Mape n Hop Francieli ZANON BOITO francieli.zanon-boito@inria.fr - - PowerPoint PPT Presentation
Ges de Des Gra cel Mape n Hop Francieli ZANON BOITO francieli.zanon-boito@inria.fr November 2018 Refce Slides by Thomas Ropars Coursera
Francieli ZANON BOITO
francieli.zanon-boito@inria.fr November 2018
2 of 61
implementation (Apache Hadoop)
3 of 61
○ "The Google File System", S. Ghemawat et al. 2003 ○ "MapReduce: simplified data processing on large clusters", D. Jeffrey and S. Ghemawat 2004
○ Building the indexing system for Google Search ○ Extracting properties of web pages ○ Graph processing, etc
○ The amount of data they handle increased too much ○ Moved on to more efficient technologies 4 of 61
* https://www.datacenterknowledge.com/archives/2014/06/25/google-dumps-mapreduce-favor-new-hyper-scale-analytics-system
* https://dzone.com/articles/how-is-facebook-deploying-big-data
** http://yahoohadoop.tumblr.com/post/138739227316/hadoop-turns-10
*** http://labs.criteo.com/about-us/
○ Implemented by people working at Yahoo!, released in 2006
○ Notably, Facebook * ○ HDFS @ Yahoo!: 600PB on 35K servers ** ○ Criteo (42k cores, 150PB, 300k jobs per day) *** 5 of 61
6 of 61
○ Transformation operation ○ A function is applied to each element of the input set ○ map( f )[ x0, ..., xn ] = [ f (x0), ..., f (xn) ] ○ map(∗2)[2, 3, 6] = [4, 6, 12]
○ Aggregation operation (fold) ○ reduce( f )[ x0, ..., xn ] = [ f ( x0, f ( x1, ..., f (xn-1 , xn ))) ] ○ reduce(+)[2, 3, 6] = (2 + (3 + 6)) = 11 ○ In MapReduce, Reduce is applied to all the elements with the same key 7 of 61
○ Handles distribution of data and the computation ○ Detects failure and automatically takes corrective actions
○ Data parallelism (as opposed to task parallelism): running the same task on different data pieces in parallel ○ Move the computation instead of the data
■ Distributed file system is central ■ Execute tasks where their data is
8 of 61
○ Data replication by the distributed file system ○ Intermediate results are written to disk ○ Failed tasks are re-executed on other nodes ○ Tasks can be executed multiple times in parallel to deal with stragglers (slow nodes) 9 of 61
○ MapReduce ○ HDFS ○ Yarn
10 of 61
< 1, "aaa bb ccc" > < 2, "aaa bb" > < "aaa", 2 > < "bb", 3 > < "ccc", 1 >
11 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" 12 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" "aaa", 1 "bb", 1 "ccc", 1 13 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" 2, "bb bb d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 14 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1 15 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1
reduce(key, values): result = 0 for value in values: result += value
16 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "aaa", 1
reduce(key, values): result = 0 for value in values: result += value
"aaa", 2 "aaa", [1,1] 17 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "bb", 1 "bb", 1 "aaa", 1 "bb", 1
reduce(key, values): result = 0 for value in values: result += value
"aaa", 2 "bb", 4 "bb", [1,1,1,1] 18 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1
reduce(key, values): result = 0 for value in values: result += value
"aaa", 2 "bb", 4 "ccc", 1 "d", 3 19 of 61
1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1 "aaa", 2 "bb", 4 "ccc", 1 "d", 3
But we generate a lot of intermediate data! Why not keep a centralized counter per word? That's the price we pay for scalability! Let's see how it works.
20 of 61
○ MapReduce ○ HDFS ○ Yarn
21 of 61
○ map and reduce functions to manipulate key-value pairs ○ key and value types (map output needs to match reduce input)
associated with that key)
22 of 61
Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle
We start with the input separated in blocks and distributed over the nodes
23 of 61
Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle
We have one map task per input block (each task executes the map function multiple times) In the same node to avoid data movement!
24 of 61
Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle
Now we have the Shuffle & Sort phase First sort each map task output by key
25 of 61
Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle
Send the pairs to the adequate reduce task (hashing) The number of reduce tasks is configurable
26 of 61
Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle
Combine the pairs that have the same key
27 of 61
Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle
Run the reduce tasks
28 of 61
Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle
Now we have (unsorted) output that is distributed over some nodes
29 of 61
○ Sequential read and writes only ○ Write-once-read-many file access (supports append and truncate) 30 of 61
○ Recently: 128MB blocks
○ Default replication factor is 3
○ NameNode: clients' entry point ○
31 of 61
switch
32 of 61
switch
Oraz i rs (conan i s in te m ak)
33 of 61
switch NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN
34 of 61
○ The NameNode checks permissions, etc
○ To each block, the client asks the NameNode for a list of destination DataNodes ○ The NameNode returns a list sorted by distance to the client
○ The client sends it to the first (closest) DataNode ○ Each DataNode forwards it to the next DataNode in the list (to create the replicas)
35 of 61
switch NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN
36 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Cre il A
37 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Ac
38 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Lis D? For each block!
39 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
D0, D5, an D9 For each block!
40 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
da For each block!
41 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
da For each block!
42 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Ac For each block!
43 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Don h i A
44 of 61
○ To each block, a list of DataNodes that have that block, sorted by distance to the client
○ Tries to read each block from the closest DataNode, if it is not available try the others 45 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Red e A
46 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Blo0: D0, D5, an D9 Blo1: D6, D10, an D11
47 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Blo0?
48 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
da
49 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
Blo1?
50 of 61
NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client
da
51 of 61
52 of 61
Source: https://www.ibm.com/developerworks/library/bd-yarn-intro/index.html 53 of 61
○ MapReduce ○ HDFS ○ Yarn
54 of 61
Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle
That is costly!
55 of 61
map(key, value): for each word in value:
1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1
reduce(key, values): result = 0 for value in values: result += value
"aaa", 2 "bb", 4 "ccc", 1 "d", 3 56 of 61
User-defined function for local aggregation on the map tasks here!
57 of 61
1, "aaa bb ccc" 2, "bb bb d" "d", 1 "aaa", 1 "bb", 1 "d", 1 3, "d aaa bb" 4, "d"
map
"aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1
map
58 of 61
1, "aaa bb ccc" 2, "bb bb d" "d", 1 "aaa", 1 "bb", 1 "d", 1 3, "d aaa bb" 4, "d"
map
"aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1
map combiner
"aaa", 1 "bb", 3 "ccc", 1 "d", 1 59 of 61
1, "aaa bb ccc" 2, "bb bb d" "d", 1 "aaa", 1 "bb", 1 "d", 1 3, "d aaa bb" 4, "d"
map
"aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1
map combiner
"d", 2 "aaa", 1 "bb", 1 60 of 61
combiner
"aaa", 1 "bb", 3 "ccc", 1 "d", 1
Ghemawat.
○ Chapter 10 of Designing Data-Intensive Applications by Martin Kleppmann ○ HDFS Carton: https://wiki.scc.kit.edu/gridkaschool/upload/1/18/Hdfs-cartoon.pdf ○ MapReduce illustration: https://words.sdsc.edu/words-data-science/mapreduce 61 of 61