Mape n Hop Francieli ZANON BOITO francieli.zanon-boito@inria.fr - - PowerPoint PPT Presentation

map e n h o p
SMART_READER_LITE
LIVE PREVIEW

Mape n Hop Francieli ZANON BOITO francieli.zanon-boito@inria.fr - - PowerPoint PPT Presentation

Ges de Des Gra cel Mape n Hop Francieli ZANON BOITO francieli.zanon-boito@inria.fr November 2018 Refce Slides by Thomas Ropars Coursera


slide-1
SLIDE 1

Ges de Dées à Gra Écel

Mape n Hop

Francieli ZANON BOITO

francieli.zanon-boito@inria.fr November 2018

slide-2
SLIDE 2

Refce

  • Slides by Thomas Ropars
  • Coursera – Big Data, University of California San Diego
  • The lecture notes of V. Leroy
  • Designing Data-Intensive Applications by Martin Kleppmann
  • Mining of Massive Datasets by Leskovec et al.

2 of 61

slide-3
SLIDE 3

In oy's as

  • The MapReduce paradigm for big data processing, and its most popular

implementation (Apache Hadoop)

  • Main ideas and how it works
  • In the TP: put it to practice

3 of 61

slide-4
SLIDE 4

Hisy

  • First publications

○ "The Google File System", S. Ghemawat et al. 2003 ○ "MapReduce: simplified data processing on large clusters", D. Jeffrey and S. Ghemawat 2004

  • Used to implement several tasks:

○ Building the indexing system for Google Search ○ Extracting properties of web pages ○ Graph processing, etc

  • Google does not use MapReduce anymore*

○ The amount of data they handle increased too much ○ Moved on to more efficient technologies 4 of 61

* https://www.datacenterknowledge.com/archives/2014/06/25/google-dumps-mapreduce-favor-new-hyper-scale-analytics-system

slide-5
SLIDE 5

Hisy

* https://dzone.com/articles/how-is-facebook-deploying-big-data

** http://yahoohadoop.tumblr.com/post/138739227316/hadoop-turns-10

*** http://labs.criteo.com/about-us/

  • Apache Hadoop: open source MapReduce framework

○ Implemented by people working at Yahoo!, released in 2006

  • Now it is a full ecosystem, used by many companies

○ Notably, Facebook * ○ HDFS @ Yahoo!: 600PB on 35K servers ** ○ Criteo (42k cores, 150PB, 300k jobs per day) *** 5 of 61

slide-6
SLIDE 6

Man e

  • A distributed computing execution framework
  • Data represented as key-value pairs
  • A distributed file system
  • Two main operations on data: Map and Reduce

6 of 61

slide-7
SLIDE 7

Map Rdu

  • The Map operation

○ Transformation operation ○ A function is applied to each element of the input set ○ map( f )[ x0, ..., xn ] = [ f (x0), ..., f (xn) ] ○ map(∗2)[2, 3, 6] = [4, 6, 12]

  • The Reduce operation

○ Aggregation operation (fold) ○ reduce( f )[ x0, ..., xn ] = [ f ( x0, f ( x1, ..., f (xn-1 , xn ))) ] ○ reduce(+)[2, 3, 6] = (2 + (3 + 6)) = 11 ○ In MapReduce, Reduce is applied to all the elements with the same key 7 of 61

slide-8
SLIDE 8

Wh i t op?

  • “Simple” to program and execute

○ Handles distribution of data and the computation ○ Detects failure and automatically takes corrective actions

  • Scale to large number of nodes

○ Data parallelism (as opposed to task parallelism): running the same task on different data pieces in parallel ○ Move the computation instead of the data

■ Distributed file system is central ■ Execute tasks where their data is

Cod e (ext), bet o l

8 of 61

slide-9
SLIDE 9

Wh i t op?

  • Fault tolerance

○ Data replication by the distributed file system ○ Intermediate results are written to disk ○ Failed tasks are re-executed on other nodes ○ Tasks can be executed multiple times in parallel to deal with stragglers (slow nodes) 9 of 61

slide-10
SLIDE 10

Age

  • Introduction
  • A first MapReduce program
  • Apache Hadoop

○ MapReduce ○ HDFS ○ Yarn

  • Combiners

10 of 61

slide-11
SLIDE 11

A fit Muc ga: wo cer

  • We want to count the occurrences of words in a text
  • Input: A set of lines, each line is a pair < line number, line content >
  • Output: A set of pairs < word, number of occurrences >

< 1, "aaa bb ccc" > < 2, "aaa bb" > < "aaa", 2 > < "bb", 3 > < "ccc", 1 >

11 of 61

slide-12
SLIDE 12

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" 12 of 61

slide-13
SLIDE 13

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" "aaa", 1 "bb", 1 "ccc", 1 13 of 61

slide-14
SLIDE 14

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" 2, "bb bb d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 14 of 61

slide-15
SLIDE 15

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1 15 of 61

slide-16
SLIDE 16

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1

reduce(key, values): result = 0 for value in values: result += value

  • utput pair(key, result)

16 of 61

slide-17
SLIDE 17

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "aaa", 1

reduce(key, values): result = 0 for value in values: result += value

  • utput pair(key, result)

"aaa", 2 "aaa", [1,1] 17 of 61

slide-18
SLIDE 18

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "bb", 1 "bb", 1 "aaa", 1 "bb", 1

reduce(key, values): result = 0 for value in values: result += value

  • utput pair(key, result)

"aaa", 2 "bb", 4 "bb", [1,1,1,1] 18 of 61

slide-19
SLIDE 19

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1

reduce(key, values): result = 0 for value in values: result += value

  • utput pair(key, result)

"aaa", 2 "bb", 4 "ccc", 1 "d", 3 19 of 61

slide-20
SLIDE 20

1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1 "aaa", 2 "bb", 4 "ccc", 1 "d", 3

But we generate a lot of intermediate data! Why not keep a centralized counter per word? That's the price we pay for scalability! Let's see how it works.

20 of 61

slide-21
SLIDE 21

Age

  • Introduction
  • A first MapReduce program
  • Apache Hadoop

○ MapReduce ○ HDFS ○ Yarn

  • Combiners

21 of 61

slide-22
SLIDE 22

Mape

  • The developer defines:

○ map and reduce functions to manipulate key-value pairs ○ key and value types (map output needs to match reduce input)

  • The map function will be executed once per input pair
  • The reduce function will be executed once per existing key (with all the values

associated with that key)

22 of 61

slide-23
SLIDE 23

Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle

We start with the input separated in blocks and distributed over the nodes

23 of 61

slide-24
SLIDE 24

Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle

We have one map task per input block (each task executes the map function multiple times) In the same node to avoid data movement!

24 of 61

slide-25
SLIDE 25

Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle

Now we have the Shuffle & Sort phase First sort each map task output by key

25 of 61

slide-26
SLIDE 26

Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle

Send the pairs to the adequate reduce task (hashing) The number of reduce tasks is configurable

26 of 61

slide-27
SLIDE 27

Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle

Combine the pairs that have the same key

27 of 61

slide-28
SLIDE 28

Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle

Run the reduce tasks

28 of 61

slide-29
SLIDE 29

Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle

Now we have (unsorted) output that is distributed over some nodes

29 of 61

slide-30
SLIDE 30

HS

  • Distributed file system for shared-nothing infrastructures
  • Main goals: scalability and fault tolerance, optimized for throughput
  • It is not POSIX-compliant

○ Sequential read and writes only ○ Write-once-read-many file access (supports append and truncate) 30 of 61

slide-31
SLIDE 31

HS

  • Files are partitioned into blocks and distributed over nodes

○ Recently: 128MB blocks

  • Replicas are topology aware (rack awareness)

○ Default replication factor is 3

  • Architecture:

○ NameNode: clients' entry point ○

  • ne DataNode per node

31 of 61

slide-32
SLIDE 32

switch

The e f posg os

32 of 61

slide-33
SLIDE 33

switch

Oraz i rs (conan i s in te m ak)

33 of 61

slide-34
SLIDE 34

switch NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN

34 of 61

slide-35
SLIDE 35

Wrig i

  • The client asks the NameNode

○ The NameNode checks permissions, etc

  • The NameNode allows the client to proceed
  • The client breaks data into blocks

○ To each block, the client asks the NameNode for a list of destination DataNodes ○ The NameNode returns a list sorted by distance to the client

  • Each block is written:

○ The client sends it to the first (closest) DataNode ○ Each DataNode forwards it to the next DataNode in the list (to create the replicas)

  • When it is all written, the client acknowledges the file creation to the NameNode
  • The NameNode saves information about the file (metadata) to disk

35 of 61

slide-36
SLIDE 36

switch NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN

36 of 61

slide-37
SLIDE 37

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Cre il A

37 of 61

slide-38
SLIDE 38

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Ac

38 of 61

slide-39
SLIDE 39

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Lis D? For each block!

39 of 61

slide-40
SLIDE 40

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

D0, D5, an D9 For each block!

40 of 61

slide-41
SLIDE 41

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

da For each block!

41 of 61

slide-42
SLIDE 42

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

da For each block!

42 of 61

slide-43
SLIDE 43

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Ac For each block!

43 of 61

slide-44
SLIDE 44

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Don h i A

44 of 61

slide-45
SLIDE 45

Red a f

  • The client asks the NameNode for information about the file
  • The NameNode gives the client a list of blocks

○ To each block, a list of DataNodes that have that block, sorted by distance to the client

  • The client reads the blocks sequentially

○ Tries to read each block from the closest DataNode, if it is not available try the others 45 of 61

slide-46
SLIDE 46

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Red e A

46 of 61

slide-47
SLIDE 47

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Blo0: D0, D5, an D9 Blo1: D6, D10, an D11

47 of 61

slide-48
SLIDE 48

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Blo0?

48 of 61

slide-49
SLIDE 49

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

da

49 of 61

slide-50
SLIDE 50

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

Blo1?

50 of 61

slide-51
SLIDE 51

NN DN DN DN DN DN DN DN DN DN DN DN DN DN DN Client

da

51 of 61

slide-52
SLIDE 52

Yar

  • The cluster resource manager: dynamically allocates resources to jobs
  • Multiple engines (in addition to MapReduce) run in parallel on the cluster
  • Hierarchical architecture for scalability

52 of 61

slide-53
SLIDE 53

Yar cec

Source: https://www.ibm.com/developerworks/library/bd-yarn-intro/index.html 53 of 61

slide-54
SLIDE 54

Age

  • Introduction
  • A first MapReduce program
  • Apache Hadoop

○ MapReduce ○ HDFS ○ Yarn

  • Combiners

54 of 61

slide-55
SLIDE 55

Figure from https://www.supinfo.com/articles/single/2807-introduction-to-the-mapreduce-life-cycle

That is costly!

55 of 61

slide-56
SLIDE 56

map(key, value): for each word in value:

  • utput pair(word, 1)

1, "aaa bb ccc" 2, "bb bb d" 3, "d aaa bb" 4, "d" "aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1 "d", 1 "aaa", 1 "bb", 1 "d", 1

reduce(key, values): result = 0 for value in values: result += value

  • utput pair(key, result)

"aaa", 2 "bb", 4 "ccc", 1 "d", 3 56 of 61

slide-57
SLIDE 57

Comr

User-defined function for local aggregation on the map tasks here!

57 of 61

slide-58
SLIDE 58

1, "aaa bb ccc" 2, "bb bb d" "d", 1 "aaa", 1 "bb", 1 "d", 1 3, "d aaa bb" 4, "d"

map

"aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1

map

58 of 61

slide-59
SLIDE 59

1, "aaa bb ccc" 2, "bb bb d" "d", 1 "aaa", 1 "bb", 1 "d", 1 3, "d aaa bb" 4, "d"

map

"aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1

map combiner

"aaa", 1 "bb", 3 "ccc", 1 "d", 1 59 of 61

slide-60
SLIDE 60

1, "aaa bb ccc" 2, "bb bb d" "d", 1 "aaa", 1 "bb", 1 "d", 1 3, "d aaa bb" 4, "d"

map

"aaa", 1 "bb", 1 "ccc", 1 "bb", 1 "bb", 1 "d", 1

map combiner

"d", 2 "aaa", 1 "bb", 1 60 of 61

combiner

"aaa", 1 "bb", 3 "ccc", 1 "d", 1

slide-61
SLIDE 61

Adina fec

  • MapReduce: Simplified Data Processing on Large Clusters, by J. Dean and S.

Ghemawat.

  • Suggested reading

○ Chapter 10 of Designing Data-Intensive Applications by Martin Kleppmann ○ HDFS Carton: https://wiki.scc.kit.edu/gridkaschool/upload/1/18/Hdfs-cartoon.pdf ○ MapReduce illustration: https://words.sdsc.edu/words-data-science/mapreduce 61 of 61