Shuffle Phase Executed only in the case of one or more reducers - - PowerPoint PPT Presentation

shuffle phase
SMART_READER_LITE
LIVE PREVIEW

Shuffle Phase Executed only in the case of one or more reducers - - PowerPoint PPT Presentation

Shuffle Phase Executed only in the case of one or more reducers Transfers data between the mappers and reducers Groups records by their keys to ensure local processing in the reduce phase 01/23/2018 15 Shuffle Phase Map 1 Map 2 Map 3


slide-1
SLIDE 1

Shuffle Phase

Executed only in the case of one or more reducers Transfers data between the mappers and reducers Groups records by their keys to ensure local processing in the reduce phase

01/23/2018 15

slide-2
SLIDE 2

Shuffle Phase

01/23/2018 16

Map1 Map2 Map3 MapM … Reduce1 Reduce2 ReduceN …

slide-3
SLIDE 3

Mapi

Shuffle Phase (Map-side)

01/23/2018 17

Input Split map k v k v k v k v k v k v k v Partition k v k v k v k v k v k v k v k v k v kA kZ k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v Reduce1 Reduce2 ReduceN … 1 N-1 1 N-1 1 N-1 1 N-1

slide-4
SLIDE 4

Shuffle Phase (Reduce-side)

01/23/2018 18

Reducej Map1 Map2 Map3 MapM … Copy Sort Reduce part1 part2 part3 partM k v k v k v

k v k v k v k v k v k v k v

slide-5
SLIDE 5

Reduce Phase

Apply the reduce function to each group of similar keys

01/23/2018 19

k1 v k1 v k2 v k2 v k3 v k3 v k3 v

reduce reduce reduce

k… v

kN v kN v kN v kN v kN v

reduce reduce

  • utput
slide-6
SLIDE 6

Output Writing

Materializes the final output to disk All results are from one process (mapper/reducer) are stored in a subdirectory An OutputFormat is used to

Create any files in the output directory Write the output records one-by-one to the output Merge the results from all the tasks (if needed)

While the output writing runs in parallel, the final commit step runs on a single machine

01/23/2018 20

slide-7
SLIDE 7

MapReduce Examples

Input: A log file Filter Aggregation Conversion

01/23/2018 21

slide-8
SLIDE 8

Advanced Issues

Map failures Reduce failures Straggler problem Custom keys and values Efficient sorting on serialized data Pipeline MapReduce jobs

01/23/2018 22