Hadoop Map Reduce
01/18/2018 1
Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming - - PowerPoint PPT Presentation
Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 01/18/2018 2 Overview MR Driver Program
01/18/2018 1
A programming paradigm A query execution engine
01/18/2018 2
01/18/2018 3
Driver
Slave nodes Master node Developer
MR Program MR Job
01/18/2018 4
01/18/2018 5
Driver Job submission Job preparation Map Shuffle Reduce Cleanup
Compatible Hadoop binaries Cluster configuration files Network access to the master node
Input and output paths Map, reduce, and any other functions Any additional user configuration
01/18/2018 6
Key: String Value: String Input hdfs://user/eldawy/README.txt Output hdfs://user/eldawy/wordcount Mapper edu.ucr.cs.cs226.eldawy.WordCount Reducer … JAR File … User-defined User-defined
01/18/2018 7
Master node
Serialized over network
01/18/2018 8
01/18/2018 9
Configuration JAR File
Master node
HDFS InputFormat#getSplits() Split1 Split2 .. SplitM Mapper1 Mapper2 .. MapperM FileInputSplit Path Start End
Read the input Apply the map function Apply the combine function (if configured) Store the map output
01/18/2018 10
01/18/2018 11
Master node
IS1 IS2 IS3 IS4 IS5 ISM …
01/18/2018 12
01/18/2018 13