Introduction to Hadoop
1
Introduction to Hadoop 1 Distributed Data Processing The idea of - - PowerPoint PPT Presentation
Introduction to Hadoop 1 Distributed Data Processing The idea of distributed databases is older than you might think Richard Peebles, Eric G. Manning: A Computer Architecture for Large (Distributed) Data Bases. VLDB 1975 : 405-427 Distributed
1
Richard Peebles, Eric G. Manning: A Computer Architecture for Large (Distributed) Data Bases. VLDB 1975: 405-427
2
A cluster of machines Big input data Final
3
4
Master node Slave nodes Name node Resource manager Data node Node manager Data node Node manager Data node Node manager Data node Node manager Data node Node manager Data node Node manager
5
Input file (600 MB) 128 MB 128 MB 128 MB 128 MB 88 MB HDFS Block
6
B B B B B B B B B B B B B B B
7
8
if you cannot fly, then run, if you cannot run, then walk, if you cannot walk, then crawl, but whatever you do you have to keep moving forward Input text file Output you: 5 cannot: 3 walk: 2 if: 3 … Map(line) { split line into words for each word w
} Reduce(w, c[]) { s = Sum(c)
}
9
Standalone mode One JRE instance Pseudo-distributed mode Name node Resource manager Data node Node manager Cluster mode RM NN NM DN
10