Hadoop Distributed File System (HDFS)
1
Hadoop Distributed File System (HDFS) 1 HDFS Overview A - - PowerPoint PPT Presentation
Hadoop Distributed File System (HDFS) 1 HDFS Overview A distributed file system Built on the architecture of Google File System (GFS) Shares a similar architecture to many other common distributed storage engines such as Amazon S3 and
1
2
B B B B B B B B B B B B B B B Name node Data nodes
3
B B B B B B B B B B B B B B B Name node Data nodes File and directory names Block ordering and locations Capacity of data nodes Architecture of data nodes Block data Name node location
4
5
File1 List of iNodes Block 1 Block 2 Block 3 … File1 List of block locations Meta data B B B B B B B B B B B B B B B
6
Data nodes File creator Name node
7
Data nodes File creator Create(…) Name node The creator process calls the create function which translates to an RPC call at the name node
8
Name node Data nodes File creator Create(…) The master node creates three initial blocks
machine
random machine in the same rack of the first machine
machine in another rack 1 2 3
9
Name node Data nodes File creator OutputStream 1 2 3
10
Name node Data nodes File creator 1 2 3 OutputStream#write
11
Name node Data nodes File creator 1 2 3 OutputStream#write
12
Name node Data nodes File creator 1 2 3 OutputStream#write
13
Name node Data nodes File creator 1 2 3 OutputStream#write When a block is filled up, the creator contacts the name node to create the next block Next block
14
15
Name node Data nodes File creator If the file creator is running on one
is always assigned to that node
16
17
Data nodes File reader
Name node The reader process calls the open function which translates to an RPC call at the name node
18
Data nodes File reader InputStream Name node The name node locates the first block
The name node returns an input stream for the file
19
Data nodes File reader InputStream#read(…) Name node
20
Data nodes File reader Name node When an end-of-block is reached, the name node locates the next block Next block
21
Data nodes File reader Name node seek(pos) InputStream#seek operation locates a block and positions the stream accordingly
22
Data nodes File reader Name node
chosen to read
machine in the same rack is chosen
chosen Open, seek
23
When self-reading occurs, HDFS can make it much faster through a feature called short-circuit
24
25
26
B B B B B B B B B B B B B B B B B B B
27
B B B B B B B B B B B B B B B
28
B B B B B B B B B B B B B B B
29
Name node File 1 File 2 File 3 Concatenate File 1 + File 2 + File 3 ➔ File 4 Rather than creating new blocks, HDFS can just change the metadata in the name node to delete File 1, File 2, and File 3, and assign their blocks to a new File 4 in the right order.
30
FileSystem DistributedFileSystem LocalFileSystem S3FileSystem Path Configuration
31
Configuration conf = new Configuration(); Path path = new Path(“…”); FileSystem fs = path.getFileSystem(conf); // To get the local FS fs = FileSystem.getLocal (conf); // To get the default FS fs = FileSystem.get(conf);
32
FSDataOutputStream out = fs.create(path, …);
fs.delete(path, recursive); fs.deleteOnExit(path);
fs.rename(oldPath, newPath);
33
FSDataInputStream in = fs.open(path, …);
in.seek(pos); in.seekToNewSource(pos);
34
fs.concat(destination, src[]);
fs.getFileStatus(path);
fs.getFileBlockLocations(path, from, to);