Presented by Haoran Ma, Yifan Qiao
The Hadoop Distributed File System
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA
The Hadoop Distributed File System Konstantin Shvachko, Hairong - - PowerPoint PPT Presentation
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA Presented by Haoran Ma, Yifan Qiao Outline Introduction Architecture File I/O Operations and
Presented by Haoran Ma, Yifan Qiao
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA
store them on a cluster of commodity hardwares.
throughput of data access.
is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications.
can move to where data resides
Usually 128 MB
Source: HDFS Tutorial – A Complete Hadoop HDFS Overview. DATAFLAIR TEAM.
Source: Hadoop HDFS Architecture Explanation and Assumptions. DATAFLAIR TEAM.
blocks, replicas and other details in memory
assigns tasks to them
Store Application Data
Source: Hadoop HDFS Architecture Explanation and Assumptions. DATAFLAIR TEAM.
HDFS Client: a code library that exports the HDFS file system interface
tolerance?
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.
Block Report Heartbeat
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.
What if NameNode fails?
Image = Checkpoint + Journal
the organization of application data as directories and files.
written to disk.
also stored in the local host’s native file system.
journal to create a new checkpoint and an empty journal.
system namespace that is always synchronized with the state of the NameNode.
stored in the system during upgrades.
system(both data and metadata).
stored in the system during upgrades.
system(both data and metadata).
Copy on Write
Combine Return
NameNode DataNode (Example) BackupNode CheckpointNode Memory:
Image
Disk: Memory:
Synchronize
Image Checkpoint Journal
New Checkpoint Empty Journal
Snapshot Snapshot (Only Hard Links)
Disk: Disk:
(1) addBlock (2) unique block ids (3) write to block
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.
(1) addBlock (2) unique block ids (3) write to block
NameNode to get a lease and destination DataNodes
DataNodes in a pipeline way
after finishing the previous block
The visibility of the modification is not guaranteed!
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.list of blocks and their replicas' locations
nearest replica, and so on
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.
block
much as possible
replica of any block
the same block, provided there are sufficient racks on the cluster
Source: Understanding Hadoop Clusters and the Network. Brad Hedlund.
typical configuration:
applications when replicating blocks 3 times
(Naive estimation for a node failure probability during a year is ~9.2%)
lose some blocks
Scenario Read (MB/s per node) Write (MB/s per node) DFSIO 66 40
7200 RPM Desktop HDD[6]
< 130 (typical 50-120) < 130 (typical 50-120)
Table1: Contrived benchmark compared with typical HDD performance
Scenario Read (MB/s per node) Write (MB/s per node) Busy Cluster 1.02 1.09
Table2: HDFS performance in a production cluster
Bytes (TB) Nodes Maps Reduces Time / s HDFS I/O Bytes/s Aggregate (GB) Per Node (MB) 1 1460 8000 2700 62 32 22.1 1000 3658 80000 20000 58500 34.2 9.35
Table 3: Sort benchmark
1000TB is too large to fit in the node memory intermediate results spill to disks and occupy disk bandwidth
Operation Throughput (ops/s) Open file for read 126 100 Create file 5600 Rename file 8300 Delete file 20 700 DataNode Heartbeat 300 000 Blocks report (blocks/s) 639 700
Table4: NameNode throughput benchmark
involve modifying nodes, can be the bottleneck in large scale
*: The title is from two great books: Six Easy Pieces: Essentials Of Physics Explained By Its Most Brilliant Teacher, by Richard P . Feynman, and Operating Systems: Three Easy Pieces, by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau
fails
parallel access
to schedule computation tasks to where the data reside
bandwidth
abstraction
[1] “The Hadoop Distributed File System”. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler. [2] “Hadoop HDFS Architecture Explanation and Assumptions”. DATAFLAIR TEAM. https://data-flair.training/blogs/hadoop-hdfs-architecture/ [3] “HDFS Tutorial – A Complete Hadoop HDFS Overview”. DATAFLAIR TEAM. https:// data-flair.training/blogs/hadoop-hdfs-tutorial/ [4] “HDFS Architecture”. http://hadoop.apache.org/docs/current/hadoop-project-dist/ hadoop-hdfs/HdfsDesign.html [5] “Understanding Hadoop Clusters and the Network”. Brad Hedlund. http:// bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/ [6] "Speed Considerations". Seagate. https://web.archive.org/web/20110920075313/ http://www.seagate.com/www/en-us/support/before_you_buy/speed_considerations