Hadoop Distributed File System (HDFS) 1 HDFS Overview A - - PowerPoint PPT Presentation

hadoop distributed file system hdfs
SMART_READER_LITE
LIVE PREVIEW

Hadoop Distributed File System (HDFS) 1 HDFS Overview A - - PowerPoint PPT Presentation

Hadoop Distributed File System (HDFS) 1 HDFS Overview A distributed file system Built on the architecture of Google File System (GFS) Shares a similar architecture to many other common distributed storage engines such as Amazon S3


slide-1
SLIDE 1

Hadoop Distributed File System (HDFS)

1

slide-2
SLIDE 2

HDFS Overview

  • A distributed file system
  • Built on the architecture of Google File

System (GFS)

  • Shares a similar architecture to many other

common distributed storage engines such as Amazon S3 and Microsoft Azure

  • HDFS is a stand-alone storage engine and

can be used in isolation of the query processing engine

  • Even if you do not use Hadoop MapReduce,

you will probably still use HDFS

2

slide-3
SLIDE 3

3

  • HDFS design and architecture
  • Fault tolerance in HDFS
  • Create (Write) a file
  • Stream reading
  • Structured reading
  • Java API
  • Command-line interface (HDFS Shell)

HDFS Topics

slide-4
SLIDE 4

HDFS Architecture

4

B B B B B B B B B B B B B B B Name node Data nodes

slide-5
SLIDE 5

What is where?

5

B B B B B B B B B B B B B B B Name node Data nodes File and directory names Block ordering and locations Capacity of data nodes Architecture of data nodes Block data Name node location

slide-6
SLIDE 6

Data Loading

6

Input file (600 MB) 128 MB 128 MB 128 MB 128 MB 88 MB HDFS Block

The most common replication factor is three

slide-7
SLIDE 7

HDFS Storage

7

B B B B B B B B B B B B B B B

slide-8
SLIDE 8

Analogy to Unix FS

8

The logical view is similar

/ user mary chu etc hadoop

slide-9
SLIDE 9

Analogy to Unix FS

9

The physical model is comparable

Unix HFDS

File1 List of iNodes Block 1 Block 2 Block 3 … File1 List of block locations Meta data B B B B B B B B B B B B B B B

slide-10
SLIDE 10

10

Fault Tolerance in HDFS

slide-11
SLIDE 11

11

  • The default fault tolerance mechanism in

HDFS is replication

  • The most common replication factor is

three

  • If one or two nodes are temporarily

unavailable, the data is still accessible

  • If one or two nodes permanently fail, the

master node replicates the under- replicated blocks to reach the desired replication factor

  • Drawback: reduced disk capacity

Replication

slide-12
SLIDE 12

12

  • Uses advanced algorithms for recovery,

e.g., Reed-Solomon, XOR Erasure Coding

Data Block

f1 f2

slide-13
SLIDE 13

13

  • Uses advanced algorithms for recovery,

e.g., Reed-Solomon, XOR Erasure Coding

f3

slide-14
SLIDE 14

14

  • Three-way

replication

  • Overhead=

! "

= 200%

  • Erase coding
  • If we use 5+2

scheme, as in the previous example

  • Overhead=

! #

= 40% Overhead

slide-15
SLIDE 15

15

Writing to HDFS

slide-16
SLIDE 16

HDFS Create

16

File creator Name node Data nodes 1 2 3

slide-17
SLIDE 17

HDFS Create

17

File creator Create(…) The creator process calls the create function which translates to an RPC call at the name node Name node Data nodes 1 2 3

slide-18
SLIDE 18

HDFS Create

18

Name node Data nodes File creator Create(…) The master node creates one initial block with three replicas

  • 1. First replica is assigned to a random

machine

  • 2. Second replica is assigned to another

random machine in a different rack

  • 3. Third replica is assigned to a random

machine on the same rack of the second machine 1 2 3 B1 File r1 r2 r3

slide-19
SLIDE 19

Physical Cluster Layout

19

Rack #1 Rack #2

Node #32

Node #3 Node #2 Node #1

Node #34 Node #33

slide-20
SLIDE 20

HDFS Create

20

File creator OutputStream(r1) Name node Data nodes 1 2 3 B1 File r1 r2 r3

slide-21
SLIDE 21

HDFS Create

21

File creator OutputStream#write Name node Data nodes 1 2 3 B1 File r1 r2 r3

slide-22
SLIDE 22

HDFS Create

22

File creator OutputStream#write Name node Data nodes 1 2 3 B1 File r1 r2 r3

slide-23
SLIDE 23

HDFS Create

23

File creator OutputStream#write Name node Data nodes 1 2 3 B1 File r1 r2 r3

slide-24
SLIDE 24

HDFS Create

24

File creator OutputStream#write à B2.r1 When a block is filled up, the creator contacts the name node to create the next block Next block Name node Data nodes 1 2 3 B1 File r1 r2 r3 B2 r1 r2 r3

slide-25
SLIDE 25

Notes about writing to HDFS

  • Data transfers of replicas are pipelined
  • The data does not go through the name

node

  • Random writing is not supported
  • Appending to a file is supported but it

creates a new block

25

slide-26
SLIDE 26

Self-writing

26

Name node Data nodes File creator If the file creator is running on one of the data nodes, the first replica is always assigned to that node The second and third replicas are assigned as before, i.e., the second replica on a different rack and the third replica on the same rack as the second one.

slide-27
SLIDE 27

27

Stream reading from HDFS

slide-28
SLIDE 28

Reading from HDFS

  • Reading is relatively easier
  • No replication is needed
  • Replication can be exploited
  • Random reading is allowed

28

slide-29
SLIDE 29

HDFS Read

29

Data nodes File reader

  • pen(…)

Name node The reader process calls the open function which translates to an RPC call at the name node

slide-30
SLIDE 30

HDFS Read

30

Data nodes File reader InputStream Name node The name node locates the first block of that file and returns the address of one of the nodes that store that block The name node returns an input stream for the file

slide-31
SLIDE 31

HDFS Read

31

Data nodes File reader InputStream#read(…) Name node

slide-32
SLIDE 32

HDFS Read

32

Data nodes File reader Name node When an end-of-block is reached, the name node locates the next block Next block

slide-33
SLIDE 33

HDFS Read

33

Data nodes File reader Name node seek(pos) InputStream#seek operation locates a block and positions the stream accordingly

slide-34
SLIDE 34

Self-reading

34

Data nodes File reader Name node

  • 1. If the block is locally stored on

the reader, this replica is chosen to read

  • 2. If not, a replica on another

machine in the same rack is chosen

  • 3. Any other random block is

chosen Open, seek When self-reading occurs, HDFS can make it much faster through a feature called short-circuit

slide-35
SLIDE 35

Notes About Reading

  • The API is much richer than the simple
  • pen/seek/close API

§ You can retrieve block locations § You can choose a specific replica to read

  • The same API is generalized to other file

systems including the local FS and S3

  • Review question: Compare random

access read in local file systems to HDFS

35

slide-36
SLIDE 36

36

Special Features in HDFS

slide-37
SLIDE 37

HDFS Special Features

  • Node decomission
  • Load balancer
  • Cheap concatenation

37

slide-38
SLIDE 38

Node Decommission

38

B B B B B B B B B B B B B B B B B B B

slide-39
SLIDE 39

Load Balancing

39

B B B B B B B B B B B B B B B

slide-40
SLIDE 40

Load Balancing

40

B B B B B B B B B B B B B B B

Start the load balancer

slide-41
SLIDE 41

Cheap Concatenation

41

Name node File 1 File 2 File 3 Concatenate File 1 + File 2 + File 3 è File 4 Rather than creating new blocks, HDFS can just change the metadata in the name node to delete File 1, File 2, and File 3, and assign their blocks to a new File 4 in the right order.

slide-42
SLIDE 42

42

HDFS Shell Command-line Interface (CLI)

slide-43
SLIDE 43

43

  • Used for common operations
  • Its usage is similar to Unix shell

commands

  • Basic operations include

§ ls, cp, mv, mkdir, rm, …

  • HDFS-specific operations include

§ copyToLocal, copyFromLocal, setrep, appendToFile, … HDFS Shell

slide-44
SLIDE 44

HDFS Shell

  • General format
  • So, instead of
  • You will write
  • A list of shell commands with usage

§ https://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop- common/FileSystemShell.html

44

hdfs dfs -<cmd> <arguments> mkdir –p myproject/mydir hdfs dfs -mkdir –p myproject/mydir

slide-45
SLIDE 45

HDFS API

45

FileSystem DistributedFileSystem LocalFileSystem S3FileSystem Path Configuration

slide-46
SLIDE 46

HDFS API Classes

  • Configuration: Holds system

configuration such as where the master node is running and default system parameters, e.g., replication factor and block size

  • Path: Stores a path to a file or directory
  • FileSystem: An abstract class for file

system operations

46

slide-47
SLIDE 47

Fully Qualified Path

47

hdfs://masternode:9000/path/to/file

hdfs: the file system scheme. Other possible values are file, ftp, s3, … masternode: the name or IP address of the node that hosts the master of the file system 9000: the port on which the master node is listening /path/to/file: the absolute path of the file

slide-48
SLIDE 48

Shorter Path Forms

  • file: relative path to the current working

directory in the default file system

  • /path/to/file: Absolute path to a file in the

default* file system (as configured)

  • hdfs://path/to/file: Use the default* values

for the master node and port

  • hdfs://masternode/path/tofile: Use the

given masternode name or IP and the default* port *All the defaults are in the Configuration

  • bject

48

slide-49
SLIDE 49

HDFS API (Java)

49

Configuration conf = new Configuration(); Path path = new Path(“…”); FileSystem fs = path.getFileSystem(conf); // To get the local FS fs = FileSystem.getLocal (conf); // To get the default FS fs = FileSystem.get(conf);

Create the file system

slide-50
SLIDE 50

HDFS API

50

FSDataOutputStream out = fs.create(path, …);

Create a new file

fs.delete(path, recursive); fs.deleteOnExit(path);

Delete a file

fs.rename(oldPath, newPath);

Rename a file

slide-51
SLIDE 51

HDFS API

51

FSDataInputStream in = fs.open(path, …);

Open a file

in.seek(pos); in.seekToNewSource(pos);

Seek to a different location

slide-52
SLIDE 52

HDFS API

52

fs.concat(destination, src[]);

Concatenate

fs.getFileStatus(path);

Get file metadata

fs.getFileBlockLocations(path, from, to);

Get block locations

slide-53
SLIDE 53

53

Structured Reading

slide-54
SLIDE 54

54

  • In distributed big data processing,

input files contain records not just raw bytes

  • We need to a way to read records from

files in HDFS

  • For efficiency, we should split the file

and read it in parallel Data files

slide-55
SLIDE 55

55

  • How to split the file?

§ By record: For fixed-size records § By size: For variable-size records

  • Considerations, splitting the

file should be fast

  • Should not need to read the

entire file to split it File Splitting

Input File

slide-56
SLIDE 56

56

  • A split is created for each

block

  • Advantages

§ Data locality § Efficiency

  • If the file is too small, a

single block might be further split Default File Splitting

Input File Block 1 Block 2 Block 3 Block 4

slide-57
SLIDE 57

Block 1 Block 2 Block 3 Block 4 record

57

  • Read all the records that are

inside the split

  • How to deal with records that
  • verlap two splits?
  • In each split, we should read

the records that start in that split Read data in one split

Input File

slide-58
SLIDE 58

58

  • Which records will

be read for each

  • f the four splits?

Read data in every split

Input File record3 record2 record1 record6 record5 record4 record9 record8 record7 record10

Split 1 Split 2 Split 3 Split 4

slide-59
SLIDE 59

59

  • Split the file based on the file metadata

§ File size, block sizes, # of nodes

  • Each split is defined by:

§ File name, Start offset, Length

  • For each split:

§ Seek to the start offset § Skip the first record (except for the first split) § Read until the beginning of the record goes beyond the start + length

Reading process

slide-60
SLIDE 60

60

  • HDFS is a general-purpose distributed

file system

  • It provides a write-once ready-many

access interface

  • Supports random reading which can be

used for stream reading and structured reading

  • Provides a Unix-like shell
  • Provides a Java API for programming

Conclusion