BFO: Batch-File Operations on Massive Files for Consistent - - PowerPoint PPT Presentation

bfo batch file operations on massive files for consistent
SMART_READER_LITE
LIVE PREVIEW

BFO: Batch-File Operations on Massive Files for Consistent - - PowerPoint PPT Presentation

BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement Yang Yang, Qiang Cao, Hong Jiang, Li Yang, Jie Yao, Yuanyuan Dong, Puyuan Yang Huazhong University of Science and Technology, University of Texas at Arlington,


slide-1
SLIDE 1

Yang Yang, Qiang Cao, Hong Jiang, Li Yang, Jie Yao, Yuanyuan Dong, Puyuan Yang

Huazhong University of Science and Technology, University of Texas at Arlington, Alibaba group

1

BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement

slide-2
SLIDE 2

Outline

 Background  BFO Design  Evaluation  Conclusion

2

slide-3
SLIDE 3

Backgr ground

  • und

 Batch-file Operations

 Accessing a batch of files

 Many applications need batch-file operations

 Backup applications  File-level data replication and archiving  Big data analytics systems  Social media and online shopping websites

 Traditional access approaches access files one by one

 Called single-file access pattern  Inefficient for small files

3

slide-4
SLIDE 4

Backgr ground

  • und

 Small files in file systems

 Desktop file system: more than 80% of accesses are to files smaller than 32B.  Cloud and HPC cluster: 25%~40% files < 4KB.

 Single-file access pattern for small files

 Accessing metadata  Fetching file data, and so on

 IO operations dominate batch-file access

 Metadata access contributes 40% time for accessing a small file on disk.  Random data IOs

4

slide-5
SLIDE 5

Overall access performance

 Read performance

5

Setup:

  • File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB)
  • Devices: HDD & SSD
  • Orders: Random & Sequential

9704 2226.5 551.3 177.6 65.1 37.1 167.9 53.7 31.1 29.4 28.9 28.2

4 16 64 256 1024 4096 16384 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

HDD_R HDD_S

227.8 106.3 30.5 20.2 14.9 10.4 87.3 37.1 21.1 14.2 9.7 8.5

4 8 16 32 64 128 256 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

SSD_R SSD_S

slide-6
SLIDE 6

Overall access performance

 Read performance

6

Setup:

  • File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB)
  • Devices: HDD & SSD
  • Orders: Random & Sequential

9704 2226.5 551.3 177.6 65.1 37.1 167.9 53.7 31.1 29.4 28.9 28.2

4 16 64 256 1024 4096 16384 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

HDD_R HDD_S

227.8 106.3 30.5 20.2 14.9 10.4 87.3 37.1 21.1 14.2 9.7 8.5

4 8 16 32 64 128 256 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

SSD_R SSD_S

Large performance gap between the random and sequential, especially for small files

57.8X 2.6X

slide-7
SLIDE 7

Overall access performance

 Read performance

7

Setup:

  • File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB)
  • Devices: HDD & SSD
  • Orders: Random vs Sequential

9704 2226.5 551.3 177.6 65.1 37.1 167.9 53.7 31.1 29.4 28.9 28.2

4 16 64 256 1024 4096 16384 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

HDD_R HDD_S

227.8 106.3 30.5 20.2 14.9 10.4 87.3 37.1 21.1 14.2 9.7 8.5

4 8 16 32 64 128 256 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

SSD_R SSD_S

Large performance gap among different file sizes

slide-8
SLIDE 8

Probl blem em

 Write performance

8

5138 930 225.7 146.5 68.6 56.1 88.7 43.5 37 36.1 35.3 35.9

2 8 32 128 512 2048 8192 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

HDD_R HDD_S

92.4 37.8 20.8 16.5 12.9 12.4 58.8 22.2 12.5 11.6 11.3 11

4 8 16 32 64 128 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

SSD_R SSD_S

Setup:

  • File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB)
  • Devices: HDD & SSD
  • Orders: Random vs Sequential

Observation: the single-file access approach is very inefficient

  • for small files (below 1MB);
  • in a random manner .
slide-9
SLIDE 9

Related ed W Wor

  • rks

 Application-level optimization (Fastcopy)

 Multi-threading, large buffer

 Prefetching mechanism (Diskseen, ATC’07)

 Depending on the future access behaviors

 Block-level I/O scheduler (split-level I/O scheduling, SOSP’15)

 Serializing the file accesses

 Packing metadata and data together (CFFS, FAST’16)

 Redesigning new file systems

9

slide-10
SLIDE 10

Probl blem em A Anal alysis

 File Access behaviors

 Reading a file set with three representative file systems 10

slide-11
SLIDE 11

Probl blem em A Anal alysis

 File Access behaviors

 Reading a file set with three representative file systems 11

slide-12
SLIDE 12

Probl blem em A Anal alysis

 File Access behaviors

 Reading a file set with three representative file systems  Writing a file set with three representative file systems 12

Insufficiency #1: The single-file access approach leads to the back and forth seek operations between the metadata area and data area, resulting in many non-sequential I/Os.

slide-13
SLIDE 13

Probl blem em A Anal alysis

 File Access behaviors  Data Access behaviors (excluding the metadata)

13

Expected access order Disk Blocks

A B C D E

App

A C E D B

slide-14
SLIDE 14

Probl blem em A Anal alysis

 File Access behaviors  Data Access behaviors (excluding the metadata)

14

Actual access order (alphabetic) Disk Blocks File A File B File C File D File E

A B C D E

Disk Blocks

A B C D E

App

A C E D B

slide-15
SLIDE 15

135 135.5 136 136.5 137 234 234.1 234.2 234.3 234.4 234.5 234.6 234.7 234.8 234.9 235 Logical Block Address (X106) Time (Secs)

Probl blem em A Anal alysis

 File Access behaviors  Data Access behaviors (excluding the metadata)

15

Insufficiency #2: The single-file access approach is unaware

  • f the underlying data layout, and may read these files in

any order, also leading to random I/Os.

slide-16
SLIDE 16

Outline

 Background  BFO Design

 BFOr  BFOw

 Evaluation  Conclusion

16

slide-17
SLIDE 17

BFO FOr

 Two-phase read

 Objective: Separately read the metadata and file data

  • f all accessed files in batches

 Phase 1: scanning the inodes  Phase 2: fetching all files’ data

 Layout-aware scheduler

17

2MB 128MB data group

slide-18
SLIDE 18

BFO FOr

 Two-phase read  Layout-aware scheduler

 Extracting the addresses from the inodes  Sorting the addresses of all files  Issuing read I/O in the order of the list

18

Order_node

Inode (2bytes) Start-point (8bytes) Length (4bytes) Num (4bytes)

Order list

Disk blocks

A C E D B

slide-19
SLIDE 19

BFO FOr

 Two-phase read  Layout-aware scheduler

 Extracting the addresses from the inodes  Sorting the addresses of all files  Issuing read I/O in the order of the list

19

Order_node

Inode (2bytes) Start-point (8bytes) Length (4bytes) Num (4bytes)

Order list

Disk blocks

A

Order_node

Inode->File A Start-point->3000# Length->8192bytes Num->0

B C D E

A C E D B

slide-20
SLIDE 20

BFO FOr

 Two-phase read  Layout-aware scheduler

 Extracting the addresses from the inodes  Sorting the addresses of all files  Issuing read I/O in the order of the list

20

Order_node

Inode (2bytes) Start-point (8bytes) Length (4bytes) Num (4bytes)

Order list

Disk blocks

A

Order_node

Inode->File A Start-point->3000# Length->8192bytes Num->0

B C D E

A C E D B

slide-21
SLIDE 21

BFO FOr

 Two-phase read  Layout-aware scheduler

 Extracting the addresses from the inodes  Sorting the addresses of all files  Issuing read I/O in the order of the list

21

Order_node

Inode (2bytes) Start-point (8bytes) Length (4bytes) Num (4bytes)

Order list

Disk blocks

A

Order_node

Inode->File A Start-point->3000# Length->8192bytes Num->0

B C D E

A C E D B

slide-22
SLIDE 22

BFO FOw

 Two-phase write

 Phase 1: creating a global file to store all data once

 Creating G inode for the file  Creating Order_list to record the order of the written files

 Phase 2: creating all inodes for all files

 Extracting the address from the G inode  Creating all inodes with the address information and the Order_list

Current_FileAddr = Previous_FileAddr + FileLength  Light-weight consistency strategy

22

Disk Blocks

ABCDE Global file

G G

slide-23
SLIDE 23

BFO FOw

 Two-phase write

 Phase 1: creating a global file to store all data once

 Creating G inode for the file  Creating Order_list to record the order of the written files

 Phase 2: creating all inodes for all files

 Extracting the address from the G inode  Creating all inodes with the address information and the Order_list

Current_FileAddr = Previous_FileAddr + FileLength  Light-weight consistency strategy

23

Disk Blocks

ABCDE

G

A B C D E

G

slide-24
SLIDE 24

BFO FOw

 Two-phase write

 Phase 1: creating a global file to store all data once

 Creating G inode for the file  Creating Order_list to record the order of the written files

 Phase 2: creating all inodes for all files

 Extracting the address from the G inode  Creating all inodes with the address information and the Order_list

Current_FileAddr = Previous_FileAddr + FileLength  Light-weight consistency strategy

24

Disk Blocks

ABCDE

G

A B C D E

A B C D E G

slide-25
SLIDE 25

BFO FOw

 Two-phase write  Light-weight consistency strategy

 writing the Order_list into journal files as an atomic operation  recreating all inodes with the Order_list and G inode

25

Disk Blocks

ABCDE

G

A B C D E

A B C D E G G

slide-26
SLIDE 26

Outline

 Background  BFO Design  Evaluation  Conclusion

28

slide-27
SLIDE 27

Exper perimental ental s setup tup

 Prototyped BFO on ext4  Intel Xeon E5 2620 @ 2.40GHz and 16GB RAM  Storage devices

 RAID0 with 5 Western Digital 7200RPM 4TB SAS HDD  A Western Digital 4TB SAS HDD  480GB SAMSUNG 750 EVO SSD

 File sets

 File sets created by Filebench

 4GB data with different file sizes (i.e., from 4KB to 4MB)

 Linux-kernel source code 29

slide-28
SLIDE 28

Read P d Perfor formanc ance

30

2 16 128 1024 8192 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

Read_R BFOr_R Read_S BFOr_S 2 16 128 1024 8192 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

Read_R BFOr_R Read_S BFOr_S

9704

1 4 16 64 256 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

Read_R BFOr_R Read_S BFOr_S

42.1X 22.4X 81.4%

slide-29
SLIDE 29

Read P d Perfor formanc ance

31

2 16 128 1024 8192 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

Read_R BFOr_R Read_S BFOr_S 2 16 128 1024 8192 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

Read_R BFOr_R Read_S BFOr_S

9704

1 4 16 64 256 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different file sets

Read_R BFOr_R Read_S BFOr_S

1.6X 2X 1.8X

slide-30
SLIDE 30

Write P te Perfor formanc nce

32 4 16 64 256 1024 4096 4KB 16KB 64KB 256KB 1MB 4MB

Execution time (s) File size in different sets

RAID_RW RAID_SW RAID_BFOw HDD_RW HDD_SW HDD_BFOw SSD_RW SSD_SW SSD_BFOw

71.8X 111.4X 2.9X

slide-31
SLIDE 31

Acce ccess ss B Behavi viors

33

slide-32
SLIDE 32

Real al-wor world A d Applicati ations

  • ns

34

The execution time of copying a file set with different storage devices. SHSP (SSSP): within the same partition of the same HDD (SSD), SHDP (SSDP): between the different partitions of the same HDD (SSD).

46.6%

slide-33
SLIDE 33

Outline

 Background  BFO Design  Evaluation  Conclusion

35

slide-34
SLIDE 34

Conc nclus usion

 We experimentally investigate the root cause of the inefficiency of the

traditional single-file access pattern for batched files.

 Seeking forth and back between metadata area and data area.  Accessing all files in random order.

 We present BFO, for batch-file access, with optimized batch-file read

(BFOr) and write (BFOw).

 Two-phase access.  Layout-aware scheduler.  Light-weight consistency strategy

 BFO improves the access performance consistently, and removes a

significant amount of random and non-sequential I/Os.

36

slide-35
SLIDE 35

Thank You

Q&A

37