BORG: Block-reORGanization for Self-optimizing Storage Systems - - PowerPoint PPT Presentation

borg block reorganization for self optimizing storage
SMART_READER_LITE
LIVE PREVIEW

BORG: Block-reORGanization for Self-optimizing Storage Systems - - PowerPoint PPT Presentation

BORG: Block-reORGanization for Self-optimizing Storage Systems Medha Bhadkamkar Jorge Guerra Luis Useche Sam Burnett Jason Liptak Raju Rangaswami Vagelis Hristidis Florida International University March 9, 2009 1 / 33 Problem I/O is


slide-1
SLIDE 1

BORG: Block-reORGanization for Self-optimizing Storage Systems

Medha Bhadkamkar Jorge Guerra Luis Useche Sam Burnett Jason Liptak Raju Rangaswami Vagelis Hristidis

Florida International University

March 9, 2009

1 / 33

slide-2
SLIDE 2

Problem

◮ I/O is the bottleneck

Legacy filesystems favor sequential access. Realistic workloads are not necessarily sequential

◮ Proposed Solution

Co-locate data based on workload block access patterns Improve sequentiality

2 / 33

slide-3
SLIDE 3

Workload Characteristics that motivate BORG

◮ Workloads

  • ffice - browser, OpenOffice applications, gnuplot, etc

developer - emacs, gcc, gdb, etc Subversion (SVN) server - Sources and document repository Web server - Department web server

◮ Workloads Statistics Summary

Workload File System Total [GB] Total [GB] type size [GB] Reads Writes

  • ffice

8.29 6.49 0.32 developer 45.59 3.82 10.46 SVN server 2.39 0.29 0.62 web server 169.54 21.07 2.24

3 / 33

slide-4
SLIDE 4

Non-uniform Access Frequency Distribution

◮ Frequently accessed data is usually a small portion of the entire data. ◮ Frequently accessed data is spread over entire disk area

Workload File System Unique [GB] Unique [GB] Top 20% type size [GB] Reads Writes data access

  • ffice

8.29 1.63 0.22 51.40 % developer 45.59 2.57 3.96 60.27 % SVN server 2.39 0.17 0.18 45.79 % web server 169.54 7.32 0.33 59.50 %

4 / 33

slide-5
SLIDE 5

Non-uniform Access Frequency Distribution

Access Frequency

The Opportunity Co-locating frequently accessed data can improve I/O performance.

5 / 33

slide-6
SLIDE 6

Workload Characteristics - Partial Determinism

◮ Non-sequential accesses repeat in a block access sequence

Workload Partial type determinism

  • ffice

65.42 % developer 61.56 % SVN server 50.73 % web server 15.55 %

The Opportunity Using partial determinism information can improve sequentiality of accesses.

6 / 33

slide-7
SLIDE 7

Temporal Locality

◮ There is a substantial overlap in the working sets across days.

20 40 60 80 100 Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Data access overlap with Day 1 (%) Days of the week All accesses Top 20% accesses

The Opportunity Using information of past I/O activity for optimizing layout can improve performance.

7 / 33

slide-8
SLIDE 8

BORG in a nutshell

◮ Uses block access patterns to identify hot block sequences in the workload. ◮ Reorganizes blocks in a separate BORG OPTimized partition (BOPT) ◮ Assimilates write request in the partition ◮ Operates in the background ◮ Can be dynamically inserted or removed when required ◮ Is independent of filesystems ◮ Maintains consistency by maintaining a persistent page-level indirection map.

8 / 33

slide-9
SLIDE 9

System Architecture

VFS Page Cache File Systems (EXT3, JFS...) BORG Layer I/O Scheduler Device Driver Legend: Existing components New components Application Kernel User

9 / 33

slide-10
SLIDE 10

System Architecture

VFS Page Cache File Systems (EXT3, JFS...) BORG Layer I/O Scheduler Device Driver I/O Trace Layout Plan Legend: Existing components New components Kernel−space components Profiler I/O Reconfigurator I/O Indirector BOPT−space User−space components Application Kernel User Analyzer Planner

10 / 33

slide-11
SLIDE 11

System Architecture

VFS Page Cache File Systems (EXT3, JFS...) BORG Layer I/O Scheduler Device Driver I/O Trace Layout Plan Legend: Existing components New components Kernel−space components Profiler I/O Reconfigurator I/O Indirector BOPT−space User−space components Application Kernel User Analyzer Planner

11 / 33

slide-12
SLIDE 12

I/O Profiler

◮ Each I/O operation logged with:

Temporal Attribute: Timestamp Process-level Attributes: Process ID, name Block-level attribute: Start LBA, length of I/O, Mode (R/W)

Sample Trace

[Timestamp] [PID] [Exec.] [StartLBA] [Size] [Mode] 705423195774700 5745 screen 6914207 32 R 705423259644748 5755 utempter 24379775 8 R 705423379492524 5755 utempter 24787567 8 R 705423421266908 5753 bash 7498311 24 R 705423454005104 5755 utempter 24793415 8 R 705423493292648 5753 bash 34543375 64 R 705423565122668 5766 stty 34543439 16 R ... ... ... ... ... ...

12 / 33

slide-13
SLIDE 13

System Architecture

VFS Page Cache File Systems (EXT3, JFS...) BORG Layer I/O Scheduler Device Driver Analyzer Planner I/O Trace Layout Plan Legend: Existing components New components Kernel−space components Profiler I/O Reconfigurator I/O Indirector BOPT−space User−space components Application Kernel User

13 / 33

slide-14
SLIDE 14

Analyzer

◮ Builds a per-process directed, weighted graph ◮ Vertex is the per request LBA range (Start LBA, length) ◮ Edge is a temporal dependency between two ranges ◮ Weights represent frequency of access ◮ Graphs merged into a single master access graph

Process graphs Master access graph after merging

r1:(0, 3) s1:(1, 6) r2:(4, 2) s2:(9, 1) r3:(8, 2) r1:(0, 1) s1:(6, 1) r1, s1:(1, 2) r2, s1:(4, 2) r3:(8, 1) s1:(3, 1) r3, s2:(9, 1) 1 1 2 1 1 1 1 1 1 1 14 / 33

slide-15
SLIDE 15

Planner

◮ Uses master access graph as input ◮ Chooses the most connected node for initial placement ◮ Chooses the node most connected to already placed node-set ◮ Places it depending on its direction of the connecting edge

A E D G C B J F H I 5 2 8 9 8 8 9 7 10 3 9 6 4 2 7 6 1 2 7 6 3

F → H → J → A → G → C → B → E → D 15 / 33

slide-16
SLIDE 16

System Architecture

VFS Page Cache File Systems (EXT3, JFS...) BORG Layer I/O Scheduler Device Driver I/O Trace Layout Plan Legend: Existing components New components Kernel−space components Profiler I/O Reconfigurator I/O Indirector BOPT−space User−space components Application Kernel User Analyzer Planner

16 / 33

slide-17
SLIDE 17

Reconfigurator

Planner

  • 3. Writes plan

BOPT Read Cache BOPT Write Buffer BOPT FS Reconfigurator

  • 2. Current Plan
  • 1. Graph G

C’ Source Dest.

  • 4. Reads plan

W’ A C B D Leaving

FS Space Space BOPT

D’ C

  • 6. Writes to

FS

  • 5. Reads from

BOPT C’ Legend:

17 / 33

slide-18
SLIDE 18

Reconfigurator

Planner

  • 3. Writes plan

BOPT Read Cache BOPT Write Buffer BOPT FS BOPT BOPT Reconfigurator

  • 2. Current Plan
  • 1. Graph G

D" C’ Source Dest.

  • 4. Reads plan

D’ W’ A C B D D" Leaving Relocate

FS Space Space BOPT

D’ C

  • 5. Reads from

BOPT

  • 6. Writes to

BOPT Legend:

18 / 33

slide-19
SLIDE 19

Reconfigurator

Planner

  • 3. Writes plan

BOPT Read Cache BOPT Write Buffer

  • 6. Writes

to BOPT BOPT FS BOPT FS BOPT BOPT

  • 5. Reads

FS block Reconfigurator

  • 2. Current Plan
  • 1. Graph G

C’ C Source Dest. B

  • 4. Reads plan

D’ W’ A C B B’ D D" Leaving Incoming Relocate

FS Space Space BOPT

D" B’ Legend:

19 / 33

slide-20
SLIDE 20

System Architecture

VFS Page Cache File Systems (EXT3, JFS...) BORG Layer I/O Scheduler Device Driver I/O Trace Layout Plan Legend: Existing components New components Kernel−space components Profiler I/O Reconfigurator I/O Indirector BOPT−space User−space components Application Kernel User Analyzer Planner

20 / 33

slide-21
SLIDE 21

I/O Indirector

Indirector I/O

C 1 B B’ C’

borg_map

FS Block

BOPT Block Dirty

BOPT Read Cache BOPT Write Buffer Request B B’ Read A C B B’ D D"

FS Space Space BOPT

Legend:

21 / 33

slide-22
SLIDE 22

I/O Indirector

X

Indirector I/O

C 1 B B’ C’

borg_map

FS Block

BOPT Block Dirty

BOPT Read Cache BOPT Write Buffer Request A A Read A C B B’ D D"

FS Space Space BOPT

Legend:

22 / 33

slide-23
SLIDE 23

I/O Indirector

W’ A C B B’ D D"

FS Space Space BOPT

Indirector I/O W’

C 1 B B’ C’ W’

borg_map

FS Block

BOPT Block Dirty

BOPT Read Cache BOPT Write Buffer W Request

W

W’

1

Write Legend:

23 / 33

slide-24
SLIDE 24

Evaluation

Goals ◮ How effective is BORG? ◮ What are the overheads? ◮ When is it not effective? ◮ How sensitive is it to different parameters? Setup ◮ Metric - Total disk busy times ◮ 5 hosts with different configurations ◮ Linux 2.6.22 kernel ◮ reiserfs and ext3

24 / 33

slide-25
SLIDE 25

Busy times for Webserver

Setup ◮ Over 1.1 million requests to over 255,000 files in one week. ◮ BOPT size 8 GB, 4 Reconfigurations ◮ Evaluated BORG with cumulative and partial traces

500 1000 1500 2000 2500 3000 3500 N1 N2 N3 N4 N5 Disk Busy Time (sec) Phases Vanilla BORG-C BORG-P

Summary 14-35% reduction in busy times for cumulative and 5-39% for partial traces.

25 / 33

slide-26
SLIDE 26

Busy times for Webserver

Setup ◮ Over 1.1 million requests to over 255,000 files in one week. ◮ BOPT size 8 GB, 4 Reconfigurations ◮ Evaluated BORG with cumulative and partial traces

100 200 300 400 500 600 700 R1 R2 R3 R4 Disk Busy Time (sec) Phases Vanilla BORG-C BORG-P

Summary ◮ Busy times higher in reconfiguration phases due to copy overheads.

26 / 33

slide-27
SLIDE 27

BORG Overhead

Setup ◮ Over 1.1 million requests to over 255,000 files in one week. ◮ BOPT size 8 GB, 4 Reconfigurations ◮ Cumulative and partial traces

5000 10000 15000 20000 25000 30000 C P C P C P C P Time (sec) Reconfigurations Analyzer Planner Reconfigurator R4 R3 R2 R1

Summary ◮ Linear increase in planning and analysis overheads for cumulative traces.

27 / 33

slide-28
SLIDE 28

Sensitivity Analysis - Reconfiguration Interval

Setup ◮ Interval 8 hours - 3 days, 1 GB BOPT, with 50% write buffer

  • 20

20 40 60 80 100 3 days 2 days 1 day 12 hrs 8 hrs Reduction in busy time (%) Reconfiguration Interval Developer SVN

Summary ◮ Smaller intervals lead to better performance for frequently changing workloads.

28 / 33

slide-29
SLIDE 29

Sensitivity Analysis - BOPT Size

Setup ◮ BOPT size 256 MB - 8 GB, with 50% write buffer

  • 20

20 40 60 80 100 256MB 512MB 1GB 2GB 4GB 8GB Reduction in busy time (%) Size of BOPT Developer SVN

Summary ◮ Developer: Performance increases with increase in size ◮ SVN: Improvement is same due to smaller working set size.

29 / 33

slide-30
SLIDE 30

Sensitivity Analysis - Write Buffer Size Variation

Setup ◮ Write buffer 0 - 100%

  • 20

20 40 60 80 100 0% 25% 50% 75% 100% Reduction in busy time (%) Write Buffer Fraction Developer SVN

Summary ◮ Incorrect size can impact performance

30 / 33

slide-31
SLIDE 31

BORG Summary and Future Work

Conclusions ◮ BORG improves I/O sequentiality and restricts disk head movement ◮ Disk busy times reduction ranges from 6% to 50% for untuned systems ◮ Disk busy times can decrease upto 80% with careful tuning ◮ BORG overheads are within acceptable limits Future Work ◮ Exploring alternate layout strategies ◮ Automated reconfigurations ◮ Automated configuration of parameters

31 / 33

slide-32
SLIDE 32

Thank you!

32 / 33

slide-33
SLIDE 33

Related Work

◮ File System Level Approaches - LFS, PLACE, HFS, FS2 ◮ Block Level Approaches - Cylinder Shuffling, Disk Caching Disk, ALIS

33 / 33