A Case for Self-Optimizing File Systems Jason Liptak, Sam Burnett - PowerPoint PPT Presentation

A Case for Self-Optimizing File Systems Jason Liptak, Sam Burnett A Case for Self-Optimizing File Systems – p.1/19

Motivation Current file systems do not take data access patterns into account. Most requests made are non-sequential. Improved disk access performance can result in improved application performance. Application CPU (s) I/O wait (s) Seq I/O (%) 1.95 5.43 13.97% firefox 0.70 4.53 18.83% gedit 2.67 4.37 50.49% gimp 4.83 10.82 10.47% oowriter 0.74 3.59 27.03% xemacs 0.64 2.95 50.38% xinit A Case for Self-Optimizing File Systems – p.2/19

Approach Build a self-optimizing file system that organizes data on disk according to data usage patterns. Profile data access patterns. Analyze the data by to create an abstraction. Plan new data layout. Reconfigure the data placement for optimal access. A Case for Self-Optimizing File Systems – p.3/19

Profiling Every IO that passes throught the kernel is recorded. The kernel talks in terms of Logical Block Numbers (LBNs). Records are of the form time pid program start length mode time is the current CPU clock value. pid is the process ID of the process that submitted the request. program is the executable name of the submitter. start is the first LBN of the request. length is the length of the request, in blocks. mode is read or write. A Case for Self-Optimizing File Systems – p.4/19

Analysis We model disk access as a directed graph. Vertices are LBNs. Edges are the transitions between LBNs in a sequence. One graph is constructed for each PID. These graphs are merged together to form a comprehensive graph used in planning. A Case for Self-Optimizing File Systems – p.5/19

Example Here is an example graph, and how it would be merged. (0,1) (0, 3) (1, 6) (6,1) 1 1 1 (1,2) 1 (4,2) (8,1) 2 1 1 (3,1) 1 (9,1) 1 (8,2) 2 1 (9,1) (4,2) A Case for Self-Optimizing File Systems – p.6/19

Planning Use a greedy algorithm to place data: 1. Find most connected vertex. Place this first. Call this vertex the “blob.” 2. Find edge with highest weight connected to the blob. 3. If the edge is coming into the blob, place the vertex at the other end of the edge before all other vertices. If the edge is coming out of the blob, place the next vertex after the blob. 4. Move all edges of the next vertex to the blob. 5. If there are edges connected to the blob, return to step 2. 6. If there are any vertices left in the graph, return to step 1. 7. Otherwise, we are done. This algorithm can be thought of as a function from LBNs to LBNs; An LBN is given to the algorithm and it returns a new location for that LBN based on the optimization heuristic. A Case for Self-Optimizing File Systems – p.7/19

Example 10 8 A B C 7 5 9 6 1 2 7 2 7 9 G D E F 2 4 6 3 9 6 3 8 8 H I J A Case for Self-Optimizing File Systems – p.8/19

Placement Algorithms We use two placement techniques: The first places all LBNs sequentially. This allows for optimal performance, but is unrealistic. The second places LBNs in the original LBN space. This is more realistic, though in a real file system better performance will probably result. Together, these two algorithms represent bounds on the performance of the system. Real file system performance will likely be somewhere in between. A Case for Self-Optimizing File Systems – p.9/19

Reconfiguration At some convenient time, rearrange the data according to the mapping provided in the planning stage. We simulate this by reading from old LBN locations, then reading from new LBN locations. This is the sequence of operations that would be performed to move data in a real self-optimizing file system. A Case for Self-Optimizing File Systems – p.10/19

Evaluation Techniques Evaluate performance of individual applications the entire system Evaluate overhead of profiling data processing A Case for Self-Optimizing File Systems – p.11/19

Application Performance Collected disk access data for six Linux applications. Used this information to create a new disk layout. Simulated application disk access by performing the requests made by the application back-to-back. Compared this to access time if using the optmized layout. A Case for Self-Optimizing File Systems – p.12/19

Application Performance Unoptimized Sequential Fragmented 10 8 Time (seconds) 6 4 2 0 xemacs firefox gedit gimp oowriter xinit Applications A Case for Self-Optimizing File Systems – p.13/19

System Performance Collected disk access data for three hosts over several days. For each host, created a new disk layout. Compared access times using back-to-back access. A Case for Self-Optimizing File Systems – p.14/19

System Performance Unoptimized Sequential Fragmented 500 400 Time (seconds) 300 200 100 0 twain apocalypse mark Hosts A Case for Self-Optimizing File Systems – p.15/19

Profiling Overhead Appears to be very small. At this time, all request data is sent to a remote server for storage. Perhaps more overhead if everything is done locally? May become much higher if we probe before caching occurs. A Case for Self-Optimizing File Systems – p.16/19

Processing Overhead Processing (analysis, planning and reconfiguration) can be done “offline,” at whatever time is most convenient to the user. Current algorithms are slow. A real implementation would be optimized. Host Requests Profiling(s) Processing(s) Overhead(%) sec./req. 43472 51336 789 1.54% 0.0181 twain 68752 262615 1492 0.57% 0.0217 apocalypse 181120 156531 12480 7.97% 0.0689 mark Overhead for Analsysis, Planning and Reconfiguration A Case for Self-Optimizing File Systems – p.17/19

Implementation Issues Obtain all relevant information from the kernel. Currently, information is obtained after the buffer cache. We would like information from before the cache. Overhead Where to store all the records (cannot dump to disk). When to perform calculations (online or offline). How to efficiently move data for optimal placement. A Case for Self-Optimizing File Systems – p.18/19

Future Work More rigorous testing of the layout algorithm. Find proper profiling point in kernel. Perhaps explore other algorithms? Implement an actual self-optimizing file system. A Case for Self-Optimizing File Systems – p.19/19

A Case for Self-Optimizing File Systems Jason Liptak, Sam Burnett - PowerPoint PPT Presentation

A Case for Self-Optimizing File Systems Jason Liptak, Sam Burnett A Case for Self-Optimizing File Systems p.1/19 Motivation Current file systems do not take data access patterns into account. Most requests made are non-sequential.

File Management What is a file? Elements of file management File organization

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

HDA case study S. Skogestad, May 2006 Self- Self Thanks to Antonio Arajo 1 Process

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

File Systems: Semantics & Structure What is a File a file is a named collection of

File Systems: Semantics & Structure What is a File a file is a named collection of

Autonomic Systems Autonomic Systems Autonomic : adaptive : adaptive Autonomic Self

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File

CPSC 410/611: File Management What is a file? Elements of file management

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

File Systems: Consistency Issues 1 File Systems: Consistency Issues File systems maintain many

CSC 452 File Systems Files Jonathan Misurda jmisurda@cs.arizona.edu File Naming File

Advanced File Systems Thierry Sans Advanced File Systems How to improve the performances?

Disproving the normal graph conjecture Lucas Pastor October 12, 2016 Joint-work with Ararat

Stable Cluster Variables Grace Zhang August 1, 2016 Grace Zhang Stable Cluster Variables

FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Fast PageRank Approximations

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs Mike

Matrix Scaling: A New Heuristic for the Feedback Vertex Set Problem James Shook 1 Isabel Beichl 1

Regular Polytopes Laura Mancinska University of Waterloo, Department of C&O January 23,

Introduction to Mascopt: a library for graph manipulation Bruno Bongiovanni, Jean-Fran cois

Healthy Workplace Conference & Award Ceremony Wadebridge, 9 March 2020 National overview

A Case for Self-Optimizing File Systems Jason Liptak, Sam Burnett - PowerPoint PPT Presentation

A Case for Self-Optimizing File Systems Jason Liptak, Sam Burnett A Case for Self-Optimizing File Systems p.1/19 Motivation Current file systems do not take data access patterns into account. Most requests made are non-sequential.

File Management What is a file? Elements of file management File organization

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

HDA case study S. Skogestad, May 2006 Self- Self Thanks to Antonio Arajo 1 Process

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

File Systems: Semantics &amp; Structure What is a File a file is a named collection of

File Systems: Semantics &amp; Structure What is a File a file is a named collection of

Autonomic Systems Autonomic Systems Autonomic : adaptive : adaptive Autonomic Self

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File

CPSC 410/611: File Management What is a file? Elements of file management

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

File Systems: Consistency Issues 1 File Systems: Consistency Issues File systems maintain many

CSC 452 File Systems Files Jonathan Misurda jmisurda@cs.arizona.edu File Naming File

Advanced File Systems Thierry Sans Advanced File Systems How to improve the performances?

Disproving the normal graph conjecture Lucas Pastor October 12, 2016 Joint-work with Ararat

Stable Cluster Variables Grace Zhang August 1, 2016 Grace Zhang Stable Cluster Variables

FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Fast PageRank Approximations

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs Mike

Matrix Scaling: A New Heuristic for the Feedback Vertex Set Problem James Shook 1 Isabel Beichl 1

Regular Polytopes Laura Mancinska University of Waterloo, Department of C&amp;O January 23,

Introduction to Mascopt: a library for graph manipulation Bruno Bongiovanni, Jean-Fran cois

Healthy Workplace Conference &amp; Award Ceremony Wadebridge, 9 March 2020 National overview

File Systems: Semantics & Structure What is a File a file is a named collection of

File Systems: Semantics & Structure What is a File a file is a named collection of

Regular Polytopes Laura Mancinska University of Waterloo, Department of C&O January 23,

Healthy Workplace Conference & Award Ceremony Wadebridge, 9 March 2020 National overview