Reducing Seek Overhead with Application-Directed Prefetching Steve - PowerPoint PPT Presentation

Reducing Seek Overhead with Application-Directed Prefetching Steve VanDeBogart, Christopher Frost, Eddie Kohler University of California, Los Angeles http://libprefetch.cs.ucla.edu

Disks are Relatively Slow Average Throughput Whetstone Seek Time Instr./Sec. 1979 55 ms 0.5 MB/s 0.714 M 2009 8.5 ms 105 MB/s 2,057 M Improvement 6.5 x 210 x 2,880 x 1979: PDP 11/55 with an RL02 10MB disk 2009: Core 2 with a Seagate 7200.11 500GB disk 2

Work Arounds ● Buffer cache – Avoid redoing reads ● Write batching – Avoid redoing writes ● Disk scheduling – Reduce (expensive) seeks ● Readahead – Overlap disk & CPU time 3

Readahead ● Generally applies to sequential workloads ● Harsh penalties for mispredicting accesses ● Hard to predict nonsequential access patterns ● Some workloads are nonsequential ● Databases ● Image / Video processing ● Scientific workloads: simulations, experimental data, etc. 4

Nonsequential Access ● Why so slow? ● Seek costs ● Possible solutions ● More RAM ● More spindles ● Disk scheduling ● Why are nonsequential access patterns often scheduled poorly? ● Painful to get right 5

Example – Getting it Wrong ● Programmer will access nonsequential dataset ● Prefetch it fadvise(fd, data_start, data_size, WILLNEED) ● Now it's slower ● Maybe prefetching evicted other useful data ● Maybe the dataset is larger than the cache size 6

Libprefetch ● User space library ● Provides new prefetching interface ● Application-directed prefetching ● Manages details of prefetching ● Up to 20x improvement ● Real applications (GIMP, SQLite) ● Small modifications (< 1,000 lines per app) 7

Libprefetch Contributions ● Microbenchmarks – Quantitatively understand problem ● Interface – Convenient interface to provide access information ● Kernel – Some changes needed ● Contention – Share resources 8

Outline ● Related work ● Microbenchmarks ● Libprefetch interface ● Results 9

Prefetching ● Determining future accesses ● Historic access patterns ● Static analysis ● Speculative execution ● Application-directed ● Using future accesses to influence I/O 10

Application-Directed Prefetching ● Patterson (Tip 1995), Cao (ACFS 1996) ● Roughly doubled performance ● Tight memory constraints ● Little reordering of disk requests ● More in paper 11

Prefetching Access pattern: 1, 6, 2, 8, 4, 7 No prefetching CPU I/O 1 → 6 → 2 → 8 → 4 → 7 Time 12

Prefetching Access pattern: 1, 6, 2, 8, 4, 7 No prefetching CPU I/O 1 → 6 → 2 → 8 → 4 → 7 Traditional prefetching – Overlap I/O & CPU CPU I/O 1 → 6 → 2 → 8 → 4 → 7 Time 13

Prefetching Access pattern: 1, 6, 2, 8, 4, 7 No prefetching CPU I/O 1 → 6 → 2 → 8 → 4 → 7 Traditional prefetching – Overlap I/O & CPU CPU I/O 1 → 6 → 2 → 8 → 4 → 7 Traditional prefetching – Fast CPU CPU I/O 1 → 6 → 2 → 8 → 4 → 7 Time 14

Seek Performance 15

Seek Performance 16

Expensive Seeks ● Minimizing expensive seeks with disk scheduling – reordering Access pattern: 1, 6, 2, 8, 4, 7 In order: 1 6 2 8 4 7 Reorder: 1 2 4 6 7 8 17

Reordering 1 6 2 8 4 7 CPU Dependency I/O 1 → 6 → 2 → 8 → 4 → 7 1 6 2 8 4 7 CPU Dependency I/O 1 1 → 2 → 4 → 6 → 7 → 8 Time ● Must buffer out of order requests ● Reordering limited by buffer space 18

Reorder Prefetching Access pattern: 1, 6, 2, 8, 4, 7 Traditional prefetching – Fast CPU CPU I/O 1 → 6 → 2 → 8 → 4 → 7 Reorder prefetching – Buffer size = 3 CPU I/O 1 2 → 6 → 4 → 7 8 Reorder prefetching – Buffer size = 6 CPU I/O 1 2 → 4 → 6 7 8 Time 19

Buffer Size Random access to a 256MB file with varying amounts of reordering allowed 20

Buffer Size 22

Buffer Size ● Buffer size important to performance ● Too low: not using all capability, lower performance ● Too high: evict useful data, performance goes down ● Start with all free and buffer cache memory ● Libprefetch uses /proc to find free memory ● Change memory target with usage 24

More microbenchmarks ● Request size ● Large requests vs. small requests ● Platter location ● Start of disk vs. end of disk ● Infill ● Reading extra data to eliminate small seeks 25

Libprefetch algorithm ● Application-directed prefetching for deep, accurate access lists ● Use as much memory as possible to maximize reordering ● Reorder requests to minimize large seeks 26

Interface Outline ● List of access entries ● Callback ● Supply access list incrementally ● Non-invasive to existing applications 27

Example c   =   register_client (callback,   NULL); File A File B 0 450 0 450 28

Example c   =   register_client (callback,   NULL); r1   =   register_region (c,   A,   75,   350); r2   =   register_region (c,   B,   100,   200);   r3   =   register_region (c,   B,   300,   400); File A File B 0 75 350 450 0 100 200 300 400 450 29

Example c   =   register_client (callback,   NULL); r1   =   register_region (c,   A,   75,   350); r2   =   register_region (c,   B,   100,   200);   r3   =   register_region (c,   B,   300,   400); a_list   =   {   {A,   100,   1},   ...   {B,   150,   0},   ...   {A,   200,   1}   }; n   =   request_prefetching (c,   a_list,   3,   PF_SET   ¦   PF_DONE); File A File B 0 75 350 450 0 100 200 300 400 450 30

Example Access list entry: c   =   register_client (callback,   NULL); file descriptor, file offset, r1   =   register_region (c,   A,   75,   350); marked flag r2   =   register_region (c,   B,   100,   200);   r3   =   register_region (c,   B,   300,   400); a_list   =   {   {A,   100,   1},   ...   {B,   150,   0},   ...   {A,   200,   1}   }; n   =   request_prefetching (c,   a_list,   3,   PF_SET   ¦   PF_DONE); File A File B 0 75 350 450 0 100 200 300 400 450 31

Example Flags: c   =   register_client (callback,   NULL); append, r1   =   register_region (c,   A,   75,   350); clear, r2   =   register_region (c,   B,   100,   200);   complete r3   =   register_region (c,   B,   300,   400); a_list   =   {   {A,   100,   1},   ...   {B,   150,   0},   ...   {A,   200,   1}   }; n   =   request_prefetching (c,   a_list,   3,   PF_SET   ¦   PF_DONE); File A File B 0 75 350 450 0 100 200 300 400 450 32

Example c   =   register_client (callback,   NULL); r1   =   register_region (c,   A,   75,   350); r2   =   register_region (c,   B,   100,   200);   r3   =   register_region (c,   B,   300,   400); a_list   =   {   {A,   100,   1},   ...   {B,   150,   0},   ...   {A,   200,   1}   }; n   =   request_prefetching (c,   a_list,   3,   PF_SET   ¦   PF_DONE); Accepted entries “short” = full File A File B 0 75 350 450 0 100 200 300 400 450 33

Example c   =   register_client (callback,   NULL); r1   =   register_region (c,   A,   75,   350); r2   =   register_region (c,   B,   100,   200);   r3   =   register_region (c,   B,   300,   400); a_list   =   {   {A,   100,   1},   ...   {B,   150,   0},   ...   {A,   200,   1}   }; n   =   request_prefetching (c,   a_list,   3,   PF_SET   ¦   PF_DONE); fadvise(A, 100, WILL_NEED) … fadvise(B, 150, WILL_NEED) … File A File B fadvise(A, 200, WILL_NEED) 0 75 350 450 0 100 200 300 400 450 libprefetch_a_list   =   {{A,   100,   1},   ...   {B,   150,   0},   ...   {A,   200,   1}}; 34

Example c   =   register_client (callback,   NULL); r1   =   register_region (c,   A,   75,   350); r2   =   register_region (c,   B,   100,   200);   r3   =   register_region (c,   B,   300,   400); a_list   =   {   {A,   100,   1},   ...   {B,   150,   0},   ...   {A,   200,   1}   }; n   =   request_prefetching (c,   a_list,   3,   PF_SET   ¦   PF_DONE); pread (A,   ...,   100); File A File B 0 75 350 450 0 100 200 300 400 450 libprefetch_a_list   =   {{A,   100,   1},   ...   {B,   150,   0},   ...   {A,   200,   1}}; 35

Reducing Seek Overhead with Application-Directed Prefetching Steve - PowerPoint PPT Presentation

Reducing Seek Overhead with Application-Directed Prefetching Steve VanDeBogart, Christopher Frost, Eddie Kohler University of California, Los Angeles http://libprefetch.cs.ucla.edu Disks are Relatively Slow Average Throughput Whetstone Seek

Seek Novelty Personality Environment Predictable Unpredictable Seek Stability Seek Novelty

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Se Seek, k, Learn rn and nd Serve ve Se Seek, k, Learn rn and nd Serve ve WELCOME TO

Low-Overhead System Tracing With eBPF Akshay Kapoor DevOps Engineer @ SAP Labs May 2018

Electric Traction Electrified railway systems Prof. Dr. Ir. R.P.B.J. Dollevoet Introduction

File System Performance File System Performance Memory mapped files - Avoid system call overhead

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

MEET SPOT. 1. SEEK & SCAN DRIVERS EMERGING TRENDS OF ISSUES CHANGE FUTURES 1. SEEK 2.

25 February 2020 ASX Market Announcements Office Dear Sir/Madam SEEK Limited ( SEEK ) - FY2020

Inductive Logic Programming for Seek Whence Richard Evans Deep Mind / Imperial College

Incidence Relations and Directed Cycles Hao Wu George Washington University Directed graphs and

3.5 Connectivity in Directed Graphs Directed Graphs Directed graph. G = (V, E) Edge (u, v)

6.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

5.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

CS 401 Greedy Algorithms Xiaorui Sun 1 Directed Acyclic Graphs (DAG) Def: A DAG is a directed

OVERHEAD CRANE OVERHEAD CRANE-HOIST HOIST-JIB CRANE JIB CRANE ATEX PLANT ATEX PLANT

ECS231 Intro to High Performance Computing April 13, 2019 1 / 33 Algorithm design and

Cost Models Chapter Twenty-One Modern Programming Languages, 2nd ed. 1 Which Is Faster?

Cache&aware)Sparse)Matrix)Formats)) for)Kepler)GPU

Ch u nking Arra y s in Dask PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha v ide

Other Camera Controls The LookAt function is only for positioning camera Other ways to

CSCI 1112: Lecture 3 Mona Diab RoadMap High level view of Object Oriented Programming

Introduction to Object-Oriented Programming Arrays Christopher Simpkins

Addressing Modes Chapter 11 S. Dandamudi Outline Addressing modes Examples Simple

Reducing Seek Overhead with Application-Directed Prefetching Steve - PowerPoint PPT Presentation

Reducing Seek Overhead with Application-Directed Prefetching Steve VanDeBogart, Christopher Frost, Eddie Kohler University of California, Los Angeles http://libprefetch.cs.ucla.edu Disks are Relatively Slow Average Throughput Whetstone Seek

Seek Novelty Personality Environment Predictable Unpredictable Seek Stability Seek Novelty

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Se Seek, k, Learn rn and nd Serve ve Se Seek, k, Learn rn and nd Serve ve WELCOME TO

Low-Overhead System Tracing With eBPF Akshay Kapoor DevOps Engineer @ SAP Labs May 2018

Electric Traction Electrified railway systems Prof. Dr. Ir. R.P.B.J. Dollevoet Introduction

File System Performance File System Performance Memory mapped files - Avoid system call overhead

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

MEET SPOT. 1. SEEK &amp; SCAN DRIVERS EMERGING TRENDS OF ISSUES CHANGE FUTURES 1. SEEK 2.

25 February 2020 ASX Market Announcements Office Dear Sir/Madam SEEK Limited ( SEEK ) - FY2020

Inductive Logic Programming for Seek Whence Richard Evans Deep Mind / Imperial College

Incidence Relations and Directed Cycles Hao Wu George Washington University Directed graphs and

3.5 Connectivity in Directed Graphs Directed Graphs Directed graph. G = (V, E) Edge (u, v)

6.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

5.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

CS 401 Greedy Algorithms Xiaorui Sun 1 Directed Acyclic Graphs (DAG) Def: A DAG is a directed

OVERHEAD CRANE OVERHEAD CRANE-HOIST HOIST-JIB CRANE JIB CRANE ATEX PLANT ATEX PLANT

ECS231 Intro to High Performance Computing April 13, 2019 1 / 33 Algorithm design and

Cost Models Chapter Twenty-One Modern Programming Languages, 2nd ed. 1 Which Is Faster?

Cache&amp;aware)Sparse)Matrix)Formats)) for)Kepler)GPU

Ch u nking Arra y s in Dask PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha v ide

Other Camera Controls The LookAt function is only for positioning camera Other ways to

CSCI 1112: Lecture 3 Mona Diab RoadMap High level view of Object Oriented Programming

Introduction to Object-Oriented Programming Arrays Christopher Simpkins

Addressing Modes Chapter 11 S. Dandamudi Outline Addressing modes Examples Simple

MEET SPOT. 1. SEEK & SCAN DRIVERS EMERGING TRENDS OF ISSUES CHANGE FUTURES 1. SEEK 2.

Cache&aware)Sparse)Matrix)Formats)) for)Kepler)GPU