1 Prefetching Prefetching or Streaming or Streaming Prediction - PDF document

I/O Buffering and Caching I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks or pages I/O Buffering and Streaming I/O Buffering and Streaming Blocks have an (object, logical block) identity Blocks/pages are cached in memory • Spatial and temporal locality • Fetch/replacement issues just as VM paging • Tradeoff of block size I/O I/O Effective Bandwidth Effective Bandwidth g BG Application processing Call this the gap g I/O initiation (e.g., syscall, driver, etc.) Define G to be transfer time per byte (bandwidth = 1/G ) I/O access request latency (e.g., disk seek, network) Block size is B bytes; transfer time is BG block transfer (disk surface, bus, network) What’s the effective bandwidth (throughput)? I/O completion overhead (e.g., block copy) Impact of Transfer Size Impact of Transfer Size Bubbles in the I/O Pipeline Bubbles in the I/O Pipeline The CPU and I/O units are both underutilized in this example. In this case, latency is critical for throughput. There are “bubbles” in the pipeline: how to overlap activity on the CPU and I/O units? • Multiprogramming is one way But what if there is only one task? Goals: keep all units fully utilized to improve throughput. B/(g i + BG i ) Hide the latency B = transfer size g = overhead ( µs) G = inverse bandwidth For these curves, G matches 32-bit 33MHz PCI and Myrinet LANai-4 link speed (132 MB/s). 1

Prefetching Prefetching or Streaming or Streaming Prediction Prediction Compiler-driven • Compile-time information about loop nests, etc. Markov prediction • “Learn” repeated patterns as program executes. Pre-execution • Execute the program speculatively, and watch its accesses Query optimization or I/O-efficient algorithm • “Choreograph” I/O accesses for complex operation How to get application-level hints to the kernel? Hinting or asynchronous I/O Readahead Readahead Prefetching and Streaming I/O: Examples Prefetching and Streaming I/O: Examples App requests block n App requests block n+1 Parallel disks n n+1 n+2 Latency for arm movement System prefetchesblock n+2 System prefetchesblock n+3 Network data fetch E.g., network memory Readahead: the system predictively issues I/Os in advance of need. This Fetch from server cache may use low-level asynchrony or create threads to issue the I/Os and wait Latency for request propagation for them to complete (e.g., RPC-based file systems such as NFS). Prefetching Prefetching and I/O Scheduling and I/O Scheduling The I/O Pipeline and I/O Overhead The I/O Pipeline and I/O Overhead Network data fetch Bandwidth-limited Asynchronous I/O or prefetching can expose more information to the I/O system, which may allow it to schedule accesses more efficiently . Faster network CPU-limited In this example, overhead rather than latency is the bottleneck for I/O throughput. E.g., read one large block with a single seek/rotation. How important is it to reduce I/O overhead as I/O devices get faster? 2

Can Prefetching Can Prefetching Hurt Performance? Hurt Performance? File Block Buffer Cache File Block Buffer Cache HASH( vnode, logical block ) Buffers with valid data are retained in Prefetching“trades bandwidth for latency”. memory in a buffer cache or file cache . • Need some bandwidth to trade… Each item in the cache is a buffer Mispredictions impose a cost. header pointing at a buffer . How deeply should we prefetch? Blocks from different files may be • Prefetching requires memory for the prefetch buffer. intermingled in the hash chains. • Must prefetch deeply enough to absorb bursts. • How much do I need to avoid stalls System data structures hold pointers to Fixed-depth vs. variable depth buffers only when I/O is pending or Most systems use a pool of buffers in • Forestall imminent. kernel memory as a staging area for - busy bit instead of refcount memory<->disk transfers. - most buffers are “free” Why Are File Caches Effective? Why Are File Caches Effective? I/O Caching vs. Memory Caches I/O Caching vs. Memory Caches 1. Locality of reference : storage accesses come in clumps. Associativity • spatial locality : If a process accesses data in block B, it is software to track references likely to reference other nearby data soon. variable-cost backing storage (e.g., rotational) (e.g., the remainder of block B) what's different from paging? example: reading or writing a file one byte at a time • but don't need to sample to track references • temporal locality : Recently accessed data is likely to be used Also: access properties are different again. 2. Read-ahead : if we can predict what blocks will be needed soon, we can prefetch them into the cache. • most files are accessed sequentially I/O Block Caching: When, What,Where? I/O Block Caching: When, What,Where? Replacement Replacement What’s the right cache replacement policy for sequentially accessed files? Question: should I/O caching be the responsibility of the kernel? How is replacement different from virtual memory page cache management? …or… How to control the impact of deep prefetching on the cache? Can/should we push it up to the • Integrated caching and prefetching application level? (Be sure you understand Page/block the tradeoffs.) cache 3

Handling Updates in the File Cache Handling Updates in the File Cache Write Write-Behind Behind 1. Blocks may be modified in memory once they have been brought into the cache. Modified blocks are dirty and must (eventually) be written back. 2. Once a block is modified in memory, the write back to disk may not be immediate ( synchronous ). • Delayed writes absorb many small updates with one disk write. How long should the system hold dirty data in memory? • Asynchronous writes allow overlapping of computation and disk This is write-behind . update activity ( write-behind ). Prediction? Performance? Memory cost? Reliability? Do the write call for block n+1 while transfer of block n is in progress. • Thus file caches also can improve performance for writes. Delayed Writes Delayed Writes Write Batching/Gathering Write Batching/Gathering Block N, byte range i, j, k. Block N, N+1, N+2. Block N Block N Block N Block N Block N+1 Block N+2 Block N Block N to N+2 This is a delayed write strategy. This combines delayed write and write-behind. Prediction? Performance? Memory cost? Reliability? Prediction? Performance? Memory cost? Reliability? Exploiting Asynchrony in Writes Exploiting Asynchrony in Writes Advantages: • Absorb multiple writes to the same block. • Batch consecutive writes to a single contiguous transfer. • Blocks often “die” in memory if file is removed after write. • Give more latitude to the disk scheduler to reorder writes for best disk performance. Disadvantages: • Data may be lost in a failure. • Writes may complete out of order. What is the state of the disk after a failure? • When to execute writes? sync daemon with flush-on-close 4

1 Prefetching Prefetching or Streaming or Streaming Prediction - PDF document

I/O Buffering and Caching I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks or pages I/O Buffering and Streaming I/O

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Buffer Trees Lars Arge. The Buffer Tree: A New Technique for Optimal I/O Algorithms . In

Enlightenment as Standalone Wayland Compositor Christopher Michael & Stefan Schmidt FOSDEM

Screen-Space Triangulation for Interactive Point Rendering Reinhold Preiner Institute of

FaCT: A DSL for Timing-Sensitive Computation Sunjay Cauligi , UC San Diego Gary Soeller, Brian

High Performance PostgreSQL, Tuning and Optimization Guide Ibrar Ahmed Senior Software Engineer

P ostgreSQ L B u fg ers Vi k Feari ng PG C onf. EU W arsaw O ctober 25, 20 17 V i k

SYSC3601 Microprocessor Systems Unit 4: 8086/88 Hardware & Bus Structure Topics/Reading

Cisco IOS Embedded Packet Capture (EPC) Cisco IOS Embedded Packet Capture (EPC) The Cisco IOS

Performance Advantages of Using a Burst Buffer for Scientific Workflows Andrey Ovsyannikov NERSC,

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Conversations with my washing machine an in-the-wild study of demand shifting with self-generated

Multilingual Visual Sentiment Concept Matching Nikolaos Pappas, Miriam Redi, Mercan Topkara,

Task-Oriented Active Perception and Planning in Environments with Partially Known Semantics MAHSA

STAGE 2 AREAS OF CONCENTRATION Scope Whats in scope and whats not in scope

Re-ECN: Adding Accountability for Causing Congestion to TCP/IP Bob Briscoe , BT & UCL Arnaud

Lecture I What is UC-HiPACC? AstroComputing Challenges Cosmological Visualizations Joel R.

Preparing for/Responding to: Repeal/Repair/Replace Environment February 10, 2017 Chris Jennings

Institute for Complex Systems Simulation An EPSRC Doctoral Training Centre Seth Bullock

And the LORD spake unto Moses, saying, Speak unto the children of Israel, saying, In the seventh

Yet Not I But Through Christ In Me Welcome & Announcements Quiz Worx Show Promo Video v0.1

Decidability Turing Machines Coded as Binary Strings Diagonalizing over Turing Machines

NOW YOU SEE HIM; NOW YOU DONT LESSON 3 Your

(Inbe) Tweening : the process of generating intermediate frames between two key frames to give the

1 Prefetching Prefetching or Streaming or Streaming Prediction - PDF document

I/O Buffering and Caching I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks or pages I/O Buffering and Streaming I/O

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Buffer Trees Lars Arge. The Buffer Tree: A New Technique for Optimal I/O Algorithms . In

Enlightenment as Standalone Wayland Compositor Christopher Michael &amp; Stefan Schmidt FOSDEM

Screen-Space Triangulation for Interactive Point Rendering Reinhold Preiner Institute of

FaCT: A DSL for Timing-Sensitive Computation Sunjay Cauligi , UC San Diego Gary Soeller, Brian

High Performance PostgreSQL, Tuning and Optimization Guide Ibrar Ahmed Senior Software Engineer

P ostgreSQ L B u fg ers Vi k Feari ng PG C onf. EU W arsaw O ctober 25, 20 17 V i k

SYSC3601 Microprocessor Systems Unit 4: 8086/88 Hardware &amp; Bus Structure Topics/Reading

Cisco IOS Embedded Packet Capture (EPC) Cisco IOS Embedded Packet Capture (EPC) The Cisco IOS

Performance Advantages of Using a Burst Buffer for Scientific Workflows Andrey Ovsyannikov NERSC,

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Conversations with my washing machine an in-the-wild study of demand shifting with self-generated

Multilingual Visual Sentiment Concept Matching Nikolaos Pappas, Miriam Redi, Mercan Topkara,

Task-Oriented Active Perception and Planning in Environments with Partially Known Semantics MAHSA

STAGE 2 AREAS OF CONCENTRATION Scope Whats in scope and whats not in scope

Re-ECN: Adding Accountability for Causing Congestion to TCP/IP Bob Briscoe , BT &amp; UCL Arnaud

Lecture I What is UC-HiPACC? AstroComputing Challenges Cosmological Visualizations Joel R.

Preparing for/Responding to: Repeal/Repair/Replace Environment February 10, 2017 Chris Jennings

Institute for Complex Systems Simulation An EPSRC Doctoral Training Centre Seth Bullock

And the LORD spake unto Moses, saying, Speak unto the children of Israel, saying, In the seventh

Yet Not I But Through Christ In Me Welcome &amp; Announcements Quiz Worx Show Promo Video v0.1

Decidability Turing Machines Coded as Binary Strings Diagonalizing over Turing Machines

NOW YOU SEE HIM; NOW YOU DONT LESSON 3 Your

(Inbe) Tweening : the process of generating intermediate frames between two key frames to give the

Enlightenment as Standalone Wayland Compositor Christopher Michael & Stefan Schmidt FOSDEM

SYSC3601 Microprocessor Systems Unit 4: 8086/88 Hardware & Bus Structure Topics/Reading

Re-ECN: Adding Accountability for Causing Congestion to TCP/IP Bob Briscoe , BT & UCL Arnaud

Yet Not I But Through Christ In Me Welcome & Announcements Quiz Worx Show Promo Video v0.1