Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju - PowerPoint PPT Presentation

Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju Rangaswami 2 Akshat Verma 1 1 IBM Research, India 2 School of Computing and Information Sciences College of Engineering and Computing HotStorage Workshop, 2011 1 / 13

Introduction ◮ Memory access granularity is smaller than disk’s 2 / 13

Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store 2 / 13

Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 1. Write( ✗ ) Process OS ??? Backing Store 2 / 13

Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 1. Write( ✗ ) 2. Miss Process OS ??? Backing Store 2 / 13

Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 1. Write( ✗ ) 2. Miss Process OS ??? 3. Issue Backing Store 2 / 13

Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 2 / 13

Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 5. Return 2 / 13

Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 6. Write( ✔ ) 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 5. Return 2 / 13

Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 6. Write( ✔ ) 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 5. Return For writes: why wait for data that the application doesn’t need? 2 / 13

Non-blocking Writes: Basic Approach Process OS Backing Store 3 / 13

Non-blocking Writes: Basic Approach 1. Write( ✗ ) Process ??? OS Backing Store 3 / 13

Non-blocking Writes: Basic Approach 1. Write( ✗ ) 2. Miss Process ??? OS Backing Store 3 / 13

Non-blocking Writes: Basic Approach Patch 3. Buffer 1. Write( ✗ ) 2. Miss Process ??? OS Backing Store 3 / 13

Non-blocking Writes: Basic Approach Patch 3. Buffer 1. Write( ✗ ) 2. Miss Process ??? OS 4. Issue Backing Store 3 / 13

Non-blocking Writes: Basic Approach Patch 3. Buffer 1. Write( ✗ ) 2. Miss Process ??? OS 4. Issue Backing Store 5. Return 3 / 13

Non-blocking Writes: Basic Approach Patch 3. Buffer 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 4. Issue 6. Complete Backing Store 5. Return 3 / 13

Non-blocking Writes: Basic Approach 7. Merge Patch 3. Buffer 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 4. Issue 6. Complete Backing Store 5. Return 3 / 13

Non-blocking Writes: Basic Approach 7. Merge Patch 3. Buffer 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 4. Issue 6. Complete Backing Store 5. Return Benefits 1. Application execution time reduction 2. Increased backing store bandwidth usage 3 / 13

Motivation → Higher Fault Rates Memory over-committed in virtualized environments 4 / 13

Motivation → Higher Fault Rates Memory over-committed in virtualized environments More process running with multi-core and virtualized environments 4 / 13

Motivation → Higher Fault Rates Memory over-committed in virtualized environments More process running with multi-core and virtualized environments Memory hierarchy moving towards a more active and faster backing store 4 / 13

Motivation → % Non-blocking faults ◮ We calculate the % of faults that can benefit in all our workloads 5 / 13

Motivation → % Non-blocking faults ◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server 5 / 13

Motivation → % Non-blocking faults ◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server ◮ Simulator with full-system memory traces. ◮ RAM set to 50% of app footprint 5 / 13

Motivation → % Non-blocking faults ◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server ◮ Simulator with full-system memory traces. ◮ RAM set to 50% of app footprint ◮ Up to 80% of page faults benefit 100 % Non-Block Faults 80 r e Image Proc p Server o l 60 e v e D 40 20 0 Workload 5 / 13

Related Work Alternatives to non-blocking writes: Perfect DRAM Provision Unpredictable or unbounded. Prefetching Can incur false positives and false negatives. Asynchronous System Calls 1. Do not work with memory mapped pages 2. Written data not immediately available for reading 6 / 13

Solution Challenges Process 7 / 13

Solution Challenges write(buf, nbytes, dest addr) OS call Process Store Inst fault(dest addr) 7 / 13

Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) 7 / 13

Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) Information Per Non-blocking Write Information Write Offset Data Written Size of Data 7 / 13

Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) Information Per Non-blocking Write Information Supervised write() ✔ Write Offset ✔ Data Written ✔ Size of Data 7 / 13

Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) Information Per Non-blocking Write Information Supervised Unsupervised write() Fault ✔ ✔ Write Offset ✔ ✗ Data Written ✔ ✗ Size of Data 7 / 13

Handling Unsupervised Writes Approach Description Fast All Arch? Low Mem? fault() ✔ ✗ ✔ Full Feature Hard- ware 8 / 13

Handling Unsupervised Writes Approach Description Fast All Arch? Low Mem? fault() ✔ ✗ ✔ Full Feature Hard- ware 4 bytes offset sw $t1, 0xff ✔ ✗ ✔ Opcode Disassembly data 8 / 13

Handling Unsupervised Writes Approach Description Fast All Arch? Low Mem? fault() ✔ ✗ ✔ Full Feature Hard- ware 4 bytes offset sw $t1, 0xff ✔ ✗ ✔ Opcode Disassembly data Disk Page or 0-buffer ✗ ✔ ✗ and 1-buffer Page Diff-Merge Updated Page 8 / 13

Quantifying Benefits 1. Fraction of non-blocking write faults ✔ 2. Outstanding write faults (over time) 3. Savings in execution time (new!) Virtual Memory Simulator Input RAM size & Full System Memory Traces Output Performance statistics ◮ Memory size set to 50% of workloads footprint ◮ Creating patches is not required 9 / 13

Quantifying Benefits → Metric ◮ How to measure the additional parallelism? ◮ Outstanding Write Faults (OWF): # of parallel write faults at any time � OWF ≤ OIO � OWF ≤ 1 for single threaded applications � OWF ≥ 0 when using non-blocking writes ◮ We need the variations over time as well ◮ E[ OWF ] : time-weighted average OWF 50 Image Proc 40 r e p o r e l e v 30 E[OWF] v r e e D S 20 10 0 Workload 10 / 13

Quantifying Benefits → Time Reduction ◮ These results are not in the paper ◮ Execution time = Trace time + Synchronous read time ◮ Write time of dirty page on evictions ignored ◮ Rough estimate: error proportional to the number of dirty pages evicted 60 Image Proc % Exec. Time 40 Decrease r e r e p v o r l e e S v e D 20 0 Workload 11 / 13

Conclusions and Future Work ◮ We presented non-blocking writes: a technique to eliminate read-before-writes � Reduced execution time � Increased device usage ◮ We estimate a reduction times of 0.1-54% ◮ In the future, we are planning to implement non-blocking writes to better study its implications � What workloads benefit from Non-blocking writes? 12 / 13

Questions? 13 / 13

Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju - PowerPoint PPT Presentation

Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju Rangaswami 2 Akshat Verma 1 1 IBM Research, India 2 School of Computing and Information Sciences College of Engineering and Computing HotStorage Workshop, 2011 1 / 13 Introduction

Truly group 2016/05 TRULY Group 4 6 PRODUCTS COMPANIES 38 28000 YEARS EMPLOYEES TRULY

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Dynamic Blocking Problems for Models of Fire Propagation Alberto Bressan Department of

Delay Aware Packet Scheduling (DAPS) and receivers buffer blocking in CMT-SCTP Nicolas KUHN 1 ,

Blocking in the 2 k Design Blocking may be required because: we cannot perform all required runs

Efficient ion blocking in gaseous detectors Efficient ion blocking in gaseous detectors and its

To truly know the world, look deeply within your own being; to truly know yourself, take real

PARADOX THE UPSIDE DOWN TRUTH OF FAITH PARADOX Week 4 Seeing the Unseen to Truly See

INSERT SHEPHERD VIDEO Truly, truly, I say to you, he who does not enter the sheepfold by

JOHN 14.12-18 John 14.12-18 12 Truly, truly, I say to you, whoever believes in me will also do

[Introduction to] Writing non- blocking code ... in Node.js and Perl Thursday, July 19, 12

A General Technique for Non-blocking Trees Trevor Brown, University of Toronto, Canada Faith

Non-Blocking Two Phase Commit (2PC) Using Blockchain Paul Ezhilchelvan , Amjad Aldweesh and Aad

Pragmatic Primitives for Non-blocking Data Structures PODC 2013 Trevor Brown, University of

Bayesian Belief Networks Decision Theoretic Agents Introduction to Probability [Ch13]

Migrating to Gitlab Daniel Vetter, Intel OTC @danvet LPC 2018, Vancouver why this talk kernel

Customizing CiviCRM Lisa Rau/Ashma Shrestha Stuart Gaston 3/25/12

A Spring In Our Step Justine Randall March Paradigm Consulting Events Agenda 1. Tatton

ENEE651/CMSC751: Parallel Algorithmics, Spring 2018 Time and Location TuTh 2:00-3:15. EGR

Decentralized Version Control Systems Matthieu Moy Verimag 2007 Matthieu Moy (Verimag) DVC

Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin Xi-Zhu Wu 1 , Song Liu 2 ,

Collection sponsoring : keeping track of tree adoptions Gerda van Uffelen, Hortus Botanicus Leiden