Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju - - PowerPoint PPT Presentation

truly non blocking writes
SMART_READER_LITE
LIVE PREVIEW

Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju - - PowerPoint PPT Presentation

Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju Rangaswami 2 Akshat Verma 1 1 IBM Research, India 2 School of Computing and Information Sciences College of Engineering and Computing HotStorage Workshop, 2011 1 / 13 Introduction


slide-1
SLIDE 1

Truly Non-blocking Writes

Luis Useche2 Ricardo Koller2 Raju Rangaswami2 Akshat Verma1

1IBM Research, India 2School of Computing and Information Sciences

College of Engineering and Computing

HotStorage Workshop, 2011

1 / 13

slide-2
SLIDE 2

Introduction

◮ Memory access granularity is smaller than disk’s

2 / 13

slide-3
SLIDE 3

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

2 / 13

slide-4
SLIDE 4

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

???

  • 1. Write(✗)

2 / 13

slide-5
SLIDE 5

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

???

  • 1. Write(✗)
  • 2. Miss

2 / 13

slide-6
SLIDE 6

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

???

  • 1. Write(✗)
  • 2. Miss
  • 3. Issue

2 / 13

slide-7
SLIDE 7

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

110 101 001

  • 1. Write(✗)
  • 2. Miss
  • 3. Issue
  • 4. Complete

2 / 13

slide-8
SLIDE 8

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

110 101 001

  • 1. Write(✗)
  • 2. Miss
  • 3. Issue
  • 4. Complete
  • 5. Return

2 / 13

slide-9
SLIDE 9

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

110 101 001

  • 1. Write(✗)
  • 2. Miss
  • 3. Issue
  • 4. Complete
  • 5. Return
  • 6. Write(✔)

2 / 13

slide-10
SLIDE 10

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

110 101 001

  • 1. Write(✗)
  • 2. Miss
  • 3. Issue
  • 4. Complete
  • 5. Return
  • 6. Write(✔)

For writes: why wait for data that the application doesn’t need?

2 / 13

slide-11
SLIDE 11

Introduction

◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store

110 101 001

  • 1. Write(✗)
  • 2. Miss
  • 3. Issue
  • 4. Complete
  • 5. Return
  • 6. Write(✔)

For writes: why wait for data that the application doesn’t need?

2 / 13

slide-12
SLIDE 12

Non-blocking Writes: Basic Approach

Process OS Backing Store

3 / 13

slide-13
SLIDE 13

Non-blocking Writes: Basic Approach

Process OS Backing Store

???

  • 1. Write(✗)

3 / 13

slide-14
SLIDE 14

Non-blocking Writes: Basic Approach

Process OS Backing Store

???

  • 1. Write(✗)
  • 2. Miss

3 / 13

slide-15
SLIDE 15

Non-blocking Writes: Basic Approach

Process OS Backing Store

???

  • 1. Write(✗)
  • 2. Miss

Patch

  • 3. Buffer

3 / 13

slide-16
SLIDE 16

Non-blocking Writes: Basic Approach

Process OS Backing Store

???

  • 1. Write(✗)
  • 2. Miss

Patch

  • 3. Buffer
  • 4. Issue

3 / 13

slide-17
SLIDE 17

Non-blocking Writes: Basic Approach

Process OS Backing Store

???

  • 1. Write(✗)
  • 2. Miss

Patch

  • 3. Buffer
  • 4. Issue
  • 5. Return

3 / 13

slide-18
SLIDE 18

Non-blocking Writes: Basic Approach

Process OS Backing Store

110 101 001

  • 1. Write(✗)
  • 2. Miss

Patch

  • 3. Buffer
  • 4. Issue
  • 5. Return
  • 6. Complete

3 / 13

slide-19
SLIDE 19

Non-blocking Writes: Basic Approach

Process OS Backing Store

110 101 001

  • 1. Write(✗)
  • 2. Miss

Patch

  • 3. Buffer
  • 4. Issue
  • 5. Return
  • 6. Complete
  • 7. Merge

3 / 13

slide-20
SLIDE 20

Non-blocking Writes: Basic Approach

Process OS Backing Store

110 101 001

  • 1. Write(✗)
  • 2. Miss

Patch

  • 3. Buffer
  • 4. Issue
  • 5. Return
  • 6. Complete
  • 7. Merge

Benefits

  • 1. Application execution time reduction
  • 2. Increased backing store bandwidth usage

3 / 13

slide-21
SLIDE 21

Motivation → Higher Fault Rates

Memory over-committed in virtualized en- vironments

4 / 13

slide-22
SLIDE 22

Motivation → Higher Fault Rates

Memory over-committed in virtualized en- vironments More process running with multi-core and virtualized environments

4 / 13

slide-23
SLIDE 23

Motivation → Higher Fault Rates

Memory over-committed in virtualized en- vironments More process running with multi-core and virtualized environments Memory hierarchy moving towards a more active and faster backing store

4 / 13

slide-24
SLIDE 24

Motivation → % Non-blocking faults

◮ We calculate the % of faults that can benefit in all our workloads

5 / 13

slide-25
SLIDE 25

Motivation → % Non-blocking faults

◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server

5 / 13

slide-26
SLIDE 26

Motivation → % Non-blocking faults

◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server ◮ Simulator with full-system memory traces. ◮ RAM set to 50% of app footprint

5 / 13

slide-27
SLIDE 27

Motivation → % Non-blocking faults

◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server ◮ Simulator with full-system memory traces. ◮ RAM set to 50% of app footprint ◮ Up to 80% of page faults benefit

20 40 60 80 100 % Non-Block Faults Workload Image Proc D e v e l

  • p

e r Server

5 / 13

slide-28
SLIDE 28

Related Work

Alternatives to non-blocking writes: Perfect DRAM Provision Unpredictable or unbounded. Prefetching Can incur false positives and false negatives. Asynchronous System Calls

  • 1. Do not work with memory mapped pages
  • 2. Written data not immediately available for reading

6 / 13

slide-29
SLIDE 29

Solution Challenges

Process

7 / 13

slide-30
SLIDE 30

Solution Challenges

Process write(buf, nbytes, dest addr) fault(dest addr)

OS call Store Inst

7 / 13

slide-31
SLIDE 31

Solution Challenges

Process write(buf, nbytes, dest addr) fault(dest addr)

OS call Store Inst

patch{new buf, nbytes, dest addr}

7 / 13

slide-32
SLIDE 32

Solution Challenges

Process write(buf, nbytes, dest addr) fault(dest addr)

OS call Store Inst

patch{new buf, nbytes, dest addr} Information Per Non-blocking Write Information Write Offset Data Written Size of Data

7 / 13

slide-33
SLIDE 33

Solution Challenges

Process write(buf, nbytes, dest addr) fault(dest addr)

OS call Store Inst

patch{new buf, nbytes, dest addr} Information Per Non-blocking Write Information Supervised

write()

Write Offset ✔ Data Written ✔ Size of Data ✔

7 / 13

slide-34
SLIDE 34

Solution Challenges

Process write(buf, nbytes, dest addr) fault(dest addr)

OS call Store Inst

patch{new buf, nbytes, dest addr} Information Per Non-blocking Write Information Supervised Unsupervised

write() Fault

Write Offset ✔ ✔ Data Written ✔ ✗ Size of Data ✔ ✗

7 / 13

slide-35
SLIDE 35

Solution Challenges

Process write(buf, nbytes, dest addr) fault(dest addr)

OS call Store Inst

patch{new buf, nbytes, dest addr} Information Per Non-blocking Write Information Supervised Unsupervised

write() Fault

Write Offset ✔ ✔ Data Written ✔ ✗ Size of Data ✔ ✗

7 / 13

slide-36
SLIDE 36

Handling Unsupervised Writes

Approach Description

Fast All Arch? Low Mem?

Full Feature Hard- ware fault() ✔ ✗ ✔

8 / 13

slide-37
SLIDE 37

Handling Unsupervised Writes

Approach Description

Fast All Arch? Low Mem?

Full Feature Hard- ware fault() ✔ ✗ ✔ Opcode Disassembly sw $t1, 0xff 4 bytes data

  • ffset

✔ ✗ ✔

8 / 13

slide-38
SLIDE 38

Handling Unsupervised Writes

Approach Description

Fast All Arch? Low Mem?

Full Feature Hard- ware fault() ✔ ✗ ✔ Opcode Disassembly sw $t1, 0xff 4 bytes data

  • ffset

✔ ✗ ✔ Page Diff-Merge Disk Page

  • r

0-buffer and 1-buffer Updated Page ✗ ✔ ✗

8 / 13

slide-39
SLIDE 39

Quantifying Benefits

  • 1. Fraction of non-blocking write faults ✔
  • 2. Outstanding write faults (over time)
  • 3. Savings in execution time (new!)

Virtual Memory Simulator Input RAM size & Full System Memory Traces Output Performance statistics ◮ Memory size set to 50% of workloads footprint ◮ Creating patches is not required

9 / 13

slide-40
SLIDE 40

Quantifying Benefits → Metric

◮ How to measure the additional parallelism? ◮ Outstanding Write Faults (OWF): # of parallel write faults at any time

OWF ≤ OIO OWF ≤ 1 for single threaded applications OWF ≥ 0 when using non-blocking writes

◮ We need the variations over time as well ◮ E[OWF]: time-weighted average OWF

10 20 30 40 50 E[OWF] Workload Image Proc D e v e l

  • p

e r S e r v e r

10 / 13

slide-41
SLIDE 41

Quantifying Benefits → Time Reduction

◮ These results are not in the paper ◮ Execution time = Trace time + Synchronous read time ◮ Write time of dirty page on evictions ignored ◮ Rough estimate: error proportional to the number of dirty pages evicted

20 40 60 % Exec. Time Decrease Workload Image Proc D e v e l

  • p

e r S e r v e r

11 / 13

slide-42
SLIDE 42

Conclusions and Future Work

◮ We presented non-blocking writes: a technique to eliminate read-before-writes

Reduced execution time Increased device usage

◮ We estimate a reduction times of 0.1-54% ◮ In the future, we are planning to implement non-blocking writes to better study its implications

What workloads benefit from Non-blocking writes?

12 / 13

slide-43
SLIDE 43

Questions?

13 / 13

slide-44
SLIDE 44

Virtual Memory Simulator

Input: RAM size & Mem Traces Output: Per Entry: Timestamp and event (hit, miss, evict); Global: Performance stats. ◮ Writes to out-of-core pages considered non-blocking ◮ Non-blocking status revoked when:

  • 1. The page is read before I/O completion
  • 2. The page is evicted before I/O completion

11 / 13

slide-45
SLIDE 45

Quantifying Benefits → Full System Memory Traces

Modified x86 software-MMU QEMU to log all memory accesses: ◮ Instruction count, CR3, virtual/physical address, access-mode, page privileges. Workloads Type # Footprint Avg/Std (MB) Server 10 294/158 Developer 4 269/183 Image 1 149/0

12 / 13

slide-46
SLIDE 46

Solution Approaches → Page Diff-Merge

  • 1. Write in two pages: 0-page and 1-page.
  • 2. Merge with and and or.

Process Backing Store

13 / 13

slide-47
SLIDE 47

Solution Approaches → Page Diff-Merge

  • 1. Write in two pages: 0-page and 1-page.
  • 2. Merge with and and or.

Process

101

  • 1. Write

Backing Store

13 / 13

slide-48
SLIDE 48

Solution Approaches → Page Diff-Merge

  • 1. Write in two pages: 0-page and 1-page.
  • 2. Merge with and and or.

Process

101

  • 1. Write

111 111 111 000 000 000

Backing Store

13 / 13

slide-49
SLIDE 49

Solution Approaches → Page Diff-Merge

  • 1. Write in two pages: 0-page and 1-page.
  • 2. Merge with and and or.

Process

101

  • 1. Write

111 101 111 000 101 000

  • 2. Write
  • 2. Write

Backing Store

13 / 13

slide-50
SLIDE 50

Solution Approaches → Page Diff-Merge

  • 1. Write in two pages: 0-page and 1-page.
  • 2. Merge with and and or.

Process

101

  • 1. Write

111 101 111 000 101 000

  • 2. Write
  • 2. Write

Backing Store

101 110 011

  • 3. Complete

13 / 13

slide-51
SLIDE 51

Solution Approaches → Page Diff-Merge

  • 1. Write in two pages: 0-page and 1-page.
  • 2. Merge with and and or.

Process

101

  • 1. Write

111 101 111 000 101 000

  • 2. Write
  • 2. Write

Backing Store

101 110 011

  • 3. Complete

And 101 100 011 13 / 13

slide-52
SLIDE 52

Solution Approaches → Page Diff-Merge

  • 1. Write in two pages: 0-page and 1-page.
  • 2. Merge with and and or.

Process

101

  • 1. Write

111 101 111 000 101 000

  • 2. Write
  • 2. Write

Backing Store

101 110 011

  • 3. Complete

And 101 100 011 Or 101 101 011 13 / 13