Reducing Journaling Harm on Virtualized I/O systems Eunji Lee, - - PowerPoint PPT Presentation

reducing journaling harm on virtualized i o systems
SMART_READER_LITE
LIVE PREVIEW

Reducing Journaling Harm on Virtualized I/O systems Eunji Lee, - - PowerPoint PPT Presentation

Chungbuk National University Reducing Journaling Harm on Virtualized I/O systems Eunji Lee, Hyokyung Bahn, Minseong Jeong, Sunghwan Kim, Jesung Yeon, Seunghoon Yoo, Sam H. Noh, Kang G. Shin ACM/ USENIX SYSTOR 16, June 6 8 1 Chungbuk


slide-1
SLIDE 1

1

Chungbuk National University

Reducing Journaling Harm on Virtualized I/O systems

Eunji Lee, Hyokyung Bahn, Minseong Jeong, Sunghwan Kim, Jesung Yeon, Seunghoon Yoo, Sam H. Noh, Kang G. Shin

ACM/ USENIX SYSTOR ’16, June 6‐8

slide-2
SLIDE 2

2

Chungbuk National University

Virtualization in Computer Systems

 Widely used in modern computer systems

  • From personal computing devices to cloud servers

 Provide flexibility, scalability and energy savings

  • Separate a software platform from hardware conditions

 Accompanied with inefficiencies

  • Additional software layers

 “Bare‐metal” approach:

Para‐Virtualization

  • Need to modify guest OS

 80% of cloud servers rely on

full virtualization hypervisors

  • Vmware, Hyper‐V, and QEMU‐KVM

http://www.infoq.com/news/2012/10/Survey-Virtualization-Cloud

slide-3
SLIDE 3

3

Chungbuk National University

Our Work in Brief

 Challenge

  • Need to improve inefficiency of virtualization without

compromising on transparency

  • Layered SW stack is more painful on high‐speed storage

 Guest’s journaling makes harmful effects on I/O

performance in a virtualized environment

 Analyze the effectiveness of journaling and reduce the

  • verhead by presenting a new caching strategy

 Proposed strategy is implemented in QEMU‐KVM, and

improves I/O performance by 3‐32% for file and key‐value store benchmarks

slide-4
SLIDE 4

4

Chungbuk National University

I/O Stack in Full Virtualization

 Both guest and host have their own file systems

and buffer caches

 Guest’s I/O goes through host cache

  • (‐) Redundant data caching

… Guest’s buffer cache Host’s buffer cache A B C A

Read “A”

A

slide-5
SLIDE 5

5

Chungbuk National University

I/O Stack in Full Virtualization

 Both guest and host have their own file systems

and buffer caches

 Guest’s I/O goes through host cache

  • (‐) Redundant data caching
  • (+) Large shared cache

… Guest’s buffer cache Host’s buffer cache A B C A

Read “A”

A

slide-6
SLIDE 6

6

Chungbuk National University

I/O Stack in Full Virtualization

 Both guest and host have their own file systems

and buffer caches

 Guest’s I/O goes through host cache

  • (‐) Redundant data caching
  • (+) Large shared cache
  • (+) Buffering and merging effects

… Guest’s buffer cache Host’s buffer cache A B C A

Read “A”

A

slide-7
SLIDE 7

7

Chungbuk National University

I/O Stack in Full Virtualization

 Both guest and host have their own file systems

and buffer caches

 Guest’s I/O goes through host cache

  • (‐) Redundant data caching
  • (+) Large shared cache
  • (+) Buffering and merging effects

 Deliver better performance

  • Locality
  • Asynchronous writes

… Guest’s buffer cache Host’s buffer cache A B C A

Read “A”

A

slide-8
SLIDE 8

8

Chungbuk National University

I/O Stack in Full Virtualization

 Both guest and host have their own file systems

and buffer caches

 Guest’s I/O goes through host cache

  • (‐) Redundant data caching
  • (+) Large shared cache
  • (+) Buffering and merging effects

 Deliver better performance

  • Locality
  • Asynchronous writes

 What if in high‐speed storage?

… Guest’s buffer cache Host’s buffer cache A B C A

Read “A”

A Additional memory copy is more painful

  • n high-speed storage

Solid State Disk

slide-9
SLIDE 9

9

Chungbuk National University

Just Bypass a Host Cache!

 Performance comparison of using host cache (Writeback)

and bypassing it (Direct) in SSD

slide-10
SLIDE 10

10

Chungbuk National University

Just Bypass a Host Cache – NO!

 Using host cache delivers 2.7x and 1.7x better

performance in HDD and SSD on average

Benefit of host cache

slide-11
SLIDE 11

11

Chungbuk National University

Journaling Effects in Virtualization

 A bit on Journaling

  • Used to ensure data consistency in file systems
  • Ext4, JFS, ReiserFS, etc
  • Writes new data to a journal area, and updates original data in the

permanent file location only if logging succeeds

 Case of Ext4

Commit (5s) Checkpoint Data in page cache File system Journal area

slide-12
SLIDE 12

12

Chungbuk National University

Harmful I/O Traffic by Journaling

 No locality : Not accessed again unless the system crashes

… Guest’s buffer cache Host’s buffer cache

J J J

Never hit in host cache

slide-13
SLIDE 13

13

Chungbuk National University

Harmful I/O Traffic by Journaling

 Synchronous writes : FLUSH command comes right after

journaling

… Guest’s buffer cache Host’s buffer cache

J J J

No buffering effect due to immediate synchronization

S J J C D

Associated data blocks

D

Time Ext4 with ordered mode FLUSH FLUSH FLUSH

Journal data

(can be skipped)

slide-14
SLIDE 14

14

Chungbuk National University

Harmful I/O Traffic by Journaling

 Large footprint : Completely sequential writes in a large

loop

J J

Cache pollution

Journal Area J J J J

… Guest’s buffer cache Host’s buffer cache

J J J U S E F U L U S E F U L J J J J J J J

slide-15
SLIDE 15

15

Chungbuk National University

Analyzing Journal Accesses

 Journal traffic accounts for 19% on average and up to 47%

slide-16
SLIDE 16

16

Chungbuk National University

Analyzing Journal Accesses

 Footprint of journal accesses account for 45.2% on

average and up to 84.8% of the total footprint

slide-17
SLIDE 17

17

Chungbuk National University

Analyzing Journal Accesses

 86% of total sync operations are associated with journal

  • n average
slide-18
SLIDE 18

18

Chungbuk National University

Pollution Defensive Caching (PDC)

 Filter out journal traffic from host cache  Two challenges

  • How to identify journal traffic than other
  • How to transfer the information to host OS, such that it can decide

to cache it or not

 Need approaches that still provide transparency

Guest OS

Hypervisor

J F F F J

Host OS

Write journal data directly to storage

slide-19
SLIDE 19

19

Chungbuk National University

1.How to Identify Journal Traffic

 Implicit Journal Traffic Detection

  • Maintain access flows with first and last LBAs in a hash table
  • Monitor if the upcoming request is in a consecutive address range
  • Regard a range where consecutive writes forms a large loop as

journal area

Explicit Knowledge Implicit Detection

Prediction Period

slide-20
SLIDE 20

20

Chungbuk National University

 posix_fadvise system call enables user

applications to provide explicit hints to the OS

 Implement POSIX_FADV_NOREUSE flag

to bypass the host cache

 Host operating system commences

direct I/O when this flag set, and switches to buffered I/O when receiving another system call with POSIX_FADV_NORMAL flag.

  • 2. How to Communicate with Host OS
slide-21
SLIDE 21

21

Chungbuk National University

Performance Evaluation

 Experimental Setup

file system r/w func. buffer cache layer block I/O layer device driver Guest

system call (read / write) dev1 dev2 page buffer bio request Data structure

file system r/w func. buffer cache layer block I/O layer device driver

  • 2. use cache w.

invalidation

  • 1. no

use cache

  • 3. use cache

Hypervisor Pollution Defensive Caching Host journal data regular data system call modify interface in I/O path selective caching routine development

slide-22
SLIDE 22

22

Chungbuk National University

Performance Evaluation

 PDC provides 8‐32% higher IOPS than original caching

(WB) in SSD

slide-23
SLIDE 23

23

Chungbuk National University

Performance Evaluation

 PDC improves synchronous writes and compact

  • perations considerably in key‐value store
  • fillsync and compact operations improves the performance

by 33% and 18%

slide-24
SLIDE 24

24

Chungbuk National University

Cache Hit Ratio

 No significant difference in the hit ratio between two

policies, despite the high ratio of journal data in footprint

slide-25
SLIDE 25

25

Chungbuk National University

Cache Hit Ratio

 No significant difference in the hit ratio between two

policies, despite the high ratio of journal data in footprint

 Journaling writes small updates periodically that are consecutive to previous accesses  Little effect on evicting likely‐to‐be accessed data from the host cache

slide-26
SLIDE 26

26

Chungbuk National University

Conclusion

 Analyze the journaling effect in fully virtualized systems  Uncover that journal traffic deteriorates cache

performance with synchronous writes with no locality

 Propose a new caching policy : pollution‐defensive

caching

 Implemented in Linux 4.14 and QEMU‐KVM  Improve I/O performance by 3‐32% in file and key‐value

store benchmarks

slide-27
SLIDE 27

27

Chungbuk National University

Reducing Write Amplification of Flash Storage through Cooperative Data Management with NVM

32nd International Conference on Massive Storage Systems and Technology (MSST) May, 2016 Eunji Lee, Chungbuk Natational University Julie Kim, Ewha University Hyokyung Bahn, Ewha University Sam H. Noh, UNIST

slide-28
SLIDE 28

28

Chungbuk National University

Write Amplification in SSD

 Undesirable phenomenon associated with flash memory  Number of writes to storage is higher than the number of

writes issued by a host

 Key aspect limiting stable performance and endurance of

SSD

Source: Radian Memory Systems

Performance is fluctuating!

slide-29
SLIDE 29

29

Chungbuk National University

Write Amplification in SSD

 Garbage collection is performed to recycle used blocks  Copy out valid pages in a victim block into a free block

B C D F G H write (B’ F’ G’ H’) B’ F’ G’ H’ B’ F’ G’ H’ A C D E A E write (B’ F’ G’ H’) + write (A C D E ) in GC Write 4 Blocks

Write 8 Blocks

2x Writes! Write Amplification Factor : 2.0

slide-30
SLIDE 30

30

Chungbuk National University

Workload and WAF Relationship

 Analyze WAF with respect to workload characteristics  Generate two synthetic workloads using SSDsim

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

4GB 8GB 16GB 32GB 64GB

SSD Capacity

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

4GB 8GB 16GB 32GB 64GB

SSD Capacity

Random writes Sequential writes WAF

6x writes! No write amplification

Random updates incur the dispersed distribution of the valid pages

slide-31
SLIDE 31

31

Chungbuk National University

Workload and WAF Relationship

 Real workload is a mixture of random and sequential

accesses

 Observe WAF varying the level of randomness

1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4

random r:s = 9:1 r:s = 8:2 r:s = 7:3 r:s = 6:4 r:s = 5:5 r:s = 4:6 r:s = 3:7 r:s = 2:8 r:s = 1:9 sequential

Write Amplification Factor

100% random 100% Sequential A low level of randomness amplifies writes! Real workloads are likely to be between two opposites

WAF

slide-32
SLIDE 32

32

Chungbuk National University

NVM Technology

 Becoming increasingly viable as leading semi‐conductor

manufacturers are eagerly investing in it

  • Diablo Technologies, Memory 1
  • All‐flash system memory module
  • 4TB Memory
  • Intel and Micron, 3DXpoint
  • All‐new memory technology
  • 8x‐10x denser than DRAM
  • 1000 times less latency than NAND

 Fast, scalable, and persistent memory is being realized in

computer systems

slide-33
SLIDE 33

33

Chungbuk National University

Cooperative Data Management (CDM)

 Goal: Reduce WAF by taking advantage of non‐volatility

  • f caches

 Using NVM as a storage cache – promising option

B C D F G H B’ F’ G’ H’ A E B C D F G H B’ F’ G’ H’ A E A C B’ D A C B’ D Volatile Cache Non‐Volatile Cache Flash Storage Flash Storage

slide-34
SLIDE 34

34

Chungbuk National University

Cooperative Data Management (CDM)

 In traditional systems, all valid pages in a victim block

should be copied into a free block during GC

 4 block writes !

B C D F G H B’ F’ G’ H’ A C D E A E A C B’ D Volatile Cache Flash Storage B C D F G H B’ F’ G’ H’ A E A C B’ D Non‐Volatile Cache Flash Storage

slide-35
SLIDE 35

35

Chungbuk National University

Cooperative Data Management (CDM)

 CDM skips the copying of valid pages in GC if the data

exist in non‐volatile cache

 Only one block write!

B C D F G H B’ F’ G’ H’ A C D E A E B C D F G H B’ F’ G’ H’ E A E A C B’ D A C B’ D Volatile Cache Non‐Volatile Cache Flash Storage Flash Storage NO NO NO

slide-36
SLIDE 36

36

Chungbuk National University

Cooperative Data Management (CDM)

 Finite state diagram

  • “Removable” state

‐ Can be erased if the data needs to be copied into another block ‐ Or it is same as the valid state

  • Data is set to “Removable” when

it is cached in a non‐volatile cache

slide-37
SLIDE 37

37

Chungbuk National University

Getting Real: Issues with CDM

 Implementation of CDM has several issues depending on

the architectures

 Feasible architectures

  • NV‐cache as a page cache in host
  • NV‐cache as an in‐storage cache

NV‐cache Flash Storage Host’s page cache DRAM‐cache Flash Storage Host’s page cache NV‐cache

slide-38
SLIDE 38

38

Chungbuk National University

  • 1. NV‐cache as a host cache

 Issue 1. Consistency  Updating data in a cache touches a final copy  Crash during update results in inconsistent data

B’ F’ G’ H’ A C D E B’ F’ G’ H’ E A C B’ D A C B’ D Volatile Cache Non‐Volatile Cache Flash Storage Flash Storage C’ C’

crash crash

slide-39
SLIDE 39

39

Chungbuk National University

  • 1. NV‐cache as a host cache

 Issue 1. Consistency  Solution is associated with specific file system

implementations

 Data consistency is managed in a file system layer with

different techniques

 File systems should be redesigned considering a way of

handling data in CDM

slide-40
SLIDE 40

40

Chungbuk National University

Case Study: Ext4 with CDM

 Finite State Diagram  Introduce additional states of

cached data to determine whether its another copy remains in storage

 Update data with a copy‐on‐

write technique if the cached data serves as a final copy

slide-41
SLIDE 41

41

Chungbuk National University

  • 1. NV‐cache as a host cache

 Issue 2. Communication Overhead  Events in cache and storage should be notified to each

  • ther synchronously
  • e.g., Garbage Collection, Erase of a block, Cached data update, etc.

 Designing a new interface is no longer a big deal

  • Recent packet‐based interfaces like NVM‐e are easy to piggyback

additional information on the original data

 However, frequent communication for additional

information transfer can be a burden

  • Consider finding a way to relieve the overhead as a future work
slide-42
SLIDE 42

42

Chungbuk National University

  • 2. NV‐cache as an in‐storage cache

 More deployable architecture  No consistency issue

  • File systems assume that the data in storage can become

inconsistent if a system crashes during updates

 Not serious communication overhead

  • Sharing information inside a storage

device is much cheaper and easier than synchronizing storage with a host cache

 Development can be achieved by

a single party ‐ storage manufacturer

  • No need to change file systems

DRAM‐cache Flash Storage Host’s page cache NV‐cache

slide-43
SLIDE 43

43

Chungbuk National University

Performance Evaluation

 Trace‐driven simulations with SSDsim

  • Developed by MSR as an extension of Disksim
  • Emulate SLC NAND Flash
slide-44
SLIDE 44

44

Chungbuk National University

Performance Evaluation

 Implement the in‐storage NV‐cache module and modify a

storage controller to support CDM

 Compare with a NVM‐basic model

  • Manage NVM like a volatile cache
  • Cache data on access and evict it with LRU policy
slide-45
SLIDE 45

45

Chungbuk National University

Write Amplification Factor

 CDM reduces WAF by 2.1‐17.6% and 4.3‐38.2% in JEDEC

and OLTP workloads

slide-46
SLIDE 46

46

Chungbuk National University

Response Time

 Average response time is improved by 9.7% and 20.3% on

average in JEDEC and OLTP

slide-47
SLIDE 47

47

Chungbuk National University

Standard Deviation of Response Time

 Reduce standard deviation of response time by 31% and

39% on average in JEDEC and OLTP

 Relieve the performance fluctuating

slide-48
SLIDE 48

48

Chungbuk National University

Fini