Ziggurat: A Tiered File System for Non-Volatile Main Memories and - - PowerPoint PPT Presentation

ziggurat a tiered file system for non volatile main
SMART_READER_LITE
LIVE PREVIEW

Ziggurat: A Tiered File System for Non-Volatile Main Memories and - - PowerPoint PPT Presentation

Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks Shengan Zheng , Morteza Hoseinzadeh , Steven Swanson Shanghai Jiao Tong University University of California, San Diego 1 Background Non-volatile main


slide-1
SLIDE 1

1

Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks

Shengan Zheng†, Morteza Hoseinzadeh§, Steven Swanson§

† Shanghai Jiao Tong University § University of California, San Diego

slide-2
SLIDE 2

2

Background

  • Non-volatile main memory (NVMM)

– Byte-addressability – Persistence – Direct access (DAX)

  • NVMM file systems

– PMFS, SCMFS, NOVA – EXT4-DAX, XFS-DAX – Capacity?

DRAM + Flash NVDIMM 3D-XPoint NVDIMM

slide-3
SLIDE 3

3

Motivation

Bandwidth $/GB

1GB/s 100MB/s 10GB/s 0.01 0.1 1 10

Hard Disk Drive SATA SSD NVMe SSD Optane SSD NVMM DRAM

slide-4
SLIDE 4

4

Motivation

Bandwidth $/GB

1GB/s 100MB/s 10GB/s 0.01 0.1 1 10

Hard Disk Drive SATA SSD NVMe SSD Optane SSD NVMM DRAM

slide-5
SLIDE 5

5

HDD

Tiered Storage System

SSD

  • SSD for speed
  • HDD for capacity
slide-6
SLIDE 6

6

HDD

Tiered Storage System

  • NVMM for speed
  • Disks for capacity

SSD NVMM

slide-7
SLIDE 7

7

Ziggurat Overview

  • Intelligent data placement policy

– Send writes to the most suitable tier – High NVMM space utilization

  • Accurate predictors

– Predict the synchronicity of each file (synchronicity predictor) – Predict the size of future writes to each file (write size predictor)

  • Efficient migration mechanism

– Only migrate cold data in cold files – Migrate file data in groups

slide-8
SLIDE 8

8

Outline

  • Motivation
  • Data placement policy
  • Migration mechanism
  • Evaluation
  • Conclusion
slide-9
SLIDE 9

9

Disk NVMM NVMM NVMM Synchronously-updated Asynchronously-updated Large writes Small writes

Data Placement Policy

  • Although NVMM is the fastest tier in Ziggurat, file writes should

not always go to NVMM.

Synchronicity predictor Write size predictor

Data Placement

slide-10
SLIDE 10

10

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 0 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 0 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

slide-11
SLIDE 11

11

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 2 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 0 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2

slide-12
SLIDE 12

12

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 0 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 0 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2

slide-13
SLIDE 13

13

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 2 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 0 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2 2,2

slide-14
SLIDE 14

14

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 0 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 0 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2 2,2

slide-15
SLIDE 15

15

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 2 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 0 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2 2,2 4,2

slide-16
SLIDE 16

16

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 0 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 0 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2 2,2 4,2

slide-17
SLIDE 17

17

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 0 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 2 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2 2,2 4,2 0,2

slide-18
SLIDE 18

18

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 0 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 4 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2 2,2 4,2 0,2 2,2

slide-19
SLIDE 19

19

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 0 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 6 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data

Synchronous

0,2 2,2 4,2 0,2 2,2 4,2

slide-20
SLIDE 20

20

Synchronicity Predictor

  • Predict whether the future accesses are likely to be synchronous

write(0,2); fsync(); write(2,2); fsync(); write(4,2); fsync(); write(0,2); write(2,2); write(4,2); fsync(); 1 2 3

Data blocks written: 0 / 4

Synchronous

4 5 6 7 1 2 3

Data blocks written: 0 / 4

Asynchronous

4 5 6 7 File log File data Write entry

  • ffset, length

File log File data 0,2 2,2 4,2 0,2 2,2 4,2

slide-21
SLIDE 21

21

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry

slide-22
SLIDE 22

22

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

slide-23
SLIDE 23

23

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

Predecessor found Large Stable Length ≥ 4

slide-24
SLIDE 24

24

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

Predecessor found Large Stable Length ≥ 4

0,4,4

slide-25
SLIDE 25

25

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

Predecessor found Large Stable Length ≥ 4

0,4,4 6,1,?

slide-26
SLIDE 26

26

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

Predecessor found Large Stable Length ≥ 4

0,4,4 6,1,?

Predecessor found Small Stable Length < 4

slide-27
SLIDE 27

27

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

Predecessor found Large Stable Length ≥ 4

0,4,4 6,1,?

Predecessor found Small Stable Length < 4

6,1,0

slide-28
SLIDE 28

28

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

Predecessor found Large Stable Length ≥ 4

0,4,4 6,1,?

Predecessor found Small Stable Length < 4

6,1,0 4,4,?

slide-29
SLIDE 29

29

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

Predecessor found Large Stable Length ≥ 4

0,4,4 6,1,?

Predecessor found Small Stable Length < 4

6,1,0 4,4,?

Predecessor not found Large Unstable Length ≥ 4

slide-30
SLIDE 30

30

Write Size Predictor

  • Predict whether the incoming writes are both large and stable

1 2 3 4 5 6 7 File log 0,4,3 File data 4,4,1 5,1,0 6,1,0 write(0,4); write(6,1); write(4,4);

  • ffset, length, counter

Write entry 0,4,?

Predecessor found Large Stable Length ≥ 4

0,4,4 6,1,?

Predecessor found Small Stable Length < 4

6,1,0 4,4,?

Predecessor not found Large Unstable Length ≥ 4

4,4,0

slide-31
SLIDE 31

31

Cold Data Identification

  • Average modification time (amtime)

– Updated differentially

1 2 3 4 5 6 7 File log 0,2,2 File data 2,2,4 4,2,6 1 2 3 4 5 6 7 File log 0,2,2 File data 2,2,4 4,2,6 6,2,8

amtime = amtime =

2 ∗ 2 + 4 ∗ 2 + 6 ∗ 2 2 + 2 + 2

= 4

4 ∗ 6 + 8 ∗ 2 6 + 2

= 5

  • ffset, length, mtime

Write entry

slide-32
SLIDE 32

32

Cold Data Identification

Cold

File File File File

CPU 0 Hot

File File File File

CPU 1

  • Among files

– Cold lists sorted by amtime

amtime

1 2 3 4 5 6 7 File log 0,2,2 File data 2,2,4 4,2,6 6,2,8

amtime = 5

  • Within each file

– Cold data blocks relative to amtime

Cold Cold Hot Hot

  • ffset, length, mtime

Write entry

slide-33
SLIDE 33

33

Migration Mechanism

Chmod Write 0-8K

Head Tail

Inode Inode log NVMM Disk

...

Write 0-4K File Page 1 File Page 2 File Page 3 File Page 1 File Page 4 Write 8-16K

Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry

  • Basic migration

– Migrate the coldest data to disk – Consistency is ensured

  • Reverse migration

– Migrate from disk to NVMM – Handle mmap – Read-dominated workloads

slide-34
SLIDE 34

34

Migration Mechanism

Chmod Write 0-8K

Head Tail

Inode Inode log NVMM Disk

...

Write 0-4K File Page 1 File Page 2 File Page 3 File Page 1 File Page 4 Write 8-16K File Page 3’ File Page 4’ Step 1

Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry

  • Basic migration

– Migrate the coldest data to disk – Consistency is ensured

  • Reverse migration

– Migrate from disk to NVMM – Handle mmap – Read-dominated workloads

slide-35
SLIDE 35

35

Migration Mechanism

Chmod Write 0-8K

Head Tail

Inode Inode log NVMM Disk

...

Write 0-4K File Page 1 File Page 2 File Page 3 File Page 1 File Page 4 Write 8-16K File Page 3’ File Page 4’ Step 1 Step 2

Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry

  • Basic migration

– Migrate the coldest data to disk – Consistency is ensured

  • Reverse migration

– Migrate from disk to NVMM – Handle mmap – Read-dominated workloads

slide-36
SLIDE 36

36

Migration Mechanism

Chmod Write 0-8K

Head Tail

Inode Inode log NVMM Disk

...

Write 0-4K Write 8-16K File Page 1 File Page 2 File Page 3 File Page 1 File Page 4 Write 8-16K File Page 3’ File Page 4’ Step 1 Step 2 Step 3

Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry

  • Basic migration

– Migrate the coldest data to disk – Consistency is ensured

  • Reverse migration

– Migrate from disk to NVMM – Handle mmap – Read-dominated workloads

slide-37
SLIDE 37

37

Migration Mechanism

Chmod Write 0-8K

Head Tail

Inode Inode log NVMM Disk

...

Write 0-4K Write 8-16K File Page 1 File Page 2 File Page 3 File Page 1 File Page 4 Write 8-16K File Page 3’ File Page 4’ Step 1 Step 2 Step 3 Step 4

Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry

  • Basic migration

– Migrate the coldest data to disk – Consistency is ensured

  • Reverse migration

– Migrate from disk to NVMM – Handle mmap – Read-dominated workloads

slide-38
SLIDE 38

38

Migration Mechanism

Chmod Write 0-8K

Head Tail

Inode Inode log NVMM Disk

...

Write 0-4K Write 8-16K File Page 1 File Page 2 File Page 3 File Page 1 File Page 4 Write 8-16K File Page 3’ File Page 4’ Step 1 Step 2 Step 3 Step 4

Pages

Step 5

Page state

Stale Live

Entry type

Inode update Old write entry New write entry

  • Basic migration

– Migrate the coldest data to disk – Consistency is ensured

  • Reverse migration

– Migrate from disk to NVMM – Handle mmap – Read-dominated workloads

slide-39
SLIDE 39

39

Migration Mechanism

Chmod Write 0-8K

Head Tail

Inode

...

Write 4-8K File Page 1 File Page 2 File Page 3 File Page 2 File Page 4 Write 8-16K Write 12-16K File Page 4

...

Inode log NVMM Disk Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry

  • Group migration

– Coalesce adjacent write entries – Utilize the high sequential bandwidth of disks

  • Benefits

– Improve migration efficiency – Accelerate future reads – Reduce log size – Moderate disk fragmentation

slide-40
SLIDE 40

40

Migration Mechanism

Step 1 File Page 1’ File Page 2’ File Page 3’ File Page 4’ Chmod Write 0-8K

Head Tail

Inode

...

Write 4-8K File Page 1 File Page 2 File Page 3 File Page 2 File Page 4 Write 8-16K Write 12-16K File Page 4

...

Inode log NVMM Disk Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry

  • Group migration

– Coalesce adjacent write entries – Utilize the high sequential bandwidth of disks

  • Benefits

– Improve migration efficiency – Accelerate future reads – Reduce log size – Moderate disk fragmentation

slide-41
SLIDE 41

41

Migration Mechanism

Step 1 File Page 1’ File Page 2’ File Page 3’ File Page 4’ Chmod Write 0-8K

Head Tail

Inode

...

Write 4-8K File Page 1 File Page 2 File Page 3 File Page 2 File Page 4 Write 8-16K Write 12-16K File Page 4

...

Inode log NVMM Disk Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry Step 2

  • Group migration

– Coalesce adjacent write entries – Utilize the high sequential bandwidth of disks

  • Benefits

– Improve migration efficiency – Accelerate future reads – Reduce log size – Moderate disk fragmentation

slide-42
SLIDE 42

42

Migration Mechanism

Step 1 File Page 1’ File Page 2’ File Page 3’ File Page 4’ Chmod Write 0-8K

Head Tail

Inode

...

Write 4-8K File Page 1 File Page 2 File Page 3 File Page 2 File Page 4 Write 8-16K Write 12-16K Write 0-16K File Page 4 Step 3

...

Inode log NVMM Disk Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry Step 2

  • Group migration

– Coalesce adjacent write entries – Utilize the high sequential bandwidth of disks

  • Benefits

– Improve migration efficiency – Accelerate future reads – Reduce log size – Moderate disk fragmentation

slide-43
SLIDE 43

43

Migration Mechanism

Step 1 File Page 1’ File Page 2’ File Page 3’ File Page 4’ Chmod Write 0-8K

Head Tail

Inode

...

Write 4-8K File Page 1 File Page 2 File Page 3 File Page 2 File Page 4 Write 8-16K Step 4 Write 12-16K Write 0-16K File Page 4 Step 3

...

Inode log NVMM Disk Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry Step 2

  • Group migration

– Coalesce adjacent write entries – Utilize the high sequential bandwidth of disks

  • Benefits

– Improve migration efficiency – Accelerate future reads – Reduce log size – Moderate disk fragmentation

slide-44
SLIDE 44

44

Migration Mechanism

Step 1 Step 5 File Page 1’ File Page 2’ File Page 3’ File Page 4’ Chmod Write 0-8K

Head Tail

Inode

...

Write 4-8K File Page 1 File Page 2 File Page 3 File Page 2 File Page 4 Write 8-16K Step 4 Write 12-16K Write 0-16K File Page 4 Step 3

...

Inode log NVMM Disk Pages

Page state

Stale Live

Entry type

Inode update Old write entry New write entry Step 2

  • Group migration

– Coalesce adjacent write entries – Utilize the high sequential bandwidth of disks

  • Benefits

– Improve migration efficiency – Accelerate future reads – Reduce log size – Moderate disk fragmentation

slide-45
SLIDE 45

45

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-46
SLIDE 46

46

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-47
SLIDE 47

47

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-48
SLIDE 48

48

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-49
SLIDE 49

49

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-50
SLIDE 50

50

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-51
SLIDE 51

51

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-52
SLIDE 52

52

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-53
SLIDE 53

53

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-54
SLIDE 54

54

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-55
SLIDE 55

55

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-56
SLIDE 56

56

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-57
SLIDE 57

57

File Operations

NVMM DRAM write(0,8); write(8,8); fsync(); append(16,1); append(17,1); ... append(23,1); ...... migrate(); ...... read(16,8); mmap(0,8); DISK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

slide-58
SLIDE 58

58

Evaluation

  • Experimental setup

– Dual-socket Intel Xeon E5 server – 256 GB DRAM – 187 GB Intel DC P4800X Optane SSD – 400 GB Intel DC P3600 NVMe SSD – Ubuntu 16.04 LTS, Linux kernel 4.13.0

  • Emulated NVMM

– NUMA effect on DRAM – NUMA node 1: NVMM emulation – NUMA node 0: Processors and memory for applications

  • Disk-based file systems

– EXT4-DJ (data journaling) – XFS-ML (metadata logging)

  • Ziggurat

– Ziggurat-X (Optane/NVMe SSD + X GB of NVMM)

  • NVMM-based file systems

– NOVA, EXT4-DAX, XFS-DAX

slide-59
SLIDE 59

59

Filebench

  • With large amount of NVMM, Ziggurat’s performance nearly

matches that of NVMM-only file systems.

Fileserver Varmail

Within 3% 2.1X 2.6X

slide-60
SLIDE 60

60

Rocksdb

  • Ziggurat shows good performance for inserting file data with

write-ahead logging (WAL).

Random insert Sequential insert

9.9X 42.4X 5.7X 13.7X

slide-61
SLIDE 61

61

SQLite

  • Ziggurat maintains near-NVMM performance because the hot

journal files are frequently updated in NVMM.

PERSIST WAL

1.6X 4.8X 1.4X 5.3X

slide-62
SLIDE 62

62

Conclusion

  • Ziggurat fully utilizes the strengths of NVMM and disks to offer

high file performance for a wide range of access patterns.

  • [Prediction] Ziggurat steers the incoming writes to the most

suitable tier based on the prediction results.

  • [Migration] Ziggurat coalesces adjacent data blocks and

migrates them in large chunks to disks.

  • [Evaluation] Ziggurat achieves up to 38.9X and 46.5X throughput

improvement compared with EXT4 and XFS running on an SSD alone, respectively.

slide-63
SLIDE 63

63

Thank you