Efficient Memory Disaggregation with Infiniswap Juncheng Gu , - - PowerPoint PPT Presentation

efficient memory disaggregation with infiniswap
SMART_READER_LITE
LIVE PREVIEW

Efficient Memory Disaggregation with Infiniswap Juncheng Gu , - - PowerPoint PPT Presentation

Efficient Memory Disaggregation with Infiniswap Juncheng Gu , Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, Kang G. Shin Agenda Motivation and related work Design and system overview Implementation and evaluation Future work and


slide-1
SLIDE 1

Efficient Memory Disaggregation with Infiniswap

Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, Kang G. Shin

slide-2
SLIDE 2

Agenda

  • Motivation and related work
  • Design and system overview
  • Implementation and evaluation
  • Future work and conclusion

3/30/17 1

slide-3
SLIDE 3

2 3/30/17

Memory-intensive applications

slide-4
SLIDE 4

3 3/30/17

Memory-intensive applications

slide-5
SLIDE 5

3/30/17 4

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-6
SLIDE 6

3/30/17 5

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-7
SLIDE 7

3/30/17 6

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-8
SLIDE 8

3/30/17 7

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-9
SLIDE 9

3/30/17 8

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-10
SLIDE 10

3/30/17 9

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-11
SLIDE 11

3/30/17 10

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-12
SLIDE 12

3/30/17 11

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-13
SLIDE 13

3/30/17 12

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

slide-14
SLIDE 14

3/30/17 13

Performance degradation

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory 75% working sets in memory 50% working sets in memory

Memory overestimation

slide-15
SLIDE 15

3/30/17 14

  • Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used Portion of Memory Time (days)

slide-16
SLIDE 16

3/30/17 15

  • Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used Portion of Memory

0.8

Time (days)

slide-17
SLIDE 17

3/30/17 16

  • Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used Portion of Memory

0.8 0.5

Time (days)

slide-18
SLIDE 18

3/30/17 17

  • Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used Portion of Memory

0.8 0.5

≈30%

Time (days)

slide-19
SLIDE 19

3/30/17 18

  • Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used Portion of Memory

0.8 0.5

≈30%

Time (days)

Can we utilize this memory?

slide-20
SLIDE 20

3/30/17 19

Machine 2 Used memory Free memory Remote memory Machine 3 Machine 4 Machine N Machine 1

slide-21
SLIDE 21

3/30/17 20

Disaggregate free memory

Machine 2 Used memory Free memory Remote memory Machine 3 Machine 4 Machine N Machine 1 Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N Machine 1 Used memory Free memory Remote memory

slide-22
SLIDE 22

3/30/17 21

Disaggregate free memory

Machine 2 Used memory Free memory Remote memory Machine 3 Machine 4 Machine N Machine 1 Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N Machine 1 Used memory Free memory Remote memory Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N Machine 1 Used memory Free memory Remote memory Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N Machine 1 Used memory Free memory Remote memory

slide-23
SLIDE 23

3/30/17 22

What are the challenges?

  • Minimize deployment overhead
  • No hardware design
  • No application modification
  • Tolerate failures
  • e.g. network disconnection, machine crash
  • Manage remote memory at scale
slide-24
SLIDE 24

No HW design No app modification Fault- tolerance Scalability

Memory Blade[ISCA’09]

HPBD[CLUSTER’05] /NBDX[1] RDMA key-value service

(e.g. HERD[SIGCOMM’14], FaRM[NSDI’14])

Intel Rack Scale Architecture (RSA)[2]

Infiniswap

3/30/17 23

Recent work on memory disaggregation

[1] https://github.com/accelio/NBDX [2] http://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html

slide-25
SLIDE 25

Agenda

  • Motivation and related work
  • Design and system overview
  • Implementation and evaluation
  • Future work and conclusion

3/30/17 24

slide-26
SLIDE 26

3/30/17 25

System Overview

Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC Machine 1

Application

Infiniswap Daemon

User Space

Machine 2 RNIC

Sync Async

slide-27
SLIDE 27

3/30/17 26

System Overview

Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC Machine 1

Application

Infiniswap Daemon

User Space

Machine 2 RNIC

Sync Async

Infiniswap Block Device

  • Swap space
  • Request router
slide-28
SLIDE 28

3/30/17 27

System Overview

Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC Machine 1

Application

Infiniswap Daemon

User Space

Machine 2 RNIC

Sync Async

Local disk

  • [ASYNC] backup swapped-out

data

  • Tolerate remote memory

failure

slide-29
SLIDE 29

3/30/17 28

System Overview

Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC Machine 1

Application

Infiniswap Daemon

User Space

Machine 2 RNIC

Sync Async

Infiniswap Deamon

  • Local memory region
  • Remote memory service
slide-30
SLIDE 30

3/30/17 29

System Overview

Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC Machine 1

Application

Infiniswap Daemon

User Space

Machine 2 RNIC

Sync Async

RDMA

  • One-sided operations
  • Bypass remote CPU
slide-31
SLIDE 31

Objectives Ideas

No hardware design Remote paging No application modification Fault-tolerance Local backup disk Scalability Decentralized remote memory management

3/30/17 30

How to meet the design objectives?

slide-32
SLIDE 32

3/30/17 31

One-to-many

Application1 Application2 Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

Application

Infiniswap Daemon

User Space

Machine 1 Machine 2 RNIC

Application

Infiniswap Daemon

User Space

Machine 3 RNIC Local Disk

User Space Kernel Space Async Sync

slide-33
SLIDE 33

3/30/17 32

Many-to-many

Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

Application

Infiniswap Daemon

User Space

Machine 1 Machine 2 RNIC

Application

Infiniswap Daemon

User Space

Machine 3 RNIC Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC Machine 4 Local Disk Local Disk

Async Sync Async Sync

slide-34
SLIDE 34

3/30/17 33

Many-to-many

Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

Application

Infiniswap Daemon

User Space

Machine 1 Machine 2 RNIC

Application

Infiniswap Daemon

User Space

Machine 3 RNIC Application1 Application2

User Space Kernel Space

Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC Machine 4 Local Disk Local Disk

Async Sync Async Sync

How to scale remote memory?

  • How to find remote memory in the cluster?
  • Which remote mapping should be evicted?
slide-35
SLIDE 35

Objectives Ideas

No hardware design Remote paging No application modification Fault-tolerance Local backup disk Scalability Decentralized remote memory management

3/30/17 34

How to meet the design objectives?

slide-36
SLIDE 36

3/30/17 35

Management unit: memory page?

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

slide-37
SLIDE 37

3/30/17 36

Management unit: memory page?

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

Local Page Remote Page p100 <s1, p1>

1GB = 256K entries 1GB = 256K RTTs

slide-38
SLIDE 38

3/30/17 37

Management unit: memory slab!

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

slide-39
SLIDE 39

3/30/17 38

Management unit: memory slab!

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

slide-40
SLIDE 40

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

3/30/17 39

Which remote machine should be selected?

slide-41
SLIDE 41

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

3/30/17 40

Which remote machine should be selected?

Goal: balance memory utilization

slide-42
SLIDE 42

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

3/30/17 41

Which remote machine should be selected?

Ø Central controller

slide-43
SLIDE 43

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

3/30/17 42

Which remote machine should be selected?

Ø Central controller Ø Decentralized approach

slide-44
SLIDE 44

3/30/17 43

[1]

Power of two choices[1]

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

[1] Mitzenmacher, Michael. "The power of two choices in randomized load balancing.”, Ph.D. thesis, U.C. Berkeley, 1996

slide-45
SLIDE 45

3/30/17 44

[1]

Power of two choices[1]

[1] Mitzenmacher, Michael. "The power of two choices in randomized load balancing.”, Ph.D. thesis, U.C. Berkeley, 1996

Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon

slide-46
SLIDE 46

3/30/17 45

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

slide-47
SLIDE 47

3/30/17 46

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

slide-48
SLIDE 48

3/30/17 47

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

slide-49
SLIDE 49

3/30/17 48

Which slab should be evicted?

Daemon: Does not know the swap activities

Infiniswap Daemon

1 2 3 4

slide-50
SLIDE 50

3/30/17 49

Daemon: Too expensive to query all the slabs

Infiniswap Daemon

1 2 3 4

Which slab should be evicted?

slide-51
SLIDE 51

Infiniswap Daemon

1 2 3 4

3/30/17 50

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

slide-52
SLIDE 52

Infiniswap Daemon

1 2 3 4

3/30/17 51

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

Infiniswap Daemon

1 2 3 4

slide-53
SLIDE 53

Infiniswap Daemon

1 2 3 4

3/30/17 52

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

Infiniswap Daemon

1 2 3 4

Infiniswap Daemon

1 2 4

slide-54
SLIDE 54

Agenda

  • Motivation and related work
  • Design and system overview
  • Implementation and evaluation
  • Future work and conclusion

3/30/17 53

slide-55
SLIDE 55

3/30/17 54

Implementation

  • Connection Management
  • One RDMA connection per active block device - daemon pair
  • Control Plane
  • SEND, RECV
  • Data Plane
  • One-sided RDMA READ, WRITE

Kernel Space Infiniswap Block Device User Space Infiniswap Daemon

RDMA

slide-56
SLIDE 56

3/30/17 55

What are we expecting from Infiniswap?

§ Application performance § Cluster memory utilization § Network usage § Eviction overhead § Fault-tolerance overhead § Performance as a block device

slide-57
SLIDE 57

3/30/17 56

Evaluation

2 x 8 cores (32 vcores) 64GB DRAM

56Gbps InfiniBand NIC

32-node cluster

InfiniBand Network

slide-58
SLIDE 58
  • 50% working sets in memory

3/30/17 57

Application performance

  • Application performance is improved by 2-16x

0.04 0.06 0.12 0.04

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory

slide-59
SLIDE 59
  • 50% working sets in memory

3/30/17 58

Application performance

  • Application performance is improved by 2-16x

0.04 0.06 0.12 0.04

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory

0.04 0.06 0.12 0.04

0.66 0.77 0.61 0.08 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory

slide-60
SLIDE 60
  • 50% working sets in memory

3/30/17 59

Application performance

  • Application performance is improved by 2-16x

0.04 0.06 0.12 0.04

0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory

0.04 0.06 0.12 0.04

0.66 0.77 0.61 0.08 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)

Normalized Performance

100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory

slide-61
SLIDE 61
  • 90 containers (applications), mixing all applications and memory constraints.

3/30/17 60

  • Cluster memory utilization is improved from 40.8% to 60% (1.47x)

Cluster memory utilization

20 40 60 80 100

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Memory Utilization (%)

Rank of Machines

Infiniswap w/o Infiniswap

slide-62
SLIDE 62

Agenda

  • Motivation and related work
  • Design and system overview
  • Implementation and evaluation
  • Future work and conclusion

3/30/17 61

slide-63
SLIDE 63

3/30/17 62

Limitations and future work

  • Trade-off in fault-tolerance
  • Local disk is the bottleneck
  • Multiple remote replicas
  • Fault-tolerance vs. space-efficiency
  • Performance isolation among applications
  • W/o limitation on each application’s usage
  • W/o mapping between remote memory and applications
slide-64
SLIDE 64
  • Infiniswap: remote paging over RDMA
  • Application performance
  • Cluster memory utilization

3/30/17 63

Conclusion

  • Efficient, practical memory disaggregation
  • No hardware design
  • No application modification
  • Fault-tolerance
  • Scalability

Source code is coming soon!

https://github.com/Infiniswap/infiniswap.git

slide-65
SLIDE 65

Thank You !

3/30/17 64