Efficient Memory Disaggregation with Infiniswap Juncheng Gu , - - PowerPoint PPT Presentation
Efficient Memory Disaggregation with Infiniswap Juncheng Gu , - - PowerPoint PPT Presentation
Efficient Memory Disaggregation with Infiniswap Juncheng Gu , Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, Kang G. Shin Agenda Motivation and related work Design and system overview Implementation and evaluation Future work and
Agenda
- Motivation and related work
- Design and system overview
- Implementation and evaluation
- Future work and conclusion
3/30/17 1
2 3/30/17
Memory-intensive applications
3 3/30/17
Memory-intensive applications
3/30/17 4
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 5
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 6
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 7
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 8
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 9
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 10
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 11
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 12
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
3/30/17 13
Performance degradation
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory 0.18 0.47 0.94 0.97 0.04 0.06 0.12 0.04 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory 75% working sets in memory 50% working sets in memory
Memory overestimation
3/30/17 14
- Google Cluster Analysis[1]
[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.
Memory underutilization
How to utilize ABU memory?
Allocated Used Portion of Memory Time (days)
3/30/17 15
- Google Cluster Analysis[1]
[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.
Memory underutilization
How to utilize ABU memory?
Allocated Used Portion of Memory
0.8
Time (days)
3/30/17 16
- Google Cluster Analysis[1]
[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.
Memory underutilization
How to utilize ABU memory?
Allocated Used Portion of Memory
0.8 0.5
Time (days)
3/30/17 17
- Google Cluster Analysis[1]
[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.
Memory underutilization
How to utilize ABU memory?
Allocated Used Portion of Memory
0.8 0.5
≈30%
Time (days)
3/30/17 18
- Google Cluster Analysis[1]
[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.
Memory underutilization
How to utilize ABU memory?
Allocated Used Portion of Memory
0.8 0.5
≈30%
Time (days)
Can we utilize this memory?
3/30/17 19
Machine 2 Used memory Free memory Remote memory Machine 3 Machine 4 Machine N Machine 1
3/30/17 20
Disaggregate free memory
Machine 2 Used memory Free memory Remote memory Machine 3 Machine 4 Machine N Machine 1 Machine 2
Memory Disaggregation Layer
Machine 3 Machine 4 Machine N Machine 1 Used memory Free memory Remote memory
3/30/17 21
Disaggregate free memory
Machine 2 Used memory Free memory Remote memory Machine 3 Machine 4 Machine N Machine 1 Machine 2
Memory Disaggregation Layer
Machine 3 Machine 4 Machine N Machine 1 Used memory Free memory Remote memory Machine 2
Memory Disaggregation Layer
Machine 3 Machine 4 Machine N Machine 1 Used memory Free memory Remote memory Machine 2
Memory Disaggregation Layer
Machine 3 Machine 4 Machine N Machine 1 Used memory Free memory Remote memory
3/30/17 22
What are the challenges?
- Minimize deployment overhead
- No hardware design
- No application modification
- Tolerate failures
- e.g. network disconnection, machine crash
- Manage remote memory at scale
No HW design No app modification Fault- tolerance Scalability
Memory Blade[ISCA’09]
HPBD[CLUSTER’05] /NBDX[1] RDMA key-value service
(e.g. HERD[SIGCOMM’14], FaRM[NSDI’14])
Intel Rack Scale Architecture (RSA)[2]
Infiniswap
3/30/17 23
Recent work on memory disaggregation
[1] https://github.com/accelio/NBDX [2] http://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html
Agenda
- Motivation and related work
- Design and system overview
- Implementation and evaluation
- Future work and conclusion
3/30/17 24
3/30/17 25
System Overview
Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
Local Disk RNIC Machine 1
Application
Infiniswap Daemon
User Space
Machine 2 RNIC
Sync Async
3/30/17 26
System Overview
Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
Local Disk RNIC Machine 1
Application
Infiniswap Daemon
User Space
Machine 2 RNIC
Sync Async
Infiniswap Block Device
- Swap space
- Request router
3/30/17 27
System Overview
Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
Local Disk RNIC Machine 1
Application
Infiniswap Daemon
User Space
Machine 2 RNIC
Sync Async
Local disk
- [ASYNC] backup swapped-out
data
- Tolerate remote memory
failure
3/30/17 28
System Overview
Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
Local Disk RNIC Machine 1
Application
Infiniswap Daemon
User Space
Machine 2 RNIC
Sync Async
Infiniswap Deamon
- Local memory region
- Remote memory service
3/30/17 29
System Overview
Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
Local Disk RNIC Machine 1
Application
Infiniswap Daemon
User Space
Machine 2 RNIC
Sync Async
RDMA
- One-sided operations
- Bypass remote CPU
Objectives Ideas
No hardware design Remote paging No application modification Fault-tolerance Local backup disk Scalability Decentralized remote memory management
3/30/17 30
How to meet the design objectives?
3/30/17 31
One-to-many
Application1 Application2 Virtual Memory Manager (VMM)
Infiniswap Block Device
RNIC
Application
Infiniswap Daemon
User Space
Machine 1 Machine 2 RNIC
Application
Infiniswap Daemon
User Space
Machine 3 RNIC Local Disk
User Space Kernel Space Async Sync
3/30/17 32
Many-to-many
Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
RNIC
Application
Infiniswap Daemon
User Space
Machine 1 Machine 2 RNIC
Application
Infiniswap Daemon
User Space
Machine 3 RNIC Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
RNIC Machine 4 Local Disk Local Disk
Async Sync Async Sync
3/30/17 33
Many-to-many
Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
RNIC
Application
Infiniswap Daemon
User Space
Machine 1 Machine 2 RNIC
Application
Infiniswap Daemon
User Space
Machine 3 RNIC Application1 Application2
User Space Kernel Space
Virtual Memory Manager (VMM)
Infiniswap Block Device
RNIC Machine 4 Local Disk Local Disk
Async Sync Async Sync
How to scale remote memory?
- How to find remote memory in the cluster?
- Which remote mapping should be evicted?
Objectives Ideas
No hardware design Remote paging No application modification Fault-tolerance Local backup disk Scalability Decentralized remote memory management
3/30/17 34
How to meet the design objectives?
3/30/17 35
Management unit: memory page?
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
3/30/17 36
Management unit: memory page?
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
Local Page Remote Page p100 <s1, p1>
1GB = 256K entries 1GB = 256K RTTs
3/30/17 37
Management unit: memory slab!
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
3/30/17 38
Management unit: memory slab!
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
3/30/17 39
Which remote machine should be selected?
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
3/30/17 40
Which remote machine should be selected?
Goal: balance memory utilization
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
3/30/17 41
Which remote machine should be selected?
Ø Central controller
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
3/30/17 42
Which remote machine should be selected?
Ø Central controller Ø Decentralized approach
3/30/17 43
[1]
Power of two choices[1]
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
[1] Mitzenmacher, Michael. "The power of two choices in randomized load balancing.”, Ph.D. thesis, U.C. Berkeley, 1996
3/30/17 44
[1]
Power of two choices[1]
[1] Mitzenmacher, Michael. "The power of two choices in randomized load balancing.”, Ph.D. thesis, U.C. Berkeley, 1996
Infiniswap Block Device Infiniswap Daemon Infiniswap Daemon Infiniswap Daemon
3/30/17 45
Slab eviction
Infiniswap Daemon
1 2 3 4
Remote Memory Used Memory
Mapped Slab Unmapped Slab
3/30/17 46
Slab eviction
Infiniswap Daemon
1 2 3 4
Remote Memory Used Memory Infiniswap Daemon
1 2 3 4
Remote Memory Used Memory
Mapped Slab Unmapped Slab
3/30/17 47
Slab eviction
Infiniswap Daemon
1 2 3 4
Remote Memory Used Memory Infiniswap Daemon
1 2 3 4
Remote Memory Used Memory Infiniswap Daemon
1 2 3 4
Remote Memory Used Memory
Mapped Slab Unmapped Slab
3/30/17 48
Which slab should be evicted?
Daemon: Does not know the swap activities
Infiniswap Daemon
1 2 3 4
3/30/17 49
Daemon: Too expensive to query all the slabs
Infiniswap Daemon
1 2 3 4
Which slab should be evicted?
Infiniswap Daemon
1 2 3 4
3/30/17 50
Power of multiple choices[1]
Select E least-active slabs from E+E’ random slabs
[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11
Infiniswap Daemon
1 2 3 4
3/30/17 51
Power of multiple choices[1]
Select E least-active slabs from E+E’ random slabs
[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11
Infiniswap Daemon
1 2 3 4
Infiniswap Daemon
1 2 3 4
3/30/17 52
Power of multiple choices[1]
Select E least-active slabs from E+E’ random slabs
[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11
Infiniswap Daemon
1 2 3 4
Infiniswap Daemon
1 2 4
Agenda
- Motivation and related work
- Design and system overview
- Implementation and evaluation
- Future work and conclusion
3/30/17 53
3/30/17 54
Implementation
- Connection Management
- One RDMA connection per active block device - daemon pair
- Control Plane
- SEND, RECV
- Data Plane
- One-sided RDMA READ, WRITE
Kernel Space Infiniswap Block Device User Space Infiniswap Daemon
RDMA
3/30/17 55
What are we expecting from Infiniswap?
§ Application performance § Cluster memory utilization § Network usage § Eviction overhead § Fault-tolerance overhead § Performance as a block device
3/30/17 56
Evaluation
2 x 8 cores (32 vcores) 64GB DRAM
56Gbps InfiniBand NIC
32-node cluster
InfiniBand Network
- 50% working sets in memory
3/30/17 57
Application performance
- Application performance is improved by 2-16x
0.04 0.06 0.12 0.04
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory
- 50% working sets in memory
3/30/17 58
Application performance
- Application performance is improved by 2-16x
0.04 0.06 0.12 0.04
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory
0.04 0.06 0.12 0.04
0.66 0.77 0.61 0.08 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory
- 50% working sets in memory
3/30/17 59
Application performance
- Application performance is improved by 2-16x
0.04 0.06 0.12 0.04
0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory
0.04 0.06 0.12 0.04
0.66 0.77 0.61 0.08 0.2 0.4 0.6 0.8 1 VoltDB (TPC-C) Memcached (Facebook/FB SYS) PowerGraph (TunkRank) GraphX (PageRank)
Normalized Performance
100% working sets in memory Disk + 50% working sets in memory Infiniswap + 50% working sets in memory
- 90 containers (applications), mixing all applications and memory constraints.
3/30/17 60
- Cluster memory utilization is improved from 40.8% to 60% (1.47x)
Cluster memory utilization
20 40 60 80 100
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Memory Utilization (%)
Rank of Machines
Infiniswap w/o Infiniswap
Agenda
- Motivation and related work
- Design and system overview
- Implementation and evaluation
- Future work and conclusion
3/30/17 61
3/30/17 62
Limitations and future work
- Trade-off in fault-tolerance
- Local disk is the bottleneck
- Multiple remote replicas
- Fault-tolerance vs. space-efficiency
- Performance isolation among applications
- W/o limitation on each application’s usage
- W/o mapping between remote memory and applications
- Infiniswap: remote paging over RDMA
- Application performance
- Cluster memory utilization
3/30/17 63
Conclusion
- Efficient, practical memory disaggregation
- No hardware design
- No application modification
- Fault-tolerance
- Scalability
Source code is coming soon!
https://github.com/Infiniswap/infiniswap.git
Thank You !
3/30/17 64