2 Berkeley Socket Userspace Kernel Hardware Time 1983 2 - - PowerPoint PPT Presentation

2
SMART_READER_LITE
LIVE PREVIEW

2 Berkeley Socket Userspace Kernel Hardware Time 1983 2 - - PowerPoint PPT Presentation

LITE Kernel RDMA Support for Datacenter Applications Shin-Yeh Tsai , Yiying Zhang Time 2 Berkeley Socket Userspace Kernel Hardware Time 1983 2 Berkeley TCP Arrakis & Socket IX O ffl oad engine mTCP Userspace Kernel Hardware


slide-1
SLIDE 1

LITE Kernel RDMA

Support for Datacenter Applications

Shin-Yeh Tsai, Yiying Zhang

slide-2
SLIDE 2

Time

2

slide-3
SLIDE 3

Time

2

1983 Berkeley Socket

Userspace Kernel Hardware

slide-4
SLIDE 4

Time

2

RDMA in Datacenters

?

2017 1983 Berkeley Socket

Userspace Kernel Hardware

1995 U-Net

2000s TCP Offload engine 2014 Arrakis & mTCP IX RDMA in HPC

slide-5
SLIDE 5
  • Directly read/write remote memory
  • Bypassing kernel
  • Memory zero copy
  • Benefits

– Low latency – High throughput – Low CPU utilization

3

RDMA (Remote Direct Memory Access)

Memory CPU User Kernel

RDMA

slide-6
SLIDE 6

Things have worked well in HPC

  • Special hardware
  • Few applications
  • Cheaper developer

4

slide-7
SLIDE 7

5

RDMA-Based Datacenter Applications

slide-8
SLIDE 8

5

[VLDB ’16]

RSI

[EuroSys ’16]

DrTM+R

[NSDI ’14]

FaRM

[SOSP ’15]

FaRM+Xact

[SIGCOMM ’14]

HERD

[ATC ’16]

HERD-RPC

[OSDI ’16]

FaSST

[ATC ’17]

Octopus

[ATC ’13]

Pilaf

[SoCC ’17]

Hotpot

[OSDI ’16]

Wukong

[SoCC ’17]

APUS

[SOSP ’15]

DrTM

[VLDB ’17]

NAM-DB

[ASPLOS ’15]

Mojim

[ATC ’16]

Cell

RDMA-Based Datacenter Applications

slide-9
SLIDE 9

Things have worked well in HPC

  • Special hardware
  • Few applications
  • Cheaper developer

6

  • Commodity, cheaper hardware
  • Many (changing) applications
  • Resource sharing and isolation

What about datacenters?

slide-10
SLIDE 10

Native RDMA

7

OS

User-Level RDMA App

RNIC

node, lkey, rkey addr

Permission check Address mapping

lkey 1 lkey n rkey 1 rkey n

… …

send recv

Library

Conn Mgmt Mem Mgmt

Cached PTEs

Connections Queues Keys Memory space

User Space Kernel Space Hardware

Userspace Hardware

slide-11
SLIDE 11

Native RDMA

7

OS

User-Level RDMA App

RNIC

node, lkey, rkey addr

Permission check Address mapping

lkey 1 lkey n rkey 1 rkey n

… …

send recv

Library

Conn Mgmt Mem Mgmt

Cached PTEs

Connections Queues Keys Memory space

User Space Kernel Space Hardware

Kernel Bypassing

Userspace Hardware

slide-12
SLIDE 12

Native RDMA

7

OS

User-Level RDMA App

RNIC

node, lkey, rkey addr

Permission check Address mapping

lkey 1 lkey n rkey 1 rkey n

… …

send recv

Library

Conn Mgmt Mem Mgmt

Cached PTEs

Connections Queues Keys Memory space

User Space Kernel Space Hardware

Kernel Bypassing

Userspace Hardware

slide-13
SLIDE 13

8

Userspace Hardware High-level Easy to use Low-level Difficult to use

slide-14
SLIDE 14

8

Userspace Hardware

Developers want

High-level Easy to use Low-level Difficult to use

slide-15
SLIDE 15

8

Userspace Hardware

Socket Developers want

High-level Easy to use Low-level Difficult to use

slide-16
SLIDE 16

8

Userspace Hardware

RDMA Socket Developers want

High-level Easy to use Low-level Difficult to use

slide-17
SLIDE 17

8

Userspace Hardware

RDMA Socket Developers want

High-level Easy to use Resource share Isolation Low-level Difficult to use Difficult to share

slide-18
SLIDE 18

8

Userspace Hardware

RDMA Socket Developers want

Abstraction Mismatch

High-level Easy to use Resource share Isolation Low-level Difficult to use Difficult to share

slide-19
SLIDE 19

8

Userspace Hardware

RDMA Socket Developers want

Fat applications No resource sharing

Abstraction Mismatch

High-level Easy to use Resource share Isolation Low-level Difficult to use Difficult to share

slide-20
SLIDE 20

Things have worked well in HPC

  • Special hardware
  • Few applications
  • Cheaper developer

9

What about datacenters?

  • Commodity, cheaper hardware
  • Many (changing) applications
  • Resource sharing and isolation
slide-21
SLIDE 21

Things have worked well in HPC

  • Special hardware
  • Few applications
  • Cheaper developer

9

What about datacenters?

  • Commodity, cheaper hardware
  • Many (changing) applications
  • Resource sharing and isolation
slide-22
SLIDE 22

Things have worked well in HPC

  • Special hardware
  • Few applications
  • Cheaper developer

9

What about datacenters?

  • Commodity, cheaper hardware
  • Many (changing) applications
  • Resource sharing and isolation
slide-23
SLIDE 23

Native RDMA

10

OS

User-Level RDMA App

RNIC

node, lkey, rkey addr

Permission check Address mapping

lkey 1 lkey n rkey 1 rkey n

… …

send recv

Library

Conn Mgmt Mem Mgmt

Cached PTEs

Connections Queues Keys Memory space

User Space Kernel Space Hardware

Kernel Bypassing

Userspace Hardware

slide-24
SLIDE 24

Userspace Hardware

11

slide-25
SLIDE 25

Userspace Hardware

11

On-NIC SRAM

  • 1. Fetches and caches page table entries
  • 2. Stores secret keys for every consecutive memory

region

slide-26
SLIDE 26

Requests /us 1.5 3 4.5 6

Total Size (MB)

1 4 16 64 256 1024 Write-64B Write-1K

Userspace Hardware

11

On-NIC SRAM

  • 1. Fetches and caches page table entries
  • 2. Stores secret keys for every consecutive memory

region

slide-27
SLIDE 27

Requests /us 1.5 3 4.5 6

Total Size (MB)

1 4 16 64 256 1024 Write-64B Write-1K

Userspace Hardware

11

Expensive, unscalable hardware

On-NIC SRAM

  • 1. Fetches and caches page table entries
  • 2. Stores secret keys for every consecutive memory

region

slide-28
SLIDE 28

Things have been good in HPC

  • Special hardware
  • Few applications
  • Cheaper developer

12

What about datacenters?

  • Commodity, cheaper hardware
  • Many (changing) applications
  • Resource sharing and isolation
slide-29
SLIDE 29

Things have been good in HPC

  • Special hardware
  • Few applications
  • Cheaper developer

12

What about datacenters?

  • Commodity, cheaper hardware
  • Many (changing) applications
  • Resource sharing and isolation
slide-30
SLIDE 30

Things have been good in HPC

  • Special hardware
  • Few applications
  • Cheaper developer

12

What about datacenters?

  • Commodity, cheaper hardware
  • Many (changing) applications
  • Resource sharing and isolation
slide-31
SLIDE 31

13

slide-32
SLIDE 32

Fat applications No resource sharing

Expensive, unscalable hardware

13

slide-33
SLIDE 33

Are we removing too much from kernel?

Fat applications No resource sharing

Expensive, unscalable hardware

13

slide-34
SLIDE 34

Outline

  • Introduction and motivation
  • Overall design and abstraction
  • LITE internals
  • LITE applications
  • Conclusion

14

slide-35
SLIDE 35

High-level abstraction Protection Resource sharing Performance isolation

Without Kernel

15

slide-36
SLIDE 36

High-level abstraction Protection Resource sharing Performance isolation

Without Kernel

15

slide-37
SLIDE 37

High-level abstraction Protection Resource sharing Performance isolation

Without Kernel

15

slide-38
SLIDE 38

High-level abstraction Protection Resource sharing Performance isolation

Without Kernel

15

slide-39
SLIDE 39

High-level abstraction Protection Resource sharing Performance isolation

Without Kernel

15

slide-40
SLIDE 40

High-level abstraction Protection Resource sharing Performance isolation

LITE - Local Indirection TiEr

Protection Performance isolation Resource sharing High-level abstraction

15

slide-41
SLIDE 41

All problems in computer science can be solved by another level of indirection

Butler Lampson

16

slide-42
SLIDE 42

RNIC

17

Permission check Address mapping

Cached PTEs lkey 1 lkey n rkey 1 rkey n

… … Library

Connections Queues Keys Memory space

User-Level RDMA App

node, lkey, rkey addr send recv Conn Mgmt Mem Mgmt

User Space Hardware

slide-43
SLIDE 43

LITE

18

Connections Queues Keys Memory space

User-Level RDMA App

node, lkey, rkey addr send recv Conn Mgmt Mem Mgmt

LITE APIs

Memory APIs RPC/Msg APIs Sync APIs

User Space Kernel Space

RNIC

Permission check Address mapping

Cached PTEs lkey 1 lkey n rkey 1 rkey n

… … Hardware

slide-44
SLIDE 44

LITE

18

Connections Queues Keys Memory space

User-Level RDMA App

node, lkey, rkey addr send recv Conn Mgmt Mem Mgmt

LITE APIs

Memory APIs RPC/Msg APIs Sync APIs

Simpler applications

User Space Kernel Space

RNIC

Permission check Address mapping

Cached PTEs lkey 1 lkey n rkey 1 rkey n

… … Hardware

slide-45
SLIDE 45

LITE RNIC

19

Connections Queues Keys Memory space

User-Level RDMA App

node, lkey, rkey addr send recv Conn Mgmt Mem Mgmt

LITE APIs

Memory APIs RPC/Msg APIs Sync APIs

Permission check Address mapping

Global rkey Global lkey Global lkey Global rkey

Simpler applications

User Space Kernel Space Hardware

Cheaper hardware
 Scalable performance

slide-46
SLIDE 46

LITE RNIC

19

Connections Queues Keys Memory space

User-Level RDMA App

node, lkey, rkey addr send recv Conn Mgmt Mem Mgmt

LITE APIs

Memory APIs RPC/Msg APIs Sync APIs

Permission check Address mapping

Global rkey Global lkey Global lkey Global rkey

Simpler applications

User Space Kernel Space Hardware

RDMA Verbs

Cheaper hardware
 Scalable performance

slide-47
SLIDE 47

LITE RNIC

19

Connections Queues Keys Memory space

User-Level RDMA App

node, lkey, rkey addr send recv Conn Mgmt Mem Mgmt

LITE APIs

Memory APIs RPC/Msg APIs Sync APIs

Permission check Address mapping

Global rkey Global lkey Global lkey Global rkey

Simpler applications

User Space Kernel Space Hardware

RDMA Verbs

Cheaper hardware
 Scalable performance

slide-48
SLIDE 48

20

Implementing Remote memset Native RDMA

slide-49
SLIDE 49

20

Implementing Remote memset Native RDMA LITE

slide-50
SLIDE 50

20

Implementing Remote memset Native RDMA LITE

slide-51
SLIDE 51

20

Implementing Remote memset Native RDMA LITE

slide-52
SLIDE 52

All problems in computer science can be solved by another level of indirection

Butler Lampson

21

slide-53
SLIDE 53

All problems in computer science can be solved by another level of indirection

Butler Lampson David Wheeler

21

slide-54
SLIDE 54

All problems in computer science can be solved by another level of indirection

Butler Lampson David Wheeler

except for the problem of too many layers of indirection


– David Wheeler 21

slide-55
SLIDE 55

22

Main Challenge: How to preserve the performance benefit

  • f RDMA?
slide-56
SLIDE 56

Design Principles

23

1.Indirection only at local for one-sided RDMA

Memory

Berkeley Socket

CPU User Kernel Memory CPU User Kernel

RDMA

Userspace Kernel Hardware

slide-57
SLIDE 57

Design Principles

23

1.Indirection only at local for one-sided RDMA

Memory

Berkeley Socket

CPU User Kernel Memory CPU User Kernel

RDMA

Memory CPU User Kernel

LITE

Userspace Kernel Hardware

slide-58
SLIDE 58

Design Principles

2.Avoid hardware indirection

24

1.Indirection only at local for one-sided RDMA

LITE RNIC

Kernel Space Hardware

Address mapping

Permission check

Address mapping

Permission check

slide-59
SLIDE 59

Design Principles

2.Avoid hardware indirection

24

1.Indirection only at local for one-sided RDMA

LITE RNIC

Kernel Space Hardware

Address mapping

Permission check

Address mapping

Permission check

slide-60
SLIDE 60

Design Principles

2.Avoid hardware indirection

24

1.Indirection only at local for one-sided RDMA

LITE RNIC

Kernel Space Hardware

Address mapping

Permission check

No redundant indirection
 Scalable performance

slide-61
SLIDE 61

Design Principles

2.Avoid hardware indirection 3.Hide kernel cost

25

1.Indirection only at local for one-sided RDMA

slide-62
SLIDE 62

Design Principles

2.Avoid hardware indirection 3.Hide kernel cost

25

1.Indirection only at local for one-sided RDMA

except for the problem of too many layers of indirection – David Wheeler

slide-63
SLIDE 63

Design Principles

2.Avoid hardware indirection 3.Hide kernel cost

25

Great Performance and Scalability

1.Indirection only at local for one-sided RDMA

except for the problem of too many layers of indirection – David Wheeler

slide-64
SLIDE 64

Outline

  • Introduction and motivation
  • Overall design and abstraction
  • LITE internals
  • LITE applications
  • Conclusion

26

slide-65
SLIDE 65

LITE - Architecture

27

OS RNIC Driver

User-Level App Kernel App LITE Abstraction Verbs Abstraction

RNIC

global lkey Mgmt User-Level App global rkey User-Level RPC Function

slide-66
SLIDE 66

LITE - Architecture

27

OS RNIC Driver

User-Level App Kernel App LITE Abstraction Verbs Abstraction

RNIC

global lkey Mgmt User-Level App global rkey User-Level RPC Function

LITE 1-Side RDMA

global rkey

addr1 addr2 lh1 lh2

Permission check Address mapping

global lkey

slide-67
SLIDE 67

LITE - Architecture

27

OS RNIC Driver

User-Level App Kernel App LITE Abstraction Verbs Abstraction

RNIC

global lkey Mgmt User-Level App global rkey User-Level RPC Function

LITE RPC

send poll recv

Connections Queues

RPC Client RPC Server

RDMA Buffer Mgmt

LITE 1-Side RDMA

global rkey

addr1 addr2 lh1 lh2

Permission check Address mapping

global lkey

slide-68
SLIDE 68

LITE - Architecture

27

OS RNIC Driver

User-Level App Kernel App LITE Abstraction Verbs Abstraction

RNIC

global lkey Mgmt User-Level App global rkey User-Level RPC Function

LITE RPC

send poll recv

Connections Queues

RPC Client RPC Server

RDMA Buffer Mgmt

LITE APIs

synch mgmt mem RPC msging

LITE 1-Side RDMA

global rkey

addr1 addr2 lh1 lh2

Permission check Address mapping

global lkey

slide-69
SLIDE 69

LITE - Architecture

27

OS RNIC Driver

User-Level App Kernel App LITE Abstraction Verbs Abstraction

RNIC

global lkey Mgmt User-Level App global rkey User-Level RPC Function

LITE RPC

send poll recv

Connections Queues

RPC Client RPC Server

RDMA Buffer Mgmt

LITE APIs

synch mgmt mem RPC msging

LITE 1-Side RDMA

global rkey

addr1 addr2 lh1 lh2

Permission check Address mapping

global lkey

slide-70
SLIDE 70

Onload Costly Operations

28

OS LITE RNIC

Connections Queues Keys Memory space

Permission check Address mapping

slide-71
SLIDE 71

Onload Costly Operations

28

OS LITE RNIC

Connections Queues Keys Memory space

Permission check Address mapping

Perform address mapping and protection in kernel

slide-72
SLIDE 72

Avoid Hardware Indirection

29

OS LITE RNIC

Connections Queues Keys Memory space

Permission check Address mapping

Cached PTEs

lkey 1 lkey n rkey 1 rkey n

… …

Challenge: How to eliminate hardware indirection without changing hardware?

slide-73
SLIDE 73

Avoid Hardware Indirection

  • Register with physical address → no need for any PTEs

29

OS LITE RNIC

Connections Queues Keys Memory space

Permission check Address mapping

Cached PTEs

lkey 1 lkey n rkey 1 rkey n

… …

Challenge: How to eliminate hardware indirection without changing hardware?

slide-74
SLIDE 74

Avoid Hardware Indirection

  • Register with physical address → no need for any PTEs

29

OS LITE RNIC

Connections Queues Keys Memory space

Permission check Address mapping

lkey 1 lkey n rkey 1 rkey n

… …

Challenge: How to eliminate hardware indirection without changing hardware?

slide-75
SLIDE 75

Avoid Hardware Indirection

  • Register with physical address → no need for any PTEs

29

OS LITE RNIC

Connections Queues Keys Memory space

Permission check Address mapping

lkey 1 lkey n rkey 1 rkey n

… …

  • Register whole memory at once → one global key

Challenge: How to eliminate hardware indirection without changing hardware?

slide-76
SLIDE 76

Avoid Hardware Indirection

  • Register with physical address → no need for any PTEs

29

OS LITE RNIC

Connections Queues Keys Memory space

Permission check Address mapping

Global rkey Global lkey Global lkey Global rkey

  • Register whole memory at once → one global key

Challenge: How to eliminate hardware indirection without changing hardware?

slide-77
SLIDE 77

30

Userspace
 application

Network

LITE in Kernel

LITE LMR and RDMA

Remote
 nodes

slide-78
SLIDE 78

30

Userspace
 application

Network

LITE in Kernel

LITE LMR and RDMA

LMR

Remote
 nodes

slide-79
SLIDE 79

30

Userspace
 application

Network

LITE in Kernel

1 0x45 4 0x27 Node Phy Addr

LITE LMR and RDMA

LMR

Remote
 nodes

slide-80
SLIDE 80

30

Userspace
 application

Network

LITE in Kernel

1 0x45 4 0x27 Node Phy Addr

LITE LMR and RDMA

0x45 Node 1 Node 4 0x27

LMR

Remote
 nodes

slide-81
SLIDE 81

30

Userspace
 application

lh

Network

LITE in Kernel

1 0x45 4 0x27 Node Phy Addr

LITE LMR and RDMA

0x45 Node 1 Node 4 0x27

LMR

Remote
 nodes

slide-82
SLIDE 82

30

Userspace
 application

lh

Network

LITE in Kernel

1 0x45 4 0x27 Node Phy Addr

LITE LMR and RDMA

0x45 Node 1 Node 4 0x27

LMR

Remote
 nodes

LITE_read(lh, offset, size)

slide-83
SLIDE 83

30

Userspace
 application

lh

Network

LITE in Kernel

Permission check QoS 1 0x45 4 0x27 Node Phy Addr

LITE LMR and RDMA

0x45 Node 1 Node 4 0x27

LMR

Remote
 nodes

LITE_read(lh, offset, size)

slide-84
SLIDE 84

30

Userspace
 application

lh

Network

LITE in Kernel

Permission check QoS 1 0x45 4 0x27 Node Phy Addr

LITE LMR and RDMA

0x45 Node 1 Node 4 0x27

LMR

Remote
 nodes

Offset

LITE_read(lh, offset, size)

slide-85
SLIDE 85

30

Userspace
 application

lh

Network

LITE in Kernel

Permission check QoS 1 0x45 4 0x27 Node Phy Addr

LITE LMR and RDMA

0x45 Node 1 Node 4 0x27

LMR

Remote
 nodes

LITE_read(lh, offset, size)

slide-86
SLIDE 86

30

Userspace
 application

lh

Network

LITE in Kernel

Permission check QoS 1 0x45 4 0x27 Node Phy Addr

LITE LMR and RDMA

0x45 Node 1 Node 4 0x27

LMR

Remote
 nodes

LITE_read(lh, offset, size)

slide-87
SLIDE 87

Requests /us 1.5 3 4.5 6

Total Size (MB)

1 4 16 64 256 1024

Write-64B LITE_write-64B Write-1K LITE_write-1K

LITE RDMA:Size of MR Scalability

31

slide-88
SLIDE 88

Requests /us 1.5 3 4.5 6

Total Size (MB)

1 4 16 64 256 1024

Write-64B LITE_write-64B Write-1K LITE_write-1K

LITE RDMA:Size of MR Scalability

31

slide-89
SLIDE 89

Requests /us 1.5 3 4.5 6

Total Size (MB)

1 4 16 64 256 1024

Write-64B LITE_write-64B Write-1K LITE_write-1K

LITE RDMA:Size of MR Scalability

31

LITE scales much better than native RDMA wrt MR size and numbers

slide-90
SLIDE 90

Latency (us) 15 30 45 60

Request Size (B)

8 512 2048 8K 32K

LITE-RDMA Latency

32

kernel space user space

slide-91
SLIDE 91

Latency (us) 15 30 45 60

Request Size (B)

8 512 2048 8K 32K

LITE-RDMA Latency

32

kernel space user space

slide-92
SLIDE 92

Latency (us) 15 30 45 60

Request Size (B)

8 512 2048 8K 32K

LITE-RDMA Latency

32

kernel space user space

slide-93
SLIDE 93

Latency (us) 15 30 45 60

Request Size (B)

8 512 2048 8K 32K

LITE-RDMA Latency

32

kernel space user space

slide-94
SLIDE 94

Latency (us) 15 30 45 60

Request Size (B)

8 512 2048 8K 32K

LITE-RDMA Latency

32

LITE only adds a very slight overhead even when native RDMA doesn’t have scalability issues

kernel space user space

slide-95
SLIDE 95
  • RPC communication using two RDMA-write-imm
  • One global busy poll thread
  • Separate LMRs at server for different RPC clients
  • Hide syscall cost behind performance critical path
  • Benefits

– Low latency – Low memory utilization – Low CPU utilization

LITE RPC

33

slide-96
SLIDE 96

Outline

  • Introduction and motivation
  • Overall design and abstraction
  • LITE internals
  • LITE applications
  • Conclusion

34

slide-97
SLIDE 97

LITE Application Effort

  • Simple to use
  • Needs no expert knowledge
  • Flexible, powerful abstraction
  • Easy to achieve optimized performance

Application LOC LOC using LITE Student Days LITE-Log 330 36 1 LITE-MapReduce 600* 49 4 LITE-Graph 1400 20 7 LITE-Kernel-DSM 3000 45 26 LITE-Graph-DSM 1300 5

35

* LITE-MapReduce ports from the 3000-LOC Phoenix with 600 lines of change or addition

slide-98
SLIDE 98

MapReduce Results

  • LITE-MapReduce adapted from Phoenix [1]

[1]: “Ranger etal., Evaluating MapReduce for Multi-core and Multiprocessor Systems. (HPCA 07)” 36 2 4 6 8 21 23 25

Hadoop Phoenix LITE

Runtime (sec)

Phoenix 2-node 4-node 8-node

slide-99
SLIDE 99

MapReduce Results

  • LITE-MapReduce adapted from Phoenix [1]

[1]: “Ranger etal., Evaluating MapReduce for Multi-core and Multiprocessor Systems. (HPCA 07)” 36 2 4 6 8 21 23 25

Hadoop Phoenix LITE

Runtime (sec)

Phoenix 2-node 4-node 8-node

LITE-MapReduce outperforms Hadoop 
 by 4.3x to 5.3x

slide-100
SLIDE 100

Runtime (sec)

2 4 6 8 10

4 nodes x 4threads 7x4

LITE-Graph Grappa PowerGraph

Graph Results

  • LITE-Graph built directly on LITE using PowerGraph design
  • Grappa and PowerGraph

37

4 nodes x 4 threads 7 nodes x 4 threads

slide-101
SLIDE 101

Runtime (sec)

2 4 6 8 10

4 nodes x 4threads 7x4

LITE-Graph Grappa PowerGraph

Graph Results

  • LITE-Graph built directly on LITE using PowerGraph design
  • Grappa and PowerGraph

37

4 nodes x 4 threads 7 nodes x 4 threads

LITE-Graph outperforms PowerGraph 
 by 3.5x to 5.6x

slide-102
SLIDE 102

Conclusion

  • LITE virtualizes RDMA into flexible abstraction
  • LITE preserves RDMA’s performance benefits
  • Indirection not always degrade performance!

38

slide-103
SLIDE 103

Conclusion

  • LITE virtualizes RDMA into flexible abstraction
  • LITE preserves RDMA’s performance benefits
  • Indirection not always degrade performance!

38

  • Division across user space, kernel, and hardware
slide-104
SLIDE 104

Thank you Questions?

Get LITE at: https://github.com/Wuklab/LITE

wuklab.io