Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , - - PowerPoint PPT Presentation

tolerating faults in disaggregated datacenters
SMART_READER_LITE
LIVE PREVIEW

Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , - - PowerPoint PPT Presentation

Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , Ivan Beschastnikh University of British Columbia HotNets17 Todays Datacenters 2 The future: Disaggregation 3 The future: Disaggregation The future: Disaggregation is


slide-1
SLIDE 1

Tolerating Faults in Disaggregated Datacenters

Amanda Carbonari, Ivan Beschastnikh

University of British Columbia

HotNets17

slide-2
SLIDE 2

Today’s Datacenters

2

slide-3
SLIDE 3

The future: Disaggregation

3

slide-4
SLIDE 4

The future: Disaggregation

▷ Intel Rack Scale Design, Ericsson Hyperscale Datacenter System 8000

4

The future: Disaggregation is coming

▷ HP The Machine ▷ UC Berkeley Firebox

slide-5
SLIDE 5

Disaggregation Research Space

Flash/Storage disaggregation

[Klimovic et. al. EuroSys’16, Legtchenko et.

  • al. HotStorage’17, Decibel NSDI’17]

5

ToR

CPU blade Memory blade Storage blade

Network + disaggregation [R2C2

SIGCOMM’15, Gao et. al. OSDI’16]

Memory disaggregation [Rao et. al.

ANCS’16, Gu et. al. NSDI’17, Aguilera et. al. SoCC’17]

Our research focus: how to build systems on DDCs

slide-6
SLIDE 6

ToR

CPU blade Memory blade Storage blade CPU CPU CPU CPU

ToR

CPU blade Memory blade Storage blade

Rack-scale Partial Disaggregation

Our Assumptions

6

ToR

CPU blade Memory blade

Memory Memory

Storage blade

CPU Blade Memory Blade

Mem Mem Mem Mem

CPU CPU

slide-7
SLIDE 7

What happens if a resource fails?

DC: resources fate share

7

Server

DDC: resources do not fate share

Disaggregated Server

How should applications observe resource failures?

DDC fate sharing should be enforced in the network.

slide-8
SLIDE 8

Why enforce fate sharing in the network?

▷ Reasonable to assume legacy applications will run on DDCs unmodified ▷ All memory accesses are across the rack network ▷ Interposition layer = Software Defined Networking (SDN)

8

slide-9
SLIDE 9

Fault tolerance in DDCs

▷ Fate sharing exposes a failure type to higher layers (failure granularity)

9

▷ Techniques inspired by related work

○ Distributed systems [Bonvin et. al. SoCC’10, GFS

OSDI’03, Shen et. al. VLDB’14, Xu et. al. ICDE’16]

○ HA VMs and systems [Bressoud et. al. SOSP’95, Bernick

  • et. al. DSN’05, Remus NSDI’08]

○ HPC [Bronevetsky et. al. PPoPP’03, Egwutuoha et. al. Journal

  • f Supercomputing’13]

▷ Open research question: how to integrate existing fault tolerance techniques into DDC?

slide-10
SLIDE 10

Fate Sharing Granularities

10

slide-11
SLIDE 11

Tainted Fate Sharing

▷ Memory fails → CPU reading/using memory fails with ▷ CPU fails while writing to one replica→ inconsistent memory fails (v1) ▷ Modularity vs. performance ▷ Open research question: implications of dynamic computation in-network

11

slide-12
SLIDE 12

Fate Sharing Granularities

12

Containers? Serverless?

DDC fate sharing should be both enforced by the network and programmable.

slide-13
SLIDE 13

▷ Goal: can describe an arbitrary fate sharing model and install in the network ▷ Model specification includes

○ Failure detection ○ Failure domain ○ Failure mitigation (optional)

▷ Open research questions:

○ Who should define the specification? ○ What workflow should be used for transformation of specification to switch machine code?

13

Programmable Fate Sharing

slide-14
SLIDE 14

Proposed Workflow

14

slide-15
SLIDE 15

Fate Sharing Specification

▷ Provides interface between components ▷ High-level language → high-level networking language [1] → compiles to switch

15

[1] FatTire HotSDN’13, NetKAT POPL’14, Merlin CoNEXT’14, P4 CCR’14, SNAP SIGCOMM’16

▷ Open research questions: ○ Spec verification? ○ Language and switch requirements for expressiveness?

slide-16
SLIDE 16

Vision: programmable, in-network fate sharing

16

▷ Failure semantics for GPUs? Storage? ▷ Switch or controller failure? ▷ Correlated failures? ▷ Other non-traditional fate sharing models?

Open research questions

Thank you!

slide-17
SLIDE 17

Backup slides

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

In-Network Memory Replication

▷ Port mirror CPU operations to memory replicas, automatically recovers replica during failure ▷ Challenges: coherency, network delay, etc. ▷ Different assumptions than previous work

○ Persistent storage backings [Sinfonia SOSP’07, RAMCloud SOSP’11,

FaRM NSDI’14, Infiniswap NSDI’17]

▷ Must consider network requirements

○ Combined solutions [GFS OSDI’03, Ceph OSDI’06] ○ Performance sensitive [Costa et. al. OSDI’96]

19

slide-20
SLIDE 20

In-Network CPU Checkpointing

▷ Controller checkpoints processor state to remote memory (state attached operation packets) ▷ Challenges: consistent client view, checkpoint retention, non-idempotent operations, etc. ▷ Different requirements than previous work

○ Low tail-latency [Remus NSDI’08, Bressoud et. al. SOSP’95]

▷ Similar trade-offs (application specific vs generality)

○ Protocol [DMTCP IPDPS’09, Bronevetskey et. al. PPoPP’03] ○ Workflow [Shen et. al. VLDB’14, Xu et. al. ICDE’16]

20

slide-21
SLIDE 21

▷ Defines what information must be collected during normal execution

○ Domain table ○ Context information ○ Application protocol headers

Passive Application Monitoring

21

cpu_ip memory_ip start ack x.x.x.x x.x.x.x ts ta src IP src port dst IP dst port rtype

  • p

tstamp

slide-22
SLIDE 22

Application Failure Notification

▷ Spec defines notification semantics ▷ When controller gets notified of failure → notifies application

22

slide-23
SLIDE 23

Active Failure Mitigation

▷ Defines how to generate a failure domain and what rules to install on the switch ▷ Compares every domain entry to failed resource to build failure domain ▷ Installs rules based on mitigation action

23

slide-24
SLIDE 24

In-Network Memory Recovery

Normal Execution

24

slide-25
SLIDE 25

In-Network Memory Recovery

25

Under Failure