Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , - PowerPoint PPT Presentation

Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , Ivan Beschastnikh University of British Columbia HotNets17

Today’s Datacenters 2

The future: Disaggregation 3

The future: Disaggregation The future: Disaggregation is coming ▷ Intel Rack Scale Design, Ericsson Hyperscale Datacenter System 8000 ▷ HP The Machine ▷ UC Berkeley Firebox 4

Disaggregation Research Space Network + disaggregation [R2C2 ToR SIGCOMM’15, Gao et. al. OSDI’16] CPU blade Memory disaggregation [Rao et. al. Memory blade ANCS’16, Gu et. al. NSDI’17, Aguilera et. al. Storage blade SoCC’17] Flash/Storage disaggregation [Klimovic et. al. EuroSys’16, Legtchenko et. al. HotStorage’17, Decibel NSDI’17] Our research focus: how to build systems on DDCs 5

Our Assumptions Rack-scale Partial Disaggregation ToR ToR ToR CPU blade CPU blade CPU blade Memory Memory CPU Memory CPU CPU blade blade blade Memory Mem Mem Storage Storage Storage CPU CPU CPU blade blade blade Memory Mem Mem Memory Blade CPU Blade 6

What happens if a resource fails? How should applications observe resource failures? DC: resources fate share DDC: resources do not fate share Server Disaggregated Server DDC fate sharing should be enforced in the network . 7

Why enforce fate sharing in the network? ▷ Reasonable to assume legacy applications will run on DDCs unmodified ▷ All memory accesses are across the rack network ▷ Interposition layer = Software Defined Networking (SDN) 8

Fault tolerance in DDCs ▷ Fate sharing exposes a failure type to higher layers ( failure granularity ) ▷ Techniques inspired by related work Distributed systems [Bonvin et. al. SoCC’10, GFS ○ OSDI’03, Shen et. al. VLDB’14, Xu et. al. ICDE’16] HA VMs and systems [Bressoud et. al. SOSP’95, Bernick ○ et. al. DSN’05, Remus NSDI’08] HPC [Bronevetsky et. al. PPoPP’03, Egwutuoha et. al. Journal ○ of Supercomputing’13] ▷ Open research question: how to integrate existing fault tolerance techniques into DDC? 9

Fate Sharing Granularities 10

Tainted Fate Sharing ▷ Memory fails → CPU reading/using memory fails with ▷ CPU fails while writing to one replica → inconsistent memory fails ( v 1 ) ▷ Modularity vs. performance ▷ Open research question: implications of dynamic computation in-network 11

Fate Sharing Granularities DDC fate sharing should be both enforced by the network Serverless? Containers? and programmable . 12

Programmable Fate Sharing ▷ Goal: can describe an arbitrary fate sharing model and install in the network ▷ Model specification includes Failure detection ○ Failure domain ○ Failure mitigation (optional) ○ ▷ Open research questions: Who should define the specification? ○ What workflow should be used for transformation of specification to ○ switch machine code? 13

Proposed Workflow 14

Fate Sharing Specification ▷ Provides interface between components ▷ High-level language → high-level networking language [1] → compiles to switch ▷ Open research questions: ○ Spec verification? ○ Language and switch requirements for expressiveness? [1] FatTire HotSDN’13, NetKAT POPL’14, Merlin CoNEXT’14, P4 CCR’14, SNAP SIGCOMM’16 15

Vision: programmable , in-network fate sharing Open research questions ▷ Failure semantics for GPUs? Storage? ▷ Switch or controller failure? ▷ Correlated failures? ▷ Other non-traditional fate sharing models? Thank you! 16

Backup slides 17

In-Network Memory Replication ▷ Port mirror CPU operations to memory replicas, automatically recovers replica during failure ▷ Challenges: coherency, network delay, etc. ▷ Different assumptions than previous work Persistent storage backings [Sinfonia SOSP’07, RAMCloud SOSP’11, ○ FaRM NSDI’14, Infiniswap NSDI’17] ▷ Must consider network requirements Combined solutions [GFS OSDI’03, Ceph OSDI’06] ○ Performance sensitive [Costa et. al. OSDI’96] ○ 19

In-Network CPU Checkpointing ▷ Controller checkpoints processor state to remote memory (state attached operation packets) ▷ Challenges: consistent client view, checkpoint retention, non-idempotent operations, etc. ▷ Different requirements than previous work Low tail-latency [Remus NSDI’08, Bressoud et. al. SOSP’95] ○ ▷ Similar trade-offs (application specific vs generality) Protocol [DMTCP IPDPS’09, Bronevetskey et. al. PPoPP’03] ○ Workflow [Shen et. al. VLDB’14, Xu et. al. ICDE’16] ○ 20

Passive Application Monitoring ▷ Defines what information must be collected during normal execution Domain table ○ Context information ○ Application protocol headers ○ cpu_ip memory_ip start ack x.x.x.x x.x.x.x t s t a src IP src port dst IP dst port rtype op tstamp 21

Application Failure Notification ▷ Spec defines notification semantics ▷ When controller gets notified of failure → notifies application 22

Active Failure Mitigation ▷ Defines how to generate a failure domain and what rules to install on the switch ▷ Compares every domain entry to failed resource to build failure domain ▷ Installs rules based on mitigation action 23

In-Network Memory Recovery Normal Execution 24

In-Network Memory Recovery Under Failure 25

Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , - PowerPoint PPT Presentation

Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , Ivan Beschastnikh University of British Columbia HotNets17 Todays Datacenters 2 The future: Disaggregation 3 The future: Disaggregation The future: Disaggregation is

Tolerating Faults in Disaggregated Datacenters Amanda Carbonari , Ivan Beschastnikh University

Facing Up to Faults Facing Up to Faults Facing Up to Faults (v.2.0.1) (v.2.0.1) (v.2.0.1)

Ubiquitous faults T-79.4001 Seminar on Theoretical Computer Science Tero Pietilinen 4.4.2007

INTERACTING FAULTS By Tyler Lagasse Faults typically form as a network How do we best

Fault Diagnosis of Discrete-Event Systems Alejandro White, Doctoral Candidate Advisor: Dr.

I m pact of I nterm ittent Faults on Nanocom puting Devices Cristian Constantinescu June 28th,

High Availability In Disaggregated Networks Saurav Das Principal

Trusted End Host Monitors for Securing Cloud Datacenters Alan Shieh Srikanth Kandula

Performance Datacenters HotNets15 Xpander: Unveiling the Secrets of High-Performance

Reliable Communication for Datacenters Mahesh Balakrishnan Cornell University Mahesh

On Faults and Faulty Programs Ali Jaoua, Marcelo Frias, Ali Mili RAMICS 2014 Marienstatt im

W HAT A BOUT P AXOS ? Paxos tolerates a minority of processing failing by crashing . What

Traps and Faults Traps and Faults Review: Mode and Space Review: Mode and Space C A B data

From something that fits in your pocket ... ... to, well, this. The future? ... Energy A look

Prometheus @ Datacenters Why Modbus Is Even Worse than SNMP Richard Hartmann, RichiH@ {

Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015

Information theory and coding Image, video and audio compression Markus Kuhn Computer

READY FOR WORK Module 4. Slides to Safety www.worksafesask.ca WorkSafe Saskatchewan i

CNG for cleaner cities and road transport Alfredo Martn, January 2005 Fuel evolution in city /

PASTE: A Network Programming Interface for Non-Volatile Main Memory Michio Honda (NEC Laboratories

Advanced Vitreous State The Physical Properties of Glass Passive Optical Properties of Glass

Audio Engineering in a Nutshell Audio Engineering in a Nutshell - Or: The next two hours of your

Lecture Outline Regeltechniek Previous lecture: Root locus, frequency response derivation.

04832250 Computer Networks (Honor Track) A Data Communication and Device Networking

Sambuz

Useful Links

Newsletter

Mail Us