The Time-less Datacenter
Paul Borrill and Alan H. Karp
Earth Computing
The Datacenter Resilience Company
Stanford EE Computer Systems Colloquium
Wednesday, November 16, 2016
http://ee380.stanford.edu
The Time-less Datacenter Paul Borrill and Alan H. Karp Earth - - PowerPoint PPT Presentation
The Time-less Datacenter Paul Borrill and Alan H. Karp Earth Computing The Datacenter Resilience Company Stanford EE Computer Systems Colloquium Wednesday, November 16, 2016 http://ee380.stanford.edu Cloud Computing The Three Taxes: 1.
Paul Borrill and Alan H. Karp
The Datacenter Resilience Company
Stanford EE Computer Systems Colloquium
Wednesday, November 16, 2016
http://ee380.stanford.edu
2
Earth Computing | The Datacenter Resilience Company
Systems can fail in catastrophic ways leading to death or tremendous financial loss. Although their are many potential causes including physical failure, human error, and environmental factors, design errors are increasingly becoming the most serious culprit*
3
*NASA Formal Methods Program: https://shemesh.larc.nasa.gov/fm/fm-why-new.html
Earth Computing | The Datacenter Resilience Company
Reliable Consensus
solution in an environment where messages can get lost)
failed link assume about a partner or cohort on the other side?
FLP Result
Key Idea:
4
Earth Computing | The Datacenter Resilience Company
network we can’t tell if event at process R happened before event at process Q, unless P caused R in some way
they are stable
through failure & healing, iff you have AIT on each link
(AIT) in the Link
P P:0 Q:-- R:-- Q P:-- Q:0 R:-- R P:-- Q:-- R:0 P P:1 Q:2 R:1 P P:2 Q:2 R:1 P P:3 Q:3 R:3 Q P:-- Q:1 R:1 Q P:-- Q:2 R:1 Q P:-- Q:3 R:1 Q P:2 Q:4 R:1 Q P:2 Q:5 R:1 R P:-- Q:-- R:1 R P:-- Q:3 R:2 R P:-- Q:3 R:3 R P:2 Q:5 R:4 R P:2 Q:5 R:5 P P:4 Q:5 R:5t Process
Causal History Future Effect
slope ≤ c slope ≤ c slope ≤ c slope ≤ c11 12 13 14 21 22 23 24 25 32 31 33 34 35
P P:0 Q:-- R:-- Q P:-- Q:0 R:-- R P:-- Q:-- R:0 P P:1 Q:2 R:1 P P:2 Q:2 R:1 P P:3 Q:3 R:3 Q P:-- Q:1 R:1 Q P:-- Q:2 R:1 Q P:-- Q:3 R:1 Q P:2 Q:4 R:1 Q P:2 Q:5 R:1 R P:-- Q:-- R:1 R P:-- Q:3 R:2 R P:-- Q:3 R:3 R P:2 Q:5 R:4 R P:2 Q:5 R:5 P P:4 Q:5 R:5t Process
Causal History
slope ≤ c slope ≤ c slope ≤ c11 12 13 14 21 22 23 24 25 32 31 33 34 35
Future Effect 5
Earth Computing | The Datacenter Resilience Company
to solve the problem
(no safety proof)
partitions (no liveness proof)
understand & get right.
Trees make roles robust, easier to understand & verify
6
Earth Computing | The Datacenter Resilience Company
the problem
are the problem
retry to distinguish between delays & drops … but … retries* ruin TCP’s ordering guarantees
simple, but fresh perspective
Peter Bailis, Kyle Kingsbury. The network is reliable * Application retries (i.e. opening a new socket)
7
Earth Computing | The Datacenter Resilience Company
8
Switches are
They Drop, Reorder, Delay and Duplicate Packets
Earth Computing | The Datacenter Resilience Company
Delta Amazon Google Apple Netflix Paypal …
9
Earth Computing | The Datacenter Resilience Company
Earth Computing
E a r t h C
p u t i n g T i m B e r n e r s L e e
World Wide Web
Key Idea:
2 Simple Sets of Rules
Cloud Computing
Mere mortals can now get their computers to talk to each other Mere mortals can now manage their infrastructures
ONE WAY LINKS TWO WAY LINKS
10
Earth Computing
Key Idea:
2 Simple Sets of Rules
Earth Computing | The Datacenter Resilience Company
CAS
Key Idea:
Lock-Free data structures
AIT
Key Idea:
Recoverable Atomic Tokens
Concurrent Safety Non-Blocking Deterministic Recoverability Durable Indivisible Property
Shared Memory Reversible Token
{While (CAS(oldvalue,newvalue, ) != new value} {Transfer (AIT(tokenID,Notify=NO, ) != Continue}
11
Earth Computing | The Datacenter Resilience Company 12
C2C Lattice of Cells & Links
DC Gateway Spine Node Leaf Node ToR GW SN ToR ToR ToR LN LN SN ToR ToR ToR LN LN GW SN ToR ToR ToR LN LN SN ToR ToR ToR LN LNDC
Today’s Networking: Servers & Switches EARTH Computing: Cells & Links
Servers, Any to Any (IP) addressing
Earth Computing | The Datacenter Resilience Company
13
Today: Internal Segregation Firewalls EARTH: Dynamic Confinement Domains
The Datacenter Simplified
The Datacenter Today
Earth Computing | The Datacenter Resilience Company
Split infrastructure into:
Cloud datacenter accessed by untrusted legacy protocols Earth dynamic, resilient, programmable topologies Core where data is immutable, secure, protected, & resilient to perturbations
(failures, disasters, attacks)
14
Cloudplane Outside World EarthCore
Data Center
Confidential | Earth Computing Inc.
EarthCore
15
Earth Computing | The Datacenter Resilience Company
16
Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NIC Cell Agent N I C N I C N I C N I C NIC NICFabric
Earth Computing | The Datacenter Resilience Company
EARTH Computing Link Protocol (ECLP)
17
C e l l
Agent
NIC NIC NIC NIC NIC NIC C e l l A g e n t NIC NIC NIC NIC NIC NIC Cable C e l l
Agent
NIC NIC NIC NIC NIC NIC Cable
*Knowledge and Common Knowledge in a Distributed Environment – Joseph Y. Halpern & Yoram Moses ’90 (initial version 1984).
18
19
20
21
Earth Computing | The Datacenter Resilience Company
22
Two Phase Commit Paxos Link Reversal
Earth Computing | The Datacenter Resilience Company
the communication liveness test, and we can avoid blocking on agent ready, by having the link store the token on the receiving half of the link. If there is a failure, both sides know; and both sides know what to do next.
chosen and then restart, a solution is impossible unless some information can be remembered by an agent that has failed and restarted”. The assumption is when a node has failed and restarted, it can’t remember the state it needs to recover. With AIT, the other half of the link can tell it the state to recover from.
some edges. Transforming an arbitrary directed acyclic input graph into an output graph with at least one route from each node to a special destination node. The resulting graph can thus be used to route messages in a loop-free manner. Links store the direction of the arrow (head and tail); AIT facilitates the atomic swap of the arrow’s tail and head to maintain loop-free routes during failure and recovery.
23
Earth Computing | The Datacenter Resilience Company
24
Courtesy: Adrian Coyler, The Morning Paper. https://blog.acolyer.org/2015/02/16/knowledge-and- common-knowledge-in-a-distributed-environment/
Confidential | Earth Computing Inc. | Paul Borrill
Agent
NIC NIC NIC NIC NIC NIC
Cell Hardware (containing Processor, Memory, Storage and (e.g 6) physical network ports) Cell Software Primary Agent Network Interface Controller Network Connector Entanglement Synchronization Domain 25
Earth Computing | The Datacenter Resilience Company
TRAPHs (Tree-gRAPHs)
Elasticity, Migration, Failover
26
Bare Metal
NAL
Network Asset Layer
Earth Computing | The Datacenter Resilience Company
27
Earth Computing | The Datacenter Resilience Company
28
Cloudplane Outside World EarthCore