Distributed Systems and Databases of the Globe Unite! The Cloud, the - - PowerPoint PPT Presentation

distributed systems and databases of the globe unite the
SMART_READER_LITE
LIVE PREVIEW

Distributed Systems and Databases of the Globe Unite! The Cloud, the - - PowerPoint PPT Presentation

Distributed Systems and Databases of the Globe Unite! The Cloud, the Edge and Blockchains Amr El Abbadi University of California, Santa Barbara Divy Agrawal , Mohammad Amiri, Sujaya Maiyya, Faisal Nawab (UCSC), Victor Zakhary. OPODIS 2018 1


slide-1
SLIDE 1

Distributed Systems and Databases of the Globe Unite! The Cloud, the Edge and Blockchains

Amr El Abbadi

University of California, Santa Barbara

Divy Agrawal, Mohammad Amiri, Sujaya Maiyya, Faisal Nawab (UCSC),

Victor Zakhary.

OPODIS 2018 1

slide-2
SLIDE 2

Protocols Supporting the Cloud

  • Scalability:
  • Shard or Partition the Data.

Commit Protocols 2PC

  • Fault‐tolerance and fast access:
  • Replicate the Data

State Machine Replication and Consensus Protocols Paxos

OPODIS 2018 2

slide-3
SLIDE 3

Storage Tier Abstract Replication PAXOS Application Execution Tier Transactions 2PL+2PC

Google’s Spanner

Application Access Tier

Datacenter A Datacenter B Datacenter Z

OPODIS 2018 3

slide-4
SLIDE 4

A Path for Unification

OPODIS 2018 4

slide-5
SLIDE 5

PAXOS

OPODIS 2018 5

slide-6
SLIDE 6

Paxos: No failure Case

  • Leader Election: Initially, a leader is elected by a majority quorum.
  • Replication: Leader replicates new updates to a majority quorum.
  • Decision: Propagate decision to all asynchronously

Leader Election Fault‐Tolerant Agreement

Proposer Majority

6

Decision

Leader

Asynchronous

OPODIS 2018

slide-7
SLIDE 7

Paxos: Failure Case

  • Leader Election: If the leader fails, a new leader is elected

Leader Election Fault‐Tolerant Agreement

A

Majority

OPODIS 2018 7

Decision Also, Value Discovery in case agreement has been reached.

slide-8
SLIDE 8

Atomic Commitment

OPODIS 2018 8

slide-9
SLIDE 9

Two Phase Commit: No Failure Case:

  • Leader: Initially, a Coordinator is chosen by transaction manager.
  • Value Discovery: Coordinator collects votes from ALL cohorts
  • If all yes, Decision=Commit, if any (no or failed) Decision=Abort
  • Fault‐Tolerance: Make Decision persistent on disk
  • Decision: Send Decision to all cohorts

Value Discovery Decision

Coordinator All cohorts

OPODIS 2018 9

Make decision Fault‐tolerant by storing on disk

slide-10
SLIDE 10

Three Phase Commit

  • 2PC has possibility of Blocking
  • Solution: 3 Phase Commit.
  • Replicate decision to other cohorts (like Paxos) to avoid site failure blocking.

OPODIS 2018 10

Value Discovery Fault‐Tolerant Agreement Decision

Coordinator

All Majority

Cohorts

slide-11
SLIDE 11

Three Phase Commit: Termination

  • If leader fails or partitioned ‐ Elect new leader and execute

termination protocol

OPODIS 2018 11

Leader Election & Value Discovery Fault‐Tolerant Agreement Decision Leader

Majority

Cohorts

Majority

slide-12
SLIDE 12

Common phases observed?

  • Paxos and 2PC/3PC are leader based protocols
  • Agreement on a single value is the main goal
  • Both protocols ensure fault tolerance on the decided value
  • Disseminate the decision, typically asynchronously

OPODIS 2018 12

slide-13
SLIDE 13

Consensus & Commitment (C&C) Framework

Leader Election Value Discovery Fault‐tolerant Agreement Decision

OPODIS 2018 13

slide-14
SLIDE 14

Paxos Atomic Commitment (PAC)

  • Any processes can terminate a transaction: leader election
  • No separate termination case (like Paxos)

OPODIS 2018 14

Leader Election & Value Discovery Fault‐Tolerant Agreement Decision Leader

Majority

Cohorts

All

slide-15
SLIDE 15

2PC/State Machine Replication (SMR)

  • Alternative approach to achieve fault‐tolerance
  • Replicate state of each process for persistence
  • Spanner and Gray and Lamport 2006
  • Layered architecture: 2PC on top of SMR
  • 2PC among coordinator and cohorts
  • SMR among shard leaders and replicas

15

Fault‐Tolerant Persistence Coordinator Cohorts Replicas Replicas Fault‐Tolerant Persistence

OPODIS 2018

Leader Leader

slide-16
SLIDE 16

2PC/State Machine Replication (SMR)

  • Alternative approach to achieve fault‐tolerance
  • Replicate state of each process for persistence
  • Spanner and Gray and Lamport 2006
  • Layered architecture: 2PC on top of SMR
  • 2PC among leaders of coordinator and cohorts
  • SMR among shard leader and replicas

16

Value Discovery Fault‐Tolerant Persistence Coordinator Leader Decision Cohort Leader Replicas Replicas Fault‐Tolerant Persistence

OPODIS 2018

Majority Majority Majority

slide-17
SLIDE 17

Generalized‐PAC (G‐PAC)

  • Follows the abstractions of C&C
  • Flattened architecture:
  • No notion of cohort leader and replica

Coordinator  all identical replicas

  • Reduces one round‐trip communication
  • Related to other consolidating consensus and commitment like TAPIR

[Zhang SOSP 2015] and Janus [Mu OSDI 2016]

  • Restrictive assumptions

17 OPODIS 2018

slide-18
SLIDE 18

G‐PAC (Generalized Paxos Atomic Commit)

18

Leader Election + Value Discovery Fault‐Tolerant Agreement Decision Coordinator Majority Cohort 1 Replicas Cohort 2 Replicas All

A majority of replicas from ALL cohorts A majority of replicas from a majority of cohorts

OPODIS 2018

slide-19
SLIDE 19

Consensus & Commitment (C&C) Framework

  • Useful in modeling many existing data management protocols as well

as propose new protocols

Leader Election Value Discovery Fault‐tolerant Agreement Decision

OPODIS 2018 19

slide-20
SLIDE 20

Consensus for Edge Data Management

OPODIS 2018 20

slide-21
SLIDE 21

The future of web/cloud applications

  • Emerging technologies
  • Business Analytics
  • Virtual/Augmented Reality
  • Data Science
  • Sensors/IoT

21/46 OPODIS 2018

slide-22
SLIDE 22

The Cloud

  • Big potential, but bigger challenges
  • Application Requirements:
  • Real‐time (low latency)
  • Continuous data flows (high throughput)
  • Challenge 1: The cloud is far away

100 of milliseconds to seconds

22/46 OPODIS 2018

slide-23
SLIDE 23

Is there a principled approach to decentralize the cloud for large scale replication?

OPODIS 2018 23

slide-24
SLIDE 24

Edge Data Management

24/46 OPODIS 2018

slide-25
SLIDE 25

We are making the world a better place through Paxos algorithms

OPODIS 2018 25

slide-26
SLIDE 26

Flexible Paxos [Howard et. al. OPODIS 2016]

  • Majority quorums for BOTH Leader Election AND Replication are

too conservative

1 2 3 4 5 6 7 Majority quorum Majority quorum

OPODIS 2018 26

slide-27
SLIDE 27

Flexible Paxos

  • Generalized Quorum Condition: Only Leader Election Quorums and

Replication Quorums must intersect.

  • Decouple Leader Election Quorums from Replication Quorums
  • Arbitrarily small replication quorums as long as Leader Election Quorums

intersect with every Replication Quorum

  • No changes to Paxos algorithms

1 2 3 4 5 6 7 Leader Election Quorum Replication Quorum

OPODIS 2018 27

slide-28
SLIDE 28

Back to Edge Data Management

  • Edge persistence: edge datacenters store copies of data
  • Storage offloading: data placed in the edge near users

OPODIS 2018 28

slide-29
SLIDE 29

Beirut 2018

  • A zone:
  • Mutually

exclusive set

  • f nodes
  • Datacenter +

edge nodes

  • Or Edge nodes
slide-30
SLIDE 30

An edge‐aware Paxos

  • Direct application of Flexible Paxos to zones.
  • Elect a leader zone rather than a leader node

Zone 1 Zone 2 Zone 3 Zone 4

OPODIS 2018

Leader Zone

30

Paxos

  • Replicate updates to majority
  • f all nodes
  • Leader election: majority of all

nodes Edge Paxos

  • Replicate updates to majority
  • f nodes in the leader zone
  • Leader election: majority from

within all zones.

slide-31
SLIDE 31

An edge‐aware, mobile Paxos

Zone 1 Zone 2 Zone 3 Zone 4 Leader Election Local replication Local replication Local replication

OPODIS 2018 31

slide-32
SLIDE 32

CAN WE DO BETTER???

OPODIS 2018 32

slide-33
SLIDE 33

Expanding Quorums

  • Dynamic Expanding Leader Election Quorums:
  • A leader announces the Replication Quorum it will use
  • Future leader election quorums need intersect only announced

quorums

  • Implementation
  • Intent Replication Quorums are piggybacked in the leader election

phase

  • To detect Intents, leader election quorums must intersect
  • If an announcement is detected, the Leader Election Quorum

expands to intersect the announced Intent Replication Quorums

OPODIS 2018 33

slide-34
SLIDE 34

Expanding Quorums example

Zone 1 Zone 2 Zone 3 Zone 4 Leader Election {intent: zone 1} Local replication Local replication Local replication Leader Election {intent: zone 5} Leader Election expansion

X

OPODIS 2018

Zone 5

34

slide-35
SLIDE 35

Leader Zone Expanding Quorums

35/46

  • Can we design smaller Leader Election quorums?
  • Leader Zone: Assign one zone as Leader Manager Zone
  • Leader Election quorums: Majority of nodes in the Leader Manager Zone
  • All Leader Election quorums intersect
  • `Use Intent Quorums to expand Leader Election Quorums.
  • Especially useful if the aspiring leaders are close to each other

OPODIS 2018

slide-36
SLIDE 36

Allowing for Mobility: Leader Handoff

  • Treat leadership as a logical role instead of physical
  • Relinquish leadership to another node when user moves
  • Note: node hosting the previous leader is functional

Zone A Zone B Zone C

Leader

relinquish() ‐ current state ‐ slots []

Leader

OPODIS 2018 36

slide-37
SLIDE 37

Dynamic Paxos: a natural marriage with Edge Computing

OPODIS 2018 37

slide-38
SLIDE 38

Blockchains

  • Many interesting (controversial?) problems in

new guises.

  • Distributed Systems: Consensus, replication, etc
  • Data Management: Transactions, replication,

commitment, etc

OPODIS 2018 38

slide-39
SLIDE 39

Origins of Blockchain: Traditional Banking Systems

OPODIS 2018 39

slide-40
SLIDE 40

Bitcoin

OPODIS 2018 40

slide-41
SLIDE 41

Traditional Banking Systems

  • From Database and Distributed Computing Perspective
  • Identities and Signatures
  • You are your signature: IDENTITY

 Private and Public Digital signatures

  • Ledger
  • The balance of each identity (saved in a DB)

 Blockchain (basically a linked list!)

  • Transactions
  • Move money from one identity to another
  • Concurrency control to serialize transactions  Mining and Proof of Work
  • Typically backed by a transactions log
  • Log is persistent  Replication to the whole world
  • Log is immutable and tamper‐free (end‐users trust this)  HashPointers

OPODIS 2018 41

slide-42
SLIDE 42

A Bitcoin Big Picture

SignatureAlice‐Bob Signature…‐Alice Pk‐Bob Sk‐Alice Sign() Pk‐Diana Sign() Sk‐Bob SignatureBob‐Diana Pk‐…. Sign() Sk‐Diana

…….

  • A bitcoin is a chain of digital signatures
  • Coin owners digitally sign their coins to transfer them to other recipients
  • Alice gives a bitcoin to Bob, Bob gives it to Diana, etc.

OPODIS 2018 42

slide-43
SLIDE 43

Double Spending

  • Spending the same digital cash asset more than once
  • Impossible to do in physical cash
  • Prevented in traditional banking systems through concurrency control

SignatureAlice‐Bob Pk‐Diana SignatureAlice‐Bob Sign() Sk‐Bob Pk‐Marty SignatureBob‐Diana Sk‐Bob SignatureBob‐Marty Sign()

I took her car I took his ring

OPODIS 2018 43

slide-44
SLIDE 44

Double Spending Prevention

  • Classic Approach: Centralization, Concurrency Control, etc
  • Blockchain Approach: State Machine Replication (SMR)
  • A network of nodes maintains a ledger (or log)
  • Network nodes work to agree on transaction order
  • Serializing transactions on every coin prevents double spending

OPODIS 2018 44

slide-45
SLIDE 45

The Ledger (or log)

  • Where is the ledger stored?
  • Each network node maintains its copy of the ledger
  • Transactions are grouped into blocks
  • How is the ledger tamper‐free?

Blocks are connected through hash‐pointers (SHA‐256)

  • Each block contains the hash of the previous block

Hash() Hash() Hash()

OPODIS 2018 45

slide-46
SLIDE 46

Making Progress

  • The ledger is fully replicated to all network nodes
  • To make progress:
  • Network nodes validate new transactions to make sure that:
  • Transactions on the new block do not conflict with each other
  • Transactions on the new block do not conflict with previous blocks transactions
  • Network nodes need to agree on the next block to be added to the blockchain

Consensus

OPODIS 2018 46

slide-47
SLIDE 47

Mining Details: Block Creation

TX1 TX2 TXn .

. .

TX1 TX2 TXn .

. .

TX1 TX2 TXn .

. . OPODIS 2018 47

slide-48
SLIDE 48

Can Network Nodes Use Consensus?

OPODIS 2018 48

slide-49
SLIDE 49

Consensus Protocols

All participants should be known a priori

  • Permissioned vs Permissionless settings
  • Permissionless setting:
  • Network nodes freely join or leave at anytime

OPODIS 2018 49

slide-50
SLIDE 50

Nakamoto’s Consensus: Proof of Work (PoW)

  • Intuitively, network nodes race to solve a puzzle: A Lottery
  • This puzzle is computationally expensive
  • Once a network node finds (mines) a solution:
  • It adds its block of transactions to the blockchain
  • It multi‐casts the solution to other network nodes
  • Other network nodes accept and verify the solution

OPODIS 2018 50

slide-51
SLIDE 51

Mining Details: Mining

TX1 TX2 TXn .

. .

TX1 TX2 TXn .

. .

TX1 TX2 TXn .

. .

TX1 TX2 TXn .

.

TXreward Transactions Header Version Previous Block Hash Merkle Tree Root Hash Time Stamp Current Target Bits Nonce

SHA256( ) < D

OPODIS 2018 51

slide-52
SLIDE 52

Mining Big Picture

OPODIS 2018 52

slide-53
SLIDE 53

Forks: Double Spending

  • Transactions in the forked blocks might have conflicts
  • Could lead to double spending
  • Forks have to be eliminated
  • Transactions in this block have to be resubmitted
  • Miners join the longest chain to resolve forks
  • Transaction “committed” if block 6 deep

Bob tries to double spend the same coin twice in two transactions

OPODIS 2018 53

slide-54
SLIDE 54

Permissioned Blockchain

  • Run blockchain among known, identified participants.
  • Secures the interactions among a group of entities with a common

goal but which do not fully trust each other.

  • Consensus using PBFT.
  • Fast.
  • IBM’s Hyperledger Fabric.

OPODIS 2018 54

slide-55
SLIDE 55

Atomic Swaps

  • Allow transactions to span multiple blockchains
  • E.g., swap Bitcoin with Ethereum
  • The goal:
  • Swap assets across multiple blockchains
  • If all parties conform to the protocol:
  • All swaps take place
  • If some coalition deviates from the protocol, then no conforming party ends

up worse off

  • No coalition has an incentive to deviate from the protocol
  • TierNolan, Atomic swap using cut and choose,

https://bitcointalk.org/index.php?topic=193281.msg2224949#msg2224949 (2013)

  • Herlihy, Maurice. "Atomic cross‐chain swaps." PODC 2018

OPODIS 2018 55

slide-56
SLIDE 56

Atomic Swap Example

  • Alice wants to trade Bitcoin for Ethereum with Bob

Bob Alice

OPODIS 2018 56

slide-57
SLIDE 57

Atomic Swap Example

  • Alice wants to trade Bitcoin for Ethereum with Bob

Bob Alice

  • Create a secret s
  • Calculate its hash h = H(s)

s and h

OPODIS 2018 57

slide-58
SLIDE 58

Atomic Swap Example

  • Alice wants to trade X Bitcoin for Y Ethereum with Bob

Bob Alice s and h T1 Move X bitcoins to Bob if Bob provides secret s | h = H(s) Bitcoin blockchain T1

OPODIS 2018 58

slide-59
SLIDE 59

Atomic Swap Example

  • Now, h is announced in Bitcoin blockchain and made public

Bob Alice s Alice’s X bitcoins are locked in T1’s smart contract Bitcoin blockchain T1 Ethereum blockchain T2 Move Y Ethereum to Alice if Alice provides secret s | h = H(s) T2

OPODIS 2018 59

slide-60
SLIDE 60

Atomic Swap Example

  • Now, for Alice to execute T2 and redeem Y Ethereum, she reveals s

Bob Alice s Alice’s X bitcoins are locked in T1’s smart contract Bitcoin blockchain T1 Ethereum blockchain Bob’s Y Ethereum are locked in T2’s smart contract T2

OPODIS 2018 60

slide-61
SLIDE 61

Atomic Swap Example

  • Revealing s, executes T2. Now s is public in Ethereum’s blockchain

Bob Alice s Alice’s X bitcoins are locked in T1’s smart contract Bitcoin blockchain T1 Ethereum blockchain Bob’s Y Ethereum are locked in T2’s smart contract T2

OPODIS 2018 61

slide-62
SLIDE 62

Atomic Swap Example

  • Now, Bob uses s to execute T1 and redeem his Bitcoins

Bob Alice s Alice’s X bitcoins are locked in T1’s smart contract Bitcoin blockchain T1 Ethereum blockchain Bob’s Y Ethereum are locked in T2’s smart contract T2 s

OPODIS 2018 62

slide-63
SLIDE 63

Atomic Swap Example

  • Now, Bob uses s to execute T1 and redeem his Bitcoins

Bob Alice s Alice’s X bitcoins are locked in T1’s smart contract Bitcoin blockchain T1 Ethereum blockchain Bob’s Y Ethereum are locked in T2’s smart contract T2 s

OPODIS 2018 63

slide-64
SLIDE 64

Atomic Swap Example: What can go wrong?

  • Alice locks her X Bitcoins in Bitcoin’s blockchain through T1
  • Bob sees T1 but refuses to insert T2
  • Now, Alice’s Bitcoins are locked for good
  • A conforming Alice ends up worse off because Bob doesn’t follow protocol
  • Prevention
  • Use timelocks to expire a contract
  • Specify that an expired contract is refunded to the creator of this contract

OPODIS 2018 64

slide-65
SLIDE 65

Atomic Swap Example: Timelocks

Bob Alice T1: Move X bitcoins to Bob if Bob provides secret s | h = H(s) T2: Move Y Ethereum to Alice if Alice provides secret s | h = H(s) T3: Refund T1 to Alice if Bob does not execute T1 before 48 hours T4: Refund T2 to Bob if Alice does not execute T2 before 24 hours How to determine the time period of a timelock?

OPODIS 2018 65

slide-66
SLIDE 66

Timelocks

  • Timelocks are set to prevent any conforming party to end up worse off
  • Forces Alice to reveal s before Alice’s contract expires
  • Allows enough time for Bob to execute T1 after Alice executes T2
  • If Alice does not reveal s, both contracts should expire and be refunded

Bob Alice

OPODIS 2018 66

slide-67
SLIDE 67

Atomic Swap Modeling [Herlihy, PODC 2018]

  • A cross‐chain swap is modeled as a directed graph D = (V,A)
  • Vertices V are parties and arcs A are proposed asset transfers
  • There is known time bound Δ
  • Δ should be enough for one party to publish a contract to a blockchain and for

a second party to confirm that the contract has been published

  • Generalizes the basic solution.
slide-68
SLIDE 68

Parting Thoughts

  • Building global‐scale data management systems

Security and Privacy Data Management Distributed Systems

68/46 OPODIS 2018

Economics