Outline Motivation Opportunities and challenges O t iti d h ll - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Motivation Opportunities and challenges O t iti d h ll - - PDF document

8/27/2013 Dependability and Security with Dependability and Security with Clouds of Clouds lessons learned from n years of research Miguel Correia WORKSHOP ON DEPENDABILITY AND INTEROPERABILITY IN WORKSHOP ON DEPENDABILITY AND


slide-1
SLIDE 1

8/27/2013 1

Dependability and Security with Dependability and Security with Clouds‐of‐Clouds

lessons learned from n years of research Miguel Correia

WORKSHOP ON DEPENDABILITY AND INTEROPERABILITY IN WORKSHOP ON DEPENDABILITY AND INTEROPERABILITY IN HETEROGENEOUS CLOUDS (DIHC13) August 27th 2013, Aachen, Germany

Outline

  • Motivation

O t iti d h ll

  • Opportunities and challenges
  • Storage – DepSky
  • Processing – BFT MapReduce
  • Services – EBAWA
  • Conclusions

3 example clouds‐of‐clouds

2

slide-2
SLIDE 2

8/27/2013 2

Outline Motivation

  • Opportunities and challenges
  • Storage – DepSky
  • Processing – BFT MapReduce
  • Services – EBAWA

C l i

  • Conclusions

3

Clouds are complex so they fail

These faults can stop services, corrupt state and execution: Byzantine faults

4

slide-3
SLIDE 3

8/27/2013 3

Cloud‐of‐Clouds

  • Consumer runs service on a set of clouds forming a

virtual cloud what we call a cloud of clouds virtual cloud, what we call a cloud‐of‐clouds

  • Related to the notion of federation of clouds

– “Federation of clouds” suggests a virtual cloud created by providers – “Cloud‐of‐clouds” suggests a virtual cloud created by consumers, possibly for improving dep&sec , p y p g p

5

User

Cloud‐of‐Clouds dependability+security

  • There is cloud redundancy and diversity

if l d f il l d f l d th t

  • so even if some clouds fail a cloud‐of‐clouds that

implements replication can still guarantee:

– Availability – if some stop, the others are still there – Integrity – they can vote which data is correct – Disaster‐tolerance – clouds can be geographically far No vendor lock in several clouds anyway – No vendor lock‐in – several clouds anyway

6

User

slide-4
SLIDE 4

8/27/2013 4

Outline

  • Motivation

Opportunities and challenges

  • Storage – DepSky
  • Processing – BFT MapReduce
  • Services – EBAWA
  • Conclusions

7

Replication / geo‐replication in clouds

  • Provides opportunities and challenges

S d t f A EC2

  • Some data from Amazon EC2

– Not different clouds but close enough – Data collected ~hourly during August 2‐15, 2013 – One micro instance (virtual server) per Amazon region

8

slide-5
SLIDE 5

8/27/2013 5

Geographical redundancy and diversity

Amazon EC2 regions and availability zones

9

  • Each region is completely independent
  • Each availability zone (AZ) is isolated
  • Note: personal map, positions may not be accurate

Network redundancy and diversity

10

  • ASs provide another level of diversity (most ISPs have more than one)
  • ISPs observed on the August 2nd (a few changes were observed in 2 weeks)
  • This is not the complete graph, several edges are missing

2 labels mean different path per direction; label that counts is the closest to the destination

slide-6
SLIDE 6

8/27/2013 6

Latency: high and variant

600 700 300 400 500 600

RTT (ms) si‐sp ir‐sy ir‐to ir‐sp ir‐nc ir‐nv

11

Compare with 0.2ms in an Ethernet LAN…

100 200

02‐08‐2013 17:1604‐08‐2013 22:2107‐08‐2013 03:2609‐08‐2013 08:3611‐08‐2013 13:4113‐08‐2013 18:46

nc‐or

Throughput: low and variant

90 100 30 40 50 60 70 80

Throughput (Mbit/s) nc‐or ir‐nv ir‐nc ir‐sp ir‐to ir‐sy

12

  • Same pairs as in previous slide but opposite order
  • Important: the throughput is higher with better instances (we used micro)

10 20

02‐08‐2013 17:1104‐08‐2013 22:1107‐08‐2013 03:1209‐08‐2013 08:1211‐08‐2013 13:1213‐08‐2013 18:14

y si‐sp

slide-7
SLIDE 7

8/27/2013 7

Economic cost (data transfer)

  • Cost for data transfer IN to EC2 from Internet: 0 $
  • Cost for data transfer OUT from EC2 to Internet:
  • Cost for data transfer OUT from EC2 to Internet:

– Vertical axis is data transferred and has logarithmic scale

1000 10000 100000 1000000 fered (1 GB ‐ 611 TB)

Data obtained on Aug. 2013 at http://aws.amazon.com/ec2/pricing/

13

1 10 100 1 12 120 1200 8300 29300 Data transf Cost (US $)

CAP theorem

  • It is impossible for a web service to provide the

following three guarantees: following three guarantees:

– Consistency – Availability – Partition‐tolerance

  • Network diversity suggests partitions are unlikely

Nodes may get isolated but not sets of nodes from others – Nodes may get isolated but not sets of nodes from others – But relaxed consistency may be offered in they happen – Current research topic; we won’t address it

14

slide-8
SLIDE 8

8/27/2013 8

Outline

  • Motivation

O t iti d h ll

  • Opportunities and challenges

Storage – DepSky

  • Processing – BFT MapReduce
  • Services – EBAWA
  • Conclusions

15

DepSky

  • (Client‐side) library for cloud‐of‐clouds storage

Fil t i il t A S3 d/ it d t t – File storage, similar to Amazon S3: read/write data, etc.

  • Use storage clouds as they are:

– No specific code in the cloud

  • Data is updatable

– Byzantine quorum replication

Amazon S3 Ni i

replication protocols for consistency

16

Nirvanix Rackspace Windows Azure

slide-9
SLIDE 9

8/27/2013 9

Write protocol

D

qwjda sjkhd ahsd

time Cloud A Cloud B

WRITE FILE ACK

D D

WRITE METADATA ACK

qwjda sjkhd ahsd qwjda sjkhd ahsd

Cloud C Cloud D

17

D D

qwjda sjkhd ahsd qwjda sjkhd ahsd

Read protocol

D

qwjda sjkhd ahsd

highest version number (+fastest or cheapest cloud)

time Cloud A Cloud B

REQUEST FILE FILE

D D

REQUEST METADATA

qwjda sjkhd ahsd qwjda sjkhd ahsd

METADATA

Cloud C Cloud D

18

D D

qwjda sjkhd ahsd qwjda sjkhd ahsd

File is fetched from other clouds if signature doesn’t match the file

slide-10
SLIDE 10

8/27/2013 10

Limitations of the solution so far

Data

  • Data is accessible

by cloud providers

Data

by cloud providers

  • Requires n×|Data|

storage space

Cloud A Cloud B Cloud C Cloud D

19

Data Data Data Data Data Data Data Data

Combining erasure codes and secret sharing

Data

encrypt

Only for data, not metadata

S1 S2 S3 S4 share K key

Data

disperse F1 F2 F3 F4 Cloud A Cloud B Cloud C Cloud D

20

F1 S1 F2 S2 F3 S3 F4 S4

Inverse process for reading from f+1 shares/fragments Encrypted so data can’t be read at a cloud! Only twice the size of storage, not 4 times!

slide-11
SLIDE 11

8/27/2013 11

DepSky latency

100KB files, clients in PlanetLab nodes

DepSky read latency is close to the cloud with the best latency

21

DepSky write latency is close to the cloud with the worst latency

Lessons from Depsky

  • Provides: availability, integrity, disaster‐tolerance, no

vendor lock in confidentiality vendor lock‐in, confidentiality

  • Insights:

– Some clouds can be faulty so we need Byzantine quorum system protocols (to reason about subsets of clouds) – Signed data allows reading from a single cloud, so faster or cheaper than average p g – Erasure codes can reduce the size of data stored – Secret sharing can be used to store cryptographic keys in clouds (avoiding the need of a key distribution service)

22

slide-12
SLIDE 12

8/27/2013 12

Outline

  • Motivation

O t iti d h ll

  • Opportunities and challenges
  • Storage – DepSky

Processing – BFT MapReduce

  • Services – EBAWA
  • Conclusions

23

What is MapReduce?

  • Programming model + execution environment

I t d d b G l i 2004 – Introduced by Google in 2004 – Used for processing large data sets in clusters of servers

  • Hadoop MapReduce, an open‐source MapReduce

– The most used, the one we have been using – Includes HDFS, a file system for large files

24

slide-13
SLIDE 13

8/27/2013 13

MapReduce basic idea

25

servers servers

Job submission and execution

26

slide-14
SLIDE 14

8/27/2013 14

The problem

  • The original Hadoop MR tolerates the most common

faults faults

– Job tracker detects and recovers crashed/stalled map/reduce tasks – Detects corrupted files (a hash is stored with each block)

  • But execution can be corrupted, tasks can return

wrong output wrong output

  • and clouds can suffer outages

27

BFT MapReduce

  • Basic idea: to replicate tasks in different clouds and

vote the results returned by the replicas vote the results returned by the replicas

– Inputs initially stored in all clouds

Cloud 1 Cloud 2

28

Cloud 3 Cloud 4

slide-15
SLIDE 15

8/27/2013 15

Original MR – Map perspective

29

BFT MR – Map perspective

rent clouds

30

Replicas in differ

Vote

slide-16
SLIDE 16

8/27/2013 16

Original MR – Reduce perspective

31

BFT MR – Reduce perspective

erent clouds

32

Replicas in diffe

Vote

slide-17
SLIDE 17

8/27/2013 17

Deferred execution

  • Faults are uncommon; consider max. of f faults
  • JT creates only f+1 replicas in f+1 clouds (f in standby)
  • JT creates only f+1 replicas in f+1 clouds (f in standby)
  • If results differ or one cloud stops, request 1 more (up to f)

33

Distributed job tracker

  • Job tracker controls all task executions in the task

trackers (e g start task detect faults) trackers (e.g., start task, detect faults)

– If job tracker is in one cloud, separated from many task trackers by the internet

  • high latency to control operations
  • single point of failure
  • Distributed job tracker

– Each cloud has one job tracker (JT) – Each JT controls the tasks in its cloud, no “remote control”

34

slide-18
SLIDE 18

8/27/2013 18

WAN communication

  • All this communication through the WAN => high delay and $ cost

– data transferred per pair can even be the size of the split (e.g., MBs)

Solution: digest communication

  • Reduces fetch the map task outputs

– Intra‐cloud fetch: output fetched normally Intra cloud fetch: output fetched normally – Inter‐cloud fetch: only hash of the output fetched e cloud

36

same

  • ther clouds
slide-19
SLIDE 19

8/27/2013 19

Makespan varying parallelism

1400 1600 400 600 800 1000 1200 1400 Makespan (s) Original BFT‐MR

2x: good because 2x tasks w/resources fixed => comm. delay has low impact

  • Estimated analytically based on another BFT‐MR we implemented; f=1

37

200 10 20 30 40 50

  • Max. number of map/reduce tasks executed in parallel

Lessons from BFT MapReduce

  • Provides: availability, integrity, disaster‐tolerance, no

vendor lock in (no confidentiality) vendor lock‐in (no confidentiality)

  • Insights:

– Tasks can be replicated in different clouds to mask faulty executions / faulty clouds – Defer execution to reduce # tasks executed without faults – Control components should be placed in all clouds to avoid Control components should be placed in all clouds to avoid control operations between clouds (high delays) – Send only digests between clouds to avoid huge communication delays and costs

38

slide-20
SLIDE 20

8/27/2013 20

Outline

  • Motivation

O t iti d h ll

  • Opportunities and challenges
  • Storage – DepSky
  • Processing – BFT MapReduce

Services – EBAWA

  • Conclusions

39

State Machine Replication (SMR)

  • Can be used to replicate “any” service

Ex: file sys k v store DBs authentication serv coordination serv – Ex: file sys., k.v. store, DBs, authentication serv., coordination serv.,… – All replicas start in the same state – All replicas execute the same requests in the same order Servers/replicas Clients issue requests request

40

Servers/replicas implement service q request total order multicast

  • rders requests

Fault‐tolerant because operation does not depend on all replicas, f can be faulty

slide-21
SLIDE 21

8/27/2013 21

BFT SMR is expensive in WANs

  • Example: PBFT (Castro&Liskov’99)

S l i ti t t – Several communication steps, messages, votes – Ok for LANs but if steps are through a WAN…

Client Replica 1 request pre- prepare prepare commit reply Replica 2

f=1

41 Replica 2 Replica 3 Replica 4

f=1

EBAWA

Efficient Byzantine Algorithm for Wide Area networks

  • EBAWA is a BFT SMR algorithm like PBFT…

b t ith t f h i f ki it ffi i t

  • …but with a set of mechanisms for making it efficient

in WANs…

  • …which make it adequate for clouds‐of‐clouds

42

slide-22
SLIDE 22

8/27/2013 22

Unique Sequential Identifier Generator service (USIG)

  • Replicas include a trusted module: USIG

L l d l i l t d t b t t d ( i – Local module, implemented to be trusted (e.g., in hardware), simple interface – Simple: monotonic counter + cryptographic mechanism

  • Interface:

– createUI: assigns a signed unique identifier to a message m – verifyUI: checks if the unique id is valid for message m verifyUI: checks if the unique id is valid for message m

43

Benefits of USIG

  • USIG prevents certain kinds of faults/misbehavior

Faulty replicas can’t send 2 messages with the same id – Faulty replicas can t send 2 messages with the same id

  • This allows cutting:

– The number of servers from 3f+1 to 2f+1 – Number of communication steps by one (lower latency)

  • Together they greatly reduce the #messages:

44 Client request prepare commit reply

f=1

slide-23
SLIDE 23

8/27/2013 23

Rotating primary

  • The primary only orders a batch of requests per view,

then the next replica becomes the primary then the next replica becomes the primary

– Prevents performance attacks (e.g., faulty server slows down

service) – critical in WANs due to high timeouts

– Reduces latency as client can access the closest replica – Provides load balancing

45 Client Primary of view { 0, 3, 6 …} request prepare commit reply Primary of view { 1, 4, 7…} Primary of view { 2, 5, 8…}

Asynchronous views

  • A replica starts an agreement as soon as it receives a

client request by sending a prepare message client request by sending a prepare message

– Servers without pending client requests skip their turn by sending a special message

46

slide-24
SLIDE 24

8/27/2013 24

Performance in PlanetLab – Europe

  • Nodes:

P t l ( li t) – Portugal (client) – France – Italy – Germany (primary) – Spain

  • EBAWA avg latency
  • EBAWA avg. latency

43% lower than JPBFT’s

47

Lessons from EBAWA

  • Provides: availability, integrity, disaster‐tolerance, no

vendor lock‐in (no confidentiality) vendor lock in (no confidentiality)

  • Insights:

– Reducing the communication steps (with the USIG) reduces the latency – Reducing the number of replicas (with the USIG) reduces costs – Rotating the primary allows preventing performance attacks, l d b l i li t l t li ( d l t ) load balancing, client can access closest replica (reduc. latency) – Asynchronous views reduce waiting, thus latency – Waiting for n‐f clouds allows disregarding the f with higher RTT

48

slide-25
SLIDE 25

8/27/2013 25

Outline

  • Motivation

O t iti d h ll

  • Opportunities and challenges
  • Storage – DepSky
  • Processing – BFT MapReduce
  • Services – EBAWA

Conclusions Conclusions

49

Lessons learned

  • Clouds‐of‐clouds: solution for consumers to create

dependable&secure clouds on top of cloud offerings dependable&secure clouds on top of cloud offerings

– We’ve seen clouds‐of‐clouds for: storage, processing, services

  • Usable or latency/cost too high?

– Latency: if we disregard processing delays, the latency is a few RTTs, but the same with “normal” clouds (e.g., min 2 , ( g , RTTs for an HTTP request) – Cost: higher, but dependability&security aren’t free

50

slide-26
SLIDE 26

8/27/2013 26

Lessons learned

  • Important design goals:

– to reduce the number of communication steps to reduce the number of communication steps – to reduce the data sent out of the individual clouds – to reduce the number of messages – to reduce the size of the data stored – to reduce the number of replicas – to do control locally in every cloud

  • We’ve seen several mechanisms to tackle these goals:

g

– Byzantine quorum system protocols; auto‐verifiable (signed) files; erasure codes; task replication; deferred task execution; local control components; digest communication between clouds; the USIG service; rotating primary; asynchronous views; waiting for n‐f replicas

51

Thank you!

Further reading:

  • My paper at DIHC 2013’s post‐proceedings

y p p p p g

  • A. N. Bessani et al. DepSky: Dependable and Secure Storage in a Cloud‐of‐
  • Clouds. ACM Transactions on Storage, to appear (also at EuroSys 2010)
  • M. Correia et al. On the Feasibility of Byzantine Fault‐Tolerant MapReduce

in Clouds‐of‐Clouds. In Proc. DISCCO 2012

  • G. S. Veronese et al. EBAWA: Efficient Byzantine Agreement for Wide‐Area
  • Networks. In Proc. HASE 2010
  • all available at: http://homepages gsd inesc‐id pt/~mpc/

all available at: http://homepages.gsd.inesc‐id.pt/ mpc/