STYX: Stream Processing with Trustworthy Cloud-based Execution - - PowerPoint PPT Presentation

styx stream processing with trustworthy cloud based
SMART_READER_LITE
LIVE PREVIEW

STYX: Stream Processing with Trustworthy Cloud-based Execution - - PowerPoint PPT Presentation

STYX: Stream Processing with Trustworthy Cloud-based Execution Julian Stephen, Savvas Savvides, Vinaitheerthan Sundaram, Masoud Saeida Ardekani, Patrick Eugster October 6, 2016 Purdue University Table of contents 1. Overview 2. Ensuring


slide-1
SLIDE 1

STYX: Stream Processing with Trustworthy Cloud-based Execution

Julian Stephen, Savvas Savvides, Vinaitheerthan Sundaram, Masoud Saeida Ardekani, Patrick Eugster October 6, 2016

Purdue University

slide-2
SLIDE 2

Table of contents

  • 1. Overview
  • 2. Ensuring confidentiality in the cloud
  • 3. Challenges in encrypted stream processing
  • 4. Architecture
  • 5. STYX abstractions and key update
  • 6. Evaluation
  • 7. Related work and conclusion

2

slide-3
SLIDE 3

Overview

slide-4
SLIDE 4

Introduction

Compute clouds

  • Data analytics platforms
  • Cost-efficiency, ’on-demand’ compute, low infrastructure setup cost

IoT

  • 26 billion smart devices connected to a network by 2020
  • Fine-grained user behavior tracking to capture, personalize and/or

monetize user experience Stream processing

  • Analytics on real-time streaming data (continuous queries)
  • Many systems over the last few years - Apache Storm, Apache

Spark, Apache Flink, Apache Samza, Amazon Kinesis

4

slide-5
SLIDE 5

Vulnerabilities - 1

5

slide-6
SLIDE 6

Vulnerabilities - 1

5

slide-7
SLIDE 7

Vulnerabilities - 1

A

5

slide-8
SLIDE 8

Vulnerabilities - 1

A f(A)

5

slide-9
SLIDE 9

Vulnerabilities - 1

A f(A)

5

slide-10
SLIDE 10

Vulnerabilities - 1

A f(A) A

5

slide-11
SLIDE 11

Vulnerabilities - 2

Real problems

6

slide-12
SLIDE 12

Ensuring confidentiality in the cloud

slide-13
SLIDE 13

Confidentiality in the cloud

Fully homomorphic encryption (FHE)

  • Allows arbitrary computation on encrypted data

A f f(A)

8

slide-14
SLIDE 14

Confidentiality in the cloud

Fully homomorphic encryption (FHE)

  • Allows arbitrary computation on encrypted data

A f f(A)

8

slide-15
SLIDE 15

Confidentiality in the cloud

Fully homomorphic encryption (FHE)

  • Allows arbitrary computation on encrypted data

A f f(A) k E(A, k)

8

slide-16
SLIDE 16

Confidentiality in the cloud

Fully homomorphic encryption (FHE)

  • Allows arbitrary computation on encrypted data

A f f(A) k E(A, k) f ′ f ′(E(A, k))

8

slide-17
SLIDE 17

Confidentiality in the cloud

Fully homomorphic encryption (FHE)

  • Allows arbitrary computation on encrypted data

A f f(A) k E(A, k) f ′ f ′(E(A, k)) D(f ′(E(A, k)))

8

slide-18
SLIDE 18

Confidentiality in the cloud

Fully homomorphic encryption (FHE)

  • Allows arbitrary computation on encrypted data
  • Prohibitive overhead

8

slide-19
SLIDE 19

Confidentiality in the cloud

Fully homomorphic encryption (FHE)

  • Allows arbitrary computation on encrypted data
  • Prohibitive overhead

Partially homomorphic encryption (PHE)

  • Allows certain operations to be performed over encrypted text
  • AHE: D(E(x1)ψE(x2)) = x1 + x2
  • AHE, MHE, OPE, DET

Conjecture Many data analytics jobs can be performed securely using a combination

  • f partially homomorphic encryption schemes

8

slide-20
SLIDE 20

Vulnerabilities - 3

9

slide-21
SLIDE 21

Vulnerabilities - 3

A → E(A, k)

9

slide-22
SLIDE 22

Vulnerabilities - 3

A → E(A, k) f ′(E(A, k))

9

slide-23
SLIDE 23

Vulnerabilities - 3

A → E(A, k) f ′(E(A, k))

9

slide-24
SLIDE 24

Vulnerabilities - 3

A → E(A, k) f ′(E(A, k)) E(A, k) ❆ k → ❅ A

9

slide-25
SLIDE 25

Challenges in encrypted stream processing

slide-26
SLIDE 26

Challenges in encrypted stream processing

  • Programmer effort
  • Need to identify encryption scheme for each input data stream,

perform cryptographic equivalent of required operation

  • if (Stream1.f1 < 100)

return; else ...

  • sum = sum + Stream2.f3;

11

slide-27
SLIDE 27

Challenges in encrypted stream processing

  • Key change
  • PHE requires all tuples in an aggregate function to be encrypted

with same key

A A B C D E F G A

11

slide-28
SLIDE 28

Challenges in encrypted stream processing

  • Deployment optimizations
  • Deployment parameters specified for plaintext data may not be
  • ptimal when computation happens on encrypted data

11

slide-29
SLIDE 29

Challenges in encrypted stream processing

  • Programmer effort
  • Need to identify encryption scheme for each input data stream,

perform cryptographic equivalent of required operation

  • Key change
  • PHE requires all tuples in an aggregate function to be encrypted

with same key

  • Deployment optimizations
  • Deployment parameters specified for plaintext data may not be
  • ptimal when computation happens on encrypted data

11

slide-30
SLIDE 30

Challenges in encrypted stream processing

  • Programmer effort
  • Need to identify encryption scheme for each input data stream,

perform cryptographic equivalent of required operation

  • Key change
  • PHE requires all tuples in an aggregate function to be encrypted

with same key

  • Deployment optimizations
  • Deployment parameters specified for plaintext data may not be
  • ptimal when computation happens on encrypted data
  • Limitations of PHE
  • PHE may not support a sequence of operations requiring trusted

nodes to perform remaining computation

11

slide-31
SLIDE 31

Challenges in encrypted stream processing

  • Programmer effort
  • Need to identify encryption scheme for each input data stream,

perform cryptographic equivalent of required operation

  • Key change
  • PHE requires all tuples in an aggregate function to be encrypted

with same key

  • Deployment optimizations
  • Deployment parameters specified for plaintext data may not be
  • ptimal when computation happens on encrypted data
  • Limitations of PHE
  • PHE may not support a sequence of operations requiring trusted

nodes to perform remaining computation

  • Constants and initialization
  • Variables must be initialized using the encrypted value of the

initialization constant

11

slide-32
SLIDE 32

Challenges in encrypted stream processing

  • Programmer effort
  • Need to identify encryption scheme for each input data stream,

perform cryptographic equivalent of required operation

  • Key change
  • PHE requires all tuples in an aggregate function to be encrypted

with same key

  • Deployment optimizations
  • Deployment parameters specified for plaintext data may not be
  • ptimal when computation happens on encrypted data
  • Limitations of PHE
  • PHE may not support a sequence of operations requiring trusted

nodes to perform remaining computation

  • Constants and initialization
  • Variables must be initialized using the encrypted value of the

initialization constant

11

slide-33
SLIDE 33

Architecture

slide-34
SLIDE 34

STYX architecture

Program (STYX API, Annotations) Homomorphism Analysis Topology scheduling Topology execution

Trusted Tier Untrusted Cloud

Analytical Model

Execution flow

  • User submits program written using system (STYX) API
  • Homomorphism analysis identifies crypto systems required to

execute the graph

  • Analytical model identifies deployment profile
  • Scheduler assigns tasks to nodes
  • Runtime executes tasks

13

slide-35
SLIDE 35

STYX abstractions and key up- date

slide-36
SLIDE 36

STYX abstraction

Group sum in a sliding window

1 /** Track sum of values per group per time slot */ 2 public

class SlotBasedSum <T> {

3

...

4

public void updateSum(T group , int slot , SecField val) {

5

SecField [] sums = objGroupSum.get(group);

6

if (sums == null) {

7

sums = new SecField[this.numSlots ];

8

init(sums , val);

9

  • bjGroupSum.put(obj , sums);

10

}

11

sums[slot] = SecureOper

12

.add(sums[slot], val);

13

}

14 }

15

slide-37
SLIDE 37

STYX abstraction

Group sum in a sliding window

1 /** Track sum of values per group per time slot */ 2 public

class SlotBasedSum <T> {

3

...

4

public void updateSum(T group , int slot , SecField val) {

5

SecField [] sums = objGroupSum.get(group);

6

if (sums == null) {

7

sums = new SecField[this.numSlots ];

8

init(sums , val);

9

  • bjGroupSum.put(obj , sums);

10

}

11

sums[slot] = SecureOper

12

.add(sums[slot], val);

13

}

14 }

15

slide-38
SLIDE 38

Without STYX abstractions

Group sum in a sliding window (Storm)

1 public

class SlotBasedSum <T> {

2

BigInteger publicKey = readPubKey ();

3

public void updateSum(T group , int slot , BigInteger value) {

4

BigInteger [] sums = objGroupSum.get(group);

5

if (sums == null) {

6

sums = new BigInteger[this.numSlots ];

7

init(sums , "AHE");

8

  • bjGroupSum.put(group , sums);

9

}

10

sums[slot] = sums[slot ]. multiply(value)

11

.mod(publicKey.multiply(publicKey));

12

}

13 } 14

16

slide-39
SLIDE 39

Without STYX abstractions

Group sum in a sliding window (Storm)

1 public

class SlotBasedSum <T> {

2

BigInteger publicKey = readPubKey ();

3

public void updateSum(T group , int slot , BigInteger value) {

4

BigInteger [] sums = objGroupSum.get(group);

5

if (sums == null) {

6

sums = new BigInteger[this.numSlots ];

7

init(sums , "AHE");

8

  • bjGroupSum.put(group , sums);

9

}

10

sums[slot] = sums[slot ]. multiply(value)

11

.mod(publicKey.multiply(publicKey));

12

}

13 } 14

16

slide-40
SLIDE 40

Without STYX abstractions

Group sum in a sliding window (Storm)

1 public

class SlotBasedSum <T> {

2

BigInteger publicKey = readPubKey ();

3

public void updateSum(T group , int slot , BigInteger value) {

4

BigInteger [] sums = objGroupSum.get(group);

5

if (sums == null) {

6

sums = new BigInteger[this.numSlots ];

7

init(sums , "AHE");

8

  • bjGroupSum.put(group , sums);

9

}

10

sums[slot] = sums[slot ]. multiply(value)

11

.mod(publicKey.multiply(publicKey));

12

}

13 } 14

16

slide-41
SLIDE 41

Key change

Challenges

  • Functions that aggregates data over a sliding window makes it

impossible to change the encryption key without disrupting output Problem

A A B C D E F G A

17

slide-42
SLIDE 42

Key change

Challenges

  • Functions that aggregates data over a sliding window makes it

impossible to change the encryption key without disrupting output Problem

A A B C D E F G A

17

slide-43
SLIDE 43

Key change

Challenges

  • Functions that aggregates data over a sliding window makes it

impossible to change the encryption key without disrupting output Problem

A A B C D E F G A

17

slide-44
SLIDE 44

Key change

Challenges

  • Functions that aggregates data over a sliding window makes it

impossible to change the encryption key without disrupting output Problem

A A B C D E F G A

17

slide-45
SLIDE 45

Key change

Challenges

  • Continuous queries that aggregates data over a sliding window

makes it impossible to change the encryption key without disrupting

  • utput

Solution

A A B C D E D E F G A

18

slide-46
SLIDE 46

Key change

Challenges

  • Continuous queries that aggregates data over a sliding window

makes it impossible to change the encryption key without disrupting

  • utput

Solution

A A B C D E D E F G A

18

slide-47
SLIDE 47

Key change

Challenges

  • Continuous queries that aggregates data over a sliding window

makes it impossible to change the encryption key without disrupting

  • utput

Solution

A A B C D E D E F G A

18

slide-48
SLIDE 48

Key change

Challenges

  • Continuous queries that aggregates data over a sliding window

makes it impossible to change the encryption key without disrupting

  • utput

Solution

A A B C D E D E F G A

18

slide-49
SLIDE 49

Key change

Challenges

  • Continuous queries that aggregates data over a sliding window

makes it impossible to change the encryption key without disrupting

  • utput

Solution

A A B C D E D E F G A

18

slide-50
SLIDE 50

Evaluation

slide-51
SLIDE 51

Evaluation - 1

IoT Bench

  • Smart meter data over a 24 hour time period at the rate of 1

reading per minute from 443 unique homes, totaling 637526 records

  • 4 ec2 m3.large node
  • Each tuple comprises of timestamp, meter id and meter reading

2000 4000 6000 8000 10000 12000 14000 16000 18000 Q1 Q2 Q3 Q4 Q5 Q6 Throughput (#tuples/sec) STYX Storm

20

slide-52
SLIDE 52

Evaluation - 2

Performance when keys change

  • New york taxi route data (10G)
  • Application finds the top 10 most frequent routes during the last 30

minutes of taxi servicing

  • 9 ec2 m3.large node

200 400 600 800 1000 1200 1400 2000 4000 6000 8000 10000 Response Time (ms) Time (s)

21

slide-53
SLIDE 53

Related work and conclusion

slide-54
SLIDE 54

Related work

Prior work on encrypted computaion

  • [Popa et al.; SOSP’11];
  • [Gentry.; STOC’09];
  • [Stephen et al.; VLDB’13];

Other approaches

  • Using secure processors (e.g., Intel SGX)

23

slide-55
SLIDE 55

Conclusion

Conclusion

  • Confidentiality breaches pose a serious threat to adoption and

utilization of cloud resources

  • PHE has proven to be effective for various batch workloads
  • STYX leverage PHE for stream data analytics in the cloud
  • Makes it easier for programmers to use PHE
  • Automatically translates the program and optimizes deployment

parameters

24

slide-56
SLIDE 56

Questions

25